Skip to main content
padlock icon - secure page this page is secure

Open Access Tekstgenres analyseren op lexicale complexiteit met TScan

Download Article:
(HTML 118.5 kb)
(PDF 221.8 kb)

Using T-Scan to analyse the lexical complexity of text genres

T-Scan is a tool for the automatic analysis of Dutch text. This paper presents the first large-scale corpus analysis with T-Scan, focusing on lexical complexity. A collection of nearly 1000 text specimens was assembled, containing ten genres: travel blogs, celebrity news features, novels, textbooks for vocational secondary schools, textbooks for general secondary schools, news reports, opinion pieces, political programs, medical advice texts and research articles. The lexical complexity features in the analysis include morphology, word frequency, various word concreteness indices, personal pronouns, names and verb tense. Systematic genre differences are found, such that a genre detection model comprising 18 T-Scan features correctly identifies 83 percent of the corpus texts. Most lexical features differentiating genres intuitively relate to text topic complexity. A closer analysis is offered of the contrast between the two textbook samples in the corpus, which differ only in the educational levels they cater for. Again, topic variation seems a more important factor than stylistic variation. We demonstrate a new method to examine stylistic variation, which consists of within-genre comparisons using the genre prediction; more specifically, ‘deviant’ texts are compared to ‘typical’ members of their genre.
No References for this article.
No Supplementary Data.
No Article Media
No Metrics

Keywords: automatic text analysis; corpus research; lexical complexity; readability; stylistic variation

Document Type: Research Article

Affiliations: 1: Henk Pander Maat is senior-onderzoeker bij het Utrecht Institute of Linguistics OTS aan de Universiteit Utrecht. 2: Nick Dekker studeerde in Utrecht Nederlands en Communicatie en organisatie; hij is nu webredacteur bij Vitens.

Publication date: December 1, 2016

More about this publication?
  • Het Tijdschrift voor Taalbeheersing verschijnt drie keer per jaar en biedt een platform aan taalbeheersers om de resultaten van wetenschappelijk onderzoek naar taal- en tekstgebruik te publiceren. Het onderzoek waarover gerapporteerd wordt, richt zich zowel op schriftelijke als mondelinge taalvaardigheid, op begrijpelijk en/of effectief taalgebruik, op vormen en functies van verschillende tekstgenres en op taalkundige kenmerken van communicatie. Het tijdschrift biedt ruimte aan onderzoek vanuit verschillende disciplines zoals taalwetenschap, tekstwetenschap, conversatie-analyse, communicatiewetenschap, psychologie, onderwijskunde, argumentatietheorie en retorica.

    The Dutch-language journal Tijdschrift voor Taalbeheersing is a platform for academics to publish research results about the use of language and texts. The research focuses on written and oral language skills, on understandable and/or effective language usage, on characteristics of different text genres and on characteristics of texts in communication.
  • Editorial Board
  • Information for Authors
  • Publisher's Website
  • Back Issues
  • Peer Review, Ethics and Malpractice
  • Ingenta Connect is not responsible for the content or availability of external websites
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
Cookie Policy
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more