
Vissen naar variatie : Digitaal op zoek naar onbekende Noord/Zuid-verschillen in de grammatica van het Nederlands
Abstract
Belgian Dutch (BD) and Netherlandic Dutch (ND) are known to exhibit phonetic and lexical differences, but national variation in the syntax of Dutch has often been claimed to be quasi non-existent. This view is rooted in the fact that both laypersons and researchers are oblivious to national divergences in the grammar of Dutch (unless they are categorical and/or heavily mediatized), but also in the undisputed belief that BD and ND are different surface manifestations of ‘the same grammatical motor’. As a result, only a few syntactic phenomena have hitherto been shown to be sensitive to national constraints. In this paper we illustrate a computational bottom-up approach (pioneered in Bannard & Callison-Burch 2005) to cast the net as widely as possible. Building on statistical machine translation and a parallel corpus of Dutch translations of English subtitles, we identify plausible mappings between English n-grams and their Dutch translations. We do this in order to obtain paraphrases, i.e., stretches of interchangeable Dutch text that carry approximately the same meaning. In a first case study, we found corroborating evidence among the discovered paraphrases for many syntactic variables that have previously been attested in Dutch, including complementizer variation, existential er-variation, word order phenomena, and inflection variation. Crucially, we also discovered a number of alternations we had not anticipated as interesting variables. In order to detect national constraints on the newly found variables, we carried out a second experiment with a smaller corpus of Belgian and Netherlandic subtitles: the two variables we investigated in this light ‐ deictic strength variation and subordination variation ‐ did indeed manifest national sensitivity.
Belgian Dutch (BD) and Netherlandic Dutch (ND) are known to exhibit phonetic and lexical differences, but national variation in the syntax of Dutch has often been claimed to be quasi non-existent. This view is rooted in the fact that both laypersons and researchers are oblivious to national divergences in the grammar of Dutch (unless they are categorical and/or heavily mediatized), but also in the undisputed belief that BD and ND are different surface manifestations of ‘the same grammatical motor’. As a result, only a few syntactic phenomena have hitherto been shown to be sensitive to national constraints. In this paper we illustrate a computational bottom-up approach (pioneered in Bannard & Callison-Burch 2005) to cast the net as widely as possible. Building on statistical machine translation and a parallel corpus of Dutch translations of English subtitles, we identify plausible mappings between English n-grams and their Dutch translations. We do this in order to obtain paraphrases, i.e., stretches of interchangeable Dutch text that carry approximately the same meaning. In a first case study, we found corroborating evidence among the discovered paraphrases for many syntactic variables that have previously been attested in Dutch, including complementizer variation, existential er-variation, word order phenomena, and inflection variation. Crucially, we also discovered a number of alternations we had not anticipated as interesting variables. In order to detect national constraints on the newly found variables, we carried out a second experiment with a smaller corpus of Belgian and Netherlandic subtitles: the two variables we investigated in this light ‐ deictic strength variation and subordination variation ‐ did indeed manifest national sensitivity.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
No Metrics
Keywords: Syntactic variation; computational linguistics; machine translation; national variation; subtitles
Document Type: Research Article
Publication date: April 1, 2020
- Het tijdschrift Nederlandse Taalkunde publiceert bijdragen aan de wetenschappelijke studie van de Nederlandse taal in de ruimste zin van het woord. Nederlandse Taalkunde streeft ernaar bijdragen te publiceren vanuit zoveel mogelijk verschillende vakgebieden van de Nederlandse taalkunde en vanuit zoveel mogelijk verschillende benaderingen binnen die vakgebieden. Alle soorten bijdragen (artikelen, squibs en boekbesprekingen) kunnen in het Engels of het Nederlands geschreven zijn. Artikelen van Nederlandse Taalkunde verschijnen in Open Access, na een periode van drie jaar.
Nederlandse Taalkunde publishes scholarly articles in both Dutch and English about linguistics, concerning the Dutch language, and in the broadest sense. The journal aims to include contributions from all subdisciplines within linguistics. In addition to research articles Nederlandse Taalkunde also publishes overviews and discussions on contemporary subjects within the field. Articles in Nederlandse Taalkunde are published in Open Access, after a period of three years. - Editorial Board
- Information for Authors
- Back Issues, 2000-2008
- Peer Review, Ethics and Malpractice
- Ingenta Connect is not responsible for the content or availability of external websites