On the Relative Influence of Corpus and Dictionary Size in a Study Using Non-Parallel Corpora
Author: Cromm O.
Source: Journal of Quantitative Linguistics, Volume 8, Number 2, August 2001 , pp. 137-148(12)
Key:
- Free Content
- New Content
- Subscribed Content
- Free Trial Content
Abstract:
We did an experiment on Japanese-to-German translation of 2-part compound nouns via their components using a small dictionary and a large Target Language (TL) corpus. As TL translation variants, we considered expressions containing adjectives or genitive adjuncts, as well as diverse forms for the first component of a German compound. Verification in a TL corpus is a good means of deciding among these forms, at least. In order to get significant statistics from corpora, large data quantities are important. As parallel data are still quite scarce, using monolingual corpora instead is an option, but it requires the use of a dictionary. In our study, insufficient dictionary size was an obstacle much bigger than corpus size. We tried to quantify the relative influence of the two resources to assess system balance. We predict that a middle-sized dictionary of about 100,000 entries would give good coverage of compound noun components.Document Type: Research article
DOI: 10.1076/jqul.8.2.137.4103
Key:
- Free Content
- New Content
- Subscribed Content
- Free Trial Content

Click here for Page Help