Skip to main content

On the Relative Influence of Corpus and Dictionary Size in a Study Using Non-Parallel Corpora

Buy Article:

$51.63 plus tax (Refund Policy)


We did an experiment on Japanese-to-German translation of 2-part compound nouns via their components using a small dictionary and a large Target Language (TL) corpus. As TL translation variants, we considered expressions containing adjectives or genitive adjuncts, as well as diverse forms for the first component of a German compound. Verification in a TL corpus is a good means of deciding among these forms, at least. In order to get significant statistics from corpora, large data quantities are important. As parallel data are still quite scarce, using monolingual corpora instead is an option, but it requires the use of a dictionary. In our study, insufficient dictionary size was an obstacle much bigger than corpus size. We tried to quantify the relative influence of the two resources to assess system balance. We predict that a middle-sized dictionary of about 100,000 entries would give good coverage of compound noun components.

Document Type: Research Article


Publication date: August 1, 2001


Access Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content
Cookie Policy
Cookie Policy
ingentaconnect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more