Skip to main content

On the Relative Influence of Corpus and Dictionary Size in a Study Using Non-Parallel Corpora

Buy Article:

$55.00 plus tax (Refund Policy)


We did an experiment on Japanese-to-German translation of 2-part compound nouns via their components using a small dictionary and a large Target Language (TL) corpus. As TL translation variants, we considered expressions containing adjectives or genitive adjuncts, as well as diverse forms for the first component of a German compound. Verification in a TL corpus is a good means of deciding among these forms, at least. In order to get significant statistics from corpora, large data quantities are important. As parallel data are still quite scarce, using monolingual corpora instead is an option, but it requires the use of a dictionary. In our study, insufficient dictionary size was an obstacle much bigger than corpus size. We tried to quantify the relative influence of the two resources to assess system balance. We predict that a middle-sized dictionary of about 100,000 entries would give good coverage of compound noun components.

Document Type: Research Article


Publication date: 2001-08-01

  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
Cookie Policy
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more