Extracting Multilingual Lexicons from Parallel Corpora

Authors: Tufiş, Dan; Barbu, Ana; Ion, Radu

Source: Computers and the Humanities, Volume 38, Number 2, May 2004 , pp. 163-189(27)

Publisher: Springer

Buy & download fulltext article:

OR

Price: $47.00 plus tax (Refund Policy)

Abstract:

The paper describes our recent developments in automatic extraction of translation equivalents from parallel corpora. We describe three increasingly complex algorithms: a simple baseline iterative method, and two non-iterative more elaborated versions. While the baseline algorithm is mainly described for illustrative purposes, the non-iterative algorithms outline the use of different working hypotheses which may be motivated by different kinds of applications and to some extent by the languages concerned. The first two algorithms rely on cross-lingual POS preservation, while with the third one POS invariance is not an extraction condition. The evaluation of the algorithms was conducted on three different corpora and several pairs of languages.

Keywords: alignment; evaluation; lemmatization; tagging; translation equivalence

Document Type: Research article

DOI: http://dx.doi.org/10.1023/B:CHUM.0000031172.03949.48

Affiliations: 1: Email: tufis@racai.ro

Publication date: 2004-05-01

Related content

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page