Extracting Multilingual Lexicons from Parallel Corpora
Authors: Tufiş, Dan; Barbu, Ana; Ion, Radu
Source: Computers and the Humanities, Volume 38, Number 2, May 2004 , pp. 163-189(27)
Publisher: Springer
Abstract:
The paper describes our recent developments in automatic extraction of translation equivalents from parallel corpora. We describe three increasingly complex algorithms: a simple baseline iterative method, and two non-iterative more elaborated versions. While the baseline algorithm is mainly described for illustrative purposes, the non-iterative algorithms outline the use of different working hypotheses which may be motivated by different kinds of applications and to some extent by the languages concerned. The first two algorithms rely on cross-lingual POS preservation, while with the third one POS invariance is not an extraction condition. The evaluation of the algorithms was conducted on three different corpora and several pairs of languages.Keywords: alignment; evaluation; lemmatization; tagging; translation equivalence
Document Type: Research article
DOI: http://dx.doi.org/10.1023/B:CHUM.0000031172.03949.48
Affiliations: 1: Email: tufis@racai.ro
Publication date: 2004-05-01
- In this: publication
- By this: publisher
- In this Subject: Library Science
- By this author: Tufiş, Dan ; Barbu, Ana ; Ion, Radu

Shopping cart
Receive new issue alert