TWO APPROACHES TO AUTOMATED TEXT ALIGNING OF PARALLEL FICTION TEXTS
Author: MIKHAIL Mikhailov
Source: Across Languages and Cultures, Volume 2, Number 1, 13 December 2001 , pp. 87-96(10)
Publisher: Akademiai Kiado
Abstract:
Parallel text corpora supply researchers with data for multilingual lexicographic research, translation studies, and language typology. The objectives of the ParRus research project at the University of Tampere are to compile a Russian-Finnish parallel corpus and to develop the software for the maintenance of the corpus. Text aligning is the crucial problem in compiling parallel corpora. The study of parallel texts shows that, in most cases, the translator retains paragraphs of the original in the translation. The Source Language Target Language quotient (ratio of number of words in originals to number of words in translations) is also a stable value. The aligning programme developed at the Department compares original with translation, paragraph by paragraph, adding new paragraphs to the extracts being aligned until the extracts match the SL-TL quotient. The system only produces good results if the translation is structurally close to the original. However, the study of parallel texts shows that frequency of words and their translation equivalents does not usually match. Therefore, paragraphs and larger text units are the only elements of formal text structure which can be used for comparing parallel texts, unless knowledge structures are exploited.Document Type: Research article
Publication date: 2001-12-13
- Terms & Conditions
- ingentaconnect is not responsible for the content or availability of external websites
- In this: publication
- By this: publisher
- In this Subject: Language & Linguistics
- By this author: MIKHAIL Mikhailov

Shopping cart
Receive new issue alert
Get Permissions