Named Entity Recognition and transliteration in Bengali

Authors: Ekbal, Asif; Naskar, Sudip Kumar; Bandyopadhyay, Sivaji

Source: Lingvisticae Investigationes, Volume 30, Number 1, 2007 , pp. 95-114(20)

Publisher: John Benjamins Publishing Company

Buy & download fulltext article:

OR

Price: $37.41 plus tax (Refund Policy)

Abstract:

The paper reports about the development of a Named Entity Recognition (NER) system in Bengali using a tagged Bengali news corpus and the subsequent transliteration of the recognized Bengali Named Entities (NEs) into English. Three different models of the NER have been developed. A semi-supervised learning method has been adopted to develop the first two models, one without linguistic features (Model A) and the other with linguistic features (Model B). The third one (Model C) is based on statistical Hidden Markov Model. A modified joint-source channel model has been used along with a number of alternatives to generate the English transliterations of Bengali NEs and vice-versa. The transliteration models learn the mappings from the bilingual training sets optionally guided by linguistic knowledge in the form of conjuncts and diphthongs in Bengali and their representations in English. The NER system has demonstrated the highest average Recall, Precision and F-Score values of 89.62%, 78.67% and 83.79% respectively in Model C. Evaluation of the proposed transliteration models demonstrated that the modified joint source-channel model performs best in terms of evaluation metrics for person and location names for both Bengali to English (B2E) transliteration and English to Bengali transliteration (E2B). The use of the linguistic knowledge during training of the transliteration models improves performance.
Related content

Tools

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page