International Conference on Computational Linguistics. In Proceedings of the 9th Workshop on Asian Translation, pages 111–116, Gyeongju, Republic of Korea. English to Bengali Multimodal Neural Machine Translation using Transliteration-based Phrase Pairs Augmentation. Anthology ID: 2022.wat-1.14 Volume: Proceedings of the 9th Workshop on Asian Translation Month: October Year: 2022 Address: Gyeongju, Republic of Korea Venue: WAT SIG: Publisher: International Conference on Computational Linguistics Note: Pages: 111–116 Language: URL: DOI: Bibkey: laskar-etal-2022-english Cite (ACL): Sahinur Rahman Laskar, Pankaj Dadure, Riyanka Manna, Partha Pakray, and Sivaji Bandyopadhyay. We have attained the best results on the challenge and evaluation test set for English to Bengali multimodal translation with BLEU scores of 28.70, 43.90 and RIBES scores of 0.688931, 0.780669, respectively. Herein, we have proposed a transliteration-based phrase pairs augmentation approach which shows improvement in the multimodal translation task and achieved benchmark results on Bengali Visual Genome 1.0 dataset. WAT2022 (Workshop on Asian Translation 2022) organizes (hosted by the COLING 2022) English to Bengali multimodal translation task where we have participated as a team named CNLP-NITS-PP in two tracks: 1) text-only and 2) multimodal translation. Moreover, the multimodal concept utilizes text and visual features to improve low-resource pair translation. Although the deep learning-based technique known as neural machine translation (NMT) is a widely accepted machine translation approach, it needs an adequate amount of training data, which is a challenging issue for low-resource pair translation. The use of the linguistic knowledge during training of the transliteration models improves performance.Abstract Automatic translation of one natural language to another is a popular task of natural language processing. Evaluation of the proposed transliteration models demonstrated that the modified joint source-channel model performs best in terms of evaluation metrics for person and location names for both Bengali to English (B2E) transliteration and English to Bengali transliteration (E2B). The NER system has demonstrated the highest average Recall, Precision and F-Score values of 89.62%, 78.67% and 83.79% respectively in Model C. The transliteration models learn the mappings from the bilingual training sets optionally guided by linguistic knowledge in the form of conjuncts and diphthongs in Bengali and their representations in English. A modified joint-source channel model has been used along with a number of alternatives to generate the English transliterations of Bengali NEs and vice-versa. The third one (Model C) is based on statistical Hidden Markov Model. A semi-supervised learning method has been adopted to develop the first two models, one without linguistic features (Model A) and the other with linguistic features (Model B). Three different models of the NER have been developed. The paper reports about the development of a Named Entity Recognition (NER) system in Bengali using a tagged Bengali news corpus and the subsequent transliteration of the recognized Bengali Named Entities (NEs) into English.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |