<doi_batch xmlns="http://www.crossref.org/schema/4.4.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="4.4.0"><head><doi_batch_id>3db4b26e-57b4-4ed9-8e1f-23e620908be4</doi_batch_id><timestamp>20210524042524439</timestamp><depositor><depositor_name>wseas:wseas</depositor_name><email_address>mdt@crossref.org</email_address></depositor><registrant>MDT Deposit</registrant></head><body><journal><journal_metadata language="en"><full_title>WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS</full_title><issn media_type="electronic">2224-3402</issn><issn media_type="print">1790-0832</issn><archive_locations><archive name="Portico"/></archive_locations><doi_data><doi>10.37394/23209</doi><resource>http://wseas.org/wseas/cms.action?id=4046</resource></doi_data></journal_metadata><journal_issue><publication_date media_type="online"><month>3</month><day>2</day><year>2021</year></publication_date><publication_date media_type="print"><month>3</month><day>2</day><year>2021</year></publication_date><journal_volume><volume>18</volume><doi_data><doi>10.37394/23209.2021.18</doi><resource>https://www.wseas.org/cms.action?id=23296</resource></doi_data></journal_volume></journal_issue><journal_article language="en"><titles><title>POS-Tagging based Neural Machine Translation System for European Languages using Transformers</title></titles><contributors><person_name sequence="first" contributor_role="author"><given_name>Preetham</given_name><surname>Ganesh</surname><affiliation>University of Texas at Arlington, Arlington, Tx, Usa</affiliation></person_name><person_name sequence="additional" contributor_role="author"><given_name>Bharat S.</given_name><surname>Rawal</surname><affiliation>Gannon University, Erie, Pa, Usa</affiliation></person_name><person_name sequence="additional" contributor_role="author"><given_name>Alexander</given_name><surname>Peter</surname><affiliation>Softsquare, Silver Spring, Md, Usa</affiliation></person_name><person_name sequence="additional" contributor_role="author"><given_name>Andi</given_name><surname>Giri</surname><affiliation>Softsquare, Toronto, on, Canada</affiliation></person_name></contributors><jats:abstract xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1"><jats:p>The interaction between human beings has always faced different kinds of difficulties. One of those difficulties is the language barrier. It would be a tedious task for someone to learn all the syllables in a new language in a short period and converse with a native speaker without grammatical errors. Moreover, having a language translator at all times would be intrusive and expensive. We propose a novel approach to Neural Machine Translation (NMT) system using interlanguage word similaritybased model training and Part-Of-Speech (POS) Tagging based model testing. We compare these approaches using two classical architectures: Luong Attention-based Sequence-to-Sequence architecture and Transformer based model. The sentences for the Luong Attention-based Sequence-to-Sequence were tokenized using SentencePiece tokenizer. The sentences for the Transformer model were tokenized using Subword Text Encoder. Three European languages were selected for modeling, namely, Spanish, French, and German. The datasets were downloaded from multiple sources such as Europarl Corpus, Paracrawl Corpus, and Tatoeba Project Corpus. Sparse Categorical CrossEntropy was the evaluation metric during the training stage, and during the testing stage, the Bilingual Evaluation Understudy (BLEU) Score, Precision Score, and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score were the evaluation metrics.</jats:p></jats:abstract><publication_date media_type="online"><month>5</month><day>24</day><year>2021</year></publication_date><publication_date media_type="print"><month>5</month><day>24</day><year>2021</year></publication_date><pages><first_page>26</first_page><last_page>33</last_page></pages><ai:program xmlns:ai="http://www.crossref.org/AccessIndicators.xsd" name="AccessIndicators"><ai:free_to_read start_date="2021-05-24"/><ai:license_ref applies_to="am" start_date="2021-05-24">https://www.wseas.org/multimedia/journals/information/2021/a105109-004(2021).pdf</ai:license_ref></ai:program><archive_locations><archive name="Portico"/></archive_locations><doi_data><doi>10.37394/23209.2021.18.5</doi><resource>https://www.wseas.org/multimedia/journals/information/2021/a105109-004(2021).pdf</resource></doi_data><citation_list><citation key="ref0"><unstructured_citation>Ethnologue. How many languages are there in the world?, Feb 2020. </unstructured_citation></citation><citation key="ref1"><unstructured_citation>Worldometer. World population (live), Oct 2020. </unstructured_citation></citation><citation key="ref2"><unstructured_citation>Wikipedia contributors. List of languages by total number of speakers — Wikipedia, the free encyclopedia. [Online; accessed 21-September2020]. </unstructured_citation></citation><citation key="ref3"><unstructured_citation>Wikipedia contributors. List of languages by the number of countries in which they are recognized as an official language — Wikipedia, the free encyclopedia, 2020. </unstructured_citation></citation><citation key="ref4"><unstructured_citation>Wikipedia contributors. Official language — Wikipedia, the free encyclopedia, 2020. </unstructured_citation></citation><citation key="ref5"><doi>10.1038/scientificamericanmind0718-5</doi><unstructured_citation>Dana Smith. At what age does our ability to learn a new language like a native speaker disappear?, May 2018. </unstructured_citation></citation><citation key="ref6"><unstructured_citation>Steffy Zameo. Neural machine translation: Tips &amp; advantages for digital translations — textmaster, May 2019. </unstructured_citation></citation><citation key="ref7"><unstructured_citation>Delip Rao. The real problems with neural machine translation, Jul 2018. </unstructured_citation></citation><citation key="ref8"><unstructured_citation>Wikipedia contributors. Statistical machine translation — Wikipedia, the free encyclopedia. [Online; accessed 21-September-2020]. </unstructured_citation></citation><citation key="ref9"><unstructured_citation>Wikipedia contributors. Neural machine translation — Wikipedia, the free encyclopedia. [Online; accessed 21-September-2020]. </unstructured_citation></citation><citation key="ref10"><unstructured_citation>Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014. </unstructured_citation></citation><citation key="ref11"><unstructured_citation>Wikipedia contributors. Seq2seq — Wikipedia, the free encyclopedia, 2020. [Online; accessed 21-September-2020]. </unstructured_citation></citation><citation key="ref12"><doi>10.3115/v1/w14-4009</doi><unstructured_citation>Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. </unstructured_citation></citation><citation key="ref13"><doi>10.18653/v1/d15-1166</doi><unstructured_citation>Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015. </unstructured_citation></citation><citation key="ref14"><doi>10.18653/v1/p16-1162</doi><unstructured_citation>Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909, 2015. </unstructured_citation></citation><citation key="ref15"><doi>10.18653/v1/d18-2012</doi><unstructured_citation>Taku Kudo and John Richardson. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71. Association for Computational Linguistics, November 2018. </unstructured_citation></citation><citation key="ref16"><doi>10.1162/tacl_a_00065</doi><unstructured_citation>Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016. </unstructured_citation></citation><citation key="ref17"><doi>10.18653/v1/p17-1012</doi><unstructured_citation>Jonas Gehring, Michael Auli, David Grangier, and Yann N Dauphin. A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344, 2016. </unstructured_citation></citation><citation key="ref18"><unstructured_citation>A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, L Kaiser, and I Polosukhin. Attention is all you need. arxiv 2017. arXiv preprint arXiv:1706.03762, 2017. </unstructured_citation></citation><citation key="ref19"><doi>10.1145/3321124</doi><unstructured_citation>Yongjing Yin, Jinsong Su, Huating Wen, Jiali Zeng, Yang Liu, and Yidong Chen. Pos tag-enhanced coarse-to-fine attention for neural machine translation. ACM Trans. Asian Low-Resour. Lang. Inf. Process., 18(4), April 2019. </unstructured_citation></citation><citation key="ref20"><doi>10.18653/v1/w17-4708</doi><unstructured_citation>Jan Niehues and Eunah Cho. Exploiting linguistic resources for neural machine translation using multi-task learning, 2017. </unstructured_citation></citation><citation key="ref21"><unstructured_citation>Kelly, Charles. English-spanish sentences from the tatoeba project, 2020. [Online; Accessed 20 September 2020.]. </unstructured_citation></citation><citation key="ref22"><unstructured_citation>Philipp Koehn. Europarl: A parallel corpus for statistical machine translation. Citeseer, 2005. </unstructured_citation></citation><citation key="ref23"><unstructured_citation>Paracrawl, 2018. </unstructured_citation></citation><citation key="ref24"><doi>10.3115/1118108.1118117</doi><unstructured_citation>Edward Loper and Steven Bird. Nltk: The natural language toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1, ETMTNLP ’02, page 63–70, USA, 2002. Association for Computational Linguistics. </unstructured_citation></citation><citation key="ref25"><unstructured_citation>Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spaCy: Industrial-strength Natural Language Processing in Python, 2020. </unstructured_citation></citation><citation key="ref26"><unstructured_citation>J. F. Kolen and S. C. Kremer. Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies, pages 237–243. 2001. </unstructured_citation></citation><citation key="ref27"><unstructured_citation>Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. Understanding the exploding gradient problem. ArXiv, abs/1211.5063, 2012. </unstructured_citation></citation><citation key="ref28"><doi>10.3115/1073083.1073135</doi><unstructured_citation>Kishore Papineni, Salim Roukos, Todd Ward, and Wei jing Zhu. Bleu: a method for automatic evaluation of machine translation. pages 311–318, 2002. </unstructured_citation></citation><citation key="ref29"><doi>10.3115/1626355.1626389</doi><unstructured_citation>Satanjeev Banerjee and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65– 72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. </unstructured_citation></citation><citation key="ref30"><doi>10.1145/3190508.3190551</doi><unstructured_citation>Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed systems, 2016. </unstructured_citation></citation><citation key="ref31"><unstructured_citation>Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017.</unstructured_citation></citation></citation_list></journal_article></journal></body></doi_batch>