Abstract:
Statistical Machine Translation (SMT) in the form of text is an approach to machine translation that is derived from the concept of information theory. Translating one natural text (Bangla) to another (Odia) using the learning technique is a challenging task with an issue called word alignment. Word alignment decides which word of source language (Bangla) is mapped with which word of the target language (Odia) using probability distribution values. The probability values are found by the iteration process between the words in a parallel corpus. A bilingual dictionary and phrase-based translation are sometimes required. A total of 70% of the text is taken for training, and 30% is taken for testing in the agriculture-based domain, which is collected from TDIL (Technology Development for Indian Languages, Govt. of India) and treated as the bilingual corpus or parallel corpus. The accuracy is calculated from the test set using the confusion matrix along with precision, recall, and f-score. That accuracy value indicates the performance of the model and will be enhanced further in the future. The accuracy is measured in three steps, including word-wise, phrase-wise, sentence-wise, and paragraph-wise translation, which give 0.92%, 0.88%, 0.85% and 0.83% respectively