A Novel Approach for Automatic Bengali Question Answering System using Semantic Similarity Analysis


الملخص بالإنكليزية

Finding the semantically accurate answer is one of the key challenges in advanced searching. In contrast to keyword-based searching, the meaning of a question or query is important here and answers are ranked according to relevance. It is very natural that there is almost no common word between the question sentence and the answer sentence. In this paper, an approach is described to find out the semantically relevant answers in the Bengali dataset. In the first part of the algorithm, a set of statistical parameters like frequency, index, part-of-speech (POS), etc. is matched between a question and the probable answers. In the second phase, entropy and similarity are calculated in different modules. Finally, a sense score is generated to rank the answers. The algorithm is tested on a repository containing a total of 275000 sentences. This Bengali repository is a product of Technology Development for Indian Languages (TDIL) project sponsored by Govt. of India and provided by the Language Research Unit of Indian Statistical Institute, Kolkata. The shallow parser, developed by the LTRC group of IIIT Hyderabad is used for POS tagging. The actual answer is ranked as 1st in 82.3% cases. The actual answer is ranked within 1st to 5th in 90.0% cases. The accuracy of the system is coming as 97.32% and precision of the system is coming as 98.14% using confusion matrix. The challenges and pitfalls of the work are reported at last in this paper.

تحميل البحث