اختيار الكلمات لتوسيع الاستعلامات في نظم استرجاع المعلومات الطبية متعددة اللغات


الملخص بالعربية

نعرض في هذا البحث مودل لتوسيع الاستعلامات التلقائية في نظم استرجاع المعلومات متعددة اللغات في المجال الطبي. يوظف المودل المستخدم ترجمة آلية للاستعلام في اللغة المصدر الى لغة المستندات وتابع انحدار خطي لتوقّع دقة الاسترجاع لكل استعلام مترجم عند توسيع هذا الاستعلام مع كلمة مرشحة. الكلمات المرشحة (في لغة المستندات) اختيرت من مصادر متعددة: الترجمات المقترحة للاستعلام التي تم الحصول عليها من نظام ترجمة آلي, مقالات ويكيبيديا, وملخصات PubMed. توسيع الاستعلام يُطبق فقط عندما يتوقّع المودل قيمة للكلمة المرشحة تتجاوز عتبة تم تدريبها مسبقا ليسمح ذلك لتوسيع الاستعلامات فقط بالكلمات المرتبطة بقوّة به. اختباراتنا تم تنفيذها على بيانات الاختبار الخاصة ب CLEF eHealth 2013-2015 وأظهرت تفوق ملحوظ في نظم استرجاع المعلومات متعددة اللغات واحادية اللغة.

المراجع المستخدمة

Amati, G., Carpineto, C., Romano, G.: Query diculty, robustness, and selective application of query expansion. In: European conference on information retrieval. pp. 127{137. Springer, Berlin, Germany (2004)
Aronson, A.R.: E ective mapping of biomedical text to the umls metathesaurus: the metamap program. Proc AMIA Symp pp. 17{21 (2001)
Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 243{250. SIGIR '08, ACM, New York, NY, USA (2008)
Chandra, G., Dwivedi, S.K.: Query expansion based on term selection for Hindi- English cross lingual IR. Journal of King Saud University - Computer and Information Sciences (2017)
Chiang, W.T.M., Hagenbuchner, M., Tsoi, A.C.: The wt10g dataset and the evolution of the web. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. pp. 938{939. WWW '05, ACM, New York, NY, USA (2005)
Choi, S., Choi, J.: Exploring e ective information retrieval technique for the medical web documents: Snumedinfo at clefehealth2014 task 3. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum. vol. 1180, pp. 167{175. CEUR-WS.org, Sheeld, UK (2014)
Dusek, O., Hajic, J., Hlavacova, J., Novak, M., Pecina, P., Rosa, R., et al.: Machine translation of medical texts in the Khresmoi project. In: Proceedings of the Ninth Workshop on Statistical Machine Translation. pp. 221{228. Baltimore, USA (2014)
Ermakova, L., Mothe, J.: Query expansion by local context analysis. In: Conference francophone en Recherche d'Information et Applications (CORIA 2016). pp. 235{ 250. CORIA-CIFED, Toulouse, France (2016)
Gabrilovich, E., Broder, A., Fontoura, M., Joshi, A., Josifovski, V., Riedel, L., Zhang, T.: Classifying search queries using the web as a source of knowledge. ACM Transactions on the Web 3(2), 5 (2009)
Goeuriot, L., Kelly, L., Li, W., Palotti, J., Pecina, P., Zuccon, G., Hanbury, A., Jones, G., Mueller, H.: ShARe/CLEF eHealth evaluation lab 2014, Task 3: Usercentred health information retrieval. In: Proceedings of CLEF 2014. pp. 43{61. CEUR-WS.org, Sheeld,UK (2014)
Goeuriot, L., Kelly, L., Suominen, H., Hanlen, L., Nevaol, A., Grouin, C., Palotti, J., Zuccon, G.: Overview of the CLEF eHealth evaluation lab 2015. In: The 6th Conference and Labs of the Evaluation Forum. pp. 429{443. Springer, Berlin, Germany (2015)
Harman, D.: Towards interactive query expansion. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 321{331. SIGIR '88, ACM, New York, NY, USA (1988)
Harman, D.: Information retrieval. chap. Relevance Feedback and Other Query Modi cation Techniques, pp. 241{263. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1992)
Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 329{338. ACM, Pittsburgh, USA (1993)
Humphreys, B.L., Lindberg, D.A.B., Schoolman, H.M., Barnett, G.O.: The uni ed medical language system. Journal of the American Medical Informatics Association 5(1), 1{11 (1998)
Kalpathy-Cramer, J., Muller, H., Bedrick, S., Eggel, I., De Herrera, A., Tsikrika, T.: Overview of the clef 2011 medical image classi cation and retrieval tasks. In: CLEF 2011 - Working Notes for CLEF 2011 Conference. vol. 1177. CEUR-WS (2011)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Demo and Poster Sessions. pp. 177{180. Stroudsburg, PA, USA (2007)
Liu, X., Nie, J.: Bridging layperson's queries with medical concepts { GRIUM @CLEF2015 eHealth Task 2. In: Working Notes of CLEF 2015 Conference and Labs of the Evaluation forum. vol. 1391. CEUR-WS.org, Toulouse, France (2015)
McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. pp. 208{214. College Park, Maryland (1999)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. pp. 3111{3119. NIPS'13, Curran Associates Inc., USA (2013)
Nikoulina, V., Kovachev, B., Lagos, N., Monz, C.: Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. pp. 109{119. Stroudsburg, PA, USA (2012)
Nogueira, R., Cho, K.: Task-oriented query reformulation with reinforcement learning. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pp. 574{583 (2017)
Nunzio, G.M.D., Moldovan, A.: A study on query expansion with mesh terms and elasticsearch. IMS unipd at CLEF ehealth task 3. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France, September 10-14, 2018. CEUR-WS, Avignon, France (2018)
Oard, D.: A comparative study of query and document translation for crosslanguage information retrieval. In: Machine Translation and the Information Soup, vol. 1529, pp. 472{483. Springer, Berlin, Germany (1998)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Proceedings of the 27th European Conference on Advances in Information Retrieval Research. pp. 517{519. ECIR'05, Springer- Verlag, Berlin, Heidelberg (2005)
Pakhomov, S.V., Finley, G., McEwan, R., Wang, Y., Melton, G.B.: Corpus domain e ects on distributional semantic modeling of medical terms. Bioinformatics 32(23), 3635{3644 (2016)
Pal, D., Mitra, M., Datta, K.: Query expansion using term distribution and term association. CoRR abs/1303.0667 (2013)
Pal, D., Mitra, M., Datta, K.: Improving query expansion using wordnet. J. Assoc. Inf. Sci. Technol. 65(12), 2469{2478 (2014)
Palotti, J.R., Zuccon, G., Goeuriot, L., Kelly, L., Hanbury, A., Jones, G.J., Lu pu, M., Pecina, P.: CLEF eHealth Evaluation Lab 2015, Task 2: Retrieving information about medical symptoms. In: CLEF (Working Notes). pp. 1{22. Spriner, Berlin, Germany (2015)

تحميل البحث