Term Selection for Query Expansion in Medical Cross-lingual Information Retrieval


Abstract in English

We present a method for automatic query expansion for cross-lingual information retrieval in the medical domain. The method employs machine translation of source-language queries into a document language and linear regression to predict the retrieval performance for each translated query when expanded with a candidate term. Candidate terms (in the document language) come from multiple sources: query translation hypotheses obtained from the machine translation system, Wikipedia articles and PubMed abstracts. Query expansion is applied only when the model predicts a score for a candidate term that exceeds a tuned threshold which allows to expand queries with strongly related terms only. Our experiments are conducted using the CLEF eHealth 2013--2015 test collection and show %seven source languages and also in the monolingual case. The results show significant improvements in both cross-lingual and monolingual settings.

References used

Amati, G., Carpineto, C., Romano, G.: Query diculty, robustness, and selective application of query expansion. In: European conference on information retrieval. pp. 127{137. Springer, Berlin, Germany (2004)
Aronson, A.R.: E ective mapping of biomedical text to the umls metathesaurus: the metamap program. Proc AMIA Symp pp. 17{21 (2001)
Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 243{250. SIGIR '08, ACM, New York, NY, USA (2008)
Chandra, G., Dwivedi, S.K.: Query expansion based on term selection for Hindi- English cross lingual IR. Journal of King Saud University - Computer and Information Sciences (2017)
Chiang, W.T.M., Hagenbuchner, M., Tsoi, A.C.: The wt10g dataset and the evolution of the web. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. pp. 938{939. WWW '05, ACM, New York, NY, USA (2005)
Choi, S., Choi, J.: Exploring e ective information retrieval technique for the medical web documents: Snumedinfo at clefehealth2014 task 3. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum. vol. 1180, pp. 167{175. CEUR-WS.org, Sheeld, UK (2014)
Dusek, O., Hajic, J., Hlavacova, J., Novak, M., Pecina, P., Rosa, R., et al.: Machine translation of medical texts in the Khresmoi project. In: Proceedings of the Ninth Workshop on Statistical Machine Translation. pp. 221{228. Baltimore, USA (2014)
Ermakova, L., Mothe, J.: Query expansion by local context analysis. In: Conference francophone en Recherche d'Information et Applications (CORIA 2016). pp. 235{ 250. CORIA-CIFED, Toulouse, France (2016)
Gabrilovich, E., Broder, A., Fontoura, M., Joshi, A., Josifovski, V., Riedel, L., Zhang, T.: Classifying search queries using the web as a source of knowledge. ACM Transactions on the Web 3(2), 5 (2009)
Goeuriot, L., Kelly, L., Li, W., Palotti, J., Pecina, P., Zuccon, G., Hanbury, A., Jones, G., Mueller, H.: ShARe/CLEF eHealth evaluation lab 2014, Task 3: Usercentred health information retrieval. In: Proceedings of CLEF 2014. pp. 43{61. CEUR-WS.org, Sheeld,UK (2014)
Goeuriot, L., Kelly, L., Suominen, H., Hanlen, L., Nevaol, A., Grouin, C., Palotti, J., Zuccon, G.: Overview of the CLEF eHealth evaluation lab 2015. In: The 6th Conference and Labs of the Evaluation Forum. pp. 429{443. Springer, Berlin, Germany (2015)
Harman, D.: Towards interactive query expansion. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 321{331. SIGIR '88, ACM, New York, NY, USA (1988)
Harman, D.: Information retrieval. chap. Relevance Feedback and Other Query Modi cation Techniques, pp. 241{263. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1992)
Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 329{338. ACM, Pittsburgh, USA (1993)
Humphreys, B.L., Lindberg, D.A.B., Schoolman, H.M., Barnett, G.O.: The uni ed medical language system. Journal of the American Medical Informatics Association 5(1), 1{11 (1998)
Kalpathy-Cramer, J., Muller, H., Bedrick, S., Eggel, I., De Herrera, A., Tsikrika, T.: Overview of the clef 2011 medical image classi cation and retrieval tasks. In: CLEF 2011 - Working Notes for CLEF 2011 Conference. vol. 1177. CEUR-WS (2011)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Demo and Poster Sessions. pp. 177{180. Stroudsburg, PA, USA (2007)
Liu, X., Nie, J.: Bridging layperson's queries with medical concepts { GRIUM @CLEF2015 eHealth Task 2. In: Working Notes of CLEF 2015 Conference and Labs of the Evaluation forum. vol. 1391. CEUR-WS.org, Toulouse, France (2015)
McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. pp. 208{214. College Park, Maryland (1999)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. pp. 3111{3119. NIPS'13, Curran Associates Inc., USA (2013)
Nikoulina, V., Kovachev, B., Lagos, N., Monz, C.: Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. pp. 109{119. Stroudsburg, PA, USA (2012)
Nogueira, R., Cho, K.: Task-oriented query reformulation with reinforcement learning. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pp. 574{583 (2017)
Nunzio, G.M.D., Moldovan, A.: A study on query expansion with mesh terms and elasticsearch. IMS unipd at CLEF ehealth task 3. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France, September 10-14, 2018. CEUR-WS, Avignon, France (2018)
Oard, D.: A comparative study of query and document translation for crosslanguage information retrieval. In: Machine Translation and the Information Soup, vol. 1529, pp. 472{483. Springer, Berlin, Germany (1998)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Proceedings of the 27th European Conference on Advances in Information Retrieval Research. pp. 517{519. ECIR'05, Springer- Verlag, Berlin, Heidelberg (2005)
Pakhomov, S.V., Finley, G., McEwan, R., Wang, Y., Melton, G.B.: Corpus domain e ects on distributional semantic modeling of medical terms. Bioinformatics 32(23), 3635{3644 (2016)
Pal, D., Mitra, M., Datta, K.: Query expansion using term distribution and term association. CoRR abs/1303.0667 (2013)
Pal, D., Mitra, M., Datta, K.: Improving query expansion using wordnet. J. Assoc. Inf. Sci. Technol. 65(12), 2469{2478 (2014)
Palotti, J.R., Zuccon, G., Goeuriot, L., Kelly, L., Hanbury, A., Jones, G.J., Lu pu, M., Pecina, P.: CLEF eHealth Evaluation Lab 2015, Task 2: Retrieving information about medical symptoms. In: CLEF (Working Notes). pp. 1{22. Spriner, Berlin, Germany (2015)

Download