Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages

الأشعة تحت الحمراء مثل SIR: استرجاع المعلومات المحسنة لغات متعددة

772 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

With the advent of contextualized embeddings, attention towards neural ranking approaches for Information Retrieval increased considerably. However, two aspects have remained largely neglected: i) queries usually consist of few keywords only, which increases ambiguity and makes their contextualization harder, and ii) performing neural ranking on non-English documents is still cumbersome due to shortage of labeled datasets. In this paper we present SIR (Sense-enhanced Information Retrieval) to mitigate both problems by leveraging word sense information. At the core of our approach lies a novel multilingual query expansion mechanism based on Word Sense Disambiguation that provides sense definitions as additional semantic information for the query. Importantly, we use senses as a bridge across languages, thus allowing our model to perform considerably better than its supervised and unsupervised alternatives across French, German, Italian and Spanish languages on several CLEF benchmarks, while being trained on English Robust04 data only. We release SIR at https://github.com/SapienzaNLP/sir.

References used

https://aclanthology.org/

rate research

Term Selection for Query Expansion in Medical Cross-lingual Information Retrieval

3022 - Springer 2019 ورقة بحثية

We present a method for automatic query expansion for cross-lingual information retrieval in the medical domain. The method employs machine translation of source-language queries into a document language and linear regression to predict the retriev al performance for each translated query when expanded with a candidate term. Candidate terms (in the document language) come from multiple sources: query translation hypotheses obtained from the machine translation system, Wikipedia articles and PubMed abstracts. Query expansion is applied only when the model predicts a score for a candidate term that exceeds a tuned threshold which allows to expand queries with strongly related terms only. Our experiments are conducted using the CLEF eHealth 2013--2015 test collection and show %seven source languages and also in the monolingual case. The results show significant improvements in both cross-lingual and monolingual settings.

Information retrieval Machine Translation ehealth Cross-lingual information retrieval

Introducing Information Retrieval for Biomedical Informatics Students

784 - Association for Computation Linguistics 2021 مقالة

Introducing biomedical informatics (BMI) students to natural language processing (NLP) requires balancing technical depth with practical know-how to address application-focused needs. We developed a set of three activities introducing introductory BM I students to information retrieval with NLP, covering document representation strategies and language models from TF-IDF to BERT. These activities provide students with hands-on experience targeted towards common use cases, and introduce fundamental components of NLP workflows for a wide variety of applications.

biomedical informatics students biomedical informatics introducing biomedical informatics طلاب المعلوماتية الطبية الحيوية المعلوماتية الطبية الحيوية تقديم المعلوماتية الطبية الحيوية صناعة حمض الفوسفور المزيد..

Improving Arabic Information Retrieval Results Semantically Using Ontology

2985 - Aِl-Baath University 2016 ورقة بحثية

This research proposes a new way to improve the search outcome of Arabic semantics by abstractly summarizing the Arabic texts (Abstractive Summary) using natural language processing algorithms(NLP),Word Sense Disambiguation (WSD) and techniques o f measuring Semantic Similarity in Arabic WordNet Ontology.

معالجة اللغات الطبيعية Semantic analysis استرجاع المعلومات التلخيص التجريدي الأنتولوجيا العربية ووردنت العلاقة الدلالية المفاهيمية التشابهية الدلالية التحليل الدلالي حل غموض معاني الكلمات (Natural Language Processing (NLP (Information Retrieval (IR Abstractive Summarization (Arabic WordNet (AWN Conceptual Semantic Relation Semantic Similarity (Word Sense Disambiguation (WSD المزيد..

Disentangling Document Topic and Author Gender in Multiple Languages: Lessons for Adversarial Debiasing

580 - Association for Computation Linguistics 2021 مقالة

Text classification is a central tool in NLP. However, when the target classes are strongly correlated with other textual attributes, text classification models can pick up wrong'' features, leading to bad generalization and biases. In social media a nalysis, this problem surfaces for demographic user classes such as language, topic, or gender, which influence the generate text to a substantial extent. Adversarial training has been claimed to mitigate this problem, but thorough evaluation is missing. In this paper, we experiment with text classification of the correlated attributes of document topic and author gender, using a novel multilingual parallel corpus of TED talk transcripts. Our findings are: (a) individual classifiers for topic and author gender are indeed biased; (b) debiasing with adversarial training works for topic, but breaks down for author gender; (c) gender debiasing results differ across languages. We interpret the result in terms of feature space overlap, highlighting the role of linguistic surface realization of the target classes.

multiple languages disentangling document topic author gender لغات متعددة DisenTangling وثيقة الموضوع المؤلف الجنس صناعة حمض الفوسفور المزيد..

Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered

623 - Association for Computation Linguistics 2021 مقالة

We train neural models for morphological analysis, generation and lemmatization for morphologically rich languages. We present a method for automatically extracting substantially large amount of training data from FSTs for 22 languages, out of which 17 are endangered. The neural models follow the same tagset as the FSTs in order to make it possible to use them as fallback systems together with the FSTs. The source code, models and datasets have been released on Zenodo.

neural morphology dataset neural morphology مجموعة بيانات التورفولوجيا العصبية التورفولوجيا العصبي صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages

الأشعة تحت الحمراء مثل SIR: استرجاع المعلومات المحسنة لغات متعددة

Ask ChatGPT about the research

Read More

suggested questions