تعتمد أنظمة مجردة للاستفادة من النطاق المفتوح (OpenQA) بشكل عام على المسترد لإيجاد مقاطع مرشحة في كوربس كبيرة وقارئ لاستخراج إجابات من تلك الممرات.في العمل الحديث بكثير، المسترد هو عنصر تعلم يستخدم تمثيلات ناقلات الخشنة من الأسئلة والمرورات.نقول أن خيار النمذجة هذا غير معبرة بما فيه الكفاية للتعامل مع تعقيد أسئلة اللغة الطبيعية.لمعالجة هذا، نحدد Colbert-Qa، الذي يتكيف مع نموذج استرجاع العصبي القابل للتطوير كولبيرت إلى OpenQA.Colbert يخلق تفاعلات جيدة المحبوس بين الأسئلة والمرورات.نقترح استراتيجية إشرافية ضعيفة فعالة تستخدم Colbert لإنشاء بيانات التدريب الخاصة بها.هذا يحسن إلى حد كبير استرجاع OpenQA على الأسئلة الطبيعية والتشكيني و Triviaqa، ويقوم النظام الناتج بأداء OpenQa الاستخراجي من بين الفن على جميع مجموعات البيانات الثلاثة.
Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.
References used
https://aclanthology.org/
Scientific literature analysis needs fine-grained named entity recognition (NER) to provide a wide range of information for scientific discovery. For example, chemistry research needs to study dozens to hundreds of distinct, fine-grained entity types
State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to a
One of the first building blocks to create a voice assistant relates to the task of tagging entities or attributes in user queries. This can be particularly challenging when entities are in the tenth of millions, as is the case of e.g. music catalogs
Humans can distinguish new categories very efficiently with few examples, largely due to the fact that human beings can leverage knowledge obtained from relevant tasks. However, deep learning based text classification model tends to struggle to achie
We present ReasonBert, a pre-training method that augments language models with the ability to reason over long-range relations and multiple, possibly hybrid contexts. Unlike existing pre-training methods that only harvest learning signals from local