ترغب بنشر مسار تعليمي؟ اضغط هنا

What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text

96   0   0.0 ( 0 )
 نشر من قبل Yosi Mass
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We describe a large, high-quality benchmark for the evaluation of Mention Detection tools. The benchmark contains annotations of both named entities as well as other types of entities, annotated on different types of text, ranging from clean text taken from Wikipedia, to noisy spoken data. The benchmark was built through a highly controlled crowd sourcing process to ensure its quality. We describe the benchmark, the process and the guidelines that were used to build it. We then demonstrate the results of a state-of-the-art system running on that benchmark.



قيم البحث

اقرأ أيضاً

Mention detection is an important preprocessing step for annotation and interpretation in applications such as NER and coreference resolution, but few stand-alone neural models have been proposed able to handle the full range of mentions. In this wor k, we propose and compare three neural network-based approaches to mention detection. The first approach is based on the mention detection part of a state of the art coreference resolution system; the second uses ELMO embeddings together with a bidirectional LSTM and a biaffine classifier; the third approach uses the recently introduced BERT model. Our best model (using a biaffine classifier) achieves gains of up to 1.8 percentage points on mention recall when compared with a strong baseline in a HIGH RECALL coreference annotation setting. The same model achieves improvements of up to 5.3 and 6.2 p.p. when compared with the best-reported mention detection F1 on the CONLL and CRAC coreference data sets respectively in a HIGH F1 annotation setting. We then evaluate our models for coreference resolution by using mentions predicted by our best model in start-of-the-art coreference systems. The enhanced model achieved absolute improvements of up to 1.7 and 0.7 p.p. when compared with our strong baseline systems (pipeline system and end-to-end system) respectively. For nested NER, the evaluation of our model on the GENIA corpora shows that our model matches or outperforms state-of-the-art models despite not being specifically designed for this task.
Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relat ion between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of training data by generating artificial anaphoric sentence--antecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and -- if disregarding syntax -- discriminates candidates using deeper features.
On the WikiSQL benchmark, state-of-the-art text-to-SQL systems typically take a slot-filling approach by building several dedicated models for each type of slots. Such modularized systems are not only complex butalso of limited capacity for capturing inter-dependencies among SQL clauses. To solve these problems, this paper proposes a novel extraction-linking approach, where a unified extractor recognizes all types of slot mentions appearing in the question sentence before a linker maps the recognized columns to the table schema to generate executable SQL queries. Trained with automatically generated annotations, the proposed method achieves the first place on the WikiSQL benchmark.
Neural models have achieved significant results on the text-to-SQL task, in which most current work assumes all the input questions are legal and generates a SQL query for any input. However, in the real scenario, users can input any text that may no t be able to be answered by a SQL query. In this work, we propose TriageSQL, the first cross-domain text-to-SQL question intention classification benchmark that requires models to distinguish four types of unanswerable questions from answerable questions. The baseline RoBERTa model achieves a 60% F1 score on the test set, demonstrating the need for further improvement on this task. Our dataset is available at https://github.com/chatc/TriageSQL.
While named entity recognition (NER) is a key task in natural language processing, most approaches only target flat entities, ignoring nested structures which are common in many scenarios. Most existing nested NER methods traverse all sub-sequences w hich is both expensive and inefficient, and also dont well consider boundary knowledge which is significant for nested entities. In this paper, we propose a joint entity mention detection and typing model via prior boundary knowledge (BoningKnife) to better handle nested NER extraction and recognition tasks. BoningKnife consists of two modules, MentionTagger and TypeClassifier. MentionTagger better leverages boundary knowledge beyond just entity start/end to improve the handling of nesting levels and longer spans, while generating high quality mention candidates. TypeClassifier utilizes a two-level attention mechanism to decouple different nested level representations and better distinguish entity types. We jointly train both modules sharing a common representation and a new dual-info attention layer, which leads to improved representation focus on entity-related information. Experiments over different datasets show that our approach outperforms previous state of the art methods and achieves 86.41, 85.46, and 94.2 F1 scores on ACE2004, ACE2005, and NNE, respectively.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا