بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Practice in Synonym Extraction at Large Scale

227 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Liangliang Cao

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Liangliang Cao - Chang Wang

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications. Automatic synonym extractor is highly preferred for large scale applications. Previous studies in synonym extraction are most limited to small scale datasets. In this paper, we build a large dataset with 3.4 million synonym/non-synonym pairs to capture the challenges in real world scenarios. We proposed (1) a new cost function to accommodate the unbalanced learning problem, and (2) a feature learning based deep neural network to model the complicated relationships in synonym pairs. We compare several different approaches based on SVMs and neural networks, and find out a novel feature learning based neural network outperforms the methods with hand-assigned features. Specifically, the best performance of our model surpasses the SVM baseline with a significant 97% relative improvement.

قيم البحث

292 - Chang Wang , Liangliang Cao , Bowen Zhou 2015

In this paper, we present a novel approach for medical synonym extraction. We aim to integrate the term embedding with the medical domain knowledge for healthcare applications. One advantage of our method is that it is very scalable. Experiments on a dataset with more than 1M term pairs show that the proposed approach outperforms the baseline approaches by a large margin.

الحساب واللغة

DocRED: A Large-Scale Document-Level Relation Extraction Dataset

86 - Yuan Yao , Deming Ye , Peng Li 2019

Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features: (1) DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE from plain text; (2) DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document; (3) along with the human-annotated data, we also offer large-scale distantly supervised data, which enables DocRED to be adopted for both supervised and weakly supervised scenarios. In order to verify the challenges of document-level RE, we implement recent state-of-the-art methods for RE and conduct a thorough evaluation of these methods on DocRED. Empirical results show that DocRED is challenging for existing RE methods, which indicates that document-level RE remains an open problem and requires further efforts. Based on the detailed analysis on the experiments, we discuss multiple promising directions for future research.

الحساب واللغة

LOME: Large Ontology Multilingual Extraction

58 - Patrick Xia , Guanghui Qin , Siddharth Vashishtha 2021

We present LOME, a system for performing multilingual information extraction. Given a text document as input, our core system identifies spans of textual entity and event mentions with a FrameNet (Baker et al., 1998) parser. It subsequently performs coreference resolution, fine-grained entity typing, and temporal relation prediction between events. By doing so, the system constructs an event and entity focused knowledge graph. We can further apply third-party modules for other types of annotation, like relation extraction. Our (multilingual) first-party modules either outperform or are competitive with the (monolingual) state-of-the-art. We achieve this through the use of multilingual encoders like XLM-R (Conneau et al., 2020) and leveraging multilingual training data. LOME is available as a Docker container on Docker Hub. In addition, a lightweight version of the system is accessible as a web demo.

الحساب واللغة

Biomedical Entity Representations with Synonym Marginalization

313 - Mujeen Sung , Hwisang Jeon , Jinhyuk Lee 2020

Biomedical named entities often play important roles in many biomedical text mining tools. However, due to the incompleteness of provided synonyms and numerous variations in their surface forms, normalization of biomedical entities is very challengin g. In this paper, we focus on learning representations of biomedical entities solely based on the synonyms of entities. To learn from the incomplete synonyms, we use a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates. Our model-based candidates are iteratively updated to contain more difficult negative samples as our model evolves. In this way, we avoid the explicit pre-selection of negative samples from more than 400K candidates. On four biomedical entity normalization datasets having three different entity types (disease, chemical, adverse reaction), our model BioSyn consistently outperforms previous state-of-the-art models almost reaching the upper bound on each dataset.

الحساب واللغة التعلم الآلي

Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

53 - Martin Kirchgessner , Vincent Leroy , Sihem Amer-Yahia 2016

Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extra cting correlations such as people in the South of France buy rose wine or customers who buy pate also buy salted butter and sour bread. Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of interestingness measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare the outcome of interestingness measures applied to buying patterns in the retail industry. We report on how we used CAPA to compare 34 measures applied to over 1,800 stores of Intermarche, one of the largest food retailers in France.

قواعد البيانات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حلوان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Practice in Synonym Extraction at Large Scale

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً