في هذه الورقة، ندرس أهمية السياق في التنبؤ بالجدارة من الجمل في المقالات العلمية.نحن صياغة هذه المشكلة كملمس تسلسل تسلسل باستخدام نموذج Bilstm هرمي.نحن نساهم في مجموعة بيانات قياسية جديدة تحتوي على أكثر من مليوني جمل وملمياتها المقابلة.نحافظ على ترتيب الجملة في هذه البيانات وأداء انقسامات قطار / اختبار على مستوى المستند، والتي تتيح الأهم من دمج المعلومات السياقية في عملية النمذجة.نحن نقيم النهج المقترح على ثلاثة مجموعات من مجموعات البيانات القياسية.تؤدي نتائجنا إلى تحديد فوائد استخدام السياق ومشروع السياق للجدارة.وأخيرا، من خلال تحليل الأخطاء، نحن نقدم رؤى في الحالات التي يلعب فيها السياق دورا أساسيا في التنبؤ بالجدر على الاقتباس.
In this paper, we study the importance of context in predicting the citation worthiness of sentences in scholarly articles. We formulate this problem as a sequence labeling task solved using a hierarchical BiLSTM model. We contribute a new benchmark dataset containing over two million sentences and their corresponding labels. We preserve the sentence order in this dataset and perform document-level train/test splits, which importantly allows incorporating contextual information in the modeling process. We evaluate the proposed approach on three benchmark datasets. Our results quantify the benefits of using context and contextual embeddings for citation worthiness. Lastly, through error analysis, we provide insights into cases where context plays an essential role in predicting citation worthiness.
References used
https://aclanthology.org/
This paper describes the system we built as the YNU-HPCC team in the SemEval-2021 Task 11: NLPContributionGraph. This task involves first identifying sentences in the given natural language processing (NLP) scholarly articles that reflect research co
Automatically extracting keyphrases from scholarly documents leads to a valuable concise representation that humans can understand and machines can process for tasks, such as information retrieval, article clustering and article classification. This
Machine learning-based prediction of material properties is often hampered by the lack of sufficiently large training data sets. The majority of such measurement data is embedded in scientific literature and the ability to automatically extract these
With the ever-increasing pace of research and high volume of scholarly communication, scholars face a daunting task. Not only must they keep up with the growing literature in their own and related fields, scholars increasingly also need to rebut pseu
Online users today are exposed to misleading and propagandistic news articles and media posts on a daily basis. To counter thus, a number of approaches have been designed aiming to achieve a healthier and safer online news and media consumption. Auto