Structural Scaffolds for Citation Intent Classification in Scientific Publications

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Arman Cohan

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Arman Cohan - Waleed Ammar - Madeleine van Zuylen

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Identifying the intent of a citation in scientific papers (e.g., background information, use of methods, comparing results) is critical for machine reading of individual publications and automated analysis of the scientific literature. We propose structural scaffolds, a multitask model to incorporate structural information of scientific papers into citations for effective classification of citation intents. Our model achieves a new state-of-the-art on an existing ACL anthology dataset (ACL-ARC) with a 13.3% absolute increase in F1 score, without relying on external linguistic resources or hand-engineered features as done in existing methods. In addition, we introduce a new dataset of citation intents (SciCite) which is more than five times larger and covers multiple scientific domains compared with existing datasets. Our code and data are available at: https://github.com/allenai/scicite.

قيم البحث

386 - Buddhika Nettasinghe , Nazanin Alipourfard , Vikram Krishnamurthy 2021

Structural inequalities persist in society, conferring systematic advantages to some people at the expense of others, for example, by giving them substantially more influence and opportunities. Using bibliometric data about authors of scientific publ ications, we identify two types of structural inequalities in scientific citations. First, female authors, who represent a minority of researchers, receive less recognition for their work (through citations) relative to male authors; second, authors affiliated with top-ranked institutions, who are also a minority, receive substantially more recognition compared to other authors. We present a model for the growth of directed citation networks and show that citations disparities arise from individual preferences to cite authors from the same group (homophily), highly cited or active authors (preferential attachment), as well as the size of the group and how frequently new authors join. We analyze the model and show that its predictions align well with real-world observations. Our theoretical and empirical analysis also suggests potential strategies to mitigate structural inequalities in science. In particular, we find that merely increasing the minority group size does little to narrow the disparities. Instead, reducing the homophily of each group, frequently adding new authors to a research field while providing them an accessible platform among existing, established authors, together with balanced group sizes can have the largest impact on reducing inequality. Our work highlights additional complexities of mitigating structural disparities stemming from asymmetric relations (e.g., directed citations) compared to symmetric relations (e.g., collaborations).

الفيزياء والمجتمع أجهزة الكمبيوتر والمجتمع المكتبات الرقمية

Enhancing Scientific Papers Summarization with Citation Graph

135 - Chenxin An , Ming Zhong , Yiran Chen 2021

Previous work for text summarization in scientific domain mainly focused on the content of the input document, but seldom considering its citation network. However, scientific papers are full of uncommon domain-specific terms, making it almost imposs ible for the model to understand its true meaning without the help of the relevant research community. In this paper, we redefine the task of scientific papers summarization by utilizing their citation graph and propose a citation graph-based summarization model CGSum which can incorporate the information of both the source paper and its references. In addition, we construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains and 661K citation relationships. The entire dataset constitutes a large connected citation graph. Extensive experiments show that our model can achieve competitive performance when compared with the pretrained models even with a simple architecture. The results also indicates the citation graph is crucial to better understand the content of papers and generate high-quality summaries.

الحساب واللغة

New measures for evaluating creativity in scientific publications

200 - Simona Doboli , Fanshu Zhao , 2014

The goal of our research is to understand how ideas propagate, combine and are created in large social networks. In this work, we look at a sample of relevant scientific publications in the area of high-frequency analog circuit design and their citat ion distribution. A novel aspect of our work is the way in which we categorize citations based on the reason and place of it in a publication. We created seven citation categories from general domain references, references to specific methods used in the same domain problem, references to an analysis method, references for experimental comparison and so on. This added information allows us to define two new measures to characterize the creativity (novelty and usefulness) of a publication based on its pattern of citations clustered by reason, place and citing scientific group. We analyzed 30 publications in relevant journals since 2000 and their about 300 citations, all in the area of high-frequency analog circuit design. We observed that the number of citations a publication receives from different scientific groups matches a Levy type distribution: with a large number of groups citing a publication relatively few times, and a very small number of groups citing a publication a large number of times. We looked at the motifs a publication is cited differently by different scientific groups.

الشبكات الاجتماعية والمعلومات المكتبات الرقمية الفيزياء والمجتمع

Subword Semantic Hashing for Intent Classification on Small Datasets

99 - Kumar Shridhar , Ayushman Dash , Amit Sahu 2018

In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task fo r data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise by the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: AskUbuntu, Chatbot, and Web Application. Our benchmarks are available online: https://github.com/kumar-shridhar/Know-Your-Intent

الحساب واللغة

Effectiveness of Pre-training for Few-shot Intent Classification

186 - Haode Zhang , Yuwei Zhang , Li-Ming Zhan 2021

This paper investigates the effectiveness of pre-training for few-shot intent classification. While existing paradigms commonly further pre-train language models such as BERT on a vast amount of unlabeled corpus, we find it highly effective and effic ient to simply fine-tune BERT with a small set of labeled utterances from public datasets. Specifically, fine-tuning BERT with roughly 1,000 labeled data yields a pre-trained model -- IntentBERT, which can easily surpass the performance of existing pre-trained models for few-shot intent classification on novel domains with very different semantics. The high effectiveness of IntentBERT confirms the feasibility and practicality of few-shot intent detection, and its high generalization ability across different domains suggests that intent classification tasks may share a similar underlying structure, which can be efficiently learned from a small set of labeled data. The source code can be found at https://github.com/hdzhang-code/IntentBERT.

الحساب واللغة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حماه

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Structural Scaffolds for Citation Intent Classification in Scientific Publications

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً