Do you want to publish a course? Click here

Navigation-Based Candidate Expansion and Pretrained Language Models for Citation Recommendation

76   0   0.0 ( 0 )
 Added by Rodrigo Nogueira
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration. We treat this task as a ranking problem, which we tackle with a two-stage approach: candidate generation followed by re-ranking. Within this framework, we adapt to the scientific domain a proven combination based on bag of words retrieval followed by re-scoring with a BERT model. We experimentally show the effects of domain adaptation, both in terms of pretraining on in-domain data and exploiting in-domain vocabulary. In addition, we introduce a novel navigation-based document expansion strategy to enrich the candidate documents processed by our neural models. On three different collections from different scientific disciplines, we achieve the best-reported results in the citation recommendation task.



rate research

Read More

Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction into automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods, and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles.
In this study, we address the challenges in developing a deep learning-based automatic patent citation recommendation system. Although deep learning-based recommendation systems have exhibited outstanding performance in various domains (such as movies, products, and paper citations), their validity in patent citations has not been investigated, owing to the lack of a freely available high-quality dataset and relevant benchmark model. To solve these problems, we present a novel dataset called PatentNet that includes textual information and metadata for approximately 110,000 patents from the Google Big Query service. Further, we propose strong benchmark models considering the similarity of textual information and metadata (such as cooperative patent classification code). Compared with existing recommendation methods, the proposed benchmark method achieved a mean reciprocal rank of 0.2377 on the test set, whereas the existing state-of-the-art recommendation method achieved 0.2073.
Citation recommendation is an important task to assist scholars in finding candidate literature to cite. Traditional studies focus on static models of recommending citations, which do not explicitly distinguish differences between papers that are caused by temporal variations. Although, some researchers have investigated chronological citation recommendation by adding time related function or modeling textual topics dynamically. These solutions can hardly cope with function generalization or cold-start problems when there is no information for user profiling or there are isolated papers never being cited. With the rise and fall of science paradigms, scientific topics tend to change and evolve over time. People would have the time preference when citing papers, since most of the theoretical basis exist in classical readings that published in old time, while new techniques are proposed in more recent papers. To explore chronological citation recommendation, this paper wants to predict the time preference based on user queries, which is a probability distribution of citing papers published in different time slices. Then, we use this time preference to re-rank the initial citation list obtained by content-based filtering. Experimental results demonstrate that task performance can be further enhanced by time preference and its flexible to be added in other citation recommendation frameworks.
Lawyers and judges spend a large amount of time researching the proper legal authority to cite while drafting decisions. In this paper, we develop a citation recommendation tool that can help improve efficiency in the process of opinion drafting. We train four types of machine learning models, including a citation-list based method (collaborative filtering) and three context-based methods (text similarity, BiLSTM and RoBERTa classifiers). Our experiments show that leveraging local textual context improves recommendation, and that deep neural models achieve decent performance. We show that non-deep text-based methods benefit from access to structured case metadata, but deep models only benefit from such access when predicting from context of insufficient length. We also find that, even after extensive training, RoBERTa does not outperform a recurrent neural model, despite its benefits of pretraining. Our behavior analysis of the RoBERTa model further shows that predictive performance is stable across time and citation classes.
The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers (e.g., genes, drugs, diseases, patient outcomes); second, combining textual and network information to search and visualize groups of researchers and their ties. SciSight has so far served over $15K$ users with over $42K$ page views and $13%$ returns.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا