ترغب بنشر مسار تعليمي؟ اضغط هنا

Scientific Dataset Discovery via Topic-level Recommendation

162   0   0.0 ( 0 )
 نشر من قبل Shichao Pei
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Data intensive research requires the support of appropriate datasets. However, it is often time-consuming to discover usable datasets matching a specific research topic. We formulate the dataset discovery problem on an attributed heterogeneous graph, which is composed of paper-paper citation, paper-dataset citation, and also paper content. We propose to characterize both paper and dataset nodes by their commonly shared latent topics, rather than learning user and item representations via canonical graph embedding models, because the usage of datasets and the themes of research projects can be understood on the common base of research topics. The relevant datasets to a given research project can then be inferred in the shared topic space. The experimental results show that our model can generate reasonable profiles for datasets, and recommend proper datasets for a query, which represents a research project linked with several papers.

قيم البحث

اقرأ أيضاً

Globally, recommendation services have become important due to the fact that they support e-commerce applications and different research communities. Recommender systems have a large number of applications in many fields including economic, education , and scientific research. Different empirical studies have shown that recommender systems are more effective and reliable than keyword-based search engines for extracting useful knowledge from massive amounts of data. The problem of recommending similar scientific articles in scientific community is called scientific paper recommendation. Scientific paper recommendation aims to recommend new articles or classical articles that match researchers interests. It has become an attractive area of study since the number of scholarly papers increases exponentially. In this survey, we first introduce the importance and advantages of paper recommender systems. Second, we review the recommendation algorithms and methods, such as Content-Based methods, Collaborative Filtering methods, Graph-Based methods and Hybrid methods. Then, we introduce the evaluation methods of different recommender systems. Finally, we summarize open issues in the paper recommender systems, including cold start, sparsity, scalability, privacy, serendipity and unified scholarly data standards. The purpose of this survey is to provide comprehensive reviews on scholarly paper recommendation.
149 - Yi Luan 2018
As a research community grows, more and more papers are published each year. As a result there is increasing demand for improved methods for finding relevant papers, automatically understanding the key ideas and recommending potential methods for a t arget problem. Despite advances in search engines, it is still hard to identify new technologies according to a researchers need. Due to the large variety of domains and extremely limited annotated resources, there has been relatively little work on leveraging natural language processing in scientific recommendation. In this proposal, we aim at making scientific recommendations by extracting scientific terms from a large collection of scientific papers and organizing the terms into a knowledge graph. In preliminary work, we trained a scientific term extractor using a small amount of annotated data and obtained state-of-the-art performance by leveraging large amount of unannotated papers through applying multiple semi-supervised approaches. We propose to construct a knowledge graph in a way that can make minimal use of hand annotated data, using only the extracted terms, unsupervised relational signals such as co-occurrence, and structural external resources such as Wikipedia. Latent relations between scientific terms can be learned from the graph. Recommendations will be made through graph inference for both observed and unobserved relational pairs.
Item-based collaborative filtering (ICF) enjoys the advantages of high recommendation accuracy and ease in online penalization and thus is favored by the industrial recommender systems. ICF recommends items to a target user based on their similaritie s to the previously interacted items of the user. Great progresses have been achieved for ICF in recent years by applying advanced machine learning techniques (e.g., deep neural networks) to learn the item similarity from data. The early methods simply treat all the historical items equally and recent ones distinguish the different importance of items for a prediction. Despite the progress, we argue that those ICF models neglect the diverse intents of users on adopting items (e.g., watching a movie because of the director, leading actors, or the visual effects). As a result, they fail to estimate the item similarity on a finer-grained level to predict the users preference for an item, resulting in sub-optimal recommendation. In this work, we propose a general factor-level attention method for ICF models. The key of our method is to distinguish the importance of different factors when computing the item similarity for a prediction. To demonstrate the effectiveness of our method, we design a light attention neural network to integrate both item-level and factor-level attention for neural ICF models. It is model-agnostic and easy-to-implement. We apply it to two baseline ICF models and evaluate its effectiveness on six public datasets. Extensive experiments show the factor-level attention enhanced models consistently outperform their counterparts, demonstrating the potential of differentiate user intents on the factor-level for ICF recommendation models.
One way to assess a certain aspect of the value of scientific research is to measure the attention it receives on social media. While previous research has mostly focused on the number of mentions of scientific research on social media, the current s tudy applies topic networks to measure public attention to scientific research on Twitter. Topic networks are the networks of co-occurring author keywords in scholarly publications and networks of co-occurring hashtags in the tweets mentioning those scholarly publications. This study investigates which topics in opioid scholarly publications have received public attention on Twitter. Additionally, it investigates whether the topic networks generated from the publications tweeted by all accounts (bot and non-bot accounts) differ from those generated by non-bot accounts. Our analysis is based on a set of opioid scholarly publications from 2011 to 2019 and the tweets associated with them. We use co-occurrence network analysis to generate topic networks. Results indicated that Twitter users have mostly used generic terms to discuss opioid publications, such as Opioid, Pain, Addiction, Treatment, Analgesics, Abuse, Overdose, and Disorders. Results confirm that topic networks provide a legitimate method to visualize public discussions of health-related scholarly publications and how Twitter users discuss health-related scientific research differently from the scientific community. There was a substantial overlap between the topic networks based on the tweets by all accounts and non-bot accounts. This result indicates that it might not be necessary to exclude bot accounts for generating topic networks as they have a negligible impact on the results.
Decision-making usually takes five steps: identifying the problem, collecting data, extracting evidence, identifying pro and con arguments, and making decisions. Focusing on extracting evidence, this paper presents a hybrid model that combines latent Dirichlet allocation and word embeddings to obtain external knowledge from structured and unstructured data. We study the task of sentence-level argument mining, as arguments mostly require some degree of world knowledge to be identified and understood. Given a topic and a sentence, the goal is to classify whether a sentence represents an argument in regard to the topic. We use a topic model to extract topic- and sentence-specific evidence from the structured knowledge base Wikidata, building a graph based on the cosine similarity between the entity word vectors of Wikidata and the vector of the given sentence. Also, we build a second graph based on topic-specific articles found via Google to tackle the general incompleteness of structured knowledge bases. Combining these graphs, we obtain a graph-based model which, as our evaluation shows, successfully capitalizes on both structured and unstructured data.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا