Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tobias Weber

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Tobias Weber - Dieter Kranzlmuller - Michael Fromm

استرجاع المعلومات المكتبات الرقمية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Automated classification of metadata of research data by their discipline(s) of research can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. Openly available metadata of the DataCite index for research data were used to compile a large training and evaluation set comprised of 609,524 records, which is published alongside this paper. These data allow to reproducibly assess classification approaches, such as tree-based models and neural networks. According to our experiments with 20 base classes (multi-label classification), multi-layer perceptron models perform best with a f1-macro score of 0.760 closely followed by Long Short-Term Memory models (f1-macro score of 0.755). A possible application of the trained classification models is the quantitative analysis of trends towards interdisciplinarity of digital scholarly output or the characterization of growth patterns of research data, stratified by discipline of research. Both applications perform at scale with the proposed models which are available for re-use.

قيم البحث

409 - Kristina M. Hettne , Harish Dharuri , Jun Zhao 2013

One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of su ch computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows.

الجينوم المكتبات الرقمية

SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

151 - Tom Hope , Jason Portenoy , Kishore Vasan 2020

The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers (e.g., genes, drugs, diseases, patient outcomes); second, combining textual and network information to search and visualize groups of researchers and their ties. SciSight has so far served over $15K$ users with over $42K$ page views and $13%$ returns.

استرجاع المعلومات المكتبات الرقمية تفاعل الإنسان والحاسوب

A Semi-supervised Multi-task Learning Approach to Classify Customer Contact Intents

83 - Li Dong , Matthew C. Spencer , Amir Biagi 2021

In the area of customer support, understanding customers intents is a crucial step. Machine learning plays a vital role in this type of intent classification. In reality, it is typical to collect confirmation from customer support representatives (CS Rs) regarding the intent prediction, though it can unnecessarily incur prohibitive cost to ask CSRs to assign existing or new intents to the mis-classified cases. Apart from the confirmed cases with and without intent labels, there can be a number of cases with no human curation. This data composition (Positives + Unlabeled + multiclass Negatives) creates unique challenges for model development. In response to that, we propose a semi-supervised multi-task learning paradigm. In this manuscript, we share our experience in building text-based intent classification models for a customer support service on an E-commerce website. We improve the performance significantly by evolving the model from multiclass classification to semi-supervised multi-task learning by leveraging the negative cases, domain- and task-adaptively pretrained ALBERT on customer contact texts, and a number of un-curated data with no labels. In the evaluation, the final model boosts the average AUC ROC by almost 20 points compared to the baseline finetuned multiclass classification ALBERT model.

استرجاع المعلومات الذكاء الاصطناعي الحساب واللغة

Thematic analysis of multiple sclerosis research by enhanced strategic diagram

496 - Nazlahshaniza Shafina , Che Aishah Nazariah Ismaila , Mohd Zulkiflin Mustafa 2021

This bibliometric review summarised the research trends and analysed research areas in multiple sclerosis (MS) over the last decade. The documents containing the term multiple sclerosis in the article title were retrieved from the Scopus database. We found a total of 18003 articles published in journals in the English language between 2012 and 2021. The emerging keywords identified utilising the enhanced strategic diagram were covid-19, teriflunomide, clinical trial, microglia, b cells, myelin, brain, white matter, functional connectivity, pain, employment, health-related quality of life, meta-analysis and comorbidity. In conclusion, this study demonstrates the tremendous growth of MS literature worldwide, which is expected to grow more than double during the next decade especially in the identified emerging topics.

الخلايا العصبية والإدراك المكتبات الرقمية

Assigning Creative Commons Licenses to Research Metadata: Issues and Cases

73 - Marta Poblet , Amir Aryani , Paolo Manghi 2016

This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in research. W e discuss how the lack of transparency hinders discovery of research data and make it disconnected from the publication and other trusted research outcomes. In addition, we discuss the application of Creative Commons licenses for research metadata, and provide some examples of the applicability of this approach to internationally known data infrastructures.

أجهزة الكمبيوتر والمجتمع