Subscribe to the gold package and get unlimited access to Shamra Academy

Causal Knowledge Extraction from Scholarly Papers in Social Sciences

142 0 0.0 ( 0 )

Download Cite

Added by J. Felipe Montano-Campos

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Victor Zitian Chen - Felipe Montano-Campos - Wlodek Zadrozny

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The scale and scope of scholarly articles today are overwhelming human researchers who seek to timely digest and synthesize knowledge. In this paper, we seek to develop natural language processing (NLP) models to accelerate the speed of extraction of relationships from scholarly papers in social sciences, identify hypotheses from these papers, and extract the cause-and-effect entities. Specifically, we develop models to 1) classify sentences in scholarly documents in business and management as hypotheses (hypothesis classification), 2) classify these hypotheses as causal relationships or not (causality classification), and, if they are causal, 3) extract the cause and effect entities from these hypotheses (entity extraction). We have achieved high performance for all the three tasks using different modeling techniques. Our approach may be generalizable to scholarly documents in a wide range of social sciences, as well as other types of textual materials.

rate research

Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings

263 - Dhruva Sahrawat , Debanjan Mahata , Mayank Kulkarni 2019

In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets (Inspec, SemEval 2010, SemEval 2017) and compare with existing popular unsupervised and supervised techniques. Our results quantify the benefits of (a) using contextualized embeddings (e.g. BERT) over fixed word embeddings (e.g. Glove); (b) using a BiLSTM-CRF architecture with contextualized word embeddings over fine-tuning the contextualized word embedding model directly, and (c) using genre-specific contextualized embeddings (SciBERT). Through error analysis, we also provide some insights into why particular models work better than others. Lastly, we present a case study where we analyze different self-attention layers of the two best models (BERT and SciBERT) to better understand the predictions made by each for the task of keyphrase extraction.

Computation and Language

Integration of Japanese Papers Into the DBLP Data Set

106 - Paul Christian Sommerhoff 2017

If someone is looking for a certain publication in the field of computer science, the searching person is likely to use the DBLP to find the desired publication. The DBLP data set is continuously extended with new publications, or rather their metadata, for example the names of involved authors, the title and the publication date. While the size of the data set is already remarkable, specific areas can still be improved. The DBLP offers a huge collection of English papers because most papers concerning computer science are published in English. Nevertheless, there are official publications in other languages which are supposed to be added to the data set. One kind of these are Japanese papers. This diploma thesis will show a way to automatically process publication lists of Japanese papers and to make them ready for an import into the DBLP data set. Especially important are the problems along the way of processing, such as transcription handling and Personal Name Matching with Japanese names.

Computation and Language Digital Libraries

Extracting a Knowledge Base of Mechanisms from COVID-19 Papers

227 - Tom Hope , Aida Amini , David Wadden 2020

The COVID-19 pandemic has spawned a diverse body of scientific literature that is challenging to navigate, stimulating interest in automated tools to help find useful knowledge. We pursue the construction of a knowledge base (KB) of mechanisms -- a fundamental concept across the sciences encompassing activities, functions and causal relations, ranging from cellular processes to economic impacts. We extract this information from the natural language of scientific papers by developing a broad, unified schema that strikes a balance between relevance and breadth. We annotate a dataset of mechanisms with our schema and train a model to extract mechanism relations from papers. Our experiments demonstrate the utility of our KB in supporting interdisciplinary scientific search over COVID-19 literature, outperforming the prominent PubMed search in a study with clinical experts.

Computation and Language Information Retrieval Machine Learning

Scholarly literature and the press: scientific impact and social perception of physics computing

585 - Maria Grazia Pia , Tullio Basaglia , Zane W. Bell 2013

The broad coverage of the search for the Higgs boson in the mainstream media is a relative novelty for high energy physics (HEP) research, whose achievements have traditionally been limited to scholarly literature. This paper illustrates the results of a scientometric analysis of HEP computing in scientific literature, institutional media and the press, and a comparative overview of similar metrics concerning representative particle physics measurements. The picture emerging from these scientometric data documents the scientific impact and social perception of HEP computing. The results of this analysis suggest that improved communication of the scientific and social role of HEP computing would be beneficial to the high energy physics community.

Physics and Society Digital Libraries High Energy Physics - Experiment

Entity Extraction with Knowledge from Web Scale Corpora

275 - Zeyi Wen , Zeyu Huang , Rui Zhang 2019

Entity extraction is an important task in text mining and natural language processing. A popular method for entity extraction is by comparing substrings from free text against a dictionary of entities. In this paper, we present several techniques as a post-processing step for improving the effectiveness of the existing entity extraction technique. These techniques utilise models trained with the web-scale corpora which makes our techniques robust and versatile. Experiments show that our techniques bring a notable improvement on efficiency and effectiveness.

Computation and Language Databases Machine Learning

comments

Fetching comments

Institut National d'Administration

Additional details More universities

Causal Knowledge Extraction from Scholarly Papers in Social Sciences

Ask ChatGPT about the research

No Arabic abstract

Read More