ﻻ يوجد ملخص باللغة العربية
Document subject classification is essential for structuring (digital) libraries and allowing readers to search within a specific field. Currently, the classification is typically made by human domain experts. Semi-supervised Machine Learning algorithms can support them by exploiting the labeled data to predict subject classes for unclassified new documents. However, while humans partly do, machines mostly do not explain the reasons for their decisions. Recently, explainable AI research to address the problem of Machine Learning decisions being a black box has increasingly gained interest. Explainer models have already been applied to the classification of natural language texts, such as legal or medical documents. Documents from Science, Technology, Engineering, and Mathematics (STEM) disciplines are more difficult to analyze, since they contain both textual and mathematical formula content. In this paper, we present first advances towards STEM document classification explainability using classical and mathematical Entity Linking. We examine relationships between textual and mathematical subject classes and entities, mining a collection of documents from the arXiv preprint repository (NTCIR and zbMATH dataset). The results indicate that mathematical entities have the potential to provide high explainability as they are a crucial part of a STEM document.
In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text classification approa
Entity linking (or Normalization) is an essential task in text mining that maps the entity mentions in the medical text to standard entities in a given Knowledge Base (KB). This task is of great importance in the medical domain. It can also be used f
Domain specific information retrieval process has been a prominent and ongoing research in the field of natural language processing. Many researchers have incorporated different techniques to overcome the technical and domain specificity and provide
We present WISER, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking.
We introduce and make publicly available an entity linking dataset from Reddit that contains 17,316 linked entities, each annotated by three human annotators and then grouped into Gold, Silver, and Bronze to indicate inter-annotator agreement. We ana