أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Malte Ostendorff

A Qualitative Evaluation of User Preference for Link-based vs. Text-based Recommendations of Wikipedia Articles

144 - Malte Ostendorff , Corinna Breitinger , Bela Gipp 2021

Literature recommendation systems (LRS) assist readers in the discovery of relevant content from the overwhelming amount of literature available. Despite the widespread adoption of LRS, there is a lack of research on the user-perceived recommendation characteristics for fundamentally different approaches to content-based literature recommendation. To complement existing quantitative studies on literature recommendation, we present qualitative study results that report on users perceptions for two contrasting recommendation classes: (1) link-based recommendation represented by the Co-Citation Proximity (CPA) approach, and (2) text-based recommendation represented by Lucenes MoreLikeThis (MLT) algorithm. The empirical data analyzed in our study with twenty users and a diverse set of 40 Wikipedia articles indicate a noticeable difference between text- and link-based recommendation generation approaches along several key dimensions. The text-based MLT method receives higher satisfaction ratings in terms of user-perceived similarity of recommended articles. In contrast, the CPA approach receives higher satisfaction scores in terms of diversity and serendipity of recommendations. We conclude that users of literature recommendation systems can benefit most from hybrid approaches that combine both link- and text-based approaches, where the users information needs and preferences should control the weighting for the approaches used. The optimal weighting of multiple approaches used in a hybrid recommendation system is highly dependent on a users shifting needs.

استرجاع المعلومات المكتبات الرقمية

Aspect-based Document Similarity for Research Papers

79 - Malte Ostendorff , Terry Ruas , Till Blume 2020

Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recomm ender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.

الحساب واللغة استرجاع المعلومات

Contextual Document Similarity for Content-based Literature Recommender Systems

96 - Malte Ostendorff 2020

To cope with the ever-growing information overload, an increasing number of digital libraries employ content-based recommender systems. These systems traditionally recommend related documents with the help of similarity measures. However, current doc ument similarity measures simply distinguish between similar and dissimilar documents. This simplification is especially crucial for extensive documents, which cover various facets of a topic and are often found in digital libraries. Still, these similarity measures neglect to what facet the similarity relates. Therefore, the context of the similarity remains ill-defined. In this doctoral thesis, we explore contextual document similarity measures, i.e., methods that determine document similarity as a triple of two documents and the context of their similarity. The context is here a further specification of the similarity. For example, in the scientific domain, research papers can be similar with respect to their background, methodology, or findings. The measurement of similarity in regards to one or more given contexts will enhance recommender systems. Namely, users will be able to explore document collections by formulating queries in terms of documents and their contextual similarities. Thus, our research objective is the development and evaluation of a recommender system based on contextual similarity. The underlying techniques will apply established similarity measures and as well as neural approaches while utilizing semantic features obtained from links between documents and their text.

استرجاع المعلومات المكتبات الرقمية

Towards an Open Platform for Legal Information

100 - Malte Ostendorff , Till Blume , Saskia Ostendorff 2020

Recent advances in the area of legal information systems have led to a variety of applications that promise support in processing and accessing legal documents. Unfortunately, these applications have various limitations, e.g., regarding scope or exte nsibility. Furthermore, we do not observe a trend towards open access in digital libraries in the legal domain as we observe in other domains, e.g., economics of computer science. To improve open access in the legal domain, we present our approach for an open source platform to transparently process and access Legal Open Data. This enables the sustainable development of legal applications by offering a single technology stack. Moreover, the approach facilitates the development and deployment of new technologies. As proof of concept, we implemented six technologies and generated metadata for more than 250,000 German laws and court decisions. Thus, we can provide users of our platform not only access to legal documents, but also the contained information.

المكتبات الرقمية الحساب واللغة

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles

87 - Malte Ostendorff , Terry Ruas , Moritz Schubotz 2020

Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we mo del the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.

المكتبات الرقمية الحساب واللغة استرجاع المعلومات

Enriching BERT with Knowledge Graph Embeddings for Document Classification

123 - Malte Ostendorff , Peter Bourgonje , Maria Berger 2019

In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata. Building upon BERT, a deep neural language model, we demonstrate how to combine text representations with metadata and knowle dge graph embeddings, which encode author information. Compared to the standard BERT approach we achieve considerably better results for the classification task. For a more coarse-grained classification using eight labels we achieve an F1- score of 87.20, while a detailed classification using 343 labels yields an F1-score of 64.70. We make the source code and trained models of our experiments publicly available

الحساب واللغة استرجاع المعلومات التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد