ترغب بنشر مسار تعليمي؟ اضغط هنا

Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library

82   0   0.0 ( 0 )
 نشر من قبل Jaimie Murdock
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of drill-down topic modeling---simultaneously reducing both the size of the corpus and the unit of analysis---we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of close reading needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted.



قيم البحث

اقرأ أيضاً

In the social sciences, researchers search for information on the Web, but this is most often distributed on different websites, search portals, digital libraries, data archives, and databases. In this work, we present an integrated search system for social science information that allows finding information around research data in a single digital library. Users can search for research data sets, publications, survey variables, questions from questionnaires, survey instruments, and tools. Information items are linked to each other so that users can see, for example, which publications contain data citations to research data. The integration and linking of different kinds of information increase their visibility so that it is easier for researchers to find information for re-use. In a log-based usage study, we found that users search across different information types, that search sessions contain a high rate of positive signals and that link information is often explored.
Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we mo del the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.
Position bias describes the tendency of users to interact with items on top of a list with higher probability than with items at a lower position in the list, regardless of the items actual relevance. In the domain of recommender systems, particularl y recommender systems in digital libraries, position bias has received little attention. We conduct a study in a real-world recommender system that delivered ten million related-article recommendations to the users of the digital library Sowiport, and the reference manager JabRef. Recommendations were randomly chosen to be shuffled or non-shuffled, and we compared click-through rate (CTR) for each rank of the recommendations. According to our analysis, the CTR for the highest rank in the case of Sowiport is 53% higher than expected in a hypothetical non-biased situation (0.189% vs. 0.123%). Similarly, in the case of Jabref the highest rank received a CTR of 1.276%, which is 87% higher than expected (0.683%). A chi-squared test confirms the strong relationship between the rank of the recommendation shown to the user and whether the user decided to click it (p < 0.01 for both Jabref and Sowiport). Our study confirms the findings from other domains, that recommendations in the top positions are more often clicked, regardless of their actual relevance.
We present an overview of the recently funded Merging Science and Cyberinfrastructure Pathways: The Whole Tale project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete na rrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, Whole Tale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale environment or can be integrated into existing or future domain Science Gateways.
Lifecycle models for research data are often abstract and simple. This comes at the danger of oversimplifying the complex concepts of research data management. The analysis of 90 different lifecycle models lead to two approaches to assess the quality of these models. While terminological issues make direct comparisons of models hard, an empirical evaluation seems possible.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا