No Arabic abstract
Position bias describes the tendency of users to interact with items on top of a list with higher probability than with items at a lower position in the list, regardless of the items actual relevance. In the domain of recommender systems, particularly recommender systems in digital libraries, position bias has received little attention. We conduct a study in a real-world recommender system that delivered ten million related-article recommendations to the users of the digital library Sowiport, and the reference manager JabRef. Recommendations were randomly chosen to be shuffled or non-shuffled, and we compared click-through rate (CTR) for each rank of the recommendations. According to our analysis, the CTR for the highest rank in the case of Sowiport is 53% higher than expected in a hypothetical non-biased situation (0.189% vs. 0.123%). Similarly, in the case of Jabref the highest rank received a CTR of 1.276%, which is 87% higher than expected (0.683%). A chi-squared test confirms the strong relationship between the rank of the recommendation shown to the user and whether the user decided to click it (p < 0.01 for both Jabref and Sowiport). Our study confirms the findings from other domains, that recommendations in the top positions are more often clicked, regardless of their actual relevance.
In the social sciences, researchers search for information on the Web, but this is most often distributed on different websites, search portals, digital libraries, data archives, and databases. In this work, we present an integrated search system for social science information that allows finding information around research data in a single digital library. Users can search for research data sets, publications, survey variables, questions from questionnaires, survey instruments, and tools. Information items are linked to each other so that users can see, for example, which publications contain data citations to research data. The integration and linking of different kinds of information increase their visibility so that it is easier for researchers to find information for re-use. In a log-based usage study, we found that users search across different information types, that search sessions contain a high rate of positive signals and that link information is often explored.
Many recommendation algorithms are available to digital library recommender system operators. The effectiveness of algorithms is largely unreported by way of online evaluation. We compare a standard term-based recommendation approach to two promising approaches for related-article recommendation in digital libraries: document embeddings, and keyphrases. We evaluate the consistency of their performance across multiple scenarios. Through our recommender-as-a-service Mr. DLib, we delivered 33.5M recommendations to users of Sowiport and Jabref over the course of 19 months, from March 2017 to October 2018. The effectiveness of the algorithms differs significantly between Sowiport and Jabref (Wilcoxon rank-sum test; p < 0.05). There is a ~400% difference in effectiveness between the best and worst algorithm in both scenarios separately. The best performing algorithm in Sowiport (terms) is the worst performing in Jabref. The best performing algorithm in Jabref (keyphrases) is 70% worse in Sowiport, than Sowiport`s best algorithm (click-through rate; 0.1% terms, 0.03% keyphrases).
We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of drill-down topic modeling---simultaneously reducing both the size of the corpus and the unit of analysis---we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of close reading needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted.
Log analysis in Web search showed that user sessions often contain several different topics. This means sessions need to be segmented into parts which handle the same topic in order to give appropriate user support based on the topic, and not on a mixture of topics. Different methods have been proposed to segment a user session to different topics based on timeouts, lexical analysis, query similarity or external knowledge sources. In this paper, we study the problem in a digital library for the social sciences. We present a method based on a thesaurus and a classification system which are typical knowledge organization systems in digital libraries. Five experts evaluated our approach and rated it as good for the segmentation of search sessions into parts that treat the same topic.
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries. Using the ADS usage logs along with membership statistics from the International Astronomical Union and data on the population and gross domestic product (GDP) we develop an accurate model for world-wide basic research where the number of scientists in a country is proportional to the GDP of that country, and the amount of basic research done by a country is proportional to the number of scientists in that country times that countrys per capita GDP. We introduce the concept of utility time to measure the impact of the ADS/URANIA and the electronic astronomical library on astronomical research. We find that in 2002 it amounted to the equivalent of 736 FTE researchers, or $250 Million, or the astronomical research done in France. Subject headings: digital libraries; bibliometrics; sociology of science; information retrieval