No Arabic abstract
Searching for concepts in science and technology is often a difficult task. To facilitate concept search, different types of human-generated metadata have been created to define the content of scientific and technical disclosures. Classification schemes such as the International Patent Classification (IPC) and MEDLINEs MeSH are structured and controlled, but require trained experts and central management to restrict ambiguity (Mork, 2013). While unstructured tags of folksonomies can be processed to produce a degree of structure (Kalendar, 2010; Karampinas, 2012; Sarasua, 2012; Bragg, 2013) the freedom enjoyed by the crowd typically results in less precision (Stock 2007). Existing classification schemes suffer from inflexibility and ambiguity. Since humans understand language, inference, implication, abstraction and hence concepts better than computers, we propose to harness the collective wisdom of the crowd. To do so, we propose a novel classification scheme that is sufficiently intuitive for the crowd to use, yet powerful enough to facilitate search by analogy, and flexible enough to deal with ambiguity. The system will enhance existing classification information. Linking up with the semantic web and computer intelligence, a Citizen Science effort (Good, 2013) would support innovation by improving the quality of granted patents, reducing duplicitous research, and stimulating problem-oriented solution design. A prototype of our design is in preparation. A crowd-sourced fuzzy and faceted classification scheme will allow for better concept search and improved access to prior art in science and technology.
The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers (e.g., genes, drugs, diseases, patient outcomes); second, combining textual and network information to search and visualize groups of researchers and their ties. SciSight has so far served over $15K$ users with over $42K$ page views and $13%$ returns.
Background: The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface for such information. It is therefore of interest to find out how well web search engines work for diagnostic queries and what factors contribute to successes and failures. Among diseases, rare (or orphan) diseases represent an especially challenging and thus interesting class to diagnose as each is rare, diverse in symptoms and usually has scattered resources associated with it. Methods: We use an evaluation approach for web search engines for rare disease diagnosis which includes 56 real life diagnostic cases, state-of-the-art evaluation measures, and curated information resources. In addition, we introduce FindZebra, a specialized (vertical) rare disease search engine. FindZebra is powered by open source search technology and uses curated freely available online medical information. Results: FindZebra outperforms Google Search in both default setup and customised to the resources used by FindZebra. We extend FindZebra with specialized functionalities exploiting medical ontological information and UMLS medical concepts to demonstrate different ways of displaying the retrieved results to medical experts. Conclusions: Our results indicate that a specialized search engine can improve the diagnostic quality without compromising the ease of use of the currently widely popular web search engines. The proposed evaluation approach can be valuable for future development and benchmarking. The FindZebra search engine is available at http://www.findzebra.com/.
Proliferation of ubiquitous mobile devices makes location based services prevalent. Mobile users are able to volunteer as providers of specific services and in the meanwhile to search these services. For example, drivers may be interested in tracking available nearby users who are willing to help with motor repair or are willing to provide travel directions or first aid. With the diffusion of mobile users, it is necessary to provide scalable means of enabling such users to connect with other nearby users so that they can help each other with specific services. Motivated by these observations, we design and implement a general location based system HelPal for mobile users to provide and enjoy instant service, which is called mobile crowd service. In this demo, we introduce a mobile crowd service system featured with several novel techniques. We sketch the system architecture and illustrate scenarios via several cases. Demonstration shows the user-friendly search interface for users to conveniently find skilled and qualified nearby service providers.
The task of expert finding has been getting increasing attention in information retrieval literature. However, the current state-of-the-art is still lacking in principled approaches for combining different sources of evidence in an optimal way. This paper explores the usage of learning to rank methods as a principled approach for combining multiple estimators of expertise, derived from the textual contents, from the graph-structure with the citation patterns for the community of experts, and from profile information about the experts. Experiments made over a dataset of academic publications, for the area of Computer Science, attest for the adequacy of the proposed approaches.
We describe Space Warps, a novel gravitational lens discovery service that yields samples of high purity and completeness through crowd-sourced visual inspection. Carefully produced colour composite images are displayed to volunteers via a web- based classification interface, which records their estimates of the positions of candidate lensed features. Images of simulated lenses, as well as real images which lack lenses, are inserted into the image stream at random intervals; this training set is used to give the volunteers instantaneous feedback on their performance, as well as to calibrate a model of the system that provides dynamical updates to the probability that a classified image contains a lens. Low probability systems are retired from the site periodically, concentrating the sample towards a set of lens candidates. Having divided 160 square degrees of Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) imaging into some 430,000 overlapping 82 by 82 arcsecond tiles and displaying them on the site, we were joined by around 37,000 volunteers who contributed 11 million image classifications over the course of 8 months. This Stage 1 search reduced the sample to 3381 images containing candidates; these were then refined in Stage 2 to yield a sample that we expect to be over 90% complete and 30% pure, based on our analysis of the volunteers performance on training images. We comment on the scalability of the SpaceWarps system to the wide field survey era, based on our projection that searches of 10$^5$ images could be performed by a crowd of 10$^5$ volunteers in 6 days.