New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Evaluating Memento Service Optimizations

306 0 0.0 ( 0 )

Download Cite

Added by Martin Klein

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Martin Klein - Lyudmila Balakireva - Harihar Shankar

Information Retrieval Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Services and applications based on the Memento Aggregator can suffer from slow response times due to the federated search across web archives performed by the Memento infrastructure. In an effort to decrease the response times, we established a cache system and experimented with machine learning models to predict archival holdings. We reported on the experimental results in previous work and can now, after these optimizations have been in production for two years, evaluate their efficiency, based on long-term log data. During our investigation we find that the cache is very effective with a 70-80% cache hit rate for human-driven services. The machine learning prediction operates at an acceptable average recall level of 0.727 but our results also show that a more frequent retraining of the models is needed to further improve prediction accuracy.

rate research

Legal Document Classification: An Application to Law Area Prediction of Petitions to Public Prosecution Service

69 - Mariana Y. Noguti , Eduardo Vellasques , Luiz S. Oliveira 2020

In recent years, there has been an increased interest in the application of Natural Language Processing (NLP) to legal documents. The use of convolutional and recurrent neural networks along with word embedding techniques have presented promising results when applied to textual classification problems, such as sentiment analysis and topic segmentation of documents. This paper proposes the use of NLP techniques for textual classification, with the purpose of categorizing the descriptions of the services provided by the Public Prosecutors Office of the State of Parana to the population in one of the areas of law covered by the institution. Our main goal is to automate the process of assigning petitions to their respective areas of law, with a consequent reduction in costs and time associated with such process while allowing the allocation of human resources to more complex tasks. In this paper, we compare different approaches to word representations in the aforementioned task: including document-term matrices and a few different word embeddings. With regards to the classification models, we evaluated three different families: linear models, boosted trees and neural networks. The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network (RNN) architecture (more specifically, LSTM), leading to an accuracy of 90% and F1-Score of 85% in the classification of eighteen categories (law areas).

Information Retrieval Machine Learning

Outlier-Resilient Web Service QoS Prediction

201 - Fanghua Ye , Zhiwei Lin , Chuan Chen 2020

The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.

Information Retrieval Software Engineering

Evaluating Stochastic Rankings with Expected Exposure

43 - Fernando Diaz , Bhaskar Mitra , Michael D. Ekstrand 2020

We introduce the concept of emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a emph{distribution over rankings} instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including emph{ad hoc} retrieval and recommendation. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress.

Information Retrieval

HelPal: A Search System for Mobile Crowd Service

57 - Yao Wu , Tianzhen Wu , Ziyi Xiong 2017

Proliferation of ubiquitous mobile devices makes location based services prevalent. Mobile users are able to volunteer as providers of specific services and in the meanwhile to search these services. For example, drivers may be interested in tracking available nearby users who are willing to help with motor repair or are willing to provide travel directions or first aid. With the diffusion of mobile users, it is necessary to provide scalable means of enabling such users to connect with other nearby users so that they can help each other with specific services. Motivated by these observations, we design and implement a general location based system HelPal for mobile users to provide and enjoy instant service, which is called mobile crowd service. In this demo, we introduce a mobile crowd service system featured with several novel techniques. We sketch the system architecture and illustrate scenarios via several cases. Demonstration shows the user-friendly search interface for users to conveniently find skilled and qualified nearby service providers.

Information Retrieval

Evaluating Music Recommendations with Binary Feedback for Multiple Stakeholders

112 - Sasha Stoikov , Hongyi Wen 2021

High quality user feedback data is essential to training and evaluating a successful music recommendation system, particularly one that has to balance the needs of multiple stakeholders. Most existing music datasets suffer from noisy feedback and self-selection biases inherent in the data collected by music platforms. Using the Piki Music dataset of 500k ratings collected over a two-year time period, we evaluate the performance of classic recommendation algorithms on three important stakeholders: consumers, well-known artists and lesser-known artists. We show that a matrix factorization algorithm trained on both likes and dislikes performs significantly better compared to one trained only on likes for all three stakeholders.

Information Retrieval

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Evaluating Memento Service Optimizations

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions