Evaluating Memento Service Optimizations

306 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Martin Klein

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Martin Klein - Lyudmila Balakireva - Harihar Shankar

استرجاع المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Services and applications based on the Memento Aggregator can suffer from slow response times due to the federated search across web archives performed by the Memento infrastructure. In an effort to decrease the response times, we established a cache system and experimented with machine learning models to predict archival holdings. We reported on the experimental results in previous work and can now, after these optimizations have been in production for two years, evaluate their efficiency, based on long-term log data. During our investigation we find that the cache is very effective with a 70-80% cache hit rate for human-driven services. The machine learning prediction operates at an acceptable average recall level of 0.727 but our results also show that a more frequent retraining of the models is needed to further improve prediction accuracy.

قيم البحث

69 - Mariana Y. Noguti , Eduardo Vellasques , Luiz S. Oliveira 2020

In recent years, there has been an increased interest in the application of Natural Language Processing (NLP) to legal documents. The use of convolutional and recurrent neural networks along with word embedding techniques have presented promising res ults when applied to textual classification problems, such as sentiment analysis and topic segmentation of documents. This paper proposes the use of NLP techniques for textual classification, with the purpose of categorizing the descriptions of the services provided by the Public Prosecutors Office of the State of Parana to the population in one of the areas of law covered by the institution. Our main goal is to automate the process of assigning petitions to their respective areas of law, with a consequent reduction in costs and time associated with such process while allowing the allocation of human resources to more complex tasks. In this paper, we compare different approaches to word representations in the aforementioned task: including document-term matrices and a few different word embeddings. With regards to the classification models, we evaluated three different families: linear models, boosted trees and neural networks. The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network (RNN) architecture (more specifically, LSTM), leading to an accuracy of 90% and F1-Score of 85% in the classification of eighteen categories (law areas).

استرجاع المعلومات التعلم الآلي

Outlier-Resilient Web Service QoS Prediction

201 - Fanghua Ye , Zhiwei Lin , Chuan Chen 2020

The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web servic es, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.

استرجاع المعلومات هندسة البرمجيات

Evaluating Stochastic Rankings with Expected Exposure

43 - Fernando Diaz , Bhaskar Mitra , Michael D. Ekstrand 2020

We introduce the concept of emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a emph{distribution over rankings} instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including emph{ad hoc} retrieval and recommendation. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress.

استرجاع المعلومات

HelPal: A Search System for Mobile Crowd Service

57 - Yao Wu , Tianzhen Wu , Ziyi Xiong 2017

Proliferation of ubiquitous mobile devices makes location based services prevalent. Mobile users are able to volunteer as providers of specific services and in the meanwhile to search these services. For example, drivers may be interested in tracking available nearby users who are willing to help with motor repair or are willing to provide travel directions or first aid. With the diffusion of mobile users, it is necessary to provide scalable means of enabling such users to connect with other nearby users so that they can help each other with specific services. Motivated by these observations, we design and implement a general location based system HelPal for mobile users to provide and enjoy instant service, which is called mobile crowd service. In this demo, we introduce a mobile crowd service system featured with several novel techniques. We sketch the system architecture and illustrate scenarios via several cases. Demonstration shows the user-friendly search interface for users to conveniently find skilled and qualified nearby service providers.

استرجاع المعلومات

Evaluating Music Recommendations with Binary Feedback for Multiple Stakeholders

112 - Sasha Stoikov , Hongyi Wen 2021

High quality user feedback data is essential to training and evaluating a successful music recommendation system, particularly one that has to balance the needs of multiple stakeholders. Most existing music datasets suffer from noisy feedback and sel f-selection biases inherent in the data collected by music platforms. Using the Piki Music dataset of 500k ratings collected over a two-year time period, we evaluate the performance of classic recommendation algorithms on three important stakeholders: consumers, well-known artists and lesser-known artists. We show that a matrix factorization algorithm trained on both likes and dislikes performs significantly better compared to one trained only on likes for all three stakeholders.

استرجاع المعلومات