No Arabic abstract
The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.
In recent years, there has been an increased interest in the application of Natural Language Processing (NLP) to legal documents. The use of convolutional and recurrent neural networks along with word embedding techniques have presented promising results when applied to textual classification problems, such as sentiment analysis and topic segmentation of documents. This paper proposes the use of NLP techniques for textual classification, with the purpose of categorizing the descriptions of the services provided by the Public Prosecutors Office of the State of Parana to the population in one of the areas of law covered by the institution. Our main goal is to automate the process of assigning petitions to their respective areas of law, with a consequent reduction in costs and time associated with such process while allowing the allocation of human resources to more complex tasks. In this paper, we compare different approaches to word representations in the aforementioned task: including document-term matrices and a few different word embeddings. With regards to the classification models, we evaluated three different families: linear models, boosted trees and neural networks. The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network (RNN) architecture (more specifically, LSTM), leading to an accuracy of 90% and F1-Score of 85% in the classification of eighteen categories (law areas).
As a new type of e-commerce platform developed in recent years, local consumer service platform provides users with software to consume service to the nearby store or to the home, such as Groupon and Koubei. Different from other common e-commerce platforms, the behavior of users on the local consumer service platform is closely related to their real-time local context information. Therefore, building a context-aware user behavior prediction system is able to provide both merchants and users better service in local consumer service platforms. However, most of the previous work just treats the contextual information as an ordinary feature into the prediction model to obtain the prediction list under a specific context, which ignores the fact that the interest of a user in different contexts is often significantly different. Hence, in this paper, we propose a context-aware heterogeneous graph attention network (CHGAT) to dynamically generate the representation of the user and to estimate the probability for future behavior. Specifically, we first construct the meta-path based heterogeneous graphs with the historical behaviors from multiple sources and comprehend heterogeneous vertices in the graph with a novel unified knowledge representing approach. Next, a multi-level attention mechanism is introduced for context-aware aggregation with graph vertices, which contains the vertex-level attention network and the path-level attention network. Both of them aim to capture the semantic correlation between information contained in the graph and the outside real-time contextual information in the search system. Then the model proposed in this paper aggregates specific graphs with their corresponding context features and obtains the representation of user interest under a specific context and input it into the prediction network to finally obtain the predicted probability of user behavior.
Services and applications based on the Memento Aggregator can suffer from slow response times due to the federated search across web archives performed by the Memento infrastructure. In an effort to decrease the response times, we established a cache system and experimented with machine learning models to predict archival holdings. We reported on the experimental results in previous work and can now, after these optimizations have been in production for two years, evaluate their efficiency, based on long-term log data. During our investigation we find that the cache is very effective with a 70-80% cache hit rate for human-driven services. The machine learning prediction operates at an acceptable average recall level of 0.727 but our results also show that a more frequent retraining of the models is needed to further improve prediction accuracy.
To achieve collaborative tasks, robots in a team need to have a shared understanding of the environment and their location within it. Distributed Simultaneous Localization and Mapping (SLAM) offers a practical solution to localize the robots without relying on an external positioning system (e.g. GPS) and with minimal information exchange. Unfortunately, current distributed SLAM systems are vulnerable to perception outliers and therefore tend to use very conservative parameters for inter-robot place recognition. However, being too conservative comes at the cost of rejecting many valid loop closure candidates, which results in less accurate trajectory estimates. This paper introduces DOOR-SLAM, a fully distributed SLAM system with an outlier rejection mechanism that can work with less conservative parameters. DOOR-SLAM is based on peer-to-peer communication and does not require full connectivity among the robots. DOOR-SLAM includes two key modules: a pose graph optimizer combined with a distributed pairwise consistent measurement set maximization algorithm to reject spurious inter-robot loop closures; and a distributed SLAM front-end that detects inter-robot loop closures without exchanging raw sensor data. The system has been evaluated in simulations, benchmarking datasets, and field experiments, including tests in GPS-denied subterranean environments. DOOR-SLAM produces more inter-robot loop closures, successfully rejects outliers, and results in accurate trajectory estimates, while requiring low communication bandwidth. Full source code is available at https://github.com/MISTLab/DOOR-SLAM.git.
VOStat is a Web service providing interactive statistical analysis of astronomical tabular datasets. It is integrated into the suite of analysis and visualization tools associated with the international Virtual Observatory (VO) through the SAMP communication system. A user supplies VOStat with a dataset extracted from the VO, or otherwise acquired, and chooses among $sim 60$ statistical functions. These include data transformations, plots and summaries, density estimation, one- and two-sample hypothesis tests, global and local regressions, multivariate analysis and clustering, spatial analysis, directional statistics, survival analysis (for censored data like upper limits), and time series analysis. The statistical operations are performed using the public domain {bf R} statistical software environment, including a small fraction of its $>4000$ {bf CRAN} add-on packages. The purpose of VOStat is to facilitate a wider range of statistical analyses than are commonly used in astronomy, and to promote use of more advanced methodology in {bf R} and {bf CRAN}.