No Arabic abstract
In this work, we propose FM-Pair, an adaptation of Factorization Machines with a pairwise loss function, making them effective for datasets with implicit feedback. The optimization model in FM-Pair is based on the BPR (Bayesian Personalized Ranking) criterion, which is a well-established pairwise optimization model. FM-Pair retains the advantages of FMs on generality, expressiveness and performance and yet it can be used for datasets with implicit feedback. We also propose how to apply FM-Pair effectively on two collaborative filtering problems, namely, context-aware recommendation and cross-domain collaborative filtering. By performing experiments on different datasets with explicit or implicit feedback we empirically show that in most of the tested datasets, FM-Pair beats state-of-the-art learning-to-rank methods such as BPR-MF (BPR with Matrix Factorization model). We also show that FM-Pair is significantly more effective for ranking, compared to the standard FMs model. Moreover, we show that FM-Pair can utilize context or cross-domain information effectively as the accuracy of recommendations would always improve with the right auxiliary features. Finally we show that FM-Pair has a linear time complexity and scales linearly by exploiting additional features.
Factorization Machines (FMs) are effective in incorporating side information to overcome the cold-start and data sparsity problems in recommender systems. Traditional FMs adopt the inner product to model the second-order interactions between different attributes, which are represented via feature vectors. The problem is that the inner product violates the triangle inequality property of feature vectors. As a result, it cannot well capture fine-grained attribute interactions, resulting in sub-optimal performance. Recently, the Euclidean distance is exploited in FMs to replace the inner product and has delivered better performance. However, previous FM methods including the ones equipped with the Euclidean distance all focus on the attribute-level interaction modeling, ignoring the critical intrinsic feature correlations inside attributes. Thereby, they fail to model the complex and rich interactions exhibited in the real-world data. To tackle this problem, in this paper, we propose a FM framework equipped with generalized metric learning techniques to better capture these feature correlations. In particular, based on this framework, we present a Mahalanobis distance and a deep neural network (DNN) methods, which can effectively model the linear and non-linear correlations between features, respectively. Besides, we design an efficient approach for simplifying the model functions. Experiments on several benchmark datasets demonstrate that our proposed framework outperforms several state-of-the-art baselines by a large margin. Moreover, we collect a new large-scale dataset on second-hand trading to justify the effectiveness of our method over cold-start and data sparsity problems in recommender systems.
This paper proposes implicit CF-NADE, a neural autoregressive model for collaborative filtering tasks using implicit feedback ( e.g. click, watch, browse behaviors). We first convert a users implicit feedback into a like vector and a confidence vector, and then model the probability of the like vector, weighted by the confidence vector. The training objective of implicit CF-NADE is to maximize a weighted negative log-likelihood. We test the performance of implicit CF-NADE on a dataset collected from a popular digital TV streaming service. More specifically, in the experiments, we describe how to convert watch counts into implicit relative rating, and feed into implicit CF-NADE. Then we compare the performance of implicit CF-NADE model with the popular implicit matrix factorization approach. Experimental results show that implicit CF-NADE significantly outperforms the baseline.
Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customers purchase history are usually long-term and may not always align with her short-term interests. Hence, in this paper, we leverage clicks within a query session, as implicit feedback, to represent users hidden intents, which further act as the basis for re-ranking subsequent result pages for the query. It has been studied extensively to model user preference with implicit feedback in recommendation tasks. However, there has been little research on modeling users short-term interest in product search. We study whether short-term context could help promote users ideal item in the following result pages for a query. Furthermore, we propose an end-to-end context-aware embedding model which can capture long-term and short-term context dependencies. Our experimental results on the datasets collected from the search log of a commercial product search engine show that short-term context leads to much better performance compared with long-term and no context. Our results also show that our proposed model is more effective than word-based context-aware models.
In this paper, we reflect on ways to improve the quality of bio-medical information retrieval by drawing implicit negative feedback from negated information in noisy natural language search queries. We begin by studying the extent to which negations occur in clinical texts and quantify their detrimental effect on retrieval performance. Subsequently, we present a number of query reformulation and ranking approaches that remedy these shortcomings by resolving natural language negations. Our experimental results are based on data collected in the course of the TREC Clinical Decision Support Track and show consistent improvements compared to state-of-the-art methods. Using our novel algorithms, we are able to reduce the negative impact of negations on early precision by up to 65%.
In this paper, we propose a robust sequential learning strategy for training large-scale Recommender Systems (RS) over implicit feedback mainly in the form of clicks. Our approach relies on the minimization of a pairwise ranking loss over blocks of consecutive items constituted by a sequence of non-clicked items followed by a clicked one for each user. Parameter updates are discarded if for a given user the number of sequential blocks is below or above some given thresholds estimated over the distribution of the number of blocks in the training set. This is to prevent from an abnormal number of clicks over some targeted items, mainly due to bots; or very few user interactions. Both scenarios affect the decision of RS and imply a shift over the distribution of items that are shown to the users. We provide a theoretical analysis showing that in the case where the ranking loss is convex, the deviation between the loss with respect to the sequence of weights found by the proposed algorithm and its minimum is bounded. Furthermore, experimental results on five large-scale collections demonstrate the efficiency of the proposed algorithm with respect to the state-of-the-art approaches, both regarding different ranking measures and computation time.