No Arabic abstract
Recommender systems rely on user behavior data like ratings and clicks to build personalization model. However, the collected data is observational rather than experimental, causing various biases in the data which significantly affect the learned model. Most existing work for recommendation debiasing, such as the inverse propensity scoring and imputation approaches, focuses on one or two specific biases, lacking the universal capacity that can account for mixed or even unknown biases in the data. Towards this research gap, we first analyze the origin of biases from the perspective of textit{risk discrepancy} that represents the difference between the expectation empirical risk and the true risk. Remarkably, we derive a general learning framework that well summarizes most existing debiasing strategies by specifying some parameters of the general framework. This provides a valuable opportunity to develop a universal solution for debiasing, e.g., by learning the debiasing parameters from data. However, the training data lacks important signal of how the data is biased and what the unbiased data looks like. To move this idea forward, we propose textit{AotoDebias} that leverages another (small) set of uniform data to optimize the debiasing parameters by solving the bi-level optimization problem with meta-learning. Through theoretical analyses, we derive the generalization bound for AutoDebias and prove its ability to acquire the appropriate debiasing strategy. Extensive experiments on two real datasets and a simulated dataset demonstrated effectiveness of AutoDebias. The code is available at url{https://github.com/DongHande/AutoDebias}.
Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen feature values (e.g. new video ID) that are prevalent in real-world recommendation systems. In this paper, we propose an alternative embedding framework Deep Hash Embedding (DHE), replacing embedding tables by a deep embedding network to compute embeddings on the fly. DHE first encodes the feature value to a unique identifier vector with multiple hashing functions and transformations, and then applies a DNN to convert the identifier vector to an embedding. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation. Empirical results show that DHE achieves comparable AUC against the standard one-hot full embedding, with smaller model sizes. Our work sheds light on the design of DNN-based alternative embedding schemes for categorical features without using embedding table lookup.
Knowledge distillation (KD) is a well-known method to reduce inference latency by compressing a cumbersome teacher model to a small student model. Despite the success of KD in the classification task, applying KD to recommender models is challenging due to the sparsity of positive feedback, the ambiguity of missing feedback, and the ranking problem associated with the top-N recommendation. To address the issues, we propose a new KD model for the collaborative filtering approach, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for the top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called the teacher- and the student-guided training methods, selecting the most useful feedback from the teacher model. Via experimental results, we demonstrate that the proposed model outperforms the state-of-the-art method by 2.7-33.2% and 2.7-29.1% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, the proposed model achieves the performance comparable to the teacher model.
Cross-domain recommendation can alleviate the data sparsity problem in recommender systems. To transfer the knowledge from one domain to another, one can either utilize the neighborhood information or learn a direct mapping function. However, all existing methods ignore the high-order connectivity information in cross-domain recommendation area and suffer from the domain-incompatibility problem. In this paper, we propose a textbf{J}oint textbf{S}pectral textbf{C}onvolutional textbf{N}etwork (JSCN) for cross-domain recommendation. JSCN will simultaneously operate multi-layer spectral convolutions on different graphs, and jointly learn a domain-invariant user representation with a domain adaptive user mapping module. As a result, the high-order comprehensive connectivity information can be extracted by the spectral convolutions and the information can be transferred across domains with the domain-invariant user mapping. The domain adaptive user mapping module can help the incompatible domains to transfer the knowledge across each other. Extensive experiments on $24$ Amazon rating datasets show the effectiveness of JSCN in the cross-domain recommendation, with $9.2%$ improvement on recall and $36.4%$ improvement on MAP compared with state-of-the-art methods. Our code is available online ~footnote{https://github.com/JimLiu96/JSCN}.
Session based model is widely used in recommend system. It use the user click sequence as input of a Recurrent Neural Network (RNN), and get the output of the RNN network as the vector embedding of the session, and use the inner product of the vector embedding of session and the vector embedding of the next item as the score that is the metric of the interest to the next item. This method can be used for the match stage for the recommendation system whose item number is very big by using some index method like KD-Tree or Ball-Tree and etc.. But this method repudiate the variousness of the interest of user in a session. We generated the model to modify the vector embedding of session to a symmetric matrix embedding, that is equivalent to a quadratic form on the vector space of items. The score is builded as the value of the vector embedding of next item under the quadratic form. The eigenvectors of the symmetric matrix embedding corresponding to the positive eigenvalues are conjectured to represent the interests of user in the session. This method can be used for the match stage also. The experiments show that this method is better than the method of vector embedding.
Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens of millions of different possible categories, the embedding tables form the primary memory bottleneck during both training and inference. We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. By storing multiple smaller embedding tables based on each complementary partition and combining embeddings from each table, we define a unique embedding for each category at smaller memory cost. This approach may be interpreted as using a specific fixed codebook to ensure uniqueness of each categorys representation. Our experimental results demonstrate the effectiveness of our approach over the hashing trick for reducing the size of the embedding tables in terms of model loss and accuracy, while retaining a similar reduction in the number of parameters.