Fast Matrix Factorization with Non-Uniform Weights on Missing Data

64 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiaoyu Du

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xiangnan He - Jinhui Tang - Xiaoyu Du

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high-dimensional but sparse. This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this work, we weight the missing data non-uniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix, rather than the matrix size. The key idea is two-fold: 1) we apply truncated SVD on the weight matrix to get a more compact representation of the weights, and 2) we learn MF parameters with element-wise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method.

قيم البحث

135 - Eduardo Pavez , Antonio Ortega 2019

In this paper we study covariance estimation with missing data. We consider missing data mechanisms that can be independent of the data, or have a time varying dependency. Additionally, observed variables may have arbitrary (non uniform) and dependen t observation probabilities. For each mechanism, we construct an unbiased estimator and obtain bounds for the expected value of their estimation error in operator norm. Our bounds are equivalent, up to constant and logarithmic factors, to state of the art bounds for complete and uniform missing observations. Furthermore, for the more general non uniform and dependent cases, the proposed bounds are new or improve upon previous results. Our error estimates depend on quantities we call scaled effective rank, which generalize the effective rank to account for missing observations. All the estimators studied in this work have the same asymptotic convergence rate (up to logarithmic factors).

نظرية الإحصاء نظرية الإحصاء

Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing data

290 - Guangtun Zhu 2016

Dimensionality reduction and matrix factorization techniques are important and useful machine-learning techniques in many fields. Nonnegative matrix factorization (NMF) is particularly useful for spectral analysis and image processing in astronomy. I present the vectorized update rules and an independent proof of their convergence for NMF with heteroscedastic measurements and missing data. I release a Python implementation of the rules and use an optical spectroscopic dataset of extragalactic sources as an example for demonstration. A future paper will present results of applying the technique to image processing of planetary disks.

الأجهزة والأساليب للزيئات الفيزياء الفلكية

A fast two-stage algorithm for non-negative matrix factorization in streaming data

136 - Ran Gu , Qiang Du , Simon J. L. Billinge 2021

In this article, we study algorithms for nonnegative matrix factorization (NMF) in various applications involving streaming data. Utilizing the continual nature of the data, we develop a fast two-stage algorithm for highly efficient and accurate NMF. In the first stage, an alternating non-negative least squares (ANLS) framework is used, in combination with active set method with warm-start strategy for the solution of subproblems. In the second stage, an interior point method is adopted to accelerate the local convergence. The convergence of the proposed algorithm is proved. The new algorithm is compared with some existing algorithms in benchmark tests using both real-world data and synthetic data. The results demonstrate the advantage of our algorithm in finding high-precision solutions.

التحسين والتحكم التحليل العددي التحليل العددي

Clustered Monotone Transforms for Rating Factorization

388 - Gaurush Hiranandani , Raghav Somani , Oluwasanmi Koyejo 2018

Exploiting low-rank structure of the user-item rating matrix has been the crux of many recommendation engines. However, existing recommendation engines force raters with heterogeneous behavior profiles to map their intrinsic rating scales to a common rating scale (e.g. 1-5). This non-linear transformation of the rating scale shatters the low-rank structure of the rating matrix, therefore resulting in a poor fit and consequentially, poor recommendations. In this paper, we propose Clustered Monotone Transforms for Rating Factorization (CMTRF), a novel approach to perform regression up to unknown monotonic transforms over unknown population segments. Essentially, for recommendation systems, the technique searches for monotonic transformations of the rating scales resulting in a better fit. This is combined with an underlying matrix factorization regression model that couples the user-wise ratings to exploit shared low dimensional structure. The rating scale transformations can be generated for each user, for a cluster of users, or for all the users at once, forming the basis of three simple and efficient algorithms proposed in this paper, all of which alternate between transformation of the rating scales and matrix factorization regression. Despite the non-convexity, CMTRF is theoretically shown to recover a unique solution under mild conditions. Experimental results on two synthetic and seven real-world datasets show that CMTRF outperforms other state-of-the-art baselines.

استرجاع المعلومات التعلم الآلي التعلم الالي

Factorization Machines for Data with Implicit Feedback

126 - Babak Loni , Martha Larson , Alan Hanjalic 2018

In this work, we propose FM-Pair, an adaptation of Factorization Machines with a pairwise loss function, making them effective for datasets with implicit feedback. The optimization model in FM-Pair is based on the BPR (Bayesian Personalized Ranking) criterion, which is a well-established pairwise optimization model. FM-Pair retains the advantages of FMs on generality, expressiveness and performance and yet it can be used for datasets with implicit feedback. We also propose how to apply FM-Pair effectively on two collaborative filtering problems, namely, context-aware recommendation and cross-domain collaborative filtering. By performing experiments on different datasets with explicit or implicit feedback we empirically show that in most of the tested datasets, FM-Pair beats state-of-the-art learning-to-rank methods such as BPR-MF (BPR with Matrix Factorization model). We also show that FM-Pair is significantly more effective for ranking, compared to the standard FMs model. Moreover, we show that FM-Pair can utilize context or cross-domain information effectively as the accuracy of recommendations would always improve with the right auxiliary features. Finally we show that FM-Pair has a linear time complexity and scales linearly by exploiting additional features.

استرجاع المعلومات