Memorize, Factorize, or be Naive: Learning Optimal Feature Interaction Methods for CTR Prediction

174 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Fuyuan Lyu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Fuyuan Lyu - Xing Tang - Huifeng Guo

التعلم الآلي استرجاع المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Click-through rate prediction is one of the core tasks in commercial recommender systems. It aims to predict the probability of a user clicking a particular item given user and item features. As feature interactions bring in non-linearity, they are widely adopted to improve the performance of CTR prediction models. Therefore, effectively modelling feature interactions has attracted much attention in both the research and industry field. The current approaches can generally be categorized into three classes: (1) naive methods, which do not model feature interactions and only use original features; (2) memorized methods, which memorize feature interactions by explicitly viewing them as new features and assigning trainable embeddings; (3) factorized methods, which learn latent vectors for original features and implicitly model feature interactions through factorization functions. Studies have shown that modelling feature interactions by one of these methods alone are suboptimal due to the unique characteristics of different feature interactions. To address this issue, we first propose a general framework called OptInter which finds the most suitable modelling method for each feature interaction. Different state-of-the-art deep CTR models can be viewed as instances of OptInter. To realize the functionality of OptInter, we also introduce a learning algorithm that automatically searches for the optimal modelling method. We conduct extensive experiments on four large datasets. Our experiments show that OptInter improves the best performed state-of-the-art baseline deep CTR models by up to 2.21%. Compared to the memorized method, which also outperforms baselines, we reduce up to 91% parameters. In addition, we conduct several ablation studies to investigate the influence of different components of OptInter. Finally, we provide interpretable discussions on the results of OptInter.

قيم البحث

134 - Shu Wu , Feng Yu , Xueli Yu 2020

The CTR (Click-Through Rate) prediction plays a central role in the domain of computational advertising and recommender systems. There exists several kinds of methods proposed in this field, such as Logistic Regression (LR), Factorization Machines (F M) and deep learning based methods like Wide&Deep, Neural Factorization Machines (NFM) and DeepFM. However, such approaches generally use the vector-product of each pair of features, which have ignored the different semantic spaces of the feature interactions. In this paper, we propose a novel Tensor-based Feature interaction Network (TFNet) model, which introduces an operating tensor to elaborate feature interactions via multi-slice matrices in multiple semantic spaces. Extensive offline and online experiments show that TFNet: 1) outperforms the competitive compared methods on the typical Criteo and Avazu datasets; 2) achieves large improvement of revenue and click rate in online A/B tests in the largest Chinese App recommender system, Tencent MyApp.

استرجاع المعلومات

Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction

393 - Feng Li , Bencheng Yan , Qingqing Long 2021

Cross features play an important role in click-through rate (CTR) prediction. Most of the existing methods adopt a DNN-based model to capture the cross features in an implicit manner. These implicit methods may lead to a sub-optimized performance due to the limitation in explicit semantic modeling. Although traditional statistical explicit semantic cross features can address the problem in these implicit methods, it still suffers from some challenges, including lack of generalization and expensive memory cost. Few works focus on tackling these challenges. In this paper, we take the first step in learning the explicit semantic cross features and propose Pre-trained Cross Feature learning Graph Neural Networks (PCF-GNN), a GNN based pre-trained model aiming at generating cross features in an explicit fashion. Extensive experiments are conducted on both public and industrial datasets, where PCF-GNN shows competence in both performance and memory-efficiency in various tasks.

الذكاء الاصطناعي استرجاع المعلومات التعلم الآلي

Naive Feature Selection: Sparsity in Naive Bayes

159 - Armin Askari , Alexandre dAspremont , Laurent El Ghaoui 2019

Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a c ombinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our bound becomes tight as the marginal contribution of additional features decreases. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, $l_1$-penalized logistic regression and LASSO, while being orders of magnitude faster. For a large data set, having more than with $1.6$ million training points and about $12$ million features, and with a non-optimized CPU implementation, our sparse naive Bayes model can be trained in less than 15 seconds.

التعلم الآلي التعلم الالي

Multi-Interactive Attention Network for Fine-grained Feature Learning in CTR Prediction

300 - Kai Zhang , Hao Qian , Qing Cui 2020

In the Click-Through Rate (CTR) prediction scenario, users sequential behaviors are well utilized to capture the user interest in the recent literature. However, despite being extensively studied, these sequential methods still suffer from three limi tations. First, existing methods mostly utilize attention on the behavior of users, which is not always suitable for CTR prediction, because users often click on new products that are irrelevant to any historical behaviors. Second, in the real scenario, there exist numerous users that have operations a long time ago, but turn relatively inactive in recent times. Thus, it is hard to precisely capture users current preferences through early behaviors. Third, multiple representations of users historical behaviors in different feature subspaces are largely ignored. To remedy these issues, we propose a Multi-Interactive Attention Network (MIAN) to comprehensively extract the latent relationship among all kinds of fine-grained features (e.g., gender, age and occupation in user-profile). Specifically, MIAN contains a Multi-Interactive Layer (MIL) that integrates three local interaction modules to capture multiple representations of user preference through sequential behaviors and simultaneously utilize the fine-grained user-specific as well as context information. In addition, we design a Global Interaction Module (GIM) to learn the high-order interactions and balance the different impacts of multiple features. Finally, Offline experiment results from three datasets, together with an Online A/B test in a large-scale recommendation system, demonstrate the effectiveness of our proposed approach.

استرجاع المعلومات الذكاء الاصطناعي

CPM-sensitive AUC for CTR prediction

175 - Zhaocheng Liu , Guangxue Yin 2019

The prediction of click-through rate (CTR) is crucial for industrial applications, such as online advertising. AUC is a commonly used evaluation indicator for CTR models. For advertising platforms, online performance is generally evaluated by CPM. Ho wever, in practice, AUC often improves in offline evaluation, but online CPM does not. As a result, a huge waste of precious online traffic and human costs has been caused. This is because there is a gap between offline AUC and online CPM. AUC can only reflect the order on CTR, but it does not reflect the order of CTR*Bid. Moreover, the bids of different advertisements are different, so the loss of income caused by different reverse-order pair is also different. For this reason, we propose the CPM-sensitive AUC (csAUC) to solve all these problems. We also give the csAUC calculation method based on dynamic programming. It can fully support the calculation of csAUC on large-scale data in real-world applications.

التعلم الآلي