An Embedding Learning Framework for Numerical Features in CTR Prediction


Abstract in English

Click-Through Rate (CTR) prediction is critical for industrial recommender systems, where most deep CTR models follow an Embedding & Feature Interaction paradigm. However, the majority of methods focus on designing network architectures to better capture feature interactions while the feature embedding, especially for numerical features, has been overlooked. Existing approaches for numerical features are difficult to capture informative knowledge because of the low capacity or hard discretization based on the offline expertise feature engineering. In this paper, we propose a novel embedding learning framework for numerical features in CTR prediction (AutoDis) with high model capacity, end-to-end training and unique representation properties preserved. AutoDis consists of three core components: meta-embeddings, automatic discretization and aggregation. Specifically, we propose meta-embeddings for each numerical field to learn global knowledge from the perspective of field with a manageable number of parameters. Then the differentiable automatic discretization performs soft discretization and captures the correlations between the numerical features and meta-embeddings. Finally, distinctive and informative embeddings are learned via an aggregation function. Comprehensive experiments on two public and one industrial datasets are conducted to validate the effectiveness of AutoDis. Moreover, AutoDis has been deployed onto a mainstream advertising platform, where online A/B test demonstrates the improvement over the base model by 2.1% and 2.7% in terms of CTR and eCPM, respectively. In addition, the code of our framework is publicly available in MindSpore(https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/recommend/autodis).

Download