TADO: Time-varying Attention with Dual-Optimizer Model

155 0 0.0 ( 0 )

Download Cite

Added by Yuexin Wu

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Yuexin Wu - Tianyu Gao - Sihao Wang

Information Retrieval

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The review-based recommender systems are commonly utilized to measure users preferences towards different items. In this paper, we focus on addressing three main problems existing in the review-based methods. Firstly, these methods suffer from the class-imbalanced problem where rating levels with lower proportions will be ignored to some extent. Thus, their performance on relatively rare rating levels is unsatisfactory. As the first attempt in this field to address this problem, we propose a flexible dual-optimizer model to gain robustness from both regression loss and classification loss. Secondly, to address the problem caused by the insufficient contextual information extraction ability of word embedding, we first introduce BERT into the review-based method to improve the performance of the semantic analysis. Thirdly, the existing methods ignore the feature information of the time-varying user preferences. Therefore, we propose a time-varying feature extraction module with bidirectional long short-term memory and multi-scale convolutional neural network. Afterward, an interaction component is proposed to further summarize the contextual information of the user-item pairs. To verify the effectiveness of the proposed TADO, we conduct extensive experiments on 23 benchmark datasets selected from Amazon Product Reviews. Compared with several recently proposed state-of-the-art methods, our model obtains significant gain over ALFM, MPCN, and ANR averagely with 20.98%, 9.84%, and 15.46%, respectively. Further analysis proves the necessity of jointly using the proposed components in TADO.

rate research

Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising

100 - Kan Ren , Yuchen Fang , Weinan Zhang 2018

In online advertising, the Internet users may be exposed to a sequence of different ad campaigns, i.e., display ads, search, or referrals from multiple channels, before led up to any final sales conversion and transaction. For both campaigners and publishers, it is fundamentally critical to estimate the contribution from ad campaign touch-points during the customer journey (conversion funnel) and assign the right credit to the right ad exposure accordingly. However, the existing research on the multi-touch attribution problem lacks a principled way of utilizing the users pre-conversion actions (i.e., clicks), and quite often fails to model the sequential patterns among the touch points from a users behavior data. To make it worse, the current industry practice is merely employing a set of arbitrary rules as the attribution model, e.g., the popular last-touch model assigns 100% credit to the final touch-point regardless of actual attributions. In this paper, we propose a Dual-attention Recurrent Neural Network (DARNN) for the multi-touch attribution problem. It learns the attribution values through an attention mechanism directly from the conversion estimation objective. To achieve this, we utilize sequence-to-sequence prediction for user clicks, and combine both post-view and post-click attribution patterns together for the final conversion estimation. To quantitatively benchmark attribution models, we also propose a novel yet practical attribution evaluation scheme through the proxy of budget allocation (under the estimated attributions) over ad channels. The experimental results on two real datasets demonstrate the significant performance gains of our attribution model against the state of the art.

Information Retrieval Artificial Intelligence Machine Learning

Why does attention to web articles fall with time?

422 - M.V. Simkin , V.P. Roychowdhury 2012

We analyze access statistics of a hundred and fifty blog entries and news articles, for periods of up to three years. Access rate falls as an inverse power of time passed since publication. The power law holds for periods of up to thousand days. The exponents are different for different blogs and are distributed between 0.6 and 3.2. We argue that the decay of attention to a web article is caused by the link to it first dropping down the list of links on the websites front page, and then disappearing from the front page and its subsequent movement further into background. The other proposed explanations that use a decaying with time novelty factor, or some intricate theory of human dynamics cannot explain all of the experimental observations.

Information Retrieval Physics and Society

Linear-Time Self Attention with Codeword Histogram for Efficient Recommendation

89 - Yongji Wu , Defu Lian , Neil Zhenqiang Gong 2021

Self-attention has become increasingly popular in a variety of sequence modeling tasks from natural language processing to recommendation, due to its effectiveness. However, self-attention suffers from quadratic computational and memory complexities, prohibiting its applications on long sequences. Existing approaches that address this issue mainly rely on a sparse attention context, either using a local window, or a permuted bucket obtained by locality-sensitive hashing (LSH) or sorting, while crucial information may be lost. Inspired by the idea of vector quantization that uses cluster centroids to approximate items, we propose LISA (LInear-time Self Attention), which enjoys both the effectiveness of vanilla self-attention and the efficiency of sparse attention. LISA scales linearly with the sequence length, while enabling full contextual attention via computing differentiable histograms of codeword distributions. Meanwhile, unlike some efficient attention methods, our method poses no restriction on casual masking or sequence length. We evaluate our method on four real-world datasets for sequential recommendation. The results show that LISA outperforms the state-of-the-art efficient attention methods in both performance and speed; and it is up to 57x faster and 78x more memory efficient than vanilla self-attention.

Information Retrieval

A Brand-level Ranking System with the Customized Attention-GRU Model

73 - Yu Zhu , Junxiong Zhu , Jie Hou 2018

In e-commerce websites like Taobao, brand is playing a more important role in influencing users decision of click/purchase, partly because users are now attaching more importance to the quality of products and brand is an indicator of quality. However, existing ranking systems are not specifically designed to satisfy this kind of demand. Some design tricks may partially alleviate this problem, but still cannot provide satisfactory results or may create additional interaction cost. In this paper, we design the first brand-level ranking system to address this problem. The key challenge of this system is how to sufficiently exploit users rich behavior in e-commerce websites to rank the brands. In our solution, we firstly conduct the feature engineering specifically tailored for the personalized brand ranking problem and then rank the brands by an adapted Attention-GRU model containing three important modifications. Note that our proposed modifications can also apply to many other machine learning models on various tasks. We conduct a series of experiments to evaluate the effectiveness of our proposed ranking model and test the response to the brand-level ranking system from real users on a large-scale e-commerce platform, i.e. Taobao.

Information Retrieval Machine Learning

Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries

104 - Matthias Fontanellaz , Stergios Christodoulidis , Stavroulan Mougiakakou 2019

Direct computer vision based-nutrient content estimation is a demanding task, due to deformation and occlusions of ingredients, as well as high intra-class and low inter-class variability between meal classes. In order to tackle these issues, we propose a system for recipe retrieval from images. The recipe information can subsequently be used to estimate the nutrient content of the meal. In this study, we utilize the multi-modal Recipe1M dataset, which contains over 1 million recipes accompanied by over 13 million images. The proposed model can operate as a first step in an automatic pipeline for the estimation of nutrition content by supporting hints related to ingredient and instruction. Through self-attention, our model can directly process raw recipe text, making the upstream instruction sentence embedding process redundant and thus reducing training time, while providing desirable retrieval results. Furthermore, we propose the use of an ingredient attention mechanism, in order to gain insight into which instructions, parts of instructions or single instruction words are of importance for processing a single ingredient within a certain recipe. Attention-based recipe text encoding contributes to solving the issue of high intra-class/low inter-class variability by focusing on preparation steps specific to the meal. The experimental results demonstrate the potential of such a system for recipe retrieval from images. A comparison with respect to two baseline methods is also presented.

Information Retrieval Computation and Language