No Arabic abstract
We analyze access statistics of a hundred and fifty blog entries and news articles, for periods of up to three years. Access rate falls as an inverse power of time passed since publication. The power law holds for periods of up to thousand days. The exponents are different for different blogs and are distributed between 0.6 and 3.2. We argue that the decay of attention to a web article is caused by the link to it first dropping down the list of links on the websites front page, and then disappearing from the front page and its subsequent movement further into background. The other proposed explanations that use a decaying with time novelty factor, or some intricate theory of human dynamics cannot explain all of the experimental observations.
Deep matching models aim to facilitate search engines retrieving more relevant documents by mapping queries and documents into semantic vectors in the first-stage retrieval. When leveraging BERT as the deep matching model, the attention score across two words are solely built upon local contextualized word embeddings. It lacks prior global knowledge to distinguish the importance of different words, which has been proved to play a critical role in information retrieval tasks. In addition to this, BERT only performs attention across sub-words tokens which weakens whole word attention representation. We propose a novel Global Weighted Self-Attention (GLOW) network for web document search. GLOW fuses global corpus statistics into the deep matching model. By adding prior weights into attention generation from global information, like BM25, GLOW successfully learns weighted attention scores jointly with query matrix $Q$ and key matrix $K$. We also present an efficient whole word weight sharing solution to bring prior whole word knowledge into sub-words level attention. It aids Transformer to learn whole word level attention. To make our models applicable to complicated web search scenarios, we introduce combined fields representation to accommodate documents with multiple fields even with variable number of instances. We demonstrate GLOW is more efficient to capture the topical and semantic representation both in queries and documents. Intrinsic evaluation and experiments conducted on public data sets reveal GLOW to be a general framework for document retrieve task. It significantly outperforms BERT and other competitive baselines by a large margin while retaining the same model complexity with BERT.
The review-based recommender systems are commonly utilized to measure users preferences towards different items. In this paper, we focus on addressing three main problems existing in the review-based methods. Firstly, these methods suffer from the class-imbalanced problem where rating levels with lower proportions will be ignored to some extent. Thus, their performance on relatively rare rating levels is unsatisfactory. As the first attempt in this field to address this problem, we propose a flexible dual-optimizer model to gain robustness from both regression loss and classification loss. Secondly, to address the problem caused by the insufficient contextual information extraction ability of word embedding, we first introduce BERT into the review-based method to improve the performance of the semantic analysis. Thirdly, the existing methods ignore the feature information of the time-varying user preferences. Therefore, we propose a time-varying feature extraction module with bidirectional long short-term memory and multi-scale convolutional neural network. Afterward, an interaction component is proposed to further summarize the contextual information of the user-item pairs. To verify the effectiveness of the proposed TADO, we conduct extensive experiments on 23 benchmark datasets selected from Amazon Product Reviews. Compared with several recently proposed state-of-the-art methods, our model obtains significant gain over ALFM, MPCN, and ANR averagely with 20.98%, 9.84%, and 15.46%, respectively. Further analysis proves the necessity of jointly using the proposed components in TADO.
Online health communications often provide biased interpretations of evidence and have unreliable links to the source research. We tested the feasibility of a tool for matching webpages to their source evidence. From 207,538 eligible vaccination-related PubMed articles, we evaluated several approaches using 3,573 unique links to webpages from Altmetric. We evaluated methods for ranking the source articles for vaccine-related research described on webpages, comparing simple baseline feature representation and dimensionality reduction approaches to those augmented with canonical correlation analysis (CCA). Performance measures included the median rank of the correct source article; the percentage of webpages for which the source article was correctly ranked first (recall@1); and the percentage ranked within the top 50 candidate articles (recall@50). While augmenting baseline methods using CCA generally improved results, no CCA-based approach outperformed a baseline method, which ranked the correct source article first for over one quarter of webpages and in the top 50 for more than half. Tools to help people identify evidence-based sources for the content they access on vaccination-related webpages are potentially feasible and may support the prevention of bias and misrepresentation of research in news and social media.
Self-attention has become increasingly popular in a variety of sequence modeling tasks from natural language processing to recommendation, due to its effectiveness. However, self-attention suffers from quadratic computational and memory complexities, prohibiting its applications on long sequences. Existing approaches that address this issue mainly rely on a sparse attention context, either using a local window, or a permuted bucket obtained by locality-sensitive hashing (LSH) or sorting, while crucial information may be lost. Inspired by the idea of vector quantization that uses cluster centroids to approximate items, we propose LISA (LInear-time Self Attention), which enjoys both the effectiveness of vanilla self-attention and the efficiency of sparse attention. LISA scales linearly with the sequence length, while enabling full contextual attention via computing differentiable histograms of codeword distributions. Meanwhile, unlike some efficient attention methods, our method poses no restriction on casual masking or sequence length. We evaluate our method on four real-world datasets for sequential recommendation. The results show that LISA outperforms the state-of-the-art efficient attention methods in both performance and speed; and it is up to 57x faster and 78x more memory efficient than vanilla self-attention.
Can quantum-mechanical particles propagating on a fixed spacetime background be approximated as test bodies satisfying the weak equivalence principle? We ultimately answer the question in the negative but find that, when universality of free-fall is assessed locally, a nontrivial agreement between quantum mechanics and the weak equivalence principle exists. Implications for mass sensing by quantum probes are discussed in some details.