ترغب بنشر مسار تعليمي؟ اضغط هنا

Self-attention has become increasingly popular in a variety of sequence modeling tasks from natural language processing to recommendation, due to its effectiveness. However, self-attention suffers from quadratic computational and memory complexities, prohibiting its applications on long sequences. Existing approaches that address this issue mainly rely on a sparse attention context, either using a local window, or a permuted bucket obtained by locality-sensitive hashing (LSH) or sorting, while crucial information may be lost. Inspired by the idea of vector quantization that uses cluster centroids to approximate items, we propose LISA (LInear-time Self Attention), which enjoys both the effectiveness of vanilla self-attention and the efficiency of sparse attention. LISA scales linearly with the sequence length, while enabling full contextual attention via computing differentiable histograms of codeword distributions. Meanwhile, unlike some efficient attention methods, our method poses no restriction on casual masking or sequence length. We evaluate our method on four real-world datasets for sequential recommendation. The results show that LISA outperforms the state-of-the-art efficient attention methods in both performance and speed; and it is up to 57x faster and 78x more memory efficient than vanilla self-attention.
91 - Yongji Wu , Lu Yin , Defu Lian 2021
Sequential recommendation plays an increasingly important role in many e-commerce services such as display advertisement and online shopping. With the rapid development of these services in the last two decades, users have accumulated a massive amoun t of behavior data. Richer sequential behavior data has been proven to be of great value for sequential recommendation. However, traditional sequential models fail to handle users lifelong sequences, as their linear computational and storage cost prohibits them from performing online inference. Recently, lifelong sequential modeling methods that borrow the idea of memory networks from NLP are proposed to address this issue. However, the RNN-based memory networks built upon intrinsically suffer from the inability to capture long-term dependencies and may instead be overwhelmed by the noise on extremely long behavior sequences. In addition, as the users behavior sequence gets longer, more interests would be demonstrated in it. It is therefore crucial to model and capture the diverse interests of users. In order to tackle these issues, we propose a novel lifelong incremental multi-interest self attention based sequential recommendation model, namely LimaRec. Our proposed method benefits from the carefully designed self-attention to identify relevant information from users behavior sequences with different interests. It is still able to incrementally update users representations for online inference, similarly to memory network based approaches. We extensively evaluate our method on four real-world datasets and demonstrate its superior performances compared to the state-of-the-art baselines.
Entity interaction prediction is essential in many important applications such as chemistry, biology, material science, and medical science. The problem becomes quite challenging when each entity is represented by a complex structure, namely structur ed entity, because two types of graphs are involved: local graphs for structured entities and a global graph to capture the interactions between structured entities. We observe that existing works on structured entity interaction prediction cannot properly exploit the unique graph of graphs model. In this paper, we propose a Graph of Graphs Neural Network, namely GoGNN, which extracts the features in both structured entity graphs and the entity interaction graph in a hierarchical way. We also propose the dual-attention mechanism that enables the model to preserve the neighbor importance in both levels of graphs. Extensive experiments on real-world datasets show that GoGNN outperforms the state-of-the-art methods on two representative structured entity interaction prediction tasks: chemical-chemical interaction prediction and drug-drug interaction prediction. Our code is available at Github.
Recently, there have been some breakthroughs in graph analysis by applying the graph neural networks (GNNs) following a neighborhood aggregation scheme, which demonstrate outstanding performance in many tasks. However, we observe that the parameters of the network and the embedding of nodes are represented in real-valued matrices in existing GNN-based graph embedding approaches which may limit the efficiency and scalability of these models. It is well-known that binary vector is usually much more space and time efficient than the real-valued vector. This motivates us to develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters following the GNN-based paradigm. Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches to binarize the model parameters and learn the compact embedding. Extensive experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space while matching the state-of-the-art performance.
87 - Yi Cao , Jian Gao , Defu Lian 2017
Quantitative understanding of relationships between students behavioral patterns and academic performances is a significant step towards personalized education. In contrast to previous studies that mainly based on questionnaire surveys, in this paper , we collect behavioral records from 18,960 undergraduate students smart cards and propose a novel metric, called orderness, which measures the regularity of campus daily life (e.g., meals and showers) of each student. Empirical analysis demonstrates that academic performance (GPA) is strongly correlated with orderness. Furthermore, we show that orderness is an important feature to predict academic performance, which remarkably improves the prediction accuracy even at the presence of students diligence. Based on these analyses, education administrators could better guide students campus lives and implement effective interventions in an early stage when necessary.
The identification of urban mobility patterns is very important for predicting and controlling spatial events. In this study, we analyzed millions of geographical check-ins crawled from a leading Chinese location-based social networking service (Jiep ang.com), which contains demographic information that facilitates group-specific studies. We determined the distinct mobility patterns of natives and non-natives in all five large cities that we considered. We used a mixed method to assign different algorithms to natives and non-natives, which greatly improved the accuracy of location prediction compared with the basic algorithms. We also propose so-called indigenization coefficients to quantify the extent to which an individual behaves like a native, which depends only on their check-in behavior, rather than requiring demographic information. Surprisingly, the hybrid algorithm weighted using the indigenization coefficients outperformed a mixed algorithm that used additional demographic information, suggesting the advantage of behavioral data in characterizing individual mobility compared with the demographic information. The present location prediction algorithms can find applications in urban planning, traffic forecasting, mobile recommendation, and so on.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا