أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Lixin Duan

Collaborative Generative Hashing for Marketing and Fast Cold-start Recommendation

328 - Yan Zhang , Ivor W. Tsang , Lixin Duan 2020

Cold-start has being a critical issue in recommender systems with the explosion of data in e-commerce. Most existing studies proposed to alleviate the cold-start problem are also known as hybrid recommender systems that learn representations of users and items by combining user-item interactive and user/item content information. However, previous hybrid methods regularly suffered poor efficiency bottlenecking in online recommendations with large-scale items, because they were designed to project users and items into continuous latent space where the online recommendation is expensive. To this end, we propose a collaborative generated hashing (CGH) framework to improve the efficiency by denoting users and items as binary codes, then fast hashing search techniques can be used to speed up the online recommendation. In addition, the proposed CGH can generate potential users or items for marketing application where the generative network is designed with the principle of Minimum Description Length (MDL), which is used to learn compact and informative binary codes. Extensive experiments on two public datasets show the advantages for recommendations in various settings over competing baselines and analyze its feasibility in marketing application.

استرجاع المعلومات

Region Comparison Network for Interpretable Few-shot Image Classification

113 - Zhiyu Xue , Lixin Duan , Wen Li 2020

While deep learning has been successfully applied to many real-world computer vision tasks, training robust classifiers usually requires a large amount of well-labeled data. However, the annotation is often expensive and time-consuming. Few-shot imag e classification has thus been proposed to effectively use only a limited number of labeled examples to train models for new classes. Recent works based on transferable metric learning methods have achieved promising classification performance through learning the similarity between the features of samples from the query and support sets. However, rare of them explicitly considers the model interpretability, which can actually be revealed during the training phase. For that, in this work, we propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works as in a neural network as well as to find out specific regions that are related to each other in images coming from the query and support sets. Moreover, we also present a visualization strategy named Region Activation Mapping (RAM) to intuitively explain what our method has learned by visualizing intermediate variables in our network. We also present a new way to generalize the interpretability from the level of tasks to categories, which can also be viewed as a method to find the prototypical parts for supporting the final decision of our RCN. Extensive experiments on four benchmark datasets clearly show the effectiveness of our method over existing baselines.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction

70 - Chaofan Tao , Qinhong Jiang , Lixin Duan 2020

Multi-agent motion prediction is challenging because it aims to foresee the future trajectories of multiple agents (textit{e.g.} pedestrians) simultaneously in a complicated scene. Existing work addressed this challenge by either learning social spat ial interactions represented by the positions of a group of pedestrians, while ignoring their temporal coherence (textit{i.e.} dependencies between different long trajectories), or by understanding the complicated scene layout (textit{e.g.} scene segmentation) to ensure safe navigation. However, unlike previous work that isolated the spatial interaction, temporal coherence, and scene layout, this paper designs a new mechanism, textit{i.e.}, Dynamic and Static Context-aware Motion Predictor (DSCMP), to integrates these rich information into the long-short-term-memory (LSTM). It has three appealing benefits. (1) DSCMP models the dynamic interactions between agents by learning both their spatial positions and temporal coherence, as well as understanding the contextual scene layout.(2) Different from previous LSTM models that predict motions by propagating hidden features frame by frame, limiting the capacity to learn correlations between long trajectories, we carefully design a differentiable queue mechanism in DSCMP, which is able to explicitly memorize and learn the correlations between long trajectories. (3) DSCMP captures the context of scene by inferring latent variable, which enables multimodal predictions with meaningful semantic scene layout. Extensive experiments show that DSCMP outperforms state-of-the-art methods by large margins, such as 9.05% and 7.62% relative improvements on the ETH-UCY and SDD datasets respectively.

الرؤية الحاسوبية وتمييز الأنماط

Reconstruction Regularized Deep Metric Learning for Multi-label Image Classification

241 - Changsheng Li , Chong Liu , Lixin Duan 2020

In this paper, we present a novel deep metric learning method to tackle the multi-label image classification problem. In order to better learn the correlations among images features, as well as labels, we attempt to explore a latent space, where imag es and labels are embedded via two unique deep neural networks, respectively. To capture the relationships between image features and labels, we aim to learn a emph{two-way} deep distance metric over the embedding space from two different views, i.e., the distance between one image and its labels is not only smaller than those distances between the image and its labels nearest neighbors, but also smaller than the distances between the labels and other images corresponding to the labels nearest neighbors. Moreover, a reconstruction module for recovering correct labels is incorporated into the whole framework as a regularization term, such that the label embedding space is more representative. Our model can be trained in an end-to-end manner. Experimental results on publicly available image datasets corroborate the efficacy of our method compared with the state-of-the-arts.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد