أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Yu Zheng

NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

426 - Yi Sun , Yu Zheng , Chao Hao 2021

Using prompts to utilize language models to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. Nonetheless, virtually all prompt-based methods are token-level, meaning they all utilize GPTs left-to-right language model or BERTs masked language model to perform cloze-style tasks. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP). Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking with ease. Based on the characteristics of NSP-BERT, we offer several quick building templates for various downstream tasks. We suggest a two-stage prompt method for word sense disambiguation tasks in particular. Our strategies for mapping the labels significantly enhance the models performance on sentence pair tasks. On the FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods on most of these tasks and comes close to the few-shot methods.

الحساب واللغة الذكاء الاصطناعي

Spatio-Temporal Graph Contrastive Learning

329 - Xu Liu , Yuxuan Liang , Yu Zheng 2021

Deep learning models are modern tools for spatio-temporal graph (STG) forecasting. Despite their effectiveness, they require large-scale datasets to achieve better performance and are vulnerable to noise perturbation. To alleviate these limitations, an intuitive idea is to use the popular data augmentation and contrastive learning techniques. However, existing graph contrastive learning methods cannot be directly applied to STG forecasting due to three reasons. First, we empirically discover that the forecasting task is unable to benefit from the pretrained representations derived from contrastive learning. Second, data augmentations that are used for defeating noise are less explored for STG data. Third, the semantic similarity of samples has been overlooked. In this paper, we propose a Spatio-Temporal Graph Contrastive Learning framework (STGCL) to tackle these issues. Specifically, we improve the performance by integrating the forecasting loss with an auxiliary contrastive loss rather than using a pretrained paradigm. We elaborate on four types of data augmentations, which disturb data in terms of graph structure, time domain, and frequency domain. We also extend the classic contrastive loss through a rule-based strategy that filters out the most semantically similar negatives. Our framework is evaluated across three real-world datasets and four state-of-the-art models. The consistent improvements demonstrate that STGCL can be used as an off-the-shelf plug-in for existing deep models.

التعلم الآلي الذكاء الاصطناعي

Generative and Contrastive Self-Supervised Learning for Graph Anomaly Detection

256 - Yu Zheng , Ming Jin , Yixin Liu 2021

Anomaly detection from graph data has drawn much attention due to its practical significance in many critical applications including cybersecurity, finance, and social networks. Existing data mining and machine learning methods are either shallow met hods that could not effectively capture the complex interdependency of graph data or graph autoencoder methods that could not fully exploit the contextual information as supervision signals for effective anomaly detection. To overcome these challenges, in this paper, we propose a novel method, Self-Supervised Learning for Graph Anomaly Detection (SL-GAD). Our method constructs different contextual subgraphs (views) based on a target node and employs two modules, generative attribute regression and multi-view contrastive learning for anomaly detection. While the generative attribute regression module allows us to capture the anomalies in the attribute space, the multi-view contrastive learning module can exploit richer structure information from multiple subgraphs, thus abling to capture the anomalies in the structure space, mixing of structure, and attribute information. We conduct extensive experiments on six benchmark datasets and the results demonstrate that our method outperforms state-of-the-art methods by a large margin.

التعلم الآلي

DGCN: Diversified Recommendation with Graph Convolutional Networks

151 - Yu Zheng , Chen Gao , Liang Chen 2021

These years much effort has been devoted to improving the accuracy or relevance of the recommendation system. Diversity, a crucial factor which measures the dissimilarity among the recommended items, received rather little scrutiny. Directly related to user satisfaction, diversification is usually taken into consideration after generating the candidate items. However, this decoupled design of diversification and candidate generation makes the whole system suboptimal. In this paper, we aim at pushing the diversification to the upstream candidate generation stage, with the help of Graph Convolutional Networks (GCN). Although GCN based recommendation algorithms have shown great power in modeling complex collaborative filtering effect to improve the accuracy of recommendation, how diversity changes is ignored in those advanced works. We propose to perform rebalanced neighbor discovering, category-boosted negative sampling and adversarial learning on top of GCN. We conduct extensive experiments on real-world datasets. Experimental results verify the effectiveness of our proposed method on diversification. Further ablation studies validate that our proposed method significantly alleviates the accuracy-diversity dilemma.

استرجاع المعلومات

Dialogue Object Search

44 - Monica Roy , Kaiyu Zheng , Jason Liu 2021

We envision robots that can collaborate and communicate seamlessly with humans. It is necessary for such robots to decide both what to say and how to act, while interacting with humans. To this end, we introduce a new task, dialogue object search: A robot is tasked to search for a target object (e.g. fork) in a human environment (e.g., kitchen), while engaging in a video call with a remote human who has additional but inexact knowledge about the targets location. That is, the robot conducts speech-based dialogue with the human, while sharing the image from its mounted camera. This task is challenging at multiple levels, from data collection, algorithm and system development,to evaluation. Despite these challenges, we believe such a task blocks the path towards more intelligent and collaborative robots. In this extended abstract, we motivate and introduce the dialogue object search task and analyze examples collected from a pilot study. We then discuss our next steps and conclude with several challenges on which we hope to receive feedback.

علم الروبوتات الذكاء الاصطناعي

Causally Invariant Predictor with Shift-Robustness

85 - Xiangyu Zheng , Xinwei Sun , Wei Chen 2021

This paper proposes an invariant causal predictor that is robust to distribution shift across domains and maximally reserves the transferable invariant information. Based on a disentangled causal factorization, we formulate the distribution shift as soft interventions in the system, which covers a wide range of cases for distribution shift as we do not make prior specifications on the causal structure or the intervened variables. Instead of imposing regularizations to constrain the invariance of the predictor, we propose to predict by the intervened conditional expectation based on the do-operator and then prove that it is invariant across domains. More importantly, we prove that the proposed predictor is the robust predictor that minimizes the worst-case quadratic loss among the distributions of all domains. For empirical learning, we propose an intuitive and flexible estimating method based on data regeneration and present a local causal discovery procedure to guide the regeneration step. The key idea is to regenerate data such that the regenerated distribution is compatible with the intervened graph, which allows us to incorporate standard supervised learning methods with the regenerated data. Experimental results on both synthetic and real data demonstrate the efficacy of our predictor in improving the predictive accuracy and robustness across domains.

التعلم الالي التعلم الآلي

Dynamic Planning and Learning under Recovering Rewards

73 - David Simchi-Levi , Zeyu Zheng , Feng Zhu 2021

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce a general class of multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect reward s from at most $K$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non parametrically recovers as the idle time increases. With the objective of maximizing expected cumulative rewards over $T$ time periods, we propose, construct and prove performance guarantees for a class of Purely Periodic Policies. For the offline problem when all model parameters are known, our proposed policy obtains an approximation ratio that is at the order of $1-mathcal O(1/sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has $widetilde{mathcal O}(Nsqrt{T})$ regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards.

التعلم الالي الرياضيات المتقطعة التعلم الآلي

Check It Again: Progressive Visual Question Answering via Visual Entailment

200 - Qingyi Si , Zheng Lin , Mingyu Zheng 2021

While sophisticated Visual Question Answering models have achieved remarkable success, they tend to answer questions only according to superficial correlations between question and answer. Several recent approaches have been developed to address this language priors problem. However, most of them predict the correct answer according to one best output without checking the authenticity of answers. Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers. In this paper, we propose a select-and-rerank (SAR) progressive framework based on Visual Entailment. Specifically, we first select the candidate answers relevant to the question or the image, then we rerank the candidate answers by a visual entailment task, which verifies whether the image semantically entails the synthetic statement of the question and each candidate answer. Experimental results show the effectiveness of our proposed framework, which establishes a new state-of-the-art accuracy on VQA-CP v2 with a 7.55% improvement.

الرؤية الحاسوبية وتمييز الأنماط

Fast Nearest Neighbor Machine Translation

85 - Yuxian Meng , Xiaoya Li , Xiayu Zheng 2021

Though nearest neighbor Machine Translation ($k$NN-MT) cite{khandelwal2020nearest} has proved to introduce significant performance boosts over standard neural MT systems, it is prohibitively slow since it uses the entire reference corpus as the datas tore for the nearest neighbor search. This means each step for each beam in the beam search has to search over the entire reference corpus. $k$NN-MT is thus two-order slower than vanilla MT models, making it hard to be applied to real-world applications, especially online services. In this work, we propose Fast $k$NN-MT to address this issue. Fast $k$NN-MT constructs a significantly smaller datastore for the nearest neighbor search: for each word in a source sentence, Fast $k$NN-MT first selects its nearest token-level neighbors, which is limited to tokens that are the same as the query token. Then at each decoding step, in contrast to using the entire corpus as the datastore, the search space is limited to target tokens corresponding to the previously selected reference source tokens. This strategy avoids search through the whole datastore for nearest neighbors and drastically improves decoding efficiency. Without loss of performance, Fast $k$NN-MT is two-order faster than $k$NN-MT, and is only two times slower than the standard NMT model. Fast $k$NN-MT enables the practical use of $k$NN-MT systems in real-world MT applications.footnote{Code is available at url{https://github.com/ShannonAI/fast-knn-nmt.}}

الحساب واللغة

Rectangle-like hysteresis in a Dysprosium Metallacrown Magnet with Linear F-Dy-F Anisotropic Moiety

60 - Si-Guo Wu , Ze-Yu Ruan , Jie-Yu Zheng 2021

Single-molecule magnets (SMMs) exhibiting open hysteresis loops may potentially apply to molecule-based information processing and storage. However, the capacity to retain magnetic memory is always limited by zero-field quantum tunneling of magnetiza tion (QTM). Herein, a well-designed dysprosium metallacrown SMM, consisting of an endohedral approximate linear F-Dy-F strong anisotropic moiety in a peripheral [15-MCNi-5] metallacrown (MC), is reported with the largest reversal barrier of 1060 cm-1 among d-f SMMs. Rectangle-like hysteresis loops are observed with the huge squareness (remanence/saturation magnetization) up to 97% at 2 K. More importantly, zero-field QTM step is phenomenologically removed by minimizing the dipole coupling and hyperfine interactions. The results demonstrate for the first time that zero-field QTM step can be eliminated via manipulating the ligand field and vanishing the external magnetic perturbations, which illuminates a promising blueprint for developing high-performance SMMs.

الفيزياء ميسكالي وننكالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد