أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Yu Yang

Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

127 - Hendrik Schuff , Hsiu-Yu Yang , Heike Adel 2021

Natural language inference (NLI) requires models to learn and apply commonsense knowledge. These reasoning abilities are particularly important for explainable NLI systems that generate a natural language explanation in addition to their label predic tion. The integration of external knowledge has been shown to improve NLI systems, here we investigate whether it can also improve their explanation capabilities. For this, we investigate different sources of external knowledge and evaluate the performance of our models on in-domain data as well as on special transfer datasets that are designed to assess fine-grained reasoning capabilities. We find that different sources of knowledge have a different effect on reasoning abilities, for example, implicit knowledge stored in language models can hinder reasoning on numbers and negations. Finally, we conduct the largest and most fine-grained explainable NLI crowdsourcing study to date. It reveals that even large differences in automatic performance scores do neither reflect in human ratings of label, explanation, commonsense nor grammar correctness.

الحساب واللغة تفاعل الإنسان والحاسوب

ROS-X-Habitat: Bridging the ROS Ecosystem with Embodied AI

111 - Guanxiong Chen , Haoyu Yang , Ian M. Mitchell 2021

We introduce ROS-X-Habitat, a software interface that bridges the AI Habitat platform for embodied reinforcement learning agents with other robotics resources via ROS. This interface not only offers standardized communication protocols between embodi ed agents and simulators, but also enables physics-based simulation. With this interface, roboticists are able to train their own Habitat RL agents in another simulation environment or to develop their own robotic algorithms inside Habitat Sim. Through in silico experiments, we demonstrate that ROS-X-Habitat has minimal impact on the navigation performance and simulation speed of Habitat agents; that a standard set of ROS mapping, planning and navigation tools can run in the Habitat simulator, and that a Habitat agent can run in the standard ROS simulator Gazebo.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

OFDM-guided Deep Joint Source Channel Coding for Wireless Multipath Fading Channels

134 - Mingyu Yang , Chenghong Bian , Hun-Seok Kim 2021

We investigate joint source channel coding (JSCC) for wireless image transmission over multipath fading channels. Inspired by recent works on deep learning based JSCC and model-based learning methods, we combine an autoencoder with orthogonal frequen cy division multiplexing (OFDM) to cope with multipath fading. The proposed encoder and decoder use convolutional neural networks (CNNs) and directly map the source images to complex-valued baseband samples for OFDM transmission. The multipath channel and OFDM are represented by non-trainable (deterministic) but differentiable layers so that the system can be trained end-to-end. Furthermore, our JSCC decoder further incorporates explicit channel estimation, equalization, and additional subnets to enhance the performance. The proposed method exhibits 2.5 -- 4 dB SNR gain for the equivalent image quality compared to conventional schemes that employ state-of-the-art but separate source and channel coding such as BPG and LDPC. The performance further improves when the system incorporates the channel state information (CSI) feedback. The proposed scheme is robust against OFDM signal clipping and parameter mismatch for the channel model used in training and evaluation.

معالجة الإشارات

Unsupervised Pre-training with Structured Knowledge for Improving Natural Language Inference

229 - Xiaoyu Yang , Xiaodan Zhu , Zhan Shi 2021

While recent research on natural language inference has considerably benefited from large annotated datasets, the amount of inference-related knowledge (including commonsense) provided in the annotated data is still rather limited. There have been tw o lines of approaches that can be used to further address the limitation: (1) unsupervised pretraining can leverage knowledge in much larger unstructured text data; (2) structured (often human-curated) knowledge has started to be considered in neural-network-based models for NLI. An immediate question is whether these two approaches complement each other, or how to develop models that can bring together their advantages. In this paper, we propose models that leverage structured knowledge in different components of pre-trained models. Our results show that the proposed models perform better than previous BERT-based state-of-the-art models. Although our models are proposed for NLI, they can be easily extended to other sentence or sentence-pair classification problems.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Influence of database noises to machine learning for spatiotemporal chaos

58 - Yu Yang , Shijie Qin , Shijun Liao 2021

A new strategy, namely the clean numerical simulation (CNS), was proposed (J. Computational Physics, 418:109629, 2020) to gain reliable/convergent simulations (with negligible numerical noises) of spatiotemporal chaotic systems in a long enough inter val of time, which provide us benchmark solution for comparison. Here we illustrate that machine learning (ML) can always give good enough fitting predictions of a spatiotemporal chaos by using, separately, two quite different training sets: one is the clean database given by the CNS with negligible numerical noises, the other is the polluted database given by the traditional algorithms in single/double precision with considerably large numerical noises. However, even in statistics, the ML predictions based on the polluted database are quite different from those based on the clean database. It illustrates that the database noises have huge influences on ML predictions of some spatiotemporal chaos, even in statistics. Thus, we must use a clean database for machine learning of some spatiotemporal chaos. This surprising result might open a new door and possibility to study machine learning.

الفيزياء الحسابية ديناميات الفوضوية

DepthTrack : Unveiling the Power of RGBD Tracking

220 - Song Yan , Jinyu Yang , Jani Kapyla 2021

RGBD (RGB plus depth) object tracking is gaining momentum as RGBD sensors have become popular in many application fields such as robotics.However, the best RGBD trackers are extensions of the state-of-the-art deep RGB trackers. They are trained with RGB data and the depth channel is used as a sidekick for subtleties such as occlusion detection. This can be explained by the fact that there are no sufficiently large RGBD datasets to 1) train deep depth trackers and to 2) challenge RGB trackers with sequences for which the depth cue is essential. This work introduces a new RGBD tracking dataset - Depth-Track - that has twice as many sequences (200) and scene types (40) than in the largest existing dataset, and three times more objects (90). In addition, the average length of the sequences (1473), the number of deformable objects (16) and the number of annotated tracking attributes (15) have been increased. Furthermore, by running the SotA RGB and RGBD trackers on DepthTrack, we propose a new RGBD tracking baseline, namely DeT, which reveals that deep RGBD tracking indeed benefits from genuine training data. The code and dataset is available at https://github.com/xiaozai/DeT

الرؤية الحاسوبية وتمييز الأنماط

DTWSSE: Data Augmentation with a Siamese Encoder for Time Series

73 - Xinyu Yang , Xinlan Zhang , Zhenguo Zhang 2021

Access to labeled time series data is often limited in the real world, which constrains the performance of deep learning models in the field of time series analysis. Data augmentation is an effective way to solve the problem of small sample size and imbalance in time series datasets. The two key factors of data augmentation are the distance metric and the choice of interpolation method. SMOTE does not perform well on time series data because it uses a Euclidean distance metric and interpolates directly on the object. Therefore, we propose a DTW-based synthetic minority oversampling technique using siamese encoder for interpolation named DTWSSE. In order to reasonably measure the distance of the time series, DTW, which has been verified to be an effective method forts, is employed as the distance metric. To adapt the DTW metric, we use an autoencoder trained in an unsupervised self-training manner for interpolation. The encoder is a Siamese Neural Network for mapping the time series data from the DTW hidden space to the Euclidean deep feature space, and the decoder is used to map the deep feature space back to the DTW hidden space. We validate the proposed methods on a number of different balanced or unbalanced time series datasets. Experimental results show that the proposed method can lead to better performance of the downstream deep learning model.

التعلم الآلي الذكاء الاصطناعي

HAWK: Rapid Android Malware Detection through Heterogeneous Graph Attention Networks

196 - Yiming Hei , Renyu Yang , Hao Peng 2021

Android is undergoing unprecedented malicious threats daily, but the existing methods for malware detection often fail to cope with evolving camouflage in malware. To address this issue, we present HAWK, a new malware detection framework for evolutio nary Android applications. We model Android entities and behavioural relationships as a heterogeneous information network (HIN), exploiting its rich semantic metastructures for specifying implicit higher-order relationships. An incremental learning model is created to handle the applications that manifest dynamically, without the need for re-constructing the whole HIN and the subsequent embedding model. The model can pinpoint rapidly the proximity between a new application and existing in-sample applications and aggregate their numerical embeddings under various semantics. Our experiments examine more than 80,860 malicious and 100,375 benign applications developed over a period of seven years, showing that HAWK achieves the highest detection accuracy against baselines and takes only 3.5ms on average to detect an out-of-sample application, with the accelerated training time of 50x faster than the existing approach.

التشفير والأمن

TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

145 - Jinyu Yang , Jingjing Liu , Ning Xu 2021

Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature. To fill this gap, this paper first comprehensively investigates the transferability of ViT on a variety of domain adaptation tasks. Surprisingly, ViT demonstrates superior transferability over its CNNs-based counterparts with a large margin, while the performance can be further improved by incorporating adversarial adaptation. Notwithstanding, directly using CNNs-based adaptation strategies fails to take the advantage of ViTs intrinsic merits (e.g., attention mechanism and sequential image representation) which play an important role in knowledge transfer. To remedy this, we propose an unified framework, namely Transferable Vision Transformer (TVT), to fully exploit the transferability of ViT for domain adaptation. Specifically, we delicately devise a novel and effective unit, which we term Transferability Adaption Module (TAM). By injecting learned transferabilities into attention blocks, TAM compels ViT focus on both transferable and discriminative features. Besides, we leverage discriminative clustering to enhance feature diversity and separation which are undermined during adversarial domain alignment. To verify its versatility, we perform extensive studies of TVT on four benchmarks and the experimental results demonstrate that TVT attains significant improvements compared to existing state-of-the-art UDA methods.

الرؤية الحاسوبية وتمييز الأنماط

Achievable Regions and Precoder Designs for the Multiple Access Wiretap Channels with Confidential and Open Messages

170 - Hao Xu , Tianyu Yang , Kai-Kit Wong 2021

This paper investigates the secrecy capacity region of multiple access wiretap (MAC-WT) channels where, besides confidential messages, the users have also open messages to transmit. All these messages are intended for the legitimate receiver (or Bob for brevity) but only the confidential messages need to be protected from the eavesdropper (Eve). We first consider a discrete memoryless (DM) MAC-WT channel where both Bob and Eve jointly decode their interested messages. By using random coding, we find an achievable rate region, within which perfect secrecy can be realized, i.e., all users can communicate with Bob with arbitrarily small probability of error, while the confidential information leaked to Eve tends to zero. Due to the high implementation complexity of joint decoding, we also consider the DM MAC-WT channel where Bob simply decodes messages independently while Eve still applies joint decoding. We then extend the results in the DM case to a Gaussian vector (GV) MAC-WT channel. Based on the information theoretic results, we further maximize the sum secrecy rate of the GV MAC-WT system by designing precoders for all users. Since the problems are non-convex, we provide iterative algorithms to obtain suboptimal solutions. Simulation results show that compared with existing schemes, secure communication can be greatly enhanced by the proposed algorithms, and in contrast to the works which only focus on the network secrecy performance, the system spectrum efficiency can be effectively improved since open messages can be simultaneously transmitted.

نظرية المعلومات نظرية المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد