أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Shuo Wang

Intermediate anomalous Hall states induced by noncollinear spin structure in magnetic topological insulator MnBi2Te4

103 - Jing-Zhi Fang , Shuo Wang , Xing-Guo Ye 2021

The combination of topology and magnetism is attractive to produce exotic quantum matters, such as the quantum anomalous Hall state, axion insulators and the magnetic Weyl semimetals. MnBi2Te4, as an intrinsic magnetic topological insulator, provides a platform for the realization of various topological phases. Here we report the intermediate Hall steps in the magnetic hysteresis of MnBi2Te4, where four distinguishable magnetic memory states at zero magnetic field are revealed. The gate and temperature dependence of the magnetic intermediate states indicates the noncollinear spin structure in MnBi2Te4, which can be attributed to the Dzyaloshinskii-Moriya interaction as the coexistence of strong spin-orbit coupling and local inversion symmetry breaking on the surface. Moreover, these multiple magnetic memory states can be programmatically switched among each other through applying designed pulses of magnetic field. Our results provide new insights of the influence of bulk topology on the magnetic states, and the multiple memory states should be promising for spintronic devices.

علم المواد الفيزياء ميسكالي وننكالي

Tea: Program Repair Using Neural Network Based on Program Information Attention Matrix

110 - Wenshuo Wang , Chen Wu , Liang Cheng 2021

The advance in machine learning (ML)-driven natural language process (NLP) points a promising direction for automatic bug fixing for software programs, as fixing a buggy program can be transformed to a translation task. While software programs contai n much richer information than one-dimensional natural language documents, pioneering work on using ML-driven NLP techniques for automatic program repair only considered a limited set of such information. We hypothesize that more comprehensive information of software programs, if appropriately utilized, can improve the effectiveness of ML-driven NLP approaches in repairing software programs. As the first step towards proving this hypothesis, we propose a unified representation to capture the syntax, data flow, and control flow aspects of software programs, and devise a method to use such a representation to guide the transformer model from NLP in better understanding and fixing buggy programs. Our preliminary experiment confirms that the more comprehensive information of software programs used, the better ML-driven NLP techniques can perform in fixing bugs in these programs.

هندسة البرمجيات الذكاء الاصطناعي

Joint Motion Correction and Super Resolution for Cardiac Segmentation via Latent Optimisation

282 - Shuo Wang , Chen Qin , Nicolo Savioli 2021

In cardiac magnetic resonance (CMR) imaging, a 3D high-resolution segmentation of the heart is essential for detailed description of its anatomical structures. However, due to the limit of acquisition duration and respiratory/cardiac motion, stacks o f multi-slice 2D images are acquired in clinical routine. The segmentation of these images provides a low-resolution representation of cardiac anatomy, which may contain artefacts caused by motion. Here we propose a novel latent optimisation framework that jointly performs motion correction and super resolution for cardiac image segmentations. Given a low-resolution segmentation as input, the framework accounts for inter-slice motion in cardiac MR imaging and super-resolves the input into a high-resolution segmentation consistent with input. A multi-view loss is incorporated to leverage information from both short-axis view and long-axis view of cardiac imaging. To solve the inverse problem, iterative optimisation is performed in a latent space, which ensures the anatomical plausibility. This alleviates the need of paired low-resolution and high-resolution images for supervised learning. Experiments on two cardiac MR datasets show that the proposed framework achieves high performance, comparable to state-of-the-art super-resolution approaches and with better cross-domain generalisability and anatomical plausibility.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Language Models are Good Translators

95 - Shuo Wang , Zhaopeng Tu , Zhixing Tan 2021

Recent years have witnessed the rapid advance in neural machine translation (NMT), the core of which lies in the encoder-decoder architecture. Inspired by the recent progress of large-scale pre-trained language models on machine translation in a limi ted scenario, we firstly demonstrate that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models on standard machine translation benchmarks, using the same training data and similar amount of model parameters. LM4MT can also easily utilize source-side texts as additional supervision. Though modeling the source- and target-language texts with the same mechanism, LM4MT can provide unified representations for both source and target sentences, which can better transfer knowledge across languages. Extensive experiments on pivot-based and zero-shot translation tasks show that LM4MT can outperform the encoder-decoder NMT model by a large margin.

الحساب واللغة

KIT Bus: A Shuttle Model for CARLA Simulator

128 - Yusheng Xiang , Shuo Wang , Tianqing Su 2021

With the continuous development of science and technology, self-driving vehicles will surely change the nature of transportation and realize the automotive industrys transformation in the future. Compared with self-driving cars, self-driving buses ar e more efficient in carrying passengers and more environmentally friendly in terms of energy consumption. Therefore, it is speculated that in the future, self-driving buses will become more and more important. As a simulator for autonomous driving research, the CARLA simulator can help people accumulate experience in autonomous driving technology faster and safer. However, a shortcoming is that there is no modern bus model in the CARLA simulator. Consequently, people cannot simulate autonomous driving on buses or the scenarios interacting with buses. Therefore, we built a bus model in 3ds Max software and imported it into the CARLA to fill this gap. Our model, namely KIT bus, is proven to work in the CARLA by testing it with the autopilot simulation. The video demo is shown on our Youtube.

علم الروبوتات

On the Language Coverage Bias for Neural Machine Translation

71 - Shuo Wang , Zhaopeng Tu , Zhixing Tan 2021

Language coverage bias, which indicates the content-dependent differences between sentence pairs originating from the source and target languages, is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice. By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data, and find that using only the source-original data achieves comparable performance with using full training data. Based on these observations, we further propose two simple and effective approaches to alleviate the language coverage bias problem through explicitly distinguishing between the source- and target-original training data, which consistently improve the performance over strong baselines on six WMT20 translation tasks. Complementary to the translationese effect, language coverage bias provides another explanation for the performance drop caused by back-translation. We also apply our approach to both back- and forward-translation and find that mitigating the language coverage bias can improve the performance of both the two representative data augmentation methods and their tagged variants.

الحساب واللغة

An Efficient Training Approach for Very Large Scale Face Recognition

94 - Kai Wang , Shuo Wang , Zhipeng Zhou 2021

Face recognition has achieved significant progress in deep-learning era due to the ultra-large-scale and well-labeled datasets. However, training on ultra-large-scale datasets is time-consuming and takes up a lot of hardware resource. Therefore, designing an efficient training approach is crucial and indispensable. The heavy computational and memory costs mainly result from the high dimensionality of the Fully-Connected (FC) layer. Specifically, the dimensionality is determined by the number of face identities, which can be million-level or even more. To this end, we propose a novel training approach for ultra-large-scale face datasets, termed Faster Face Classification (F$^2$C). In F$^2$C, we first define a Gallery Net and a Probe Net that are used to generate identities centers and extract faces features for face recognition, respectively. Gallery Net has the same structure as Probe Net and inherits the parameters from Probe Net with a moving average paradigm. After that, to reduce the training time and hardware costs of the FC layer, we propose a Dynamic Class Pool (DCP) that stores the features from Gallery Net and calculates the inner product (logits) with positive samples (whose identities are in the DCP) in each mini-batch. DCP can be regarded as a substitute for the FC layer but it is far smaller, thus greatly reducing the computational and memory costs. For negative samples (whose identities are not in DCP), we minimize the cosine similarities between negative samples and those in DCP. Then, to improve the update efficiency of DCPs parameters, we design a dual data-loader including identity-based and instance-based loaders to generate a certain of identities and samples in mini-batches.

الرؤية الحاسوبية وتمييز الأنماط

Robust Training Using Natural Transformation

142 - Shuo Wang , Lingjuan Lyu , Surya Nepal 2021

Previous robustness approaches for deep learning models such as data augmentation techniques via data transformation or adversarial training cannot capture real-world variations that preserve the semantics of the input, such as a change in lighting c onditions. To bridge this gap, we present NaTra, an adversarial training scheme that is designed to improve the robustness of image classification algorithms. We target attributes of the input images that are independent of the class identification, and manipulate those attributes to mimic real-world natural transformations (NaTra) of the inputs, which are then used to augment the training dataset of the image classifier. Specifically, we apply textit{Batch Inverse Encoding and Shifting} to map a batch of given images to corresponding disentangled latent codes of well-trained generative models. textit{Latent Codes Expansion} is used to boost image reconstruction quality through the incorporation of extended feature maps. textit{Unsupervised Attribute Directing and Manipulation} enables identification of the latent directions that correspond to specific attribute changes, and then produce interpretable manipulations of those attributes, thereby generating natural transformations to the input data. We demonstrate the efficacy of our scheme by utilizing the disentangled latent representations derived from well-trained GANs to mimic transformations of an image that are similar to real-world natural variations (such as lighting conditions or hairstyle), and train models to be invariant to these natural transformations. Extensive experiments show that our method improves generalization of classification models and increases its robustness to various real-world distortions

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

The 5th AI City Challenge

72 - Milind Naphade , Shuo Wang , David C. Anastasiu 2021

The AI City Challenge was created with two goals in mind: (1) pushing the boundaries of research and development in intelligent video analysis for smarter cities use cases, and (2) assessing tasks where the level of performance is enough to cause rea l-world adoption. Transportation is a segment ripe for such adoption. The fifth AI City Challenge attracted 305 participating teams across 38 countries, who leveraged city-scale real traffic data and high-quality synthetic data to compete in five challenge tracks. Track 1 addressed video-based automatic vehicle counting, where the evaluation being conducted on both algorithmic effectiveness and computational efficiency. Track 2 addressed city-scale vehicle re-identification with augmented synthetic data to substantially increase the training set for the task. Track 3 addressed city-scale multi-target multi-camera vehicle tracking. Track 4 addressed traffic anomaly detection. Track 5 was a new track addressing vehicle retrieval using natural language descriptions. The evaluation system shows a general leader board of all submitted results, and a public leader board of results limited to the contest participation rules, where teams are not allowed to use external data in their work. The public leader board shows results more close to real-world situations where annotated data is limited. Results show the promise of AI in Smarter Transportation. State-of-the-art performance for some tasks shows that these technologies are ready for adoption in real-world systems.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Uncovering Interpretable Internal States of Merging Tasks at Highway On-Ramps for Autonomous Driving Decision-Making

87 - Huanjie Wang , Wenshuo Wang , Shihua Yuan 2021

Humans make daily routine decisions based on their internal states in intricate interaction scenarios. This paper presents a probabilistically reconstructive learning approach to identify the internal states of multi-vehicle sequential interactions w hen merging at highway on-ramps. We treated the merging tasks sequential decision as a dynamic, stochastic process and then integrated the internal states into an HMM-GMR model, a probabilistic combination of an extended Gaussian mixture regression (GMR) and hidden Markov models (HMM). We also developed a variant expectation-maximum (EM) algorithm to estimate the model parameters and verified it based on a real-world data set. Experiment results reveal that three interpretable internal states can semantically describe the interactive merge procedure at highway on-ramps. This finding provides a basis to develop an efficient model-based decision-making algorithm for autonomous vehicles (AVs) in a partially observable environment.

علم الروبوتات أنظمة وتحكم أنظمة وتحكم

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد