ترغب بنشر مسار تعليمي؟ اضغط هنا

Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation. In this work we develop comprehensive assessments of the robustness of neural machine translation systems to numerical text via behav ioural testing. We explore a variety of numerical translation capabilities a system is expected to exhibit and design effective test examples to expose system underperformance. We find that numerical mistranslation is a general issue: major commercial systems and state-of-the-art research models fail on many of our test examples, for high- and low-resource languages. Our tests reveal novel errors that have not previously been reported in NMT systems, to the best of our knowledge. Lastly, we discuss strategies to mitigate numerical mistranslation.
Neural machine translation systems are known to be vulnerable to adversarial test inputs, however, as we show in this paper, these systems are also vulnerable to training attacks. Specifically, we propose a poisoning attack in which a malicious adver sary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation. This sample is designed to induce a specific, targeted translation behaviour, such as peddling misinformation. We present two methods for crafting poisoned examples, and show that only a tiny handful of instances, amounting to only 0.02% of the training set, is sufficient to enact a successful attack. We outline a defence method against said attacks, which partly ameliorates the problem. However, we stress that this is a blind-spot in modern NMT, demanding immediate attention.
385 - Yehui Tang , Kai Han , Chang Xu 2021
Transformer models have achieved great progress on computer vision tasks recently. The rapid development of vision transformers is mainly contributed by their high representation ability for extracting informative features from input images. However, the mainstream transformer models are designed with deep architectures, and the feature diversity will be continuously reduced as the depth increases, i.e., feature collapse. In this paper, we theoretically analyze the feature collapse phenomenon and study the relationship between shortcuts and feature diversity in these transformer models. Then, we present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts. To save the computational costs, we further explore an efficient approach that uses the block-circulant projection to implement augmented shortcuts. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1% accuracy increase of the state-of-the-art visual transformers without obviously increasing their parameters and FLOPs.
We view disentanglement learning as discovering an underlying structure that equivariantly reflects the factorized variations shown in data. Traditionally, such a structure is fixed to be a vector space with data variations represented by translation s along individual latent dimensions. We argue this simple structure is suboptimal since it requires the model to learn to discard the properties (e.g. different scales of changes, different levels of abstractness) of data variations, which is an extra work than equivariance learning. Instead, we propose to encode the data variations with groups, a structure not only can equivariantly represent variations, but can also be adaptively optimized to preserve the properties of data variations. Considering it is hard to conduct training on group structures, we focus on Lie groups and adopt a parameterization using Lie algebra. Based on the parameterization, some disentanglement learning constraints are naturally derived. A simple model named Commutative Lie Group VAE is introduced to realize the group-based disentanglement learning. Experiments show that our model can effectively learn disentangled representations without supervision, and can achieve state-of-the-art performance without extra constraints.
Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolut ion filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $ell_1$-norm distance between filters and input feature as the output response. We first develop a theoretical foundation for AdderNets, by showing that both the single hidden layer AdderNet and the width-bounded deep AdderNet with ReLU activation functions are universal function approximators. An approximation bound for AdderNets with a single hidden layer is also presented. We further analyze the influence of this new similarity measure on the optimization of neural network and develop a special training scheme for AdderNets. Based on the gradient magnitude, an adaptive learning rate strategy is proposed to enhance the training procedure of AdderNets. AdderNets can achieve a 75.7% Top-1 accuracy and a 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in the convolutional layer.
Capturing interpretable variations has long been one of the goals in disentanglement learning. However, unlike the independence assumption, interpretability has rarely been exploited to encourage disentanglement in the unsupervised setting. In this p aper, we examine the interpretability of disentangled representations by investigating two questions: where to be interpreted and what to be interpreted? A latent code is easily to be interpreted if it would consistently impact a certain subarea of the resulting generated image. We thus propose to learn a spatial mask to localize the effect of each individual latent dimension. On the other hand, interpretability usually comes from latent dimensions that capture simple and basic variations in data. We thus impose a perturbation on a certain dimension of the latent code, and expect to identify the perturbation along this dimension from the generated images so that the encoding of simple variations can be enforced. Additionally, we develop an unsupervised model selection method, which accumulates perceptual distance scores along axes in the latent space. On various datasets, our models can learn high-quality disentangled representations without supervision, showing the proposed modeling of interpretability is an effective proxy for achieving unsupervised disentanglement.
The objective of this study is to predict the near-future flooding status of road segments based on their own and adjacent road segments current status through the use of deep learning framework on fine-grained traffic data. Predictive flood monitori ng for situational awareness of road network status plays a critical role to support crisis response activities such as evaluation of the loss of access to hospitals and shelters. Existing studies related to near-future prediction of road network flooding status at road segment level are missing. Using fine-grained traffic speed data related to road sections, this study designed and implemented three spatio-temporal graph convolutional network (STGCN) models to predict road network status during flood events at the road segment level in the context of the 2017 Hurricane Harvey in Harris County (Texas, USA). Model 1 consists of two spatio-temporal blocks considering the adjacency and distance between road segments, while Model 2 contains an additional elevation block to account for elevation difference between road segments. Model 3 includes three blocks for considering the adjacency and the product of distance and elevation difference between road segments. The analysis tested the STGCN models and evaluated their prediction performance. Our results indicated that Model 1 and Model 2 have reliable and accurate performance for predicting road network flooding status in near future (e.g., 2-4 hours) with model precision and recall values larger than 98% and 96%, respectively. With reliable road network status predictions in floods, the proposed model can benefit affected communities to avoid flooded roads and the emergency management agencies to implement evacuation and relief resource delivery plans.
113 - Ying Wang , Liang Qiao , Chang Xu 2021
Ever since its first release in 2009, the Go programming language (Golang) has been well received by software communities. A major reason for its success is the powerful support of library-based development, where a Golang project can be conveniently built on top of other projects by referencing them as libraries. As Golang evolves, it recommends the use of a new library-referencing mode to overcome the limitations of the original one. While these two library modes are incompatible, both are supported by the Golang ecosystem. The heterogeneous use of library-referencing modes across Golang projects has caused numerous dependency management (DM) issues, incurring reference inconsistencies and even build failures. Motivated by the problem, we conducted an empirical study to characterize the DM issues, understand their root causes, and examine their fixing solutions. Based on our findings, we developed textsc{Hero}, an automated technique to detect DM issues and suggest proper fixing solutions. We applied textsc{Hero} to 19,000 popular Golang projects. The results showed that textsc{Hero} achieved a high detection rate of 98.5% on a DM issue benchmark and found 2,422 new DM issues in 2,356 popular Golang projects. We reported 280 issues, among which 181 (64.6%) issues have been confirmed, and 160 of them (88.4%) have been fixed or are under fixing. Almost all the fixes have adopted our fixing suggestions.
121 - Wentao Xu , Weiqing Liu , Chang Xu 2021
Stock trend forecasting, aiming at predicting the stock future trends, is crucial for investors to seek maximized profits from the stock market. Many event-driven methods utilized the events extracted from news, social media, and discussion board to forecast the stock trend in recent years. However, existing event-driven methods have two main shortcomings: 1) overlooking the influence of event information differentiated by the stock-dependent properties; 2) neglecting the effect of event information from other related stocks. In this paper, we propose a relational event-driven stock trend forecasting (REST) framework, which can address the shortcoming of existing methods. To remedy the first shortcoming, we propose to model the stock context and learn the effect of event information on the stocks under different contexts. To address the second shortcoming, we construct a stock graph and design a new propagation layer to propagate the effect of event information from related stocks. The experimental studies on the real-world data demonstrate the efficiency of our REST framework. The results of investment simulation show that our framework can achieve a higher return of investment than baselines.
83 - Niu Wan , Takayuki Myo , Chang Xu 2020
By using bare Argonne V4 (AV4), V6 (AV6), and V8 (AV8) nucleon-nucleon (NN) interactions respectively, the nuclear equations of state (EOSs) for neutron matter are calculated with the unitary correlation operator and high-momentum pair methods. The n eutron matter is described under a finite particle number approach with magic number $N=66$ under a periodic boundary condition. The central short-range correlation coming from the short-range repulsion in the NN interaction is treated by the unitary correlation operator method (UCOM) and the tensor correlation and spin-orbit effects are described by the two-particle two-hole (2p2h) excitations of nucleon pairs, in which the two nucleons with a large relative momentum are regarded as a high-momentum pair (HM). With the 2p2h configurations increasing, the total energy per particle of neutron matter is well converged under this UCOM+HM framework. By comparing the results calculated with AV4, AV6, and AV8 NN interactions, the effects of the short-range correlation, the tensor correlation, and the spin-orbit coupling on the density dependence of the total energy per particle of neutron matter are demonstrated. Moreover, the contribution of each Hamiltonian component to the total energy per particle is discussed. The EOSs of neutron matter calculated within the present UCOM+HM framework agree with the calculations of six different microscopic many-body theories, especially in agreement with the auxiliary field diffusion Monte Carlo calculations.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا