Do you want to publish a course? Click here

Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low quality translations -- commonly failing to even produce outputs in the right target language. In this work, we observe that off-target translation is dominant even in strong multilingual systems, trained on massive multilingual corpora. To address this issue, we propose a joint approach to regularize NMT models at both representation-level and gradient-level. At the representation level, we leverage an auxiliary target language prediction task to regularize decoder outputs to retain information about the target language. At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance by +5.59 and +10.38 BLEU on WMT and OPUS datasets respectively. Moreover, experiments show that our method also works well when the small amount of direct data is not available.
Neural abstractive summarization systems have gained significant progress in recent years. However, abstractive summarization often produce inconsisitent statements or false facts. How to automatically generate highly abstract yet factually correct s ummaries? In this paper, we proposed an efficient weak-supervised adversarial data augmentation approach to form the factual consistency dataset. Based on the artificial dataset, we train an evaluation model that can not only make accurate and robust factual consistency discrimination but is also capable of making interpretable factual errors tracing by backpropagated gradient distribution on token embeddings. Experiments and analysis conduct on public annotated summarization and factual consistency datasets demonstrate our approach effective and reasonable.
How to effectively adapt neural machine translation (NMT) models according to emerging cases without retraining? Despite the great success of neural machine translation, updating the deployed models online remains a challenge. Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples. However, non-parametric methods are prone to overfit the retrieved examples. In this work, we propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online. Experiments on domain adaptation and multi-domain machine translation datasets show that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. The code and trained models are released at https://github.com/jiangqn/KSTER.
Current research on quality estimation of machine translation focuses on the sentence-level quality of the translations. By using explainability methods, we can use these quality estimations for word-level error identification. In this work, we compa re different explainability techniques and investigate gradient-based and perturbation-based methods by measuring their performance and required computational efforts. Throughout our experiments, we observed that using absolute word scores boosts the performance of gradient-based explainers significantly. Further, we combine explainability methods to ensembles to exploit the strengths of individual explainers to get better explanations. We propose the usage of absolute gradient-based methods. These work comparably well to popular perturbation-based ones while being more time-efficient.
In this work, we propose a novel framework, Gradient Aligned Mutual Learning BERT (GAML-BERT), for improving the early exiting of BERT. GAML-BERT's contributions are two-fold. We conduct a set of pilot experiments, which shows that mutual knowledge d istillation between a shallow exit and a deep exit leads to better performances for both. From this observation, we use mutual learning to improve BERT's early exiting performances, that is, we ask each exit of a multi-exit BERT to distill knowledge from each other. Second, we propose GA, a novel training method that aligns the gradients from knowledge distillation to cross-entropy losses. Extensive experiments are conducted on the GLUE benchmark, which shows that our GAML-BERT can significantly outperform the state-of-the-art (SOTA) BERT early exiting methods.
We propose the first general-purpose gradient-based adversarial attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix , hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks, outperforming prior work in terms of adversarial success rate with matching imperceptibility as per automated and human evaluation. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.
We introduce a simple and highly general phonotactic learner which induces a probabilistic finite-state automaton from word-form data. We describe the learner and show how to parameterize it to induce unrestricted regular languages, as well as how to restrict it to certain subregular classes such as Strictly k-Local and Strictly k-Piecewise languages. We evaluate the learner on its ability to learn phonotactic constraints in toy examples and in datasets of Quechua and Navajo. We find that an unrestricted learner is the most accurate overall when modeling attested forms not seen in training; however, only the learner restricted to the Strictly Piecewise language class successfully captures certain nonlocal phonotactic constraints. Our learner serves as a baseline for more sophisticated methods.
The amount of digital images that are produced in hospitals is increasing rapidly. Effective medical images can play an important role in aiding in diagnosis and treatment, they can also be useful in the education domain for healthcare students by explaining with these images will help them in their studies, new trends for image retrieval using automatic image classification has been investigated for the past few years. Medical image Classification can play an important role in diagnostic and teaching purposes in medicine. For these purposes different imaging modalities are used. There are many classifications created for medical images using both grey-scale and color medical images. In this paper, different algorithms in every step involved in medical image processing have been studied. One way is the algorithms of preprocessing step such as Median filter [1], Histogram equalization (HE) [2], Dynamic histogram equalization (DHE), and Contrast Limited Adaptive Histogram Equalization (CLAHE). Second way is the Feature Selection and Extraction step [3,4], such as Gray Level Co-occurrence Matrix(GLCM). Third way is the classification techniques step, which is divided into three ways in this paper, first one is texture classification techniques, second one is neural network classification techniques, and the third one is K-Nearest Neighbor classification techniques. In this paper, we have use MRI brain image to determine the area of tumor in brain. The steps started by preprocessing operation to the image before inputting it to algorithm. The image was converted to gray scale, later on remove film artifact using special algorithm, and then remove the Skull portions from the image without effect on white and gray matter of the brain using another algorithm, After that the image enhanced using optimized median filter algorithm and remove Impurities that produced from first and second steps.
In this paper, we present two new methods for finding the numerical solutions of systems of the nonlinear equations. The basic idea depend on founding relationship between minimum of a function and the solution of systems of the nonlinear equatio ns. The first method seeks the numerical solution with a sequence of search directions, which is depended on gradient and Hessian matrix of function, while the second method is based on a sequence of conjugate search directions. The study shows that our two methods are convergent, and they can find exact solutions for quadratic functions, so they can find high accurate solutions for over quadratic functions. The purposed two algorithms are programmed by Mathematica Version9. The approximate solutions of some test problems are given. Comparisons of our results with other methods illustrate the efficiency and highly accurate of our suggested methods.
This paper presents parallel computers architectures especially Superscalar processors and Vector processors, building a simulator depending on the basic characteristics for each architecture, the simulator simulates their mechanism of work progra mmatically at the aim of comparing the performance of the two architectures in executing Data Level Parallelism (DLP) and Instruction Level Parallelism ILP. The results shows that the effectiveness of executing instructions in parallel depends significantly on choosing the appropriate architecture for execution, according to the type of parallelism that can be applied to instructions, and the vector features in the vector architecture achieve remarkable improvement in performance that cannot be ignored in execution of DLP, simplify the code and reduce the number of instruction. The provided simulator is a good core that can be developed and modified especially in the field of education for the students of Computer Science and Engineering and the research field.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا