Research papers, master and doctoral theses about machine

The Mininglamp Machine Translation System for WMT21

835 - Association for Computation Linguistics 2021 مقالة

This paper describes Mininglamp neural machine translation systems of the WMT2021 news translation tasks. We have participated in eight directions translation tasks for news text including Chinese to/from English, Hausa to/from English, German to/fro m English and French to/from German. Our fundamental system was based on Transformer architecture, with wider or smaller construction for different news translation tasks. We mainly utilized the method of back-translation, knowledge distillation and fine-tuning to boost single model, while the ensemble was used to combine single models. Our final submission has ranked first for the English to/from Hausa task.

mininglamp machine translation machine translation system mininglamp neural machine ترجمة آلة MiningLamp. نظام الترجمة الآلية mininglamp الآلة العصبية صناعة حمض الفوسفور المزيد..

The NiuTrans Machine Translation Systems for WMT21

863 - Association for Computation Linguistics 2021 مقالة

This paper describes NiuTrans neural machine translation systems of the WMT 2021 news translation tasks. We made submissions to 9 language directions, including English2Chinese, Japanese, Russian, Icelandic and English2Hausa tasks. Our primary system s are built on several effective variants of Transformer, e.g., Transformer-DLCL, ODE-Transformer. We also utilize back-translation, knowledge distillation, post-ensemble, and iterative fine-tuning techniques to enhance the model performance further.

machine translation systems niutrans machine translation أنظمة الترجمة الآلية niutrans ترجمة آلة صناعة حمض الفوسفور

Machine Extraction of Tax Laws from Legislative Texts

894 - Association for Computation Linguistics 2021 مقالة

Using a corpus of compiled codes from U.S. states containing labeled tax law sections, we train text classifiers to automatically tag tax-law documents and, further, to identify the associated revenue source (e.g. income, property, or sales). After e valuating classifier performance in held-out test data, we apply them to an historical corpus of U.S. state legislation to extract the flow of relevant laws over the years 1910 through 2010. We document that the classifiers are effective in the historical corpus, for example by automatically detecting establishments of state personal income taxes. The trained models with replication code are published at https://github.com/luyang521/tax-classification.

machine extraction legislative texts tax law sections استخراج الجهاز النصوص التشريعية قسم قانون الضرائب صناعة حمض الفوسفور المزيد..

Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits

778 - Association for Computation Linguistics 2021 مقالة

Training data for machine translation (MT) is often sourced from a multitude of large corpora that are multi-faceted in nature, e.g. containing contents from multiple domains or different levels of quality or complexity. Naturally, these facets do no t occur with equal frequency, nor are they equally important for the test scenario at hand. In this work, we propose to optimize this balance jointly with MT model parameters to relieve system developers from manual schedule design. A multi-armed bandit is trained to dynamically choose between facets in a way that is most beneficial for the MT system. We evaluate it on three different multi-facet applications: balancing translationese and natural training data, or data from multiple domains or multiple language pairs. We find that bandit learning leads to competitive MT systems across tasks, and our analysis provides insights into its learned strategies and the underlying data sets.

follow rules multi-facet machine translation اتبع القواعد ترجمة آلة متعددة الوجه صناعة حمض الفوسفور

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

653 - Association for Computation Linguistics 2021 مقالة

Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their re liability in practice. Quality Estimation (QE) is the task of automatically assessing the performance of MT systems at test time. Thus, in order to be useful, QE systems should be able to detect such errors. However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements. In this work, we bridge this gap by proposing a general methodology for adversarial testing of QE for MT. First, we show that despite a high correlation with human judgements achieved by the recent SOTA, certain types of meaning errors are still problematic for QE to detect. Second, we show that on average, the ability of a given model to discriminate between meaning-preserving and meaning-altering perturbations is predictive of its overall performance, thus potentially allowing for comparing QE systems without relying on manual quality annotation.

quality estimation current machine translation buttons تقدير الجودة ترجمة الجهاز الحالي أزرار صناعة حمض الفوسفور المزيد..

Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

910 - Association for Computation Linguistics 2021 مقالة

Learning multilingual and multi-domain translation model is challenging as the heterogeneous and imbalanced data make the model converge inconsistently over different corpora in real world. One common practice is to adjust the share of each corpus in the training, so that the learning process is balanced and low-resource cases can benefit from the high resource ones. However, automatic balancing methods usually depend on the intra- and inter-dataset characteristics, which is usually agnostic or requires human priors. In this work, we propose an approach, MultiUAT, that dynamically adjusts the training data usage based on the model's uncertainty on a small set of trusted clean data for multi-corpus machine translation. We experiments with two classes of uncertainty measures on multilingual (16 languages with 4 settings) and multi-domain settings (4 for in-domain and 2 for out-of-domain on English-German translation) and demonstrate our approach MultiUAT substantially outperforms its baselines, including both static and dynamic strategies. We analyze the cross-domain transfer and show the deficiency of static and similarity based methods.

multi-domain neural machine multi-domain neural آلة متعددة المجال العصبية متعدد المجال العصبي صناعة حمض الفوسفور

Zero-Shot Clinical Questionnaire Filling From Human-Machine Interactions

462 - Association for Computation Linguistics 2021 مقالة

In clinical studies, chatbots mimicking doctor-patient interactions are used for collecting information about the patient's health state. Later, this information needs to be processed and structured for the doctor. One way to organize it is by automa tically filling the questionnaires from the human-bot conversation. It would help the doctor to spot the possible issues. Since there is no such dataset available for this task and its collection is costly and sensitive, we explore the capacities of state-of-the-art zero-shot models for question answering, textual inference, and text classification. We provide a detailed analysis of the results and propose further directions for clinical questionnaire filling.

clinical questionnaire filling human-machine interactions zero-shot clinical questionnaire استبيان سريري التفاعلات البشرية صفر لقطة الاستبيان السريرية صناعة حمض الفوسفور المزيد..

Uncertainty-Aware Machine Translation Evaluation

1145 - Association for Computation Linguistics 2021 مقالة

Several neural-based metrics have been recently proposed to evaluate machine translation quality. However, all of them resort to point estimates, which provide limited information at segment level. This is made worse as they are trained on noisy, bia sed and scarce human judgements, often resulting in unreliable quality predictions. In this paper, we introduce uncertainty-aware MT evaluation and analyze the trustworthiness of the predicted quality. We combine the COMET framework with two uncertainty estimation methods, Monte Carlo dropout and deep ensembles, to obtain quality scores along with confidence intervals. We compare the performance of our uncertainty-aware MT evaluation methods across multiple language pairs from the QT21 dataset and the WMT20 metrics task, augmented with MQM annotations. We experiment with varying numbers of references and further discuss the usefulness of uncertainty-aware quality estimation (without references) to flag possibly critical translation mistakes.

المحاكيات متعددة اللغات evaluate machine translation تقييم ترجمة الجهاز صناعة حمض الفوسفور

Improving Machine Translation of Rare and Unseen Word Senses

632 - Association for Computation Linguistics 2021 مقالة

The performance of NMT systems has improved drastically in the past few years but the translation of multi-sense words still poses a challenge. Since word senses are not represented uniformly in the parallel corpora used for training, there is an exc essive use of the most frequent sense in MT output. In this work, we propose CmBT (Contextually-mined Back-Translation), an approach for improving multi-sense word translation leveraging pre-trained cross-lingual contextual word representations (CCWRs). Because of their contextual sensitivity and their large pre-training data, CCWRs can easily capture word senses that are missing or very rare in parallel corpora used to train MT. Specifically, CmBT applies bilingual lexicon induction on CCWRs to mine sense-specific target sentences from a monolingual dataset, and then back-translates these sentences to generate a pseudo parallel corpus as additional training data for an MT system. We test the translation quality of ambiguous words on the MuCoW test suite, which was built to test the word sense disambiguation effectiveness of MT systems. We show that our system improves on the translation of difficult unseen and low frequency word senses.

improving machine translation improving machine تحسين ترجمة الآلة تحسين آلة صناعة حمض الفوسفور

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

840 - Association for Computation Linguistics 2021 مقالة

This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is unconstrained and the latter two are fully constrained. Our model submissions to the shared task were initialized with DeltaLM, a generic pre-trained multilingual encoder-decoder model, and fine-tuned correspondingly with the vast collected parallel data and allowed data sources according to track settings, together with applying progressive learning and iterative back-translation approaches to further improve the performance. Our final submissions ranked first on three tracks in terms of the automatic evaluation metric.

tentrans على نطاق واسع متعدد اللغات microsoft machine translation ترجمة آلة مايكروسوفت صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد