Research papers, master and doctoral theses about التعميم

An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages

153 - Association for Computation Linguistics 2021 مقالة

The availability of parallel sentence simplification (SS) is scarce for neural SS modelings. We propose an unsupervised method to build SS corpora from large-scale bilingual translation corpora, alleviating the need for SS supervised corpora. Our met hod is motivated by the following two findings: neural machine translation model usually tends to generate more high-frequency tokens and the difference of text complexity levels exists between the source and target language of a translation corpus. By taking the pair of the source sentences of translation corpus and the translations of their references in a bridge language, we can construct large-scale pseudo parallel SS data. Then, we keep these sentence pairs with a higher complexity difference as SS sentence pairs. The building SS corpora with an unsupervised approach can satisfy the expectations that the aligned sentences preserve the same meanings and have difference in text complexity levels. Experimental results show that SS methods trained by our corpora achieve the state-of-the-art results and significantly outperform the results on English benchmark WikiLarge.

تمكين التعميم المنهجي building sentence simplification sentence simplification corpora بناء جملة تبسيط جملة تبسيط corpora. صناعة حمض الفوسفور

Regularising Fisher Information Improves Cross-lingual Generalisation

220 - Association for Computation Linguistics 2021 مقالة

Many recent works use consistency regularisation' to improve the generalisation of fine-tuned pre-trained models, both multilingual and English-only. These works encourage model outputs to be similar between a perturbed and normal version of the inpu t, usually via penalising the Kullback--Leibler (KL) divergence between the probability distribution of the perturbed and normal model. We believe that consistency losses may be implicitly regularizing the loss landscape. In particular, we build on work hypothesising that implicitly or explicitly regularizing trace of the Fisher Information Matrix (FIM), amplifies the implicit bias of SGD to avoid memorization. Our initial results show both empirically and theoretically that consistency losses are related to the FIM, and show that the flat minima implied by a small trace of the FIM improves performance when fine-tuning a multilingual model on additional languages. We aim to confirm these initial results on more datasets, and use our insights to develop better multilingual fine-tuning techniques.

improves cross-lingual generalisation information improves cross-lingual regularising fisher information يحسن التعميم عبر اللغات المعلومات تحسن عبر اللغات تنظيم معلومات فيشر صناعة حمض الفوسفور المزيد..

Compositional Generalization via Semantic Tagging

450 - Association for Computation Linguistics 2021 مقالة

Although neural sequence-to-sequence models have been successfully applied to semantic parsing, they fail at compositional generalization, i.e., they are unable to systematically generalize to unseen compositions of seen components. Motivated by trad itional semantic parsing where compositionality is explicitly accounted for by symbolic grammars, we propose a new decoding framework that preserves the expressivity and generality of sequence-to-sequence models while featuring lexicon-style alignments and disentangled information processing. Specifically, we decompose decoding into two phases where an input utterance is first tagged with semantic symbols representing the meaning of individual words, and then a sequence-to-sequence model is used to predict the final meaning representation conditioning on the utterance and the predicted tag sequence. Experimental results on three semantic parsing datasets show that the proposed approach consistently improves compositional generalization across model architectures, domains, and semantic formalisms.

compositional generalization semantic tagging التعميم التركيبي العلامة الدلالية صناعة حمض الفوسفور

Compositional Networks Enable Systematic Generalization for Grounded Language Understanding

165 - Association for Computation Linguistics 2021 مقالة

Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of networks. We demonstrate that these limitations can be overcome by addressing the generalization challenges in the gSCAN dataset, which explicitly measures how well an agent is able to interpret novel linguistic commands grounded in vision, e.g., novel pairings of adjectives and nouns. The key principle we employ is compositionality: that the compositional structure of networks should reflect the compositional structure of the problem domain they address, while allowing other parameters to be learned end-to-end. We build a general-purpose mechanism that enables agents to generalize their language understanding to compositional domains. Crucially, our network has the same state-of-the-art performance as prior work while generalizing its knowledge when prior work does not. Our network also provides a level of interpretability that enables users to inspect what each part of networks learns. Robust grounded language understanding without dramatic failures and without corner cases is critical to building safe and fair robots; we demonstrate the significant role that compositionality can play in achieving that goal.

تكوين المفهوم enable systematic generalization تمكين التعميم المنهجي صناعة حمض الفوسفور

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

180 - Association for Computation Linguistics 2021 مقالة

Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (ada pters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.

simple heuristics generalization in nli learning dataset-specific heuristics الاستدلال البسيطة التعميم في NLI. التعلم الاستدلال محددات البيانات صناعة حمض الفوسفور المزيد..

Clustering Monolingual Vocabularies to Improve Cross-Lingual Generalization

500 - Association for Computation Linguistics 2021 مقالة

Multilingual language models exhibit better performance for some languages than for others (Singh et al., 2019), and many languages do not seem to benefit from multilingual sharing at all, presumably as a result of poor multilingual segmentation (Pyy sal o et al., 2020). This work explores the idea of learning multilingual language models based on clustering of monolingual segments. We show significant improvements over standard multilingual segmentation and training across nine languages on a question answering task, both in a small model regime and for a model of the size of BERT-base.

improve cross-lingual generalization vocabularies to improve clustering monolingual vocabularies تحسين التعميم عبر اللغات المفردات لتحسين تجميع المفردات أحادية الأونلينغ صناعة حمض الفوسفور المزيد..

OCHADAI-KYOTO at SemEval-2021 Task 1: Enhancing Model Generalization and Robustness for Lexical Complexity Prediction

173 - Association for Computation Linguistics 2021 مقالة

We propose an ensemble model for predicting the lexical complexity of words and multiword expressions (MWEs). The model receives as input a sentence with a target word or MWE and outputs its complexity score. Given that a key challenge with this task is the limited size of annotated data, our model relies on pretrained contextual representations from different state-of-the-art transformer-based language models (i.e., BERT and RoBERTa), and on a variety of training methods for further enhancing model generalization and robustness: multi-step fine-tuning and multi-task learning, and adversarial training. Additionally, we propose to enrich contextual representations by adding hand-crafted features during training. Our model achieved competitive results and ranked among the top-10 systems in both sub-tasks.

تنبؤ التعقيد enhancing model generalization تعزيز نموذج التعميم صناعة حمض الفوسفور

SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)

214 - Association for Computation Linguistics 2021 مقالة

In this paper, we introduce the first SemEval task on Multilingual and Cross-Lingual Word-in-Context disambiguation (MCL-WiC). This task allows the largely under-investigated inherent ability of systems to discriminate between word senses within and across languages to be evaluated, dropping the requirement of a fixed sense inventory. Framed as a binary classification, our task is divided into two parts. In the multilingual sub-task, participating systems are required to determine whether two target words, each occurring in a different context within the same language, express the same meaning or not. Instead, in the cross-lingual part, systems are asked to perform the task in a cross-lingual scenario, in which the two target words and their corresponding contexts are provided in two different languages. We illustrate our task, as well as the construction of our manually-created dataset including five languages, namely Arabic, Chinese, English, French and Russian, and the results of the participating systems. Datasets and results are available at: https://github.com/SapienzaNLP/mcl-wic.

تعزيز نموذج التعميم cross-lingual disambiguation عبر اللغات صناعة حمض الفوسفور

Synthetic Examples Improve Cross-Target Generalization: A Study on Stance Detection on a Twitter corpus.

352 - Association for Computation Linguistics 2021 مقالة

Cross-target generalization is a known problem in stance detection (SD), where systems tend to perform poorly when exposed to targets unseen during training. Given that data annotation is expensive and time-consuming, finding ways to leverage abundan t unlabeled in-domain data can offer great benefits. In this paper, we apply a weakly supervised framework to enhance cross-target generalization through synthetically annotated data. We focus on Twitter SD and show experimentally that integrating synthetic data is helpful for cross-target generalization, leading to significant improvements in performance, with gains in F1 scores ranging from +3.4 to +5.1.

improve cross-target generalization improve cross-target تحسين التعميم المستهدف تحسين الهدف عبر صناعة حمض الفوسفور

The Concept of the Epistemological Obstacle at Gaston Bachellar and its Essential Forms

4120 - Tishreen University 2018 ورقة بحثية

The importance of this research is that it is one of the rare researches that touch on the philosophy of science at the French philosopher Gaston Bachler, And the role played by this philosopher in the development of the science of epistemology thr ough the epistemological concepts that he introduced to this science, Such as the concept of epistemological obstacle and the concept of epistemological estrangement in addition to the concept of temporal regression and the relationship of these concepts with each other, which contributed to the enrichment of the epistemology and its evolution.

المفاهيم Concepts perceptions العقبة القطيعة التعميم فلسفة العلم التصورات العقبات المعرفية Disconnection Obstacle Circular Philosophy of Science Cognitive Obstacles المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد