Research papers, master and doctoral theses about Source

APGN: Adversarial and Parameter Generation Networks for Multi-Source Cross-Domain Dependency Parsing

638 - Association for Computation Linguistics 2021 مقالة

Thanks to the strong representation learning capability of deep learning, especially pre-training techniques with language model loss, dependency parsing has achieved great performance boost in the in-domain scenario with abundant labeled training da ta for target domains. However, the parsing community has to face the more realistic setting where the parsing performance drops drastically when labeled data only exists for several fixed out-domains. In this work, we propose a novel model for multi-source cross-domain dependency parsing. The model consists of two components, i.e., a parameter generation network for distinguishing domain-specific features, and an adversarial network for learning domain-invariant representations. Experiments on a recently released NLPCC-2019 dataset for multi-domain dependency parsing show that our model can consistently improve cross-domain parsing performance by about 2 points in averaged labeled attachment accuracy (LAS) over strong BERT-enhanced baselines. Detailed analysis is conducted to gain more insights on contributions of the two components.

الأداء مع التناوب cross-domain dependency parsing multi-source cross-domain dependency تحليل التبعية عبر المجال متعددة المصدر تبعية المجال صناعة حمض الفوسفور

CodeQA: A Question Answering Dataset for Source Code Comprehension

844 - Association for Computation Linguistics 2021 مقالة

We propose CodeQA, a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pai rs and a Python dataset with 70,085 question-answer pairs. To obtain natural and faithful questions and answers, we implement syntactic rules and semantic analysis to transform code comments into question-answer pairs. We present the construction process and conduct systematic analysis of our dataset. Experiment results achieved by several neural baselines on our dataset are shown and discussed. While research on question-answering and machine reading comprehension develops rapidly, few prior work has drawn attention to code question answering. This new dataset can serve as a useful research benchmark for source code comprehension.

source code comprehension شفرة المصدر الفهم صناعة حمض الفوسفور

Refocusing on Relevance: Personalization in NLG

643 - Association for Computation Linguistics 2021 مقالة

Many NLG tasks such as summarization, dialogue response, or open domain question answering, focus primarily on a source text in order to generate a target response. This standard approach falls short, however, when a user's intent or context of work is not easily recoverable based solely on that source text-- a scenario that we argue is more of the rule than the exception. In this work, we argue that NLG systems in general should place a much higher level of emphasis on making use of additional context, and suggest that relevance (as used in Information Retrieval) be thought of as a crucial tool for designing user-oriented text-generating tasks. We further discuss possible harms and hazards around such personalization, and argue that value-sensitive design represents a crucial path forward through these challenges.

التحكم في إعادة صياغة النص source text refocusing on relevance النص المصدر إعادة تركيزه حسب الصلة صناعة حمض الفوسفور

Tracing Source Language Interference in Translation with Graph-Isomorphism Measures

595 - Association for Computation Linguistics 2021 مقالة

Previous research has used linguistic features to show that translations exhibit traces of source language interference and that phylogenetic trees between languages can be reconstructed from the results of translations into the same language. Recent research has shown that instances of translationese (source language interference) can even be detected in embedding spaces, comparing embeddings spaces of original language data with embedding spaces resulting from translations into the same language, using a simple Eigenvector-based divergence from isomorphism measure. To date, it remains an open question whether alternative graph-isomorphism measures can produce better results. In this paper, we (i) explore Gromov-Hausdorff distance, (ii) present a novel spectral version of the Eigenvector-based method, and (iii) evaluate all approaches against a broad linguistic typological database (URIEL). We show that language distances resulting from our spectral isomorphism approaches can reproduce genetic trees on a par with previous work without requiring any explicit linguistic information and that the results can be extended to non-Indo-European languages. Finally, we show that the methods are robust under a variety of modeling conditions.

source language interference tracing source language source language مصدر لغة المصدر لغة المصدر تتبع لغة المصدر صناعة حمض الفوسفور المزيد..

YNU-HPCC at SemEval-2021 Task 10: Using a Transformer-based Source-Free Domain Adaptation Model for Semantic Processing

920 - Association for Computation Linguistics 2021 مقالة

Data sharing restrictions are common in NLP datasets. The purpose of this task is to develop a model trained in a source domain to make predictions for a target domain with related domain data. To address the issue, the organizers provided the models that fine-tuned a large number of source domain data on pre-trained models and the dev data for participants. But the source domain data was not distributed. This paper describes the provided model to the NER (Name entity recognition) task and the ways to develop the model. As a little data provided, pre-trained models are suitable to solve the cross-domain tasks. The models fine-tuned by large number of another domain could be effective in new domain because the task had no change.

اكتشاف التكيف مجال الكشف. transformer-based source-free domain المجال المستند إلى المصدر صناعة حمض الفوسفور

Self-Adapter at SemEval-2021 Task 10: Entropy-based Pseudo-Labeler for Source-free Domain Adaptation

510 - Association for Computation Linguistics 2021 مقالة

Source-free domain adaptation is an emerging line of work in deep learning research since it is closely related to the real-world environment. We study the domain adaption in the sequence labeling problem where the model trained on the source domain data is given. We propose two methods: Self-Adapter and Selective Classifier Training. Self-Adapter is a training method that uses sentence-level pseudo-labels filtered by the self-entropy threshold to provide supervision to the whole model. Selective Classifier Training uses token-level pseudo-labels and supervises only the classification layer of the model. The proposed methods are evaluated on data provided by SemEval-2021 task 10 and Self-Adapter achieves 2nd rank performance.

حقيقة علامة تبويب SEM entropy-based pseudo-labeler source-free domain Entropy- تستند بوزودو المجال المجاني المصدر صناعة حمض الفوسفور

SemEval-2021 Task 10: Source-Free Domain Adaptation for Semantic Processing

755 - Association for Computation Linguistics 2021 مقالة

This paper presents the Source-Free Domain Adaptation shared task held within SemEval-2021. The aim of the task was to explore adaptation of machine-learning models in the face of data sharing constraints. Specifically, we consider the scenario where annotations exist for a domain but cannot be shared. Instead, participants are provided with models trained on that (source) data. Participants also receive some labeled data from a new (development) domain on which to explore domain adaptation algorithms. Participants are then tested on data representing a new (target) domain. We explored this scenario with two different semantic tasks: negation detection (a text classification task) and time expression recognition (a sequence tagging task).

source-free domain adaptation semantic processing مصدر المجالات المجانية للمصدر المعالجة الدلالية صناعة حمض الفوسفور

On the Embeddings of Variables in Recurrent Neural Networks for Source Code

628 - Association for Computation Linguistics 2021 مقالة

Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that the semantics of a vari able is defined not only by its name but also by the contexts in which the variable occurs. In this work, we develop dynamic embeddings, a recurrent mechanism that adjusts the learned semantics of the variable when it obtains more information about the variable's role in the program. We show that using the proposed dynamic embeddings significantly improves the performance of the recurrent neural network, in code completion and bug fixing tasks.

source code processing معالجة شفرة المصدر صناعة حمض الفوسفور

A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

632 - Association for Computation Linguistics 2021 مقالة

There is an emerging interest in the application of natural language processing models to source code processing tasks. One of the major problems in applying deep learning to software engineering is that source code often contains a lot of rare ident ifiers, resulting in huge vocabularies. We propose a simple, yet effective method, based on identifier anonymization, to handle out-of-vocabulary (OOV) identifiers. Our method can be treated as a preprocessing step and, therefore, allows for easy implementation. We show that the proposed OOV anonymization method significantly improves the performance of the Transformer in two code processing tasks: code completion and bug fixing.

approach for handling simple approach source code نهج التعامل نهج بسيط مصدر الرمز صناعة حمض الفوسفور المزيد..

Using of t student’s test to compare the variations of magic Latin square design in microbiological experiments

1684 - Aِl-Baath University 2017 ورقة بحثية

This research was conducted to study the effectiveness of the variations of magic latin square design, to reduce the value of the experimental error, in experiences of microbiological (Lactobacillus acidophilus), and to improve the activity of ran dom rectangles as one source of the variations of magic latin square design. Where they were conducting a random distribution of (6) treatments supposed to (36) experimental unit test has repeated the process of distribution (150) times in order to obtain magic latin squares realized the conditions which show per treatments once in the row and once in the column, and once in each rectangle within the one design.

المربع اللاتيني السحري magic latin square مصادر التباين t student's test source of the variations

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد