Research papers, master and doctoral theses about رمز

Multilingual Translation via Grafting Pre-trained Language Models

281 - Association for Computation Linguistics 2021 مقالة

Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pre-trained (masked) language models in many NLP tasks. However, directly c onnecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size.

توليد رمز المعزز grafting pre-trained language تطعيم اللغة المدربة مسبقا صناعة حمض الفوسفور

Retrieval Augmented Code Generation and Summarization

455 - Association for Computation Linguistics 2021 مقالة

Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting t hem. To mimic developers' code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. REDCODER has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.

تحسين المنطق العددي code generation augmented code generation رمز الجيل توليد رمز المعزز صناعة حمض الفوسفور

Corpus Creation and Language Identification in Low-Resource Code-Mixed Telugu-English Text

170 - Association for Computation Linguistics 2021 مقالة

Code-Mixing (CM) is a common phenomenon in multilingual societies. CM plays a significant role in technology and medical fields where terminologies in the native language are not available or known. Language Identification (LID) of the CM data will h elp solve NLP tasks such as Spell Checking, Named Entity Recognition, Part-Of-Speech tagging, and Semantic Parsing. In the current era of machine learning, a common problem to the above-mentioned tasks is the availability of Learning data to train models. In this paper, we introduce two Telugu-English CM manually annotated datasets (Twitter dataset and Blog dataset). The Twitter dataset contains more romanization variability and misspelled words than the blog dataset. We compare across various classification models and perform extensive bench-marking using both Classical and Deep Learning Models for LID compared to existing models. We propose two architectures for language classification (Telugu and English) in CM data: (1) Word Level Classification (2) Sentence Level word-by-word Classification and compare these approaches presenting two strong baselines for LID on these datasets.

code-mixed telugu-english text corpus creation low-resource code-mixed telugu-english نص خلط رمز التيلجو إنشاء كوربوس Low-Resource Code-Mixed Telugu English صناعة حمض الفوسفور المزيد..

CoTexT: Multi-task Learning with Code-Text Transformer

205 - Association for Computation Linguistics 2021 مقالة

We present CoTexT, a pre-trained, transformer-based encoder-decoder model that learns the representative context between natural language (NL) and programming language (PL). Using self-supervision, CoTexT is pre-trained on large programming language corpora to learn a general understanding of language and code. CoTexT supports downstream NL-PL tasks such as code summarizing/documentation, code generation, defect detection, and code debugging. We train CoTexT on different combinations of available PL corpus including both bimodal'' and unimodal'' data. Here, bimodal data is the combination of text and corresponding code snippets, whereas unimodal data is merely code snippets. We first evaluate CoTexT with multi-task learning: we perform Code Summarization on 6 different programming languages and Code Refinement on both small and medium size featured in the CodeXGLUE dataset. We further conduct extensive experiments to investigate CoTexT on other tasks within the CodeXGlue dataset, including Code Generation and Defect Detection. We consistently achieve SOTA results in these tasks, demonstrating the versatility of our models.

code-text transformer code محول النص رمز الشفرة صناعة حمض الفوسفور

Time-Efficient Code Completion Model for the R Programming Language

476 - Association for Computation Linguistics 2021 مقالة

In this paper we present a deep learning code completion model for the R language. We introduce several techniques to utilize language modeling based architecture in the code completion task. With these techniques, the model requires low resources, b ut still achieves high quality. We also present an evaluation dataset for the R language completion task. Our dataset contains multiple autocompletion usage contexts that provides robust validation results. The dataset is publicly available.

code completion model time-efficient code completion code completion نموذج إكمال التعليمات البرمجية إكمال رمز الوقت الفعال إكمال الكود صناعة حمض الفوسفور المزيد..

UoR at SemEval-2021 Task 4: Using Pre-trained BERT Token Embeddings for Question Answering of Abstract Meaning

181 - Association for Computation Linguistics 2021 مقالة

Most question answering tasks focuses on predicting concrete answers, e.g., named entities. These tasks can be normally achieved by understanding the contexts without additional information required. In Reading Comprehension of Abstract Meaning (ReCA M) task, the abstract answers are introduced. To understand abstract meanings in the context, additional knowledge is essential. In this paper, we propose an approach that leverages the pre-trained BERT Token embeddings as a prior knowledge resource. According to the results, our approach using the pre-trained BERT outperformed the baselines. It shows that the pre-trained BERT token embeddings can be used as additional knowledge for understanding abstract meanings in question answering.

pre-trained bert token bert token embeddings رمز بيرت المدرب مسبقا Bert Token Emgeddings. صناعة حمض الفوسفور

Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text

391 - Association for Computation Linguistics 2021 مقالة

In this shared task, we seek the participating teams to investigate the factors influencing the quality of the code-mixed text generation systems. We synthetically generate code-mixed Hinglish sentences using two distinct approaches and employ human annotators to rate the generation quality. We propose two subtasks, quality rating prediction and annotators' disagreement prediction of the synthetic Hinglish dataset. The proposed subtasks will put forward the reasoning and explanation of the factors influencing the quality and human perception of the code-mixed text.

low-resource synthetically generated generated code-mixed hinglish synthetically generated code-mixed الموارد المنخفضة الناتج توليد رمز مختلطة هينجلديش توليد الكود المزدوج صناعة حمض الفوسفور المزيد..

Entity at SemEval-2021 Task 5: Weakly Supervised Token Labelling for Toxic Spans Detection

432 - Association for Computation Linguistics 2021 مقالة

Detection of toxic spans - detecting toxicity of contents in the granularity of tokens - is crucial for effective moderation of online discussions. The baseline approach for this problem using the transformer model is to add a token classification he ad to the language model and fine-tune the layers with the token labeled dataset. One of the limitations of such a baseline approach is the scarcity of labeled data. To improve the results, We studied leveraging existing public datasets for a related but different task of entire comment/sentence classification. We propose two approaches: the first approach fine-tunes transformer models that are pre-trained on sentence classification samples. In the second approach, we perform weak supervision with soft attention to learn token level labels from sentence labels. Our experiments show improvements in the F1 score over the baseline approach. The implementation has been released publicly.

weakly supervised token supervised token labelling رمز تحت إشراف ضعيف الرمز المميز تحت إشراف صناعة حمض الفوسفور

dhivya-hope-detection@LT-EDI-EACL2021: Multilingual Hope Speech Detection for Code-mixed and Transliterated Texts

232 - Association for Computation Linguistics 2021 مقالة

In this paper we work with a hope speech detection corpora that includes English, Tamil, and Malayalam datasets. We present a two phase mechanism to detect hope speech. In the first phase we build a classifier to identify the language of the text. In the second phase, we build a classifier to detect hope speech, non hope speech, or not lang labels. Experimental results show that hope speech detection is challenging and there is scope for improvement.

multilingual hope speech code-mixed and transliterated خطاب أمل متعدد اللغات رمز مختلط والترجمة صناعة حمض الفوسفور

SYMBOL IN THE CNSCIOUNESS CURRENT STORIES ANEESA ABBOUD AND GHADA ALSAMMAN

1227 - Aِl-Baath University 2018 ورقة بحثية

The plurality of symbols forms in consciousness current story and their repetition form the most important goals that the research seek through to keep up with thoughts movement and their pandemonium deep inside the human mind

symbol consciousness Mind تيار رمز وعي ذهن Current المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد