New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

نهج بسيط للتعامل مع المعرفات خارج المفردات في التعلم العميق للحصول على شفرة المصدر

300 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

approach for handling simple approach source code نهج التعامل نهج بسيط مصدر الرمز صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

There is an emerging interest in the application of natural language processing models to source code processing tasks. One of the major problems in applying deep learning to software engineering is that source code often contains a lot of rare identifiers, resulting in huge vocabularies. We propose a simple, yet effective method, based on identifier anonymization, to handle out-of-vocabulary (OOV) identifiers. Our method can be treated as a preprocessing step and, therefore, allows for easy implementation. We show that the proposed OOV anonymization method significantly improves the performance of the Transformer in two code processing tasks: code completion and bug fixing.

References used

https://aclanthology.org/

rate research

A Deep Metric Learning Approach to Account Linking

429 - Association for Computation Linguistics 2021 مقالة

We consider the task of linking social media accounts that belong to the same author in an automated fashion on the basis of the content and meta-data of the corresponding document streams. We focus on learning an embedding that maps variable-sized s amples of user activity--ranging from single posts to entire months of activity--to a vector space, where samples by the same author map to nearby points. Our approach does not require human-annotated data for training purposes, which allows us to leverage large amounts of social media content. The proposed model outperforms several competitive baselines under a novel evaluation framework modeled after established recognition benchmarks in other domains. Our method achieves high linking accuracy, even with small samples from accounts not seen at training time, a prerequisite for practical applications of the proposed linking framework.

deep metric learning deep metric metric learning approach التعلم متري العميق متري العميق نهج التعلم متري صناعة حمض الفوسفور المزيد..

CodeQA: A Question Answering Dataset for Source Code Comprehension

447 - Association for Computation Linguistics 2021 مقالة

We propose CodeQA, a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pai rs and a Python dataset with 70,085 question-answer pairs. To obtain natural and faithful questions and answers, we implement syntactic rules and semantic analysis to transform code comments into question-answer pairs. We present the construction process and conduct systematic analysis of our dataset. Experiment results achieved by several neural baselines on our dataset are shown and discussed. While research on question-answering and machine reading comprehension develops rapidly, few prior work has drawn attention to code question answering. This new dataset can serve as a useful research benchmark for source code comprehension.

source code comprehension شفرة المصدر الفهم صناعة حمض الفوسفور

A Simple and Efficient Multi-Task Learning Approach for Conditioned Dialogue Generation

477 - Association for Computation Linguistics 2021 مقالة

Conditioned dialogue generation suffers from the scarcity of labeled responses. In this work, we exploit labeled non-dialogue text data related to the condition, which are much easier to collect. We propose a multi-task learning approach to leverage both labeled dialogue and text data. The 3 tasks jointly optimize the same pre-trained Transformer -- conditioned dialogue generation task on the labeled dialogue data, conditioned language encoding task and conditioned language generation task on the labeled text data. Experimental results show that our approach outperforms the state-of-the-art models by leveraging the labeled texts, and it also obtains larger improvement in performance comparing to the previous methods to leverage text data.

simple and efficient conditioned dialogue generation efficient multi-task learning بسيطة وفعالة توليد الحوار مشروط التعلم متعدد المهام فعالة صناعة حمض الفوسفور المزيد..

Fake News Detection for Portuguese with Deep Learning

344 - Association for Computation Linguistics 2021 مقالة

The exponential growth of the internet and social media in the past decade gave way to the increase in dissemination of false or misleading information. Since the 2016 US presidential election, the term fake news'' became increasingly popular and thi s phenomenon has received more attention. In the past years several fact-checking agencies were created, but due to the great number of daily posts on social media, manual checking is insufficient. Currently, there is a pressing need for automatic fake news detection tools, either to assist manual fact-checkers or to operate as standalone tools. There are several projects underway on this topic, but most of them focus on English. This research-in-progress paper discusses the employment of deep learning methods, and the development of a tool, for detecting false news in Portuguese. As a first step we shall compare well-established architectures that were tested in other languages and analyse their performance on our Portuguese data. Based on the preliminary results of these classifiers, we shall choose a deep learning model or combine several deep learning models which hold promise to enhance the performance of our fake news detection system.

الفتيات الكابلات fake news detection كشف الأخبار وهمية صناعة حمض الفوسفور

Adversarial Self-Supervised Learning for Out-of-Domain Detection

506 - Association for Computation Linguistics 2021 مقالة

Detecting out-of-domain (OOD) intents is crucial for the deployed task-oriented dialogue system. Previous unsupervised OOD detection methods only extract discriminative features of different in-domain intents while supervised counterparts can directl y distinguish OOD and in-domain intents but require extensive labeled OOD data. To combine the benefits of both types, we propose a self-supervised contrastive learning framework to model discriminative semantic features of both in-domain intents and OOD intents from unlabeled data. Besides, we introduce an adversarial augmentation neural module to improve the efficiency and robustness of contrastive learning. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.

ood unsupervised ood detection intents عود كشف ood غير المعدل النوايا صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

نهج بسيط للتعامل مع المعرفات خارج المفردات في التعلم العميق للحصول على شفرة المصدر

Ask ChatGPT about the research

Read More

suggested questions