New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

نمذجة طويلة المدى لملفات التعليمات البرمجية المصدر مع Ewash: الوصول الموسع نافذة الوصول عن طريق التسلسل الهرمي بناء الجملة

408 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

extended window access syntax hierarchy extended window إمكانية الوصول إلى النافذة الممتدة بناء جملة الهرمية نافذة ممتدة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

عثرت نمذجة اللغة الإحصائية والترجمة مع المحولات العديد من التطبيقات الناجحة في فهم البرنامج ومهام الجيل، وتحديد معايير عالية للأدوات في بيئات تطوير البرمجيات الحديثة. ومع ذلك، فإن نافذة السياق المحدودة لهذه النماذج العصبية تعني أنهم لن يكونوا غير قادرين على الاستفادة من السياق الكامل بأكمله من الملفات والحزم الكبيرة لأي مهمة معينة. في حين أن هناك العديد من الجهود المبذولة لتوسيع نافذة السياق، فإننا نقدم نهجا مستقلا بالهندسة المعمارية للاستفادة من التسلسلات الهيدروجسية النحوية من التعليمات البرمجية المصدرية لإدماج سياق كامل مستوى الملف في نافذة ذات طول ثابت. باستخدام أشجار بناء جملة الخرسانة من كل ملف مصدر نستخرج التسلسلات الهرمية النحوية ودمجها في نافذة السياق عن طريق إزالة بشكل انتقائي من عرض نطاقات أكثر تحديدا وأقل أهمية لمهمة معينة. نقوم بتقييم هذا النهج على مهام توليد التعليمات البرمجية والترجمة المشتركة للغة الطبيعية ومزدئة المصدر في لغة البرمجة الثابتة، وتحقيق حالة جديدة من بين الفن في إكمال التعليمات البرمجية وتلخيص Python في معيار Codexglue. نقدم أيضا معايير CodexGlue جديدة للمهام الدوافع المتعلقة بتجربة المستخدمين: إكمال التعليمات البرمجية مع الحرفيات الطبيعية، طريقة إتمام الأسلوب / تلخيص / رمز رمز مكيف في سياق مستوى الملفات.

Statistical language modeling and translation with transformers have found many successful applications in program understanding and generation tasks, setting high benchmarks for tools in modern software development environments. The finite context window of these neural models means, however, that they will be unable to leverage the entire relevant context of large files and packages for any given task. While there are many efforts to extend the context window, we introduce an architecture-independent approach for leveraging the syntactic hierarchies of source code for incorporating entire file-level context into a fixed-length window. Using concrete syntax trees of each source file we extract syntactic hierarchies and integrate them into context window by selectively removing from view more specific, less relevant scopes for a given task. We evaluate this approach on code generation tasks and joint translation of natural language and source code in Python programming language, achieving a new state-of-the-art in code completion and summarization for Python in the CodeXGLUE benchmark. We also introduce new CodeXGLUE benchmarks for user-experience-motivated tasks: code completion with normalized literals, method body completion/code summarization conditioned on file-level context.

References used

https://aclanthology.org/

rate research

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

206 - Association for Computation Linguistics 2021 مقالة

Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract syntax trees (ASTs) is conducive to summary generation. However, existing approaches fail to fully capture the rich information in ASTs because of the large size/depth of ASTs. In this paper, we propose a novel model CAST that hierarchically splits and reconstructs ASTs. First, we hierarchically split a large AST into a set of subtrees and utilize a recursive neural network to encode the subtrees. Then, we aggregate the embeddings of subtrees by reconstructing the split ASTs to get the representation of the complete AST. Finally, AST representation, together with source code embedding obtained by a vanilla code token encoder, is used for code summarization. Extensive experiments, including the ablation study and the human evaluation, on benchmarks have demonstrated the power of CAST. To facilitate reproducibility, our code and data are available at https://github.com/DeepSoftwareAnalytics/CAST.

hierarchical splitting splitting and reconstruction abstract syntax trees تقسيم هرمي تقسيم وإعادة الإعمار أشجار بناء الجملة مجردة صناعة حمض الفوسفور المزيد..

Access Time Reduction To Image Files Using Enhanced B+ Tree Indexing Method

1500 - Tishreen University 2015 ورقة بحثية

The Research suggests a novel model aims to reduce the time of search for image files by proposing a new indexing mechanism to avoid the plague algorithm used with indexing so that the access time to these files becomes as less as possible. The fi rst stage in this paper is to clarify the importance of archiving in organizing files via designing a database, storing images in it and recording the times needed to obtain the required files from the database. Then the indexing process for image files stored in the database is applied by proposing a new algorithm -B+ Tree enhanced- for organizing image files according to a certain mechanism to facilitate accessing any file, conducting queries and recording the times used to get those files from the database to compare them with the times required to access files before indexing in order to show the efficiency of the proposed method.

قواعد البيانات Database الفهرسة الشجرية الفهرسة الاستعلام Tree indexing Indexing query B+ Tree المزيد..

Do Long-Range Language Models Actually Use Long-Range Context?

333 - Association for Computation Linguistics 2021 مقالة

Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency of self-atte ntion have led to a proliferation of long-range Transformer language models, which can process much longer sequences than models of the past. However, the ways in which such models take advantage of the long-range context remain unclear. In this paper, we perform a fine-grained analysis of two long-range Transformer language models (including the Routing Transformer, which achieves state-of-the-art perplexity on the PG-19 long-sequence LM benchmark dataset) that accept input sequences of up to 8K tokens. Our results reveal that providing long-range context (i.e., beyond the previous 2K tokens) to these models only improves their predictions on a small set of tokens (e.g., those that can be copied from the distant context) and does not help at all for sentence-level prediction tasks. Finally, we discover that PG-19 contains a variety of different document types and domains, and that long-range context helps most for literary novels (as opposed to textbooks or magazines).

اشرح الجملة long-range transformer language لغة محول طويلة المدى صناعة حمض الفوسفور

CoTexT: Multi-task Learning with Code-Text Transformer

292 - Association for Computation Linguistics 2021 مقالة

We present CoTexT, a pre-trained, transformer-based encoder-decoder model that learns the representative context between natural language (NL) and programming language (PL). Using self-supervision, CoTexT is pre-trained on large programming language corpora to learn a general understanding of language and code. CoTexT supports downstream NL-PL tasks such as code summarizing/documentation, code generation, defect detection, and code debugging. We train CoTexT on different combinations of available PL corpus including both bimodal'' and unimodal'' data. Here, bimodal data is the combination of text and corresponding code snippets, whereas unimodal data is merely code snippets. We first evaluate CoTexT with multi-task learning: we perform Code Summarization on 6 different programming languages and Code Refinement on both small and medium size featured in the CodeXGLUE dataset. We further conduct extensive experiments to investigate CoTexT on other tasks within the CodeXGlue dataset, including Code Generation and Defect Detection. We consistently achieve SOTA results in these tasks, demonstrating the versatility of our models.

code-text transformer code محول النص رمز الشفرة صناعة حمض الفوسفور

Retrieval Augmented Code Generation and Summarization

628 - Association for Computation Linguistics 2021 مقالة

Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting t hem. To mimic developers' code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. REDCODER has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.

تحسين المنطق العددي code generation augmented code generation رمز الجيل توليد رمز المعزز صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

نمذجة طويلة المدى لملفات التعليمات البرمجية المصدر مع Ewash: الوصول الموسع نافذة الوصول عن طريق التسلسل الهرمي بناء الجملة

Ask ChatGPT about the research

Read More

suggested questions