CGEMs: A Metric Model for Automatic Code Generation using GPT-3

71 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Aishwarya N

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Aishwarya Narasimhan B M S College of Engineering

الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Today, AI technology is showing its strengths in almost every industry and walks of life. From text generation, text summarization, chatbots, NLP is being used widely. One such paradigm is automatic code generation. An AI could be generating anything; hence the output space is unconstrained. A self-driving car is driven for 100 million miles to validate its safety, but tests cannot be written to monitor and cover an unconstrained space. One of the solutions to validate AI-generated content is to constrain the problem and convert it from abstract to realistic, and this can be accomplished by either validating the unconstrained algorithm using theoretical proofs or by using Monte-Carlo simulation methods. In this case, we use the latter approach to test/validate a statistically significant number of samples. This hypothesis of validating the AI-generated code is the main motive of this work and to know if AI-generated code is reliable, a metric model CGEMs is proposed. This is an extremely challenging task as programs can have different logic with different naming conventions, but the metrics must capture the structure and logic of the program. This is similar to the importance grammar carries in AI-based text generation, Q&A, translations, etc. The various metrics that are garnered in this work to support the evaluation of generated code are as follows: Compilation, NL description to logic conversion, number of edits needed, some of the commonly used static-code metrics and NLP metrics. These metrics are applied to 80 codes generated using OpenAIs GPT-3. Post which a Neural network is designed for binary classification (acceptable/not acceptable quality of the generated code). The inputs to this network are the values of the features obtained from the metrics. The model achieves a classification accuracy of 76.92% and an F1 score of 55.56%. XAI is augmented for model interpretability.

قيم البحث

70 - Martin Bauer , Harald Kostler , Ulrich Rude 2020

Lattice Boltzmann methods are a popular mesoscopic alternative to macroscopic computational fluid dynamics solvers. Many variants have been developed that vary in complexity, accuracy, and computational cost. Extensions are available to simulate mult i-phase, multi-component, turbulent, or non-Newtonian flows. In this work we present lbmpy, a code generation package that supports a wide variety of different methods and provides a generic development environment for new schemes as well. A high-level domain-specific language allows the user to formulate, extend and test various lattice Boltzmann schemes. The method specification is represented in a symbolic intermediate representation. Transformations that operate on this intermediate representation optimize and parallelize the method, yielding highly efficient lattice Boltzmann compute kernels not only for single- and two-relaxation-time schemes but also for multi-relaxation-time, cumulant, and entropically stabilized methods. An integration into the HPC framework waLBerla makes massively parallel, distributed simulations possible, which is demonstrated through scaling experiments on the SuperMUC-NG supercomputing system

البرمجيات الرياضية الهندسة الحاسوبية، المالية،العلوم النظم الموزعة والتوازية والحوسبة العنقودية

Automatic Generation of Interpolants for Lattice Samplings: Part II -- Implementation and Code Generation

122 - Joshua Horacsek , Usman Alim 2021

In the prequel to this paper, we presented a systematic framework for processing spline spaces. In this paper, we take the results of that framework and provide a code generation pipeline that automatically generates efficient implementations of spli ne spaces. We decompose the final algorithm from Part I and translate the resulting components into LLVM-IR (a low level language that can be compiled to various targets/architectures). Our design provides a handful of parameters for a practitioner to tune - this is one of the avenues that provides us with the flexibility to target many different computational architectures and tune performance on those architectures. We also provide an evaluation of the effect of the different parameters on performance.

البرمجيات الرياضية

Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning

137 - Binbin Xie , Jinsong Su , Yubin Ge 2021

Code generation aims to automatically generate a piece of code given an input natural language utterance. Currently, among dominant models, it is treated as a sequence-to-tree task, where a decoder outputs a sequence of actions corresponding to the p re-order traversal of an Abstract Syntax Tree. However, such a decoder only exploits the preorder traversal based preceding actions, which are insufficient to ensure correct action predictions. In this paper, we first throughly analyze the context modeling difference between neural code generation models with different traversals based decodings (preorder traversal vs breadth-first traversal), and then propose to introduce a mutual learning framework to jointly train these models. Under this framework, we continuously enhance both two models via mutual distillation, which involves synchronous executions of two one-to-one knowledge transfers at each training step. More specifically, we alternately choose one model as the student and the other as its teacher, and require the student to fit the training data and the action prediction distributions of its teacher. By doing so, both models can fully absorb the knowledge from each other and thus could be improved simultaneously. Experimental results and in-depth analysis on several benchmark datasets demonstrate the effectiveness of our approach. We release our code at https://github.com/DeepLearnXMU/CGML.

الذكاء الاصطناعي

Fine-tuning GPT-3 for Russian Text Summarization

434 - Alexandr Nikolich , Arina Puchkova 2021

Automatic summarization techniques aim to shorten and generalize information given in the text while preserving its core message and the most relevant ideas. This task can be approached and treated with a variety of methods, however, not many attempt s have been made to produce solutions specifically for the Russian language despite existing localizations of the state-of-the-art models. In this paper, we aim to showcase ruGPT3 ability to summarize texts, fine-tuning it on the corpora of Russian news with their corresponding human-generated summaries. Additionally, we employ hyperparameter tuning so that the models output becomes less random and more tied to the original text. We evaluate the resulting texts with a set of metrics, showing that our solution can surpass the state-of-the-art models performance without additional changes in architecture or loss function. Despite being able to produce sensible summaries, our model still suffers from a number of flaws, namely, it is prone to altering Named Entities present in the original text (such as surnames, places, dates), deviating from facts stated in the given document, and repeating the information in the summary.

الحساب واللغة

Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2

351 - Virapat Kieuvongngam , Bowen Tan , Yiming Niu 2020

With the COVID-19 pandemic, there is a growing urgency for medical community to keep up with the accelerating growth in the new coronavirus-related literature. As a result, the COVID-19 Open Research Dataset Challenge has released a corpus of scholar ly articles and is calling for machine learning approaches to help bridging the gap between the researchers and the rapidly growing publications. Here, we take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2, to solve this challenge by performing text summarization on this dataset. We evaluate the results using ROUGE scores and visual inspection. Our model provides abstractive and comprehensive information based on keywords extracted from the original articles. Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.

الحساب واللغة التعلم الآلي