Run, Forest, Run? On Randomization and Reproducibility in Predictive Software Engineering

121 0 0.0 ( 0 )

Download Cite

Added by Cynthia Liem

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Cynthia C. S. Liem - Annibale Panichella

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Machine learning (ML) has been widely used in the literature to automate software engineering tasks. However, ML outcomes may be sensitive to randomization in data sampling mechanisms and learning procedures. To understand whether and how researchers in SE address these threats, we surveyed 45 recent papers related to three predictive tasks: defect prediction (DP), predictive mutation testing (PMT), and code smell detection (CSD). We found that less than 50% of the surveyed papers address the threats related to randomized data sampling (via multiple repetitions); only 8% of the papers address the random nature of ML; and parameter values are rarely reported (only 18% of the papers). To assess the severity of these threats, we conducted an empirical study using 26 real-world datasets commonly considered for the three predictive tasks of interest, considering eight common supervised ML classifiers. We show that different data resamplings for 10-fold cross-validation lead to extreme variability in observed performance results. Furthermore, randomized ML methods also show non-negligible variability for different choices of random seeds. More worryingly, performance and variability are inconsistent for different implementations of the conceptually same ML method in different libraries, as also shown through multi-dataset pairwise comparison. To cope with these critical threats, we provide practical guidelines on how to validate, assess, and report the results of predictive methods.

rate research

On the Replicability and Reproducibility of Deep Learning in Software Engineering

86 - Chao Liu , Cuiyun Gao , Xin Xia 2020

Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) reproducibility - whether one reported experimental findings can be reproduced by new experiments with the same experimental protocol and DL model, but different sampled real-world data. Unlike traditional machine learning (ML) models, DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process. In this study, we conducted a literature review on 93 DL studies recently published in twenty SE journals or conferences. Our statistics show the urgency of investigating these two factors in SE. Moreover, we re-ran four representative DL models in SE. Experimental results show the importance of replicability and reproducibility, where the reported performance of a DL model could not be replicated for an unstable optimization process. Reproducibility could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. It is therefore urgent for the SE community to provide a long-lasting link to a replication package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.

Software Engineering Machine Learning

ARTENOLIS: Automated Reproducibility and Testing Environment for Licensed Software

68 - Laurent Heirendt , Sylvain Arreckx , Christophe Trefois 2017

Motivation: Automatically testing changes to code is an essential feature of continuous integration. For open-source code, without licensed dependencies, a variety of continuous integration services exist. The COnstraint-Based Reconstruction and Analysis (COBRA) Toolbox is a suite of open-source code for computational modelling with dependencies on licensed software. A novel automated framework of continuous integration in a semi-licensed environment is required for the development of the COBRA Toolbox and related tools of the COBRA community. Results: ARTENOLIS is a general-purpose infrastructure software application that implements continuous integration for open-source software with licensed dependencies. It uses a master-slave framework, tests code on multiple operating systems, and multip

Software Engineering

Quantum Software Engineering: Landscapes and Horizons

97 - Jianjun Zhao 2020

Quantum software plays a critical role in exploiting the full potential of quantum computing systems. As a result, it is drawing increasing attention recently. This paper defines the term quantum software engineering and introduces a quantum software life cycle. Based on these, the paper provides a comprehensive survey of the current state of the art in the field and presents the challenges and opportunities that we face. The survey summarizes the technology available in the various phases of the quantum software life cycle, including quantum software requirements analysis, design, implementation, test, and maintenance. It also covers the crucial issue of quantum software reuse.

Software Engineering Programming Languages Quantum Physics

Towards Modal Software Engineering

519 - Ramy Shahin 2021

In this paper we introduce the notion of Modal Software Engineering: automatically turning sequential, deterministic programs into semantically equivalent programs efficiently operating on inputs coming from multiple overlapping worlds. We are drawing an analogy between modal logics, and software application domains where multiple sets of inputs (multiple worlds) need to be processed efficiently. Typically those sets highly overlap, so processing them independently would involve a lot of redundancy, resulting in lower performance, and in many cases intractability. Three application domains are presented: reasoning about feature-based variability of Software Product Lines (SPLs), probabilistic programming, and approximate programming.

Software Engineering Programming Languages

A Survey on Deep Learning for Software Engineering

120 - Yanming Yang , Xin Xia , David Lo 2020

In 2006, Geoffrey Hinton proposed the concept of training Deep Neural Networks (DNNs) and an improved model training method to break the bottleneck of neural network development. More recently, the introduction of AlphaGo in 2016 demonstrated the powerful learning ability of deep learning and its enormous potential. Deep learning has been increasingly used to develop state-of-the-art software engineering (SE) research tools due to its ability to boost performance for various SE tasks. There are many factors, e.g., deep learning model selection, internal structure differences, and model optimization techniques, that may have an impact on the performance of DNNs applied in SE. Few works to date focus on summarizing, classifying, and analyzing the application of deep learning techniques in SE. To fill this gap, we performed a survey to analyse the relevant studies published since 2006. We first provide an example to illustrate how deep learning techniques are used in SE. We then summarize and classify different deep learning techniques used in SE. We analyzed key optimization technologies used in these deep learning models, and finally describe a range of key research topics using DNNs in SE. Based on our findings, we present a set of current challenges remaining to be investigated and outline a proposed research road map highlighting key opportunities for future work.

Software Engineering