ARTENOLIS: Automated Reproducibility and Testing Environment for Licensed Software

69 0 0.0 ( 0 )

Download Cite

Added by Ronan M.T. Fleming Dr

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Laurent Heirendt - Sylvain Arreckx - Christophe Trefois

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Motivation: Automatically testing changes to code is an essential feature of continuous integration. For open-source code, without licensed dependencies, a variety of continuous integration services exist. The COnstraint-Based Reconstruction and Analysis (COBRA) Toolbox is a suite of open-source code for computational modelling with dependencies on licensed software. A novel automated framework of continuous integration in a semi-licensed environment is required for the development of the COBRA Toolbox and related tools of the COBRA community. Results: ARTENOLIS is a general-purpose infrastructure software application that implements continuous integration for open-source software with licensed dependencies. It uses a master-slave framework, tests code on multiple operating systems, and multip

rate research

Run, Forest, Run? On Randomization and Reproducibility in Predictive Software Engineering

120 - Cynthia C. S. Liem , Annibale Panichella 2020

Machine learning (ML) has been widely used in the literature to automate software engineering tasks. However, ML outcomes may be sensitive to randomization in data sampling mechanisms and learning procedures. To understand whether and how researchers in SE address these threats, we surveyed 45 recent papers related to three predictive tasks: defect prediction (DP), predictive mutation testing (PMT), and code smell detection (CSD). We found that less than 50% of the surveyed papers address the threats related to randomized data sampling (via multiple repetitions); only 8% of the papers address the random nature of ML; and parameter values are rarely reported (only 18% of the papers). To assess the severity of these threats, we conducted an empirical study using 26 real-world datasets commonly considered for the three predictive tasks of interest, considering eight common supervised ML classifiers. We show that different data resamplings for 10-fold cross-validation lead to extreme variability in observed performance results. Furthermore, randomized ML methods also show non-negligible variability for different choices of random seeds. More worryingly, performance and variability are inconsistent for different implementations of the conceptually same ML method in different libraries, as also shown through multi-dataset pairwise comparison. To cope with these critical threats, we provide practical guidelines on how to validate, assess, and report the results of predictive methods.

Software Engineering

On the Replicability and Reproducibility of Deep Learning in Software Engineering

86 - Chao Liu , Cuiyun Gao , Xin Xia 2020

Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) reproducibility - whether one reported experimental findings can be reproduced by new experiments with the same experimental protocol and DL model, but different sampled real-world data. Unlike traditional machine learning (ML) models, DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process. In this study, we conducted a literature review on 93 DL studies recently published in twenty SE journals or conferences. Our statistics show the urgency of investigating these two factors in SE. Moreover, we re-ran four representative DL models in SE. Experimental results show the importance of replicability and reproducibility, where the reported performance of a DL model could not be replicated for an unstable optimization process. Reproducibility could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. It is therefore urgent for the SE community to provide a long-lasting link to a replication package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.

Software Engineering Machine Learning

Use of Docker for deployment and testing of astronomy software

112 - D. Morris , S. Voutsinas , N.C. Hambly 2017

We describe preliminary investigations of using Docker for the deployment and testing of astronomy software. Docker is a relatively new containerisation technology that is developing rapidly and being adopted across a range of domains. It is based upon virtualization at operating system level, which presents many advantages in comparison to the more traditional hardware virtualization that underpins most cloud computing infrastructure today. A particular strength of Docker is its simple format for describing and managing software containers, which has benefits for software developers, system administrators and end users. We report on our experiences from two projects -- a simple activity to demonstrate how Docker works, and a more elaborate set of services that demonstrates more of its capabilities and what they can achieve within an astronomical context -- and include an account of how we solved problems through interaction with Dockers very active open source development community, which is currently the key to the most effective use of this rapidly-changing technology.

Software Engineering Instrumentation and Methods for Astrophysics

Searching publications on software testing

155 - C. A. Middelburg 2010

This note concerns a search for publications in which the pragmatic concept of a test as conducted in the practice of software testing is formalized, a theory about software testing based on such a formalization is presented or it is demonstrated on the basis of such a theory that there are solid grounds to test software in cases where in principle other forms of analysis could be used. This note reports on the way in which the search has been carried out and the main outcomes of the search. The message of the note is that the fundamentals of software testing are not yet complete in some respects.

Software Engineering

Enablers and Impediments for Collaborative Research in Software Testing: An Empirical Exploration

134 - Eduard Paul Enoiu , Adnan Causevic 2014

When it comes to industrial organizations, current collaboration efforts in software engineering research are very often kept in-house, depriving these organizations off the skills necessary to build independent collaborative research. The current trend, towards empirical software engineering research, requires certain standards to be established which would guide these collaborative efforts in creating a strong partnership that promotes independent, evidence-based, software engineering research. This paper examines key enabling factors for an efficient and effective industry-academia collaboration in the software testing domain. A major finding of the research was that while technology is a strong enabler to better collaboration, it must be complemented with industrial openness to disclose research results and the use of a dedicated tooling platform. We use as an example an automated test generation approach that has been developed in the last two years collaboratively with Bombardier Transportation AB in Sweden.

Software Engineering