Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality

84 0 0.0 ( 0 )

Download Cite

Added by Carlo A. Furia

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Carlo A. Furia - Richard Torkar - Robert Feldt

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Statistical analysis is the tool of choice to turn data into information, and then information into empirical knowledge. To be valid, the process that goes from data to knowledge should be supported by detailed, rigorous guidelines, which help ferret out issues with the data or model, and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such guidelines are being developed by statisticians to support the latest techniques for Bayesian data analysis. In this article, we frame these guidelines in a way that is apt to empirical research in software engineering. To demonstrate the guidelines in practice, we apply them to reanalyze a GitHub dataset about code quality in different programming languages. The datasets original analysis (Ray et al., 2014) and a critical reanalysis (Berger at al., 2019) have attracted considerable attention -- in no small part because they target a topic (the impact of different programming languages) on which strong opinions abound. The goals of our reanalysis are largely orthogonal to this previous work, as we are concerned with demonstrating, on data in an interesting domain, how to build a principled Bayesian data analysis and to showcase some of its benefits. In the process, we will also shed light on some critical aspects of the analyzed data and of the relationship between programming languages and code quality. The high-level conclusions of our exercise will be that Bayesian statistical techniques can be applied to analyze software engineering data in a way that is principled, flexible, and leads to convincing results that inform the state of the art while highlighting the boundaries of its validity. The guidelines can support building solid statistical analyses and connecting their results, and hence help buttress continued progress in empirical software engineering research.

rate research

Bayesian Data Analysis in Empirical Software Engineering Research

205 - Carlo A. Furia , Robert Feldt , Richard Torkar 2018

Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics have traditionally dominated empirical data analysis, and certainly remain prevalent in empirical software engineering. This situation is unfortunate because frequentist statistics suffer from a number of shortcomings---such as lack of flexibility and results that are unintuitive and hard to interpret---that curtail their effectiveness when dealing with the heterogeneous data that is increasingly available for empirical analysis of software engineering practice. In this paper, we pinpoint these shortcomings, and present Bayesian data analysis techniques that provide tangible benefits---as they can provide clearer results that are simultaneously robust and nuanced. After a short, high-level introduction to the basic tools of Bayesian statistics, we present the reanalysis of two empirical studies on the effectiveness of automatically generated tests and the performance of programming languages. By contrasting the original frequentist analyses with our new Bayesian analyses, we demonstrate the concrete advantages of the latter. To conclude we advocate a more prominent role for Bayesian statistical techniques in empirical software engineering research and practice.

Software Engineering Methodology

Data Quality in Empirical Software Engineering: A Targeted Review

94 - Michael Franklin Bosu , Stephen G. MacDonell 2021

Context: The utility of prediction models in empirical software engineering (ESE) is heavily reliant on the quality of the data used in building those models. Several data quality challenges such as noise, incompleteness, outliers and duplicate data points may be relevant in this regard. Objective: We investigate the reporting of three potentially influential elements of data quality in ESE studies: data collection, data pre-processing, and the identification of data quality issues. This enables us to establish how researchers view the topic of data quality and the mechanisms that are being used to address it. Greater awareness of data quality should inform both the sound conduct of ESE research and the robust practice of ESE data collection and processing. Method: We performed a targeted literature review of empirical software engineering studies covering the period January 2007 to September 2012. A total of 221 relevant studies met our inclusion criteria and were characterized in terms of their consideration and treatment of data quality. Results: We obtained useful insights as to how the ESE community considers these three elements of data quality. Only 23 of these 221 studies reported on all three elements of data quality considered in this paper. Conclusion: The reporting of data collection procedures is not documented consistently in ESE studies. It will be useful if data collection challenges are reported in order to improve our understanding of why there are problems with software engineering data sets and the models developed from them. More generally, data quality should be given far greater attention by the community. The improvement of data sets through enhanced data collection, pre-processing and quality assessment should lead to more reliable prediction models, thus improving the practice of software engineering.

Software Engineering

A Taxonomy of Data Quality Challenges in Empirical Software Engineering

187 - Michael Franklin Bosu , Stephen G. MacDonell 2021

Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling; second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set; and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.

Software Engineering

Qualitative software engineering research -- reflections and guidelines

93 - Per Lenberg , Robert Feldt , Lucas Gren 2017

Researchers are increasingly recognizing the importance of human aspects in software development and since qualitative methods are used to, in-depth, explore human behavior, we believe that studies using such techniques will become more common. Existing qualitative software engineering guidelines do not cover the full breadth of qualitative methods and knowledge on using them found in the social sciences. The aim of this study was thus to extend the software engineering research communitys current body of knowledge regarding available qualitative methods and provide recommendations and guidelines for their use. With the support of an epistemological argument and a literature review, we suggest that future research would benefit from (1) utilizing a broader set of research methods, (2) more strongly emphasizing reflexivity, and (3) employing qualitative guidelines and quality criteria. We present an overview of three qualitative methods commonly used in social sciences but rarely seen in software engineering research, namely interpretative phenomenological analysis, narrative analysis, and discourse analysis. Furthermore, we discuss the meaning of reflexivity in relation to the software engineering context and suggest means of fostering it. Our paper will help software engineering researchers better select and then guide the application of a broader set of qualitative research methods.

Software Engineering Computers and Society

Bayesian Statistics in Software Engineering: Practical Guide and Case Studies

139 - Carlo A. Furia 2016

Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics has dominated data analysis in the past; but Bayesian statistics is making a comeback at the forefront of science. In this paper, we give a practical overview of Bayesian statistics and illustrate its main advantages over frequentist statistics for the kinds of analyses that are common in empirical software engineering, where frequentist statistics still is standard. We also apply Bayesian statistics to empirical data from previous research investigating agile vs. structured development processes, the performance of programming languages, and random testing of object-oriented programs. In addition to being case studies demonstrating how Bayesian analysis can be applied in practice, they provide insights beyond the results in the original publications (which used frequentist statistics), thus showing the practical value brought by Bayesian statistics.

Software Engineering