No Arabic abstract
Representative sampling appears rare in empirical software engineering research. Not all studies need representative samples, but a general lack of representative sampling undermines a scientific field. This article therefore reports a systematic review of the state of sampling in recent, high-quality software engineering research. The key findings are: (1) random sampling is rare; (2) sophisticated sampling strategies are very rare; (3) sampling, representativeness and randomness often appear misunderstood. These findings suggest that textit{software engineering research has a generalizability crisis}. To address these problems, this paper synthesizes existing knowledge of sampling into a succinct primer and proposes extensive guidelines for improving the conduct, presentation and evaluation of sampling in software engineering research. It is further recommended that while researchers should strive for more representative samples, disparaging non-probability sampling is generally capricious and particularly misguided for predominately qualitative research.
Researchers are increasingly recognizing the importance of human aspects in software development and since qualitative methods are used to, in-depth, explore human behavior, we believe that studies using such techniques will become more common. Existing qualitative software engineering guidelines do not cover the full breadth of qualitative methods and knowledge on using them found in the social sciences. The aim of this study was thus to extend the software engineering research communitys current body of knowledge regarding available qualitative methods and provide recommendations and guidelines for their use. With the support of an epistemological argument and a literature review, we suggest that future research would benefit from (1) utilizing a broader set of research methods, (2) more strongly emphasizing reflexivity, and (3) employing qualitative guidelines and quality criteria. We present an overview of three qualitative methods commonly used in social sciences but rarely seen in software engineering research, namely interpretative phenomenological analysis, narrative analysis, and discourse analysis. Furthermore, we discuss the meaning of reflexivity in relation to the software engineering context and suggest means of fostering it. Our paper will help software engineering researchers better select and then guide the application of a broader set of qualitative research methods.
Background: Assessing and communicating software engineering research can be challenging. Design science is recognized as an appropriate research paradigm for applied research but is seldom referred to in software engineering. Applying the design science lens to software engineering research may improve the assessment and communication of research contributions. Aim: The aim of this study is 1) to understand whether the design science lens helps summarize and assess software engineering research contributions, and 2) to characterize different types of design science contributions in the software engineering literature. Method: In previous research, we developed a visual abstract template, summarizing the core constructs of the design science paradigm. In this study, we use this template in a review of a set of 38 top software engineering publications to extract and analyze their design science contributions. Results: We identified five clusters of papers, classifying them according to their alignment with the design science paradigm. Conclusions: The design science lens helps emphasize the theoretical contribution of research output---in terms of technological rules---and reflect on the practical relevance, novelty, and rigor of the rules proposed by the research.
A meaningful and deep understanding of the human aspects of software engineering (SE) requires psychological constructs to be considered. Psychology theory can facilitate the systematic and sound development as well as the adoption of instruments (e.g., psychological tests, questionnaires) to assess these constructs. In particular, to ensure high quality, the psychometric properties of instruments need evaluation. In this paper, we provide an introduction to psychometric theory for the evaluation of measurement instruments for SE researchers. We present guidelines that enable using existing instruments and developing new ones adequately. We conducted a comprehensive review of the psychology literature framed by the Standards for Educational and Psychological Testing. We detail activities used when operationalizing new psychological constructs, such as item pooling, item review, pilot testing, item analysis, factor analysis, statistical property of items, reliability, validity, and fairness in testing and test bias. We provide an openly available example of a psychometric evaluation based on our guideline. We hope to encourage a culture change in SE research towards the adoption of established methods from psychology. To improve the quality of behavioral research in SE, studies focusing on introducing, validating, and then using psychometric instruments need to be more common.
An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept of Deep Learning (DL). The popularity of such techniques largely stems from their automated feature engineering capabilities, which aid in modeling software artifacts. However, due to the rapid pace at which DL techniques have been adopted, it is difficult to distill the current successes, failures, and opportunities of the current research landscape. In an effort to bring clarity to this cross-cutting area of work, from its modern inception to the present, this paper presents a systematic literature review of research at the intersection of SE & DL. The review canvases work appearing in the most prominent SE and DL conferences and journals and spans 84 papers across 22 unique SE tasks. We center our analysis around the components of learning, a set of principles that govern the application of machine learning techniques (ML) to a given problem domain, discussing several aspects of the surveyed work at a granular level. The end result of our analysis is a research roadmap that both delineates the foundations of DL techniques applied to SE research, and likely areas of fertile exploration for the future.
Empirical Standards are natural-language models of a scientific communitys expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.