Automating Test Case Identification in Java Open Source Projects on GitHub

81 0 0.0 ( 0 )

Download Cite

Added by Matej Madeja

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Matej Madeja - Jaroslav Poruban - Michaela Bav{c}ikova

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are written in real projects. In this paper, the following was investigated: (i) the denotation of the word test in different natural languages; (ii) whether the number of occurrences of the word test correlates with the number of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3M GitHub projects. We analyzed 20,340 test cases in 803 classes manually and 170k classes using an automated approach. The results show that: (i) there exists a weak correlation (r = 0.655) between the number of occurrences of the word test and the number of test cases in a class; (ii) the proposed algorithm using static file analysis correctly detected 97% of test cases; (iii) 15% of the analyzed classes used main() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very complex due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions, and to mine tests to improve program comprehension in the future.

rate research

Identifying Unmaintained Projects in GitHub

131 - Jailton Coelho , Marco Tulio Valente , Luciana L. Silva 2018

Background: Open source software has an increasing importance in modern software development. However, there is also a growing concern on the sustainability of such projects, which are usually managed by a small number of developers, frequently working as volunteers. Aims: In this paper, we propose an approach to identify GitHub projects that are not actively maintained. Our goal is to alert users about the risks of using these projects and possibly motivate other developers to assume the maintenance of the projects. Method: We train machine learning models to identify unmaintained or sparsely maintained projects, based on a set of features about project activity (commits, forks, issues, etc). We empirically validate the model with the best performance with the principal developers of 129 GitHub projects. Results: The proposed machine learning approach has a precision of 80%, based on the feedback of real open source developers; and a recall of 96%. We also show that our approach can be used to assess the risks of projects becoming unmaintained. Conclusions: The model proposed in this paper can be used by open source users and developers to identify GitHub projects that are not actively maintained anymore.

Software Engineering

Making Quantum Computing Open: Lessons from Open-Source Projects

76 - Ruslan Shaydulin , Caleb Thomas , Paige Rodeghero 2019

Quantum computing (QC) is an emerging computing paradigm with potential to revolutionize the field of computing. QC is a field that is quickly developing globally and has high barriers of entry. In this paper we explore both successful contributors to the field as well as wider QC community with the goal of understanding the backgrounds and training that helped them succeed. We gather data on 148 contributors to open-source quantum computing projects hosted on GitHub and survey 46 members of QC community. Our findings show that QC practitioners and enthusiasts have diverse backgrounds, with most of them having a PhD and trained in physics or computer science. We observe a lack of educational resources on quantum computing. Our goal for these findings is to start a conversation about how best to prepare the next generation of QC researchers and practitioners.

Software Engineering

Visualization of Contributions to Open-Source Projects

74 - Andreas Schreiber 2020

We want to analyze visually, to what extend team members and external developers contribute to open-source projects. This gives a high-level impression about collaboration in that projects. We achieve this by recording provenance of the development process and use graph drawing on the resulting provenance graph. Our graph drawings show, which developers are jointly changed the same files -- and to what extent -- which we show at Germanys COVID-19 exposure notification app Corona-Warn-App.

Software Engineering Information Retrieval

Characterizing the Roles of Contributors in Open-source Scientific Software Projects

75 - Reed Milewicz , Gustavo Pinto , Paige Rodeghero 2020

The development of scientific software is, more than ever, critical to the practice of science, and this is accompanied by a trend towards more open and collaborative efforts. Unfortunately, there has been little investigation into who is driving the evolution of such scientific software or how the collaboration happens. In this paper, we address this problem. We present an extensive analysis of seven open-source scientific software projects in order to develop an empirically-informed model of the development process. This analysis was complemented by a survey of 72 scientific software developers. In the majority of the projects, we found senior research staff (e.g. professors) to be responsible for half or more of commits (an average commit share of 72%) and heavily involved in architectural concerns (seniors were more likely to interact with files related to the build system, project meta-data, and developer documentation). Juniors (e.g.graduate students) also contribute substantially -- in one studied project, juniors made almost 100% of its commits. Still, graduate students had the longest contribution periods among juniors (with 1.72 years of commit activity compared to 0.98 years for postdocs and 4 months for undergraduates). Moreover, we also found that third-party contributors are scarce, contributing for just one day for the project. The results from this study aim to help scientists to better understand their own projects, communities, and the contributors behavior, while paving the road for future software engineering research

Software Engineering

Poster: Communication in Open-Source Projects--End of the E-mail Era?

177 - Verena Kafer , Daniel Graziotin , Ivan Bogicevic 2018

Communication is essential in software engineering. Especially in distributed open-source teams, communication needs to be supported by channels including mailing lists, forums, issue trackers, and chat systems. Yet, we do not have a clear understanding of which communication channels stakeholders in open-source projects use. In this study, we fill the knowledge gap by investigating a statistically representative sample of 400 GitHub projects. We discover the used communication channels by regular expressions on project data. We show that (1) half of the GitHub projects use observable communication channels; (2) GitHub Issues, e-mail addresses, and the modern chat system Gitter are the most common channels; (3) mailing lists are only in place five and have a lower market share than all modern chat systems combined.

Software Engineering