An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects

142 0 0.0 ( 0 )

Download Cite

Added by Bihuan Chen

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Ying Wang - Bihuan Chen - Kaifeng Huang

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Third-party libraries are a central building block to develop software systems. However, outdated third-party libraries are commonly used, and developers are usually less aware of the potential risks. Therefore, a quantitative and holistic study on usages, updates and risks of third-party libraries can provide practical insights to improve the ecosystem sustainably. In this paper, we conduct such a study in the Java ecosystem. Specifically, we conduct a library usage analysis (e.g., usage intensity and outdatedness) and a library update analysis (e.g., update intensity and delay) using 806 open-source projects. The two analyses aim to quantify usage and update practices holistically from the perspective of both open-source projects and third-party libraries. Then, we conduct a library risk analysis (e.g., potential risk and developer response) in terms of bugs with 15 popularly-used third-party libraries. This analysis aims to quantify the potential risk of using outdated libraries and the developer response to the risk. Our findings from the three analyses provide practical insights to developers and researchers on problems and potential solutions in maintaining third-party libraries (e.g., smart alerting and automated updating of outdated libraries). To demonstrate the usefulness of our findings, we propose a bug-driven alerting system for assisting developers to make confident decisions in updating third-party libra

rate research

An Empirical Study of Data Constraint Implementations in Java

129 - Juan Manuel Florez , Laura Moreno , Zenong Zhang 2021

Software systems are designed according to guidelines and constraints defined by business rules. Some of these constraints define the allowable or required values for data handled by the systems. These data constraints usually originate from the problem domain (e.g., regulations), and developers must write code that enforces them. Understanding how data constraints are implemented is essential for testing, debugging, and software change. Unfortunately, there are no widely-accepted guidelines or best practices on how to implement data constraints. This paper presents an empirical study that investigates how data constraints are implemented in Java. We study the implementation of 187 data constraints extracted from the documentation of eight real-world Java software systems. First, we perform a qualitative analysis of the textual description of data constraints and identify four data constraint types. Second, we manually identify the implementations of these data constraints and reveal that they can be grouped into 30 implementation patterns. The analysis of these implementation patterns indicates that developers prefer a handful of patterns when implementing data constraints and deviations from these patterns are associated with unusual implementation decisions or code smells. Third, we develop a tool-assisted protocol that allows us to identify 256 additional trace links for the data constraints implemented using the 13 most common patterns. We find that almost half of these data constraints have multiple enforcing statements, which are code clones of different types.

Software Engineering

Research on Third-Party Libraries in AndroidApps: A Taxonomy and Systematic LiteratureReview

140 - Xian Zhan , Tianming Liu , Lingling Fan 2021

Third-party libraries (TPLs) have been widely used in mobile apps, which play an essential part in the entire Android ecosystem. However, TPL is a double-edged sword. On the one hand, it can ease the development of mobile apps. On the other hand, it also brings security risks such as privacy leaks or increased attack surfaces (e.g., by introducing over-privileged permissions) to mobile apps. Although there are already many studies for characterizing third-party libraries, including automated detection, security and privacy analysis of TPLs, TPL attributes analysis, etc., what strikes us odd is that there is no systematic study to summarize those studies endeavors. To this end, we conduct the first systematic literature review on Android TPL-related research. Following a well-defined systematic literature review protocol, we collected 74 primary research papers closely related to the Android third-party library from 2012 to 2020. After carefully examining these studies, we designed a taxonomy of TPL-related research studies and conducted a systematic study to summarize current solutions, limitations, challenges and possible implications of new research directions related to third-party library analysis. We hope that these contributions can give readers a clear overview of existing TPL-related studies and inspire them to go beyond the current status quo by advancing the discipline with innovative approaches.

Software Engineering

The Behavioral Diversity of Java JSON Libraries

80 - Nicolas Harrand , Thomas Durieux , David Broman 2021

JSON is an essential file and data format in do-mains that span scientific computing, web APIs or configuration management. Its popularity has motivated significant software development effort to build multiple libraries to process JSON data. Previous studies focus on performance comparison among these libraries and lack a software engineering perspective.We present the first systematic analysis and comparison of the input / output behavior of 20 JSON libraries, in a single software ecosystem: Java/Maven. We assess behavior diversity by running each library against a curated set of 473 JSON files, including both well-formed and ill-formed files. The main design differences, which influence the behavior of the libraries, relate to the choice of data structure to represent JSON objects and to the encoding of numbers. We observe a remarkable behavioral diversity with ill-formed files, or corner cases such as large numbers or duplicate data. Our unique behavioral assessment of JSON libraries paves the way for a robust processing of ill-formed files, through a multi-version architecture.

Software Engineering

A Systematic Mapping Study of Empirical Studies performed with Collections of Software Projects

120 - Juan Andres Carruthers , Jorge Andres Diaz Pace , Emanuel Agustinn Irrazabal 2021

Context: software projects are common resources in Software Engineering experiments, although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarantee their updating over a long period of time. Goal: to carry out a systematic secondary study about the strategies to select software projects in empirical studies to discover the guidelines taken into account, the degree of use of project collections, the meta-data extracted and the subsequent statistical analysis conducted. Method: A systematic mapping study to identify studies published from January 2013 to December 2020. Results: 122 studies were identified, of which the 72% used their own guidelines for project selection and the 27% used existent project collections. Likewise, there was no evidence of a standardized framework for the project selection process, nor the application of statistical methods that relates with the sample collection strategy.

Software Engineering

Automating Test Case Identification in Java Open Source Projects on GitHub

80 - Matej Madeja , Jaroslav Poruban , Michaela Bav{c}ikova 2021

Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are written in real projects. In this paper, the following was investigated: (i) the denotation of the word test in different natural languages; (ii) whether the number of occurrences of the word test correlates with the number of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3M GitHub projects. We analyzed 20,340 test cases in 803 classes manually and 170k classes using an automated approach. The results show that: (i) there exists a weak correlation (r = 0.655) between the number of occurrences of the word test and the number of test cases in a class; (ii) the proposed algorithm using static file analysis correctly detected 97% of test cases; (iii) 15% of the analyzed classes used main() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very complex due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions, and to mine tests to improve program comprehension in the future.

Software Engineering