The Behavioral Diversity of Java JSON Libraries

81 0 0.0 ( 0 )

Download Cite

Added by Nicolas Harrand

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Nicolas Harrand - Thomas Durieux - David Broman

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

JSON is an essential file and data format in do-mains that span scientific computing, web APIs or configuration management. Its popularity has motivated significant software development effort to build multiple libraries to process JSON data. Previous studies focus on performance comparison among these libraries and lack a software engineering perspective.We present the first systematic analysis and comparison of the input / output behavior of 20 JSON libraries, in a single software ecosystem: Java/Maven. We assess behavior diversity by running each library against a curated set of 473 JSON files, including both well-formed and ill-formed files. The main design differences, which influence the behavior of the libraries, relate to the choice of data structure to represent JSON objects and to the encoding of numbers. We observe a remarkable behavioral diversity with ill-formed files, or corner cases such as large numbers or duplicate data. Our unique behavioral assessment of JSON libraries paves the way for a robust processing of ill-formed files, through a multi-version architecture.

rate research

An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects

141 - Ying Wang , Bihuan Chen , Kaifeng Huang 2020

Third-party libraries are a central building block to develop software systems. However, outdated third-party libraries are commonly used, and developers are usually less aware of the potential risks. Therefore, a quantitative and holistic study on usages, updates and risks of third-party libraries can provide practical insights to improve the ecosystem sustainably. In this paper, we conduct such a study in the Java ecosystem. Specifically, we conduct a library usage analysis (e.g., usage intensity and outdatedness) and a library update analysis (e.g., update intensity and delay) using 806 open-source projects. The two analyses aim to quantify usage and update practices holistically from the perspective of both open-source projects and third-party libraries. Then, we conduct a library risk analysis (e.g., potential risk and developer response) in terms of bugs with 15 popularly-used third-party libraries. This analysis aims to quantify the potential risk of using outdated libraries and the developer response to the risk. Our findings from the three analyses provide practical insights to developers and researchers on problems and potential solutions in maintaining third-party libraries (e.g., smart alerting and automated updating of outdated libraries). To demonstrate the usefulness of our findings, we propose a bug-driven alerting system for assisting developers to make confident decisions in updating third-party libra

Software Engineering

On the diversity and frequency of code related to mathematical formulas in real-world Java projects

62 - Oliver Moseler , Felix Lemmer , Sebastian Baltes 2020

In this paper, the term formula code refers to fragments of source code that implement a mathematical formula. We present empirical studies that analyze the diversity and frequency of formula code in open-source-software projects. In an exploratory study, we investigated what kinds of formulas are implemented in real-world Java projects and derived syntactical patterns and constraints. We refined these patterns for sum and product formulas to automatically detect formula code in software archives and to reconstruct the implemented formula in mathematical notation. In a quantitative study of a large sample of engineered Java projects on GitHub we analyzed the frequency of formula code and estimated that one of 700 lines of code in this sample implements a sum or product formula. For a sample of scientific-computing projects, we found that one of 100 lines of code implements a sum or product formula. To assess the need for tool support, we investigated the helpfulness of comments for program understanding in a sample of formula-code fragments and performed an online survey. Our findings provide first insights into the characteristics of formula code, that can motivate further studies on the role of formula code in software projects and the design of formula-related tools.

Software Engineering

Model-based Testing of the Java Network API

107 - Cyrille Artho 2017

Testing networked systems is challenging. The client or server side cannot be tested by itself. We present a solution using tool Modbat that generates test cases for Javas network library java.nio, where we test both blocking and non-blocking network functions. Our test model can dynamically simulate actions in multiple worker and client threads, thanks to a carefully orchestrated design that covers non-determinism while ensuring progress.

Software Engineering

Adabot: Fault-Tolerant Java Decompiler

62 - Zhiming Li , Qing Wu , Kun Qian 2019

Reverse Engineering(RE) has been a fundamental task in software engineering. However, most of the traditional Java reverse engineering tools are strictly rule defined, thus are not fault-tolerant, which pose serious problem when noise and interference were introduced into the system. In this paper, we view reverse engineering as a statistical machine translation task instead of rule-based task, and propose a fault-tolerant Java decompiler based on machine translation models. Our model is based on attention-based Neural Machine Translation (NMT) and Transformer architectures. First, we measure the translation quality on both the redundant and purified datasets. Next, we evaluate the fault-tolerance(anti-noise ability) of our framework on test sets with different unit error probability (UEP). In addition, we compare the suitability of different word segmentation algorithms for decompilation task. Experimental results demonstrate that our model is more robust and fault-tolerant compared to traditional Abstract Syntax Tree (AST) based decompilers. Specifically, in terms of BLEU-4 and Word Error Rate (WER), our performance has reached 94.50% and 2.65% on the redundant test set; 92.30% and 3.48% on the purified test set.

Software Engineering Computation and Language

A Longitudinal Analysis of Bloated Java Dependencies

59 - Cesar Soto-Valero , Thomas Durieux , Benoit Baudry 2021

We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we study across a total of 31,51

Software Engineering