No Arabic abstract
Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called cells, notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the notebooks visible code, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present NBSafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. NBSafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate NBSafetys ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, NBSafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that NBSafety identified as resolving safety issues were more than $7times$ more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using NBSafety and were therefore not influenced by its suggestions.
Modern software development is increasingly dependent on components, libraries and frameworks coming from third-party vendors or open-source suppliers and made available through a number of platforms (or forges). This way of writing software puts an emphasis on reuse and on composition, commoditizing the services which modern applications require. On the other hand, bugs and vulnerabilities in a single library living in one such ecosystem can affect, directly or by transitivity, a huge number of other libraries and applications. Currently, only product-level information on library dependencies is used to contain this kind of danger, but this knowledge often reveals itself too imprecise to lead to effective (and possibly automated) handling policies. We will discuss how fine-grained function-level dependencies can greatly improve reliability and reduce the impact of vulnerabilities on the whole software ecosystem.
Fault localization is to identify faulty source code. It could be done on various granularities, e.g., classes, methods, and statements. Most of the automated fault localization (AFL) approaches are coarse-grained because it is challenging to accurately locate fine-grained faulty software elements, e.g., statements. SBFL, based on dynamic execution of test cases only, is simple, intuitive, and generic (working on various granularities). However, its accuracy deserves significant improvement. To this end, in this paper, we propose a hybrid fine-grained AFL approach based on both dynamic spectrums and static statement types. The rationale of the approach is that some types of statements are significantly more/less error-prone than others, and thus statement types could be exploited for fault localization. On a crop of faulty programs, we compute the error-proneness for each type of statements, and assign priorities to special statement types that are steadily more/less error-prone than others. For a given faulty program under test, we first leverage traditional spectrum-based fault localization algorithm to identify all suspicious statements and to compute their suspicious scores. For each of the resulting suspicious statements, we retrieve its statement type as well as the special priority associated with the type. The final suspicious score is the product of the SBFL suspicious score and the priority assigned to the statement type. A significant advantage of the approach is that it is simple and intuitive, making it efficient and easy to interpret/implement. We evaluate the proposed approach on widely used benchmark Defects4J. The evaluation results suggest that the proposed approach outperforms widely used SBFL, reducing the absolute waste effort (AWE) by 9.3% on average.
The Notebook validation tool nbval allows to load and execute Python code from a Jupyter notebook file. While computing outputs from the cells in the notebook, these outputs are compared with the outputs saved in the notebook file, treating each cell as a test. Deviations are reported as test failures, with various configuration options available to control the behaviour. Application use cases include the validation of notebook-based documentation, tutorials and textbooks, as well as the use of notebooks as additional unit, integration and system tests for the libraries that are used in the notebook. Nbval is implemented as a plugin for the pytest testing software.
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.
In the following paper, we present and discuss challenging applications for fine-grained visual classification (FGVC): biodiversity and species analysis. We not only give details about two challenging new datasets suitable for computer vision research with up to 675 highly similar classes, but also present first results with localized features using convolutional neural networks (CNN). We conclude with a list of challenging new research directions in the area of visual classification for biodiversity research.