No Arabic abstract
The bug triaging process, an essential process of assigning bug reports to the most appropriate developers, is related closely to the quality and costs of software development. As manual bug assignment is a labor-intensive task, especially for large-scale software projects, many machine-learning-based approaches have been proposed to automatically triage bug reports. Although developer collaboration networks (DCNs) are dynamic and evolving in the real-world, most automated bug triaging approaches focus on static tossing graphs at a single time slice. Also, none of the previous studies consider periodic interactions among developers. To address the problems mentioned above, in this article, we propose a novel spatial-temporal dynamic graph neural network (ST-DGNN) framework, including a joint random walk (JRWalk) mechanism and a graph recurrent convolutional neural network (GRCNN) model. In particular, JRWalk aims to sample local topological structures in a graph with two sampling strategies by considering both node importance and edge importance. GRCNN has three components with the same structure, i.e., hourly-periodic, daily-periodic, and weekly-periodic components, to learn the spatial-temporal features of dynamic DCNs. We evaluated our approachs effectiveness by comparing it with several state-of-the-art graph representation learning methods in two domain-specific tasks that belong to node classification. In the two tasks, experiments on two real-world, large-scale developer collaboration networks collected from the Eclipse and Mozilla projects indicate that the proposed approach outperforms all the baseline methods.
Modern software development is increasingly dependent on components, libraries and frameworks coming from third-party vendors or open-source suppliers and made available through a number of platforms (or forges). This way of writing software puts an emphasis on reuse and on composition, commoditizing the services which modern applications require. On the other hand, bugs and vulnerabilities in a single library living in one such ecosystem can affect, directly or by transitivity, a huge number of other libraries and applications. Currently, only product-level information on library dependencies is used to contain this kind of danger, but this knowledge often reveals itself too imprecise to lead to effective (and possibly automated) handling policies. We will discuss how fine-grained function-level dependencies can greatly improve reliability and reduce the impact of vulnerabilities on the whole software ecosystem.
The issue tracking system (ITS) is a rich data source for data-driven decision making. Different characteristics of bugs, such as severity, priority, and time to fix may be misleading. Similarly, these values may be subjective, e.g., severity and priority values are assigned based on the intuition of a user or a developer rather than a structured and well-defined procedure. Hence, we explore the dependency graph of the bugs and its complexity as an alternative to show the actual project evolution. In this work, we aim to overcome uncertainty in decision making by tracking the complexity of the bug dependency graph (BDG) to come up with a bug resolution policy that balances different considerations such as bug dependency, severity, and fixing time for the bug triaging. We model the evolution of BDG by mining issue tracking systems of three open-source projects for the past ten years. We first design a Wayback machine to examine the current bug fixing strategies, and then we define eight rule-based bug prioritization policies and compare their performances using ten distinct internal and external indices. We simulate the behavior of the ITS and trace back the effect of each policy across the history of the ITS. Considering the strategies related to the topology of the BDG, we are able to address bug prioritization problems under different scenarios. Our findings show that the network-related approaches are superior to the actual prioritization task in most cases. Among the selected open-source projects, LibreOffice triagers are the only ones who disregard the importance of the BDG, and that project is faced with a very dense BDG. Although we found that there is no single remedy that satisfies all the expectations of developers, the graph-related policies are found to be robust and deemed to be more suitable for bug triaging.
Software vulnerabilities are usually caused by design flaws or implementation errors, which could be exploited to cause damage to the security of the system. At present, the most commonly used method for detecting software vulnerabilities is static analysis. Most of the related technologies work based on rules or code similarity (source code level) and rely on manually defined vulnerability features. However, these rules and vulnerability features are difficult to be defined and designed accurately, which makes static analysis face many challenges in practical applications. To alleviate this problem, some researchers have proposed to use neural networks that have the ability of automatic feature extraction to improve the intelligence of detection. However, there are many types of neural networks, and different data preprocessing methods will have a significant impact on model performance. It is a great challenge for engineers and researchers to choose a proper neural network and data preprocessing method for a given problem. To solve this problem, we have conducted extensive experiments to test the performance of the two most typical neural networks (i.e., Bi-LSTM and RVFL) with the two most classical data preprocessing methods (i.e., the vector representation and the program symbolization methods) on software vulnerability detection problems and obtained a series of interesting research conclusions, which can provide valuable guidelines for researchers and engineers. Specifically, we found that 1) the training speed of RVFL is always faster than BiLSTM, but the prediction accuracy of Bi-LSTM model is higher than RVFL; 2) using doc2vec for vector representation can make the model have faster training speed and generalization ability than using word2vec; and 3) multi-level symbolization is helpful to improve the precision of neural network models.
The refinement calculus provides a methodology for transforming an abstract specification into a concrete implementation, by following a succession of refinement rules. These rules have been mechanized in theorem-provers, thus providing a formal and rigorous way to prove that a given program refines another one. In a previous work, we have extended this mechanization for object-oriented programs, where the memory is represented as a graph, and we have integrated our approach within the rCOS tool, a model-driven software development tool providing a refinement language. Hence, for any refinement step, the tool automatically generates the corresponding proof obligations and the user can manually discharge them, using a provided library of refinement lemmas. In this work, we propose an approach to automate the search of possible refinement rules from a program to another, using the rewriting tool Maude. Each refinement rule in Maude is associated with the corresponding lemma in Isabelle, thus allowing the tool to automatically generate the Isabelle proof when a refinement rule can be automatically found. The user can add a new refinement rule by providing the corresponding Maude rule and Isabelle lemma.
Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) -- a self-supervised graph neural network pre-training framework -- to capture the universal network topological properties across multiple networks. We design GCCs pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.