No Arabic abstract
IR-based fault localization approaches achieves promising results when locating faulty files by comparing a bug report with source code. Unfortunately, they become less effective to locate faulty methods. We conduct a preliminary study to explore its challenges, and identify three problems: the semantic gap problem, the representation sparseness problem, and the single revision problem. To tackle these problems, we propose MRAM, a mixed RNN and attention model, which combines bug-fixing features and method structured features to explore both implicit and explicit relevance between methods and bug reports for method level fault localization task. The core ideas of our model are: (1) constructing code revision graphs from code, commits and past bug reports, which reveal the latent relations among methods to augment short methods and as well provide all revisions of code and past fixes to train more accurate models; (2) embedding three method structured features (token sequences, API invocation sequences, and comments) jointly with RNN and soft attention to represent source methods and obtain their implicit relevance with bug reports; and (3) integrating multirevision bug-fixing features, which provide the explicit relevance between bug reports and methods, to improve the performance. We have implemented MRAM and conducted a controlled experiment on five open-source projects. Comparing with stateof-the-art approaches, our MRAM improves MRR values by 3.8- 5.1% (3.7-5.4%) when the dataset contains (does not contain) localized bug reports. Our statistics test shows that our improvements are significant
Combinatorial interaction testing is an efficient software testing strategy. If all interactions among test parameters or factors needed to be covered, the size of a required test suite would be prohibitively large. In contrast, this strategy only requires covering $t$-wise interactions where $t$ is typically very small. As a result, it becomes possible to significantly reduce test suite size. Locating arrays aim to enhance the ability of combinatorial interaction testing. In particular, $(overline{1}, t)$-locating arrays can not only execute all $t$-way interactions but also identify, if any, which of the interactions causes a failure. In spite of this useful property, there is only limited research either on how to generate locating arrays or on their minimum sizes. In this paper, we propose an approach to generating minimum locating arrays. In the approach, the problem of finding a locating array consisting of $N$ tests is represented as a Constraint Satisfaction Problem (CSP) instance, which is in turn solved by a modern CSP solver. The results of using the proposed approach reveal many $(overline{1}, t)$-locating arrays that are smallest known so far. In addition, some of these arrays are proved to be minimum.
In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higher-level representation of a word by taking into account its local fixed-size context in the input text t^x. In this work, we propose an attentive convolution network, ATTCONV. It extends the context scope of the convolution operation, deriving higher-level features for a word not only from local context, but also information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text t^x that are distant or (ii) from extra (i.e., external) contexts t^y. Experiments on sentence modeling with zero-context (sentiment analysis), single-context (textual entailment) and multiple-context (claim verification) demonstrate the effectiveness of ATTCONV in sentence representation learning with the incorporation of context. In particular, attentive convolution outperforms attentive pooling and is a strong competitor to popular attentive RNNs.
Context: Combinatorial interaction testing is known to be an efficient testing strategy for computing and information systems. Locating arrays are mathematical objects that are useful for this testing strategy, as they can be used as a test suite that enables fault localization as well as fault detection. In this application, each row of an array is used as an individual test. Objective: This paper proposes an algorithm for constructing locating arrays with a small number of rows. Testing cost increases as the number of tests increases; thus the problem of finding locating arrays of small sizes is of practical importance. Method: The proposed algorithm uses simulation annealing, a meta-heuristic algorithm, to find locating array of a given size. The whole algorithm repeatedly executes the simulated annealing algorithm by dynamically varying the input array size. Results: Experimental results show 1) that the proposed algorithm is able to construct locating arrays for problem instances of large sizes and 2) that, for problem instances for which nontrivial locating arrays are known, the algorithm is often able to generate locating arrays that are smaller than or at least equal to the known arrays. Conclusion: Based on the results, it is concluded that the proposed algorithm can produce small locating arrays and scale to practical problems.
Search results personalization has become an effective way to improve the quality of search engines. Previous studies extracted information such as past clicks, user topical interests, query click entropy and so on to tailor the original ranking. However, few studies have taken into account the sequential information underlying previous queries and sessions. Intuitively, the order of issued queries is important in inferring the real user interests. And more recent sessions should provide more reliable personal signals than older sessions. In addition, the previous search history and user behaviors should influence the personalization of the current query depending on their relatedness. To implement these intuitions, in this paper we employ a hierarchical recurrent neural network to exploit such sequential information and automatically generate user profile from historical data. We propose a query-aware attention model to generate a dynamic user profile based on the input query. Significant improvement is observed in the experiment with data from a commercial search engine when compared with several traditional personalization models. Our analysis reveals that the attention model is able to attribute higher weights to more related past sessions after fine training.
Semantic code search, which aims to retrieve code snippets relevant to a given natural language query, has attracted many research efforts with the purpose of accelerating software development. The huge amount of online publicly available code repositories has prompted the employment of deep learning techniques to build state-of-the-art code search models. Particularly, they leverage deep neural networks to embed codes and queries into a unified semantic vector space and then use the similarity between codes and querys vectors to approximate the semantic correlation between code and the query. However, most existing studies overlook the codes intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes. In this paper, we propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the valuable codes intrinsic structural logic. To further increase the learning efficiency of COSEA, we propose a variant of contrastive loss for training the code search model, where the ground-truth code should be distinguished from the most similar negative sample. We have implemented a prototype of COSEA. Extensive experiments over existing public datasets of Python and SQL have demonstrated that COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.