No Arabic abstract
Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, they quickly become outdated as the implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite-state models from execution logs. However, existing techniques do not scale well when processing very large logs, such as system-level logs obtained by combining component-level logs. Furthermore, in the case of component-based systems, existing techniques assume to know the definitions of communication channels between components. However, this information is usually not available in the case of systems integrating 3rd-party components with limited documentation. In this paper, we address the scalability problem of inferring the model of a component-based system from the individual component-level logs, when the only available information about the system are high-level architecture dependencies among components and a (possibly incomplete) list of log message templates denoting communication events between components. Our model inference technique, called SCALER, follows a divide and conquer approach. The idea is to first infer a model of each system component from the corresponding logs; then, the individual component models are merged together taking into account the dependencies among components, as reflected in the logs. We evaluated SCALER in terms of scalability and accuracy, using a dataset of logs from an industrial system; the results show that SCALER can process much larger logs than a state-of-the-art tool, while yielding more accurate models.
Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, quickly become outdated as implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite state models from execution logs. However, existing techniques do not scale well when processing very large logs that can be commonly found in practice. In this paper, we address the scalability problem of inferring the model of a component-based system from large system logs, without requiring any extra information. Our model inference technique, called PRINS, follows a divide and conquer approach. The idea is to first infer a model of each system component from the corresponding logs; then, the individual component models are merged together taking into account the flow of events across components, as reflected in the logs. We evaluated PRINS in terms of scalability and accuracy, using nine datasets composed of logs extracted from publicly available benchmarks and a personal computer running desktop business applications. The results show that PRINS can process large logs much faster than a publicly available and well-known state-of-the-art tool, without significantly compromising the accuracy of inferred models.
Robotic Process Automation (RPA) is a technology to automate routine work such as copying data across applications or filling in document templates using data from multiple applications. RPA tools allow organizations to automate a wide range of routines. However, identifying and scoping routines that can be automated using RPA tools is time consuming. Manual identification of candidate routines via interviews, walk-throughs, or job shadowing allow analysts to identify the most visible routines, but these methods are not suitable when it comes to identifying the long tail of routines in an organization. This article proposes an approach to discover automatable routines from logs of user interactions with IT systems and to synthesize executable specifications for such routines. The approach starts by discovering frequent routines at a control-flow level (candidate routines). It then determines which of these candidate routines are automatable and it synthetizes an executable specification for each such routine. Finally, it identifies semantically equivalent routines so as to produce a set of non-redundant automatable routines. The article reports on an evaluation of the approach using a combination of synthetic and real-life logs. The evaluation results show that the approach can discover automatable routines that are known to be present in a UI log, and that it identifies automatable routines that users recognize as such in real-life logs.
We will present a new general framework for robust and adaptive control that allows for distributed and scalable learning and control of large systems of interconnected linear subsystems. The control method is demonstrated for a linear time-invariant system with bounded parameter uncertainties, disturbances and noise. The presented scheme continuously collects measurements to reduce the uncertainty about the system parameters and adapts dynamic robust controllers online in a stable and performance-improving way. A key enabler for our approach is choosing a time-varying dynamic controller implementation, inspired by recent work on System Level Synthesis. We leverage a new robustness result for this implementation to propose a general robust adaptive control algorithm. In particular, the algorithm allows us to impose communication and delay constraints on the controller implementation and is formulated as a sequence of robust optimization problems that can be solved in a distributed manner. The proposed control methodology performs particularly well when the interconnection between systems is sparse and the dynamics of local regions of subsystems depend only on a small number of parameters. As we will show on a five-dimensional exemplary chain-system, the algorithm can utilize system structure to efficiently learn and control the entire system while respecting communication and implementation constraints. Moreover, although current theoretical results require the assumption of small initial uncertainties to guarantee robustness, we will present simulations that show good closed-loop performance even in the case of large uncertainties, which suggests that this assumption is not critical for the presented technique and future work will focus on providing less conservative guarantees.
We consider the problem of approximate Bayesian inference in log-supermodular models. These models encompass regular pairwise MRFs with binary variables, but allow to capture high-order interactions, which are intractable for existing approximate inference techniques such as belief propagation, mean field, and variants. We show that a recently proposed variational approach to inference in log-supermodular models -L-FIELD- reduces to the widely-studied minimum norm problem for submodular minimization. This insight allows to leverage powerful existing tools, and hence to solve the variational problem orders of magnitude more efficiently than previously possible. We then provide another natural interpretation of L-FIELD, demonstrating that it exactly minimizes a specific type of Renyi divergence measure. This insight sheds light on the nature of the variational approximations produced by L-FIELD. Furthermore, we show how to perform parallel inference as message passing in a suitable factor graph at a linear convergence rate, without having to sum up over all the configurations of the factor. Finally, we apply our approach to a challenging image segmentation task. Our experiments confirm scalability of our approach, high quality of the marginals, and the benefit of incorporating higher-order potentials.
As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the users intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we propose an automated software-specific query reformulation approach based on deep learning. With query logs provided by Stack Overflow, we construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the users original query. The evaluation results show that our approach outperforms five state-of-the-art baselines, and achieves a 5.6% to 33.5% boost in terms of $mathit{ExactMatch}$ and a 4.8% to 14.4% boost in terms of $mathit{GLEU}$.