No Arabic abstract
Code changes constitute one of the most important features of software evolution. Studying them can provide insights into the nature of software development and also lead to practical solutions - recommendations and automations of popular changes for developers. In our work, we developed a tool called PythonChangeMiner that allows to discover code change patterns in the histories of Python projects. We validated the tool and then employed it to discover patterns in the dataset of 120 projects from four different domains of software engineering. We manually categorized patterns that occur in more than one project from the standpoint of their structure and content, and compared different domains and patterns in that regard. We conducted a survey of the authors of the discovered changes: 82.9% of them said that they can give the change a name and 57.9% expressed their desire to have the changes automated, indicating the ability of the tool to discover valuable patterns. Finally, we interviewed 9 members of a popular integrated development environment (IDE) development team to estimate the feasibility of automating the discovered changes. It was revealed that independence from the context and high precision made a pattern a better candidate for automation. The patterns received mainly positive reviews and several were ranked as very likely for automation.
Many code changes that developers make in their projects are repeated and constitute recurrent change patterns. It is of interest to collect such patterns from the version history of open-source repositories and suggest the most useful of them as quick fixes. In this paper, we present Revizor - a tool aimed to build custom plugins for PyCharm, a popular Python IDE. A Revizor-based plugin can take change patterns and highlight potential places for their application in the developers code editor. If the developer accepts the quick fix, the plugin automatically performs the edit. Our approach uses a graph-based representation of code changes, which allows it to support complex distributed code patterns. Experienced developers have also rated the usability and the performance of such Revizor-based plugin positively. The source code of the tool and test plugin prototype are available on GitHub: https://github.com/JetBrains-Research/revizor. A demonstration video with a short tool description can be found on YouTube: https://youtu.be/5eLs14nco7E.
Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on visual reasoning that can be applied to real-world objects. In this paper, we present a diagrammatic reasoning dataset that provides a large variety of DR problems. In addition, we also propose a Knowledge-based Long Short Term Memory (KLSTM) to solve diagrammatic reasoning problems. Our proposed analysis is arguably the first work in this research area. Several state-of-the-art learning frameworks have been used to compare with the proposed KLSTM framework in the present context. Preliminary results indicate that the domain is highly related to computer vision and pattern recognition research with several challenging avenues.
I examine the topic of training scientific generalists. To focus the discussion, I propose the creation of a new graduate program, analogous in structure to existing MD/PhD programs, aimed at training a critical mass of scientific researchers with substantial intellectual breadth. In addition to completing the normal requirements for a PhD, students would undergo an intense, several year training period designed to expose them to the core vocabulary of multiple subjects at the graduate level. After providing some historical and philosophical context for this proposal, I outline how such a program could be implemented with little institutional overhead by existing research universities. Finally, I discuss alternative possibilities for training generalists by taking advantage of contemporary developments in online learning and open science.
Hashing produces compact representations for documents, to perform tasks like classification or retrieval based on these short codes. When hashing is supervised, the codes are trained using labels on the training data. This paper first shows that the evaluation protocols used in the literature for supervised hashing are not satisfactory: we show that a trivial solution that encodes the output of a classifier significantly outperforms existing supervised or semi-supervised methods, while using much shorter codes. We then propose two alternative protocols for supervised hashing: one based on retrieval on a disjoint set of classes, and another based on transfer learning to new classes. We provide two baseline methods for image-related tasks to assess the performance of (semi-)supervised hashing: without coding and with unsupervised codes. These baselines give a lower- and upper-bound on the performance of a supervised hashing scheme.
We calculate Bayes factors to quantify how the feasibility of the constrained minimal supersymmetric standard model (CMSSM) has changed in the light of a series of observations. This is done in the Bayesian spirit where probability reflects a degree of belief in a proposition and Bayes theorem tells us how to update it after acquiring new information. Our experimental baseline is the approximate knowledge that was available before LEP, and our comparison model is the Standard Model with a simple dark matter candidate. To quantify the amount by which experiments have altered our relative belief in the CMSSM since the baseline data we compute the Bayes factors that arise from learning in sequence the LEP Higgs constraints, the XENON100 dark matter constraints, the 2011 LHC supersymmetry search results, and the early 2012 LHC Higgs search results. We find that LEP and the LHC strongly shatter our trust in the CMSSM (with $M_0$ and $M_{1/2}$ below 2 TeV), reducing its posterior odds by a factor of approximately two orders of magnitude. This reduction is largely due to substantial Occam factors induced by the LEP and LHC Higgs searches.