No Arabic abstract
Similarly to production code, code smells also occur in test code, where they are called test smells. Test smells have a detrimental effect not only on test code but also on the production code that is being tested. To date, the majority of the research on test smells has been focusing on programming languages such as Java and Scala. However, there are no available automated tools to support the identification of test smells for Python, despite its rapid growth in popularity in recent years. In this paper, we strive to extend the research to Python, build a tool for detecting test smells in this language, and conduct an empirical analysis of test smells in Python projects. We started by gathering a list of test smells from existing research and selecting test smells that can be considered language-agnostic or have similar functionality in Pythons standard Unittest framework. In total, we identified 17 diverse test smells. Additionally, we searched for Python-specific test smells by mining frequent code change patterns that can be considered as either fixing or introducing test smells. Based on these changes, we proposed our own test smell called Suboptimal assert. To detect all these test smells, we developed a tool called PyNose in the form of a plugin to PyCharm, a popular Python IDE. Finally, we conducted a large-scale empirical investigation aimed at analyzing the prevalence of test smells in Python code. Our results show that 98% of the projects and 84% of the test suites in the studied dataset contain at least one test smell. Our proposed Suboptimal assert smell was detected in as much as 70.6% of the projects, making it a valuable addition to the list.
TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components reused across attacks. This framework allows both researchers and developers to test and study the weaknesses of their NLP models. To build such an open-source NLP toolkit requires solving some common problems: How do we enable users to supply models from different deep learning frameworks? How can we build tools to support as many different datasets as possible? We share our insights into developing a well-written, well-documented NLP Python framework in hope that they can aid future development of similar packages.
Test bots are automated testing tools that autonomously and periodically run a set of test cases that check whether the system under test meets the requirements set forth by the customer. The automation decreases the amount of time a development team spends on testing. As development projects become larger, it is important to focus on improving the test bots by designing more effective test cases because otherwise time and usage costs can increase greatly and misleading conclusions from test results might be drawn, such as false positives in the test execution. However, literature currently lacks insights on how test case design affects the effectiveness of test bots. This paper uses a case study approach to investigate those effects by identifying challenges in designing tests for test bots. Our results include guidelines for test design schema for such bots that support practitioners in overcoming the challenges mentioned by participants during our study.
Background: Previous studies have shown that up to 99.59 % of the Java apps using crypto APIs misuse the API at least once. However, these studies have been conducted on Java and C, while empirical studies for other languages are missing. For example, a controlled user study with crypto tasks in Python has shown that 68.5 % of the professional developers write a secure solution for a crypto task. Aims: To understand if this observation holds for real-world code, we conducted a study of crypto misuses in Python. Method: We developed a static analysis tool that covers common misuses of 5 different Python crypto APIs. With this analysis, we analyzed 895 popular Python projects from GitHub and 51 MicroPython projects for embedded devices. Further, we compared our results with the findings of previous studies. Results: Our analysis reveals that 52.26 % of the Python projects have at least one misuse. Further, some Python crypto libraries API design helps developers from misusing crypto functions, which were much more common in studies conducted with Java and C code. Conclusion: We conclude that we can see a positive impact of the good API design on crypto misuses for Python applications. Further, our analysis of MicroPython projects reveals the importance of hybrid analyses.
Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.
Diversity has been used as an effective criteria to optimise test suites for cost-effective testing. Particularly, diversity-based (alternatively referred to as similarity-based) techniques have the benefit of being generic and applicable across different Systems Under Test (SUT), and have been used to automatically select or prioritise large sets of test cases. However, it is a challenge to feedback diversity information to developers and testers since results are typically many-dimensional. Furthermore, the generality of diversity-based approaches makes it harder to choose when and where to apply them. In this paper we address these challenges by investigating: i) what are the trade-off in using different sources of diversity (e.g., diversity of test requirements or test scripts) to optimise large test suites, and ii) how visualisation of test diversity data can assist testers for test optimisation and improvement. We perform a case study on three industrial projects and present quantitative results on the fault detection capabilities and redundancy levels of different sets of test cases. Our key result is that test similarity maps, based on pair-wise diversity calculations, helped industrial practitioners identify issues with their test repositories and decide on actions to improve. We conclude that the visualisation of diversity information can assist testers in their maintenance and optimisation activities.