Do you want to publish a course? Click here

Prioritize Crowdsourced Test Reports via Deep Screenshot Understanding

60   0   0.0 ( 0 )
 Added by Shengcheng Yu
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Crowdsourced testing is increasingly dominant in mobile application (app) testing, but it is a great burden for app developers to inspect the incredible number of test reports. Many researches have been proposed to deal with test reports based only on texts or additionally simple image features. However, in mobile app testing, texts contained in test reports are condensed and the information is inadequate. Many screenshots are included as complements that contain much richer information beyond texts. This trend motivates us to prioritize crowdsourced test reports based on a deep screenshot understanding. In this paper, we present a novel crowdsourced test report prioritization approach, namely DeepPrior. We first represent the crowdsourced test reports with a novelly introduced feature, namely DeepFeature, that includes all the widgets along with their texts, coordinates, types, and even intents based on the deep analysis of the app screenshots, and the textual descriptions in the crowdsourced test reports. DeepFeature includes the Bug Feature, which directly describes the bugs, and the Context Feature, which depicts the thorough context of the bug. The similarity of the DeepFeature is used to represent the test reports similarity and prioritize the crowdsourced test reports. We formally define the similarity as DeepSimilarity. We also conduct an empirical experiment to evaluate the effectiveness of the proposed technique with a large dataset group. The results show that DeepPrior is promising, and it outperforms the state-of-the-art approach with less than half the overhead.



rate research

Read More

Crowdsourced testing, as a distinct testing paradigm, has attracted much attention in software testing, especially in mobile application (app) testing field. Compared with in-house testing, crowdsourced testing outperforms because it utilize the diverse testing environments of different crowdworkers faced with the mobile testing fragmentation problem. However, crowdsourced testing also brings some problem. The crowdworkers involved are with different expertise, and they are not professional testers. Therefore, the reports they may submit are numerous and with uneven quality. App developers have to distinguish high-quality reports from low-quality ones to help the bug revealing and fixing. Some crowdworkers would submit inconsistent test reports, which means the textual descriptions are not focusing on the attached bug occurring screenshots. Such reports cause the waste on both time and human resources of app developing and testing. To solve such a problem, we propose ReCoDe in this paper, which is designed to detect the consistency of crowdsourced test reports via deep image-and-text fusion understanding. First, according to a pre-conducted survey, ReCoDe classifies the crowdsourced test reports into 10 categories, which covers the vast majority of reported problems in the test reports. Then, for each category of bugs, we have distinct processing models. The models have a deep fusion understanding on both image information and textual descriptions. We also have conducted an experiment to evaluate ReCoDe, and the results show the effectiveness of ReCoDe to detect consistency crowdsourced test reports.
Testing is the most direct and effective technique to ensure software quality. However, it is a burden for developers to understand the poorly-commented tests, which are common in industry environment projects. Mobile applications (app) are GUI-intensive and event-driven, so test scripts focusing on GUI interactions play a more important role in mobile app testing besides the test cases for the source code. Therefore, more attention should be paid to the user interactions and the corresponding user event responses. However, test scripts are loosely linked to apps under test (AUT) based on widget selectors, making it hard to map the operations to the functionality code of AUT. In such a situation, code understanding algorithms may lose efficacy if directly applied to mobile app test scripts. We present a novel approach, TestIntent, to infer the intent of mobile app test scripts. TestIntent combines the GUI image understanding and code understanding technologies. The test script is transferred into an operation sequence model. For each operation, TestIntent extracts the operated widget selector and link the selector to the UI layout structure, which stores the detailed information of the widgets, including coordinates, type, etc. With code understanding technologies, TestIntent can locate response methods in the source code. Afterwards, NLP algorithms are adopted to understand the code and generate descriptions. Also, TestIntent can locate widgets on the app GUI images. Then, TestIntent can understand the widget intent with an encoder-decoder model. With the combination of the results from GUI and code understanding, TestIntent generates the test intents in natural language format. We also conduct an empirical experiment, and the results prove the outstanding performance of TestIntent. A user study also declares that TestIntent can save developers time to understand test scripts.
We introduce a new logic named Quantitative Confidence Logic (QCL) that quantifies the level of confidence one has in the conclusion of a proof. By translating a fault tree representing a systems architecture to a proof, we show how to use QCL to give a solution to the test resource allocation problem that takes the given architecture into account. We implemented a tool called Astrahl and compared our results to other testing resource allocation strategies.
It is integral to test API functions of widely used deep learning (DL) libraries. The effectiveness of such testing requires DL specific input constraints of these API functions. Such constraints enable the generation of valid inputs, i.e., inputs that follow these DL specific constraints, to explore deep to test the core functionality of API functions. Existing fuzzers have no knowledge of such constraints, and existing constraint extraction techniques are ineffective for extracting DL specific input constraints. To fill this gap, we design and implement a document guided fuzzing technique, D2C, for API functions of DL libraries. D2C leverages sequential pattern mining to generate rules for extracting DL specific constraints from API documents and uses these constraints to guide the fuzzing to generate valid inputs automatically. D2C also generates inputs that violate these constraints to test the input validity checking code. In addition, D2C uses the constraints to generate boundary inputs to detect more bugs. Our evaluation of three popular DL libraries (TensorFlow, PyTorch, and MXNet) shows that D2Cs accuracy in extracting input constraints is 83.3% to 90.0%. D2C detects 121 bugs, while a baseline fuzzer without input constraints detects only 68 bugs. Most (89) of the 121 bugs are previously unknown, 54 of which have been fixed or confirmed by developers after we report them. In addition, D2C detects 38 inconsistencies within documents, including 28 that are fixed or confirmed after we report them.
Graphical User Interface (GUI) provides a visual bridge between a software application and end users, through which they can interact with each other. With the development of technology and aesthetics, the visual effects of the GUI are more and more attracting. However, such GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always occur during GUI rendering on different devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot. Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues and develop a heuristics-based data augmentation method for boosting the performance of our OwlEye. The evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues. We also evaluate OwlEye with popular Android apps on Google Play and F-droid, and successfully uncover 57 previously-undetected UI display issues with 26 of them being confirmed or fixed so far.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا