Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination?

71 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jieshan Chen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jieshan Chen - Mulong Xie - Zhenchang Xing

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Detecting Graphical User Interface (GUI) elements in GUI images is a domain-specific object detection task. It supports many software engineering tasks, such as GUI animation and testing, GUI search and code generation. Existing studies for GUI element detection directly borrow the mature methods from computer vision (CV) domain, including old fashioned ones that rely on traditional image processing features (e.g., canny edge, contours), and deep learning models that learn to detect from large-scale GUI data. Unfortunately, these CV methods are not originally designed with the awareness of the unique characteristics of GUIs and GUI elements and the high localization accuracy of the GUI element detection task. We conduct the first large-scale empirical study of seven representative GUI element detection methods on over 50k GUI images to understand the capabilities, limitations and effective designs of these methods. This study not only sheds the light on the technical challenges to be addressed but also informs the design of new GUI element detection methods. We accordingly design a new GUI-specific old-fashioned method for non-text GUI element detection which adopts a novel top-down coarse-to-fine strategy, and incorporate it with the mature deep learning model for GUI text detection.Our evaluation on 25,000 GUI images shows that our method significantly advances the start-of-the-art performance in GUI element detection.

قيم البحث

61 - Dipu Manandhar , Hailin Jin , John Collomosse 2021

We present Magic Layouts; a method for parsing screenshots or hand-drawn sketches of user interface (UI) layouts. Our core contribution is to extend existing detectors to exploit a learned structural prior for UI designs, enabling robust detection of UI components; buttons, text boxes and similar. Specifically we learn a prior over mobile UI layouts, encoding common spatial co-occurrence relationships between different UI components. Conditioning region proposals using this prior leads to performance gains on UI layout parsing for both hand-drawn UIs and app screenshots, which we demonstrate within the context an interactive application for rapidly acquiring digital prototypes of user experience (UX) designs.

الرؤية الحاسوبية وتمييز الأنماط تفاعل الإنسان والحاسوب التعلم الآلي

Intelligent Exploration for User Interface Modules of Mobile App with Collective Learning

67 - Jingbo Zhou , Zhenwei Tang , Min Zhao 2020

A mobile app interface usually consists of a set of user interface modules. How to properly design these user interface modules is vital to achieving user satisfaction for a mobile app. However, there are few methods to determine design variables for user interface modules except for relying on the judgment of designers. Usually, a laborious post-processing step is necessary to verify the key change of each design variable. Therefore, there is a only very limited amount of design solutions that can be tested. It is timeconsuming and almost impossible to figure out the best design solutions as there are many modules. To this end, we introduce FEELER, a framework to fast and intelligently explore design solutions of user interface modules with a collective machine learning approach. FEELER can help designers quantitatively measure the preference score of different design solutions, aiming to facilitate the designers to conveniently and quickly adjust user interface module. We conducted extensive experimental evaluations on two real-life datasets to demonstrate its applicability in real-life cases of user interface module design in the Baidu App, which is one of the most popular mobile apps in China.

هندسة البرمجيات تفاعل الإنسان والحاسوب التعلم الآلي

GuiTeNet: A graphical user interface for tensor networks

73 - Lisa Sahlmann , Christian B. Mendl 2018

We introduce a graphical user interface for constructing arbitrary tensor networks and specifying common operations like contractions or splitting, denoted GuiTeNet. Tensors are represented as nodes with attached legs, corresponding to the ordered di mensions of the tensor. GuiTeNet visualizes the current network, and instantly generates Python/NumPy source code for the hitherto sequence of user actions. Support for additional programming languages is planned for the future. We discuss the elementary operations on tensor networks used by GuiTeNet, together with high-level optimization strategies. The software runs directly in web browsers and is available online at http://guitenet.org.

البرمجيات الرياضية الإلكترونات المرتبطة بشدة الفيزياء الحسابية

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

178 - Feifei Shao , Long Chen , Jian Shao 2021

Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), i.e., detecting multiple and single instances with bounding boxes in an image using image-level labels, are long-standing and challenging tasks in the CV community. With the success o f deep neural networks in object detection, both WSOD and WSOL have received unprecedented attention. Hundreds of WSOD and WSOL methods and numerous techniques have been proposed in the deep learning era. To this end, in this paper, we consider WSOL is a sub-task of WSOD and provide a comprehensive survey of the recent achievements of WSOD. Specifically, we firstly describe the formulation and setting of the WSOD, including the background, challenges, basic framework. Meanwhile, we summarize and analyze all advanced techniques and training tricks for improving detection performance. Then, we introduce the widely-used datasets and evaluation metrics of WSOD. Lastly, we discuss the future directions of WSOD. We believe that these summaries can help pave a way for future research on WSOD and WSOL.

الرؤية الحاسوبية وتمييز الأنماط

Cats or CAT scans: transfer learning from natural or medical image source datasets?

84 - Veronika Cheplygina 2018

Transfer learning is a widely used strategy in medical image analysis. Instead of only training a network with a limited amount of data from the target task of interest, we can first train the network with other, potentially larger source datasets, c reating a more robust model. The source datasets do not have to be related to the target task. For a classification task in lung CT images, we could use both head CT images, or images of cats, as the source. While head CT images appear more similar to lung CT images, the number and diversity of cat images might lead to a better model overall. In this survey we review a number of papers that have performed similar comparisons. Although the answer to which strategy is best seems to be it depends, we discuss a number of research directions we need to take as a community, to gain more understanding of this topic.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي