Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

292 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yifan Gong

تاريخ النشر 2021

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Zheng Zhan - Yifan Gong - Pu Zhao

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deployment of SISR tasks on mobile, we combine neural architecture search with pruning search and propose an automatic search framework that derives sparse super-resolution (SR) models with high image quality while satisfying the real-time inference requirement. To decrease the search cost, we leverage the weight sharing strategy by introducing a supernet and decouple the search problem into three stages, including supernet construction, compiler-aware architecture and pruning search, and compiler-aware pruning ratio search. With the proposed framework, we are the first to achieve real-time SR inference (with only tens of milliseconds per frame) for implementing 720p resolution with competitive image quality (in terms of PSNR and SSIM) on mobile platforms (Samsung Galaxy S20).

قيم البحث

224 - Han Huang , Li Shen , Chaoyang He 2021

Single Image Super-Resolution (SISR) tasks have achieved significant performance with deep neural networks. However, the large number of parameters in CNN-based methods for SISR tasks require heavy computations. Although several efficient SISR models have been recently proposed, most are handcrafted and thus lack flexibility. In this work, we propose a novel differentiable Neural Architecture Search (NAS) approach on both the cell-level and network-level to search for lightweight SISR models. Specifically, the cell-level search space is designed based on an information distillation mechanism, focusing on the combinations of lightweight operations and aiming to build a more lightweight and accurate SR structure. The network-level search space is designed to consider the feature connections among the cells and aims to find which information flow benefits the cell most to boost the performance. Unlike the existing Reinforcement Learning (RL) or Evolutionary Algorithm (EA) based NAS methods for SISR tasks, our search pipeline is fully differentiable, and the lightweight SISR models can be efficiently searched on both the cell-level and network-level jointly on a single GPU. Experiments show that our methods can achieve state-of-the-art performance on the benchmark datasets in terms of PSNR, SSIM, and model complexity with merely 68G Multi-Adds for $times 2$ and 18G Multi-Adds for $times 4$ SR tasks. Code will be available at url{https://github.com/DawnHH/DLSR-PyTorch}.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Real-Time Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2021 Challenge: Report

107 - Andrey Ignatov , Radu Timofte , Maurizio Denna 2021

Image super-resolution is one of the most popular computer vision problems with many important applications to mobile devices. While many solutions have been proposed for this task, they are usually not optimized even for common smartphone AI hardwar e, not to mention more constrained smart TV platforms that are often supporting INT8 inference only. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image super-resolution solutions that can demonstrate a real-time performance on mobile or edge NPUs. For this, the participants were provided with the DIV2K dataset and trained quantized models to do an efficient 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated NPU capable of accelerating quantized neural networks. The proposed solutions are fully compatible with all major mobile AI accelerators and are capable of reconstructing Full HD images under 40-60 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Real-Time Video Super-Resolution on Smartphones with Deep Learning, Mobile AI 2021 Challenge: Report

148 - Andrey Ignatov , Andres Romero , Heewon Kim 2021

Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services. While many solutions have been proposed for this task, the majority of them are too computatio nally expensive to run on portable devices with limited hardware resources. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a real-time performance on mobile GPUs. The participants were provided with the REDS dataset and trained their models to do an efficient 4X video upscaling. The runtime of all models was evaluated on the OPPO Find X2 smartphone with the Snapdragon 865 SoC capable of accelerating floating-point networks on its Adreno GPU. The proposed solutions are fully compatible with any mobile GPU and can upscale videos to HD resolution at up to 80 FPS while demonstrating high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search

144 - Pu Zhao , Wei Niu , Geng Yuan 2021

Object detection plays an important role in self-driving cars for security development. However, mobile systems on self-driving cars with limited computation resources lead to difficulties for object detection. To facilitate this, we propose a compil er-aware neural pruning search framework to achieve high-speed inference on autonomous vehicles for 2D and 3D object detection. The framework automatically searches the pruning scheme and rate for each layer to find a best-suited pruning for optimizing detection accuracy and speed performance under compiler optimization. Our experiments demonstrate that for the first time, the proposed method achieves (close-to) real-time, 55ms and 99ms inference times for YOLOv4 based 2D object detection and PointPillars based 3D detection, respectively, on an off-the-shelf mobile phone with minor (or no) accuracy loss.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

115 - Wei Niu , Mengshu Sun , Zhengang Li 2020

Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, high-end mobile CPUs and GPUs. However, it is still a challenging task to execute 3D Convolutional Neural Networks (CNNs) targeting fo r real-time performance, besides high inference accuracy. The reason is more complex model structure and higher model dimensionality overwhelm the available computation/storage resources on mobile devices. A natural way may be turning to deep learning weight pruning techniques. However, the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs is not ideal for fully exploiting mobile parallelism while achieving high inference accuracy. This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs, seamlessly integrating neural network weight pruning and compiler code generation techniques. We propose and investigate two structured sparsity schemes i.e., the vanilla structured sparsity and kernel group structured (KGS) sparsity that are mobile acceleration friendly. The vanilla sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained structured sparsity that enjoys higher flexibility while exploiting full on-device parallelism. We propose a reweighted regularization pruning algorithm to achieve the proposed sparsity schemes. The inference time speedup due to sparsity is approaching the pruning rate of the whole model FLOPs (floating point operations). RT3D demonstrates up to 29.1$times$ speedup in end-to-end inference time comparing with current mobile frameworks supporting 3D CNNs, with moderate 1%-1.5% accuracy loss. The end-to-end inference time for 16 video frames could be within 150 ms, when executing representative C3D and R(2+1)D models on a cellphone. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية