بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices

96 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xin Liu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xin Liu - Yuang Li - Josh Fromm

تفاعل الإنسان والحاسوب الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Super-resolution (SR) is a coveted image processing technique for mobile apps ranging from the basic camera apps to mobile health. Existing SR algorithms rely on deep learning models with significant memory requirements, so they have yet to be deployed on mobile devices and instead operate in the cloud to achieve feasible inference time. This shortcoming prevents existing SR methods from being used in applications that require near real-time latency. In this work, we demonstrate state-of-the-art latency and accuracy for on-device super-resolution using a novel hybrid architecture called SplitSR and a novel lightweight residual block called SplitSRBlock. The SplitSRBlock supports channel-splitting, allowing the residual blocks to retain spatial information while reducing the computation in the channel dimension. SplitSR has a hybrid design consisting of standard convolutional blocks and lightweight residual blocks, allowing people to tune SplitSR for their computational budget. We evaluate our system on a low-end ARM CPU, demonstrating both higher accuracy and up to 5 times faster inference than previous approaches. We then deploy our model onto a smartphone in an app called ZoomSR to demonstrate the first-ever instance of on-device, deep learning-based SR. We conducted a user study with 15 participants to have them assess the perceived quality of images that were post-processed by SplitSR. Relative to bilinear interpolation -- the existing standard for on-device SR -- participants showed a statistically significant preference when looking at both images (Z=-9.270, p<0.01) and text (Z=-6.486, p<0.01).

قيم البحث

76 - Gabriela Bosetti , Sergio Firmenich 2019

The trend towards mobile devices usage has put more than ever the Web as a ubiquitous platform where users perform all kind of tasks. In some cases, users access the Web with native mobile applications developed for well-known sites, such as LinkedIn , Facebook, Twitter, etc. These native applications might offer further (e.g. location-based) functionalities to their users in comparison with their corresponding Web sites, because they were developed with mobile features in mind. However, most Web applications have not this native mobile counterpart and users access them using browsers in the mobile device. Users might eventually want to add mobile features on these Web sites even though those features were not supported originally. In this paper we present a novel approach to allow end users to augment their preferred Web sites with mobile features. This end-user approach is supported by a framework for mobile Web augmentation that we describe in the paper. We also present a set of supporting tools and a validation experiment with end users.

تفاعل الإنسان والحاسوب

End-to-End Adaptive Monte Carlo Denoising and Super-Resolution

87 - Xinyue Wei , Haozhi Huang , Yujin Shi 2021

The classic Monte Carlo path tracing can achieve high quality rendering at the cost of heavy computation. Recent works make use of deep neural networks to accelerate this process, by improving either low-resolution or fewer-sample rendering with supe r-resolution or denoising neural networks in post-processing. However, denoising and super-resolution have only been considered separately in previous work. We show in this work that Monte Carlo path tracing can be further accelerated by joint super-resolution and denoising (SRD) in post-processing. This new type of joint filtering allows only a low-resolution and fewer-sample (thus noisy) image to be rendered by path tracing, which is then fed into a deep neural network to produce a high-resolution and clean image. The main contribution of this work is a new end-to-end network architecture, specifically designed for the SRD task. It contains two cascaded stages with shared components. We discover that denoising and super-resolution require very different receptive fields, a key insight that leads to the introduction of deformable convolution into the network design. Extensive experiments show that the proposed method outperforms previous methods and their variants adopted for the SRD task.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي

From Search Engines to Search Services: An End-User Driven Approach

114 - Gabriela Bosetti , Sergio Firmenich 2019

The World Wide Web is a vast and continuously changing source of information where searching is a frequent, and sometimes critical, user task. Searching is not always the users primary goal but an ancillary task that is performed to find complementar y information allowing to complete another task. In this paper, we explore primary and/or ancillary search tasks and propose an approach for simplifying the user interaction during search tasks. Rather than fo-cusing on dedicated search engines, our approach allows the user to abstract search engines already provided by Web applications into pervasive search services that will be available for performing searches from any other Web site. We also propose to allow users to manage the way in which searching results are displayed and the interaction with them. In order to illustrate the feasibility of this approach, we have built a support tool based on a plug-in architecture that allows users to integrate new search services (created by themselves by means of visual tools) and execute them in the context of both kinds of searches. A case study illustrates the use of these tools. We also present the results of two evaluations that demonstrate the feasibility of the approach and the benefits in its use.

تفاعل الإنسان والحاسوب استرجاع المعلومات

An end-to-end Optical Character Recognition approach for ultra-low-resolution printed text images

98 - Julian D. Gilbey , Carola-Bibiane Schonlieb 2021

Some historical and more recent printed documents have been scanned or stored at very low resolutions, such as 60 dpi. Though such scans are relatively easy for humans to read, they still present significant challenges for optical character recogniti on (OCR) systems. The current state-of-the art is to use super-resolution to reconstruct an approximation of the original high-resolution image and to feed this into a standard OCR system. Our novel end-to-end method bypasses the super-resolution step and produces better OCR results. This approach is inspired from our understanding of the human visual system, and builds on established neural networks for performing OCR. Our experiments have shown that it is possible to perform OCR on 60 dpi scanned images of English text, which is a significantly lower resolution than the state-of-the-art, and we achieved a mean character level accuracy (CLA) of 99.7% and word level accuracy (WLA) of 98.9% across a set of about 1000 pages of 60 dpi text in a wide range of fonts. For 75 dpi images, the mean CLA was 99.9% and the mean WLA was 99.4% on the same sample of texts. We make our code and data (including a set of low-resolution images with their ground truths) publicly available as a benchmark for future work in this field.

الرؤية الحاسوبية وتمييز الأنماط

End-to-end Ultrasound Frame to Volume Registration

93 - Hengtao Guo , Xuanang Xu , Sheng Xu 2021

Fusing intra-operative 2D transrectal ultrasound (TRUS) image with pre-operative 3D magnetic resonance (MR) volume to guide prostate biopsy can significantly increase the yield. However, such a multimodal 2D/3D registration problem is a very challeng ing task. In this paper, we propose an end-to-end frame-to-volume registration network (FVR-Net), which can efficiently bridge the previous research gaps by aligning a 2D TRUS frame with a 3D TRUS volume without requiring hardware tracking. The proposed FVR-Net utilizes a dual-branch feature extraction module to extract the information from TRUS frame and volume to estimate transformation parameters. We also introduce a differentiable 2D slice sampling module which allows gradients backpropagating from an unsupervised image similarity loss for content correspondence learning. Our model shows superior efficiency for real-time interventional guidance with highly competitive registration accuracy.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشھباء الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً