User Constrained Thumbnail Generation using Adaptive Convolutions

107 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ayan Kumar Bhunia

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Perla Sai Raj Kishore - Ayan Kumar Bhunia - Shuvozit Ghose

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Thumbnails are widely used all over the world as a preview for digital images. In this work we propose a deep neural framework to generate thumbnails of any size and aspect ratio, even for unseen values during training, with high accuracy and precision. We use Global Context Aggregation (GCA) and a modified Region Proposal Network (RPN) with adaptive convolutions to generate thumbnails in real time. GCA is used to selectively attend and aggregate the global context information from the entire image while the RPN is used to predict candidate bounding boxes for the thumbnail image. Adaptive convolution eliminates the problem of generating thumbnails of various aspect ratios by using filter weights dynamically generated from the aspect ratio information. The experimental results indicate the superior performance of the proposed model over existing state-of-the-art techniques.

قيم البحث

93 - Seyed A. Esmaeili , Bharat Singh , Larry S. Davis 2016

Fast-AT is an automatic thumbnail generation system based on deep neural networks. It is a fully-convolutional deep neural network, which learns specific filters for thumbnails of different sizes and aspect ratios. During inference, the appropriate f ilter is selected depending on the dimensions of the target thumbnail. Unlike most previous work, Fast-AT does not utilize saliency but addresses the problem directly. In addition, it eliminates the need to conduct region search on the saliency map. The model generalizes to thumbnails of different sizes including those with extreme aspect ratios and can generate thumbnails in real time. A data set of more than 70,000 thumbnail annotations was collected to train Fast-AT. We show competitive results in comparison to existing techniques.

الرؤية الحاسوبية وتمييز الأنماط

Sentence Specified Dynamic Video Thumbnail Generation

186 - Yitian Yuan , Lin Ma , Wenwu Zhu 2019

With the tremendous growth of videos over the Internet, video thumbnails, providing video content previews, are becoming increasingly crucial to influencing users online searching experiences. Conventional video thumbnails are generated once purely b ased on the visual characteristics of videos, and then displayed as requested. Hence, such video thumbnails, without considering the users searching intentions, cannot provide a meaningful snapshot of the video contents that users concern. In this paper, we define a distinctively new task, namely sentence specified dynamic video thumbnail generation, where the generated thumbnails not only provide a concise preview of the original video contents but also dynamically relate to the users searching intentions with semantic correspondences to the users query sentences. To tackle such a challenging task, we propose a novel graph convolved video thumbnail pointer (GTP). Specifically, GTP leverages a sentence specified video graph convolutional network to model both the sentence-video semantic interaction and the internal video relationships incorporated with the sentence information, based on which a temporal conditioned pointer network is then introduced to sequentially generate the sentence specified video thumbnails. Moreover, we annotate a new dataset based on ActivityNet Captions for the proposed new task, which consists of 10,000+ video-sentence pairs with each accompanied by an annotated sentence specified video thumbnail. We demonstrate that our proposed GTP outperforms several baseline methods on the created dataset, and thus believe that our initial results along with the release of the new dataset will inspire further research on sentence specified dynamic video thumbnail generation. Dataset and code are available at https://github.com/yytzsy/GTP.

الرؤية الحاسوبية وتمييز الأنماط

Revisiting Adaptive Convolutions for Video Frame Interpolation

86 - Simon Niklaus , Long Mai , Oliver Wang 2020

Video frame interpolation, the synthesis of novel views in time, is an increasingly popular research direction with many new papers further advancing the state of the art. But as each new method comes with a host of variables that affect the interpol ation quality, it can be hard to tell what is actually important for this task. In this work, we show, somewhat surprisingly, that it is possible to achieve near state-of-the-art results with an older, simpler approach, namely adaptive separable convolutions, by a subtle set of low level improvements. In doing so, we propose a number of intuitive but effective techniques to improve the frame interpolation quality, which also have the potential to other related applications of adaptive convolutions such as burst image denoising, joint image filtering, or video prediction.

الرؤية الحاسوبية وتمييز الأنماط

Adaptive Convolutions with Per-pixel Dynamic Filter Atom

163 - Ze Wang , Zichen Miao , Jun Hu 2021

Applying feature dependent network weights have been proved to be effective in many fields. However, in practice, restricted by the enormous size of model parameters and memory footprints, scalable and versatile dynamic convolutions with per-pixel ad apted filters are yet to be fully explored. In this paper, we address this challenge by decomposing filters, adapted to each spatial position, over dynamic filter atoms generated by a light-weight network from local features. Adaptive receptive fields can be supported by further representing each filter atom over sets of pre-fixed multi-scale bases. As plug-and-play replacements to convolutional layers, the introduced adaptive convolutions with per-pixel dynamic atoms enable explicit modeling of intra-image variance, while avoiding heavy computation, parameters, and memory cost. Our method preserves the appealing properties of conventional convolutions as being translation-equivariant and parametrically efficient. We present experiments to show that, the proposed method delivers comparable or even better performance across tasks, and are particularly effective on handling tasks with significant intra-image variance.

الرؤية الحاسوبية وتمييز الأنماط

Constrained Image Generation Using Binarized Neural Networks with Decision Procedures

119 - Svyatoslav Korneev , Nina Narodytska , Luca Pulina 2018

We consider the problem of binary image generation with given properties. This problem arises in a number of practical applications, including generation of artificial porous medium for an electrode of lithium-ion batteries, for composed materials, e tc. A generated image represents a porous medium and, as such, it is subject to two sets of constraints: topological constraints on the structure and process constraints on the physical process over this structure. To perform image generation we need to define a mapping from a porous medium to its physical process parameters. For a given geometry of a porous medium, this mapping can be done by solving a partial differential equation (PDE). However, embedding a PDE solver into the search procedure is computationally expensive. We use a binarized neural network to approximate a PDE solver. This allows us to encode the entire problem as a logical formula. Our main contribution is that, for the first time, we show that this problem can be tackled using decision procedures. Our experiments show that our model is able to produce random constrained images that satisfy both topological and process constraints.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات (0)

no comments...

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة سوهاج

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

User Constrained Thumbnail Generation using Adaptive Convolutions

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً