Variable Rate Image Compression with Recurrent Neural Networks

302 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل George Toderici

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف George Toderici - Sean M. OMalley - Sung Jin Hwang

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore a current research focus, as any byte savings will significantly enhance the experience of mobile device users. Toward this end, we propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks. Our models address the main issues that have prevented autoencoder neural networks from competing with existing image compression algorithms: (1) our networks only need to be trained once (not per-image), regardless of input image dimensions and the desired compression rate; (2) our networks are progressive, meaning that the more bits are sent, the more accurate the image reconstruction; and (3) the proposed architecture is at least as efficient as a standard purpose-trained autoencoder for a given number of bits. On a large-scale benchmark of 32$times$32 thumbnails, our LSTM-based approaches provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size that is reduced by 10% or more.

قيم البحث

112 - George Toderici , Damien Vincent , Nick Johnston 2016

This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each ne twork need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural network for entropy coding. We compare RNN types (LSTM, associative LSTM) and introduce a new hybrid of GRU and ResNet. We also study one-shot versus additive reconstruction architectures and introduce a new scaled-additive framework. We compare to previous work, showing improvements of 4.3%-8.8% AUC (area under the rate-distortion curve), depending on the perceptual metric used. As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding.

الرؤية الحاسوبية وتمييز الأنماط

Target-Quality Image Compression with Recurrent, Convolutional Neural Networks

150 - Michele Covell , Nick Johnston , David Minnen 2017

We introduce a stop-code tolerant (SCT) approach to training recurrent convolutional neural networks for lossy image compression. Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in area s around stop-code masking as well as in highly-detailed areas. These methods lead to lower true bitrates for a given recursion count, both pre- and post-entropy coding, even using unstructured LZ77 code compression. The pre-LZ77 gains are achieved by trimming stop codes. The post-LZ77 gains are due to the highly unequal distributions of 0/1 codes from the SCT architectures. With these code compressions, the SCT architecture maintains or exceeds the image quality at all compression rates compared to JPEG and to RNN auto-encoders across the Kodak dataset. In addition, the SCT coding results in lower variance in image quality across the extent of the image, a characteristic that has been shown to be important in human ratings of image quality

الرؤية الحاسوبية وتمييز الأنماط

Learned Multi-Resolution Variable-Rate Image Compression with Octave-based Residual Blocks

101 - Mohammad Akbari , Jie Liang , Jingning Han 2020

Recently deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increase the implementation complexity. In this paper, we pr opose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv) with built-in generalized divisive normalization (GDN) and inverse GDN (IGDN) layers. Novel GoConv- and GoTConv-based residual blocks are also developed in the encoder and decoder networks. Our scheme also uses a stochastic rounding-based scalar quantization. To further improve the performance, we encode the residual between the input and the reconstructed image from the decoder network as an enhancement layer. To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

DRAW: A Recurrent Neural Network For Image Generation

419 - Karol Gregor , Ivo Danihelka , Alex Graves 2015

This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational aut o-encoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

Block-optimized Variable Bit Rate Neural Image Compression

67 - Caglar Aytekin , Xingyang Ni , Francesco Cricri 2018

In this work, we propose an end-to-end block-based auto-encoder system for image compression. We introduce novel contributions to neural-network based image compression, mainly in achieving binarization simulation, variable bit rates with multiple ne tworks, entropy-friendly representations, inference-stage code optimization and performance-improving normalization layers in the auto-encoder. We evaluate and show the incremental performance increase of each of our contributions.

التعلم الآلي التعلم الالي