Seeing in the dark with recurrent convolutional neural networks

124 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Till Hartmann

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Till S. Hartmann

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Classical convolutional neural networks (cCNNs) are very good at categorizing objects in images. But, unlike human vision which is relatively robust to noise in images, the performance of cCNNs declines quickly as image quality worsens. Here we propose to use recurrent connections within the convolutional layers to make networks robust against pixel noise such as could arise from imaging at low light levels, and thereby significantly increase their performance when tested with simulated noisy video sequences. We show that cCNNs classify images with high signal to noise ratios (SNRs) well, but are easily outperformed when tested with low SNR images (high noise levels) by convolutional neural networks that have recurrency added to convolutional layers, henceforth referred to as gruCNNs. Addition of Bayes-optimal temporal integration to allow the cCNN to integrate multiple image frames still does not match gruCNN performance. Additionally, we show that at low SNRs, the probabilities predicted by the gruCNN (after calibration) have higher confidence than those predicted by the cCNN. We propose to consider recurrent connections in the early stages of neural networks as a solution to computer vision under imperfect lighting conditions and noisy environments; challenges faced during real-time video streams of autonomous driving at night, during rain or snow, and other non-ideal situations.

قيم البحث

150 - Michele Covell , Nick Johnston , David Minnen 2017

We introduce a stop-code tolerant (SCT) approach to training recurrent convolutional neural networks for lossy image compression. Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in area s around stop-code masking as well as in highly-detailed areas. These methods lead to lower true bitrates for a given recursion count, both pre- and post-entropy coding, even using unstructured LZ77 code compression. The pre-LZ77 gains are achieved by trimming stop codes. The post-LZ77 gains are due to the highly unequal distributions of 0/1 codes from the SCT architectures. With these code compressions, the SCT architecture maintains or exceeds the image quality at all compression rates compared to JPEG and to RNN auto-encoders across the Kodak dataset. In addition, the SCT coding results in lower variance in image quality across the extent of the image, a characteristic that has been shown to be important in human ratings of image quality

الرؤية الحاسوبية وتمييز الأنماط

Convolutional Recurrent Neural Networks for Music Classification

218 - Keunwoo Choi , George Fazekas , Mark Sandler 2016

We introduce a convolutional recurrent neural network (CRNN) for music tagging. CRNNs take advantage of convolutional neural networks (CNNs) for local feature extraction and recurrent neural networks for temporal summarisation of the extracted featur es. We compare CRNN with three CNN structures that have been used for music tagging while controlling the number of parameters with respect to their performance and training time per sample. Overall, we found that CRNNs show a strong performance with respect to the number of parameter and training time, indicating the effectiveness of its hybrid structure in music feature extraction and feature summarisation.

الحوسبة العصبية والتطورية التعلم الآلي الوسائط المتعددة

Variable Rate Image Compression with Recurrent Neural Networks

301 - George Toderici , Sean M. OMalley , Sung Jin Hwang 2015

A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit l ow-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore a current research focus, as any byte savings will significantly enhance the experience of mobile device users. Toward this end, we propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks. Our models address the main issues that have prevented autoencoder neural networks from competing with existing image compression algorithms: (1) our networks only need to be trained once (not per-image), regardless of input image dimensions and the desired compression rate; (2) our networks are progressive, meaning that the more bits are sent, the more accurate the image reconstruction; and (3) the proposed architecture is at least as efficient as a standard purpose-trained autoencoder for a given number of bits. On a large-scale benchmark of 32$times$32 thumbnails, our LSTM-based approaches provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size that is reduced by 10% or more.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

Quaternion Convolutional Neural Networks for Heterogeneous Image Processing

397 - Titouan Parcollet , Mohamed Morchid , Georges Linar`es 2018

Convolutional neural networks (CNN) have recently achieved state-of-the-art results in various applications. In the case of image recognition, an ideal model has to learn independently of the training data, both local dependencies between the three c omponents (R,G,B) of a pixel, and the global relations describing edges or shapes, making it efficient with small or heterogeneous datasets. Quaternion-valued convolutional neural networks (QCNN) solved this problematic by introducing multidimensional algebra to CNN. This paper proposes to explore the fundamental reason of the success of QCNN over CNN, by investigating the impact of the Hamilton product on a color image reconstruction task performed from a gray-scale only training. By learning independently both internal and external relations and with less parameters than real valued convolutional encoder-decoder (CAE), quaternion convolutional encoder-decoders (QCAE) perfectly reconstructed unseen color images while CAE produced worst and gray-sca

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

Pruning Convolutional Neural Networks with Self-Supervision

360 - Mathilde Caron , Ari Morcos , Piotr Bojanowski 2020

Convolutional neural networks trained without supervision come close to matching performance with supervised pre-training, but sometimes at the cost of an even higher number of parameters. Extracting subnetworks from these large unsupervised convnets with preserved performance is of particular interest to make them less computationally intensive. Typical pruning methods operate during training on a task while trying to maintain the performance of the pruned network on the same task. However, in self-supervised feature learning, the training objective is agnostic on the representation transferability to downstream tasks. Thus, preserving performance for this objective does not ensure that the pruned subnetwork remains effective for solving downstream tasks. In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i.e. on self-supervised tasks). We show that pruned masks obtained with or without labels reach comparable performance when re-trained on labels, suggesting that pruning operates similarly for self-supervised and supervised learning. Interestingly, we also find that pruning preserves the transfer performance of self-supervised subnetwork representations.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية