Reinforcement Learning for Adaptive Video Compressive Sensing

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sidi Lu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sidi Lu - Xin Yuan - Aggelos K Katsaggelos

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We apply reinforcement learning to video compressive sensing to adapt the compression ratio. Specifically, video snapshot compressive imaging (SCI), which captures high-speed video using a low-speed camera is considered in this work, in which multiple (B) video frames can be reconstructed from a snapshot measurement. One research gap in previous studies is how to adapt B in the video SCI system for different scenes. In this paper, we fill this gap utilizing reinforcement learning (RL). An RL model, as well as various convolutional neural networks for reconstruction, are learned to achieve adaptive sensing of video SCI systems. Furthermore, the performance of an object detection network using directly the video SCI measurements without reconstruction is also used to perform RL-based adaptive video compressive sensing. Our proposed adaptive SCI method can thus be implemented in low cost and real time. Our work takes the technology one step further towards real applications of video SCI.

قيم البحث

120 - Zhengjue Wang , Hao Zhang , Ziheng Cheng 2021

To capture high-speed videos using a two-dimensional detector, video snapshot compressive imaging (SCI) is a promising system, where the video frames are coded by different masks and then compressed to a snapshot measurement. Following this, efficien t algorithms are desired to reconstruct the high-speed frames, where the state-of-the-art results are achieved by deep learning networks. However, these networks are usually trained for specific small-scale masks and often have high demands of training time and GPU memory, which are hence {bf em not flexible} to $i$) a new mask with the same size and $ii$) a larger-scale mask. We address these challenges by developing a Meta Modulated Convolutional Network for SCI reconstruction, dubbed MetaSCI. MetaSCI is composed of a shared backbone for different masks, and light-weight meta-modulation parameters to evolve to different modulation parameters for each mask, thus having the properties of {bf em fast adaptation} to new masks (or systems) and ready to {bf em scale to large data}. Extensive simulation and real data results demonstrate the superior performance of our proposed approach. Our code is available at {smallurl{https://github.com/xyvirtualgroup/MetaSCI-CVPR2021}}.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Low-Cost Compressive Sensing for Color Video and Depth

186 - Xin Yuan , Patrick Llull , Xuejun Liao 2014

A simple and inexpensive (low-power and low-bandwidth) modification is made to a conventional off-the-shelf color video camera, from which we recover {multiple} color frames for each of the original measured frames, and each of the recovered frames c an be focused at a different depth. The recovery of multiple frames for each measured frame is made possible via high-speed coding, manifested via translation of a single coded aperture; the inexpensive translation is constituted by mounting the binary code on a piezoelectric device. To simultaneously recover depth information, a {liquid} lens is modulated at high speed, via a variable voltage. Consequently, during the aforementioned coding process, the liquid lens allows the camera to sweep the focus through multiple depths. In addition to designing and implementing the camera, fast recovery is achieved by an anytime algorithm exploiting the group-sparsity of wavelet/DCT coefficients.

الرؤية الحاسوبية وتمييز الأنماط

Adaptive ROI Generation for Video Object Segmentation Using Reinforcement Learning

127 - Mingjie Sun , Jimin Xiao , Eng Gee Lim 2019

In this paper, we aim to tackle the task of semi-supervised video object segmentation across a sequence of frames where only the ground-truth segmentation of the first frame is provided. The challenges lie in how to online update the segmentation mod el initialized from the first frame adaptively and accurately, even in presence of multiple confusing instances or large object motion. The existing approaches rely on selecting the region of interest for model update, which however, is rough and inflexible, leading to performance degradation. To overcome this limitation, we propose a novel approach which utilizes reinforcement learning to select optimal adaptation areas for each frame, based on the historical segmentation information. The RL model learns to take optimal actions to adjust the region of interest inferred from the previous frame for online model updating. To speed up the model adaption, we further design a novel multi-branch tree based exploration method to fast select the best state action pairs. Our experiments show that our work improves the state-of-the-art of the mean region similarity on DAVIS 2016 dataset to 87.1%.

الرؤية الحاسوبية وتمييز الأنماط

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

94 - Rameswar Panda , Chun-Fu Chen , Quanfu Fan 2021

Multi-modal learning, which focuses on utilizing various modalities to improve the performance of a model, is widely used in video recognition. While traditional multi-modal learning offers excellent recognition results, its computational expense lim its its impact for many real-world applications. In this paper, we propose an adaptive multi-modal learning framework, called AdaMML, that selects on-the-fly the optimal modalities for each segment conditioned on the input for efficient video recognition. Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency. We efficiently train the policy network jointly with the recognition model using standard back-propagation. Extensive experiments on four challenging diverse datasets demonstrate that our proposed adaptive approach yields 35%-55% reduction in computation when compared to the traditional baseline that simply uses all the modalities irrespective of the input, while also achieving consistent improvements in accuracy over the state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Remote Multilinear Compressive Learning with Adaptive Compression

113 - Dat Thanh Tran , Moncef Gabbouj , Alexandros Iosifidis 2021

Multilinear Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. The level of signal compression affects the detection or classification performance of a MCL model, with higher compression rates often associated with lower inference accuracy. However, higher compression rates are more amenable to a wider range of applications, especially those that require low operating bandwidth and minimal energy consumption such as Internet-of-Things (IoT) applications. Many communication protocols provide support for adaptive data transmission to maximize the throughput and minimize energy consumption. By developing compressive sensing and learning models that can operate with an adaptive compression rate, we can maximize the informational content throughput of the whole application. In this paper, we propose a novel optimization scheme that enables such a feature for MCL models. Our proposal enables practical implementation of adaptive compressive signal acquisition and inference systems. Experimental results demonstrated that the proposed approach can significantly reduce the amount of computations required during the training phase of remote learning systems but also improve the informational content throughput via adaptive-rate sensing.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو