ترغب بنشر مسار تعليمي؟ اضغط هنا

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

410   0   0.0 ( 0 )
 نشر من قبل Tarun Kalluri
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

A majority of methods for video frame interpolation compute bidirectional optical flow between adjacent frames of a video, followed by a suitable warping algorithm to generate the output frames. However, approaches relying on optical flow often fail to model occlusions and complex non-linear motions directly from the video and introduce additional bottlenecks unsuitable for widespread deployment. We address these limitations with FLAVR, a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video frame interpolation. Our method efficiently learns to reason about non-linear motions, complex occlusions and temporal abstractions, resulting in improved performance on video interpolation, while requiring no additional inputs in the form of optical flow or depth maps. Due to its simplicity, FLAVR can deliver 3x faster inference speed compared to the current most accurate method on multi-frame interpolation without losing interpolation accuracy. In addition, we evaluate FLAVR on a wide range of challenging settings and consistently demonstrate superior qualitative and quantitative results compared with prior methods on various popular benchmarks including Vimeo-90K, UCF101, DAVIS, Adobe, and GoPro. Finally, we demonstrate that FLAVR for video frame interpolation can serve as a useful self-supervised pretext task for action recognition, optical flow estimation, and motion magnification.



قيم البحث

اقرأ أيضاً

Most approaches for video frame interpolation require accurate dense correspondences to synthesize an in-between frame. Therefore, they do not perform well in challenging scenarios with e.g. lighting changes or motion blur. Recent deep learning appro aches that rely on kernels to represent motion can only alleviate these problems to some extent. In those cases, methods that use a per-pixel phase-based motion representation have been shown to work well. However, they are only applicable for a limited amount of motion. We propose a new approach, PhaseNet, that is designed to robustly handle challenging scenarios while also coping with larger motion. Our approach consists of a neural network decoder that directly estimates the phase decomposition of the intermediate frame. We show that this is superior to the hand-crafted heuristics previously used in phase-based methods and also compares favorably to recent deep learning based approaches for video frame interpolation on challenging datasets.
We propose RIFE, a Real-time Intermediate Flow Estimation algorithm for Video Frame Interpolation (VFI). Many recent flow-based VFI methods first estimate the bi-directional optical flows, then scale and reverse them to approximate intermediate flows , leading to artifacts on motion boundaries. RIFE uses a neural network named IFNet that can directly estimate the intermediate flows from coarse-to-fine with much better speed. We design a privileged distillation scheme for training intermediate flow model, which leads to a large performance improvement. Experiments demonstrate that RIFE is flexible and can achieve state-of-the-art performance on several public benchmarks. The code is available at url{https://github.com/hzwer/arXiv2020-RIFE}
132 - Bin Zhao , Xuelong Li 2021
Video frame interpolation can up-convert the frame rate and enhance the video quality. In recent years, although the interpolation performance has achieved great success, image blur usually occurs at the object boundaries owing to the large motion. I t has been a long-standing problem, and has not been addressed yet. In this paper, we propose to reduce the image blur and get the clear shape of objects by preserving the edges in the interpolated frames. To this end, the proposed Edge-Aware Network (EA-Net) integrates the edge information into the frame interpolation task. It follows an end-to-end architecture and can be separated into two stages, emph{i.e.}, edge-guided flow estimation and edge-protected frame synthesis. Specifically, in the flow estimation stage, three edge-aware mechanisms are developed to emphasize the frame edges in estimating flow maps, so that the edge-maps are taken as the auxiliary information to provide more guidance to boost the flow accuracy. In the frame synthesis stage, the flow refinement module is designed to refine the flow map, and the attention module is carried out to adaptively focus on the bidirectional flow maps when synthesizing the intermediate frames. Furthermore, the frame and edge discriminators are adopted to conduct the adversarial training strategy, so as to enhance the reality and clarity of synthesized frames. Experiments on three benchmarks, including Vimeo90k, UCF101 for single-frame interpolation and Adobe240-fps for multi-frame interpolation, have demonstrated the superiority of the proposed EA-Net for the video frame interpolation task.
Video frame interpolation, the synthesis of novel views in time, is an increasingly popular research direction with many new papers further advancing the state of the art. But as each new method comes with a host of variables that affect the interpol ation quality, it can be hard to tell what is actually important for this task. In this work, we show, somewhat surprisingly, that it is possible to achieve near state-of-the-art results with an older, simpler approach, namely adaptive separable convolutions, by a subtle set of low level improvements. In doing so, we propose a number of intuitive but effective techniques to improve the frame interpolation quality, which also have the potential to other related applications of adaptive convolutions such as burst image denoising, joint image filtering, or video prediction.
Video frame interpolation aims at synthesizing intermediate frames from nearby source frames while maintaining spatial and temporal consistencies. The existing deep-learning-based video frame interpolation methods can be roughly divided into two cate gories: flow-based methods and kernel-based methods. The performance of flow-based methods is often jeopardized by the inaccuracy of flow map estimation due to oversimplified motion models, while that of kernel-based methods tends to be constrained by the rigidity of kernel shape. To address these performance-limiting issues, a novel mechanism named generalized deformable convolution is proposed, which can effectively learn motion information in a data-driven manner and freely select sampling points in space-time. We further develop a new video frame interpolation method based on this mechanism. Our extensive experiments demonstrate that the new method performs favorably against the state-of-the-art, especially when dealing with complex motions.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا