ﻻ يوجد ملخص باللغة العربية
Previous cycle-consistency correspondence learning methods usually leverage image patches for training. In this paper, we present a fully convolutional method, which is simpler and more coherent to the inference process. While directly applying fully convolutional training results in model collapse, we study the underline reason behind this collapse phenomenon, indicating that the absolute positions of pixels provide a shortcut to easily accomplish cycle-consistence, which hinders the learning of meaningful visual representations. To break this absolute position shortcut, we propose to apply different crops for forward and backward frames, and adopt feature warping to establish correspondence between two crops of a same frame. The former technique enforces the corresponding pixels at forward and back tracks to have different absolute positions, and the latter effectively blocks the shortcuts going between forward and back tracks. In three label propagation benchmarks for pose tracking, face landmark tracking and video object segmentation, our method largely improves the results of vanilla fully convolutional cycle-consistency method, achieving very competitive performance compared with the self-supervised state-of-the-art approaches.
To see is to sketch -- free-hand sketching naturally builds ties between human and machine vision. In this paper, we present a novel approach for translating an object photo to a sketch, mimicking the human sketching process. This is an extremely cha
Learning to synthesize high frame rate videos via interpolation requires large quantities of high frame rate training videos, which, however, are scarce, especially at high resolutions. Here, we propose unsupervised techniques to synthesize high fram
Recent works have advanced the performance of self-supervised representation learning by a large margin. The core among these methods is intra-image invariance learning. Two different transformations of one image instance are considered as a positive
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be used to find
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions. In order to utilize neighboring sharp patches, typical methods rely mainly on homography or optical flows to spatially align neighboring blurry