ترغب بنشر مسار تعليمي؟ اضغط هنا

Space-Time Correspondence as a Contrastive Random Walk

202   0   0.0 ( 0 )
 نشر من قبل Allan Jabri
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper proposes a simple self-supervised approach for learning a representation for visual correspondence from raw video. We cast correspondence as prediction of links in a space-time graph constructed from video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a representation in which pairwise similarity defines transition probability of a random walk, so that long-range correspondence is computed as a walk along the graph. We optimize the representation to place high probability along paths of similarity. Targets for learning are formed without supervision, by cycle-consistency: the objective is to maximize the likelihood of returning to the initial node when walking along a graph constructed from a palindrome of frames. Thus, a single path-level constraint implicitly supervises chains of intermediate comparisons. When used as a similarity metric without adaptation, the learned representation outperforms the self-supervised state-of-the-art on label propagation tasks involving objects, semantic parts, and pose. Moreover, we demonstrate that a technique we call edge dropout, as well as self-supervised adaptation at test-time, further improve transfer for object-centric correspondence.


قيم البحث

اقرأ أيضاً

A physical-mathematical approach to anomalous diffusion may be based on generalized diffusion equations (containing derivatives of fractional order in space or/and time) and related random walk models. The fundamental solution (for the {Cauchy} probl em) of the fractional diffusion equations can be interpreted as a probability density evolving in time of a peculiar self-similar stochastic process that we view as a generalized diffusion process. By adopting appropriate finite-difference schemes of solution, we generate models of random walk discrete in space and time suitable for simulating random variables whose spatial probability density evolves in time according to a given fractional diffusion equation.
318 - A.V. Plyukhin 2009
In a simple model of a continuous random walk a particle moves in one dimension with the velocity fluctuating between V and -V. If V is associated with the thermal velocity of a Brownian particle and allowed to be position dependent, the model accoun ts readily for the particles drift along the temperature gradient and recovers basic results of the conventional thermophoresis theory.
Self-supervised learning has recently begun to rival supervised learning on computer vision tasks. Many of the recent approaches have been based on contrastive instance discrimination (CID), in which the network is trained to recognize two augment
167 - Norio Konno , Shunya Tamura 2021
In this paper, following the recent paper on Walk/Zeta Correspondence by the first author and his coworkers, we compute the zeta function for the three- and four-state quantum walk and correlated random walk, and the multi-state random walk on the on e-dimensional torus by using the Fourier analysis. We deal with also the four-state quantum walk and correlated random walk on the two-dimensional torus. In addition, we introduce a new class of models determined by the generalized Grover matrix bridging the gap between the Grover matrix and the positive-support of the Grover matrix. Finally, we give a generalized version of the Konno-Sato theorem for the new class. As a corollary, we calculate the zeta function for the generalized Grover matrix on the d-dimensional torus.
In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation. The in tra-video learning transforms the image contents across frames within a single video via the frame pair-wise affinity. To obtain the discriminative representation for instance-level separation, we go beyond the intra-video analysis and construct the inter-video affinity to facilitate the contrastive transformation across different videos. By forcing the transformation consistency between intra- and inter-video levels, the fine-grained correspondence associations are well preserved and the instance-level feature discrimination is effectively reinforced. Our simple framework outperforms the recent self-supervised correspondence methods on a range of visual tasks including video object tracking (VOT), video object segmentation (VOS), pose keypoint tracking, etc. It is worth mentioning that our method also surpasses the fully-supervised affinity representation (e.g., ResNet) and performs competitively against the recent fully-supervised algorithms designed for the specific tasks (e.g., VOT and VOS).

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا