Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

187 0 0.0 ( 0 )

Download Cite

Added by Debidatta Dwibedi

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Debidatta Dwibedi - Yusuf Aytar - Jonathan Tompson

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called Repnet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos. Project webpage: https://sites.google.com/view/repnet .

rate research

Viewpoint-Invariant Exercise Repetition Counting

79 - Yu Cheng Hsu , Qingpeng Zhang , Efstratios Tsougenis 2021

Counting the repetition of human exercise and physical rehabilitation is a common task in rehabilitation and exercise training. The existing vision-based repetition counting methods less emphasize the concurrent motions in the same video. This work presents a vision-based human motion repetition counting applicable to counting concurrent motions through the skeleton location extracted from various pose estimation methods. The presented method was validated on the University of Idaho Physical Rehabilitation Movements Data Set (UI-PRMD), and MM-fit dataset. The overall mean absolute error (MAE) for mm-fit was 0.06 with off-by-one Accuracy (OBOA) 0.94. Overall MAE for UI-PRMD dataset was 0.06 with OBOA 0.95. We have also tested the performance in a variety of camera locations and concurrent motions with conveniently collected video with overall MAE 0.06 and OBOA 0.88. The proposed method provides a view-angle and motion agnostic concurrent motion counting. This method can potentially use in large-scale remote rehabilitation and exercise training with only one camera.

Computer Vision and Pattern Recognition Human-Computer Interaction

Point in, Box out: Beyond Counting Persons in Crowds

224 - Yuting Liu , Miaojing Shi , Qijun Zhao 2019

Modern crowd counting methods usually employ deep neural networks (DNN) to estimate crowd counts via density regression. Despite their significant improvements, the regression-based methods are incapable of providing the detection of individuals in crowds. The detection-based methods, on the other hand, have not been largely explored in recent trends of crowd counting due to the needs for expensive bounding box annotations. In this work, we instead propose a new deep detection network with only point supervision required. It can simultaneously detect the size and location of human heads and count them in crowds. We first mine useful person size information from point-level annotations and initialize the pseudo ground truth bounding boxes. An online updating scheme is introduced to refine the pseudo ground truth during training; while a locally-constrained regression loss is designed to provide additional constraints on the size of the predicted boxes in a local neighborhood. In the end, we propose a curriculum learning strategy to train the network from images of relatively accurate and easy pseudo ground truth first. Extensive experiments are conducted in both detection and counting tasks on several standard benchmarks, e.g. ShanghaiTech, UCF_CC_50, WiderFace, and TRANCOS datasets, and the results show the superiority of our method over the state-of-the-art.

Computer Vision and Pattern Recognition

Counting Conjugacy Classes in $Out(F_N)$

137 - Michael Hull , Ilya Kapovich 2017

We show that if a f.g. group $G$ has a non-elementary WPD action on a hyperbolic metric space $X$, then the number of $G$-conjugacy classes of $X$-loxodromic elements of $G$ coming from a ball of radius $R$ in the Cayley graph of $G$ grows exponentially in $R$. As an application we prove that for $Nge 3$ the number of distinct $Out(F_N)$-conjugacy classes of fully irreducibles $phi$ from an $R$-ball in the Cayley graph of $Out(F_N)$ with $loglambda(phi)$ on the order of $R$ grows exponentially in $R$.

Group Theory Dynamical Systems Geometric Topology

Time-Agnostic Prediction: Predicting Predictable Video Frames

131 - Dinesh Jayaraman , Frederik Ebert , Alexei A. Efros 2018

Prediction is arguably one of the most basic functions of an intelligent system. In general, the problem of predicting events in the future or between two waypoints is exceedingly difficult. However, most phenomena naturally pass through relatively predictable bottlenecks---while we cannot predict the precise trajectory of a robot arm between being at rest and holding an object up, we can be certain that it must have picked the object up. To exploit this, we decouple visual prediction from a rigid notion of time. While conventional approaches predict frames at regularly spaced temporal intervals, our time-agnostic predictors (TAP) are not tied to specific times so that they may instead discover predictable bottleneck frames no matter when they occur. We evaluate our approach for future and intermediate frame prediction across three robotic manipulation tasks. Our predictions are not only of higher visual quality, but also correspond to coherent semantic subgoals in temporally extended tasks.

Computer Vision and Pattern Recognition Machine Learning Machine Learning

Dilated-Scale-Aware Attention ConvNet For Multi-Class Object Counting

279 - Wei Xu , Dingkang Liang , Yixiao Zheng 2020

Object counting aims to estimate the number of objects in images. The leading counting approaches focus on the single category counting task and achieve impressive performance. Note that there are multiple categories of objects in real scenes. Multi-class object counting expands the scope of application of object counting task. The multi-target detection task can achieve multi-class object counting in some scenarios. However, it requires the dataset annotated with bounding boxes. Compared with the point annotations in mainstream object counting issues, the coordinate box-level annotations are more difficult to obtain. In this paper, we propose a simple yet efficient counting network based on point-level annotations. Specifically, we first change the traditional output channel from one to the number of categories to achieve multiclass counting. Since all categories of objects use the same feature extractor in our proposed framework, their features will interfere mutually in the shared feature space. We further design a multi-mask structure to suppress harmful interaction among objects. Extensive experiments on the challenging benchmarks illustrate that the proposed method achieves state-of-the-art counting performance.

Computer Vision and Pattern Recognition