ترغب بنشر مسار تعليمي؟ اضغط هنا

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

187   0   0.0 ( 0 )
 نشر من قبل Debidatta Dwibedi
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called Repnet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos. Project webpage: https://sites.google.com/view/repnet .



قيم البحث

اقرأ أيضاً

Counting the repetition of human exercise and physical rehabilitation is a common task in rehabilitation and exercise training. The existing vision-based repetition counting methods less emphasize the concurrent motions in the same video. This work p resents a vision-based human motion repetition counting applicable to counting concurrent motions through the skeleton location extracted from various pose estimation methods. The presented method was validated on the University of Idaho Physical Rehabilitation Movements Data Set (UI-PRMD), and MM-fit dataset. The overall mean absolute error (MAE) for mm-fit was 0.06 with off-by-one Accuracy (OBOA) 0.94. Overall MAE for UI-PRMD dataset was 0.06 with OBOA 0.95. We have also tested the performance in a variety of camera locations and concurrent motions with conveniently collected video with overall MAE 0.06 and OBOA 0.88. The proposed method provides a view-angle and motion agnostic concurrent motion counting. This method can potentially use in large-scale remote rehabilitation and exercise training with only one camera.
Modern crowd counting methods usually employ deep neural networks (DNN) to estimate crowd counts via density regression. Despite their significant improvements, the regression-based methods are incapable of providing the detection of individuals in c rowds. The detection-based methods, on the other hand, have not been largely explored in recent trends of crowd counting due to the needs for expensive bounding box annotations. In this work, we instead propose a new deep detection network with only point supervision required. It can simultaneously detect the size and location of human heads and count them in crowds. We first mine useful person size information from point-level annotations and initialize the pseudo ground truth bounding boxes. An online updating scheme is introduced to refine the pseudo ground truth during training; while a locally-constrained regression loss is designed to provide additional constraints on the size of the predicted boxes in a local neighborhood. In the end, we propose a curriculum learning strategy to train the network from images of relatively accurate and easy pseudo ground truth first. Extensive experiments are conducted in both detection and counting tasks on several standard benchmarks, e.g. ShanghaiTech, UCF_CC_50, WiderFace, and TRANCOS datasets, and the results show the superiority of our method over the state-of-the-art.
We show that if a f.g. group $G$ has a non-elementary WPD action on a hyperbolic metric space $X$, then the number of $G$-conjugacy classes of $X$-loxodromic elements of $G$ coming from a ball of radius $R$ in the Cayley graph of $G$ grows exponentia lly in $R$. As an application we prove that for $Nge 3$ the number of distinct $Out(F_N)$-conjugacy classes of fully irreducibles $phi$ from an $R$-ball in the Cayley graph of $Out(F_N)$ with $loglambda(phi)$ on the order of $R$ grows exponentially in $R$.
Prediction is arguably one of the most basic functions of an intelligent system. In general, the problem of predicting events in the future or between two waypoints is exceedingly difficult. However, most phenomena naturally pass through relatively p redictable bottlenecks---while we cannot predict the precise trajectory of a robot arm between being at rest and holding an object up, we can be certain that it must have picked the object up. To exploit this, we decouple visual prediction from a rigid notion of time. While conventional approaches predict frames at regularly spaced temporal intervals, our time-agnostic predictors (TAP) are not tied to specific times so that they may instead discover predictable bottleneck frames no matter when they occur. We evaluate our approach for future and intermediate frame prediction across three robotic manipulation tasks. Our predictions are not only of higher visual quality, but also correspond to coherent semantic subgoals in temporally extended tasks.
Object counting aims to estimate the number of objects in images. The leading counting approaches focus on the single category counting task and achieve impressive performance. Note that there are multiple categories of objects in real scenes. Multi- class object counting expands the scope of application of object counting task. The multi-target detection task can achieve multi-class object counting in some scenarios. However, it requires the dataset annotated with bounding boxes. Compared with the point annotations in mainstream object counting issues, the coordinate box-level annotations are more difficult to obtain. In this paper, we propose a simple yet efficient counting network based on point-level annotations. Specifically, we first change the traditional output channel from one to the number of categories to achieve multiclass counting. Since all categories of objects use the same feature extractor in our proposed framework, their features will interfere mutually in the shared feature space. We further design a multi-mask structure to suppress harmful interaction among objects. Extensive experiments on the challenging benchmarks illustrate that the proposed method achieves state-of-the-art counting performance.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا