No Arabic abstract
Highlight detection in sports videos has a broad viewership and huge commercial potential. It is thus imperative to detect highlight scenes more suitably for human interest with high temporal accuracy. Since people instinctively suppress blinks during attention-grabbing events and synchronously generate blinks at attention break points in videos, the instantaneous blink rate can be utilized as a highly accurate temporal indicator of human interest. Therefore, in this study, we propose a novel, automatic highlight detection method based on the blink rate. The method trains a one-dimensional convolution network (1D-CNN) to assess blink rates at each video frame from the spatio-temporal pose features of figure skating videos. Experiments show that the method successfully estimates the blink rate in 94% of the video clips and predicts the temporal change in the blink rate around a jump event with high accuracy. Moreover, the method detects not only the representative athletic action, but also the distinctive artistic expression of figure skating performance as key frames. This suggests that the blink-rate-based supervised learning approach enables high-accuracy highlight detection that more closely matches human sensibility.
In this paper, we test the hypothesis that interesting events in unstructured videos are inherently audiovisual. We combine deep image representations for object recognition and scene understanding with representations from an audiovisual affect recognition model. To this set, we include content agnostic audio-visual synchrony representations and mel-frequency cepstral coefficients to capture other intrinsic properties of audio. These features are used in a modular supervised model. We present results from two experiments: efficacy study of single features on the task, and an ablation study where we leave one feature out at a time. For the video summarization task, our results indicate that the visual features carry most information, and including audiovisual features improves over visual-only information. To better study the task of highlight detection, we run a pilot experiment with highlights annotations for a small subset of video clips and fine-tune our best model on it. Results indicate that we can transfer knowledge from the video summarization task to a model trained specifically for the task of highlight detection.
This paper targets at learning to score the figure skating sports videos. To address this task, we propose a deep architecture that includes two complementary components, i.e., Self-Attentive LSTM and Multi-scale Convolutional Skip LSTM. These two components can efficiently learn the local and global sequential information in each video. Furthermore, we present a large-scale figure skating sports video dataset -- FisV dataset. This dataset includes 500 figure skating videos with the average length of 2 minutes and 50 seconds. Each video is annotated by two scores of nine different referees, i.e., Total Element Score(TES) and Total Program Component Score (PCS). Our proposed model is validated on FisV and MIT-skate datasets. The experimental results show the effectiveness of our models in learning to score the figure skating videos.
In this work, we model the movement of a figure skater gliding on ice by the Chaplygin sleigh, a classic pedagogical example of a nonholonomic mechanical system. The Chaplygin sleigh is controlled by a movable added mass, modeling the movable center of mass of the figure skater. The position and velocity of the added mass act as controls that can be used to steer the skater in order to produce prescribed patterns. For any piecewise smooth prescribed curve, this model can be used to determine the controls needed to reproduce that curve by approximating the curve with circular arcs. Tracing of the circular arcs is exact in our control procedure, so the accuracy of the method depends solely on the accuracy of approximation of a trajectory by circular arcs. To reproduce the individual elements of a pattern, we employ an optimization algorithm. We conclude by reproducing a classical double flower figure skating pattern and compute the resulting controls.
We derive and analyze a three dimensional model of a figure skater. We model the skater as a three-dimensional body moving in space subject to a non-holonomic constraint enforcing movement along the skates direction and holonomic constraints of continuous contact with ice and pitch constancy of the skate. For a static (non-articulated) skater, we show that the system is integrable if and only if the projection of the center of mass on skates direction coincides with the contact point with ice and some mild (and realistic) assumptions on the directions of inertias axes. The integrability is proved by showing the existence of two new constants of motion linear in momenta, providing a new and highly nontrivial example of an integrable non-holonomic mechanical system. We also consider the case when the projection of the center of mass on skates direction does not coincide with the contact point and show that this non-integrable case exhibits apparent chaotic behavior, by studying the divergence of nearby trajectories We also demonstrate the intricate behavior during the transition from the integrable to chaotic case. Our model shows many features of real-life skating, especially figure skating, and we conjecture that real-life skaters may intuitively use the discovered mechanical properties of the system for the control of the performance on ice.
Personalized video highlight detection aims to shorten a long video to interesting moments according to a users preference, which has recently raised the communitys attention. Current methods regard the users history as holistic information to predict the users preference but negating the inherent diversity of the users interests, resulting in vague preference representation. In this paper, we propose a simple yet efficient preference reasoning framework (PR-Net) to explicitly take the diverse interests into account for frame-level highlight prediction. Specifically, distinct user-specific preferences for each input query frame are produced, presented as the similarity weighted sum of history highlights to the corresponding query frame. Next, distinct comprehensive preferences are formed by the user-specific preferences and a learnable generic preference for more overall highlight measurement. Lastly, the degree of highlight and non-highlight for each query frame is calculated as semantic similarity to its comprehensive and non-highlight preferences, respectively. Besides, to alleviate the ambiguity due to the incomplete annotation, a new bi-directional contrastive loss is proposed to ensure a compact and differentiable metric space. In this way, our method significantly outperforms state-of-the-art methods with a relative improvement of 12% in mean accuracy precision.