ﻻ يوجد ملخص باللغة العربية
Action detection plays an important role in high-level video understanding and media interpretation. Many existing studies fulfill this spatio-temporal localization by modeling the context, capturing the relationship of actors, objects, and scenes conveyed in the video. However, they often universally treat all the actors without considering the consistency and distinctness between individuals, leaving much room for improvement. In this paper, we explicitly highlight the identity information of the actors in terms of both long-term and short-term context through a graph memory network, namely identity-aware graph memory network (IGMN). Specifically, we propose the hierarchical graph neural network (HGNN) to comprehensively conduct long-term relation modeling within the same identity as well as between different ones. Regarding short-term context, we develop a dual attention module (DAM) to generate identity-aware constraint to reduce the influence of interference by the actors of different identities. Extensive experiments on the challenging AVA dataset demonstrate the effectiveness of our method, which achieves state-of-the-art results on AVA v2.1 and v2.2.
Detecting 3D landmarks on cone-beam computed tomography (CBCT) is crucial to assessing and quantifying the anatomical abnormalities in 3D cephalometric analysis. However, the current methods are time-consuming and suffer from large biases in landmark
Weakly supervised temporal action localization aims to detect and localize actions in untrimmed videos with only video-level labels during training. However, without frame-level annotations, it is challenging to achieve localization completeness and
In this technical report, we describe our solution to temporal action proposal (task 1) in ActivityNet Challenge 2019. First, we fine-tune a ResNet-50-C3D CNN on ActivityNet v1.3 based on Kinetics pretrained model to extract snippet-level video repre
This technical report presents our solution for temporal action detection task in AcitivityNet Challenge 2021. The purpose of this task is to locate and identify actions of interest in long untrimmed videos. The crucial challenge of the task comes fr
Despite recent progress of automatic medical image segmentation techniques, fully automatic results usually fail to meet the clinical use and typically require further refinement. In this work, we propose a quality-aware memory network for interactiv