ﻻ يوجد ملخص باللغة العربية
The goal of weakly-supervised video moment retrieval is to localize the video segment most relevant to the given natural language query without access to temporal annotations during training. Prior strongly- and weakly-supervised approaches often leverage co-attention mechanisms to learn visual-semantic representations for localization. However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query. Consequently, the above-mentioned visual-semantic representations, built upon local frame features, do not contain much contextual information. To address this limitation, we propose a Latent Graph Co-Attention Network (LoGAN) that exploits fine-grained frame-by-word interactions to reason about correspondences between all possible pairs of frames, given the semantic context of the query. Comprehensive experiments across two datasets, DiDeMo and Charades-Sta, demonstrate the effectiveness of our proposed latent co-attention model where it outperforms current state-of-the-art (SOTA) weakly-supervised approaches by a significant margin. Notably, it even achieves a 11% improvement to Recall@1 accuracy over strongly-supervised SOTA methods on DiDeMo.
Weakly supervised semantic segmentation is receiving great attention due to its low human annotation cost. In this paper, we aim to tackle bounding box supervised semantic segmentation, i.e., training accurate semantic segmentation models using bound
Content-based video retrieval aims to find videos from a large video database that are similar to or even near-duplicate of a given query video. Video representation and similarity search algorithms are crucial to any video retrieval system. To deriv
During the first wave of COVID-19, hospitals were overwhelmed with the high number of admitted patients. An accurate prediction of the most likely individual disease progression can improve the planning of limited resources and finding the optimal tr
Cervical cancer is the second most prevalent cancer affecting women today. As the early detection of cervical carcinoma relies heavily upon screening and pre-clinical testing, digital cervicography has great potential as a primary or auxiliary screen
Weakly Supervised Object Detection (WSOD) has emerged as an effective tool to train object detectors using only the image-level category labels. However, without object-level labels, WSOD detectors are prone to detect bounding boxes on salient object