ﻻ يوجد ملخص باللغة العربية
Recent studies on image memorability have shed light on the visual features that make generic images, object images or face photographs memorable. However, a clear understanding and reliable estimation of natural scene memorability remain elusive. In this paper, we provide an attempt to answer: what exactly makes natural scene memorable. Specifically, we first build LNSIM, a large-scale natural scene image memorability database (containing 2,632 images and memorability annotations). Then, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of natural scene. In particular, we find that high-level feature of scene category is rather correlated with natural scene memorability. Thus, we propose a deep neural network based natural scene memorability (DeepNSM) predictor, which takes advantage of scene category. Finally, the experimental results validate the effectiveness of DeepNSM.
With the explosion of video content on the Internet, there is a need for research on methods for video analysis which take human cognition into account. One such cognitive measure is memorability, or the ability to recall visual content after watchin
Recent studies indicate that hierarchical Vision Transformer with a macro architecture of interleaved non-overlapped window-based self-attention & shifted-window operation is able to achieve state-of-the-art performance in various visual recognition
Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart. In our experiments, h
Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning. Despite its success, the influence of different view choices has been less studied. In
Contrastive visual pretraining based on the instance discrimination pretext task has made significant progress. Notably, recent work on unsupervised pretraining has shown to surpass the supervised counterpart for finetuning downstream applications su