ﻻ يوجد ملخص باللغة العربية
Previous methods for representing scene images based on deep learning primarily consider either the foreground or background information as the discriminating clues for the classification task. However, scene images also require additional information (hybrid) to cope with the inter-class similarity and intra-class variation problems. In this paper, we propose to use hybrid features in addition to foreground and background features to represent scene images. We suppose that these three types of information could jointly help to represent scene image more accurately. To this end, we adopt three VGG-16 architectures pre-trained on ImageNet, Places, and Hybrid (both ImageNet and Places) datasets for the corresponding extraction of foreground, background and hybrid information. All these three types of deep features are further aggregated to achieve our final features for the representation of scene images. Extensive experiments on two large benchmark scene datasets (MIT-67 and SUN-397) show that our method produces the state-of-the-art classification performance.
Nowadays it is prevalent to take features extracted from pre-trained deep learning models as image representations which have achieved promising classification performance. Existing methods usually consider either object-based features or scene-based
Existing research in scene image classification has focused on either content features (e.g., visual information) or context features (e.g., annotations). As they capture different information about images which can be complementary and useful to dis
Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background. We consider the task of separating these events from the background, which we call foreground-background ambient sound scene separatio
Indoor image features extraction is a fundamental problem in multiple fields such as image processing, pattern recognition, robotics and so on. Nevertheless, most of the existing feature extraction methods, which extract features based on pixels, col
The ability to decompose scenes in terms of abstract building blocks is crucial for general intelligence. Where those basic building blocks share meaningful properties, interactions and other regularities across scenes, such decompositions can simpli