ﻻ يوجد ملخص باللغة العربية
In this paper, we proposed an integrated model of semantic-aware and contrast-aware saliency combining both bottom-up and top-down cues for effective saliency estimation and eye fixation prediction. The proposed model processes visual information using two pathways. The first pathway aims to capture the attractive semantic information in images, especially for the presence of meaningful objects and object parts such as human faces. The second pathway is based on multi-scale on-line feature learning and information maximization, which learns an adaptive sparse representation for the input and discovers the high contrast salient patterns within the image context. The two pathways characterize both long-term and short-term attention cues and are integrated dynamically using maxima normalization. We investigate two different implementations of the semantic pathway including an End-to-End deep neural network solution and a dynamic feature integration solution, resulting in the SCA and SCAFI model respectively. Experimental results on artificial images and 5 popular benchmark datasets demonstrate the superior performance and better plausibility of the proposed model over both classic approaches and recent deep models.
Existing weakly supervised semantic segmentation (WSSS) methods usually utilize the results of pre-trained saliency detection (SD) models without explicitly modeling the connections between the two tasks, which is not the most efficient configuration
Aesthetic image cropping is a practical but challenging task which aims at finding the best crops with the highest aesthetic quality in an image. Recently, many deep learning methods have been proposed to address this problem, but they did not reveal
Objective image quality assessment (IQA) is imperative in the current multimedia-intensive world, in order to assess the visual quality of an image at close to a human level of ability. Many~parameters such as color intensity, structure, sharpness, c
Current semantic segmentation methods focus only on mining local context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization criteri
Deep-learning-based algorithms have led to impressive results in visual-saliency prediction, but the impact of noise in training gaze data has been largely overlooked. This issue is especially relevant for videos, where the gaze data tends to be inco