No Arabic abstract
As it is said by Van Gogh, great things are done by a series of small things brought together. Aesthetic experience arises from the aggregation of underlying visual components. However, most existing deep image aesthetic assessment (IAA) methods over-simplify the IAA process by failing to model image aesthetics with clearly-defined visual components as building blocks. As a result, the connection between resulting aesthetic predictions and underlying visual components is mostly invisible and hard to be explicitly controlled, which limits the model in both performance and interpretability. This work aims to model image aesthetics from the level of visual components. Specifically, object-level regions detected by a generic object detector are defined as visual components, namely object-level visual components (OVCs). Then generic features representing OVCs are aggregated for the aesthetic prediction based upon proposed object-level and graph attention mechanisms, which dynamically determines the importance of individual OVCs and relevance between OVC pairs, respectively. Experimental results confirm the superiority of our framework over previous relevant methods in terms of SRCC and PLCC on the aesthetic rating distribution prediction. Besides, quantitative analysis is done towards model interpretation by observing how OVCs contribute to aesthetic predictions, whose results are found to be supported by psychology on aesthetics and photography rules. To the best of our knowledge, this is the first attempt at the interpretation of a deep IAA model.
In recent years, visual comfort assessment (VCA) for 3D/stereoscopic content has aroused extensive attention. However, much less work has been done on the perceptual evaluation of stereoscopic image retargeting. In this paper, we first build a Stereoscopic Image Retargeting Database (SIRD), which contains source images and retargeted images produced by four typical stereoscopic retargeting methods. Then, the subjective experiment is conducted to assess four aspects of visual distortion, i.e. visual comfort, image quality, depth quality and the overall quality. Furthermore, we propose a Visual Comfort Assessment metric for Stereoscopic Image Retargeting (VCA-SIR). Based on the characteristics of stereoscopic retargeted images, the proposed model introduces novel features like disparity range, boundary disparity as well as disparity intensity distribution into the assessment model. Experimental results demonstrate that VCA-SIR can achieve high consistency with subjective perception.
Visual comfort is a quite important factor in 3D media service. Few research efforts have been carried out in this area especially in case of 3D content retargeting which may introduce more complicated visual distortions. In this paper, we propose a Hybrid Distortion Aggregated Visual Comfort Assessment (HDA-VCA) scheme for stereoscopic retargeted images (SRI), considering aggregation of hybrid distortions including structure distortion, information loss, binocular incongruity and semantic distortion. Specifically, a Local-SSIM feature is proposed to reflect the local structural distortion of SRI, and information loss is represented by Dual Natural Scene Statistics (D-NSS) feature extracted from the binocular summation and difference channels. Regarding binocular incongruity, visual comfort zone, window violation, binocular rivalry, and accommodation-vergence conflict of human visual system (HVS) are evaluated. Finally, the semantic distortion is represented by the correlation distance of paired feature maps extracted from original stereoscopic image and its retargeted image by using trained deep neural network. We validate the effectiveness of HDA-VCA on published Stereoscopic Image Retargeting Database (SIRD) and two stereoscopic image databases IEEE-SA and NBU 3D-VCA. The results demonstrate HDA-VCAs superior performance in handling hybrid distortions compared to state-of-the-art VCA schemes.
Personalized image aesthetic assessment (PIAA) has recently become a hot topic due to its usefulness in a wide variety of applications such as photography, film and television, e-commerce, fashion design and so on. This task is more seriously affected by subjective factors and samples provided by users. In order to acquire precise personalized aesthetic distribution by small amount of samples, we propose a novel user-guided personalized image aesthetic assessment framework. This framework leverages user interactions to retouch and rank images for aesthetic assessment based on deep reinforcement learning (DRL), and generates personalized aesthetic distribution that is more in line with the aesthetic preferences of different users. It mainly consists of two stages. In the first stage, personalized aesthetic ranking is generated by interactive image enhancement and manual ranking, meanwhile two policy networks will be trained. The images will be pushed to the user for manual retouching and simultaneously to the enhancement policy network. The enhancement network utilizes the manual retouching results as the optimization goals of DRL. After that, the ranking process performs the similar operations like the retouching mentioned before. These two networks will be trained iteratively and alternatively to help to complete the final personalized aesthetic assessment automatically. In the second stage, these modified images are labeled with aesthetic attributes by one style-specific classifier, and then the personalized aesthetic distribution is generated based on the multiple aesthetic attributes of these images, which conforms to the aesthetic preference of users better.
Nowadays, most existing blind image quality assessment (BIQA) models 1) are developed for synthetically-distorted images and often generalize poorly to authentic ones; 2) heavily rely on human ratings, which are prohibitively labor-expensive to collect. Here, we propose an $opinion$-$free$ BIQA method that learns from synthetically-distorted images and multiple agents to assess the perceptual quality of authentically-distorted ones captured in the wild without relying on human labels. Specifically, we first assemble a large number of image pairs from synthetically-distorted images and use a set of full-reference image quality assessment (FR-IQA) models to assign pseudo-binary labels of each pair indicating which image has higher quality as the supervisory signal. We then train a convolutional neural network (CNN)-based BIQA model to rank the perceptual quality, optimized for consistency with the binary labels. Since there exists domain shift between the synthetically- and authentically-distorted images, an unsupervised domain adaptation (UDA) module is introduced to alleviate this issue. Extensive experiments demonstrate the effectiveness of our proposed $opinion$-$free$ BIQA model, yielding state-of-the-art performance in terms of correlation with human opinion scores, as well as gMAD competition. Codes will be made publicly available upon acceptance.
Existing blind image quality assessment (BIQA) methods are mostly designed in a disposable way and cannot evolve with unseen distortions adaptively, which greatly limits the deployment and application of BIQA models in real-world scenarios. To address this problem, we propose a novel Lifelong blind Image Quality Assessment (LIQA) approach, targeting to achieve the lifelong learning of BIQA. Without accessing to previous training data, our proposed LIQA can not only learn new distortions, but also mitigate the catastrophic forgetting of seen distortions. Specifically, we adopt the Split-and-Merge distillation strategy to train a single-head network that makes task-agnostic predictions. In the split stage, we first employ a distortion-specific generator to obtain the pseudo features of each seen distortion. Then, we use an auxiliary multi-head regression network to generate the predicted quality of each seen distortion. In the merge stage, we replay the pseudo features paired with pseudo labels to distill the knowledge of multiple heads, which can build the final regressed single head. Experimental results demonstrate that the proposed LIQA method can handle the continuous shifts of different distortion types and even datasets. More importantly, our LIQA model can achieve stable performance even if the task sequence is long.