No Arabic abstract
A good distortion representation is crucial for the success of deep blind image quality assessment (BIQA). However, most previous methods do not effectively model the relationship between distortions or the distribution of samples with the same distortion type but different distortion levels. In this work, we start from the analysis of the relationship between perceptual image quality and distortion-related factors, such as distortion types and levels. Then, we propose a Distortion Graph Representation (DGR) learning framework for IQA, named GraphIQA, in which each distortion is represented as a graph, ieno, DGR. One can distinguish distortion types by learning the contrast relationship between these different DGRs, and infer the ranking distribution of samples from different levels in a DGR. Specifically, we develop two sub-networks to learn the DGRs: a) Type Discrimination Network (TDN) that aims to embed DGR into a compact code for better discriminating distortion types and learning the relationship between types; b) Fuzzy Prediction Network (FPN) that aims to extract the distributional characteristics of the samples in a DGR and predicts fuzzy degrees based on a Gaussian prior. Experiments show that our GraphIQA achieves the state-of-the-art performance on many benchmark datasets of both synthetic and authentic distortions.
Existing blind image quality assessment (BIQA) methods are mostly designed in a disposable way and cannot evolve with unseen distortions adaptively, which greatly limits the deployment and application of BIQA models in real-world scenarios. To address this problem, we propose a novel Lifelong blind Image Quality Assessment (LIQA) approach, targeting to achieve the lifelong learning of BIQA. Without accessing to previous training data, our proposed LIQA can not only learn new distortions, but also mitigate the catastrophic forgetting of seen distortions. Specifically, we adopt the Split-and-Merge distillation strategy to train a single-head network that makes task-agnostic predictions. In the split stage, we first employ a distortion-specific generator to obtain the pseudo features of each seen distortion. Then, we use an auxiliary multi-head regression network to generate the predicted quality of each seen distortion. In the merge stage, we replay the pseudo features paired with pseudo labels to distill the knowledge of multiple heads, which can build the final regressed single head. Experimental results demonstrate that the proposed LIQA method can handle the continuous shifts of different distortion types and even datasets. More importantly, our LIQA model can achieve stable performance even if the task sequence is long.
Nowadays, most existing blind image quality assessment (BIQA) models 1) are developed for synthetically-distorted images and often generalize poorly to authentic ones; 2) heavily rely on human ratings, which are prohibitively labor-expensive to collect. Here, we propose an $opinion$-$free$ BIQA method that learns from synthetically-distorted images and multiple agents to assess the perceptual quality of authentically-distorted ones captured in the wild without relying on human labels. Specifically, we first assemble a large number of image pairs from synthetically-distorted images and use a set of full-reference image quality assessment (FR-IQA) models to assign pseudo-binary labels of each pair indicating which image has higher quality as the supervisory signal. We then train a convolutional neural network (CNN)-based BIQA model to rank the perceptual quality, optimized for consistency with the binary labels. Since there exists domain shift between the synthetically- and authentically-distorted images, an unsupervised domain adaptation (UDA) module is introduced to alleviate this issue. Extensive experiments demonstrate the effectiveness of our proposed $opinion$-$free$ BIQA model, yielding state-of-the-art performance in terms of correlation with human opinion scores, as well as gMAD competition. Codes will be made publicly available upon acceptance.
The explosive growth of image data facilitates the fast development of image processing and computer vision methods for emerging visual applications, meanwhile introducing novel distortions to the processed images. This poses a grand challenge to existing blind image quality assessment (BIQA) models, failing to continually adapt to such subpopulation shift. Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets. However, this type of approach is not scalable to a large number of datasets, and is cumbersome to incorporate a newly created dataset as well. In this paper, we formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets, building on what was learned from previously seen data. We first identify five desiderata in the new setting with a measure to quantify the plasticity-stability trade-off. We then propose a simple yet effective method for learning BIQA models continually. Specifically, based on a shared backbone network, we add a prediction head for a new dataset, and enforce a regularizer to allow all prediction heads to evolve with new data while being resistant to catastrophic forgetting of old data. We compute the quality score by an adaptive weighted summation of estimates from all prediction heads. Extensive experiments demonstrate the promise of the proposed continual learning method in comparison to standard training techniques for BIQA.
Visual comfort is a quite important factor in 3D media service. Few research efforts have been carried out in this area especially in case of 3D content retargeting which may introduce more complicated visual distortions. In this paper, we propose a Hybrid Distortion Aggregated Visual Comfort Assessment (HDA-VCA) scheme for stereoscopic retargeted images (SRI), considering aggregation of hybrid distortions including structure distortion, information loss, binocular incongruity and semantic distortion. Specifically, a Local-SSIM feature is proposed to reflect the local structural distortion of SRI, and information loss is represented by Dual Natural Scene Statistics (D-NSS) feature extracted from the binocular summation and difference channels. Regarding binocular incongruity, visual comfort zone, window violation, binocular rivalry, and accommodation-vergence conflict of human visual system (HVS) are evaluated. Finally, the semantic distortion is represented by the correlation distance of paired feature maps extracted from original stereoscopic image and its retargeted image by using trained deep neural network. We validate the effectiveness of HDA-VCA on published Stereoscopic Image Retargeting Database (SIRD) and two stereoscopic image databases IEEE-SA and NBU 3D-VCA. The results demonstrate HDA-VCAs superior performance in handling hybrid distortions compared to state-of-the-art VCA schemes.
Ensemble methods are generally regarded to be better than a single model if the base learners are deemed to be accurate and diverse. Here we investigate a semi-supervised ensemble learning strategy to produce generalizable blind image quality assessment models. We train a multi-head convolutional network for quality prediction by maximizing the accuracy of the ensemble (as well as the base learners) on labeled data, and the disagreement (i.e., diversity) among them on unlabeled data, both implemented by the fidelity loss. We conduct extensive experiments to demonstrate the advantages of employing unlabeled data for BIQA, especially in model generalization and failure identification.