No Arabic abstract
The quality assessment of light field images presents new challenges to conventional compression methods, as the spatial quality is affected by the optical distortion of capturing devices, and the angular consistency affects the performance of dynamic rendering applications. In this paper, we propose a two-pass encoding system for pseudo-temporal sequence based light field image compression with a novel frame level bit allocation framework that optimizes spatial quality and angular consistency simultaneously. Frame level rate-distortion models are estimated during the first pass, and the second pass performs the actual encoding with optimized bit allocations given by a two-step convex programming. The proposed framework supports various encoder configurations. Experimental results show that comparing to the anchor HM 16.16 (HEVC reference software), the proposed two-pass encoding system on average achieves 11.2% to 11.9% BD-rate reductions for the all-intra configuration, 15.8% to 32.7% BD-rate reductions for the random-access configuration, and 12.1% to 15.7% BD-rate reductions for the low-delay configuration. The resulting bit errors are limited, and the total time cost is less than twice of the one-pass anchor. Comparing with our earlier low-delay configuration based method, the proposed system improves BD-rate reduction by 3.1% to 8.3%, reduces the bit errors by more than 60%, and achieves more than 12x speed up.
Compared with conventional image and video, light field images introduce the weight channel, as well as the visual consistency of rendered view, information that has to be taken into account when compressing the pseudo-temporal-sequence (PTS) created from light field images. In this paper, we propose a novel frame level bit allocation framework for PTS coding. A joint model that measures weighted distortion and visual consistency, combined with an iterative encoding system, yields the optimal bit allocation for each frame by solving a convex optimization problem. Experimental results show that the proposed framework is effective in producing desired distortion distribution based on weights, and achieves up to 24.7% BD-rate reduction comparing to the default rate control algorithm.
Light field (LF) representations aim to provide photo-realistic, free-viewpoint viewing experiences. However, the most popular LF representations are images from multiple views. Multi-view image-based representations generally need to restrict the range or degrees of freedom of the viewing experience to what can be interpolated in the image domain, essentially because they lack explicit geometry information. We present a new surface light field (SLF) representation based on explicit geometry, and a method for SLF compression. First, we map the multi-view images of a scene onto a 3D geometric point cloud. The color of each point in the point cloud is a function of viewing direction known as a view map. We represent each view map efficiently in a B-Spline wavelet basis. This representation is capable of modeling diverse surface materials and complex lighting conditions in a highly scalable and adaptive manner. The coefficients of the B-Spline wavelet representation are then compressed spatially. To increase the spatial correlation and thus improve compression efficiency, we introduce a smoothing term to make the coefficients more similar across the 3D space. We compress the coefficients spatially using existing point cloud compression (PCC) methods. On the decoder side, the scene is rendered efficiently from any viewing direction by reconstructing the view map at each point. In contrast to multi-view image-based LF approaches, our method supports photo-realistic rendering of real-world scenes from arbitrary viewpoints, i.e., with an unlimited six degrees of freedom (6DOF). In terms of rate and distortion, experimental results show that our method achieves superior performance with lighter decoder complexity compared with a reference image-plus-geometry compression (IGC) scheme, indicating its potential in practical virtual and augmented reality applications.
Learning-based light field reconstruction methods demand in constructing a large receptive field by deepening the network to capture correspondences between input views. In this paper, we propose a spatial-angular attention network to perceive correspondences in the light field non-locally, and reconstruction high angular resolution light field in an end-to-end manner. Motivated by the non-local attention mechanism, a spatial-angular attention module specifically for the high-dimensional light field data is introduced to compute the responses from all the positions in the epipolar plane for each pixel in the light field, and generate an attention map that captures correspondences along the angular dimension. We then propose a multi-scale reconstruction structure to efficiently implement the non-local attention in the low spatial scale, while also preserving the high frequency components in the high spatial scales. Extensive experiments demonstrate the superior performance of the proposed spatial-angular attention network for reconstructing sparsely-sampled light fields with non-Lambertian effects.
A good distortion representation is crucial for the success of deep blind image quality assessment (BIQA). However, most previous methods do not effectively model the relationship between distortions or the distribution of samples with the same distortion type but different distortion levels. In this work, we start from the analysis of the relationship between perceptual image quality and distortion-related factors, such as distortion types and levels. Then, we propose a Distortion Graph Representation (DGR) learning framework for IQA, named GraphIQA, in which each distortion is represented as a graph, ieno, DGR. One can distinguish distortion types by learning the contrast relationship between these different DGRs, and infer the ranking distribution of samples from different levels in a DGR. Specifically, we develop two sub-networks to learn the DGRs: a) Type Discrimination Network (TDN) that aims to embed DGR into a compact code for better discriminating distortion types and learning the relationship between types; b) Fuzzy Prediction Network (FPN) that aims to extract the distributional characteristics of the samples in a DGR and predicts fuzzy degrees based on a Gaussian prior. Experiments show that our GraphIQA achieves the state-of-the-art performance on many benchmark datasets of both synthetic and authentic distortions.
Existing blind image quality assessment (BIQA) methods are mostly designed in a disposable way and cannot evolve with unseen distortions adaptively, which greatly limits the deployment and application of BIQA models in real-world scenarios. To address this problem, we propose a novel Lifelong blind Image Quality Assessment (LIQA) approach, targeting to achieve the lifelong learning of BIQA. Without accessing to previous training data, our proposed LIQA can not only learn new distortions, but also mitigate the catastrophic forgetting of seen distortions. Specifically, we adopt the Split-and-Merge distillation strategy to train a single-head network that makes task-agnostic predictions. In the split stage, we first employ a distortion-specific generator to obtain the pseudo features of each seen distortion. Then, we use an auxiliary multi-head regression network to generate the predicted quality of each seen distortion. In the merge stage, we replay the pseudo features paired with pseudo labels to distill the knowledge of multiple heads, which can build the final regressed single head. Experimental results demonstrate that the proposed LIQA method can handle the continuous shifts of different distortion types and even datasets. More importantly, our LIQA model can achieve stable performance even if the task sequence is long.