No Arabic abstract
In contrast with traditional video, omnidirectional video enables spherical viewing direction with support for head-mounted displays, providing an interactive and immersive experience. Unfortunately, to the best of our knowledge, there are few visual quality assessment (VQA) methods, either subjective or objective, for omnidirectional video coding. This paper proposes both subjective and objective methods for assessing quality loss in encoding omnidirectional video. Specifically, we first present a new database, which includes the viewing direction data from several subjects watching omnidirectional video sequences. Then, from our database, we find a high consistency in viewing directions across different subjects. The viewing directions are normally distributed in the center of the front regions, but they sometimes fall into other regions, related to video content. Given this finding, we present a subjective VQA method for measuring difference mean opinion score (DMOS) of the whole and regional omnidirectional video, in terms of overall DMOS (O-DMOS) and vectorized DMOS (V-DMOS), respectively. Moreover, we propose two objective VQA methods for encoded omnidirectional video, in light of human perception characteristics of omnidirectional video. One method weighs the distortion of pixels with regard to their distances to the center of front regions, which considers human preference in a panorama. The other method predicts viewing directions according to video content, and then the predicted viewing directions are leveraged to allocate weights to the distortion of each pixel in our objective VQA method. Finally, our experimental results verify that both the subjective and objective methods proposed in this paper advance state-of-the-art VQA for omnidirectional video.
The diversity of video delivery pipeline poses a grand challenge to the evaluation of adaptive bitrate (ABR) streaming algorithms and objective quality-of-experience (QoE) models. Here we introduce so-far the largest subject-rated database of its kind, namely WaterlooSQoE-IV, consisting of 1350 adaptive streaming videos created from diverse source contents, video encoders, network traces, ABR algorithms, and viewing devices. We collect human opinions for each video with a series of carefully designed subjective experiments. Subsequent data analysis and testing/comparison of ABR algorithms and QoE models using the database lead to a series of novel observations and interesting findings, in terms of the effectiveness of subjective experiment methodologies, the interactions between user experience and source content, viewing device and encoder type, the heterogeneities in the bias and preference of user experiences, the behaviors of ABR algorithms, and the performance of objective QoE models. Most importantly, our results suggest that a better objective QoE model, or a better understanding of human perceptual experience and behaviour, is the most dominating factor in improving the performance of ABR algorithms, as opposed to advanced optimization frameworks, machine learning strategies or bandwidth predictors, where a majority of ABR research has been focused on in the past decade. On the other hand, our performance evaluation of 11 QoE models shows only a moderate correlation between state-of-the-art QoE models and subjective ratings, implying rooms for improvement in both QoE modeling and ABR algorithms. The database is made publicly available at: url{https://ece.uwaterloo.ca/~zduanmu/waterloosqoe4/}.
Video live streaming is gaining prevalence among video streaming services, especially for the delivery of popular sporting events. Many objective Video Quality Assessment (VQA) models have been developed to predict the perceptual quality of videos. Appropriate databases that exemplify the distortions encountered in live streaming videos are important to designing and learning objective VQA models. Towards making progress in this direction, we built a video quality database specifically designed for live streaming VQA research. The new video database is called the Laboratory for Image and Video Engineering (LIVE) Live stream Database. The LIVE Livestream Database includes 315 videos of 45 contents impaired by 6 types of distortions. We also performed a subjective quality study using the new database, whereby more than 12,000 human opinions were gathered from 40 subjects. We demonstrate the usefulness of the new resource by performing a holistic evaluation of the performance of current state-of-the-art (SOTA) VQA models. The LIVE Livestream database is being made publicly available for these purposes at https://live.ece.utexas.edu/research/LIVE_APV_Study/apv_index.html.
Perceptual quality assessment of the videos acquired in the wilds is of vital importance for quality assurance of video services. The inaccessibility of reference videos with pristine quality and the complexity of authentic distortions pose great challenges for this kind of blind video quality assessment (BVQA) task. Although model-based transfer learning is an effective and efficient paradigm for the BVQA task, it remains to be a challenge to explore what and how to bridge the domain shifts for better video representation. In this work, we propose to transfer knowledge from image quality assessment (IQA) databases with authentic distortions and large-scale action recognition with rich motion patterns. We rely on both groups of data to learn the feature extractor. We train the proposed model on the target VQA databases using a mixed list-wise ranking loss function. Extensive experiments on six databases demonstrate that our method performs very competitively under both individual database and mixed database training settings. We also verify the rationality of each component of the proposed method and explore a simple manner for further improvement.
Image quality assessment (IQA) models aim to establish a quantitative relationship between visual images and their perceptual quality by human observers. IQA modeling plays a special bridging role between vision science and engineering practice, both as a test-bed for vision theories and computational biovision models, and as a powerful tool that could potentially make profound impact on a broad range of image processing, computer vision, and computer graphics applications, for design, optimization, and evaluation purposes. IQA research has enjoyed an accelerated growth in the past two decades. Here we present an overview of IQA methods from a Bayesian perspective, with the goals of unifying a wide spectrum of IQA approaches under a common framework and providing useful references to fundamental concepts accessible to vision scientists and image processing practitioners. We discuss the implications of the successes and limitations of modern IQA methods for biological vision and the prospect for vision science to inform the design of future artificial vision systems.
Deep learning based methods have achieved remarkable success in image restoration and enhancement, but most such methods rely on RGB input images. These methods fail to take into account the rich spectral distribution of natural images. We propose a deep architecture, SpecNet, which computes spectral profile to estimate pixel-wise dynamic range adjustment of a given image. First, we employ an unpaired cycle-consistent framework to generate hyperspectral images (HSI) from low-light input images. HSI is further used to generate a normal light image of the same scene. We incorporate a self-supervision and a spectral profile regularization network to infer a plausible HSI from an RGB image. We evaluate the benefits of optimizing the spectral profile for real and fake images in low-light conditions on the LOL Dataset.