No Arabic abstract
The latest High Efficiency Video Coding (HEVC) standard has been increasingly applied to generate video streams over the Internet. However, HEVC compressed videos may incur severe quality degradation, particularly at low bit-rates. Thus, it is necessary to enhance the visual quality of HEVC videos at the decoder side. To this end, this paper proposes a Quality Enhancement Convolutional Neural Network (QE-CNN) method that does not require any modification of the encoder to achieve quality enhancement for HEVC. In particular, our QE-CNN method learns QE-CNN-I and QE-CNN-P models to reduce the distortion of HEVC I and P frames, respectively. The proposed method differs from the existing CNN-based quality enhancement approaches, which only handle intra-coding distortion and are thus not suitable for P frames. Our experimental results validate that our QE-CNN method is effective in enhancing quality for both I and P frames of HEVC videos. To apply our QE-CNN method in time-constrained scenarios, we further propose a Time-constrained Quality Enhancement Optimization (TQEO) scheme. Our TQEO scheme controls the computational time of QE-CNN to meet a target, meanwhile maximizing the quality enhancement. Next, the experimental results demonstrate the effectiveness of our TQEO scheme from the aspects of time control accuracy and quality enhancement under different time constraints. Finally, we design a prototype to implement our TQEO scheme in a real-time scenario.
In this paper, we propose a partition-masked Convolution Neural Network (CNN) to achieve compressed-video enhancement for the state-of-the-art coding standard, High Efficiency Video Coding (HECV). More precisely, our method utilizes the partition information produced by the encoder to guide the quality enhancement process. In contrast to existing CNN-based approaches, which only take the decoded frame as the input to the CNN, the proposed approach considers the coding unit (CU) size information and combines it with the distorted decoded frame such that the degradation introduced by HEVC is reduced more efficiently. Experimental results show that our approach leads to over 9.76% BD-rate saving on benchmark sequences, which achieves the state-of-the-art performance.
High-efficiency video coding (HEVC) encryption has been proposed to encrypt syntax elements for the purpose of video encryption. To achieve high video security, to the best of our knowledge, almost all of the existing HEVC encryption algorithms mainly encrypt the whole video, such that the user without permissions cannot obtain any viewable information. However, these encryption algorithms cannot meet the needs of customers who need part of the information but not the full information in the video. In many cases, such as professional paid videos or video meetings, users would like to observe some visible information in the encrypted video of the original video to satisfy their requirements in daily life. Aiming at this demand, this paper proposes a multi-level encryption scheme that is composed of lightweight encryption, medium encryption and heavyweight encryption, where each encryption level can obtain a different amount of visual information. It is found that both encrypting the luma intraprediction model (IPM) and scrambling the syntax element of the DCT coefficient sign can achieve the performance of a distorted video in which there is still residual visual information, while encrypting both of them can implement the intensity of encryption and one cannot gain any visual information. The experimental results meet our expectations appropriately, indicating that there is a different amount of visual information in each encryption level. Meanwhile, users can flexibly choose the encryption level according to their various requirements.
In this paper, we propose a quality enhancement network of versatile video coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS). The proposed network consists of a temporal structure fusion subnet and a spatial detail enhancement subnet. The former subnet is used to estimate and compensate the temporal motion across frames, and the latter subnet is used to reduce the compression artifacts and enhance the reconstruction quality of compressed video. Experimental results demonstrate the effectiveness of our SDTS-based method.
High frame rates have been known to enhance the perceived visual quality of specific video content. However, the lack of investigation of high frame rates has restricted the expansion of this research field particularly in the context of full-high-definition (FHD) and 4K ultra-high-definition video formats. This study involves a subjective and objective quality assessment of compressed FHD videos. First, we compress the FHD videos by employing high-efficiency video coding, and VP9 at five quantization parameter levels for multiple frame rates, i.e., 15fps, 30fps, and 60fps. The FHD videos are obtained from a high frame-rate video database BVI-HFR, spanning various scenes, colors, and motions, and are shown to be representative of the BBC broadcast content. Second, a detailed subjective quality assessment of compressed videos for both encoders and individual frame rates is conducted, resulting in subjective measurements in the form of the differential mean opinion score reflecting the quality of experience. In particular, the aim is to investigate the impact of compression on the perceptual quality of compressed FHD videos and compare the performance of both encoders for each frame rate. Finally, 11 state-of-the-art objective quality assessment metrics are benchmarked using the subjective measurements, to investigate the correlation as a statistical evaluation between the two models in terms of correlation coefficients. A recommendation for enhancing the quality estimation of full-reference (FR) video quality measurements (VQMs) is presented after the extensive investigation.
Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.