New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Sub-sampled Cross-component Prediction for Emerging Video Coding Standards

68 0 0.0 ( 0 )

Download Cite

Added by Meng Wang

Publication date 2020

fields Informatics Engineering Electronic Engineering

and research's language is English

Authors Junru Li - Meng Wang - Li Zhang

Multimedia Image and Video Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Cross-component linear model (CCLM) prediction has been repeatedly proven to be effective in reducing the inter-channel redundancies in video compression. Essentially speaking, the linear model is identically trained by employing accessible luma and chroma reference samples at both encoder and decoder, elevating the level of operational complexity due to the least square regression or max-min based model parameter derivation. In this paper, we investigate the capability of the linear model in the context of sub-sampled based cross-component correlation mining, as a means of significantly releasing the operation burden and facilitating the hardware and software design for both encoder and decoder. In particular, the sub-sampling ratios and positions are elaborately designed by exploiting the spatial correlation and the inter-channel correlation. Extensive experiments verify that the proposed method is characterized by its simplicity in operation and robustness in terms of rate-distortion performance, leading to the adoption by Versatile Video Coding (VVC) standard and the third generation of Audio Video Coding Standard (AVS3).

rate research

An Overview of Emerging Technologies for High Efficiency 3D Video Coding

118 - Qifei Wang 2015

3D video coding is one of the most popular research area in multimedia. This paper reviews the recent progress of the coding technologies for multiview video (MVV) and free view-point video (FVV) which is represented by MVV and depth maps. We first discuss the traditional multiview video coding (MVC) framework with different prediction structures. The rate-distortion performance and the view switching delay of the three main coding prediction structures are analyzed. We further introduce the joint coding technologies for MVV and depth maps and evaluate the rate-distortion performance of them. The scalable 3D video coding technologies are reviewed by the quality and view scalability, respectively. Finally, we summarize the bit allocation work of 3D video coding. This paper also points out some future research problems in high efficiency 3D video coding such as the view switching latency optimization in coding structure and bit allocation.

Multimedia

Deep Learning-Based Video Coding: A Review and A Case Study

88 - Dong Liu , Yue Li , Jianping Lin 2019

The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works about using deep learning for image/video coding, which has been an actively developing research area since the year of 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks (deep schemes), and deep network-based coding tools (deep tools) that shall be used within traditional coding schemes or together with traditional coding tools. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively. For deep tools, there have been several proposed techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, namely Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF) and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help improve the compression efficiency by a significant margin. With the two deep tools as well as other non-deep coding tools, DLVC is able to achieve on average 39.6% and 33.0% bits saving than HEVC, under random-access and low-delay configurations, respectively. The source code of DLVC has been released for future researches.

Multimedia Image and Video Processing

Optimal Lagrange Multipliers for Dependent Rate Allocation in Video Coding

54 - Ana De Abreu , Gene Cheung , Pascal Frossard 2016

In a typical video rate allocation problem, the objective is to optimally distribute a source rate budget among a set of (in)dependently coded data units to minimize the total distortion of all units. Conventional Lagrangian approaches convert the lone rate constraint to a linear rate penalty scaled by a multiplier in the objective, resulting in a simpler unconstrained formulation. However, the search for the optimal multiplier, one that results in a distortion-minimizing solution among all Lagrangian solutions that satisfy the original rate constraint, remains an elusive open problem in the general setting. To address this problem, we propose a computation-efficient search strategy to identify this optimal multiplier numerically. Specifically, we first formulate a general rate allocation problem where each data unit can be dependently coded at different quantization parameters (QP) using a previous unit as predictor, or left uncoded at the encoder and subsequently interpolated at the decoder using neighboring coded units. After converting the original rate constrained problem to the unconstrained Lagrangian counterpart, we design an efficient dynamic programming (DP) algorithm that finds the optimal Lagrangian solution for a fixed multiplier. Finally, within the DP framework, we iteratively compute neighboring singular multiplier values, each resulting in multiple simultaneously optimal Lagrangian solutions, to drive the rates of the computed Lagrangian solutions towards the bit budget. We terminate when a singular multiplier value results in two Lagrangian solutions with rates below and above the bit budget. In extensive monoview and multiview video coding experiments, we show that our DP algorithm and selection of optimal multipliers on average outperform comparable rate control solutions used in video compression standards such as HEVC that do not skip frames in Y-PSNR.

Multimedia

Spatial-Temporal Residue Network Based In-Loop Filter for Video Coding

69 - Chuanmin Jia , Shiqi Wang , Xinfeng Zhang 2017

Deep learning has demonstrated tremendous break through in the area of image/video processing. In this paper, a spatial-temporal residue network (STResNet) based in-loop filter is proposed to suppress visual artifacts such as blocking, ringing in video coding. Specifically, the spatial and temporal information is jointly exploited by taking both current block and co-located block in reference frame into consideration during the processing of in-loop filter. The architecture of STResNet only consists of four convolution layers which shows hospitality to memory and coding complexity. Moreover, to fully adapt the input content and improve the performance of the proposed in-loop filter, coding tree unit (CTU) level control flag is applied in the sense of rate-distortion optimization. Extensive experimental results show that our scheme provides up to 5.1% bit-rate reduction compared to the state-of-the-art video coding standard.

Multimedia

Probabilistic Tile Visibility-Based Server-Side Rate Adaptation for Adaptive 360-Degree Video Streaming

80 - Junni Zou , Chenglin Li , Chengming Liu 2019

In this paper, we study the server-side rate adaptation problem for streaming tile-based adaptive 360-degree videos to multiple users who are competing for transmission resources at the network bottleneck. Specifically, we develop a convolutional neural network (CNN)-based viewpoint prediction model to capture the nonlinear relationship between the future and historical viewpoints. A Laplace distribution model is utilized to characterize the probability distribution of the prediction error. Given the predicted viewpoint, we then map the viewport in the spherical space into its corresponding planar projection in the 2-D plane, and further derive the visibility probability of each tile based on the planar projection and the prediction error probability. According to the visibility probability, tiles are classified as viewport, marginal and invisible tiles. The server-side tile rate allocation problem for multiple users is then formulated as a non-linear discrete optimization problem to minimize the overall received video distortion of all users and the quality difference between the viewport and marginal tiles of each user, subject to the transmission capacity constraints and users specific viewport requirements. We develop a steepest descent algorithm to solve this non-linear discrete optimization problem, by initializing the feasible starting point in accordance with the optimal solution of its continuous relaxation. Extensive experimental results show that the proposed algorithm can achieve a near-optimal solution, and outperforms the existing rate adaptation schemes for tile-based adaptive 360-video streaming.

Multimedia Image and Video Processing

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sub-sampled Cross-component Prediction for Emerging Video Coding Standards

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions