No Arabic abstract
3D video coding is one of the most popular research area in multimedia. This paper reviews the recent progress of the coding technologies for multiview video (MVV) and free view-point video (FVV) which is represented by MVV and depth maps. We first discuss the traditional multiview video coding (MVC) framework with different prediction structures. The rate-distortion performance and the view switching delay of the three main coding prediction structures are analyzed. We further introduce the joint coding technologies for MVV and depth maps and evaluate the rate-distortion performance of them. The scalable 3D video coding technologies are reviewed by the quality and view scalability, respectively. Finally, we summarize the bit allocation work of 3D video coding. This paper also points out some future research problems in high efficiency 3D video coding such as the view switching latency optimization in coding structure and bit allocation.
Cross-component linear model (CCLM) prediction has been repeatedly proven to be effective in reducing the inter-channel redundancies in video compression. Essentially speaking, the linear model is identically trained by employing accessible luma and chroma reference samples at both encoder and decoder, elevating the level of operational complexity due to the least square regression or max-min based model parameter derivation. In this paper, we investigate the capability of the linear model in the context of sub-sampled based cross-component correlation mining, as a means of significantly releasing the operation burden and facilitating the hardware and software design for both encoder and decoder. In particular, the sub-sampling ratios and positions are elaborately designed by exploiting the spatial correlation and the inter-channel correlation. Extensive experiments verify that the proposed method is characterized by its simplicity in operation and robustness in terms of rate-distortion performance, leading to the adoption by Versatile Video Coding (VVC) standard and the third generation of Audio Video Coding Standard (AVS3).
In a typical video rate allocation problem, the objective is to optimally distribute a source rate budget among a set of (in)dependently coded data units to minimize the total distortion of all units. Conventional Lagrangian approaches convert the lone rate constraint to a linear rate penalty scaled by a multiplier in the objective, resulting in a simpler unconstrained formulation. However, the search for the optimal multiplier, one that results in a distortion-minimizing solution among all Lagrangian solutions that satisfy the original rate constraint, remains an elusive open problem in the general setting. To address this problem, we propose a computation-efficient search strategy to identify this optimal multiplier numerically. Specifically, we first formulate a general rate allocation problem where each data unit can be dependently coded at different quantization parameters (QP) using a previous unit as predictor, or left uncoded at the encoder and subsequently interpolated at the decoder using neighboring coded units. After converting the original rate constrained problem to the unconstrained Lagrangian counterpart, we design an efficient dynamic programming (DP) algorithm that finds the optimal Lagrangian solution for a fixed multiplier. Finally, within the DP framework, we iteratively compute neighboring singular multiplier values, each resulting in multiple simultaneously optimal Lagrangian solutions, to drive the rates of the computed Lagrangian solutions towards the bit budget. We terminate when a singular multiplier value results in two Lagrangian solutions with rates below and above the bit budget. In extensive monoview and multiview video coding experiments, we show that our DP algorithm and selection of optimal multipliers on average outperform comparable rate control solutions used in video compression standards such as HEVC that do not skip frames in Y-PSNR.
Deep learning has demonstrated tremendous break through in the area of image/video processing. In this paper, a spatial-temporal residue network (STResNet) based in-loop filter is proposed to suppress visual artifacts such as blocking, ringing in video coding. Specifically, the spatial and temporal information is jointly exploited by taking both current block and co-located block in reference frame into consideration during the processing of in-loop filter. The architecture of STResNet only consists of four convolution layers which shows hospitality to memory and coding complexity. Moreover, to fully adapt the input content and improve the performance of the proposed in-loop filter, coding tree unit (CTU) level control flag is applied in the sense of rate-distortion optimization. Extensive experimental results show that our scheme provides up to 5.1% bit-rate reduction compared to the state-of-the-art video coding standard.
We report results from a measurement study of three video streaming services, YouTube, Dailymotion and Vimeo on six different smartphones. We measure and analyze the traffic and energy consumption when streaming different quality videos over Wi-Fi and 3G. We identify five different techniques to deliver the video and show that the use of a particular technique depends on the device, player, quality, and service. The energy consumption varies dramatically between devices, services, and video qualities depending on the streaming technique used. As a consequence, we come up with suggestions on how to improve the energy efficiency of mobile video streaming services.
In a distributed storage system, code symbols are dispersed across space in nodes or storage units as opposed to time. In settings such as that of a large data center, an important consideration is the efficient repair of a failed node. Efficient repair calls for erasure codes that in the face of node failure, are efficient in terms of minimizing the amount of repair data transferred over the network, the amount of data accessed at a helper node as well as the number of helper nodes contacted. Coding theory has evolved to handle these challenges by introducing two new classes of erasure codes, namely regenerating codes and locally recoverable codes as well as by coming up with novel ways to repair the ubiquitous Reed-Solomon code. This survey provides an overview of the efforts in this direction that have taken place over the past decade.