Do you want to publish a course? Click here

Visual Analysis Motivated Rate-Distortion Model for Image Coding

173   0   0.0 ( 0 )
 Added by Zhimeng Huang
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Optimized for pixel fidelity metrics, images compressed by existing image codec are facing systematic challenges when used for visual analysis tasks, especially under low-bitrate coding. This paper proposes a visual analysis-motivated rate-distortion model for Versatile Video Coding (VVC) intra compression. The proposed model has two major contributions, a novel rate allocation strategy and a new distortion measurement model. We first propose the region of interest for machine (ROIM) to evaluate the degree of importance for each coding tree unit (CTU) in visual analysis. Then, a novel CTU-level bit allocation model is proposed based on ROIM and the local texture characteristics of each CTU. After an in-depth analysis of multiple distortion models, a visual analysis friendly distortion criteria is subsequently proposed by extracting deep feature of each coding unit (CU). To alleviate the problem of lacking spatial context information when calculating the distortion of each CU, we finally propose a multi-scale feature distortion (MSFD) metric using different neighboring pixels by weighting the extracted deep features in each scale. Extensive experimental results show that the proposed scheme could achieve up to 28.17% bitrate saving under the same analysis performance among several typical visual analysis tasks such as image classification, object detection, and semantic segmentation.



rate research

Read More

End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance. However, distinct models are required to be trained to reach different points in the rate-distortion (R-D) space. In this paper, we consider the problem of R-D characteristic analysis and modeling for NIC. We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling. Thus continuous bit-rate points could be elegantly realized by leveraging such model via a single trained network. In this regard, we propose a plugin-in module to learn the relationship between the target bit-rate and the binary representation for the latent variable of auto-encoder. Furthermore, we model the rate and distortion characteristic of NIC as a function of the coding parameter $lambda$ respectively. Our experiments show our proposed method is easy to adopt and obtains competitive coding performance with fixed-rate coding approaches, which would benefit the practical deployment of NIC. In addition, the proposed model could be applied to NIC rate control with limited bit-rate error using a single network.
642 - Zhengfang Duanmu 2019
Rate-distortion (RD) theory is at the heart of lossy data compression. Here we aim to model the generalized RD (GRD) trade-off between the visual quality of a compressed video and its encoding profiles (e.g., bitrate and spatial resolution). We first define the theoretical functional space $mathcal{W}$ of the GRD function by analyzing its mathematical properties.We show that $mathcal{W}$ is a convex set in a Hilbert space, inspiring a computational model of the GRD function, and a method of estimating model parameters from sparse measurements. To demonstrate the feasibility of our idea, we collect a large-scale database of real-world GRD functions, which turn out to live in a low-dimensional subspace of $mathcal{W}$. Combining the GRD reconstruction framework and the learned low-dimensional space, we create a low-parameter eigen GRD method to accurately estimate the GRD function of a source video content from only a few queries. Experimental results on the database show that the learned GRD method significantly outperforms state-of-the-art empirical RD estimation methods both in accuracy and efficiency. Last, we demonstrate the promise of the proposed model in video codec comparison.
102 - Jian Yue , Yanbo Gao , Shuai Li 2021
In-loop filtering is used in video coding to process the reconstructed frame in order to remove blocking artifacts. With the development of convolutional neural networks (CNNs), CNNs have been explored for in-loop filtering considering it can be treated as an image de-noising task. However, in addition to being a distorted image, the reconstructed frame is also obtained by a fixed line of block based encoding operations in video coding. It carries coding-unit based coding distortion of some similar characteristics. Therefore, in this paper, we address the filtering problem from two aspects, global appearance restoration for disrupted texture and local coding distortion restoration caused by fixed pipeline of coding. Accordingly, a three-stream global appearance and local coding distortion based fusion network is developed with a high-level global feature stream, a high-level local feature stream and a low-level local feature stream. Ablation study is conducted to validate the necessity of different features, demonstrating that the global features and local features can complement each other in filtering and achieve better performance when combined. To the best of our knowledge, we are the first one that clearly characterizes the video filtering process from the above global appearance and local coding distortion restoration aspects with experimental verification, providing a clear pathway to developing filter techniques. Experimental results demonstrate that the proposed method significantly outperforms the existing single-frame based methods and achieves 13.5%, 11.3%, 11.7% BD-Rate saving on average for AI, LDP and RA configurations, respectively, compared with the HEVC reference software.
Today, according to the Cisco Annual Internet Report (2018-2023), the fastest-growing category of Internet traffic is machine-to-machine communication. In particular, machine-to-machine communication of images and videos represents a new challenge and opens up new perspectives in the context of data compression. One possible solution approach consists of adapting current human-targeted image and video coding standards to the use case of machine consumption. Another approach consists of developing completely new compression paradigms and architectures for machine-to-machine communications. In this paper, we focus on image compression and present an inference-time content-adaptive finetuning scheme that optimizes the latent representation of an end-to-end learned image codec, aimed at improving the compression efficiency for machine-consumption. The conducted experiments show that our online finetuning brings an average bitrate saving (BD-rate) of -3.66% with respect to our pretrained image codec. In particular, at low bitrate points, our proposed method results in a significant bitrate saving of -9.85%. Overall, our pretrained-and-then-finetuned system achieves -30.54% BD-rate over the state-of-the-art image/video codec Versatile Video Coding (VVC).
Handling digital images is almost always accompanied by a lossy compression in order to facilitate efficient transmission and storage. This introduces an unavoidable tension between the allocated bit-budget (rate) and the faithfulness of the resulting image to the original one (distortion). An additional complicating consideration is the effect of the compression on recognition performance by given classifiers (accuracy). This work aims to explore this rate-distortion-accuracy tradeoff. As a case study, we focus on the design of the quantization tables in the JPEG compression standard. We offer a novel optimal tuning of these tables via continuous optimization, leveraging a differential implementation of both the JPEG encoder-decoder and an entropy estimator. This enables us to offer a unified framework that considers the interplay between rate, distortion and classification accuracy. In all these fronts, we report a substantial boost in performance by a simple and easily implemented modification of these tables.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا