Do you want to publish a course? Click here

A Multiparametric Class of Low-complexity Transforms for Image and Video Coding

318   0   0.0 ( 0 )
 Added by Renato J Cintra
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Discrete transforms play an important role in many signal processing applications, and low-complexity alternatives for classical transforms became popular in recent years. Particularly, the discrete cosine transform (DCT) has proven to be convenient for data compression, being employed in well-known image and video coding standards such as JPEG, H.264, and the recent high efficiency video coding (HEVC). In this paper, we introduce a new class of low-complexity 8-point DCT approximations based on a series of works published by Bouguezel, Ahmed and Swamy. Also, a multiparametric fast algorithm that encompasses both known and novel transforms is derived. We select the best-performing DCT approximations after solving a multicriteria optimization problem, and submit them to a scaling method for obtaining larger size transforms. We assess these DCT approximations in both JPEG-like image compression and video coding experiments. We show that the optimal DCT approximations present compelling results in terms of coding efficiency and image quality metrics, and require only few addition or bit-shifting operations, being suitable for low-complexity and low-power systems.



rate research

Read More

74 - R. J. Cintra 2020
Approximate methods have been considered as a means to the evaluation of discrete transforms. In this work, we propose and analyze a class of integer transforms for the discrete Fourier, Hartley, and cosine transforms (DFT, DHT, and DCT), based on simple dyadic rational approximation methods. The introduced method is general, applicable to several block-lengths, whereas existing approaches are usually dedicated to specific transform sizes. The suggested approximate transforms enjoy low multiplicative complexity and the orthogonality property is achievable via matrix polar decomposition. We show that the obtained transforms are competitive with archived methods in literature. New 8-point square wave approximate transforms for the DFT, DHT, and DCT are also introduced as particular cases of the introduced methodology.
The principal component analysis (PCA) is widely used for data decorrelation and dimensionality reduction. However, the use of PCA may be impractical in real-time applications, or in situations were energy and computing constraints are severe. In this context, the discrete cosine transform (DCT) becomes a low-cost alternative to data decorrelation. This paper presents a method to derive computationally efficient approximations to the DCT. The proposed method aims at the minimization of the angle between the rows of the exact DCT matrix and the rows of the approximated transformation matrix. The resulting transformations matrices are orthogonal and have extremely low arithmetic complexity. Considering popular performance measures, one of the proposed transformation matrices outperforms the best competitors in both matrix error and coding capabilities. Practical applications in image and video coding demonstrate the relevance of the proposed transformation. In fact, we show that the proposed approximate DCT can outperform the exact DCT for image encoding under certain compression ratios. The proposed transform and its direct competitors are also physically realized as digital prototype circuits using FPGA technology.
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs while being much faster and easier to implement. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% bitrate savings without trading off decoding speed. Importantly, we show that our approach outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video, and against all video codecs on lower framerates, while being thousands of times faster in decoding than deep models utilizing an autoregressive entropy model.
102 - Jian Yue , Yanbo Gao , Shuai Li 2021
In-loop filtering is used in video coding to process the reconstructed frame in order to remove blocking artifacts. With the development of convolutional neural networks (CNNs), CNNs have been explored for in-loop filtering considering it can be treated as an image de-noising task. However, in addition to being a distorted image, the reconstructed frame is also obtained by a fixed line of block based encoding operations in video coding. It carries coding-unit based coding distortion of some similar characteristics. Therefore, in this paper, we address the filtering problem from two aspects, global appearance restoration for disrupted texture and local coding distortion restoration caused by fixed pipeline of coding. Accordingly, a three-stream global appearance and local coding distortion based fusion network is developed with a high-level global feature stream, a high-level local feature stream and a low-level local feature stream. Ablation study is conducted to validate the necessity of different features, demonstrating that the global features and local features can complement each other in filtering and achieve better performance when combined. To the best of our knowledge, we are the first one that clearly characterizes the video filtering process from the above global appearance and local coding distortion restoration aspects with experimental verification, providing a clear pathway to developing filter techniques. Experimental results demonstrate that the proposed method significantly outperforms the existing single-frame based methods and achieves 13.5%, 11.3%, 11.7% BD-Rate saving on average for AI, LDP and RA configurations, respectively, compared with the HEVC reference software.
174 - Shoou-I Yu , Yi Yang , Zhongwen Xu 2016
The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا