A practical convolutional neural network as loop filter for intra frame

275 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiaodan Song

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xiaodan Song - Jiabao Yao - Lulu Zhou

الوسائط المتعددة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Loop filters are used in video coding to remove artifacts or improve performance. Recent advances in deploying convolutional neural network (CNN) to replace traditional loop filters show large gains but with problems for practical application. First, different model is used for frames encoded with different quantization parameter (QP), respectively. It is expensive for hardware. Second, float points operation in CNN leads to inconsistency between encoding and decoding across different platforms. Third, redundancy within CNN model consumes precious computational resources. This paper proposes a CNN as the loop filter for intra frames and proposes a scheme to solve the above problems. It aims to design a single CNN model with low redundancy to adapt to decoded frames with different qualities and ensure consistency. To adapt to reconstructions with different qualities, both reconstruction and QP are taken as inputs. After training, the obtained model is compressed to reduce redundancy. To ensure consistency, dynamic fixed points (DFP) are adopted in testing CNN. Parameters in the compressed model are first quantized to DFP and then used for inference of CNN. Outputs of each layer in CNN are computed by DFP operations. Experimental results on JEM 7.0 report 3.14%, 5.21%, 6.28% BD-rate savings for luma and two chroma components with all intra configuration when replacing all traditional filters.

قيم البحث

107 - Hengyu Man , Xiaopeng Fan , Ruiqin Xiong 2021

As a crucial part of video compression, intra prediction utilizes local information of images to eliminate the redundancy in spatial domain. In both H.265/HEVC and H.266/VVC, multiple directional prediction modes are employed to find the texture tren d of each small block and then the prediction is made based on reference samples in the selected direction. Recently, the intra prediction schemes based on neural networks have achieved great success. In these methods, the networks are trained and applied to intra prediction in addition to the directional prediction modes. In this paper, we propose a novel data clustering-driven neural network (dubbed DCDNN) for intra prediction, which can learn deep features of the clustered data. In DCDNN, each network can be split into two networks by adding or subtracting Gaussian random noise. Then a data clustering-driven training is applied to train all the derived networks recursively. In each iteration, the entire training dataset is partitioned according to the recovery qualities of the derived networks. For the experiment, DCDNN is implemented into HEVC reference software HM-16.9. The experimental results demonstrate that DCDNN can reach an average of 4.2% Bjontegaard distortion rate (BDrate) improvement (up to 7.0%) over HEVC with all intra configuration. Compared with existing fully connected networkbased intra prediction methods, the bitrate saving performance is further improved.

الوسائط المتعددة

Spatial-Temporal Residue Network Based In-Loop Filter for Video Coding

69 - Chuanmin Jia , Shiqi Wang , Xinfeng Zhang 2017

Deep learning has demonstrated tremendous break through in the area of image/video processing. In this paper, a spatial-temporal residue network (STResNet) based in-loop filter is proposed to suppress visual artifacts such as blocking, ringing in vid eo coding. Specifically, the spatial and temporal information is jointly exploited by taking both current block and co-located block in reference frame into consideration during the processing of in-loop filter. The architecture of STResNet only consists of four convolution layers which shows hospitality to memory and coding complexity. Moreover, to fully adapt the input content and improve the performance of the proposed in-loop filter, coding tree unit (CTU) level control flag is applied in the sense of rate-distortion optimization. Extensive experimental results show that our scheme provides up to 5.1% bit-rate reduction compared to the state-of-the-art video coding standard.

الوسائط المتعددة

Learning a Representation for Cover Song Identification Using Convolutional Neural Network

71 - Zhesong Yu , Xiaoshuo Xu , Xiaoou Chen 2019

Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cov

الوسائط المتعددة التعلم الآلي أنظمة الصوت في الحاسوب

A Novel Convolutional Neural Network for Image Steganalysis with Shared Normalization

161 - Songtao Wu , Sheng-hua Zhong , 2017

Deep learning based image steganalysis has attracted increasing attentions in recent years. Several Convolutional Neural Network (CNN) models have been proposed and achieved state-of-the-art performances on detecting steganography. In this paper, we explore an important technique in deep learning, the batch normalization, for the task of image steganalysis. Different from natural image classification, steganalysis is to discriminate cover images and stego images which are the result of adding weak stego signals into covers. This characteristic makes a cover image is more statistically similar to its stego than other cover images, requiring steganalytic methods to use paired learning to extract effective features for image steganalysis. Our theoretical analysis shows that a CNN model with multiple normalization layers is hard to be generalized to new data in the test set when it is well trained with paired learning. To hand this difficulty, we propose a novel normalization technique called Shared Normalization (SN) in this paper. Unlike the batch normalization layer utilizing the mini-batch mean and standard deviation to normalize each input batch, SN shares same statistics for all training and test batches. Based on the proposed SN layer, we further propose a novel neural network model for image steganalysis. Extensive experiments demonstrate that the proposed network with SN layers is stable and can detect the state of the art steganography with better performances than previous methods.

الوسائط المتعددة

Enhancing HEVC Compressed Videos with a Partition-masked Convolutional Neural Network

144 - Xiaoyi He , Qiang Hu , Xintong Han 2018

In this paper, we propose a partition-masked Convolution Neural Network (CNN) to achieve compressed-video enhancement for the state-of-the-art coding standard, High Efficiency Video Coding (HECV). More precisely, our method utilizes the partition inf ormation produced by the encoder to guide the quality enhancement process. In contrast to existing CNN-based approaches, which only take the decoded frame as the input to the CNN, the proposed approach considers the coding unit (CU) size information and combines it with the distorted decoded frame such that the degradation introduced by HEVC is reduced more efficiently. Experimental results show that our approach leads to over 9.76% BD-rate saving on benchmark sequences, which achieves the state-of-the-art performance.

الوسائط المتعددة