A Bayesian Approach to Block Structure Inference in AV1-based Multi-rate Video Encoding

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Bichuan Guo

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Bichuan Guo - Xinyao Chen - Jiawen Gu

الوسائط المتعددة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Due to differences in frame structure, existing multi-rate video encoding algorithms cannot be directly adapted to encoders utilizing special reference frames such as AV1 without introducing substantial rate-distortion loss. To tackle this problem, we propose a novel bayesian block structure inference model inspired by a modification to an HEVC-based algorithm. It estimates the posterior probabilistic distributions of block partitioning, and adapts early terminations in the RDO procedure accordingly. Experimental results show that the proposed method provides flexibility for controlling the tradeoff between speed and coding efficiency, and can achieve an average time saving of 36.1% (up to 50.6%) with negligible bitrate cost.

قيم البحث

121 - Bichuan Guo , Yuxing Han , Jiangtao Wen 2018

The widely used adaptive HTTP streaming requires an efficient algorithm to encode the same video to different resolutions. In this paper, we propose a fast block structure determination algorithm based on the AV1 codec that accelerates high resolutio n encoding, which is the bottle-neck of multiple resolutions encoding. The block structure similarity across resolutions is modeled by the fineness of frame detail and scale of object motions, this enables us to accelerate high resolution encoding based on low resolution encoding results. The average depth of a blocks co-located neighborhood is used to decide early termination in the RDO process. Encoding results show that our proposed algorithm reduces encoding time by 30.1%-36.8%, while keeping BD-rate low at 0.71%-1.04%. Comparing to the state-of-the-art, our method halves performance loss without sacrificing time savings.

الوسائط المتعددة

CNN-based driving of block partitioning for intra slices encoding

97 - Franck Galpin , Fabien Racape , Sunil Jaiswal 2020

This paper provides a technical overview of a deep-learning-based encoder method aiming at optimizing next generation hybrid video encoders for driving the block partitioning in intra slices. An encoding approach based on Convolutional Neural Network s is explored to partly substitute classical heuristics-based encoder speed-ups by a systematic and automatic process. The solution allows controlling the trade-off between complexity and coding gains, in intra slices, with one single parameter. This algorithm was proposed at the Call for Proposals of the Joint Video Exploration Team (JVET) on video compression with capability beyond HEVC. In All Intra configuration, for a given allowed topology of splits, a speed-up of $times 2$ is obtained without BD-rate loss, or a speed-up above $times 4$ with a loss below 1% in BD-rate.

الوسائط المتعددة

A novel technique for image steganography based on Block-DCT and Huffman Encoding

494 - A.Nag 2010

Image steganography is the art of hiding information into a cover image. This paper presents a novel technique for Image steganography based on Block-DCT, where DCT is used to transform original image (cover image) blocks from spatial domain to frequ ency domain. Firstly a gray level image of size M x N is divided into no joint 8 x 8 blocks and a two dimensional Discrete Cosine Transform (2-d DCT) is performed on each of the P = MN / 64 blocks. Then Huffman encoding is also performed on the secret messages/images before embedding and each bit of Huffman code of secret message/image is embedded in the frequency domain by altering the least significant bit of each of the DCT coefficients of cover image blocks. The experimental results show that the algorithm has a high capacity and a good invisibility. Moreover PSNR of cover image with stego-image shows the better results in comparison with other existing steganography approaches. Furthermore, satisfactory security is maintained since the secret message/image cannot be extracted without knowing decoding rules and Huffman table.

الوسائط المتعددة

Optimizing Video Caching at the Edge: A Hybrid Multi-Point Process Approach

69 - Xianzhi Zhang , Yipeng Zhou , Di Wu 2021

It is always a challenging problem to deliver a huge volume of videos over the Internet. To meet the high bandwidth and stringent playback demand, one feasible solution is to cache video contents on edge servers based on predicted video popularity. T raditional caching algorithms (e.g., LRU, LFU) are too simple to capture the dynamics of video popularity, especially long-tailed videos. Recent learning-driven caching algorithms (e.g., DeepCache) show promising performance, however, such black-box approaches are lack of explainability and interpretability. Moreover, the parameter tuning requires a large number of historical records, which are difficult to obtain for videos with low popularity. In this paper, we optimize video caching at the edge using a white-box approach, which is highly efficient and also completely explainable. To accurately capture the evolution of video popularity, we develop a mathematical model called emph{HRS} model, which is the combination of multiple point processes, including Hawkes self-exciting, reactive and self-correcting processes. The key advantage of the HRS model is its explainability, and much less number of model parameters. In addition, all its model parameters can be learned automatically through maximizing the Log-likelihood function constructed by past video request events. Next, we further design an online HRS-based video caching algorithm. To verify its effectiveness, we conduct a series of experiments using real video traces collected from Tencent Video, one of the largest online video providers in China. Experiment results demonstrate that our proposed algorithm outperforms the state-of-the-art algorithms, with 12.3% improvement on average in terms of cache hit rate under realistic settings.

الوسائط المتعددة بنية الشبكات والإنترنت

Hysia: Serving DNN-Based Video-to-Retail Applications in Cloud

80 - Huaizheng Zhang , Yuanming Li , Qiming Ai 2020

Combining underline{v}ideo streaming and online underline{r}etailing (V2R) has been a growing trend recently. In this paper, we provide practitioners and researchers in multimedia with a cloud-based platform named Hysia for easy development and deplo yment of V2R applications. The system consists of: 1) a back-end infrastructure providing optimized V2R related services including data engine, model repository, model serving and content matching; and 2) an application layer which enables rapid V2R application prototyping. Hysia addresses industry and academic needs in large-scale multimedia by: 1) seamlessly integrating state-of-the-art libraries including NVIDIA video SDK, Facebook faiss, and gRPC; 2) efficiently utilizing GPU computation; and 3) allowing developers to bind new models easily to meet the rapidly changing deep learning (DL) techniques. On top of that, we implement an orchestrator for further optimizing DL model serving performance. Hysia has been released as an open source project on GitHub, and attracted considerable attention. We have published Hysia to DockerHub as an official image for seamless integration and deployment in current cloud environments.

الوسائط المتعددة النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي