Assessing the Quality-of-Experience of Adaptive Bitrate Video Streaming

157 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhengfang Duanmu

تاريخ النشر 2020

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhengfang Duanmu - Wentao Liu - Zhuoran Li

معالجة الصور والفيديو الوسائط المتعددة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The diversity of video delivery pipeline poses a grand challenge to the evaluation of adaptive bitrate (ABR) streaming algorithms and objective quality-of-experience (QoE) models. Here we introduce so-far the largest subject-rated database of its kind, namely WaterlooSQoE-IV, consisting of 1350 adaptive streaming videos created from diverse source contents, video encoders, network traces, ABR algorithms, and viewing devices. We collect human opinions for each video with a series of carefully designed subjective experiments. Subsequent data analysis and testing/comparison of ABR algorithms and QoE models using the database lead to a series of novel observations and interesting findings, in terms of the effectiveness of subjective experiment methodologies, the interactions between user experience and source content, viewing device and encoder type, the heterogeneities in the bias and preference of user experiences, the behaviors of ABR algorithms, and the performance of objective QoE models. Most importantly, our results suggest that a better objective QoE model, or a better understanding of human perceptual experience and behaviour, is the most dominating factor in improving the performance of ABR algorithms, as opposed to advanced optimization frameworks, machine learning strategies or bandwidth predictors, where a majority of ABR research has been focused on in the past decade. On the other hand, our performance evaluation of 11 QoE models shows only a moderate correlation between state-of-the-art QoE models and subjective ratings, implying rooms for improvement in both QoE modeling and ABR algorithms. The database is made publicly available at: url{https://ece.uwaterloo.ca/~zduanmu/waterloosqoe4/}.

قيم البحث

111 - Angeliki V. Katsenou , Joel Sole , David R. Bull 2021

One of the challenges faced by many video providers is the heterogeneity of network specifications, user requirements, and content compression performance. The universal solution of a fixed bitrate ladder is inadequate in ensuring a high quality of u ser experience without re-buffering or introducing annoying compression artifacts. However, a content-tailored solution, based on extensively encoding across all resolutions and over a wide quality range is highly expensive in terms of computational, financial, and energy costs. Inspired by this, we propose an approach that exploits machine learning to predict a content-optimized bitrate ladder. The method extracts spatio-temporal features from the uncompressed content, trains machine-learning models to predict the Pareto front parameters, and, based on that, builds the ladder within a defined bitrate range. The method has the benefit of significantly reducing the number of encodes required per sequence. The presented results, based on 100 HEVC-encoded sequences, demonstrate a reduction in the number of encodes required when compared to an exhaustive search and an interpolation-based method, by 89.06% and 61.46%, respectively, at the cost of an average Bj{o}ntegaard Delta Rate difference of 1.78% compared to the exhaustive approach. Finally, a hybrid method is introduced that selects either the proposed or the interpolation-based method depending on the sequence features. This results in an overall 83.83% reduction of required encodings at the cost of an average Bj{o}ntegaard Delta Rate difference of 1.26%.

معالجة الصور والفيديو

The Effect of Frame Rate on 3D Video Quality and Bitrate

63 - Amin Banitalebi-Dehkordi , Mahsa T. Pourazad , 2018

Increasing the frame rate of a 3D video generally results in improved Quality of Experience (QoE). However, higher frame rates involve a higher degree of complexity in capturing, transmission, storage, and display. The question that arises here is wh at frame rate guarantees high viewing quality of experience given the existing/required 3D devices and technologies (3D cameras, 3D TVs, compression, transmission bandwidth, and storage capacity). This question has already been addressed for the case of 2D video, but not for 3D. The objective of this paper is to study the relationship between 3D quality and bitrate at different frame rates. Our performance evaluations show that increasing the frame rate of 3D videos beyond 60 fps may not be visually distinguishable. In addition, our experiments show that when the available bandwidth is reduced, the highest possible 3D quality of experience can be achieved by adjusting (decreasing) the frame rate instead of increasing the compression ratio. The results of our study are of particular interest to network providers for rate adaptation in variable bitrate channels.

معالجة الصور والفيديو

A Knowledge-Driven Quality-of-Experience Model for Adaptive Streaming Videos

154 - Zhengfang Duanmu , Wentao Liu , Diqi Chen 2019

The fundamental conflict between the enormous space of adaptive streaming videos and the limited capacity for subjective experiment casts significant challenges to objective Quality-of-Experience (QoE) prediction. Existing objective QoE models exhibi t complex functional form, failing to generalize well in diverse streaming environments. In this study, we propose an objective QoE model namely knowledge-driven streaming quality index (KSQI) to integrate prior knowledge on the human visual system and human annotated data in a principled way. By analyzing the subjective characteristics towards streaming videos from a corpus of subjective studies, we show that a family of QoE functions lies in a convex set. Using a variant of projected gradient descent, we optimize the objective QoE model over a database of training videos. The proposed KSQI demonstrates strong generalizability to diverse streaming environments, evident by state-of-the-art performance on four publicly available benchmark datasets.

الوسائط المتعددة بنية الشبكات والإنترنت

A Study on the Relationship Between Depth Map Quality and the Overall 3D Video Quality OF Experience

97 - Amin Banitalebi-Dehkordi , Mahsa T. Pourazad , 2018

The emergence of multiview displays has made the need for synthesizing virtual views more pronounced, since it is not practical to capture all of the possible views when filming multiview content. View synthesis is performed using the available views and depth maps. There is a correlation between the quality of the synthesized views and the quality of depth maps. In this paper we study the effect of depth map quality on perceptual quality of synthesized view through subjective and objective analysis. Our evaluation results show that: 1) 3D video quality depends highly on the depth map quality and 2) the Visual Information Fidelity index computed between the reference and distorted depth maps has Pearson correlation ratio of 0.75 and Spearman rank order correlation coefficient of 0.67 with the subjective 3D video quality.

معالجة الصور والفيديو

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

101 - Pulkit Tandon , Shubham Chandak , Pat Pataranutaporn 2021

Video represents the majority of internet traffic today leading to a continuous technological arms race between generating higher quality content, transmitting larger file sizes and supporting network infrastructure. Adding to this is the recent COVI D-19 pandemic fueled surge in the use of video conferencing tools. Since videos take up substantial bandwidth (~100 Kbps to few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. In this work, we present a novel video compression pipeline, called Txt2Vid, which substantially reduces data transmission rates by compressing webcam videos (talking-head videos) to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users (n=242) in an online study. The code for this work is available at https://github.com/tpulkit/txt2vid.git.

معالجة الصور والفيديو الوسائط المتعددة