No Arabic abstract
Immersive video offers the freedom to navigate inside virtualized environment. Instead of streaming the bulky immersive videos entirely, a viewport (also referred to as field of view, FoV) adaptive streaming is preferred. We often stream the high-quality content within current viewport, while reducing the quality of representation elsewhere to save the network bandwidth consumption. Consider that we could refine the quality when focusing on a new FoV, in this paper, we model the perceptual impact of the quality variations (through adapting the quantization stepsize and spatial resolution) with respect to the refinement duration, and yield a product of two closed-form exponential functions that well explain the joint quantization and resolution induced quality impact. Analytical model is cross-validated using another set of data, where both Pearson and Spearmans rank correlation coefficients are close to 0.98. Our work is devised to optimize the adaptive FoV streaming of the immersive video under limited network resource. Numerical results show that our proposed model significantly improves the quality of experience of users, with about 9.36% BD-Rate (Bjontegaard Delta Rate) improvement on average as compared to other representative methods, particularly under the limited bandwidth.
In this paper, we study the server-side rate adaptation problem for streaming tile-based adaptive 360-degree videos to multiple users who are competing for transmission resources at the network bottleneck. Specifically, we develop a convolutional neural network (CNN)-based viewpoint prediction model to capture the nonlinear relationship between the future and historical viewpoints. A Laplace distribution model is utilized to characterize the probability distribution of the prediction error. Given the predicted viewpoint, we then map the viewport in the spherical space into its corresponding planar projection in the 2-D plane, and further derive the visibility probability of each tile based on the planar projection and the prediction error probability. According to the visibility probability, tiles are classified as viewport, marginal and invisible tiles. The server-side tile rate allocation problem for multiple users is then formulated as a non-linear discrete optimization problem to minimize the overall received video distortion of all users and the quality difference between the viewport and marginal tiles of each user, subject to the transmission capacity constraints and users specific viewport requirements. We develop a steepest descent algorithm to solve this non-linear discrete optimization problem, by initializing the feasible starting point in accordance with the optimal solution of its continuous relaxation. Extensive experimental results show that the proposed algorithm can achieve a near-optimal solution, and outperforms the existing rate adaptation schemes for tile-based adaptive 360-video streaming.
Adaptive bitrate (ABR) streaming is the de facto solution for achieving smooth viewing experiences under unstable network conditions. However, most of the existing rate adaptation approaches for ABR are content-agnostic, without considering the semantic information of the video content. Nevertheless, semantic information largely determines the informativeness and interestingness of the video content, and consequently affects the QoE for video streaming. One common case is that the user may expect higher quality for the parts of video content that are more interesting or informative so as to reduce video distortion and information loss, given that the overall bitrate budgets are limited. This creates two main challenges for such a problem: First, how to determine which parts of the video content are more interesting? Second, how to allocate bitrate budgets for different parts of the video content with different significances? To address these challenges, we propose a Content-of-Interest (CoI) based rate adaptation scheme for ABR. We first design a deep learning approach for recognizing the interestingness of the video content, and then design a Deep Q-Network (DQN) approach for rate adaptation by incorporating video interestingness information. The experimental results show that our method can recognize video interestingness precisely, and the bitrate allocation for ABR can be aligned with the interestingness of video content while not compromising the performances on objective QoE metrics.
Omnidirectional applications are immersive and highly interactive, which can improve the efficiency of remote collaborative work among factory workers. The transmission of omnidirectional video (OV) is the most important step in implementing virtual remote collaboration. Compared with the ordinary video transmission, OV transmission requires more bandwidth, which is still a huge burden even under 5G networks. The tile-based scheme can reduce bandwidth consumption. However, it neither accurately obtain the field of view(FOV) area, nor difficult to support real-time OV streaming. In this paper, we propose an edge-assisted viewport adaptive scheme (EVAS-OV) to reduce bandwidth consumption during real-time OV transmission. First, EVAS-OV uses a Gated Recurrent Unit(GRU) model to predict users viewport. Then, users were divided into multicast clusters thereby further reducing the consumption of computing resources. EVAS-OV reprojects OV frames to accurately obtain users FOV area from pixel level and adopt a redundant strategy to reduce the impact of viewport prediction errors. All computing tasks were offloaded to edge servers to reduce the transmission delay and improve bandwidth utilization. Experimental results show that EVAS-OV can save more than 60% of bandwidth compared with the non-viewport adaptive scheme. Compared to a two-layer scheme with viewport adaptive, EVAS-OV still saves 30% of bandwidth.
Compressed videos constitute 70% of Internet traffic, and video upload growth rates far outpace compute and storage improvement trends. Past work in leveraging perceptual cues like saliency, i.e., regions where viewers focus their perceptual attention, reduces compressed video size while maintaining perceptual quality, but requires significant changes to video codecs and ignores the data management of this perceptual information. In this paper, we propose Vignette, a compression technique and storage manager for perception-based video compression. Vignette complements off-the-shelf compression software and hardware codec implementations. Vignettes compression technique uses a neural network to predict saliency information used during transcoding, and its storage manager integrates perceptual information into the video storage system to support a perceptual compression feedback loop. Vignettes saliency-based optimizations reduce storage by up to 95% with minimal quality loss, and Vignette videos lead to power savings of 50% on mobile phones during video playback. Our results demonstrate the benefit of embedding information about the human visual system into the architecture of video storage systems.
Combining underline{v}ideo streaming and online underline{r}etailing (V2R) has been a growing trend recently. In this paper, we provide practitioners and researchers in multimedia with a cloud-based platform named Hysia for easy development and deployment of V2R applications. The system consists of: 1) a back-end infrastructure providing optimized V2R related services including data engine, model repository, model serving and content matching; and 2) an application layer which enables rapid V2R application prototyping. Hysia addresses industry and academic needs in large-scale multimedia by: 1) seamlessly integrating state-of-the-art libraries including NVIDIA video SDK, Facebook faiss, and gRPC; 2) efficiently utilizing GPU computation; and 3) allowing developers to bind new models easily to meet the rapidly changing deep learning (DL) techniques. On top of that, we implement an orchestrator for further optimizing DL model serving performance. Hysia has been released as an open source project on GitHub, and attracted considerable attention. We have published Hysia to DockerHub as an official image for seamless integration and deployment in current cloud environments.