No Arabic abstract
This paper investigates adaptive streaming of one or multiple tiled 360 videos from a multi-antenna base station (BS) to one or multiple single-antenna users, respectively, in a multi-carrier wireless system. We aim to maximize the video quality while keeping rebuffering time small via encoding rate adaptation at each group of pictures (GOP) and transmission adaptation at each (transmission) slot. To capture the impact of field-of-view (FoV) prediction, we consider three cases of FoV viewing probability distributions, i.e., perfect, imperfect, and unknown FoV viewing probability distributions, and use the average total utility, worst average total utility, and worst total utility as the respective performance metrics. In the single-user scenario, we optimize the encoding rates of the tiles, encoding rates of the FoVs, and transmission beamforming vectors for all subcarriers to maximize the total utility in each case. In the multi-user scenario, we adopt rate splitting with successive decoding and optimize the encoding rates of the tiles, encoding rates of the FoVs, rates of the common and private messages, and transmission beamforming vectors for all subcarriers to maximize the total utility in each case. Then, we separate the challenging optimization problem into multiple tractable problems in each scenario. In the single-user scenario, we obtain a globally optimal solution of each problem using transformation techniques and the Karush-Kuhn-Tucker (KKT) conditions. In the multi-user scenario, we obtain a KKT point of each problem using the concave-convex procedure (CCCP). Finally, numerical results demonstrate that the proposed solutions achieve notable gains over existing schemes in all three cases. To the best of our knowledge, this is the first work revealing the impact of FoV prediction on the performance of adaptive streaming of tiled 360 videos.
Omnidirectional (or 360-degree) images and videos are emergent signals in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality, they allow an immersive experience in which the user is provided with a 360-degree field of view and can navigate throughout a scene, e.g., through the use of Head Mounted Displays. Since it represents the full 360-degree field of view from one point of the scene, omnidirectional content is naturally represented as spherical visual signals. Current approaches for capturing, processing, delivering, and displaying 360-degree content, however, present many open technical challenges and introduce several types of distortions in these visual signals. Some of the distortions are specific to the nature of 360-degree images, and often different from those encountered in the classical image communication framework. This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals undergoing state of the art processing in common applications. While their impact on viewers visual perception and on the immersive experience at large is still unknown ---thus, it stays an open research topic--- this review serves the purpose of identifying the main causes of visual distortions in the end-to-end 360-degree content distribution pipeline. It is essential as a basis for benchmarking different processing techniques, allowing the effective design of new algorithms and applications. It is also necessary to the deployment of proper psychovisual studies to characterise the human perception of these new images in interactive and immersive applications.
The fundamental conflict between the enormous space of adaptive streaming videos and the limited capacity for subjective experiment casts significant challenges to objective Quality-of-Experience (QoE) prediction. Existing objective QoE models exhibit complex functional form, failing to generalize well in diverse streaming environments. In this study, we propose an objective QoE model namely knowledge-driven streaming quality index (KSQI) to integrate prior knowledge on the human visual system and human annotated data in a principled way. By analyzing the subjective characteristics towards streaming videos from a corpus of subjective studies, we show that a family of QoE functions lies in a convex set. Using a variant of projected gradient descent, we optimize the objective QoE model over a database of training videos. The proposed KSQI demonstrates strong generalizability to diverse streaming environments, evident by state-of-the-art performance on four publicly available benchmark datasets.
In this paper, we study the server-side rate adaptation problem for streaming tile-based adaptive 360-degree videos to multiple users who are competing for transmission resources at the network bottleneck. Specifically, we develop a convolutional neural network (CNN)-based viewpoint prediction model to capture the nonlinear relationship between the future and historical viewpoints. A Laplace distribution model is utilized to characterize the probability distribution of the prediction error. Given the predicted viewpoint, we then map the viewport in the spherical space into its corresponding planar projection in the 2-D plane, and further derive the visibility probability of each tile based on the planar projection and the prediction error probability. According to the visibility probability, tiles are classified as viewport, marginal and invisible tiles. The server-side tile rate allocation problem for multiple users is then formulated as a non-linear discrete optimization problem to minimize the overall received video distortion of all users and the quality difference between the viewport and marginal tiles of each user, subject to the transmission capacity constraints and users specific viewport requirements. We develop a steepest descent algorithm to solve this non-linear discrete optimization problem, by initializing the feasible starting point in accordance with the optimal solution of its continuous relaxation. Extensive experimental results show that the proposed algorithm can achieve a near-optimal solution, and outperforms the existing rate adaptation schemes for tile-based adaptive 360-video streaming.
With the merit of containing full panoramic content in one camera, Virtual Reality (VR) and 360-degree videos have attracted more and more attention in the field of industrial cloud manufacturing and training. Industrial Internet of Things (IoT), where many VR terminals needed to be online at the same time, can hardly guarantee VRs bandwidth requirement. However, by making use of users quality of experience (QoE) awareness factors, including the relative moving speed and depth difference between the viewpoint and other content, bandwidth consumption can be reduced. In this paper, we propose OFB-VR (Optical Flow Based VR), an interactive method of VR streaming that can make use of VR users QoE awareness to ease the bandwidth pressure. The Just-Noticeable Difference through Optical Flow Estimation (JND-OFE) is explored to quantify users awareness of quality distortion in 360-degree videos. Accordingly, a novel 360-degree videos QoE metric based on PSNR and JND-OFE (PSNR-OF) is proposed. With the help of PSNR-OF, OFB-VR proposes a versatile-size tiling scheme to lessen the tiling overhead. A Reinforcement Learning(RL) method is implemented to make use of historical data to perform Adaptive BitRate(ABR). For evaluation, we take two prior VR streaming schemes, Pano and Plato, as baselines. Vast evaluations show that our system can increase the mean PSNR-OF score by 9.5-15.8% while maintaining the same rebuffer ratio compared with Pano and Plato in a fluctuate LTE bandwidth dataset. Evaluation results show that OFB-VR is a promising prototype for actual interactive industrial VR. A prototype of OFB-VR can be found in https://github.com/buptexplorers/OFB-VR.
In this paper, we propose a systematic solution to the problem of scheduling delay-sensitive media data for transmission over time-varying wireless channels. We first formulate the dynamic scheduling problem as a Markov decision process (MDP) that explicitly considers the users heterogeneous multimedia data characteristics (e.g. delay deadlines, distortion impacts and dependencies etc.) and time-varying channel conditions, which are not simultaneously considered in state-of-the-art packet scheduling algorithms. This formulation allows us to perform foresighted decisions to schedule multiple data units for transmission at each time in order to optimize the long-term utilities of the multimedia applications. The heterogeneity of the media data enables us to express the transmission priorities between the different data units as a priority graph, which is a directed acyclic graph (DAG). This priority graph provides us with an elegant structure to decompose the multi-data unit foresighted decision at each time into multiple single-data unit foresighted decisions which can be performed sequentially, from the high priority data units to the low priority data units, thereby significantly reducing the computation complexity. When the statistical knowledge of the multimedia data characteristics and channel conditions is unknown a priori, we develop a low-complexity online learning algorithm to update the value functions which capture the impact of the current decision on the future utility. The simulation results show that the proposed solution significantly outperforms existing state-of-the-art scheduling solutions.