ترغب بنشر مسار تعليمي؟ اضغط هنا

Fine granularity access in interactive compression of 360-degree images based on rate-adaptive channel codes

73   0   0.0 ( 0 )
 نشر من قبل Navid Mahmoudian Bidgoli
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, we propose a new interactive compression scheme for omnidirectional images. This requires two characteristics: efficient compression of data, to lower the storage cost, and random access ability to extract part of the compressed stream requested by the user (for reducing the transmission rate). For efficient compression, data needs to be predicted by a series of references that have been pre-defined and compressed. This contrasts with the spirit of random accessibility. We propose a solution for this problem based on incremental codes implemented by rate-adaptive channel codes. This scheme encodes the image while adapting to any user request and leads to an efficient coding that is flexible in extracting data depending on the available information at the decoder. Therefore, only the information that is needed to be displayed at the users side is transmitted during the users request, as if the request was already known at the encoder. The experimental results demonstrate that our coder obtains a better transmission rate than the state-of-the-art tile-based methods at a small cost in storage. Moreover, the transmission rate grows gradually with the size of the request and avoids a staircase effect, which shows the perfect suitability of our coder for interactive transmission.

قيم البحث

اقرأ أيضاً

In this paper, we study the server-side rate adaptation problem for streaming tile-based adaptive 360-degree videos to multiple users who are competing for transmission resources at the network bottleneck. Specifically, we develop a convolutional neu ral network (CNN)-based viewpoint prediction model to capture the nonlinear relationship between the future and historical viewpoints. A Laplace distribution model is utilized to characterize the probability distribution of the prediction error. Given the predicted viewpoint, we then map the viewport in the spherical space into its corresponding planar projection in the 2-D plane, and further derive the visibility probability of each tile based on the planar projection and the prediction error probability. According to the visibility probability, tiles are classified as viewport, marginal and invisible tiles. The server-side tile rate allocation problem for multiple users is then formulated as a non-linear discrete optimization problem to minimize the overall received video distortion of all users and the quality difference between the viewport and marginal tiles of each user, subject to the transmission capacity constraints and users specific viewport requirements. We develop a steepest descent algorithm to solve this non-linear discrete optimization problem, by initializing the feasible starting point in accordance with the optimal solution of its continuous relaxation. Extensive experimental results show that the proposed algorithm can achieve a near-optimal solution, and outperforms the existing rate adaptation schemes for tile-based adaptive 360-video streaming.
State-of-the-art 2D image compression schemes rely on the power of convolutional neural networks (CNNs). Although CNNs offer promising perspectives for 2D image compression, extending such models to omnidirectional images is not straightforward. Firs t, omnidirectional images have specific spatial and statistical properties that can not be fully captured by current CNN models. Second, basic mathematical operations composing a CNN architecture, e.g., translation and sampling, are not well-defined on the sphere. In this paper, we study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. In particular, we: i) propose the definition of a new convolution operation on the sphere that keeps the high expressiveness and the low complexity of a classical 2D convolution; ii) adapt standard CNN techniques such as stride, iterative aggregation, and pixel shuffling to the spherical domain; and then iii) apply our new framework to the task of omnidirectional image compression. Our experiments show that our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images. Also, compared to learning models based on graph convolutional networks, our solution supports more expressive filters that can preserve high frequencies and provide a better perceptual quality of the compressed images. Such results demonstrate the efficiency of the proposed framework, which opens new research venues for other omnidirectional vision tasks to be effectively implemented on the sphere manifold.
Omnidirectional (or 360-degree) images and videos are emergent signals in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality, they allow an immersive experience in which the user is provided with a 360-degre e field of view and can navigate throughout a scene, e.g., through the use of Head Mounted Displays. Since it represents the full 360-degree field of view from one point of the scene, omnidirectional content is naturally represented as spherical visual signals. Current approaches for capturing, processing, delivering, and displaying 360-degree content, however, present many open technical challenges and introduce several types of distortions in these visual signals. Some of the distortions are specific to the nature of 360-degree images, and often different from those encountered in the classical image communication framework. This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals undergoing state of the art processing in common applications. While their impact on viewers visual perception and on the immersive experience at large is still unknown ---thus, it stays an open research topic--- this review serves the purpose of identifying the main causes of visual distortions in the end-to-end 360-degree content distribution pipeline. It is essential as a basis for benchmarking different processing techniques, allowing the effective design of new algorithms and applications. It is also necessary to the deployment of proper psychovisual studies to characterise the human perception of these new images in interactive and immersive applications.
As a technology that can prevent the information of original image and additional information from being disclosed, the reversible data hiding in encrypted images (RDHEI) has been widely concerned by researchers. How to further improve the performanc e of RDHEI methods has become a focus of research. To this end, this work proposes a high-capacity RDHEI method based on bit plane compression of prediction error. Firstly, to reserve the room for embedding information, the image owner rearranges and compresses the bit plane of prediction error. Next, the image after reserving room is encrypted with a serect key. Finally, the information hiding device embeds the additional information into the reserved room. This paper makes full use of the correlation between adjacent pixels. Experimental results show that this method can realize the real reversibility and provide higher embedding capacity than state-of-the-art works.
94 - Zhaoxia Yin , Yinyin Peng , 2019
Reversible data hiding in encrypted images (RDHEI) receives growing attention because it protects the content of the original image while the embedded data can be accurately extracted and the original image can be reconstructed lossless. To make full use of the correlation of the adjacent pixels, this paper proposes an RDHEI scheme based on pixel prediction and bit-plane compression. Firstly, to vacate room for data embedding, the prediction error of the original image is calculated and used for bit-plane rearrangement and compression. Then, the image after vacating room is encrypted by a stream cipher. Finally, the additional data is embedded in the vacated room by multi-LSB substitution. Experimental results show that the embedding capacity of the proposed method outperforms the state-of-the-art methods.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا