أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Philip A. Chou

82 - Johannes Balle , Philip A. Chou , David Minnen 2020

We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--dis tortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate--distortion trade-off of nonlinear transforms, introducing a simplified one.

نظرية المعلومات معالجة الصور والفيديو نظرية المعلومات

Deep Implicit Volume Compression

134 - Danhang Tang , Saurabh Singh , Philip A. Chou 2020

We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly compress the signs of the TSDF, which also upper bounds the reconstruction error by the voxel size. To compress the corresponding texture, we designed a fast block-based UV parameterization, generating coherent texture maps that can be effectively compressed using existing video compression algorithms. We demonstrate the performance of our algorithms on two 4D performance capture datasets, reducing bitrate by 66% for the same distortion, or alternatively reducing the distortion by 50% for the same bitrate, compared to the state-of-the-art.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Surface Light Field Compression using a Point Cloud Codec

69 - Xiang Zhang , Philip A. Chou , Ming-Ting Sun 2018

Light field (LF) representations aim to provide photo-realistic, free-viewpoint viewing experiences. However, the most popular LF representations are images from multiple views. Multi-view image-based representations generally need to restrict the ra nge or degrees of freedom of the viewing experience to what can be interpolated in the image domain, essentially because they lack explicit geometry information. We present a new surface light field (SLF) representation based on explicit geometry, and a method for SLF compression. First, we map the multi-view images of a scene onto a 3D geometric point cloud. The color of each point in the point cloud is a function of viewing direction known as a view map. We represent each view map efficiently in a B-Spline wavelet basis. This representation is capable of modeling diverse surface materials and complex lighting conditions in a highly scalable and adaptive manner. The coefficients of the B-Spline wavelet representation are then compressed spatially. To increase the spatial correlation and thus improve compression efficiency, we introduce a smoothing term to make the coefficients more similar across the 3D space. We compress the coefficients spatially using existing point cloud compression (PCC) methods. On the decoder side, the scene is rendered efficiently from any viewing direction by reconstructing the view map at each point. In contrast to multi-view image-based LF approaches, our method supports photo-realistic rendering of real-world scenes from arbitrary viewpoints, i.e., with an unlimited six degrees of freedom (6DOF). In terms of rate and distortion, experimental results show that our method achieves superior performance with lighter decoder complexity compared with a reference image-plus-geometry compression (IGC) scheme, indicating its potential in practical virtual and augmented reality applications.

الوسائط المتعددة

Comments on Compression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform

248 - Gustavo Sandri , Ricado L. de Queiroz , Philip A. Chou 2018

The recently introduced coder based on region-adaptive hierarchical transform (RAHT) for the compression of point clouds attributes, was shown to have a performance competitive with the state-of-the-art, while being much less complex. In the paper Co mpression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform, top performance was achieved using arithmetic coding (AC), while adaptive run-length Golomb-Rice (RLGR) coding was presented as a lower-performance lower-complexity alternative. However, we have found that by reordering the RAHT coefficients we can largely increase the runs of zeros and significantly increase the performance of the RLGR-based RAHT coder. As a result, the new coder, using ordered coefficients, was shown to outperform all other coders, including AC-based RAHT, at an even lower computational cost. We present new results and plots that should enhance those in the work of Queiroz and Chou to include the new results for RLGR-RAHT. We risk to say, based on the results herein, that RLGR-RAHT with sorted coefficients is the new state-of-the-art in point cloud compression.

معالجة الصور والفيديو

Rate-Utility Optimized Streaming of Volumetric Media for Augmented Reality

83 - Jounsup Park , Philip A. Chou , 2018

Volumetric media, popularly known as holograms, need to be delivered to users using both on-demand and live streaming, for new augmented reality (AR) and virtual reality (VR) experiences. As in video streaming, hologram streaming must support network adaptivity and fast startup, but must also moderate large bandwidths, multiple simultaneously streaming objects, and frequent user interaction, which requires low delay. In this paper, we introduce the first system to our knowledge designed specifically for streaming volumetric media. The system reduces bandwidth by introducing 3D tiles, and culling them or reducing their level of detail depending on their relation to the users view frustum and distance to the user. Our system reduces latency by introducing a window-based buffer, which in contrast to a queue-based buffer allows insertions near the head of the buffer rather than only at the tail of the buffer, to respond quickly to user interaction. To allocate bits between different tiles across multiple objects, we introduce a simple greedy yet provably optimal algorithm for rate-utility optimization. We introduce utility measures based not only on the underlying quality of the representation, but on the level of detail relative to the users viewpoint and device resolution. Simulation results show that the proposed algorithm provides superior quality compared to existing video-streaming approaches adapted to hologram streaming, in terms of utility and user experience over variable, throughput-constrained networks.

الوسائط المتعددة

Dynamic Polygon Clouds: Representation and Compression for VR/AR

113 - Philip A. Chou , Eduardo Pavez , Ricardo L. de Queiroz 2016

We introduce the {em polygon cloud}, also known as a polygon set or {em soup}, as a compressible representation of 3D geometry (including its attributes, such as color texture) intermediate between polygonal meshes and point clouds. Dynamic or time-v arying polygon clouds, like dynamic polygonal meshes and dynamic point clouds, can take advantage of temporal redundancy for compression, if certain challenges are addressed. In this paper, we propose methods for compressing both static and dynamic polygon clouds, specifically triangle clouds. We compare triangle clouds to both triangle meshes and point clouds in terms of compression, for live captured dynamic colored geometry. We find that triangle clouds can be compressed nearly as well as triangle meshes, while being far more robust to noise and other structures typically found in live captures, which violate the assumption of a smooth surface manifold, such as lines, points, and ragged boundaries. We also find that triangle clouds can be used to compress point clouds with significantly better performance than previously demonstrated point cloud compression methods. In particular, for intra-frame coding of geometry, our method improves upon octree-based intra-frame coding by a factor of 5-10 in bit rate. Inter-frame coding improves this by another factor of 2-5. Overall, our dynamic triangle cloud compression improves over the previous state-of-the-art in dynamic point cloud compression by 33% or more.

الرسم الحاسوبي

Graph-based compression of dynamic 3D point cloud sequences

376 - Dorina Thanou , Philip A. Chou , 2015

This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames are similar, motion estimation is key to effective compression of these sequences. It however remains a challenging problem as the point cloud frames have varying numbers of points without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs, and consider 3D positions and color attributes of the points clouds as signals on the vertices of the graphs. We then cast motion estimation as a feature matching problem between successive graphs. The motion is estimated on a sparse set of representative vertices using new spectral graph wavelet descriptors. A dense motion field is eventually interpolated by solving a graph-based regularization problem. The estimated motion is finally used for removing the temporal redundancy in the predictive coding of the 3D positions and the color characteristics of the point cloud sequences. Experimental results demonstrate that our method is able to accurately estimate the motion between consecutive frames. Moreover, motion estimation is shown to bring significant improvement in terms of the overall compression performance of the sequence. To the best of our knowledge, this is the first paper that exploits both the spatial correlation inside each frame (through the graph) and the temporal correlation between the frames (through the motion estimation) to compress the color and the geometry of 3D point cloud sequences in an efficient way.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد