ترغب بنشر مسار تعليمي؟ اضغط هنا

Optimal Layered Representation for Adaptive Interactive Multiview Video Streaming

273   0   0.0 ( 0 )
 نشر من قبل Ana De Abreu
 تاريخ النشر 2015
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider an interactive multiview video streaming (IMVS) system where clients select their preferred viewpoint in a given navigation window. To provide high quality IMVS, many high quality views should be transmitted to the clients. However, this is not always possible due to the limited and heterogeneous capabilities of the clients. In this paper, we propose a novel adaptive IMVS solution based on a layered multiview representation where camera views are organized into layered subsets to match the different clients constraints. We formulate an optimization problem for the joint selection of the views subsets and their encoding rates. Then, we propose an optimal and a reduced computational complexity greedy algorithms, both based on dynamic-programming. Simulation results show the good performance of our novel algorithms compared to a baseline algorithm, proving that an effective IMVS adaptive solution should consider the scene content and the client capabilities and their preferences in navigation.



قيم البحث

اقرأ أيضاً

Interactive multi-view video streaming (IMVS) services permit to remotely immerse within a 3D scene. This is possible by transmitting a set of reference camera views (anchor views), which are used by the clients to freely navigate in the scene and po ssibly synthesize additional viewpoints of interest. From a networking perspective, the big challenge in IMVS systems is to deliver to each client the best set of anchor views that maximizes the navigation quality, minimizes the view-switching delay and yet satisfies the network constraints. Integrating adaptive streaming solutions in free-viewpoint systems offers a promising solution to deploy IMVS in large and heterogeneous scenarios, as long as the multi-view video representations on the server are properly selected. We therefore propose to optimize the multi-view data at the server by minimizing the overall resource requirements, yet offering a good navigation quality to the different users. We propose a video representation set optimization for multiview adaptive streaming systems and we show that it is NP-hard. We therefore introduce the concept of multi-view navigation segment that permits to cast the video representation set selection as an integer linear programming problem with a bounded computational complexity. We then show that the proposed solution reduces the computational complexity while preserving optimality in most of the 3D scenes. We then provide simulation results for different classes of users and show the gain offered by an optimal multi-view video representation selection compared to recommended representation sets (e.g., Netflix and Apple ones) or to a baseline representation selection algorithm where the encoding parameters are decided a priori for all the views.
Enabling users to interactively navigate through different viewpoints of a static scene is a new interesting functionality in 3D streaming systems. While it opens exciting perspectives towards rich multimedia applications, it requires the design of n ovel representations and coding techniques in order to solve the new challenges imposed by interactive navigation. Interactivity clearly brings new design constraints: the encoder is unaware of the exact decoding process, while the decoder has to reconstruct information from incomplete subsets of data since the server can generally not transmit images for all possible viewpoints due to resource constrains. In this paper, we propose a novel multiview data representation that permits to satisfy bandwidth and storage constraints in an interactive multiview streaming system. In particular, we partition the multiview navigation domain into segments, each of which is described by a reference image and some auxiliary information. The auxiliary information enables the client to recreate any viewpoint in the navigation segment via view synthesis. The decoder is then able to navigate freely in the segment without further data request to the server; it requests additional data only when it moves to a different segment. We discuss the benefits of this novel representation in interactive navigation systems and further propose a method to optimize the partitioning of the navigation domain into independent segments, under bandwidth and storage constraints. Experimental results confirm the potential of the proposed representation; namely, our system leads to similar compression performance as classical inter-view coding, while it provides the high level of flexibility that is required for interactive streaming. Hence, our new framework represents a promising solution for 3D data representation in novel interactive multimedia services.
170 - Laura Toni , Gene Cheung , 2015
To enable Interactive multiview video systems with a minimum view-switching delay, multiple camera views are sent to the users, which are used as reference images to synthesize additional virtual views via depth-image-based rendering. In practice, ba ndwidth constraints may however restrict the number of reference views sent to clients per time unit, which may in turn limit the quality of the synthesized viewpoints. We argue that the reference view selection should ideally be performed close to the users, and we study the problem of in-network reference view synthesis such that the navigation quality is maximized at the clients. We consider a distributed cloud network architecture where data stored in a main cloud is delivered to end users with the help of cloudlets, i.e., resource-rich proxies close to the users. In order to satisfy last-hop bandwidth constraints from the cloudlet to the users, a cloudlet re-samples viewpoints of the 3D scene into a discrete set of views (combination of received camera views and virtual views synthesized) to be used as reference for the synthesis of additional virtual views at the client. This in-network synthesis leads to better viewpoint sampling given a bandwidth constraint compared to simple selection of camera views, but it may however carry a distortion penalty in the cloudlet-synthesized reference views. We therefore cast a new reference view selection problem where the best subset of views is defined as the one minimizing the distortion over a view navigation window defined by the user under some transmission bandwidth constraints. We show that the view selection problem is NP-hard, and propose an effective polynomial time algorithm using dynamic programming to solve the optimization problem. Simulation results finally confirm the performance gain offered by virtual view synthesis in the network.
Emerging applications in multiview streaming look for providing interactive navigation services to video players. The user can ask for information from any viewpoint with a minimum transmission delay. The purpose is to provide user with as much infor mation as possible with least number of redundancies. The recent concept of navigation segment representation consists of regrouping a given number of viewpoints in one signal and transmitting them to the users according to their navigation path. The question of the best description strategy of these navigation segments is however still open. In this paper, we propose to represent and code navigation segments by a method that extends the recent layered depth image (LDI) format. It consists of describing the scene from a viewpoint with multiple images organized in layers corresponding to the different levels of occluded objects. The notion of extended LDI comes from the fact that the size of this image is adapted to take into account the sides of the scene also, in contrary to classical LDI. The obtained results show a significant rate-distortion gain compared to classical multiview compression approaches in navigation scenario.
Multiview video with interactive and smooth view switching at the receiver is a challenging application with several issues in terms of effective use of storage and bandwidth resources, reactivity of the system, quality of the viewing experience and system complexity. The classical decoding system for generating virtual views first projects a reference or encoded frame to a given viewpoint and then fills in the holes due to potential occlusions. This last step still constitutes a complex operation with specific software or hardware at the receiver and requires a certain quantity of information from the neighboring frames for insuring consistency between the virtual images. In this work we propose a new approach that shifts most of the burden due to interactivity from the decoder to the encoder, by anticipating the navigation of the decoder and sending auxiliary information that guarantees temporal and interview consistency. This leads to an additional cost in terms of transmission rate and storage, which we minimize by using optimization techniques based on the user behavior modeling. We show by experiments that the proposed system represents a valid solution for interactive multiview systems with classical decoders.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا