No Arabic abstract
Connected vehicles, whether equipped with advanced driver-assistance systems or fully autonomous, are currently constrained to visual information in their lines-of-sight. A cooperative perception system among vehicles increases their situational awareness by extending their perception ranges. Existing solutions imply significant network and computation load, as well as high flow of not-always-relevant data received by vehicles. To address such issues, and thus account for the inherently diverse informativeness of the data, we present Augmented Informative Cooperative Perception (AICP) as the first fast-filtering system which optimizes the informativeness of shared data at vehicles. AICP displays the filtered data to the drivers in augmented reality head-up display. To this end, an informativeness maximization problem is presented for vehicles to select a subset of data to display to their drivers. Specifically, we propose (i) a dedicated system design with custom data structure and light-weight routing protocol for convenient data encapsulation, fast interpretation and transmission, and (ii) a comprehensive problem formulation and efficient fitness-based sorting algorithm to select the most valuable data to display at the application layer. We implement a proof-of-concept prototype of AICP with a bandwidth-hungry, latency-constrained real-life augmented reality application. The prototype realizes the informative-optimized cooperative perception with only 12.6 milliseconds additional latency. Next, we test the networking performance of AICP at scale and show that AICP effectively filter out less relevant packets and decreases the channel busy time.
ARIANNA stands for pAth Recognition for Indoor Assisted Navigation with Augmented perception. It is a flexible and low cost navigation system for vi- sually impaired people. Arianna permits to navigate colored paths painted or sticked on the floor revealing their directions through vibrational feedback on commercial smartphones.
Multi-modal representation learning by pretraining has become an increasing interest due to its easy-to-use and potential benefit for various Visual-and-Language~(V-L) tasks. However its requirement of large volume and high-quality vision-language pairs highly hinders its values in practice. In this paper, we proposed a novel label-augmented V-L pretraining model, named LAMP, to address this problem. Specifically, we leveraged auto-generated labels of visual objects to enrich vision-language pairs with fine-grained alignment and correspondingly designed a novel pretraining task. Besides, we also found such label augmentation in second-stage pretraining would further universally benefit various downstream tasks. To evaluate LAMP, we compared it with some state-of-the-art models on four downstream tasks. The quantitative results and analysis have well proven the value of labels in V-L pretraining and the effectiveness of LAMP.
We study the potential for interaction in natural language classification. We add a limited form of interaction for intent classification, where users provide an initial query using natural language, and the system asks for additional information using binary or multi-choice questions. At each turn, our system decides between asking the most informative question or making the final classification prediction.The simplicity of the model allows for bootstrapping of the system without interaction data, instead relying on simple crowdsourcing tasks. We evaluate our approach on two domains, showing the benefit of interaction and the advantage of learning to balance between asking additional questions and making the final prediction.
Volumetric media, popularly known as holograms, need to be delivered to users using both on-demand and live streaming, for new augmented reality (AR) and virtual reality (VR) experiences. As in video streaming, hologram streaming must support network adaptivity and fast startup, but must also moderate large bandwidths, multiple simultaneously streaming objects, and frequent user interaction, which requires low delay. In this paper, we introduce the first system to our knowledge designed specifically for streaming volumetric media. The system reduces bandwidth by introducing 3D tiles, and culling them or reducing their level of detail depending on their relation to the users view frustum and distance to the user. Our system reduces latency by introducing a window-based buffer, which in contrast to a queue-based buffer allows insertions near the head of the buffer rather than only at the tail of the buffer, to respond quickly to user interaction. To allocate bits between different tiles across multiple objects, we introduce a simple greedy yet provably optimal algorithm for rate-utility optimization. We introduce utility measures based not only on the underlying quality of the representation, but on the level of detail relative to the users viewpoint and device resolution. Simulation results show that the proposed algorithm provides superior quality compared to existing video-streaming approaches adapted to hologram streaming, in terms of utility and user experience over variable, throughput-constrained networks.
Most P2P VoD schemes focused on service architectures and overlays optimization without considering segments rarity and the performance of prefetching strategies. As a result, they cannot better support VCRoriented service in heterogeneous environment having clients using free VCR controls. Despite the remarkable popularity in VoD systems, there exist no prior work that studies the performance gap between different prefetching strategies. In this paper, we analyze and understand the performance of different prefetching strategies. Our analytical characterization brings us not only a better understanding of several fundamental tradeoffs in prefetching strategies, but also important insights on the design of P2P VoD system. On the basis of this analysis, we finally proposed a cooperative prefetching strategy called cooching. In this strategy, the requested segments in VCR interactivities are prefetched into session beforehand using the information collected through gossips. We evaluate our strategy through extensive simulations. The results indicate that the proposed strategy outperforms the existing prefetching mechanisms.