No Arabic abstract
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents. While previous works have leveraged conditional generative models like GANs and VAEs for learning the likely future trajectories, accurately modeling the dependency structure of these multimodal distributions, particularly over long time horizons remains challenging. Normalizing flow based generative models can model complex distributions admitting exact inference. These include variants with split coupling invertible transformations that are easier to parallelize compared to their autoregressive counterparts. To this end, we introduce a novel Haar wavelet based block autoregressive model leveraging split couplings, conditioned on coarse trajectories obtained from Haar wavelet based transformations at different levels of granularity. This yields an exact inference method that models trajectories at different spatio-temporal resolutions in a hierarchical manner. We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets - Stanford Drone and Intersection Drone.
In most practical situations, the compression or transmission of images and videos creates distortions that will eventually be perceived by a human observer. Vice versa, image and video restoration techniques, such as inpainting or denoising, aim to enhance the quality of experience of human viewers. Correctly assessing the similarity between an image and an undistorted reference image as subjectively experienced by a human viewer can thus lead to significant improvements in any transmission, compression, or restoration system. This paper introduces the Haar wavelet-based perceptual similarity index (HaarPSI), a novel and computationally inexpensive similarity measure for full reference image quality assessment. The HaarPSI utilizes the coefficients obtained from a Haar wavelet decomposition to assess local similarities between two images, as well as the relative importance of image areas. The consistency of the HaarPSI with the human quality of experience was validated on four large benchmark databases containing thousands of differently distorted images. On these databases, the HaarPSI achieves higher correlations with human opinion scores than state-of-the-art full reference similarity measures like the structural similarity index (SSIM), the feature similarity index (FSIM), and the visual saliency-based index (VSI). Along with the simple computational structure and the short execution time, these experimental results suggest a high applicability of the HaarPSI in real world tasks.
Pedestrian trajectory prediction is a challenging task as there are three properties of human movement behaviors which need to be addressed, namely, the social influence from other pedestrians, the scene constraints, and the multimodal (multiroute) nature of predictions. Although existing methods have explored these key properties, the prediction process of these methods is autoregressive. This means they can only predict future locations sequentially. In this paper, we present NAP, a non-autoregressive method for trajectory prediction. Our method comprises specifically designed feature encoders and a latent variable generator to handle the three properties above. It also has a time-agnostic context generator and a time-specific context generator for non-autoregressive prediction. Through extensive experiments that compare NAP against several recent methods, we show that NAP has state-of-the-art trajectory prediction performance.
Recently, ghost imaging has been attracting attentions because its mechanism would lead to many applications inaccessible to conventional imaging methods. However, it is challenging for high contrast and high resolution imaging, due to its low signal-to-noise ratio (SNR) and the demand of high sampling rate in detection. To circumvent these challenges, we here propose a ghost imaging scheme that exploits Haar wavelets as illuminating patterns with a bi-frequency light projecting system and frequency-selecting single-pixel detectors. This method provides a theoretically 100% image contrast and high detection SNR, which reduces the requirement of high dynamic range of detectors, enabling high resolution ghost imaging. Moreover, it can highly reduce the sampling rate (far below Nyquist limit) for a sparse object by adaptively abandoning unnecessary patterns during the measurement. These characteristics are experimentally verified with a resolution of 512 times 512 and a sampling rate lower than 5%. A high-resolution (1000 times 1000 times 1000) 3D reconstruction of an object is also achieved from multi-angle images.
Graph Neural Networks (GNNs) have recently caught great attention and achieved significant progress in graph-level applications. In this paper, we propose a framework for graph neural networks with multiresolution Haar-like wavelets, or MathNet, with interrelated convolution and pooling strategies. The underlying method takes graphs in different structures as input and assembles consistent graph representations for readout layers, which then accomplishes label prediction. To achieve this, the multiresolution graph representations are first constructed and fed into graph convolutional layers for processing. The hierarchical graph pooling layers are then involved to downsample graph resolution while simultaneously remove redundancy within graph signals. The whole workflow could be formed with a multi-level graph analysis, which not only helps embed the intrinsic topological information of each graph into the GNN, but also supports fast computation of forward and adjoint graph transforms. We show by extensive experiments that the proposed framework obtains notable accuracy gains on graph classification and regression tasks with performance stability. The proposed MathNet outperforms various existing GNN models, especially on big data sets.
We introduce the use of autoregressive normalizing flows for rapid likelihood-free inference of binary black hole system parameters from gravitational-wave data with deep neural networks. A normalizing flow is an invertible mapping on a sample space that can be used to induce a transformation from a simple probability distribution to a more complex one: if the simple distribution can be rapidly sampled and its density evaluated, then so can the complex distribution. Our first application to gravitational waves uses an autoregressive flow, conditioned on detector strain data, to map a multivariate standard normal distribution into the posterior distribution over system parameters. We train the model on artificial strain data consisting of IMRPhenomPv2 waveforms drawn from a five-parameter $(m_1, m_2, phi_0, t_c, d_L)$ prior and stationary Gaussian noise realizations with a fixed power spectral density. This gives performance comparable to current best deep-learning approaches to gravitational-wave parameter estimation. We then build a more powerful latent variable model by incorporating autoregressive flows within the variational autoencoder framework. This model has performance comparable to Markov chain Monte Carlo and, in particular, successfully models the multimodal $phi_0$ posterior. Finally, we train the autoregressive latent variable model on an expanded parameter space, including also aligned spins $(chi_{1z}, chi_{2z})$ and binary inclination $theta_{JN}$, and show that all parameters and degeneracies are well-recovered. In all cases, sampling is extremely fast, requiring less than two seconds to draw $10^4$ posterior samples.