No Arabic abstract
Partial observations of continuous time-series dynamics at arbitrary time stamps exist in many disciplines. Fitting this type of data using statistical models with continuous dynamics is not only promising at an intuitive level but also has practical benefits, including the ability to generate continuous trajectories and to perform inference on previously unseen time stamps. Despite exciting progress in this area, the existing models still face challenges in terms of their representational power and the quality of their variational approximations. We tackle these challenges with continuous latent process flows (CLPF), a principled architecture decoding continuous latent processes into continuous observable processes using a time-dependent normalizing flow driven by a stochastic differential equation. To optimize our model using maximum likelihood, we propose a novel piecewise construction of a variational posterior process and derive the corresponding variational lower bound using trajectory re-weighting. Our ablation studies demonstrate the effectiveness of our contributions in various inference tasks on irregular time grids. Comparisons to state-of-the-art baselines show our models favourable performance on both synthetic and real-world time-series data.
Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature. We propose an intensity-free framework that directly models the point process distribution by utilizing normalizing flows. This approach is capable of capturing highly complex temporal distributions and does not rely on restrictive parametric forms. Comparisons with state-of-the-art baseline models on both synthetic and challenging real-life datasets show that the proposed framework is effective at modeling the stochasticity of discrete event sequences.
Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and real-world data.
A normalizing flow is an invertible mapping between an arbitrary probability distribution and a standard normal distribution; it can be used for density estimation and statistical inference. Computing the flow follows the change of variables formula and thus requires invertibility of the mapping and an efficient way to compute the determinant of its Jacobian. To satisfy these requirements, normalizing flows typically consist of carefully chosen components. Continuous normalizing flows (CNFs) are mappings obtained by solving a neural ordinary differential equation (ODE). The neural ODEs dynamics can be chosen almost arbitrarily while ensuring invertibility. Moreover, the log-determinant of the flows Jacobian can be obtained by integrating the trace of the dynamics Jacobian along the flow. Our proposed OT-Flow approach tackles two critical computational challenges that limit a more widespread use of CNFs. First, OT-Flow leverages optimal transport (OT) theory to regularize the CNF and enforce straight trajectories that are easier to integrate. Second, OT-Flow features exact trace computation with time complexity equal to trace estimators used in existing CNFs. On five high-dimensional density estimation and generative modeling tasks, OT-Flow performs competitively to state-of-the-art CNFs while on average requiring one-fourth of the number of weights with an 8x speedup in training time and 24x speedup in inference.
We compare the discretize-optimize (Disc-Opt) and optimize-discretize (Opt-Disc) approaches for time-series regression and continuous normalizing flows (CNFs) using neural ODEs. Neural ODEs are ordinary differential equations (ODEs) with neural network components. Training a neural ODE is an optimal control problem where the weights are the controls and the hidden features are the states. Every training iteration involves solving an ODE forward and another backward in time, which can require large amounts of computation, time, and memory. Comparing the Opt-Disc and Disc-Opt approaches in image classification tasks, Gholami et al. (2019) suggest that Disc-Opt is preferable due to the guaranteed accuracy of gradients. In this paper, we extend the comparison to neural ODEs for time-series regression and CNFs. Unlike in classification, meaningful models in these tasks must also satisfy additional requirements beyond accurate final-time output, e.g., the invertibility of the CNF. Through our numerical experiments, we demonstrate that with careful numerical treatment, Disc-Opt methods can achieve similar performance as Opt-Disc at inference with drastically reduced training costs. Disc-Opt reduced costs in six out of seven separate problems with training time reduction ranging from 39% to 97%, and in one case, Disc-Opt reduced training from nine days to less than one day.
Deep learning methods have shown state of the art performance in a range of tasks from computer vision to natural language processing. However, it is well known that such systems are vulnerable to attackers who craft inputs in order to cause misclassification. The level of perturbation an attacker needs to introduce in order to cause such a misclassification can be extremely small, and often imperceptible. This is of significant security concern, particularly where misclassification can cause harm to humans. We thus propose Deep Latent Defence, an architecture which seeks to combine adversarial training with a detection system. At its core Deep Latent Defence has a adversarially trained neural network. A series of encoders take the intermediate layer representation of data as it passes though the network and project it to a latent space which we use for detecting adversarial samples via a $k$-nn classifier. We present results using both grey and white box attackers, as well as an adaptive $L_{infty}$ bounded attack which was constructed specifically to try and evade our defence. We find that even under the strongest attacker model that we have investigated our defence is able to offer significant defensive benefits.