No Arabic abstract
Parameter retrieval and model inversion are key problems in remote sensing and Earth observation. Currently, different approximations exist: a direct, yet costly, inversion of radiative transfer models (RTMs); the statistical inversion with in situ data that often results in problems with extrapolation outside the study area; and the most widely adopted hybrid modeling by which statistical models, mostly nonlinear and non-parametric machine learning algorithms, are applied to invert RTM simulations. We will focus on the latter. Among the different existing algorithms, in the last decade kernel based methods, and Gaussian Processes (GPs) in particular, have provided useful and informative solutions to such RTM inversion problems. This is in large part due to the confidence intervals they provide, and their predictive accuracy. However, RTMs are very complex, highly nonlinear, and typically hierarchical models, so that often a shallow GP model cannot capture complex feature relations for inversion. This motivates the use of deeper hierarchical architectures, while still preserving the desirable properties of GPs. This paper introduces the use of deep Gaussian Processes (DGPs) for bio-geo-physical model inversion. Unlike shallow GP models, DGPs account for complicated (modular, hierarchical) processes, provide an efficient solution that scales well to big datasets, and improve prediction accuracy over their single layer counterpart. In the experimental section, we provide empirical evidence of performance for the estimation of surface temperature and dew point temperature from infrared sounding data, as well as for the prediction of chlorophyll content, inorganic suspended matter, and coloured dissolved matter from multispectral data acquired by the Sentinel-3 OLCI sensor. The presented methodology allows for more expressive forms of GPs in remote sensing model inversion problems.
Solving inverse problems is central to geosciences and remote sensing. Radiative transfer models (RTMs) represent mathematically the physical laws which govern the phenomena in remote sensing applications (forward models). The numerical inversion of the RTM equations is a challenging and computationally demanding problem, and for this reason, often the application of a nonlinear statistical regression is preferred. In general, regression models predict the biophysical parameter of interest from the corresponding received radiance. However, this approach does not employ the physical information encoded in the RTMs. An alternative strategy, which attempts to include the physical knowledge, consists in learning a regression model trained using data simulated by an RTM code. In this work, we introduce a nonlinear nonparametric regression model which combines the benefits of the two aforementioned approaches. The inversion is performed taking into account jointly both real observations and RTM-simulated data. The proposed Joint Gaussian Process (JGP) provides a solid framework for exploiting the regularities between the two types of data. The JGP automatically detects the relative quality of the simulated and real data, and combines them accordingly. This occurs by learning an additional hyper-parameter w.r.t. a standard GP model, and fitting parameters through maximizing the pseudo-likelihood of the real observations. The resulting scheme is both simple and robust, i.e., capable of adapting to different scenarios. The advantages of the JGP method compared to benchmark strategies are shown considering RTM-simulated and real observations in different experiments. Specifically, we consider leaf area index (LAI) retrieval from Landsat data combined with simulated data generated by the PROSAIL model.
We present a personalized and reliable prediction model for healthcare, which can provide individually tailored medical services such as diagnosis, disease treatment, and prevention. Our proposed framework targets at making personalized and reliable predictions from time-series data, such as Electronic Health Records (EHR), by modeling two complementary components: i) a shared component that captures global trend across diverse patients and ii) a patient-specific component that models idiosyncratic variability for each patient. To this end, we propose a composite model of a deep neural network to learn complex global trends from the large number of patients, and Gaussian Processes (GP) to probabilistically model individual time-series given relatively small number of visits per patient. We evaluate our model on diverse and heterogeneous tasks from EHR datasets and show practical advantages over standard time-series deep models such as pure Recurrent Neural Network (RNN).
Deep Gaussian processes (DGPs) have struggled for relevance in applications due to the challenges and cost associated with Bayesian inference. In this paper we propose a sparse variational approximation for DGPs for which the approximate posterior mean has the same mathematical structure as a Deep Neural Network (DNN). We make the forward pass through a DGP equivalent to a ReLU DNN by finding an interdomain transformation that represents the GP posterior mean as a sum of ReLU basis functions. This unification enables the initialisation and training of the DGP as a neural network, leveraging the well established practice in the deep learning community, and so greatly aiding the inference task. The experiments demonstrate improved accuracy and faster training compared to current DGP methods, while retaining favourable predictive uncertainties.
Gaussian processes (GPs) are nonparametric priors over functions. Fitting a GP implies computing a posterior distribution of functions consistent with the observed data. Similarly, deep Gaussian processes (DGPs) should allow us to compute a posterior distribution of compositions of multiple functions giving rise to the observations. However, exact Bayesian inference is intractable for DGPs, motivating the use of various approximations. We show that the application of simplifying mean-field assumptions across the hierarchy leads to the layers of a DGP collapsing to near-deterministic transformations. We argue that such an inference scheme is suboptimal, not taking advantage of the potential of the model to discover the compositional structure in the data. To address this issue, we examine alternative variational inference schemes allowing for dependencies across different layers and discuss their advantages and limitations.
It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.