No Arabic abstract
We demonstrate applications of the Gaussian process-based landmarking algorithm proposed in [T. Gao, S.Z. Kovalsky, and I. Daubechies, SIAM Journal on Mathematics of Data Science (2019)] to geometric morphometrics, a branch of evolutionary biology centered at the analysis and comparisons of anatomical shapes, and compares the automatically sampled landmarks with the ground truth landmarks manually placed by evolutionary anthropologists; the results suggest that Gaussian process landmarks perform equally well or better, in terms of both spatial coverage and downstream statistical analysis. We provide a detailed exposition of numerical procedures and feature filtering algorithms for computing high-quality and semantically meaningful diffeomorphisms between disk-type anatomical surfaces.
As a means of improving analysis of biological shapes, we propose an algorithm for sampling a Riemannian manifold by sequentially selecting points with maximum uncertainty under a Gaussian process model. This greedy strategy is known to be near-optimal in the experimental design literature, and appears to outperform the use of user-placed landmarks in representing the geometry of biological objects in our application. In the noiseless regime, we establish an upper bound for the mean squared prediction error (MSPE) in terms of the number of samples and geometric quantities of the manifold, demonstrating that the MSPE for our proposed sequential design decays at a rate comparable to the oracle rate achievable by any sequential or non-sequential optimal design; to our knowledge this is the first result of this type for sequential experimental design. The key is to link the greedy algorithm to reduced basis methods in the context of model reduction for partial differential equations. We expect this approach will find additional applications in other fields of research.
Gaussian processes are the ideal tool for modelling the Galactic ISM, combining statistical flexibility with a good match to the underlying physics. In an earlier paper we outlined how they can be employed to construct three-dimensional maps of dust extinction from stellar surveys. Gaussian processes scale poorly to large datasets though, which put the analysis of realistic catalogues out of reach. Here we show how a novel combination of the Expectation Propagation method and certain sparse matrix approximations can be used to accelerate the dust mapping problem. We demonstrate, using simulated Gaia data, that the resultant algorithm is fast, accurate and precise. Critically, it can be scaled up to map the Gaia catalogue.
Updating observations of a signal due to the delays in the measurement process is a common problem in signal processing, with prominent examples in a wide range of fields. An important example of this problem is the nowcasting of COVID-19 mortality: given a stream of reported counts of daily deaths, can we correct for the delays in reporting to paint an accurate picture of the present, with uncertainty? Without this correction, raw data will often mislead by suggesting an improving situation. We present a flexible approach using a latent Gaussian process that is capable of describing the changing auto-correlation structure present in the reporting time-delay surface. This approach also yields robust estimates of uncertainty for the estimated nowcasted numbers of deaths. We test assumptions in model specification such as the choice of kernel or hyper priors, and evaluate model performance on a challenging real dataset from Brazil. Our experiments show that Gaussian process nowcasting performs favourably against both comparable methods, and against a small sample of expert human predictions. Our approach has substantial practical utility in disease modelling -- by applying our approach to COVID-19 mortality data from Brazil, where reporting delays are large, we can make informative predictions on important epidemiological quantities such as the current effective reproduction number.
We first introduce a novel profile-based alignment algorithm, the multiple continuous Signal Alignment algorithm with Gaussian Process Regression profiles (SA-GPR). SA-GPR addresses the limitations of currently available signal alignment methods by adopting a hybrid of the particle smoothing and Markov-chain Monte Carlo (MCMC) algorithms to align signals, and by applying the Gaussian process regression to construct profiles to be aligned continuously. SA-GPR shares all the strengths of the existing alignment algorithms that depend on profiles but is more exact in the sense that profiles do not need to be discretized as sequential bins. The uncertainty of performance over the resolution of such bins is thereby eliminated. This methodology produces alignments that are consistent, that regularize extreme cases, and that properly reflect the inherent uncertainty. Then we extend SA-GPR to a specific problem in the field of paleoceanography with a method called Bayesian Inference Gaussian Process Multiproxy Alignment of Continuous Signals (BIGMACS). The goal of BIGMACS is to infer continuous ages for ocean sediment cores using two classes of age proxies: proxies that explicitly return calendar ages (e.g., radiocarbon) and those used to synchronize ages in multiple marine records (e.g., an oxygen isotope based marine proxy known as benthic ${delta}^{18}{rm O}$). BIGMACS integrates these two proxies by iteratively performing two steps: profile construction from benthic ${delta}^{18}{rm O}$ age models and alignment of each core to the profile also reflecting radiocarbon dates. We use BIGMACS to construct a new Deep Northeastern Atlantic stack (i.e., a profile from a particular benthic ${delta}^{18}{rm O}$ records) of five ocean sediment cores. We conclude by constructing multiproxy age models for two additional cores from the same region by aligning them to the stack.
Modern media data such as 360 videos and light field (LF) images are typically captured in much higher dimensions than the observers visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intra-coding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation. To address this problem, we propose landmarks, a selection of key MDUs from the high-dimensional media. Using landmarks as predictors, nearby MDUs in local neighborhoods are intercoded, resulting in a predictive MDU structure with controlled coding cost. It means that any requested MDU can be decoded by at most transmitting a landmark and an inter-coded MDU, enabling navigational random access. To build a landmarked MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove inter-coded MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 images as illustrative applications, and I-, P- and previously proposed merge frames to intra- and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.