No Arabic abstract
Many large-scale machine learning problems involve estimating an unknown parameter $theta_{i}$ for each of many items. For example, a key problem in sponsored search is to estimate the click through rate (CTR) of each of billions of query-ad pairs. Most common methods, though, only give a point estimate of each $theta_{i}$. A posterior distribution for each $theta_{i}$ is usually more useful but harder to get. We present a simple post-processing technique that takes point estimates or scores $t_{i}$ (from any method) and estimates an approximate posterior for each $theta_{i}$. We build on the idea of calibration, a common post-processing technique that estimates $mathrm{E}left(theta_{i}!!bigm|!! t_{i}right)$. Our method, second order calibration, uses empirical Bayes methods to estimate the distribution of $theta_{i}!!bigm|!! t_{i}$ and uses the estimated distribution as an approximation to the posterior distribution of $theta_{i}$. We show that this can yield improved point estimates and useful accuracy estimates. The method scales to large problems - our motivating example is a CTR estimation problem involving tens of billions of query-ad pairs.
The algorithms used for optimal management of ambulances require accurate description and prediction of the spatio-temporal evolution of emergency interventions. In the last years, several authors have proposed sophisticated statistical approaches to forecast the ambulance dispatches, typically modelling the events as a point pattern occurring on a planar region. Nevertheless, ambulance interventions can be more appropriately modelled as a realisation of a point process occurring along a network of lines, such as a road network. The constrained spatial domain raises specific challenges and unique methodological problems that cannot be ignored when developing a proper statistical model. Hence, this paper proposes a spatiotemporal model to analyse the ambulance interventions that occurred in the road network of Milan (Italy) from 2015 to 2017. We adopt a non-separable first-order intensity function with spatial and temporal terms. The temporal component is estimated semi-parametrically using a Poisson regression model, while the spatial dimension is estimated nonparametrically using a network kernel function. A set of weights is included in the spatial term to capture space-time interactions, inducing non-separability in the intensity function. A series of maps and graphical tests show that our approach successfully models the ambulance interventions and captures the space-time patterns.
Arctic sea ice plays an important role in the global climate. Sea ice models governed by physical equations have been used to simulate the state of the ice including characteristics such as ice thickness, concentration, and motion. More recent models also attempt to capture features such as fractures or leads in the ice. These simulated features can be partially misaligned or misshapen when compared to observational data, whether due to numerical approximation or incomplete physics. In order to make realistic forecasts and improve understanding of the underlying processes, it is necessary to calibrate the numerical model to field data. Traditional calibration methods based on generalized least-square metrics are flawed for linear features such as sea ice cracks. We develop a statistical emulation and calibration framework that accounts for feature misalignment and misshapenness, which involves optimally aligning model output with observed features using cutting edge image registration techniques. This work can also have application to other physical models which produce coherent structures.
Positron Emission Tomography (PET) is an imaging technique which can be used to investigate chemical changes in human biological processes such as cancer development or neurochemical reactions. Most dynamic PET scans are currently analyzed based on the assumption that linear first order kinetics can be used to adequately describe the system under observation. However, there has recently been strong evidence that this is not the case. In order to provide an analysis of PET data which is free from this compartmental assumption, we propose a nonparametric deconvolution and analysis model for dynamic PET data based on functional principal component analysis. This yields flexibility in the possible deconvolved functions while still performing well when a linear compartmental model setup is the true data generating mechanism. As the deconvolution needs to be performed on only a relative small number of basis functions rather than voxel by voxel in the entire 3-D volume, the methodology is both robust to typical brain imaging noise levels while also being computationally efficient. The new methodology is investigated through simulations in both 1-D functions and 2-D images and also applied to a neuroimaging study whose goal is the quantification of opioid receptor concentration in the brain.
In the following, bypassing dynamical systems tools, we propose a simple means of computing the box dimension of the graph of the classical Weierstrass function defined, for any real number~$x$, by~$ {cal W}(x)=displaystyle sum_{n=0}^{+infty} lambda^n,cos left ( 2, pi,N_b^n,x right) $, where~$lambda$ and~$N_b$ are two real numbers such that~mbox{$0 <lambda<1$},~mbox{$ N_b,in,N$} and~$ lambda,N_b > 1 $, using a sequence a graphs that approximate the studied one.
This work is motivated by the Obepine French system for SARS-CoV-2 viral load monitoring in wastewater. The objective of this work is to identify, from time-series of noisy measurements, the underlying auto-regressive signals, in a context where the measurements present numerous missing data, censoring and outliers. We propose a method based on an auto-regressive model adapted to censored data with outliers. Inference and prediction are produced via a discretised smoother. This method is both validated on simulations and on real data from Obepine. The proposed method is used to denoise measurements from the quantification of the SARS-CoV-2 E gene in wastewater by RT-qPCR. The resulting smoothed signal shows a good correlation with other epidemiological indicators and an estimate of the whole system noise is produced.