No Arabic abstract
In this paper we revisit the classical problem of nonparametric regression, but impose local differential privacy constraints. Under such constraints, the raw data $(X_1,Y_1),ldots,(X_n,Y_n)$, taking values in $mathbb{R}^d times mathbb{R}$, cannot be directly observed, and all estimators are functions of the randomised output from a suitable privacy mechanism. The statistician is free to choose the form of the privacy mechanism, and here we add Laplace distributed noise to a discretisation of the location of a feature vector $X_i$ and to the value of its response variable $Y_i$. Based on this randomised data, we design a novel estimator of the regression function, which can be viewed as a privatised version of the well-studied partitioning regression estimator. The main result is that the estimator is strongly universally consistent. Our methods and analysis also give rise to a strongly universally consistent binary classification rule for locally differentially private data.
In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression. We derive the regret upper bounds on the classes of Sobolev spaces $W_{p}^{beta}(mathcal{X})$, $pgeq 2, beta>frac{d}{p}$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> frac{d}{2}$ or $p=infty$ these rates are (essentially) optimal. Finally, we compare the performance of the kernelized ridge regression forecaster to the known non-parametric forecasters in terms of the regret rates and their computational complexity as well as to the excess risk rates in the setting of statistical (i.i.d.) nonparametric regression.
Let ${(X_i,Y_i)}$ be a stationary ergodic time series with $(X,Y)$ values in the product space $R^dbigotimes R .$ This study offers what is believed to be the first strongly consistent (with respect to pointwise, least-squares, and uniform distance) algorithm for inferring $m(x)=E[Y_0|X_0=x]$ under the presumption that $m(x)$ is uniformly Lipschitz continuous. Auto-regression, or forecasting, is an important special case, and as such our work extends the literature of nonparametric, nonlinear forecasting by circumventing customary mixing assumptions. The work is motivated by a time series model in stochastic finance and by perspectives of its contribution to the issues of universal time series estimation.
We apply Gaussian process (GP) regression, which provides a powerful non-parametric probabilistic method of relating inputs to outputs, to survival data consisting of time-to-event and covariate measurements. In this context, the covariates are regarded as the `inputs and the event times are the `outputs. This allows for highly flexible inference of non-linear relationships between covariates and event times. Many existing methods, such as the ubiquitous Cox proportional hazards model, focus primarily on the hazard rate which is typically assumed to take some parametric or semi-parametric form. Our proposed model belongs to the class of accelerated failure time models where we focus on directly characterising the relationship between covariates and event times without any explicit assumptions on what form the hazard rates take. It is straightforward to include various types and combinations of censored and truncated observations. We apply our approach to both simulated and experimental data. We then apply multiple output GP regression, which can handle multiple potentially correlated outputs for each input, to competing risks survival data where multiple event types can occur. By tuning one of the model parameters we can control the extent to which the multiple outputs (the time-to-event for each risk) are dependent thus allowing the specification of correlated risks. Simulation studies suggest that in some cases assuming dependence can lead to more accurate predictions.
Spike-and-slab priors are popular Bayesian solutions for high-dimensional linear regression problems. Previous theoretical studies on spike-and-slab methods focus on specific prior formulations and use prior-dependent conditions and analyses, and thus can not be generalized directly. In this paper, we propose a class of generic spike-and-slab priors and develop a unified framework to rigorously assess their theoretical properties. Technically, we provide general conditions under which generic spike-and-slab priors can achieve the nearly-optimal posterior contraction rate and the model selection consistency. Our results include those of Narisetty and He (2014) and Castillo et al. (2015) as special cases.
The goal of regression is to recover an unknown underlying function that best links a set of predictors to an outcome from noisy observations. In non-parametric regression, one assumes that the regression function belongs to a pre-specified infinite dimensional function space (the hypothesis space). In the online setting, when the observations come in a stream, it is computationally-preferable to iteratively update an estimate rather than refitting an entire model repeatedly. Inspired by nonparametric sieve estimation and stochastic approximation methods, we propose a sieve stochastic gradient descent estimator (Sieve-SGD) when the hypothesis space is a Sobolev ellipsoid. We show that Sieve-SGD has rate-optimal MSE under a set of simple and direct conditions. We also show that the Sieve-SGD estimator can be constructed with low time expense, and requires almost minimal memory usage among all statistically rate-optimal estimators, under some conditions on the distribution of the predictors.