For a nonlinear regression model the information matrices of designs depend on the parameter of the model. The adaptive Wynn-algorithm for D-optimal design estimates the parameter at each step on the basis of the employed design points and observed responses so far, and selects the next design point as in the classical Wynn-algorithm for D-optimal design. The name `Wynn-algorithm is in honor of Henry P. Wynn who established the latter `classical algorithm in his 1970 paper. The asymptotics of the sequences of designs and maximum likelihood estimates generated by the adaptive algorithm is studied for an important class of nonlinear regression models: generalized linear models whose (univariate) response variables follow a distribution from a one-parameter exponential family. Under the assumptions of compactness of the experimental region and of the parameter space together with some natural continuity assumptions it is shown that the adaptive ML-estimators are strongly consistent and the design sequence is asymptotically locally D-optimal at the true parameter point. If the true parameter point is an interior point of the parameter space then under some smoothness assumptions the asymptotic normality of the adaptive ML-estimators is obtained.
The paper continues the authors work on the adaptive Wynn algorithm in a nonlinear regression model. In the present paper it is shown that if the mean response function satisfies a condition of `saturated identifiability, which was introduced by Pronzato cite{Pronzato}, then the adaptive least squares estimators are strongly consistent. The condition states that the regression parameter is identifiable under any saturated design, i.e., the values of the mean response function at any $p$ distinct design points determine the parameter point uniquely where, typically, $p$ is the dimension of the regression parameter vector. Further essential assumptions are compactness of the experimental region and of the parameter space together with some natural continuity assumptions. If the true parameter point is an interior point of the parameter space then under some smoothness assumptions and asymptotic homoscedasticity of random errors the asymptotic normality of adaptive least squares estimators is obtained.
In this paper we develop an online statistical inference approach for high-dimensional generalized linear models with streaming data for real-time estimation and inference. We propose an online debiased lasso (ODL) method to accommodate the special structure of streaming data. ODL differs from offline debiased lasso in two important aspects. First, in computing the estimate at the current stage, it only uses summary statistics of the historical data. Second, in addition to debiasing an online lasso estimator, ODL corrects an approximation error term arising from nonlinear online updating with streaming data. We show that the proposed online debiased estimators for the GLMs are consistent and asymptotically normal. This result provides a theoretical basis for carrying out real-time interim statistical inference with streaming data. Extensive numerical experiments are conducted to evaluate the performance of the proposed ODL method. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. A streaming dataset from the National Automotive Sampling System-Crashworthiness Data System is analyzed to illustrate the application of the proposed method.
A variance reduction technique in nonparametric smoothing is proposed: at each point of estimation, form a linear combination of a preliminary estimator evaluated at nearby points with the coefficients specified so that the asymptotic bias remains unchanged. The nearby points are chosen to maximize the variance reduction. We study in detail the case of univariate local linear regression. While the new estimator retains many advantages of the local linear estimator, it has appealing asymptotic relative efficiencies. Bandwidth selection rules are available by a simple constant factor adjustment of those for local linear estimation. A simulation study indicates that the finite sample relative efficiency often matches the asymptotic relative efficiency for moderate sample sizes. This technique is very general and has a wide range of applications.
Non linear regression models are a standard tool for modeling real phenomena, with several applications in machine learning, ecology, econometry... Estimating the parameters of the model has garnered a lot of attention during many years. We focus here on a recursive method for estimating parameters of non linear regressions. Indeed, these kinds of methods, whose most famous are probably the stochastic gradient algorithm and its averaged version, enable to deal efficiently with massive data arriving sequentially. Nevertheless, they can be, in practice, very sensitive to the case where the eigen-values of the Hessian of the functional we would like to minimize are at different scales. To avoid this problem, we first introduce an online Stochastic Gauss-Newton algorithm. In order to improve the estimates behavior in case of bad initialization, we also introduce a new Averaged Stochastic Gauss-Newton algorithm and prove its asymptotic efficiency.
In this paper, we survey some recent results on statistical inference (parametric and nonparametric statistical estimation, hypotheses testing) about the spectrum of stationary models with tapered data, as well as, a question concerning robustness of inferences, carried out on a linear stationary process contaminated by a small trend. We also discuss some question concerning tapered Toeplitz matrices and operators, central limit theorems for tapered Toeplitz type quadratic functionals, and tapered Fejer-type kernels and singular integrals. These are the main tools for obtaining the corresponding results, and also are of interest in themselves. The processes considered will be discrete-time and continuous-time Gaussian, linear or Levy-driven linear processes with memory.