No Arabic abstract
In prevalent cohort studies where subjects are recruited at a cross-section, the time to an event may be subject to length-biased sampling, with the observed data being either the forward recurrence time, or the backward recurrence time, or their sum. In the regression setting, it has been shown that the accelerated failure time model for the underlying event time is invariant under these observed data set-ups and can be fitted using standard methodology for accelerated failure time model estimation, ignoring the length-bias. However, the efficiency of these estimators is unclear, owing to the fact that the observed covariate distribution, which is also length-biased, may contain information about the regression parameter in the accelerated life model. We demonstrate that if the true covariate distribution is completely unspecified, then the naive estimator based on the conditional likelihood given the covariates is fully efficient.
The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGMs expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.
We study a problem of estimation of smooth functionals of parameter $theta $ of Gaussian shift model $$ X=theta +xi, theta in E, $$ where $E$ is a separable Banach space and $X$ is an observation of unknown vector $theta$ in Gaussian noise $xi$ with zero mean and known covariance operator $Sigma.$ In particular, we develop estimators $T(X)$ of $f(theta)$ for functionals $f:Emapsto {mathbb R}$ of Holder smoothness $s>0$ such that $$ sup_{|theta|leq 1} {mathbb E}_{theta}(T(X)-f(theta))^2 lesssim Bigl(|Sigma| vee ({mathbb E}|xi|^2)^sBigr)wedge 1, $$ where $|Sigma|$ is the operator norm of $Sigma,$ and show that this mean squared error rate is minimax optimal at least in the case of standard Gaussian shift model ($E={mathbb R}^d$ equipped with the canonical Euclidean norm, $xi =sigma Z,$ $Zsim {mathcal N}(0;I_d)$). Moreover, we determine a sharp threshold on the smoothness $s$ of functional $f$ such that, for all $s$ above the threshold, $f(theta)$ can be estimated efficiently with a mean squared error rate of the order $|Sigma|$ in a small noise setting (that is, when ${mathbb E}|xi|^2$ is small). The construction of efficient estimators is crucially based on a bootstrap chain method of bias reduction. The results could be applied to a variety of special high-dimensional and infinite-dimensional Gaussian models (for vector, matrix and functional data).
We introduce estimation and test procedures through divergence minimiza- tion for models satisfying linear constraints with unknown parameter. These procedures extend the empirical likelihood (EL) method and share common features with generalized empirical likelihood approach. We treat the problems of existence and characterization of the divergence projections of probability distributions on sets of signed finite measures. We give a precise characterization of duality, for the proposed class of estimates and test statistics, which is used to derive their limiting distributions (including the EL estimate and the EL ratio statistic) both under the null hypotheses and under alterna- tives or misspecification. An approximation to the power function is deduced as well as the sample size which ensures a desired power for a given alternative.
Kernel-based nonparametric hazard rate estimation is considered with a special class of infinite-order kernels that achieves favorable bias and mean square error properties. A fully automatic and adaptive implementation of a density and hazard rate estimator is proposed for randomly right censored data. Careful selection of the bandwidth in the proposed estimators yields estimates that are more efficient in terms of overall mean squared error performance, and in some cases achieves a nearly parametric convergence rate. Additionally, rapidly converging bandwidth estimates are presented for use in second-order kernels to supplement such kernel-based methods in hazard rate estimation. Simulations illustrate the improved accuracy of the proposed estimator against other nonparametric estimators of the density and hazard function. A real data application is also presented on survival data from 13,166 breast carcinoma patients.
In this study, we propose shrinkage methods based on {it generalized ridge regression} (GRR) estimation which is suitable for both multicollinearity and high dimensional problems with small number of samples (large $p$, small $n$). Also, it is obtained theoretical properties of the proposed estimators for Low/High Dimensional cases. Furthermore, the performance of the listed estimators is demonstrated by both simulation studies and real-data analysis, and compare its performance with existing penalty methods. We show that the proposed methods compare well to competing regularization techniques.