No Arabic abstract
Paired estimation of change in parameters of interest over a population plays a central role in several application domains including those in the social sciences, epidemiology, medicine and biology. In these domains, the size of the population under study is often very large, however, the number of observations available per individual in the population is very small (emph{sparse observations}) which makes the problem challenging. Consider the setting with $N$ independent individuals, each with unknown parameters $(p_i, q_i)$ drawn from some unknown distribution on $[0, 1]^2$. We observe $X_i sim text{Bin}(t, p_i)$ before an event and $Y_i sim text{Bin}(t, q_i)$ after the event. Provided these paired observations, ${(X_i, Y_i) }_{i=1}^N$, our goal is to accurately estimate the emph{distribution of the change in parameters}, $delta_i := q_i - p_i$, over the population and properties of interest like the emph{$ell_1$-magnitude of the change} with sparse observations ($tll N$). We provide emph{information theoretic lower bounds} on the error in estimating the distribution of change and the $ell_1$-magnitude of change. Furthermore, we show that the following two step procedure achieves the optimal error bounds: first, estimate the full joint distribution of the paired parameters using the maximum likelihood estimator (MLE) and then estimate the distribution of change and the $ell_1$-magnitude of change using the joint MLE. Notably, and perhaps surprisingly, these error bounds are of the same order as the minimax optimal error bounds for learning the emph{full} joint distribution itself (in Wasserstein-1 distance); in other words, estimating the magnitude of the change of parameters over the population is, in a minimax sense, as difficult as estimating the full joint distribution itself.
Change-points are a routine feature of big data observed in the form of high-dimensional data streams. In many such data streams, the component series possess group structures and it is natural to assume that changes only occur in a small number of all groups. We propose a new change point procedure, called groupInspect, that exploits the group sparsity structure to estimate a projection direction so as to aggregate information across the component series to successfully estimate the change-point in the mean structure of the series. We prove that the estimated projection direction is minimax optimal, up to logarithmic factors, when all group sizes are of comparable order. Moreover, our theory provide strong guarantees on the rate of convergence of the change-point location estimator. Numerical studies demonstrates the competitive performance of groupInspect in a wide range of settings and a real data example confirms the practical usefulness of our procedure.
Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input. It is frequently observed that quantile regression -- a vanilla algorithm for learning quantiles with asymptotic guarantees -- tends to emph{under-cover} than the desired coverage level in reality. While various fixes have been proposed, a more fundamental understanding of why this under-coverage bias happens in the first place remains elusive. In this paper, we present a rigorous theoretical study on the coverage of uncertainty estimation algorithms in learning quantiles. We prove that quantile regression suffers from an inherent under-coverage bias, in a vanilla setting where we learn a realizable linear quantile function and there is more data than parameters. More quantitatively, for $alpha>0.5$ and small $d/n$, the $alpha$-quantile learned by quantile regression roughly achieves coverage $alpha - (alpha-1/2)cdot d/n$ regardless of the noise distribution, where $d$ is the input dimension and $n$ is the number of training data. Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error that is not implied by existing theories on quantile regression. Experiments on simulated and real data verify our theory and further illustrate the effect of various factors such as sample size and model capacity on the under-coverage bias in more practical setups.
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a {em single} sample. {We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.}
Due to the recent advancements in wearables and sensing technology, health scientists are increasingly developing mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHealth intervention policies, often called just-in-time adaptive interventions, are decision rules that map an individuals current state (e.g., individuals past behaviors as well as current observations of time, location, social activity, stress and urges to smoke) to a particular treatment at each of many time points. The vast majority of current mHealth interventions deploy expert-derived policies. In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy. Our measure of performance is the average of proximal outcomes over a long time period should the particular mHealth policy be followed. We provide an estimator as well as confidence intervals. This work is motivated by HeartSteps, an mHealth physical activity intervention.
We address the joint estimation of the two defining parameters of a displacement operation in phase space. In a measurement scheme based on a Gaussian probe field and two homodyne detectors, it is shown that both conjugated parameters can be measured below the standard quantum limit when the probe field is entangled. We derive the most informative Cramer-Rao bound, providing the theoretical benchmark on the estimation and observe that our scheme is nearly optimal for a wide parameter range characterizing the probe field. We discuss the role of the entanglement as well as the relation between our measurement strategy and the generalized uncertainty relations.