Connecting model-based and model-free approaches to linear least squares regression


Abstract in English

In a regression setting with response vector $mathbf{y} in mathbb{R}^n$ and given regressor vectors $mathbf{x}_1,ldots,mathbf{x}_p in mathbb{R}^n$, a typical question is to what extent $mathbf{y}$ is related to these regressor vectors, specifically, how well can $mathbf{y}$ be approximated by a linear combination of them. Classical methods for this question are based on statistical models for the conditional distribution of $mathbf{y}$, given the regressor vectors $mathbf{x}_j$. Davies and Duembgen (2020) proposed a model-free approach in which all observation vectors $mathbf{y}$ and $mathbf{x}_j$ are viewed as fixed, and the quality of the least squares fit of $mathbf{y}$ is quantified by comparing it with the least squares fit resulting from $p$ independent white noise regressor vectors. The purpose of the present note is to explain in a general context why the model-based and model-free approach yield the same p-values, although the interpretation of the latter is different under the two paradigms.

Download