The Early Roots of Statistical Learning in the Psychometric Literature: A review and two new results

published by Mark De Rooij in 2019 in Mathematical Statistics and research's language is English Download

Abstract in English

Machine and Statistical learning techniques become more and more important for the analysis of psychological data. Four core concepts of machine learning are the bias variance trade-off, cross-validation, regularization, and basis expansion. We present some early psychometric papers, from almost a century ago, that dealt with cross-validation and regularization. From this review it is safe to conclude that the origins of these lie partly in the field of psychometrics. From our historical review, two new ideas arose which we investigated further: The first is about the relationship between reliability and predictive validity; the second is whether optimal regression weights should be estimated by regularizing their values towards equality or shrinking their values towards zero. In a simulation study we show that the reliability of a test score does not influence the predictive validity as much as is usually written in psychometric textbooks. Using an empirical example we show that regularization towards equal regression coefficients is beneficial in terms of prediction error.

Download