How well can we learn large factor models without assuming strong factors?


Abstract in English

In this paper, we consider the problem of learning models with a latent factor structure. The focus is to find what is possible and what is impossible if the usual strong factor condition is not imposed. We study the minimax rate and adaptivity issues in two problems: pure factor models and panel regression with interactive fixed effects. For pure factor models, if the number of factors is known, we develop adaptive estimation and inference procedures that attain the minimax rate. However, when the number of factors is not specified a priori, we show that there is a tradeoff between validity and efficiency: any confidence interval that has uniform validity for arbitrary factor strength has to be conservative; in particular its width is bounded away from zero even when the factors are strong. Conversely, any data-driven confidence interval that does not require as an input the exact number of factors (including weak ones) and has shrinking width under strong factors does not have uniform coverage and the worst-case coverage probability is at most 1/2. For panel regressions with interactive fixed effects, the tradeoff is much better. We find that the minimax rate for learning the regression coefficient does not depend on the factor strength and propose a simple estimator that achieves this rate. However, when weak factors are allowed, uncertainty in the number of factors can cause a great loss of efficiency although the rate is not affected. In most cases, we find that the strong factor condition (and/or exact knowledge of number of factors) improves efficiency, but this condition needs to be imposed by faith and cannot be verified in data for inference purposes.

Download