Missing Values and the Dimensionality of Expected Returns
Combining 100+ cross-sectional predictors requires either dropping 90 data or imputing missing values. We compare imputation using the expectation-maximization algorithm with simple ad-hoc methods. Surprisingly, expectation-maximization and ad-hoc methods lead to similar results. This similarity happens because predictors are largely independent: Correlations cluster near zero and more than 10 principal components are required to span 50 uninformative about missing predictors, making ad-hoc methods valid. In an out-of-sample principal components (PC) regression test, 50 PCs are required to capture equal-weighted long-short expected returns (30 PCs value-weighted), regardless of the imputation method.
READ FULL TEXT