Robust Mean Estimation in High Dimensions via Global Outlier Pursuit
We study the robust mean estimation problem in high dimensions, where less than half of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, we formulate the robust mean estimation problem as the minimization of the ℓ_0-`norm' of an outlier indicator vector, under a second moment constraint on the datapoints. We further relax the ℓ_0-`norm' to the ℓ_p-norm (0<p≤ 1) in the objective and prove that the global minima for each of these objectives are order-optimal for the robust mean estimation problem. Then we propose a computationally tractable iterative ℓ_p-minimization and hard thresholding algorithm that outputs an order-optimal robust estimate of the population mean. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods. The source code will be made available at GitHub.
READ FULL TEXT