Median of means principle as a divide-and-conquer procedure for robustness, sub-sampling and hyper-parameters tuning
Many learning methods have poor risk estimates with large probability under moment assumptions on data, are sensitive to outliers and require hyper-parameters tuning. The purpose here is to introduce an algorithm whose task is, when fed with such learning methods and possibly corrupted data satisfying at best moment assumptions to: return a robust estimator with good excess risk bounds holding with exponentially large probability estimate, identify large non-corrupted subsamples and automatically tune hyper-parameters. The procedure is tested on the LASSO which is known to be highly sensitive to outliers. The basic tool is the median-of-means principle which can be recast as a divide-and-conquer methodologty, making this procedure easily scalable.
READ FULL TEXT