Online Updating Huber Robust Regression for Big Data Streams
Big data has grasped great attention in different fields over recent years. In the context of computer memory limitation, how to do regression on big data streams and solve outlier problems reasonably is worth discussing. Take this as a starting point, this article proposes an Online Updating Huber Robust Regression algorithm. By integrating Huber regression into Online Updating structure, it can achieve continuously updating on historical data using key features extracted from new data subsets and be robust to heavy-tailed distribution, cases with heterogeneous error and outliers. The Online Updating estimator obtained is asymptotically equivalent with Oracle estimator calculated by the entire data and has a lower computation complexity. We also execute simulations and real data analysis. Results in experiments shows that our algorithm performs outstandingly among other 5 algorithms in estimation and calculation efficiency, being feasible to real application.
READ FULL TEXT