Robust EM algorithm for model-based curve clustering
Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful approaches in cluster analysis. The mixture density estimation is generally performed by maximizing the observed-data log-likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the EM algorithm initialization is crucial. In addition, the standard EM algorithm requires the number of clusters to be known a priori. Some solutions have been provided in [31, 12] for model-based clustering with Gaussian mixture models for multivariate data. In this paper we focus on model-based curve clustering approaches, when the data are curves rather than vectorial data, based on regression mixtures. We propose a new robust EM algorithm for clustering curves. We extend the model-based clustering approach presented in [31] for Gaussian mixture models, to the case of curve clustering by regression mixtures, including polynomial regression mixtures as well as spline or B-spline regressions mixtures. Our approach both handles the problem of initialization and the one of choosing the optimal number of clusters as the EM learning proceeds, rather than in a two-fold scheme. This is achieved by optimizing a penalized log-likelihood criterion. A simulation study confirms the potential benefit of the proposed algorithm in terms of robustness regarding initialization and funding the actual number of clusters.
READ FULL TEXT