Scalability and robustness of spectral embedding: landmark diffusion is all you need
While spectral embedding is a widely applied dimension reduction technique in various fields, so far it is still challenging to make it scalable and robust to handle "big data". Motivated by the need of handling such data, we propose a novel spectral embedding algorithm, which we coined Robust and Scalable Embedding via Landmark Diffusion (ROSELAND). In short, we measure the affinity between two points via a set of landmarks, which is composed of a small number of points, and ”diffuse” on the dataset via the landmark set to achieve a spectral embedding. The embedding is not only scalable and robust, but also preserves the geometric properties under the manifold setup. The Roseland can be viewed as a generalization of the commonly applied spectral embedding algorithm, the diffusion map (DM), in the sense that it shares various properties of the DM. In addition to providing a theoretical justification of the Roseland under the manifold setup, including handling the U-statistics like quantities and providing a spectral convergence rate, we show various numerical simulations and compare the Roseland with other existing algorithms.
READ FULL TEXT