High Dimensional Clustering with r-nets

11/06/2018
by   Georgia Avarikioti, et al.
0

Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called r-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating r-nets in high-dimensional spaces with ℓ_1 and ℓ_2 metrics from Õ(dn^2-Θ(√(ϵ))) to Õ(dn + n^2-α), where α = Ω(ϵ^1/3/(1/ϵ)). These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g., (1+ϵ)-approximate kth-nearest neighbor distance, (4+ϵ)-approximate Min-Max clustering, (4+ϵ)-approximate k-center clustering. In addition, we build an algorithm that (1+ϵ)-approximates greedy permutations in time Õ((dn + n^2-α) ·Φ) where Φ is the spread of the input. This algorithm is used to (2+ϵ)-approximate k-center with the same time complexity.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset