Fast Distance Oracles for Any Symmetric Norm
In the Distance Oracle problem, the goal is to preprocess n vectors x_1, x_2, โฏ, x_n in a d-dimensional metric space (๐^d, ยท_l) into a cheap data structure, so that given a query vector q โ๐^d and a subset Sโ [n] of the input data points, all distances q - x_i _l for x_iโ S can be quickly approximated (faster than the trivial โผ d|S| query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of โ_p norms, the problem is well understood, and optimal data structures are known for most values of p. Our main contribution is a fast (1+ฮต) distance oracle for any symmetric norm ยท_l. This class includes โ_p norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-k norms, max-mixture and sum-mixture of โ_p norms, small-support norms and the box-norm. We propose a novel data structure with ร(n (d + mmc(l)^2 ) ) preprocessing time and space, and t_q = ร(d + |S| ยทmmc(l)^2) query time, for computing distances to a subset S of data points, where mmc(l) is a complexity-measure (concentration modulus) of the symmetric norm. When l = โ_p , this runtime matches the aforementioned state-of-art oracles.
READ FULL TEXT