Approximate Nearest Neighbor for Curves --- Simple, Efficient, and Deterministic
In the (1+ε,r)-approximate-near-neighbor problem for curves (ANNC) under some distance measure δ, the goal is to construct a data structure for a given set C of curves that supports approximate near-neighbor queries: Given a query curve Q, if there exists a curve C∈C such that δ(Q,C)< r, then return a curve C'∈C with δ(Q,C')<(1+ε)r. There exists an efficient reduction from the (1+ε)-approximate-nearest-neighbor problem to ANNC, where in the former problem the answer to a query is a curve C∈C with δ(Q,C)<(1+ε)·δ(Q,C^*), where C^* is the curve of C closest to Q. Given a set C of n curves, each consisting of m points in d dimensions, we construct a data structure for ANNC that uses n· O(1/ε)^md storage space and has O(md(nm/ε)) query time (for a query curve of length m), where the similarity between two curves is their discrete Fréchet or dynamic time warping distance. Our approach consists of a discretization of space based on the input curves, which allows us to prepare a small set of curves that captures all possible queries approximately. Our method is simple and deterministic, yet, somewhat surprisingly, it is more efficient than all previous methods. We also apply our method to a version of approximate range counting for curves and achieve similar bounds.
READ FULL TEXT