Algorithms for Similarity Search and Pseudorandomness

06/22/2019
by   Tobias Christiani, et al.
0

We study the problem of approximate near neighbor (ANN) search and show the following results: - An improved framework for solving the ANN problem using locality-sensitive hashing, reducing the number of evaluations of locality-sensitive hash functions and the word-RAM complexity compared to the standard framework. - A framework for solving the ANN problem with space-time tradeoffs as well as tight upper and lower bounds for the space-time tradeoff of framework solutions to the ANN problem under cosine similarity. - A novel approach to solving the ANN problem on sets along with a matching lower bound, improving the state of the art. - A self-tuning version of the algorithm is shown through experiments to outperform existing similarity join algorithms. - Tight lower bounds for asymmetric locality-sensitive hashing which has applications to the approximate furthest neighbor problem, orthogonal vector search, and annulus queries. - A proof of the optimality of a well-known Boolean locality-sensitive hashing scheme. We study the problem of efficient algorithms for producing high-quality pseudorandom numbers and obtain the following results: - A deterministic algorithm for generating pseudorandom numbers of arbitrarily high quality in constant time using near-optimal space. - A randomized construction of a family of hash functions that outputs pseudorandom numbers of arbitrarily high quality with space usage and running time nearly matching known cell-probe lower bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2019

Lower Bounds for Oblivious Near-Neighbor Search

We prove an Ω(d n/ ( n)^2) lower bound on the dynamic cell-probe comple...
research
04/08/2019

Subsets and Supermajorities: Unifying Hashing-based Set Similarity Search

We consider the problem of designing Locality Sensitive Filters (LSF) fo...
research
05/23/2020

DartMinHash: Fast Sketching for Weighted Sets

Weighted minwise hashing is a standard dimensionality reduction techniqu...
research
09/07/2018

When Hashing Met Matching: Efficient Search for Potential Matches in Ride Sharing

We study the problem of matching rides in a ride sharing platform. Such ...
research
12/22/2017

Lattice-based Locality Sensitive Hashing is Optimal

Locality sensitive hashing (LSH) was introduced by Indyk and Motwani (ST...
research
06/20/2023

Cuckoo Hashing in Cryptography: Optimal Parameters, Robustness and Applications

Cuckoo hashing is a powerful primitive that enables storing items using ...
research
11/02/2019

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

The probability Jaccard similarity was recently proposed as a natural ge...

Please sign up or login with your details

Forgot password? Click here to reset