Distinct Elements in Streams: An Algorithm for the (Text) Book

01/24/2023
by   Sourav Chakraborty, et al.
0

Given a data stream 𝒟 = ⟨ a_1, a_2, …, a_m ⟩ of m elements where each a_i ∈ [n], the Distinct Elements problem is to estimate the number of distinct elements in 𝒟. Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space optimal algorithms for it. All the current state-of-the-art algorithms are, however, beyond the reach of an undergraduate textbook owing to their reliance on the usage of notions such as pairwise independence and universal hash functions. We present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset