Sequential algorithms for testing identity and closeness of distributions

05/12/2022
∙
by   Omar Fawzi, et al.
∙
0
∙

What advantage do sequential procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions 𝒟_1 and 𝒟_2 on {1,â€Ķ, n} are equal or Ïĩ-far, we give several answers to this question. We show that for a small alphabet size n, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least 4 in terms sample complexity. For a general alphabet size n, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance TV(𝒟_1, 𝒟_2) between 𝒟_1 and 𝒟_2 is larger than Ïĩ. As a corollary, letting Ïĩ go to 0, we obtain a sequential algorithm for testing closeness when no a priori bound on TV(𝒟_1, 𝒟_2) is given that has a sample complexity 𝒊Ėƒ(n^2/3/TV(𝒟_1, 𝒟_2)^4/3): this improves over the 𝒊Ėƒ(n/log n/TV(𝒟_1, 𝒟_2)^2) tester of <cit.> and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing identity and closeness: they can improve the worst case number of samples by at most a constant factor.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset