Sequential algorithms for testing identity and closeness of distributions
What advantage do sequential procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions π_1 and π_2 on {1,β¦, n} are equal or Ο΅-far, we give several answers to this question. We show that for a small alphabet size n, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least 4 in terms sample complexity. For a general alphabet size n, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance TV(π_1, π_2) between π_1 and π_2 is larger than Ο΅. As a corollary, letting Ο΅ go to 0, we obtain a sequential algorithm for testing closeness when no a priori bound on TV(π_1, π_2) is given that has a sample complexity πͺΜ(n^2/3/TV(π_1, π_2)^4/3): this improves over the πͺΜ(n/log n/TV(π_1, π_2)^2) tester of <cit.> and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing identity and closeness: they can improve the worst case number of samples by at most a constant factor.
READ FULL TEXT