Tight Bound of Incremental Cover Trees for Dynamic Diversification
Dynamic diversification---finding a set of data points with maximum diversity from a time-dependent sample pool---is an important task in recommender systems, web search, database search, and notification services, to avoid showing users duplicate or very similar items. The incremental cover tree (ICT) with high computational efficiency and flexibility has been applied to this task, and shown good performance. Specifically, it was empirically observed that ICT typically provides a set with its diversity only marginally (∼ 1/ 1.2 times) worse than the greedy max-min (GMM) algorithm, the state-of-the-art method for static diversification with its performance bound optimal for any polynomial time algorithm. Nevertheless, the known performance bound for ICT is 4 times worse than this optimal bound. With this paper, we aim to fill this very gap between theory and empirical observations. For achieving this, we first analyze variants of ICT methods, and derive tighter performance bounds. We then investigate the gap between the obtained bound and empirical observations by using specially designed artificial data for which the optimal diversity is known. Finally, we analyze the tightness of the bound, and show that the bound cannot be further improved, i.e., this paper provides the tightest possible bound for ICT methods. In addition, we demonstrate a new use of dynamic diversification for generative image samplers, where prototypes are incrementally collected from a stream of artificial images generated by an image sampler.
READ FULL TEXT