On Top-k Selection from m-wise Partial Rankings via Borda Counting
We analyze the performance of the Borda counting algorithm in a non-parametric model. The algorithm needs to utilize probabilistic rankings of the items within m-sized subsets to accurately determine which items are the overall top-k items in a total of n items. The Borda counting algorithm simply counts the cumulative scores for each item from these partial ranking observations. This generalizes a previous work of a similar nature by Shah et al. using probabilistic pairwise comparison data. The performance of the Borda counting algorithm critically depends on the associated score separation Δ_k between the k-th item and the (k+1)-th item. Specifically, we show that if Δ_k is greater than certain value, then the top-k items selected by the algorithm is asymptotically accurate almost surely; if Δ_k is below certain value, then the result will be inaccurate with a constant probability. In the special case of m=2, i.e., pairwise comparison, the resultant bound is tighter than that given by Shah et al., leading to a reduced gap between the error probability upper and lower bounds. These results are further extended to the approximate top-k selection setting. Numerical experiments demonstrate the effectiveness and accuracy of the Borda counting algorithm, compared with the spectral MLE-based algorithm, particularly when the data does not necessarily follow an assumed parametric model.
READ FULL TEXT