Cascading Bandit under Differential Privacy
This paper studies differential privacy (DP) and local differential privacy (LDP) in cascading bandits. Under DP, we propose an algorithm which guarantees ϵ-indistinguishability and a regret of 𝒪((log T/ϵ)^1+ξ) for an arbitrarily small ξ. This is a significant improvement from the previous work of 𝒪(log^3 T/ϵ) regret. Under (ϵ,δ)-LDP, we relax the K^2 dependence through the tradeoff between privacy budget ϵ and error probability δ, and obtain a regret of 𝒪(Klog (1/δ) log T/ϵ^2), where K is the size of the arm subset. This result holds for both Gaussian mechanism and Laplace mechanism by analyses on the composition. Our results extend to combinatorial semi-bandit. We show respective lower bounds for DP and LDP cascading bandits. Extensive experiments corroborate our theoretic findings.
READ FULL TEXT