Do PAC-Learners Learn the Marginal Distribution?
We study a foundational variant of Valiant and Vapnik and Chervonenkis' Probably Approximately Correct (PAC)-Learning in which the adversary is restricted to a known family of marginal distributions 𝒫. In particular, we study how the PAC-learnability of a triple (𝒫,X,H) relates to the learners ability to infer distributional information about the adversary's choice of D ∈𝒫. To this end, we introduce the `unsupervised' notion of TV-Learning, which, given a class (𝒫,X,H), asks the learner to approximate D from unlabeled samples with respect to a natural class-conditional total variation metric. In the classical distribution-free setting, we show that TV-learning is equivalent to PAC-Learning: in other words, any learner must infer near-maximal information about D. On the other hand, we show this characterization breaks down for general 𝒫, where PAC-Learning is strictly sandwiched between two approximate variants we call `Strong' and `Weak' TV-learning, roughly corresponding to unsupervised learners that estimate most relevant distances in D with respect to H, but differ in whether the learner knows the set of well-estimated events. Finally, we observe that TV-learning is in fact equivalent to the classical notion of uniform estimation, and thereby give a strong refutation of the uniform convergence paradigm in supervised learning.
READ FULL TEXT