Phase Transitions in Rate Distortion Theory and Deep Learning

08/03/2020
โˆ™
by   Philipp Grohs, et al.
โˆ™
0
โˆ™

Rate distortion theory is concerned with optimally encoding a given signal class ๐’ฎ using a budget of R bits, as Rโ†’โˆž. We say that ๐’ฎ can be compressed at rate s if we can achieve an error of ๐’ช(R^-s) for encoding ๐’ฎ; the supremal compression rate is denoted s^โˆ—(๐’ฎ). Given a fixed coding scheme, there usually are elements of ๐’ฎ that are compressed at a higher rate than s^โˆ—(๐’ฎ) by the given coding scheme; we study the size of this set of signals. We show that for certain "nice" signal classes ๐’ฎ, a phase transition occurs: We construct a probability measure โ„™ on ๐’ฎ such that for every coding scheme ๐’ž and any s >s^โˆ—(๐’ฎ), the set of signals encoded with error ๐’ช(R^-s) by ๐’ž forms a โ„™-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into L^2(ฮฉ) for a bounded Lipschitz domain ฮฉ. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random fโˆˆ๐’ฎ can be encoded to within accuracy ฮต using R bits. This result is applied to the problem of approximately representing fโˆˆ๐’ฎ to within accuracy ฮต by a (quantized) neural network that is constrained to have at most W nonzero weights and is generated by an arbitrary "learning" procedure. We show that for any s >s^โˆ—(๐’ฎ) there are constants c,C such that, no matter how we choose the "learning" procedure, the probability of success is bounded from above by min{1,2^Cยท WโŒˆlog_2(1+W)โŒ‰^2 -cยทฮต^-1/s}.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset