Fundamental Limits of Lossless Data Compression with Side Information
The problem of lossless data compression with side information available to both the encoder and the decoder is considered. The finite-blocklength fundamental limits of the best achievable performance are defined, in two different versions of the problem: Reference-based compression, when a single side information string is used repeatedly in compressing different source messages, and pair-based compression, where a different side information string is used for each source message. General achievability and converse theorems are established for arbitrary source-side information pairs, and the optimal asymptotic behaviour of arbitrary compressors is determined for ergodic source-side information pairs. A central limit theorem and a law of the iterated logarithm are proved, describing the inevitable fluctuations of the second-order asymptotically best possible rate, under appropriate mixing conditions. An idealized version of Lempel-Ziv coding with side information is shown to be first- and second-order asymptotically optimal, under the same conditions. Nonasymptotic normal approximation expansions are proved for the optimal rate in both the reference-based and pair-based settings, for memoryless sources. These are stated in terms of explicit, finite-blocklength bounds, that are tight up to third-order terms. Extensions that go significantly beyond the class of memoryless sources are obtained. The relevant source dispersion is identified and its relationship with the conditional varentropy rate is established. Interestingly, the dispersion is different in reference-based and pair-based compression, and it is proved that the reference-based dispersion is in general smaller.
READ FULL TEXT