On the Estimation of Information Measures of Continuous Distributions
The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in K-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family P. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in P is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.
READ FULL TEXT