Universal Approximation of Edge Density in Large Graphs
In this paper, we present a novel way to summarize the structure of large graphs, based on non-parametric estimation of edge density in directed multigraphs. Following coclustering approach, we use a clustering of the vertices, with a piecewise constant estimation of the density of the edges across the clusters, and address the problem of automatically and reliably inferring the number of clusters, which is the granularity of the coclustering. We use a model selection technique with data-dependent prior and obtain an exact evaluation criterion for the posterior probability of edge density estimation models. We demonstrate, both theoretically and empirically, that our data-dependent modeling technique is consistent, resilient to noise, valid non asymptotically and asymptotically behaves as an universal approximator of the true edge density in directed multigraphs. We evaluate our method using artificial graphs and present its practical interest on real world graphs. The method is both robust and scalable. It is able to extract insightful patterns in the unsupervised learning setting and to provide state of the art accuracy when used as a preparation step for supervised learning.
READ FULL TEXT