Graph Compression with Application to Model Selection
Many multivariate data such as social and biological data exhibit complex dependencies that are best characterized by graphs. Unlike sequential data, graphs are, in general, unordered structures. This means we can no longer use classic, sequential-based compression methods on these graph-based data. Therefore, it is necessary to develop new methods for graph compression. In this paper, we present universal source coding methods for the lossless compression of unweighted, undirected, unlabelled graphs. We encode in two steps: 1) transforming graph into a rooted binary tree, 2) the encoding rooted binary tree using graph statistics. Our coders showed better compression performance than other source coding methods on both synthetic and real-world graphs. We then applied our graph coding methods for model selection of Gaussian graphical models using minimum description length (MDL) principle finding the description length of the conditional independence graph. Experiments on synthetic data show that our approach gives better performance compared to common model selection methods. We also applied our approach to electrocardiogram (ECG) data in order to explore the differences between graph models of two groups of subjects.
READ FULL TEXT