Covariate-Assisted Community Detection on Sparse Networks
Community detection is an important problem when processing network data. In many real data sets, the adjacency matrix can be too sparse at some nodes for existing methods to obtain any community information. The covariates have shown support in community detection. However, how to combine the covariates is a challenge, because covariates may have high dimensions and inconsistent class labels with the network. To quantify the relationship between the covariates and the network, we propose a general model, called covariate assisted degree corrected stochastic block model (CA- DCSBM). Based on CA-DCSBM, we design the adjusted neighbor-covariate (ANC) data matrix, which leverages covariate information to assist community detection. We then prove that the spectral clustering method on the ANC matrix will combine the network and covariates. The resulting method, named CA-SCORE, is shown to have the oracle property under mild conditions. In particular, we show that our framework can cover challenging scenarios where the adjacency matrix has no community information, or the covariate matrix has different community labels from the ones of the adjacency matrix. Finally, we apply CA-SCORE on several synthetic and real datasets and show that it has better performance than other community detection methods.
READ FULL TEXT