missSBM: An R Package for Handling Missing Values in the Stochastic Block Model
The Stochastic Block Model (SBM) is a popular probabilistic model for random graph. It is commonly used to perform clustering on network data by aggregating nodes that share similar connectivity patterns into blocks. When fitting an SBM to a network which is partially observed, it is important to account for the underlying process that originates the missing values, otherwise the inference may be biased. This paper introduces missSBM, an R-package fitting the SBM when the network is partially observed, i.e. the adjacency matrix contains not only 1 or 0 encoding presence or absence of edges but also NA encoding missing information between pairs of nodes. It implements a series of algorithms for the binary SBM, with the possibility of accounting for covariates if needed, by performing variational inference for several sampling mechanisms, the methodology of which is detailed in Tabouy, Barbillon, and Chiquet (2019). Our implementation automatically explores different block numbers to select the most relevant according to the Integrated Classification Likelihood (ICL) criterion. The ICL criterion can also help to determine which sampling mechanism fits the best the data. Finally, missSBM can be used to perform imputation of missing entries in the adjacency matrix. We illustrate the package on a network data set consisting in interactions between blogs sampled during the French presidential election in 2007.
READ FULL TEXT