A Generalized Estimating Equation Approach to Network Regression
Regression models applied to network data where node attributes are the dependent variables poses a methodological challenge. As has been well studied, naive regression neither properly accounts for community structure, nor does it account for the dependent variable acting as both model outcome and covariate. To address this methodological gap, we propose a network regression model motivated by the important observation that controlling for community structure can, when a network is modular, significantly account for meaningful correlation between observations induced by network connections. We propose a generalized estimating equation (GEE) approach to learn model parameters based on clusters defined through any single-membership community detection algorithm applied to the observed network. We provide a necessary condition on the network size and edge formation probabilities to establish the asymptotic normality of the model parameters under the assumption that the graph structure is a stochastic block model. We evaluate the performance of our approach through simulations and apply it to estimate the joint impact of baseline covariates and network effects on COVID-19 incidence rate among countries connected by a network of commercial airline traffic. We find that during the beginning of the pandemic the network effect has some influence, the percentage of urban population has more influence on the incidence rate compared to the network effect after the travel ban was in effect.
READ FULL TEXT