Two-sample inference for high-dimensional Markov networks
Markov networks are frequently used in sciences to represent conditional independence relationships underlying observed variables arising from a complex system. It is often of interest to understand how an underlying network differs between two conditions. In this paper, we develop methodology for performing valid statistical inference for difference between parameters of Markov network in a high-dimensional setting where the number of observed variables is allowed to be larger than the sample size. Our proposal is based on the regularized Kullback-Leibler Importance Estimation Procedure that allows us to directly learn the parameters of the differential network, without requiring for separate or joint estimation of the individual Markov network parameters. This allows for applications in cases where individual networks are not sparse, such as networks that contain hub nodes, but the differential network is sparse. We prove that our estimator is regular and its distribution can be well approximated by a normal under wide range of data generating processes and, in particular, is not sensitive to model selection mistakes. Furthermore, we develop a new testing procedure for equality of Markov networks, which is based on a max-type statistics. A valid bootstrap procedure is developed that approximates quantiles of the test statistics. The performance of the methodology is illustrated through extensive simulations and real data examples.
READ FULL TEXT