Predictive Independence Testing, Predictive Conditional Independence Testing, and Predictive Graphical Modelling
Testing (conditional) independence of multivariate random variables is a task central to statistical inference and modelling in general - though unfortunately one for which to date there does not exist a practicable workflow. State-of-art workflows suffer from the need for heuristic or subjective manual choices, high computational complexity, or strong parametric assumptions. We address these problems by establishing a theoretical link between multivariate/conditional independence testing, and model comparison in the multivariate predictive modelling aka supervised learning task. This link allows advances in the extensively studied supervised learning workflow to be directly transferred to independence testing workflows - including automated tuning of machine learning type which addresses the need for a heuristic choice, the ability to quantitatively trade-off computational demand with accuracy, and the modern black-box philosophy for checking and interfacing. As a practical implementation of this link between the two workflows, we present a python package 'pcit', which implements our novel multivariate and conditional independence tests, interfacing the supervised learning API of the scikit-learn package. Theory and package also allow for straightforward independence test based learning of graphical model structure. We empirically show that our proposed predictive independence test outperform or are on par to current practice, and the derived graphical model structure learning algorithms asymptotically recover the 'true' graph. This paper, and the 'pcit' package accompanying it, thus provide powerful, scalable, generalizable, and easy-to-use methods for multivariate and conditional independence testing, as well as for graphical model structure learning.
READ FULL TEXT