SADA: A General Framework to Support Robust Causation Discovery with Theoretical Guarantee
Causation discovery without manipulation is considered a crucial problem to a variety of applications. The state-of-the-art solutions are applicable only when large numbers of samples are available or the problem domain is sufficiently small. Motivated by the observations of the local sparsity properties on causal structures, we propose a general Split-and-Merge framework, named SADA, to enhance the scalability of a wide class of causation discovery algorithms. In SADA, the variables are partitioned into subsets, by finding causal cut on the sparse causal structure over the variables. By running mainstream causation discovery algorithms as basic causal solvers on the subproblems, complete causal structure can be reconstructed by combining the partial results. SADA benefits from the recursive division technique, since each small subproblem generates more accurate result under the same number of samples. We theoretically prove that SADA always reduces the scales of problems without sacrifice on accuracy, under the condition of local causal sparsity and reliable conditional independence tests. We also present sufficient condition to accuracy enhancement by SADA, even when the conditional independence tests are vulnerable. Extensive experiments on both simulated and real-world datasets verify the improvements on scalability and accuracy by applying SADA together with existing causation discovery algorithms.
READ FULL TEXT