Truncated Rank-Based Tests for Two-Part Models with Excessive Zeros and Applications to Microbiome Data
High-throughput sequencing technology allows us to test the compositional difference of bacteria in different populations. One important feature of human microbiome data is that it often includes a large number of zeros. Such data can be treated as being generated from a two-part model that includes a zero point-mass. Motivated by analysis of such non-negative data with excessive zeros, we introduce several truncated rank-based two-group and multi-group tests for such data, including a truncated rank-based Wilcoxon rank-sum test for two-group comparison and two truncated Kruskal-Wallis tests for multi-group comparison. We show both analytically through asymptotic relative efficiency analysis and by simulations that the proposed tests have higher power than the standard rank-based tests, especially when the proportion of zeros in the data is high. The tests can also be applied to repeated measurements of compositional data via simple within-subject permutations. We apply the tests to the analysis of a gut microbiome data set to compare the microbiome compositions of healthy and pediatric Crohn's disease patients and to assess the treatment effects on microbiome compositions. We identify several bacterial genera that are missed by the standard rank-based tests.
READ FULL TEXT