De novo construction of q-ploid linkage maps using discrete graphical models
Linkage maps are important for fundamental and applied genetic research. New sequencing techniques have been created opportunities to increase substantially the density of genetic markers. With such revolutionary advances in technology come new challenges in methodologies and informatics. In this article, we introduce a novel linkage map algorithm to construct high-quality and high-density linkage maps for diploid and polyploid species. We propose to construct linkage maps using graphical models either via a sparse Gaussian copula or via a nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes, and the order of markers in each LG is determined by revealing the conditional independence relationships among a large number of markers in the genome. We illustrate the efficiency of the inference method on a broad range of synthetic data with varying rates of missingness and genotyping errors. We show that our method outperforms other available methods in terms of determining the correct number of linkage groups and ordering markers both when the data are clean and contain no missing observations and when data are noisy and incomplete. In addition, we implement the method on real genotype data of barley and potato from diploid and tetraploid populations, respectively. Given that most tetraploid potato linkage maps have been generated either from diploid populations (Felcher et al., 2012) or from a subset of marker types (e.g. both parents were heterozygous) (Grandke et al., 2017), developing a map construction method based on discrete graphical models opens the opportunities to construct high-quality linkage maps for any biparental diploid and polyploid species containing all different marker types. We have implemented the method in the R package netwgwas (Behrouzi and Wit, 2017b).
READ FULL TEXT