An Approximation Algorithm for Ancestral Maximum-Likelihood and Phylogeography Inference Problems under Time Reversible Markov Evolutionary Models
The ancestral maximum-likelihood and phylogeography problems are two fundamental problems involving evolutionary studies. The ancestral maximum-likelihood problem involves identifying a rooted tree alongside internal node sequences that maximizes the probability of observing a given set of sequences as leaves. The phylogeography problem extends the ancestral maximum-likelihood problem to incorporate geolocation of leaf and internal nodes. While a constant factor approximation algorithm has been established for the ancestral maximum-likelihood problem concerning two-state sequences, no such algorithm has been devised for any generalized instances of the problem. In this paper, we focus on a generalization of the two-state model, the time reversible Markov evolutionary models for sequences and geolocations. Under this evolutionary model, we present a 2log_2 k-approximation algorithm, where k is the number of input samples, addressing both the ancestral maximum-likelihood and phylogeography problems. This is the first approximation algorithm for the phylogeography problem. Furthermore, we show how to apply the algorithm on popular evolutionary models like generalized time-reversible (GTR) model and its specialization Jukes and Cantor 69 (JC69).
READ FULL TEXT