TY - JOUR
T1 - A Likelihood-Ratio Test for Lumpability of Phylogenetic Data
T2 - Is the Markovian Property of an Evolutionary Process Retained in Recoded DNA?
AU - Vera-Ruiz, Victor A.
AU - Robinson, John
AU - Jermiin, Lars S.
N1 - Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.
PY - 2022/5/1
Y1 - 2022/5/1
N2 - In molecular phylogenetics, it is typically assumed that the evolutionary process for DNA can be approximated by independent and identically distributed Markovian processes at the variable sites and that these processes diverge over the edges of a rooted bifurcating tree. Sometimes the nucleotides are transformed from a 4-state alphabet to a 3-or 2-state alphabet by a procedure that is called recoding, lumping, or grouping of states. Here, we introduce a likelihood-ratio test for lumpability for DNA that has diverged under different Markovian conditions, which assesses the assumption that the Markovian property of the evolutionary process over each edge is retained after recoding of the nucleotides. The test is derived and validated numerically on simulated data. To demonstrate the insights that can be gained by using the test, we assessed two published data sets, one of mitochondrial DNA from a phylogenetic study of the ratites and the other of nuclear DNA from a phylogenetic study of yeast. Our analysis of these data sets revealed that recoding of the DNA eliminated some of the compositional heterogeneity detected over the sequences. However, the Markovian property of the original evolutionary process was not retained by the recoding, leading to some significant distortions of edge lengths in reconstructed trees.[Evolutionary processes; likelihood-ratio test; lumpability; Markovian processes; Markov models; phylogeny; recoding of nucleotides.]
AB - In molecular phylogenetics, it is typically assumed that the evolutionary process for DNA can be approximated by independent and identically distributed Markovian processes at the variable sites and that these processes diverge over the edges of a rooted bifurcating tree. Sometimes the nucleotides are transformed from a 4-state alphabet to a 3-or 2-state alphabet by a procedure that is called recoding, lumping, or grouping of states. Here, we introduce a likelihood-ratio test for lumpability for DNA that has diverged under different Markovian conditions, which assesses the assumption that the Markovian property of the evolutionary process over each edge is retained after recoding of the nucleotides. The test is derived and validated numerically on simulated data. To demonstrate the insights that can be gained by using the test, we assessed two published data sets, one of mitochondrial DNA from a phylogenetic study of the ratites and the other of nuclear DNA from a phylogenetic study of yeast. Our analysis of these data sets revealed that recoding of the DNA eliminated some of the compositional heterogeneity detected over the sequences. However, the Markovian property of the original evolutionary process was not retained by the recoding, leading to some significant distortions of edge lengths in reconstructed trees.[Evolutionary processes; likelihood-ratio test; lumpability; Markovian processes; Markov models; phylogeny; recoding of nucleotides.]
UR - http://www.scopus.com/inward/record.url?scp=85128493539&partnerID=8YFLogxK
U2 - 10.1093/sysbio/syab074
DO - 10.1093/sysbio/syab074
M3 - Article
SN - 1063-5157
VL - 71
SP - 660
EP - 675
JO - Systematic Biology
JF - Systematic Biology
IS - 3
ER -