TY - JOUR
T1 - Linking branch lengths across sets of loci provides the highest statistical support for phylogenetic inference
AU - Duchêne, David A.
AU - Tong, K. Jun
AU - Foster, Charles S.P.
AU - Duchêne, Sebastián
AU - Lanfear, Robert
AU - Ho, Simon Y.W.
N1 - Publisher Copyright:
© 2019 The Author(s). Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved.
PY - 2020/4/1
Y1 - 2020/4/1
N2 - Evolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.
AB - Evolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.
KW - among-lineage rate variation
KW - data partitioning
KW - model selection
KW - phylogenomics
KW - substitution model
UR - http://www.scopus.com/inward/record.url?scp=85082146671&partnerID=8YFLogxK
U2 - 10.1093/molbev/msz291
DO - 10.1093/molbev/msz291
M3 - Article
SN - 0737-4038
VL - 37
SP - 1202
EP - 1210
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 4
ER -