TY - JOUR
T1 - Phylogeny analysis from gene-order data with massive duplications
AU - Zhou, Lingxi
AU - Lin, Yu
AU - Feng, Bing
AU - Zhao, Jieyi
AU - Tang, Jijun
N1 - Publisher Copyright:
© 2017 The Author(s).
PY - 2017/10/16
Y1 - 2017/10/16
N2 - Background: Gene order changes, under rearrangements, insertions, deletions and duplications, have been used as a new type of data source for phylogenetic reconstruction. Because these changes are rare compared to sequence mutations, they allow the inference of phylogeny further back in evolutionary time. There exist many computational methods for the reconstruction of gene-order phylogenies, including widely used maximum parsimonious methods and maximum likelihood methods. However, both methods face challenges in handling large genomes with many duplicated genes, especially in the presence of whole genome duplication. Methods: In this paper, we present three simple yet powerful methods based on maximum-likelihood (ML) approaches that encode multiplicities of both gene adjacency and gene content information for phylogenetic reconstruction. Results: Extensive experiments on simulated data sets show that our new method achieves the most accurate phylogenies compared to existing approaches. We also evaluate our method on real whole-genome data from eleven mammals. The package is publicly accessible at http://www.geneorder.org. Conclusions: Our new encoding schemes successfully incorporate the multiplicity information of gene adjacencies and gene content into an ML framework, and show promising results in reconstruct phylogenies for whole-genome data in the presence of massive duplications.
AB - Background: Gene order changes, under rearrangements, insertions, deletions and duplications, have been used as a new type of data source for phylogenetic reconstruction. Because these changes are rare compared to sequence mutations, they allow the inference of phylogeny further back in evolutionary time. There exist many computational methods for the reconstruction of gene-order phylogenies, including widely used maximum parsimonious methods and maximum likelihood methods. However, both methods face challenges in handling large genomes with many duplicated genes, especially in the presence of whole genome duplication. Methods: In this paper, we present three simple yet powerful methods based on maximum-likelihood (ML) approaches that encode multiplicities of both gene adjacency and gene content information for phylogenetic reconstruction. Results: Extensive experiments on simulated data sets show that our new method achieves the most accurate phylogenies compared to existing approaches. We also evaluate our method on real whole-genome data from eleven mammals. The package is publicly accessible at http://www.geneorder.org. Conclusions: Our new encoding schemes successfully incorporate the multiplicity information of gene adjacencies and gene content into an ML framework, and show promising results in reconstruct phylogenies for whole-genome data in the presence of massive duplications.
KW - Maximum likelihood
KW - Phylogeny reconstruction
KW - Variable length binary encoding
KW - Whole genome duplication
UR - http://www.scopus.com/inward/record.url?scp=85031496377&partnerID=8YFLogxK
U2 - 10.1186/s12864-017-4129-0
DO - 10.1186/s12864-017-4129-0
M3 - Article
SN - 1471-2164
VL - 18
JO - BMC Genomics
JF - BMC Genomics
M1 - 760
ER -