TY - JOUR
T1 - AliSim
T2 - A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era
AU - Ly-Trong, Nhan
AU - Naser-Khdour, Suha
AU - Lanfear, Robert
AU - Minh, Bui Quang
N1 - Publisher Copyright:
© 2022 The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
PY - 2022/5/3
Y1 - 2022/5/3
N2 - Sequence simulators play an important role in phylogenetics. Simulated data has many applications, such as evaluating the performance of different methods, hypothesis testing with parametric bootstraps, and, more recently, generating data for training machine-learning applications. Many sequence simulation programmes exist, but the most feature-rich programmes tend to be rather slow, and the fastest programmes tend to be feature-poor. Here, we introduce AliSim, a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 h and 1.3 GB RAM to simulate alignments with one million sequences or sites, whereas popular software Seq-Gen, Dawg, and INDELible require 2-5 h and 50-500 GB of RAM. We provide AliSim as an extension of the IQ-TREE software version 2.2, freely available at www.iqtree.org, and a comprehensive user tutorial at http://www.iqtree.org/doc/AliSim.
AB - Sequence simulators play an important role in phylogenetics. Simulated data has many applications, such as evaluating the performance of different methods, hypothesis testing with parametric bootstraps, and, more recently, generating data for training machine-learning applications. Many sequence simulation programmes exist, but the most feature-rich programmes tend to be rather slow, and the fastest programmes tend to be feature-poor. Here, we introduce AliSim, a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 h and 1.3 GB RAM to simulate alignments with one million sequences or sites, whereas popular software Seq-Gen, Dawg, and INDELible require 2-5 h and 50-500 GB of RAM. We provide AliSim as an extension of the IQ-TREE software version 2.2, freely available at www.iqtree.org, and a comprehensive user tutorial at http://www.iqtree.org/doc/AliSim.
KW - molecular evolution
KW - phylogenetics
KW - sequence simulation
UR - http://www.scopus.com/inward/record.url?scp=85130642639&partnerID=8YFLogxK
U2 - 10.1093/molbev/msac092
DO - 10.1093/molbev/msac092
M3 - Article
SN - 0737-4038
VL - 39
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 5
M1 - msac092
ER -