TY - JOUR
T1 - HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa
T2 - Impact of Missing Nucleotide Characters in Next-Generation Sequences
AU - Ratmann, Oliver
AU - Wymant, Chris
AU - Colijn, Caroline
AU - Danaviah, Siva
AU - Essex, Max
AU - Frost, Simon
AU - Gall, Astrid
AU - Gaseitsiwe, Simani
AU - Grabowski, Mary K.
AU - Gray, Ronald
AU - Guindon, Stephane
AU - Von Haeseler, Arndt
AU - Kaleebu, Pontiano
AU - Kendall, Michelle
AU - Kozlov, Alexey
AU - Manasa, Justen
AU - Minh, Bui Quang
AU - Moyo, Sikhulile
AU - Novitsky, Vlad
AU - Nsubuga, Rebecca
AU - Pillay, Sureshnee
AU - Quinn, Thomas C.
AU - Serwadda, David
AU - Ssemwanga, Deogratius
AU - Stamatakis, Alexandros
AU - Trifinopoulos, Jana
AU - Wawer, Maria
AU - Brown, Andy Leigh
AU - De Oliveira, Tulio
AU - Kellam, Paul
AU - Pillay, Deenan
AU - Fraser, Christophe
N1 - Publisher Copyright:
© Copyright 2017, Mary Ann Liebert, Inc. 2017.
PY - 2017/11
Y1 - 2017/11
N2 - To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the "Phylogenetics and Networks for Generalised HIV Epidemics in Africa" consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
AB - To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the "Phylogenetics and Networks for Generalised HIV Epidemics in Africa" consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
KW - human immunodeficiency virus, phylogenomics, phylodynamics, HIV-1 molecular epidemiology, sub-Saharan Africa, PANGEA
UR - http://www.scopus.com/inward/record.url?scp=85032619182&partnerID=8YFLogxK
U2 - 10.1089/aid.2017.0061
DO - 10.1089/aid.2017.0061
M3 - Article
SN - 0889-2229
VL - 33
SP - 1083
EP - 1098
JO - AIDS Research and Human Retroviruses
JF - AIDS Research and Human Retroviruses
IS - 11
ER -