TY - JOUR
T1 - HaploJuice
T2 - Accurate haplotype assembly from a pool of sequences with known relative concentrations
AU - Wong, Thomas K.F.
AU - Ranjard, Louis
AU - Lin, Yu
AU - Rodrigo, Allen G.
N1 - Publisher Copyright:
© 2018 The Author(s).
PY - 2018/10/22
Y1 - 2018/10/22
N2 - Background: Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. (PLoS ONE 13:0195090, 2018) proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. Results: HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.'s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq (Ranjard et al., PLoS ONE 13:0195090, 2018), ShoRAH (Zagordi et al., BMC Bioinformatics 12:119, 2011), SAVAGE (Baaijens et al., Genome Res 27:835-848, 2017), PredictHaplo (Prabhakaran et al., IEEE/ACM Trans Comput Biol Bioinform 11:182-91, 2014) and QuRe (Prosperi and Salemi, Bioinformatics 28:132-3, 2012). Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. Conclusion: HaploJuice provides high accuracy in haplotype reconstruction, making Ranjard et al.'s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.
AB - Background: Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. (PLoS ONE 13:0195090, 2018) proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. Results: HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.'s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq (Ranjard et al., PLoS ONE 13:0195090, 2018), ShoRAH (Zagordi et al., BMC Bioinformatics 12:119, 2011), SAVAGE (Baaijens et al., Genome Res 27:835-848, 2017), PredictHaplo (Prabhakaran et al., IEEE/ACM Trans Comput Biol Bioinform 11:182-91, 2014) and QuRe (Prosperi and Salemi, Bioinformatics 28:132-3, 2012). Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. Conclusion: HaploJuice provides high accuracy in haplotype reconstruction, making Ranjard et al.'s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.
KW - Barcode
KW - Haplotype reconstruction
KW - Pooling strategy
UR - http://www.scopus.com/inward/record.url?scp=85055209825&partnerID=8YFLogxK
U2 - 10.1186/s12859-018-2424-7
DO - 10.1186/s12859-018-2424-7
M3 - Article
SN - 1471-2105
VL - 19
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 389
ER -