HaploJuice: Accurate haplotype assembly from a pool of sequences with known relative concentrations

Thomas K.F. Wong*, Louis Ranjard, Yu Lin, Allen G. Rodrigo

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Background: Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. (PLoS ONE 13:0195090, 2018) proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. Results: HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.'s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq (Ranjard et al., PLoS ONE 13:0195090, 2018), ShoRAH (Zagordi et al., BMC Bioinformatics 12:119, 2011), SAVAGE (Baaijens et al., Genome Res 27:835-848, 2017), PredictHaplo (Prabhakaran et al., IEEE/ACM Trans Comput Biol Bioinform 11:182-91, 2014) and QuRe (Prosperi and Salemi, Bioinformatics 28:132-3, 2012). Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. Conclusion: HaploJuice provides high accuracy in haplotype reconstruction, making Ranjard et al.'s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.

    Original languageEnglish
    Article number389
    JournalBMC Bioinformatics
    Volume19
    Issue number1
    DOIs
    Publication statusPublished - 22 Oct 2018

    Fingerprint

    Dive into the research topics of 'HaploJuice: Accurate haplotype assembly from a pool of sequences with known relative concentrations'. Together they form a unique fingerprint.

    Cite this