Abstract
Motivation: To devise a method that, unlike available methods, directly measures variations in phylogenetic signals in gene sequences that result from recombination, tests the significance of the signal variations and distinguishes misleading signals. Results: We have developed a method, that we call 'sister-scanning', for assessing phylogenetic and compositional signals in the various patterns of identity that occur between four nucleotide sequences. A Monte Carlo randomization is done for all columns (positions) within a window and Z-scores are obtained for four real sequences or three real sequences with an outlier that is also randomized. The usefulness of the approach is demonstrated using tobamovirus and luteovirus sequences. Contradictory phylogenetic signals were distinguished in both datasets, as were regions of sequence that contained no clear signal or potentially misleading signals related to compositional similarities. In the tobamovirus dataset, contradictory phylogenetic signals were separated by coding sequences up to a kilobase long that contained no clear signal. Our re-analysis of this dataset using sister-scanning also yielded the first evidence known to us of an interspecies recombination site within a viral RNA-dependent RNA polymerase gene together with evidence of an unusual pattern of conservation in the three codon positions.
Original language | English |
---|---|
Pages (from-to) | 573-582 |
Number of pages | 10 |
Journal | Bioinformatics |
Volume | 16 |
Issue number | 7 |
DOIs | |
Publication status | Published - 2000 |