Weighted k-word matches: A sequence comparison tool for proteins

J. Jing*, S. R. Wilson, C. J. Burden

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    4 Citations (Scopus)

    Abstract

    The use of k-word matches was developed as a fast alignment-free comparison method for dna sequences in cases where long range contiguity has been compromised, for example, by shuffling, duplication, deletion or inversion of extended blocks of sequence. Here we extend the algorithm to amino acid sequences. We define a new statistic, the weighted word match, which reflects the varying degrees of similarity between pairs of amino acids. We computed the mean and variance, and simulated the distribution function for various forms of this statistic for sequences of identically and independently distributed letters. We present these results and a method for choosing an optimal word size. The efficiency of the method is tested by using simulated evolutionary sequences, and the results compared with blast.

    Original languageEnglish
    Pages (from-to)C172-C189
    JournalANZIAM Journal
    Volume52
    Publication statusPublished - 2010

    Fingerprint

    Dive into the research topics of 'Weighted k-word matches: A sequence comparison tool for proteins'. Together they form a unique fingerprint.

    Cite this