Word match counts between Markovian biological sequences

Conrad Burden*, Paul Leopardi, Sylvain Forêt

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

The D2 statistic, which counts the number of word matches between two given sequences, has long been proposed as a measure of similarity for biological sequences. Much of the mathematically rigorous work carried out to date on the properties of the D2 statistic has been restricted to the case of ‘Bernoulli’ sequences composed of identically and independently distributed letters. Here the properties of the distribution of this statistic for the biologically more realistic case of Markovian sequences is studied. The approach is novel in that Markovian dependency is defined for sequences with periodic boundary conditions, and this enables exact analytic formulae for the mean and variance to be derived. The formulae are confirmed using numerical simulations, and asymptotic approximations to the full distribution are tested.

Original languageEnglish
Title of host publicationBiomedical Engineering Systems and Technologies - 6th International Joint Conference, BIOSTEC 2013, Revised Selected Papers
EditorsAna Fred, Hugo Gamboa, Pedro L. Fernandes, Jordi Solé-Casals, Mireya Fernández-Chimeno, Sergio Alvarez, Deborah Stacey
PublisherSpringer Verlag
Pages147-161
Number of pages15
ISBN (Electronic)9783662444849
DOIs
Publication statusPublished - 2014
Event6th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2013 - Barcelona, Spain
Duration: 11 Feb 201314 Feb 2013

Publication series

NameCommunications in Computer and Information Science
Volume452
ISSN (Print)1865-0929

Conference

Conference6th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2013
Country/TerritorySpain
CityBarcelona
Period11/02/1314/02/13

Fingerprint

Dive into the research topics of 'Word match counts between Markovian biological sequences'. Together they form a unique fingerprint.

Cite this