TY - GEN

T1 - The 2-interval pattern matching problems and its application to ncRNA scanning

AU - Wong, Thomas K.F.

AU - YiuT, S. M.

AU - Lam, W.

AU - Sung, Wing Kin

N1 - © Springer-Verlag Berlin Heidelberg 2009

PY - 2009

Y1 - 2009

N2 - This paper focuses on the 2-Interval pattern matching problem for{<, ⊂}-structured pattern and applies it on scanning for the ncRNAs without pseudoknots. Vialette [6] gave an O(mn3 log n) time solution to the problem, where ,n are the number of intervals in the pattern and the given 2-interval set.This solution however is not practical for scanning the secondary structure in a genome- wide or chromosome-wide scale. In this paper, we propose an efficient algorithm to solve the problem in O(mn log n) time. In order to capture morecharacteristics of the secondary structures of ncRNA families, we define a new problem by considering the distance constraints between the intervals and we can still solve it without increasing the time complexity. Experiment showed that the method to the new defined problem can result in much fewer false positives. Moreover, if we assume the only possible base pairs are {(A,U),(C,G),U,G)} which are the case for RNA molecule, we can further improve the ime complexity to O(mq), where q is the length of the input RNA sequences. From the experiment, our new method requires a reasonable time (2.5 min) to scan the whole chromosome for an ncRNA family.

AB - This paper focuses on the 2-Interval pattern matching problem for{<, ⊂}-structured pattern and applies it on scanning for the ncRNAs without pseudoknots. Vialette [6] gave an O(mn3 log n) time solution to the problem, where ,n are the number of intervals in the pattern and the given 2-interval set.This solution however is not practical for scanning the secondary structure in a genome- wide or chromosome-wide scale. In this paper, we propose an efficient algorithm to solve the problem in O(mn log n) time. In order to capture morecharacteristics of the secondary structures of ncRNA families, we define a new problem by considering the distance constraints between the intervals and we can still solve it without increasing the time complexity. Experiment showed that the method to the new defined problem can result in much fewer false positives. Moreover, if we assume the only possible base pairs are {(A,U),(C,G),U,G)} which are the case for RNA molecule, we can further improve the ime complexity to O(mq), where q is the length of the input RNA sequences. From the experiment, our new method requires a reasonable time (2.5 min) to scan the whole chromosome for an ncRNA family.

UR - http://www.scopus.com/inward/record.url?scp=68249122304&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-00727-9_10

DO - 10.1007/978-3-642-00727-9_10

M3 - Conference contribution

SN - 3642007260

SN - 9783642007262

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 79

EP - 89

BT - Bioinformatics and Computational Biology - First International Conference, BICoB 2009, Proceedings

T2 - 1st International Conference on Bioinformatics and Computational Biology, BICoB 2009

Y2 - 8 April 2009 through 10 April 2009

ER -