TY - GEN
T1 - The 2-interval pattern matching problems and its application to ncRNA scanning
AU - Wong, Thomas K.F.
AU - YiuT, S. M.
AU - Lam, W.
AU - Sung, Wing Kin
N1 - © Springer-Verlag Berlin Heidelberg 2009
PY - 2009
Y1 - 2009
N2 - This paper focuses on the 2-Interval pattern matching problem for{<, ⊂}-structured pattern and applies it on scanning for the ncRNAs without pseudoknots. Vialette [6] gave an O(mn3 log n) time solution to the problem, where ,n are the number of intervals in the pattern and the given 2-interval set.This solution however is not practical for scanning the secondary structure in a genome- wide or chromosome-wide scale. In this paper, we propose an efficient algorithm to solve the problem in O(mn log n) time. In order to capture morecharacteristics of the secondary structures of ncRNA families, we define a new problem by considering the distance constraints between the intervals and we can still solve it without increasing the time complexity. Experiment showed that the method to the new defined problem can result in much fewer false positives. Moreover, if we assume the only possible base pairs are {(A,U),(C,G),U,G)} which are the case for RNA molecule, we can further improve the ime complexity to O(mq), where q is the length of the input RNA sequences. From the experiment, our new method requires a reasonable time (2.5 min) to scan the whole chromosome for an ncRNA family.
AB - This paper focuses on the 2-Interval pattern matching problem for{<, ⊂}-structured pattern and applies it on scanning for the ncRNAs without pseudoknots. Vialette [6] gave an O(mn3 log n) time solution to the problem, where ,n are the number of intervals in the pattern and the given 2-interval set.This solution however is not practical for scanning the secondary structure in a genome- wide or chromosome-wide scale. In this paper, we propose an efficient algorithm to solve the problem in O(mn log n) time. In order to capture morecharacteristics of the secondary structures of ncRNA families, we define a new problem by considering the distance constraints between the intervals and we can still solve it without increasing the time complexity. Experiment showed that the method to the new defined problem can result in much fewer false positives. Moreover, if we assume the only possible base pairs are {(A,U),(C,G),U,G)} which are the case for RNA molecule, we can further improve the ime complexity to O(mq), where q is the length of the input RNA sequences. From the experiment, our new method requires a reasonable time (2.5 min) to scan the whole chromosome for an ncRNA family.
UR - http://www.scopus.com/inward/record.url?scp=68249122304&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-00727-9_10
DO - 10.1007/978-3-642-00727-9_10
M3 - Conference contribution
SN - 3642007260
SN - 9783642007262
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 79
EP - 89
BT - Bioinformatics and Computational Biology - First International Conference, BICoB 2009, Proceedings
T2 - 1st International Conference on Bioinformatics and Computational Biology, BICoB 2009
Y2 - 8 April 2009 through 10 April 2009
ER -