T1 - The 2-interval pattern matching problems and its application to ncRNA scanning

AU - Wong, Thomas K.F.

AU - YiuT, S. M.

AU - Lam, W.

AU - Sung, Wing Kin

N1 - © Springer-Verlag Berlin Heidelberg 2009

PY - 2009

Y1 - 2009

N2 - This paper focuses on the 2-Interval pattern matching problem for{<, ⊂}-structured pattern and applies it on scanning for the ncRNAs without pseudoknots. Vialette [6] gave an O(mn3 log n) time solution to the problem, where ,n are the number of intervals in the pattern and the given 2-interval set.This solution however is not practical for scanning the secondary structure in a genome- wide or chromosome-wide scale. In this paper, we propose an efficient algorithm to solve the problem in O(mn log n) time. In order to capture morecharacteristics of the secondary structures of ncRNA families, we define a new problem by considering the distance constraints between the intervals and we can still solve it without increasing the time complexity. Experiment showed that the method to the new defined problem can result in much fewer false positives. Moreover, if we assume the only possible base pairs are {(A,U),(C,G),U,G)} which are the case for RNA molecule, we can further improve the ime complexity to O(mq), where q is the length of the input RNA sequences. From the experiment, our new method requires a reasonable time (2.5 min) to scan the whole chromosome for an ncRNA family.

