Efficient approach for finding common DNA motifs with gaps

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Rahman, Dr. Atif Hasan
dc.contributor.author	Sayeed, Suri Dipannita
dc.date.accessioned	2019-12-07T09:29:05Z
dc.date.available	2019-12-07T09:29:05Z
dc.date.issued	2019-07-09
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5423
dc.description.abstract	Motifs are repeated patterns in groups of protein or nucleic acid sequences and motif discovery is an important and challenging problem in computational biology. This thesis formulates the gapped motif finding problem as multiple longest common sub-sequence (MLCS) problem and presents an algorithm which solves both of them. The algorithm is based on branch and bound strategy and solves the problem recursively. Motif finding has been widely studied and several variants have been proposed. It is the problem of identifying recurring patterns in sequences. Here, we address the problem of finding Common Motifs with Gaps (CMG) that are present in all strings of a finite set. Searching the Longest Common Subsequences (LCS) among a set of biosequences is another fundamental problem in bioinformatics. This is a classical NP-hard problem. In this thesis, we prove that the CMG problem is NP-hard by reducing the MLCS problem to it. To provide efficient exact solution for both of the problems we give a novel algorithm based on branch and bound method. We propose a preprocessing strategy and a data structure based on that preprocessing part. This preprocessed data structure reduces the total space consumption significantly as no additional data structure is required during simulation of the algorithm. We show the result of practical analysis on simulated sequences that our algorithm outperform all the other existing approaches for solving MLCS problem in terms of space. Our implementation of the algorithm also shows promising results in terms of time compared to some extensively used parallel algorithms. We also show how the algorithm can be extended to give an algorithm for CMG after common factors that occur in all the strings have been identified. We have also implemented the algorithm for CMG and it can solve the CMG problem efficiently.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of computer Science and Engineering	en_US
dc.subject	Algorithms-DNA-RNA sequence	en_US
dc.title	Efficient approach for finding common DNA motifs with gaps	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	0416052004	en_US
dc.identifier.accessionNumber	117217
dc.contributor.callno	006.31/SUR/2019	en_US