dc.contributor.advisor |
Rahman, Dr. Atif Hasan |
|
dc.contributor.author |
Sayeed, Suri Dipannita |
|
dc.date.accessioned |
2019-12-07T09:29:05Z |
|
dc.date.available |
2019-12-07T09:29:05Z |
|
dc.date.issued |
2019-07-09 |
|
dc.identifier.uri |
http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5423 |
|
dc.description.abstract |
Motifs are repeated patterns in groups of protein or nucleic acid sequences and motif discovery is an important and challenging problem in computational biology. This thesis formulates the gapped motif finding problem as multiple longest common sub-sequence (MLCS) problem and presents an algorithm which solves both of them. The algorithm is based on branch and bound strategy and solves the problem recursively.
Motif finding has been widely studied and several variants have been proposed. It is the problem of identifying recurring patterns in sequences. Here, we address the problem of finding Common Motifs with Gaps (CMG) that are present in all strings of a finite set. Searching the Longest Common Subsequences (LCS) among a set of biosequences is another fundamental problem in bioinformatics. This is a classical NP-hard problem. In this thesis, we prove that the CMG problem is NP-hard by reducing the MLCS problem to it.
To provide efficient exact solution for both of the problems we give a novel algorithm based on branch and bound method. We propose a preprocessing strategy and a data structure based on that preprocessing part. This preprocessed data structure reduces the total space consumption significantly as no additional data structure is required during simulation of the algorithm. We show the result of practical analysis on simulated sequences that our algorithm outperform all the other existing approaches for solving MLCS problem in terms of space. Our implementation of the algorithm also shows promising results in terms of time compared to some extensively used parallel algorithms. We also show how the algorithm can be extended to give an algorithm for CMG after common factors that occur in all the strings have been identified. We have also implemented the algorithm for CMG and it can solve the CMG problem efficiently. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Department of computer Science and Engineering |
en_US |
dc.subject |
Algorithms-DNA-RNA sequence |
en_US |
dc.title |
Efficient approach for finding common DNA motifs with gaps |
en_US |
dc.type |
Thesis-MSc |
en_US |
dc.contributor.id |
0416052004 |
en_US |
dc.identifier.accessionNumber |
117217 |
|
dc.contributor.callno |
006.31/SUR/2019 |
en_US |