DSpace Repository

Efficient approach for finding common DNA motifs with gaps

Show simple item record

dc.contributor.advisor Rahman, Dr. Atif Hasan
dc.contributor.author Sayeed, Suri Dipannita
dc.date.accessioned 2019-12-07T09:29:05Z
dc.date.available 2019-12-07T09:29:05Z
dc.date.issued 2019-07-09
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5423
dc.description.abstract Motifs are repeated patterns in groups of protein or nucleic acid sequences and motif discovery is an important and challenging problem in computational biology. This thesis formulates the gapped motif finding problem as multiple longest common sub-sequence (MLCS) problem and presents an algorithm which solves both of them. The algorithm is based on branch and bound strategy and solves the problem recursively. Motif finding has been widely studied and several variants have been proposed. It is the problem of identifying recurring patterns in sequences. Here, we address the problem of finding Common Motifs with Gaps (CMG) that are present in all strings of a finite set. Searching the Longest Common Subsequences (LCS) among a set of biosequences is another fundamental problem in bioinformatics. This is a classical NP-hard problem. In this thesis, we prove that the CMG problem is NP-hard by reducing the MLCS problem to it. To provide efficient exact solution for both of the problems we give a novel algorithm based on branch and bound method. We propose a preprocessing strategy and a data structure based on that preprocessing part. This preprocessed data structure reduces the total space consumption significantly as no additional data structure is required during simulation of the algorithm. We show the result of practical analysis on simulated sequences that our algorithm outperform all the other existing approaches for solving MLCS problem in terms of space. Our implementation of the algorithm also shows promising results in terms of time compared to some extensively used parallel algorithms. We also show how the algorithm can be extended to give an algorithm for CMG after common factors that occur in all the strings have been identified. We have also implemented the algorithm for CMG and it can solve the CMG problem efficiently. en_US
dc.language.iso en en_US
dc.publisher Department of computer Science and Engineering en_US
dc.subject Algorithms-DNA-RNA sequence en_US
dc.title Efficient approach for finding common DNA motifs with gaps en_US
dc.type Thesis-MSc en_US
dc.contributor.id 0416052004 en_US
dc.identifier.accessionNumber 117217
dc.contributor.callno 006.31/SUR/2019 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account