Consensus string matching problem and the diagnosis of allelic heterogeneity

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

Consensus string matching problem and the diagnosis of allelic heterogeneity

Fatema Tuz Zohora

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/643

Date: 2014-10

Abstract:

The consensus problems in strings is motivated by the requirement of nding commonality of a large number of strings and has a variety of applications in Bioinformatics. This thesis presents important theoretical and algorithmic results determining the complexity class of the consensus string problem and provides a road map for diagnosing unknown genetic diseases that show Allelic Heterogeneity, a case where a normal gene mutates in di erent orders resulting in two different gene sequences causing two di erent genetic diseases. In this thesis, we rst show the NP-hardness of the consensus string problem under a well known mutation type, namely transposition as the distance metric. Then we propose a polynomial time algorithm for the relaxed version of the problem which determines the existence of a consensus sequence given two input sequences under the inversion and trans-position metric. Our algorithm detects the existence of a common ancestor gene sequence given two input DNA sequences with theoretical worst case time complexity of O(n4) for both the non-overlapping inversion (reversed complement) metric and transposition met-ric. Here n is the common length of the input sequences. However, for both the inversion and transposition metric, practically the average and worst case time complexity have been found to be O(n2) and O(n3) respectively, where the worst case occurs when both input sequences have similarity of around 90%. Similarly, theoretical worst case space complexity is O(n3) for both the inversion and transposition metric, whereas it is O(n2) practically for the inversion metric. Finally, we present a pathway of detecting Allelic Heterogeneity, a challenging genetic disease, using our algorithm.

Show full item record

Files in this item

Name: thesis_FTZ.pdf

Size: 2.876Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

Consensus string matching problem and the diagnosis of allelic heterogeneity

Consensus string matching problem and the diagnosis of allelic heterogeneity

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account