Abstract:
String matching and sequence alignment problems are classical problems in Computer Science
with extensive applications in di erent branches of science and engineering. These problems are
interesting as fundamental computer science problems and are considered as basic requirements
in many practical applications. They in fact appear in almost every textbook of algorithms
and data structures. Circular strings or sequences appear in a number of biological contexts,
in all domains of life: bacteria, archaea, and eukaryotes; and in viruses.
This thesis deals with three pattern matching problems, namely, Classical Pattern Match-
ing problem, Circular Pattern Matching (CPM) problem and Circular Sequence Comparison
(CSC). Here, we present some ltering techniques to solve these problems. At rst, we propose
a concept of a ltering pattern signature. Using this concept we develop an algorithm for
search space reduction of the text string. Then we develop algorithms for circular strings and
sequences. Our lters are simple and light-weight. Here light-weight means that executing
the lters will be computationally easy and memory e cient. These lters will ensure that
there are no false negatives; however, there could be false positives which will be handled in
a subsequent processing step. In the sequel, only correct solutions will be determined by the
combined algorithm. Our algorithms have been implemented and rigorously tested using real
genome datasets. We compare our algorithms with the state of the art algorithms and the
results are found to be excellent.