Abstract:
Finding patterns and characterizing regularities in strings are in the heart of research in
the field of stringology. These problems are interesting as fundamental computer science
problems and often turn out to be important and useful in many applications. In case of
string regularities periods, repeats, and covers are the most common forms and there are
several efficient algorithms for computing regularities in a string.
In this thesis, we focus on a generalization of the regular string commonly known
as degenerate strings. A degenerate string can be represented as a sequence T
T[lJT[2] ... T[nj, where T[iJ <;; E for each i, and E is a given alphabet. It is an
extended notion of the regular string which is very much applicable especially in the
context of computational biology. Pattern matching problem for degenerate string has
also been investigated in the literature albeit with little success. The lack of any efficient
algorithms for degenerate pattern matching may be attributed at least partially to the
lack of fruitful research on regularities of degenerate strings. Very recently, the issue of
regularities in degenerate string has received some attention. In this thesis,we study the
problem of string regularities for degenerate strings. We present two novel and efficient
algorithms for computing cover array for degenerate strings. Both of our algorithms run
in O(n2) time and space. Using string combinatorics and probability analysis we have
further proved that expected number of both borders and covers for a degenerate string
is bounded by a constant. By using this probabilistic analysis, we have also shown that
the average running time and space complexity of our algorithms are O(n).