dc.description.abstract |
Genes, some long molecules of DNA, store the control codes for all the activities of life; and
scientists are giving huge efforts to tind out the genetic codes in the cells of different living
beings, especially of humans. Because of the huge volume of the databases containing the
genomes of various species, computational gene recognition tools have become essential for
discovery and analysis of the genes. The genes constitute only little portions of the genomic
DNA sequences, and are interleaved by long non-coding intergenic regions. There are
interlcaving of coding and non-coding regions within the genes too. The problem of gene
recognition is to identify gene in the huge volume of DNA sequence, and also to identify the
coding and non-coding regions inside the gene. This thesis describes a new and simple
Hidden Markov Model based system, namely HMMSplice for recognition of donor and
acceptor splice sites in a genomic DNA sequence. Since identification of splice sites extracts
the coding exons and non-coding introns in a gene and thus, completely reveal the structure
of a gene, this system provides substantial aid for recognizing genes in un-annotated DNA
sequences. Hidden Markov Models provide a precise probabilistic method for modeling
sequence of discrete data, and therefore seem to be a natural solution for analyzing various
sites in DNA sequences. Separate HMMs ii)r donor and acceptor splice sites have been
designed for HMMSplice. They are trained and tested with real data and the results of the
experiments have been discussed. Since complete understanding of the biological process that
recognizes and utilizes genes to synthesize proteins, is essential to develop as well as
lhlderstand a gene recognition system, a comprehensive discussion on protein synthesis is
provided Tbe features of spilce sites that arc considered in development of the models, are
discussed in detaiL |
en_US |