Abstract:
A phylogenetic tree represents the evolutionary relationship among a group of species. The
‘quartet-based’ phylogenetic tree construction refers to the method of combining many ‘quartets’
(a phylogenetic tree relating 4 species) into a single phylogenetic tree. In this thesis
we present a new algorithm for ‘quartet-based’ phylogenetic tree construction and show its
superiority in accuracy over the current best method for this problem.
Phylogenetic tree construction methods are inherently computationally very intensive, and
usually can be applied to limited number of species. But the ultimate goal of phylogenetic
reconstruction is to infer the phylogeny involving all lives on earth, i.e., to infer the ‘Tree of
Life’. The ‘supertree’ method (constructing larger tree(s) from many smaller trees) has been
identified as a reasonable solution in this regard, as these methods are computationally less
intensive compared to other existing phylogenetic tree contruction methods (such as maximum
likelihood). Over the past decade, supertree construction has become an area of active theoretical
and practical research. A ‘quartet’ is the the basic piece of phylogenetic information,
so the quartet-based supertree method is responsible for combining many minimal pieces of
information into a single, coherent, and more comprehensive piece of information. Since the
quartet-based supertree methods are inherently computationally less intensive compared to the
other approaches of phylogenetic tree construction, the only challenge of such construction is
to achieve the accuracy and scalability.
In this thesis, we have devised a new quartet based supertree method, QFM (Quartet FM),
which constructs more accurate trees (in terms of topological accuracy) than the current best
quartet based method, QMC (Quartet MaxCut) [32]. The new method is also scalable to large
datasets, that is, it performs well on very large datasets without compromising the accuracy. Also, QFM is found performing same as (even better in some cases) QMC in maximizing the
objective function of the underlying optimization problem. In this thesis, we performed an
extensive experimental study to evaluate the accuracy and scalability of our algorithm on both
simulated and biological datasets. In addition, we have developed a software tool, named,
‘QTREE’ to simulate and analyze the two methods - QFM and QMC.