DSpace Repository

Improving phylogenomic analysis using naive binning with quartet-based super tree estimation

Show simple item record

dc.contributor.advisor Bayzid, Dr. Md. Shamsuzzoha
dc.contributor.author Munmun, Farha Akhter
dc.date.accessioned 2025-04-15T06:07:47Z
dc.date.available 2025-04-15T06:07:47Z
dc.date.issued 2024-09-18
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/7038
dc.description.abstract Estimating a species tree from biomolecular sequences is extremely difficult, especially when confronted with gene tree heterogeneity resulting from incomplete lineage sorting (ILS). Two of the most popular techniques for estimating species tree are: combined analysis (CA), which concatenates multiple sequence alignments of different genes into a single supergene alignment and then estimates a tree from this alignment, and another one is summary methods, which compute gene trees from different loci and then combine the inferred gene trees into a species tree. CA could be highly accurate in many cases as the combined gene alignments offer a high level of phylogenetic signals. However, it is agnostic about gene tree discordance (i.e., different genes having different evolutionary histories), leading to statistical inconsistency. On the other hand, summary methods can explicitly account for gene tree discordance and the underlying biological reasons, and thus could be statistically consistent. But they do not perform well when the number of genes is limited and the gene trees are not well estimated (i.e, gene tree estimation errors are prevalent). In this study, we have introduced a hybrid pipeline for species tree estimation that combines the strengths of both the combined analysis method and summary methods. Specifically, we have updated the process flow of a widely used quartet-based summary method called SVDquartets by combining SVDquartets with an existing technique called “binning” and a highly accurate quartet amalgamation technique wQFM. We assessed the performance of our proposed hybrid model on a collection of simulated and real biological datasets that cover a wide range of challenging model conditions with varying numbers of genes, amounts of gene tree estimation errors, and levels gene tree discordance. Our extensive evaluation studies on on both simulated and real biological datasets suggest that this hybrid model could be a promising approach for estimating species trees, especially in the presence of gene tree estimation error due to short gene sequences. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering (CSE), BUET en_US
dc.subject Heuristic programming en_US
dc.title Improving phylogenomic analysis using naive binning with quartet-based super tree estimation en_US
dc.type Thesis-MSc en_US
dc.contributor.id 0419052092 en_US
dc.identifier.accessionNumber 119880
dc.contributor.callno 119880 en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account