DSpace Repository

Prediction of protein-carbohydrate binding sites from protein primary sequence

Show simple item record

dc.contributor.advisor Rahman, Dr. Mohammad Saifur
dc.contributor.author Nawar, Quazi Farah
dc.date.accessioned 2024-06-30T04:25:03Z
dc.date.available 2024-06-30T04:25:03Z
dc.date.issued 2023-09-23
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6757
dc.description.abstract A protein is a large, complex macromolecule, and it has many crucial roles in the human body as it performs most of the work in cells and tissues. It consists of one or multiple extended sequences of amino acid components. Another important biomolecule that comes after DNA and proteins is carbohydrates. Carbohydrates interact with proteins to facilitate various biological processes. Sev- eralbiochemicalexperimentsexisttostudyprotein-carbohydrateinteractions,buttheyareexpensive, time-consuming,andchallenging.Asaresultoftheswiftadvancementsinsequencingtechnologies, thequantityofrecognizedproteinsequenceshassurgedexponentially.Therefore,developingacom- putational technique from known protein sequences for effectively predictingprotein-carbohydrate bindinginteractionshasledtotheemergenceofaprominentnewareaofstudy. Mostofthecomputationalapproachesforprotein-carbohydratebindingsitespredictionarebiased towards the negative class. This is due to the fact that the count of carbohydrate-binding residues isconsiderablylowercomparedtonon-carbohydrate-bindingresiduesinthebenchmarkdatasets.In this thesis, we introduce a proficient ensemble machine learning model called ‘StackCBEmbed’ for the accurate classification of protein-carbohydrate binding interactions at the residue level within establishedproteinsequences.StackCBEmbeddemonstratesamorebalancedbehaviorcomparedto the state-of-the-art methods in terms of accurately predicting both the positive and negative data points. Ourresearchusedabenchmarktrainingdatasetandtwoseparateindependenttestsets.Through the use of the Incremental Feature Selection method, we identified crucial sequence-based features and picked the most impactful ones. Furthermore, we integrated embedding characteristics from a pre-trained transformer-based language model known as ‘ProtT5-XL-Uniref50.’ To the best of our knowledge, this is the initial endeavor to utilize a protein language model for predicting protein- carbohydrate binding interactions. StackCBEmbed achieved sensitivity, specificity, andbalanced v accuracyscoresof0.691,0.849,0.769and0.627,0.835,0.731inthetwoindependenttestsetsrespec- tively. Compared to the earlier prediction models that were benchmarked in the same datasets, our reportedresultsaresignificantlysuperior.Thus,wehopetheStackCBEmbedwillhelpdiscovernovel protein-carbohydrate interactions and advance the related research fields. StackCBEmbed is freely available as python scripts athttps://github.com/farah5112github/StackCBEmbed en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering (CSE), BUET. en_US
dc.subject Parallel processing en_US
dc.title Prediction of protein-carbohydrate binding sites from protein primary sequence en_US
dc.type Thesis-MSc en_US
dc.contributor.id 1017052061 en_US
dc.identifier.accessionNumber 119573
dc.contributor.callno 004.3/QUA/2023 en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account