Prediction of protein-carbohydrate binding sites from protein primary sequence

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

Prediction of protein-carbohydrate binding sites from protein primary sequence

Nawar, Quazi Farah

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6757

Date: 2023-09-23

Abstract:

A protein is a large, complex macromolecule, and it has many crucial roles in the human body as it performs most of the work in cells and tissues. It consists of one or multiple extended sequences of amino acid components. Another important biomolecule that comes after DNA and proteins is carbohydrates. Carbohydrates interact with proteins to facilitate various biological processes. Sev- eralbiochemicalexperimentsexisttostudyprotein-carbohydrateinteractions,buttheyareexpensive, time-consuming,andchallenging.Asaresultoftheswiftadvancementsinsequencingtechnologies, thequantityofrecognizedproteinsequenceshassurgedexponentially.Therefore,developingacom- putational technique from known protein sequences for effectively predictingprotein-carbohydrate bindinginteractionshasledtotheemergenceofaprominentnewareaofstudy. Mostofthecomputationalapproachesforprotein-carbohydratebindingsitespredictionarebiased towards the negative class. This is due to the fact that the count of carbohydrate-binding residues isconsiderablylowercomparedtonon-carbohydrate-bindingresiduesinthebenchmarkdatasets.In this thesis, we introduce a proficient ensemble machine learning model called ‘StackCBEmbed’ for the accurate classification of protein-carbohydrate binding interactions at the residue level within establishedproteinsequences.StackCBEmbeddemonstratesamorebalancedbehaviorcomparedto the state-of-the-art methods in terms of accurately predicting both the positive and negative data points. Ourresearchusedabenchmarktrainingdatasetandtwoseparateindependenttestsets.Through the use of the Incremental Feature Selection method, we identified crucial sequence-based features and picked the most impactful ones. Furthermore, we integrated embedding characteristics from a pre-trained transformer-based language model known as ‘ProtT5-XL-Uniref50.’ To the best of our knowledge, this is the initial endeavor to utilize a protein language model for predicting protein- carbohydrate binding interactions. StackCBEmbed achieved sensitivity, specificity, andbalanced v accuracyscoresof0.691,0.849,0.769and0.627,0.835,0.731inthetwoindependenttestsetsrespec- tively. Compared to the earlier prediction models that were benchmarked in the same datasets, our reportedresultsaresignificantlysuperior.Thus,wehopetheStackCBEmbedwillhelpdiscovernovel protein-carbohydrate interactions and advance the related research fields. StackCBEmbed is freely available as python scripts athttps://github.com/farah5112github/StackCBEmbed

Show full item record

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

Prediction of protein-carbohydrate binding sites from protein primary sequence

Prediction of protein-carbohydrate binding sites from protein primary sequence

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account