Revisiting succinylated lysine residue prediction with carefully selected physicochemical and biochemical properties of amino acids

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Rahman, Dr.M.Sohel
dc.contributor.author	Shehab, Sarar Ahmed
dc.date.accessioned	2024-01-22T05:43:33Z
dc.date.available	2024-01-22T05:43:33Z
dc.date.issued	2022-05-16
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6582
dc.description.abstract	Succinylation of lysine residue is a special type of post-translational modification(PTM).Ithasacrucialroleinbalancingtheprocessesofcells.Abnormalsuccinylation can be the cause of cancers, metabolism diseases, inflammation andnervous system diseases.Detecting succinylation sites is of great importance toexplore the function of proteins.However, the experimental methods to detectsuccinylation sites are costly,time and labor consuming.This thus calls forcomputational models with high efficacy and attention has been given in theliterature for developing such models, albeit with only moderate success in thecontextofdifferentevaluationmetrics.Inparticular,theexistingworksfailedto balance the two metrics, sensitivity and specificity, leaving a large room forimprovements in this context. One important aspect in this context is the biochemicaland physicochemical properties of amino acids, which appear to be useful as featuresfor such computational predictors. However, some of the existing computationalmodelsdidnotusethebiochemicalandphysicochemicalpropertiesofaminoacids,while some others used them without considering the inter-dependency among theproperties. In this thesis, we revisit the computational prediction of succinylated lysineresidue (SLR) and use a broad spectrum of weaponry to tackle this problem. Wefirst focus on the biochemical and physicochemical properties of amino acids andformulateanoptimizationproblemtofindcombinationthatismoresuitablefortheproblem at hand considering their inter-dependencies and other factors. In particular,we propose a variant of genetic algorithm, called IBCGA, to search for suitablecombinations thereof for efficient prediction of SLRs. In this context, we leveragethe power of Random Forest (RF) and Balanced RF (a variant of RF to handleimbalanceddata). We then propose three deep learning architectures, CNN+Bi-LSTM (CBL),Bi-LSTM+CNN (BLC) and their combination (CBL BLC) thereby leveraging thepotentialofdeepneuralnetworkarchitecturesforSLRprediction.Wealsoemploydifferent ensembling techniques to improve upon the performance of our models,which includes heterogeneous ensembling of traditional ML models with deeplearning architectures as well. Finally, we apply differential evolution to tune thethreshold of ensemble classifiers thereby providing the biologists and practitionerswithaknobtobalancethesensitivityandspecificity. Thecombinationsofbiochemicalandphysicochemicalpropertiesderivedthroughouroptimizationprocessachievebetterresultsthantheresultsachievedbythe combination of all the properties. In this context, one of the best performingcombinationsconsistsofonlytwoproperties.Asforourdeeplearningarchitectures,	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering (CSE)	en_US
dc.subject	Computer simulation	en_US
dc.title	Revisiting succinylated lysine residue prediction with carefully selected physicochemical and biochemical properties of amino acids	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	419052005	en_US
dc.identifier.accessionNumber	119139
dc.contributor.callno	005.369/SHE/2022	en_US