DSpace Repository

Machine learning based approach for subcellular localization of proteins using generic feature set

Show simple item record

dc.contributor.advisor Akhter, Dr. Shahin
dc.contributor.author Upama, Paramita Basak
dc.date.accessioned 2019-04-16T09:35:12Z
dc.date.available 2019-04-16T09:35:12Z
dc.date.issued 2018-08-14
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5174
dc.description.abstract Protein subcellular localization is defined as predicting the functioning location of a given protein inside the cell. It is considered an important step towards protein function prediction and drug design. The task of protein subcellular localization from primary protein sequences is crucial for understanding genome regulation and functions. Support vector machine (SVM) based learning methods are shown to be effective for predicting protein subcellular and subnuclear localizations. Extraction of informative features cooperating with SVM plays an important role in designing an accurate system for predicting protein subnuclear localization. Proteins are large, complex molecules that are required for the structure, function, and regulation of a body’s tissues and organs. Subcellular localization of proteins within a cell of the body is a mean of achieving functional diversity of protein. The process determines the access of protein’s interacting partners and enables the integration of proteins into functional biological networks. To gain access to appropriate molecular interaction partners, protein must be at the right place at the right moment. Therefore, the process of protein subcellular localization is crucial for protein synthesis and drug discovery for a broad range of medical conditions and diseases. The current study described here introduces a novel machine learning approach in Bioinformatics for classifying 361 protein sequences found inside a cell. The sequences were in string (text) format, and a set of characteristics were extracted out of them. The feature set includes 8 physicochemical properties of the protein found in 6 target locations of a cell. A support vector machine (SVM) based model has been developed to learn these properties of proteins and test the model on an independent dataset, considering the well-known application of SVM in this field. The algorithm developed during this work selects an optimal range of parameters of SVM and adopts feature selection for obtaining the best performance of the algorithm. The proposed algorithm achieved an average accuracy of 90% in classifying proteins on the target locations. It shows better performance compared to several similar algorithms presented in the literature. The technique proposed here can further be extended for protein sequences found in any part of the body. en_US
dc.language.iso en en_US
dc.publisher Institute of Information and Communication Technology en_US
dc.subject Molecular biology--Periodicals en_US
dc.title Machine learning based approach for subcellular localization of proteins using generic feature set en_US
dc.type Thesis-MSc en_US
dc.contributor.id 1015312027 en_US
dc.identifier.accessionNumber 116973
dc.contributor.callno 572.8/UPA/2018 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account