Machine learning based approach for subcellular localization of proteins using generic feature set

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Institute of Appropriate Technology
→
View Item

dc.contributor.advisor	Akhter, Dr. Shahin
dc.contributor.author	Upama, Paramita Basak
dc.date.accessioned	2019-04-16T09:35:12Z
dc.date.available	2019-04-16T09:35:12Z
dc.date.issued	2018-08-14
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5174
dc.description.abstract	Protein subcellular localization is defined as predicting the functioning location of a given protein inside the cell. It is considered an important step towards protein function prediction and drug design. The task of protein subcellular localization from primary protein sequences is crucial for understanding genome regulation and functions. Support vector machine (SVM) based learning methods are shown to be effective for predicting protein subcellular and subnuclear localizations. Extraction of informative features cooperating with SVM plays an important role in designing an accurate system for predicting protein subnuclear localization. Proteins are large, complex molecules that are required for the structure, function, and regulation of a body’s tissues and organs. Subcellular localization of proteins within a cell of the body is a mean of achieving functional diversity of protein. The process determines the access of protein’s interacting partners and enables the integration of proteins into functional biological networks. To gain access to appropriate molecular interaction partners, protein must be at the right place at the right moment. Therefore, the process of protein subcellular localization is crucial for protein synthesis and drug discovery for a broad range of medical conditions and diseases. The current study described here introduces a novel machine learning approach in Bioinformatics for classifying 361 protein sequences found inside a cell. The sequences were in string (text) format, and a set of characteristics were extracted out of them. The feature set includes 8 physicochemical properties of the protein found in 6 target locations of a cell. A support vector machine (SVM) based model has been developed to learn these properties of proteins and test the model on an independent dataset, considering the well-known application of SVM in this field. The algorithm developed during this work selects an optimal range of parameters of SVM and adopts feature selection for obtaining the best performance of the algorithm. The proposed algorithm achieved an average accuracy of 90% in classifying proteins on the target locations. It shows better performance compared to several similar algorithms presented in the literature. The technique proposed here can further be extended for protein sequences found in any part of the body.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Information and Communication Technology	en_US
dc.subject	Molecular biology--Periodicals	en_US
dc.title	Machine learning based approach for subcellular localization of proteins using generic feature set	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	1015312027	en_US
dc.identifier.accessionNumber	116973
dc.contributor.callno	572.8/UPA/2018	en_US