Multi-label classification of human protein subcellular locations from microscopy images using convolutional neural networks

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Electrical and Electronic Engineering
→
View Item

dc.contributor.advisor	Mohammad Ariful Haque, Dr.
dc.contributor.author	Mitra, Avijit
dc.date.accessioned	2021-08-16T08:59:57Z
dc.date.available	2021-08-16T08:59:57Z
dc.date.issued	2021-01-16
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5737
dc.description.abstract	Proteins are the ‘doers’ of all living organisms. Subcellular localization of human proteins plays an important role for inferring their structures and functions in our cells. Due to the recent advancement of molecule imaging techniques, the importance of analyzing image data for protein subcellular locations is now more than ever.At the same time,it is getting widely popular instead of conventional 1D protein amino acid sequence data. Classification of human protein cell localization is important to automate and accelerate different biomedical research tasks as well as the diagnosis of different diseases to reduce the time and manual effort. Although the use of deep convolutional neural networks (DCNN) to classify images is a very straightforward approach, our task comes with multiple challenges. First, there are 28 distinct labels, assigned to a single image. Second, there is a strong class imbalance in the dataset with some labels appearing in less than 0.3% of the data. Lastly, the protein location classification task is to be performed across a wide range of different human cells. We aim at overcoming these through different approaches. In this work, our principal goal is to presentan end-to-end system for the classification of mixed pattern protein subcellular localization from confocal microscopy images, using convolutional neural networks. We showed the outcomes of several experimental setups for a highly imbalanced dataset and investigated their effectiveness. We also demonstrate that oversampling outweighs cost sensitive learning to handle the data imbalance problem. In addition, we show that an ensemble of models always benefits our task. Using these observations, we managed to achieve a public macro F1 score of 0.574 and a private macro F1 score of 0.515 on the dataset for Kaggle competition - Human Protein Atlas Image Classification.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Electrical and Electronic Engineering ( EEE), BUET	en_US
dc.subject	Neural networks	en_US
dc.title	Multi-label classification of human protein subcellular locations from microscopy images using convolutional neural networks	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	0417062288	en_US
dc.identifier.accessionNumber	117758
dc.contributor.callno	623.99/MIT/2021	en_US