Prediction of cervical cancer in Bangladesh using hybrid machine learning algorithms

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Institute of Information and Communication Technology
→
View Item

dc.contributor.advisor	Mondal, Dr. Md. Rubaiyat Hossain
dc.contributor.author	Khanam, Fahima
dc.date.accessioned	2022-06-28T04:19:58Z
dc.date.available	2022-06-28T04:19:58Z
dc.date.issued	2021-10-10
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6030
dc.description.abstract	The aim of this research work is to apply machine learning algorithms for predicting cervical cancer. Early screening of vulnerable patients is essential to prevent cervical cancer. However, in many developing countries, there is a scarcity of medical facilities for such screening. Hence, research is needed in the field of data-driven diagnosis of cervical cancer. In this thesis, a dataset of cervical cancer patients has been considered, which includes attributes suitable for Bangladeshi patients. Another objective is to classify the patients of the dataset by using a new efficient hybrid algorithm. Firstly, an existing dataset collected from the University of California, Irvine (UCI); a machine learning repository is considered, which consists of 36 attributes and 858 instances. To overcome the imbalance of the data samples, the borderline Synthetic Minority Over-sampling Technique (SMOTE) is used. Next, a new dataset of cervical cancer patients collected from various hospitals in Bangladesh has been introduced. This new dataset consists of 21 attributes and 228 instances. The Recursive Feature Elimination method is applied to both datasets to find the most important attributing to cervical cancer. A number of classifiers, including base, ensemble, and hybrid algorithms, are applied to the datasets. Next, a two-stage hybrid algorithm is proposed where ExtraTreeClassifier is used in the first stage, and a stacking algorithm is used in the second stage. Results show that stacking as a combination of Random Forest, ExtraTreeClassifier, XGBoost, and Bagging exhibits the best classification accuracy of 95.3% for the first dataset. For the second dataset, AdaBoost shows the best classification accuracy of 95.6%. The proposed hybrid method offers classification accuracy of 95.9% and 96.2% for first and second datasets. Hence, the Bangladeshi dataset and the proposed hybrid algorithm can play an essential role in predicting cervical cancer.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Information and Commutation Technology	en_US
dc.subject	Diagnostic imaging-Digital techniques-Breast cancer	en_US
dc.title	Prediction of cervical cancer in Bangladesh using hybrid machine learning algorithms	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	0417312045	en_US
dc.identifier.accessionNumber	118614
dc.contributor.callno	616.0754/FAH/2021	en_US