DEVELOPMENT OF ANENSEMBLE FEATURE SELECTION METHOD FOR CLASSIFICATION OF MEDICAL DATA

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Institute of Information and Communication Technology
→
View Item

dc.contributor.advisor	Md. Rubaiyat Hossain Mondal, Dr.
dc.contributor.author	Arju Manara, Begum
dc.date.accessioned	2024-08-20T09:19:47Z
dc.date.available	2024-08-20T09:19:47Z
dc.date.issued	2023-02-04
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6782
dc.description.abstract	Feature selection (FS), a crucial preprocessing step in machine learning, greatly reduces the dimension of the data and improves model performance. By removing irrelevant and redundant features from the feature space, the fundamental goal of FS is to choose an optimal subset of features. Feature weightings reported in the literature illustrate how essential each feature is, but they cannot ensure a superior categorization feature set. It is found that the features' interaction is complex. In order to locate fewer redundant or more pure features, we may give up valuable ones, which could hinder data classification. Developing a good feature selection strategy is crucial. This research focuses on selecting features for medical data classification. In this work, a new form of ensemble FS method called PRG_Ensemble has been put forth. It combines three FS methods to produce a stable and diverse subset of features. Gaining an optimal subset of features and overcoming the shortcomings of a single FS method are the primary goal of the ensemble FS method. In this study, the three filter FS approaches that are employed as base selectors are the Pearson’s correlation coefficient (PCC), reliefF, and gain ratio (GR). When used on a certain dataset, these three FS approaches produce three distinct lists of features and order each feature by importance or weight. The final subset of features in this study is chosen using the average weight of each feature and the rank difference of a feature across three ranked lists. Using the average weight and rank difference of each feature, unstable and less significant features are eliminated from the feature space. Two well-known medical datasetschronic kidney disease (CKD) and Lung Cancer, have been used to evaluate the performance of the suggested technique. Data in CKD and Lung Cancer is classified using logistic regression (LR). The experimental results show that the proposed method has obtained highestaccuracy value of 99.25% for CKD and highest accuracy value of 93.5275% for Lung Cancer, compared to other three base FS methods for each dataset respectively.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Information and Communication Technology	en_US
dc.subject	Data mining	en_US
dc.title	DEVELOPMENT OF ANENSEMBLE FEATURE SELECTION METHOD FOR CLASSIFICATION OF MEDICAL DATA	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	0416312030	en_US
dc.identifier.accessionNumber	119553
dc.contributor.callno	0416312030	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Dissertations/Theses - Institute of Information and Communication Technology
Post graduate dissertations (Theses) of Institute of Information and Communication Technology (IICT)

Show simple item record

Search BUET IR

Advanced Search

Browse

All of IR
This Collection

DEVELOPMENT OF ANENSEMBLE FEATURE SELECTION METHOD FOR CLASSIFICATION OF MEDICAL DATA

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account