DSpace Repository

DEVELOPMENT OF ANENSEMBLE FEATURE SELECTION METHOD FOR CLASSIFICATION OF MEDICAL DATA

Show simple item record

dc.contributor.advisor Md. Rubaiyat Hossain Mondal, Dr.
dc.contributor.author Arju Manara, Begum
dc.date.accessioned 2024-08-20T09:19:47Z
dc.date.available 2024-08-20T09:19:47Z
dc.date.issued 2023-02-04
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6782
dc.description.abstract Feature selection (FS), a crucial preprocessing step in machine learning, greatly reduces the dimension of the data and improves model performance. By removing irrelevant and redundant features from the feature space, the fundamental goal of FS is to choose an optimal subset of features. Feature weightings reported in the literature illustrate how essential each feature is, but they cannot ensure a superior categorization feature set. It is found that the features' interaction is complex. In order to locate fewer redundant or more pure features, we may give up valuable ones, which could hinder data classification. Developing a good feature selection strategy is crucial. This research focuses on selecting features for medical data classification. In this work, a new form of ensemble FS method called PRG_Ensemble has been put forth. It combines three FS methods to produce a stable and diverse subset of features. Gaining an optimal subset of features and overcoming the shortcomings of a single FS method are the primary goal of the ensemble FS method. In this study, the three filter FS approaches that are employed as base selectors are the Pearson’s correlation coefficient (PCC), reliefF, and gain ratio (GR). When used on a certain dataset, these three FS approaches produce three distinct lists of features and order each feature by importance or weight. The final subset of features in this study is chosen using the average weight of each feature and the rank difference of a feature across three ranked lists. Using the average weight and rank difference of each feature, unstable and less significant features are eliminated from the feature space. Two well-known medical datasetschronic kidney disease (CKD) and Lung Cancer, have been used to evaluate the performance of the suggested technique. Data in CKD and Lung Cancer is classified using logistic regression (LR). The experimental results show that the proposed method has obtained highestaccuracy value of 99.25% for CKD and highest accuracy value of 93.5275% for Lung Cancer, compared to other three base FS methods for each dataset respectively. en_US
dc.language.iso en en_US
dc.publisher Institute of Information and Communication Technology en_US
dc.subject Data mining en_US
dc.title DEVELOPMENT OF ANENSEMBLE FEATURE SELECTION METHOD FOR CLASSIFICATION OF MEDICAL DATA en_US
dc.type Thesis-MSc en_US
dc.contributor.id 0416312030 en_US
dc.identifier.accessionNumber 119553
dc.contributor.callno 0416312030 en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account