DSpace Repository

Multi-layer hybrid balancing technique to remove data imbalance

Show simple item record

dc.contributor.advisor Mustafa, Dr. Hossen Asifu
dc.contributor.author Islam, Muhammad Tanveer
dc.date.accessioned 2022-06-26T06:33:37Z
dc.date.available 2022-06-26T06:33:37Z
dc.date.issued 2021-06-15
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6019
dc.description.abstract Data is one of the essential elements nowadays for discovering business decisions, de- cision optimization, and scientific research and growing exponentially due to the use of different kinds of applications in various business organizations and production indus- tries. The proper dataset offers organizations and researchers to analyze their showcas- ing techniques, make effective data-driven choices and make superior advertisements. In real-life scenarios, most data sources create a gap among class attribute elements which reduces to build a proper decision in the prediction. An imbalanced dataset cre- ates a critical problem that affects the business decisions and makes a biased result towards the major class. However, existing data balancing techniques can solve the problems of data balancing. Existing data balancing techniques have a major draw- back: these create new artificial samples randomly, which create outliers and hamper the potentiality of the original dataset. Our thesis work proposes a Multi-Layer Hybrid (MLH) Balancing Scheme that combines three over-sampling techniques and processes output in a proper way. This scheme gives a balanced and noise-free output by combin- ing the characteristics of ADASYN, SVM-SMOTE, and SMOTE+ENN. It also creates new data points within the range of the original dataset, which keeps the originality of the new data points. Thus, the generated output from three layers is proper balancing output for machine learning models. We use 34 different imbalanced datasets with dif- ferent imbalance ratios, and experimental results show balanced and proper output for the proposed scheme. We apply the resultant dataset to Random Forest (RF) and Ar- tificial Neural Network (ANN); comparing existing techniques shows that our scheme gives better results. We used various types of the dataset in our thesis and got a differ- ent amount of result for these datasets; so we combined the results and got the average output for different metrics. Using the RF, we achieved, 82%, 83%, 83%, 84% and 91% average Accuracy; 45%, 63%, 72%, 58% and 88% average G-Mean; 39%, 55%, 62%, 51% and 83% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using the ANN, we achieved, 78%, 77%, 74%, 80% and 79% average Accuracy; 30%, 71%, 73%, 69% and 77% aver- age G-Mean; 26%, 59%, 59%, 60% and 67% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using our proposed approach, we got a better outcome for the imbalanced dataset than the exist- ing approach and observed a better performance for our proposed approach using the Random Forest. en_US
dc.language.iso en en_US
dc.publisher Institute of Information and Commutation Technology en_US
dc.subject Data mining en_US
dc.title Multi-layer hybrid balancing technique to remove data imbalance en_US
dc.type Thesis-MSc en_US
dc.contributor.id 0417312025 en_US
dc.identifier.accessionNumber 118631
dc.contributor.callno 005.759/TAN/2021 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account