MWMOTE-Boost: majority weighted minority over-sampling technique integrated with boosting for imbalanced data set learning

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

MWMOTE-Boost: majority weighted minority over-sampling technique integrated with boosting for imbalanced data set learning

Barua, Sukarna

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/3368

Date: 2011-10

Abstract:

Imbalanced data sets contain an unequal distribution of data samples among the classes and pose a challenge to the learning algorithms as it becomes hard to learn the minority class concepts. Synthetic oversampling techniques address this problem by generating synthetic minority samples to balance the distribution between the samples of the ma- jority and minority classes. This thesis identifies that most of the existing synthetic oversampling techniques may generate wrong synthetic samples in some scenarios and make the learning task harder. To this end, the thesis presents a new synthetic oversam- pling method, called Majority Weighted Minority Oversampling Technique (MWMOTE), for handling imbalanced data sets efficiently. The term ’majority weighted minority over- sampling’ here means important minority samples for oversampling will be identified and weighted by the nearest majority samples and then will be used for oversampling. To do this, MWMOTE uses information from both the minority and majority samples in the data set. First, it identifies hard-to-learn informative minority samples and assigns them weights according to their importance using distance information from the nearest ma- jority samples. MWMOTE then identifies the clusters in the minority data set and uses weighted informative minority samples to generate synthetic samples inside the clusters. This is done in order to ensure that generated samples always lie inside some minority cluster and do not overlap with majority regions. The thesis finally presents a new stand-alone ensemble algorithm, called, MWMOTE- Boost, by integrating MWMOTE inside the famous AdaBoost.M2 boosting procedure. MWMOTE-Boost algorithm is obtained from MWMOTE oversampling algorithm by in- serting it into the boosting iteration of classic AdaBoost.M2 ensemble algorithm. The manner in which MWMOTE and AdaBoost.M2 are integrated is similar to the recent state-of-the-art RAMOBoost algorithm except that in place of RAMOBoost’s RAMO oversampling procedure, MWMOTE oversampling procedure is used. The proposed meth- ods, i.e., MWMOTE and MWMOTE-Boost have been evaluated extensively on four arti- ficial and seventeen real-world data sets and using several classifier models such as neural network, decision tree, k-nearest neighbor and ensemble classifier. The simulation results show that our new methods MWMOTE and MWMOTE-Boost are better or comparable than some other existing methods in terms of various assessment metrics, such as pre- cision, recall, F-measure, G-mean, and area under the receiver operating curve (ROC), usually known as area under curve (AUC).

Show full item record

Files in this item

Name: Full Thesis.pdf

Size: 622.7Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

MWMOTE-Boost: majority weighted minority over-sampling technique integrated with boosting for imbalanced data set learning

MWMOTE-Boost: majority weighted minority over-sampling technique integrated with boosting for imbalanced data set learning

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account