DSpace Repository

MWMOTE-Boost: majority weighted minority over-sampling technique integrated with boosting for imbalanced data set learning

Show simple item record

dc.contributor.advisor Monirul Islam, Dr. Md.
dc.contributor.author Barua, Sukarna
dc.date.accessioned 2016-06-25T04:20:29Z
dc.date.available 2016-06-25T04:20:29Z
dc.date.issued 2011-10
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/3368
dc.description.abstract Imbalanced data sets contain an unequal distribution of data samples among the classes and pose a challenge to the learning algorithms as it becomes hard to learn the minority class concepts. Synthetic oversampling techniques address this problem by generating synthetic minority samples to balance the distribution between the samples of the ma- jority and minority classes. This thesis identifies that most of the existing synthetic oversampling techniques may generate wrong synthetic samples in some scenarios and make the learning task harder. To this end, the thesis presents a new synthetic oversam- pling method, called Majority Weighted Minority Oversampling Technique (MWMOTE), for handling imbalanced data sets efficiently. The term ’majority weighted minority over- sampling’ here means important minority samples for oversampling will be identified and weighted by the nearest majority samples and then will be used for oversampling. To do this, MWMOTE uses information from both the minority and majority samples in the data set. First, it identifies hard-to-learn informative minority samples and assigns them weights according to their importance using distance information from the nearest ma- jority samples. MWMOTE then identifies the clusters in the minority data set and uses weighted informative minority samples to generate synthetic samples inside the clusters. This is done in order to ensure that generated samples always lie inside some minority cluster and do not overlap with majority regions. The thesis finally presents a new stand-alone ensemble algorithm, called, MWMOTE- Boost, by integrating MWMOTE inside the famous AdaBoost.M2 boosting procedure. MWMOTE-Boost algorithm is obtained from MWMOTE oversampling algorithm by in- serting it into the boosting iteration of classic AdaBoost.M2 ensemble algorithm. The manner in which MWMOTE and AdaBoost.M2 are integrated is similar to the recent state-of-the-art RAMOBoost algorithm except that in place of RAMOBoost’s RAMO oversampling procedure, MWMOTE oversampling procedure is used. The proposed meth- ods, i.e., MWMOTE and MWMOTE-Boost have been evaluated extensively on four arti- ficial and seventeen real-world data sets and using several classifier models such as neural network, decision tree, k-nearest neighbor and ensemble classifier. The simulation results show that our new methods MWMOTE and MWMOTE-Boost are better or comparable than some other existing methods in terms of various assessment metrics, such as pre- cision, recall, F-measure, G-mean, and area under the receiver operating curve (ROC), usually known as area under curve (AUC). en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering (CSE) en_US
dc.subject Machine learning en_US
dc.title MWMOTE-Boost: majority weighted minority over-sampling technique integrated with boosting for imbalanced data set learning en_US
dc.type Thesis-MSc en_US
dc.contributor.id 0409052061 P en_US
dc.identifier.accessionNumber 110061
dc.contributor.callno 006.31/BAR/2011 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account