MWMOTE-Boost: majority weighted minority over-sampling technique integrated with boosting for imbalanced data set learning

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Monirul Islam, Dr. Md.
dc.contributor.author	Barua, Sukarna
dc.date.accessioned	2016-06-25T04:20:29Z
dc.date.available	2016-06-25T04:20:29Z
dc.date.issued	2011-10
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/3368
dc.description.abstract	Imbalanced data sets contain an unequal distribution of data samples among the classes and pose a challenge to the learning algorithms as it becomes hard to learn the minority class concepts. Synthetic oversampling techniques address this problem by generating synthetic minority samples to balance the distribution between the samples of the ma- jority and minority classes. This thesis identifies that most of the existing synthetic oversampling techniques may generate wrong synthetic samples in some scenarios and make the learning task harder. To this end, the thesis presents a new synthetic oversam- pling method, called Majority Weighted Minority Oversampling Technique (MWMOTE), for handling imbalanced data sets efficiently. The term ’majority weighted minority over- sampling’ here means important minority samples for oversampling will be identified and weighted by the nearest majority samples and then will be used for oversampling. To do this, MWMOTE uses information from both the minority and majority samples in the data set. First, it identifies hard-to-learn informative minority samples and assigns them weights according to their importance using distance information from the nearest ma- jority samples. MWMOTE then identifies the clusters in the minority data set and uses weighted informative minority samples to generate synthetic samples inside the clusters. This is done in order to ensure that generated samples always lie inside some minority cluster and do not overlap with majority regions. The thesis finally presents a new stand-alone ensemble algorithm, called, MWMOTE- Boost, by integrating MWMOTE inside the famous AdaBoost.M2 boosting procedure. MWMOTE-Boost algorithm is obtained from MWMOTE oversampling algorithm by in- serting it into the boosting iteration of classic AdaBoost.M2 ensemble algorithm. The manner in which MWMOTE and AdaBoost.M2 are integrated is similar to the recent state-of-the-art RAMOBoost algorithm except that in place of RAMOBoost’s RAMO oversampling procedure, MWMOTE oversampling procedure is used. The proposed meth- ods, i.e., MWMOTE and MWMOTE-Boost have been evaluated extensively on four arti- ficial and seventeen real-world data sets and using several classifier models such as neural network, decision tree, k-nearest neighbor and ensemble classifier. The simulation results show that our new methods MWMOTE and MWMOTE-Boost are better or comparable than some other existing methods in terms of various assessment metrics, such as pre- cision, recall, F-measure, G-mean, and area under the receiver operating curve (ROC), usually known as area under curve (AUC).	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering (CSE)	en_US
dc.subject	Machine learning	en_US
dc.title	MWMOTE-Boost: majority weighted minority over-sampling technique integrated with boosting for imbalanced data set learning	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	0409052061 P	en_US
dc.identifier.accessionNumber	110061
dc.contributor.callno	006.31/BAR/2011	en_US