Abstract:
Using a set of binary classi ers to solve the multiclass classi cation problem has been a popular
approach over the years. This technique is known as binarization. The decision boundary that
these binary classi ers (also called base classi ers) have to learn is much simpler than the
decision boundary of a multiclass classi er. But binarization gives rise to a new problem
called the class imbalance problem. Class imbalance problem occurs when the data set used
for training has relatively less data items for one class than for another class. This problem
becomes more severe if the original data set itself was imbalanced. Furthermore, binarization
has only been implemented in the domain of supervised classi cation.
In this thesis, we propose a framework called Binarization with Boosting and Oversampling
(BBO). Our framework can handle the class imbalance problem arising from binarization.
As the name of the framework suggests, this is achieved through a combination of boosting
and oversampling. BBO framework can be used with any supervised classi cation algorithm.
Moreover, unlike any other binarization approaches used earlier, we apply our framework with
semi-supervised classi cation as well. BBO framework has been rigorously tested with a number
of benchmark data sets from UCI machine learning repository. The experimental results
show that using the BBO framework achieves a higher accuracy than the traditional binarization
approach.