Abstract:
Naive Bayes (NB) is one of the most efficient and effective learning algorithms for
machine leaming and data mining tasks due to its linear computational and memory
complexities and easier implementation technique. The main focus of NB is the
simplification of Bayes Optimal Classifier. A common problem in Bayes Optimal
Classifier is the direct estimation of class-conditional probability distribution (CPD)
hom a given training data set with high dimensional feature space while finding thc
maximum a posteriori probability (MAP) hypothesis for a given example whose
prediction is not specified. Estimation of CPD from a given training data set with high
dimensional feature space requires that every possible combination of attribute values
must be available in training data which is usually not found in real life learning
domains. NB uses some approximations to eliminate this problem by using the
simplifying assumption that attribute values are conditionally independent given the
class values. If all the attributes are truly independent, NB makes the same prediction
as Bayes Optimal Classifier and NB is said to be working perfectly. But this
independence assumption is almost always violated in practice and as a result
classification accuracy of NB degrades in a large number of leaming domains. In this
thesis, wc propose a new learning algorithm of Naive Bayesian classification to
alleviate the independence assumption problem in NB: thereby, improving the
performance of NB and sustaining its optimality to all learning domains which makes
it universal. In our algorithm, a measure of attribute dependence is considered for
each attrihute. Attrihute dependence of each attribute on other attributes is estimated
from training data sel with the help of dependency equation. Most interdependent
attributes are selected based on their dependency and by applying a leave-one-out
cross validation on training data set. A subset of examples is chosen using these
attributes with their values in a test example. A local NB is applied on this subset to
classify the test example. The algorithm has been tested on a wide range of natural
and al1ificial learning domains taken from UCI machine learning repository.
Experimental result shows that the new algorithm obtains a lower error rate than that
of NB classifier, BSEJ and LBR. In some domains its enor rate is lower than that of
modern decision tree learning algorithm C4.5, LAZYDT also.