Label based ensemble farmework for multi-label data stream classification with recurring and novel class detection

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

Label based ensemble farmework for multi-label data stream classification with recurring and novel class detection

Sajjadur Rahman

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/2980

Date: 2014-06

Abstract:

Of late, the advent of online social media has led to the inception of a new form of data stream called multi-label data stream, where each stream record carries multiple class labels and requires a classi er to associate multiple categories to each record. Data streams present several challenges that has to be dealt with by any stream classi cation model. Concept drifting, in nite length with nite memory and processing time are the challenges that have been addressed by the existing multi-label data stream classi cation models in literature. In real world applications that generate data streams, the amount of labeled data is usually very scarce compared to the entire stream. Moreover, with the ever changing nature of Internet and social media, the emergence new class of data in the stream is a common phenomenon. This phenomenon is known as concept evolution. When this emergence occurs periodically for some classes of data, it is called class recurrence. None of the existing methodologies address any of the issues of scarcity of labeled data, concept evolution and class recurrence. This thesis proposes a layered ensemble based classi cation framework (LEAD) for multi-label data streams. The primary component of our LEAD framework is a two layer ensemble architecture. The top layer of the ensemble architecture re ects the most recent concept of the data stream whereas the bottom layer represents the older concepts of the stream. As a result, the bottom layer enables LEAD to classify recurrent class instances. Moreover, the layered approach also helps to di erentiate between recurrent and novel class instances which signi cantly reduces the false alarm rate of novel class instance identi cation. LEAD deploys a fuzzy novel class detection technique to identify the emergence of novel concept(s) in the stream. The problem of limited amount of labeled data is handled by a deferred classi cation mechanism. This mechanism allows more labeled data to appear in the stream that may help the development of a more informed classi er. Experimental results show clearly that LEAD exhibits better performance than the baseline methods.

Show full item record

Files in this item

Name: Full Thesis.pdf

Size: 903.5Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

Label based ensemble farmework for multi-label data stream classification with recurring and novel class detection

Label based ensemble farmework for multi-label data stream classification with recurring and novel class detection

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account