Conceptual clustering and classification of information using vector space model

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

Conceptual clustering and classification of information using vector space model

Faruk Ahmed, Md.

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/1625

Date: 2002-10

Abstract:

Document clustering is a popular t,ool for organizing a large collection of documents. Clustering algorithms are usually applied on documents, represented as vectors, in a high dimensional term space. The main two problems related to such clu~tering approach are accurately cluster the co-related documents and determine the proper number of clust,ers. The first feature is being analyzed in current literature in different ways including active CltlStering, partitional k-means algorithm, project,ion based methods including LSI, self-organizing maps, multi dimensional scaling, graph-theoretic techniques and many more. As for the second feature most of the clustering approaches assumes the number of clusters as a pre-requisite quantity such in case of Markov State Cluster, partitional methods and most of the graphtheoretic techniques. A few of the clustering algorithms have been analyzed those can automatically determine the number of clusters. A popular approach is based on the idea borrowed from Principal Component Analysis. Another approach uses self-refinement process of discriminative feature identification and cluster label voting to converge to optimal number of clusters. In this work we have implemented iterative solution with inductive knowledge base to achieve the optimal clustering. Both the inter-cluster distance and number of clusters are iteratively varied to have this optimization. This new technique to determine the number of clusters and document clustering shows promising result with 81% percent clustering accuracy. For classification we studied unsupervised clustering technique together with the group vector that also minimizes the computational cost that is usually associated with ordinary classification approaches. The outcome reveals comparable result to current practices and gives 78% classification accuracy.

Show full item record

Files in this item

Name: Full Thesis .pdf

Size: 744.0Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

Conceptual clustering and classification of information using vector space model

Conceptual clustering and classification of information using vector space model

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account