Conceptual clustering and classification of information using vector space model

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Rahman, Dr. Chowdhury Mofizur
dc.contributor.author	Faruk Ahmed, Md.
dc.date.accessioned	2016-01-10T03:47:24Z
dc.date.available	2016-01-10T03:47:24Z
dc.date.issued	2002-10
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/1625
dc.description.abstract	Document clustering is a popular t,ool for organizing a large collection of documents. Clustering algorithms are usually applied on documents, represented as vectors, in a high dimensional term space. The main two problems related to such clu~tering approach are accurately cluster the co-related documents and determine the proper number of clust,ers. The first feature is being analyzed in current literature in different ways including active CltlStering, partitional k-means algorithm, project,ion based methods including LSI, self-organizing maps, multi dimensional scaling, graph-theoretic techniques and many more. As for the second feature most of the clustering approaches assumes the number of clusters as a pre-requisite quantity such in case of Markov State Cluster, partitional methods and most of the graphtheoretic techniques. A few of the clustering algorithms have been analyzed those can automatically determine the number of clusters. A popular approach is based on the idea borrowed from Principal Component Analysis. Another approach uses self-refinement process of discriminative feature identification and cluster label voting to converge to optimal number of clusters. In this work we have implemented iterative solution with inductive knowledge base to achieve the optimal clustering. Both the inter-cluster distance and number of clusters are iteratively varied to have this optimization. This new technique to determine the number of clusters and document clustering shows promising result with 81% percent clustering accuracy. For classification we studied unsupervised clustering technique together with the group vector that also minimizes the computational cost that is usually associated with ordinary classification approaches. The outcome reveals comparable result to current practices and gives 78% classification accuracy.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering, BUET	en_US
dc.subject	Cluster analysis-Computer programme	en_US
dc.title	Conceptual clustering and classification of information using vector space model	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	040005017 F	en_US
dc.identifier.accessionNumber	98233
dc.contributor.callno	005.1/FAR/2002	en_US