Design and implementation of the new endemism concept to determine special frequent patterns

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	ShahriraT, Dr. Rifat
dc.contributor.author	Basak, Madhusudan
dc.date.accessioned	2018-08-01T06:06:21Z
dc.date.available	2018-08-01T06:06:21Z
dc.date.issued	2018-03-20
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/4953
dc.description.abstract	In a transaction database or support set, the task of finding out the patterns which occur more frequently than a specified threshold is known as frequent pattern mining. Since its inception in the early 1990s,frequent pattern mining has been extensively studied, and subsequently applied to the wide range of application domains-consumer behavior analysis, web log mining, gene expression profiling to name a few. While there has been substantial research in innovating a wide variety of frequent patterns, the evolution of existing and emergence of new application domains demand to innovate a new variety of patterns that can reveal distinguishing characteristics of the underlying support set. For example, clickstream analysis seek to segment users into meaningful clusters based on their click path, which requires identifying click sequences that contribute to user profiling. There are many such examples where analyst seeks for patterns that can reveal distinguishing characteristics of the underlying population. As the best of the literature review, there is no recent work to determine these characteristics. But many of these distinguishing characteristics can be identified using a newly proposed concept named endemism. If the constituent elements of pattern are more likely to be found in combine and less likely to be obtained otherwise, then this co-occurring tendency of these elements will be referred to as endemism and this type of pattern will be called endemic pattern. This thesis introduces this endemism concepts to make pattern level grouping of the records or users, which can provide valuable information about the underlying support set. This work proposes two scoring strategies, Reluctancy Scoring and Affinity Scoring, to evaluate the endemism of the frequent patterns. This thesis also proposes three heuristics, TopK selection, Optimized Search and Random selection, as the alternative to the costly Combinatorial Search method for the final grouping of the records. Experiments show that reluctancy Scoring outperforms Affinity Scoring, and optimized Search provides the best result among the heuristics with a little sacrifice of time.
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering n(CSE), BUET	en_US
dc.subject	Data mining	en_US
dc.title	Design and implementation of the new endemism concept to determine special frequent patterns	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	1014052035P	en_US
dc.identifier.accessionNumber	116187
dc.contributor.callno	005.759/BAS/2018	en_US