DSpace Repository

Design and implementation of the new endemism concept to determine special frequent patterns

Show simple item record

dc.contributor.advisor ShahriraT, Dr. Rifat
dc.contributor.author Basak, Madhusudan
dc.date.accessioned 2018-08-01T06:06:21Z
dc.date.available 2018-08-01T06:06:21Z
dc.date.issued 2018-03-20
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/4953
dc.description.abstract In a transaction database or support set, the task of finding out the patterns which occur more frequently than a specified threshold is known as frequent pattern mining. Since its inception in the early 1990s,frequent pattern mining has been extensively studied, and subsequently applied to the wide range of application domains-consumer behavior analysis, web log mining, gene expression profiling to name a few. While there has been substantial research in innovating a wide variety of frequent patterns, the evolution of existing and emergence of new application domains demand to innovate a new variety of patterns that can reveal distinguishing characteristics of the underlying support set. For example, clickstream analysis seek to segment users into meaningful clusters based on their click path, which requires identifying click sequences that contribute to user profiling. There are many such examples where analyst seeks for patterns that can reveal distinguishing characteristics of the underlying population. As the best of the literature review, there is no recent work to determine these characteristics. But many of these distinguishing characteristics can be identified using a newly proposed concept named endemism. If the constituent elements of pattern are more likely to be found in combine and less likely to be obtained otherwise, then this co-occurring tendency of these elements will be referred to as endemism and this type of pattern will be called endemic pattern. This thesis introduces this endemism concepts to make pattern level grouping of the records or users, which can provide valuable information about the underlying support set. This work proposes two scoring strategies, Reluctancy Scoring and Affinity Scoring, to evaluate the endemism of the frequent patterns. This thesis also proposes three heuristics, TopK selection, Optimized Search and Random selection, as the alternative to the costly Combinatorial Search method for the final grouping of the records. Experiments show that reluctancy Scoring outperforms Affinity Scoring, and optimized Search provides the best result among the heuristics with a little sacrifice of time.
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering n(CSE), BUET en_US
dc.subject Data mining en_US
dc.title Design and implementation of the new endemism concept to determine special frequent patterns en_US
dc.type Thesis-MSc en_US
dc.contributor.id 1014052035P en_US
dc.identifier.accessionNumber 116187
dc.contributor.callno 005.759/BAS/2018 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account