Cleaning and clustering of sensor data by k - means algorithm for efficient query processing

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Latiful Hoque, Dr. Abu Sayed Md.
dc.contributor.author	Muhidul Islam Khan, Md.
dc.date.accessioned	2015-12-26T10:38:47Z
dc.date.available	2015-12-26T10:38:47Z
dc.date.issued	2009-08
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/1560
dc.description.abstract	The way of collecting sensor data will face a revolution when the newly developing technology of distributed sensor networks becomes fully functional and widely available. Distributed sensor networks are indeed an attractive technology, but the program/stack memory and the battery life of today nodes do not enable complex data mining in runtime. Effective data mining can be implemented on the central base station, where the computational power is not generally constrained. Today's real-world databases are highly susceptible to noisy, missing and inconsistent data because of their typically huge size and their likely origin from multiple, heterogeneous sources. Low-quality data will lead to low-quality mining results. There are many possible reasons for noisy data (having incorrect attribute values). The data collection sensor nodes used may be faulty. Errors in data transmission can also occur. There may be technology limitations, such as limited buffer size for coordinating synchronized data transfer and consumption. In:correct data may also result from inconsistencies in naming conventions or data codes used or inconsistent formats for input fields. Duplicate tuples also require data cleaning. Preprocessing is required to remove noisy, missing and inconsistent data for efficient mining in Wireless Sensor Networks (WSN) data. A number of research works have been done for mining WSN data. No research work has been found to be done on pre-. processing the WSN data for efficient query processing. In: this project, we have evaluated a number of statistical techniques to handle missing data. Among these techniques, mean before after is found most suitable for handling missing data. We have . implemented the Approximate Duplicilte Record Detection method to remove the duplicate records from a dataset. We have used some WSN datasets available in the internet for experimental purpose. Kmeans Algorithm has been applied for clustering the dataset. Cleaned and clustered dataset has shown better performance for query processing than dirty and non clustered data.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering, BUET	en_US
dc.subject	Data mining	en_US
dc.title	Cleaning and clustering of sensor data by k - means algorithm for efficient query processing	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	100705049 P	en_US
dc.identifier.accessionNumber	107379
dc.contributor.callno	005.759/MUH/2009	en_US