Evaluating the scalability of hadoop in a real and virtual environment on a huge volume of unstructured data

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Institute of Information and Communication Technology
→
View Item

Evaluating the scalability of hadoop in a real and virtual environment on a huge volume of unstructured data

Siddiq Bin Nur, Md.

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/4741

Date: 2017-03-27

Abstract:

Businesses across all industries, academic institutions or research organizations are gathering and storing more and more unstructured data on a daily basis. Unstructured data is being constantly generated via call center logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, and so on.Unstructured data takes a lion’s share in digital space and approximately occupies 80% by volume compared to only 20% for structured data. Until recently, the technology didn’t evolve to support doing much with it except storing it or analyzing it manually. While the amount of unstructured data is increasing rapidly, businesses’ ability to summarize, understand and make sense of such data for making better business decisions become challenging. But organizations are in dire need to process and exploit unstructured data to get edge in business. Some big data tools, primarily those based on Hadoop as well as MapReduce, are designed from the ground up to manage and analyze unstructured information. In this project, an attempt is made to determine the scalability of Hadoop cluster on huge volumes of textual unstructured data for word count. For this a Hadoop cluster is established through real environment and sample data sets are analyzed by using the cluster. It is found that it significantly reduce the processing time of desired output based on Hadoop cluster size. The results show that as cluster size increases the performance gives better output in terms of task completion time.

Show full item record

Files in this item

Name: Full Thesis, ...

Size: 943.5Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Dissertations/Theses - Institute of Information and Communication Technology
Post graduate dissertations (Theses) of Institute of Information and Communication Technology (IICT)

Evaluating the scalability of hadoop in a real and virtual environment on a huge volume of unstructured data

Evaluating the scalability of hadoop in a real and virtual environment on a huge volume of unstructured data

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account