DSpace Repository

Evaluating the scalability of hadoop in a real and virtual environment on a huge volume of unstructured data

Show simple item record

dc.contributor.advisor Saiful Islam, Md.
dc.contributor.author Siddiq Bin Nur, Md.
dc.date.accessioned 2018-01-27T09:02:42Z
dc.date.available 2018-01-27T09:02:42Z
dc.date.issued 2017-03-27
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/4741
dc.description.abstract Businesses across all industries, academic institutions or research organizations are gathering and storing more and more unstructured data on a daily basis. Unstructured data is being constantly generated via call center logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, and so on.Unstructured data takes a lion’s share in digital space and approximately occupies 80% by volume compared to only 20% for structured data. Until recently, the technology didn’t evolve to support doing much with it except storing it or analyzing it manually. While the amount of unstructured data is increasing rapidly, businesses’ ability to summarize, understand and make sense of such data for making better business decisions become challenging. But organizations are in dire need to process and exploit unstructured data to get edge in business. Some big data tools, primarily those based on Hadoop as well as MapReduce, are designed from the ground up to manage and analyze unstructured information. In this project, an attempt is made to determine the scalability of Hadoop cluster on huge volumes of textual unstructured data for word count. For this a Hadoop cluster is established through real environment and sample data sets are analyzed by using the cluster. It is found that it significantly reduce the processing time of desired output based on Hadoop cluster size. The results show that as cluster size increases the performance gives better output in terms of task completion time. en_US
dc.language.iso en en_US
dc.publisher Institute of Information and Communication Technology (IICT) en_US
dc.subject Database management | Optical data processing en_US
dc.title Evaluating the scalability of hadoop in a real and virtual environment on a huge volume of unstructured data en_US
dc.type Thesis - Post Graduate Diploma en_US
dc.contributor.id 0413311014 en_US
dc.identifier.accessionNumber 115892
dc.contributor.callno 005.74/SID/2017 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account