DSpace Repository

Scalable algorithm for multi-class support vector machine on geo-distributed datasets

Show simple item record

dc.contributor.advisor Adnan, Dr. Muhammad Abdullah
dc.contributor.author Kabir, Tasnim
dc.date.accessioned 2019-11-09T04:42:24Z
dc.date.available 2019-11-09T04:42:24Z
dc.date.issued 2019-07-23
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5363
dc.description.abstract Training machine learning models on large scale data to efficiently discover valuable information while maintaining the security and privacy of data remains an important research issue. Often many real-life applications such as health-care systems or financial organizations distribute data over many data centers and these data centers may have a different privacy policy. The joint decision over this dataset while not sharing the local information of the data centers is a must and becomes a challenging problem. The State-of-the-art method mainly relies on cryptographic technique to ensure the privacy for data communication between the data centers. This technique alone is not suitable for the large-scale geo-distributed datasets as this is designed for small-scale systems. To solve these problems, we propose a novel approach to achieve privacy-preserving Support Vector Machine (SVM) algorithm where the training set is distributed and each partition can contain large-scale data. We utilize the traditional SVM model to train the dataset of local data centers and with a few parameters sent by those training models, a centralized machine calculates the final result. We show that the proposed model is secure in an adverse environment and use experimental evaluation to demonstrate its correctness and computation speed compared to other parallel SVM training models. Our proposed Distributed SVM (D-SVM) and Time-constrained Distributed SVM (TCD- SVM) algorithms scale the efficiency and speed of the learning network. We conduct exper- iments to compare our algorithms with traditional SVM and state-of-the-art algorithms using data sets collected from the UCI Machine Learning repository. We simulate these algorithms on Amazon Web Service (AWS) EC2 instances and show that using Distributed SVM and Time- constrained Distributed SVM, we can achieve an improvement of 87.5% and 90.67% in task iv completion time, respectively, compared to the traditional SVM. We show that we can achieve accuracy very close to the traditional SVM having great improvement in task completion times. en_US
dc.language.iso en en_US
dc.publisher Department of computer Science and Engineering en_US
dc.subject Big data en_US
dc.title Scalable algorithm for multi-class support vector machine on geo-distributed datasets en_US
dc.type Thesis-MSc en_US
dc.contributor.id 0417052036 en_US
dc.identifier.accessionNumber 117209
dc.contributor.callno 005.7/TAS/2019 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account