Scalable algorithm for multi-class support vector machine on geo-distributed datasets

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Adnan, Dr. Muhammad Abdullah
dc.contributor.author	Kabir, Tasnim
dc.date.accessioned	2019-11-09T04:42:24Z
dc.date.available	2019-11-09T04:42:24Z
dc.date.issued	2019-07-23
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5363
dc.description.abstract	Training machine learning models on large scale data to efficiently discover valuable information while maintaining the security and privacy of data remains an important research issue. Often many real-life applications such as health-care systems or financial organizations distribute data over many data centers and these data centers may have a different privacy policy. The joint decision over this dataset while not sharing the local information of the data centers is a must and becomes a challenging problem. The State-of-the-art method mainly relies on cryptographic technique to ensure the privacy for data communication between the data centers. This technique alone is not suitable for the large-scale geo-distributed datasets as this is designed for small-scale systems. To solve these problems, we propose a novel approach to achieve privacy-preserving Support Vector Machine (SVM) algorithm where the training set is distributed and each partition can contain large-scale data. We utilize the traditional SVM model to train the dataset of local data centers and with a few parameters sent by those training models, a centralized machine calculates the final result. We show that the proposed model is secure in an adverse environment and use experimental evaluation to demonstrate its correctness and computation speed compared to other parallel SVM training models. Our proposed Distributed SVM (D-SVM) and Time-constrained Distributed SVM (TCD- SVM) algorithms scale the efficiency and speed of the learning network. We conduct exper- iments to compare our algorithms with traditional SVM and state-of-the-art algorithms using data sets collected from the UCI Machine Learning repository. We simulate these algorithms on Amazon Web Service (AWS) EC2 instances and show that using Distributed SVM and Time- constrained Distributed SVM, we can achieve an improvement of 87.5% and 90.67% in task iv completion time, respectively, compared to the traditional SVM. We show that we can achieve accuracy very close to the traditional SVM having great improvement in task completion times.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of computer Science and Engineering	en_US
dc.subject	Big data	en_US
dc.title	Scalable algorithm for multi-class support vector machine on geo-distributed datasets	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	0417052036	en_US
dc.identifier.accessionNumber	117209
dc.contributor.callno	005.7/TAS/2019	en_US