dc.contributor.advisor |
Adnan, Dr. Muhammad Abdullah |
|
dc.contributor.author |
Kabir, Tasnim |
|
dc.date.accessioned |
2019-11-09T04:42:24Z |
|
dc.date.available |
2019-11-09T04:42:24Z |
|
dc.date.issued |
2019-07-23 |
|
dc.identifier.uri |
http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5363 |
|
dc.description.abstract |
Training machine learning models on large scale data to efficiently discover valuable information while maintaining the security and privacy of data remains an important research issue. Often many real-life applications such as health-care systems or financial organizations distribute data over many data centers and these data centers may have a different privacy policy. The joint decision over this dataset while not sharing the local information of the data centers is a must and becomes a challenging problem. The State-of-the-art method mainly relies on cryptographic technique to ensure the privacy for data communication between the data centers. This technique alone is not suitable for the large-scale geo-distributed datasets as this is designed for small-scale systems.
To solve these problems, we propose a novel approach to achieve privacy-preserving Support Vector Machine (SVM) algorithm where the training set is distributed and each partition can contain large-scale data. We utilize the traditional SVM model to train the dataset of local data centers and with a few parameters sent by those training models, a centralized machine calculates the final result. We show that the proposed model is secure in an adverse environment and use experimental evaluation to demonstrate its correctness and computation speed compared to other parallel SVM training models.
Our proposed Distributed SVM (D-SVM) and Time-constrained Distributed SVM (TCD- SVM) algorithms scale the efficiency and speed of the learning network. We conduct exper- iments to compare our algorithms with traditional SVM and state-of-the-art algorithms using data sets collected from the UCI Machine Learning repository. We simulate these algorithms on Amazon Web Service (AWS) EC2 instances and show that using Distributed SVM and Time- constrained Distributed SVM, we can achieve an improvement of 87.5% and 90.67% in task
iv
completion time, respectively, compared to the traditional SVM. We show that we can achieve accuracy very close to the traditional SVM having great improvement in task completion times. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Department of computer Science and Engineering |
en_US |
dc.subject |
Big data |
en_US |
dc.title |
Scalable algorithm for multi-class support vector machine on geo-distributed datasets |
en_US |
dc.type |
Thesis-MSc |
en_US |
dc.contributor.id |
0417052036 |
en_US |
dc.identifier.accessionNumber |
117209 |
|
dc.contributor.callno |
005.7/TAS/2019 |
en_US |