New mixture of distillation strategy for knowledge transfer

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Islam, Dr. Md. Monirul
dc.contributor.author	Mamunoor, Rashid
dc.date.accessioned	2024-01-21T06:44:54Z
dc.date.available	2024-01-21T06:44:54Z
dc.date.issued	2023-03-27
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6573
dc.description.abstract	Despite the fact that best performing supervised learning models are often ensemble of many base classifiers or a very large and complex classifier, it may suffer by lack of resource related problem on smart-phones or Internet of Things (IoT) related devices. Model compression or distillation is the solution to turn a large and complex model or an ensemble of models into a smaller and faster model, usually without significant loss in performance which is more suitable for deployment in resource constrained devices. However, existing offline distillation methods rely on a strong pre-trained teacher model to solve complex problems leading to a lengthy and complex multi-phase training procedure. Its online counterparts on the otherhand address this limitation by introducing simultaneous training of student and teacher models where peer learning provide extra teaching knowledge. Though online distillation sometimes outperforms than the teacher based offline distillation, this teacher-student simultaneous learning strategy some time pulls to “the blind leading the blind” paradigm. To avoid these problems, we present a new single stage training procedure named Mixture of Distillation (MoD) which introduces a different kind of independent-dependent group learning for both student and teacher models and utilizes the complementary strengths of both offline and online distillation loss function. The Main objective of such a hybrid approach is to improve accuracy and to reduce the training time. Extensive evaluations on SVHN, MNIST, NumtaDB, CIFAR-10 and CIFAR-100 datasets substantiates that our proposed “Mixture of Distillation” improves the generalization performance more significantly than existing distillation methods.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering (CSE)	en_US
dc.subject	Computer programming	en_US
dc.title	New mixture of distillation strategy for knowledge transfer	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	1015052006	en_US
dc.identifier.accessionNumber	119491
dc.contributor.callno	001.6424/MAM/2023	en_US