DSpace Repository

New mixture of distillation strategy for knowledge transfer

Show simple item record

dc.contributor.advisor Islam, Dr. Md. Monirul
dc.contributor.author Mamunoor, Rashid
dc.date.accessioned 2024-01-21T06:44:54Z
dc.date.available 2024-01-21T06:44:54Z
dc.date.issued 2023-03-27
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6573
dc.description.abstract Despite the fact that best performing supervised learning models are often ensemble of many base classifiers or a very large and complex classifier, it may suffer by lack of resource related problem on smart-phones or Internet of Things (IoT) related devices. Model compression or distillation is the solution to turn a large and complex model or an ensemble of models into a smaller and faster model, usually without significant loss in performance which is more suitable for deployment in resource constrained devices. However, existing offline distillation methods rely on a strong pre-trained teacher model to solve complex problems leading to a lengthy and complex multi-phase training procedure. Its online counterparts on the otherhand address this limitation by introducing simultaneous training of student and teacher models where peer learning provide extra teaching knowledge. Though online distillation sometimes outperforms than the teacher based offline distillation, this teacher-student simultaneous learning strategy some time pulls to “the blind leading the blind” paradigm. To avoid these problems, we present a new single stage training procedure named Mixture of Distillation (MoD) which introduces a different kind of independent-dependent group learning for both student and teacher models and utilizes the complementary strengths of both offline and online distillation loss function. The Main objective of such a hybrid approach is to improve accuracy and to reduce the training time. Extensive evaluations on SVHN, MNIST, NumtaDB, CIFAR-10 and CIFAR-100 datasets substantiates that our proposed “Mixture of Distillation” improves the generalization performance more significantly than existing distillation methods. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering (CSE) en_US
dc.subject Computer programming en_US
dc.title New mixture of distillation strategy for knowledge transfer en_US
dc.type Thesis-MSc en_US
dc.contributor.id 1015052006 en_US
dc.identifier.accessionNumber 119491
dc.contributor.callno 001.6424/MAM/2023 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account