Knowledge-distillation

1.Distilling the Knowledge in a Neural Network[.,2015]

post-thumbnail

2.Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results[.,2017]

post-thumbnail

3.Noise as a Resource for Learning in Knowledge Distillation[.,2019]

post-thumbnail

4.Improving BERT Fine-Tuning via Self Ensemble and Self-Distillation[., 2020]

post-thumbnail