Collection of papers on Knowledge Distillation. The PDF of each paper could be obtained by clicking the title.
- Spatial knowledge distillation to aid visual reasoning. Aditya, S., Saha, R., Yang, Y. & Baral, C. (2019). WACV.
- Knowledge distillation from internal representations. Aguilar, G., Ling, Y., Zhang, Y., Yao, B., Fan, X. & Guo, E. (2020). AAAI.
- Compressing gans using knowledge distillation. Aguinaldo, A., Chiang, P. Y., Gain, A., Patil, A., Pearson, K. & Feizi, S. (2019).
- Variational information distillation for knowledge transfer. Ahn, S., Hu, S., Damianou, A., Lawrence, N. D. & Dai, Z. (2019). CVPR.
- Emotion recognition in speech using crossmodal transfer in the wild. Albanie, S., Nagrani, A., Vedaldi, A. & Zisserman, A. (2018). ACM MM.
- Learning and generalization in overparameterized neural networks going beyond two layers. Allen-Zhu, Z., Li, Y., & Liang, Y. (2019). NeurIPS.
- Large scale distributed neural network training through online distillation. Anil, R., Pereyra, G., Passos, A., Ormandi, R., Dahl, G. E.. & Hinton, G. E. (2018). ICLR.
- On the optimization of deep networks: Implicit acceleration by overparameterization. Arora, S., Cohen, N., & Hazan, E. (2018). ICML.
- On knowledge distillation from complex networks for response prediction. Arora, S., Khapra, M. M. & Ramaswamy, H. G. (2019). NAACL-HLT.
- Domain adaptation of dnn acoustic models using knowledge distillation. Asami, T., Masumura, R., Yamaguchi, Y., Masataki, H. & Aono, Y. (2017). ICASSP.
- N2N learning: Network to network compression via policy gradient reinforcement learning. Ashok, A., Rhinehart, N., Beainy, F. & Kitani, K. M. (2018). ICLR.
- Ensemble knowledge distillation for learning improved and efficient networks. Asif, U., Tang, J. & Harrer, S. (2020). ECAI.
- Do deep nets really need to be deep?. Ba, J. & Caruana, R. (2014). NeurIPS.
- Label refinery: Improving imagenet classification through label progressio. Bagherinezhad, H., Horton, M., Rastegari, M. & Farhadi, A. (2018).
- Few shot network compression via cross distillation. Bai, H., Wu, J., King, I. & Lyu, M. (2020). AAAI.
- Learn spelling from teachers: transferring knowledge from language models to sequence-to-sequence speech recognition. Bai, Y., Yi, J., Tao, J., Tian, Z. &Wen, Z. (2019). Interspeech.
- Teacher guided architecture search. Bashivan, P., Tensen, M. & DiCarlo, J. J. (2019). ICCV.
- Adversarial network compression. Belagiannis, V., Farshad, A. & Galasso, F. (2018). ECCV.
- Representation learning: A review and new perspectives. Bengio, Y., Courville, A., & Vincent, P. (2013). IEEE TPAMI 35(8): 1798–1828.
- Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C. (2020). CVPR.
- Efficient video classification using fewer frames. Bhardwaj, S., Srinivasan, M. & Khapra, M. M. (2019). CVPR.
- Distributed Distillation for On-Device Learning. Bistritz, I., Mann, A., & Bambos, N. (2020). NeurIPS.
- Flexible Dataset Distillation: Learn Labels Instead of Images. Bohdal, O., Yang, Y., & Hospedales, T. (2020).
- Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks. Boo, Y., Shin, S., Choi, J., & Sung, W. (2021). AAAI.
- Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem. Brutzkus, A., & Globerson, A. (2019). ICML.
- Model compression. Bucilua, C., Caruana, R. & Niculescu-Mizil, A. (2006). SIGKDD.
- Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning. Caccia, M., Rodriguez, P., Ostapenko, O., Normandin, F., Lin, M., Caccia, L., Laradji, I., Rish, I., Lacoste, A., Vazquez D., & Charlin, L. (2020). NeurIPS.
- Transferring knowledge from a rnn to a DNN. Chan, W., Ke, N. R. & Lane, I. (2015).
- Data-Free Knowledge Distillation for Object Detection. Chawla, A., Yin, H., Molchanov, P., & Alvarez, J. (2021). WACV.
- Distilling knowledge from ensembles of neural networks for speech recognition. Chebotar, Y. & Waters, A. (2016). Interspeech.
- Online knowledge distillation with diverse peers. Chen, D., Mei, J. P., Wang, C., Feng, Y. & Chen, C. (2020a). AAAI.
- Cross-Layer Distillation with Semantic Calibration. Chen, D., Mei, J. P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., & Chen, C. (2021). AAAI.
- Learning efficient object detection models with knowledge distillation. Chen, G., Choi, W., Yu, X., Han, T., & Chandraker, M. (2017). NeurIPS.
- Data-Free Learning of Student Networks. Chen, H., Wang, Y., Xu, C., Yang, Z., Liu, C., Shi, B., Xu, C., Xu, C., &Tian, Q. (2019a). ICCV.
- Learning student networks via feature embedding. Chen, H., Wang, Y., Xu, C., Xu, C. & Tao, D. (2021). IEEE TNNLS 32(1): 25-35.
- Net2Net: ACCELERATING LEARNING VIA KNOWLEDGE TRANSFER. Chen, T., Goodfellow, I. & Shlens, J. (2016). ICLR.
- Knowledge distillation with feature maps for image classification. Chen, W. C., Chang, C. C. & Lee, C. R. (2018a). ACCV.
- Adversarial distillation for efficient recommendation with external knowledge. Chen, X., Zhang, Y., Xu, H., Qin, Z. & Zha, H. (2018b). ACM TOIS 37(1): 1–28.
- A two-teacher tramework for knowledge distillation. Chen, X., Su, J. & Zhang, J. (2019b). ISNN.
- Darkrank: Accelerating deep metric learning via cross sample similarities transfer. Chen, Y., Wang, N. & Zhang, Z. (2018c). AAAI.
- Distilling knowledge learned in BERT for text generation. Chen, Y. C., Gan, Z., Cheng, Y., Liu, J., & Liu, J. (2020b). ACL.
- Crdoco: Pixel-level domain transfer with cross-domain consistency. Chen, Y. C., Lin, Y. Y., Yang, M. H., Huang, J. B. (2019c). CVPR.
- Lifelong Machine Learning, Second Edition Synthesis Lectures on Artificial Intelligence and Machine Learning. Chen, Z. & Liu, B. (2018). 12(3): 1–207.
- A Multi-task Mean Teacher for Semi-supervised Shadow Detection. Chen, Z., Zhu, L., Wan, L., Wang, S., Feng, W., & Heng, P. A. (2020c). CVPR.
- Model compression and acceleration for deep neural networks: The principles, progress, and challenges. Cheng, Y., Wang, D., Zhou, P. & Zhang, T. (2018). IEEE Signal Proc Mag 35(1): 126–136.
- Explaining Knowledge Distillation by Quantifying the Knowledge. Cheng, X., Rao, Z., Chen, Y., & Zhang, Q. (2020). CVPR.
- On the efficacy of knowledge distillation. Cho, J. H. & Hariharan, B. (2019). ICCV.
- Xception: Deep learning with depthwise separable convolutions. Chollet, F. (2017). CVPR.
- Feature-map-level online adversarial knowledge distillation. Chung, I., Park, S., Kim, J. & Kwak, N. (2020). ICML.
- Bam! born-again multitask networks for natural language understanding. Clark, K., Luong, M. T., Khandelwal, U., Manning, C. D. & Le, Q. V. (2019). ACL. 51.Binaryconnect: Training deep neural networks with binary weights during propagations. Courbariaux, M., Bengio, Y. & David, J. P. (2015). NeurIPS. 52.Moonshine: Distilling with cheap convolutions. Crowley, E. J., Gray, G. & Storkey, A. J. (2018). NeurIPS.
- Knowledge distillation across ensembles of multilingual models or low-resource languages. Cui, J., Kingsbury, B., Ramabhadran, B., Saon, G., Sercu, T., Audhkhasi, K. & et al. (2017).ICASSP.
- Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition. Cui, Z., Song, T., Wang, Y., & Ji, Q. (2020). NeurIPS.
- Defocus Blur Detection via Depth Distillation. Cun, X., & Pun, C. M. (2020). ECCV.
- ImageNet: A large-scale hierarchical image database. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei- Fei, L. (2009). CVPR.
- Exploiting linear structure within convolutional networks for efficient evaluation. Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y. & Fergus, R. (2014).NeurIPS.
- Bert: Pre-training of deep bidirectional transformers for language understanding Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. (2019). NAACL-HLT.
- Adaptive regularization of labels Ding, Q., Wu, S., Sun, H., Guo, J. & Xia, ST. (2019).
- Compact trilinear interaction for visual question answering Do, T., Do, T. T., Tran, H., Tjiputra, E. & Tran, Q. D. (2019). ICCV.
- Teacher supervises students how to learn from partially labeled images for facial landmark detection Dong, X. & Yang, Y. (2019). ICCV.
- Unpaired multi-modal segmentation via knowledge distillation Dou, Q., Liu, Q., Heng, P. A., & Glocker, B. (2020). IEEE TMI
- Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space Du, S., You, S., Li, X., Wu, J., Wang, F., Qian, C., & Zhang, C. (2020). NeurIPS.
- ShrinkTeaNet: Million-scale lightweight face recognition via shrinking teacher-student networks Duong, C. N., Luu, K., Quach, K. G. & Le, N. (2019.)
- Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation Fakoor, R., Mueller, J. W., Erickson, N., Chaudhari, P., & Smola, A. J. (2020). NeurIPS.
- Transferring knowledge across learning processes Flennerhag, S., Moreno, P. G., Lawrence, N. D. & Damianou, A. (2019). ICLR.
- Ensemble distillation for neural machine translation Freitag, M., Al-Onaizan, Y. & Sankaran, B. (2017).
- LRC-BERT: Latent representation Contrastive Knowledge Distillation for Natural Language Understanding. Fu, H., Zhou, S., Yang, Q., Tang, J., Liu, G., Liu, K., & Li, X. (2021). AAAI.
- Efficient knowledge distillation from an ensemble of teachers. Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J. & Ramabhadran, B. (2017). Interspeech.
- Born again neural networks. Furlanello, T., Lipton, Z., Tschannen, M., Itti, L. & Anandkumar, A. (2018). ICML.
- An adversarial feature distillation method for audio classification. Gao, L., Mi, H., Zhu, B., Feng, D., Li, Y. & Peng, Y. (2019). IEEE Access 7: 105319–105330.
- Residual Error Based Knowledge Distillation. Gao, M., Wang, Y., & Wan, L. (2021). Neurocomputing 433: 154-161.
- Privileged modality distillation for vessel border detection in intracoronary imaging. Gao, Z., Chung, J., Abdelrazek, M., Leung, S., Hau,W. K., Xian, Z., Zhang, H., & Li, S. (2020). IEEE TMI 39(5): 1524-1534.
- Modality distillation with multiple stream networks for action recognition. Garcia, N. C., Morerio, P. & Murino, V. (2018). ECCV.
- Low-resolution face recognition in the wild via selective knowledge distillation. Ge, S., Zhao, S., Li, C. & Li, J. (2018). IEEE TIP 28(4):2051–2062.
- Efficient Low-Resolution Face Recognition via Bridge Distillation. Ge, S., Zhao, S., Li, C., Zhang, Y., & Li, J. (2020). IEEE TIP 29: 6898-6908.
- Advancing multi-accented lstm-ctc speech recognition using a domain specific student-teacher learning paradigm. Ghorbani, S., Bulut, A. E. & Hansen, J. H. (2018). SLTW.
- White-to-black: Efficient distillation of black-box adversarial attacks. Gil, Y., Chai, Y., Gorodissky, O. & Berant, J. (2019). NAACL-HLT.
- Adversarially robust distillation. Goldblum, M., Fowl, L., Feizi, S. & Goldstein, T. (2020). AAAI.
- Teaching semi-supervised classifier via generalized distillation. Gong, C., Chang, X., Fang, M. & Yang, J. (2018). IJCAI.
- Label propagation via teaching-to-learn and learningto-teach. Gong, C., Tao, D., Liu, W., Liu, L., & Yang, J. (2017). TNNLS 28(6): 1452–1465.
- Generative adversarial nets. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). NeurIPS.
- Explaining sequencelevel knowledge distillation as data-augmentation for neural machine translation. Gordon, M. A. & Duh, K. (2019).
- Search for Better Students to Learn Distilled Knowledge. Gu, J., & Tresp, V. (2020). ECAI.
- Differentiable Feature Aggregation Search for Knowledge Distillation. Guan, Y., Zhao, P., Wang, B., Zhang, Y., Yao, C., Bian, K., & Tang, J. (2020). ECCV.
- Online Knowledge Distillation via Collaborative Learning. Guo, Q., Wang, X., Wu, Y., Yu, Z., Liang, D., Hu, X., & Luo, P. (2020). CVPR.
- Cross modal istillation for supervision transfer. Gupta, S., Hoffman, J. & Malik, J. (2016). CVPR.
- Self-knowledge distillation in natural language processing. Hahn, S. & Choi, H. (2019). RANLP.
- Textkdgan: Text generation using knowledge distillation and generative adversarial networks. Haidar, M. A. & Rezagholizadeh, M. (2019). Canadian Conference on Artificial Intelligence.
- Learning both weights and connections for efficient neural network. Han, S., Pool, J., Tran, J. & Dally, W. (2015). NeurIPS.
- Spatiotemporal distilled dense-connectivity network for video action recognition. Hao, W. & Zhang, Z. (2019). Pattern Recogn 92: 13–24.
- The knowledge within: Methods for data-free model compression. Haroush, M., Hubara, I., Hoffer, E., & Soudry, D. (2020). CVPR.
- Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge. He, C., Annavaram, M., & Avestimehr, S. (2020a). NeurIPS.
- Why resnet works? residuals generalize. He, F., Liu, T., & Tao, D. (2020b). IEEE TNNLS 31(12): 5349–5362.
- Deep residual learning for image recognition. He, K., Zhang, X., Ren, S. & Sun, J. (2016). CVPR.
- Knowledge adaptation for efficient semantic segmentation. He, T., Shen, C., Tian, Z., Gong, D., Sun, C. & Yan, Y. (2019). CVPR.
- A comprehensive overhaul of feature distillation. Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., & Choi, J. Y. (2019a). ICCV.
- Knowledge distillation with adversarial samples supporting decision boundary. Heo, B., Lee, M., Yun, S. & Choi, J. Y. (2019b). AAAI.
- Knowledge transfer via distillation of activation boundaries formed by hidden neurons. Heo, B., Lee, M., Yun, S. & Choi, J. Y.(2019c). AAAI.
- Distilling the knowledge in a neural network. Hinton, G., Vinyals, O. & Dean, J. (2015).
- Learning with Side Information through Modality Hallucination.Hoffman, J., Gupta, S. & Darrell, T. (2016). CVPR.
- GAN-Knowledge Distillation for one-stage Object Detection. Hong, W. & Yu, J. (2019).
- Learning lightweight lane detection cnns by self attention distillation. Hou, Y., Ma, Z., Liu, C. & Loy, CC. (2019). ICCV.
- Inter-Region Affinity Distillation for Road Marking Segmentation. Hou, Y., Ma, Z., Liu, C., Hui, T. W., & Loy, C. C.(2020). CVPR.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017).
- Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing. Hu, H., Xie, L., Hong, R., & Tian, Q. (2020). CVPR.
- Attention-guided answer distillation for machine reading comprehension. Hu, M., Peng, Y., Wei, F., Huang, Z., Li, D., Yang, N. & et al. (2018). EMNLP.
- Densely connected convolutional networks. Huang, G., Liu, Z., Van, Der Maaten, L. & Weinberger, K. Q. (2017). CVPR.
- Knowledge Distillation for Sequence Model. Huang, M., You, Y., Chen, Z., Qian, Y. & Yu, K. (2018). Interspeech.
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. Huang, Z. & Wang, N. (2017).
- Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection. Huang, Z., Zou, Y., Bhagavatula, V., & Huang, D. (2020). NeurIPS.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. Ioffe, S., & Szegedy, C. (2015). ICML
- Learning what and where to transfer. Jang, Y., Lee, H., Hwang, S. J. & Shin, J. (2019). ICML.
- Knowledge Distillation in Wide Neural Networks:Risk Bound, Data Efficiency and Imperfect Teacher. Ji, G., & Zhu, Z. (2020). NeurIPS.
- Tinybert: Distilling bert for natural language understanding. Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L. & et al. (2020). EMNLP.
- Knowledge distillation via route constrained optimization. Jin, X., Peng, B., Wu, Y., Liu, Y., Liu, J., Liang, D., Yan, J. & Hu, X. (2019). ICCV.
- Towards oracle knowledge distillation with neural architecture search. Kang, M., Mun, J. & Han, B. (2020). AAAI.
- Paraphrasing Complex Network: Network Compression via Factor Transfer. Kim, J., Park, S. & Kwak, N. (2018). NeurIPS.
- QKD: Quantization-aware Knowledge Distillation. Kim, J., Bhalgat, Y., Lee, J., Patel, C., & Kwak, N. (2019a).
- Feature fusion for online mutual knowledge distillation. Kim, J., Hyun, M., Chung, I. & Kwak, N. (2019b). ICPR.
- TRANSFERRING KNOWLEDGE TO SMALLER NETWORK WITH CLASS-DISTANCE LOSS.Kim, S. W. & Kim, H. E. (2017). ICLRW.
- Sequence-Level Knowledge Distillation. Kim, Y., Rush & A. M. (2016). EMNLP.
- Few-shot learning of neural networks from scratch by pseudo example optimization. Kimura, A., Ghahramani, Z., Takeuchi, K., Iwata, T. & Ueda, N. (2018). BMVC.
- ADAPTIVE KNOWLEDGE DISTILLATION BASED ON ENTROPY. Kwon, K., Na, H., Lee, H., & Kim, N. S. (2020). ICASSP.
- Cross-Resolution Face Recognition via Prior-Aided Face Hallucination and Residual Knowledge Distillation. Kong, H., Zhao, J., Tu, X., Xing, J., Shen, S. & Feng, J. (2019).
- Learning multiple layers of features from tiny images.
- Imagenet classification with deep convolutional neural networks. Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). NeurIPS.
- Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser. Kuncoro, A., Ballesteros, M., Kong, L., Dyer, C. & Smith, N. A. (2016). EMNLP.
- Unsupervised multi-task adaptation using adversarial cross-task distillation. Kundu, J. N., Lakkakula, N. & Babu, R. V. (2019). CVPR.
- Dual Policy Distillation. Lai, K. H., Zha, D., Li, Y., & Hu, X. (2020). IJCAI.
- Self-Referenced Deep Learning. Lan, X., Zhu, X., & Gong, S. (2018). ACCV.
- Rethinking data augmentation: Self-supervision and selfdistillation. Lee, H., Hwang, S. J. & Shin, J. (2019a).
- Overcoming catastrophic forgetting with unlabeled data in the wild. Lee, K., Lee, K., Shin, J. & Lee, H. (2019b). ICCV.
- Stochasticity and Skip Connection Improve Knowledge Transfer. Lee, K., Nguyen, L. T. & Shim, B. (2019c). AAAI.
- Graph-based knowledge distillation by multi-head attention network. Lee, S. & Song, B. (2019). BMVC.
- Selfsupervised knowledge distillation using singular value decomposition. Lee, S. H., Kim, D. H. & Song, B. C. (2018). ECCV.
- Learning Light-Weight Translation Models from Deep Transformer. Li, B., Wang, Z., Liu, H., Du, Q., Xiao, T., Zhang, C., & Zhu, J. (2021). AAAI.
- Blockwisely Supervised Neural Architecture Search with Knowledge Distillation.. Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L.,& Chang, X. (2020a). CVPR.
- Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts. Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., & Zhang, T. (2020b). NeurIPS.
- Spatiotemporal knowledge distillation for efficient estimation of aerial video saliency. Li, J., Fu, K., Zhao, S. & Ge, S. (2019). IEEE TIP 29:1902–1914.
- Gan compression: Efficient architectures for interactive conditional gans. Li, M., Lin, J., Ding, Y., Liu, Z., Zhu, J. Y., & Han, S.(2020c). CVPR.
- Mimicking very efficient network for object detection. Li, Q., Jin, S. & Yan, J. (2017). CVPR.
- Few sample knowledge distillation for efficient network compression. Li, T., Li, J., Liu, Z., & Zhang, C. (2020d). CVPR.
- Local Correlation Consistency for Knowledge Distillation. Li, X., Wu, J., Fang, H., Liao, Y., Wang, F., & Qian, C. (2020e). ECCV.
- Learning without forgetting. Li, Z. & Hoiem, D. (2017).IEEE TPAMI 40(12): 2935–2947.
- Ensemble distillation for robust model fusion in federated learning. Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). NeurIPS.
- Knowledge flow:Improve upon your teachers. Liu, I. J., Peng, J. & Schwing, A. G. (2019a). ICLR.
- Exploiting the ground-truth: An adversarial imitation based knowledge distillation approach for event detection. Liu, J., Chen, Y. & Liu, K. (2019b). AAAI.
- Knowledge representing:efficient, sparse representation of prior knowledge for knowledge distillation. Liu, J., Wen, D., Gao, H., Tao, W., Chen, T. W., Osa, K. & et al. (2019c). CVPRW.
- DDFlow: Learning optical flow with unlabeled data distillation. Liu, P., King, I., Lyu, M. R., & Xu, J. (2019d). AAAI.
- Ktan: knowledge transfer adversarial network. Liu, P., Liu, W., Ma, H., Mei, T. & Seok, M. (2020a). IJCNN.
- Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. Liu, Q., Xie, L., Wang, H., Yuille & A. L. (2019e). . ICCV.
- Model compression with generative adversarial networks. Liu, R., Fusi, N. & Mackey, L. (2018).
- FastBERT: a self-distilling BERT with Adaptive Inference Time. Liu, W., Zhou, P., Zhao, Z., Wang, Z., Deng, H., & Ju,Q. (2020b). ACL.
- Improving the interpretability of deep neural networks with knowledge distillation. Liu, X., Wang, X. & Matwin, S. (2018b). ICDMW.
- Improving multi-task deep neural networks via knowledge distillation for natural language understanding. Liu, X., He, P., Chen, W. & Gao, J. (2019f).
- Knowledge distillation via instance relationship graph. Liu, Y., Cao, J., Li, B., Yuan, C., Hu, W., Li, Y. & Duan, Y. (2019g). CVPR.
- Structured knowledge distillation for semantic segmentation. Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z. & Wang, J. (2019h). CVPR.
- Search to distill: Pearls are everywhere but not the eyes. Liu, Y., Jia, X., Tan, M., Vemulapalli, R., Zhu, Y., Green, B. & et al. (2019i). CVPR.
- Adaptive multi-teacher multi-level knowledge distillation. Liu, Y., Zhang, W., & Wang, J. (2020c). Neu-rocomputing 415: 106-113.
- Data-free knowledge distillation for deep neural networks. Lopes, R. G., Fenu, S. & Starner, T. (2017). NeurIPS.
- Unifying distillation and privileged information. Lopez-Paz, D., Bottou, L., Sch¨olkopf, B. & Vapnik, V. (2016). ICLR.
- Knowledge distillation for small-footprint highway networks. Lu, L., Guo, M. & Renals, S. (2017). ICASSP.
- Face model compression by distilling knowledge from neurons. Luo, P., Zhu, Z., Liu, Z., Wang, X. & Tang, X. (2016). AAAI.
- Collaboration by Competition: Selfcoordinated Knowledge Amalgamation for Multitalent Student Learning. Luo, S., Pan, W., Wang, X., Wang, D., Tang, H., & Song, M. (2020). ECCV.
- Knowledge amalgamation from heterogeneous networks by common feature learning. Luo, S., Wang, X., Fang, G., Hu, Y., Tao, D., & Song, M. (2019). IJCAI.
- Graph distillation for action detection with privileged modalities. Luo, Z., Hsieh, J. T., Jiang, L., Carlos Niebles, J.& Fei- Fei, L. (2018). ECCV.
- Improving neural architecture search image classifiers via ensemble learning. Macko, V., Weill, C., Mazzawi, H. & Gonzalvo, J. (2019). NeurIPS Workshop.
- Graph representation learning via multi-task knowledge distillation. Ma, J., & Mei, Q. (2019).
- Shufflenet v2: Practical guidelines for efficient cnn architecture design. Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). ECCV
- Conditional teacher-student learning. Meng, Z., Li, J., Zhao, Y. & Gong, Y. (2019). ICASSP.
- Zero-shot Knowledge Transfer via Adversarial Belief Matching. Micaelli, P. & Storkey, A. J. (2019). NeurIPS.
- Knowledge transfer graph for deep collaborative learning. Minami, S., Hirakawa, T., Yamashita, T. & Fujiyoshi, H. (2019).
- Improved knowledge distillation via teacher assistant. Mirzadeh, S. I., Farajtabar,M., Li, A. & Ghasemzadeh, H. (2020). AAAI.
- Apprentice: Using knowledge distillation techniques to improve lowprecision network accuracy. Mishra, A. & Marr, D. (2018). ICLR.
- Self-distillation amplifies regularization in hilbert space. Mobahi, H., Farajtabar, M., & Bartlett, P. L. (2020). NeurIPS.
- Distilling word embeddings: An encoding approach. Mou, L., Jia, R., Xu, Y., Li, G., Zhang, L. & Jin, Z. (2016). CIKM.
- Cogni-net: Cognitive feature learning through deep visual perception. Mukherjee, P., Das, A., Bhunia, A. K. & Roy, P. P. (2019). ICIP.
- Online model distillation for efficient video inference. Mullapudi, R. T., Chen, S., Zhang, K., Ramanan, D. & Fatahalian, K. (2019). ICCV.
- When does label smoothing help?. Muller, R., Kornblith, S. & Hinton, G. E. (2019). NeurIPS.
- Learning to specialize with knowledge distillation for visual question answering. Mun, J., Lee, K., Shin, J. & Han, B. (2018). NeurIPS.
- Knowledge distillation for end-to-end person search. Munjal, B., Galasso, F. & Amin, S. (2019). BMVC.
- Knowledge Distillation for Bilingual Dictionary Induction. Nakashole, N. & Flauger, R. (2017). EMNLP.
- Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation. Nayak, G. K.,Mopuri, K. R., & Chakraborty, A. (2021). WACV.
- Zero-shot knowledge distillation in deep networks. Nayak, G. K., Mopuri, K. R., Shaj, V., Babu, R. V. & Chakraborty, A. (2019). ICML.
- Teacherstudent training for text-independent speaker recognition. Ng, R. W., Liu, X. & Swietojanski, P. (2018). SLTW.
- Dynamic kernel distillation for efficient pose estimation in videos. Nie, X., Li, Y., Luo, L., Zhang, N. & Feng, J. (2019). ICCV.
- Boosting self-supervised learning via knowledge transfer. Noroozi, M., Vinjimoor, A., Favaro, P. & Pirsiavash, H. (2018). CVPR.
- Deep net triage: Analyzing the importance of network layers via structural compression. Nowak, T. S. & Corso, J. J. (2018).
- Parallel wavenet: Fast high-fidelity speech synthesis. Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K. & et al. (2018). ICML.
- Spatio-Temporal Graph for Video Captioning with Knowledge Distillation. Pan, B., Cai, H., Huang, D. A., Lee, K. H., Gaidon, A., Adeli, E., & Niebles, J. C. (2020). CVPR
- A novel enhanced collaborative autoencoder with knowledge distillation for top-n recommender systems. Pan, Y., He, F. & Yu, H. (2019). Neurocomputing 332: 137–148.
- Semi-supervised knowledge transfer for deep learning from private training data. Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I. & Talwar, K. (2017). ICLR
- Distillation as a defense to adversarial perturbations against deep neural networks. Papernot, N., McDaniel, P., Wu, X., Jha, S. & Swami, A. (2016). IEEE SP.
- Feature-level Ensemble Knowledge Distillation for Aggregating Knowledge from Multiple Networks. Park, S. & Kwak, N. (2020). ECAI.
- Relational knowledge distillation. Park,W., Kim, D., Lu, Y. & Cho, M. (2019). CVPR.
- ALP-KD: Attention-Based Layer Projection for Knowledge Distillation. Passban, P., Wu, Y., Rezagholizadeh, M., & Liu, Q. (2021). AAAI.
- Learning deep representations with probabilistic knowledge transfer. Passalis, N. & Tefas, A. (2018). ECCV.
- Probabilistic Knowledge Transfer for Lightweight Deep Representation Learning. Passalis, N., Tzelepi, M., & Tefas, A. (2020a).TNNLS.
- Heterogeneous Knowledge Distillation using Information Flow Modeling. Passalis, N., Tzelepi,M., & Tefas, A. (2020b). CVPR.
- Correlation congruence for knowledge distillation. Peng, B., Jin, X., Liu, J., Li, D., Wu, Y., Liu, Y. & et al. (2019a). ICCV.
- Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search. Peng, H., Du, H., Yu, H., Li, Q., Liao, J., & Fu, J. (2020). NeurIPS.
- Few-shot image recognition with knowledge transfer. Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G. J. & Tang, J. (2019b). ICCV.
- Audio-visual model distillation using acoustic images. Perez, A., Sanguineti, V., Morerio, P. & Murino, V. (2020). Audio-visual model distillation using acoustic images. WACV.
- Towards understanding knowledge distillation. Phuong, M. & Lampert, C. H. (2019a). ICML.
- Distillationbased training for multi-exit architectures. Phuong, M., & Lampert, C. H. (2019b). ICCV.
- Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. Pilzer, A., Lathuiliere, S., Sebe, N. & Ricci, E. (2019). CVPR.
- Model compression via distillation and quantization. Polino, A., Pascanu, R. & Alistarh, D. (2018). ICLR.
- Wise teachers train better dnn acoustic models. Price, R., Iso, K. & Shinoda, K. (2016). . EURASIP Journal on Audio, Speech, and Music Processing 2016(1):10.
- Data distillation: Towards omnisupervised learning. Radosavovic, I., Dollar, P., Girshick, R., Gkioxari, G., & He, K. (2018). CVPR.
- Designing network design spaces. Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., & Dollar P. (2020). CVPR.
- Cross-modality distillation: A case for conditional generative adversarial networks. Roheda, S., Riggan, B. S., Krim, H. & Dai, L. (2018). ICASSP.
- Fitnets: Hints for thin deep nets. Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2015). ICLR.
- Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. Ross, A. S. & Doshi-Velez, F. (2018). AAAI.
- Knowledge adaptation: Teaching to adapt. Ruder, S., Ghaffari, P. & Breslin, J. G. (2017).
- Mobilenetv2: Inverted residuals and linear bottlenecks. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). CVPR.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. Sanh, V., Debut, L., Chaumond, J. & Wolf, T. (2019).
- Distilling knowledge from a deep pose regressor network. Saputra, M. R. U., de Gusmao, P. P., Almalioglu, Y., Markham, A. & Trigoni, N. (2019). ICCV.
- Deep model compression: Distilling knowledge from noisy teachers. Sau, B. B. & Balasubramanian, V. N. (2016).
- Federated Knowledge Distillation. Seo, H., Park, J., Oh, S., Bennis, M., & Kim, S. L. (2020).
- Knowledge distillation in document retrieval. Shakeri, S., Sethy, A. & Cheng, C. (2019).
- Amalgamating knowledge towards comprehensive classification. Shen, C., Wang, X., Song, J., Sun, L., & Song, M. (2019a). AAAI.
- Progressive Network Grafting for Few-Shot Knowledge Distillation. Shen, C., Wang, X., Yin, Y., Song, J., Luo, S., & Song, M. (2021). AAAI.
- Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation. Shen, C., Xue, M., Wang, X., Song, J., Sun, L., & Song, M. (2019b). ICCV.
- In teacher we trust: Learning compressed models for pedestrian detection. Shen, J., Vesdapunt, N., Boddeti, V. N. & Kitani, K. M. (2016).
- Feature representation of short utterances based on knowledge distillation for spoken language identification. Shen, P., Lu, X., Li, S. & Kawai, H. (2018). Interspeech.
- Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification. Shen, P., Lu, X., Li, S., & Kawai, H. (2020). IEEE/ACM T AUDIO SPE 28: 2674-2683.
- Interactive learning of teacher-student model for short utterance spoken language identification. Shen, P., Lu, X., Li, S. & Kawai, H. (2019c). ICASSP.
- Meal: Multi-model ensemble via adversarial learning. Shen, Z., He, Z. & Xue, X. (2019d). AAAI.
- Compression of acoustic event detection models with quantized distillation. Shi, B., Sun, M., Kao, C. C., Rozgic, V., Matsoukas, S. & Wang, C. (2019a). Interspeech.
- Semi-supervised acoustic event detection based on tri-training. Shi, B., Sun, M., Kao, CC., Rozgic, V., Matsoukas, S. & Wang, C. (2019b). ICASSP.
- Knowledge distillation for recurrent neural network language modeling with trust regularization. Shi, Y., Hwang, M. Y., Lei, X. & Sheng, H. (2019c). ICASSP.
- Empirical analysis of knowledge distillation technique for optimization of quantized deep neural networks. Shin, S., Boo, Y. & Sung,W. (2019).
- Incremental learning of object detectors without catastrophic forgetting. Shmelkov, K., Schmid, C. & Alahari, K. (2017). ICCV.
- Knowledge squeezed adversarial network compression. Shu, C., Li, P., Xie, Y., Qu, Y., Dai, L., & Ma, L.(2019).
- Video object segmentation using teacher-student adaptation in a human robot interaction (hri) setting. Siam, M., Jiang, C., Lu, S., Petrich, L., Gamal, M., Elhoseiny, M. & et al. (2019). ICRA.
- Structured transforms for small-footprint deep learning. Sindhwani, V., Sainath, T. & Kumar, S. (2015). NeurIPS.
- Mastering the game of Go with deep neural networks and tree search. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S. (2016). Nature, 529(7587): 484–489.
- Neural compatibility modeling with attentive knowledge distillation. Song, X., Feng, F., Han, X., Yang, X., Liu,W. & Nie, L. (2018). SIGIR.
- Knowledge transfer with jacobian matching. Srinivas, S. & Fleuret, F. (2018). ICML.
- Adapting models to signal degradation using distillation. Su, J. C. & Maji, S. (2017). BMVC.
- Collaborative Teacher-Student Learning via Multiple Knowledge Transfer. Sun, L., Gou, J., Du, L., & Tao, D. (2021)
- Patient knowledge distillation for bert model compression. Sun, S., Cheng, Y., Gan, Z. & Liu, J. (2019). NEMNLP-IJCNLP.
- Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes. Sun, P., Feng, W., Han, R., Yan, S., & Wen, Y. (2019).
- An investigation of a knowledge distillation method for ctc acoustic models. Takashima, R., Li, S. & Kawai, H. (2018). ICASSP.
- Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis. Tan, H., Liu, X., Liu, M., Yin, B., & Li, X. (2021). KTGAN:. IEEE TIP 30: 1275-1290.
- Mnasnet: Platform-aware neural architecture search for mobile. Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). CVPR.
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Tan, M., & Le, Q. (2019). ICML.
- Multilingual neural machine translation with knowledge distillation. Tan, X., Ren, Y., He, D., Qin, T., Zhao, Z. & Liu, T. Y. (2019). ICLR.
- Understanding and Improving Knowledge Distillation. Tang, J., Shivanna, R., Zhao, Z., Lin, D., Singh, A., Chi, E. H., & Jain, S. (2020).
- Ranking distillation: Learning compact ranking models with high performance for recommender system. Tang, J. & Wang, K. (2018). SIGKDD.
- Distilling task-specific knowledge from bert into simple neural networks. Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O. & Lin, J. (2019).
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Tarvainen, A., & Valpola, H. (2017). NeurIPS.
- Cross-modal knowledge distillation for action recognition. Thoker, F. M. & Gall, J. (2019). ICIP.
- Contrastive representation distillation. Tian, Y., Krishnan, D. & Isola, P. (2020). ICLR.
- Understanding Generalization in Recurrent Neural Networks. Tu, Z., He, F., & Tao, D. (2020). ICLR.
- Similarity-preserving knowledge distillation. Tung, F. & Mori, G. (2019). ICCV.
- Well-read students learn better: The impact of student initialization on knowledge distillation. Turc, I., Chang, M. W., Lee, K. & Toutanova, K.(2019).
- Access to unlabeled data can speed up prediction time. Urner, R., Shalev-Shwartz, S., Ben-David, S. (2011). ICML.
- Do deep convolutional nets really need to be deep and convolutional?. Urban, G., Geras, K. J., Kahou, S. E., Aslan, O., Wang, S., Caruana, R. & et al. (2017). ICLR.
- Learning using privileged information: similarity control and knowledge transfer. Vapnik, V. & Izmailov, R. (2015). J Mach Learn Res 16(1): 2023-2049.
- Unifying heterogeneous classifiers with distillation. Vongkulbhisal, J., Vinayavekhin, P. & Visentini-Scarzanella, M. (2019). CVPR.
- Online Ensemble Model Compression using Knowledge Distillation. Walawalkar, D., Shen, Z., & Savvides, M. (2020). ECCV.
- Model distillation with knowledge transfer from face classification to alignment and verification. Wang, C., Lan, X. & Zhang, Y. (2017).
- Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. Wang, L., & Yoon, K. J. (2020).
- Progressive blockwise knowledge distillation for neural network acceleration. Wang, H., Zhao, H., Li, X. & Tan, X. (2018a). IJCAI.
- Private model compression via knowledge distillation. Wang, J., Bao, W., Sun, L., Zhu, X., Cao, B. & Philip, SY. (2019a). AAAI.
- Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. Wang, J., Gou, L., Zhang, W., Yang, H. & Shen, H. W. (2019b). TVCG 25(6): 2168-2180
- Discover the effective strategy for face recognition model compression by improved knowledge distillation. Wang, M., Liu, R., Abe, N., Uchida, H., Matsunami, T. & Yamada, S. (2018b). ICIP.
- Improved knowledge distillation for training fast low resolution face recognition model. Wang, M., Liu, R., Hajime, N., Narishige, A., Uchida, H. & Matsunami, T.(2019c). ICCVW.
- Distilling Object Detectors with Fine-grained Feature Imitation. Wang, T., Yuan, L., Zhang, X. & Feng, J. (2019d). CVPR.
- Dataset distillation. Wang, T., Zhu, J. Y., Torralba, A., & Efros, A. A. (2018c).
- Minilm: Deep self-attention distillation for task-agnostic compression of pretrained transformers. Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020a). NeurIPS.
- A teacher-student framework for maintainable dialog manager. Wang, W., Zhang, J., Zhang, H., Hwang, M. Y., Zong, C. & Li, Z. (2018d). EMNLP.
- Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition. Wang, X., Fu, T., Liao, S., Wang, S., Lei, Z., & Mei, T. (2020b). ECCV.
- Progressive teacher-student learning for early action prediction. Wang, X., Hu, J. F., Lai, J. H., Zhang, J. & Zheng, W. S. (2019e). CVPR.
- Kdgan: Knowledge distillation with generative adversarial networks. Wang, X., Zhang, R., Sun, Y. & Qi, J. (2018e). NeurIPS.
- Packing convolutional neural networks in the frequency domain. Wang, Y., Xu, C., Xu, C. & Tao, D. (2019f). IEEE TPAMI 41(10): 2495–2510.
- Adversarial learning of portable student networks. Wang, Y., Xu, C., Xu, C. & Tao, D. (2018f). AAAI.
- Joint architecture and knowledge distillation in CNN for Chinese text recognition. Wang, Z. R., & Du, J. (2021). Pattern Recognition 111: 107722.
- Student-teacher network learning with enhanced features. Watanabe, S., Hori, T., Le Roux, J. & Hershey, J. R. (2017). ICASSP.
- Online distilling from checkpoints for neural machine translation. Wei, H. R., Huang, S., Wang, R., Dai, X. & Chen, J. (2019). NAACL-HLT.
- Quantizationmimic: Towards very tiny cnn for object detection. Wei, Y., Pan, X., Qin, H., Ouyang,W. & Yan, J. (2018). ECCV.
- Sequence studentteacher training of deep neural networks. Wong, J. H. & Gales, M. (2016). Interspeech.
- Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., ... & Keutzer, K. (2019). CVPR.
- Distilled person re-identification: Towards a more scalable system. Wu, A., Zheng, W. S., Guo, X. & Lai, J. H. (2019a). CVPR.
- Peer Collaborative Learning for Online Knowledge Distillation. Wu, G., & Gong, S. (2021). AAAI.
- Quantized convolutional neural networks for mobile devices. Wu, J., Leng, C., Wang, Y., Hu, Q. & Cheng, J. (2016). CVPR.
- Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. Wu, M. C., Chiu, C. T. & Wu, K. H. (2019b). ICASSP.
- Learning an evolutionary embedding via massive knowledge distillation. Wu, X., He, R., Hu, Y., & Sun, Z. (2020). International Journal of Computer Vision, 1-18.
- Complete random forest based class noise filtering learning for improving the generalizability of classifiers. Xia, S., Wang, G., Chen, Z., & Duan, Y. (2018). IEEE TKDE 31(11): 2063-2078.
- Training convolutional neural networks with cheap convolutions and online distillation. Xie, J., Lin, S., Zhang, Y. & Luo, L. (2019).
- Self-training with Noisy Student improves ImageNet classification. Xie, Q., Hovy, E., Luong, M. T., & Le, Q. V. (2020). CVPR.
- Knowledge Distillation Meets Self-Supervision. Xu, G., Liu, Z., Li, X., & Loy, C. C. (2020a). ECCV.
- Feature Normalized Knowledge Distillation for Image Classification. Xu, K., Rui, L., Li, Y., & Gu, L. (2020b). ECCV.
- Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control. Xu, Z., Wu, K., Che, Z., Tang, J., & Ye, J. (2020c). NeurIPS.
- Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. Xu, Z., Hsu, Y. C. & Huang, J. (2018a). ICLR Workshop.
- Data-distortion guided self-distillation for deep neural networks. Xu, Z., Hsu, Y. C. & Huang, J. (2018b). BMVC.
- Data-distortion guided self-distillation for deep neural networks. Xu, T. B., & Liu, C. L. (2019). AAAI.
- Vargfacenet: An efficient variable group convolutional neural network for lightweight face recognition. Yan, M., Zhao, M., Xu, Z., Zhang, Q., Wang, G. & Su, Z. (2019). ICCVW.
- Knowledge distillation in generations: More tolerant teachers educate better students. Yang, C., Xie, L., Qiao, S. & Yuille, A. (2019a). AAAI.
- Snapshot distillation: Teacher-student optimization in one generation. Yang, C., Xie, L., Su, C. & Yuille, A. L. (2019b). CVPR.
- Knowledge distillation via adaptive instance normalization. Yang, J., Martinez, B., Bulat, A., & Tzimiropoulos, G. (2020a). ECCV.
- Distilling Knowledge From Graph Convolutional Networks. Yang, Y., Qiu, J., Song, M., Tao, D. & Wang, X. (2020b). CVPR.
- TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing. Yang, Z., Cui, Y., Chen, Z., Che, W., Liu, T., Wang, S., & Hu, G. (2020c). ACL.
- Model compression with two-stage multiteacher knowledge distillation for web question answering system. Yang, Z., Shou, L., Gong, M., Lin, W. & Jiang, D. (2020d). WSDM.
- Knowledge Transfer via Dense Cross-Layer Mutual-Distillation. Yao, A., & Sun, D. (2020). ECCV.
- Graph Few-shot Learning via Knowledge Transfer. Yao, H., Zhang, C., Wei, Y., Jiang, M., Wang, S., Huang, J., Chawla, N. V., & Li, Z. (2020). AAAI.
- Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN. Ye, J., Ji, Y., Wang, X., Gao, X., & Song, M. (2020). CVPR.
- Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more. Ye, J., Ji, Y., Wang, X., Ou, K., Tao, D. & Song, M. (2019). CVPR.
- A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. Yim, J., Joo, D., Bae, J. & Kim, J. (2017). CVPR.
- Dreaming to distill: Data-free knowledge transfer via DeepInversion. Yin, H., Molchanov, P., Alvarez, J. M., Li, Z., Mallya, A., Hoiem, D., Jha, Niraj K., & Kautz, J. (2020). CVPR.
- Knowledge extraction with no observable data. Yoo, J., Cho, M., Kim, T., & Kang, U. (2019). NeurIPS.
- Learning from multiple teacher networks. You, S., Xu, C., Xu, C. & Tao, D. (2017). SIGKDD.
- Learning with single-teacher multi-student. You, S., Xu, C., Xu, C. & Tao, D. (2018). AAAI.
- Large batch optimization for deep learning: Training bert in 76 minutes. You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., ... & Hsieh, C. J. (2019). ICLR.
- Learning metrics from teachers: Compact networks for image embedding. Yu, L., Yazici, V. O., Liu, X., Weijer, J., Cheng, Y. & Ramisa, A. (2019). CVPR.
- On compressing deep models by low rank and sparse decomposition. Yu, X., Liu, T., Wang, X., & Tao, D. (2017). CVPR.
- Reinforced Multi-Teacher Selection for Knowledge Distillation. Yuan, F., Shou, L., Pei, J., Lin,W., Gong,M., Fu, Y., & Jiang, D. (2021). AAAI.
- Revisit knowledge distillation: a teacher-free framework. Yuan, L., Tay, F. E., Li, G., Wang, T. & Feng, J. (2020). CVPR.
- CKD: Cross-task knowledge distillation for text-to-image synthesis. Yuan, M., & Peng, Y. (2020). IEEE TMM 22(8): 1955-1968.
- Matching Guided Distillation. Yue, K., Deng, J., & Zhou, F. (2020). ECCV.
- Regularizing Class-wise Predictions via Self-knowledge Distillation. Yun, S., Park, J., Lee, K. & Shin, J. (2020). CVPR.
- Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Zagoruyko, S. & Komodakis, N. (2017). ICLR.
- Lifelong gan: Continual learning for conditional image generation. Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M. & Mori, G. (2019). ICCV.
- Doubly convolutional neural networks. Zhai, S., Cheng, Y., Zhang, Z. M. & Lu, W. (2016). NeurIPS.
- Robust Domain Randomised Reinforcement Learning through Peerto-Peer Distillation. Zhao, C., & Hospedales, T. (2020). NeurIPS.
- Highlight every step: Knowledge distillation via collaborative teaching. Zhao, H., Sun, X., Dong, J., Chen, C., & Dong, Z. (2020a). . IEEE TCYB.
- Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge. Zhao, L., Peng, X., Chen, Y., Kapadia, M., & Metaxas, D. N. (2020b).
- Throughwall human pose estimation using radio signals. Zhao, M., Li, T., Abu Alsh]eikh, M., Tian, Y., Zhao, H., Torralba, A. & Katabi, D. (2018). CVPR.
- Better and faster: knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification. Zhang, C. & Peng, Y. (2018). IJCAI.
- Fast human pose estimation. Zhang, F., Zhu, X. & Ye, M. (2019a). CVPR.
- An informationtheoretic view for deep learning. Zhang, J., Liu, T., & Tao, D. (2018).
- Adversarial co-distillation learning for image recognition. Zhang, H., Hu, Z., Qin, W., Xu, M., & Wang, M. (2021a). Pattern Recognition 111: 107659.
- Task-Oriented Feature Distillation. Zhang, L., Shi, Y., Shi, Z., Ma, K., & Bao, C. (2020a). NeurIPS.
- Be your own teacher: Improve the performance of convolutional neural networks via self distillation. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C. & Ma, K. (2019b). ICCV.
- Discriminability distillation in group representation learning. Zhang, M., Song, G., Zhou, H., & Liu, Y. (2020b). ECCV.
- Future-Guided Incremental Transformer for Simultaneous Translation. Zhang, S., Feng, Y., & Li, L. (2021b). AAAI.
- Knowledge Integration Networks for Action Recognition. Zhang, S., Guo, S., Wang, L., Huang, W., & Scott, M. R. (2020c). AAAI.
- Reliable Data Distillation on Graph Convolutional Network. Zhang, W., Miao, X., Shao, Y., Jiang, J., Chen, L., Ruas, O., & Cui, B. (2020d). ACM SIGMOD.
- Diverse Knowledge Distillation for End-to-End Person Search. Zhang, X., Wang, X., Bian, J. W., Shen, C., & You, M. (2021c). AAAI.
- Shufflenet: An extremely efficient convolutional neural network for mobile devices. Zhang, X., Zhou, X., Lin, M. & Sun, J. (2018a). CVPR.
- Prime-Aware Adaptive Distillation. Zhang, Y., Lan, Z., Dai, Y., Zeng, F., Bai, Y., Chang, J., &Wei, Y. (2020e). ECCV.
- Deep mutual learning. Zhang, Y., Xiang, T., Hospedales, T. M. & Lu, H. (2018b). CVPR.
- Self-Distillation as Instance-Specific Label Smoothing. Zhang, Z., & Sabuncu, M. R. (2020). NeurIPS.
- Object Relational Graph with Teacher-Recommended Learning for Video Captioning. Zhang, Z., Shi, Y., Yuan, C., Li, B., Wang, P., Hu, W., & Zha, Z. J. (2020f). CVPR.
- Understanding knowledge distillation in non-autoregressive machine translation. Zhou C, Neubig G, Gu J (2019a). ICLR.
- Rocket launching: A universal and efficient framework for training well-performing light net. Zhou, G., Fan, Y., Cui, R., Bian, W., Zhu, X. & Gai, K. (2018). AAAI.
- Two-stage image classification supervised by a single teacher single student model. Zhou, J., Zeng, S. & Zhang, B. (2019b). BMVC.
- M2KD: Incremental Learning via Multi-model and Multi-level Knowledge Distillation. Zhou, P., Mai, L., Zhang, J., Xu, N., Wu, Z. & Davis, L. S. (2020). BMVC.
- Low-resolution visual recognition via deep feature distillation. Zhu,M., Han, K., Zhang, C., Lin, J. &Wang, Y. (2019). ICASSP.
Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge Distillation: A Survey IJCV, 129(6), 1789-1819.