https://openaccess.thecvf.com/content/ICCV2021/papers/Wu_CvT_Introducing_Convolutions_to_Vision_Transformers_ICCV_2021_paper.pdfWu, Haiping, et a
https://proceedings.mlr.press/v139/touvron21aTouvron, Hugo, et al. "Training data-efficient image transformers & distillation through attention."
https://openaccess.thecvf.com/content/ICCV2021/papers/GrahamLeViTAVisionTransformerinConvNetsClothingforFasterInferenceICCV2021_paper.pdf Paper Info
Dai, Zihang, et al. "Coatnet: Marrying convolution and attention for all data sizes." Advances in neural information processing systems 34 (2021): 396