Zhang, Yaping, et al. "Sequence-to-sequence domain adaptation network for robust text image recognition." ...
Kobayashi, Sosuke. "Contextual augmentation: Data augmentation by words with paradigmatic relations." arXiv preprint arXiv:1805.06201 (2018).
Sun, Jinxuan, et al. "Generative adversarial networks with mixture of t-distributions noise for diverse image generation." Neural Networks 122 (2020)
Xie, Qizhe, et al. "Unsupervised data augmentation for consistency training." arXiv preprint arXiv:1904.12848 (2019).
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate."
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection."
Rothe, Sascha, Sebastian Ebert, and Hinrich Schütze. "Ultradense word embeddings by orthogonal transformation."
Vaswani, Ashish, et al. "Attention is all you need." arXiv preprint arXiv:1706.03762 (2017).
Bai, Yancheng, et al. "Sod-mtgan: Small object detection via multi-task generative adversarial network."
Zhang, Han, et al. "Self-attention generative adversarial networks."
Zhou, Xingyi, Jiacheng Zhuo, and Philipp Krahenbuhl. "Bottom-up object detection by grouping extreme and center points."
Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints.
Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as points.
Lee, J., Hayashi, H., Ohyama, W., & Uchida, S. (2019). Page segmentation using a convolutional neural network with trainable co-occurrence features
Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks.
Zhu, Z., Xu, M., Bai, S., Huang, T., & Bai, X. (2019). Asymmetric non-local neural networks for semantic segmentation.
Cao, Y., Xu, J., Lin, S., Wei, F., & Hu, H. (2019). Gcnet: Non-local networks meet squeeze-excitation networks and beyond.
An image is worth 16x16 words: Transformers for image recognition at scale
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows.
Kim, S., Kim, D., Cho, M., & Kwak, S. (2020). Proxy anchor loss for deep metric learning.
Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks.
Liao, M., Pang, G., Huang, J., Hassner, T., & Bai, X. (2020). Mask textspotter v3: Segmentation proposal network for robust scene text spotting.