Escaping the Big Data Paradigm with Compact Transformers 제1부

이준석·2022년 8월 18일

Escaping the Big Data Paradigm with Compact Transformers

목록 보기

1/2

Abstract

With the rise of Transformers as the standard for language processing, and their advancements in computer vision, there has been a corresponding growth in parameter size and amounts of training data.
Transformer가 언어 처리의 표준으로 부상하고 컴퓨터 비전이 발전함에 따라 매개변수 크기와 훈련 데이터 양이 그에 상응하는 성장을 이루었습니다.

Many have come to believe that because of this, transformers are not suitable for small sets of data.
많은 사람들이 이 때문에 변압기가 작은 데이터 세트에 적합하지 않다고 생각하게 되었습니다.

This trend leads to concerns such as: limited availability of data in certain scientific domains and the exclusion of those with limited resource from research in the field.
이러한 경향은 다음과 같은 문제로 이어집니다. 특정 과학 영역의 데이터 가용성 제한 및 해당 분야의 연구에서 제한된 자원의 배제.

In this paper, we aim to present an approach for small-scale learning by introducing Compact Transformers.
이 논문에서는 Compact Transformers를 도입하여 소규모 학습을 위한 접근 방식을 제시하는 것을 목표로 합니다.

We show for the first time that with the right size, convolutional tokenization, transformers can avoid overfitting and outperform state-of-the-art CNNs on small datasets.
올바른 크기, 컨볼루션 토큰화를 통해 변환기가 과적합을 방지하고 작은 데이터 세트에서 최첨단 CNN을 능가할 수 있음을 처음으로 보여줍니다.

Our models are flexible in terms of model size, and can have as little as 0.28M parameters while achieving competitive results.
당사 모델은 모델 크기 측면에서 유연하며 경쟁 결과를 달성하면서 0.28M만큼 적은 매개변수를 가질 수 있습니다.

Our best model can reach 98% accuracy when training from scratch on CIFAR-10 with only 3.7M parameters, which is a significant improvement in data-efficiency over previous Transformer based models being over 10x smaller than other transformers and is 15% the size of ResNet50 while achieving similar performance.
우리의 최고의 모델은 3.7M 매개 변수만으로 CIFAR-10에서 처음부터 훈련할 때 98%의 정확도에 도달할 수 있는데, 이는 다른 변압기보다 10배 이상 작고 ResNet50의 15% 크기인 이전 변압기 기반 모델에 비해 데이터 효율성이 크게 향상되었으며 유사한 성능을 달성한다.

CCT also outperforms many modern CNN based approaches, and even some recent NAS-based approaches.
CCT는 또한 많은 최신 CNN 기반 접근 방식과 일부 최근 NAS 기반 접근 방식보다 성능이 뛰어납니다.

Additionally, we obtain a new SOTA result on Flowers-102 with 99.76% top-1 accuracy, and improve upon the existing baseline on ImageNet (82.71% accuracy with 29% as many parameters as ViT), as well as NLP tasks.
또한, 우리는 99.76% top-1 정확도로 Flowers-102에 대한 새로운 SOTA 결과를 얻었고 ImageNet의 기존 기준선(82.71% 정확도와 ViT만큼 많은 매개변수 수) 및 NLP 작업을 개선했습니다.

Our simple and compact design for transformers makes them more feasible to study for those with limited computing resources and/or dealing with small datasets, while extending existing research efforts in data efficient transformers.
feasible 실현가능한
우리의 단순하고 컴팩트한 변압기 설계는 제한된 컴퓨팅 리소스를 가지고 있거나 작은 데이터 세트를 다루는 사람들을 위한 연구를 더 실현 가능하게 하는 동시에 데이터 효율적인 변압기에 대한 기존 연구 노력을 확장한다.

Conclusion

Transformers have commonly been perceived to be only applicable to larger-scale or medium-scale training.
변압기는 일반적으로 대규모 또는 중간 규모 교육에만 적용할 수 있는 것으로 인식되었습니다.

While their scalability is undeniable, we have shown within this paper that with proper configuration, a transformer can be successfully used in small data regimes as well, and outperform convolutional models of equivalent, and even larger, sizes.
scalability 확정성 undeniable 부인할수 없는 configuration 구성 regime 정권, 체제, 영역
확장성은 부인할 수 없지만, 본 논문에서 우리는 적절한 구성을 통해 변압기를 소규모 데이터 체제에서도 성공적으로 사용할 수 있으며, 동등하고 심지어 더 큰 크기의 컨볼루션 모델을 능가할 수 있음을 보여주었다.

Our method is simple, flexible in size, and the smallest of our variants can be easily loaded on even a minimal GPU, or even a CPU.
variants 유연하다
우리의 방법은 간단하고, 크기가 유연하며, 우리의 변형 중 가장 작은 것은 최소한의 GPU나 CPU에도 쉽게 로드할 수 있다.

While part of research has been focused on large-scale models and datasets, we focus on smaller scales in which there is still much research to be done in data efficiency.
연구의 일부는 대규모 모델과 데이터 세트에 초점을 맞추었지만, 우리는 데이터 효율성에 대해 여전히 많은 연구가 필요한 소규모 규모에 초점을 맞춥니다.

We show that CCT can outperform other transformer based models on small datasets while also having a significant reduction in computational costs and memory constraints.
우리는 CCT가 작은 데이터 세트에서 다른 변환기 기반 모델을 능가하는 동시에 계산 비용과 메모리 제약을 크게 줄일 수 있음을 보여줍니다.

This work demonstrates that transformers do not require vast computational resources and can allow for their applications in even the most modest of settings.
modest 겸손한, 수수한
이 연구는 변압기가 방대한 계산 리소스를 필요로 하지 않으며 가장 수수한 설정에서도 응용 프로그램을 허용할 수 있음을 보여준다.

This type of research is important to many scientific domains where data is far more limited that the conventional machine learning datasets which are used in general research.
이러한 유형의 연구는 일반 연구에서 사용되는 기존의 기계 학습 데이터 세트보다 데이터가 훨씬 더 제한된 많은 과학 영역에서 중요합니다.

Continuing research in this direction will help open research up to more people and domains, extending machine learning research.
이 방향으로 연구를 계속하면 더 많은 사람과 영역에 연구를 개방하여 기계 학습 연구를 확장하는 데 도움이 될 것입니다.

이준석

인공지능 전문가가 될레요

다음 포스트

Escaping the Big Data Paradigm with Compact Transformers 제1부