EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
efficientnet.pdf
Abstract
๋ชจ๋ธ์ ์ฑ๋ฅ๊ณผ ํจ์จ์ฑ์ ๊ท ํ์๊ฒ ์กฐ์ ํ๊ธฐ ์ํด ๋คํธ์ํฌ์ ๊ท๋ชจ(scale)๋ฅผ ์กฐ์ ํ๋ ๋ฐฉ๋ฒ์ ์ ์
โ Compound Scaling
Introduction
์ด์ ์ ConvNet์ scale up ํ ๋ depth or width or resolution ์ค ํ๋๋ง ์ฌ์ฉ
- ๋์์ ์ ํ ์ด์ : arbitrary scaling requires tedious manual tuning and still often yields sub-optimal accuracy and efficiency
width/depth/resolution๋ฅผ constant ratio๋ก scale โ Compound Scaling Method
- uniformly scales network width, depth and resolution with a set of fixed scaling coefficients
- if computational resources 2N โ depth width image size : ฮฑN,ฮฒN,ฮณN
- ์ด ๋ ฮฑ,ฮฒ,ฮณ๋ original small model ์์ gird search
- ์
๋ ฅ ์ด๋ฏธ์ง์ ํฌ๊ธฐ๊ฐ ์ปค์ง์๋ก ๋คํธ์ํฌ๊ฐ ๋ ๋ง์ layer์ channel์ ํ์๋ก ํ๋ค
- receptive field๋ฅผ ์ฆ๊ฐ์ํค๊ธฐ ์ํด ๋ ๋ง์ layer ํ์
- ๊ฐ๋ณ ํฝ์
์ด "๋ณด์ด๋" ์์ญ์ธ receptive field ๋ ํจ๊ป ์ฆ๊ฐํด์ผ
- fine-grained pattern์ ์ฐพ๊ธฐ ์ํด ๋ ๋ง์ channel์ด ํ์ํ๋ค
- ํฐ ์ด๋ฏธ์ง์๋ ์์ ์ธ๋ถ ์ฌํญ๊ณผ ๋ฏธ์ธํ ํจํด๋ค์ด ๋ ๋ง์ด ํฌํจ
Compound Model Scaling
Problem Formulation
Y=Fiโ(Xiโ)
- y : output tensor
- f: operator
- x : input tensor, shape : [Hiโ,Wiโ,Ciโ] height, width, channel
ConvNet N
FiLiโโ : F๊ฐ stage i์์ L๋ฒ ๋ฐ๋ณต๋๋ค
F๋ฅผ fixํ๊ณ L,C,(H,W)๋ฅผ expand
Scaling Dimensions
Depth d
- Deeper โ capture richer and more complex features, generalize well on new tasks
- ๊ทธ๋ฌ๋ Gradient Vanishing Problem
- ํด๊ฒฐ์ฑ
: Skip Connections, Batch Normalization ๋ฑ, ํ์ง๋ง ํด๊ฒฐ X
Width w
- Wider โ capture more fine-grained features and are easier to train
- ๊ทธ๋ฌ๋ extremely wide but shallow networks tend to have difficulties in capturing higher level features
Resolution ฮณ
- capture more fine-grained patterns
- accuracy gain diminishes for very high resolutions
Compound Scaling
์ง๊ด์ ์ผ๋ก
Higher Resolution images โ increase depth & increase Width
โ need to coordinate and balance different scaling dimensions rather than conventional single-dimension scaling
Observation 2
- balance all dimensions of network width, depth, and resolution during ConvNet scaling
- ฯ๋ก uniformly scale ํ๋ค
- ์ด ๋ ฮฑ, ฮฒ, ฮณ ๋ small grid search๋ก ๊ฒฐ์
depth 2๋ฐฐ โ FLOPS 2๋ฐฐ
width, resolution 2๋ฐฐ โ FLOPS 4๋ฐฐ
โ ๋ณธ ๋
ผ๋ฌธ์์ total FLOPS 2ฯ ์ฆ๊ฐ
EfficientNet Architecture
- Mnas-Net๊ณผ ์ ์ฌํ ์ํคํ
์ฒ
- MBConv block์ ์ฌ์ฉ
- squeeze and excitation ์ต์ ํ๋ฅผ ์งํ
Conclusion