EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
efficientnet.pdf
Abstract
๋ชจ๋ธ์ ์ฑ๋ฅ๊ณผ ํจ์จ์ฑ์ ๊ท ํ์๊ฒ ์กฐ์ ํ๊ธฐ ์ํด ๋คํธ์ํฌ์ ๊ท๋ชจ(scale)๋ฅผ ์กฐ์ ํ๋ ๋ฐฉ๋ฒ์ ์ ์
โ Compound Scaling
data:image/s3,"s3://crabby-images/7f17d/7f17d42eafe48c4f7c4014b1d7e72d584a8c4c7f" alt=""
Introduction
์ด์ ์ ConvNet์ scale up ํ ๋ depth or width or resolution ์ค ํ๋๋ง ์ฌ์ฉ
- ๋์์ ์ ํ ์ด์ : arbitrary scaling requires tedious manual tuning and still often yields sub-optimal accuracy and efficiency
width/depth/resolution๋ฅผ constant ratio๋ก scale โ Compound Scaling Method
data:image/s3,"s3://crabby-images/13fed/13fed57b6784ee4f791a7084c0036162fb39be44" alt=""
- uniformly scales network width, depth and resolution with a set of fixed scaling coefficients
- if computational resources 2N โ depth width image size : ฮฑN,ฮฒN,ฮณN
- ์ด ๋ ฮฑ,ฮฒ,ฮณ๋ original small model ์์ gird search
- ์
๋ ฅ ์ด๋ฏธ์ง์ ํฌ๊ธฐ๊ฐ ์ปค์ง์๋ก ๋คํธ์ํฌ๊ฐ ๋ ๋ง์ layer์ channel์ ํ์๋ก ํ๋ค
- receptive field๋ฅผ ์ฆ๊ฐ์ํค๊ธฐ ์ํด ๋ ๋ง์ layer ํ์
- ๊ฐ๋ณ ํฝ์
์ด "๋ณด์ด๋" ์์ญ์ธ receptive field ๋ ํจ๊ป ์ฆ๊ฐํด์ผ
- fine-grained pattern์ ์ฐพ๊ธฐ ์ํด ๋ ๋ง์ channel์ด ํ์ํ๋ค
- ํฐ ์ด๋ฏธ์ง์๋ ์์ ์ธ๋ถ ์ฌํญ๊ณผ ๋ฏธ์ธํ ํจํด๋ค์ด ๋ ๋ง์ด ํฌํจ
Compound Model Scaling
Problem Formulation
Y=Fiโ(Xiโ)
- y : output tensor
- f: operator
- x : input tensor, shape : [Hiโ,Wiโ,Ciโ] height, width, channel
ConvNet N
data:image/s3,"s3://crabby-images/ec9bc/ec9bc0a30672010a1de882a976fb2eec8aeccfae" alt=""
FiLiโโ : F๊ฐ stage i์์ L๋ฒ ๋ฐ๋ณต๋๋ค
data:image/s3,"s3://crabby-images/2d5e0/2d5e0b4aedeec810cb2bcdd0fb4efff099f4670f" alt=""
F๋ฅผ fixํ๊ณ L,C,(H,W)๋ฅผ expand
Scaling Dimensions
Depth d
- Deeper โ capture richer and more complex features, generalize well on new tasks
- ๊ทธ๋ฌ๋ Gradient Vanishing Problem
- ํด๊ฒฐ์ฑ
: Skip Connections, Batch Normalization ๋ฑ, ํ์ง๋ง ํด๊ฒฐ X
Width w
- Wider โ capture more fine-grained features and are easier to train
- ๊ทธ๋ฌ๋ extremely wide but shallow networks tend to have difficulties in capturing higher level features
Resolution ฮณ
- capture more fine-grained patterns
- accuracy gain diminishes for very high resolutions
Compound Scaling
์ง๊ด์ ์ผ๋ก
Higher Resolution images โ increase depth & increase Width
โ need to coordinate and balance different scaling dimensions rather than conventional single-dimension scaling
data:image/s3,"s3://crabby-images/715f2/715f27a495beecc16accfe3399cab17eb28c07f3" alt=""
Observation 2
- balance all dimensions of network width, depth, and resolution during ConvNet scaling
data:image/s3,"s3://crabby-images/4054d/4054d5c660cb3a4ae98e76c2f564f1a823310558" alt=""
- ฯ๋ก uniformly scale ํ๋ค
- ์ด ๋ ฮฑ, ฮฒ, ฮณ ๋ small grid search๋ก ๊ฒฐ์
depth 2๋ฐฐ โ FLOPS 2๋ฐฐ
width, resolution 2๋ฐฐ โ FLOPS 4๋ฐฐ
โ ๋ณธ ๋
ผ๋ฌธ์์ total FLOPS 2ฯ ์ฆ๊ฐ
EfficientNet Architecture
data:image/s3,"s3://crabby-images/90218/902189b03e9ea5a7439ee3a42c14fd481c5f9b4a" alt=""
- Mnas-Net๊ณผ ์ ์ฌํ ์ํคํ
์ฒ
- MBConv block์ ์ฌ์ฉ
- squeeze and excitation ์ต์ ํ๋ฅผ ์งํ
Conclusion
data:image/s3,"s3://crabby-images/ad399/ad3999d68ea133e420997ea0c0b78434f20356e7" alt=""