๐Ÿ”ฅ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ - EfficientNet

esc247ยท2023๋…„ 9์›” 22์ผ
0

AI

๋ชฉ๋ก ๋ณด๊ธฐ
22/22

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks


efficientnet.pdf

Abstract

๋ชจ๋ธ์˜ ์„ฑ๋Šฅ๊ณผ ํšจ์œจ์„ฑ์„ ๊ท ํ˜•์žˆ๊ฒŒ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ๋„คํŠธ์›Œํฌ์˜ ๊ทœ๋ชจ(scale)๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œ

โ†’ Compound Scaling

Introduction

์ด์ „์—” ConvNet์„ scale up ํ•  ๋–„ depth or width or resolution ์ค‘ ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉ

  • ๋™์‹œ์— ์•ˆ ํ•œ ์ด์œ : arbitrary scaling requires tedious manual tuning and still often yields sub-optimal accuracy and efficiency

width/depth/resolution๋ฅผ constant ratio๋กœ scale โ†’ Compound Scaling Method

  • uniformly scales network width, depth and resolution with a set of fixed scaling coefficients
  • if computational resources 2N2^N โ†’ depth width image size : ฮฑN,ฮฒN,ฮณN\alpha^N, \beta^N, \gamma^N
    • ์ด ๋•Œ ฮฑ,ฮฒ,ฮณ\alpha , \beta, \gamma๋Š” original small model ์—์„œ gird search
  • ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ๋„คํŠธ์›Œํฌ๊ฐ€ ๋” ๋งŽ์€ layer์™€ channel์„ ํ•„์š”๋กœ ํ•œ๋‹ค
    • receptive field๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋” ๋งŽ์€ layer ํ•„์š”
      • ๊ฐœ๋ณ„ ํ”ฝ์…€์ด "๋ณด์ด๋Š”" ์˜์—ญ์ธ receptive field ๋„ ํ•จ๊ป˜ ์ฆ๊ฐ€ํ•ด์•ผ
    • fine-grained pattern์„ ์ฐพ๊ธฐ ์œ„ํ•ด ๋” ๋งŽ์€ channel์ด ํ•„์š”ํ•˜๋‹ค
      • ํฐ ์ด๋ฏธ์ง€์—๋Š” ์ž‘์€ ์„ธ๋ถ€ ์‚ฌํ•ญ๊ณผ ๋ฏธ์„ธํ•œ ํŒจํ„ด๋“ค์ด ๋” ๋งŽ์ด ํฌํ•จ

Compound Model Scaling

Problem Formulation

Y=Fi(Xi)Y = F_i(X_i)

  • y : output tensor
  • f: operator
  • x : input tensor, shape : [Hi,Wi,Ci][H_i, W_i, C_i] height, width, channel

ConvNet NN

FiLiF_i^{L_i} : F๊ฐ€ stage i์—์„œ L๋ฒˆ ๋ฐ˜๋ณต๋œ๋‹ค

F๋ฅผ fixํ•˜๊ณ  L,C,(H,W)๋ฅผ expand

Scaling Dimensions

Depth dd

  • Deeper โ†’ capture richer and more complex features, generalize well on new tasks
  • ๊ทธ๋Ÿฌ๋‚˜ Gradient Vanishing Problem
  • ํ•ด๊ฒฐ์ฑ… : Skip Connections, Batch Normalization ๋“ฑ, ํ•˜์ง€๋งŒ ํ•ด๊ฒฐ X

Width ww

  • Wider โ†’ capture more fine-grained features and are easier to train
  • ๊ทธ๋Ÿฌ๋‚˜ extremely wide but shallow networks tend to have difficulties in capturing higher level features

Resolution ฮณ\gamma

  • capture more fine-grained patterns
  • accuracy gain diminishes for very high resolutions

Compound Scaling

์ง๊ด€์ ์œผ๋กœ

Higher Resolution images โ†’ increase depth & increase Width

โ‡’ need to coordinate and balance different scaling dimensions rather than conventional single-dimension scaling

Observation 2

  • balance all dimensions of network width, depth, and resolution during ConvNet scaling

  • ฯ•\phi๋กœ uniformly scale ํ•œ๋‹ค
  • ์ด ๋•Œ ฮฑ, ฮฒ, ฮณ ๋Š” small grid search๋กœ ๊ฒฐ์ •

depth 2๋ฐฐ โ†’ FLOPS 2๋ฐฐ

width, resolution 2๋ฐฐ โ†’ FLOPS 4๋ฐฐ

โ†’ ๋ณธ ๋…ผ๋ฌธ์—์„œ total FLOPS 2ฯ† ์ฆ๊ฐ€

EfficientNet Architecture

  • Mnas-Net๊ณผ ์œ ์‚ฌํ•œ ์•„ํ‚คํ…์ฒ˜
  • MBConv block์„ ์‚ฌ์šฉ
  • squeeze and excitation ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰

Conclusion

profile
๋ง‰์ƒ ํ•˜๋ฉด ๋ชจ๋ฅด๋‹ˆ๊นŒ ์ผ๋‹จ ํ•˜์ž.

0๊ฐœ์˜ ๋Œ“๊ธ€