[CV] U-Net: Convolutional Networks for Biomedical Image Segmentation

๊ฐ•๋™์—ฐยท2022๋…„ 3์›” 22์ผ
0

[Paper review]

๋ชฉ๋ก ๋ณด๊ธฐ
16/17

๐ŸŽˆ ๋ณธ ๋ฆฌ๋ทฐ๋Š” U-Net ๋ฐ ๋ฆฌ๋ทฐ๋ฅผ ์ฐธ๊ณ ํ•ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ‘ฉโ€๐Ÿ’ป ์˜ค๋Š˜์€ Semantic segmentation ์ค‘ ํ•˜๋‚˜์˜ U-Net ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ์ง„ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ œ๋ชฉ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ๋ณธ ๋…ผ๋ฌธ์ด ์˜ํ•™๊ณ„์—ด๊ณผ ๋งŽ์€ ๊ด€๋ จ์ด ์žˆ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ „ ์˜ํ•™์— ๋Œ€ํ•ด ๋ฌด์ง€ํ•˜๊ธฐ์—.... ๋ฐฉ๋ฒ•๋ก  ์œ„์ฃผ์˜ ๋ฆฌ๋ทฐ๋ฅผ ์ง„ํ–‰ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

Keywords

๐ŸŽˆ U-Net: Contracting path and expanding path
๐ŸŽˆ Overlap Tile strategy
๐ŸŽˆ Elastic Deformation for Data Augmentation

Introduction

โœ” Semantic Segmentation๋Š” ๊ธฐ์กด์˜ Image Classification๊ณผ ๋‹ค๋ฅด๊ฒŒ, ๊ฐ pixel๋งˆ๋‹ค class label๋ฅผ ๊ตฌ๋ถ„ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. Segmentation์˜ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๋จผ์ €, U-Net์ด locaizeํ•  ์ˆ˜ ์žˆ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ patch๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” training data๊ฐ€ training image๋ณด๋‹ค ํ›จ์”ฌ ๋” ๋งŽ์•„์•ผํ•ฉ๋‹ˆ๋‹ค.

โœ” ๊ธฐ์กด์˜ ์ „๋žต๋“ค์—๋Š” ๋‘ ๊ฐ€์ง€ ์ƒ๊ฐํ•ด์•ผํ•  ๋ถ€๋ถ„์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ € ๊ธฐ์กด์˜ sliding window์—๋Š” ๋งŽ์€ ์ค‘๋ณต์ด ์กด์žฌํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ localization ์ •ํ™•๋„์™€ context๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์— trade-off ๊ด€๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋” ํฐ Patch(context)๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ์€ ๋” ๋งŽ์€ max-pooling์ด ํ•„์š”ํ•˜๋ฉฐ ์ด๋Š” localization์˜ ์ •ํ™•๋„๋ฅผ ๋‚ฎ์ถฅ๋‹ˆ๋‹ค.

โœ” ๊ฒฐ๊ณผ์ ์œผ๋กœ U-Net์€ Fully Convolutional Network๋ฅผ ์‚ฌ์šฉํ•ด ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ FCN ๊ตฌ์กฐ๋ฅผ ์ˆ˜์ • ๋ฐ ํ™•์žฅ์„ ํ†ตํ•ด ์ ์€ training ์ด๋ฏธ์ง€๋กœ ๋” ์œ ์˜๋ฏธํ•œ segmentation๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค.

โœ” ์œ„์˜ Overlap-tile strategy๋ฅผ ์‚ฌ์šฉํ•ด seamless segmentation์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŒŒ๋ž€์ƒ‰ ์„ ์˜ ๋ฒ”์œ„๋กœ ๋…ธ๋ž€์ƒ‰ ์„ ์˜ ๋ฒ”์œ„๋ฅผ ์˜ˆ์ธกํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ํŒŒ๋ž€์ƒ‰ ์„ ์˜ ๋ฒ”์œ„ ์ผ๋ถ€๊ฐ€ ๊ธฐ์กด์—๋Š” ์ด๋ฏธ์ง€๊ฐ€ ์ผ๋ถ€๊ฐ€ ์•„๋‹ˆ์ง€๋งŒ, mirroring์„ ํ†ตํ•ด ์ด๋ฏธ์ง€๊ฐ€ ์—†๋Š” ๋ถ€๋ถ„ ์ฑ„์›Œ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Network Architecture

โœ” U-Net์˜ ์ „์ฒด์ ์ธ ๊ตฌ์กฐ ํ˜•ํƒœ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ€์šด๋Œ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ฑฐ์˜ ๋Œ€์นญ์ธ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ขŒ์ธก์€ contracting path ๊ทธ๋ฆฌ๊ณ  ์šฐ์ธก์€ expansive path๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

โœ” contracting path์˜ ๊ตฌ์กฐ๋Š” ์ „ํ˜•์ ์ธ convolution network๊ตฌ์กฐ๋ผ๊ณ  ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ block(?)์ด 2๊ฐœ์˜ 3x3 convolutions๊ณผ ReLU ๊ทธ๋ฆฌ๊ณ  2x2 Max pooling์œผ๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ downsamplingํ•  ๋•Œ feature channels์„ ๋‘ ๋ฐฐ๋กœ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค.

โœ” expansive path๋Š” 2x2 convolution("up-convolution")์ธ upsampling์˜ ๋ฐฉ์‹์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ contracting path๋กœ๋ถ€ํ„ฐ ์˜จ crop๋œ feature map์„ ์‚ฌ์šฉํ•ด concat์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. crop์„ ํ•˜๋Š” ์ด์œ ๋Š” ๊ฐ convolution๋งˆ๋‹ค ๊ฐ€์žฅ์ž๋ฆฌ pixel ์ •๋ณด๋“ค์ด ์†์‹ค๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด๋Š” convolution ๊ณผ์ •์—์„œ padding ํ•˜์ง€ ์•Š๋Š” ์ด์œ ๋ผ๊ณ ๋„ ๋งํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐ๋ฉ๋‹ˆ๋‹ค.

โœ” ์ด 23๊ฐœ์˜ convolution layer๋กœ ์กด์žฌํ•˜๋ฉฐ, connected layer๋Š” ์กด์žฌํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฐœ์ธ์ ์œผ๋กœ๋Š” ์œ„์˜ ๋„คํŠธ์›Œํฌ ๊ทธ๋ฆผ์ด ์ง๊ด€์ ์œผ๋กœ ์ž˜ ์„ค๋ช…๋˜์–ด์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

Training

โœ” Unpadded convolution์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— output image๋Š” input ๋ณด๋‹ค ์ž‘์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๋” ํฐ input ์‚ฌ์ด์ฆˆ์˜ batch size๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ ์ด๋Š” ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋งˆ๋‹ค ๋ฐฐ์น˜ ์ˆ˜๋Š” ์ค„์–ด๋“ค๊ฒ๋‹ˆ๋‹ค. ์ด ๋ถ€๋ถ„์˜ ๋ณด์™„์„ ์œ„ํ•ด momentum(0.99)๋ฅผ ์‚ฌ์šฉํ•ด ๊ณผ๊ฑฐ์˜ ์ •๋ณด๋ฅผ ๋” ๋งŽ์ด ๋ฐ›์•„๋“œ๋ฆฝ๋‹ˆ๋‹ค.

โœ” U-Net์—์„œ ์‚ฌ์šฉํ•œ ์• ๋„ˆ์ง€ํ•จ์ˆ˜(์ € ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” ์†์‹คํ•จ์ˆ˜๋ผ๊ณ  ์ดํ•ดํ–ˆ์Šต๋‹ˆ๋‹ค..)์˜ ๊ฒฝ์šฐ pixel-wise(ํ”ฝ์…€ ๋‹จ์œ„์˜) soft-max๋ฅผ ์ตœ์ข… feature map์—์„œ cross entropy์™€ ๊ฒฐํ•ฉ๋˜ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

โœ” Soft-max๋Š” ์œ„์˜ ์‹๊ณผ ๊ฐ™์ด pk(x)p_k(x)๋กœ ์ •์˜ ๋ฉ๋‹ˆ๋‹ค. ak(x)a_k(x)์˜ ๊ฒฝ์šฐ feautre map k์—์„œ์˜ activation์ด๋ผ๊ณ  ํ•˜๋ฉฐ, xx๋Š” ํ”ฝ์…€ ํฌ์ง€์…˜์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋Œ€๋ฌธ์ž KK๋Š” ํด๋ž˜์Šค์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•˜๊ณ , pk(x)p_k(x)์€ approximated maximun-function์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

โœ” ์ „์ฒด์ ์ธ ์• ๋„ˆ์ง€ํ•จ์ˆ˜์˜ ๊ตฌ์กฐ๋Š” ์œ„์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

โœ” ๋˜ํ•œ ์œ„ ๋…ผ๋ฌธ์€ ์˜ํ•™๊ด€๋ จ ์ฃผ์ œ๋ฅผ ๋‹ค๋ฃจ๋ฉฐ, ๊ฒฝ๊ณ„๋ถ€๋ถ„์„ ๋ถ„๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด์„œ Weight map์„ pre-computeํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ training dataset์˜ ํŠน์ • ํด๋ž˜์Šค์—์„œ ํ”ฝ์…€์˜ ๋‹ค๋ฅธ ๋นˆ๋„๋ฅผ ๋ณด์ •ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ์ง€์ƒ ์ง„์‹ค ๋ถ„ํ• ์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜ ๋งต์„ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•œ๋‹ค. ๋˜ํ•œ ๋„คํŠธ์›Œํฌ์— ๊ฒฝ๊ณ„๊ฐ€ ์ ์€ ๋ถ€๋ถ„์„ ๋ถ„๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด pre-compute๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

โœ” ์œ„์˜ ์‹์—์„œ ํ•ต์‹ฌ์€ d1d1๊ณผ d2d2๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. d1d1์€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด cell(์„ธํฌ)์˜ ๊ฑฐ๋ฆฌ, d2d2๋Š” ๋‘๋ฒˆ์งธ๋กœ ๊ฐ€๊นŒ์šด cell์˜ ๊ฑฐ๋ฆฌ๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Data Augmentation

โœ” U-Net์—์„œ๋Š” ์˜ํ•™ ๋ฐ์ดํ„ฐ์— ๋งž๋Š” Elastic Deformation์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” invariacneํ•˜๊ณ  robustํ•˜๋‹ค๋Š” ํŠน์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ œ์‹œํ•œ ์ด์œ ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์„ธํฌ ๋ฐ์ดํ„ฐ๋“ฑ๊ณผ ๊ฐ™์€ ๊ฒƒ์€ ๊ตฌํ•˜๊ธฐ๊ฐ€ ์–ด๋ ค์šฐ๋ฉฐ, ๊ฐ๋„์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ๋•Œ๋ฌธ์— ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋…ผ๋ฌธ ๋ฐ ์ž๋ฃŒ๋ฅผ ์ฐพ์•„๋ณด์‹œ๋Š” ๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

profile
Maybe I will be an AI Engineer?

0๊ฐœ์˜ ๋Œ“๊ธ€