U-Net: Convolutional Networks for Biomedical Image Segmentation

์ด์€์ƒยท2024๋…„ 6์›” 24์ผ
0

๋…ผ๋ฌธ๋ฆฌ๋ทฐ

๋ชฉ๋ก ๋ณด๊ธฐ
20/23

๐Ÿ“„U-Net: Convolutional Networks for Biomedical Image Segmentation

written by Olaf Ronneberger, Philipp Fischer, and Thomas Brox


Introduction

์ง€๋‚œ ์‹œ๊ฐ„ ๋™์•ˆ Deep Convolutional Networks(DCN)๋Š” ๋งŽ์€ ์‹œ๊ฐ ์ธ์‹ ์ž‘์—…์—์„œ ํฐ ์„ฑ๊ณผ๋ฅผ ๊ฑฐ๋‘ . DCN์˜ ์„ฑ๊ณต์€ ์ฃผ๋กœ ๋Œ€๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋„คํŠธ์›Œํฌ ํฌ๊ธฐ์˜ ํ•œ๊ณ„๋กœ ์ธํ•ด ์ œํ•œ๋˜์—ˆ์œผ๋‚˜ Krizhevsky ๋“ฑ์ด 8๊ฐœ ์ธต๊ณผ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ parameters๋ฅผ ๊ฐ€์ง„ ๋Œ€ํ˜• ๋„คํŠธ์›Œํฌ๋ฅผ ImageNet ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ๊ฐ๋… ํ•™์Šตํ•˜์—ฌ ์„ฑ๊ณผ๋ฅผ ๋ƒ„

์ „ํ˜•์ ์ธ DCN์€ ๋ถ„๋ฅ˜ ์ž‘์—…์— ์‚ฌ์šฉ๋˜์–ด ์ด๋ฏธ์ง€์— a single class label์„ output์œผ๋กœ ๋„์ถœํ•จ. ๊ทธ๋Ÿฌ๋‚˜ ์ƒ๋ฌผ์˜ํ•™ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ์—์„œ๋Š” ๊ฐ ํ”ฝ์…€์— label์„ ํ• ๋‹นํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ. Ciresan ๋“ฑ์€ ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ด ํ”ฝ์…€ ์ฃผ๋ณ€์˜ ๊ตญ์†Œ ์˜์—ญ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๊ฐ ํ”ฝ์…€์˜ class label์„ ์˜ˆ์ธกํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ํ›ˆ๋ จ์‹œํ‚ด. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๋ฐฉ์‹์€ ๋А๋ฆฌ๊ณ  ์ค‘๋ณต์ด ๋งŽ์œผ๋ฉฐ, ํฐ patch์—์„œ๋Š” ๋ฌธ๋งฅ์„ ๋งŽ์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์œ„์น˜ ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” fully convolutional network๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ a more elegant architecture๋ฅผ ์ œ์•ˆํ•จ. ํ•ด๋‹น architecture๋Š” ํ›ˆ๋ จ ์ด๋ฏธ์ง€๊ฐ€ ์ ์–ด๋„ ์ž‘๋™ํ•˜๋ฉฐ, ๋” ์ •ํ™•ํ•œ segmentation์„ ์ œ๊ณตํ•จ. ์ฃผ์š” ์•„์ด๋””์–ด๋Š” ์ˆ˜์ถ• ๊ฒฝ๋กœ์—์„œ ๊ณ ํ•ด์ƒ๋„ feature๋ฅผ upsampled๋œ output๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ •ํ™•ํ•œ output์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ. ๋˜ํ•œ data augmentation์„ ํ†ตํ•ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๊ณ  ๋ณ€ํ˜•์— ๋Œ€ํ•œ ๋ถˆ๋ณ€์„ฑ์„ ํ•™์Šตํ•˜๊ฒŒ ํ•จ

๋”ํ•˜์—ฌ ์ ‘์ด‰ํ•˜๋Š” ๊ฐ์ฒด๋ฅผ ๋ถ„๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์ค‘ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•จ. ์ œ์•ˆ๋œ network๋Š” ๋‹ค์–‘ํ•œ ์ƒ์˜ํ•™ segmentation ๋ฌธ์ œ์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, EM stack์—์„œ ์‹ ๊ฒฝ ๊ตฌ์กฐ์˜ ์„ธ๋ถ„ํ™” ๋ฐ ISBI 2015์˜ ์„ธํฌ ์ถ”์  ์ฑŒ๋ฆฐ์ง€์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๊ณผ๋ฅผ ๋ณด์ž„


Network Architecture


Network architecture๋Š” contracting path(์™ผ)์™€ expansive path(์˜ค)๋กœ ์ด๋ฃจ์–ด์ง

contracting path๋Š” ์ „ํ˜•์ ์ธ convolution network architecture๋ฅผ ๋”ฐ๋ฆ„. ๋‘ ๊ฐœ์˜ 3x3 convolution(no pading)๊ณผ ๊ฐ conloution ํ›„์— ์ •๋ฅ˜๋œ ReLU, ๊ทธ๋ฆฌ๊ณ  2x2 max pooling์œผ๋กœ ๊ตฌ์„ฑ๋จ. ๋‹ค์šด์ƒ˜ํ”Œ๋ง ๋‹จ๊ณ„๋งˆ๋‹ค feature channel์˜ ์ˆ˜๊ฐ€ ๋‘ ๋ฐฐ๋กœ ์ฆ๊ฐ€

expansive path์˜ ๊ฐ ๋‹จ๊ณ„๋Š” feature map์„ ์—…์ƒ˜ํ”Œ๋งํ•˜๊ณ  2x2 convolution(up-convolution)์„ ํ†ตํ•ด feature channel ์ˆ˜๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ธ ํ›„, contracting path์˜ ๋Œ€์‘๋˜๋Š” ๋ถ€๋ถ„์—์„œ ์ž˜๋ผ๋‚ธ feature map๊ณผ ๊ฒฐํ•ฉํ•จ. ์ดํ›„ ๋‘ ๊ฐœ์˜ 3x3 convolution๊ณผ ๊ฐ convolution ํ›„์— ReLU๊ฐ€ ์ ์šฉ๋จ. ์ž˜๋ผ๋‚ด๊ธฐ๋Š” ๊ฐ convolution์—์„œ ๊ฒฝ๊ณ„ ํ”ฝ์…€์ด ์†์‹ค๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ•„์š”ํ•จ. final layer์—์„œ๋Š” 1x1 convolution์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ 64-component feature vector๋ฅผ ์›ํ•˜๋Š” class ์ˆ˜๋กœ mappingํ•จ. network๋Š” ์ด 23๊ฐœ์˜ convolution layers๋ฅผ ๊ฐ€์ง

output segmentation map์˜ ์›ํ™œํ•œ tiling์„ ์œ„ํ•ด ๋ชจ๋“  2x2 max pooling ์—ฐ์‚ฐ์ด x ๋ฐ y ํฌ๊ธฐ๊ฐ€ ์ง์ˆ˜์ธ layer์— ์ €๊ตฅใ…‡๋˜๋„๋ก input tile size๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•จ


Training

Network๋Š” Caffe์˜ Stochastic Gradient Descent(SGD) ๊ตฌํ˜„์„ ์‚ฌ์šฉํ•˜์—ฌ input image์™€ ํ•ด๋‹น segmentation map์œผ๋กœ ํ›ˆ๋ จ๋จ. padding์ด ์—†๋Š” convolution์œผ๋กœ ์ธํ•ด output image๋Š” input๋ณด๋‹ค ๊ฒฝ๊ณ„ ๋„ˆ๋น„๋งŒํผ ์ž‘์•„์ง. GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํฐ batch ํฌ๊ธฐ๋ณด๋‹ค๋Š” ํฐ input tile์„ ์‚ฌ์šฉํ•˜๊ณ , patch๋Š” ๋‹จ์ผ ์ด๋ฏธ์ง€๋กœ ์ค„์ž„. ๋†’์€ momentum(0.99)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด์ „ ํ›ˆ๋ จ ์ƒ˜ํ”Œ๋“ค์ด ํ˜„์žฌ ์ตœ์ ํ™” ๋‹จ๊ณ„์˜ ์—…๋ฐ์ดํŠธ๋ฅผ ๊ฒฐ์ •ํ•˜๊ฒŒ ํ•จ

energy ํ•จ์ˆ˜๋Š” ์ตœ์ข… feature map์— ๋Œ€ํ•œ pixel ๋‹จ์œ„ sofmax์™€ cross entropy loss ํ•จ์ˆ˜๋กœ ๊ณ„์‚ฐ๋จ. softmax๋Š” ๊ฐ pixel ์œ„์น˜์—์„œ k๋ฒˆ์งธ feature map์˜ activation์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ •์˜๋จ. cross entropy๋Š” ๊ฐ ์œ„์น˜์—์„œ ์˜ˆ์ธก๋œ class ํ™•๋ฅ ๊ณผ ์‹ค์ œ label์˜ ์ฐจ์ด๋ฅผ ๋ฒŒ์น™์œผ๋กœ ์‚ฌ์šฉํ•จ

๊ฐ ground truth segmentation์— ๋Œ€ํ•ด weight map์„ ์‚ฌ์ „ ๊ณ„์‚ฐํ•˜์—ฌ ํŠน์ • ํด๋ž˜์Šค์˜ ํ”ฝ์…€ ๋นˆ๋„๋ฅผ ๋ณด์ƒํ•˜๊ณ , ์„œ๋กœ ์ ‘์ด‰ํ•˜๋Š” cell ์‚ฌ์ด์˜ ์ž‘์€ ๋ถ„๋ฆฌ ๊ฒฝ๊ณ„๋ฅผ ํ•™์Šตํ•˜๋„๋ก ํ•จ. ๋ถ„๋ฆฌ ๊ฒฝ๊ณ„๋Š” ํ˜•ํƒœํ•™์  ์—ฐ์‚ฐใ…‡๋ฅด ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋จ. weight map์€ class ๋นˆ๋„๋ฅผ ๊ท ํ˜• ์žˆ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ wc์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด cell ๊ฒฝ๊ณ„์™€ ๋‘ ๋ฒˆ์งธ๋กœ ๊ฐ€๊นŒ์šด cell ๊ฒฝ๊ณ„๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์‹์œผ๋กœ ๊ณ„์‚ฐ๋จ

deep netwrok์—์„œ๋Š” weight initialization์ด ๋งค์šฐ ์ค‘์š”. ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ถ€๊ฐ€ ๊ณผ๋„ํ•œ ํ™œ์„ฑํ™”๋ฅผ ์ œ๊ณตํ•˜๊ฑฐ๋‚˜ ๊ธฐ์—ฌํ•˜์ง€ ์•Š๊ฒŒ ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ด์•ผ ํ•จ. ๋„คํŠธ์›Œํฌ์˜ ๊ฐ feature map์ด ๋Œ€๋žต ๋‹จ์œ„ ๋ถ„์‚ฐ์„ ๊ฐ€์ง€๋„๋ก ์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜ ์„ค์ •. ์ด๋ฅผ ์œ„ํ•ด ๊ฐ€์ค‘์น˜๋Š” ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ sqrt(2/N)์ธ ์ •๊ทœ ๋ถ„ํฌ์—์„œ ์ถ”์ถœ๋˜๋ฉฐ, ์—ฌ๊ธฐ์„œ N์€ ํ•œ ๋‰ด๋Ÿฐ์˜ ๋“ค์–ด์˜ค๋Š” ๋…ธ๋“œ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ„. ์˜ˆ๋ฅผ ๋“ค์–ด, 3x3 ์ปจ๋ณผ๋ฃจ์…˜๊ณผ ์ด์ „ ์ธต์˜ 64๊ฐœ ํŠน์ง• ์ฑ„๋„์—์„œ๋Š” N = 9 * 64 = 576

Data Augmentation

์ ์€ ์ˆ˜์˜ ํ›ˆ๋ จ ์ƒ˜ํ”Œ๋กœ ๋„คํŠธ์›Œํฌ์— ํ•„์š”ํ•œ invariance๊ณผ robustness๋ฅผ ๊ฐ€๋ฅด์น˜๋Š” ๋ฐ ํ•„์ˆ˜์ 

ํ˜„๋ฏธ๊ฒฝ ์ด๋ฏธ์ง€๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•  ๋•Œ, ์ฃผ๋กœ ์ด๋™ ๋ฐ ํšŒ์ „ invariance์™€ ๋ณ€ํ˜• ๋ฐ ํšŒ์ƒ‰ ๊ฐ’ ๋ณ€๋™์— ๋Œ€ํ•œ robustness๊ฐ€ ํ•„์š”ํ•จ. ํŠนํžˆ, random elastic deformations of the training samples์€ ๋งค์šฐ ์ ์€ ์ˆ˜์˜ ์ฃผ์„์ด ๋‹ฌ๋ฆฐ ์ด๋ฏธ์ง€๋กœ segmentation network๋ฅผ ํ›ˆ๋ จํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ๊ฐœ๋…์ž„. 3x3 ๊ฒฉ์ž์—์„œ ๋žœ๋ค ๋ณ€์œ„ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๋„๋Ÿฌ์šด ๋ณ€ํ˜•์„ ์ƒ์„ฑํ•จ. ๋ณ€์œ„๋Š” ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ 10ํ”ฝ์…€์ธ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์—์„œ ์ƒ˜ํ”Œ๋ง๋˜๋ฉฐ, ํ”ฝ์…€ ๋‹จ์œ„ ๋ณ€์œ„๋Š” 3์ฐจ ์Šคํ”Œ๋ผ์ธ ๋ณด๊ฐ„๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋จ. contracting path ๋์— drop-out layer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ ์•”๋ฌต์  data augmentation์„ ์ˆ˜ํ–‰ํ•จ


Experiments

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” U-Net์„ ์„ธ ๊ฐ€์ง€ ๋‹ค๋ฅธ segmentation ์ž‘์—…์— ์ ์šฉํ•จ

  1. ์ „์žํ˜„๋ฏธ๊ฒฝ ๊ธฐ๋ก์—์„œ ์‹ ๊ฒฝ ๊ตฌ์กฐ์˜ ์„ธ๋ถ„ํ™”
    • ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ISBI 2012์—์„œ ์‹œ์ž‘๋œ EM ์„ธ๋ถ„ํ™” ์ฑŒ๋ฆฐ์ง€์—์„œ ์ œ๊ณต
      • 30๊ฐœ์˜ ์ด๋ฏธ์ง€(512x512 ํ”ฝ์…€)์™€ ํ•ด๋‹น ์„ธํฌ(ํฐ์ƒ‰) ๋ฐ ๋ง‰(๊ฒ€์€์ƒ‰)์˜ ์ฃผ์„์ด ๋‹ฌ๋ฆฐ ์„ธ๋ถ„ํ™” ๋งต์œผ๋กœ ๊ตฌ์„ฑ
    • U-Net์€ ์ „์ฒ˜๋ฆฌ๋‚˜ ํ›„์ฒ˜๋ฆฌ ์—†์ด 0.0003529์˜ ์›Œํ•‘ ์—๋Ÿฌ์™€ 0.0382์˜ ๋žœ๋“œ ์—๋Ÿฌ๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ ์ตœ๊ณ  ๊ธฐ๋ก์„ ๊ฒฝ์‹ 
  2. ํ•™ ํ˜„๋ฏธ๊ฒฝ ์ด๋ฏธ์ง€์—์„œ์˜ ์„ธํฌ ์„ธ๋ถ„ํ™”
    • ISBI ์„ธํฌ ์ถ”์  ์ฑŒ๋ฆฐ์ง€ 2014 ๋ฐ 2015์˜ ์ผํ™˜
    • ์ฒซ ๋ฒˆ์งธ ๋ฐ์ดํ„ฐ ์„ธํŠธ : PhC-U373
      • ์œ„์ƒ์ฐจ ํ˜„๋ฏธ๊ฒฝ์œผ๋กœ ๊ธฐ๋ก๋œ ๊ธ€๋ฆฌ์˜ค๋ธ”๋ผ์Šคํ† ๋งˆ-์•„์ŠคํŠธ๋กœ์‚ฌ์ดํ† ๋งˆ U373 ์„ธํฌ๋“ค
      • U-Net์€ ํ‰๊ท  IOU๊ฐ€ 92%๋กœ ๋‘ ๋ฒˆ์งธ๋กœ ์ข‹์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜(83%)๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๊ณผ๋ฅผ ๋ณด์ž„
    • ๋‘ ๋ฒˆ์งธ ๋ฐ์ดํ„ฐ ์„ธํŠธ : DIC-HeLa
      • ์ฐจ๋“ฑ ๊ฐ„์„ญ ๋Œ€๋น„ ํ˜„๋ฏธ๊ฒฝ์œผ๋กœ ๊ธฐ๋ก๋œ HeLa ์„ธํฌ๋กœ ๊ตฌ์„ฑ
      • U-Net์€ ํ‰๊ท  IOU๊ฐ€ 77.5%๋กœ ๋‘ ๋ฒˆ์งธ๋กœ ์ข‹์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜(46%)๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๊ณผ๋ฅผ ๋ณด์ž„

Conclusion

U-Net architecture๋Š” ๋‹ค์–‘ํ•œ ์ƒ์˜ํ•™ segmentation ์‘์šฉ ๋ถ„์•ผ์—์„œ ๋งค์šฐ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•จ. elastic deformations์„ ์ด์šฉํ•œ data augmentation์„ ํ†ตํ•ด ์†Œ์ˆ˜์˜ ์ฃผ์„์ด ๋‹ฌ๋ฆฐ ์ด๋ฏธ์ง€๋กœ๋„ ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, NVidia Titan GPU(6GB)์—์„œ 10์‹œ๊ฐ„ ์ •๋„์˜ ํ›ˆ๋ จ ์‹œ๊ฐ„๋งŒ ํ•„์š”๋กœ ํ•จ. U-Net architecture๋Š” ๋” ๋งŽ์€ ์ž‘์—…์— ์‰ฝ๊ฒŒ ์ ์šฉ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž„.

0๊ฐœ์˜ ๋Œ“๊ธ€