Semantic Segmentation of Aerial Images Using U-Net

Speedwell🍀·2022년 11월 6일

U-Net

DigitalSreeni YouTube를 보고 따라한 내용입니다!

1) Dataset 준비

먼저 Kaggle에서 Semantic segmentation of aerial imagery 데이터셋을 다운로드 받자!

데이터셋에 대한 정보는 아래와 같다.
데이터셋 폴더 구조는 아래와 같다.

2) Process

Preprocess; crop all images to a size divisible by 256 and extract patches
- because of images come in many sizes
➡ can capture all images into numpy arrays

HEX→RGB 변환 후, RGB labels→integer values 변환 후, one hot encoding
- Masks는 RGB, Information은 HEX color code
- one hot encoding을 하는 이유는 multi-class problem이기 때문!
  - 따라서 Softmax를 쓰게 될 것이고, categorical cross entropy와 비슷한 걸 쓰게 될 것!

모델 학습이 끝나고 예측된 (segmented) 이미지들을 original RGB colors로 변환해야 함
256x256로 잘랐던 predicted tiles를 다시 merge by minimizing blending artefacts (smooth blending)
- 이미지를 256x256보다 약간 크게 자르면 tiles 간 overlap 생김
- keep these overlaps & predict & blend them in a smooth way
  ➡ merge해도 edge effects 없음

3) Code

3-1) Divide images into smaller patches

patchify를 쓰자!

📌 Semantic segmentation을 할 땐, resize하지 말고 crop하기!!

예를 들어, 797x644 이미지를 256으로 나눌 수 있는 가장 가까운 사이즈인 768x512로 변환한다.
➡ patch 수는 3(768/256) * 2(512/256) = 6이다.

이런 식으로 계산해보면 전체 patch의 수는 1305가 된다.

Tile 1: 797 x 644 --> 768 x 512 --> 6
Tile 2: 509 x 544 --> 512 x 256 --> 2
Tile 3: 682 x 658 --> 512 x 512  --> 4
Tile 4: 1099 x 846 --> 1024 x 768 --> 12
Tile 5: 1126 x 1058 --> 1024 x 1024 --> 16
Tile 6: 859 x 838 --> 768 x 768 --> 9
Tile 7: 1817 x 2061 --> 1792 x 2048 --> 56
Tile 8: 2149 x 1479 --> 1280 x 2048 --> 40
Total 9 images in each folder * (145 patches) = 1305
Total 1305 patches of size 256x256

Speedwell🍀

이전 포스트

Part 1) 재귀 필터 - 정리

다음 포스트