[230721] PIDNet

FSA·2023년 7월 21일

0

segmentation

목록 보기

2/14

Abstract

Two branch network architecutre
- PI 제어기와 같다. (overshoot issue가 있다.)
  - P
    - current
    - all pass filter
    - detail을 잡는다. (local information에 집중)
    - high-resolution feature map에서 detail한 정보를 보존하기 위함.
  - I
    - all previous
    - low pass (잘 안바뀌는걸 잡음)
    - context(문맥)을 잡는다.
    - large averaging filter 처럼 작동한다.
    - long-range dependencies 를 다루기 위해, locally & globally context information을 합친다.
  - D
    - derivative
    - high pass (잘 바뀌는걸 잡음)
    - boundary를 잡기 위해, high frequency를 추출한다.
  - PI
    - input 중 low frequency 에 집중하는 경향이 있고, 빠르게 변화하는 input 부분에는 대응이 느리다.
  - high resolution detail 과 low frequency context 를 직접적으로 fusion하는 것은
    - object boundary가 surrounding contextual information에 잠식당할 가능성이 크다.
    - 작은 물체가 인접한 큰 물체에 잠식당할 가능성이 있다.

Method

loss

l_0, l_2: Sementic loss
l_1: boundary binary cross entropy loss
l_3: boundary aware sementic loss

PAG(Pixel Attention Guided) Fusion

PI network의 문제점을 해결해보자.
- object boundary가 surrounding contextual information에 잠식당할 가능성이 크다.
- 작은 물체가 인접한 큰 물체에 잠식당할 가능성이 있다.

BAG(Boundary Attention Guided) Fusion

boundary 근처에서는 P network를 좀 더 보고, boundary 근처가 아닌 곳에서는 I Network를 좀 더 보자.

PAPPM: Fast Aggregation of Contexts

DAPPM에서, parallelizable 하게 만들었고, channel 수를 128 -> 96으로 줄인 것.
- PAPPM은 PIDNet-M / PIDNet-S에서 사용됨
- DAPPM은 PIDNET-L 에서 사용됨

Experiment

Datasets

Cityscapes

차량에서 수집한 이미지들
5000장 이미지 = 2975 / 500 / 1525
이미지 해상도: 2048 * 1024

CamVid

701장 = 367/ 101/ 223
960 * 720 해상도
32 카테고리

PASCAL Context

4998 training / 5105 validation.
59~60 카테고리

Implementaiton details

Pretraining

ImageNet에서 학습

Training

학습된 이미지 방식/크기
- data image
  - 1024 x 2048
- data image re-size
  - base_size(2048) * (0.5배 ~ 2.1배) = 1024 ~ 4300 (width가)
    - 결론적으로: 1024 x 2048-> (512 1024) ~ (2150 4300)
    - 최소 크기가 1024 * 1024가 되도록 zero padding을 더해줌.
- data image crop
  - 랜덤하게 (1024, 1024) 로 줄임.

Conclusion

학습 시간이 좀 오래 걸리는건 단점이다.

모든 의사 결정 과정을 지나칠 정도로 모두 기록하고, 나중에 스스로 피드백 하는 것

이전 포스트

[230707] A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View

다음 포스트

ADE20K dataset

2개의 댓글

2023년 7월 21일

아주 유용한 정보네요!

답글 달기

2024년 6월 5일

학습 코드좀 받을수있을까요?

답글 달기

관련 채용 정보