Pose Detection

eric9687·2022년 2월 21일

0

Pose Detection

Pose Estimation = keypoints detection = pose estimation
분류와 비슷하지만, upsampling한다는 다른 점이 있음
학습요소
사람의 위치정보 (bbox)
관절별 좌표
모델 계층
- single
  --- direct regression
  --- heatmap based estimation
- Multi
  --- Top-down : 사람위치 -> 관절 위치
  --- Down-top

논문

DeepPose (논문) : 관절별로 모형 학습 -> 학습시간 오래걸림, 정확도가 떨어짐
입력 데이터: RGB 이미지
출력 데이터: bbox
사람 추출(bbox) 후 이미지 사이즈 고정
(stage1)
AlexNet : representation벡터 추출
2k개의 예측값 산출(x,y)
MSE
대략적 위치
(stage2) : 하나의 대략적 위치를 다시 추론
평가: PDJ, percent of Detected Joints = 원안에 있는 관절 정답 / 전체 관절 수

Mask-RCNN (Top-down)
모듈 1 : Feature Extractor (backbone) - ResNext ,
모듈 2 : RPN, 어디에 객체가 있을지 후보
모듈 3 : Bounding box regrassion and Classification
모듈 4 : binary class estimation
모듈 5 : Human pose estimation (mask이용)
mask : 관절 1 배경 0
Microsoft common objects in context (MS COCO)
데이터셋 : x,y,v(visibility flag) v: 이미지에 확실 존재(2), 이미지에 존재하지않음(0), 가려져있음(1)
평가지표 : Object keypoint similarity(OKS) basedd mean Average Precision

regression 보다 heatmap접근이 성능이 더 잘나옴

regression: deepose, CNNregressor를 통해 key-points의 좌표값(x,y)를 직접 추정
heatmap: 상대적인 빈도, 확률값으로 판단, guassian 분포의 heatmap, 2배이상 높은 성능

Trade-off: global information vs high resolution

보통 global information을 높이기 위해 receptive field를 확대하면 resolution이 낮아지고 upsampling했을때 정보손실이 높아짐
HRNet (high resolution)
일반적인 구조
압축(high to low): strided convoution, pooling -> 복원: upsampling, transposed convolution
(spatial information 손실)
원인: 구조가 직렬화되어 있어서
병렬화: 병렬적인 하위 네트워크들로 multi-scale resolution을 그대로 유지하여 다양한 scale의 spatial 정보 학습
- fusion
  => fusion of Multi-scale resolution (4개의 resolution)

그러나 먼저 된 자로서 나중되고 나중 된 자로서 먼저될 자가 많으니라(마:19:30)

이전 포스트

정상 데이터로만 결함을 검출하는 Anomaly Detection

다음 포스트

Heatmap을 이용한 Pose Detection

0개의 댓글

관련 채용 정보