[2021 Computers in Biology and Medicine] The impact of pre- and post-image processing techniques on deep learning frameworks: A comprehensive review for digital pathology image analysis

yellofi·2022년 3월 18일

Paper Review

목록 보기

1/25

0. Motivation

Pathology Image Anlaysis에 적용한 딥러닝과 관련한 전처리, 후처리 방법들을 개괄적으로 정리한 논문으로 이를 공부하기 위해 리뷰함

The impact of pre- and post-image processing techniques on deep learning frameworks: A comprehensive review for digital pathology image analysis, Computers in Biology and Medicine, Volume 128, 2021

1. Overview

최근 딥러닝은 의료영상 분석의 main methodology가 되었다. 대표적으로 딥러닝을 활용한 이미지 분석은 classification (e.g., healthy vs. cancerous tissue), detection (e.g., lymphocytes and mitosis counting), segmentation (e.g., nuclei and glands segmentation)을 포함하는데, 디지털 병리학에서의 최근 머신러닝 방법의 대부분은 DNN에 합쳐지는 전처리, 후처리 단계를 가지고 있습니다. 이 논문의 목적은 디지털 병리학 이미지 분석에서 딥러닝 프레임워크와 함께 적용되는 전처리, 후처리 방법들에 대한 개요를 제공하는 것이다.

암 진단 및 등급 책정 (Grading) 에 기본인 조직 슬라이드 분석은 전문 병리학자에 의해 수행되는데, 이는 암 발생 정도와 환자에 따른 치료 옵션의 증가에 따라 더욱 복잡해지고 완전한 진단을 위해서 많은 양의 슬라이드에 대한 주의깊은 분석을 요구하게 된다.

병리학자들은 Grading에 보편적으로 사용되는 Cell counting, area, length, percentage of a specific cell presence within slide와 같은 수치적인 파라미터들을 추출해야하는데, 이러한 이슈들은 굉장히 높은 inter and intra vaiability를 야기하는데, 슬라이드내 특정 셀의 존재 percentage와 같은 수치적인 평가는 가끔 이루어지고, 실제로는 병리학자의 전문의식에 의존하는 질적인 평가가 대부분이다. (inter variability - 병리학자 간의 grading이 다르다, intra variability - 병리학자가 보는 시점에 따라 grading이 다르다)

특히 classification, object detection, segmentation 들과 함께 발전된 이미지 프로세싱 방법들을 리뷰한다.

2. Pre-processing

Tissue segmentation & artifact detection
Stain normalization
Patch selection

2.1. Tissue segmentation

Single-stage thresholding

Global thresholding (on HSV color space)
Hysterisis thresholding on grayscale image
Otsu thresholding
Thresholding on the optical density of the RGB channels

Multi-stage thresholding

Gaussian filtering and Otsu thresholding followed by morphological operators
RGB high-pass filter followed by Otsu thresholding and morphological operators

DL-based thresholding

Semantic segmentation using U-NET

2.2. Artifact detection

(b) 제대로 안 하면 물방울이 관찰될 수 있음
(d) Tissue fold, 접힌 부분은 좀 더 saturated된 걸 알 수 있음
(e) pH, 용액의 농도, 염색시간이 영향을 줄 수 있으며 해당 샘플은 너무 긴 염색시간으로 착색이 너무 깊어짐을 보여줌
(f) 이 단계에서 먼지, 기포, 미생물 오염 등이 발생할 수 있음
(g) 스캐너와 방법에 따라 색상 변화를 일으키게 되고, 이 과정에서 Blurring이 발생할 수 있음

Tissue fold detection using HSI color space and k-means clustering
Tissue fold segmentation
- through adaptive shifting of the RGB values
- using connectivity-based thresholding and color properiteis
Classifiaction of sharp and blurry images through local histogram features
Detection of out-of-focus regions using AdaBoost classifier

2.3. Stain normalziation

(1) Global color normalization

Color transfer using LAB
histogram specification

=> target image와 source image가 나타내는 세포 구조가 매우 다르면, 이미지 히스토그램 기반의 이 방법은 완전히 실패하게 된다.

(2) Color normalization after stain separation

RGB 공간에서 각 stain의 농도와 밝기가 비선형이므로 OD (optical density) space로 변형하여 stain을 separation하게 됨.

$V$ 는 OD space에서의 세기, $I$ 는 투사광 세기, $I_0$ 는 입사광 세기

$V=\log_{10}{\frac{I_{0}}{I}}=W\cdot H$

stain color apperance matrix $W$ 를 이용하면 각 염색 성분으로 분해될 수 있음.

$W$ 를 추정 $\cdot$ 추출하는 다양한 기술들이 발전되어왔고, 최근에는 unsupervised 기술들이 조직이미지를 normalizae하는데 적용되고 있는데, Spectral matching, Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA)와 같은 $W$ , $H$ 를 모두 추정하기 위해 적용되었다.

(3) Color transfer using deep networks

Generateive learning and Style transfer

Feature extraction and RGB color shifting using a deep neural network
Sparse AutoEncoders to standardize the color distribution of the image
Stain-style transfer learning through GANs
Cycle-GAN normalization

특히, Stain normalization은 딥러닝 프레임워크에 좋은 영향을 주었으며 prostate and breast cancer detection, colon glands segmentation and classification, nuclei segmentation, mitosis detection에 대한 CAD의 정확도를 향상시켰다.

이는 기관, 스캐너 등 달라지는 조건에서 모델의 성능이 일반화시킬 수 있는 전략으로도 중요하다

2.4. Patch (Tile) selection

특히, 딥러닝에서 WSI에 대한 discriminative information을 효과적으로 학습하기 위해 patch-level 예측을 통한 WSI 결과를 도출해야한다.

Grid sampling

K-means-based sampling

K-means algorithm to extract relevant patches
patch ext. using HSV color space and k-means to detect glandualr areas

Image processing-based sampling

Color deconvolution to detect cell nuclei and perform patch extraction
Nuclesus-guided patch extraction using color deconvolution and Gaussian Filter

DNN-based sampling

CNN (AlexNet) to identify all the nuclear regions
Soft-attention netowrk to select the relevant patches

target하는 세포 구조에 따라 적절한 patch를 선택하는 것은 불필요한 patch를 제외함으로써 계산 시간을 줄이면서도 모델의 성능을 향상시킬 수 있다.

3. Post-processing

3.1. Classification

patch-level의 prediction을 모아 1) majority voting, 2) fusion algorithm 을 도입해 WSI의 prediction을 도출한다.

voting을 통해 나머지 prediction 결과를 손실하는 것보다 fusion하는 방법으로 더 합리적은 prediction을 할 수 있고 성능을 향상시킬 수 있다

여러 연구가 적용한 post-processing에 대한 summary는 논문의 table 2을 참고바랍니다.

3.2. Detection

R-CNN, Faster R-CNN, YOLO 등의 딥러닝 object detector가 발전해왔다. 평가로 F1-score를 주로 사용한다?

heatmap의 center를 찾아서 mapping해주는 알고리즘을 적용하거나 2) Resions' proposal에서 적절한 region(bounding box)를 선별하는 알고리즘, 특히 non-maxima suppression (NMS) 알고리즘을 적용하게 된다.

1)에서는 probability가 가장 큰 point를 center로 잡거나 thresholding을 사용할 수 있습니다.

lymphocyte detection

NSM algorithm을 가장 많이 쓰이고, post-processing이 적용되지 않는 것에 비해 8% 탐지 성능이 향상된 것을 보입니다.

mitosis detection

lymphocyte detection과 함께, CNN heatmap에 적용된 clustering 전략들 NMS, majority voting , local maxima이 가장 많이 쓰이고, single CNN에 비해 17% 까지 정확도가 향상되는 것을 보여줍니다.

여러 연구가 적용한 post-processing에 대한 summary는 논문의 table 3을 참고바랍니다

3.3. Segmentation

대상의 정확한 boundary를 찾아내는 것으로 detection보다 challenging하고, 평가는 F1-score, DICE score를 주로 사용한다.

region-based segmentation: region을 찾고, 그 region을 기반으로 ROI에 label된 pixel을 prediction
FCN-based segmentation: a mapping from pixels to pixels, region proposal 필요없음

MASK R-CNN, U-NET이 의료이미지 segmentation에서 대표적인 DL 아키텍쳐

Mask R-CNN: FASTER R-CNN의 진화형태, 3개의 branch, multi-task learning
U-NET: FCN과 유사한 형태로 정보손실을 막기위해 downsample된 feature를 대응되는 upsampling 단계에서 concat

1) nuclei segmentation

암 진단에서 biopsy를 분석할때, nuclei의 morphology와 spatial arrangement을 기반으로 한다.

overlapping된 nuclei를 구분하는 것이 issue가 되는데, watershed와 같은 기술이 도입되거나 three-class pipeline으로 접근하게된다.

two-class pipeline (binary segmentation, nuclei/boundary)
three-class pipeline (nuclei/boundary/cell boundary)

boundary mask에 대한 label을 사용해 인접한 nuclei를 구분하는 것이 main post-processing 방법이고, 11%까지 성능 향상을 보였다.

2) Tubules and glands segmentation

tubule/gland morphology는 prostate, breast, colon과 같은 epithelial tissue (표피조직)에서 암의 악성정도를 보는데 사용된다.

three-class pipeline에 적용된 morphological operator(connected componen, 들이 tubles and glands segmentation에서 주로 사용하는 post-processing 방법이고, 20%까지 성능 향상을 보였다.

여러 연구가 적용한 post-processing에 대한 summary는 논문의 table 4을 참고바란다.

4. Discussion

pathology image analysis에서 pre- and post-processing의 중요해져서 이를 deep network와 함께 robust하게 사용하고 성능을 올릴 수 있도록 tool로 개발한 연구들도 최근에 있다.

디지털 조직이미지에서 pre-processing 중에선 stain normalization이 가장 흔한 알고리즘, 그 다음이 중요한 건 patch selection이라고 볼 수 있다.

classification에서느 voting보단 aggregation을 추천하고, detection task에서는 NMS가 가장 흔히 사용된다. 또한, segmentation task에서는 배경과 대상을 binary segmentation하는 two-class pipeline이 있는데, 최근에는 대상의 boundary까지 segmentation하는 three-class pipeline의 접근이 더 활발하게 적용되고 있다.

딥러닝과 pre- and post-processing으로 적절한 integration으로 reliable하고 robust한 pathology image anaylsis가 이룩될 것이라고 본다.

5. Conclusion

2020년도 연구까지 pathology image analysis에서의 개괄적인 pre- and post-processing을 담고 있으니 참고하면 좋을 것 같다.

yellofi

ML engineer, Pathology Image Analysis

다음 포스트