Bi-Directional Cascade Network for Perceptual Edge Detection 제2부

이준석·2023년 3월 7일

Bi-Directional Cascade Network for Perceptual Edge Detection

목록 보기

2/2

This work is related to edge detection, multi-scale representation learning, and network cascade structure. We briefly review these three lines of works, respectively.
이 작업은 에지 감지, 다중 스케일 표현 학습 및 네트워크 캐스케이드 구조와 관련이 있다. 우리는 각각 이 세 줄의 작품을 간략하게 검토한다.

Edge Detection: Most edge detection methods can be categorized into three groups, i.e., traditional edge operators, learning based methods, and the recent deep learning, respectively. Traditional edge operators [22, 4, 45, 34] detect edges by finding sudden changes in intensity, color, texture, etc.
에지 감지: 대부분의 에지 감지 방법은 세 가지 그룹, 즉 전통적인 에지 연산자, 학습 기반 방법 및 최근의 딥 러닝으로 각각 분류할 수 있다. 기존 에지 연산자[22, 4, 45, 34]는 강도, 색상, 질감 등의 갑작스러운 변화를 찾아 에지를 감지합니다.

Learning based methods spot edges by utilizing supervised models and hand-crafted features. For example, Dollár et al. [10] propose structured edge which jointly learns the clustering of groundtruth edges and the mapping of image patch to clustered token. Deep learning based methods use CNN to extract multi-level hierarchical features. Bertasius et al. [2] employ CNN to generate features of candidate contour points. Xie et al. [49] propose an end-to-end detection model that leverages the outputs from different intermediate layers with skip-connections.
학습 기반 방법은 지도 모델과 수작업 기능을 활용하여 가장자리를 발견한다. 예를 들어 Dollar et al. [10] 실측 자료 에지의 클러스터링과 클러스터링된 토큰에 대한 이미지 패치의 매핑을 공동으로 학습하는 구조화된 에지를 제안한다. 딥 러닝 기반 방법은 CNN을 사용하여 다단계 계층적 특징을 추출한다. 베르타시우스 외. [2] CNN을 사용하여 후보 등고선의 특징을 생성한다. 시에 외. [49] 스킵 연결이 있는 서로 다른 중간 계층의 출력을 활용하는 종단 간 탐지 모델을 제안한다.

Liu et al. [30] further learn richer deep representations by concatenating features derived from all convolutional layers. Xu et al. [51] introduce a hierarchical deep model to extract multi-scale features and a gated conditional random field to fuse them.
류 외. [30] 모든 컨볼루션 레이어에서 파생된 기능을 연결하여 더 풍부한 심층 표현을 학습한다. 쉬 외. [51] 다중 스케일 특징을 추출하기 위한 계층적 심층 모델과 이를 융합하기 위한 게이트 조건부 무작위 필드를 도입한다.

Multi-Scale Representation Learning: Extraction and fusion of multi-scale features are fundamental and critical for many vision tasks, e.g., [19, 52, 6]. Multi-scale representations can be constructed from multiple re-scaled images [12, 38, 11], i.e., an image pyramid, either by computing features independently at each scale [12] or using the output from one scale as the input to the next scale [38, 11].
다중 스케일 표현 학습: 다중 스케일 기능의 추출과 융합은 많은 비전 작업(예: [19, 52, 6)에서 기본적이고 중요하다. 다중 스케일 표현은 여러 개의 재조정된 이미지[12, 38, 11], 즉 이미지 피라미드에서 각 스케일[12]에서 독립적으로 기능을 계산하거나 한 스케일의 출력을 다음 스케일의 입력으로 사용하여 구성할 수 있다[38, 11].

Recently, innovative works DeepLab [5] and PSPNet [55] use dilated convolutions and pooling to achieve multi-scale feature learning in image segmentation. Chen et al. [6] propose an attention mechanism to softly weight the multiscale features at each pixel location.
최근 혁신적인 작업인 DeepLab[5]과 PSPNet[55]은 확장된 컨볼루션과 풀링을 사용하여 이미지 분할에서 다중 스케일 기능 학습을 달성한다. 첸 외. [6] 각 픽셀 위치에서 멀티스케일 기능에 부드럽게 가중치를 부여하기 위한 주의 메커니즘을 제안한다.

Like other image patterns, edges vary dramatically in scales. Ren et al. [39] show that considering multi-scale cues does improve performance of edge detection. Multiple scale cues are also used in many approaches [48, 39, 24, 50, 30, 34, 51]. Most of those approaches explore the scale-space of edges, e.g., using Gaussian smoothing at multiple scales [48] or extracting features from different scaled images [1]. Recent deep based methods employ image pyramid and hierarchal features. For example, Liu et al. [30] forward multiple re-scaled images to a CNN independently, then average the results. Our approaches follow a similar intuition, nevertheless, we build SEM to learn multi-scale representations in an efficient way, which avoids repetitive computation on multiple input images.
다른 이미지 패턴과 마찬가지로 가장자리도 스케일이 매우 다양합니다. 렌 외. [39] 다중 스케일 단서를 고려하면 에지 감지 성능이 향상된다는 것을 보여준다. 다중 스케일 단서는 또한 많은 접근법에서 사용된다[48, 39, 24, 50, 30, 34, 51]. 이러한 접근 방식의 대부분은 에지의 스케일 공간을 탐색한다. 예를 들어, 여러 스케일에서 가우스 평활을 사용하거나 다른 스케일 이미지에서 특징을 추출한다[1]. 최근의 심층 기반 방법은 이미지 피라미드 및 계층적 특징을 사용한다. 예를 들어, 류 외. [30] 여러 개의 재조정된 이미지를 CNN에 독립적으로 전달한 다음 결과를 평균화한다. 우리의 접근 방식은 유사한 직관을 따르지만, 그럼에도 불구하고 우리는 효율적인 방식으로 다중 스케일 표현을 학습하기 위해 SEM을 구축하여 여러 입력 이미지에 대한 반복적인 계산을 피한다.

Network Cascade: Network cascade [21, 37, 25, 46, 26] is an effective scheme for many vision applications like classification [37], detection [25], pose estimation [46] and semantic segmentation [26]. For example, Murthy et al. [37] treat easy and hard samples with different networks to improve classification accuracy. Yuan et al. [54] ensemble a set of models with different complexities to process samples with different difficulties. Li et al. [26] propose to classify easy regions in a shallow network and train deeper networks to deal with hard regions. Lin et al. [29] propose a top-down architecture with lateral connections to propagate deep semantic features to shallow layers.
네트워크 캐스케이드: 네트워크 캐스케이드 [21, 37, 25, 46, 26]은 분류 [37], 탐지 [25], 포즈 추정 [46] 및 의미론적 분할 [26]과 같은 많은 비전 애플리케이션에 효과적인 체계이다. 예를 들어, Murthy 등이 있습니다. [37] 분류 정확도를 향상시키기 위해 쉽고 단단한 샘플을 다른 네트워크로 처리한다. Yuan et al. [54]는 서로 다른 어려움을 가진 샘플을 처리하기 위해 서로 다른 복잡성을 가진 모델 세트를 앙상블한다. 거짓말 등. [26] 얕은 네트워크에서 쉬운 영역을 분류하고 어려운 영역을 다루기 위해 더 깊은 네트워크를 훈련시킬 것을 제안한다. 라인 외. [29] 심층 의미 특징을 얕은 계층으로 전파하기 위해 측면 연결이 있는 하향식 아키텍처를 제안한다.

Different from previous network cascade, BDCN is a bidirectional pseudo-cascade structure, which allows an innovative way to supervise each layer individually for layerspecific edge detection. To our best knowledge, this is an early and original attempt to adopt a cascade architecture in edge detection.
이전의 네트워크 캐스케이드와 달리, BDCN은 양방향 유사 캐스케이드 구조로, 계층별 에지 감지를 위해 각 계층을 개별적으로 감독할 수 있는 혁신적인 방법을 제공한다. 우리가 아는 한, 이것은 에지 감지에서 캐스케이드 아키텍처를 채택하려는 초기의 독창적인 시도이다.

이준석

인공지능 전문가가 될레요

이전 포스트

Bi-Directional Cascade Network for Perceptual Edge Detection 제2부