CNN

재구몬·2021년 7월 31일

목록 보기

6/12

Convolutional Neural Networks

Zero padding

패딩을 추가하는 이유는 합성곱을 하는 과정에서 입력값의 크기가 줄어드는 것을 방지하기 위함이다. 이는 깊은 신경망을 만드는데 매우 중요한 부분이며 합성곱을 한 후에도 결과값의 부피가 줄어들지 않는 동치 합성곱(same convolution)을 만드는데 주로 이용된다.
또한 합성곱을 하다보면 이미지 자체의 모서리 부분에 대한 특징을 추출하는데 어려움이 생긴다. 이를 위해서 패딩을 추가하면 동등하게 모든 픽셀을 필터링에 이용할 수 있다. 입력값의 차원이 (m, n_H, n_W, n_C)라고 할 때, np.pad를 이용해서 패딩을 추가할 수 있다.

def zero_pad(X, pad):

    X_pad = np.pad(X, ((0,0), (pad,pad), (pad,pad), (0,0)), mode='constant', constant_values=(0,0))
    
    return X_pad

Convolution

합성곱 과정에서 나타나는 각 요소들의 차원은 이전 포스트를 참고하길 바란다. 먼저 입력값에서 필터와의 합성곱을 위해 해당부분이 a_slice_prev로 추출이 되었다고 가정하였을 때의 단일 합성곱에 대한 함수이다.

def conv_single_step(a_slice_prev, W, b):
    """
    Arguments:
    a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
    W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
    b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)
    """
    # Element-wise product between a_slice_prev and W. Do not add the bias yet.
    s = W * a_slice_prev
    # Sum over all entries of the volume s.
    Z = np.sum(s)
    # Add bias b to Z. Cast b to a float() so that Z results in a scalar value.
    Z = Z + np.float64(b)

    return Z

위의 함수를 이용하여 함성곱을 이용한 순방향 전파과정 함수를 만들어 보자. 마지막 부분은 역방향 전파를 위한 캐싱을 추가한다.

def conv_forward(A_prev, W, b, hparameters): 
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    (f, f, n_C_prev, n_C) = W.shape
    
    stride = hparameters["stride"]
    pad = hparameters["pad"]
    
    n_H = int((n_H_prev - f + 2 * pad) / stride) + 1
    n_W = int((n_W_prev - f + 2 * pad) / stride) + 1
    
    Z = np.zeros(shape = (m, n_H, n_W, n_C))
    
    A_prev_pad = zero_pad(A_prev, pad)

    for i in range(m):
    
        a_prev_pad = pad               
        for h in range(n_H):          
            vert_start = h * stride
            vert_end = vert_start + f
            
            for w in range(n_W):
                horiz_start = w * stride
                horiz_end = horiz_start + f
                
                for c in range(n_C):
                    a_slice_prev = A_prev_pad[i, vert_start : vert_end, horiz_start : horiz_end, :]
                    
                    weights = W[:, :, :, c]
                    biases = b[:, :, :, c]
                    Z[i, h, w, c] = conv_single_step(a_slice_prev, weights, biases)
                    
    # Save information in "cache" for the backprop
    cache = (A_prev, W, b, hparameters)
    
    return Z, cache

Pooling Layer

풀링을 적용할 경우 입력값의 높이와 너비가 감소하여 단순히 특징을 찾는(feature detecting)의 용도 이외에 계산을 보다 쉽게 해준다는 장점이 존재한다. 풀링 과정에서는 보통 패딩을 추가하지 않은 상태로 보폭과 필터의 크기만을 가지고 계산을 적용하게 된다.

def pool_forward(A_prev, hparameters, mode = "max"):
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    f = hparameters["f"]
    stride = hparameters["stride"]
    
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    
    A = np.zeros((m, n_H, n_W, n_C))              
    
    for i in range(m): 
    
        for h in range(n_H):
            vert_start = h * stride
            vert_end = vert_start + f
            
            for w in range(n_W):  
                horiz_start = w * stride
                horiz_end = horiz_start + f
                
                for c in range (n_C):       
                    a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c]
                    
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.average(a_prev_slice)
                        
    cache = (A_prev, hparameters)
    
    assert(A.shape == (m, n_H, n_W, n_C))
    
    return A, cache

Backpropagation in CNN

Convolution Layer

대체로 순방향 전파와 손실함수만 계산해 주어도 라이브러리나 프레임워크에서 역방향 전파의 과정이 계산되는 편리함을 가지고 있다. 따라서 효율을 따진다면 역방향 전파까지 신경을 쓸 필요는 없지만 전반적으로 독특한 구조를 가지는 합성곱 신경망에서는 역방향 전파가 어떻게 일어나는지 살펴보도록하자.

Computing dA

$A$ 값은 합성곱과 풀링 과정을 거쳐서 나온 결과값으로 변화량을 측정하기 위해서는 합성곱 과정에서 발생하는 변화량인 $dZ$ 를 역으로 합성곱을 통해 측정을 해야한다. 활성화 값이 $dA$ 는 필터 $W_c$ 에 의해 다음과 같이 계산된다.

\ dA \mathrel{+}=\sum_{h=0}^{nH}\sum_{w=0}^{nW}(W_c\times dZ_{hw})

Computing dW

위의 과정과 마찬가지로 필터의 변화량을 체크하기 위해서는 $dZ$ 를 역으로 합성곱을 할 필요가 있지만 그 대상이 고정된 활성값의 일부분이 되어야 한다.

\ dW_c \mathrel{+}=\sum_{h=0}^{nH}\sum_{w=0}^{nW}(a_{slice}\times dZ_{hw})

$a_{slice}$ 는 화성화 값에서 역으로 합성곱을 계산하기 위해서 일부분을 추출한 값을 의미한다.

Computing db

편차값의 변화량은 합성곱이 이워진 후에 나타나게 되므로 미분시 $dZ$ 와 곱해지는 요소가 없이 역으로 합성곱 과정을 진행하면 된다.

\ db \mathrel{+}=\sum_{h=0}^{nH}\sum_{w=0}^{nW}dZ_{hw}

conv_backward

def conv_backward(dZ, cache):
    (A_prev, W, b, hparameters) = cache
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    (f, f, n_C_prev, n_C) = W.shape
    
    stride = hparameters["stride"]
    pad = hparameters["pad"]
    
    (m, n_H, n_W, n_C) = dZ.shape
    
    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros(shape = A_prev.shape)                          
    dW = np.zeros(shape = W.shape)
    db = np.zeros(shape = b.shape)
    
    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev, pad)
    dA_prev_pad = zero_pad(dA_prev, pad)
    
    for i in range(m):                       
        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i]
        da_prev_pad = dA_prev_pad[i]
        
        for h in range(n_H):
           for w in range(n_W):
               for c in range(n_C):
                    # Find the corners of the current "slice"
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f

                    a_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]

                    da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
                    dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
                    db[:,:,:,c] += dZ[i, h, w, c]
                    
        # Set the ith training example's dA_prev to the unpadded da_prev_pad 
        dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
    
    return dA_prev, dW, db

Pooling Layer

풀링과정에서는 학습해야한 인자들이 존재하지는 않지만 어떤 방식의 풀링을 사용했는가에 따라서 $dA$ 에게 전해줘야할 요소값이 달라지게 된다.

Max Pooling

최대치 풀링에서는 풀링을 적용한 창문(window)에서 어떤 요소가 풀링의 결과고 채택이 되었는지 표시할 수 있는 인자가 필요하다.

def create_mask_from_window(x):
    mask = x == np.max(x)

    return mask

창문에 대한 표시 행렬을 만드는 것은 최대값 풀링을 적용한 층을 후방 전파를 적용할 때, 풀링의 대상이 되었던 인자만 손실함수의 영향을 미치게된다. 이를 위해서 영향을 미쳤던 부분가 그렇지 않은 부분을 구별하기 위해서 create_mask_from_window()가 필요하다.

Average Pooling

평균치 풀링과정에서는 최대치 풀링과정과 다르게 모든 행렬의 요소가 손실함수에 영향을 미치게 된다. 이를 통해 $dZ$ 를 각 모든 요소에 분산시켜 수정을 해야할 필요가 있다.

def distribute_value(dz, shape):
    (n_H, n_W) = shape
    average = dz / (n_H * n_W)
    a = np.full(shape, average)
    return a

pool_backward

def pool_backward(dA, cache, mode = "max"):
    (A_prev, hparameters) = cache
    stride = hparameters["stride"]
    f = hparameters["f"]
    
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    (m, n_H, n_W, n_C) = dA.shape
    
    dA_prev = np.zeros(shape = A_prev.shape)
    
    for i in range(m): # loop over the training examples
        a_prev = A_prev[i]
        
        for h in range(n_H):
           for w in range(n_W):
                for c in range(n_C):
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Compute the backward propagation in both modes.
                    if mode == "max":
                        a_prev_slice = a_prev[vert_start: vert_end, horiz_start: horiz_end, c]
                        mask = create_mask_from_window(a_prev_slice)

                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += dA[i, h, w, c] * mask
                        
                    elif mode == "average":
                        da = dA[i, h, w, c]
                        shape = (f, f)

                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)

    assert(dA_prev.shape == A_prev.shape)
    
    return dA_prev