CNN 입/출력 데이터 및 파라미터 계산

GRoovAllstar·2022년 11월 3일

Layer 별 입력 데이터에 대한 출력 Feature Map 산정 계산법
- 입력 데이터 높이 : H
- 입력 데이터 폭 : W
- 필터 높이 : FH
- 필터 폭 : FW
- Stride 크기 : S
- Padding 크기 : P

출력 데이터 크기 계산식 $Output Height = \frac{(H+2P)-FH}{S}+1$ $Output Width = \frac{(H+2P)-FW}{S}+1$

Input Shape: (7, 7, 3) Output Shape : (5, 5 ,4) K : (3, 3) P : (0, 0) S : (1, 1) Filter : 4

Output Height = \frac{(7+2*0)-3}{1}+1 = 5

Output Width = \frac{(7+2*0)-3}{1}+1 = 5

(출처 - https://miro.medium.com/max/1100/1*ubRrYAZJUlCcqg7WoKjLgQ.gif)
kernel width/height가 다를경우
Input Shape: (7, 9, 3) Output Shape : (3, 8, 2) K : (5, 2) P : (0, 0) S : (1, 1) Filter : 2

Output Height = \frac{(7+2*0)-5}{1}+1 = 3

Output Width = \frac{(9+2*0)-2}{1}+1 = 8

(출처 - https://miro.medium.com/max/1100/1*EnIGiVTcIMQm9ujkOHPc5A.gif)
stride가 다를 경우
Input Shape: (9, 9, 3) Output Shape : (7, 3, 2) K : (3, 3, 2) P : (0, 0) S : (1, 3) Filter : 2

Output Height = \frac{(9+2*0)-3}{1}+1 = 7

Output Width = \frac{(9+2*0)-2}{3}+1 = 3

(출처 - https://miro.medium.com/max/1100/1*o9-Rq3QUC8IzTMfAJIbhLA.gif)

Input Shape : (7, 7, 2) Output Shape : (7, 7, 1) K : (3, 3) P : (1, 1) S : (1, 1) Filter : 1

Output Height = \frac{(7+2*1)-3}{1}+1 = 7

Output Width = \frac{(7+2*1)-3}{1}+1 = 7

(출처 - https://miro.medium.com/max/1100/1*EnIGiVTcIMQm9ujkOHPc5A.gif)

CNN Layer의 입/출력, 파라미터 계산
- 파라미터 계산식 $Input Channel * Kernel Width * kernel Height * Output Channel + Bias(Filters)$
- Output Channel은 Filters 값
Pooling Layer 계산
$Output Row Size = \frac{Input Row Size}{Pooling Size}$ $Output Column Size = \frac{Input Column Size}{Pooling Size}$
BatchNormalization 파라미터 계산
- 현재 gamma, beta, mean, standard deviation 총 4개의 파라미터를 구하도록 구현되어 있음.
- https://github.com/keras-team/keras/issues/1523
  - gamma : 스케일링(scaling) 파라미터, beta : shift 파라미터.
  - mean, standard deviation은 Non-trainable params.
모델

IMAGE_SIZE = 32
input_tensor = Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3), name='input')

x = Conv2D(filters=64, kernel_size=(3, 3), padding='same', name='conv2d_1')(input_tensor)
x = BatchNormalization(name='bn_1')(x)
x = Activation('relu', name='activation_1')(x)
x = MaxPooling2D(pool_size=2)(x)

x = Conv2D(filters=128, kernel_size=3, padding='same', name='conv2d_2')(x)
x = BatchNormalization(name='bn_2')(x)
x = Activation('relu', name='activation_2')(x)
x = MaxPooling2D(pool_size=2)(x)

x = Conv2D(filters=256, kernel_size=3, strides=2, padding='same', name='conv2d_3')(x)
x = BatchNormalization(name='bn_3')(x)
x = Activation('relu', name='activation_3')(x)
x = MaxPooling2D(pool_size=2)(x)

x = Conv2D(filters=512, kernel_size=3, strides=2, padding='same', name='conv2d_4')(x)
x = BatchNormalization(name='bn_4')(x)
x = Activation('relu', name='activation_4')(x)

x = GlobalAveragePooling2D()(x)
x = Dropout(rate=0.5)(x)
x = Dense(50, activation='relu', name='fc')(x)
x = Dropout(rate=0.5)(x)
output = Dense(10, activation='softmax', name='output')(x)

model = Model(inputs=input_tensor, outputs=output)

계산 결과

Layer	Input Shape	Output Shape	Parameters
Convolution Layer 1	(32+21-3)/1+1 = 32, (32+21-3)/1+1 = 32, 3	32, 32, 64	133*64+64=1792
Batch Normalization Layer 1	32, 32, 64	32, 32, 64	64*4 = 256
MaxPooling Layer 1	32, 32, 64	32/2=16, 32/2=16, 64	0
Convolution Layer 2	(16+21-3)/1+1 = 16, (16+21-3)/1+1 = 16, 128	16, 16, 128	6433*128+128=73856
Batch Normalization Layer 2	16, 16, 128	16, 16, 128	128*4 = 512
MaxPooling Layer 2	16, 16, 128	16/2 = 8,16/2 = 8,128	0
Convolution Layer 3	(8+21-3)/2+1 = 4, (8+21-3)/2+1 = 4, 256	4, 4, 256	12833*256+256=295168
Batch Normalization Layer 3	4, 4, 256	4, 4, 256	256*4 = 1024
MaxPooling Layer 3	4, 4, 256	4/2=2, 4/2=2, 256	0
Convolution Layer 4	(2+21-3)/2+1 = 1,(2+21-3)/2+1 = 1,512	1, 1, 512	25633*512+512=1180160
Batch Normalization Layer 4	1, 1, 512	1, 1, 512	512*4 = 2048
Dropout Layer 1	None, 512	None, 512	0
Fully Connected Layer 1	None, 50	None, 50	512*50+50=25650
Dropout Layer 2	None, 50	None, 50	0
Output	None, 50	None, 10	50*10+10 = 510

ref. https://towardsdatascience.com/conv2d-to-finally-understand-what-happens-in-the-forward-pass-1bbaafb0b148
ref. http://taewan.kim/post/cnn/

GRoovAllstar

Keep on eye on the future :)

이전 포스트

PyMongo Tutorial 페이지 번역

다음 포스트

M1 Mac docker build error

1개의 댓글

anderson

2023년 12월 26일

The input data for a CNN is typically represented as a multi-dimensional array known as a tensor. For images, this tensor is usually in the form of width x height x channels, where the width and height represent the dimensions of the image, and the channels represent color channels
RTS TV APK

답글 달기

CNN 입/출력 데이터 및 파라미터 계산

PyMongo Tutorial 페이지 번역

M1 Mac docker build error

1개의 댓글

관련 채용 정보