[DEEP LEARNIG with Python] 02장. 신경망의 수학적 구성 요소 - 2

ByungJik_Oh·2025년 4월 21일

AI Deep Learning machine learning 케라스 창시자와 배우는 딥러닝

[DEEP LEARNING with Python]

목록 보기

3/9

📖 02장. 신경망의 수학적 구성 요소

🗝️ 핵심내용

첫 번째 신경망 예제 만들기

텐서와 텐서 연산의 개념

역전파와 경사 하강법을 사용하여 신경망이 학습되는 방법

📑 신경망의 엔진: 그레이디언트 기반 최적화

초기의 W(가중치)와 b(절편)은 난수로 채워져 있는데, 이 상태에선 relu(dot(W, input) + b)가 유용한 표현을 만들 것이라고 기대하기 힘들다. 이때 점진적인 조정, 훈련(training)을 통해 피드백 신호에 기초하여 가중치가 점진적으로 조정된다. 이는 다음과 같은 훈련 반복 루프(training loop)안에서 일어난다.

훈련 샘플 x와 타깃 y_true의 배치 추출
x를 사용하여 y_pred 추출
y_pred와 y_true츼 차이를 측정하여 모델의 손실 계산
배치에 대한 손실이 감소되도록 모델의 모든 가중치 업데이트

그렇다면 손실을 어떻게 감소하는 방향으로 업데이트 할 수 있을까?

텐서 연산의 도함수: 그레이디언트

그레이디언트 : 텐서 연산(텐서 함수)의 도함수

loss_value = f(W)
1. 현재 W의 값을 W0라고 하였을 때, 점 W0에서 f의 도함수는 grad(loss_value, W0)
2. 위 텐서의 각 원소 grad(loss_value, W0)[i, j]는 W0[i, j]를 수정했을 때 loss_value가 바뀌는 방향과 크기

위와 같은 그레이디언트의 특징을 활용하여 함수 f에 대해 도함수의 반대방향으로 x를 조금 움직이면 f(x)의 값을 조금씩 감소시킬 수 있다.

확률적 경사 하강법

이때 손실 함수의 최솟값을 해석적으로 구할 수 있는데, 함수의 최솟값은 도함수가 0인 지점이다. 하지만 실제 신경망에서는 파라미터의 수가 수천, 수만개가 되기 때문에 해석적으로 해결하는 것은 어렵다.

대신 랜덤한 배치 데이터에서 현재 손실값을 토대로 하여 조금씩 파라미터를 수정하는 방법을 사용할 수 있다.

훈련 샘플 배치 x와 타깃 y_true 추출
x로 예측 y_pred 추출
이 배치에서 y_pred와 y_true의 차이를 측정하여 손실 계산
손실 함수의 그레이디언트 계산
그레이디언트의 반대방향으로 사용자가 설정한 learning_rate (스칼라 값)을 사용하여 파라미터를 조금씩 이동 시킨다. ex) W -= learing_rate * gradient

또한 업데이트할 다음 가중치를 그레이디언트 값 뿐만아니라 모멘텀을 사용한 SGD, Adagrad, RMSProp 등이 있다. 이러한 방법들을 모두 최적화 방법(옵티마이저, optimization method)라고 한다.

# 모멘텀을 사용한 옵티마이저의 간단한 구현
past_velocity = 0
momentum = 0.9
while  loss > 0.01:
    w, loss, gradient = get_current_parameters()
    velocity = momentum * past_velocity - learning_rate * gradient
    w = w + velocity
    past_velocity = velocity
    update_parameter(w)

손실 함수의 도함수가 0이 되는 값은 여러가지가 될 수 있고, 도함수가 0이 되는 지점 중 최솟값이 아닌 지점을 지역 최솟값이라고 한다. 이때 작은 학습률을 가진 SGD로 최적화 되었다면 전역 최솟값이 아닌 지역 최솟값에 갇힐 수 있는데, 모멘텀의 개념을 사용하여 이러한 문제를 피할 수 있다.

도함수 연결: 역전파 알고리즘

역전파 : 간단한 연산의 도함수를 사용해서 이런 기초적인 연산을 조합한 복잡한 연산의 그레이디언트를 쉽게 계산하는 방법이다.

역전파는 최종 손실값에서 시작하여 아래층에서 맨 위층까지 거꾸로 거슬러 올라가 각 파라미터의 손실값에 기여한 정도를 계산한다.

이때 텐서플로의 API인 GradientTape을 사용하여 간단하게 그레이디언트를 구할 수 있다

import tensorflow as tf

x = tf.Variable(0.)
with tf.GradientTape() as tape:
    y = 2 * x + 3
grad_of_y_wrt_x = tape.gradient(y, x)
print(grad_of_y_wrt_x)
# tf.Tensor(2.0, shape=(), dtype=float32)

GradientTape를 다차원 텐서와 함께 사용할 수 있다.

x = tf.Variable(tf.zeros((2, 2)))
with tf.GradientTape() as tape:
    y = 2 * x + 3
grad_of_y_wrt_x = tape.gradient(y, x)
print(grad_of_y_wrt_x)
# tf.Tensor(
# [[2. 2.]
#  [2. 2.]], shape=(2, 2), dtype=float32)

변수 리스트의 그레이디언트를 계산할 수도 있다.

W = tf.Variable(tf.random.uniform((2, 2)))
b = tf.Variable(tf.zeros((2,)))
x = tf.random.uniform((2, 2))
with tf.GradientTape() as tape:
    y = tf.matmul(x, W) + b
grad_of_y_wrt_W_and_b = tape.gradient(y, [W, b])

📑 첫번째 예제 다시 살펴보기

이 장의 첫번째 예제로 돌아가서 지금까지 배웠던 내용을 이용하여 코드를 자세하게 리뷰해보자.

단순한 Dense 클래스

# 단순한 Dense 클래스
class NaiveDense:
   def __init__(self, input_size, output_size, activation):
       self.activation = activation

       w_shape = (input_size, output_size)
       w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
       self.W = tf.Variable(w_initial_value)
       
       b_shape = (output_size,)
       b_initial_value = tf.zeros(b_shape)
       self.b = tf.Variable(b_initial_value)

   def __call__(self, inputs):
       return self.activation(tf.matmul(inputs, self.W) + self.b)

   @property
   def weights(self):
       return [self.W, self.b]

2개의 텐서플로 변수 W와 b를 만들고 __call()__ 메서드에 앞서 언급한 변환을 적용해보자.

단순한 Sequential 클래스

# 단순한 Sequential 클래스
class NaiveSequential:
   def __init__(self, layers):
       self.layers = layers

   def __call__(self, inputs):
       x = inputs
       for layer in self.layers:
           x = layer(x)
       return x
   
   @property
   def weights(self):
       weights = []
       for layer in self.layers:
           weights += layer.weights
       return weights

층의 리스트를 받고 __call()__ 메서드에서 입력을 사용하여 층을 순서대로 호출하여 연결한다.

NaiveDense, NaiveSequential 클래스를 이용하여 모델 생성

model = NaiveSequential([
   NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
   NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
])
assert len(model.weights) == 4

배치 제너레이터

# 배치 제너레이터
import math

class BatchGenerator:
   def __init__(self, images, labels, batch_size=128):
       assert len(images) == len(labels)
       self.index = 0
       self.images = images
       self.labels = labels
       self.batch_size = batch_size
       self.num_batchs = math.ceil(len(images) / batch_size)

   def next(self):
       images = self.images[self.index : self.index + self.batch_size]
       labels = self.labels[self.index : self.index + self.batch_size]
       self.index += self.batch_size
       return images, labels

MNIST 데이터를 미니 배치로 순회하기 위해 BatchGenerator 클래스를 구현한다.

그레이디언트 계산

# 그레이디언트 계산
def one_training_step(model, images_batch, labels_batch):
   with tf.GradientTape() as tape:
       predictions = model(images_batch)
       per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
           labels_batch, predictions)
       average_loss = tf.reduce_mean(per_sample_losses)
   gradients = tape.gradient(average_loss, model.weights)
   update_weights(gradients, model.weights)
   return average_loss

손실 값을 구하고, 그레디언트를 통해 가중치를 업데이트하기 위해 그레이디언트를 계산하는 함수를 구현하고, update_weight 함수를 호출하여 가중치를 업데이트한다.

가중치 업데이트

learning_rate = 1e-3

def update_weights(gradients, weights):
   for g, w in zip(gradients, weights):
       w.assign_sub(g * learning_rate)

학습률을 지정하고, 각 가중치에 gradient * learning_rate 빼서 가중치를 업데이트한다.

전체 훈련 루프

def fit(model, images, labels, epochs, batch_size=128):
   for epoch_counter in range(epochs):
       print(f'에포크 {epoch_counter}')
       batch_generator = BatchGenerator(images, labels)
       for batch_counter in range(batch_generator.num_batchs):
           images_batch, labels_batch = batch_generator.next()
           loss = one_training_step(model, images_batch, labels_batch)
           if batch_counter % 100 == 0:
               print(f'{batch_counter}번째 배치 손실: {loss:.2f}')

훈련 에포크 하나마다 각 배치에 대한 one_training_step(훈련 스텝)을 반복한다.

함수 테스트

from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

fit(model, train_images, train_labels, epochs=10, batch_size=128)
# 에포크 0
# 0번째 배치 손실: 5.07
# 100번째 배치 손실: 2.26
# 200번째 배치 손실: 2.22
# 300번째 배치 손실: 2.10
# 400번째 배치 손실: 2.23
# ...
# 에포크 9
# 0번째 배치 손실: 0.67
# 100번째 배치 손실: 0.71
# 200번째 배치 손실: 0.61
# 300번째 배치 손실: 0.67
# 400번째 배치 손실: 0.73

epochs를 10으로 지정하고 fit()함수를 호출한 결과, 각 에포크마다 각 배치의 손실 값을 계산하고 가중치를 업데이트하여 에포크가 지날 때 마다 배치에 대한 손실 값이 줄어드는 것을 확인할 수 있다.

모델 평가

# 모델 평가하기
predictions = model(test_images)
predictions = predictions.numpy()
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f'정확도: {matches.mean():.2f}') # 정확도: 0.82

모델을 평가하기 위해 10000개의 테스트 데이터에 대한 0~9까지의 확률값을 저장하여 테스트 레이블과 비교한 데이터를 matches 변수에 저장한다. 이때 matches 에는 Boolean 형식으로 예측값이 맞았는지 틀렸는지 True / False 형태로 저장되어 있고 이들의 평균을 계산해보면 정확도가 약 82%로 도출되는 것을 확인할 수 있다.