cs231n 과제2 Q1- Fully Connected Network

이준학·2024년 7월 17일

cs231n 과제

목록 보기

6/15

이번 글에서는 cs231n 과제2 Q1에 대해 다뤄보려고 한다. 이전의 과제1의 Two Layer Net의 확장 버전이라고 생각하면 된다. 코딩할게 유난히 많고, 내가 잘못 생각한 부분들이 많아서 시간이 되게 오래 걸렸다. 과제의 파일들을 보면 batch normalization과 dropout과 관련된 파일들이 있는데, 이건 추후에 다룬다고 한다. 전과 마찬가지로 내가 헷갈렸거나 몰랐던 부분들 위주로 정리하겠다.

1. Training a three/five layer net

num_train = 50
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

weight_scale = 4e-2  # Experiment with this!
learning_rate = 9e-3 # Experiment with this!

model = FullyConnectedNet(
    [100, 100, 100, 100],
    weight_scale=weight_scale,
    dtype=np.float64
)
solver = Solver(
    model,
    small_data,
    print_every=10,
    num_epochs=20,
    batch_size=25,
    update_rule='sgd',
    optim_config={'learning_rate': learning_rate},
) 
solver.train()

plt.plot(solver.loss_history)
plt.title('Training loss history')
plt.xlabel('Iteration')
plt.ylabel('Training loss')
plt.grid(linestyle='--', linewidth=0.5)
plt.show()

위의 함수는 Three layer network를 훈련시키는 코드이다. 사실 각 파일에서 코딩은 다 해놨기 때문에 weight scale과 learning rate를 바꿔가면서 training data를 overfitting 시키면 된다. 그런데 weight_scale과 lr에 따라 굉장히 예민하게 정확도가 바뀌어서 overfitting시키기가 어려웠던 것 같다.

2. Optimization

1) SGD+Momentum

먼저 dictionary에서 사용할 수 있는 것 중의 하나로 dict.setdefault("key",value)가 있다. 말 그대로 딕셔너리의 특정 key값의 default value를 지정해주는 것이다. 일종의 초기화라고 받아들였다.

def sgd_momentum(w, dw, config=None):
   """
   Performs stochastic gradient descent with momentum.

   config format:
   - learning_rate: Scalar learning rate.
   - momentum: Scalar between 0 and 1 giving the momentum value.
     Setting momentum = 0 reduces to sgd.
   - velocity: A numpy array of the same shape as w and dw used to store a
     moving average of the gradients.
   """
   if config is None:
       config = {}
   config.setdefault("learning_rate", 1e-2)
   config.setdefault("momentum", 0.9)
   v = config.get("velocity", np.zeros_like(w))

   next_w = None
   ###########################################################################
   # TODO: Implement the momentum update formula. Store the updated value in #
   # the next_w variable. You should also use and update the velocity v.     #
   ###########################################################################
   # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
   v=config["momentum"]*v - config["learning_rate"]*dw
   next_w=w+v
   # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
   ###########################################################################
   #                             END OF YOUR CODE                            #
   ###########################################################################
   config["velocity"] = v

   return next_w, config

내가 의문이 들었던 부분은, SGD+Momentum의 실제 식과 코드로 구현한 식이 다른 이유가 무엇인지였다. 이걸 위주로 좀 정리해보려고 한다. 알아보니 SGD+Momentum의 방식이 두 가지라고 한다. 식으로 적혀있는 버전과 코드에 나와있는 버전 모두 사용할 수 있지만, hyperparameter가 구성된 방식이 다르기 때문에 두 개를 다 시도해보고 모델에 가장 잘 맞는 방식을 택하면 된다고 한다. 아마 강의에서는 error가 최대한 적게 나오는 방식으로 하도록 하기 위해 ppt에 나오는 식과 다른 버전을 알려준 것 같다. RMSProp과 Adam에서는 그냥 ppt에 나온 식을 그대로 구현하면 문제 없이 에러값이 작게 나왔다.

3. 정리

이렇게 보니까 별 내용이 없긴 하다. 그런데 이 과제가 워낙 코딩해야하는 파일들이 많기 때문에, 에러가 어디서 났는지 찾는 과정이 쉽지 않았던 것 같다. 좀 하다가 잘 안되니까 다른 블로그 코드도 좀 참고해봤는데, 나랑 구현 방식이 달라서 오히려 더 헷갈리게 된 것 같다. 풀이는 파일을 첨부해서 올려놓았다.

내 풀이 링크:

https://github.com/danlee0113/cs231n

이준학

AI/ Computer Vision

이전 포스트

cs231n 과제1 Q5- Higher Level Representations : Image Features

다음 포스트