[PyTorch] nn.CrossEntropyLoss()

olxtar·2022년 3월 25일

torch.autograd 개념에서 왜 마지막에 .backward()를 때리는 값이 scalar? tensor여야 하는지 찾다가... 보통 loss.backward() 즉, loss값에 .backward()를 때리잖아?

? 근데 loss값이 왜 scalar값이 나오는지 궁금해서 여기까지 오게됨! (batch 즉, 64개 이미지로 돌려줬는데도 loss값은 단 하나의 scalar값으로 나오네?)

한번 밑바닥 까지 싹싹 이해해보자...

CrossEntropyLoss

- Math

l(S,t)=-log \left[\frac{e^{S_{t}}}{\sum_1^c e^{S_c}}\right]=-S_t+log\sum_1^ce^{S_c}

일단 쫄지말자
숫자 손글씨 분류기처럼 각각 0~9일 확률을 operation한 값이 score로 나오고
target값은 정답이 0~9중 무엇이냐 처럼 Class의 index를 담고있다. 즉, 숫자 2가 정답이면 target값은 2

$l$ : loss function을 의미함
$S$ : network model이 마지막으로 뱉는 output을 의미, 자 nn.CrossEntropyLoss에는 이미 nn.LogSoftmax가 포함되어 있다. 따라서 log값이 씌워진 output값이 아닌 생 output값을 줘야함 즉, raw data! = logits, score
$t$ : target, multi-class binary로 정답인 Class의 넘버라고 생각하면 된다.
$S_t$ : 정답인 Class의 예측 score 값
$c$ : Class의 개수, 여기서는 10!

위 수식에서 이거만 기억하자!

-S_t+\sum_1^ce^{S_c}

1개의 이미지 당 0~9일 확률(?)을 의미하는 Score 10개가 나온다.
[0.4, 0.9, ... , 0.1]
t=1 이다. 그말은 이 이미지는 숫자 '1'이라는 것이다.
따라서 S_t는 Score of class '1'이므로 위 Score 10개에서 바로바로 0.9인 것이다!
마지막으로... 위 수식을 설명하자면

output으로 나온 score 10개에 torch.exp를 씌우고
torch.sum()으로 모두 더해준 다음!
torch.log씌워주고
정답인 class, 즉 target의 score값을 빼주자!

- Code #1

1개의 이미지에 대한 Score로 CrossEntropyLoss 구하기

score = torch.tensor( [0.8982,
                       0.805,
                       0.6393,
                       0.9983,
                       0.5731,
                       0.0469,
                       0.556,
                       0.1476,
                       0.8404,
                       0.5544] )

target = torch.tensor( 1 )

print(score.shape)
print(target.shape)

>>>
torch.Size([10])
torch.Size([])

자 일단 1개의 input image에 대한 network model의 output(logits or score) 10개와
해당 input image에 대한 정답, 즉 label, target 등...을 만들자.
그 다음 위에 적어둔 순서대로 CrossEntropyLoss를 직접 계산해보자.

output으로 나온 score 10개에 torch.exp를 씌우고

torch.exp(score)

>>>
tensor([2.4552, 2.2367, 1.8952, 2.7137, 1.7738, 1.0480, 1.7437, 1.1590, 2.3173,
        1.7409])

torch.sum()으로 모두 더해준 다음!

torch.sum(torch.exp(score))

>>>
tensor(19.0834)

torch.log씌워주고

torch.log(torch.sum(torch.exp(score)))

>>>
tensor(2.9488)

정답인 class, 즉 target의 score값을 빼주자!

torch.log(torch.sum(torch.exp(score))) - score[ target.item() ]

>>>
tensor(2.1438)

[+]
target은 tensor(1)이고
target.item()은 1이므로
score[ target.item() ] = score[ 1 ]
$\therefore$ 0.805

마지막으로 직접 계산한 CrossEntropyLoss가 잘 나왔는지
nn.CrossEntropyLoss()로 체크해보자

criterion = nn.CrossEntropyLoss()
criterion(score,target)

>>>
tensor(2.1438)

자 이제는 batch_size=64로 돌렸을 때 왜 단 하나의 scalar값이 나오는지 알아보자!
이거는 사실 64개 각각의 이미지 당 나오는 loss들을 모두 더한 다음 이미지 개수 64개로 나눈것이다...

- Code #2

64개의 Batch 이미지에 대한 Score로 CrossEntropyLoss 구하기

score = torch.randn(64,10)
target = torch.randint(9, size=(64,))

print(score[:3])
print(score.shape)
print('------------------------------------------------------------------------------')
print(target)
print(target.shape)


>>>
tensor([[ 0.9978,  0.9063,  0.5765,  1.0108,  0.4485,  1.1336,  1.6341,  0.7848,
         -0.1417, -0.4199],
        [ 1.3955, -0.0629,  0.1729, -1.0462,  0.6561,  1.6969,  0.3144,  1.5188,
          1.4189, -0.1865],
        [-1.2740,  1.6582,  0.3553, -1.0935,  0.1199,  0.9832, -0.4927, -0.2499,
          1.0497, -0.5948]])
torch.Size([64, 10])
------------------------------------------------------------------------------
tensor([4, 7, 2, 2, 3, 8, 0, 7, 6, 0, 1, 3, 8, 7, 8, 4, 8, 1, 5, 6, 2, 7, 1, 4,
        6, 8, 2, 8, 6, 0, 2, 4, 2, 1, 7, 5, 5, 0, 6, 5, 1, 6, 5, 8, 1, 3, 8, 2,
        6, 4, 7, 4, 7, 2, 8, 7, 7, 8, 4, 7, 0, 2, 5, 4])
torch.Size([64])

score = torch.randn(64,10) : (64,10) 사이즈의 tensor 생성 (그 정규분포랜덤으로)
target = torch.randint(9, size=(64,)) : 9까지의 정수로 이루어진 (64) 사이즈의 tensor 생성

[+] 되짚어주는건데... (64,10)의 score는 64개의 input image에 대한 network의 output 값이다. 즉 64개는 이미지의 개수, 10개는 0~9까지의 Class의 개수임

#loss를 쌓아둘 변수생성
acc_loss = 0     

for i in range(len(score)):

    loss = torch.log( torch.sum( torch.exp(score[i]) ) ) - score[i, target[i].item()]
    acc_loss += loss
    
acc_loss/ len(score)

>>>
tensor(2.7268)

이제 nn.CrossEntropyLoss()로 검증해보자

criterion(score, target)

>>>
tensor(2.7268)

요약
즉!
score = [Batch size x the number of Class]
target = [Batch size]

ex) Batch size = 8, the number of Class = 3 (A=0, B=1 ,C=2)

Score (model's output) $=\begin{bmatrix} S_{A1} & S_{B1} & S_{C1} \\ & & \\ \vdots & \vdots & \vdots \\ & & \\ S_{A8} & S_{B8} & S_{C8} \end{bmatrix}\rightarrow$ size = 8 x 3

Target (정답, label값) $=\begin{bmatrix} 0 \\ 2 \\ \vdots \\ 1 \\ 2\end{bmatrix}\rightarrow$ size = 8 x 1

criterion = nn.CrossEntropyLoss(score, target)

criterion은 8개의 input(이미지)에 대한 평균 loss값 $\rightarrow$ Scalar!

"즉 8개의 이미지는 각각의 loss값을 가지는데 nn.CrossEntropyLoss는 8개의 loss값들을 모두 더해서 8로 나눠줌"

"들어온 score에 대해서는 1개의 input에 대한 score가 들어왔던, n개의 input에 대한 score가 들어왔던 모두 평균내드림!"