Cross-Entropy gradient

d4r6j·2023년 9월 24일

목록 보기

1/3

Cross-Entropy gradient : $case$ (Hard distillation)

with respect to each logit, $z_i$ of the distilled model.

If the cumbersome model has logits $v_i$ which produce soft target probabilities $p_i$

and the transfer training is done at a temperature of $T$ ,

\frac{\partial C}{\partial z_i} = \frac{1}{T}(q_i-p_i)

[REF]
paper : https://arxiv.org/pdf/1503.02531.pdf
blog : https://jmlb.github.io/ml/2017/12/26/Calculate_Gradient_Softmax/

Softmax, Cross-Entropy fomula