CS231N - 4

Ho Jin Lee·2023년 4월 11일

CS231N

목록 보기

4/4

Backpropagation => using chain rule

we can do it like this

simpler to little be more complex group if you can derive local gradient.

max gate

one will have full gradient of before while other will get gradient of 0.
because in forward path, also max value was used!

Jacobian Matrix will be just a diagonal matrix

calculating how much does the number affect the final output

non linearity is important!
multiple layers
h is value of scores of each of templates on W1
W2 weights all of them

h is right after the non linearity

배 터져 죽을 때까지