RNN, Attention - 시각적 이해를 위한 머신러닝 3

zzwon1212·2024년 7월 11일
0

딥러닝

목록 보기
27/33

11. Recurrent Neural Networks

  • Sequential Data
    label y는 task에 따라 single일 수도, sequence일 수도 있다.

    • Image Captioning
    • Visual Question & Answering (VQA)
    • Visual Dialog (Conversation about an Image)
    • Visual Language Navigation
  • Types of Neural Networks

    • One-to-one
    • Many-to-one
    • One-to-many
    • Many-to-many
    • Sequence-to-sequence
  • Internal State
    At each step, the new internal state is determined by its old state as well as the input (feedback loop).

  • The same function (ff) and the same set of parameters (W\mathrm{W}) are used at every time step.

    ht=fw(ht1,xt)ht=tanh(Whhht1,Wxhxt)h_t = f_\mathrm{w} (h_{t-1}, x_t) \\ \, \\ h_t = \mathrm{tanh} (\mathrm{W}_{hh} h_{t-1}, \mathrm{W}_{xh} x_t)
    • For binary classification (many-to-many)
      y^t=σ(Whyht)\hat{y}_t = \sigma(\mathrm{W}_{hy} h_t)
    • For regression (many-to-many)
      y^t=Whyht\hat{y}_t = \mathrm{W}_{hy} h_t
  • Multi-layer RNN

  • LSTM

    • cell state
    • forget gate
    • input gate
    • output gate
  • GRU


12. RNN-based Video Models

Attention Mechanism

  • RNN has Information Loss problem.

  • Attention Summary

    • Query: decoder hidden state s0s_0
    • Key, Value: encoder hidden states {h1,h2,h3,...}\{h_1, h_2, h_3, ...\}
    • Attention Value: weighted average of encoder hidden states
      • Weights: similarity to s0s_0 (attention coefficients)
  • Attention-based Video Models

    • MultiLSTM
      • Query: previous hidden state hi1h_{i-1} of LSTM
      • Key, Value: NN recent input frame features
      • Attention value: weighted sum of recent NN frame features
    • Visual Attention
      • Spatial attention
        "Where should we focus on the 2D image space to classify the video correctly?"
        Spatial attention provides interpretability.
        • lt\mathrm{l}_t: spatial attention coefficients
        • Xt\mathrm{X}_t: the last conv-layer representation of an input image
      • Query: previous hidden state of the last LSTM (ht1h_{t-1})
      • Key, Value: K×KK \times K regional features from input Xt\mathrm{X}_t
      • Attention value: weighted sum of region features
        • Weights: proportional to relevance to ht1h_{t-1}

📙 강의

profile
JUST DO IT.

0개의 댓글

관련 채용 정보