Tensor: Indicate Info, furthermore, even pass the weight preserve previous Info
The sum of weights in any set is equal to the set value, also can adjust each weights towards each tensors
Multiply all the same dimensional values of the vector and add them, The higher the Correlation, the higher value of the muliplycation.
Queries, Keys, Values
Wq-Wk-Wv
Wq-Wk (Inner Product) / How does keys are simillar with Query
Wq-Wk (Weighted Sum) / Exponential, Simplification
Wv -> Result Value
Cannot Distinguish HOMONYM,
Cannot Check Other Inner Prodcut
Cannot Know Sequence and Context
Positional Encoding
Position Vector Layer
Context
Self Attention
Context Encoder Layer
Basic Structure: encoder-decoder
BERT(Encoder-only)
GPT(Decoder-only)
Translation Encoding Value -> Key, Value of Middle Attention from Transformer Decoder
Inffering the Nth Answer in Squence by REFERRING to the values up to N-1th
All tokens of Decoder knows every each tokens and Context Info.
Masking Backward(Future) Words.
Gives Extreme NEGATIVE Value of Key on Inner Product.