Note: I'll write about only the things that are I think noteworthy.
"It was the development of language that makes human being invincible."
Word2vec(Mikolov et al. 2013) is a framework for learning word vectors.
1) We have a large corpus (pile of text)
2) Every word in a fixed vocabulary is represented by a vector (initialized with random vector)
3) Go through each position t in the text, which has a center word c and context (outside) word o
4) Use the similarity of the word vectors for c and o to calculate or
Note: is for Skip-gram, and is for CBOW(Continuous Bag Of Word)
5) Keep adjusting the word vectors to maximize this probability
Likelihood (Maximize)
For all given center word ,
maximize the probability of occurence of context word t+j
( is window size)
Objective Function (Minimize)
* Minimize the objective function <=> Maximizing predictive accuracy
To calculate , we will use two vectors per word w (for easier optimization)
- when w is a center word
- when w is a context word
- Then, . . . This is softmax!
... is size of a word vector and is # words in vocabulary.
Algorithm (by SGD with learning rate )
For each sentence in corpus,
For each word in sentence,
calculatecalculate
calculate ... for more details, check here
update
This is another method of Word2vec implementation, which uses only one vector for each word.
Architecture (Skip-gram)
Note: The notation of the architecture below is different from what we learned so far.
... size of word vector
... and uses 2 vectors () for each word while doesn't.
- The input is one-hot vector.
- gives us , which is the word vector we want to learn.
- With , calculate for each context words, and compare it with real context vector (using cross-entropy)
- Update and which also means that update word vector.
Here you can find my work for HW1!
If there is something wrong in my writing or understanding, please comment and make corrections!
[reference]
1. https://youtu.be/8rXD5-xhemo
2. https://arxiv.org/abs/1301.3781
3. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/slides/cs224n-2019-lecture01-wordvecs1.pdf
4. https://stats.stackexchange.com/questions/253244/gradients-for-skipgram-word2vec
5. https://reniew.github.io/22/