4.3 SARSA vs Q-learning

Tommy Kim·2023년 9월 16일

Reinforcement Learning - hyukppenheim youtube

목록 보기

11/13

Let’s compare two methods with a following example:

I will denote row 2 column 3 as r2c3(in this figure).
What sarsa do is getting TD-target corresponding to current action and next action. After first episode, r2c3 will be updated.

Then, in the second episode, the agent will update r2c2 only negative terms. It will take -100 from lower box, and -100 multiplied with $\gamma$ from next box(This is because SARSA do not take the maximum value - as we learned).

SARSA has the advantage of accounting for negative rewards. However, this advantage can sometimes hinder it from finding the optimal solution.

In contrast, Q-learning takes the optimal policy, so it can find optimal solution.

Both methods find optimal solution eventually. But SARSA takes much time due to its prioritizing safety.

Tommy Kim

I’m interested in artificial intelligence

이전 포스트

4.2 Q-learning(advanced)

다음 포스트

4.3 SARSA vs Q-learning

Reinforcement Learning - hyukppenheim youtube

4.2 Q-learning(advanced)

4.4 n-step TD vs n-step Q-learning

0개의 댓글

관련 채용 정보