-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[W2D5] T2: equation for greedy policy #446
Comments
As of slack discussion with @marcelomattar, N stands for the number of actions, but there are inconsistencies both within equation and in the text below regarding whether the argmax-option is among the options to choose in the non-greedy case. What the solution code does is it chooses the greedy option with probability (1-epsilon) and between all options (including the greedy), otherwise. Suggested fix (in line with Sutton & Barto)
Alternatively (appears to also be used in practice; makes equation look slightly less complex):
|
After going through the slack discussion, I propose the following fix:
|
@willwayaka IMO, the first case should be 1 - \epsilon + 1/N, then the second condition should be an "else". I think it confusing to have the actions contained in the "if" and the "else" be overlapping. |
* Closes #387 and fixes RGB -> C1C2C0 * Closes #395 * Process tutorial notebooks * Closes #483 * Process tutorial notebooks * Closes #406 and fixes colors * Process tutorial notebooks * Closes #404 * Process tutorial notebooks * Closes #419 * Process tutorial notebooks * Closes #410 T3 comments * Closes #410 * Process tutorial notebooks * Partially addresses #435 for T1 * Process tutorial notebooks * Created using Colaboratory * Process tutorial notebooks * Closes #457 #446, partially addresses #455 * Closes #456 * Process tutorial notebooks * Closes #455 * Created using Colaboratory * Closes #460 * Process tutorial notebooks Co-authored-by: GitHub Action <[email protected]>
In the equation for greedy policy:
\begin{align}
P (a_{t} = a) =
\begin{cases}
1 - \epsilon & \quad \text{if } a_{t} = \text{argmax}{a} ; q{t} (a) \
\epsilon/N & \quad \text{else}
\end{cases}
\end{align}
(between sec2.1 and ex1) what does N stand for? There is no mention of it in the text.
The text was updated successfully, but these errors were encountered: