-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creates quiz for unit 6 #429
Merged
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
a310438
Create quiz for unit 6
josejuanmartinez 57678da
Update quiz.mdx
josejuanmartinez 306d408
Update _toctree.yml
josejuanmartinez 40cf768
Fixes typo and comma(s)
josejuanmartinez f7c510a
Adds newline after ###
josejuanmartinez f41bf2c
Fixes missing commas
josejuanmartinez File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# Quiz | ||
|
||
The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**. | ||
|
||
|
||
### Q1: What of the following interpretations of bias-variance tradeoff is the most accurate in the field of Reinforcement Learning? | ||
|
||
<Question | ||
choices={[ | ||
{ | ||
text: "The bias-variance tradeoff reflects how my model is able to generalize the knowledge to previously tagged data we give to the model during training time.", | ||
explain: "This is the traditional bias-variance tradeoff in Machine Learning. In our specific case of Reinforcement Learning, we don't have previously tagged data, but only a reward signal.", | ||
correct: false, | ||
}, | ||
{ | ||
text: "The bias-variance tradeoff reflects how well the reinforcement signal reflects the true reward the agent should get from the enviromment", | ||
explain: "", | ||
correct: true, | ||
}, | ||
]} | ||
/> | ||
|
||
### Q2: Which of the following statements are True, when talking about models with bias and/or variance in RL? | ||
<Question | ||
choices={[ | ||
{ | ||
text: "An unbiased reward signal returns rewards similar to the real / expected ones from the environment", | ||
explain: "", | ||
correct: true, | ||
}, | ||
{ | ||
text: "A biased reward signal returns rewards similar to the real / expected ones from the environment", | ||
explain: "If a reward signal is biased, it means the reward signal we get differs from the real reward we should be getting from an environment", | ||
correct: false, | ||
}, | ||
, | ||
{ | ||
text: "A reward signal with high variance has much noise in it and gets affected by, for example, stochastic (non constant) elements in the environment" | ||
explain: "", | ||
correct: true, | ||
}, | ||
{ | ||
text: "A reward signal with low variance has much noise in it and gets affected by, for example, stochastic (non constant) elements in the environment" | ||
explain: "If a reward signal has low variance, then it's less affected by the noise of the environment and produce similar values regardless the random elements in the environment", | ||
correct: false, | ||
}, | ||
]} | ||
/> | ||
|
||
|
||
### Q3: Which of the following statements are true about Monte-carlo method? | ||
<Question | ||
choices={[ | ||
{ | ||
text: "It's a sampling mechanism, which means we don't consider analyze all the possible states, but a sample of those", | ||
explain: "", | ||
correct: true, | ||
}, | ||
{ | ||
text: "It's very resistant to stochasticity (random elements in the trajectory)", | ||
explain: "Monte-carlo randomly estimates everytime a sample of trajectories. However, even same trajectories can have different reward values if they contain stochastic elements", | ||
correct: false, | ||
}, | ||
, | ||
{ | ||
text: "To reduce the impact of stochastic elements in Monte-Carlo, we can take `n` strategies and average them, reducing their impact impact in case of noise" | ||
explain: "", | ||
correct: true, | ||
}, | ||
]} | ||
/> | ||
|
||
### Q4: What is the Advanced Actor-Critic Method (A2C)? | ||
<details> | ||
<summary>Solution</summary> | ||
|
||
The idea behind Actor-Critic is that we learn two function approximations: | ||
1. A `policy` that controls how our agent acts (π) | ||
2. A `value` function to assist the policy update by measuring how good the action taken is (q) | ||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/step2.jpg" alt="Actor-Critic, step 2"/> | ||
|
||
</details> | ||
|
||
### Q5: Which of the following statemets are True about the Actor-Critic Method? | ||
<Question | ||
choices={[ | ||
{ | ||
text: "The Critic does not learn from the training process", | ||
explain: "Both the Actor and the Critic function parameters are updated during training time", | ||
correct: false, | ||
}, | ||
{ | ||
text: "The Actor learns a policy function, while the Critic learns a value function", | ||
explain: "", | ||
correct: true, | ||
}, | ||
{ | ||
text: "It adds resistance to stochasticity and reduces high variance", | ||
explain: "", | ||
correct: true, | ||
}, | ||
]} | ||
/> | ||
|
||
|
||
|
||
### Q6: What is `Advantege` in the A2C method? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typo |
||
<details> | ||
<summary>Solution</summary> | ||
|
||
Instead of using directly the Action-Value function of the Critic as it is, we could use an `Advantage` function. The idea behind an `Advantage` function is that we calculate the relative advantage of an action compared to the others possible at a state, averaging them. | ||
|
||
In other words: how taking that action at a state is better compared to the average value of the state | ||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/advantage1.jpg" alt="Advantage in A2C"/> | ||
|
||
</details> | ||
|
||
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read the chapter again to reinforce (😏) your knowledge. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's this comma who created the bug (line 36)