Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creates quiz for unit 6 #429

Merged
merged 6 commits into from
Dec 6, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions units/en/unit6/quiz.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Quiz

The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.


### Q1: What of the following interpretations of bias-variance tradeoff is the most accurate in the field of Reinforcement Learning?

<Question
choices={[
{
text: "The bias-variance tradeoff reflects how my model is able to generalize the knowledge to previously tagged data we give to the model during training time.",
explain: "This is the traditional bias-variance tradeoff in Machine Learning. In our specific case of Reinforcement Learning, we don't have previously tagged data, but only a reward signal.",
correct: false,
},
{
text: "The bias-variance tradeoff reflects how well the reinforcement signal reflects the true reward the agent should get from the enviromment",
explain: "",
correct: true,
},
]}
/>

### Q2: Which of the following statements are True, when talking about models with bias and/or variance in RL?
<Question
choices={[
{
text: "An unbiased reward signal returns rewards similar to the real / expected ones from the environment",
explain: "",
correct: true,
},
{
text: "A biased reward signal returns rewards similar to the real / expected ones from the environment",
explain: "If a reward signal is biased, it means the reward signal we get differs from the real reward we should be getting from an environment",
correct: false,
},
,
Copy link
Member

@simoninithomas simoninithomas Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's this comma who created the bug (line 36)

{
text: "A reward signal with high variance has much noise in it and gets affected by, for example, stochastic (non constant) elements in the environment"
explain: "",
correct: true,
},
{
text: "A reward signal with low variance has much noise in it and gets affected by, for example, stochastic (non constant) elements in the environment"
explain: "If a reward signal has low variance, then it's less affected by the noise of the environment and produce similar values regardless the random elements in the environment",
correct: false,
},
]}
/>


### Q3: Which of the following statements are true about Monte-carlo method?
<Question
choices={[
{
text: "It's a sampling mechanism, which means we don't consider analyze all the possible states, but a sample of those",
explain: "",
correct: true,
},
{
text: "It's very resistant to stochasticity (random elements in the trajectory)",
explain: "Monte-carlo randomly estimates everytime a sample of trajectories. However, even same trajectories can have different reward values if they contain stochastic elements",
correct: false,
},
,
{
text: "To reduce the impact of stochastic elements in Monte-Carlo, we can take `n` strategies and average them, reducing their impact impact in case of noise"
explain: "",
correct: true,
},
]}
/>

### Q4: What is the Advanced Actor-Critic Method (A2C)?
<details>
<summary>Solution</summary>

The idea behind Actor-Critic is that we learn two function approximations:
1. A `policy` that controls how our agent acts (π)
2. A `value` function to assist the policy update by measuring how good the action taken is (q)

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/step2.jpg" alt="Actor-Critic, step 2"/>

</details>

### Q5: Which of the following statemets are True about the Actor-Critic Method?
<Question
choices={[
{
text: "The Critic does not learn from the training process",
explain: "Both the Actor and the Critic function parameters are updated during training time",
correct: false,
},
{
text: "The Actor learns a policy function, while the Critic learns a value function",
explain: "",
correct: true,
},
{
text: "It adds resistance to stochasticity and reduces high variance",
explain: "",
correct: true,
},
]}
/>



### Q6: What is `Advantege` in the A2C method?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

<details>
<summary>Solution</summary>

Instead of using directly the Action-Value function of the Critic as it is, we could use an `Advantage` function. The idea behind an `Advantage` function is that we calculate the relative advantage of an action compared to the others possible at a state, averaging them.

In other words: how taking that action at a state is better compared to the average value of the state

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/advantage1.jpg" alt="Advantage in A2C"/>

</details>

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read the chapter again to reinforce (😏) your knowledge.
Loading