Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dqn vs mdqn #264

Merged
merged 7 commits into from
Dec 20, 2022
Merged

dqn vs mdqn #264

merged 7 commits into from
Dec 20, 2022

Conversation

KohlerHECTOR
Copy link
Collaborator

@KohlerHECTOR KohlerHECTOR commented Dec 17, 2022

Description

dqn vs mdqn on mountaincar as a longtest

KohlerHECTOR added 3 commits December 17, 2022 12:47
pytest long_tests/torch_agent/ltest_dqn_vs_mdqn_montaincar.py
@KohlerHECTOR KohlerHECTOR changed the title added mdqn in init dqn vs mdqn Dec 17, 2022
@KohlerHECTOR
Copy link
Collaborator Author

KohlerHECTOR commented Dec 18, 2022

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning.
rewards.pdf
losses.pdf
eval.pdf

@AleShi94
Copy link
Collaborator

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. rewards.pdf losses.pdf eval.pdf

Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games.

@KohlerHECTOR
Copy link
Collaborator Author

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. rewards.pdf losses.pdf eval.pdf

Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games.

Hello, no. It would be good to try to get a single good run on acrobot indeed. I will try it should not be too long

@KohlerHECTOR
Copy link
Collaborator Author

KohlerHECTOR commented Dec 19, 2022

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. rewards.pdf losses.pdf eval.pdf

Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games.

Hello, no. It would be good to try to get a single good run on acrobot indeed. I will try it should not be too long

Ok it works well. I had to change the value of hyperparam "target_update_parameter" from 8000 to 0.005. 0.005 is the default value for this hyperparam in base DQN. But the default value in MDQN is now 8000. Is it normal ?
mdqn_acro_rewards.pdf
mdqn_acro_loss.pdf
mdqn_acro_eval.pdf

@KohlerHECTOR
Copy link
Collaborator Author

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. rewards.pdf losses.pdf eval.pdf

Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games.

Hello, no. It would be good to try to get a single good run on acrobot indeed. I will try it should not be too long

Ok it works well. I had to change the value of hyperparam "target_update_parameter" from 8000 to 0.005. 0.005 is the default value for this hyperparam in base DQN. But the default value in MDQN is now 8000. Is it normal ? mdqn_acro_rewards.pdf mdqn_acro_loss.pdf mdqn_acro_eval.pdf

Maybe we should fix default mdqn hyperparams to be the same as dqn's default ?

@AleShi94
Copy link
Collaborator

Maybe we should fix default mdqn hyperparams to be the same as dqn's default ?

I put the hyperparameters mentioned in their article by default. But as I already mentioned they run experiments on Atari mostly, so it could be that hyperparameters on smaller environments should be different.

@AleShi94
Copy link
Collaborator

Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0?

@KohlerHECTOR
Copy link
Collaborator Author

KohlerHECTOR commented Dec 19, 2022

Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0?

Maybe if we want to see how good this munchausen trick is, we should launch mdqn vs dqn on acrobot using dqn with lambda = 0 indeed , and DQN default where possible ?

@TimotheeMathieu
Copy link
Collaborator

Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0?

Maybe if we want to see how good this munchausen trick is, we should launch mdqn vs dqn on acrobot using dqn with lambda = 0 indeed , and DQN default where possible ?

There are two views : either you say that the honest thing to do is to compare both algorithms while only adding the munchausen trick or you compare the best tuning of both agents. I guess it depends who you ask.

@AleShi94
Copy link
Collaborator

It could be an idea for another PR: compare performances of DQN for different values of lambda. We don't really know if it adds a lot considering non-zero lambdas. But for this particular test I think it is enough to do a classical DQN (lambda=0) vs Munchausen DQN. All other hyperparameters could be the same for both agents.

@KohlerHECTOR
Copy link
Collaborator Author

KohlerHECTOR commented Dec 19, 2022

It could be an idea for another PR: compare performances of DQN for different values of lambda. We don't really know if it adds a lot considering non-zero lambdas. But for this particular test I think it is enough to do a classical DQN (lambda=0) vs Munchausen DQN. All other hyperparameters could be the same for both agents.

I have launched this just now.
Results:
mdqn_acro_loss.pdf

mdqn_acro_eval.pdf

mdqn_acro_rewards.pdf

I guess I will modify the long test so that it repeats this experiment on multiple seeds.
Then we should close this PR.
I guess the munchausen trick does not improve too much on acrobot, next PR we should try on minatar maybe

Copy link
Collaborator

@AleShi94 AleShi94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree for merging

Copy link
Collaborator

@TimotheeMathieu TimotheeMathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small comment otherwise good for me, thanks !

@mmcenta
Copy link
Collaborator

mmcenta commented Dec 20, 2022

Ok it works well. I had to change the value of hyperparam "target_update_parameter" from 8000 to 0.005. 0.005 is the default value for this hyperparam in base DQN. But the default value in MDQN is now 8000. Is it normal ?

I think that they do the same thing in two different ways. There's two major ways of doing target network updates:

  1. Copy the weights periodically, i.e. every N timesteps you copy the weights from the online NN to the target NN. In this case, it seems N = 8000.
  2. Do a soft update every time such that new_weights = (1 - c) * target_weights + c * weights, in which c is a parameter functionally similar to N. In this case, c=0.005.

Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0?

Our DQN is a bit weird because a lot of implementations don't have a chunk_size (and consequently TD(lambda) targets). I think both are fine.

Copy link
Collaborator

@mmcenta mmcenta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@KohlerHECTOR KohlerHECTOR merged commit ddd9148 into main Dec 20, 2022
@KohlerHECTOR KohlerHECTOR deleted the mdqn_vs_dqn branch December 20, 2022 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants