-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dqn vs mdqn #264
dqn vs mdqn #264
Conversation
pytest long_tests/torch_agent/ltest_dqn_vs_mdqn_montaincar.py
I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. |
Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games. |
Hello, no. It would be good to try to get a single good run on acrobot indeed. I will try it should not be too long |
Ok it works well. I had to change the value of hyperparam "target_update_parameter" from 8000 to 0.005. 0.005 is the default value for this hyperparam in base DQN. But the default value in MDQN is now 8000. Is it normal ? |
Maybe we should fix default mdqn hyperparams to be the same as dqn's default ? |
I put the hyperparameters mentioned in their article by default. But as I already mentioned they run experiments on Atari mostly, so it could be that hyperparameters on smaller environments should be different. |
Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0? |
Maybe if we want to see how good this munchausen trick is, we should launch mdqn vs dqn on acrobot using dqn with lambda = 0 indeed , and DQN default where possible ? |
There are two views : either you say that the honest thing to do is to compare both algorithms while only adding the munchausen trick or you compare the best tuning of both agents. I guess it depends who you ask. |
It could be an idea for another PR: compare performances of DQN for different values of lambda. We don't really know if it adds a lot considering non-zero lambdas. But for this particular test I think it is enough to do a classical DQN (lambda=0) vs Munchausen DQN. All other hyperparameters could be the same for both agents. |
I have launched this just now. I guess I will modify the long test so that it repeats this experiment on multiple seeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree for merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small comment otherwise good for me, thanks !
I think that they do the same thing in two different ways. There's two major ways of doing target network updates:
Our DQN is a bit weird because a lot of implementations don't have a chunk_size (and consequently TD(lambda) targets). I think both are fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
dqn vs mdqn on mountaincar as a longtest