dqn vs mdqn #264

KohlerHECTOR · 2022-12-17T11:49:45Z

Description

dqn vs mdqn on mountaincar as a longtest

pytest long_tests/torch_agent/ltest_dqn_vs_mdqn_montaincar.py

KohlerHECTOR · 2022-12-18T21:34:49Z

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning.
rewards.pdf
losses.pdf
eval.pdf

AleShi94 · 2022-12-19T11:03:57Z

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. rewards.pdf losses.pdf eval.pdf

Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games.

KohlerHECTOR · 2022-12-19T11:25:45Z

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. rewards.pdf losses.pdf eval.pdf

Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games.

Hello, no. It would be good to try to get a single good run on acrobot indeed. I will try it should not be too long

KohlerHECTOR · 2022-12-19T11:51:26Z

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. rewards.pdf losses.pdf eval.pdf

Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games.

Hello, no. It would be good to try to get a single good run on acrobot indeed. I will try it should not be too long

Ok it works well. I had to change the value of hyperparam "target_update_parameter" from 8000 to 0.005. 0.005 is the default value for this hyperparam in base DQN. But the default value in MDQN is now 8000. Is it normal ?
mdqn_acro_rewards.pdf
mdqn_acro_loss.pdf
mdqn_acro_eval.pdf

KohlerHECTOR · 2022-12-19T11:59:44Z

I have now looked at the results of comparing dqn and mdqn on mountaincar (4 fits of 1e5 steps each). And it seems mdqn is not learning. rewards.pdf losses.pdf eval.pdf

Have you tried to run it on Acrobot? Actually, in the original paper they mostly show that M-DQN improves on DQN mostly on Atari games.

Hello, no. It would be good to try to get a single good run on acrobot indeed. I will try it should not be too long

Ok it works well. I had to change the value of hyperparam "target_update_parameter" from 8000 to 0.005. 0.005 is the default value for this hyperparam in base DQN. But the default value in MDQN is now 8000. Is it normal ? mdqn_acro_rewards.pdf mdqn_acro_loss.pdf mdqn_acro_eval.pdf

Maybe we should fix default mdqn hyperparams to be the same as dqn's default ?

AleShi94 · 2022-12-19T12:23:30Z

Maybe we should fix default mdqn hyperparams to be the same as dqn's default ?

I put the hyperparameters mentioned in their article by default. But as I already mentioned they run experiments on Atari mostly, so it could be that hyperparameters on smaller environments should be different.

AleShi94 · 2022-12-19T12:32:14Z

Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0?

KohlerHECTOR · 2022-12-19T12:36:43Z

Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0?

Maybe if we want to see how good this munchausen trick is, we should launch mdqn vs dqn on acrobot using dqn with lambda = 0 indeed , and DQN default where possible ?

TimotheeMathieu · 2022-12-19T12:43:18Z

Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0?

Maybe if we want to see how good this munchausen trick is, we should launch mdqn vs dqn on acrobot using dqn with lambda = 0 indeed , and DQN default where possible ?

There are two views : either you say that the honest thing to do is to compare both algorithms while only adding the munchausen trick or you compare the best tuning of both agents. I guess it depends who you ask.

AleShi94 · 2022-12-19T12:50:18Z

It could be an idea for another PR: compare performances of DQN for different values of lambda. We don't really know if it adds a lot considering non-zero lambdas. But for this particular test I think it is enough to do a classical DQN (lambda=0) vs Munchausen DQN. All other hyperparameters could be the same for both agents.

KohlerHECTOR · 2022-12-19T12:54:07Z

It could be an idea for another PR: compare performances of DQN for different values of lambda. We don't really know if it adds a lot considering non-zero lambdas. But for this particular test I think it is enough to do a classical DQN (lambda=0) vs Munchausen DQN. All other hyperparameters could be the same for both agents.

I have launched this just now.
Results:
mdqn_acro_loss.pdf

mdqn_acro_eval.pdf

mdqn_acro_rewards.pdf

I guess I will modify the long test so that it repeats this experiment on multiple seeds.
Then we should close this PR.
I guess the munchausen trick does not improve too much on acrobot, next PR we should try on minatar maybe

AleShi94

Agree for merging

TimotheeMathieu

A small comment otherwise good for me, thanks !

long_tests/torch_agent/ltest_dqn_vs_mdqn_acrobot.py

mmcenta · 2022-12-20T10:26:40Z

Ok it works well. I had to change the value of hyperparam "target_update_parameter" from 8000 to 0.005. 0.005 is the default value for this hyperparam in base DQN. But the default value in MDQN is now 8000. Is it normal ?

I think that they do the same thing in two different ways. There's two major ways of doing target network updates:

Copy the weights periodically, i.e. every N timesteps you copy the weights from the online NN to the target NN. In this case, it seems N = 8000.
Do a soft update every time such that new_weights = (1 - c) * target_weights + c * weights, in which c is a parameter functionally similar to N. In this case, c=0.005.

Also DQN by default is using TD(lambda), while it is not implemented in M-dqn. According to what they claim in the article it is not really required with munchausen trick. But still, maybe we should compare with DQN with lambda = 0?

Our DQN is a bit weird because a lot of implementations don't have a chunk_size (and consequently TD(lambda) targets). I think both are fine.

mmcenta

LGTM

KohlerHECTOR added 3 commits December 17, 2022 12:47

added mdqn in init

6f6b911

added longtest for dqn vs mdqn

a86c207

pytest long_tests/torch_agent/ltest_dqn_vs_mdqn_montaincar.py

blacked

fb7ad03

KohlerHECTOR changed the title ~~added mdqn in init~~ dqn vs mdqn Dec 17, 2022

KohlerHECTOR requested review from mmcenta, AleShi94 and TimotheeMathieu December 17, 2022 12:45

Single run on acrobot using mqdn

9dd82fa

KohlerHECTOR added 2 commits December 19, 2022 14:08

updated longtest. Now we compare dqn/mdqn with same hyperparams.

977acb4

BLACKED

e56f624

AleShi94 approved these changes Dec 19, 2022

View reviewed changes

TimotheeMathieu approved these changes Dec 20, 2022

View reviewed changes

long_tests/torch_agent/ltest_dqn_vs_mdqn_acrobot.py Outdated Show resolved Hide resolved

mmcenta approved these changes Dec 20, 2022

View reviewed changes

KohlerHECTOR added the ready for review label Dec 20, 2022

trigger cli + added comment

84c0576

KohlerHECTOR merged commit ddd9148 into main Dec 20, 2022

KohlerHECTOR deleted the mdqn_vs_dqn branch December 20, 2022 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dqn vs mdqn #264

dqn vs mdqn #264

KohlerHECTOR commented Dec 17, 2022 •

edited

Loading

KohlerHECTOR commented Dec 18, 2022 •

edited

Loading

AleShi94 commented Dec 19, 2022

KohlerHECTOR commented Dec 19, 2022

KohlerHECTOR commented Dec 19, 2022 •

edited

Loading

KohlerHECTOR commented Dec 19, 2022

AleShi94 commented Dec 19, 2022

AleShi94 commented Dec 19, 2022

KohlerHECTOR commented Dec 19, 2022 •

edited

Loading

TimotheeMathieu commented Dec 19, 2022

AleShi94 commented Dec 19, 2022

KohlerHECTOR commented Dec 19, 2022 •

edited

Loading

AleShi94 left a comment

TimotheeMathieu left a comment

mmcenta commented Dec 20, 2022

mmcenta left a comment

dqn vs mdqn #264

dqn vs mdqn #264

Conversation

KohlerHECTOR commented Dec 17, 2022 • edited Loading

Description

KohlerHECTOR commented Dec 18, 2022 • edited Loading

AleShi94 commented Dec 19, 2022

KohlerHECTOR commented Dec 19, 2022

KohlerHECTOR commented Dec 19, 2022 • edited Loading

KohlerHECTOR commented Dec 19, 2022

AleShi94 commented Dec 19, 2022

AleShi94 commented Dec 19, 2022

KohlerHECTOR commented Dec 19, 2022 • edited Loading

TimotheeMathieu commented Dec 19, 2022

AleShi94 commented Dec 19, 2022

KohlerHECTOR commented Dec 19, 2022 • edited Loading

AleShi94 left a comment

Choose a reason for hiding this comment

TimotheeMathieu left a comment

Choose a reason for hiding this comment

mmcenta commented Dec 20, 2022

mmcenta left a comment

Choose a reason for hiding this comment

KohlerHECTOR commented Dec 17, 2022 •

edited

Loading

KohlerHECTOR commented Dec 18, 2022 •

edited

Loading

KohlerHECTOR commented Dec 19, 2022 •

edited

Loading

KohlerHECTOR commented Dec 19, 2022 •

edited

Loading

KohlerHECTOR commented Dec 19, 2022 •

edited

Loading