-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/Multiple trainers for MA-DDPG #253
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this massive effort @DriesSmit 🙌 🏆 This is close to ready from my standpoint, I just had some comments and then we can do the benchmarks 🚀 🔥
) | ||
|
||
# Flush the writer. | ||
self._writer.flush(self._max_in_flight_items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for certain adders, we previously flushed after every write , e.g. sequence adder (
Mava/mava/adders/reverb/sequence.py
Line 136 in cbdbf6a
self._writer.flush(self._max_in_flight_items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great observation! I have not considered this. Flush seems to wait for all the data to be written. Please see here. So maybe its fine to do this only at the end of a trajectory step. Not sure. I am happy either way. I just don't want to make the writing even slower if we flush to often. But also if we need to flush after each write then we should change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I phrased my previous message incorrectly. The flush is after every create_item
, not write.
Let's do a flush after every create_item
. Acme did that recently across all writers (google-deepmind/acme@94711b1). Please check we do it for the case where it is not multiple trainers as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this back to how it was previously. I think doing it after every create_item
broke my training.
Trainer logging does not seem to be working for |
Fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @DriesSmit! 🙌 Just see my few comments. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DriesSmit!! 🔥
What?
Implements a scaled-up version of MADDPG where multiple trainers can now be used with multiple executors. A centralised variable server is also implemented that absorbs the responsibilities of the counter node, trainer checkpointing and trainer variable source. The trainers and executors now read and write to the centralised variable source directly. A multiple trainer example is included where 3 trainers and 2 executors are used to train 3 non-weight sharing agents on the debugging environment.
Why?
Multiple trainers allow for the parallelisation of the trainer's tasks. Just like is already done with executors. This also opens the door to hyperparameter tuning directly using Mava in future updates.
How?
Added a new Scaled MA-DDPG system that allows for the use of multiple trainers.
Extra
This PR uses changes proposed in updated-network-keys. Therefore that PR should be merged first. After that point, this PR can be moved out of the draft status.