feature: working version of importance sampling on feedforward madqn. #275
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What?
Implemented importance sampling (IS) / prioritized experience replay for feedforward madqn.
Why?
Hopefully importance sampling will assist performance.
How?
After each learning step we compute new priorities for the samples we drew from the replay buffer. This is done using Q-value errors. The
mutate_priorities()
function is used to update the priorities in the reverb table.Extra
For now IS only works in feedforward madqn. I have also tested to make sure all of the other systems that inherit from feedforward MADQN still work. Recurrent MADQN, MADQN with comms, Dial, VDN and Qmix all still work after I made these changes.
Because so many systems inherit from feedforward MADQN, it is quite hard to make changes without breaking the other systems. So my strategy mocing forward is going to be to make very incremental upgrades to MADQN and ensure at each step that nothing breaks the other systems.