You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is the Distributional Q-Value Actor currently fully supported? If so, are there any plans to integrate C51 and more importantly, Rainbow, to the list of sota-implementations?
The text was updated successfully, but these errors were encountered:
I have implemented a first version of Rainbow containing all tricks! (Dueling DQN, Distributional, Prioritized Experience, etc.) and I am now running some preliminary experiments to debug its performance and make sure it works well.
However, I had to change this line to Tz = reward + (1 - terminated.to(reward.dtype)) * discount.unsqueeze(-1) * support.repeat(batch_size, 1).
Otherwise I would get shape errors. Let me know if this makes sense!
Is the Distributional Q-Value Actor currently fully supported? If so, are there any plans to integrate C51 and more importantly, Rainbow, to the list of sota-implementations?
The text was updated successfully, but these errors were encountered: