-
-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Results from get_last_reward() when Replaying LMP in Mode.PLAYER #412
Comments
Nice catch. Tested on Windows 10, Python 3.6.8 and ViZDoom 1.1.8pre and I am getting same results (does not work with PLAYER mode, but works with SPECTATOR mode). The sleep trick from #354 does not help here, although they might be related. |
This feels like a race condition and I think my assertion that "it works in Mode.SPECTATOR" is not correct. If I insert a sleep in the loop and print Example output with
However, if I do the same in Mode.player, the result is still wrong but it's "less wrong" since I get about the correct number of iterations (300) and Net, net, this feels like a race condition, but the effects are somewhat more limited in Mode.PLAYER. Though the living reward is effectively doubled. In terms of training this means that transitions from an expert trajectory replayed using this method actually look less appealing than those found randomly! |
Thanks for pointing this out! I first found it weird |
Well to be clear: I've experimented a bit and if I change ViZDoomGame.cpp as follows:
My belief is that for synchronous cases, it seems more correct for the DoomGame to get the last map tic executed by the user of the DoomGame using the value that the DoomController holds, rather than using DoomController::getMapTic which queries shared memory that may have been updated by Doom running in a separate thread. In testing of cases of human game play (SPECTATOR), .lmp playback (PLAYER) and agent play (PLAYER) I get the results I expect. However, given I don't really know the internals of ViZDoom, I'm not really sure whether this is a reasonable fix or not, but for now I will try this as a workaround. |
In the words of @mwydmuch , the whole project is a hack on top of ZDoom so solutions tend not to be too pretty. If this solution works reliably for you, it would be quite dandy if you could make a PR out of it :) |
Let me play with it a bit and if it works I will gladly submit a PR. Thanks! |
One more update: I think the below is an additional problem. The missing waitForDoomWork() causes the controller and VizDoom to become out of sync and subsequent TIC/UPDATE/TIC+UPDATE requests from the controller receive an acknowledgement that should have been paired with the prior message. The two changes I made seem to be working well for me but needs additional testing. diff --git a/src/lib/ViZDoomController.cpp b/src/lib/ViZDoomController.cpp
index daf6bba..faafbc8 100644
--- a/src/lib/ViZDoomController.cpp
+++ b/src/lib/ViZDoomController.cpp
@@ -441,6 +441,7 @@ namespace vizdoom {
// Workaround for some problems
this->sendCommand(std::string("map ") + this->map);
this->MQDoom->send(MSG_CODE_TIC);
+ this->waitForDoomWork();
this->sendCommand(std::string("playdemo ") + prepareLmpFilePath(demoPath)); |
Else DoomController becomes out of sync with Doom. See: Farama-Foundation#412
running synchronously. See Farama-Foundation#412.
Fixed by #415 |
Issue: Playback of .lmp using Mode.PLAYER results in game.get_last_reward() sometimes combining rewards of two time steps.
ViZDoom version 1.1.7
Platform: Ubuntu 18.04, Python 3.7.3
To reproduce:
This implies either a bug in the code, or that the
documentationexample which states that replay can be used in any mode should be updated to indicate Mode.SPECTATOR is required (and possibly a validation in the code that disallows replaying LMP files in Mode.PLAYER).Separately, the example also states that game.get_last_action() is not supported for replay, which it is now.
Thank you!
Example output when using Mode.PLAYER:
The text was updated successfully, but these errors were encountered: