-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policy base invalid action mask #505
Conversation
…of complicating the info storage array
…ns, fix dimension of created mask in PPO2.
… variable for mb_action_mask
…Discrete([a,b,c]), fix distri test, cleaned up _train_step for supported algorithms
Squashed commits: [b661b29] Fix bug: action_space = Box crash [18bb1bb] fix action_mask_check bug [d69ca21] Use kwargs to pass the action_mask_ph parameter [370e031] Add default mask [10268b0] Fixed train_model action mask shape. [5ead377] Fix AttributeError: 'CategoricalProbabilityDistributionType' object has no attribute 'n_vec' [0187197] Restore ProbabilityDistributionType, overwrite proba_distribution_from_flat. [8dbb5ac] Bug fix: Environment does not return action mask, causing program crash [e06c4fe] Support action_space: Discrete [fb5ecea] support FeedForwardPolicy [9b2c2f5] Fix small amount invalid action problems [95bb5ca] Fixed neglogp is a nan error. But there will still be a small amount of invalid action. [2719987] why neglogp is nan? [1724767] Rename variable [47cc453] Override DQN action [5319f7b] Override actions
Although my code will merge with @H-Park in the future, I still hope that anyone can comment on me and tell me where I can improve. |
On my computer, all the tests passed. However, because I added a lot of action mask tests, Travis CI could not be completed smoothly. https://travis-ci.org/NTUT-SE-ST/stable-baselines/builds/596085033 |
In my honest opinion it does not make sense to have two PRs for exact same feature running concurrently. While I see your code does implement many of these things, so does #453 . I do not see how this could contribute to work being done there once it is finished. @araffin |
Travis CI has the message "The job exceeded the maximum time limit for jobs, and has been terminated." |
I totally agree with @Miffyli , one PR is enough. I would prefer that you help and contribute with @H-Park PR (#453 ) as we have been discussing with him for a while now.
You cannot do much with that, the test just took too long to run. Either run shorter test or split the test. |
OK, so I need close this pr? |
I merged the latest commit from master to my branch and the CI failed to pass. Therefore, I reopened this pr to check the code difference. |
# Conflicts: # stable_baselines/a2c/a2c.py # stable_baselines/acer/acer_simple.py # stable_baselines/acktr/acktr.py # stable_baselines/common/base_class.py # stable_baselines/common/misc_util.py # stable_baselines/common/runners.py # stable_baselines/ppo2/ppo2.py
# Conflicts: # stable_baselines/a2c/a2c.py # stable_baselines/acer/acer_simple.py # stable_baselines/acktr/acktr.py # stable_baselines/common/base_class.py # stable_baselines/common/misc_util.py # stable_baselines/common/runners.py # stable_baselines/ppo2/ppo2.py
352224f
to
073e394
Compare
proba_vals() takes 3 positional arguments but 4 were given
Currently support:
Algorithm: PPO1, PPO2, A2C, ACER, ACKTR, TRPO
Action_space: Discrete, MultiDiscrete
Policy Network: MlpPolicy, MlpLnLstmPolicy, MlpLstmPolicy
Policy Network(Theoretically supported, but not tested): CnnPolicy, CnnLnLstmPolicy, CnnLstmPolicy
Vectorized Environments: DummyVecEnv, SubprocVecEnv
How to use: Environment, Test