Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy base invalid action mask #505

Closed
wants to merge 80 commits into from

Conversation

ChengYen-Tang
Copy link

@ChengYen-Tang ChengYen-Tang commented Oct 10, 2019

Currently support:
Algorithm: PPO1, PPO2, A2C, ACER, ACKTR, TRPO
Action_space: Discrete, MultiDiscrete
Policy Network: MlpPolicy, MlpLnLstmPolicy, MlpLstmPolicy
Policy Network(Theoretically supported, but not tested): CnnPolicy, CnnLnLstmPolicy, CnnLstmPolicy
Vectorized Environments: DummyVecEnv, SubprocVecEnv

How to use: Environment, Test

H-Park and others added 22 commits August 23, 2019 15:31
…Discrete([a,b,c]), fix distri test, cleaned up _train_step for supported algorithms
Squashed commits:
[b661b29] Fix bug: action_space = Box crash
[18bb1bb] fix action_mask_check bug
[d69ca21] Use kwargs to pass the action_mask_ph parameter
[370e031] Add default mask
[10268b0] Fixed train_model action mask shape.
[5ead377] Fix AttributeError: 'CategoricalProbabilityDistributionType' object has no attribute 'n_vec'
[0187197] Restore ProbabilityDistributionType, overwrite proba_distribution_from_flat.
[8dbb5ac] Bug fix: Environment does not return action mask, causing program crash
[e06c4fe] Support action_space: Discrete
[fb5ecea] support FeedForwardPolicy
[9b2c2f5] Fix small amount invalid action problems
[95bb5ca] Fixed neglogp is a nan error. But there will still be a small amount of invalid action.
[2719987] why neglogp is nan?
[1724767] Rename variable
[47cc453] Override DQN action
[5319f7b] Override actions
@Miffyli
Copy link
Collaborator

Miffyli commented Oct 10, 2019

Is there something special here compared to #453 (Invalid action mask PR)? At a quick glance this is the code shared there as well, albeit slightly behind now since there has been new discussions and developments on #453.

@ChengYen-Tang
Copy link
Author

Although my code will merge with @H-Park in the future, I still hope that anyone can comment on me and tell me where I can improve.

@ChengYen-Tang
Copy link
Author

@Miffyli

On my computer, all the tests passed. However, because I added a lot of action mask tests, Travis CI could not be completed smoothly.
"The job exceeded the maximum time limit for jobs, and has been terminated."
Any suggestions?

https://travis-ci.org/NTUT-SE-ST/stable-baselines/builds/596085033

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 10, 2019

In my honest opinion it does not make sense to have two PRs for exact same feature running concurrently. While I see your code does implement many of these things, so does #453 . I do not see how this could contribute to work being done there once it is finished.

@araffin
Care to throw your opinion here?

@ChengYen-Tang
Copy link
Author

ChengYen-Tang commented Oct 10, 2019

@Miffyli
Currently, I support the Lstm policy, #453 not yet supported. There is a slight speed advantage in my experimental environment (still more tests are needed).

@ChengYen-Tang
Copy link
Author

Travis CI has the message "The job exceeded the maximum time limit for jobs, and has been terminated."
Is there any solution?
https://travis-ci.org/hill-a/stable-baselines/jobs/596132109

@araffin
Copy link
Collaborator

araffin commented Oct 10, 2019

Care to throw your opinion here?

I totally agree with @Miffyli , one PR is enough. I would prefer that you help and contribute with @H-Park PR (#453 ) as we have been discussing with him for a while now.

"The job exceeded the maximum time limit for jobs, and has been terminated."
Any suggestions?

You cannot do much with that, the test just took too long to run. Either run shorter test or split the test.

@ChengYen-Tang
Copy link
Author

OK, so I need close this pr?

@ChengYen-Tang
Copy link
Author

I merged the latest commit from master to my branch and the CI failed to pass. Therefore, I reopened this pr to check the code difference.

@ChengYen-Tang ChengYen-Tang reopened this Nov 28, 2019
ChengYen-Tang and others added 14 commits December 4, 2019 23:46
# Conflicts:
# stable_baselines/a2c/a2c.py
# stable_baselines/acer/acer_simple.py
# stable_baselines/acktr/acktr.py
# stable_baselines/common/base_class.py
# stable_baselines/common/misc_util.py
# stable_baselines/common/runners.py
# stable_baselines/ppo2/ppo2.py
# Conflicts:
#	stable_baselines/a2c/a2c.py
#	stable_baselines/acer/acer_simple.py
#	stable_baselines/acktr/acktr.py
#	stable_baselines/common/base_class.py
#	stable_baselines/common/misc_util.py
#	stable_baselines/common/runners.py
#	stable_baselines/ppo2/ppo2.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants