Policy base invalid action mask #505

ChengYen-Tang · 2019-10-10T14:07:29Z

Currently support:
Algorithm: PPO1, PPO2, A2C, ACER, ACKTR, TRPO
Action_space: Discrete, MultiDiscrete
Policy Network: MlpPolicy, MlpLnLstmPolicy, MlpLstmPolicy
Policy Network(Theoretically supported, but not tested): CnnPolicy, CnnLnLstmPolicy, CnnLstmPolicy
Vectorized Environments: DummyVecEnv, SubprocVecEnv

How to use: Environment, Test

…of complicating the info storage array

…ns, fix dimension of created mask in PPO2.

… variable for mb_action_mask

…Discrete([a,b,c]), fix distri test, cleaned up _train_step for supported algorithms

Squashed commits: [b661b29] Fix bug: action_space = Box crash [18bb1bb] fix action_mask_check bug [d69ca21] Use kwargs to pass the action_mask_ph parameter [370e031] Add default mask [10268b0] Fixed train_model action mask shape. [5ead377] Fix AttributeError: 'CategoricalProbabilityDistributionType' object has no attribute 'n_vec' [0187197] Restore ProbabilityDistributionType, overwrite proba_distribution_from_flat. [8dbb5ac] Bug fix: Environment does not return action mask, causing program crash [e06c4fe] Support action_space: Discrete [fb5ecea] support FeedForwardPolicy [9b2c2f5] Fix small amount invalid action problems [95bb5ca] Fixed neglogp is a nan error. But there will still be a small amount of invalid action. [2719987] why neglogp is nan? [1724767] Rename variable [47cc453] Override DQN action [5319f7b] Override actions

Miffyli · 2019-10-10T14:13:09Z

Is there something special here compared to #453 (Invalid action mask PR)? At a quick glance this is the code shared there as well, albeit slightly behind now since there has been new discussions and developments on #453.

ChengYen-Tang · 2019-10-10T14:17:45Z

Although my code will merge with @H-Park in the future, I still hope that anyone can comment on me and tell me where I can improve.

ChengYen-Tang · 2019-10-10T14:29:28Z

@Miffyli

On my computer, all the tests passed. However, because I added a lot of action mask tests, Travis CI could not be completed smoothly.
"The job exceeded the maximum time limit for jobs, and has been terminated."
Any suggestions?

https://travis-ci.org/NTUT-SE-ST/stable-baselines/builds/596085033

Miffyli · 2019-10-10T15:02:20Z

In my honest opinion it does not make sense to have two PRs for exact same feature running concurrently. While I see your code does implement many of these things, so does #453 . I do not see how this could contribute to work being done there once it is finished.

@araffin
Care to throw your opinion here?

ChengYen-Tang · 2019-10-10T15:48:07Z

@Miffyli
Currently, I support the Lstm policy, #453 not yet supported. There is a slight speed advantage in my experimental environment (still more tests are needed).

ChengYen-Tang · 2019-10-10T16:20:32Z

Travis CI has the message "The job exceeded the maximum time limit for jobs, and has been terminated."
Is there any solution?
https://travis-ci.org/hill-a/stable-baselines/jobs/596132109

araffin · 2019-10-10T19:15:41Z

Care to throw your opinion here?

I totally agree with @Miffyli , one PR is enough. I would prefer that you help and contribute with @H-Park PR (#453 ) as we have been discussing with him for a while now.

"The job exceeded the maximum time limit for jobs, and has been terminated."
Any suggestions?

You cannot do much with that, the test just took too long to run. Either run shorter test or split the test.

ChengYen-Tang · 2019-11-12T10:58:04Z

OK, so I need close this pr?

…nto hill-a-master

ChengYen-Tang · 2019-11-28T16:12:54Z

I merged the latest commit from master to my branch and the CI failed to pass. Therefore, I reopened this pr to check the code difference.

…e 'action_space'

# Conflicts: # stable_baselines/a2c/a2c.py # stable_baselines/acer/acer_simple.py # stable_baselines/acktr/acktr.py # stable_baselines/common/base_class.py # stable_baselines/common/misc_util.py # stable_baselines/common/runners.py # stable_baselines/ppo2/ppo2.py

…elines into ActionMask

proba_vals() takes 3 positional arguments but 4 were given

H-Park and others added 22 commits August 23, 2019 15:31

begin action mask

e9612c6

Remove print statement, added docstrings, use mb_action_mask instead …

7341a26

…of complicating the info storage array

Make acer and acktr work

9ad9d6b

Merge branch 'master' into master

7ce9613

Remove redundant masking, pass around the mask to appropriate functio…

8f1c0c8

…ns, fix dimension of created mask in PPO2.

Merge branch 'master' into master

3595fb3

Merge branch 'master' into master

57d3a9b

Merge branch 'master' into master

0dd3f50

Fix order of variables passed into reshape_action_mask, and use local…

bd25101

… variable for mb_action_mask

tf.add instead of tf.multiply

41405bd

Support MultiDiscrete properly

1d895aa

Merge branch 'master' into master

57dd7d6

Test suite for action_mask

1e5e8f7

set the invalid logits to -inf(-999)

1457a9c

A2C, ACER, AKTOR support, cleaned up passing around action_mask

7dbd296

Declare action_mask_ph in common/distributions.py

1fead97

Finish moving action_mask_ph to common/distribution.py, support Multi…

1d7c142

…Discrete([a,b,c]), fix distri test, cleaned up _train_step for supported algorithms

Merge with hill-a:master, take advantage of placeholder_with_default

cd9da93

remove vestigial action mask parameter to proba_distribution_from_latent

03a0bb4

A2C, ACER, ACKTR support, add action mask test

61b811b

Restore dqn.py

74c902f

Update test method

15621cd

ChengYen-Tang added 3 commits November 13, 2019 00:53

neglogp use binary action masks

b77a8e0

Merge branch 'neglogp' into neglogp+entropy

980fd8b

Merge branch 'entropy' into neglogp+entropy

1a35e18

ChengYen-Tang closed this Nov 17, 2019

ChengYen-Tang added 2 commits November 28, 2019 18:58

Merge branch 'master' of https://github.com/hill-a/stable-baselines i…

ee1c26f

…nto hill-a-master

Merge branch 'hill-a-master' into ActionMask

2875a61

ChengYen-Tang reopened this Nov 28, 2019

ChengYen-Tang and others added 14 commits December 4, 2019 23:46

fix load model bug: AttributeError: 'NoneType' object has no attribut…

99c400d

…e 'action_space'

Merge pull request #16 from hill-a/master

a0b7b18

Merge branch 'ActionMask' into neglogp

8b8f13f

Merge branch 'ActionMask' into entropy

7f21eda

Merge branch 'neglogp' into neglogp+entropy

1012025

Merge branch 'entropy' into neglogp+entropy

3fd974a

Merge pull request #18 from hill-a/master

a694404

Merge branch 'pr/19' into ActionMask

5bdf274

Fix CI 'Type Checking' failed

2910c7e

Merge pull request #22 from NTUT-SELab/ActionMask

1e3b7a9

Merge branch 'NeuralNetworkOutput' into neglogp+entropy

352224f

Merge branch 'NeuralNetworkOutput' into neglogp+entropy

073e394

ChengYen-Tang force-pushed the ActionMask branch from 352224f to 073e394 Compare April 9, 2020 15:47

ChengYen-Tang and others added 5 commits June 30, 2020 01:43

Update distributions.py comment

ef97e0a

Merge branch 'ActionMask' of https://github.com/NTUT-SELab/stable-bas…

d185e57

…elines into ActionMask

Merge branch 'master' into ActionMask

05431a1

Merge branch 'master' into ActionMask

cb4a24a

Fix test_log_prob.py FAILED

b2a6c65

proba_vals() takes 3 positional arguments but 4 were given

ChengYen-Tang closed this Oct 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy base invalid action mask #505

Policy base invalid action mask #505

ChengYen-Tang commented Oct 10, 2019 •

edited

Loading

Miffyli commented Oct 10, 2019

ChengYen-Tang commented Oct 10, 2019

ChengYen-Tang commented Oct 10, 2019

Miffyli commented Oct 10, 2019

ChengYen-Tang commented Oct 10, 2019 •

edited

Loading

ChengYen-Tang commented Oct 10, 2019

araffin commented Oct 10, 2019 •

edited

Loading

ChengYen-Tang commented Nov 12, 2019

ChengYen-Tang commented Nov 28, 2019

Policy base invalid action mask #505

Policy base invalid action mask #505

Conversation

ChengYen-Tang commented Oct 10, 2019 • edited Loading

Miffyli commented Oct 10, 2019

ChengYen-Tang commented Oct 10, 2019

ChengYen-Tang commented Oct 10, 2019

Miffyli commented Oct 10, 2019

ChengYen-Tang commented Oct 10, 2019 • edited Loading

ChengYen-Tang commented Oct 10, 2019

araffin commented Oct 10, 2019 • edited Loading

ChengYen-Tang commented Nov 12, 2019

ChengYen-Tang commented Nov 28, 2019

ChengYen-Tang commented Oct 10, 2019 •

edited

Loading

ChengYen-Tang commented Oct 10, 2019 •

edited

Loading

araffin commented Oct 10, 2019 •

edited

Loading