New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Miscellaneous GAIL improvements and refactoring #133

Merged

shwang merged 23 commits into master from gail-reward-net-refactor

Jan 8, 2020

Member

qxcv commented Nov 30, 2019 •

edited

Loading

This PR includes a set of changes that make GAIL more flexible & easier to use. The DAgger PR (#128) and the BC PR (#125) should be merged before this one, since the branch I'm pulling from is based on both of those. New changes here:

Allows a custom discriminator model constructor to be passed to GAIL.
Allows control over where make_summary_writer writes its logs & makes AdversarialTrainer responsible for passing in that directory.
Moves the unique output directory generation code that was in make_summary_writer into init_trainer so that the resulting output directory can be used for logs, and not just TB summaries.
Removes the four build_ methods from DiscrimNet (build_{train_reward,test_reward,summaries,disc_loss}, IIRC) and replaces them with one build_graph method. Also merges the two build_ methods in AdversarialTrainer.
Adds an SB Logger to AdversarialTrainer that records discriminator stats at each update. At the moment this only works for GAIL; the extension to AIRL is straightforward, but I don't have time to test it manually, so I haven't done it myself.

Originally I was going to refactor DiscrimNet entirely so that it passes around Keras models instead of construction functions and kwargs, but I don't have time to do so at the moment (I've added it to the wishlist in #31).

qxcv added 12 commits

November 21, 2019 20:11


          Add ability to save policies with BC

0cc534d


          Also tests

d35f0fc


          Merge branch 'master' into save-bc-pols

4e0876c


          DAgger kind of works

bd50ce4


          Docstrings for DAgger

fbfc7e2


          Add tests (one still broken?)

03f6072


          Merge master

7cb2697


          Fix DAgger test

16c3ce8


          Fix type error in DAgger test

7a2150a


          Hopefully fix DAgger bug

67bbaf1


          Let DiscrimNet build custom network

32bae1b


          Merge branch 'dagger' into gail-reward-net-refactor

6fa152b

qxcv changed the title ~~GAIL reward net refactor~~ Custom reward nets for GAIL


          Re-structure code for generating TB log path

e39664a

qxcv changed the title ~~Custom reward nets for GAIL~~ Miscellaneous GAIL improvements and refactoring

codecov bot commented Dec 1, 2019 •

edited

Loading

Codecov Report

Merging #133 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
- Coverage   87.14%   87.13%   -0.01%     
==========================================
  Files          60       60              
  Lines        4332     4329       -3     
==========================================
- Hits         3775     3772       -3     
  Misses        557      557

Impacted Files	Coverage Δ
src/imitation/rewards/discrim_net.py	`97.9% <100%> (-0.05%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2fd8cb...aac5457. Read the comment docs.

qxcv added 2 commits

December 1, 2019 15:49


          Refactor DiscrimNet build_* methods

6efa03a


          Log discriminator stats for GAIL

6ccbd4b

shwang suggested changes

View reviewed changes

Member

shwang left a comment

Thanks for the refactoring / ease-of-use changes Sam. Have several questions about what to keep and not to keep when we merge with our custom logger in #135 .

src/imitation/algorithms/adversarial.py Outdated Show resolved Hide resolved

src/imitation/rewards/discrim_net.py Outdated Show resolved Hide resolved

src/imitation/algorithms/adversarial.py Outdated Show resolved Hide resolved

src/imitation/algorithms/adversarial.py

+                  mean_stats = {
+                    k: np.mean(v) for k, v in stat_dict_accum.items()
+                  }
+                  return mean_stats

Member

shwang Jan 1, 2020

Do you have any code that uses the mean_stats return value? I think that in a later commit I can cleanly merge all of the mean accumulating logic here with the mean-accumulating custom logger from #135.

The mean-accumulating logger gets to write the means to an SBLogger (whereas the means aren't logged here), so I'm wondering if you think we will still need to return mean_stats later on.

Member Author

qxcv Jan 1, 2020

I do, but I won't need it once #135 is removed. It's fine to take out.

Member Author

qxcv Jan 1, 2020

(I just added a comment saying you can remove it; I'll let you do the honours when you merge #135)

src/imitation/algorithms/adversarial.py

+                  log_fmts = [
+                    make_output_format(s, disc_log_dir) for s in log_fmt_strs
+                  ]
+                  self._disc_logger = Logger(disc_log_dir, log_fmts)

Member

shwang Jan 1, 2020

FYI, When I merge with #135 later, probably won't use this second logger. Instead will use a "discrim" context.

This is fine by me for this PR though.

src/imitation/algorithms/adversarial.py Outdated Show resolved Hide resolved

src/imitation/algorithms/adversarial.py Outdated Show resolved Hide resolved

src/imitation/summaries.py Outdated Show resolved Hide resolved

src/imitation/rewards/discrim_net.py Show resolved Hide resolved

src/imitation/rewards/discrim_net.py Outdated Show resolved Hide resolved

qxcv and others added 4 commits

January 1, 2020 14:17


          Update src/imitation/algorithms/adversarial.py

9745d4c

Co-Authored-By: Steven H. Wang <[email protected]>


          Update src/imitation/rewards/discrim_net.py

b6a2807

Co-Authored-By: Steven H. Wang <[email protected]>


          Merge branch 'master' into gail-reward-net-refactor

ad94c32


          Fix issues with GAIL improvement PR

e4be624

qxcv force-pushed the gail-reward-net-refactor branch from 3e49a29 to e4be624 Compare

January 1, 2020 23:22


          Pytype error suppression for dynamic attributes

4d3a2fd

qxcv requested a review from shwang

January 1, 2020 23:32

shwang reviewed

View reviewed changes

src/imitation/rewards/discrim_net.py Outdated

+                          construction of the discriminator network, and a `tf.Tensor`
+                          representing the desired discriminator logits.
+                      build_discrim_net_kwargs: optional extra keyword arguments for
+                          `build_discrim_net()`.

Member

shwang Jan 1, 2020

Now that we have the build_discrim_net_kwargs pattern, could we make build_mlp_discrim_net into a function rather than a class?

Not sure if pytype would be happy with Callable + arbitrary kwargs. (It looks like newer versions of Python will have a more precise Protocol type for defining callable types: https://stackoverflow.com/a/57840786/1091722)

shwang suggested changes

View reviewed changes

Member

shwang left a comment

Would like to get rid of the functor class but otherwise is looking good

shwang and others added 2 commits

January 1, 2020 16:11


          minor whitespace fixes

d2fd8cb


          Merge branch 'master' into gail-reward-net-refactor

b8e8a65


          Change functor to function

aac5457

Member Author

qxcv commented Jan 8, 2020

Done! Sorry for the delay, I forgot that I still had a change to make.

qxcv requested a review from shwang

January 8, 2020 00:47

shwang approved these changes

View reviewed changes

Member

shwang left a comment

LGTM, thanks

shwang merged commit 7f01e0e into master

shwang deleted the gail-reward-net-refactor branch

January 8, 2020 01:33

shwang mentioned this pull request

discrim: Share logging logic between GAIL & AIRL #156

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet