Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving DataPipe buffers from __iter__ to instance (self) #388

Closed
wants to merge 8 commits into from

Conversation

NivekT
Copy link
Contributor

@NivekT NivekT commented May 6, 2022

Stack from ghstack:

This PR impacts ParagraphAggregator and IterKeyZipper as they previously have their buffer with the iterator rather than the instance.

This should be landed along with pytorch/pytorch#76999 and pytorch/pytorch#77775

Note that the CI is expected to fail until that PR goes into nightly

Differential Revision: D36519522

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2022
NivekT added a commit that referenced this pull request May 6, 2022
ghstack-source-id: 43ba829e0675da98fe2a8c4c73efd0dd6c66edaf
Pull Request resolved: #388
@NivekT NivekT marked this pull request as draft May 6, 2022 22:12
This PR impacts `ParagraphAggregator` and `IterKeyZipper` as they previously have their buffer with the iterator rather than the instance.

This should be landed along with pytorch/pytorch#76999

Note that the CI is expected to fail until that PR lands


[ghstack-poisoned]
NivekT added a commit that referenced this pull request May 6, 2022
ghstack-source-id: 0be96c0bb28e2959ae4fd015d16c419c3abd6015
Pull Request resolved: #388
This PR impacts `ParagraphAggregator` and `IterKeyZipper` as they previously have their buffer with the iterator rather than the instance.

This should be landed along with pytorch/pytorch#76999

Note that the CI is expected to fail until that PR lands


[ghstack-poisoned]
NivekT added a commit that referenced this pull request May 18, 2022
ghstack-source-id: 827de5290b332ab5171f044a9714450169333c74
Pull Request resolved: #388
@NivekT NivekT marked this pull request as ready for review May 18, 2022 21:35
@NivekT NivekT requested review from VitalyFedyunin and ejguan May 18, 2022 21:35
This PR impacts `ParagraphAggregator` and `IterKeyZipper` as they previously have their buffer with the iterator rather than the instance.

This should be landed along with pytorch/pytorch#76999 and pytorch/pytorch#77775 

Note that the CI is expected to fail until that PR goes into nightly


[ghstack-poisoned]
NivekT added a commit that referenced this pull request May 18, 2022
ghstack-source-id: 643d13c60d8c8fa21e68be23b371d5ba33df896e
Pull Request resolved: #388
Comment on lines 111 to 112
if self.buffer:
self.buffer.clear()
Copy link
Contributor Author

@NivekT NivekT May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting enough the domain CI test for Vision started to fail for a previous commit of this PR. It is raising "ResourceWarning" as you can see in the link below.

https://github.com/pytorch/data/runs/6498262553?check_suite_focus=true

Perhaps we need clear the buffer when the iterator is exhausted? I am adding this line to see if the issue is resolved. Let me know if you have any other thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried and clearing the buffer at the end of __iter__ doesn't help.

@pmeier Do you think this is related to pytest or something that is introduced in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here is the thread where we discussed a similar issue:
pytorch/vision#5801

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to ignore the warning and land this based on the previous investigation.

This PR impacts `ParagraphAggregator` and `IterKeyZipper` as they previously have their buffer with the iterator rather than the instance.

This should be landed along with pytorch/pytorch#76999 and pytorch/pytorch#77775 

Note that the CI is expected to fail until that PR goes into nightly


[ghstack-poisoned]
NivekT added a commit that referenced this pull request May 18, 2022
ghstack-source-id: 827de5290b332ab5171f044a9714450169333c74
Pull Request resolved: #388
This PR impacts `ParagraphAggregator` and `IterKeyZipper` as they previously have their buffer with the iterator rather than the instance.

This should be landed along with pytorch/pytorch#76999 and pytorch/pytorch#77775 

Note that the CI is expected to fail until that PR goes into nightly


[ghstack-poisoned]
NivekT added a commit that referenced this pull request May 19, 2022
ghstack-source-id: c4a55ad83139e8bb7d32881d2bc5675b0392197b
Pull Request resolved: #388
This PR impacts `ParagraphAggregator` and `IterKeyZipper` as they previously have their buffer with the iterator rather than the instance.

This should be landed along with pytorch/pytorch#76999 and pytorch/pytorch#77775 

Note that the CI is expected to fail until that PR goes into nightly


[ghstack-poisoned]
NivekT added a commit that referenced this pull request May 19, 2022
ghstack-source-id: 2c51e5c0253a2305705ae97e8710116f93fcd1c5
Pull Request resolved: #388
@NivekT
Copy link
Contributor Author

NivekT commented May 19, 2022

@NivekT has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
We can still discuss if we want to serialize/deserialize buffer in getstate and setstate later in your snapshotting design.

This PR impacts `ParagraphAggregator` and `IterKeyZipper` as they previously have their buffer with the iterator rather than the instance.

This should be landed along with pytorch/pytorch#76999 and pytorch/pytorch#77775 

Note that the CI is expected to fail until that PR goes into nightly

Differential Revision: [D36519522](https://our.internmc.facebook.com/intern/diff/D36519522)

[ghstack-poisoned]
NivekT added a commit that referenced this pull request May 19, 2022
ghstack-source-id: c74a7afe7ffbfb82ee99521d3144153ea29759b7
Pull Request resolved: #388
@NivekT
Copy link
Contributor Author

NivekT commented May 19, 2022

@NivekT has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot facebook-github-bot deleted the gh/NivekT/77/head branch May 23, 2022 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants