-
Notifications
You must be signed in to change notification settings - Fork 23.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataLoader] Adding ability to use dill to pass DataPipes in mutiprocessing #77288
[DataLoader] Adding ability to use dill to pass DataPipes in mutiprocessing #77288
Conversation
…essing [ghstack-poisoned]
🔗 Helpful links
❌ 6 New Failures, 16 PendingAs of commit 6cd977e (more details on the Dr. CI page): Expand to see more
🕵️ 6 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pull / linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/6)Step: "Test" (full log | diagnosis details | 🔁 rerun)
|
…in mutiprocessing" [ghstack-poisoned]
…in mutiprocessing" [ghstack-poisoned]
…in mutiprocessing" [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple comments below!
try: | ||
import dill | ||
# XXX: By default, dill writes the Pickler dispatch table to inject its | ||
# own logic there. This globally affects the behavior of the standard library | ||
# pickler for any user who transitively depends on this module! | ||
# Undo this extension to avoid altering the behavior of the pickler globally. | ||
dill.extend(use_dill=False) | ||
HAS_DILL = True | ||
except ImportError: | ||
HAS_DILL = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we can reuse the code from here:
pytorch/torch/utils/data/_utils/serialization.py
Lines 5 to 15 in 50efa3a
try: | |
import dill | |
# XXX: By default, dill writes the Pickler dispatch table to inject its | |
# own logic there. This globally affects the behavior of the standard library | |
# pickler for any user who transitively depends on this module! | |
# Undo this extension to avoid altering the behavior of the pickler globally. | |
dill.extend(use_dill=False) | |
DILL_AVAILABLE = True | |
except ImportError: | |
DILL_AVAILABLE = False |
from torch.utils.data._utils.serialization import DILL_AVAILABLE
@@ -265,3 +278,43 @@ def __iter__(self) -> Iterator[T]: | |||
def raw_iterator(self) -> T: # type: ignore[misc] | |||
for i in self.items: | |||
yield i | |||
|
|||
class TravelingDataPipe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename it to something like _DataPipeSerializationWrapper
? With DataPipe
at the end of this class, we might give users a wrong hint. And, make it private (underscore) might be even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the renaming, we might want to rename TravelingIterDataPipe
and TravelingMapDataPipe
too, especially if these names show up in the DataPipe graph and users want to know what they are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will rename, but names will not show in the graph (maximum in error trace) as they are applied inside DLv1 only.
@@ -265,3 +278,43 @@ def __iter__(self) -> Iterator[T]: | |||
def raw_iterator(self) -> T: # type: ignore[misc] | |||
for i in self.items: | |||
yield i | |||
|
|||
class TravelingDataPipe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the renaming, we might want to rename TravelingIterDataPipe
and TravelingMapDataPipe
too, especially if these names show up in the DataPipe graph and users want to know what they are
…in mutiprocessing" Fixes part of pytorch/data#397 (only DataLoader v2 remaining) [ghstack-poisoned]
…essing ghstack-source-id: 825f016a89ee75157c1ab57a97dfe11fd1f1fff8 Pull Request resolved: #77288
…in mutiprocessing" Fixes part of pytorch/data#397 (only DataLoader v2 remaining) [ghstack-poisoned]
…in mutiprocessing" Fixes part of pytorch/data#397 (only DataLoader v2 remaining) [ghstack-poisoned]
…essing ghstack-source-id: e3d89fc416b67658474007f63d83312db20aa50c Pull Request resolved: #77288
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…in mutiprocessing" Fixes part of pytorch/data#397 (only DataLoader v2 remaining) Differential Revision: [D36347737](https://our.internmc.facebook.com/intern/diff/D36347737) [ghstack-poisoned]
…in mutiprocessing" Fixes part of pytorch/data#397 (only DataLoader v2 remaining) Differential Revision: [D36347737](https://our.internmc.facebook.com/intern/diff/D36347737) [ghstack-poisoned]
I think users can also use the same pattern here if they ever want to have custom serialization logic? I am anticipating that they can follow this pattern, implement a DataPipe with custom |
…in mutiprocessing" Fixes part of pytorch/data#397 (only DataLoader v2 remaining) Differential Revision: [D36347737](https://our.internmc.facebook.com/intern/diff/D36347737) [ghstack-poisoned]
…essing ghstack-source-id: ac892a042d83b749beeb9bc55b643f1795f31462 Pull Request resolved: #77288
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: X-link: pytorch/pytorch#77288 Fixes part of #397 (only DataLoader v2 remaining) Reviewed By: NivekT, ejguan Differential Revision: D36347737 Pulled By: VitalyFedyunin fbshipit-source-id: 144d149f1da98b7108f54e9dcecda0f5740d1eb9
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
…essing (#77288) Summary: Pull Request resolved: #77288 Fixes part of pytorch/data#397 (only DataLoader v2 remaining) Test Plan: Imported from OSS Reviewed By: NivekT, ejguan Differential Revision: D36347737 Pulled By: VitalyFedyunin fbshipit-source-id: 144d149f1da98b7108f54e9dcecda0f5740d1eb9
try: | ||
new_datapipe = dill.loads(dill.dumps(datapipe)) | ||
except Exception as de: | ||
Exception('Unable to dill DataPipe to make thread local copy', de) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this exception be raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I will follow-up with quick patch.
Stack from ghstack:
Fixes part of pytorch/data#397 (only DataLoader v2 remaining)
Differential Revision: D36347737