Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataLoader] Adding ability to use dill to pass DataPipes in mutiprocessing #77288

Closed

Conversation

VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin commented May 11, 2022

Stack from ghstack:

Fixes part of pytorch/data#397 (only DataLoader v2 remaining)

Differential Revision: D36347737

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 11, 2022

🔗 Helpful links

❌ 6 New Failures, 16 Pending

As of commit 6cd977e (more details on the Dr. CI page):

Expand to see more
  • 6/6 failures introduced in this PR

🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build pull / linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-12T18:57:59.0619602Z FAIL [1.187s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:57:58.9526622Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:57:59.0042590Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.054s)
2022-05-12T18:57:59.0043159Z     test_correct_module_names failed - num_retries_left: 1
2022-05-12T18:57:59.0069187Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:57:59.0583087Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.054s)
2022-05-12T18:57:59.0583516Z     test_correct_module_names failed - num_retries_left: 0
2022-05-12T18:57:59.0612750Z   test_no_new_bindings (__main__.TestPublicBindings)
2022-05-12T18:57:59.0618738Z This test aims to stop the introduction of new JIT bindings into torch._C ... ok (0.003s)
2022-05-12T18:57:59.0619122Z 
2022-05-12T18:57:59.0619273Z ======================================================================
2022-05-12T18:57:59.0619602Z FAIL [1.187s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:57:59.0620081Z An API is considered public, if  its  `__module__` starts with `torch.`
2022-05-12T18:57:59.0620749Z ----------------------------------------------------------------------
2022-05-12T18:57:59.0621203Z Traceback (most recent call last):
2022-05-12T18:57:59.0621868Z   File "test_public_bindings.py", line 394, in test_correct_module_names
2022-05-12T18:57:59.0622344Z     self.assertTrue(not failure_list, msg)
2022-05-12T18:57:59.0623426Z AssertionError: False is not true : All the APIs below do not meet our guidelines for public API from https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation.
2022-05-12T18:57:59.0624172Z Make sure that everything that is public is expected (in particular that the module has a properly populated `__all__` attribute) and that everything that is supposed to be public does look public (it does not start with `_` and has a `__module__` that is properly populated).
2022-05-12T18:57:59.0624537Z 
2022-05-12T18:57:59.0624604Z Full list:
2022-05-12T18:57:59.0624849Z # torch.utils.data.IterDataPipeSerializationWrapper:

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge) (2/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-12T19:07:36.5532279Z FAIL [1.384s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T19:07:36.4451458Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T19:07:36.4960550Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.053s)
2022-05-12T19:07:36.4961130Z     test_correct_module_names failed - num_retries_left: 1
2022-05-12T19:07:36.4985694Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T19:07:36.5494217Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.053s)
2022-05-12T19:07:36.5494942Z     test_correct_module_names failed - num_retries_left: 0
2022-05-12T19:07:36.5524654Z   test_no_new_bindings (__main__.TestPublicBindings)
2022-05-12T19:07:36.5530672Z This test aims to stop the introduction of new JIT bindings into torch._C ... ok (0.003s)
2022-05-12T19:07:36.5531663Z 
2022-05-12T19:07:36.5531932Z ======================================================================
2022-05-12T19:07:36.5532279Z FAIL [1.384s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T19:07:36.5532824Z An API is considered public, if  its  `__module__` starts with `torch.`
2022-05-12T19:07:36.5533512Z ----------------------------------------------------------------------
2022-05-12T19:07:36.5533774Z Traceback (most recent call last):
2022-05-12T19:07:36.5534179Z   File "test_public_bindings.py", line 394, in test_correct_module_names
2022-05-12T19:07:36.5534507Z     self.assertTrue(not failure_list, msg)
2022-05-12T19:07:36.5535433Z AssertionError: False is not true : All the APIs below do not meet our guidelines for public API from https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation.
2022-05-12T19:07:36.5536339Z Make sure that everything that is public is expected (in particular that the module has a properly populated `__all__` attribute) and that everything that is supposed to be public does look public (it does not start with `_` and has a `__module__` that is properly populated).
2022-05-12T19:07:36.5536713Z 
2022-05-12T19:07:36.5536782Z Full list:
2022-05-12T19:07:36.5537031Z # torch.utils.data.IterDataPipeSerializationWrapper:

See GitHub Actions build pull / linux-xenial-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge) (3/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-12T19:14:46.1457177Z FAIL [2.016s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T19:14:46.0016988Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T19:14:46.0648818Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.066s)
2022-05-12T19:14:46.0649324Z     test_correct_module_names failed - num_retries_left: 1
2022-05-12T19:14:46.0682205Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T19:14:46.1406500Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.076s)
2022-05-12T19:14:46.1406976Z     test_correct_module_names failed - num_retries_left: 0
2022-05-12T19:14:46.1446815Z   test_no_new_bindings (__main__.TestPublicBindings)
2022-05-12T19:14:46.1455381Z This test aims to stop the introduction of new JIT bindings into torch._C ... ok (0.005s)
2022-05-12T19:14:46.1456325Z 
2022-05-12T19:14:46.1456481Z ======================================================================
2022-05-12T19:14:46.1457177Z FAIL [2.016s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T19:14:46.1457734Z An API is considered public, if  its  `__module__` starts with `torch.`
2022-05-12T19:14:46.1459304Z ----------------------------------------------------------------------
2022-05-12T19:14:46.1459556Z Traceback (most recent call last):
2022-05-12T19:14:46.1459840Z   File "test_public_bindings.py", line 394, in test_correct_module_names
2022-05-12T19:14:46.1460112Z     self.assertTrue(not failure_list, msg)
2022-05-12T19:14:46.1460687Z AssertionError: False is not true : All the APIs below do not meet our guidelines for public API from https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation.
2022-05-12T19:14:46.1461340Z Make sure that everything that is public is expected (in particular that the module has a properly populated `__all__` attribute) and that everything that is supposed to be public does look public (it does not start with `_` and has a `__module__` that is properly populated).
2022-05-12T19:14:46.1461722Z 
2022-05-12T19:14:46.1461776Z Full list:
2022-05-12T19:14:46.1462034Z # torch.utils.data.IterDataPipeSerializationWrapper:

See GitHub Actions build pull / linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge) (4/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-12T18:58:49.2457356Z FAIL [1.350s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:58:49.1346542Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:58:49.1871032Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.055s)
2022-05-12T18:58:49.1871551Z     test_correct_module_names failed - num_retries_left: 1
2022-05-12T18:58:49.1895799Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:58:49.2418452Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.055s)
2022-05-12T18:58:49.2418962Z     test_correct_module_names failed - num_retries_left: 0
2022-05-12T18:58:49.2449678Z   test_no_new_bindings (__main__.TestPublicBindings)
2022-05-12T18:58:49.2456004Z This test aims to stop the introduction of new JIT bindings into torch._C ... ok (0.004s)
2022-05-12T18:58:49.2456471Z 
2022-05-12T18:58:49.2456957Z ======================================================================
2022-05-12T18:58:49.2457356Z FAIL [1.350s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:58:49.2458000Z An API is considered public, if  its  `__module__` starts with `torch.`
2022-05-12T18:58:49.2458670Z ----------------------------------------------------------------------
2022-05-12T18:58:49.2459074Z Traceback (most recent call last):
2022-05-12T18:58:49.2459393Z   File "test_public_bindings.py", line 394, in test_correct_module_names
2022-05-12T18:58:49.2459678Z     self.assertTrue(not failure_list, msg)
2022-05-12T18:58:49.2460410Z AssertionError: False is not true : All the APIs below do not meet our guidelines for public API from https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation.
2022-05-12T18:58:49.2461053Z Make sure that everything that is public is expected (in particular that the module has a properly populated `__all__` attribute) and that everything that is supposed to be public does look public (it does not start with `_` and has a `__module__` that is properly populated).
2022-05-12T18:58:49.2461424Z 
2022-05-12T18:58:49.2461491Z Full list:
2022-05-12T18:58:49.2461749Z # torch.utils.data.IterDataPipeSerializationWrapper:

See GitHub Actions build pull / pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) (5/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-12T18:21:49.7268700Z TypeError: run_gen...got an unexpected keyword argument 'get_device_fn'
2022-05-12T18:21:47.5128969Z + python setup.py install
2022-05-12T18:21:48.4646926Z No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
2022-05-12T18:21:48.4758551Z Building torch_xla version: 1.12
2022-05-12T18:21:48.4759005Z XLA Commit ID: 7a306610e754785a046d8ba58121701430b943bb
2022-05-12T18:21:48.4759286Z PyTorch Commit ID: 6cd977e20ba8081032784161c55c88ebddf53f2f
2022-05-12T18:21:48.4823663Z /var/lib/jenkins/workspace /var/lib/jenkins/workspace/xla
2022-05-12T18:21:49.6321914Z /var/lib/jenkins/workspace/xla
2022-05-12T18:21:49.7267090Z Traceback (most recent call last):
2022-05-12T18:21:49.7267581Z   File "/var/lib/jenkins/workspace/xla/scripts/gen_lazy_tensor.py", line 84, in <module>
2022-05-12T18:21:49.7268158Z     get_device_fn="torch_xla::bridge::GetXlaDevice")
2022-05-12T18:21:49.7268700Z TypeError: run_gen_lazy_tensor() got an unexpected keyword argument 'get_device_fn'
2022-05-12T18:21:49.7358204Z Failed to generate lazy files: ['python', '/var/lib/jenkins/workspace/xla/scripts/gen_lazy_tensor.py']
2022-05-12T18:21:49.9362197Z + cleanup
2022-05-12T18:21:49.9362577Z + retcode=1
2022-05-12T18:21:49.9362775Z + set +x
2022-05-12T18:21:49.9395327Z ##[error]Process completed with exit code 1.
2022-05-12T18:21:49.9492186Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-12T18:21:49.9492443Z with:
2022-05-12T18:21:49.9492853Z   github-token: ***
2022-05-12T18:21:49.9493023Z env:
2022-05-12T18:21:49.9493164Z   IN_CI: 1

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (6/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-12T18:55:14.4329653Z FAIL [1.323s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:55:14.3246017Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:55:14.3753003Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.053s)
2022-05-12T18:55:14.3753467Z     test_correct_module_names failed - num_retries_left: 1
2022-05-12T18:55:14.3778187Z   test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:55:14.4291873Z An API is considered public, if  its  `__module__` starts with `torch.` ... FAIL (0.054s)
2022-05-12T18:55:14.4292381Z     test_correct_module_names failed - num_retries_left: 0
2022-05-12T18:55:14.4321972Z   test_no_new_bindings (__main__.TestPublicBindings)
2022-05-12T18:55:14.4328759Z This test aims to stop the introduction of new JIT bindings into torch._C ... ok (0.004s)
2022-05-12T18:55:14.4329080Z 
2022-05-12T18:55:14.4329230Z ======================================================================
2022-05-12T18:55:14.4329653Z FAIL [1.323s]: test_correct_module_names (__main__.TestPublicBindings)
2022-05-12T18:55:14.4330254Z An API is considered public, if  its  `__module__` starts with `torch.`
2022-05-12T18:55:14.4330887Z ----------------------------------------------------------------------
2022-05-12T18:55:14.4331159Z Traceback (most recent call last):
2022-05-12T18:55:14.4331413Z   File "test_public_bindings.py", line 394, in test_correct_module_names
2022-05-12T18:55:14.4331679Z     self.assertTrue(not failure_list, msg)
2022-05-12T18:55:14.4332267Z AssertionError: False is not true : All the APIs below do not meet our guidelines for public API from https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation.
2022-05-12T18:55:14.4333216Z Make sure that everything that is public is expected (in particular that the module has a properly populated `__all__` attribute) and that everything that is supposed to be public does look public (it does not start with `_` and has a `__module__` that is properly populated).
2022-05-12T18:55:14.4333595Z 
2022-05-12T18:55:14.4333667Z Full list:
2022-05-12T18:55:14.4333929Z # torch.utils.data.IterDataPipeSerializationWrapper:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@VitalyFedyunin VitalyFedyunin requested review from NivekT and ejguan May 11, 2022 20:51
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple comments below!

test/test_datapipe.py Show resolved Hide resolved
test/test_datapipe.py Show resolved Hide resolved
Comment on lines +7 to +16
try:
import dill
# XXX: By default, dill writes the Pickler dispatch table to inject its
# own logic there. This globally affects the behavior of the standard library
# pickler for any user who transitively depends on this module!
# Undo this extension to avoid altering the behavior of the pickler globally.
dill.extend(use_dill=False)
HAS_DILL = True
except ImportError:
HAS_DILL = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we can reuse the code from here:

try:
import dill
# XXX: By default, dill writes the Pickler dispatch table to inject its
# own logic there. This globally affects the behavior of the standard library
# pickler for any user who transitively depends on this module!
# Undo this extension to avoid altering the behavior of the pickler globally.
dill.extend(use_dill=False)
DILL_AVAILABLE = True
except ImportError:
DILL_AVAILABLE = False

from torch.utils.data._utils.serialization import DILL_AVAILABLE

@@ -265,3 +278,43 @@ def __iter__(self) -> Iterator[T]:
def raw_iterator(self) -> T: # type: ignore[misc]
for i in self.items:
yield i

class TravelingDataPipe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename it to something like _DataPipeSerializationWrapper? With DataPipe at the end of this class, we might give users a wrong hint. And, make it private (underscore) might be even better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the renaming, we might want to rename TravelingIterDataPipe and TravelingMapDataPipe too, especially if these names show up in the DataPipe graph and users want to know what they are

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename, but names will not show in the graph (maximum in error trace) as they are applied inside DLv1 only.

@@ -265,3 +278,43 @@ def __iter__(self) -> Iterator[T]:
def raw_iterator(self) -> T: # type: ignore[misc]
for i in self.items:
yield i

class TravelingDataPipe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the renaming, we might want to rename TravelingIterDataPipe and TravelingMapDataPipe too, especially if these names show up in the DataPipe graph and users want to know what they are

@NivekT NivekT added module: dataloader Related to torch.utils.data.DataLoader and Sampler release notes: dataloader release notes category topic: improvements topic category labels May 11, 2022
…in mutiprocessing"


Fixes part of pytorch/data#397 (only DataLoader v2 remaining)

[ghstack-poisoned]
VitalyFedyunin added a commit that referenced this pull request May 11, 2022
…essing

ghstack-source-id: 825f016a89ee75157c1ab57a97dfe11fd1f1fff8
Pull Request resolved: #77288
…in mutiprocessing"


Fixes part of pytorch/data#397 (only DataLoader v2 remaining)

[ghstack-poisoned]
…in mutiprocessing"


Fixes part of pytorch/data#397 (only DataLoader v2 remaining)

[ghstack-poisoned]
VitalyFedyunin added a commit that referenced this pull request May 12, 2022
…essing

ghstack-source-id: e3d89fc416b67658474007f63d83312db20aa50c
Pull Request resolved: #77288
@VitalyFedyunin
Copy link
Contributor Author

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…in mutiprocessing"


Fixes part of pytorch/data#397 (only DataLoader v2 remaining)

Differential Revision: [D36347737](https://our.internmc.facebook.com/intern/diff/D36347737)

[ghstack-poisoned]
…in mutiprocessing"


Fixes part of pytorch/data#397 (only DataLoader v2 remaining)

Differential Revision: [D36347737](https://our.internmc.facebook.com/intern/diff/D36347737)

[ghstack-poisoned]
@NivekT
Copy link
Contributor

NivekT commented May 12, 2022

I think users can also use the same pattern here if they ever want to have custom serialization logic? I am anticipating that they can follow this pattern, implement a DataPipe with custom __getstate__ and __setstate__, put that at the end of the pipeline, and the rest should work.

…in mutiprocessing"


Fixes part of pytorch/data#397 (only DataLoader v2 remaining)

Differential Revision: [D36347737](https://our.internmc.facebook.com/intern/diff/D36347737)

[ghstack-poisoned]
VitalyFedyunin added a commit that referenced this pull request May 12, 2022
…essing

ghstack-source-id: ac892a042d83b749beeb9bc55b643f1795f31462
Pull Request resolved: #77288
@VitalyFedyunin
Copy link
Contributor Author

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor Author

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit to pytorch/data that referenced this pull request May 15, 2022
Summary:
X-link: pytorch/pytorch#77288

Fixes part of #397 (only DataLoader v2 remaining)

Reviewed By: NivekT, ejguan

Differential Revision: D36347737

Pulled By: VitalyFedyunin

fbshipit-source-id: 144d149f1da98b7108f54e9dcecda0f5740d1eb9
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge this

(Initiating merge automatically since Phabricator Diff has merged)

facebook-github-bot pushed a commit that referenced this pull request May 15, 2022
…essing (#77288)

Summary:
Pull Request resolved: #77288

Fixes part of pytorch/data#397 (only DataLoader v2 remaining)

Test Plan: Imported from OSS

Reviewed By: NivekT, ejguan

Differential Revision: D36347737

Pulled By: VitalyFedyunin

fbshipit-source-id: 144d149f1da98b7108f54e9dcecda0f5740d1eb9
@facebook-github-bot facebook-github-bot deleted the gh/VitalyFedyunin/178/head branch May 19, 2022 14:16
try:
new_datapipe = dill.loads(dill.dumps(datapipe))
except Exception as de:
Exception('Unable to dill DataPipe to make thread local copy', de)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this exception be raised?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I will follow-up with quick patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: dataloader Related to torch.utils.data.DataLoader and Sampler release notes: dataloader release notes category topic: improvements topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants