Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2/n] Add setting function to set seeds to the graph #894

Closed
wants to merge 4 commits into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Nov 14, 2022

Changes

  • Update list_dps with a new argument (exclusive) to make sure all DataPipes will be excluded even though they are the predecessors of the non-excluded DataPipe.
  • Update list_dps to use BFS to return the list of DataPipes.
  • Add _set_worker_seed_for_dp_graph to set seeds for DataPipes before sharding_filter with the same seed generator and set different seeds for DataPipes after sharding_filter with the worker-local seed generator.

Step 2 for #885

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2022
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torchdata/dataloader2/graph/utils.py Outdated Show resolved Hide resolved
return datapipe


def _set_graph_seeds(datapipe: DataPipe, seed_generator: torch.Generator, worker_id: int = 0) -> DataPipe:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using a private name because I don't like the name of this function. LMK if you have any better idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. _set_worker_seed
  2. _set_worker_seed_for_dp_graph

@ejguan ejguan requested a review from NivekT November 15, 2022 15:17
@ejguan ejguan added the topic: new feature topic category label Nov 15, 2022
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A few comments:

torchdata/dataloader2/graph/settings.py Show resolved Hide resolved
return datapipe


def _set_graph_seeds(datapipe: DataPipe, seed_generator: torch.Generator, worker_id: int = 0) -> DataPipe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. _set_worker_seed
  2. _set_worker_seed_for_dp_graph

torchdata/dataloader2/graph/utils.py Outdated Show resolved Hide resolved
@ejguan ejguan force-pushed the shuffle_after_sharding branch from db08b88 to fd7808a Compare November 16, 2022 20:05
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BFS looks good to me. Can we add a test that confirms the result's ordering? (It might already exist.)

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants