Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharding for multi-GPU training #1634

Open
albertz opened this issue Oct 15, 2024 · 2 comments
Open

Sharding for multi-GPU training #1634

albertz opened this issue Oct 15, 2024 · 2 comments
Assignees

Comments

@albertz
Copy link
Member

albertz commented Oct 15, 2024

Now, as a follow-up of #1630: a very nice next step/feature would be if we can use this sharding feature in general for any kind of multi GPU training. Similar as the horovod_dataset_distribution="shard" option we had for TF (which was implemented very inefficiently though, by just selecting every Nth seq from the dataset, i.e. the dataset still iterated through all the data). So maybe, to distinguish or making it more explicit where the sharding is done, we should not just call this "shard", but "dataset_sharding" or so.

We should also not reuse the horovod_dataset_distribution option (which is intended only for Horovod), but maybe generically distributed_dataset_distribution or so? Or it could be part of torch_distributed, just dataset_distribution in it? (In principle, we could reuse the feature later also for TF or other backend engines. But having it currently in torch_distributed is also fine.)

The dataset_distribution default would be "random_seed_offset" (i.e. like horovod_dataset_distribution="random_seed_offset"), which is the current behavior of PyTorch distributed training. (We could change the default via a new behavior version if we want to...)

(Also note, similar comment as I made in #1612: There are some implicit assumptions here: That the worker index and rank is static. This might not always be the case. But it might be possible to just update the shard index / num shards dynamically for the next sub-epoch. Just to keep this in mind. I don't think we need to take care of this now.)

@NeoLegends
Copy link
Member

The global option dataset_distribution=shard would supersede the distrib_shard_files option from DistributeFilesDataset(doing exactly the same), right?

@albertz
Copy link
Member Author

albertz commented Jan 16, 2025

The global option dataset_distribution=shard would supersede the distrib_shard_files option from DistributeFilesDataset(doing exactly the same), right?

Yes I think so. In DistributeFilesDataset, instead of distrib_shard_files: bool = False, we should use distrib_shard_files: Optional[bool] = None, and throw an error if this clashes with the global option dataset_distribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants