Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please allow different batch sizes per gpu in ddp #15573

Open
mosheliv opened this issue Nov 7, 2022 · 8 comments
Open

Please allow different batch sizes per gpu in ddp #15573

mosheliv opened this issue Nov 7, 2022 · 8 comments
Labels
question Further information is requested won't fix This will not be worked on

Comments

@mosheliv
Copy link

mosheliv commented Nov 7, 2022

🚀 Feature

I propose adding instead of batch size a dictionary with batch size per GPU, for example {"cuda0":4, "cuda1", 6}

Motivation

I have gv100 (32gb) and 3090 (24gb). using the current muti gpu strategies, i can only use 24gb of memory from the gv100

Pitch

explained above

Alternatives

Automatic batch size sizing would be really nice as well for multi gpu, with different batch on different gpus

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.

@mosheliv mosheliv added the needs triage Waiting to be triaged by maintainers label Nov 7, 2022
@yichaoshen-MS
Copy link

Can I ask a question that in these days, if I set the "batch_size==1" and "Trainer.strategy=="ddp", gpus==4", the total batch_size will be 4 or 1?

@mosheliv
Copy link
Author

mosheliv commented Nov 8, 2022 via email

@tchaton
Copy link
Contributor

tchaton commented Nov 8, 2022

Hey @mosheliv,

Technically, nothing prevents you to do this directly within your data loader and provides a different batch_size.

However, you would need a custom-distributed sampler to take this into account

The distributed sampler would need to ensure batches are uniquely seen by a single process and the total number of batches across all machines is the same.

Best,
T.C

@mosheliv
Copy link
Author

mosheliv commented Nov 8, 2022 via email

@tchaton
Copy link
Contributor

tchaton commented Nov 8, 2022

Hey @mosheliv,

When using DDP, each process associated to their GPU are loading their own batches and therefore the batch_size is local. The total batch size is batch_size * world_size. This is the default behaviour in PyTorch and we kept it this way.

When using DP however, as there is a single process, it loads a single batch and scatter it across all GPUS.

one machine with two GPUs
Yes, for your own use case and using DDP, there are 2 processes running as you have 2 GPUS.

If you provide different batch_size between ranks, the ranks are going to see duplicated data. But it might not be a problem from a convergence point of view though if your dataset is large and you use the same batch size in validation and test.

I hope it helps a bit.

@mosheliv
Copy link
Author

mosheliv commented Nov 9, 2022 via email

@mosheliv
Copy link
Author

mosheliv commented Nov 9, 2022 via email

@awaelchli awaelchli added question Further information is requested and removed needs triage Waiting to be triaged by maintainers labels Nov 10, 2022
@stale
Copy link

stale bot commented Apr 14, 2023

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested won't fix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants