Skip to content

Commit

Permalink
Add option to stagger uploads based on local rank (mosaicml#3275)
Browse files Browse the repository at this point in the history
  • Loading branch information
dakinggg authored May 10, 2024
1 parent d895d56 commit 9d8f1c0
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions composer/loggers/remote_uploader_downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -698,4 +698,12 @@ def upload_file():
file_queue.task_done()
completed_queue.put_nowait(remote_file_name)

# When encountering issues with too much concurrency in uploads, staggering the uploads can help.
# This stagger is intended for use when uploading model shards from every rank, and will effectively reduce
# the concurrency by a factor of num GPUs per node.
local_rank = dist.get_local_rank()
local_rank_stagger = int(os.environ.get('COMPOSER_LOCAL_RANK_STAGGER_SECONDS', 0))
log.debug(f'Staggering uploads by {local_rank * local_rank_stagger} seconds on {local_rank} local rank.')
time.sleep(local_rank * local_rank_stagger)

upload_file()

0 comments on commit 9d8f1c0

Please sign in to comment.