-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Accelerate model loading] Fix meta device and super low memory usage #1016
Conversation
The tests:
are currently failing on main. Also this PR renames: Related original PR: #850 @piEsposito does this work for you? |
@@ -487,71 +483,3 @@ def test_ddpm_ddim_equality_batched(self): | |||
|
|||
# the values aren't exactly equal, but the images look the same visually | |||
assert np.abs(ddpm_images - ddim_images).max() < 1e-1 | |||
|
|||
@require_torch_gpu | |||
def test_stable_diffusion_accelerate_load_works(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test doesn't do anything so let's delete it
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for fixing this, looks good to me!
@@ -119,14 +119,13 @@ def disable_attention_slicing(self): | |||
# set slice_size = `None` to disable `attention slicing` | |||
self.enable_attention_slicing(None) | |||
|
|||
def cuda_with_minimal_gpu_usage(self): | |||
def enable_sequential_cpu_offload(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great name choice!
pipeline_id = "CompVis/stable-diffusion-v1-4" | ||
|
||
start_time = time.time() | ||
pipeline_normal_load = StableDiffusionPipeline.from_pretrained( | ||
pipeline_id, revision="fp16", torch_dtype=torch.float16, use_auth_token=True | ||
) | ||
pipeline_normal_load.to(torch_device) | ||
normal_load_time = time.time() - start_time | ||
|
||
start_time = time.time() | ||
_ = StableDiffusionPipeline.from_pretrained( | ||
pipeline_id, revision="fp16", torch_dtype=torch.float16, use_auth_token=True, device_map="auto" | ||
) | ||
meta_device_load_time = time.time() - start_time | ||
|
||
assert 2 * meta_device_load_time < normal_load_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
@patrickvonplaten great naming choice, I love it! |
…huggingface#1016) * [Accelerate model loading] Fix meta device and super low memory usage * better naming
The tests:
are currently failing on main.
Also this PR renames:
cuda_with_minimal_gpu_usage
toenable_sequential_cpu_offload
as it's a more fitting name and disentangledenable_attention_slicing
fromcpu_offload
Related original PR: #850
@piEsposito does this work for you?