Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to avoid model offloading/reloading #2311

Closed
1 task done
badayvedat opened this issue Feb 20, 2024 · 10 comments
Closed
1 task done

how to avoid model offloading/reloading #2311

badayvedat opened this issue Feb 20, 2024 · 10 comments
Labels
question Further information is requested

Comments

@badayvedat
Copy link

Read Troubleshoot

  • I confirm that I have read the Troubleshoot guide before making this issue.

Describe the problem
Fooocus reloads the model and offload(s) the clones on every inference request (the same request). Notice the [Fooocus Model Management] Moving model(s) has taken <x> seconds on the second and third inference logs.

Full Console Log

Total VRAM 81051 MB, total RAM 1814211 MB
Set vram state to: HIGH_VRAM
Device: cuda:0 NVIDIA A100-SXM4-80GB : native
VAE dtype: torch.bfloat16
Using pytorch cross attention
/root/.cache/isolate/virtualenv/e92a635dd941079698e1bee142ebe7517b06330053a4306442cc7f043ef113ae/lib/python3.11/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
loaded straight to GPU
Requested to load SDXL
Loading 1 new model
Base model loaded: /data/fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/data/fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/data/fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/data/fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.38 seconds


### First Request ###
Enter LCM mode.
[Fooocus] Downloading LCM components ...
[Parameters] Adaptive CFG = 1.0
[Parameters] Sharpness = 0.0
[Parameters] ADM Scale = 1.0 : 1.0 : 0.0
[Parameters] CFG = 1.0
[Parameters] Seed = 176400
[Fooocus] Downloading control models ...
[Fooocus] Loading control models ...
[Parameters] Sampler = lcm - lcm
[Parameters] Steps = 8 - 8
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0], ('sdxl_lcm_lora.safetensors', 1.0)] for model [/data/fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/data/fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/data/fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Loaded LoRA [/data/fooocus/models/loras/sdxl_lcm_lora.safetensors] for UNet [/data/fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 1.0.
Requested to load SDXLClipModel
Loading 1 new model
unload clone 1
[Fooocus Model Management] Moving model(s) has taken 0.94 seconds
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] an astronaut in the jungle, cold color palette with butterflies in the background, highly detailed, 8k, professional composition, breathtaking, beautiful, sharp focus, cinematic, intricate, elegant, rich deep colors, perfect, aesthetic, very inspirational, sincere, cute, magical, fine detail, full strong creative, positive, vibrant, colorful, coherent, whole complex artistic, great thought
[Fooocus] Encoding positive #1 ...
[Fooocus] Image processing ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1024, 1024)
Preparation time: 5.48 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970144629478455, sigma_max = 14.614640235900879
Requested to load SDXL
Loading 1 new model
unload clone 2
[Fooocus Model Management] Moving model(s) has taken 0.44 seconds
  0%|          | 0/8 [00:00<?, ?it/s]
 12%|█▎        | 1/8 [00:00<00:03,  1.82it/s]
 38%|███▊      | 3/8 [00:00<00:01,  4.99it/s]
 62%|██████▎   | 5/8 [00:00<00:00,  7.33it/s]
 88%|████████▊ | 7/8 [00:01<00:00,  9.03it/s]
100%|██████████| 8/8 [00:01<00:00,  7.37it/s]
Requested to load AutoencoderKL
Loading 1 new model
Image generated with private log at: /data/fooocus/outputs/2024-02-20/log.html
Generating and saving time: 3.03 seconds

### Second Request ###
Enter LCM mode.
[Fooocus] Downloading LCM components ...
[Parameters] Adaptive CFG = 1.0
[Parameters] Sharpness = 0.0
[Parameters] ADM Scale = 1.0 : 1.0 : 0.0
[Parameters] CFG = 1.0
[Parameters] Seed = 176400
[Fooocus] Downloading control models ...
[Fooocus] Loading control models ...
[Parameters] Sampler = lcm - lcm
[Parameters] Steps = 8 - 8
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] an astronaut in the jungle, cold color palette with butterflies in the background, highly detailed, 8k, professional composition, breathtaking, beautiful, sharp focus, cinematic, intricate, elegant, rich deep colors, perfect, aesthetic, very inspirational, sincere, cute, magical, fine detail, full strong creative, positive, vibrant, colorful, coherent, whole complex artistic, great thought
[Fooocus] Encoding positive #1 ...
[Fooocus] Image processing ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1024, 1024)
Preparation time: 0.63 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.43672478199005127, sigma_max = 14.614640235900879
Requested to load SDXL
Loading 1 new model
unload clone 3
[Fooocus Model Management] Moving model(s) has taken 0.36 seconds    <----- Shouldn't have move any models
  0%|          | 0/8 [00:00<?, ?it/s]
 25%|██▌       | 2/8 [00:00<00:00, 12.06it/s]
 50%|█████     | 4/8 [00:00<00:00, 12.65it/s]
 75%|███████▌  | 6/8 [00:00<00:00, 12.66it/s]
100%|██████████| 8/8 [00:00<00:00, 12.61it/s]
100%|██████████| 8/8 [00:00<00:00, 12.57it/s]
Image generated with private log at: /data/fooocus/outputs/2024-02-20/log.html
Generating and saving time: 2.25 seconds

### Third Request ###
Enter LCM mode.
[Fooocus] Downloading LCM components ...
[Parameters] Adaptive CFG = 1.0
[Parameters] Sharpness = 0.0
[Parameters] ADM Scale = 1.0 : 1.0 : 0.0
[Parameters] CFG = 1.0
[Parameters] Seed = 176400
[Fooocus] Downloading control models ...
[Fooocus] Loading control models ...
[Parameters] Sampler = lcm - lcm
[Parameters] Steps = 8 - 8
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] an astronaut in the jungle, cold color palette with butterflies in the background, highly detailed, 8k, professional composition, breathtaking, beautiful, sharp focus, cinematic, intricate, elegant, rich deep colors, perfect, aesthetic, very inspirational, sincere, cute, magical, fine detail, full strong creative, positive, vibrant, colorful, coherent, whole complex artistic, great thought
[Fooocus] Encoding positive #1 ...
[Fooocus] Image processing ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1024, 1024)
Preparation time: 0.68 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.43672478199005127, sigma_max = 14.614640235900879
Requested to load SDXL
Loading 1 new model
unload clone 3
[Fooocus Model Management] Moving model(s) has taken 0.37 seconds    <----- Shouldn't have move any models
  0%|          | 0/8 [00:00<?, ?it/s]
 25%|██▌       | 2/8 [00:00<00:00, 12.13it/s]
 50%|█████     | 4/8 [00:00<00:00, 12.33it/s]
 75%|███████▌  | 6/8 [00:00<00:00, 12.48it/s]
100%|██████████| 8/8 [00:00<00:00, 12.43it/s]
100%|██████████| 8/8 [00:00<00:00, 12.40it/s]
Image generated with private log at: /data/fooocus/outputs/2024-02-20/log.html
Generating and saving time: 2.26 seconds

cc: @isidentical

@mashb1t
Copy link
Collaborator

mashb1t commented Feb 20, 2024

holy smokes, Total VRAM 81051 MB, total RAM 1814211 MB!

You can disable offloading with --disable-offload-from-vram or better in your case use always GPU with --always-gpu

@mashb1t mashb1t closed this as completed Feb 20, 2024
@mashb1t mashb1t added the question Further information is requested label Feb 20, 2024
@isidentical
Copy link

@mashb1t we are specifying those options, but the weird thing we noticed was that, at every subsequent request LCM lora would get loaded again and again (preparing time is never 0 seconds, always >=400ms). This is something we'd want to somehow optimize as it is super wasteful in the current state.

I have a feeling that this is related to how Fooocus maintains "unet" copies and how it fuses loras, but we weren't able to figure out a way to always maintain an LCM-lora fused model in memory to get rid of this. Any ideas/suggestions would be super amazing!

@mashb1t mashb1t reopened this Feb 20, 2024
@mashb1t
Copy link
Collaborator

mashb1t commented Feb 20, 2024

@isidentical I'm sadly not that deep into the model loading and patching myself and I'm afraid I can't be of much help here.
As you've mentioned, I'd have also assumed that --always-gpu at least would keep the models (both checkpoints and LoRAs) loaded.
Maybe the fusing happens on the fly as juggernaut by default isn't provided with integrated LCM.
To dig deeper, please check https://github.com/lllyasviel/Fooocus/blob/main/ldm_patched/modules/model_patcher.py and / or https://github.com/lllyasviel/Fooocus/blob/main/ldm_patched/modules/model_management.py for offloading.
Happy to hear your results and thank you for the analysis!

@mashb1t
Copy link
Collaborator

mashb1t commented Feb 22, 2024

@lllyasviel are you able to shed some light on the situation?

@isidentical
Copy link

@mashb1t just to update everyone here, we were able to fuse the LCM lora into the original juggernautXL model and then removed this line. This allowed us to get the generation timings from ~2 seconds to ~1.4-1.6 seconds (400-600ms shaved) for our own workflows. Not sure how much it is worth upstreaming this since it has its own pros/cons (e.g. you need a lot of VRAM and fooocus is generally a consumer orianted app so users might choose the flexiblity over couple hundred miliseconds). So for our part, I think the issue can be closed.

loras += [(modules.config.downloading_sdxl_lcm_lora(), 1.0)]

@mashb1t
Copy link
Collaborator

mashb1t commented Feb 22, 2024

@isidentical thank you for the update and for sharing your insights, much appreciated. As you've also already hinted at, the goal of Fooocus is to lower entry barriers and allow as many users as possible to generate images, even with low hardware specs/knowledge/internet bandwith. I assume the average Fooocus user doesn't mind waiting an additional few 100ms or even a second when the alternative is a sacrifice in flexibility.
Thanks again, closing as solved.

@mashb1t mashb1t closed this as completed Feb 22, 2024
@isidentical
Copy link

@mashb1t fooocus is an application we really like, so thank you both for maintaining it!! I don't see any sponsor button on the repo or on any of the maintainers, but we'd love to at least contribute financially if not as code/knowledge!

@poor7
Copy link

poor7 commented Feb 23, 2024

@isidentical @mashb1t Interesting material on optimization of SD XL has just been released, perhaps something might be useful to you. https://www.felixsanz.dev/articles/ultimate-guide-to-optimizing-stable-diffusion-xl

@mashb1t
Copy link
Collaborator

mashb1t commented Feb 23, 2024

@poor7 great article! Fooocus already uses almost all of the mentioned optimisation mechanisms except pre-compilation to keep ease of use high. This is why the VRAM footprint only is 4GB (even lower than the min. 6GB mentioned in the article), but a bit of swap is needed for offloading the models for fast access.

@Moorer624
Copy link

I have a brand new laptop with 32GB RAM and RTX4070. No matter what I do with switches, I continue to get the reload of model on every image created in a batch (like badayvedat). I have tinkered with --disable offload from VRAM, --always-gpu, --always high vram. Nothing seems to prevent the reloading on each image generation. Is there something I might be missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants