Partial Loading PR4: Enable partial loading (behind config flag) #7505

RyanJDick · 2024-12-31T16:27:07Z

Summary

This PR adds support for partial loading of models onto the GPU. This enables models to run with much lower peak VRAM requirements (e.g. full FLUX dev with 8GB of VRAM).

The partial loading feature is enabled behind a new config flag: enable_partial_loading=True. This flag defaults to False.

Note about performance:
The ram and vram config limits are still applied when enable_partial_loading=True is set. This can result in significant slowdowns compared to the 'old' behaviour. Consider the case where the VRAM limit is set to vram=0.75 (GB) and we are trying to run an 8GB model. When enable_partial_loading=False, we attempt to load the entire model into VRAM, and if it fits (no OOM error) then it will run at full speed. When enable_partial_loading=True, since we have the option to partially load the model we will only load 0.75 GB into VRAM and leave the remaining 7.25 GB in RAM. This will cause inference to be much slower than before. To workaround this, it is important that your ram and vram configs are carefully tuned. In a future PR, we will add the ability to dynamically set the RAM/VRAM limits based on the available memory / VRAM.

Related Issues / Discussions

QA Instructions

Tests with enable_partial_loading=True, vram=2, on CUDA device:
For all tests, we expect model memory to stay below 2 GB. Peak working memory will be higher.

Tests with enable_partial_loading=True, and hack to force all models to load 10%, on CUDA device:

Tests with enable_partial_loading=False, vram=30:
We expect no change in behaviour when enable_partial_loading=False.

Other platforms:

No change in behavior on MPS, even if enable_partial_loading=True.
No change in behavior on CPU-only systems, even if enable_partial_loading=True.

Merge Plan

Merge Partial Loading PR3: Integrate 1) partial loading, 2) quantized models, 3) model patching #7500 first, and change the target branch to main

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

hipsterusername · 2024-12-31T20:17:59Z

Is there a recommended setting for a 4090 to test/compare performance?

RyanJDick · 2025-01-02T00:39:24Z

Is there a recommended setting for a 4090 to test/compare performance?

For a 4090, you could test with something like this:

enable_partial_loading: true
vram: 18
ram: 30

That should give enough working memory for most operations. You should expect a slowdown on large models >18GB, but anything smaller than that will run at full speed.

psychedelicious

Testing on Windows w/ 4070 TI (12 GB VRAM), 32GB RAM, with FLUX Dev (full). Fresh install.

Default config

Cache key error while handling lock. Suppose this is actually an OOM.

Logs

[2025-01-06 22:36:34,894]::[InvokeAI]::INFO --> Executing queue item 1, session 0e92668f-cd7a-49b1-86b7-00fe3ea5a657
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.71it/s]
[2025-01-06 22:36:47,944]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 5.20s. Total model size: 9083.39MB, VRAM: 9083.39MB (100.0%)
[2025-01-06 22:36:47,975]::[InvokeAI]::ERROR --> Error while invoking session 0e92668f-cd7a-49b1-86b7-00fe3ea5a657, invocation 560e66bc-beee-432d-b1f6-440bf001c1a2 (flux_text_encoder): '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2'
[2025-01-06 22:36:47,975]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\services\session_processor\session_processor_default.py", line 129, in run_node     
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\services\session_processor\session_processor_default.py", line 129, in run_node     
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^
             ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 60, in invoke
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 60, in invoke
    t5_embeddings = self._t5_encode(context)
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 60, in invoke
    t5_embeddings = self._t5_encode(context)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
    t5_embeddings = self._t5_encode(context)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
    with (
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
    with (
    with (
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\backend\model_manager\load\load_base.py", line 60, in __enter__
    self._cache.lock(self._cache_record.key)
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\backend\model_manager\load\model_cache\model_cache.py", line 199, in lock
    cache_entry = self._cached_models[key]
                  ~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2'

[2025-01-06 22:36:48,007]::[InvokeAI]::INFO --> Graph stats: 0e92668f-cd7a-49b1-86b7-00fe3ea5a657
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.008s     0.000G
             flux_text_encoder       1   13.037s     9.118G
TOTAL GRAPH EXECUTION TIME:  13.045s
TOTAL GRAPH WALL TIME:  13.046s
RAM used by InvokeAI process: 9.97G (+9.212G)
RAM used to load models: 8.87G
VRAM in use: 8.871G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 2
   Models cached: 1
   Models cleared from cache: 1
   Cache high water mark: 8.87/0.00G

Add `enable_partial_loading: true` to config

Same cache key error while handling lock.

Logs

[2025-01-06 22:39:58,611]::[InvokeAI]::INFO --> Executing queue item 2, session 6318f284-b0df-4e3e-bd72-76a5e8eb82f9
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  5.30it/s]
[2025-01-06 22:40:06,899]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 0.16s. Total model size: 9334.39MB, VRAM: 251.39MB (2.7%)
[2025-01-06 22:40:06,899]::[InvokeAI]::ERROR --> Error while invoking session 6318f284-b0df-4e3e-bd72-76a5e8eb82f9, invocation f0aea7a9-1c93-4605-a503-de728f1df1be (flux_text_encoder): '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2'
[2025-01-06 22:40:06,899]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\services\session_processor\session_processor_default.py", line 129, in run_node     
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 60, in invoke
    t5_embeddings = self._t5_encode(context)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
    with (
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\backend\model_manager\load\load_base.py", line 60, in __enter__
    self._cache.lock(self._cache_record.key)
  File "C:\Users\Nelle\Documents\InvokeAI\invokeai\backend\model_manager\load\model_cache\model_cache.py", line 199, in lock
    cache_entry = self._cached_models[key]
                  ~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2'

[2025-01-06 22:40:06,941]::[InvokeAI]::INFO --> Graph stats: 6318f284-b0df-4e3e-bd72-76a5e8eb82f9
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.214s     0.000G
             flux_text_encoder       1    8.070s     0.246G
TOTAL GRAPH EXECUTION TIME:   8.283s
TOTAL GRAPH WALL TIME:   8.283s
RAM used by InvokeAI process: 1.34G (+0.588G)
RAM used to load models: 9.12G
VRAM in use: 0.000G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 2
   Models cached: 1
   Models cleared from cache: 1
   Cache high water mark: 9.12/0.00G

Add `ram: 30`, `vram: 10` to config

It works! I also tested with FLUX Canny LoRA & IP Adapter. All work, with expected changes in perf.

Logs

l model size: 9334.39MB, VRAM: 9334.39MB (100.0%)
[2025-01-06 22:40:58,719]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[2025-01-06 22:40:59,818]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder' (CLIPTextModel) onto cuda device in 0.04s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-01-06 22:40:59,818]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\transformers\models\clip\modeling_clip.py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[2025-01-06 22:41:11,980]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 7.11s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
 83%|████████████████████████████████████████████████████████████████████████████████████▏     87%|████████████████████████████████████████100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:13<00:00,  2.45s/it]
[2025-01-06 22:42:25,968]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.04s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 22:42:32,484]::[InvokeAI]::INFO --> Graph stats: 15ba6bbe-c65a-4774-b9b6-3e0921f5eaab
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.007s     0.000G
             flux_text_encoder       1   11.438s     9.347G
                       collect       1    0.000s     9.343G
                  flux_denoise       1   85.507s    10.587G
                 core_metadata       1    0.016s     9.999G
               flux_vae_decode       1    7.013s    12.098G
TOTAL GRAPH EXECUTION TIME: 103.981s
TOTAL GRAPH WALL TIME: 103.982s
RAM used by InvokeAI process: 21.04G (+20.285G)
RAM used to load models: 31.90G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 6
   Models cached: 4
   Models cleared from cache: 2
   Cache high water mark: 22.78/0.00G

3x consecutive txt2img generations, same settings, different seed, no aux models

But I noticed some inconsistencies in generation time. Restarted and did 3 consecutive gens - it/s changes substantially from generation to generation. In this test and all following tests, the settings, prompt, model etc are the same. Just using a random seed.

Logs

(.venv) PS C:\Users\Nelle\Documents\InvokeAI\invokeai\frontend\web> invokeai-web
[2025-01-06 22:56:18,990]::[InvokeAI]::INFO --> Patchmatch initialized
[2025-01-06 22:56:19,701]::[InvokeAI]::INFO --> Using torch device: NVIDIA GeForce RTX 4070 Ti
[2025-01-06 22:56:20,829]::[InvokeAI]::INFO --> cuDNN version: 90100
[2025-01-06 22:56:20,864]::[InvokeAI]::INFO --> InvokeAI version 5.5.0
[2025-01-06 22:56:20,865]::[InvokeAI]::INFO --> Root directory = C:\Users\Nelle\Documents\InvokeAI
[2025-01-06 22:56:20,865]::[InvokeAI]::INFO --> Initializing database at C:\Users\Nelle\Documents\InvokeAI\databases\invokeai.db
[2025-01-06 22:56:20,867]::[ModelInstallService]::INFO --> Removing dangling temporary directory C:\Users\Nelle\Documents\InvokeAI\models\tmpinstall_m4qqxyz5
[2025-01-06 22:56:20,882]::[InvokeAI]::INFO --> Pruned 6 finished queue items
[2025-01-06 22:56:21,020]::[InvokeAI]::INFO --> Cleaned database (freed 0.13MB)
[2025-01-06 22:56:21,020]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
[2025-01-06 22:57:36,521]::[InvokeAI]::INFO --> Executing queue item 9, session ace01c97-fc44-4aa3-9fb8-f95b422a6e2f
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.97it/s]       
[2025-01-06 22:57:51,164]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 6.98s. Total model size: 9334.39MB, VRAM: 9334.39MB (100.0%)
[2025-01-06 22:57:51,164]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[2025-01-06 22:57:51,916]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder' (CLIPTextModel) onto cuda device in 0.05s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-01-06 22:57:51,916]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\transformers\models\clip\modeling_clip.py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[2025-01-06 22:58:00,815]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 6.81s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:06<00:00,  2.21s/it] 
[2025-01-06 22:59:07,588]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.05s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 22:59:12,658]::[InvokeAI]::INFO --> Graph stats: ace01c97-fc44-4aa3-9fb8-f95b422a6e2f
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.217s     0.000G
             flux_text_encoder       1   15.242s     9.347G
                       collect       1    0.001s     9.343G
                  flux_denoise       1   75.147s    10.587G
                 core_metadata       1    0.000s     9.999G
               flux_vae_decode       1    5.510s    12.098G
TOTAL GRAPH EXECUTION TIME:  96.117s
TOTAL GRAPH WALL TIME:  96.117s
RAM used by InvokeAI process: 23.53G (+22.772G)
RAM used to load models: 31.90G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 6
   Models cached: 4
   Models cleared from cache: 2
   Cache high water mark: 22.78/0.00G

[2025-01-06 22:59:12,679]::[InvokeAI]::INFO --> Executing queue item 10, session 593146d4-67ae-4721-98c7-1ad1bdcee934
[2025-01-06 22:59:12,985]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.29s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:14<00:00,  4.49s/it] 
[2025-01-06 23:01:27,672]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.05s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:01:33,057]::[InvokeAI]::INFO --> Graph stats: 593146d4-67ae-4721-98c7-1ad1bdcee934
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.000s     9.970G
             flux_text_encoder       1    0.000s     9.970G
                       collect       1    0.000s     9.970G
                  flux_denoise       1  134.927s    10.586G
                 core_metadata       1    0.000s     9.999G
               flux_vae_decode       1    5.412s    12.098G
TOTAL GRAPH EXECUTION TIME: 140.339s
TOTAL GRAPH WALL TIME: 140.341s
RAM used by InvokeAI process: 22.70G (-0.824G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 0
   Models cached: 4
   Models cleared from cache: 0
   Cache high water mark: 22.78/0.00G

[2025-01-06 23:01:33,057]::[InvokeAI]::INFO --> Executing queue item 11, session 4a54712a-5718-4c7b-82e9-28acbc91fbc5
[2025-01-06 23:01:33,143]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.06s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:14<00:00,  4.47s/it] 
[2025-01-06 23:03:47,308]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.05s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:03:52,770]::[InvokeAI]::INFO --> Graph stats: 4a54712a-5718-4c7b-82e9-28acbc91fbc5
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.000s     9.970G
             flux_text_encoder       1    0.000s     9.970G
                       collect       1    0.001s     9.970G
                  flux_denoise       1  134.172s    10.586G
                 core_metadata       1    0.001s     9.999G
               flux_vae_decode       1    5.487s    12.098G
TOTAL GRAPH EXECUTION TIME: 139.661s
TOTAL GRAPH WALL TIME: 139.662s
RAM used by InvokeAI process: 22.68G (-0.027G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 0
   Models cached: 4
   Models cleared from cache: 0
   Cache high water mark: 22.78/0.00G

Fresh start and another 3 gens - time differs quite a bit.

Logs

(.venv) PS C:\Users\Nelle\Documents\InvokeAI\invokeai\frontend\web> invokeai-web
[2025-01-06 23:32:20,404]::[InvokeAI]::INFO --> Patchmatch initialized
[2025-01-06 23:32:21,137]::[InvokeAI]::INFO --> Using torch device: NVIDIA GeForce RTX 4070 Ti
[2025-01-06 23:32:22,289]::[InvokeAI]::INFO --> cuDNN version: 90100
[2025-01-06 23:32:22,304]::[InvokeAI]::INFO --> InvokeAI version 5.5.0
[2025-01-06 23:32:22,304]::[InvokeAI]::INFO --> Root directory = C:\Users\Nelle\Documents\InvokeAI
[2025-01-06 23:32:22,304]::[InvokeAI]::INFO --> Initializing database at C:\Users\Nelle\Documents\InvokeAI\databases\invokeai.db
[2025-01-06 23:32:22,336]::[InvokeAI]::INFO --> Pruned 6 finished queue items
[2025-01-06 23:32:22,476]::[InvokeAI]::INFO --> Cleaned database (freed 0.19MB)
[2025-01-06 23:32:22,476]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
[2025-01-06 23:32:28,658]::[InvokeAI]::INFO --> Executing queue item 29, session 5dc2e6f3-c32a-43d9-93bd-cfcaa338546a
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.96it/s]
[2025-01-06 23:32:45,515]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 9.23s. Total model size: 9334.39MB, VRAM: 9334.39MB (100.0%)
[2025-01-06 23:32:45,515]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[2025-01-06 23:32:46,570]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder' (CLIPTextModel) onto cuda device in 0.06s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-01-06 23:32:46,570]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\transformers\models\clip\modeling_clip.py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[2025-01-06 23:32:55,675]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 6.99s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:48<00:00,  3.60s/it]
[2025-01-06 23:34:44,335]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.06s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:34:52,779]::[InvokeAI]::INFO --> Graph stats: 5dc2e6f3-c32a-43d9-93bd-cfcaa338546a
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.007s     0.000G
             flux_text_encoder       1   17.976s     9.347G
                       collect       1    0.000s     9.343G
                  flux_denoise       1  117.154s    10.587G
                 core_metadata       1    0.017s     9.999G
               flux_vae_decode       1    8.928s    12.098G
TOTAL GRAPH EXECUTION TIME: 144.082s
TOTAL GRAPH WALL TIME: 144.086s
RAM used by InvokeAI process: 23.04G (+22.289G)
RAM used to load models: 31.90G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 6
   Models cached: 4
   Models cleared from cache: 2
   Cache high water mark: 22.78/0.00G

[2025-01-06 23:34:52,815]::[InvokeAI]::INFO --> Executing queue item 30, session 552db5b1-7b7b-4f3c-aaf3-23efac352189
[2025-01-06 23:34:53,378]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.54s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:16<00:00,  8.55s/it] 
[2025-01-06 23:39:09,915]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.07s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:39:15,194]::[InvokeAI]::INFO --> Graph stats: 552db5b1-7b7b-4f3c-aaf3-23efac352189
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.001s     9.970G
             flux_text_encoder       1    0.000s     9.970G
                       collect       1    0.000s     9.970G
                  flux_denoise       1  257.015s    10.586G
                 core_metadata       1    0.001s     9.999G
               flux_vae_decode       1    5.321s    12.098G
TOTAL GRAPH EXECUTION TIME: 262.339s
TOTAL GRAPH WALL TIME: 262.341s
RAM used by InvokeAI process: 22.22G (-0.823G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 0
   Models cached: 4
   Models cleared from cache: 0
   Cache high water mark: 22.78/0.00G

[2025-01-06 23:39:15,210]::[InvokeAI]::INFO --> Executing queue item 31, session b41bcac9-6ccf-4c4c-b89e-f0a3d6638f27
[2025-01-06 23:39:15,307]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.07s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:43<00:00,  5.46s/it] 
[2025-01-06 23:41:59,339]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.08s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:42:04,699]::[InvokeAI]::INFO --> Graph stats: b41bcac9-6ccf-4c4c-b89e-f0a3d6638f27
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.000s     9.970G
             flux_text_encoder       1    0.001s     9.970G
                       collect       1    0.001s     9.970G
                  flux_denoise       1  164.038s    10.586G
                 core_metadata       1    0.000s     9.999G
               flux_vae_decode       1    5.429s    12.098G
TOTAL GRAPH EXECUTION TIME: 169.469s
TOTAL GRAPH WALL TIME: 169.469s
RAM used by InvokeAI process: 22.12G (-0.102G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 0
   Models cached: 4
   Models cleared from cache: 0
   Cache high water mark: 22.78/0.00G

Another fresh start and 3 gens, but I enabled memory logging and set log level to debug. I manually commented out some thread polling log statements that were polluting the log. Still inconsistent.

Logs

(.venv) PS C:\Users\Nelle\Documents\InvokeAI\invokeai\frontend\web> invokeai-web
[2025-01-06 23:46:05,679]::[InvokeAI]::INFO --> Patchmatch initialized
[2025-01-06 23:46:06,352]::[InvokeAI]::INFO --> Using torch device: NVIDIA GeForce RTX 4070 Ti
[2025-01-06 23:46:07,489]::[InvokeAI]::INFO --> cuDNN version: 90100
[2025-01-06 23:46:07,508]::[InvokeAI]::INFO --> InvokeAI version 5.5.0
[2025-01-06 23:46:07,508]::[InvokeAI]::INFO --> Root directory = C:\Users\Nelle\Documents\InvokeAI
[2025-01-06 23:46:07,509]::[InvokeAI]::INFO --> Initializing database at C:\Users\Nelle\Documents\InvokeAI\databases\invokeai.db
[2025-01-06 23:46:07,509]::[InvokeAI]::DEBUG --> Registered migration 0 -> 1
[2025-01-06 23:46:07,509]::[InvokeAI]::DEBUG --> Registered migration 1 -> 2
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 2 -> 3
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 3 -> 4
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 4 -> 5
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 5 -> 6
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 6 -> 7
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 7 -> 8
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 8 -> 9
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 9 -> 10
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 10 -> 11
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 11 -> 12
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 12 -> 13
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 13 -> 14
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 14 -> 15
[2025-01-06 23:46:07,511]::[InvokeAI]::DEBUG --> Database is up to date, no migrations to run
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-2 (_download_next_item) starting.
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-3 (_download_next_item) starting.
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-4 (_download_next_item) starting.
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-5 (_download_next_item) starting.
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-6 (_download_next_item) starting.
[2025-01-06 23:46:07,659]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
[2025-01-06 23:46:15,836]::[InvokeAI]::INFO --> Executing queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c
[2025-01-06 23:46:15,836]::[InvokeAI]::DEBUG --> On before run session: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c
[2025-01-06 23:46:15,842]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node d78ff88b-71e4-4009-9ef6-aaa0442cf8c7 (flux_model_loader)
[2025-01-06 23:46:15,843]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_model_loader": d78ff88b-71e4-4009-9ef6-aaa0442cf8c7
[2025-01-06 23:46:15,844]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node d78ff88b-71e4-4009-9ef6-aaa0442cf8c7 (flux_model_loader)
[2025-01-06 23:46:16,045]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 156aa19c-8bfb-46d8-83dc-1759aecebbe2 (flux_text_encoder)
[2025-01-06 23:46:16,045]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_text_encoder": 156aa19c-8bfb-46d8-83dc-1759aecebbe2
[2025-01-06 23:46:16,052]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2
[2025-01-06 23:46:16,053]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 0.00MB of RAM.
[2025-01-06 23:46:16,053]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:     0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:     0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
  CUDA Memory Allocated:         0.0 MB
  Total models:                  0

[2025-01-06 23:46:16,053]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:16,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:     0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:     0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
  CUDA Memory Allocated:         0.0 MB
  Total models:                  0

[2025-01-06 23:46:16,151]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 0.03MB of RAM.
[2025-01-06 23:46:16,151]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:     0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:     0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
  CUDA Memory Allocated:         0.0 MB
  Total models:                  0

[2025-01-06 23:46:16,151]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:16,154]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:     0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:     0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
  CUDA Memory Allocated:         0.0 MB
  Total models:                  0

[2025-01-06 23:46:16,154]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer, Wrap mode: CachedModelOnlyFullLoad, Model size: 0.03MB)
[2025-01-06 23:46:16,154]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer)
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 9083.39MB of RAM.
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:     0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:     0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
  CUDA Memory Allocated:         0.0 MB
  Total models:                  1
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False

[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:     0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:     0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
  CUDA Memory Allocated:         0.0 MB
  Total models:                  1
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  5.59it/s]
[2025-01-06 23:46:23,438]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 9083.39MB of RAM.
[2025-01-06 23:46:23,438]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:     0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:     0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
  CUDA Memory Allocated:         0.0 MB
  Total models:                  1
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False

[2025-01-06 23:46:23,438]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:23,438]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:     0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:     0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
  CUDA Memory Allocated:         0.0 MB
  Total models:                  1
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False

[2025-01-06 23:46:23,454]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel, Wrap mode: CachedModelWithPartialLoad, Model size: 9083.39MB)
[2025-01-06 23:46:23,454]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel)
[2025-01-06 23:46:23,454]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel)
[2025-01-06 23:46:23,459]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=9334 MB, model_vram=0 MB (0.0% %), vram_available=10240 MB, 
[2025-01-06 23:46:23,459]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 0.00MB of VRAM.
[2025-01-06 23:46:23,459]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=0.00MB
[2025-01-06 23:46:23,459]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=9334 MB, model_vram=0 MB (0.0% %), vram_available=10240 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 2.92s. Total model size: 9334.39MB, VRAM: 9334.39MB (100.0%)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=9334.39MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=9334 MB, model_vram=9334 MB (100.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9084.4 MB
  Total models:                  2
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=True

[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 0.00MB of VRAM.
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=0.00MB
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=0.00MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9084.4 MB
  Total models:                  2
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=True
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=True

[2025-01-06 23:46:26,575]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer)
[2025-01-06 23:46:26,575]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel)
[2025-01-06 23:46:26,579]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer
[2025-01-06 23:46:26,579]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 0.00MB of RAM.
[2025-01-06 23:46:26,580]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9096.5 MB
  Total models:                  2
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,671]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:26,671]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9096.5 MB
  Total models:                  2
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 0.00MB of RAM.
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9096.5 MB
  Total models:                  2
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9096.5 MB
  Total models:                  2
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer, Wrap mode: CachedModelOnlyFullLoad, Model size: 0.00MB)
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer)
[2025-01-06 23:46:26,725]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder
[2025-01-06 23:46:26,725]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 234.74MB of RAM.
[2025-01-06 23:46:26,725]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9096.5 MB
  Total models:                  3
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,726]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:26,726]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9096.5 MB
  Total models:                  3
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 469.44MB of RAM.
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9096.5 MB
  Total models:                  3
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9334.4 MB (91.2%), Available:   905.6 MB ( 8.8%)
  CUDA Memory Allocated:         9096.5 MB
  Total models:                  3
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel, Wrap mode: CachedModelWithPartialLoad, Model size: 469.44MB)
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel)
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel)
[2025-01-06 23:46:26,861]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=469 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,861]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 0.00MB of VRAM.
[2025-01-06 23:46:26,862]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=0.00MB
[2025-01-06 23:46:26,862]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=469 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder' (CLIPTextModel) onto cuda device in 0.06s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=469.44MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=469 MB, model_vram=469 MB (100.0% %), vram_available=436 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9803.9 MB (31.9%), Available: 20916.1 MB (68.1%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9803.8 MB (95.7%), Available:   436.2 MB ( 4.3%)
  CUDA Memory Allocated:         9567.3 MB
  Total models:                  4
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=  469.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=True

[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=436 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 0.00MB of VRAM.
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=0.00MB
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=436 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=0.00MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=436 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9803.9 MB (31.9%), Available: 20916.1 MB (68.1%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9803.8 MB (95.7%), Available:   436.2 MB ( 4.3%)
  CUDA Memory Allocated:         9567.3 MB
  Total models:                  4
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=True
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=  469.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=True

C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\transformers\models\clip\modeling_clip.py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[2025-01-06 23:46:26,961]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer)
[2025-01-06 23:46:26,961]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel)
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 156aa19c-8bfb-46d8-83dc-1759aecebbe2 (flux_text_encoder)
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 51833822-f3b9-4ad2-9932-40a107a95d67 (collect)  
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> Invocation cache miss for type "collect": 51833822-f3b9-4ad2-9932-40a107a95d67
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 51833822-f3b9-4ad2-9932-40a107a95d67 (collect)   
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 17eec3f2-90c7-47f9-8759-e185424bddc6 (flux_denoise)
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_denoise": 17eec3f2-90c7-47f9-8759-e185424bddc6
[2025-01-06 23:46:26,987]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer
[2025-01-06 23:46:26,988]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 22700.25MB of RAM.
[2025-01-06 23:46:26,988]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:  9803.9 MB (31.9%), Available: 20916.1 MB (68.1%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:  9803.8 MB (95.7%), Available:   436.2 MB ( 4.3%)
  CUDA Memory Allocated:         9576.6 MB
  Total models:                  4
  Models:
    847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB (100.0%), locked=False
    847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel):            total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=  469.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:26,988]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropping 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 from RAM cache to free 0.03MB.
[2025-01-06 23:46:26,991]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropping 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 from RAM cache to free 9334.39MB.
[2025-01-06 23:46:27,759]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 2 models to free 9334.42MB of RAM.
[2025-01-06 23:46:27,759]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:   469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:   469.4 MB ( 4.6%), Available:  9770.6 MB (95.4%)
  CUDA Memory Allocated:         492.2 MB
  Total models:                  2
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=  469.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 22700.13MB of RAM.
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:   469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:   469.4 MB ( 4.6%), Available:  9770.6 MB (95.4%)
  CUDA Memory Allocated:         492.2 MB
  Total models:                  2
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=  469.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:   469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:   469.4 MB ( 4.6%), Available:  9770.6 MB (95.4%)
  CUDA Memory Allocated:         492.2 MB
  Total models:                  2
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=  469.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 22700.13MB of RAM.
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:   469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:   469.4 MB ( 4.6%), Available:  9770.6 MB (95.4%)
  CUDA Memory Allocated:         492.2 MB
  Total models:                  2
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=  469.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:29,027]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:29,027]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used:   469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used:   469.4 MB ( 4.6%), Available:  9770.6 MB (95.4%)
  CUDA Memory Allocated:         492.2 MB
  Total models:                  2
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=  469.4 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=False

[2025-01-06 23:46:29,049]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux, Wrap mode: CachedModelWithPartialLoad, Model size: 22700.13MB)
[2025-01-06 23:46:29,049]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:46:29,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:46:29,057]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=22700 MB, model_vram=0 MB (0.0% %), vram_available=9771 MB,
[2025-01-06 23:46:29,058]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 12929.57MB of VRAM.
[2025-01-06 23:46:29,062]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder from VRAM to free 469 MB.
[2025-01-06 23:46:29,064]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=469.44MB
[2025-01-06 23:46:29,064]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=22700 MB, model_vram=0 MB (0.0% %), vram_available=10240 MB,
[2025-01-06 23:46:33,066]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 4.01s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
[2025-01-06 23:46:33,066]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=10226.13MB,
[2025-01-06 23:46:33,066]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=22700 MB, model_vram=10226 MB (45.0% %), vram_available=14 MB,
[2025-01-06 23:46:33,066]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:46:33,066]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available:  7550.4 MB (24.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available:    13.9 MB ( 0.1%)
  CUDA Memory Allocated:         10249.1 MB
  Total models:                  3
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=True

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:07<00:00,  2.25s/it]
[2025-01-06 23:47:40,449]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:47:40,476]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 17eec3f2-90c7-47f9-8759-e185424bddc6 (flux_denoise)
[2025-01-06 23:47:40,477]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node a89c8ad7-8a7f-4b93-bc4f-67fdf51a35ad (core_metadata)
[2025-01-06 23:47:40,478]::[InvokeAI]::DEBUG --> Invocation cache miss for type "core_metadata": a89c8ad7-8a7f-4b93-bc4f-67fdf51a35ad
[2025-01-06 23:47:40,488]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node a89c8ad7-8a7f-4b93-bc4f-67fdf51a35ad (core_metadata)
[2025-01-06 23:47:40,489]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 80e3fba5-9fa1-498b-a4bd-06256d8efad9 (flux_vae_decode)
[2025-01-06 23:47:40,489]::[InvokeAI]::DEBUG --> Skipping invocation cache for "flux_vae_decode": 80e3fba5-9fa1-498b-a4bd-06256d8efad9
[2025-01-06 23:47:40,492]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae
[2025-01-06 23:47:40,492]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 319.77MB of RAM.
[2025-01-06 23:47:40,493]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available:  7550.4 MB (24.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available:    13.9 MB ( 0.1%)
  CUDA Memory Allocated:         10239.3 MB
  Total models:                  3
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=False

[2025-01-06 23:47:40,498]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:47:40,498]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available:  7550.4 MB (24.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available:    13.9 MB ( 0.1%)
  CUDA Memory Allocated:         10239.3 MB
  Total models:                  3
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=False

[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 159.87MB of RAM.
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available:  7550.4 MB (24.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available:    13.9 MB ( 0.1%)
  CUDA Memory Allocated:         10239.3 MB
  Total models:                  3
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=False

[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available:  7550.4 MB (24.6%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available:    13.9 MB ( 0.1%)
  CUDA Memory Allocated:         10239.3 MB
  Total models:                  3
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=False

[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder, Wrap mode: CachedModelWithPartialLoad, Model size: 159.87MB)
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:47:40,910]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:47:40,910]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=14 MB,
[2025-01-06 23:47:40,910]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 146.01MB of VRAM.
[2025-01-06 23:47:40,930]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer from VRAM to free 194 MB.
[2025-01-06 23:47:40,930]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=193.92MB
[2025-01-06 23:47:40,930]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=208 MB,
[2025-01-06 23:47:40,960]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.05s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:47:40,960]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=159.87MB,
[2025-01-06 23:47:40,960]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=160 MB, model_vram=160 MB (100.0% %), vram_available=48 MB,
[2025-01-06 23:47:40,960]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:47:40,963]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available:  7390.6 MB (24.1%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10192.1 MB (99.5%), Available:    47.9 MB ( 0.5%)
  CUDA Memory Allocated:         10208.9 MB
  Total models:                  4
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10032.2 MB (44.2%), ram=12667.9 MB (55.8%), locked=False
    ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder):                          total=  159.9 MB, vram=  159.9 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=True

[2025-01-06 23:47:46,576]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:47:46,940]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 80e3fba5-9fa1-498b-a4bd-06256d8efad9 (flux_vae_decode)
[2025-01-06 23:47:46,940]::[InvokeAI]::DEBUG --> On after run session: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c
[2025-01-06 23:47:46,962]::[InvokeAI]::INFO --> Graph stats: fc5be7a7-578e-4350-b32b-5c8ec6ba051c
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.203s     0.000G
             flux_text_encoder       1   10.929s     9.347G
                       collect       1    0.000s     9.343G
                  flux_denoise       1   73.503s    10.587G
                 core_metadata       1    0.010s     9.999G
               flux_vae_decode       1    6.451s    12.098G
TOTAL GRAPH EXECUTION TIME:  91.097s
TOTAL GRAPH WALL TIME:  91.098s
RAM used by InvokeAI process: 22.57G (+21.819G)
RAM used to load models: 31.90G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 6
   Models cached: 4
   Models cleared from cache: 2
   Cache high water mark: 22.78/0.00G

[2025-01-06 23:47:46,982]::[InvokeAI]::INFO --> Executing queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2
[2025-01-06 23:47:46,982]::[InvokeAI]::DEBUG --> On before run session: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2
[2025-01-06 23:47:46,985]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 9ae2f4aa-cf77-4125-ba77-cf4cc01a2309 (flux_model_loader)
[2025-01-06 23:47:46,986]::[InvokeAI]::DEBUG --> Invocation cache hit for type "flux_model_loader": 9ae2f4aa-cf77-4125-ba77-cf4cc01a2309
[2025-01-06 23:47:46,986]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 9ae2f4aa-cf77-4125-ba77-cf4cc01a2309 (flux_model_loader)
[2025-01-06 23:47:46,988]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node b7c54fb9-b437-4f57-9260-4a236fe5d0be (flux_text_encoder)
[2025-01-06 23:47:46,988]::[InvokeAI]::DEBUG --> Invocation cache hit for type "flux_text_encoder": b7c54fb9-b437-4f57-9260-4a236fe5d0be
[2025-01-06 23:47:46,989]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node b7c54fb9-b437-4f57-9260-4a236fe5d0be (flux_text_encoder)
[2025-01-06 23:47:46,989]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node bbe083f0-bf0a-42e5-b6fb-6b60a32c9f84 (collect)  
[2025-01-06 23:47:46,989]::[InvokeAI]::DEBUG --> Invocation cache hit for type "collect": bbe083f0-bf0a-42e5-b6fb-6b60a32c9f84
[2025-01-06 23:47:46,990]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node bbe083f0-bf0a-42e5-b6fb-6b60a32c9f84 (collect)   
[2025-01-06 23:47:46,990]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 1b141073-c5b2-4256-a47f-6ac429fde88e (flux_denoise)
[2025-01-06 23:47:46,991]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_denoise": 1b141073-c5b2-4256-a47f-6ac429fde88e
[2025-01-06 23:47:47,006]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:47:47,008]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:47:47,008]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=22700 MB, model_vram=10032 MB (44.2% %), vram_available=48 MB,
[2025-01-06 23:47:47,008]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 12620.01MB of VRAM.
[2025-01-06 23:47:47,011]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae from VRAM to free 160 MB.
[2025-01-06 23:47:47,031]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=159.87MB
[2025-01-06 23:47:47,032]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=22700 MB, model_vram=10032 MB (44.2% %), vram_available=208 MB,
[2025-01-06 23:47:47,384]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.38s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
[2025-01-06 23:47:47,384]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=193.92MB,
[2025-01-06 23:47:47,384]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=22700 MB, model_vram=10226 MB (45.0% %), vram_available=14 MB,
[2025-01-06 23:47:47,384]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:47:47,384]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available:  7390.6 MB (24.1%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available:    13.9 MB ( 0.1%)
  CUDA Memory Allocated:         10248.3 MB
  Total models:                  4
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=True
    ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder):                          total=  159.9 MB, vram=    0.0 MB ( 0.0%), ram=  159.9 MB (100.0%), locked=False

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [03:17<00:00,  6.58s/it] 
[2025-01-06 23:51:04,731]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:51:04,745]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 1b141073-c5b2-4256-a47f-6ac429fde88e (flux_denoise)
[2025-01-06 23:51:04,746]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 999c5bf6-d0c0-4ab3-ba26-cda2aae5c152 (core_metadata)
[2025-01-06 23:51:04,747]::[InvokeAI]::DEBUG --> Invocation cache miss for type "core_metadata": 999c5bf6-d0c0-4ab3-ba26-cda2aae5c152
[2025-01-06 23:51:04,747]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 999c5bf6-d0c0-4ab3-ba26-cda2aae5c152 (core_metadata)
[2025-01-06 23:51:04,747]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 0f3d6d5a-da92-40d5-bc54-851d1e2bf478 (flux_vae_decode)
[2025-01-06 23:51:04,748]::[InvokeAI]::DEBUG --> Skipping invocation cache for "flux_vae_decode": 0f3d6d5a-da92-40d5-bc54-851d1e2bf478
[2025-01-06 23:51:04,750]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:51:04,751]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:51:04,751]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=14 MB,
[2025-01-06 23:51:04,751]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 146.01MB of VRAM.
[2025-01-06 23:51:04,766]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer from VRAM to free 194 MB.
[2025-01-06 23:51:04,780]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=193.92MB
[2025-01-06 23:51:04,780]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=208 MB,
[2025-01-06 23:51:04,820]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.07s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:51:04,820]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=159.87MB,
[2025-01-06 23:51:04,820]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=160 MB, model_vram=160 MB (100.0% %), vram_available=48 MB,
[2025-01-06 23:51:04,820]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:51:04,820]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available:  7390.6 MB (24.1%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10192.1 MB (99.5%), Available:    47.9 MB ( 0.5%)
  CUDA Memory Allocated:         10208.9 MB
  Total models:                  4
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10032.2 MB (44.2%), ram=12667.9 MB (55.8%), locked=False
    ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder):                          total=  159.9 MB, vram=  159.9 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=True

[2025-01-06 23:51:07,266]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:51:09,982]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 0f3d6d5a-da92-40d5-bc54-851d1e2bf478 (flux_vae_decode)
[2025-01-06 23:51:09,982]::[InvokeAI]::DEBUG --> On after run session: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2
[2025-01-06 23:51:10,009]::[InvokeAI]::INFO --> Graph stats: dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.002s     9.970G
             flux_text_encoder       1    0.001s     9.970G
                       collect       1    0.001s     9.970G
                  flux_denoise       1  197.756s    10.586G
                 core_metadata       1    0.001s     9.999G
               flux_vae_decode       1    5.235s    12.098G
TOTAL GRAPH EXECUTION TIME: 202.996s
TOTAL GRAPH WALL TIME: 202.996s
RAM used by InvokeAI process: 21.99G (-0.580G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 0
   Models cached: 4
   Models cleared from cache: 0
   Cache high water mark: 22.78/0.00G

[2025-01-06 23:51:10,027]::[InvokeAI]::INFO --> Executing queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963
[2025-01-06 23:51:10,027]::[InvokeAI]::DEBUG --> On before run session: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963
[2025-01-06 23:51:10,033]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node e53063be-01ae-43fa-b4b9-959a935ea65c (flux_model_loader)
[2025-01-06 23:51:10,033]::[InvokeAI]::DEBUG --> Invocation cache hit for type "flux_model_loader": e53063be-01ae-43fa-b4b9-959a935ea65c
[2025-01-06 23:51:10,033]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node e53063be-01ae-43fa-b4b9-959a935ea65c (flux_model_loader)
[2025-01-06 23:51:10,034]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 8751e1a2-b4ca-47f4-81ec-3dde5189b918 (flux_text_encoder)
[2025-01-06 23:51:10,034]::[InvokeAI]::DEBUG --> Invocation cache hit for type "flux_text_encoder": 8751e1a2-b4ca-47f4-81ec-3dde5189b918
[2025-01-06 23:51:10,035]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 8751e1a2-b4ca-47f4-81ec-3dde5189b918 (flux_text_encoder)
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node b32028a9-05f7-4af7-9a09-8593f577d6c0 (collect)  
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> Invocation cache hit for type "collect": b32028a9-05f7-4af7-9a09-8593f577d6c0
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node b32028a9-05f7-4af7-9a09-8593f577d6c0 (collect)   
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 87127432-67d8-4d6c-b27c-51fa5991da8f (flux_denoise)
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_denoise": 87127432-67d8-4d6c-b27c-51fa5991da8f
[2025-01-06 23:51:10,053]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:51:10,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:51:10,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=22700 MB, model_vram=10032 MB (44.2% %), vram_available=48 MB,
[2025-01-06 23:51:10,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 12620.01MB of VRAM.
[2025-01-06 23:51:10,059]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae from VRAM to free 160 MB.
[2025-01-06 23:51:10,082]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=159.87MB
[2025-01-06 23:51:10,082]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=22700 MB, model_vram=10032 MB (44.2% %), vram_available=208 MB,
[2025-01-06 23:51:10,138]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.08s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
[2025-01-06 23:51:10,138]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=193.92MB,
[2025-01-06 23:51:10,138]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=22700 MB, model_vram=10226 MB (45.0% %), vram_available=14 MB,
[2025-01-06 23:51:10,138]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:51:10,138]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available:  7390.6 MB (24.1%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available:    13.9 MB ( 0.1%)
  CUDA Memory Allocated:         10248.3 MB
  Total models:                  4
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=True
    ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder):                          total=  159.9 MB, vram=    0.0 MB ( 0.0%), ram=  159.9 MB (100.0%), locked=False

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:43<00:00,  5.44s/it] 
[2025-01-06 23:53:53,199]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:53:53,220]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 87127432-67d8-4d6c-b27c-51fa5991da8f (flux_denoise)
[2025-01-06 23:53:53,220]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 27a877df-d0b4-4417-a86c-66d32f411c4a (core_metadata)
[2025-01-06 23:53:53,221]::[InvokeAI]::DEBUG --> Invocation cache miss for type "core_metadata": 27a877df-d0b4-4417-a86c-66d32f411c4a
[2025-01-06 23:53:53,221]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 27a877df-d0b4-4417-a86c-66d32f411c4a (core_metadata)
[2025-01-06 23:53:53,221]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 5d199201-4727-409c-80ba-5fb5bae96c51 (flux_vae_decode)
[2025-01-06 23:53:53,222]::[InvokeAI]::DEBUG --> Skipping invocation cache for "flux_vae_decode": 5d199201-4727-409c-80ba-5fb5bae96c51
[2025-01-06 23:53:53,223]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:53:53,224]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:53:53,225]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=14 MB,
[2025-01-06 23:53:53,225]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 146.01MB of VRAM.
[2025-01-06 23:53:53,241]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer from VRAM to free 194 MB.
[2025-01-06 23:53:53,256]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=193.92MB
[2025-01-06 23:53:53,256]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=208 MB,
[2025-01-06 23:53:53,297]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.07s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:53:53,297]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=159.87MB,
[2025-01-06 23:53:53,297]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=160 MB, model_vram=160 MB (100.0% %), vram_available=48 MB,
[2025-01-06 23:53:53,298]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:53:53,298]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
  Storage Device (cpu)           Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available:  7390.6 MB (24.1%)
  Compute Device (cuda)          Limit: 10240.0 MB, Used: 10192.1 MB (99.5%), Available:    47.9 MB ( 0.5%)
  CUDA Memory Allocated:         10208.9 MB
  Total models:                  4
  Models:
    5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer):                  total=    0.0 MB, vram=    0.0 MB ( 0.0%), ram=    0.0 MB ( 0.0%), locked=False
    5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel):               total=  469.4 MB, vram=    0.0 MB ( 0.0%), ram=  469.4 MB (100.0%), locked=False
    b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux):                         total=22700.1 MB, vram=10032.2 MB (44.2%), ram=12667.9 MB (55.8%), locked=False
    ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder):                          total=  159.9 MB, vram=  159.9 MB (100.0%), ram=    0.0 MB ( 0.0%), locked=True

[2025-01-06 23:53:57,792]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:53:58,488]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 5d199201-4727-409c-80ba-5fb5bae96c51 (flux_vae_decode)
[2025-01-06 23:53:58,488]::[InvokeAI]::DEBUG --> On after run session: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963
[2025-01-06 23:53:58,510]::[InvokeAI]::INFO --> Graph stats: bd5f7579-ae9d-4483-baf8-44be43dad963
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.001s     9.970G
             flux_text_encoder       1    0.001s     9.970G
                       collect       1    0.000s     9.970G
                  flux_denoise       1  163.184s    10.586G
                 core_metadata       1    0.001s     9.999G
               flux_vae_decode       1    5.267s    12.098G
TOTAL GRAPH EXECUTION TIME: 168.455s
TOTAL GRAPH WALL TIME: 168.455s
RAM used by InvokeAI process: 21.93G (-0.060G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 0
   Models cached: 4
   Models cleared from cache: 0
   Cache high water mark: 22.78/0.00G

…a model after registering it with the model cache.

… trustworthy now that partial loading is permitted.

…tive device of models that are partially loaded.

… weights to the total torch.cuda.memory_allocated().

## Summary This PR enables RAM/VRAM cache size limits to be determined dynamically based on availability. **Config Changes** This PR modifies the app configs in the following ways: - A new `device_working_mem_gb` config was added. This is the amount of non-model working memory to keep available on the execution device (i.e. GPU) when using dynamic cache limits. It default to 3GB. - The `ram` and `vram` configs now default to `None`. If these configs are set, they will take precedence over the dynamic limits. **Note: Some users may have previously overriden the `ram` and `vram` values in their `invokeai.yaml`. They will need to remove these configs to enable the new dynamic limit feature.** **Working Memory** In addition to the new `device_working_mem_gb` config described above, memory-intensive operations can estimate the amount of working memory that they will need and request it from the model cache. This is currently applied to the VAE decoding step for all models. In the future, we may apply this to other operations as we work out which ops tend to exceed the default working memory reservation. **Mitigations for #7513 This PR includes some mitigations for the issue described in #7513. Without these mitigations, it would occur with higher frequency when dynamic RAM limits are used and the RAM is close to maxed-out. ## Limitations / Future Work - Only _models_ can be offloaded to RAM to conserve VRAM. I.e. if VAE decoding requires more working VRAM than available, the best we can do is keep the full model on the CPU, but we will still hit an OOM error. In the future, we could detect this ahead of time and switch to running inference on the CPU for those ops. - There is often a non-negligible amount of VRAM 'reserved' by the torch CUDA allocator, but not used by any allocated tensors. We may be able to tune the torch CUDA allocator to work better for our use case. Reference: https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf - There may be some ops that require high working memory that haven't been updated to request extra memory yet. We will update these as we uncover them. - If a model is 'locked' in VRAM, it won't be partially unloaded if a later model load requests extra working memory. This should be uncommon, but I can think of cases where it would matter. ## Related Issues / Discussions - #7492 - #7494 - #7500 - #7505 ## QA Instructions Run a variety of models near the cache limits to ensure that model switching works properly for the following configurations: - [x] CUDA, `enable_partial_loading=true`, all other configs default (i.e. dynamic memory limits) - [x] CUDA, `enable_partial_loading=true`, CPU and CUDA memory reserved in another process so there is limited RAM/VRAM remaining, all other configs default (i.e. dynamic memory limits) - [x] CUDA, `enable_partial_loading=false`, all other configs default (i.e. dynamic memory limits) - [x] CUDA, ram/vram limits set (these should take precedence over the dynamic limits) - [x] MPS, all other default (i.e. dynamic memory limits) - [x] CPU, all other default (i.e. dynamic memory limits) ## Merge Plan - [x] Merge #7505 first and change target branch to main ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_

github-actions bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files services PRs that change app services python-tests PRs that change python tests labels Dec 31, 2024

Base automatically changed from ryan/model-offload-3-smart-lora-patcher-v2 to main December 31, 2024 18:58

RyanJDick marked this pull request as ready for review December 31, 2024 19:53

RyanJDick requested review from lstein, blessedcoolant, brandonrising, hipsterusername and psychedelicious as code owners December 31, 2024 19:53

hipsterusername approved these changes Jan 2, 2025

View reviewed changes

RyanJDick mentioned this pull request Jan 2, 2025

Partial Loading PR5: Dynamic cache ram/vram limits #7509

Merged

11 tasks

psychedelicious reviewed Jan 6, 2025

View reviewed changes

RyanJDick added 10 commits January 7, 2025 00:30

First pass at adding partial loading support to the ModelCache.

535e45c

Add 'enable_partial_loading' config flag.

d0bfa01

Add log prefix to model cache logs.

ceb2498

Remove unused function set_nested_attr(...).

7127040

Add seed to flaky unit test.

402dd84

Improve handling of cases when application code modifies the size of …

1b7bb70

…a model after registering it with the model cache.

Remove all cases where we check the 'model.device'. This is no longer…

bcd29c5

… trustworthy now that partial loading is permitted.

Handle device casting in ia2_layer.py.

2619ef5

Add get_effective_device(...) utility to aid in determining the effec…

e5180c4

…tive device of models that are partially loaded.

Change definition of VRAM in use for the ModelCache from sum of model…

6a9de1f

… weights to the total torch.cuda.memory_allocated().

RyanJDick force-pushed the ryan/model-offload-4-model-cache-partial-load branch from bbc078a to 6a9de1f Compare January 7, 2025 00:34

RyanJDick merged commit 87fdcb7 into main Jan 7, 2025
15 checks passed

RyanJDick deleted the ryan/model-offload-4-model-cache-partial-load branch January 7, 2025 04:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial Loading PR4: Enable partial loading (behind config flag) #7505

Partial Loading PR4: Enable partial loading (behind config flag) #7505

RyanJDick commented Dec 31, 2024 •

edited

Loading

hipsterusername commented Dec 31, 2024

RyanJDick commented Jan 2, 2025

psychedelicious left a comment •

edited

Loading

Partial Loading PR4: Enable partial loading (behind config flag) #7505

Partial Loading PR4: Enable partial loading (behind config flag) #7505

Conversation

RyanJDick commented Dec 31, 2024 • edited Loading

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

hipsterusername commented Dec 31, 2024

RyanJDick commented Jan 2, 2025

psychedelicious left a comment • edited Loading

Choose a reason for hiding this comment

Default config

Add enable_partial_loading: true to config

Add ram: 30, vram: 10 to config

3x consecutive txt2img generations, same settings, different seed, no aux models

RyanJDick commented Dec 31, 2024 •

edited

Loading

psychedelicious left a comment •

edited

Loading

Add `enable_partial_loading: true` to config

Add `ram: 30`, `vram: 10` to config