-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial Loading PR4: Enable partial loading (behind config flag) #7505
Partial Loading PR4: Enable partial loading (behind config flag) #7505
Conversation
Is there a recommended setting for a 4090 to test/compare performance? |
For a 4090, you could test with something like this: enable_partial_loading: true
vram: 18
ram: 30 That should give enough working memory for most operations. You should expect a slowdown on large models >18GB, but anything smaller than that will run at full speed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing on Windows w/ 4070 TI (12 GB VRAM), 32GB RAM, with FLUX Dev (full). Fresh install.
Default config
Cache key error while handling lock. Suppose this is actually an OOM.
Logs
[2025-01-06 22:36:34,894]::[InvokeAI]::INFO --> Executing queue item 1, session 0e92668f-cd7a-49b1-86b7-00fe3ea5a657
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4.71it/s]
[2025-01-06 22:36:47,944]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 5.20s. Total model size: 9083.39MB, VRAM: 9083.39MB (100.0%)
[2025-01-06 22:36:47,975]::[InvokeAI]::ERROR --> Error while invoking session 0e92668f-cd7a-49b1-86b7-00fe3ea5a657, invocation 560e66bc-beee-432d-b1f6-440bf001c1a2 (flux_text_encoder): '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2'
[2025-01-06 22:36:47,975]::[InvokeAI]::ERROR --> Traceback (most recent call last):
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\services\session_processor\session_processor_default.py", line 129, in run_node
output = invocation.invoke_internal(context=context, services=self._services)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
output = self.invoke(context)
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\services\session_processor\session_processor_default.py", line 129, in run_node
output = invocation.invoke_internal(context=context, services=self._services)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
output = self.invoke(context)
output = invocation.invoke_internal(context=context, services=self._services)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
output = self.invoke(context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
output = self.invoke(context)
output = self.invoke(context)
^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 60, in invoke
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 60, in invoke
t5_embeddings = self._t5_encode(context)
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 60, in invoke
t5_embeddings = self._t5_encode(context)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
t5_embeddings = self._t5_encode(context)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
with (
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
with (
with (
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\backend\model_manager\load\load_base.py", line 60, in __enter__
self._cache.lock(self._cache_record.key)
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\backend\model_manager\load\model_cache\model_cache.py", line 199, in lock
cache_entry = self._cached_models[key]
~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2'
[2025-01-06 22:36:48,007]::[InvokeAI]::INFO --> Graph stats: 0e92668f-cd7a-49b1-86b7-00fe3ea5a657
Node Calls Seconds VRAM Used
flux_model_loader 1 0.008s 0.000G
flux_text_encoder 1 13.037s 9.118G
TOTAL GRAPH EXECUTION TIME: 13.045s
TOTAL GRAPH WALL TIME: 13.046s
RAM used by InvokeAI process: 9.97G (+9.212G)
RAM used to load models: 8.87G
VRAM in use: 8.871G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 2
Models cached: 1
Models cleared from cache: 1
Cache high water mark: 8.87/0.00G
Add enable_partial_loading: true
to config
Same cache key error while handling lock.
Logs
[2025-01-06 22:39:58,611]::[InvokeAI]::INFO --> Executing queue item 2, session 6318f284-b0df-4e3e-bd72-76a5e8eb82f9
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.30it/s]
[2025-01-06 22:40:06,899]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 0.16s. Total model size: 9334.39MB, VRAM: 251.39MB (2.7%)
[2025-01-06 22:40:06,899]::[InvokeAI]::ERROR --> Error while invoking session 6318f284-b0df-4e3e-bd72-76a5e8eb82f9, invocation f0aea7a9-1c93-4605-a503-de728f1df1be (flux_text_encoder): '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2'
[2025-01-06 22:40:06,899]::[InvokeAI]::ERROR --> Traceback (most recent call last):
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\services\session_processor\session_processor_default.py", line 129, in run_node
output = invocation.invoke_internal(context=context, services=self._services)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\baseinvocation.py", line 300, in invoke_internal
output = self.invoke(context)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 60, in invoke
t5_embeddings = self._t5_encode(context)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\app\invocations\flux_text_encoder.py", line 77, in _t5_encode
with (
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\backend\model_manager\load\load_base.py", line 60, in __enter__
self._cache.lock(self._cache_record.key)
File "C:\Users\Nelle\Documents\InvokeAI\invokeai\backend\model_manager\load\model_cache\model_cache.py", line 199, in lock
cache_entry = self._cached_models[key]
~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2'
[2025-01-06 22:40:06,941]::[InvokeAI]::INFO --> Graph stats: 6318f284-b0df-4e3e-bd72-76a5e8eb82f9
Node Calls Seconds VRAM Used
flux_model_loader 1 0.214s 0.000G
flux_text_encoder 1 8.070s 0.246G
TOTAL GRAPH EXECUTION TIME: 8.283s
TOTAL GRAPH WALL TIME: 8.283s
RAM used by InvokeAI process: 1.34G (+0.588G)
RAM used to load models: 9.12G
VRAM in use: 0.000G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 2
Models cached: 1
Models cleared from cache: 1
Cache high water mark: 9.12/0.00G
Add ram: 30
, vram: 10
to config
It works! I also tested with FLUX Canny LoRA & IP Adapter. All work, with expected changes in perf.
Logs
l model size: 9334.39MB, VRAM: 9334.39MB (100.0%)
[2025-01-06 22:40:58,719]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[2025-01-06 22:40:59,818]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder' (CLIPTextModel) onto cuda device in 0.04s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-01-06 22:40:59,818]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\transformers\models\clip\modeling_clip.py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
[2025-01-06 22:41:11,980]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 7.11s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
83%|████████████████████████████████████████████████████████████████████████████████████▏ 87%|████████████████████████████████████████100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:13<00:00, 2.45s/it]
[2025-01-06 22:42:25,968]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.04s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 22:42:32,484]::[InvokeAI]::INFO --> Graph stats: 15ba6bbe-c65a-4774-b9b6-3e0921f5eaab
Node Calls Seconds VRAM Used
flux_model_loader 1 0.007s 0.000G
flux_text_encoder 1 11.438s 9.347G
collect 1 0.000s 9.343G
flux_denoise 1 85.507s 10.587G
core_metadata 1 0.016s 9.999G
flux_vae_decode 1 7.013s 12.098G
TOTAL GRAPH EXECUTION TIME: 103.981s
TOTAL GRAPH WALL TIME: 103.982s
RAM used by InvokeAI process: 21.04G (+20.285G)
RAM used to load models: 31.90G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 6
Model cache misses: 6
Models cached: 4
Models cleared from cache: 2
Cache high water mark: 22.78/0.00G
3x consecutive txt2img generations, same settings, different seed, no aux models
But I noticed some inconsistencies in generation time. Restarted and did 3 consecutive gens - it/s changes substantially from generation to generation. In this test and all following tests, the settings, prompt, model etc are the same. Just using a random seed.
Logs
(.venv) PS C:\Users\Nelle\Documents\InvokeAI\invokeai\frontend\web> invokeai-web
[2025-01-06 22:56:18,990]::[InvokeAI]::INFO --> Patchmatch initialized
[2025-01-06 22:56:19,701]::[InvokeAI]::INFO --> Using torch device: NVIDIA GeForce RTX 4070 Ti
[2025-01-06 22:56:20,829]::[InvokeAI]::INFO --> cuDNN version: 90100
[2025-01-06 22:56:20,864]::[InvokeAI]::INFO --> InvokeAI version 5.5.0
[2025-01-06 22:56:20,865]::[InvokeAI]::INFO --> Root directory = C:\Users\Nelle\Documents\InvokeAI
[2025-01-06 22:56:20,865]::[InvokeAI]::INFO --> Initializing database at C:\Users\Nelle\Documents\InvokeAI\databases\invokeai.db
[2025-01-06 22:56:20,867]::[ModelInstallService]::INFO --> Removing dangling temporary directory C:\Users\Nelle\Documents\InvokeAI\models\tmpinstall_m4qqxyz5
[2025-01-06 22:56:20,882]::[InvokeAI]::INFO --> Pruned 6 finished queue items
[2025-01-06 22:56:21,020]::[InvokeAI]::INFO --> Cleaned database (freed 0.13MB)
[2025-01-06 22:56:21,020]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
[2025-01-06 22:57:36,521]::[InvokeAI]::INFO --> Executing queue item 9, session ace01c97-fc44-4aa3-9fb8-f95b422a6e2f
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4.97it/s]
[2025-01-06 22:57:51,164]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 6.98s. Total model size: 9334.39MB, VRAM: 9334.39MB (100.0%)
[2025-01-06 22:57:51,164]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[2025-01-06 22:57:51,916]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder' (CLIPTextModel) onto cuda device in 0.05s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-01-06 22:57:51,916]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\transformers\models\clip\modeling_clip.py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
[2025-01-06 22:58:00,815]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 6.81s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:06<00:00, 2.21s/it]
[2025-01-06 22:59:07,588]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.05s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 22:59:12,658]::[InvokeAI]::INFO --> Graph stats: ace01c97-fc44-4aa3-9fb8-f95b422a6e2f
Node Calls Seconds VRAM Used
flux_model_loader 1 0.217s 0.000G
flux_text_encoder 1 15.242s 9.347G
collect 1 0.001s 9.343G
flux_denoise 1 75.147s 10.587G
core_metadata 1 0.000s 9.999G
flux_vae_decode 1 5.510s 12.098G
TOTAL GRAPH EXECUTION TIME: 96.117s
TOTAL GRAPH WALL TIME: 96.117s
RAM used by InvokeAI process: 23.53G (+22.772G)
RAM used to load models: 31.90G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 6
Model cache misses: 6
Models cached: 4
Models cleared from cache: 2
Cache high water mark: 22.78/0.00G
[2025-01-06 22:59:12,679]::[InvokeAI]::INFO --> Executing queue item 10, session 593146d4-67ae-4721-98c7-1ad1bdcee934
[2025-01-06 22:59:12,985]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.29s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:14<00:00, 4.49s/it]
[2025-01-06 23:01:27,672]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.05s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:01:33,057]::[InvokeAI]::INFO --> Graph stats: 593146d4-67ae-4721-98c7-1ad1bdcee934
Node Calls Seconds VRAM Used
flux_model_loader 1 0.000s 9.970G
flux_text_encoder 1 0.000s 9.970G
collect 1 0.000s 9.970G
flux_denoise 1 134.927s 10.586G
core_metadata 1 0.000s 9.999G
flux_vae_decode 1 5.412s 12.098G
TOTAL GRAPH EXECUTION TIME: 140.339s
TOTAL GRAPH WALL TIME: 140.341s
RAM used by InvokeAI process: 22.70G (-0.824G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 0
Models cached: 4
Models cleared from cache: 0
Cache high water mark: 22.78/0.00G
[2025-01-06 23:01:33,057]::[InvokeAI]::INFO --> Executing queue item 11, session 4a54712a-5718-4c7b-82e9-28acbc91fbc5
[2025-01-06 23:01:33,143]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.06s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:14<00:00, 4.47s/it]
[2025-01-06 23:03:47,308]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.05s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:03:52,770]::[InvokeAI]::INFO --> Graph stats: 4a54712a-5718-4c7b-82e9-28acbc91fbc5
Node Calls Seconds VRAM Used
flux_model_loader 1 0.000s 9.970G
flux_text_encoder 1 0.000s 9.970G
collect 1 0.001s 9.970G
flux_denoise 1 134.172s 10.586G
core_metadata 1 0.001s 9.999G
flux_vae_decode 1 5.487s 12.098G
TOTAL GRAPH EXECUTION TIME: 139.661s
TOTAL GRAPH WALL TIME: 139.662s
RAM used by InvokeAI process: 22.68G (-0.027G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 0
Models cached: 4
Models cleared from cache: 0
Cache high water mark: 22.78/0.00G
Fresh start and another 3 gens - time differs quite a bit.
Logs
(.venv) PS C:\Users\Nelle\Documents\InvokeAI\invokeai\frontend\web> invokeai-web
[2025-01-06 23:32:20,404]::[InvokeAI]::INFO --> Patchmatch initialized
[2025-01-06 23:32:21,137]::[InvokeAI]::INFO --> Using torch device: NVIDIA GeForce RTX 4070 Ti
[2025-01-06 23:32:22,289]::[InvokeAI]::INFO --> cuDNN version: 90100
[2025-01-06 23:32:22,304]::[InvokeAI]::INFO --> InvokeAI version 5.5.0
[2025-01-06 23:32:22,304]::[InvokeAI]::INFO --> Root directory = C:\Users\Nelle\Documents\InvokeAI
[2025-01-06 23:32:22,304]::[InvokeAI]::INFO --> Initializing database at C:\Users\Nelle\Documents\InvokeAI\databases\invokeai.db
[2025-01-06 23:32:22,336]::[InvokeAI]::INFO --> Pruned 6 finished queue items
[2025-01-06 23:32:22,476]::[InvokeAI]::INFO --> Cleaned database (freed 0.19MB)
[2025-01-06 23:32:22,476]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
[2025-01-06 23:32:28,658]::[InvokeAI]::INFO --> Executing queue item 29, session 5dc2e6f3-c32a-43d9-93bd-cfcaa338546a
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4.96it/s]
[2025-01-06 23:32:45,515]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 9.23s. Total model size: 9334.39MB, VRAM: 9334.39MB (100.0%)
[2025-01-06 23:32:45,515]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[2025-01-06 23:32:46,570]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder' (CLIPTextModel) onto cuda device in 0.06s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-01-06 23:32:46,570]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\transformers\models\clip\modeling_clip.py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
[2025-01-06 23:32:55,675]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 6.99s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:48<00:00, 3.60s/it]
[2025-01-06 23:34:44,335]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.06s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:34:52,779]::[InvokeAI]::INFO --> Graph stats: 5dc2e6f3-c32a-43d9-93bd-cfcaa338546a
Node Calls Seconds VRAM Used
flux_model_loader 1 0.007s 0.000G
flux_text_encoder 1 17.976s 9.347G
collect 1 0.000s 9.343G
flux_denoise 1 117.154s 10.587G
core_metadata 1 0.017s 9.999G
flux_vae_decode 1 8.928s 12.098G
TOTAL GRAPH EXECUTION TIME: 144.082s
TOTAL GRAPH WALL TIME: 144.086s
RAM used by InvokeAI process: 23.04G (+22.289G)
RAM used to load models: 31.90G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 6
Model cache misses: 6
Models cached: 4
Models cleared from cache: 2
Cache high water mark: 22.78/0.00G
[2025-01-06 23:34:52,815]::[InvokeAI]::INFO --> Executing queue item 30, session 552db5b1-7b7b-4f3c-aaf3-23efac352189
[2025-01-06 23:34:53,378]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.54s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:16<00:00, 8.55s/it]
[2025-01-06 23:39:09,915]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.07s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:39:15,194]::[InvokeAI]::INFO --> Graph stats: 552db5b1-7b7b-4f3c-aaf3-23efac352189
Node Calls Seconds VRAM Used
flux_model_loader 1 0.001s 9.970G
flux_text_encoder 1 0.000s 9.970G
collect 1 0.000s 9.970G
flux_denoise 1 257.015s 10.586G
core_metadata 1 0.001s 9.999G
flux_vae_decode 1 5.321s 12.098G
TOTAL GRAPH EXECUTION TIME: 262.339s
TOTAL GRAPH WALL TIME: 262.341s
RAM used by InvokeAI process: 22.22G (-0.823G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 0
Models cached: 4
Models cleared from cache: 0
Cache high water mark: 22.78/0.00G
[2025-01-06 23:39:15,210]::[InvokeAI]::INFO --> Executing queue item 31, session b41bcac9-6ccf-4c4c-b89e-f0a3d6638f27
[2025-01-06 23:39:15,307]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.07s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:43<00:00, 5.46s/it]
[2025-01-06 23:41:59,339]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.08s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:42:04,699]::[InvokeAI]::INFO --> Graph stats: b41bcac9-6ccf-4c4c-b89e-f0a3d6638f27
Node Calls Seconds VRAM Used
flux_model_loader 1 0.000s 9.970G
flux_text_encoder 1 0.001s 9.970G
collect 1 0.001s 9.970G
flux_denoise 1 164.038s 10.586G
core_metadata 1 0.000s 9.999G
flux_vae_decode 1 5.429s 12.098G
TOTAL GRAPH EXECUTION TIME: 169.469s
TOTAL GRAPH WALL TIME: 169.469s
RAM used by InvokeAI process: 22.12G (-0.102G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 0
Models cached: 4
Models cleared from cache: 0
Cache high water mark: 22.78/0.00G
Another fresh start and 3 gens, but I enabled memory logging and set log level to debug. I manually commented out some thread polling log statements that were polluting the log. Still inconsistent.
Logs
(.venv) PS C:\Users\Nelle\Documents\InvokeAI\invokeai\frontend\web> invokeai-web
[2025-01-06 23:46:05,679]::[InvokeAI]::INFO --> Patchmatch initialized
[2025-01-06 23:46:06,352]::[InvokeAI]::INFO --> Using torch device: NVIDIA GeForce RTX 4070 Ti
[2025-01-06 23:46:07,489]::[InvokeAI]::INFO --> cuDNN version: 90100
[2025-01-06 23:46:07,508]::[InvokeAI]::INFO --> InvokeAI version 5.5.0
[2025-01-06 23:46:07,508]::[InvokeAI]::INFO --> Root directory = C:\Users\Nelle\Documents\InvokeAI
[2025-01-06 23:46:07,509]::[InvokeAI]::INFO --> Initializing database at C:\Users\Nelle\Documents\InvokeAI\databases\invokeai.db
[2025-01-06 23:46:07,509]::[InvokeAI]::DEBUG --> Registered migration 0 -> 1
[2025-01-06 23:46:07,509]::[InvokeAI]::DEBUG --> Registered migration 1 -> 2
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 2 -> 3
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 3 -> 4
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 4 -> 5
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 5 -> 6
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 6 -> 7
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 7 -> 8
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 8 -> 9
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 9 -> 10
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 10 -> 11
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 11 -> 12
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 12 -> 13
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 13 -> 14
[2025-01-06 23:46:07,510]::[InvokeAI]::DEBUG --> Registered migration 14 -> 15
[2025-01-06 23:46:07,511]::[InvokeAI]::DEBUG --> Database is up to date, no migrations to run
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-2 (_download_next_item) starting.
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-3 (_download_next_item) starting.
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-4 (_download_next_item) starting.
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-5 (_download_next_item) starting.
[2025-01-06 23:46:07,520]::[DownloadQueueService]::DEBUG --> Download queue worker thread Thread-6 (_download_next_item) starting.
[2025-01-06 23:46:07,659]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
[2025-01-06 23:46:15,836]::[InvokeAI]::INFO --> Executing queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c
[2025-01-06 23:46:15,836]::[InvokeAI]::DEBUG --> On before run session: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c
[2025-01-06 23:46:15,842]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node d78ff88b-71e4-4009-9ef6-aaa0442cf8c7 (flux_model_loader)
[2025-01-06 23:46:15,843]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_model_loader": d78ff88b-71e4-4009-9ef6-aaa0442cf8c7
[2025-01-06 23:46:15,844]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node d78ff88b-71e4-4009-9ef6-aaa0442cf8c7 (flux_model_loader)
[2025-01-06 23:46:16,045]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 156aa19c-8bfb-46d8-83dc-1759aecebbe2 (flux_text_encoder)
[2025-01-06 23:46:16,045]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_text_encoder": 156aa19c-8bfb-46d8-83dc-1759aecebbe2
[2025-01-06 23:46:16,052]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2
[2025-01-06 23:46:16,053]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 0.00MB of RAM.
[2025-01-06 23:46:16,053]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
CUDA Memory Allocated: 0.0 MB
Total models: 0
[2025-01-06 23:46:16,053]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:16,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
CUDA Memory Allocated: 0.0 MB
Total models: 0
[2025-01-06 23:46:16,151]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 0.03MB of RAM.
[2025-01-06 23:46:16,151]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
CUDA Memory Allocated: 0.0 MB
Total models: 0
[2025-01-06 23:46:16,151]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:16,154]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
CUDA Memory Allocated: 0.0 MB
Total models: 0
[2025-01-06 23:46:16,154]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer, Wrap mode: CachedModelOnlyFullLoad, Model size: 0.03MB)
[2025-01-06 23:46:16,154]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer)
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 9083.39MB of RAM.
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
CUDA Memory Allocated: 0.0 MB
Total models: 1
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:16,155]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
CUDA Memory Allocated: 0.0 MB
Total models: 1
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.59it/s]
[2025-01-06 23:46:23,438]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 9083.39MB of RAM.
[2025-01-06 23:46:23,438]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
CUDA Memory Allocated: 0.0 MB
Total models: 1
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
[2025-01-06 23:46:23,438]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:23,438]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 0.0 MB ( 0.0%), Available: 30720.0 MB (100.0%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 0.0 MB ( 0.0%), Available: 10240.0 MB (100.0%)
CUDA Memory Allocated: 0.0 MB
Total models: 1
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
[2025-01-06 23:46:23,454]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel, Wrap mode: CachedModelWithPartialLoad, Model size: 9083.39MB)
[2025-01-06 23:46:23,454]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel)
[2025-01-06 23:46:23,454]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel)
[2025-01-06 23:46:23,459]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=9334 MB, model_vram=0 MB (0.0% %), vram_available=10240 MB,
[2025-01-06 23:46:23,459]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 0.00MB of VRAM.
[2025-01-06 23:46:23,459]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=0.00MB
[2025-01-06 23:46:23,459]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=9334 MB, model_vram=0 MB (0.0% %), vram_available=10240 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2' (T5EncoderModel) onto cuda device in 2.92s. Total model size: 9334.39MB, VRAM: 9334.39MB (100.0%)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=9334.39MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=9334 MB, model_vram=9334 MB (100.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9084.4 MB
Total models: 2
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=True
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 0.00MB of VRAM.
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=0.00MB
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=0.00MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer)
[2025-01-06 23:46:26,376]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9084.4 MB
Total models: 2
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=True
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=True
[2025-01-06 23:46:26,575]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (Type: T5Tokenizer)
[2025-01-06 23:46:26,575]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (Type: T5EncoderModel)
[2025-01-06 23:46:26,579]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer
[2025-01-06 23:46:26,579]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 0.00MB of RAM.
[2025-01-06 23:46:26,580]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9096.5 MB
Total models: 2
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,671]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:26,671]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9096.5 MB
Total models: 2
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 0.00MB of RAM.
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9096.5 MB
Total models: 2
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9096.5 MB
Total models: 2
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer, Wrap mode: CachedModelOnlyFullLoad, Model size: 0.00MB)
[2025-01-06 23:46:26,721]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer)
[2025-01-06 23:46:26,725]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder
[2025-01-06 23:46:26,725]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 234.74MB of RAM.
[2025-01-06 23:46:26,725]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9096.5 MB
Total models: 3
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,726]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:26,726]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9096.5 MB
Total models: 3
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 469.44MB of RAM.
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9096.5 MB
Total models: 3
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9334.4 MB (30.4%), Available: 21385.6 MB (69.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9334.4 MB (91.2%), Available: 905.6 MB ( 8.8%)
CUDA Memory Allocated: 9096.5 MB
Total models: 3
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel, Wrap mode: CachedModelWithPartialLoad, Model size: 469.44MB)
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel)
[2025-01-06 23:46:26,846]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel)
[2025-01-06 23:46:26,861]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=469 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,861]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 0.00MB of VRAM.
[2025-01-06 23:46:26,862]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=0.00MB
[2025-01-06 23:46:26,862]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=469 MB, model_vram=0 MB (0.0% %), vram_available=906 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder' (CLIPTextModel) onto cuda device in 0.06s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=469.44MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=469 MB, model_vram=469 MB (100.0% %), vram_available=436 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9803.9 MB (31.9%), Available: 20916.1 MB (68.1%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9803.8 MB (95.7%), Available: 436.2 MB ( 4.3%)
CUDA Memory Allocated: 9567.3 MB
Total models: 4
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 469.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=True
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=436 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 0.00MB of VRAM.
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=0.00MB
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=436 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=0.00MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=0 MB, model_vram=0 MB (0.0% %), vram_available=436 MB,
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer)
[2025-01-06 23:46:26,906]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9803.9 MB (31.9%), Available: 20916.1 MB (68.1%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9803.8 MB (95.7%), Available: 436.2 MB ( 4.3%)
CUDA Memory Allocated: 9567.3 MB
Total models: 4
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=True
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 469.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=True
C:\Users\Nelle\Documents\InvokeAI\.venv\Lib\site-packages\transformers\models\clip\modeling_clip.py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
[2025-01-06 23:46:26,961]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model 5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (Type: CLIPTokenizer)
[2025-01-06 23:46:26,961]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (Type: CLIPTextModel)
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 156aa19c-8bfb-46d8-83dc-1759aecebbe2 (flux_text_encoder)
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 51833822-f3b9-4ad2-9932-40a107a95d67 (collect)
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> Invocation cache miss for type "collect": 51833822-f3b9-4ad2-9932-40a107a95d67
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 51833822-f3b9-4ad2-9932-40a107a95d67 (collect)
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 17eec3f2-90c7-47f9-8759-e185424bddc6 (flux_denoise)
[2025-01-06 23:46:26,974]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_denoise": 17eec3f2-90c7-47f9-8759-e185424bddc6
[2025-01-06 23:46:26,987]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer
[2025-01-06 23:46:26,988]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 22700.25MB of RAM.
[2025-01-06 23:46:26,988]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 9803.9 MB (31.9%), Available: 20916.1 MB (68.1%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 9803.8 MB (95.7%), Available: 436.2 MB ( 4.3%)
CUDA Memory Allocated: 9576.6 MB
Total models: 4
Models:
847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 (T5Tokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB (100.0%), locked=False
847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 (T5EncoderModel): total= 9334.4 MB, vram= 9334.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 469.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:26,988]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropping 847347b4-bae3-4d88-9937-7056497da4e9:tokenizer_2 from RAM cache to free 0.03MB.
[2025-01-06 23:46:26,991]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropping 847347b4-bae3-4d88-9937-7056497da4e9:text_encoder_2 from RAM cache to free 9334.39MB.
[2025-01-06 23:46:27,759]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 2 models to free 9334.42MB of RAM.
[2025-01-06 23:46:27,759]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 469.4 MB ( 4.6%), Available: 9770.6 MB (95.4%)
CUDA Memory Allocated: 492.2 MB
Total models: 2
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 469.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 22700.13MB of RAM.
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 469.4 MB ( 4.6%), Available: 9770.6 MB (95.4%)
CUDA Memory Allocated: 492.2 MB
Total models: 2
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 469.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 469.4 MB ( 4.6%), Available: 9770.6 MB (95.4%)
CUDA Memory Allocated: 492.2 MB
Total models: 2
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 469.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 22700.13MB of RAM.
[2025-01-06 23:46:29,012]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 469.4 MB ( 4.6%), Available: 9770.6 MB (95.4%)
CUDA Memory Allocated: 492.2 MB
Total models: 2
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 469.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:29,027]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:46:29,027]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 469.4 MB ( 1.5%), Available: 30250.6 MB (98.5%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 469.4 MB ( 4.6%), Available: 9770.6 MB (95.4%)
CUDA Memory Allocated: 492.2 MB
Total models: 2
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 469.4 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=False
[2025-01-06 23:46:29,049]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux, Wrap mode: CachedModelWithPartialLoad, Model size: 22700.13MB)
[2025-01-06 23:46:29,049]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:46:29,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:46:29,057]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=22700 MB, model_vram=0 MB (0.0% %), vram_available=9771 MB,
[2025-01-06 23:46:29,058]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 12929.57MB of VRAM.
[2025-01-06 23:46:29,062]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded 5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder from VRAM to free 469 MB.
[2025-01-06 23:46:29,064]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=469.44MB
[2025-01-06 23:46:29,064]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=22700 MB, model_vram=0 MB (0.0% %), vram_available=10240 MB,
[2025-01-06 23:46:33,066]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 4.01s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
[2025-01-06 23:46:33,066]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=10226.13MB,
[2025-01-06 23:46:33,066]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=22700 MB, model_vram=10226 MB (45.0% %), vram_available=14 MB,
[2025-01-06 23:46:33,066]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:46:33,066]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available: 7550.4 MB (24.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available: 13.9 MB ( 0.1%)
CUDA Memory Allocated: 10249.1 MB
Total models: 3
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=True
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:07<00:00, 2.25s/it]
[2025-01-06 23:47:40,449]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:47:40,476]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 17eec3f2-90c7-47f9-8759-e185424bddc6 (flux_denoise)
[2025-01-06 23:47:40,477]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node a89c8ad7-8a7f-4b93-bc4f-67fdf51a35ad (core_metadata)
[2025-01-06 23:47:40,478]::[InvokeAI]::DEBUG --> Invocation cache miss for type "core_metadata": a89c8ad7-8a7f-4b93-bc4f-67fdf51a35ad
[2025-01-06 23:47:40,488]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node a89c8ad7-8a7f-4b93-bc4f-67fdf51a35ad (core_metadata)
[2025-01-06 23:47:40,489]::[InvokeAI]::DEBUG --> On before run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 80e3fba5-9fa1-498b-a4bd-06256d8efad9 (flux_vae_decode)
[2025-01-06 23:47:40,489]::[InvokeAI]::DEBUG --> Skipping invocation cache for "flux_vae_decode": 80e3fba5-9fa1-498b-a4bd-06256d8efad9
[2025-01-06 23:47:40,492]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache miss: ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae
[2025-01-06 23:47:40,492]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 319.77MB of RAM.
[2025-01-06 23:47:40,493]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available: 7550.4 MB (24.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available: 13.9 MB ( 0.1%)
CUDA Memory Allocated: 10239.3 MB
Total models: 3
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=False
[2025-01-06 23:47:40,498]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:47:40,498]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available: 7550.4 MB (24.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available: 13.9 MB ( 0.1%)
CUDA Memory Allocated: 10239.3 MB
Total models: 3
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=False
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Making room for 159.87MB of RAM.
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available: 7550.4 MB (24.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available: 13.9 MB ( 0.1%)
CUDA Memory Allocated: 10239.3 MB
Total models: 3
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=False
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Dropped 0 models to free 0.00MB of RAM.
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After dropping models:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23169.6 MB (75.4%), Available: 7550.4 MB (24.6%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available: 13.9 MB ( 0.1%)
CUDA Memory Allocated: 10239.3 MB
Total models: 3
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=False
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Added model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder, Wrap mode: CachedModelWithPartialLoad, Model size: 159.87MB)
[2025-01-06 23:47:40,894]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:47:40,910]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:47:40,910]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=14 MB,
[2025-01-06 23:47:40,910]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 146.01MB of VRAM.
[2025-01-06 23:47:40,930]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer from VRAM to free 194 MB.
[2025-01-06 23:47:40,930]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=193.92MB
[2025-01-06 23:47:40,930]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=208 MB,
[2025-01-06 23:47:40,960]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.05s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:47:40,960]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=159.87MB,
[2025-01-06 23:47:40,960]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=160 MB, model_vram=160 MB (100.0% %), vram_available=48 MB,
[2025-01-06 23:47:40,960]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:47:40,963]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available: 7390.6 MB (24.1%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10192.1 MB (99.5%), Available: 47.9 MB ( 0.5%)
CUDA Memory Allocated: 10208.9 MB
Total models: 4
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10032.2 MB (44.2%), ram=12667.9 MB (55.8%), locked=False
ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder): total= 159.9 MB, vram= 159.9 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=True
[2025-01-06 23:47:46,576]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:47:46,940]::[InvokeAI]::DEBUG --> On after run node: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c, node 80e3fba5-9fa1-498b-a4bd-06256d8efad9 (flux_vae_decode)
[2025-01-06 23:47:46,940]::[InvokeAI]::DEBUG --> On after run session: queue item 35, session fc5be7a7-578e-4350-b32b-5c8ec6ba051c
[2025-01-06 23:47:46,962]::[InvokeAI]::INFO --> Graph stats: fc5be7a7-578e-4350-b32b-5c8ec6ba051c
Node Calls Seconds VRAM Used
flux_model_loader 1 0.203s 0.000G
flux_text_encoder 1 10.929s 9.347G
collect 1 0.000s 9.343G
flux_denoise 1 73.503s 10.587G
core_metadata 1 0.010s 9.999G
flux_vae_decode 1 6.451s 12.098G
TOTAL GRAPH EXECUTION TIME: 91.097s
TOTAL GRAPH WALL TIME: 91.098s
RAM used by InvokeAI process: 22.57G (+21.819G)
RAM used to load models: 31.90G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 6
Model cache misses: 6
Models cached: 4
Models cleared from cache: 2
Cache high water mark: 22.78/0.00G
[2025-01-06 23:47:46,982]::[InvokeAI]::INFO --> Executing queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2
[2025-01-06 23:47:46,982]::[InvokeAI]::DEBUG --> On before run session: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2
[2025-01-06 23:47:46,985]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 9ae2f4aa-cf77-4125-ba77-cf4cc01a2309 (flux_model_loader)
[2025-01-06 23:47:46,986]::[InvokeAI]::DEBUG --> Invocation cache hit for type "flux_model_loader": 9ae2f4aa-cf77-4125-ba77-cf4cc01a2309
[2025-01-06 23:47:46,986]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 9ae2f4aa-cf77-4125-ba77-cf4cc01a2309 (flux_model_loader)
[2025-01-06 23:47:46,988]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node b7c54fb9-b437-4f57-9260-4a236fe5d0be (flux_text_encoder)
[2025-01-06 23:47:46,988]::[InvokeAI]::DEBUG --> Invocation cache hit for type "flux_text_encoder": b7c54fb9-b437-4f57-9260-4a236fe5d0be
[2025-01-06 23:47:46,989]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node b7c54fb9-b437-4f57-9260-4a236fe5d0be (flux_text_encoder)
[2025-01-06 23:47:46,989]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node bbe083f0-bf0a-42e5-b6fb-6b60a32c9f84 (collect)
[2025-01-06 23:47:46,989]::[InvokeAI]::DEBUG --> Invocation cache hit for type "collect": bbe083f0-bf0a-42e5-b6fb-6b60a32c9f84
[2025-01-06 23:47:46,990]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node bbe083f0-bf0a-42e5-b6fb-6b60a32c9f84 (collect)
[2025-01-06 23:47:46,990]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 1b141073-c5b2-4256-a47f-6ac429fde88e (flux_denoise)
[2025-01-06 23:47:46,991]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_denoise": 1b141073-c5b2-4256-a47f-6ac429fde88e
[2025-01-06 23:47:47,006]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:47:47,008]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:47:47,008]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=22700 MB, model_vram=10032 MB (44.2% %), vram_available=48 MB,
[2025-01-06 23:47:47,008]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 12620.01MB of VRAM.
[2025-01-06 23:47:47,011]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae from VRAM to free 160 MB.
[2025-01-06 23:47:47,031]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=159.87MB
[2025-01-06 23:47:47,032]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=22700 MB, model_vram=10032 MB (44.2% %), vram_available=208 MB,
[2025-01-06 23:47:47,384]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.38s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
[2025-01-06 23:47:47,384]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=193.92MB,
[2025-01-06 23:47:47,384]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=22700 MB, model_vram=10226 MB (45.0% %), vram_available=14 MB,
[2025-01-06 23:47:47,384]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:47:47,384]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available: 7390.6 MB (24.1%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available: 13.9 MB ( 0.1%)
CUDA Memory Allocated: 10248.3 MB
Total models: 4
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=True
ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder): total= 159.9 MB, vram= 0.0 MB ( 0.0%), ram= 159.9 MB (100.0%), locked=False
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [03:17<00:00, 6.58s/it]
[2025-01-06 23:51:04,731]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:51:04,745]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 1b141073-c5b2-4256-a47f-6ac429fde88e (flux_denoise)
[2025-01-06 23:51:04,746]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 999c5bf6-d0c0-4ab3-ba26-cda2aae5c152 (core_metadata)
[2025-01-06 23:51:04,747]::[InvokeAI]::DEBUG --> Invocation cache miss for type "core_metadata": 999c5bf6-d0c0-4ab3-ba26-cda2aae5c152
[2025-01-06 23:51:04,747]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 999c5bf6-d0c0-4ab3-ba26-cda2aae5c152 (core_metadata)
[2025-01-06 23:51:04,747]::[InvokeAI]::DEBUG --> On before run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 0f3d6d5a-da92-40d5-bc54-851d1e2bf478 (flux_vae_decode)
[2025-01-06 23:51:04,748]::[InvokeAI]::DEBUG --> Skipping invocation cache for "flux_vae_decode": 0f3d6d5a-da92-40d5-bc54-851d1e2bf478
[2025-01-06 23:51:04,750]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:51:04,751]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:51:04,751]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=14 MB,
[2025-01-06 23:51:04,751]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 146.01MB of VRAM.
[2025-01-06 23:51:04,766]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer from VRAM to free 194 MB.
[2025-01-06 23:51:04,780]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=193.92MB
[2025-01-06 23:51:04,780]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=208 MB,
[2025-01-06 23:51:04,820]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.07s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:51:04,820]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=159.87MB,
[2025-01-06 23:51:04,820]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=160 MB, model_vram=160 MB (100.0% %), vram_available=48 MB,
[2025-01-06 23:51:04,820]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:51:04,820]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available: 7390.6 MB (24.1%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10192.1 MB (99.5%), Available: 47.9 MB ( 0.5%)
CUDA Memory Allocated: 10208.9 MB
Total models: 4
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10032.2 MB (44.2%), ram=12667.9 MB (55.8%), locked=False
ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder): total= 159.9 MB, vram= 159.9 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=True
[2025-01-06 23:51:07,266]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:51:09,982]::[InvokeAI]::DEBUG --> On after run node: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2, node 0f3d6d5a-da92-40d5-bc54-851d1e2bf478 (flux_vae_decode)
[2025-01-06 23:51:09,982]::[InvokeAI]::DEBUG --> On after run session: queue item 36, session dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2
[2025-01-06 23:51:10,009]::[InvokeAI]::INFO --> Graph stats: dd9ba27f-a4df-4c8f-abdc-80b6f2c1dff2
Node Calls Seconds VRAM Used
flux_model_loader 1 0.002s 9.970G
flux_text_encoder 1 0.001s 9.970G
collect 1 0.001s 9.970G
flux_denoise 1 197.756s 10.586G
core_metadata 1 0.001s 9.999G
flux_vae_decode 1 5.235s 12.098G
TOTAL GRAPH EXECUTION TIME: 202.996s
TOTAL GRAPH WALL TIME: 202.996s
RAM used by InvokeAI process: 21.99G (-0.580G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 0
Models cached: 4
Models cleared from cache: 0
Cache high water mark: 22.78/0.00G
[2025-01-06 23:51:10,027]::[InvokeAI]::INFO --> Executing queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963
[2025-01-06 23:51:10,027]::[InvokeAI]::DEBUG --> On before run session: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963
[2025-01-06 23:51:10,033]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node e53063be-01ae-43fa-b4b9-959a935ea65c (flux_model_loader)
[2025-01-06 23:51:10,033]::[InvokeAI]::DEBUG --> Invocation cache hit for type "flux_model_loader": e53063be-01ae-43fa-b4b9-959a935ea65c
[2025-01-06 23:51:10,033]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node e53063be-01ae-43fa-b4b9-959a935ea65c (flux_model_loader)
[2025-01-06 23:51:10,034]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 8751e1a2-b4ca-47f4-81ec-3dde5189b918 (flux_text_encoder)
[2025-01-06 23:51:10,034]::[InvokeAI]::DEBUG --> Invocation cache hit for type "flux_text_encoder": 8751e1a2-b4ca-47f4-81ec-3dde5189b918
[2025-01-06 23:51:10,035]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 8751e1a2-b4ca-47f4-81ec-3dde5189b918 (flux_text_encoder)
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node b32028a9-05f7-4af7-9a09-8593f577d6c0 (collect)
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> Invocation cache hit for type "collect": b32028a9-05f7-4af7-9a09-8593f577d6c0
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node b32028a9-05f7-4af7-9a09-8593f577d6c0 (collect)
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 87127432-67d8-4d6c-b27c-51fa5991da8f (flux_denoise)
[2025-01-06 23:51:10,036]::[InvokeAI]::DEBUG --> Invocation cache miss for type "flux_denoise": 87127432-67d8-4d6c-b27c-51fa5991da8f
[2025-01-06 23:51:10,053]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:51:10,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:51:10,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=22700 MB, model_vram=10032 MB (44.2% %), vram_available=48 MB,
[2025-01-06 23:51:10,054]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 12620.01MB of VRAM.
[2025-01-06 23:51:10,059]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae from VRAM to free 160 MB.
[2025-01-06 23:51:10,082]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=159.87MB
[2025-01-06 23:51:10,082]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=22700 MB, model_vram=10032 MB (44.2% %), vram_available=208 MB,
[2025-01-06 23:51:10,138]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer' (Flux) onto cuda device in 0.08s. Total model size: 22700.13MB, VRAM: 10226.13MB (45.0%)
[2025-01-06 23:51:10,138]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=193.92MB,
[2025-01-06 23:51:10,138]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=22700 MB, model_vram=10226 MB (45.0% %), vram_available=14 MB,
[2025-01-06 23:51:10,138]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:51:10,138]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available: 7390.6 MB (24.1%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10226.1 MB (99.9%), Available: 13.9 MB ( 0.1%)
CUDA Memory Allocated: 10248.3 MB
Total models: 4
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10226.1 MB (45.0%), ram=12474.0 MB (55.0%), locked=True
ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder): total= 159.9 MB, vram= 0.0 MB ( 0.0%), ram= 159.9 MB (100.0%), locked=False
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:43<00:00, 5.44s/it]
[2025-01-06 23:53:53,199]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Type: Flux)
[2025-01-06 23:53:53,220]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 87127432-67d8-4d6c-b27c-51fa5991da8f (flux_denoise)
[2025-01-06 23:53:53,220]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 27a877df-d0b4-4417-a86c-66d32f411c4a (core_metadata)
[2025-01-06 23:53:53,221]::[InvokeAI]::DEBUG --> Invocation cache miss for type "core_metadata": 27a877df-d0b4-4417-a86c-66d32f411c4a
[2025-01-06 23:53:53,221]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 27a877df-d0b4-4417-a86c-66d32f411c4a (core_metadata)
[2025-01-06 23:53:53,221]::[InvokeAI]::DEBUG --> On before run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 5d199201-4727-409c-80ba-5fb5bae96c51 (flux_vae_decode)
[2025-01-06 23:53:53,222]::[InvokeAI]::DEBUG --> Skipping invocation cache for "flux_vae_decode": 5d199201-4727-409c-80ba-5fb5bae96c51
[2025-01-06 23:53:53,223]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Cache hit: ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:53:53,224]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:53:53,225]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Before unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=14 MB,
[2025-01-06 23:53:53,225]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Offloading unlocked models with goal of freeing 146.01MB of VRAM.
[2025-01-06 23:53:53,241]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer from VRAM to free 194 MB.
[2025-01-06 23:53:53,256]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unloaded models (if necessary): vram_bytes_freed=193.92MB
[2025-01-06 23:53:53,256]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After unloading: model_total=160 MB, model_vram=0 MB (0.0% %), vram_available=208 MB,
[2025-01-06 23:53:53,297]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae' (AutoEncoder) onto cuda device in 0.07s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[2025-01-06 23:53:53,297]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Loaded model onto execution device: model_bytes_loaded=159.87MB,
[2025-01-06 23:53:53,297]::[ModelManagerService]::DEBUG --> [MODEL CACHE] After loading: model_total=160 MB, model_vram=160 MB (100.0% %), vram_available=48 MB,
[2025-01-06 23:53:53,298]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Finished locking model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:53:53,298]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Model cache state:
Storage Device (cpu) Limit: 30720.0 MB, Used: 23329.4 MB (75.9%), Available: 7390.6 MB (24.1%)
Compute Device (cuda) Limit: 10240.0 MB, Used: 10192.1 MB (99.5%), Available: 47.9 MB ( 0.5%)
CUDA Memory Allocated: 10208.9 MB
Total models: 4
Models:
5c49bd84-17fc-4183-a386-559bc1feb71c:tokenizer (CLIPTokenizer): total= 0.0 MB, vram= 0.0 MB ( 0.0%), ram= 0.0 MB ( 0.0%), locked=False
5c49bd84-17fc-4183-a386-559bc1feb71c:text_encoder (CLIPTextModel): total= 469.4 MB, vram= 0.0 MB ( 0.0%), ram= 469.4 MB (100.0%), locked=False
b3b23163-97a2-492c-aa0b-c1ae8a1412df:transformer (Flux): total=22700.1 MB, vram=10032.2 MB (44.2%), ram=12667.9 MB (55.8%), locked=False
ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (AutoEncoder): total= 159.9 MB, vram= 159.9 MB (100.0%), ram= 0.0 MB ( 0.0%), locked=True
[2025-01-06 23:53:57,792]::[ModelManagerService]::DEBUG --> [MODEL CACHE] Unlocked model ca7c76fb-408d-42c0-8f8d-febdc81734f5:vae (Type: AutoEncoder)
[2025-01-06 23:53:58,488]::[InvokeAI]::DEBUG --> On after run node: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963, node 5d199201-4727-409c-80ba-5fb5bae96c51 (flux_vae_decode)
[2025-01-06 23:53:58,488]::[InvokeAI]::DEBUG --> On after run session: queue item 37, session bd5f7579-ae9d-4483-baf8-44be43dad963
[2025-01-06 23:53:58,510]::[InvokeAI]::INFO --> Graph stats: bd5f7579-ae9d-4483-baf8-44be43dad963
Node Calls Seconds VRAM Used
flux_model_loader 1 0.001s 9.970G
flux_text_encoder 1 0.001s 9.970G
collect 1 0.000s 9.970G
flux_denoise 1 163.184s 10.586G
core_metadata 1 0.001s 9.999G
flux_vae_decode 1 5.267s 12.098G
TOTAL GRAPH EXECUTION TIME: 168.455s
TOTAL GRAPH WALL TIME: 168.455s
RAM used by InvokeAI process: 21.93G (-0.060G)
RAM used to load models: 22.32G
VRAM in use: 9.970G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 0
Models cached: 4
Models cleared from cache: 0
Cache high water mark: 22.78/0.00G
…a model after registering it with the model cache.
… trustworthy now that partial loading is permitted.
…tive device of models that are partially loaded.
… weights to the total torch.cuda.memory_allocated().
bbc078a
to
6a9de1f
Compare
## Summary This PR enables RAM/VRAM cache size limits to be determined dynamically based on availability. **Config Changes** This PR modifies the app configs in the following ways: - A new `device_working_mem_gb` config was added. This is the amount of non-model working memory to keep available on the execution device (i.e. GPU) when using dynamic cache limits. It default to 3GB. - The `ram` and `vram` configs now default to `None`. If these configs are set, they will take precedence over the dynamic limits. **Note: Some users may have previously overriden the `ram` and `vram` values in their `invokeai.yaml`. They will need to remove these configs to enable the new dynamic limit feature.** **Working Memory** In addition to the new `device_working_mem_gb` config described above, memory-intensive operations can estimate the amount of working memory that they will need and request it from the model cache. This is currently applied to the VAE decoding step for all models. In the future, we may apply this to other operations as we work out which ops tend to exceed the default working memory reservation. **Mitigations for #7513 This PR includes some mitigations for the issue described in #7513. Without these mitigations, it would occur with higher frequency when dynamic RAM limits are used and the RAM is close to maxed-out. ## Limitations / Future Work - Only _models_ can be offloaded to RAM to conserve VRAM. I.e. if VAE decoding requires more working VRAM than available, the best we can do is keep the full model on the CPU, but we will still hit an OOM error. In the future, we could detect this ahead of time and switch to running inference on the CPU for those ops. - There is often a non-negligible amount of VRAM 'reserved' by the torch CUDA allocator, but not used by any allocated tensors. We may be able to tune the torch CUDA allocator to work better for our use case. Reference: https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf - There may be some ops that require high working memory that haven't been updated to request extra memory yet. We will update these as we uncover them. - If a model is 'locked' in VRAM, it won't be partially unloaded if a later model load requests extra working memory. This should be uncommon, but I can think of cases where it would matter. ## Related Issues / Discussions - #7492 - #7494 - #7500 - #7505 ## QA Instructions Run a variety of models near the cache limits to ensure that model switching works properly for the following configurations: - [x] CUDA, `enable_partial_loading=true`, all other configs default (i.e. dynamic memory limits) - [x] CUDA, `enable_partial_loading=true`, CPU and CUDA memory reserved in another process so there is limited RAM/VRAM remaining, all other configs default (i.e. dynamic memory limits) - [x] CUDA, `enable_partial_loading=false`, all other configs default (i.e. dynamic memory limits) - [x] CUDA, ram/vram limits set (these should take precedence over the dynamic limits) - [x] MPS, all other default (i.e. dynamic memory limits) - [x] CPU, all other default (i.e. dynamic memory limits) ## Merge Plan - [x] Merge #7505 first and change target branch to main ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_
Summary
This PR adds support for partial loading of models onto the GPU. This enables models to run with much lower peak VRAM requirements (e.g. full FLUX dev with 8GB of VRAM).
The partial loading feature is enabled behind a new config flag:
enable_partial_loading=True
. This flag defaults toFalse
.Note about performance:
The
ram
andvram
config limits are still applied whenenable_partial_loading=True
is set. This can result in significant slowdowns compared to the 'old' behaviour. Consider the case where the VRAM limit is set tovram=0.75
(GB) and we are trying to run an 8GB model. Whenenable_partial_loading=False
, we attempt to load the entire model into VRAM, and if it fits (no OOM error) then it will run at full speed. Whenenable_partial_loading=True
, since we have the option to partially load the model we will only load 0.75 GB into VRAM and leave the remaining 7.25 GB in RAM. This will cause inference to be much slower than before. To workaround this, it is important that yourram
andvram
configs are carefully tuned. In a future PR, we will add the ability to dynamically set the RAM/VRAM limits based on the available memory / VRAM.Related Issues / Discussions
QA Instructions
Tests with
enable_partial_loading=True
,vram=2
, on CUDA device:For all tests, we expect model memory to stay below 2 GB. Peak working memory will be higher.
Tests with
enable_partial_loading=True
, and hack to force all models to load 10%, on CUDA device:Tests with
enable_partial_loading=False
,vram=30
:We expect no change in behaviour when
enable_partial_loading=False
.Other platforms:
enable_partial_loading=True
.enable_partial_loading=True
.Merge Plan
Checklist
What's New
copy (if doing a release after this PR)