You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
After upgrading from version 3.2.0 to either 3.3.0 or 3.31 I notice a huge increase in memory usage. Run from_bytes() on a 25 MB file, now results in using almost 3 GB of memory.
To Reproduce
Run this file, placed inside the charset_normalizer folder, with the scalene memory profiler (Linux/WSL):
Expected behaviour
Expected that the function did use just a bit more memory than the file I passed into from_bytes().
Testing Environment
OS: Ubuntu on WSL
Python version 3.11.6
Package version 3.3.0/1
Additional context
We use the charset-normalizer in our program running in containers with strict memory limits. We noticed the change in behaviour after our pods were Out Of Memory (OOM) killed.
Doing some debugging, it seems that the increase in memory consumption comes from storing the decoded_payload in the CharsetMatch().
Finally
A big thank you to the authors and maintainers! This library is much needed, used and appreciated!
The text was updated successfully, but these errors were encountered:
Describe the bug
After upgrading from version 3.2.0 to either 3.3.0 or 3.31 I notice a huge increase in memory usage. Run
from_bytes()
on a 25 MB file, now results in using almost 3 GB of memory.To Reproduce
Run this file, placed inside the
charset_normalizer
folder, with the scalene memory profiler (Linux/WSL):memory_profile_test.py:
Data file used (25 MB), placed in the
data
folder :memory_profile_test.txt
Profiler result (download and view in browser):
profile_charset_normalizer_3.3.1.html
Expected behaviour
Expected that the function did use just a bit more memory than the file I passed into
from_bytes()
.Testing Environment
Additional context
We use the charset-normalizer in our program running in containers with strict memory limits. We noticed the change in behaviour after our pods were Out Of Memory (OOM) killed.
Doing some debugging, it seems that the increase in memory consumption comes from storing the
decoded_payload
in theCharsetMatch()
.Finally
A big thank you to the authors and maintainers! This library is much needed, used and appreciated!
The text was updated successfully, but these errors were encountered: