Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak: unlike json.dumps, orjson.dumps does not release memory, resulting in the continuously growing memory usage #483

Closed
alessio-locatelli opened this issue May 24, 2024 · 6 comments
Labels
invalid This doesn't seem right

Comments

@alessio-locatelli
Copy link

alessio-locatelli commented May 24, 2024

Search you tried in the issue tracker

"memory"

There is a range of different issues about memory usage, but all of them are either closed as "fixed" or are not strictly related to the current one.

Describe your issue

orjson uses significantly more memory for orjson.dumps() when I compare it to the json Python built-in library.

I noticed that a peak memory usage in my application is larger when I use orjson.dumps (11 GiB with json VS 17 GiB with orjson).

Reproducible code example

import orjson
import json

# Create some fake input data.
d = {
    str(i): {
        str(i): [[i,i], [i,i]],
        str(i + i): [[i,i], [i,i]]
    }
    for i in range(500_000)
}

# We make a kind of a stream by removing items from one variable and putting the popped values to a new variable.
# In other words, the memory usage should be persistent, without significant grow.
new = []
while True:
    try:
        _, mapping = d.popitem()
    except KeyError:
        break

    new.append({k: orjson.dumps(v) for k, v in mapping.items()})
    #new.append({k: json.dumps(v) for k, v in mapping.items()})

how to run

memray run --aggregate --native --trace-python-allocators benchmark.py

results

orjson

image

image

json

image

image

results summary

With orjson the memory is growing continuously. The memory is not released.

With json the memory is not growing. Variables from the initial dictionary are moved to a new list. The memory usage is persistent. This is the expected behavior.

versions

  • Python 3.12.3
  • orjson 3.10.3 3.10.4
  • Debian GNU/Linux 12 (bookworm)
@alessio-locatelli alessio-locatelli changed the title As opposite to json, with orjson memory is not released resulting in multiple time larger memory usage As opposite to json, with orjson memory is not released, resulting in a multiple times larger memory usage May 24, 2024
@alessio-locatelli alessio-locatelli changed the title As opposite to json, with orjson memory is not released, resulting in a multiple times larger memory usage Unlike json, orjson does not release memory, resulting in a much higher memory usage May 24, 2024
@alessio-locatelli alessio-locatelli changed the title Unlike json, orjson does not release memory, resulting in a much higher memory usage Unlike json.dumps, orjson.dumps does not release memory, resulting in a much higher memory usage May 24, 2024
@ZeroIntensity
Copy link

Does orjson.dumps cache results? That could result in memray seeing memory leaks.

@github-actions github-actions bot added the Stale label Jun 7, 2024
@alessio-locatelli
Copy link
Author

@github-actions github-actions bot added the Stale label Jun 7, 2024

Not stale.

@github-actions github-actions bot removed the Stale label Jun 9, 2024
@alessio-locatelli alessio-locatelli changed the title Unlike json.dumps, orjson.dumps does not release memory, resulting in a much higher memory usage Memory leak: unlike json.dumps, orjson.dumps does not release memory, resulting in the continuously growing memory usage Jun 13, 2024
@godlygeek
Copy link

godlygeek commented Jun 20, 2024

This is a true leak, in the sense that the returned bytes objects are much larger than they need to be. The bug is this line:

(*self.bytes.cast::<PyVarObject>()).ob_size = self.len as Py_ssize_t;

finish() calls self.resize(self.len), which calls _PyBytes_Resize to shrink the bytes object down to only the needed capacity (which does free memory back to the allocator as long as the object is shrinking by at least 25%). But _PyBytes_Resize checks the new size against the old size to decide whether it has any work to do, and bails out without doing any work if the current size and new size are exactly the same. Because the above linked line of code overwrites the "current size" field to be equal to the new size, the _PyBytes_Resize called later from finish() always decides the bytes object is already the right size, and never shrinks its capacity.

Here's a simpler reproducer:

import orjson
import resource

lst = [orjson.dumps(digit) for _ in range(20_000) for digit in "1234567890"]
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, "MB")

That program prints the maximum resident set size that the program reached. With orjson as it exists today, that prints out "214.5 MB" for me. If I comment out the direct write to the bytes object's ob_size field, that becomes "24.625 MB". The further the size of the bytes object is from its capacity, the more space is needlessly wasted.

@godlygeek
Copy link

And, to be clear, it's not a "true leak" in the sense that memory can never be reclaimed by the allocator. When the bytes object dies, the memory is reclaimed. The impact is only that the bytes object is much larger than it needs to be while it is alive, because no shrink-to-fit operation is happening.

@github-actions github-actions bot added the Stale label Jun 27, 2024
@alessio-locatelli
Copy link
Author

@github-actions github-actions bot added the Stale label Jun 27, 2024

Not stale.

@github-actions github-actions bot removed the Stale label Jun 28, 2024
@github-actions github-actions bot added the Stale label Jul 6, 2024
@alessio-locatelli
Copy link
Author

@github-actions github-actions bot added the Stale label Jul 6, 2024

Not stale.

@github-actions github-actions bot removed the Stale label Jul 8, 2024
Repository owner deleted a comment from sanbei101 Jul 10, 2024
@ijl ijl added the invalid This doesn't seem right label Jul 10, 2024
@ijl ijl closed this as completed Jul 10, 2024
Repository owner locked as spam and limited conversation to collaborators Jul 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

4 participants