Memory Not Released After High-Load Operations #2078

EBazarov · 2023-08-21T14:45:57Z

EBazarov
Aug 21, 2023

Summary:

We have developed an inference server using FastAPI with batching. However, we've encountered a significant issue related to memory management during high-load scenarios, in our case (described below) we are sending lists of base64-encoded images with a lot of simultaneous requests. Although we initially implemented this with FastAPI, further tests revealed that the issue also persists with a pure Starlette implementation, that’s why it’s submitted under the Starlette repository.

Memory consumption over time with Starlette + Gunicorn while waiting for 1 hour after the test ends.

Detailed Description:

In our test scenario, the asynchronous server receives a request containing a list of base64 encoded images and simply returns “Hello World!”.

from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route

async def infer(request):
    data = await request.json()
    return JSONResponse({"message": "Hello World!"})

routes = [
    Route('/infer', infer, methods=['GET', 'POST'])
]

app = Starlette(debug=True, routes=routes)

Current Behavior:

Memory consumption increases under high-load conditions, which is expected due to data backlog. However, the memory is not released back to the system even after the load decreases.

Expected Behavior:

Memory consumption should increase under high-load conditions, but it should revert to a level close to its initial state once the high-load operations are complete.

Reproduction Steps:

We have prepared a dedicated GitHub repository that showcases this issue in greater details. This includes mprof memory plots for each test, Dockerfiles and makefile for easy reproducibility. You can clone the repository and easily reproduce the issue following the README file with specific instructions.

TLDR what we did:

Initiate an asynchronous server using Starlette
Conduct a series of load tests (transmission of random length lists of base64 encoded images)
Monitor memory usage before, during, and after the load tests

What We've Tried:

Some of the experiments can be found in the shared repository above, but overall what we tried:

Switching to different versions of FastAPI, Starlette, and Python
Utilizing various ASGI servers like hypercorn, uvicorn, and gunicorn
Setting _default_thread_limiter in anyio to 1
Waiting a few hours in idle, just to see if memory consumption will go down after the test load
Forcing garbage collector request to remove not used objects
We considered the possibility that this issue could stem from Python's own memory management, as there are documented cases where Python reserves memory from the OS. However, continuous monitoring of Resident Set Size (RSS) and Virtual Memory Size (VMS) at the Docker level showed that they stay almost at the same level

Related issues

We found out that we are not the only ones having the same behaviour:

Answered by gi0baro

Nov 7, 2023

Given everything we stated in the upper replies, I suspect there's no much to do about this.
Given the number of frameworks and servers tested, it just seems the Python interpreter is the culprit here, as even if the heap memory gets released, the RSS remains high.

View full answer

Kludex · 2023-08-21T14:47:47Z

Kludex
Aug 21, 2023
Maintainer

Does it persist with another ASGI web framework? 👀

23 replies

Kludex Aug 28, 2023
Maintainer

It's extremely unlikely that there's a leak of such a magnitude with this simple app with the production setup. Try installing also uvloop, and see if still reproduces. If it reproduces, please improve the README, so I can run here. It's taking too long to print the chart.

EBazarov Aug 28, 2023
Author

@Kludex of course, not an issue, we will try to test it with uvloop also. I will say that it surprised me also, to see that, but still, we can see it in our local and production environments.
We will be happy to improve our README, let us know please where you think we should make changes, there's a makefile to reproduce the issue with two commands.

gi0baro Aug 28, 2023

A quick correction, as I re-run the Granian tests. It seems the result I posted above was actually dependant on the fact the Rust workers died during the test.

Tweaking a bit the test itself so that Granian won't flood the Python event loop, shows results comparable with every other test:

So, at this point, given the number of frameworks and servers tested, I'm prone to give the culprit to Python itself..

EBazarov Aug 28, 2023
Author

Thank you for that, because we tested and still had the same issue, we were thinking of writing that after double-checking everything.

syrinecheriaa Aug 29, 2023

@Kludex we updated the repo by adding a "quick test" that enables you to have a plot in less then one minute. Please refer to this part of the README to use it.

We also made a test using uvloop and the issue persists.

Kludex · 2023-08-28T15:52:21Z

Kludex
Aug 28, 2023
Maintainer

This list is not useful. There was a leak in the past that was solved.

1 reply

EBazarov Aug 28, 2023
Author

Agree that some of them were solved, but they contain a reference to things that we tried and it didn't worked for us, also we spotted that in some discussions people still have some troubles with memory consumption.

Kludex · 2023-08-30T07:49:22Z

Kludex
Aug 30, 2023
Maintainer

Can you use the pipeline, and upload the artifact (image) from there?

1 reply

syrinecheriaa Aug 30, 2023

@Kludex you can pull the image from docker hub
docker pull syrinecher/test_memory_consumption:v0.0.1 (this image use starlette + uvicorn with the whole test (it should takes around 90 seconds to generate the plot))

EBazarov · 2023-09-04T07:56:59Z

EBazarov
Sep 4, 2023
Author

Hi @Kludex,

Have you had a chance to check out the repository and reproduce the plots we've been working on? We're interested in digging deeper to understand the underlying issues better.

One approach we're considering is enhanced memory profiling. However, we'd really value your insights on this or any other suggestions you might have for uncovering the root cause.

Looking forward to your thoughts!

0 replies

ulan-yisaev · 2023-09-04T08:10:23Z

ulan-yisaev
Sep 4, 2023

Hi @EBazarov and team,

I've been closely following this thread as I'm currently facing similar memory issues with my app, especially during model inference. It's been quite a journey trying to pinpoint the exact cause.

While exploring solutions, I came across the memray profiler. It offers a detailed breakdown of memory consumption by different lines of code. For instance, to profile a FastAPI application using memray, one can use:
memray run -m uvicorn app:app --workers 1

I thought it might be worth sharing, just in case it could provide some insights or be of help in your investigations. Looking forward to any updates on this issue, and I truly appreciate all the efforts you and others are putting into this!

5 replies

hetpin Sep 4, 2023

We've added memray to the repository/test_memray.
So far, memray's html report gives a great view of mem usage.
Thanks for your suggestion! @ulan-yisaev

EBazarov Sep 4, 2023
Author

Here's the plot:

cheers to @hetpin

Kludex Sep 5, 2023
Maintainer

And the reason?

EBazarov Sep 5, 2023
Author

Hey @Kludex, we just shared the plots, and report that was been generated with memray, we still can't say why and explain the reasons that it has such behaviour. We suppose, but not sure, that it may be related to Python itself and memory management.

suhjohn Dec 14, 2023

Is this reproduced in hypercorn for you @EBazarov ? For this exact problem potentially in #1993 I moved to hypercorn and I have not been seeing the leak. Also found a related discovery here: fastapi/fastapi#1624. Pretty sure this is not a Python problem and a Uvicorn problem

james-ro-williams · 2023-10-31T11:09:48Z

james-ro-williams
Oct 31, 2023

Hi everyone, any update on this? We're currently experiencing a similar issue.

2 replies

gi0baro Nov 7, 2023

Given everything we stated in the upper replies, I suspect there's no much to do about this.
Given the number of frameworks and servers tested, it just seems the Python interpreter is the culprit here, as even if the heap memory gets released, the RSS remains high.

Answer selected by Kludex

Faolain Nov 27, 2023

Really appreciate the in-depth research done by everyone here but somehow this solution doesn't feel completely satisfactory/feels janky as a long term solution. Has this issue been raised in the CPython repo? Update: Linked this issue python/cpython#109534 with the thread above. Maybe related?

Faolain · 2023-11-28T01:46:14Z

Faolain
Nov 28, 2023

@EBazarov apologies for the ping but would you be able to followup in the above thread python/cpython#109534 as it seems with your tests you might have something already available to do so? Guido himself responded that the upstream team is hardpressed to debug this and if someone can produce a self contained example that demonstrates the issue it would go a long way to get it solved. I believe your repo goes quite a ways there. 🙏

0 replies

Kludex · 2023-12-14T07:42:02Z

Kludex
Dec 14, 2023
Maintainer

I'm locking this thread to avoid misguidance.

Since it reproduces on every ASGI server, then it's not a Uvicorn issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Not Released After High-Load Operations #2078

{{title}}

Replies: 8 comments 32 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Memory Not Released After High-Load Operations #2078

Summary:

Detailed Description:

Reproduction Steps:

What We've Tried:

Related issues

Replies: 8 comments · 32 replies

Kludex Aug 21, 2023 Maintainer

Kludex Aug 28, 2023 Maintainer

EBazarov Aug 28, 2023 Author

EBazarov Aug 28, 2023 Author

Kludex Aug 28, 2023 Maintainer

EBazarov Aug 28, 2023 Author

Kludex Aug 30, 2023 Maintainer

EBazarov Sep 4, 2023 Author

EBazarov Sep 4, 2023 Author

Kludex Sep 5, 2023 Maintainer

EBazarov Sep 5, 2023 Author

Kludex Dec 14, 2023 Maintainer

Replies: 8 comments 32 replies

Kludex
Aug 21, 2023
Maintainer

Kludex Aug 28, 2023
Maintainer

EBazarov Aug 28, 2023
Author

EBazarov Aug 28, 2023
Author

Kludex
Aug 28, 2023
Maintainer

EBazarov Aug 28, 2023
Author

Kludex
Aug 30, 2023
Maintainer

EBazarov
Sep 4, 2023
Author

EBazarov Sep 4, 2023
Author

Kludex Sep 5, 2023
Maintainer

EBazarov Sep 5, 2023
Author

Kludex
Dec 14, 2023
Maintainer