Memory Not Released After High-Load Operations #2078
-
Summary:We have developed an inference server using FastAPI with batching. However, we've encountered a significant issue related to memory management during high-load scenarios, in our case (described below) we are sending lists of base64-encoded images with a lot of simultaneous requests. Although we initially implemented this with FastAPI, further tests revealed that the issue also persists with a pure Starlette implementation, that’s why it’s submitted under the Starlette repository. Memory consumption over time with Starlette + Gunicorn while waiting for 1 hour after the test ends. Detailed Description:In our test scenario, the asynchronous server receives a request containing a list of base64 encoded images and simply returns “Hello World!”. from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route
async def infer(request):
data = await request.json()
return JSONResponse({"message": "Hello World!"})
routes = [
Route('/infer', infer, methods=['GET', 'POST'])
]
app = Starlette(debug=True, routes=routes) Current Behavior:
Expected Behavior:
Reproduction Steps:We have prepared a dedicated GitHub repository that showcases this issue in greater details. This includes TLDR what we did:
What We've Tried:Some of the experiments can be found in the shared repository above, but overall what we tried:
Related issuesWe found out that we are not the only ones having the same behaviour:
|
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 32 replies
-
Does it persist with another ASGI web framework? 👀 |
Beta Was this translation helpful? Give feedback.
-
![]() This list is not useful. There was a leak in the past that was solved. |
Beta Was this translation helpful? Give feedback.
-
Can you use the pipeline, and upload the artifact (image) from there? |
Beta Was this translation helpful? Give feedback.
-
Hi @Kludex, Have you had a chance to check out the repository and reproduce the plots we've been working on? We're interested in digging deeper to understand the underlying issues better. One approach we're considering is enhanced memory profiling. However, we'd really value your insights on this or any other suggestions you might have for uncovering the root cause. Looking forward to your thoughts! |
Beta Was this translation helpful? Give feedback.
-
Hi @EBazarov and team, I've been closely following this thread as I'm currently facing similar memory issues with my app, especially during model inference. It's been quite a journey trying to pinpoint the exact cause. While exploring solutions, I came across the memray profiler. It offers a detailed breakdown of memory consumption by different lines of code. For instance, to profile a FastAPI application using memray, one can use: I thought it might be worth sharing, just in case it could provide some insights or be of help in your investigations. Looking forward to any updates on this issue, and I truly appreciate all the efforts you and others are putting into this! |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, any update on this? We're currently experiencing a similar issue. |
Beta Was this translation helpful? Give feedback.
-
@EBazarov apologies for the ping but would you be able to followup in the above thread python/cpython#109534 as it seems with your tests you might have something already available to do so? Guido himself responded that the upstream team is hardpressed to debug this and if someone can produce a self contained example that demonstrates the issue it would go a long way to get it solved. I believe your repo goes quite a ways there. 🙏 |
Beta Was this translation helpful? Give feedback.
-
I'm locking this thread to avoid misguidance. Since it reproduces on every ASGI server, then it's not a Uvicorn issue. |
Beta Was this translation helpful? Give feedback.
Given everything we stated in the upper replies, I suspect there's no much to do about this.
Given the number of frameworks and servers tested, it just seems the Python interpreter is the culprit here, as even if the heap memory gets released, the RSS remains high.