-
-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solver arrays not being cleared from memory. #680
Comments
If you do Another thing that's helpful is to use things like |
Thanks. Just tried it out. |
My guess is that it doesn't clear the memory for |
Is there anything the developers of the package can do about this issue? I recently ran into this problem when I parallelized solving many large ODE models with multithreading. It kept crashing (out of memory for 370GB RAM servers) and I spent a long time troubleshooting this memory leak until finding this post. I tried the solution Chris mentioned and it finally worked. But isn't it a little ridiculous that the solver arrays are not collected by the GC? Especially if they are already reasigned or are out of the function? |
That's not a memory leak. If you are solving many large ODE models simultaneously with multithreading, then they all have to be held in memory. Multithreading is by definition shared memory multithreading, and the objects have to live in the memory pool. You have not described anything that resembles a memory leak, so please do not use that term. A memory leak is a case where memory that can be freed is not freed. But if you are multithreading, then you cannot safely remove the objects because they are still being computed on. Freeing those objects will result in incorrect computations and segfaults. This is true in any language: you cannot just free memory of things that are currently being computed! If what you need is to reduce memory of an ensemble, a good way to do this is to not save any solutions and instead write them to disk. In terms of the ensemble interface https://docs.sciml.ai/DiffEqDocs/stable/features/ensemble/#Building-a-Problem, you can do something like define a
No. If it's in the REPL you can in theory still access it, so it's incorrect to GC it if you can access the values. Otherwise you could access a value that could already have been GC'd, and thus you'd get incorrect junk. If you want something to be GC'd, you should remove all references. This is true for every single GC language, and if it wasn't true, that would be a non-deterministic correctness bug in the language! But this is out of the purview of this package. DifferentialEquations.jl doesn't do anything special with memory handling, it's just using the GC and getting standard GC behavior. If you think that there is an issue with the GC, please report it with a reproducible case to https://github.com/JuliaLang/julia. But I want to stress again that if you're running out of memory saving huge arrays, then don't save huge arrays. Let me know if you have any questions. But please, if you believe there is a memory leak somewhere, share an example and report it. |
I had to use the huge arrays because I was simulating a nonstiff model first followed by a stiff one that depends on the solution (at each time step) of the nonstiff model, if that makes sense. But I was fine with the objects living in the memory. The issue was that it should get deleted every time I finished that iteration and move on to simulating a new ODE. Instead, it didn't get cleared and just piled up over the course of the whole simulation. For example, I expected each iteration of ODE solving to take 1GB of memory. With 64 CPU cores, I should have steady 64GB memory usage throughout. Instead, It was 64GB at the start and slowing growing to the total RAM (370GB) of my server and crashed. Anyways, my problem was fully solved (steady 64GB) by doing this ⬇️
And thanks for the suggestion about the ensemble!
|
Maybe there's a confusion as to what's going on here that I should describe in a bit more detail. This is something that can be seen without any ODE solvers involved. Say you have a 16 GB of RAM machine, and you're making arrays of size 12 GB. If you do: x = ... # 12 GB array
x = ... # 12 GB array you will OOM your machine because that requires more than 12 GB. You could think that it should only require 12 total, since it could GC the first array while defining the second, but it cannot in general. For example: x = ... # 12 GB array
x = [x[1], ...] it could have dependencies on the original array. If it did, then deleting the original array before creating the next one is dangerous because you can have undefined behavior for what you have done with the memory for how x = ... # 12 GB array
y = @view x[1:1]
x = [y[1], ...] would have the same issue with an "arbitrarily" different array, so then you need some advanced aliasing analysis and such would be difficult to have correct in the most general of cases. So, the GC does not apply to the first array until after the second is defined. In fact, it acts like it first creates the new array, then binds it to the name In order to prevent this, you just do: x = ... # 12 GB array
x = nothing
x = ... # 12 GB array because in the So then:
My best guess as to what would be happening here is that it's related to GC behavior under multithreading and likely due to late GC-ing because the mark and sweep passes are not multithreaded themselves. This is why Julia's GC currently has subpar performance in multithreaded contexts (though that's being worked on), and by doing But anyways, that would be some Julia Base issue with multithreading in GC contexts, and not something we would solve in the ODE solver. |
I have a recurring issue where, since I'm running a simulation on a PDE with a wide range of initial conditions, after the solve() call is finished large amounts of memory are still allocated and refuse to be garbage collected-etc.
I'll run a block of code like this several times with different initial conditions.
Running additional solves will keep compounding memory. It's not an issue at this scale but I need to be able to run solutions for 2 to 3 times longer in some cases and running batches of those regularly crashes my kernel.
I'm still very green when it comes to Julia and numerical methods in general so there's probably a big issue with the way I've laid out the problem but I can't seem to figure it out.
Any commentary would be appreciated.
The text was updated successfully, but these errors were encountered: