-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openPMD plugin: Flush data to disk within a step #4002
openPMD plugin: Flush data to disk within a step #4002
Conversation
@pnorbert suggested to specify |
Note that in the above image the first graph reaches a surprisingly large peak memory consumption of 110GB.
In here, the virtual memory usage is even higher than Heaptrack reports, but the physical memory (RES) peaks at 55GB. This being said, I don't know if SLURM or other Batch systems understand this, i.e. I don't know if they go by physical or virtual memory to monitor memory usage of jobs. |
1ee5d61
to
4ae3880
Compare
This PR now has a working suggestion on how to handle different flush targets via JSON configuration |
be2aac6
to
1ba0120
Compare
1ba0120
to
94be5df
Compare
62fc8a4
to
9547d7b
Compare
9547d7b
to
db10489
Compare
db10489
to
20e22cb
Compare
Thanks for working on this feature, this change is required to support reducing the memory footprint for IO on ORNL crusher/frontier and other systems with a low amount of host memory compared to the GPU memory. Sry I was not aware that you pushed new changes to this PR. Please ping me next time. |
The upcoming BP5 engine in ADIOS2 has some features for saving memory compared to BP4.
BP5 will not replace BP4 because these memory optimizations come at a runtime cost, instead users will be able to decide between runtime efficiency and memory efficiency.
One feature that we asked for and is now implemented is the ability to flush data to disk within a single IO step. I'm currently working on exposing this functionality in openPMD. Together with that PR, this PR makes the feature available as a preview in PIConGPU.
Pinging @psychocoderHPC because he asked for this feature
TODO:
First results
I ran 4 tests, each one writing 3 IO steps, bit more than 15Gb per step:
The memory profiles of each run are seen in the following screenshot line by line, note the different y scales
Further details:
Interpretation:
As it stands, the runtime duration of BP5 approaches is very long in these benchmarks. The parameters of the BP5 engine are not yet documented, so I have not really had the chance to tune this yet.