-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bus errors/"too many open files" errors in jobs with large numbers of Estimators using temporary files #631
Comments
It looks like the bus errors, while inconsistent in terms of what datasets cause them, tend to occur about 4.5 hours into my jobs on the FIU HPC, and it happens specifically when writing MA values to the temporary file. Specifically line 806 below: Lines 801 to 807 in 3675d96
|
As proposed by @JulioAPeraza, setting a different |
After ~21 hours, my profiling jobs failed with the following error:
I think that the issue is that NiMARE's memmaps are never closed, even though the associated files are deleted. I don't know if this is what was causing the bus error, but it seems like it could be related. |
I still get a "Too many open files" after merging #597, though it takes longer to happen. |
I also get the bus error when using |
Now that we're not using memmaps anymore, I think I can close this. |
Summary
When running a large number of meta-analyses on the FIU HPC, I end up with either a bus error or a "too many open files" error.
The bus error occurs after ~7 hours on the FIU HPC when the temporary directory (i.e., where temporary files are written) is set to
/tmp
. The "too many open files" error occurs after at least 24 hours when the temporary directory is set to/scratch
.While I made progress on tracking and closing memmapped files in #597, it looks like there are still open files slipping through.
Hopefully the impact of this bug is low, in that it should only arise when running large numbers of meta-analyses with memory limits in place.
Additional details
I originally noticed this problem when I started working on the NiMARE Jupyter book, which eventually led to me writing temporary files to the NiMARE data directory instead of tmpdir in #460. However, that inevitably slowed down operations on those temporary files, so I switched back in #599. My hope was that the problem would be resolved, since I never figured out the cause of the issue in the first place.
The text was updated successfully, but these errors were encountered: