Replies: 2 comments 3 replies
-
I tried a bit more. Calling |
Beta Was this translation helpful? Give feedback.
-
I tested it again on another cluster. The cluster is much larger. You can easily use 100,000 cpu hours per day there. I was able to run I talked to the developer of |
Beta Was this translation helpful? Give feedback.
-
I am running 120 daemons on a slurm cluster. I have observed very unusual runtimes when
.f
uses functions from an external package. I usemlr3misc::map()
in the example because there is an equivalent in the base package. I also tried completely different functions with the same result. The only thing they have in common is that they are not in the base package.Most of the mirais are resolved after a few milliseconds. A few mirais needs seconds to finish and the last one 165 seconds.
The exact runtime varies of course but usually this small example takes more than a minute. A few times the measured runtimes in the mirais looked good but
collect_mirai(m)
still took a minute. What I have also observed is that if.f
is a longer running function, about 90 mirias are started directly and then a new mirai starts every few seconds. I looked at a timestamp that is recorded directly at the beginning of.f
.If we use
lapply
from the base package instead ofmlr3misc::map
, the examples finishes in milliseconds.I suspect that the shared file system of the cluster is overloaded when 120 mirais try to load a package. When I add
Sys.sleep(sample(10, 1))
to the beginning of.f
, I always get a runtime of 10 seconds. So it seems to be enough to delay the loading of the packages. I tried to run the same.f
with another parallelization framework calledbatchtools
and I also got some expired jobs. So I don't think it's a problem caused by mirai but maybe mirai can solve it?Beta Was this translation helpful? Give feedback.
All reactions