-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit number of simultaneously opened fds #578
Conversation
Non-optimal solution: only handles the common case.
Would wrapping up I know lazy IO has problems, but we aren't actually letting our handles escape. So we aren't pretending to be composable. Also, only the fd acquisition is lazy. Reading/Writing is still done as eagerly as ever. |
I had an earlier (flawed) implementation which did the deferring (there's no need here for
The problem with both those ideas is a slight flaw in the underlying design - once you start a |
This should be fixed by #586 |
It should be fine to use |
The implementation here is good, but the change still seems a little ad hoc. Do we still need this after the scheduling fix? I'd say no if it seems to address the problem for everyone. But if this PR makes the fd usage measurably more efficient, I suppose it should be included. |
@diml - using a @rgrinberg - does the scheduling change solve this? The issue here isn't the scheduler, it's the delay between opening the file for redirection and actually scheduling anything to it, but I'd only glanced at the scheduling change PR. |
Looking again at process.ml, using a lazy value is not going to be enough as in the So the only way to generalize this PR is to correctly set the |
I think we merge this as it and leave the generalization for later as it's a more invasive change |
Non-optimal solution: only handles the common case.
Non-optimal solution: only handles the common case.
Both @jpdeplaix and I have seen some Travis jobs failing with beta18 running out of fds (see, for example, this Travis log
I bisected the problem and found that it was #486 which introduced the issue. However, it's not a bug in the implementation, it's just that it reveals a problem. The
redirect
function inAction
opens the files which will be written to immediately, but the jobs which are going to write to them (in this case, lots of calls toocamldep.opt
) obviously won't be scheduled at once so the fds are "hogged" by the scheduler. This creates a problem for things like mccs in opam's vendored build which does a lot ofcopy_files
which obviously can proceed while ocamldep calls are going on, but eventually fail because of the number of open fds (I think @avsm has seen the same thing with RWO2?). Unusually, this is therefore a failure which becomes more apparently with-j 1
than less!The proper fix is quite invasive, as it involves threading the state of redirection through
Action
andProcess
- I couldn't see a really neat way of guaranteeing that the file is both only opened as late as possible and also that it's always written to. However, what I've done here captures the common case ofwith-*-to (run foo)