Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running multiple instances of dune in parallel fails #236

Closed
c-cube opened this issue Aug 25, 2017 · 32 comments · Fixed by #6360 or ocaml/opam-repository#22498
Closed

running multiple instances of dune in parallel fails #236

c-cube opened this issue Aug 25, 2017 · 32 comments · Fixed by #6360 or ocaml/opam-repository#22498
Assignees
Milestone

Comments

@c-cube
Copy link

c-cube commented Aug 25, 2017

There should probably be a file lock somewhere to prevent that. It works in e.g., cargo. My use case is that I have a recompilation loop that inotify-watches the project in the background, and I sometimes run dune runtest manually, and both seem to collide.

@ghost
Copy link

ghost commented Aug 31, 2017

adding a lockfile seems fine to me

@jordwalke
Copy link
Contributor

I have many other use cases for this file lock. I actually think it would be good to form a convention that all kinds of developer tools can partake in to ensure that there is one large operation occurring at a time. For example, our build systems already tend to have file watchers in place, and we can make those watchers coordinate with Dune's locked file. Another use case is that large version control systems can make sure to lock that same file before doing a rebase so that Dune doesn't thrash while rebasing or try to build an inconsistent state, if it is currently watching.
I suppose my request is to just advertise whatever file path you were going to implement as a suitable file for other tools to lock so that everyone can partake in the same build process queue.

@ghost
Copy link

ghost commented May 14, 2019

That seems fine. Just one thing thought, the plan is that when dune will be running in polling mode, it will hold the lock continuously, meaning that no one else will be able to take it. Does that work with what you are describing?

@rgrinberg
Copy link
Member

The behavior I would expect in polling mode would be for dune to acquire the lock before it wants to start a build and release once the build is done.

@ghost
Copy link

ghost commented May 14, 2019

Maybe. Although I have a feeling that in general, what you really want is a way to temporarily pause the build system.

@ghost
Copy link

ghost commented May 14, 2019

It would help if we had a concrete use case. @jordwalke do you have one in mind?

@rgrinberg
Copy link
Member

Maybe. Although I have a feeling that in general, what you really want is a way to temporarily pause the build system.

I believe this feature would give us this ability:

$ touch .dune.lock

@ghost
Copy link

ghost commented May 14, 2019

Hmm, how would that work? If dune is currently doing a build, it would hold this lock. You'd have to wait for dune to finish before taking the lock.

@rgrinberg
Copy link
Member

Ah yeah, I guess you can't queue the operation here so that the lock file is created only after the current build finishes.

It would be quite nice to have something simple like this however. One naive scheme to try this out would be:

  1. Have every dune run generate a unique GUID
  2. Before a build (includes polling mode as well), echo $GUID > .dune.lock.
  3. Read the contents of the .dune.lock, if it matches the $GUID then remove the lock file. If not, then consider the current workspace as locked.

I'm sure this suffers from many races, and you can't really queue access to the workspace this way, so it's really not ideal. I guess the goal here is to have a simple command like $ dune lock. To make it as simple as possible for things like rebasing scripts to stop dune from building stuff.

@ghost
Copy link

ghost commented May 14, 2019

I guess. Though I'd still like to see a concrete use case before we go to far in the design :) I still don't know what this is for.

However, the simple version of a lock file to prevent two dune instances to build things in the same workspace seems fine.

@rgrinberg
Copy link
Member

Here is a very simple use case that I would want this feature for. I would like to do a rebase and resolve some merge conflicts. In this situation, the polling mode can be quite distracting because it makes my editor jump unpredictably. I would prefer to turn it off and go back to manual building until the rebasing is done. Currently, what I do is turn off the polling build and then turn it back on, but I do sometimes forget this step.

What I'd like is to create an alias for rebase that will automatically turn pause the polling builds. However, I just realized that having touch .dune.lock isn't good enough for me. I suppose what I'd like is $ dune pause which would block until the lock is acquired, and then call git rebase.

@jordwalke
Copy link
Contributor

jordwalke commented May 15, 2019

The behavior I would expect in polling mode would be for dune to acquire the lock before it wants to start a build and release once the build is done.

I'm not exactly sure how polling mode is implementing but if it is doing file system scans, I would also expect it to grab the lock while doing the file system scan too, as well as when actually building (if in fact the file system scan is a separate step).

@jordwalke
Copy link
Contributor

I think that having Dune grab the lock before each (scan+build) would allow more tools to partake in locking the build, even in polling mode, as opposed to grabbing the lock, and building forever without ever releasing it.

Some concrete use cases I have in mind.

  1. The rebasing scenario that @rgrinberg described. We currently integrate hg with other polling build systems, and for the same reasons we'd like the ability to integrate hg with dune.

  2. We might also be running our own watchers. We don't want to be scanning the sources when dune might be generating files (.merlin etc), and we don't want to be updating files using other non-dune tooling in-source when Dune is mid-build. That could result in a false positive failure to build. For example we have a tool syncproject that will fixup some Dune files and symlinks etc. What we would do is have syncproject first grab the project build lock, then update the dune files/symlinks, and then release the lock. That way the sync script and Dune itself sees a consistent state of the world.

We have more use cases as well.

@jordwalke
Copy link
Contributor

Another idea: Would it be possible to put the lockfile in the dune build directory, since the build directory will often reside out of source (in our use cases)? That also helps integration with other file watchers we may be running since those watchers won't get confused when it sees a lock file change mtime or appear/disappear.

@ghost
Copy link

ghost commented May 15, 2019

Yh, the lockfile should definitely be in the build directory. The false positive is not a worry: the guarantee is that when dune finishes building, you are in the same state as if you started a build from scratch. In particular, if you modify the source tree while dune is running, it will simply pick up the changes and adapt.

I admit that I am a bit sceptical about the vcs case. We have been in this situation for years at Jane Street and it has never been a big deal. The build system tends to blink a bit while you do vcs operations, but that's about it. Adding all this stateful stuff seems like it will complicate things. If the other tool forget to remove the lock, then Dune will never restart.

In any case, a plain lock file doesn't seem right to me. If you are building something big that takes 10 minutes to build, you are going to have to wait 10 minutes before you can do a vcs operation, which is not right. If we really wanted to serialise things, then it should be dune that pauses while the user performs the vcs operation. So instead, we should do this via a RPC.

@aalekseyev
Copy link
Collaborator

aalekseyev commented May 15, 2019

@diml, it's not quite true that vcs is not a big deal: it breaks pretty often and we have a bunch of fragile hacks to work around it.

I'm thinking that for two different things there ideally should be two different locks.

The two things are

  1. The source directory, shared with the editor, vcs, other tools and other dunes (writes can only happen due to auto-promotion)
  2. The build directory, shared with other dunes only

I think it would be nice if dune only acquired a lock on (1) when promoting. That makes dune never hold the lock for prolonged periods, so it makes all the other tools responsive.

I expect dune to hold the lock on (2) at least for the duration of one build, but possibly for the entire polling.

The argument for limiting the scope of the lock to one build is that it lets you run a standalone dune build on a different set of targets while another instance of dune is waiting for things to change.

Regarding the concern that some tool might never release the lock, that's a valid concern, but on unixes we could use a variant of the lock that's automatically released when the process dies to help with that (I'm thinking of flock).

@ghost
Copy link

ghost commented May 16, 2019

Ok, I wasn't aware of these hacks.

I can see the point of 1, but I admit I'm not thrilled with adding yet another file in the source tree :( That's one more thing users have to add to their .gitignore. Maybe it's possible to do that without a file in the file system though. For instance, creating a socket with an abstract path should have the same effect, right? Plus it should be deleted automatically if whoever created the socket dies.

The argument for limiting the scope of the lock to one build is that it lets you run a standalone dune build on a different set of targets while another instance of dune is waiting for things to change.

We could also achieve that via a RPC.

Regarding the concern that some tool might never release the lock, that's a valid concern, but on unixes we could use a variant of the lock that's automatically released when the process dies to help with that (I'm thinking of flock).

Ack

@aalekseyev
Copy link
Collaborator

adding yet another file in the source tree

The file doesn't have to live in the source tree itself. It could live in the .git/.hg if that's present, for example.

creating a socket with an abstract path should have the same effect, right?

I don't really know how abstract namespace works, but I imagine it might be a pain from sysadmin/security perspective because access to this namespace won't be controlled by the filesystem permissions.

One thing that seems to work on linux is just flocking the directory itself instead of choosing a specific file in it. I don't know how portable these things are.

Anyway, it seems to me that we should implement at least the build directory locking for sure in a dune-specific way, but the source directory locking needs more research of concrete use cases (how do existing tools do it? what's a locking protocol that strikes a good balance between all the various considerations (correctness, automatic release, existing conventions, simplicity of implementations and portability)).

@ghost
Copy link

ghost commented May 16, 2019

That seems reasonable to me.

BTW, @jordwalke do you have source directory locking at Facebook?

@jordwalke
Copy link
Contributor

@diml we have a use case for it because some of out other build tools / build systems generate source files as well. I'd be happy with a locked file that locks both build and source dir, or two locked files (one for build and one for source).
We do have some special logic for pausing various tools while rebasing since many things can go wrong during a rebase.

@ghost
Copy link

ghost commented May 20, 2019

Is this logic something you can share? It'd be great to see real use cases before jumping to one particular solution.

@jordwalke
Copy link
Contributor

I can't share that logic unfortunately. But the other logic for integrating esy builds/package upgrades/watcher integration is something that hasn't been written yet

@ghost
Copy link

ghost commented May 30, 2019

Ack. So to sum up the discussion we need two kinds of lock; one for _build and one for the source tree. The one for _build can be specific to dune and dune. The one for the source tree should be shared with other tools and requires a bit more research.

For now, we should simply add a lock file for _build. I sugget _build/.lock. We can implement it as a symlink that points to the pid of the process holding the lock.

For the source tree lock, we'll decide once we know a bit more about it.

@jordwalke
Copy link
Contributor

jordwalke commented May 31, 2019

one for the source tree. The one for _build can be specific to dune and dune.

You might have meant to write "dune and something-else"?

Would dune lock the source tree whenever it wants to write .merlin files (or promotes) into the source? That's another thing that trips up watchers. (and dune promote commands).

I'm not entirely sure why there would be two separate locks, because the act of _building the artifacts involves reading from sources, and I would want a way for dune to grab an exclusive lock on the source files before doing a build - for the case where our version control is going to do a rebase and change all the sources - we'd want to hold off on any builds as well even though dune is merely reading from the source tree, not writing to it.

For now, we should simply add a lock file for _build. I sugget _build/.lock. We can implement it as a symlink that points to the pid of the process holding the lock.

Sounds like that might be more nuanced on windows?

I like that a lock file should point to something representing the process ID of the process that locked it because then that acts as a way for all these tools to handle the case where other tools crash without releasing. I'm just not sure what the nuances of Windows would be.

@aalekseyev
Copy link
Collaborator

aalekseyev commented May 31, 2019

Would dune lock the source tree whenever it wants to write .merlin files (or promotes) into the source?
That's another thing that trips up watchers. (and dune promote commands).

That was the idea, yes. Dune would consult a lock on source dir any time it writes there.

the act of _building the artifacts involves reading from sources

When I proposed this, I was only considering dune in --watch mode. In watch mode, we don't care about dune reading something because it will re-run when there is a concurrent write.

Indeed, without --watch some sort of locking might be needed to guarantee an outcome that makes sense (consistent with a state of filesystem at a particular point in time).

I'm not entirely sure why there would be two separate locks

The use case I was thinking of is: dune building all the time (potentially a single build taking many minutes), and "hg update" happening in the middle of a build (a very common use case here in Jane Street).
That use case is awkward if you only have one lock: you'd have to have a way for dune to "give up" the lock in the middle of the build, so that the vcs can do its thing, which requires a rather complicated locking scheme.
Having a separate build dir lock lets us never hold the source dir lock for prolonged periods.

Now I agree that without --watch this seems maybe insufficient, but I still think it's mostly fine:

  1. It's common for synchronous one-shot commands to be sequenced by the user themselves, so it's not an unreasonable expectation (not so with --watch).
  2. It's possible to use a (rather simple, I would think) wrapper script to lock something for the duration of non-watching dune build. (equally true if you want a lock for the duration of --watch, yes, but not "for the duration of promote" and not "for the duration of one build of a watching dune")
  3. Things don't go wrong too badly: merely running dune again should fix your build. (not so with build directory: not locking that can lead to arbitrary corruption of persistent dune state)

I like that a lock file should point to something representing the process ID of the process that locked it because then that acts as a way for all these tools to handle the case where other tools crash without releasing.

I would like it better if the lock was reliably released when things crash, which would rule out symlink existence, file existence, directory existence, etc. because those aren't removed automatically. Advisory locks (preferably flock) are the only thing that comes to mind. (it's OK to write your pid to the file once you lock it too, to make debugging easier) (well, there's also TCP/IP ports, but that would be ridiculous) (and abstract socket namespace, but as I said I know nothing about those)

Sounds like that might be more nuanced on windows?

Sounds like it, yeah. :-)

@jordwalke
Copy link
Contributor

That use case is awkward if you only have one lock: you'd have to have a way for dune to "give up" the lock in the middle of the build, so that the vcs can do its thing, which requires a rather complicated locking scheme.

Why not have vcs integrate with the lock?

I would like it better if the lock was reliably released when things crash,

The idea behind this lockfile I was imagining is that it's something all tools can coordinate around (hence locking the source directory for reading even if dune isn't writing there - some other tool might be writing there during a dune build). And if it is a central point of coordinating all these tools, Dune might be able to guarantee it releases the locks, but we can't really say the same about all the other tools. If the universal "repo lock protocol" that tools standardize around uses some process ID then that might solve it for everyone?

@aalekseyev
Copy link
Collaborator

Why not have vcs integrate with the lock?

What does it mean to integrate with the lock and how does it help?

we can't really say [that they guarantee to release the locks] about all the other tools

Well, with flock when a process dies (or rather, when all the relevant file descriptors are closed) the lock is released. That you can guarantee by killing the stuck process. I don't really understand the point you're making in the rest of the paragraph.

@jordwalke
Copy link
Contributor

What does it mean to integrate with the lock and how does it help?

I imagined that you could use some hg hook/extension to ensure that rebases wait on a current build to finish (lock to be released) before performing the rebase, and then then it would grab the lock during the rebase, releasing it when it's done.

Well, with flock when a process dies (or rather, when all the relevant file descriptors are closed) the lock is released.

I wasn't totally sure this is possible to do reliably on windows, but you may have a fair point there.

@aalekseyev
Copy link
Collaborator

wait on a current build to finish

As I said above, we want hg operations to happen in the middle of a build.
It's not acceptable to wait for the current build to finish. We want rebase to happen immediately and the build system to start building the rebased version. That's why I'm saying you'd need a complicated locking mechanism where the high-priority writer (vcs) can come and ask the low-priority readers to please release the lock ASAP.

A much simpler solution from dune watching mode perspective (as a reader) is not to take the lock at all. (but then you'd still want a writer lock when promoting)

Blaisorblade added a commit to Blaisorblade/dune that referenced this issue Jul 7, 2020
The explanation is taken from https://coq.zulipchat.com/#narrow/stream/240550-Dune-devs.20.26.20users/topic/dune.20and.20parallel.20builds.

Also, if this is merged, please replace "jbuilder" by "dune" in the issue title, so that people who click the link aren't too confused.
Blaisorblade added a commit to Blaisorblade/dune that referenced this issue Jul 8, 2020
The explanation is taken from https://coq.zulipchat.com/#narrow/stream/240550-Dune-devs.20.26.20users/topic/dune.20and.20parallel.20builds.

Also, if this is merged, please replace "jbuilder" by "dune" in the issue title, so that people who click the link aren't too confused.

Signed-Off-By: Paolo G. Giarrusso <[email protected]>
Blaisorblade added a commit to Blaisorblade/dune that referenced this issue Jul 8, 2020
The explanation is taken from https://coq.zulipchat.com/#narrow/stream/240550-Dune-devs.20.26.20users/topic/dune.20and.20parallel.20builds.

Also, if this is merged, please replace "jbuilder" by "dune" in the issue title, so that people who click the link aren't too confused.

Signed-Off-By: Paolo G. Giarrusso <[email protected]>
@Blaisorblade
Copy link
Contributor

Please consider renaming jbuilder to dune in the OP — per #3617.

@ejgallego ejgallego changed the title running multiple instances of jbuilder in parallel fails running multiple instances of dune in parallel fails Jul 8, 2020
@ejgallego
Copy link
Collaborator

Please consider renaming jbuilder to dune in the OP — per #3617.

Done, thanks @Blaisorblade

Blaisorblade added a commit to Blaisorblade/dune that referenced this issue Jul 8, 2020
The explanation is taken from https://coq.zulipchat.com/#narrow/stream/240550-Dune-devs.20.26.20users/topic/dune.20and.20parallel.20builds.

Also, if this is merged, please replace "jbuilder" by "dune" in the issue title, so that people who click the link aren't too confused.

Signed-Off-By: Paolo G. Giarrusso <[email protected]>
Blaisorblade added a commit to Blaisorblade/dune that referenced this issue Jul 8, 2020
The explanation is taken from https://coq.zulipchat.com/#narrow/stream/240550-Dune-devs.20.26.20users/topic/dune.20and.20parallel.20builds.

Also, if this is merged, please replace "jbuilder" by "dune" in the issue title, so that people who click the link aren't too confused.

Signed-Off-By: Paolo G. Giarrusso <[email protected]>
Blaisorblade added a commit to Blaisorblade/dune that referenced this issue Jul 8, 2020
The explanation is taken from https://coq.zulipchat.com/#narrow/stream/240550-Dune-devs.20.26.20users/topic/dune.20and.20parallel.20builds.

Also, if this is merged, please replace "jbuilder" by "dune" in the issue title, so that people who click the link aren't too confused.

Signed-Off-By: Paolo G. Giarrusso <[email protected]>
ghost pushed a commit that referenced this issue Jul 9, 2020
The explanation is taken from https://coq.zulipchat.com/#narrow/stream/240550-Dune-devs.20.26.20users/topic/dune.20and.20parallel.20builds.

Also, if this is merged, please replace "jbuilder" by "dune" in the issue title, so that people who click the link aren't too confused.

Signed-Off-By: Paolo G. Giarrusso <[email protected]>
@rgrinberg rgrinberg added requires-team-discussion This topic requires a team discussion and removed help wanted labels May 2, 2022
@rgrinberg rgrinberg removed the requires-team-discussion This topic requires a team discussion label Jun 2, 2022
@rgrinberg rgrinberg linked a pull request Oct 30, 2022 that will close this issue
@rgrinberg rgrinberg added this to the 3.6.0 milestone Oct 30, 2022
@rgrinberg rgrinberg self-assigned this Nov 8, 2022
emillon added a commit to emillon/opam-repository that referenced this issue Nov 14, 2022
…ne-site, dune-rpc, dune-rpc-lwt, dune-private-libs, dune-glob, dune-configurator, dune-build-info, dune-action-plugin and chrome-trace (3.6.0)

CHANGES:

- Forbid multiple instances of dune running concurrently in the same workspace.
  (ocaml/dune#6360, fixes ocaml/dune#236, @rgrinberg)

- Allow promoting into source directories specified by `subdir` (ocaml/dune#6404, fixes
  ocaml/dune#3502, @rgrinberg)

- Make dune describe workspace return the correct root path
  (ocaml/dune#6380, fixes ocaml/dune#6379, @esope)

- Introduce a `$ dune ocaml top-module` subcommand to load modules directly
  without sealing them behind the signature. (ocaml/dune#5940, @rgrinberg)

- [ctypes] do not mangle user written names in the ctypes stanza (ocaml/dune#6374, fixes
  ocaml/dune#5561, @rgrinberg)

- Support `CLICOLOR` and `CLICOLOR_FORCE` to enable/disable/force ANSI
  colors. (ocaml/dune#6340, fixes ocaml/dune#6323, @MisterDA).

- Forbid private libraries with `(package ..)` set from depending on private
  libraries that don't belong to a package (ocaml/dune#6385, fixes ocaml/dune#6153, @rgrinberg)

- Allow `Byte_complete` binaries to be installable (ocaml/dune#4873, @AltGr, @rgrinberg)

- Revive `$ dune external-lib-deps` under `$ dune describe external-lib-deps`.
  (ocaml/dune#6045, @moyodiallo)

- Fix running inline tests in bytecode mode (ocaml/dune#5622, fixes ocaml/dune#5515, @dariusf)

- [ctypes] always re-run `pkg-config` because we aren't tracking its external
  dependencies (ocaml/dune#6052, @rgrinberg)

- [ctypes] remove dependency on configurator in the generated rules (ocaml/dune#6052,
  @rgrinberg)

- Build progress status now shows number of failed jobs (ocaml/dune#6242, @Alizter)

- Allow absolute build directories to find public executables. For example,
  those specified with `(deps %{bin:...})` (ocaml/dune#6326, @anmonteiro)

- Create a fake socket file `_build/.rpc/dune` on windows to allow rpc clients
  to connect using the build directory. (ocaml/dune#6329, @rgrinberg)

- Prevent crash if absolute paths are used in the install stanza and in
  recursive globs. These cases now result in a user error. (ocaml/dune#6331, @gridbugs)

- Add `(glob_files <glob>)` and `(glob_files_rec <glob>)` terms to the `files`
  field of the `install` stanza (ocaml/dune#6250, closes ocaml/dune#6018, @gridbugs)

- Allow `:standard` in the `(modules)` field of the `coq.pp` stanza (ocaml/dune#6229,
  fixes ocaml/dune#2414, @Alizter)

- Fix passing of flags to dune coq top (ocaml/dune#6369, fixes ocaml/dune#6366, @Alizter)

- Extend the promotion CLI to a `dune promotion` group: `dune promote` is moved
  to `dune promotion apply` (the former still works) and the new `dune promotion
  diff` command can be used to just display the promotion without applying it.
  (ocaml/dune#6160, fixes ocaml/dune#5368, @emillon)
@Alizter
Copy link
Collaborator

Alizter commented Mar 23, 2023

A quick comment here: Dune has currently been made to disallow multiple builds in the same _build/ directory by locking it. In the future there are plans for Dune processes to connect to an already running one which is tracked in #7308.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment