Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Io.copy_channels is called concurrently under Windows (at least), resulting in corrupted output #11193

Closed
nojb opened this issue Dec 12, 2024 · 1 comment · Fixed by #11194
Labels

Comments

@nojb
Copy link
Collaborator

nojb commented Dec 12, 2024

Recently at LexiFi we observed what looks like a bug in Dune. When building a medium-size workspace under Windows, sometimes the target of copy_files# will get corrupted. Concretely, the workspace includes Menhir, and the file src/stage2/lexmli.ml (which is an ocamllex-generated lexer and is copied using copy_files# from src/lexmli.ml) is sometimes corrupted:

image

For future reference, the symptom is that the compiler complained from an illegal backslash in this file:

image

Looking at the code (copy_files# ...) is implemented using Io.copy_channels from stdune which uses a global buffer:

let copy_channels =
let buf_len = 65536 in
let buf = Bytes.create buf_len in
let rec loop ic oc =
match input ic buf 0 buf_len with
| 0 -> ()
| n ->
output oc buf 0 n;
loop ic oc
in
loop
;;

Accordingly, the first suspicion is that this function is called concurrently. To test this conjecture I instrumented the code:

diff --git a/otherlibs/stdune/src/io.ml b/otherlibs/stdune/src/io.ml
index 576021ffb..0dbaae703 100644
--- a/otherlibs/stdune/src/io.ml
+++ b/otherlibs/stdune/src/io.ml
@@ -68,6 +68,7 @@ let input_zero_separated =
 let copy_channels =
   let buf_len = 65536 in
   let buf = Bytes.create buf_len in
+  let busy = ref false in
   let rec loop ic oc =
     match input ic buf 0 buf_len with
     | 0 -> ()
@@ -75,7 +76,14 @@ let copy_channels =
       output oc buf 0 n;
       loop ic oc
   in
-  loop
+  fun ic oc ->
+  if !busy then begin
+    Printf.printf "XXXXXXXX\n%!";
+    Printexc.print_raw_backtrace stdout (Printexc.get_callstack 20)
+  end;
+  busy := true;
+  loop ic oc;
+  busy := false
 ;;

and indeed, the function is called concurrently:

image

@rgrinberg: should we do the safe thing here and use a separate buffer for each call? (We could optimize that by using a global buffer but if the function is called again while the global buffer is being used, then use a fresh buffer.) What do you think?

@nojb
Copy link
Collaborator Author

nojb commented Dec 12, 2024

@rgrinberg: should we do the safe thing here and use a separate buffer for each call? (We could optimize that by using a global buffer but if the function is called again while the global buffer is being used, then use a fresh buffer.) What do you think?

See #11194 for this approach.

@nojb nojb added the bug label Dec 12, 2024
maiste added a commit to maiste/opam-repository that referenced this issue Dec 17, 2024
CHANGES:

### Fixed

- When a library declares `(no_dynlink)`, then the `.cmxs` file for it
  is no longer built. (ocaml/dune#11176, @nojb)

- Fix bug that could result in corrupted file copies by Dune, for example when
  using the `copy_files#` stanza or the `copy#` action. (@nojb, ocaml/dune#11194, fixes
  ocaml/dune#11193)

- Remove useless error message when running `$ dune subst` in empty projects.
  (@rgrinberg, ocaml/dune#11204, fixes ocaml/dune#11200)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant