Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race condition in mv where the file might get removed after we checked its existance #42253

Merged
merged 1 commit into from
Sep 23, 2021

Conversation

KristofferC
Copy link
Member

@KristofferC KristofferC commented Sep 15, 2021

We've observed an error with the following stacktrace:

IOError: unlink("home/kc/.julia/logs/manifest_usage.toml"): no such file or directory (ENOENT)
Stacktrace:
  [1] uv_error
    @ ./libuv.jl:97 [inlined]
  [2] unlink
    @ ./file.jl:958
  [3] #rm#12
    @ ./file.jl:276
  [4] #checkfor_mv_cp_cptree#13
    @ ./file.jl:323
  [5] #mv#17
    @ ./file.jl:411 [inlined]
...

In short, the rm in the line changed in this PR errors because the file does not exist, even though we checked its existence a while earlier. This could be because some other process deleted it. Therefore, pass force=true to rm so that it is not an error if the file has already been deleted.

cc @Sacha0, @NHDaly

@KristofferC KristofferC added backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Sep 15, 2021
Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kristoffer! :)

@Sacha0
Copy link
Member

Sacha0 commented Sep 15, 2021

(We shared the following via other channels as well, but reporting here for good measure and other folks' benefit.) With this patch the failure mode changes to:

      From worker 3:      Activating project at `~`
ERROR: LoadError: On worker 3:
SystemError: opening file "/[...]/.julia/logs/manifest_usage.toml": No such file or directory
Stacktrace:
  [1] #systemerror#69
    @ ./error.jl:174
  [2] #systemerror#68
    @ ./error.jl:173 [inlined]
  [3] systemerror
    @ ./error.jl:173 [inlined]
  [4] #open#636
    @ ./iostream.jl:293
  [5] open
    @ ./iostream.jl:282 [inlined]
  [6] #open#355
    @ ./io.jl:328
  [7] open
    @ ./io.jl:328 [inlined]
  [8] read
    @ ./io.jl:436 [inlined]
  [9] readstring
    @ /build/source/usr/share/julia/stdlib/v1.7/TOML/src/TOML.jl:21 [inlined]
 [10] parsefile
    @ /build/source/usr/share/julia/stdlib/v1.7/TOML/src/TOML.jl:43
 [11] write_env_usage
    @ /build/source/usr/share/julia/stdlib/v1.7/Pkg/src/Types.jl:499
 [12] EnvCache
    @ /build/source/usr/share/julia/stdlib/v1.7/Pkg/src/Types.jl:337
 [13] EnvCache
    @ /build/source/usr/share/julia/stdlib/v1.7/Pkg/src/Types.jl:317 [inlined]
 [14] add_snapshot_to_undo
    @ /build/source/usr/share/julia/stdlib/v1.7/Pkg/src/API.jl:1627
 [15] add_snapshot_to_undo
    @ /build/source/usr/share/julia/stdlib/v1.7/Pkg/src/API.jl:1623 [inlined]
 [16] #activate#282
    @ /build/source/usr/share/julia/stdlib/v1.7/Pkg/src/API.jl:1589
 [17] activate
    @ /build/source/usr/share/julia/stdlib/v1.7/Pkg/src/API.jl:1552
 [18] top-level scope
    @ none:1
 [19] eval
    @ ./boot.jl:373
 [20] #103
    @ /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:274
 [21] run_work_thunk
    @ /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:63
 [22] run_work_thunk
    @ /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:72
 [23] #96
    @ ./task.jl:411
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:369
 [2] macro expansion
   @ ./task.jl:388 [inlined]
 [3] remotecall_eval(m::Module, procs::Vector{Int64}, ex::Expr)
   @ Distributed /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/macros.jl:223
 [4] top-level scope
   @ /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/macros.jl:207
...

@Sacha0
Copy link
Member

Sacha0 commented Sep 15, 2021

We just managed to get our CI pipeline testing the 1.6.3 candidate, and hit an analog under 1.6.3 likely due to the Pkg bump in df3d351. Relevant part of the stacktrace:

    nested task error: SystemError: opening file "/[...]/.julia/logs/manifest_usage.toml": No such file or directory
    Stacktrace:
      [1] #systemerror#63
        @ ./error.jl:168
      [2] #systemerror#62
        @ ./error.jl:167 [inlined]
      [3] systemerror
        @ ./error.jl:167 [inlined]
      [4] #open#587
        @ ./iostream.jl:293
      [5] open
        @ ./iostream.jl:282 [inlined]
      [6] #open#317
        @ ./io.jl:328
      [7] open
        @ ./io.jl:328 [inlined]
      [8] read
        @ ./io.jl:434 [inlined]
      [9] readstring
        @ /build/source/usr/share/julia/stdlib/v1.6/TOML/src/TOML.jl:21 [inlined]
     [10] parsefile
        @ /build/source/usr/share/julia/stdlib/v1.6/TOML/src/TOML.jl:43
     [11] write_env_usage
        @ /build/source/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:446
     [12] EnvCache
        @ /build/source/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:301
     [13] EnvCache
        @ /build/source/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:281 [inlined]
     [14] add_snapshot_to_undo
        @ /build/source/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:1534
     [15] add_snapshot_to_undo
        @ /build/source/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:1530 [inlined]
     [16] #activate#258
        @ /build/source/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:1496
     [17] activate
        @ /build/source/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:1459
     [18] top-level scope
        @ none:1
     [19] eval
        @ ./boot.jl:360 [inlined]
     [20] #155
        @ ./task.jl:411

@Sacha0
Copy link
Member

Sacha0 commented Sep 16, 2021

For those following along, the commit in Pkg.jl that introduced the issue has been reverted on Pkg.jl's release-1.6 and release-1.7 branches (via the two pull requests linked just above). #42282 brings the relevant Pkg.jl rev onto julialang/julia release-1.6 branch, and #42255 brings it onto the julialang/Julia release-1.7 branch.

I'm not sure where that this leaves this pull request. Thoughts Kristoffer? Close this and address the issue more precisely on Pkg.jl master I imagine? :)

@KristofferC
Copy link
Member Author

Thoughts Kristoffer? Close this and address the issue more precisely on Pkg.jl master I imagine? :)

This still looks like an improvement to me though? It turns a scenario from a fatal error to doing what is expected.

@Sacha0
Copy link
Member

Sacha0 commented Sep 17, 2021

This still looks like an improvement to me though? It turns a scenario from a fatal error to doing what is expected.

👍 I defer to your better judgement, not being familiar with the subtleties (if any) of force / this code :).

@KristofferC KristofferC merged commit 72ece74 into master Sep 23, 2021
@KristofferC KristofferC deleted the kc/mv_race branch September 23, 2021 11:29
@KristofferC KristofferC removed backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Sep 23, 2021
@IanButterworth IanButterworth added backport 1.6 Change should be backported to release-1.6 backport 1.7 and removed backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Oct 24, 2021
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants