Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calls curl_multi_assign with null #260

Closed
bevanjkay opened this issue Sep 11, 2024 · 19 comments · Fixed by #266
Closed

Calls curl_multi_assign with null #260

bevanjkay opened this issue Sep 11, 2024 · 19 comments · Fixed by #266

Comments

@bevanjkay
Copy link

When building the latest version of curl at Homebrew, we ran into a segfault when regression testing Julia as a curl dependent.
The segfault issue has been resolved in curl, however it likely points to an issue in the curl multi handler in Julia.

See curl comment for reference;
curl/curl#14860 (comment)

For our CI build;
Complete logs are available here. The referenced error starts here on macOS 14 arm64.

@an-Iceberg
Copy link

I am affected by this issue as well. I can't update any packages i've installed because a segmentation fault at curl_multi_assign happens.

@giordano
Copy link
Contributor

@bevanjkay do you have a self-contained reproducer?

@giordano
Copy link
Contributor

If I change

diff --git a/src/Curl/Curl.jl b/src/Curl/Curl.jl
index fd9b4c9..a2cc146 100644
--- a/src/Curl/Curl.jl
+++ b/src/Curl/Curl.jl
@@ -51,6 +51,7 @@ function curl_multi_socket_action(multi_handle, s, ev_bitmask, running_handles)
     ccall((:curl_multi_socket_action, libcurl), CURLMcode, (Ptr{CURLM}, curl_socket_t, Cint, Ptr{Cint}), multi_handle, s, ev_bitmask, running_handles)
 end
 function curl_multi_assign(multi_handle, sockfd, sockp)
+    multi_handle == C_NULL && error("multi_handle is NULL")
     ccall((:curl_multi_assign, libcurl), CURLMcode, (Ptr{CURLM}, curl_socket_t, Ptr{Cvoid}), multi_handle, sockfd, sockp)
 end
 

all tests pass for me. Please, provide a self-contained minimal reproducer, otherwise it looks complicated to address this issue.

@giordano
Copy link
Contributor

giordano commented Sep 18, 2024

And with

diff --git a/src/Curl/Curl.jl b/src/Curl/Curl.jl
index fd9b4c9..546ef21 100644
--- a/src/Curl/Curl.jl
+++ b/src/Curl/Curl.jl
@@ -51,6 +51,8 @@ function curl_multi_socket_action(multi_handle, s, ev_bitmask, running_handles)
     ccall((:curl_multi_socket_action, libcurl), CURLMcode, (Ptr{CURLM}, curl_socket_t, Cint, Ptr{Cint}), multi_handle, s, ev_bitmask, running_handles)
 end
 function curl_multi_assign(multi_handle, sockfd, sockp)
+    @info "Hello from curl_multi_assign"
+    multi_handle == C_NULL && error("multi_handle is NULL")
     ccall((:curl_multi_assign, libcurl), CURLMcode, (Ptr{CURLM}, curl_socket_t, Ptr{Cvoid}), multi_handle, sockfd, sockp)
 end
 

I tried to follow the (non-minimal) reproducer by @carlocab in curl/curl#14860 (comment) and I still can't see multi_handle being NULL:

% julia -p 3 --project=/tmp -e 'using Pkg; Pkg.add("Example")'
[ Info: Hello from curl_multi_assign
[ Info: Hello from curl_multi_assign
[ Info: Hello from curl_multi_assign
[ Info: Hello from curl_multi_assign
[ Info: Hello from curl_multi_assign
[ Info: Hello from curl_multi_assign
   Resolving package versions...
[ Info: Hello from curl_multi_assign
[ Info: Hello from curl_multi_assign
[ Info: Hello from curl_multi_assign
[ Info: Hello from curl_multi_assign
   Installed Example ─ v0.5.3
    Updating `/private/tmp/Project.toml`
  [7876af07] + Example v0.5.3
    Updating `/private/tmp/Manifest.toml`
  [7876af07] + Example v0.5.3
Precompiling packages finished.
  1 dependency successfully precompiled in 1 seconds. 25 already precompiled.

Note that Example is being installed (I had deleted the source from my ${JULIA_DEPOT_PATH}/packages directory), and the message Hello from curl_multi_assign to confirm my custom Downloads version is being used.

@giordano
Copy link
Contributor

Alright, I finally managed to hit a NULL handle with

% julia --project=/tmp -e 'using Downloads; Downloads.download("https://example.org", IOBuffer())'
┌ Error: socket_callback: unexpected error
│   err = multi.handle is NULL
└ @ Downloads.Curl ~/repo/Downloads.jl/src/Curl/Multi.jl:204

but only with libcurl 8.10.0+. This is happening at the exit of julia, so probably in the finalizer of the

# Close any Multis and their timers at exit that haven't been finalized by then
Base.atexit() do
while true
w = @lock MULTIS_LOCK (isempty(MULTIS) ? nothing : pop!(MULTIS))
w === nothing && break
w = w.value
w isa Multi && done!(w)
end
end

All my tests above were with Julia master (0073917331) and built-in libcurl 8.6.0. I also can't reproduce it with a local build of libcurl 8.9.0. What I did here was to use same version of Julia and Downloads.jl and link it to a local build of libcurl 8.10.0 or 8.10.1. But the thing is that the code path on the julia side changed only by changing libcurl version.

@mrufsvold
Copy link

I don't understand the linking between libcurl, libcurl_jll, etc. very well, so please forgive me if this isn't adding any new information.

I hit this same segfault using the Downloads.jl backend for AWS.jl. It ran continuously for 5 hours with no problem. Then crashed on a segfault. This did not occur why Julia was exiting. Unfortunately, since it ran the same loop for many hours before hand and there are several places in the code that interact with AWS services, it's hard for me to narrow down a reproducible example.

Here is the trace:

[1] signal (11.1): Segmentation fault
in expression starting at /usr/local/consolidator_script.jl:25
curl_multi_cleanup at /usr/local/julia-1.10.5/bin/../lib/julia/libcurl.so.4 (unknown line)
curl_multi_cleanup at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/LibCURL/src/lC_curl_h.jl:214 [inlined]
done! at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Downloads/src/Curl/Multi.jl:29 [inlined]
#36 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Downloads/src/Curl/Multi.jl:81 [inlined]
lock at ./lock.jl:229
#35 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Downloads/src/Curl/Multi.jl:78 [inlined]
#726 at ./asyncevent.jl:306
jfptr_YY.726_75903.1 at /usr/local/julia-1.10.5/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/task.c:1238
Allocations: 18104701385 (Pool: 18059901401; Big: 44799984); GC: 6844

I am running:
Julia v1.10.5
LibCURL_jll v8.4.0

On a linux docker container running on x86.

It's rather difficult to pull more details since this is a non-interactive app running in the cloud, but if there is anything I can provide that would be helpful, I'm happy to get it.

@uncomfyhalomacro
Copy link

uncomfyhalomacro commented Oct 4, 2024

I am also receiving the same thing since openSUSE ships the same libcurl version as well or at least greater than what Julia provides in its bundled deps.

Image

@uncomfyhalomacro
Copy link

I can reproduce with add DifferentialEquations but not add Example
Image

@uncomfyhalomacro
Copy link

uncomfyhalomacro commented Oct 5, 2024

Seems to be resolved with this patch curl/curl@48f61e7. I will test this on an isolated build environment. I should send patches to the curl maintainer of opensuse.

EDIT: seems they already pushed a patched version of curl and i didn't see that. just need to wait for image rebuild to happen in the coming weeks :)

@an-Iceberg
Copy link

an-Iceberg commented Oct 6, 2024

I managed to solve this problem by uninstalling the AUR version and installing it the official way according to the instructions on their website ¯\_(ツ)_/¯

@uncomfyhalomacro
Copy link

I managed to solve this problem by uninstalling the AUR version and installing it the official way according to the instructions on their website ¯_(ツ)_/¯

Welp. Good for you. What I did was use the leap variant of the builds using distrobox. seems it's still not fixed on tumbleweed with curl v8.10.0+ something

@inkydragon
Copy link
Member

inkydragon commented Oct 19, 2024

Calls curl_multi_assign with null

In julia side:

  1. do curl_multi_cleanup(handle) first, and then set handle = null
  2. we may check multi.handle==null in socket_callback, if true just return and do nothing.

In libcurl side:
3. We may want to backport patch (curl/curl@48f61e7) , for old libcurl version (< v8.10.1), to prevent Segmentation fault in libcurl.

Error: curl_multi_assign: 1

┌ Error: curl_multi_assign: 1
└ @ Downloads.Curl /mnt/d/jl/julia/usr/share/julia/stdlib/v1.12/Downloads/src/Curl/utils.jl:57

[34923] signal 6 (-6): Aborted
  1. This is a curl bug, we need patch curl/curl@461ce6c

    With patch 48f61e7, curl_multi_assign will check if multi.handle is good (handle.magic == SPECIAL_CONST).
    But curl_multi_cleanup will set handle.magic=0, and call socket_callback (and call curl_multi_assign)

    GOOD_MULTI_HANDLE(x): https://github.com/curl/curl/blob/6847733191f3a8d4c475d2a1a8439f6a7edfcd77/lib/multi.c#L87

patch

1. clean and set to null

function done!(multi::Multi)
stoptimer!(multi)
handle = multi.handle
handle == C_NULL && return
multi.handle = C_NULL
curl_multi_cleanup(handle)
nothing
end

-     multi.handle = C_NULL 
      curl_multi_cleanup(handle) 
+     multi.handle = C_NULL 

ref:

2. check null in socket_callback

multi = unsafe_pointer_to_objref(multi_p)::Multi
if watcher_p != C_NULL

         multi = unsafe_pointer_to_objref(multi_p)::Multi
+        multi.handle == C_NULL && return -1
+
         if watcher_p != C_NULL

With patch 1, this check may be unnecessary.


And I've found another issue:
we may forget to remove and cleanup all Easy handles before cleanup multi handles.

The order of cleaning up should be:

1 - curl_multi_remove_handle before any easy handles are cleaned up
2 - curl_easy_cleanup can now be called independently since the easy handle is no longer connected to the multi handle
3 - curl_multi_cleanup should be called when all easy handles are removed

https://curl.se/libcurl/c/curl_multi_cleanup.html

@giordano
Copy link
Contributor

Did you actually try the suggested patch? I seem to remember I tried already

-     multi.handle = C_NULL 
      curl_multi_cleanup(handle) 
+     multi.handle = C_NULL 

but that broke something else.

@giordano
Copy link
Contributor

Also, I suggested

         multi = unsafe_pointer_to_objref(multi_p)::Multi
+        multi.handle == C_NULL && return -1
+
         if watcher_p != C_NULL

at https://julialang.slack.com/archives/C01439FAFA7/p1726680870906299 but @vtjnash said that would introduce a data race. You can see that in that thread I discussed several solution, including adding a finalizing flag

diff --git a/src/Curl/Multi.jl b/src/Curl/Multi.jl
index d2be032..fad991f 100644
--- a/src/Curl/Multi.jl
+++ b/src/Curl/Multi.jl
@@ -4,9 +4,10 @@ mutable struct Multi
     timer  :: Union{Nothing,Timer}
     easies :: Vector{Easy}
     grace  :: UInt64
+    finalizing :: Bool
 
     function Multi(grace::Integer = typemax(UInt64))
-        multi = new(ReentrantLock(), C_NULL, nothing, Easy[], grace)
+        multi = new(ReentrantLock(), C_NULL, nothing, Easy[], grace, false)
         finalizer(done!, multi)
         @lock MULTIS_LOCK push!(filter!(m -> m.value isa Multi, MULTIS), WeakRef(multi))
         return multi
@@ -26,6 +27,7 @@ function done!(multi::Multi)
     handle = multi.handle
     handle == C_NULL && return
     multi.handle = C_NULL
+    multi.finalizing = true
     curl_multi_cleanup(handle)
     nothing
 end
@@ -162,6 +164,7 @@ function socket_callback(
             return -1
         end
         multi = unsafe_pointer_to_objref(multi_p)::Multi
+        multi.finalizing && return 0
         if watcher_p != C_NULL
             old_watcher = unsafe_pointer_to_objref(watcher_p)::FDWatcher
             @check curl_multi_assign(multi.handle, sock, C_NULL)

but none of that was accepted.

@inkydragon
Copy link
Member

inkydragon commented Oct 19, 2024

Did you actually try the suggested patch?

Yes, I've tried all those 4 patch with

  • Version 1.12.0-DEV.1436 (2024-10-19) (b0c1525f17)
  • Downloads.jl 1.6.0 (89d3c7d)
  • LibCURL source build v8.10.1
  • Ubuntu on Win64/WSL2

test with

using LibCURL

libcurl_ver = unsafe_string(LibCURL.curl_version())
println("libcurl_ver: $libcurl_ver")

using Downloads
Downloads.download("https://example.org", IOBuffer());

exit()

With LibCURL v8.10.1 (included patch 3):

  • Apply patch 4 (only update libcurl): Error: curl_multi_assign: 1, sice multi.handle = C_NULL is not good.
  • Apply patch 1+4: Good. So, maybe we dont need patch 2.
  • Apply patch 1+2+4: Good

I seem to remember I tried already
but that broke something else.

Only apply patch 2, Without patch 4, multi.handle != C_NULL, but is still not a GOOD_MULTI_HANDLE (See my upper reply).


In summary:

  • Update libcurl to v8.10.1, and apply patch curl/curl@461ce6c to make multi.handle looks "good".
  • Apply patch 1 to Downloads.jl
-     multi.handle = C_NULL 
      curl_multi_cleanup(handle) 
+     multi.handle = C_NULL 

observingClouds added a commit to observingClouds/xbitinfo that referenced this issue Oct 28, 2024
this restriction may be lifted after JuliaLang/Downloads.jl#260
is solved
observingClouds added a commit to observingClouds/xbitinfo that referenced this issue Oct 28, 2024
* fix ignored python version setting
* limit libcurl version to <3.10

this restriction may be lifted after JuliaLang/Downloads.jl#260
is solved
@giordano
Copy link
Contributor

giordano commented Nov 2, 2024

I seem to remember I tried already

-     multi.handle = C_NULL 
      curl_multi_cleanup(handle) 
+     multi.handle = C_NULL 

but that broke something else.

I remembered correctly: tests are completely broken with the change you suggest (and that I had tried already), they hang forever, besides the failure reported in #261.

For the record, julia still crashes with signal 6 at exit on curl/curl@9d32724.

@giordano
Copy link
Contributor

giordano commented Nov 2, 2024

tests are completely broken with the change you suggest (and that I had tried already), they hang forever

Actually, there's a testset which just takes a while, that's normal. However with curl 8.10.1 test do crash with the change you suggest (which what I was referring to in curl/curl#14860 (comment)):

[1323261] signal 6 (-6): Aborted
in expression starting at /home/mose/.julia/dev/Downloads/test/runtests.jl:3
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fca2c46471a) at /lib/x86_64-linux-gnu/libc.so.6
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
curl_multi_assign at /home/mose/repo/julia/deps/srccache/curl-7eb8c048470ed2cc14dca75be9c1cdae7ac8498b/lib/multi.c:3691
curl_multi_assign at /home/mose/.julia/dev/Downloads/src/Curl/Curl.jl:54 [inlined]
macro expansion at /home/mose/.julia/dev/Downloads/src/Curl/utils.jl:56 [inlined]
socket_callback at /home/mose/.julia/dev/Downloads/src/Curl/Multi.jl:176
jfptr_socket_callback_2728 at /home/mose/.julia/compiled/v1.12/Downloads/bvwXU_wQxEU.so (unknown line)
jlcapi_socket_callback_2676 at /home/mose/.julia/compiled/v1.12/Downloads/bvwXU_wQxEU.so (unknown line)
Curl_multi_pollset_ev at /home/mose/repo/julia/deps/srccache/curl-7eb8c048470ed2cc14dca75be9c1cdae7ac8498b/lib/multi.c:2927
cpool_update_shutdown_ev at /home/mose/repo/julia/deps/srccache/curl-7eb8c048470ed2cc14dca75be9c1cdae7ac8498b/lib/conncache.c:1086
cpool_discard_conn at /home/mose/repo/julia/deps/srccache/curl-7eb8c048470ed2cc14dca75be9c1cdae7ac8498b/lib/conncache.c:771
cpool_close_and_destroy_all at /home/mose/repo/julia/deps/srccache/curl-7eb8c048470ed2cc14dca75be9c1cdae7ac8498b/lib/conncache.c:661
Curl_cpool_destroy at /home/mose/repo/julia/deps/srccache/curl-7eb8c048470ed2cc14dca75be9c1cdae7ac8498b/lib/conncache.c:189
curl_multi_cleanup at /home/mose/repo/julia/deps/srccache/curl-7eb8c048470ed2cc14dca75be9c1cdae7ac8498b/lib/multi.c:2758
curl_multi_cleanup at /home/mose/repo/julia/usr/share/julia/stdlib/v1.12/LibCURL/src/lC_curl_h.jl:214 [inlined]
done! at /home/mose/.julia/dev/Downloads/src/Curl/Multi.jl:28 [inlined]
#remove_handle##6 at /home/mose/.julia/dev/Downloads/src/Curl/Multi.jl:72
unknown function (ip: 0x7fca01b8c68f) at (unknown file)
unknown function (ip: 0x7fca01b8c609) at (unknown file)
unknown function (ip: 0x7fca01b8c5a1) at (unknown file)
lock at ./lock.jl:253
#remove_handle##4 at /home/mose/.julia/dev/Downloads/src/Curl/Multi.jl:69
unknown function (ip: 0x7fca01b8c2b2) at (unknown file)
unknown function (ip: 0x7fca01b0f6d9) at (unknown file)
unknown function (ip: 0x7fca01b0f67c) at (unknown file)
#627 at ./asyncevent.jl:313
unknown function (ip: 0x7fca01b0f5cf) at (unknown file)
jl_apply at /home/mose/repo/julia/src/julia.h:2243 [inlined]
start_task at /home/mose/repo/julia/src/task.c:1263
Allocations: 8194245 (Pool: 8194087; Big: 158); GC: 9
ERROR: Package Downloads errored during testing (received signal: 6)

But with curl/curl@461ce6c (which didn't exist when I looked into this two months ago) tests finally do finish successfully, besides the failure in #261, at least there are no crashes nor segfaults. So moving

multi.handle = C_NULL

after the cleanup is probably a good solution, once we upgrade to whatever is the next version of curl.

@inkydragon
Copy link
Member

@valeriupredoi
Copy link

hi folks, curl==8.11 is currently marked as broken on conda-forge, so we'll have to wait until the fixed package gets uploaded; does this mean that this issue will be fixed with the new curl >=8.11, and we have to pin curl in our environment, or will Julia work again with any curl, including 8.10.x? Cheers muchly 🍺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants