Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when using multithreaded instances of KNITRO #93

Closed
frapac opened this issue Dec 6, 2018 · 25 comments · Fixed by #138 or #282
Closed

Segfault when using multithreaded instances of KNITRO #93

frapac opened this issue Dec 6, 2018 · 25 comments · Fixed by #138 or #282

Comments

@frapac
Copy link
Collaborator

frapac commented Dec 6, 2018

For instance, if we set a number of threads greater to one in the example tuner.jl:
https://github.com/Artelys/KNITRO.jl/blob/fp/moi/examples/tuner.jl#L185

KNITRO returns a segfault.

@frapac frapac changed the title Segfault when using multithreaded versions of KNITRO Segfault when using multithreaded instances of KNITRO Dec 6, 2018
@odow
Copy link
Member

odow commented Dec 6, 2018

CPLEX has similar issues: jump-dev/JuMP.jl#904

@frapac
Copy link
Collaborator Author

frapac commented Oct 31, 2019

Investigating it further, it appears that the problem arises only when we are using Knitro's callbacks. We should figure out a thread-safe way to use them.

@jlperla
Copy link

jlperla commented Oct 31, 2019

Great. So it sounds like this is likely a more classic thread safety issue rather than anything to do with pthreads?

@frapac
Copy link
Collaborator Author

frapac commented Oct 31, 2019

Yes, which is a good news! I think the problem arises as we are embedding a KNITRO.Model in each callback:
https://github.com/JuliaOpt/KNITRO.jl/blob/master/src/kn_callbacks.jl#L12

I am working on a fix to get rid of that.

@jlperla
Copy link

jlperla commented Oct 31, 2019

Perfect. @arnavs is on vacation for a few weeks, but if anything is ready around that time, we can test it.

@frapac
Copy link
Collaborator Author

frapac commented Nov 4, 2019

Investigating it further, it seems that the issue arises because Knitro callbacks do not respect this rule:
https://docs.julialang.org/en/v1.3-dev/manual/calling-c-and-fortran-code/#Thread-safety-1

@jlperla
Copy link

jlperla commented Nov 4, 2019

@frapac Great. Seems like a pain, but they provide a workaround?

But, if I understand the workaround, we would need to be careful that we don't just end up executing all of the julia callabacks in a single julia thread, or else the advantages of parallel knitro are limited. Perhaps what is needed is that you would start up julia NUM_THREADS with roughly the same amount you want to do with the knitro threads, and then the first-layer of the callback would schedule the execution using the new PARTR stuff?

Regardless, I think that multithreading support in knitro should target Julia 1.3 and above. Too much changed in the threading model in the 1.2 to 1.3 release to make backwards compatibility worthwhile.

@frapac
Copy link
Collaborator Author

frapac commented Nov 5, 2019

Even if I set par_concurrent_evals=0 (as done in the knitroampl interface, which is not thread safe), we get a segfault. This looks tricky.

A gdb call on julia-debug returns the stacktrace:

Multistart will generate 20 start points as follows:
      0 variables will vary within their upper and lower bounds
      2 variables will vary over a range of 1000
[New Thread 0x7fffd33f5700 (LWP 26317)]

Knitro parallel multistart will run with 2 threads.

 Solve #  ThreadID  Status     Objective     FeasError   OptError   Real Time 
--------  --------  ------  --------------  ----------  ---------- ----------

Thread 4 "julia-debug" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd33f5700 (LWP 26317)]
0x00007ffff74b00a6 in jl_excstack_state ()
    at /home/fpacaud/dev/julia-1.3.0-rc4/src/rtutils.c:283
283	    jl_excstack_t *s = ptls->current_task->excstack;

So, Knitro creates a new thread (here with id 0x7fffd33f5700) and fails to evaluate the Julia callback on this particular thread.

@jlperla
Copy link

jlperla commented Nov 5, 2019

Alas, I fear this is outside of my paygrade and I don't have any students who would be likely to know the answers. Should we beg help from the experts? @ChrisRackauckas do you know a good person we could ask about these sorts of multi-threading issues with external libraries? Have you run into this with DiffEq callbacks, for example?

@odow
Copy link
Member

odow commented Dec 17, 2019

Looks like this was closed accidentally by #138.

@frapac frapac reopened this Dec 17, 2019
@frapac
Copy link
Collaborator Author

frapac commented Dec 17, 2019

Right, thank you!

@ferrolho
Copy link
Contributor

Hello @frapac! Do you have any updates on this? I was just now looking into computing finite difference gradient values in parallel with par_numthreads, but when I specify >1 threads it still starts the optimisation with Knitro performing finite-difference gradient computation with 1 thread.

@frapac
Copy link
Collaborator Author

frapac commented May 29, 2020

Hi @ferrolho! There was indeed a few updates due to Knitro 12.2. Now Knitro enables parallel multistart by default in its internal, hence causing segfault in Knitro.jl when using parallel multistart. To catch this use case, Knitro.jl now set explicitly the number of threads to 1.

@ferrolho
Copy link
Contributor

Okay, thanks! Just to make sure: finite-difference gradient computation with more than one thread is not possible at the moment, correct?

@frapac
Copy link
Collaborator Author

frapac commented May 29, 2020

If you are using a callback function coded in Julia, that's correct. But if you have a pure linear/quadratic/conic structure (hence you do not rely on any callback) parallel finite-difference or multistart should work.

@ferrolho
Copy link
Contributor

I see... Thank you for the clarification. Indeed I was trying to do it for a general callback implemented in Julia. I will keep in mind this would work if I wanted to do it for pure Knitro structures though. Thanks!

@jlperla
Copy link

jlperla commented May 29, 2020

Thanks @frapac Are parallel nonlinear problems with knitro hopeless in the short term? Anything Artelys can do about it?

@frapac
Copy link
Collaborator Author

frapac commented May 30, 2020

Currently, I think parallel evaluations are hopeless in the near term. However, I see some hope in the long term.

From what we have investigated, the segfault occurs when we are calling a Julia callback from an external C thread (cf previous stacktrace).

Two workarounds would be

  • On the Knitro side: ensure that when we set par_concurrent_evals to 0 Knitro calls all external callbacks on the same thread as the original one (this is currently not the case, hence the segfault). This should be not that difficult to implement in Knitro, but we won't have the benefit of pure concurrent evaluations. However, concurrent evaluations work currently only when using Knitro with C, C++ and Fortran. Even knitroampl sets explicitly par_concurrent_evals to 0 when using multithreaded multistart or finite-diff.
  • On the Julia side, I see more glimmer of hope. I am following closely Calling a Julia function from a non-Julia thread JuliaLang/julia#17573, and this comment introduces a good idea. Indeed, a solution would be to have a init_external_threads in Julia and then call Julia callbacks from different C threads in a seamless manner. If we are careful in Knitro.jl, we could even have concurrent evaluations at the end.

@james-atkins
Copy link

Since 1.9, threads started outside the Julia runtime (i.e. from Knitro) can now become able to call into Julia code by calling jl_adopt_thread as JuliaLang/julia#46609 has been merged. So I think it is now possible to have concurrent evaluations in Julia versions 1.9 and above.

@odow
Copy link
Member

odow commented Nov 13, 2023

I haven't looked into the details, but to enable this it seems we'd need to remove:

KNITRO.jl/src/C_wrapper.jl

Lines 514 to 528 in 33db39c

function KN_solve(m::Model)
# Check sanity. If model has Julia callbacks, we need to ensure
# that Knitro is not multithreaded. Otherwise, the code will segfault
# as we have trouble calling Julia code from multithreaded C
# code. See issue #93 on https://github.com/jump-dev/KNITRO.jl.
if has_callbacks(m)
if KNITRO_VERSION >= v"13.0"
KN_set_param(m, KN_PARAM_MS_NUMTHREADS, 1)
KN_set_param(m, KN_PARAM_NUMTHREADS, 1)
KN_set_param(m, KN_PARAM_MIP_NUMTHREADS, 1)
else
KN_set_param(m, "par_numthreads", 1)
KN_set_param(m, "par_msnumthreads", 1)
end
end

@DarioSlaifsteinSk
Copy link

DarioSlaifsteinSk commented Nov 27, 2023

Hi @frapac and @odow!
I'm trying to solve a MINLP, but my

set_optimizer_attributes(model,"mip_numthreads"=> 3)

Is being ignored anyway. I haven't defined any user callbacks or anything, why is it still happening then?
Is it because the solver uses its own callbacks for branch-and-bound?
My version info is:

julia> versioninfo()
Julia Version 1.9.0
Commit 8e63055292 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
  Threads: 1 on 8 virtual cores

@odow
Copy link
Member

odow commented Nov 27, 2023

Is it because the solver uses its own callbacks for branch-and-bound?

Yes. the has_callbacks is related to whether there are any callbacks in the model, not just user-defined ones. If you add a nonlinear constraint or objective, then KNITRO.jl will automatically create a callback that evaluates the function and gradient, etc.

@odow
Copy link
Member

odow commented Dec 5, 2023

I took a look here, #281, but the answer is fairly obvious in retrospect: JuMP/MOI cache data for the functions and gradients in a tape that is not thread safe, so this can never be a safe operation.

At the C level, you can opt-out of the thread safety by calling KN_solve(model.env) instead of KN_solve(model).

@DarioSlaifsteinSk
Copy link

DarioSlaifsteinSk commented Aug 13, 2024

Hi, how was this actually solved?
Can we use mip_numthreads for MINLP or NLP or is it always overwritten to 1?

@odow
Copy link
Member

odow commented Aug 13, 2024

how was this actually solve

It wasn't.

Can we use mip_numthreads for MINLP or NLP or is it always overwritten to 1?

It is always overwritten to 1. The callbacks from C into Julia are not thread safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants