-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NOMAD segfaults on Manjaro Linux when Julia is started with "-t n" with n>1 #39
Comments
Hi @simonp0420 ! |
I am careful to not attempt to modify the same value from distinct threads in my code. I do use a |
We compiled a sequential version of NOMAD with Yggdrasil (https://github.com/JuliaPackaging/Yggdrasil/blob/master/N/NOMAD/build_tarballs.jl) to avoid issues with Julia's |
Hi @simonp0420. If you are a bit on a hurry, you can use the executable When I take a look at your logs, it is strange it does not find the |
I am sorry about the long delay in responding to your answers. For some reason I haven't received notifications of these and I just happened to glance back here. Anyway... @amontoison I can supply the code I tried to use but it requires installing my PSSFSS package which is not a small, minimal example. Let me know if you would like me to do this or if I should try to come up with a different, much smaller example. @Salomoni, I installed Julia by downloading it (actually using Jill.py on Linux and Chocolately on Windows). I'm not in a hurry, as there are plenty of other optimizers available for my problem. I would like to see if NOMAD is more efficient or arrives at a better solution than, say CMAEvolutionStrategy.jl. |
@simonp0420 I have access to a Linux machine (Fedora). If you want, I can take a look at your example. Otherwise, if you have a smaller example, you are welcome. I tried with a silly @thread loop blackbox function, but it does not fail. |
@Salomi, thanks for your offer to look at my example. I haven't been able to generate a MWE that also exhibits the seg fault, so I'm guessing that I'm doing something wrong with threading. I hope this isn't a waste of your time, but here is my failing example: using PSSFSS, NOMAD
using Dates: now
let bestf = typemax(Float64)
global bb
"""
(success, counteval, [objective, c1, c2, c3, c4, c5, c6]) = bb(x)
x = [period, wo, ho, wi, hi, wc, hc, t1, t2]
ao = bo = ac = bc = period; ai = √2*period, bi = period/√2
constraints to be held ≤ 0:
c1 = ho - 0.99*bo
c2 = hi - 0.99*bi
c3 = hc - 0.99*bc
c4 = 2.05*wo - ho
c5 = 2.05*wi - hi
c6 = 2.05*wc - hc
"""
function bb(x)
period, wo, ho, wi, hi, wc, hc, t1, t2 = x
ao = bo = ai = bi = ac = bc = period
ai *= √2
bi /= √2
c1 = ho - 0.99*bo
c2 = hi - 0.99*bi
c3 = hc - 0.99*bc
c4 = 2.05*wo - ho
c5 = 2.05*wi - hi
c6 = 2.05*wc - hc
returnval = [5000.0,c1,c2,c3,c4,c5,c6]
any(returnval[2:end] .> 0) && (return (false, false, returnval))
outer(rot) = meander(a=ao, b=bo, w1=wo, w2=wo, h=ho, units=mm, ntri=400, rot=rot)
inner(rot) = meander(a=ai, b=bi, w1=wi, w2=wi, h=hi, units=mm, ntri=400, rot=rot)
center(rot) = meander(a=ac, b=bc, w1=wc, w2=wc, h=hc, units=mm, ntri=400, rot=rot)
substrate = Layer(width=0.1mm, epsr=2.6)
foam(w) = Layer(width=w, epsr=1.05)
rot0 = 0
strata = [
Layer()
outer(rot0)
substrate
foam(t1*1mm)
inner(rot0 - 45)
substrate
foam(t2*1mm)
center(rot0 - 2*45)
substrate
foam(t2*1mm)
inner(rot0 - 3*45)
substrate
foam(t1*1mm)
outer(rot0 - 4*45)
substrate
Layer() ]
steering = (θ=0, ϕ=0)
flist = 11:0.25:19
results = analyze(strata, flist, steering, showprogress=false)
s11rr, s21ll, ar11db, ar21db = eachcol(extract_result(results,
@outputs s11db(R,R) s21db(L,L) ar11db(R) ar21db(L)))
RL = -s11rr
IL = -s21ll
obj = maximum(vcat(RL,IL,ar11db,ar21db))
returnval[1] = obj
if obj < bestf
bestf = obj
open("optimization_best.log", "a") do fid
xround = map(t -> round(t, digits=4), x)
println(fid, round(obj,digits=4), " at x = ", xround, " #", now())
end
end
return (true, true, returnval)
end
end
# x = [period, wo, ho, wi, hi, wc, hc, t1, t2]
xmin = [3.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 1.5, 1.5]
xmax = [5.5, 0.35,4.0, 0.35, 4.0, 0.35, 4.0, 6.0, 6.0]
x0 = 0.5 * (xmin + xmax)
nb_inputs = length(x0)
nb_outputs = 7
output_types = ["OBJ"; repeat(["EB"], nb_outputs-1)]
prob = NomadProblem(nb_inputs, nb_outputs, output_types, bb;
lower_bound = xmin,
upper_bound = xmax,
granularity = 1.e-3 * ones(nb_inputs))
isfile("optimization_best.log") && rm("optimization_best.log")
result = solve(prob, x0) Thanks for looking at it. |
Whoops, the comment in my code was incorrect. c1, c2, ...c6 are all to be
held less than or equal to zero. I've edited the comment to correct this.
I believe the executable code is/was correct.
…On Tue, Jun 1, 2021 at 11:39 AM salomonl ***@***.***> wrote:
@simonp0420 <https://github.com/simonp0420> Thank you for the code. To be
sure, your constraints that you want to satisfy are of the form c(x) >= 0 ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#39 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAYNWKNKQFACGRQV5QIBZTTQUSPRANCNFSM44SRE6FQ>
.
|
@simonp0420 I apologize for the long delay. I can confirm with your example I am able to reproduce the bug, even if I use my own version of As an alternative, you can use the I rewrote your code as a blackbox using ArgParse
using PSSFSS
using Dates: now
"""
(success, counteval, [objective, c1, c2, c3, c4, c5, c6]) = bb(x)
x = [period, wo, ho, wi, hi, wc, hc, t1, t2]
ao = bo = ac = bc = period; ai = √2*period, bi = period/√2
constraints to be held nonnegative:
c1 = ho - 0.99*bo
c2 = hi - 0.99*bi
c3 = hc - 0.99*bc
c4 = 2.05*wo - ho
c5 = 2.05*wi - hi
c6 = 2.05*wc - hc
"""
function bb(x)
period, wo, ho, wi, hi, wc, hc, t1, t2 = x
ao = bo = ai = bi = ac = bc = period
ai *= √2
bi /= √2
c1 = ho - 0.99*bo
c2 = hi - 0.99*bi
c3 = hc - 0.99*bc
c4 = 2.05*wo - ho
c5 = 2.05*wi - hi
c6 = 2.05*wc - hc
returnval = [5000.0,c1,c2,c3,c4,c5,c6]
any(returnval[2:end] .> 0) && (return (false, false, returnval))
outer(rot) = meander(a=ao, b=bo, w1=wo, w2=wo, h=ho, units=mm, ntri=400, rot=rot)
inner(rot) = meander(a=ai, b=bi, w1=wi, w2=wi, h=hi, units=mm, ntri=400, rot=rot)
center(rot) = meander(a=ac, b=bc, w1=wc, w2=wc, h=hc, units=mm, ntri=400, rot=rot)
substrate = Layer(width=0.1mm, epsr=2.6)
foam(w) = Layer(width=w, epsr=1.05)
rot0 = 0
strata = [
Layer()
outer(rot0)
substrate
foam(t1*1mm)
inner(rot0 - 45)
substrate
foam(t2*1mm)
center(rot0 - 2*45)
substrate
foam(t2*1mm)
inner(rot0 - 3*45)
substrate
foam(t1*1mm)
outer(rot0 - 4*45)
substrate
Layer() ]
steering = (θ=0, ϕ=0)
flist = 11:0.25:19
results = analyze(strata, flist, steering, showprogress=false)
s11rr, s21ll, ar11db, ar21db = eachcol(extract_result(results,
@outputs s11db(R,R) s21db(L,L) ar11db(R) ar21db(L)))
RL = -s11rr
IL = -s21ll
obj = maximum(vcat(RL,IL,ar11db,ar21db))
returnval[1] = obj
return (true, true, returnval)
end
# This blackbox takes in input a file containing the coordinates of the point you want to evaluate...
s = ArgParseSettings()
@add_arg_table s begin
"filename"
required = true
end
parsed_args = parse_args(ARGS, s)
input_values = begin
open(parsed_args["filename"], "r") do file
lines = readline(file)
[parse(Float64, elt) for elt in split(lines, " ")]
end
end
# ... and return on the standard output the outputs of the blackbox...
bb_outputs = bb(input_values)
for elt in bb_outputs[3]
print(elt)
print(" ")
end
println()
# ...with a signal indicating if the evaluation failed or not.
if bb_outputs[1] == true
exit(0)
else
exit(1)
end In the same folder where
There exists other parameters, and you can get an history of your execution by adding other parameters (see https://nomad-4-user-guide.readthedocs.io/en/latest/Appendix.html). After typing the following command
This is not a silver bullet, but I hope it will temporarily help you. |
Thanks for looking at this and providing a workaround. I'm glad you were able to confirm the issue, as this is the first time I've ever used mutithreading in any language and I suspected that the problem might be with my use of multithreading. I've used argparse before and I know that it adds its own substantial overhead to the already significant startup time for Julia. These two considerations plus the need to modify the source and create the parameters file make your workaround less attractive to me than simply running the original code in a single-threaded Julia session. I'll plan on doing this until the time when the threading interface settles down and the NOMAD and Julia can multithread harmoniously. Thanks again for the significant effort you put into looking at this. |
Not exactly this, but for me nomad.jl segfaults when doing allocation in parallel loops ( |
@SobhanMP Thanks for the workaround tip. Perhaps the maintainers of NOMAD may have other ideas, but it sounds like you might be able to provide a simple minimal working example (MWE) of how using @threads causes a segfault with NOMAD. If so, would you consider posting it here? I was unable to generate a MWE. It may help the maintainers with debugging the problem. |
using Base.Threads
using NOMAD
using LinearAlgebra
n = 5
A = randn(n, n)
function f(x)
y = fill(0.0, nthreads())
@threads for i in eachindex(x)
for j in 1:100
g = rand(10, 10)
end
y[threadid()] = x[i]
end
(true, true, [sum(y)])
end
pb = NomadProblem(n,
1,
["OBJ"],
f;
upper_bound=[100.0 for _ in 1:n],
lower_bound=[0.0 for _ in 1:n])
pb.options.max_bb_eval = 3
result = @time NOMAD.solve(pb, rand(n))
display(result) gives
when ran with
commenting the loop in line 10-12 that causes allocation and garbage collection(i suspect this is culprit), makes it work. I'm using julia 1.6.1 from the julia website on gentoo! |
@SobhanMP thanks for posting your example. Hopefully this will be easier for the developers to test and debug than my case. |
this should be closed, it's no longer an issue with julia/version-1.8 branch |
I'm still seeing the same behavior for my application, both on 1.7.3 and 1.8-rc1: Works fine with Julia multithreading under Windows, but segfaults on Manjaro if I start Julia with more than a single thread. |
the MVE i had no longer breaks down using the git version (the branch version-1.8 not 1.8-rc1) |
my bad, seems like it just takes a bit longer to segfault |
@simonp0420 can you try my fork of NOMAD.jl (you can dev it by removing NOMAD.jl and running
|
It's working! So far it's completed four evaluations of the objective function, running with 8 threads! I will continue to let it run and report back tomorrow. But things are looking very good so far :-) |
: ) |
Still running (in the end game, I think). It has run without error for over 17 hours, with more than 3000 evaluations of the objective function. Congratulations! Great work! 👏 👏 👏 |
Fixed by PR 59. Many thanks! |
Firstly, thanks for making this great optimizer available to Julia users!
I have an expensive objective function that takes about 20 seconds to evaluate with threading enabled in Julia. When I try to optimize with NOMAD on Manjaro Linux, starting Julia with -t2 -t3, etc., on my 8-core machine, I get the following error (-t1 works fine, though slowly):
This error does not occur on my Windows machine. Here is my configuration:
I'm using NOMAD.jl v. 2.1.0.
Actually, from looking at the information my objective function writes out, it looks like the segfault is occurring in the objective function, presumably when the first Threads.@threads statement is encountered. But, as I noted previously, this error doesn't occur on my Windows machine, where I'm using 8 threads.
The text was updated successfully, but these errors were encountered: