Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libjulia: update to 1.8.2 and latest 1.9-DEV #5506

Merged
merged 2 commits into from
Oct 30, 2022

Conversation

fingolfin
Copy link
Member

I am not sure if the ABI for 1.8 changed since we made that last
snapshot, but it sure seems sensible to update to a stable release.
Since we are already at 1.8.1 by now, I've picked that.

Also update 1.9-DEV to a fresh release, which needs LLVM 14.

@fingolfin fingolfin force-pushed the mh/libjulia branch 2 times, most recently from bd9e090 to 524d478 Compare September 15, 2022 23:05
@fingolfin
Copy link
Member Author

fingolfin commented Sep 16, 2022

There are errors with the Julia 1.9 snapshot on ARM with glibc. They are in code from JuliaLang/julia#45110 (see this commit) by @vtjnash . On a cursory check it seems to me that perhaps the preprocessor logic in stackwalk.c and julia_internal.h might not match up completely?

This is the error:

[23:16:44] /workspace/srcdir/julia/src/stackwalk.c: In function 'jl_rec_backtrace':
[23:16:44] /workspace/srcdir/julia/src/stackwalk.c:903:24: error: 'bt_context_t {aka struct unw_tdep_context}' has no member named 'uc_mcontext'
[23:16:44]      mcontext_t *mc = &c.uc_mcontext;
[23:16:44]                         ^

So it is in this code in stackwalk.c:

...
#elif defined(JL_HAVE_UNW_CONTEXT)
    context = &t->ctx.ctx;
#elif defined(JL_HAVE_UCONTEXT)
    context = jl_to_bt_context(&t->ctx.ctx);
#elif defined(JL_HAVE_ASM)
    bt_context_t c;
    memset(&c, 0, sizeof(c));
 #if defined(_OS_LINUX_) && defined(__GLIBC__)
    __jmp_buf *mctx = &t->ctx.ctx.uc_mcontext->__jmpbuf;
    mcontext_t *mc = &c.uc_mcontext;  // <- this is line 903 with the error
...

So this is GLIBC and linux, and also ARM (which is explicitly handled farther down), but the definition of bt_context_t is apparently not what the code expects here?

@giordano
Copy link
Member

For the record, that was reported in JuliaLang/julia#45400 (comment) (and this is what's currently preventing us from using -Werror in CI for aarch64 Linux)

@fingolfin
Copy link
Member Author

@giordano It may be related, but here it is worse: in the comment you link to, we see a warning about a mismatching type; but here it is an error because something "has no member named" ?

@fingolfin
Copy link
Member Author

When JL_HAVE_UCONTEXT is defined, the problematic code in jl_rec_backtrace invokes jl_to_bt_context, which looks like this:
From signals-unix.c

// helper function for returning the unw_context_t inside a ucontext_t
// (also used by stackwalk.c)
bt_context_t *jl_to_bt_context(void *sigctx)
{
#ifdef __APPLE__
    return (bt_context_t*)&((ucontext64_t*)sigctx)->uc_mcontext64->__ss;
#elif defined(_CPU_ARM_)
    // libunwind does not use `ucontext_t` on ARM.
    // `unw_context_t` is a struct of 16 `unsigned long` which should
    // have the same layout as the `arm_r0` to `arm_pc` fields in `sigcontext`
    ucontext_t *ctx = (ucontext_t*)sigctx;
    return (bt_context_t*)&ctx->uc_mcontext.arm_r0;
#else
    return (bt_context_t*)sigctx;
#endif
}

(What that code comment says matches with what I see in the libunwind.h headers.)

However, despite its name, JL_HAVE_UCONTEXT seems to only be defined on Windows (!?). This used to be different but was changed in 5327824ec8c452410e2a2f755921764df9344855 also by @vtjnash . (Assuming that change was correct, that would suggest that perhaps renaming JL_HAVE_UCONTEXT might be a good idea -- as it is, there is some dead code in task.c which is for cases when JL_HAVE_UCONTEXT is defined but _OS_WINDOWS_ is not, which as far as I can tell is currently never the case? But perhaps it is kept around for debugging?

Anyway, also in jl_rec_backtrace the code guarded behind defined(JL_HAVE_UCONTEXT is thus effectively dead because defined(_OS_WINDOWS_) is checked first.

Anyway, all in all I am now confused why this code would build on any Linux + glibc + arm (32bit) system? What's the difference in our build here that makes it fail?

@giordano
Copy link
Member

giordano commented Sep 19, 2022

You're assuming Julia v1.8 works on 32-bit arm, we as far as I can see in https://julialang.org/downloads/#current_stable_release we don't have a build for that version. Last version to be built for armv7l is 1.7.3: https://julialang.org/downloads/oldreleases/

@fingolfin
Copy link
Member Author

fingolfin commented Sep 19, 2022

Well, 1.8.1 does seem to build fine, at least here... But yeah, 1.9-DEV is broken and I kinda assumed that it "should work", and that confused me a lot. But if you are telling me that this is not actually being tested actively, well that explains quite a bit :-).

I guess I could just disable ARM 32 bit support for Julia 1.9 for now? Not great but I don't think I'll have time to debug this issue in Julia... Though I could at least open an issue for it on the main Julia repo (should I?)

@fingolfin
Copy link
Member Author

Oh and Julia 1.9-DEV also fails to compile on aarch64-linux-gnu ... it is fine with musl though, and other Julia versions and platforms...

[23:16:21] /workspace/srcdir/julia/src/stackwalk.c:959:5: error: unknown type name 'unw_fpsimd_context_t'; did you mean 'unw_tdep_context_t'?
[23:16:21]      unw_fpsimd_context_t *mcfp = (unw_fpsimd_context_t*)&mc->__reserved;
[23:16:21]      ^~~~~~~~~~~~~~~~~~~~
[23:16:21]      unw_tdep_context_t
[23:16:21] /workspace/srcdir/julia/src/stackwalk.c:959:35: error: 'unw_fpsimd_context_t' undeclared (first use in this function); did you mean 'unw_tdep_context_t'?
[23:16:21]      unw_fpsimd_context_t *mcfp = (unw_fpsimd_context_t*)&mc->__reserved;
[23:16:21]                                    ^~~~~~~~~~~~~~~~~~~~
[23:16:21]                                    unw_tdep_context_t
[23:16:21] /workspace/srcdir/julia/src/stackwalk.c:959:35: note: each undeclared identifier is reported only once for each function it appears in

@fingolfin fingolfin changed the title libjulia: update to 1.8.1 and latest 1.9-DEV libjulia: update to 1.8.2 and latest 1.9-DEV Oct 25, 2022
@fingolfin
Copy link
Member Author

I've rebased this now to use Julia 1.8.2, and the latest 1.9-DEV.

It would be really good to finish this somehow. If aarch64-linux-gnu keeps failing, should I just disable it for 1.9?

@giordano
Copy link
Member

Might be an ok-ish solution for the time being, but we need to sort this out before 1.9 is out. Is there an open issue already?

@fingolfin
Copy link
Member Author

Huh, I could have sworn I filed an issue about it, but I don't see one now -- and I assume I would have linked here, but GitHub shows no such link, so... I probably never filed it :-(.

I'll wait for the results and if it still fails, will file that issue.

@fingolfin
Copy link
Member Author

Huh, it seems to be building llvm
from scratch in 1.9?!? A build system regression?!?

(also seems it needs a different libuv. Will look more closely when I am back at a computer, not a smartphone)

@fingolfin
Copy link
Member Author

Julia 1.9 builds fail on all platforms with

/workspace/srcdir/julia/src/gc.c:3676: undefined reference to `uv_get_available_memory'

Turns out the builds here end up loading LibUV_jll v2.0.1+5 but the Julia master branch references v2.0.1+11 . Now despite the innocent identical version differing only in the build number, these are built from completely different git commits of libuv (side note: using the build version like that doesn't strike me as a great approach? Perhaps one should consider to adapt something like the "multiply all version digits by 100 to get some wriggling room for updates" for (some) stdlibs, too?)

@fingolfin
Copy link
Member Author

I tried to deal with it by explicitly using Dependency(get_addable_spec("LibUV_jll", v"2.0.1+11")), and that seems to have helped with a few 1.9 builds. But they still are building their own LLVM, and I don't know what changed in the build system to cause that...

Anyway, this also made me wonder if it's good that we use e.g. USE_SYSTEM_LLVM=1 with current Julia versions -- wouldn't it be easier to just let Julia fetch its own dependencies these days? (Perhaps except for the actual dependencies LibUnwind_jll and LibUV_jll.) I guess it bypasses download caching on the builders?

@benlorenz
Copy link
Contributor

@fingolfin please try merging my wip commit from here: benlorenz@502048c

@fingolfin
Copy link
Member Author

Thanks to @benlorenz for pointing out I had to add USE_SYSTEM_LLD=1

@fingolfin
Copy link
Member Author

Here is the ARM issue JuliaLang/julia#47345

@fingolfin
Copy link
Member Author

I see no immediate way to fix the ARM 32 bit issue (the workaround suggested by @vtjnash (or at least how I understood the suggestion) does not seem to work, with the patch, it crashes for me on a Raspberry Pi 4 -- of course the crash might also be due to another, unrelated issue; one could presumably find out by checking the commit before the breaking commit; I plan to do that when I find some time).

But in the meantime, Julia master just landed another ABI breaking change in JuliaLang/julia#47170 (see oscar-system/GAP.jl#836), so I have some urgent need to update libjulia_jll and then we could update all our other JLLs using it. Unfortunately that means disabling ARM 32 bit for the time being, but I see no other realistic option right now :-(

@giordano giordano merged commit 61cefde into JuliaPackaging:master Oct 30, 2022
@fingolfin fingolfin deleted the mh/libjulia branch October 30, 2022 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants