Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stack pointer retrieval in jl_backtrace_from_here #42585

Merged
merged 29 commits into from
Nov 18, 2021
Merged

Conversation

tkf
Copy link
Member

@tkf tkf commented Oct 10, 2021

It seems jl_backtrace_from_here with the returnsp flag set was broken. This patch adds a few fixes and a test.

Comment on lines 261 to +263
if (returnsp) {
sp_ptr = (uintptr_t*)jl_array_data(sp) + offset;
jl_array_grow_end(sp, maxincr);
sp_ptr = (uintptr_t*)jl_array_data(sp) + offset;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix 1: Take the pointer after grow, to avoid invalidation.

Comment on lines 168 to 171
bt_entry->uintptr = call_ip;
if (sp)
sp[n] = thesp;
n++;
Copy link
Member Author

@tkf tkf Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix 2: set sp[n] where bt_data[n] contains the call IP. It makes filtering out invalid stack pointers easy (see Base._reformat_sp). Previously, it was stored at the beginning of the extended entry.

All calls to jl_unw_stepn pass sp = NULL expect the one from jl_backtrace_from_here. So, I think this is a NFC.

$ git grep 'jl_unw_stepn('
src/stackwalk.c:static int jl_unw_stepn(bt_cursor_t *cursor, jl_bt_element_t *bt_data, size_t *bt_size,
src/stackwalk.c:    jl_unw_stepn(&cursor, bt_data, &bt_size, NULL, maxsize, 0, &pgcstack, 1);
src/stackwalk.c:    jl_unw_stepn(&cursor, bt_data, &bt_size, NULL, maxsize, skip + 1, &pgcstack, 0);
src/stackwalk.c:            have_more_frames = jl_unw_stepn(&cursor, (jl_bt_element_t*)jl_array_data(ip) + offset,
src/stackwalk.c:    jl_unw_stepn(&cursor, bt_data, &bt_size, NULL, maxsize, 0, &pgcstack, 1);

test/backtrace.jl Outdated Show resolved Hide resolved
test/backtrace.jl Outdated Show resolved Hide resolved
test/backtrace.jl Outdated Show resolved Hide resolved
@tkf
Copy link
Member Author

tkf commented Oct 11, 2021

After e4a1edd, tester_linuxaarch64 https://build.julialang.org/#/builders/7/builds/4452/steps/5/logs/stdio and tester_linux32 https://build.julialang.org/#/builders/55/builds/4450/steps/5/logs/stdio now complain

src/ccall.cpp:888: jl_cgval_t emit_llvmcall(jl_codectx_t&, jl_value_t**, size_t): Assertion `*it == f->getFunctionType()->getParamType(i)' failed.

from test/backtrace.jl

julia: /buildworker/worker/package_linuxaarch64/build/src/ccall.cpp:888: jl_cgval_t emit_llvmcall(jl_codectx_t&, jl_value_t**, size_t): Assertion `*it == f->getFunctionType()->getParamType(i)' failed.

signal (6): Aborted
in expression starting at /buildworker/worker/tester_linuxaarch64/build/share/julia/test/backtrace.jl:314

I'll just roll it back for now.

base/error.jl Outdated Show resolved Hide resolved
base/error.jl Outdated Show resolved Hide resolved
test/backtrace.jl Outdated Show resolved Hide resolved
@tkf
Copy link
Member Author

tkf commented Oct 14, 2021

Hmm... The failure in win32 seems to be a real one: https://build.julialang.org/#/builders/9/builds/4162/steps/5/logs/stdio

Test Failed at C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\backtrace.jl:331
  Expression: ptr2 < sp[1] < ptr1
   Evaluated: Ptr{Nothing} @0x2a07e1b8 < Ptr{Nothing} @0x2a07e1b0 < Ptr{Nothing} @0x2a07e228

i.e., ptr2 < sp[1] doesn't hold because 0x2a07e1b8 - 0x2a07e1b0 == 8 > 0. But I have no idea why...

As a reference, the version without llvm.frameaddress actually worked in win32 (https://build.julialang.org/#/builders/90/builds/4644). Not sure if this is something fundamental or just that it's a heisenbug.

@tkf tkf removed the merge me PR is reviewed. Merge when all tests are passing label Nov 14, 2021
@tkf
Copy link
Member Author

tkf commented Nov 14, 2021

@vtjnash Don't we need something like 3f8868a to make it work on all platforms?

I guess the innermost noinlinecall is redundant though.

@tkf
Copy link
Member Author

tkf commented Nov 14, 2021

The failures in FreeBSD seem irrelevant https://build.julialang.org/#/builders/78/builds/5229. It passes the backtrace test.

@vtjnash
Copy link
Member

vtjnash commented Nov 15, 2021

I thought 3f8868a was simply needed if we want to ensure we the test would pass even if we were getting the wrong answer from the intermediate functions. Though either answer seems almost correct for what is expected of the stack pointer.

@tkf
Copy link
Member Author

tkf commented Nov 15, 2021

I was getting the failure in all platforms without 3f8868a.

In linux64 https://build.julialang.org/#/builders/69/builds/6083/steps/5/logs/stdio

Test Failed at /buildworker/worker/tester_linux64/build/share/julia/test/backtrace.jl:342
  Expression: ptr2 < sp[2] < ptr1
   Evaluated: Ptr{Nothing} @0x00007f6991e99f80 < Ptr{Nothing} @0x00007f6991e99ff0 < Ptr{Nothing} @0x00007f6991e99fe0

In win64 https://build.julialang.org/#/builders/65/builds/5392

Test Failed at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\test\backtrace.jl:342
  Expression: ptr2 < sp[2] < ptr1
   Evaluated: Ptr{Nothing} @0x0000000063c6cb20 < Ptr{Nothing} @0x0000000063c6cbe0 < Ptr{Nothing} @0x0000000063c6cbd0

In win32 https://build.julialang.org/#/builders/9/builds/5488

Test Failed at C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\backtrace.jl:342
  Expression: ptr2 < sp[2] < ptr1
   Evaluated: Ptr{Nothing} @0x6b57e228 < Ptr{Nothing} @0x6b57e2a0 < Ptr{Nothing} @0x6b57e298

In these cases, ptr2 < sp[2] holds but sp[2] < ptr1 does not.

Since jl_backtrace_from_here seems to try to skip its own frame, I thought I'd makes sense (at least in non-Windows?) to insert the outer noinlinecall since sp[2] will be the frame in which we call @llvm.frameaddress. (Though I don't understand why we see sp[2] > ptr1... Is it possible that the definition of frameaddress is different in LLVM and libunwind...?)

Is it OK to pull this as-is or do you want to investigate this further?

@vtjnash
Copy link
Member

vtjnash commented Nov 15, 2021

Perhaps it would be ptr2 < sp[2] && sp[1] < ptr1? I am not sure, and am okay with this as an improvement either way

@tkf
Copy link
Member Author

tkf commented Nov 15, 2021

Ah, good point. Let's see if it works.

@tkf
Copy link
Member Author

tkf commented Nov 16, 2021

Hmm... win64 failed but I don't think if it's relevant https://build.julialang.org/#/builders/65/builds/5439 (but re-starting it to be sure)

FreeBSD failure seems irrelevant.

@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Nov 16, 2021
@DilumAluthge
Copy link
Member

If you rebase this PR branch on the latest master, that will fix the FreeBSD test failures.

Also, I noticed that the master branch of your fork is behind the master branch of JuliaLang/julia. I'd recommend fast-forwarding the master branch of your fork so that it is up-to-date with the master branch of JuliaLang/julia.

@tkf
Copy link
Member Author

tkf commented Nov 16, 2021

Thanks, I didn't realize that this branch is rather behind the master. I just merged it.

I always start a branch from whatever the master of JuliaLang/julia is. So I don't think master of my GitHub fork is relevant here?

@DilumAluthge
Copy link
Member

I always start a branch from whatever the master of JuliaLang/julia is. So I don't think master of my GitHub fork is relevant here?

Ah, I thought you were starting feature branches off of the master of your fork. If not, yeah the master of your fork is not relevant.

@tkf tkf merged commit 455236e into JuliaLang:master Nov 18, 2021
@tkf tkf deleted the fix-sp branch November 18, 2021 02:18
@tkf tkf removed the merge me PR is reviewed. Merge when all tests are passing label Nov 18, 2021
KristofferC pushed a commit that referenced this pull request Nov 19, 2021
Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit 455236e)
KristofferC pushed a commit that referenced this pull request Nov 26, 2021
Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit 455236e)
@KristofferC
Copy link
Member

@tkf, the tests on #43084 fails due to this with LoadError: f(Ptr{Cvoid}(sp)) is not a function expression. Due you have a version of the tests which is backportable?

tkf added a commit to tkf/julia that referenced this pull request Nov 26, 2021
KristofferC pushed a commit that referenced this pull request Nov 26, 2021
…) (#43241)

* Fix stack pointer retrieval in jl_backtrace_from_here (#42585)

Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit 455236e)

* Avoid using call-site `@noinline`
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants