-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and warn on stack overflow #6928
Conversation
src/signal.cr
Outdated
# amount larger than a typical stack frame. | ||
stack_top = Pointer(Void).new([email protected] - 4096) | ||
if stack_top <= addr < Fiber.current.@stack_bottom | ||
LibC.dprintf 2, "Potential stack overflow (e.g., infinite or very deep recursion)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just say "Stack overflow" (with or without that infinite/recursion hint), but I would totally remove the "Invalid memory access" message from above. The user should see just one of those messages, not both. And even if it's a potential stack overflow but not certainly a stack overflow, I think the changes of that a super low so I'd rather have the user see "Stack overflow" and not confuse him with potential things (i.e. "Potential stack overflow? Well, maybe that's not it...").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I would raise a StackOverflowError
when this happen. Right now it's a bit slow to show the full trace for it, but we should improve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, but I don't think we can raise an exception from the SIGSEGV handler, because it runs on its own altstack, not the stack that segfaulted (it's full).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's true... When I tried it I watched it print "Unhandled exception: ...", but I can't seem to rescue
it. So printing an error and exiting is fine. After all, a stack overflow is probably a bug in the app and it usually happens when you start it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive my overly cautious error messages. I'll go out on a limb and claim (with certainty!) a stack overflow.
bc0203c
to
e60aefd
Compare
src/signal.cr
Outdated
stack_top = Pointer(Void).new([email protected] - 4096) | ||
|
||
if stack_top <= addr < Fiber.current.@stack_bottom | ||
LibC.dprintf 2, "Stack overflow (invalid memory access at address 0x%lx)\n", addr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps expand Stack overflow part into Stack overflow: infinite or too deep recursion (copied from previous PR by @ysbaddaden)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was there originally (edited a bit) but I took @asterite's comment
I would just say "Stack overflow" (with or without that infinite/recursion hint)
as subtext that he felt the hint was superfluous. Otherwise, why bring up "with or without"? So I nixed it. I can revert to the original language if there is a general preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I wouldn't mix "stack ovefflow" with "invalid memory access" in the same message. Most developers don't know or don't care about this fact. Just "Stack overflow" is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. In @ysbaddaden 's original work he also reported the address of the seg fault being handled, even when it was a stack overflow. Easy enough to not do this and shorten the error message yet further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, don't listen to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. ;-)
But your original point is very well taken: don't bury the main point (stack overflow) on the second line of the error message. That's fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to settle on something, then let's change the message once. My personal opinion:
- I'd keep the
infinite or too deep recursion
hint. It gives an explanation for the bluntstack overflow
, which otherwise can be a little intimidating when not familiar with it. - I'd drop the
invalid memory access at 0x...
since it's 99% likely to be a stack overflow, and debugging with gdb/lldb will give access to the actual address for the remaining 1%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consensus seemed to be reached. The error message has the hint, but no mention of the memory address.
stack_size = rlim.rlim_cur | ||
@stack = Pointer(Void).new(@stack_bottom.address - stack_size) | ||
end | ||
{% end %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should work for the main thread, but is it correct for created threads? I don't know what's the default stack size for them. Maybe we should use pthread_getattr_np(3) and pthread_attr_getstacksize(3) for created thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not worry about that for now until after threaded GC works. Otherwise it's stalling the PR. I'd like this in 0.27.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but maybe we can add a comment about it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a TODO:
would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO's with notes added.
b00a070
to
e578379
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is good, just one more simple spec.
spec/std/kernel_spec.cr
Outdated
@@ -220,3 +220,22 @@ describe "at_exit" do | |||
OUTPUT | |||
end | |||
end | |||
|
|||
describe "seg fault" do | |||
it "detects stack overflow" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a spec which stack overflows the main stack, not just a fiber stack?
How long does this spec run for (time)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each spec in kernel_spec.cr
costs about 3 seconds apiece to run for a total of 36 seconds on my laptop. The one I added is no different. So I was leery about adding too many of these.
In my actual testing on the various platforms I used the following:
- Random pointer deref does not trigger stack overflow
- Trivial infinite recursion on main stack triggers stack overflow
- Infinite recursion on main stack with modest stack allocations triggers stack overflow
- Trivial infinite recursion on fiber stack triggers stack overflow (this became the official spec)
- Infinite recursion on fiber with modest stack allocation triggers stack overflow
All of these take about the same amount of time, except on one of the BSDs (freebsd, I think) the default stack size is gargantuan and ulimit
didn't seem to work to restrict it. So testing for the main stack took an inordinate amount of time (a minute or two). This led to the choice of using the fibre stack for the official suite if there was going to be just one spec.
Please let me know which of these you'd like me to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@damaxwell I'd go for 1, 3 and 5. Speeding up the stack overflow by allocating a 512 byte staticarray should help the speed of the test on the BSDs, and testing main and fiber stacks is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new specs are added, @RX14. It was FreeBSD that seems to have a default stack size of 0.5G, but ulimit
works just fine to limit it; not sure what I had done wrong. It takes 2 minutes (in a VM) to exhaust a stack that big. I left a comment in the spec to note the potential trouble and remedy.
e578379
to
28da6f6
Compare
This spec fails systematically on Ubuntu 14.04 for me when running
|
Thanks. It was tested under Ubuntu 16.04 and passes there. I'll get 14.04 up and running and see what's different. |
@ysbaddaden: the failure happens because gmake 3.81 sets the stack rlimit to RLIM_INFINITY=-1. So our computation of the stack top is wrong. Later versions of gmake reset the stack rlimit to something sane before forking out child processes; see this discussion. I'm looking into a fix. |
@damaxwell @ysbaddaden Maybe a bit late, but thank you so much for this! Segfaults that are stack overflows are probably the most common errors reported here, and now that traffic will probably be reduced a lot. |
This PR fleshes out PoC #6376.
Closes #271.