Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGINT crash during graceful shutdown with memory storage on macOS Mojave #31380

Closed
willbuckner opened this issue Oct 15, 2018 · 6 comments · Fixed by #31516
Closed

SIGINT crash during graceful shutdown with memory storage on macOS Mojave #31380

willbuckner opened this issue Oct 15, 2018 · 6 comments · Fixed by #31516
Assignees
Labels
A-build-system B-os-macos Issues specific to macOS. C-investigation Further steps needed to qualify. C-label will change. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.

Comments

@willbuckner
Copy link

When running cockroach start --store=type=mem,size=30% ..., pressing Ctrl+C/sending SIGINT results in a crash during shutdown. The log below contains both stdout and stderr output.

cockroach.log

Please describe the issue you observed, and any steps we can take to reproduce it:

To Reproduce

^C
*@spacecraft~⦕ tail cockroach.log
r11    0x213
r12    0x0
r13    0x3
r14    0x33
r15    0x0
rip    0x405ed1f
rflags 0x213
cs     0x2b
fs     0x0
gs     0x0
*@spacecraft~⦕

What did you do? Describe in your own words.

I started cockroach 2.0.5 using in-memory storage, and it crashes on SIGINT.

Expected behavior
Not crashing--graceful shutdown.

Additional data / screenshots
cockroach.log

Note that it crashes regardless of any SQL being run.

Environment:

*@spacecraft~/go/src/my_project⦕ cockroach version
Build Tag:    v2.0.5
Build Time:   2018/08/13 20:55:02
Distribution: CCL
Platform:     darwin amd64 (x86_64-apple-darwin17.7.0)
Go Version:   go1.10.3
C Compiler:   4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)
Build SHA-1:  3f2f0f5eea9b9d552e471eba8c37504a0595342f
Build Type:   development

*@spacecraft~⦕ uname -v
Darwin Kernel Version 18.0.0: Wed Aug 22 20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64

Installed from brew install cockroach on macOS Mojave.

Additional context

@tbg
Copy link
Member

tbg commented Oct 15, 2018

Thanks @willbuckner. I've been running into something similar today, though for me it immediately crashes the process when it starts.

@benesch look what this is:

I181015 17:34:35.167704 574 server/updates.go:233  [n1] A new version is available: 2.0.6, details: https://www.cockroachlabs.com/docs/releases/v2.0.6.html
SIGTRAP: trace trap
PC=0x405ed1f m=0 sigcode=1

goroutine 0 [idle]:
invalid spdelta runtime.sigtramp 0x405ece0 0x405ed1f 0xea402 -1
runtime: unexpected return pc for runtime.sigtramp called from 0x500
stack: frame={sp:0xc420009a88, fp:0xc420009a8f} stack=[0x7ffeefb7edc0,0x7ffeefbfe240)

runtime.sigtramp(0xc420009ee000, 0xc420009f4800, 0x1e00, 0xc420009f4800, 0xc420009ab000, 0x0, 0x0, 0x0, 0x0, 0x300, ...)
	?:0 +0x3f

goroutine 53 [syscall]:
runtime.notetsleepg(0x7375460, 0xae3d6a8, 0x0)
	/usr/local/Cellar/go/1.10.3/libexec/src/runtime/lock_sema.go:280 +0x4b fp=0xc420489f60 sp=0xc420489f20 pc=0x4013b1b
runtime.timerproc(0x7375440)
	/usr/local/Cellar/go/1.10.3/libexec/src/runtime/time.go:261 +0x2e7 fp=0xc420489fd8 sp=0xc420489f60 pc=0x404ddc7
runtime.goexit()
	/usr/local/Cellar/go/1.10.3/libexec/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc420489fe0 sp=0xc420489fd8 pc=0x405d9a1
created by runtime.(*timersBucket).addtimerLocked
	/usr/local/Cellar/go/1.10.3/libexec/src/runtime/time.go:160 +0x107

@tbg tbg self-assigned this Oct 15, 2018
@tbg
Copy link
Member

tbg commented Oct 15, 2018

Ok, I think I'm closing in on a small repro (not for this exact crash, but for the one I get locally via gopsutil which I hope amounts to the same thing at the end). I've trimmed down pkg/cmd/cockroach to the following. If I comment out the ccl import, the crash goes away. Looking there now.

package main

// NB: the C import doesn't make a difference, I just have it here so that
// I don't accidentally drop out of cgo mode.

// static int foo = 0;
import "C"

import (
	"context"

	_ "github.com/cockroachdb/cockroach/pkg/ccl" // ccl init hooks
	// _ "github.com/cockroachdb/cockroach/pkg/ccl/cliccl" // cliccl init hooks
	// _ "github.com/cockroachdb/cockroach/pkg/ui/distccl" // ccl web UI init hook
	gopsnet "github.com/shirou/gopsutil/net"
)

func main() {
	_, _ = gopsnet.IOCountersWithContext(context.TODO(), true)
}

@benesch
Copy link
Contributor

benesch commented Oct 15, 2018

Amazing! This comes to mind:

engine.SetRocksDBOpenHook(C.DBOpenHookCCL)

@tbg
Copy link
Member

tbg commented Oct 15, 2018

Yeah, that's what I tracked it down to! Currently trimming the fat.

@knz knz added S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. C-investigation Further steps needed to qualify. C-label will change. labels Oct 15, 2018
@tbg
Copy link
Member

tbg commented Oct 15, 2018

No conclusion here yet but I doubt @benesch can resist taking this down tomorrow.

The last I know is that the crash only happens when CryptoPP:UsesAESNI is referenced (not called - referenced).

Essentially the repro boils down to

func main() {
	_ = C.UsesAESNI

	cmd := exec.Command("echo")
	cmd.Run()
}

Very 🍿 to learn what this is caused by.

@tbg tbg assigned benesch and unassigned tbg Oct 15, 2018
@tbg
Copy link
Member

tbg commented Oct 15, 2018

Branch is here: https://github.com/tschottdorf/cockroach/tree/delme-repro

make build && ./cockroach repros (not on the latest commit, that's an attempt at cutting out more code but I couldn't get it to build).

@tbg tbg added the B-os-macos Issues specific to macOS. label Oct 16, 2018
benesch added a commit to cockroachdb/cryptopp that referenced this issue Oct 16, 2018
As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb/cockroach#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb/cockroach#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.
benesch added a commit to cockroachdb/cryptopp that referenced this issue Oct 16, 2018
As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb/cockroach#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb/cockroach#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.
benesch added a commit to benesch/cockroach that referenced this issue Oct 16, 2018
Bump CryptoPP to pick up a fix for cockroachdb#31380.
Details reproduced below.

Fix cockroachdb#31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (cockroachdb#31380).
craig bot pushed a commit that referenced this issue Oct 16, 2018
31516: c-deps: bump CryptoPP to avoid SIGTRAP on macOS r=mberhault a=benesch

Bump CryptoPP to pick up a fix for #31380.
Details reproduced below.

Fix #31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in #31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (#31380).

31517: roachtest: deflake acceptance/bank/zerosum-splits r=andreimatei a=benesch

This test requires that the experimental_force_split_at session var be
set to force ALTER ... SPLIT AT to work even with the merge queue
enabled. gosql.DB's connection pool will occasionally open a new
connection which does not have the var set. Set the session var in the
same batch of statements as the ALTER ... SPLIT AT command so that the
session var is always set in the session that executes the ALTER ...
SPLIT AT command.

Fix #31510.

Release note: None

Co-authored-by: Nikhil Benesch <[email protected]>
@craig craig bot closed this as completed in #31516 Oct 16, 2018
benesch added a commit to benesch/cockroach that referenced this issue Oct 16, 2018
Bump CryptoPP to pick up a fix for cockroachdb#31380.
Details reproduced below.

Fix cockroachdb#31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (cockroachdb#31380).
benesch added a commit to cockroachdb/cryptopp that referenced this issue Oct 16, 2018
As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb/cockroach#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb/cockroach#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.
benesch added a commit to benesch/cockroach that referenced this issue Oct 16, 2018
Bump CryptoPP to pick up a fix for cockroachdb#31380.
Details reproduced below.

Fix cockroachdb#31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (cockroachdb#31380).

diff --git a/c-deps/cryptopp b/c-deps/cryptopp
index c621ce0532..6d6064445d 160000
--- a/c-deps/cryptopp
+++ b/c-deps/cryptopp
@@ -1 +1 @@
-Subproject commit c621ce053298fafc1e59191079c33acd76045c26
+Subproject commit 6d6064445ded787c7d6bf0a021fb718fddb63345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-system B-os-macos Issues specific to macOS. C-investigation Further steps needed to qualify. C-label will change. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants