Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: SIGSEGV during C.getaddrinfo #30310

Closed
hawran opened this issue Feb 19, 2019 · 9 comments
Closed

runtime: fatal error: SIGSEGV during C.getaddrinfo #30310

hawran opened this issue Feb 19, 2019 · 9 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@hawran
Copy link

hawran commented Feb 19, 2019

What version of Go are you using (go version)?

$ go version
go1.11.5 linux/amd64

Does this issue reproduce with the latest release?

This is the latest version, I presume.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/.../.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/.../go/workspace/"
GOPROXY=""
GORACE=""
GOROOT="/home/.../data/opt/go/current"
GOTMPDIR=""
GOTOOLDIR="/home/.../data/opt/go/current/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build917841504=/tmp/go-build -gno-record-gcc-switches"
System (Xubuntu)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.2 LTS
Release:        18.04
Codename:       bionic
$ uname -a
Linux hawran-lin 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

What did you do?

I've just run sets of my unit tests as usual.

What did you expect to see?

All tests passed.

What did you see instead?

I've come across these fatal errors when I was running some tests which had been OK before. (Meanwhile I'd upgraded my distribution).
Having discussed the issue with my colleagues we've narrowed a list of possible culprits down to a line as follows:
options edns0
It's a line within the /etc/resolv.conf file.
Unfortunately I cannot claim that this option was not in use in the previous version.
However, when I comment that line out, all tests are OK again.

The most irritating symptom of the issue is that the SIGSEGV happens randomly - for instance when a set of tests failed because of one test failed, the same test running alone passes.

I've spent some time to generate a core file, in vain.
So the only thing I can give you at the moment is a couple of lines from a stack trace:

stack trace (shortened)
...
=== RUN   TestLookupLib_LookupA
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x63 pc=0x7f43e539a448]

runtime stack:
runtime.throw(0xb35288, 0x2a)
/home/.../data/opt/go/current/src/runtime/panic.go:608 +0x72
runtime.sigpanic()
/home/.../data/opt/go/current/src/runtime/signal_unix.go:374 +0x2f2

goroutine 28 [syscall]:
runtime.cgocall(0x919b40, 0xc000040e00, 0x29)
/home/.../data/opt/go/current/src/runtime/cgocall.go:128 +0x5e fp=0xc000040dc8 sp=0xc000040d90 pc=0x403bae
net._C2func_getaddrinfo(0xc0001484a0, 0x0, 0xc0001418f0, 0xc0001440e8, 0x0, 0x0, 0x0)
_cgo_gotypes.go:91 +0x55 fp=0xc000040e00 sp=0xc000040dc8 pc=0x589275
net.cgoLookupIPCNAME.func1(0xc0001484a0, 0x0, 0xc0001418f0, 0xc0001440e8, 0xa, 0xa, 0x0)
/home/.../data/opt/go/current/src/net/cgo_unix.go:149 +0x131 fp=0xc000040e48 sp=0xc000040e00 pc=0x58e9a1
net.cgoLookupIPCNAME(0xb22da3, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/cgo_unix.go:149 +0x153 fp=0xc000040f38 sp=0xc000040e48 pc=0x58a833
net.cgoIPLookup(0xc000150840, 0xb22da3, 0x9)
/home/.../data/opt/go/current/src/net/cgo_unix.go:201 +0x4d fp=0xc000040fc8 sp=0xc000040f38 pc=0x58aeed
runtime.goexit()
/home/.../data/opt/go/current/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000040fd0 sp=0xc000040fc8 pc=0x45cfd1
created by net.cgoLookupIP
/home/.../data/opt/go/current/src/net/cgo_unix.go:211 +0xad

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0001ea400, 0xb2785a, 0x15, 0xb44b50, 0x47d301)
/home/.../data/opt/go/current/src/testing/testing.go:879 +0x383
testing.runTests.func1(0xc0001ea000)
/home/.../data/opt/go/current/src/testing/testing.go:1119 +0x78
testing.tRunner(0xc0001ea000, 0xc00009bda0)
/home/.../data/opt/go/current/src/testing/testing.go:827 +0xbf
testing.runTests(0xc00000cb00, 0x1212000, 0xe, 0xe, 0x40caff)
/home/.../data/opt/go/current/src/testing/testing.go:1117 +0x2aa
testing.(*M).Run(0xc00010e300, 0x0)
/home/.../data/opt/go/current/src/testing/testing.go:1034 +0x165
main.main()
_testmain.go:126 +0x205

goroutine 22 [syscall]:
os/signal.signal_recv(0xb46240)
/home/.../data/opt/go/current/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
/home/.../data/opt/go/current/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/home/.../data/opt/go/current/src/os/signal/signal_unix.go:29 +0x41

goroutine 9 [select]:
net.(*Resolver).LookupIPAddr(0x125cc60, 0xbbc520, 0xc000150720, 0xb22da3, 0x9, 0xb22dad, 0x4, 0x14ea, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/lookup.go:227 +0x55f
net.(*Resolver).internetAddrList(0x125cc60, 0xbbc520, 0xc000150720, 0xb1b388, 0x3, 0xb22da3, 0xe, 0x0, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/ipsock.go:279 +0x614
net.(*Resolver).resolveAddrList(0x125cc60, 0xbbc520, 0xc000150720, 0xb1b902, 0x4, 0xb1b388, 0x3, 0xb22da3, 0xe, 0x0, ...)
/home/.../data/opt/go/current/src/net/dial.go:202 +0x4fb
net.(*Dialer).DialContext(0xc000097b90, 0xbbc4e0, 0xc000024148, 0xb1b388, 0x3, 0xb22da3, 0xe, 0x0, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/dial.go:384 +0x201
net.(*Dialer).Dial(0xc000052b90, 0xb1b388, 0x3, 0xb22da3, 0xe, 0x203000, 0x8, 0x7f43ea8cd148, 0x8)
/home/.../data/opt/go/current/src/net/dial.go:329 +0x75
.../goutils/vendor/github.com/miekg/dns.(*Client).Dial(0xc0001b5420, 0xb22da3, 0xe, 0xaa7aa0, 0xc00019b500, 0x2183ddc47ec6)
/home/.../go/workspace/src/.../goutils/vendor/github.com/miekg/dns/client.go:104 +0x2c0
.../goutils/vendor/github.com/miekg/dns.(*Client).exchange(0xc0001b5420, 0xc00020c000, 0xb22da3, 0xe, 0x0, 0x0, 0x0, 0x0)
/home/.../go/workspace/src/.../goutils/vendor/github.com/miekg/dns/client.go:152 +0x69
.../goutils/vendor/github.com/miekg/dns.(*Client).Exchange(0xc0001b5420, 0xc00020c000, 0xb22da3, 0xe, 0xc00020c000, 0x101, 0x110, 0xc0000bd560)
/home/.../go/workspace/src/.../goutils/vendor/github.com/miekg/dns/client.go:129 +0x2b7
.../goutils/resolver.(*LookupLib).lookup(0xc000046778, 0xb22c7d, 0xe, 0xb1acfa, 0x1, 0x0, 0x0, 0xb46360, 0x1261660, 0x1312)
/home/.../go/workspace/src/.../goutils/resolver/lookup.go:202 +0x279
.../goutils/resolver.(*LookupLib).lookupType(0xc000046778, 0xb22c7d, 0xe, 0xb1acfa, 0x1, 0xc000001b00, 0x1261660, 0x1261660)
/home/.../go/workspace/src/.../goutils/resolver/lookup.go:166 +0x6c
.../goutils/resolver.(*LookupLib).LookupA(0xc000097f78, 0xb22c7d, 0xe, 0xc0001ea500, 0xc00006cdc0, 0xf, 0xff43cd, 0x37, 0x44c9a8)
/home/.../go/workspace/src/.../goutils/resolver/lookup.go:78 +0x62
.../goutils/resolver.TestLookupLib_LookupA(0xc0001ea400)
/home/.../go/workspace/src/.../goutils/resolver/lookup_test.go:64 +0x188
testing.tRunner(0xc0001ea400, 0xb44b50)
/home/.../data/opt/go/current/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
/home/.../data/opt/go/current/src/testing/testing.go:878 +0x35c

goroutine 10 [IO wait]:
internal/poll.runtime_pollWait(0x7f43e6652f00, 0x72, 0x0)
/home/.../data/opt/go/current/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00010e418, 0x72, 0xc00006cf00, 0x0, 0x0)
/home/.../data/opt/go/current/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00010e418, 0xffffffffffffff00, 0x0, 0x0)
/home/.../data/opt/go/current/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc00010e400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc00010e400, 0x0, 0xc000032000, 0x0)
/home/.../data/opt/go/current/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc00000e108, 0xc000057e48, 0x6bade7, 0xc0001ea5bc)
/home/.../data/opt/go/current/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).Accept(0xc00000e108, 0xb45201, 0xc000024cd0, 0xc0001ea500, 0x1)
/home/.../data/opt/go/current/src/net/tcpsock.go:260 +0x47
.../goutils/vendor/github.com/miekg/dns.(*Server).serveTCP(0xc0001ea500, 0xbbbaa0, 0xc00000e108, 0x0, 0x0)
/home/.../go/workspace/src/.../goutils/vendor/github.com/miekg/dns/server.go:487 +0xfd
.../goutils/vendor/github.com/miekg/dns.(*Server).ListenAndServe(0xc0001ea500, 0x0, 0x0)
/home/.../go/workspace/src/.../goutils/vendor/github.com/miekg/dns/server.go:342 +0x2a2
.../goutils/testutils.Serve(0xc0001ea400, 0xc0001ea500)
/home/.../go/workspace/src/.../goutils/testutils/simpledns.go:93 +0x2b
created by .../goutils/resolver.TestLookupLib_LookupA
/home/.../go/workspace/src/.../goutils/resolver/lookup_test.go:58 +0xd7

goroutine 27 [select]:
net.cgoLookupIP(0xbbc4a0, 0xc00014c6c0, 0xb22da3, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/cgo_unix.go:212 +0x17b
net.(*Resolver).lookupIP(0x125cc60, 0xbbc4a0, 0xc00014c6c0, 0xb22da3, 0x9, 0x0, 0xc00012a600, 0xc00014c680, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/lookup_unix.go:95 +0x166
net.(*Resolver).lookupIP-fm(0xbbc4a0, 0xc00014c6c0, 0xb22da3, 0x9, 0x42c072, 0x8, 0xc00014c680, 0x0, 0xc000040ea0)
/home/.../data/opt/go/current/src/net/lookup.go:207 +0x56
net.glob..func1(0xbbc4a0, 0xc00014c6c0, 0xc00013ee50, 0xb22da3, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/hook.go:19 +0x52
net.(*Resolver).LookupIPAddr.func1(0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/lookup.go:221 +0xd8
internal/singleflight.(*Group).doCall(0x125cc70, 0xc0001528c0, 0xb22da3, 0x9, 0xc000141860)
/home/.../data/opt/go/current/src/internal/singleflight/singleflight.go:95 +0x2e
created by internal/singleflight.(*Group).DoChan
/home/.../data/opt/go/current/src/internal/singleflight/singleflight.go:88 +0x2a0
FAIL .../goutils/resolver 1.013s

@agnivade agnivade changed the title fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation] runtime: fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation] Feb 19, 2019
@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 19, 2019
@agnivade
Copy link
Contributor

/cc @ianlancetaylor for CGo issues.

Also probably helpful if you can try with 1.12rc1 and report back your findings. Thanks.

@ianlancetaylor
Copy link
Member

Tracking the problem down to a change in resolv.conf is impressive.

Changing the line in resolv.conf is most likely changing whether you are using the Go or C DNS resolver code. It would be interesting to see whether the problem recurs if you set the environment variable GODEBUG=netdns=go (expected to work) or GODEBUG=netdns=cgo (expected to fail).

The stack trace shows that the C resolver code, which is part of glibc, is getting a segmentation violation. This is happening during a call to getaddrinfo. The call is pretty simple, and it's hard to understand why it would crash. I'm not sure how to make progress here without some way to reproduce the problem. You might possibly get a bit more information if you add, anywhere in your program,

import _ "github.com/ianlancetaylor/cgosymbolizer"

The might show us a stack trace in the C code.

@ianlancetaylor ianlancetaylor added this to the Go1.13 milestone Feb 19, 2019
@ianlancetaylor ianlancetaylor changed the title runtime: fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation] runtime: fatal error: SIGSEGV during C.getaddrinfo Feb 19, 2019
@hawran
Copy link
Author

hawran commented Feb 20, 2019

Hi guys,
my findings are as follows:

go1.11.5:

GODEBUG: - cgo go
options edns0 SIGSEGV SIGSEGV ok
no options edns0 ok SIGSEGV ok
stack trace (cgo parts)
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x63 pc=0x7feb2e5ee1b9]

runtime stack:
runtime.throw(0xba064a, 0x2a)
/home/.../data/opt/go/current/src/runtime/panic.go:617 +0x72
runtime.sigpanic()
/home/.../data/opt/go/current/src/runtime/signal_unix.go:374 +0x4a9

goroutine 92 [syscall]:
non-Go function
pc=0x7feb2e5ee1b9
non-Go function
pc=0x7feb2e5ef179
non-Go function
pc=0x9dc144
non-Go function
pc=0x9de0d3
runtime.cgocall(0x975730, 0xc00021c5e0, 0xc0000103c8)
/home/.../data/opt/go/current/src/runtime/cgocall.go:128 +0x5b fp=0xc00021c5b0 sp=0xc00021c578 pc=0x403d4b
net._C2func_getaddrinfo(0xc000029840, 0x0, 0xc0002a08a0, 0xc0000103c8, 0x0, 0x0, 0x0)
_cgo_gotypes.go:92 +0x55 fp=0xc00021c5e0 sp=0xc00021c5b0 pc=0x664585
net.cgoLookupIPCNAME.func1(0xc000029840, 0xb, 0xb, 0xc0002a08a0, 0xc0000103c8, 0xa, 0x42bd5f, 0x8)
/home/.../data/opt/go/current/src/net/cgo_unix.go:154 +0x13e fp=0xc00021c628 sp=0xc00021c5e0 pc=0x669dbe
net.cgoLookupIPCNAME(0xb877aa, 0x3, 0xc000029830, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/cgo_unix.go:154 +0x176 fp=0xc00021c718 sp=0xc00021c628 pc=0x665a56
net.cgoIPLookup(0xc0002ae0c0, 0xb877aa, 0x3, 0xc000029830, 0xa)
/home/.../data/opt/go/current/src/net/cgo_unix.go:206 +0x67 fp=0xc00021c7b8 sp=0xc00021c718 pc=0x666147
runtime.goexit()
/home/.../data/opt/go/current/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc00021c7c0 sp=0xc00021c7b8 pc=0x459441
created by net.cgoLookupIP
/home/.../data/opt/go/current/src/net/cgo_unix.go:216 +0xc7

...

goroutine 91 [select]:
net.cgoLookupIP(0xc67640, 0xc0002a8140, 0xb877aa, 0x3, 0xc000029830, 0xa, 0xc000067b00, 0xc00021c7c8, 0x7806c3, 0xc000067b00, ...)
/home/.../data/opt/go/current/src/net/cgo_unix.go:217 +0x195
net.(*Resolver).lookupIP(0x135f080, 0xc67640, 0xc0002a8140, 0xb877aa, 0x3, 0xc000029830, 0xa, 0xc00021c5f8, 0x455f10, 0xc000217380, ...)
/home/.../data/opt/go/current/src/net/lookup_unix.go:96 +0x1a4
net.glob..func1(0xc67640, 0xc0002a8140, 0xc00028ca00, 0xb877aa, 0x3, 0xc000029830, 0xa, 0x405bc5, 0xc00007eba0, 0xc4cae0, ...)
/home/.../data/opt/go/current/src/net/hook.go:23 +0x72
net.(*Resolver).lookupIPAddr.func1(0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/lookup.go:268 +0x116
internal/singleflight.(*Group).doCall(0x135f090, 0xc0000a9ae0, 0xc000029830, 0xa, 0xc0002a8180)
/home/.../data/opt/go/current/src/internal/singleflight/singleflight.go:95 +0x2e
created by internal/singleflight.(*Group).DoChan
/home/.../data/opt/go/current/src/internal/singleflight/singleflight.go:88 +0x29d

go1.12rc1:

GODEBUG: - cgo go
options edns0 SIGSEGV SIGSEGV ok
no options edns0 ok SIGSEGV ok
stack trace (cgo parts)
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x63 pc=0x7fd8b41641b9]

runtime stack:
runtime.throw(0xb5ad63, 0x2a)
/home/.../data/opt/go/current/src/runtime/panic.go:608 +0x72
runtime.sigpanic()
/home/.../data/opt/go/current/src/runtime/signal_unix.go:374 +0x2f2

goroutine 41 [syscall]:
non-Go function
pc=0x7fd8b41641b9
non-Go function
pc=0x7fd8b4165179
non-Go function
pc=0x99db14
non-Go function
pc=0x99faa3
runtime.cgocall(0x937100, 0xc00022b600, 0x29)
/home/.../data/opt/go/current/src/runtime/cgocall.go:128 +0x5e fp=0xc00022b5c8 sp=0xc00022b590 pc=0x403bce
net._C2func_getaddrinfo(0xc000164080, 0x0, 0xc0001608d0, 0xc000150068, 0x0, 0x0, 0x0)
_cgo_gotypes.go:91 +0x55 fp=0xc00022b600 sp=0xc00022b5c8 pc=0x65e7d5
net.cgoLookupIPCNAME.func1(0xc000164080, 0x0, 0xc0001608d0, 0xc000150068, 0xb, 0xb, 0x0)
/home/.../data/opt/go/current/src/net/cgo_unix.go:149 +0x131 fp=0xc00022b648 sp=0xc00022b600 pc=0x663f01
net.cgoLookupIPCNAME(0xc000164070, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/cgo_unix.go:149 +0x153 fp=0xc00022b738 sp=0xc00022b648 pc=0x65fd93
net.cgoIPLookup(0xc000146900, 0xc000164070, 0xa)
/home/.../data/opt/go/current/src/net/cgo_unix.go:201 +0x4d fp=0xc00022b7c8 sp=0xc00022b738 pc=0x66044d
runtime.goexit()
/home/.../data/opt/go/current/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc00022b7d0 sp=0xc00022b7c8 pc=0x459181
created by net.cgoLookupIP
/home/.../data/opt/go/current/src/net/cgo_unix.go:211 +0xad

...

goroutine 40 [select]:
net.cgoLookupIP(0xbe7ee0, 0xc00015e2c0, 0xc000164070, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/cgo_unix.go:212 +0x17b
net.(*Resolver).lookupIP(0x129bec0, 0xbe7ee0, 0xc00015e2c0, 0xc000164070, 0xa, 0x0, 0xc0001c8780, 0xc00013ee00, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/lookup_unix.go:95 +0x166
net.(*Resolver).lookupIP-fm(0xbe7ee0, 0xc00015e2c0, 0xc000164070, 0xa, 0x42af82, 0x8, 0xc00013ee00, 0x0, 0xc00022b6a0)
/home/.../data/opt/go/current/src/net/lookup.go:207 +0x56
net.glob..func1(0xbe7ee0, 0xc00015e2c0, 0xc0001663c0, 0xc000164070, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/hook.go:19 +0x52
net.(*Resolver).LookupIPAddr.func1(0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/lookup.go:221 +0xd8
internal/singleflight.(*Group).doCall(0x129bed0, 0xc000162460, 0xc000164070, 0xa, 0xc000160840)
/home/.../data/opt/go/current/src/internal/singleflight/singleflight.go:95 +0x2e
created by internal/singleflight.(*Group).DoChan
/home/.../data/opt/go/current/src/internal/singleflight/singleflight.go:88 +0x2a0

(Cannot imagine how the "improved" traces help and both the traces seem to be same to me.)


PS As for the "Tracking the problem down to a change in resolv.conf..." - one colleague of mine has recalled the similar issue in aws and he'd been able to track the critical change down... (magic NixOS, :-))

@ianlancetaylor
Copy link
Member

Thanks. You're right, of course: the extra stack traces didn't help. I guess that cgosymbolizer couldn't find the debug info for your C library.

It appears that on your system calls to getaddrinfo crash. I don't know why. Nobody else has been complaining about this so I don't think there is a systematic problem. Would it be possible for you to share a standalone program that demonstrates the problem? It sounds like it might be easy to demonstrate using GODEBUG=netdns=cgo.

@hawran
Copy link
Author

hawran commented Feb 24, 2019

No problem.
I'll try to prepare something usable, give me some time, pls.

@hawran
Copy link
Author

hawran commented Mar 11, 2019

Hi,
finally I've managed to extract simple file to reproduce the issue, with go 1.12, regardless of presence of options edns0:

Dial.gofile
package main

import (
"fmt"
"github.com/miekg/dns"
"net"
"time"
_ "github.com/ianlancetaylor/cgosymbolizer"
)

// SIMPLE DNS
func HandleDnsRequest(w dns.ResponseWriter, r *dns.Msg) {
fmt.Printf("\n>>> DEBUG: func HandleDnsRequest(w dns.ResponseWriter, r *dns.Msg) {: DO NOTHING\n")
}

func Serve(srv *dns.Server) {
fmt.Printf("\n>>> DEBUG: func Serve(srv *dns.Server) {\n")
_ = srv.ListenAndServe()
}

// A Conn represents a connection to a DNS server. see github.com/miekg/dns/client.go
type Conn struct {
net.Conn // a net.Conn holding the connection
UDPSize uint16 // minimum receive buffer for UDP messages
TsigSecret map[string]string // secret(s) for Tsig map[], zonename must be in canonical form (lowercase, fqdn, see RFC 4034 Section 6.2)
tsigRequestMAC string
}

// MAIN
func main() {
fmt.Printf("\n>>> DEBUG: main(): START\n\n")

// DNS SERVER
dns.HandleFunc("testing.", HandleDnsRequest)
dnsServer := &dns.Server{Addr: "localhost:5354", Net: "tcp"}
go Serve(dnsServer)
defer dnsServer.Shutdown()
// ---------------------
time.Sleep(time.Second * 1)

fmt.Printf("\n>>> DEBUG: main(): BEFORE: d := net.Dialer{Timeout: 2 * time.Second}\n")
    d := net.Dialer{Timeout: 2 * time.Second}
    fmt.Printf("  DEBUG: d: [%+v]\n", d)

fmt.Printf("\n>>> DEBUG: main(): BEFORE: conn := new(Conn)\n")
    conn := new(Conn)
    fmt.Printf("  DEBUG: conn: [%+v]\n", conn)

    network := "tcp"
    address := "localhost:5354"
    var err error

fmt.Printf("\n>>> DEBUG: main(): BEFORE: conn.Conn, err = d.Dial(network['%s'], address['%s'])\n", network, address)
    conn.Conn, err = d.Dial(network, address)
fmt.Printf("\n>>> main(): AFTER: conn.Conn, err = d.Dial(network['%s'], address['%s'])\n", network, address)
    fmt.Printf("  DEBUG: conn: [%+v], err: [%+v]\n", conn, err)

    fmt.Printf("\n>>> DEBUG: main(): END\n\n")

}

stack(a snippet)
>>> DEBUG: main(): BEFORE: conn.Conn, err = d.Dial(network['tcp'], address['localhost:5354'])
go package net: hostLookupOrder(localhost) = cgo
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x63 pc=0x7fe1b7df1448]

runtime stack:
runtime.throw(0x775ab7, 0x2a)
/home/.../data/opt/go/current/src/runtime/panic.go:617 +0x72
runtime.sigpanic()
/home/.../data/opt/go/current/src/runtime/signal_unix.go:374 +0x4a9

goroutine 10 [syscall]:
non-Go function
pc=0x7fe1b7df1448
non-Go function
pc=0x7fe1b7df28bc
non-Go function
pc=0x69a7d4
non-Go function
pc=0x69c763
runtime.cgocall(0x62da70, 0xc0000385e0, 0xc000012060)
/home/.../data/opt/go/current/src/runtime/cgocall.go:128 +0x5b fp=0xc0000385b0 sp=0xc000038578 pc=0x403cbb
net._C2func_getaddrinfo(0xc000018480, 0x0, 0xc000080d20, 0xc000012060, 0x0, 0x0, 0x0)
_cgo_gotypes.go:92 +0x55 fp=0xc0000385e0 sp=0xc0000385b0 pc=0x548785
net.cgoLookupIPCNAME.func1(0xc000018480, 0xa, 0xa, 0xc000080d20, 0xc000012060, 0xc0000386d0, 0x40604f, 0xc0000601e0)
/home/.../data/opt/go/current/src/net/cgo_unix.go:154 +0x13e fp=0xc000038628 sp=0xc0000385e0 pc=0x54c95e
net.cgoLookupIPCNAME(0x76af72, 0x3, 0x76d246, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/cgo_unix.go:154 +0x176 fp=0xc000038718 sp=0xc000038628 pc=0x549786
net.cgoIPLookup(0xc000060540, 0x76af72, 0x3, 0x76d246, 0x9)
/home/.../data/opt/go/current/src/net/cgo_unix.go:206 +0x67 fp=0xc0000387b8 sp=0xc000038718 pc=0x549e77
runtime.goexit()
/home/.../data/opt/go/current/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000387c0 sp=0xc0000387b8 pc=0x456bc1
created by net.cgoLookupIP
/home/.../data/opt/go/current/src/net/cgo_unix.go:216 +0xc7

goroutine 1 [select]:
net.(*Resolver).lookupIPAddr(0xb97a20, 0x7bb7e0, 0xc000060480, 0x76af72, 0x3, 0x76d246, 0x9, 0x14ea, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/lookup.go:274 +0x5e9
net.(*Resolver).internetAddrList(0xb97a20, 0x7bb7e0, 0xc000060480, 0x76af72, 0x3, 0x76d246, 0xe, 0x0, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/ipsock.go:280 +0x61f
net.(*Resolver).resolveAddrList(0xb97a20, 0x7bb7e0, 0xc000060480, 0x76b0c1, 0x4, 0x76af72, 0x3, 0x76d246, 0xe, 0x0, ...)
/home/.../data/opt/go/current/src/net/dial.go:213 +0x4ce
net.(*Dialer).DialContext(0xc00004bec8, 0x7bb7a0, 0xc000018130, 0x76af72, 0x3, 0x76d246, 0xe, 0x0, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/dial.go:395 +0x202
net.(*Dialer).Dial(...)
/home/.../data/opt/go/current/src/net/dial.go:340
main.main()
/home/.../go/workspace/src/hawran/190222-091732.github.golang.30310.SIGSEGV-during-C.getaddrinfo/Dial.go:54 +0x486

goroutine 5 [IO wait]:
internal/poll.runtime_pollWait(0x7fe1bc186f08, 0x72, 0x0)
/home/.../data/opt/go/current/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc0000b2098, 0x72, 0x0, 0x0, 0x76b811)
/home/.../data/opt/go/current/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
/home/.../data/opt/go/current/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc0000b2080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/internal/poll/fd_unix.go:384 +0x1ba
net.(*netFD).accept(0xc0000b2080, 0xc00006a000, 0x7fe1c0bf8008, 0x0)
/home/.../data/opt/go/current/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc000012050, 0xc00004cde0, 0x5f05b1, 0xc0000b00bc)
/home/.../data/opt/go/current/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xc000012050, 0x77a001, 0xc0000183f0, 0xc0000b0000, 0xc000000001)
/home/.../data/opt/go/current/src/net/tcpsock.go:260 +0x48
github.com/miekg/dns.(*Server).serveTCP(0xc0000b0000, 0x7bb160, 0xc000012050, 0x0, 0x0)
/home/.../go/workspace/src/github.com/miekg/dns/server.go:487 +0x102
github.com/miekg/dns.(*Server).ListenAndServe(0xc0000b0000, 0x0, 0x0)
/home/.../go/workspace/src/github.com/miekg/dns/server.go:342 +0x2eb
main.Serve(0xc0000b0000)
/home/.../go/workspace/src/hawran/190222-091732.github.golang.30310.SIGSEGV-during-C.getaddrinfo/Dial.go:18 +0x6e
created by main.main
/home/.../go/workspace/src/hawran/190222-091732.github.golang.30310.SIGSEGV-during-C.getaddrinfo/Dial.go:36 +0x12a

goroutine 9 [select]:
net.cgoLookupIP(0x7bb760, 0xc00005e2c0, 0x76af72, 0x3, 0x76d246, 0x9, 0x0, 0x0, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/cgo_unix.go:217 +0x195
net.(*Resolver).lookupIP(0xb97a20, 0x7bb760, 0xc00005e2c0, 0x76af72, 0x3, 0x76d246, 0x9, 0x0, 0x0, 0xc000001b00, ...)
/home/.../data/opt/go/current/src/net/lookup_unix.go:96 +0x1a4
net.glob..func1(0x7bb760, 0xc00005e2c0, 0xc000014820, 0x76af72, 0x3, 0x76d246, 0x9, 0x0, 0x0, 0x0, ...)
/home/.../data/opt/go/current/src/net/hook.go:23 +0x72
net.(*Resolver).lookupIPAddr.func1(0x0, 0x0, 0x0, 0x0)
/home/.../data/opt/go/current/src/net/lookup.go:268 +0x116
internal/singleflight.(*Group).doCall(0xb97a30, 0xc000092190, 0x76d246, 0x9, 0xc00005e300)
/home/.../data/opt/go/current/src/internal/singleflight/singleflight.go:95 +0x2e
created by internal/singleflight.(*Group).DoChan
/home/.../data/opt/go/current/src/internal/singleflight/singleflight.go:88 +0x29d

Build/Run info:
go build -ldflags "-w -linkmode external -extldflags -static" -o Dial Dial.go && GODEBUG=netdns=cgo+2 ./Dial
ends up with SIGSEGV

go build -ldflags "-w -linkmode external -extldflags -static" -o Dial Dial.go && GODEBUG=netdns=go+2 ./Dial
ends up without SIGSEGV

I am very sorry for the delay.

@ianlancetaylor
Copy link
Member

Thanks. I didn't realize that you were using -extldflags -static.

I cannot recreate the problem using GODEBUG=netdns=go+2. I can recreate it using GODEBUG=netdns=cgo+2 (changing go to cgo).

When using netdns=cgo, the crash is due to a bug in glibc: https://sourceware.org/bugzilla/show_bug.cgi?id=19341 . There is some discussion of its effect on Go in #13470. Basically, it seems that due to this glibc bug it is impossible to use -extldflags -static and use cgo code for net and os/user.

Please double check whether you are using GODEBUG=netdns=cgo (which unfortunately cannot work reliably when using -extldflags -static or GODEBUG=netdns=go (which does work in my testing). Thanks.

In general if you want to use -extldflags -static the safest approach is to build with -tags netgo. This will ensure that your program only use the Go DNS lookup routines.

@hawran
Copy link
Author

hawran commented Mar 12, 2019

Thanks. I didn't realize that you were using -extldflags -static.

I'm sorry, my mistake, I realised that the problem is actually with a built application when I was trying to prepare that code sample.

I cannot recreate the problem using GODEBUG=netdns=go+2. I can recreate it using GODEBUG=netdns=cgo+2 (changing go to cgo).

There might be a small misunderstanding here, I'll rewrite my conclusion:
with GODEBUG=netdns=go+2 there's no SIGSEGV, the program ends up normally.
With GODEBUG=netdns=cgo+2 SIGSEGV happens, the program crashes.

When using netdns=cgo, the crash is due to a bug in glibc: https://sourceware.org/bugzilla/show_bug.cgi?id=19341 . There is some discussion of its effect on Go in #13470. Basically, it seems that due to this glibc bug it is impossible to use -extldflags -static and use cgo code for net and os/user.

Please double check whether you are using GODEBUG=netdns=cgo (which unfortunately cannot work reliably when using -extldflags -static or GODEBUG=netdns=go (which does work in my testing). Thanks.

In general if you want to use -extldflags -static the safest approach is to build with -tags netgo. This will ensure that your program only use the Go DNS lookup routines.

OK, thank you.
However I really need some time to get all of it correctly.

@ianlancetaylor
Copy link
Member

Thanks. I'm going to close this issue because unfortunately I don't think there is anything that the Go project can do to fix it. It's a bug in glibc, even though the glibc maintainers don't seem to have any plans to fix it either. The workaround is to use GODEBUG=netdns=go or to build the program with -tags netgo.

@golang golang locked and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants