Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store - runtime: program exceeds 10000-thread limit #5259

Closed
bennyboom38 opened this issue Mar 30, 2022 · 3 comments
Closed

store - runtime: program exceeds 10000-thread limit #5259

bennyboom38 opened this issue Mar 30, 2022 · 3 comments

Comments

@bennyboom38
Copy link

Thanos, Prometheus and Golang version used:

thanos, version 0.23.1 (branch: HEAD, revision: 5327cd8)
build user: root@0acc901868e9
build date: 20211005-12:08:29
go version: go1.16.8
platform: linux/amd64

Object Storage Provider:

type: FILESYSTEM (NFS mounting point)
config:
directory: "/mnt/thanos_tsdb"

What happened:

When I try to get history for the last 45 days , thanos store crashed

What you expected to happen:

Thanos store stay up and provide me the data for the time range i have asked

How to reproduce it (as minimally and precisely as possible):

Here is the configuration of thanos store:
--http-address=0.0.0.0:19091 --grpc-address=0.0.0.0:19092 --log.level=debug --chunk-pool-size=256MB --index-cache-size=256MB --data-dir=/opt/kelkoogroup/data/thanos-storegateway --objstore.config-file=/opt/thanos/bucket_config.yaml

Full logs to relevant components:

Mar 30 16:00:43 thanos[116952]: runtime: program exceeds 10000-thread limit
Mar 30 16:00:43 thanos[116952]: fatal error: thread exhaustion
Mar 30 16:00:43 thanos[116952]: runtime stack:
Mar 30 16:00:43 thanos[116952]: runtime.throw(0x1e1e320, 0x11)
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/panic.go:1117 +0x72
Mar 30 16:00:43 thanos[116952]: runtime.checkmcount()
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/proc.go:701 +0xac
Mar 30 16:00:43 thanos[116952]: runtime.mReserveID(0xc000053800)
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/proc.go:717 +0x3e
Mar 30 16:00:43 thanos[116952]: runtime.startm(0x0, 0xc08b7bff01)
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/proc.go:2370 +0x92
Mar 30 16:00:43 thanos[116952]: runtime.wakep()
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/proc.go:2477 +0x66
Mar 30 16:00:43 thanos[116952]: runtime.resetspinning()
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/proc.go:3020 +0x59
Mar 30 16:00:43 thanos[116952]: runtime.schedule()
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/proc.go:3176 +0x2b9
Mar 30 16:00:43 thanos[116952]: runtime.mstart1()
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/proc.go:1313 +0x93
Mar 30 16:00:43 thanos[116952]: runtime.mstart()
Mar 30 16:00:43 thanos[116952]: /usr/local/go/src/runtime/proc.go:1272 +0x6e
Mar 30 16:00:43 thanos[116952]: goroutine 1 [chan receive, 2 minutes]:
Mar 30 16:00:43 thanos[116952]: github.com/oklog/run.(*Group).Run(0xc00078e588, 0xc0007ef280, 0xc000415680)
Mar 30 16:00:43 thanos[116952]: /go/pkg/mod/github.com/oklog/[email protected]/group.go:43 +0xed
Mar 30 16:00:43 thanos[116952]: main.main()

Anything else we need to know:

Environment:

  • OS (e.g. from /etc/os-release): CentOS Stream release 8
  • Kernel (e.g. uname -a): 4.18.0-365.el8.x86_64 Initial structure and block shipper #1 SMP Thu Feb 10 16:11:23 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Others:
    24 vCPU 2Ghz
    32Go Ram
@GiedriusS
Copy link
Member

GiedriusS commented Apr 11, 2022

Could you please dump the list of goroutines when this happens via /debug/pprof HTTP endpoint? 👁️ It's impossible to tell what's happening right now :/

@bennyboom38
Copy link
Author

bennyboom38 commented May 19, 2022

Hello

Here is the list of goroutine when thanos store gateway is crashing :

goroutine profile: total 27
4 @ 0x43af85 0x43351b 0x46c475 0x4e7185 0x4e8275 0x4e8257 0x59446f 0x5a8cf1 0x7b1879 0x5771e8 0x57736f 0x7b3a45 0x7b79a5 0x472121
#	0x46c474	internal/poll.runtime_pollWait+0x54		/usr/local/go/src/runtime/netpoll.go:222
#	0x4e7184	internal/poll.(*pollDesc).wait+0x44		/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4e8274	internal/poll.(*pollDesc).waitRead+0x1d4	/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4e8256	internal/poll.(*FD).Read+0x1b6			/usr/local/go/src/internal/poll/fd_unix.go:166
#	0x59446e	net.(*netFD).Read+0x4e				/usr/local/go/src/net/fd_posix.go:55
#	0x5a8cf0	net.(*conn).Read+0x90				/usr/local/go/src/net/net.go:183
#	0x7b1878	net/http.(*connReader).Read+0x1b8		/usr/local/go/src/net/http/server.go:780
#	0x5771e7	bufio.(*Reader).fill+0x107			/usr/local/go/src/bufio/bufio.go:101
#	0x57736e	bufio.(*Reader).Peek+0x4e			/usr/local/go/src/bufio/bufio.go:139
#	0x7b3a44	net/http.(*conn).readRequest+0xec4		/usr/local/go/src/net/http/server.go:963
#	0x7b79a4	net/http.(*conn).serve+0x704			/usr/local/go/src/net/http/server.go:1858

2 @ 0x43af85 0x4068cf 0x40650b 0xc66f05 0x472121
#	0xc66f04	google.golang.org/grpc.(*addrConn).resetTransport+0x464	/go/pkg/mod/google.golang.org/[email protected]/clientconn.go:1156

2 @ 0x43af85 0x43351b 0x46c475 0x4e7185 0x4e8275 0x4e8257 0x59446f 0x5a8cf1 0x577882 0x4e0627 0x9b9109 0x9b90a2 0x9b9985 0xc3c705 0x472121
#	0x46c474	internal/poll.runtime_pollWait+0x54					/usr/local/go/src/runtime/netpoll.go:222
#	0x4e7184	internal/poll.(*pollDesc).wait+0x44					/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4e8274	internal/poll.(*pollDesc).waitRead+0x1d4				/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4e8256	internal/poll.(*FD).Read+0x1b6						/usr/local/go/src/internal/poll/fd_unix.go:166
#	0x59446e	net.(*netFD).Read+0x4e							/usr/local/go/src/net/fd_posix.go:55
#	0x5a8cf0	net.(*conn).Read+0x90							/usr/local/go/src/net/net.go:183
#	0x577881	bufio.(*Reader).Read+0x221						/usr/local/go/src/bufio/bufio.go:227
#	0x4e0626	io.ReadAtLeast+0x86							/usr/local/go/src/io/io.go:328
#	0x9b9108	io.ReadFull+0x88							/usr/local/go/src/io/io.go:347
#	0x9b90a1	golang.org/x/net/http2.readFrameHeader+0x21				/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237
#	0x9b9984	golang.org/x/net/http2.(*Framer).ReadFrame+0xa4				/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:492
#	0xc3c704	google.golang.org/grpc/internal/transport.(*http2Client).reader+0x184	/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:1273

2 @ 0x43af85 0x43351b 0x46c475 0x4e7185 0x4e8275 0x4e8257 0x59446f 0x5a8cf1 0x7b1879 0x5771e8 0x577f5d 0x5781b4 0x732716 0x7abaaa 0x7abaab 0x7b2d1d 0x7b79a5 0x472121
#	0x46c474	internal/poll.runtime_pollWait+0x54		/usr/local/go/src/runtime/netpoll.go:222
#	0x4e7184	internal/poll.(*pollDesc).wait+0x44		/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4e8274	internal/poll.(*pollDesc).waitRead+0x1d4	/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4e8256	internal/poll.(*FD).Read+0x1b6			/usr/local/go/src/internal/poll/fd_unix.go:166
#	0x59446e	net.(*netFD).Read+0x4e				/usr/local/go/src/net/fd_posix.go:55
#	0x5a8cf0	net.(*conn).Read+0x90				/usr/local/go/src/net/net.go:183
#	0x7b1878	net/http.(*connReader).Read+0x1b8		/usr/local/go/src/net/http/server.go:780
#	0x5771e7	bufio.(*Reader).fill+0x107			/usr/local/go/src/bufio/bufio.go:101
#	0x577f5c	bufio.(*Reader).ReadSlice+0x3c			/usr/local/go/src/bufio/bufio.go:360
#	0x5781b3	bufio.(*Reader).ReadLine+0x33			/usr/local/go/src/bufio/bufio.go:389
#	0x732715	net/textproto.(*Reader).readLineSlice+0xd5	/usr/local/go/src/net/textproto/reader.go:57
#	0x7abaa9	net/textproto.(*Reader).ReadLine+0xa9		/usr/local/go/src/net/textproto/reader.go:38
#	0x7abaaa	net/http.readRequest+0xaa			/usr/local/go/src/net/http/request.go:1027
#	0x7b2d1c	net/http.(*conn).readRequest+0x19c		/usr/local/go/src/net/http/server.go:966
#	0x7b79a4	net/http.(*conn).serve+0x704			/usr/local/go/src/net/http/server.go:1858

2 @ 0x43af85 0x44c1f7 0xc2c2ff 0xc2cc5d 0xc4dfbb 0x472121
#	0xc2c2fe	google.golang.org/grpc/internal/transport.(*controlBuffer).get+0xfe	/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:395
#	0xc2cc5c	google.golang.org/grpc/internal/transport.(*loopyWriter).run+0x1dc	/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:513
#	0xc4dfba	google.golang.org/grpc/internal/transport.newHTTP2Client.func3+0x7a	/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:346

2 @ 0x43af85 0x44c1f7 0xc5f1ac 0x472121
#	0xc5f1ab	google.golang.org/grpc.(*ccBalancerWrapper).watcher+0xab	/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:69

1 @ 0x40c6f4 0x46e805 0x636045 0x472121
#	0x46e804	os/signal.signal_recv+0xa4	/usr/local/go/src/runtime/sigqueue.go:168
#	0x636044	os/signal.loop+0x24		/usr/local/go/src/os/signal/signal_unix.go:23

1 @ 0x43af85 0x4068cf 0x40650b 0x196bd10 0x636567 0x472121
#	0x196bd0f	main.main.func2+0x4f				/app/cmd/thanos/main.go:115
#	0x636566	github.com/oklog/run.(*Group).Run.func1+0x26	/go/pkg/mod/github.com/oklog/[email protected]/group.go:38

1 @ 0x43af85 0x4068cf 0x40650b 0x63642d 0x193c2c8 0x43ab56 0x472121
#	0x63642c	github.com/oklog/run.(*Group).Run+0xec	/go/pkg/mod/github.com/oklog/[email protected]/group.go:43
#	0x193c2c7	main.main+0xd67				/app/cmd/thanos/main.go:155
#	0x43ab55	runtime.main+0x255			/usr/local/go/src/runtime/proc.go:225

1 @ 0x43af85 0x4068cf 0x40650b 0xfe471a 0x472121
#	0xfe4719	github.com/baidubce/bce-sdk-go/util/log.NewLogger.func1+0x139	/go/pkg/mod/github.com/baidubce/[email protected]/util/log/logger.go:362

1 @ 0x43af85 0x43351b 0x46c475 0x4e7185 0x4e9ff2 0x4e9fd4 0x595aa5 0x5b3352 0x5b2145 0x7bca25 0xcff730 0xcff525 0xe2a4bb 0x196f506 0x636567 0x472121
#	0x46c474	internal/poll.runtime_pollWait+0x54						/usr/local/go/src/runtime/netpoll.go:222
#	0x4e7184	internal/poll.(*pollDesc).wait+0x44						/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4e9ff1	internal/poll.(*pollDesc).waitRead+0x211					/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4e9fd3	internal/poll.(*FD).Accept+0x1f3						/usr/local/go/src/internal/poll/fd_unix.go:401
#	0x595aa4	net.(*netFD).accept+0x44							/usr/local/go/src/net/fd_unix.go:172
#	0x5b3351	net.(*TCPListener).accept+0x31							/usr/local/go/src/net/tcpsock_posix.go:139
#	0x5b2144	net.(*TCPListener).Accept+0x64							/usr/local/go/src/net/tcpsock.go:261
#	0x7bca24	net/http.(*Server).Serve+0x284							/usr/local/go/src/net/http/server.go:2961
#	0xcff72f	github.com/prometheus/exporter-toolkit/web.Serve+0x1af				/go/pkg/mod/github.com/prometheus/[email protected]/web/tls_config.go:192
#	0xcff524	github.com/prometheus/exporter-toolkit/web.ListenAndServe+0x104			/go/pkg/mod/github.com/prometheus/[email protected]/web/tls_config.go:184
#	0xe2a4ba	github.com/thanos-io/thanos/pkg/server/http.(*Server).ListenAndServe+0x25a	/app/pkg/server/http/http.go:68
#	0x196f505	main.runQuery.func13+0x45							/app/cmd/thanos/query.go:582
#	0x636566	github.com/oklog/run.(*Group).Run.func1+0x26					/go/pkg/mod/github.com/oklog/[email protected]/group.go:38

1 @ 0x43af85 0x43351b 0x46c475 0x4e7185 0x4e9ff2 0x4e9fd4 0x595aa5 0x5b3352 0x5b2145 0xc730ff 0x15cc9bd 0x196f6a6 0x636567 0x472121
#	0x46c474	internal/poll.runtime_pollWait+0x54						/usr/local/go/src/runtime/netpoll.go:222
#	0x4e7184	internal/poll.(*pollDesc).wait+0x44						/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4e9ff1	internal/poll.(*pollDesc).waitRead+0x211					/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4e9fd3	internal/poll.(*FD).Accept+0x1f3						/usr/local/go/src/internal/poll/fd_unix.go:401
#	0x595aa4	net.(*netFD).accept+0x44							/usr/local/go/src/net/fd_unix.go:172
#	0x5b3351	net.(*TCPListener).accept+0x31							/usr/local/go/src/net/tcpsock_posix.go:139
#	0x5b2144	net.(*TCPListener).Accept+0x64							/usr/local/go/src/net/tcpsock.go:261
#	0xc730fe	google.golang.org/grpc.(*Server).Serve+0x27e					/go/pkg/mod/google.golang.org/[email protected]/server.go:621
#	0x15cc9bc	github.com/thanos-io/thanos/pkg/server/grpc.(*Server).ListenAndServe+0x21c	/app/pkg/server/grpc/grpc.go:128
#	0x196f6a5	main.runQuery.func15+0x45							/app/cmd/thanos/query.go:611
#	0x636566	github.com/oklog/run.(*Group).Run.func1+0x26					/go/pkg/mod/github.com/oklog/[email protected]/group.go:38

1 @ 0x43af85 0x44c1f7 0x1064c4d 0x472121
#	0x1064c4c	go.opencensus.io/stats/view.(*worker).start+0xcc	/go/pkg/mod/[email protected]/stats/view/worker.go:276

1 @ 0x43af85 0x44c1f7 0x193d375 0x196bf9c 0x636567 0x472121
#	0x193d374	main.interrupt+0x134				/app/cmd/thanos/main.go:166
#	0x196bf9b	main.main.func4+0x3b				/app/cmd/thanos/main.go:139
#	0x636566	github.com/oklog/run.(*Group).Run.func1+0x26	/go/pkg/mod/github.com/oklog/[email protected]/group.go:38

1 @ 0x43af85 0x44c1f7 0x193d6bb 0x196c065 0x636567 0x472121
#	0x193d6ba	main.reload+0x11a				/app/cmd/thanos/main.go:179
#	0x196c064	main.main.func6+0x44				/app/cmd/thanos/main.go:149
#	0x636566	github.com/oklog/run.(*Group).Run.func1+0x26	/go/pkg/mod/github.com/oklog/[email protected]/group.go:38

1 @ 0x43af85 0x44c1f7 0xe3b8ee 0x196de85 0x636567 0x472121
#	0xe3b8ed	github.com/thanos-io/thanos/pkg/runutil.Repeat+0xed	/app/pkg/runutil/runutil.go:78
#	0x196de84	main.runQuery.func3+0xa4				/app/cmd/thanos/query.go:431
#	0x636566	github.com/oklog/run.(*Group).Run.func1+0x26		/go/pkg/mod/github.com/oklog/[email protected]/group.go:38

1 @ 0x43af85 0x44c1f7 0xe3b8ee 0x196f2c5 0x636567 0x472121
#	0xe3b8ed	github.com/thanos-io/thanos/pkg/runutil.Repeat+0xed	/app/pkg/runutil/runutil.go:78
#	0x196f2c4	main.runQuery.func9+0x284				/app/cmd/thanos/query.go:484
#	0x636566	github.com/oklog/run.(*Group).Run.func1+0x26		/go/pkg/mod/github.com/oklog/[email protected]/group.go:38

1 @ 0x46c01d 0xccb86e 0xccb645 0xcc81d2 0xcd9c45 0xcdb497 0x7b9084 0x7baf0d 0x7bc643 0x7b7b6d 0x472121
#	0x46c01c	runtime/pprof.runtime_goroutineProfileWithLabels+0x5c	/usr/local/go/src/runtime/mprof.go:716
#	0xccb86d	runtime/pprof.writeRuntimeProfile+0xcd			/usr/local/go/src/runtime/pprof/pprof.go:724
#	0xccb644	runtime/pprof.writeGoroutine+0xa4			/usr/local/go/src/runtime/pprof/pprof.go:684
#	0xcc81d1	runtime/pprof.(*Profile).WriteTo+0x3f1			/usr/local/go/src/runtime/pprof/pprof.go:331
#	0xcd9c44	net/http/pprof.handler.ServeHTTP+0x384			/usr/local/go/src/net/http/pprof/pprof.go:253
#	0xcdb496	net/http/pprof.Index+0x8d6				/usr/local/go/src/net/http/pprof/pprof.go:371
#	0x7b9083	net/http.HandlerFunc.ServeHTTP+0x43			/usr/local/go/src/net/http/server.go:2049
#	0x7baf0c	net/http.(*ServeMux).ServeHTTP+0x1ac			/usr/local/go/src/net/http/server.go:2428
#	0x7bc642	net/http.serverHandler.ServeHTTP+0xa2			/usr/local/go/src/net/http/server.go:2867
#	0x7b7b6c	net/http.(*conn).serve+0x8cc				/usr/local/go/src/net/http/server.go:1932

1 @ 0x7b12a1 0x472121
#	0x7b12a0	net/http.(*connReader).backgroundRead+0x0	/usr/local/go/src/net/http/server.go:671

Starting option for thanos store gateway are :

thanos store --http-address=0.0.0.0:19091 --grpc-address=0.0.0.0:19092 --log.level=debug --store.grpc.series-max-concurrency=30 --chunk-pool-size=8GB --index-cache-size=256MB --data-dir=/mnt/thanos_tsdb/thanos-storegateway --objstore.config-file=/opt/thanos/bucket_config.yaml --tracing.config-file=/opt/thanos/backend_jaeger.yaml

@bennyboom38
Copy link
Author

Hello!
we have changed params for thanos store by these ones :
--store.grpc.series-max-concurrency=60 --chunk-pool-size=8GB --index-cache-size=8GB
Seems better now (thanos store no more crashing when we query more than 7 days of history) but we have lost part of data so for now we have only 1 month datas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants