Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make a copy of KVs coming from boltdb which are only valid until the transaction is valid #2971

Merged
merged 2 commits into from
Aug 3, 2020

Conversation

sandeepsukhani
Copy link
Contributor

@sandeepsukhani sandeepsukhani commented Aug 1, 2020

What this PR does:
We noticed a panic when running Loki with boltdb-shipper at scale with all the goroutines stack dumped. Following is the relevant stack trace:

	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/series_store.go:292 +0x4aa
created by github.com/cortexproject/cortex/pkg/chunk.(*seriesStore).lookupSeriesByMetricNameMatchers
	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc017e79f88 sp=0xc017e79f80 pc=0x4677c1
runtime.goexit()
	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/series_store.go:293 +0xe0 fp=0xc017e79f80 sp=0xc017e79ec8 pc=0xd40c20
github.com/cortexproject/cortex/pkg/chunk.(*seriesStore).lookupSeriesByMetricNameMatchers.func1(0xc000162600, 0xc07351f2a0, 0x173a5c99b80, 0x173a5d422d0, 0xc000cbf3fc, 0x2, 0x27e5a57, 0x4, 0x0, 0xc0c0a5f680, ...)
	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/series_store.go:347 +0x102 fp=0xc017e79ec8 sp=0xc017e79e20 pc=0xd33c12
github.com/cortexproject/cortex/pkg/chunk.(*seriesStore).lookupSeriesByMetricNameMatcher(0xc000162600, 0x2e2f2c0, 0xc0250af2c0, 0x173a5c99b80, 0x173a5d422d0, 0xc000cbf3fc, 0x2, 0x27e5a57, 0x4, 0xc0250aede0, ...)
	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/chunk_store.go:486 +0x68c fp=0xc017e79e20 sp=0xc017e79ba0 pc=0xd105cc
github.com/cortexproject/cortex/pkg/chunk.(*baseStore).lookupIdsByMetricNameMatcher(0xc000162600, 0x2e2f2c0, 0xc0250af410, 0x173a5c99b80, 0x173a5d422d0, 0xc000cbf3fc, 0x2, 0x27e5a57, 0x4, 0xc0250aede0, ...)
	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/chunk_store.go:526 +0x1d6 fp=0xc017e79ba0 sp=0xc017e79ad0 pc=0xd11ad6
github.com/cortexproject/cortex/pkg/chunk.(*baseStore).lookupEntriesByQueries(0xc000162600, 0x2e2f2c0, 0xc0250af680, 0xc0191e8700, 0x10, 0x10, 0x0, 0x0, 0x0, 0x0, ...)
	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/storage/caching_index_client.go:176 +0xe7d fp=0xc017e79ad0 sp=0xc017e791d0 pc=0x1fa082d
github.com/cortexproject/cortex/pkg/chunk/storage.(*cachingIndexClient).QueryPages(0xc0007bca40, 0x2e2f2c0, 0xc0250af680, 0xc0191e8700, 0x10, 0x10, 0xc00a583160, 0x0, 0x0)
	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/util/util.go:114 +0xe5 fp=0xc017e791d0 sp=0xc017e79128 pc=0x1ca0eb5
github.com/cortexproject/cortex/pkg/chunk/util.QueryFilter.func1(0xc102781120, 0x14, 0xc1027812a0, 0x15, 0x0, 0x0, 0x0, 0xc00ef0ec00, 0x2c, 0x2c, ...)
	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/chunk_store.go:529 +0xa3 fp=0xc017e79128 sp=0xc017e79050 pc=0xd3f313
github.com/cortexproject/cortex/pkg/chunk.(*baseStore).lookupEntriesByQueries.func1(0xc102781120, 0x14, 0xc1027812a0, 0x15, 0x0, 0x0, 0x0, 0xc00ef0ec00, 0x2c, 0x2c, ...)
	/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/util/util.go:96 +0x16c fp=0xc017e79050 sp=0xc017e78fe0 pc=0x1ca01ec
github.com/cortexproject/cortex/pkg/chunk/util.(*filteringBatchIter).Next(0xc00cd36c00, 0x2e0aa00)
	/usr/local/go/src/bytes/bytes.go:27
bytes.Compare(...)
	/usr/local/go/src/internal/bytealg/compare_amd64.s:48 +0x3f fp=0xc017e78fe0 sp=0xc017e78fd8 pc=0x401fff
cmpbody()
	/usr/local/go/src/runtime/signal_unix.go:702 +0x3cc fp=0xc017e78fd8 sp=0xc017e78fa8 pc=0x44b70c
runtime.sigpanic()
	/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc017e78fa8 sp=0xc017e78f78 pc=0x434ae2
runtime.throw(0x27e6f96, 0x5)
goroutine 47451747 [running]:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f28c4607156 pc=0x401fff]
fatal error: fault
unexpected fault address 0x7f28c4607156

Digging more into it I came across a relevant bug golang/go#33047
This PR fixes the issue of KVs coming from boltdb being valid until the transaction is valid by making a copy of them.

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Sandeep Sukhani <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job spotting it! LGTM

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pracucci pracucci merged commit 4aefed8 into cortexproject:master Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants