Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in store binary index header feature #2147

Closed
brancz opened this issue Feb 18, 2020 · 5 comments · Fixed by #2151
Closed

segfault in store binary index header feature #2147

brancz opened this issue Feb 18, 2020 · 5 comments · Fixed by #2151

Comments

@brancz
Copy link
Member

brancz commented Feb 18, 2020

Thanos, Prometheus and Golang version used:

Thanos master-2020-02-13-adfef4b5

Object Storage Provider: AWS S3

What happened: Store errored with segfault.

What you expected to happen: Store does not segfault 🙂

How to reproduce it (as minimally and precisely as possible): Unclear, but the thing to note probably is that this store has the experimental binary index header feature enabled.

Full logs to relevant components:

Full log from start of the store:

https://gist.github.com/brancz/2cd7d982b844b521bf4301895bafb61b

Anything else we need to know:

n/a

@bwplotka
Copy link
Member

bwplotka commented Feb 18, 2020

Thanks for report.

Segfault seems to be here:

// removeElement is used to remove a given list element from the cache
func (c *LRU) removeElement(e *list.Element) {
	c.evictList.Remove(e)
	kv := e.Value.(*entry)
->	delete(c.items, kv.key)
	if c.onEvict != nil {
		c.onEvict(kv.key, kv.value)
	}
}

And particularly for cacheKeyPostings which is created here:

// StorePostings sets the postings identified by the ulid and label to the value v,
// if the postings already exists in the cache it is not mutated.
func (c *InMemoryIndexCache) StorePostings(ctx context.Context, blockID ulid.ULID, l labels.Label, v []byte) {
	c.set(cacheTypePostings, cacheKey{blockID, cacheKeyPostings(l)}, v)

It might be the case that l labels.Label are backed up by mmaped memory, so if:

  • Block is removed/unloaded
  • Mmap unloads for other reasons?

We will have SEGFault. Implementing a fix now.

@bwplotka
Copy link
Member

bwplotka commented Feb 18, 2020

The exact issue is when using hash key in ASM:

	// load data to be hashed
	MOVOU	(AX), X2

Bad news is that I cannot reproduce this locally with the test cases I thought they will trigger this...

@bwplotka
Copy link
Member

Trying to repro this here: ea6a26c

No luck yet.

bwplotka added a commit that referenced this issue Feb 18, 2020
Fixes: #2147

+ Test for Repro.

Signed-off-by: Bartlomiej Plotka <[email protected]>
@bwplotka
Copy link
Member

Ok, got it. Was actually pretty close with reproduction (:

#2151

Fix will be shipped tomorrow, hopefully before RC.

cc @metalmatze @brancz

bwplotka added a commit that referenced this issue Feb 19, 2020
Fixes: #2147

+ Test for Repro.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Feb 19, 2020
@bwplotka
Copy link
Member

Fixed: #2151

bwplotka added a commit that referenced this issue Feb 19, 2020
vankop pushed a commit to monitoring-tools/thanos that referenced this issue Feb 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants