-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: use consistent iterators when needed, and when possible #58515
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 7 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/engine.go, line 277 at r1 (raw file):
// TODO(sumeer): this partial consistency can be a source of bugs if future // code starts relying on it, but rarely uses a Reader that does not guarantee // it. Can we enumerate the current cases where KV uses Engine as a Reader?
Is Engine the only implementation that does not guarantee a consistent snapshot? It might be worth adding a ConsistentReads() bool
method to this interface, if only to force each implementation to consider the question and to serve as documentation.
pkg/storage/pebble.go, line 1204 at r1 (raw file):
// iterators that make separated locks/intents look as interleaved need to // use both simultaneously. // When the first iterator is initialized, the underlying *pebble.Iterator
So this is still grabbing the snapshot lazily? That will make an issue like #55461 difficult to address in its entirety because it means that we still have less control than we'd like of exactly when the snapshot it captured. How do we feel about pebbleReadOnly
grabbing an iterator immediately upon creation?
pkg/storage/pebble.go, line 1210 at r1 (raw file):
// which causes pebbleIterator.Close to not close iter. iter will be closed // when pebbleReadOnly.Close is called. prefixIter pebbleIterator
Not directly related to this PR, but I remember @petermattis mentioning that Pebble would give us the ability to switch an iterator into and out of prefix mode without needing to create two separate iterators. Is that something we'd like to pursue?
pkg/storage/pebble_iterator.go, line 88 at r1 (raw file):
// there are no timestamp hints, else it is created using handle. func (p *pebbleIterator) init( handle pebble.Reader, iterToClone *pebble.Iterator, opts IterOptions,
To prevent abuse and to make the intention abundantly clear in this param and in places where we hold on to this, we might want to wrap this iterator in some slim interface that only allows for cloning.
e4779bb
to
245a009
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/storage/engine.go, line 277 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is Engine the only implementation that does not guarantee a consistent snapshot? It might be worth adding a
ConsistentReads() bool
method to this interface, if only to force each implementation to consider the question and to serve as documentation.
Yes, Engine
is the only implementation that does not make that guarantee.
I've added ConsistentIterators
.
Can you shed any light on this question about what parts of KV use Engine
directly?
pkg/storage/pebble.go, line 1204 at r1 (raw file):
So this is still grabbing the snapshot lazily?
Yes
That will make an issue like #55461 difficult to address in its entirety because it means that we still have less control than we'd like of exactly when the snapshot it captured. How do we feel about pebbleReadOnly grabbing an iterator immediately upon creation?
I've added a TODO here. I am worried about introducing more semantic differences among different Reader
implementations. I would like to strengthen the calling-side first so we have a clearer understanding of when different Reader
implementations are being used. Once we have that, this would be a trivial change. What do you think?
Also, do we need this eager creation for pebbleBatch
too, as indicated in that issue? Unlike pebbleReadOnly
, I am guessing there are clients of pebbleBatch
who never create an iterator, so creating one eagerly may be wasteful -- are there any such cases?
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Not directly related to this PR, but I remember @petermattis mentioning that Pebble would give us the ability to switch an iterator into and out of prefix mode without needing to create two separate iterators. Is that something we'd like to pursue?
Do we never simultaneously need a prefix and non-prefix iterator?
If so, I can add a TODO here to remind us to improve this, since Pebble
already supports it.
pkg/storage/pebble_iterator.go, line 88 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
To prevent abuse and to make the intention abundantly clear in this param and in places where we hold on to this, we might want to wrap this iterator in some slim interface that only allows for cloning.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Do we never simultaneously need a prefix and non-prefix iterator?
I have a recollection that we do, though perhaps only in tests. When we initially introduced Pebble we tried to use a single iterator and switch it back and forth between prefix and non-prefix modes. Something broke, but I don't recall offhand what it was. You might be able to look back at old PRs and see if there is any commentary on the breakage.
pkg/storage/pebble.go, line 1210 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
My memory is also hazy from late 2019 when this came up, but we did go with the single-iter path at first and came across one code path where we quickly spin up a prefixIter while having an open normal iter on a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r1, 7 of 7 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @petermattis, and @sumeerbhola)
pkg/storage/engine.go, line 277 at r1 (raw file):
I've added ConsistentIterators.
You might want to mention in the comment that it is meant for documentation purposes, just so future readers don't think it's used for more than it is.
There are a number of users of Engine
, but there should be none in the request evaluation path. From what I'm aware, it's mostly used for server-level tasks, which access it through Store.Engine
and don't need consistency across multiple iterators.
pkg/storage/pebble.go, line 1204 at r1 (raw file):
I would like to strengthen the calling-side first so we have a clearer understanding of when different Reader implementations are being used.
By that, do you mean create an extension to the Reader
interface (i.e. ConsistentReader
) that some callers expect so that we can capture this property in the type system? If so, I'm on board with that.
Also, do we need this eager creation for pebbleBatch too, as indicated in that issue? Unlike pebbleReadOnly, I am guessing there are clients of pebbleBatch who never create an iterator, so creating one eagerly may be wasteful -- are there any such cases?
The only places where that might be the case are when a request results in a call to MVCCBlindPut/MVCCBlindConditionalPut/MVCCBlindInitPut
, but it's not clear to me that even in those cases, the pebbleBatch is never read from. It certainly is for transactional batches (see
checkIfTxnAborted) and I think it may also be read from
optimizePuts` even for non-transactional batches.
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
My memory is also hazy from late 2019 when this came up, but we did go with the single-iter path at first and came across one code path where we quickly spin up a prefixIter while having an open normal iter on a
pebbleReadOnly
. One of the unit tests surfaced that, but I don't remember which one it was, and I can't find the corresponding PR.
It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @sumeerbhola)
pkg/storage/pebble.go, line 1210 at r1 (raw file):
It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?
Agreed. Part of the intention behind Pebble's SeekPrefixGE
API was being able to switch an iterator from normal to prefix iteration and back. I'm hopeful we can get down to a single iterator. If we can't for some reason, it would be good to know why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 7 files at r1, 7 of 7 files at r2.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?
Agreed. Part of the intention behind Pebble's
SeekPrefixGE
API was being able to switch an iterator from normal to prefix iteration and back. I'm hopeful we can get down to a single iterator. If we can't for some reason, it would be good to know why.
I can take this on as a follow-on task. Finding the test case could be as simple as panicking on the second concurrent iterator opening, and running the test suite. Let me try that and report back.
For pebbleBatch and pebbleReadOnly, all iterators without timestamp hints see the same underlying engine state, with no interface change for the caller. This is done by cloning the first created pebble.Iterator. intentInterleavingIter explicitly requests a clone, when it creates two iterators, which ensures the consistency guarantee applies to all Reader implementations. Informs cockroachdb#55461 Informs cockroachdb#41720 Release note: None
245a009
to
49d259b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/storage/engine.go, line 277 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I've added ConsistentIterators.
You might want to mention in the comment that it is meant for documentation purposes, just so future readers don't think it's used for more than it is.
There are a number of users of
Engine
, but there should be none in the request evaluation path. From what I'm aware, it's mostly used for server-level tasks, which access it throughStore.Engine
and don't need consistency across multiple iterators.
I've added panics in a couple of places in replica_read.go
and replica_write.go
so that this is more than documentation.
pkg/storage/pebble.go, line 1204 at r1 (raw file):
By that, do you mean create an extension to the Reader interface (i.e. ConsistentReader) that some callers expect so that we can capture this property in the type system? If so, I'm on board with that.
Something like that. I've added a link to this discussion to the TODO so that I don't forget.
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I can take this on as a follow-on task. Finding the test case could be as simple as panicking on the second concurrent iterator opening, and running the test suite. Let me try that and report back.
Thanks Bilal!
bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
Build succeeded: |
For pebbleBatch and pebbleReadOnly, all iterators without timestamp
hints see the same underlying engine state, with no interface
change for the caller. This is done by cloning the first created
pebble.Iterator.
intentInterleavingIter explicitly requests a clone, when it creates
two iterators, which ensures the consistency guarantee applies to
all Reader implementations.
Informs #55461
Informs #41720
Release note: None