Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: use consistent iterators when needed, and when possible #58515

Merged
merged 1 commit into from
Jan 11, 2021

Conversation

sumeerbhola
Copy link
Collaborator

For pebbleBatch and pebbleReadOnly, all iterators without timestamp
hints see the same underlying engine state, with no interface
change for the caller. This is done by cloning the first created
pebble.Iterator.

intentInterleavingIter explicitly requests a clone, when it creates
two iterators, which ensures the consistency guarantee applies to
all Reader implementations.

Informs #55461
Informs #41720

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 7 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)


pkg/storage/engine.go, line 277 at r1 (raw file):

// TODO(sumeer): this partial consistency can be a source of bugs if future
// code starts relying on it, but rarely uses a Reader that does not guarantee
// it. Can we enumerate the current cases where KV uses Engine as a Reader?

Is Engine the only implementation that does not guarantee a consistent snapshot? It might be worth adding a ConsistentReads() bool method to this interface, if only to force each implementation to consider the question and to serve as documentation.


pkg/storage/pebble.go, line 1204 at r1 (raw file):

	// iterators that make separated locks/intents look as interleaved need to
	// use both simultaneously.
	// When the first iterator is initialized, the underlying *pebble.Iterator

So this is still grabbing the snapshot lazily? That will make an issue like #55461 difficult to address in its entirety because it means that we still have less control than we'd like of exactly when the snapshot it captured. How do we feel about pebbleReadOnly grabbing an iterator immediately upon creation?


pkg/storage/pebble.go, line 1210 at r1 (raw file):

	// which causes pebbleIterator.Close to not close iter. iter will be closed
	// when pebbleReadOnly.Close is called.
	prefixIter       pebbleIterator

Not directly related to this PR, but I remember @petermattis mentioning that Pebble would give us the ability to switch an iterator into and out of prefix mode without needing to create two separate iterators. Is that something we'd like to pursue?


pkg/storage/pebble_iterator.go, line 88 at r1 (raw file):

// there are no timestamp hints, else it is created using handle.
func (p *pebbleIterator) init(
	handle pebble.Reader, iterToClone *pebble.Iterator, opts IterOptions,

To prevent abuse and to make the intention abundantly clear in this param and in places where we hold on to this, we might want to wrap this iterator in some slim interface that only allows for cloning.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/storage/engine.go, line 277 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is Engine the only implementation that does not guarantee a consistent snapshot? It might be worth adding a ConsistentReads() bool method to this interface, if only to force each implementation to consider the question and to serve as documentation.

Yes, Engine is the only implementation that does not make that guarantee.
I've added ConsistentIterators.
Can you shed any light on this question about what parts of KV use Engine directly?


pkg/storage/pebble.go, line 1204 at r1 (raw file):

So this is still grabbing the snapshot lazily?

Yes

That will make an issue like #55461 difficult to address in its entirety because it means that we still have less control than we'd like of exactly when the snapshot it captured. How do we feel about pebbleReadOnly grabbing an iterator immediately upon creation?

I've added a TODO here. I am worried about introducing more semantic differences among different Reader implementations. I would like to strengthen the calling-side first so we have a clearer understanding of when different Reader implementations are being used. Once we have that, this would be a trivial change. What do you think?

Also, do we need this eager creation for pebbleBatch too, as indicated in that issue? Unlike pebbleReadOnly, I am guessing there are clients of pebbleBatch who never create an iterator, so creating one eagerly may be wasteful -- are there any such cases?


pkg/storage/pebble.go, line 1210 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Not directly related to this PR, but I remember @petermattis mentioning that Pebble would give us the ability to switch an iterator into and out of prefix mode without needing to create two separate iterators. Is that something we'd like to pursue?

Do we never simultaneously need a prefix and non-prefix iterator?
If so, I can add a TODO here to remind us to improve this, since Pebble already supports it.


pkg/storage/pebble_iterator.go, line 88 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

To prevent abuse and to make the intention abundantly clear in this param and in places where we hold on to this, we might want to wrap this iterator in some slim interface that only allows for cloning.

Done

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)


pkg/storage/pebble.go, line 1210 at r1 (raw file):

Do we never simultaneously need a prefix and non-prefix iterator?

I have a recollection that we do, though perhaps only in tests. When we initially introduced Pebble we tried to use a single iterator and switch it back and forth between prefix and non-prefix modes. Something broke, but I don't recall offhand what it was. You might be able to look back at old PRs and see if there is any commentary on the breakage.

@itsbilal
Copy link
Contributor

itsbilal commented Jan 7, 2021


pkg/storage/pebble.go, line 1210 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Do we never simultaneously need a prefix and non-prefix iterator?

I have a recollection that we do, though perhaps only in tests. When we initially introduced Pebble we tried to use a single iterator and switch it back and forth between prefix and non-prefix modes. Something broke, but I don't recall offhand what it was. You might be able to look back at old PRs and see if there is any commentary on the breakage.

My memory is also hazy from late 2019 when this came up, but we did go with the single-iter path at first and came across one code path where we quickly spin up a prefixIter while having an open normal iter on a pebbleReadOnly. One of the unit tests surfaced that, but I don't remember which one it was, and I can't find the corresponding PR.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 7 files at r1, 7 of 7 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @petermattis, and @sumeerbhola)


pkg/storage/engine.go, line 277 at r1 (raw file):

I've added ConsistentIterators.

You might want to mention in the comment that it is meant for documentation purposes, just so future readers don't think it's used for more than it is.

There are a number of users of Engine, but there should be none in the request evaluation path. From what I'm aware, it's mostly used for server-level tasks, which access it through Store.Engine and don't need consistency across multiple iterators.


pkg/storage/pebble.go, line 1204 at r1 (raw file):

I would like to strengthen the calling-side first so we have a clearer understanding of when different Reader implementations are being used.

By that, do you mean create an extension to the Reader interface (i.e. ConsistentReader) that some callers expect so that we can capture this property in the type system? If so, I'm on board with that.

Also, do we need this eager creation for pebbleBatch too, as indicated in that issue? Unlike pebbleReadOnly, I am guessing there are clients of pebbleBatch who never create an iterator, so creating one eagerly may be wasteful -- are there any such cases?

The only places where that might be the case are when a request results in a call to MVCCBlindPut/MVCCBlindConditionalPut/MVCCBlindInitPut, but it's not clear to me that even in those cases, the pebbleBatch is never read from. It certainly is for transactional batches (see checkIfTxnAborted) and I think it may also be read from optimizePuts` even for non-transactional batches.


pkg/storage/pebble.go, line 1210 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

My memory is also hazy from late 2019 when this came up, but we did go with the single-iter path at first and came across one code path where we quickly spin up a prefixIter while having an open normal iter on a pebbleReadOnly. One of the unit tests surfaced that, but I don't remember which one it was, and I can't find the corresponding PR.

It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @sumeerbhola)


pkg/storage/pebble.go, line 1210 at r1 (raw file):

It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?

Agreed. Part of the intention behind Pebble's SeekPrefixGE API was being able to switch an iterator from normal to prefix iteration and back. I'm hopeful we can get down to a single iterator. If we can't for some reason, it would be good to know why.

Copy link
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 7 files at r1, 7 of 7 files at r2.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)


pkg/storage/pebble.go, line 1210 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?

Agreed. Part of the intention behind Pebble's SeekPrefixGE API was being able to switch an iterator from normal to prefix iteration and back. I'm hopeful we can get down to a single iterator. If we can't for some reason, it would be good to know why.

I can take this on as a follow-on task. Finding the test case could be as simple as panicking on the second concurrent iterator opening, and running the test suite. Let me try that and report back.

For pebbleBatch and pebbleReadOnly, all iterators without timestamp
hints see the same underlying engine state, with no interface
change for the caller. This is done by cloning the first created
pebble.Iterator.

intentInterleavingIter explicitly requests a clone, when it creates
two iterators, which ensures the consistency guarantee applies to
all Reader implementations.

Informs cockroachdb#55461
Informs cockroachdb#41720

Release note: None
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/storage/engine.go, line 277 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I've added ConsistentIterators.

You might want to mention in the comment that it is meant for documentation purposes, just so future readers don't think it's used for more than it is.

There are a number of users of Engine, but there should be none in the request evaluation path. From what I'm aware, it's mostly used for server-level tasks, which access it through Store.Engine and don't need consistency across multiple iterators.

I've added panics in a couple of places in replica_read.go and replica_write.go so that this is more than documentation.


pkg/storage/pebble.go, line 1204 at r1 (raw file):

By that, do you mean create an extension to the Reader interface (i.e. ConsistentReader) that some callers expect so that we can capture this property in the type system? If so, I'm on board with that.

Something like that. I've added a link to this discussion to the TODO so that I don't forget.


pkg/storage/pebble.go, line 1210 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I can take this on as a follow-on task. Finding the test case could be as simple as panicking on the second concurrent iterator opening, and running the test suite. Let me try that and report back.

Thanks Bilal!

@sumeerbhola
Copy link
Collaborator Author

bors r+

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal, @nvanbenschoten, and @petermattis)

@craig
Copy link
Contributor

craig bot commented Jan 11, 2021

Build succeeded:

@craig craig bot merged commit f125293 into cockroachdb:master Jan 11, 2021
@sumeerbhola sumeerbhola deleted the iter_clone branch February 11, 2021 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants