-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blockstore: Migrate ShredIndex type to more efficient data structure #3900
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me. I like the phased adoption strategy you've proposed.
Left a couple nits and potential perf optimizations.
My only real concern is how tied we are to the current shred limit. It should be easy to transition to a world where shred limit is >32k
This is @steviez's strategy for the record! Just my slightly tweaked (for my own clarity) summary of it. Almost all of the code should be portable across I believe the only thing that would require change when we increase this limit is the deserialization function. Right now it's explicitly checking |
1aaa647
to
cac0912
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review this later today - just throwing this in here so I get a chance to do so before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, things are looking good ! I have a bunch of minor comments, and two more major items.
Endianness - Our current situation is kind of all over the place:
- We use big endian for keys
- We use little endian for things that are serialized (little is default for bincode)
If we use native to write these, that might break some existing guarantees/workflows. One such case would be downloading rocksdb backups from our warehouse node uploads. Our nodes push up compressed rocksdb archives, so thing would obviously break if the native endianness different from the warehouse node and downloading node. I need to think about this aspect a bit more.
Migration strategy - In the steps I originally wrote up, I mentioned targeting v2.1 as the version to land the write-legacy-format-but-can-read-new-format change. Given that tip of master is currently v2.2, this means we'd have to BP. If we want to BP, I think we need to slim this PR down as much as possible given where v2.1
is in its' release cycle. That would mean including only what is necessary to deserialize into the new type + convert it into the old type (plus a compatibility test)
That being said, we should probably ensure we have some amount of buy in for the v2.1 BP before you go restructure stuff. @bw-solana, what are your thoughts on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest looks good to me. Would be good for others to weigh in before merging
00f2a09
to
e676815
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor things, this is looking good in general tho and I think this is pretty close to being ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, just a couple of nits
452d12a
to
1126b20
Compare
1126b20
to
c6d6b80
Compare
Problem
The current blockstore index type, backed by a
BTreeSet
, suffers from performance issues in serialization / deserialization due to its dynamically allocated and balanced nature. See #3570 for context.Summary of Changes
Fixes #3570
This PR implements the
ShredIndex
behavior behind a bit vec (Vec<u8>
).Migration strategy
The general goal is to avoid the overhead of writing two separate columns while still supporting the downgrade path. While this can be accomplished with two distinct columns, for this proposal, rather than writing to a new column, I am proposing we add support for both formats in the same column. This will avoid an additional read to rocksdb, as we can attempt deserialization of both formats on the same rocksdb slice. In order to support the downgrade path, this initial PR will solely add support for reading the new format.
See release steps
The idea here is to split the column migration across three releases such that: