Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blockstore: Migrate ShredIndex type to more efficient data structure #3900

Open
wants to merge 52 commits into
base: master
Choose a base branch
from

Conversation

cpubot
Copy link

@cpubot cpubot commented Dec 3, 2024

Problem

The current blockstore index type, backed by a BTreeSet, suffers from performance issues in serialization / deserialization due to its dynamically allocated and balanced nature. See #3570 for context.

Summary of Changes

Fixes #3570

This PR implements the ShredIndex behavior behind a bit vec (Vec<u8>).

Migration strategy

The general goal is to avoid the overhead of writing two separate columns while still supporting the downgrade path. While this can be accomplished with two distinct columns, for this proposal, rather than writing to a new column, I am proposing we add support for both formats in the same column. This will avoid an additional read to rocksdb, as we can attempt deserialization of both formats on the same rocksdb slice. In order to support the downgrade path, this initial PR will solely add support for reading the new format.

See release steps

The idea here is to split the column migration across three releases such that:

  1. Initial release simply adds support for reading the new format as a fallback case, and does no writing of the new format.
    • This lays the foundation for a downgrade. For example, assume operators have upgraded to release 2/3 in the chain (bullet point 2), and as such have been solely writing to the new format. In the event of a downgrade, release 1/3 still understands how to read the new format, while continuing to read and write the legacy version.
    • This ensures release 1/3 doesn't incur the overhead of serializing and writing the new format, but can still understand and use it in the event of a downgrade.
  2. This release reads and writes the new format as its primary target, yet still understands the legacy column for fallback reads (i.e., we swap the deserialization attempt order). It does no writing of the legacy format.
    • This instantiates the migration. We can safely downgrade to release 1 because it understands how to read the new format that was written in release 2.
  3. Once the release is considered stable and we don't anticipate a downgrade, we can remove support for the legacy format and its associated fallback reads.

@cpubot cpubot changed the title Shred index next Blockstore: Migrate ShredIndex type to more efficient data structure Dec 3, 2024
Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me. I like the phased adoption strategy you've proposed.

Left a couple nits and potential perf optimizations.

My only real concern is how tied we are to the current shred limit. It should be easy to transition to a world where shred limit is >32k

ledger/src/blockstore.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
@cpubot
Copy link
Author

cpubot commented Dec 6, 2024

Looks pretty good to me. I like the phased adoption strategy you've proposed.

Left a couple nits and potential perf optimizations.

My only real concern is how tied we are to the current shred limit. It should be easy to transition to a world where shred limit is >32k

This is @steviez's strategy for the record! Just my slightly tweaked (for my own clarity) summary of it.

Almost all of the code should be portable across >32k since MAX_U64S_PER_SLOT is a function of MAX_DATA_SHREDS_PER_SLOT, and everything is written against either MAX_U64S_PER_SLOT and MAX_DATA_SHREDS_PER_SLOT.
(Also just pushed a change to ensure we have an extra u64 slot available if MAX_DATA_SHREDS_PER_SLOT % 64 != 0).

I believe the only thing that would require change when we increase this limit is the deserialization function. Right now it's explicitly checking std::mem::size_of::<[u64; MAX_U64S_PER_SLOT]>, but this could easily be made more flexible when the limit is increased.

@cpubot cpubot requested a review from bw-solana December 6, 2024 09:04
bw-solana
bw-solana previously approved these changes Dec 6, 2024
Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review this later today - just throwing this in here so I get a chance to do so before merging

Copy link

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, things are looking good ! I have a bunch of minor comments, and two more major items.

Endianness - Our current situation is kind of all over the place:

  • We use big endian for keys
  • We use little endian for things that are serialized (little is default for bincode)

If we use native to write these, that might break some existing guarantees/workflows. One such case would be downloading rocksdb backups from our warehouse node uploads. Our nodes push up compressed rocksdb archives, so thing would obviously break if the native endianness different from the warehouse node and downloading node. I need to think about this aspect a bit more.

Migration strategy - In the steps I originally wrote up, I mentioned targeting v2.1 as the version to land the write-legacy-format-but-can-read-new-format change. Given that tip of master is currently v2.2, this means we'd have to BP. If we want to BP, I think we need to slim this PR down as much as possible given where v2.1 is in its' release cycle. That would mean including only what is necessary to deserialize into the new type + convert it into the old type (plus a compatibility test)

That being said, we should probably ensure we have some amount of buy in for the v2.1 BP before you go restructure stuff. @bw-solana, what are your thoughts on this ?

ledger/src/blockstore.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_db.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
Copy link
Member

@vadorovsky vadorovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good

ledger/src/blockstore_meta.rs Show resolved Hide resolved
@cpubot cpubot requested a review from bw-solana January 7, 2025 20:13
@cpubot
Copy link
Author

cpubot commented Jan 7, 2025

modified version of branch reading/writing the new format on mnb
2025-01-07-150223_hyprshot

bw-solana
bw-solana previously approved these changes Jan 7, 2025
Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest looks good to me. Would be good for others to weigh in before merging

Copy link

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things, this is looking good in general tho and I think this is pretty close to being ready

ledger/src/blockstore_db.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
@alessandrod alessandrod self-requested a review January 10, 2025 01:57
Copy link

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, just a couple of nits

ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_db.rs Show resolved Hide resolved
@cpubot cpubot requested review from steviez and alessandrod January 11, 2025 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.1 Backport to v2.1 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blockstore: Migrate ShredIndex type to more efficient data structure
8 participants