Blockstore: Migrate ShredIndex type to more efficient data structure #3900

cpubot · 2024-12-03T22:54:40Z

Problem

The current blockstore index type, backed by a BTreeSet, suffers from performance issues in serialization / deserialization due to its dynamically allocated and balanced nature. See #3570 for context.

Summary of Changes

Fixes #3570

This PR implements the ShredIndex behavior behind a bit vec (Vec<u8>).

Migration strategy

The general goal is to avoid the overhead of writing two separate columns while still supporting the downgrade path. While this can be accomplished with two distinct columns, for this proposal, rather than writing to a new column, I am proposing we add support for both formats in the same column. This will avoid an additional read to rocksdb, as we can attempt deserialization of both formats on the same rocksdb slice. In order to support the downgrade path, this initial PR will solely add support for reading the new format.

See release steps

The idea here is to split the column migration across three releases such that:

Initial release simply adds support for reading the new format as a fallback case, and does no writing of the new format.
- This lays the foundation for a downgrade. For example, assume operators have upgraded to release 2/3 in the chain (bullet point 2), and as such have been solely writing to the new format. In the event of a downgrade, release 1/3 still understands how to read the new format, while continuing to read and write the legacy version.
- This ensures release 1/3 doesn't incur the overhead of serializing and writing the new format, but can still understand and use it in the event of a downgrade.
This release reads and writes the new format as its primary target, yet still understands the legacy column for fallback reads (i.e., we swap the deserialization attempt order). It does no writing of the legacy format.
- This instantiates the migration. We can safely downgrade to release 1 because it understands how to read the new format that was written in release 2.
Once the release is considered stable and we don't anticipate a downgrade, we can remove support for the legacy format and its associated fallback reads.

bw-solana

Looks pretty good to me. I like the phased adoption strategy you've proposed.

Left a couple nits and potential perf optimizations.

My only real concern is how tied we are to the current shred limit. It should be easy to transition to a world where shred limit is >32k

ledger/src/blockstore.rs

ledger/src/blockstore_meta.rs

cpubot · 2024-12-06T06:18:14Z

Looks pretty good to me. I like the phased adoption strategy you've proposed.

Left a couple nits and potential perf optimizations.

My only real concern is how tied we are to the current shred limit. It should be easy to transition to a world where shred limit is >32k

This is @steviez's strategy for the record! Just my slightly tweaked (for my own clarity) summary of it.

Almost all of the code should be portable across >32k since MAX_U64S_PER_SLOT is a function of MAX_DATA_SHREDS_PER_SLOT, and everything is written against either MAX_U64S_PER_SLOT and MAX_DATA_SHREDS_PER_SLOT.
(Also just pushed a change to ensure we have an extra u64 slot available if MAX_DATA_SHREDS_PER_SLOT % 64 != 0).

I believe the only thing that would require change when we increase this limit is the deserialization function. Right now it's explicitly checking std::mem::size_of::<[u64; MAX_U64S_PER_SLOT]>, but this could easily be made more flexible when the limit is increased.

bw-solana

LGTM

steviez

Will review this later today - just throwing this in here so I get a chance to do so before merging

steviez

Overall, things are looking good ! I have a bunch of minor comments, and two more major items.

Endianness - Our current situation is kind of all over the place:

We use big endian for keys
We use little endian for things that are serialized (little is default for bincode)

If we use native to write these, that might break some existing guarantees/workflows. One such case would be downloading rocksdb backups from our warehouse node uploads. Our nodes push up compressed rocksdb archives, so thing would obviously break if the native endianness different from the warehouse node and downloading node. I need to think about this aspect a bit more.

Migration strategy - In the steps I originally wrote up, I mentioned targeting v2.1 as the version to land the write-legacy-format-but-can-read-new-format change. Given that tip of master is currently v2.2, this means we'd have to BP. If we want to BP, I think we need to slim this PR down as much as possible given where v2.1 is in its' release cycle. That would mean including only what is necessary to deserialize into the new type + convert it into the old type (plus a compatibility test)

That being said, we should probably ensure we have some amount of buy in for the v2.1 BP before you go restructure stuff. @bw-solana, what are your thoughts on this ?

ledger/src/blockstore.rs

ledger/src/blockstore_db.rs

ledger/src/blockstore_meta.rs

vadorovsky

Overall looks good

ledger/src/blockstore_meta.rs

cpubot · 2025-01-07T23:04:06Z

modified version of branch reading/writing the new format on mnb

bw-solana

Latest looks good to me. Would be good for others to weigh in before merging

steviez

Just a few minor things, this is looking good in general tho and I think this is pretty close to being ready

ledger/src/blockstore_db.rs

ledger/src/blockstore_meta.rs

alessandrod

looks great, just a couple of nits

ledger/src/blockstore_meta.rs

ledger/src/blockstore_db.rs

This reverts commit b22c49c.

cpubot added 6 commits December 3, 2024 06:49

Add ShredIndex benchmarks

318a34b

add custom serde for U64ShredIndex

a33db83

Add ShredIndex benchmarks

46697c1

add ShredIndexNext implementation

55f901e

Add ShredIndexNext support to Blockstore::get_index

b52dc75

Use custom serde for ShredIndexNext to avoid collisions

14507b3

cpubot changed the title ~~Shred index next~~ Blockstore: Migrate ShredIndex type to more efficient data structure Dec 3, 2024

cpubot requested review from steviez, alessandrod and bw-solana December 3, 2024 22:58

bw-solana reviewed Dec 5, 2024

View reviewed changes

cpubot added 4 commits December 6, 2024 09:31

explicitly use bitwise ops for index / mask

090d0c0

&= !mask -> ^= mask

9fe2353

assert index not out of bounds in remove

99b2dd2

Round up MAX_U64S_PER_SLOT to accommodate %64!=0

90ca29d

cpubot added 3 commits December 5, 2024 22:18

Merge branch 'master' into shred-index-next

12d31dd

Add comment about migration strategy

2c6a43e

IndexNext -> IndexV2

cac0912

cpubot force-pushed the shred-index-next branch from 1aaa647 to cac0912 Compare December 6, 2024 08:50

cpubot requested a review from bw-solana December 6, 2024 09:04

forward compatibility for increases in MAX_DATA_SHREDS_PER_SLOT

2a662d9

bw-solana previously approved these changes Dec 6, 2024

View reviewed changes

steviez requested changes Dec 6, 2024

View reviewed changes

steviez reviewed Dec 9, 2024

View reviewed changes

vadorovsky reviewed Dec 9, 2024

View reviewed changes

ledger/src/blockstore_meta.rs Show resolved Hide resolved

use serialize_tuple to avoid recounting set bits

5da104c

cpubot dismissed bw-solana’s stale review via 5da104c December 9, 2024 14:09

cpubot added 2 commits December 9, 2024 18:17

remove "next" terminology

e416a70

address naming and alias nits

01bdfbb

cpubot added 6 commits January 6, 2025 18:59

reject trailing bytes on serialize

d1a816c

remove unused import

069fb5d

remove old comment

32a5457

clippy

214d1b3

update outdated comment

9d4cf1b

Add padding to IndexV2 to prevent collisions

bc9da48

cpubot requested a review from bw-solana January 7, 2025 20:13

bw-solana previously approved these changes Jan 7, 2025

View reviewed changes

Merge branch 'master' into shred-index-next

e676815

cpubot dismissed bw-solana’s stale review via e676815 January 9, 2025 15:01

cpubot force-pushed the shred-index-next branch from 00f2a09 to e676815 Compare January 9, 2025 15:01

steviez reviewed Jan 9, 2025

View reviewed changes

cpubot added 4 commits January 9, 2025 22:22

use bincode fixint encoding

28115a3

re-introduce TypedColumn::serialize

5edeb17

add proptest regression file

c276d5b

clean up ShredIndexV2 constants

77b0963

alessandrod self-requested a review January 10, 2025 01:57

alessandrod reviewed Jan 10, 2025

View reviewed changes

ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved

ledger/src/blockstore_meta.rs Outdated Show resolved Hide resolved

ledger/src/blockstore_db.rs Show resolved Hide resolved

use u8 bounds ShredIndexV2 boundary conditions test

02f8abc

cpubot force-pushed the shred-index-next branch from 452d12a to 1126b20 Compare January 10, 2025 17:51

improve runtime of proptests

c6d6b80

cpubot force-pushed the shred-index-next branch from 1126b20 to c6d6b80 Compare January 10, 2025 17:52

cpubot added 5 commits January 10, 2025 10:00

Merge branch 'master' into shred-index-next

c7f7529

Document (de)serialization failures between index types

674d6e7

Merge branch 'master' into shred-index-next

adb13d6

propagate Cargo.lock changes

c222ef2

Revert "fix missing usage of get_index abstraction"

0b6928e

This reverts commit b22c49c.

cpubot requested review from steviez and alessandrod January 11, 2025 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blockstore: Migrate ShredIndex type to more efficient data structure #3900

Blockstore: Migrate ShredIndex type to more efficient data structure #3900

cpubot commented Dec 3, 2024 •

edited

Loading

bw-solana left a comment

cpubot commented Dec 6, 2024

bw-solana left a comment

steviez left a comment

steviez left a comment

vadorovsky left a comment

cpubot commented Jan 7, 2025

bw-solana left a comment

steviez left a comment

alessandrod left a comment

Blockstore: Migrate ShredIndex type to more efficient data structure #3900

Are you sure you want to change the base?

Blockstore: Migrate ShredIndex type to more efficient data structure #3900

Conversation

cpubot commented Dec 3, 2024 • edited Loading

Problem

Summary of Changes

Migration strategy

bw-solana left a comment

Choose a reason for hiding this comment

cpubot commented Dec 6, 2024

bw-solana left a comment

Choose a reason for hiding this comment

steviez left a comment

Choose a reason for hiding this comment

steviez left a comment

Choose a reason for hiding this comment

vadorovsky left a comment

Choose a reason for hiding this comment

cpubot commented Jan 7, 2025

bw-solana left a comment

Choose a reason for hiding this comment

steviez left a comment

Choose a reason for hiding this comment

alessandrod left a comment

Choose a reason for hiding this comment

cpubot commented Dec 3, 2024 •

edited

Loading