-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds list decode support for mini-block encoded data #3241
feat: adds list decode support for mini-block encoded data #3241
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3241 +/- ##
==========================================
+ Coverage 78.47% 78.96% +0.48%
==========================================
Files 245 246 +1
Lines 85088 86313 +1225
Branches 85088 86313 +1225
==========================================
+ Hits 66772 68153 +1381
+ Misses 15501 15341 -160
- Partials 2815 2819 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have a few minor questions.
debug_assert!( | ||
buf.len() | ||
<= bytes_rep | ||
+ bytes_def | ||
+ bytes_val | ||
+ 6 | ||
+ 1 // P1 | ||
+ (2 * MINIBLOCK_MAX_PADDING) // P2/P3 | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda funky formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because of the inline comments. I can break it up but there is a good chance this particular code will be changing within the month.
vals_in_chunk: self.vals_in_chunk, | ||
ranges: self.ranges.clone(), | ||
vals_targeted: self.vals_targeted, | ||
// TODO: Add test cases for the all-preamble and all-trailer cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO done? or a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow-up. I'll go through and make follow-ups before I merge, I think there are 2 or 3 in this PR.
// | ||
// So if `ChunkInstructions` is "skip preamble, skip 10, take 50, take trailer" and we are decoding in | ||
// batches of size 10 we might have a `ChunkDrainInstructions` that targets that chunk and has its own | ||
// skip of 17 and take of 10. This would mean we decode the chunk, skip the preamble and 27 rows, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe?
// skip of 17 and take of 10. This would mean we decode the chunk, skip the preamble and 27 rows, and | |
// skip of 17 and take of 10. This would mean we decode the chunk, skip the preamble and 17 rows, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's additive:
// Our instructions tell us which rows we want to take from this chunk
let row_range_start =
instructions.rows_to_skip + instructions.chunk_instructions.rows_to_skip;
let row_range_end = row_range_start + instructions.rows_to_take;
We add instructions.rows_to_skip
(17 in the above example) and instructions.chunk_instructions.rows_to_skip
(10 in the above example) to get the actual range in the chunk (27..37
in the above example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I was mixing up with the 10 from the take.
// "no preamble, skip 5, take 10, take trailer" and we are draining 20 rows then the | ||
// first drain instructions will have no preamble, skip 0, take 11 and the second chunk | ||
// instructions will have take preamble, skip 0, take 9 (assuming the second chunk has at | ||
// least 9 rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ignore the skips? Also why 11 when the total is take 10? Is that because the trailer is counting as 1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, the total is 11 because of the trailer. When we're acutally zoomed into the rep levels and mapping the ranges the trailer looks like a normal row (e.g. look at the middle chunk below):
1 0 0 0 1 0 | 0 0 1 0 0 1 0 0 1 0 0 1 0 0 | 0 0 1 0
So the logic was (at the time) simpler to just treat it as a row by the time we hit map_ranges. Maybe it would be simpler if we keep it a separate boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a separate boolean, just wanted to clarify the logic here. Also TBH I'm confused about the concept of draining 20 rows from a take 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was meant to be draining 20 rows from a series of chunks starting with a chunk that only has 10 rows and a trailer. However, I cleaned up the wording and made the example more explicit:
// One very confusing bit is that `rows_to_take` includes the trailer. So if we have two chunks:
// -no preamble, skip 5, take 10, take trailer
// -take preamble, skip 0, take 50, no trailer
//
// and we are draining 20 rows then the drain instructions for the first batch will be:
// - no preamble, skip 0 (from chunk 0), take 11 (from chunk 0)
// - take preamble (from chunk 1), skip 0 (from chunk 1), take 9 (from chunk 1)
impl ChunkInstructions { | ||
// Given a repetition index and a set of user ranges we need to figure out how to read from the chunks | ||
// | ||
// We assume that `user_ranges` are in sorted order and non-overlapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a debug_assert
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe but we should probably put it higher up in decoder.rs
. This is an assumption we make throughout the decoders and many of them rely on it.
This comment was less "oh, in this spot we have this extra guarantee" and more "remember that we know this"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have something higher up that is fine.
@@ -1359,6 +1358,9 @@ impl StructuralFieldScheduler for StructuralListScheduler { | |||
} | |||
} | |||
|
|||
/// Scheduling job for list data | |||
/// | |||
/// It doesn't really do anything right now because list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a unfinished sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yes it is 🤦. Changed to:
/// Scheduling job for list data
///
/// Scheduling is handled by the primitive encoder and nothing special
/// happens here.
61ef307
to
d21781d
Compare
Lists are encoded using rep/def levels and a repetition index. At decode time we take all this information to be able to fetch individual ranges of lists.