-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce concept of "storage class" wtih separate dataset for "blob" storage data #3064
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3064 +/- ##
==========================================
- Coverage 77.57% 77.40% -0.18%
==========================================
Files 240 240
Lines 78683 79578 +895
Branches 78683 79578 +895
==========================================
+ Hits 61040 61596 +556
- Misses 14498 14820 +322
- Partials 3145 3162 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…class are kept in a separate (nested) blob dataset. This allows use to eventually change max_bytes_per_file to be much smaller. "Blob columns" will have relatively few rows per file. "Scalar" columns will have many rows per file. This allows us to optimize query performance and compaction times.
…rs (e.g. schema) as part of the 'data bytes'
df10405
to
3d5505e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good. I think I'm fine with storage class, though not sure what other values we will want there.
/// "" - The default storage class. | ||
/// "blob" - The field is compacted into fewer rows per fragment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the criteria for this entirely based on size of row values? or do you imagine there will eventually be other considerations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size is the only thing I can think of and I don't expect us to need more than two storage classes for the next year or two.
It's possible we have a special "foreign" storage class or something like that for values stored in remote datasets but I think that would be best solved in a different way.
At the moment the only other class I can really think of is for extremely large values where file-per-value makes more sense.
I also hadn't quite done the math yet. We want to balance:
- We can fit 8MiB worth of values into a file (min bound on num_values)
- We don't have more than 10GiB of values in a file (max bound on num_values)
For example, if we start with int32 we need at least 2Mi values (ideally more actually due to compression). Given 10GiB this gives us a surprisingly small max size of 5KiB (anything larger will occupy more than 10GiB for 2Mi values).
If we are generous and make the cutoff at 64KiB then we need to make sure we have at least 128 values per file which should allow us up to 160MIB per value.
If we find that we can't be generous and we really do want the cutoff at...say...8KiB then we need at least 1024 values per file which means no more than 10MiB per value which is starting to get to the point where we might want three storage classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we find that we can't be generous and we really do want the cutoff at...say...8KiB
This logic is based on the 10GiB limit which is a somewhat arbitrary "let's pretend we have at most 100 or so 'big' columns and we want to keep files under 1TiB" criteria which seems a little conservative. At 64KiB we're fine as long as we only have 1 or 2 big columns or can tolerate files over 1TiB.
I'm open to terms other than "storage class" as well. I also think we might want to use something other than "blob" as it can be easy to confuse the concepts of "blob encoding" (e.g. low page overhead, file-like API) and "blob storage class" (fewer rows per file). In fact, the thresholds can even be different with the blob encoding threshold being 8-16MB but the blob storage class threshold being closer to 128KB
Closes #3029