-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 'auto' encoded longs + compression serializer #6045
Conversation
Fixes apache#6044 changes: * Fixes `VSizeLongSerde` serializers to treat 'close' as 'flush' when used with `BlockLayoutColumnarLongsSerializer`, allowing unwritten values to be flushed to the buffer when the block is compressed * Add exhaustive unit test that flexes a variety of value sizes, row counts, and compression strategies to catch issues such as these :
I don't think the test you wrote should be |
I think it's fine to enable by default if it just takes one or two minutes since this kind of bugs is difficult to catch without various testing. |
@clintropolis what do you think about adding |
@jihoonson I considered that but it seemed more invasive of a change and I'm trying to deprecate this code path anyway.. That said, it is strange that |
@jihoonson What if I just rename I think alternative involves adding a |
@clintropolis I'm fine with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest change looks good to me.
Fixes #6044 by changing
BlockLayoutColumnarLongsSerializer
block flushing to callLongEncodingWriter.setBuffer
which will create new serializers.To help catch further issues like this, the only thing I could think of that would make me feel reasonably confident about stuff was to add a rather exhaustive test that tries a variety of 'bits per value', number of rows, and compression strategies, but it unfortunately adds a minute or two to overall test time.
It isn't even completely exhaustive, as the behavior of
IntermediateColumnarLongsSerializer
isn't directly controllable, but both table and delta encodings are exercised. I'm considering marking it@Ignore
.