-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug when loading multivalue+multipart String columns #7943
Comments
gianm
added a commit
to gianm/druid
that referenced
this issue
Jan 31, 2025
This patch fixes a class of bugs where various primitive column readers were not providing a SmooshedFileMapper to GenericIndexed, even though the corresponding writer could potentially write multi-file columns. For example, apache#7943 is an instance of this bug. This patch also includes a fix for an issue on the writer for compressed multi-value string columns, V3CompressedVSizeColumnarMultiIntsSerializer, where it would use the same base filename for both the offset and values sections. This bug would only be triggered for segments in excess of 500 million rows. When a segment has fewer rows than that, it could potentially have a values section that needs to be split over multiple files, but the offset is never more than 4 bytes per row. This bug was triggered by the new tests, which use a smaller fileSizeLimit.
gianm
added a commit
that referenced
this issue
Feb 3, 2025
* Various fixes for large columns. This patch fixes a class of bugs where various primitive column readers were not providing a SmooshedFileMapper to GenericIndexed, even though the corresponding writer could potentially write multi-file columns. For example, #7943 is an instance of this bug. This patch also includes a fix for an issue on the writer for compressed multi-value string columns, V3CompressedVSizeColumnarMultiIntsSerializer, where it would use the same base filename for both the offset and values sections. This bug would only be triggered for segments in excess of 500 million rows. When a segment has fewer rows than that, it could potentially have a values section that needs to be split over multiple files, but the offset is never more than 4 bytes per row. This bug was triggered by the new tests, which use a smaller fileSizeLimit. * Use a Random seed. * Remove erroneous test code. * Fix two compilation problems. * Add javadocs. * Another javadoc.
Fixed by #17691. |
317brian
pushed a commit
to 317brian/druid
that referenced
this issue
Feb 3, 2025
* Various fixes for large columns. This patch fixes a class of bugs where various primitive column readers were not providing a SmooshedFileMapper to GenericIndexed, even though the corresponding writer could potentially write multi-file columns. For example, apache#7943 is an instance of this bug. This patch also includes a fix for an issue on the writer for compressed multi-value string columns, V3CompressedVSizeColumnarMultiIntsSerializer, where it would use the same base filename for both the offset and values sections. This bug would only be triggered for segments in excess of 500 million rows. When a segment has fewer rows than that, it could potentially have a values section that needs to be split over multiple files, but the offset is never more than 4 bytes per row. This bug was triggered by the new tests, which use a smaller fileSizeLimit. * Use a Random seed. * Remove erroneous test code. * Fix two compilation problems. * Add javadocs. * Another javadoc.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Affected Version
0.13.0 and likely later versions, not sure what the earliest affected version is
Description
A user reported errors loading certain segments after upgrading from 0.11.0 -> 0.13.0: https://groups.google.com/forum/?pli=1#!topic/druid-user/m6IAMFLRrQM
The error and stack trace:
The segment in question is quite large (7GB+):
DataSegment{size=7112133889,
From that, it looks like
CompressedVSizeColumnarIntsSupplier.fromByteBuffer
may need to handle the multipart column case and sometimes callpublic static <T> GenericIndexed<T> read(ByteBuffer buffer, ObjectStrategy<T> strategy, SmooshedFileMapper fileMapper)
with aSmooshedFileMapper
.The text was updated successfully, but these errors were encountered: