-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use mmap for nested column value to dictionary id lookup for more chill heap usage during serialization #14919
use mmap for nested column value to dictionary id lookup for more chill heap usage during serialization #14919
Conversation
processing/src/main/java/org/apache/druid/segment/nested/DictionaryIdLookup.java
Fixed
Show fixed
Hide fixed
Assert.assertEquals(" index: " + i, LONGS[i - 1], writer.get(i)); | ||
} | ||
} else { | ||
Assert.assertEquals(" index: " + i, LONGS[i], writer.get(i)); |
Check failure
Code scanning / CodeQL
Array index out of bounds
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSerializer.java
Fixed
Show fixed
Hide fixed
processing/src/main/java/org/apache/druid/segment/nested/VariantColumnSerializer.java
Fixed
Show fixed
Hide fixed
smoosher.close(); | ||
stringBufferMapper = SmooshedFileMapper.load(stringSmoosh); | ||
final ByteBuffer stringBuffer = stringBufferMapper.mapFile(fileName); | ||
stringDictionary = StringEncodingStrategies.getStringDictionarySupplier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we handle string Dictionary with encoding other than utf8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what the question here is? We currently support 2 encoding strategies for string dictionaries, plain utf8 and front-coded which also stores utf8 strings, just incrementally encoded. https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/StringEncodingStrategy.java#L40
which strategy that is used is controlled via the IndexSpec
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/IndexSpec.java#L84
doubleLookup.defaultReturnValue(-1); | ||
this.arrayLookup = new Object2IntAVLTreeMap<>(FrontCodedIntArrayIndexedWriter.ARRAY_COMPARATOR); | ||
this.arrayLookup.defaultReturnValue(-1); | ||
if (stringDictionary == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great idea to emit metrics of usage of the FileSmoosh to monitor the dictionaries larger than 2gb, not sure if we can somehow track that today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what metric do you have in mind? Also why 2gb instead of just a metric on the size of the components? Also shouldn't such a metric really be much broader in scale and not just apply to nested columns? This doesn't feel like either a blocker or something that should be resolved in this PR, but agree it does sound potentially nice to have a better breakdown on what is driving segment size.
Tangentially related, I'd still like to do this someday #7124 to have another way to look inside segments and see what is driving the size of stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can emit the metrics whenever we need to use File Smoosh for given large dictionary, this would be given us some warning signs that dictionary is getting bigger.
// for strings because of this. if other type dictionary writers could potentially use multiple internal files | ||
// in the future, we should transition them to using this approach as well (or build a combination smoosher and | ||
// mapper so that we can have a mutable smoosh) | ||
File stringSmoosh = FileUtils.createTempDir(name + "__stringTempSmoosh"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this dictionary name conflict with some other dictionaries in the case if we have multiple instances running on the same machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
behind FileUtils.createTempDir
this is using Files.createTempDirectory
which uses java.io.tmpdir
, we use this in a lot of places when building segment files, so it should be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to add more details, for a column named "nested", the temp dir looks something like this /var/folders/8y/mhfmxp391pl9m2h103s_kn200000gn/T/nested__stringTempSmoosh14881555230535603507
, same true for the other temp files I'm generating that use Files.createTempFile
directly, e.g. /var/folders/8y/mhfmxp391pl9m2h103s_kn200000gn/T/nested__longDictionary8646593343499944990.tmp
Looks good to me. |
apache#15068) Fixes a bug caused by apache#14919, which was just using the column name as part of a temp file name, which.. isn't very cool, my bad. Switched to use StringUtils.urlEncode so that ugly chars don't explode stuff. The modified test fails without the changes in this PR.
apache#15068) Fixes a bug caused by apache#14919, which was just using the column name as part of a temp file name, which.. isn't very cool, my bad. Switched to use StringUtils.urlEncode so that ugly chars don't explode stuff. The modified test fails without the changes in this PR.
Description
Switches
DictionaryIdLookup
to serialize the value dictionaries and then use mmap to load into their respectiveIndexed
implementations instead of using the the{x}2IntMap
heap collections we were previously using, resulting in dramatically more predictable heap usage at very minor performance cost.Before (the peaks are merge/publish, 10m rows, 5m max segment size):
After (the dips are merge/publish with the peaks at the end being bitmap index creation):
It does add a bit extra time to the task I used in my experiment:
but it seems worth the cost, and probably in practice potentially faster than the heap based approach when the task has a lot less room to spare and faces more expensive gc cycles. I will be looking for other areas to improve in exchange for this change.
I also refactored a few things to clean some stuff up (mainly consolidating code around reading string dictionaries when the dictionary might be front-coded or classic generic indexed based that was duplicated in a bunch of places).
This PR has: