-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reusable HistogramValue object #49799
Conversation
an object per document
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
|
||
/** reset the value for the histogram */ | ||
void reset(BytesRef bytesRef) { | ||
streamInput = new ByteBufferStreamInput(ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is little value in reusing the histogram if you still create new inputs here. You might want to have a look at ByteArrayDataInput#reset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was a bit annoyed with that. Still now I am encoding doubles as longs and using ByteArrayDataInput for deserialising. I found a bit weird I am using different family of Input/Output classes to read / write. Is that ok / safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I have a slight preference for updating the write logic to be symmetric, but could be convinced either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit a catch 22.
The ByteArrayDataOutput needs to be provided an array before hand so you need to know the size of the serialise object before hand. ByteBufferStreamOutput abstract out all that complexity.
The ByteBufferStreamInput is not reusable, ByteArrayDataInput is.
Maybe I am missing something but it seems something is missing. I am seeing this pattern where we are using more complex binary doc values and it seems logic to have a strategy to be reusing those wrappers, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of using ByteBuffersDataOutput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jpountz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a minor comment, LGTM otherwise. Feel free to push without further reviews.
streamOutput.writeVInt(count); | ||
streamOutput.writeDouble(values.get(i)); | ||
dataOutput.writeVInt(count); | ||
dataOutput.writeLong(NumericUtils.doubleToSortableLong(values.get(i))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't need ordering, let's just do Double.doubleToRawLongBits
, which is cheaper.
public boolean next() { | ||
if (streamInput.eof() == false) { | ||
count = streamInput.readVInt(); | ||
value = NumericUtils.sortableLongToDouble(streamInput.readLong()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and use Double.longBitsToDouble
here.
Adds a reusable implementation of HistogramValue so we do not create an object per document.
In #49683 a new field mapper was introduced which supports percentile aggregations via binary doc values. Those are complex values that are interface to the user via HistogramValue interface.
This field mapper generates the doc values and it currently creates an object per doc value of type HistogramValue. This PR adds a new class InternalHistogramValue that implements HistogramValue which can be reused so we create one object per segment instead of per document.