Add reusable HistogramValue object #49799

iverase · 2019-12-03T16:32:56Z

In #49683 a new field mapper was introduced which supports percentile aggregations via binary doc values. Those are complex values that are interface to the user via HistogramValue interface.

This field mapper generates the doc values and it currently creates an object per doc value of type HistogramValue. This PR adds a new class InternalHistogramValue that implements HistogramValue which can be reused so we create one object per segment instead of per document.

an object per document

elasticmachine · 2019-12-03T16:32:58Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

jpountz · 2019-12-03T16:54:31Z

...n/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/HistogramFieldMapper.java

+
+        /** reset the value for the histogram */
+        void reset(BytesRef bytesRef) {
+            streamInput = new ByteBufferStreamInput(ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length));


There is little value in reusing the histogram if you still create new inputs here. You might want to have a look at ByteArrayDataInput#reset.

Yes, I was a bit annoyed with that. Still now I am encoding doubles as longs and using ByteArrayDataInput for deserialising. I found a bit weird I am using different family of Input/Output classes to read / write. Is that ok / safe?

Good question. I have a slight preference for updating the write logic to be symmetric, but could be convinced either way.

I think this is a bit a catch 22.

The ByteArrayDataOutput needs to be provided an array before hand so you need to know the size of the serialise object before hand. ByteBufferStreamOutput abstract out all that complexity.

The ByteBufferStreamInput is not reusable, ByteArrayDataInput is.

Maybe I am missing something but it seems something is missing. I am seeing this pattern where we are using more complex binary doc values and it seems logic to have a strategy to be reusing those wrappers, wdyt?

I was thinking of using ByteBuffersDataOutput.

Thanks @jpountz

jpountz

I left a minor comment, LGTM otherwise. Feel free to push without further reviews.

jpountz · 2019-12-04T08:40:45Z

...n/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/HistogramFieldMapper.java

-                        streamOutput.writeVInt(count);
-                        streamOutput.writeDouble(values.get(i));
+                        dataOutput.writeVInt(count);
+                        dataOutput.writeLong(NumericUtils.doubleToSortableLong(values.get(i)));


Since we don't need ordering, let's just do Double.doubleToRawLongBits, which is cheaper.

jpountz · 2019-12-04T08:41:08Z

...n/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/HistogramFieldMapper.java

+        public boolean next() {
+            if (streamInput.eof() == false) {
+                count = streamInput.readVInt();
+                value = NumericUtils.sortableLongToDouble(streamInput.readLong());


and use Double.longBitsToDouble here.

Adds a reusable implementation of HistogramValue so we do not create an object per document.

iverase added 2 commits December 3, 2019 16:59

Add a reusable implementation of HistogramValue so we do not create

76e0c15

an object per document

checkStyle

990b3ba

iverase added >enhancement :Analytics/Aggregations Aggregations v8.0.0 v7.6.0 labels Dec 3, 2019

iverase requested review from jpountz and polyfractal December 3, 2019 16:32

jpountz reviewed Dec 3, 2019

View reviewed changes

iverase added 3 commits December 3, 2019 18:31

Deserialize using ByteArrayDataInput

b43a086

remove unused imports

dac0fb1

Use ByteBuffersDataOutput

876d97f

jpountz approved these changes Dec 4, 2019

View reviewed changes

iverase added 2 commits December 4, 2019 09:45

Use Double to bits methods instead of NumericUtils

85de8a0

rename streamInput to dataInput

56c8ca9

iverase merged commit d8b9b02 into elastic:master Dec 4, 2019

iverase mentioned this pull request Dec 4, 2019

[Backport] Add reusable HistogramValue object #49823

Merged

iverase deleted the InternalHistogramValue branch December 4, 2019 10:52

iverase added a commit that referenced this pull request Dec 4, 2019

Add reusable HistogramValue object (#49799) (#49823)

44e9455

Adds a reusable implementation of HistogramValue so we do not create an object per document.

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020

Add reusable HistogramValue object (elastic#49799)

ad60476

Adds a reusable implementation of HistogramValue so we do not create an object per document.

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reusable HistogramValue object #49799

Add reusable HistogramValue object #49799

iverase commented Dec 3, 2019

elasticmachine commented Dec 3, 2019

jpountz Dec 3, 2019

iverase Dec 3, 2019

jpountz Dec 3, 2019

iverase Dec 4, 2019

jpountz Dec 4, 2019

iverase Dec 4, 2019

jpountz left a comment

jpountz Dec 4, 2019

jpountz Dec 4, 2019

Add reusable HistogramValue object #49799

Add reusable HistogramValue object #49799

Conversation

iverase commented Dec 3, 2019

elasticmachine commented Dec 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment