Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reusable HistogramValue object #49799

Merged
merged 7 commits into from
Dec 4, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,9 @@ public AtomicHistogramFieldData load(LeafReaderContext context) {
public HistogramValues getHistogramValues() throws IOException {
try {
final BinaryDocValues values = DocValues.getBinary(context.reader(), fieldName);
final InternalHistogramValue value = new InternalHistogramValue();
return new HistogramValues() {

@Override
public boolean advanceExact(int doc) throws IOException {
return values.advanceExact(doc);
Expand All @@ -211,7 +213,8 @@ public boolean advanceExact(int doc) throws IOException {
@Override
public HistogramValue histogram() throws IOException {
try {
return getHistogramValue(values.binaryValue());
value.reset(values.binaryValue());
return value;
} catch (IOException e) {
throw new IOException("Cannot load doc value", e);
}
Expand All @@ -220,7 +223,6 @@ public HistogramValue histogram() throws IOException {
} catch (IOException e) {
throw new IOException("Cannot load doc values", e);
}

}

@Override
Expand Down Expand Up @@ -259,44 +261,6 @@ public SortField sortField(Object missingValue, MultiValueMode sortMode,
}
};
}

private HistogramValue getHistogramValue(final BytesRef bytesRef) throws IOException {
final ByteBufferStreamInput streamInput = new ByteBufferStreamInput(
ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length));
return new HistogramValue() {
double value;
int count;
boolean isExhausted;

@Override
public boolean next() throws IOException {
if (streamInput.available() > 0) {
count = streamInput.readVInt();
value = streamInput.readDouble();
return true;
}
isExhausted = true;
return false;
}

@Override
public double value() {
if (isExhausted) {
throw new IllegalArgumentException("histogram already exhausted");
}
return value;
}

@Override
public int count() {
if (isExhausted) {
throw new IllegalArgumentException("histogram already exhausted");
}
return count;
}
};
}

};
}

Expand Down Expand Up @@ -439,4 +403,50 @@ protected void doXContentBody(XContentBuilder builder, boolean includeDefaults,
builder.field(Names.IGNORE_MALFORMED, ignoreMalformed.value());
}
}

/** re-usable {@link HistogramValue} implementation */
private static class InternalHistogramValue extends HistogramValue {
double value;
int count;
boolean isExhausted;
ByteBufferStreamInput streamInput;

InternalHistogramValue() {
}

/** reset the value for the histogram */
void reset(BytesRef bytesRef) {
streamInput = new ByteBufferStreamInput(ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is little value in reusing the histogram if you still create new inputs here. You might want to have a look at ByteArrayDataInput#reset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was a bit annoyed with that. Still now I am encoding doubles as longs and using ByteArrayDataInput for deserialising. I found a bit weird I am using different family of Input/Output classes to read / write. Is that ok / safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I have a slight preference for updating the write logic to be symmetric, but could be convinced either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit a catch 22.

The ByteArrayDataOutput needs to be provided an array before hand so you need to know the size of the serialise object before hand. ByteBufferStreamOutput abstract out all that complexity.

The ByteBufferStreamInput is not reusable, ByteArrayDataInput is.

Maybe I am missing something but it seems something is missing. I am seeing this pattern where we are using more complex binary doc values and it seems logic to have a strategy to be reusing those wrappers, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of using ByteBuffersDataOutput.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jpountz

isExhausted = false;
value = 0;
count = 0;
}

@Override
public boolean next() throws IOException {
if (streamInput.available() > 0) {
count = streamInput.readVInt();
value = streamInput.readDouble();
return true;
}
isExhausted = true;
return false;
}

@Override
public double value() {
if (isExhausted) {
throw new IllegalArgumentException("histogram already exhausted");
}
return value;
}

@Override
public int count() {
if (isExhausted) {
throw new IllegalArgumentException("histogram already exhausted");
}
return count;
}
}
}