Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bytes from geogrid aggregation to request circuit-breaker #50720

Closed
wants to merge 11 commits into from

Conversation

talevy
Copy link
Contributor

@talevy talevy commented Jan 8, 2020

Since high precision values for geogrid aggregations can
result in single documents tripping the bytes limit for a request.
large shapes and high precisions can result in hundreds of thousands
of buckets kept track in a long[]. To protect from OOMs, it is important
to keep track of this memory usage. This array is re-used across documents
within each segment, so accounting is managed by the GeoGridAggregator.

relates #37206

Since high precision values for geogrid aggregations can
result in single documents tripping the `search.max_buckets`
limit, it is important to catch these as soon as possible.

This PR pushes a new check to the MultiBucketConsumerService to
allow stateless checking of new bucket counts against the limit.
@talevy talevy added WIP :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Jan 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

@talevy talevy changed the title Add eager bucket count for geo_shape grid aggregation Add eager max-bucket-count check for geo_shape grid aggregation Jan 10, 2020
@talevy
Copy link
Contributor Author

talevy commented Jan 10, 2020

run elasticsearch-ci/1

@talevy talevy changed the title Add eager max-bucket-count check for geo_shape grid aggregation Add bytes from geogrid aggregation to request circuit-breaker Jan 14, 2020
@talevy talevy requested a review from polyfractal January 14, 2020 02:52
@talevy
Copy link
Contributor Author

talevy commented Jan 14, 2020

hey @polyfractal. I'd like to get some more explicit tests to verify the pre and post collection
ceremony, but I think I've captured all the necessary hooks to test the addition of
the circuit-breaker to the GeoGridAggregator. thanks for talking it over!

@talevy talevy removed the WIP label Jan 14, 2020
Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out pretty clean! Nice work!

++ if we could get some kind of test for the pre-segment and post-collection hooks.

Do you know if there are any integration tests that use the geotile/geogrid agg? Not generally a fan of integration tests but since our breaker testing is spotty right now (e.g. all the None breakers that AggTestCase uses) it might make sense to ensure there is at least one integration test flexing the breaker. Not necessarily trying to trip it, just using it for routine tests.


public void testGeoShapeTrippedCircuitBreaker() throws IOException {
GeoGridTiler tiler = geoGridTiler();
int precision = randomIntBetween(1, 9); // does not go until MAX_ZOOM for performance reasons
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just verifying this won't nuke CI, since it has to run at least once to determine the number tiles? I'm assuming that's why it's limited to precision: 9 but thought I'd double check :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it is just that I am not interested in tiling a shape into 100s of thousands of tiles. I've run this test over 1000 iterations and things seem to run snappily

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

int count = GEOHASH.setValues(values, value, 7);
assertThat(count, equalTo(1024));
}
}

public void testGeoHashMaxBuckets() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename these tests to reference the breaker instead of max buckets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good catch!

@talevy talevy added the WIP label Feb 14, 2020
@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@talevy
Copy link
Contributor Author

talevy commented May 7, 2020

Closed in #37206

@talevy talevy closed this May 7, 2020
@talevy talevy deleted the gdv-circuitbreak branch May 7, 2020 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants