Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rejections in WriteMemoryLimits #58885

Merged
merged 10 commits into from
Jul 8, 2020

Conversation

Tim-Brooks
Copy link
Contributor

@Tim-Brooks Tim-Brooks commented Jul 2, 2020

This commit adds rejections when the indexing memory limits are
exceeded for primary or coordinating operations. The amount of bytes
allow for indexing is controlled by a new setting
indices.write.limit.

Relates #59263

@Tim-Brooks Tim-Brooks added >enhancement :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.9.0 labels Jul 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 2, 2020
@Tim-Brooks
Copy link
Contributor Author

@ywelsch - I set the rejection threshold to 30% because a decent amount of tests index large enough bulk requests to trip it if it is 10% (reindex, searchable snapshot, rollover, etc). Once we decide on a number, we might need to artificially inflate it in the test configurations due to the small heap sizes.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments. The main effort of this PR is that we will have to do some validation.

An interesting effect of the change in #57573 (WRITE executor changed to SAME executor on write action) that Henning pointed out is that bulk shard requests now have a task registered before execution on the WRITE whereas they previously did not have so. This changes the output of _tasks. This is not necessarily bad, but we should probably make sure we can distinguish between tasks that have executed and are waiting on replica and those that are just queued up (if we want to expose those). Can be handled on separate PR, just did not want to forget about that (i.e. think that needs to be addressed before #57573 goes into 7.9).

@@ -697,6 +697,6 @@ public long getAutoGeneratedTimestamp() {

@Override
public long ramBytesUsed() {
return SHALLOW_SIZE + RamUsageEstimator.sizeOf(id) + (source == null ? 0 : source.ramBytesUsed());
return SHALLOW_SIZE + RamUsageEstimator.sizeOf(id) + (source == null ? 0 : source.length());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think that this should be ok for now, it's critical to confirm though that our accounting is not completely off with some tests/benchmarks.

A first test to ensure that we are not vastly underaccounting should have a high indices.write.limit (e.g. 80%) and check that we don't hit the real-memory circuit breaker when we are filling up the node (should probably block requests from being processed), but rather the indices.write.limit instead.
A second test to ensure that we are not vastly overaccounting should have a low indices.write.limit, for example (e.g. 100MB), and show that we can still put a decently large number of requests to the system (i.e. send bulks that are in total making up 80MB in input size) without reaching the limit.
Third we need to check what the overhead of primaries is to wait on replica responses (i.e. memory consumed by listeners etc.). For that we should block processing on a replica, and fill up a primary and see how much memory is being consumed.


import java.util.concurrent.atomic.AtomicLong;

public class WriteMemoryLimits {

// TODO: Adjust
public static final Setting<ByteSizeValue> MAX_INDEXING_BYTES =
Setting.memorySizeSetting("indices.write.limit", "10%", Setting.Property.NodeScope, Setting.Property.Dynamic);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10% can be quite a lot for a system with a big heap (i.e. 3GB for a 32GB heap). Should we put an upper bound on this number? Can we run some benchmarks to determine what a good limit would look like (e.g. by checking the lowest limit we can get away with using our current Rally benchmarks).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what rally benchmarks you are referring to? All of the rally benchmarks by default are going to use very little indexing memory since they don't use many clients.

I ran some benchmarks today with 10K clients per node and have some numbers. But those a pretty specific to the security use case. We can talk about them tomorrow. But generally the CPUs were saturated with indexing and write queue latency was pretty high (ranging from 50ms - 2s through the benchmark.) In these benchmarks the write limits tended to be 200-300MB and the replica limits tended to be 10-80MB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what specific benchmarks you want here. I have run this with some concurrent security benchmarks and have a good idea about the write limits under various load. But our normally nightly rally benchmarks use very few clients and will not consume significant indexing memory.

@Tim-Brooks
Copy link
Contributor Author

I made some changes to this. Added a replica limit. And added a test to ensure we do not trip the write limits in unexpected places. But we should sync up so that we can agree on what specific validation needs to be performed on this PR.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one comment on the setting name and dynamicity


import java.util.concurrent.atomic.AtomicLong;

public class WriteMemoryLimits {

// TODO: Adjust
public static final Setting<ByteSizeValue> MAX_INDEXING_BYTES =
Setting.memorySizeSetting("indices.write.limit", "10%", Setting.Property.NodeScope, Setting.Property.Dynamic);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's make this a non-dynamic node setting. We also need to discuss whether we should white-list this setting in Cloud (so that users can change it there).

I think that we should not pick a setting name that is under the "indices" namespace, but perhaps introduce something completely new, for example indexing_limits.memory.limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the name and made it non-dynamic.

primaryTransportService.clearAllRules();
}
}

public void testWriteCanBeRejectedAtCoordinatingLevel() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, we eventually want to have a test that also covers the rest layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment.

@Tim-Brooks Tim-Brooks requested a review from ywelsch July 8, 2020 14:11
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tim-Brooks Tim-Brooks merged commit 611fb03 into elastic:master Jul 8, 2020
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Jul 13, 2020
This commit adds rejections when the indexing memory limits are
exceeded for primary or coordinating operations. The amount of bytes
allow for indexing is controlled by a new setting
indexing_limits.memory.limit.
@Tim-Brooks
Copy link
Contributor Author

The meta issue #59263 describes this work from a release highlight perspective.

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Jul 27, 2020
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.

Note the limit was implemented in elastic#58885.
dimitris-athanasiou added a commit that referenced this pull request Jul 28, 2020
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.

Note the limit was implemented in #58885.
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Jul 28, 2020
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.

Note the limit was implemented in elastic#58885.

Backport of elastic#60219
dimitris-athanasiou added a commit that referenced this pull request Jul 28, 2020
…0283)

Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.

Note the limit was implemented in #58885.

Backport of #60219
dimitris-athanasiou added a commit that referenced this pull request Jul 28, 2020
…0284)

Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.

Note the limit was implemented in #58885.

Backport of #60219
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement release highlight Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants