-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement rejections in WriteMemoryLimits
#58885
Implement rejections in WriteMemoryLimits
#58885
Conversation
Pinging @elastic/es-distributed (:Distributed/CRUD) |
@ywelsch - I set the rejection threshold to 30% because a decent amount of tests index large enough bulk requests to trip it if it is 10% (reindex, searchable snapshot, rollover, etc). Once we decide on a number, we might need to artificially inflate it in the test configurations due to the small heap sizes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments. The main effort of this PR is that we will have to do some validation.
An interesting effect of the change in #57573 (WRITE executor changed to SAME executor on write action) that Henning pointed out is that bulk shard requests now have a task registered before execution on the WRITE whereas they previously did not have so. This changes the output of _tasks. This is not necessarily bad, but we should probably make sure we can distinguish between tasks that have executed and are waiting on replica and those that are just queued up (if we want to expose those). Can be handled on separate PR, just did not want to forget about that (i.e. think that needs to be addressed before #57573 goes into 7.9).
@@ -697,6 +697,6 @@ public long getAutoGeneratedTimestamp() { | |||
|
|||
@Override | |||
public long ramBytesUsed() { | |||
return SHALLOW_SIZE + RamUsageEstimator.sizeOf(id) + (source == null ? 0 : source.ramBytesUsed()); | |||
return SHALLOW_SIZE + RamUsageEstimator.sizeOf(id) + (source == null ? 0 : source.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think that this should be ok for now, it's critical to confirm though that our accounting is not completely off with some tests/benchmarks.
A first test to ensure that we are not vastly underaccounting should have a high indices.write.limit
(e.g. 80%) and check that we don't hit the real-memory circuit breaker when we are filling up the node (should probably block requests from being processed), but rather the indices.write.limit
instead.
A second test to ensure that we are not vastly overaccounting should have a low indices.write.limit
, for example (e.g. 100MB), and show that we can still put a decently large number of requests to the system (i.e. send bulks that are in total making up 80MB in input size) without reaching the limit.
Third we need to check what the overhead of primaries is to wait on replica responses (i.e. memory consumed by listeners etc.). For that we should block processing on a replica, and fill up a primary and see how much memory is being consumed.
server/src/main/java/org/elasticsearch/action/bulk/WriteMemoryLimits.java
Outdated
Show resolved
Hide resolved
|
||
import java.util.concurrent.atomic.AtomicLong; | ||
|
||
public class WriteMemoryLimits { | ||
|
||
// TODO: Adjust | ||
public static final Setting<ByteSizeValue> MAX_INDEXING_BYTES = | ||
Setting.memorySizeSetting("indices.write.limit", "10%", Setting.Property.NodeScope, Setting.Property.Dynamic); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10% can be quite a lot for a system with a big heap (i.e. 3GB for a 32GB heap). Should we put an upper bound on this number? Can we run some benchmarks to determine what a good limit would look like (e.g. by checking the lowest limit we can get away with using our current Rally benchmarks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what rally benchmarks you are referring to? All of the rally benchmarks by default are going to use very little indexing memory since they don't use many clients.
I ran some benchmarks today with 10K clients per node and have some numbers. But those a pretty specific to the security use case. We can talk about them tomorrow. But generally the CPUs were saturated with indexing and write queue latency was pretty high (ranging from 50ms - 2s through the benchmark.) In these benchmarks the write limits tended to be 200-300MB and the replica limits tended to be 10-80MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure what specific benchmarks you want here. I have run this with some concurrent security benchmarks and have a good idea about the write limits under various load. But our normally nightly rally benchmarks use very few clients and will not consume significant indexing memory.
I made some changes to this. Added a replica limit. And added a test to ensure we do not trip the write limits in unexpected places. But we should sync up so that we can agree on what specific validation needs to be performed on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left one comment on the setting name and dynamicity
|
||
import java.util.concurrent.atomic.AtomicLong; | ||
|
||
public class WriteMemoryLimits { | ||
|
||
// TODO: Adjust | ||
public static final Setting<ByteSizeValue> MAX_INDEXING_BYTES = | ||
Setting.memorySizeSetting("indices.write.limit", "10%", Setting.Property.NodeScope, Setting.Property.Dynamic); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's make this a non-dynamic node setting. We also need to discuss whether we should white-list this setting in Cloud (so that users can change it there).
I think that we should not pick a setting name that is under the "indices" namespace, but perhaps introduce something completely new, for example indexing_limits.memory.limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name and made it non-dynamic.
primaryTransportService.clearAllRules(); | ||
} | ||
} | ||
|
||
public void testWriteCanBeRejectedAtCoordinatingLevel() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we eventually want to have a test that also covers the rest layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit adds rejections when the indexing memory limits are exceeded for primary or coordinating operations. The amount of bytes allow for indexing is controlled by a new setting indexing_limits.memory.limit.
The meta issue #59263 describes this work from a release highlight perspective. |
Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in elastic#58885.
Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885.
Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in elastic#58885. Backport of elastic#60219
…0283) Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885. Backport of #60219
…0284) Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885. Backport of #60219
This commit adds rejections when the indexing memory limits are
exceeded for primary or coordinating operations. The amount of bytes
allow for indexing is controlled by a new setting
indices.write.limit
.Relates #59263