-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force execution of finish shard bulk request #51957
Conversation
Currently the shard bulk request can be rejected by the write threadpool after a mapping update. This introduces a scenario where the mapping listener thread will attempt to finish the request and fsync. This thread can potentially be a transport thread. This commit fixes this issue by forcing the finish action to happen on the write threadpool. Fixes elastic#51904.
Pinging @elastic/es-distributed (:Distributed/CRUD) |
How far should we back port this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment for consideration.
|
||
@Override | ||
protected void doRun() { | ||
finishRequest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that rather than scheduling only finishRequest
on the write thread pool, we should execute the entirety of onRejection
on the write thread pool (and still forcing execution, since rejecting execution here would be very bad):
elasticsearch/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java
Line 160 in 3ad8aa6
ActionListener.wrap(v -> executor.execute(this), this::onRejection)) == false) { |
The reason I think this is because we're still doing work in onRejection
that is linear in the number of documents in the bulk request. It seems we'd want to get that off of the networking/cluster state applier thread too given that we're going to fork anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should instead let onRejection
force-push the onRejection
handling onto the queue of the requested executor. The current "direct" handling kind of prioritizes the onRejection
handling over everything else in the queue, which I think there is really no good reason for.
Looking at various onRejection
handlers, some call listener.onFailure
and verifying that none of those do bad things is tricky.
Also notice that onAfter
is also called in the caller thread when requests are rejected, this poses similar issues (not that I found a smoking gun there).
Finally, notice that AbstractRunnable.onRejection
by default calls onFailure
.
I doubt that we careful consider that onAfter
and onFailure
might run in the current thread when using AbstractRunnable
and executing the onRejection
handling on the target thread-pool would fix this, making it easier to reason about.
That said, I think this PR is good and I am not objecting to it going in. Following my suggestion is likely to surface a few additional things to resolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made @jasontedor change.
I think maybe Henning's comment about maybe onRejection
should be executed on the thread pool anyway is beyond the scope of this PR? Or at least a larger discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbrooks8, yes, that is fine, I just found it most natural to put it here. I will open a PR with that change so we can discuss based on that PR instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment, but this LGTM.
context, null); | ||
} | ||
finishRequest(); | ||
// Force the execution to finish the request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth a comment here why we fork to the executor (since it's not obvious from the code).
I think to the 7.6, 7.x, and master branches. |
Currently the shard bulk request can be rejected by the write threadpool after a mapping update. This introduces a scenario where the mapping listener thread will attempt to finish the request and fsync. This thread can potentially be a transport thread. This commit fixes this issue by forcing the finish action to happen on the write threadpool. Fixes elastic#51904.
Currently the shard bulk request can be rejected by the write threadpool after a mapping update. This introduces a scenario where the mapping listener thread will attempt to finish the request and fsync. This thread can potentially be a transport thread. This commit fixes this issue by forcing the finish action to happen on the write threadpool. Fixes #51904.
Currently the shard bulk request can be rejected by the write threadpool after a mapping update. This introduces a scenario where the mapping listener thread will attempt to finish the request and fsync. This thread can potentially be a transport thread. This commit fixes this issue by forcing the finish action to happen on the write threadpool. Fixes #51904.
Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.
Fixes #51904.