-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude nested documents in LuceneChangesSnapshot #51279
Conversation
Pinging @elastic/es-distributed (:Distributed/Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me, but I'd like another pair of eyes to look at it too in case there's more traps in this area of which I'm unaware. I left a comment on comments.
Can we do something similar to address the user's issue with not falling back on file-based recovery eagerly enough? It's a known issue:
elasticsearch/server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java
Lines 290 to 295 in d02afcc
// NB safeCommitInfo.docCount is a very low-level count of the docs in the index, and in particular if this shard contains nested | |
// docs then safeCommitInfo.docCount counts every child doc separately from the parent doc. However every part of a nested document | |
// has the same seqno, so we may be overestimating the cost of a file-based recovery when compared to an ops-based recovery and | |
// therefore preferring ops-based recoveries inappropriately in this case. Correctly accounting for nested docs seems difficult to | |
// do cheaply, and the circumstances in which this matters should be relatively rare, so we use this naive calculation regardless. | |
// TODO improve this measure for when nested docs are in use |
It would be good if we could more accurately estimate the "size" of the safe commit to compare it to the operation count in this method.
server/src/main/java/org/elasticsearch/index/engine/CombinedDocValues.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I adapted the version labels as this is a bugfix
server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
Outdated
Show resolved
Hide resolved
Sure, I will work on a follow-up for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments but otherwise LGTM.
server/src/main/java/org/elasticsearch/index/engine/CombinedDocValues.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/CombinedDocValues.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/CombinedDocValues.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java
Show resolved
Hide resolved
Thanks everyone :) |
LuceneChangesSnapshot can be slow if nested documents are heavily used. Also, it estimates the number of operations to be recovered in peer recoveries inaccurately. With this change, we prefer excluding the nested non-root documents in a Lucene query instead.
LuceneChangesSnapshot can be slow if nested documents are heavily used. Also, it estimates the number of operations to be recovered in peer recoveries inaccurately. With this change, we prefer excluding the nested non-root documents in a Lucene query instead.
LuceneChangesSnapshot can be slow if nested documents are heavily used. Also, it estimates the number of operations to be recovered in peer recoveries inaccurately. With this change, we prefer excluding the nested non-root documents in a Lucene query instead.
Kudos to @DaveCTurner for finding this issue. See: https://discuss.elastic.co/t/es-7-5-translog-recovery-is-extremely-slow/215505/11.