-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dataset quality] using msearch to speed up degradedDocs query #183023
[Dataset quality] using msearch to speed up degradedDocs query #183023
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
832eab2
to
b81cf2e
Compare
}, | ||
// total docs per dataset | ||
{ | ||
size: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in the first query we just get the total number of documents with _ignored
not null. A bucket in the first query will look like
{
"key": {
"dataset": "apm.error",
"namespace": "default"
},
"doc_count": 102
},
Notice that we are not doing the nested aggregation anymore, which I have the theory is the most expensive one. And yes, we do need the total documents in the timerange to get the ratio (percentages).
x-pack/plugins/observability_solution/dataset_quality/common/api_types.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the quick change 🚀
💚 Build Succeeded
Metrics [docs]Canvas Sharable Runtime
Page load bundle
HistoryTo update your PR or re-run it, just comment with: cc @yngrdyn |
Relates to #179227.
After gathering some numbers around possible tweaks of the current degradedDocs query (more information), I decided to move forward and split the query to reduce the time taken by elastic search aggregating on data streams.
This PR contains the following changes:
mSearch
method was added toDatasetQualityESClient
to allow the usage of multi search.degradedDocsRt
was changed to now include not only the amount of degradedDocs but also the total docs for the datastreams within the timerange selectedNothing visible has changed in terms of functionality
Screen.Recording.2024-05-09.at.12.33.53.mov