[Dataset quality] using msearch to speed up degradedDocs query #183023

yngrdyn · 2024-05-09T10:32:25Z

Relates to #179227.

After gathering some numbers around possible tweaks of the current degradedDocs query (more information), I decided to move forward and split the query to reduce the time taken by elastic search aggregating on data streams.

This PR contains the following changes:

mSearch method was added to DatasetQualityESClient to allow the usage of multi search.
degradedDocsRt was changed to now include not only the amount of degradedDocs but also the total docs for the datastreams within the timerange selected

Nothing visible has changed in terms of functionality

Screen.Recording.2024-05-09.at.12.33.53.mov

apmmachine · 2024-05-09T10:32:37Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

…query

mohamedhamed-ahmed · 2024-05-09T10:44:39Z

...ugins/observability_solution/dataset_quality/server/routes/data_streams/get_degraded_docs.ts

+    },
+    // total docs per dataset
+    {
+      size: 0,


do we actually need this query to get the total docs? don't you automatically get the total doc counts from bucket.doc_count?

my thoughts was the we just need to add 1 more prop and assign the value to it.

No, in the first query we just get the total number of documents with _ignored not null. A bucket in the first query will look like

{ "key": { "dataset": "apm.error", "namespace": "default" }, "doc_count": 102 },

Notice that we are not doing the nested aggregation anymore, which I have the theory is the most expensive one. And yes, we do need the total documents in the timerange to get the ratio (percentages).

x-pack/plugins/observability_solution/dataset_quality/common/api_types.ts

mohamedhamed-ahmed

LGTM! Thanks for the quick change 🚀

kibana-ci · 2024-05-09T11:53:27Z

💚 Build Succeeded

Buildkite Build
Commit: dd6ef15
Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-183023-dd6ef1545cfb
Observability Deployment

Metrics [docs]

Canvas Sharable Runtime

The Canvas "shareable runtime" is an bundle produced to enable running Canvas workpads outside of Kibana. This bundle is included in third-party webpages that embed canvas and therefor should be as slim as possible.

id	before	after	diff
`module count`	-	5407	+5407
`total size`	-	8.8MB	+8.8MB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`datasetQuality`	36.2KB	36.3KB	+19.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @yngrdyn

yngrdyn requested a review from a team as a code owner May 9, 2024 10:32

botelastic bot added the ci:project-deploy-observability Create an Observability project label May 9, 2024

yngrdyn self-assigned this May 9, 2024

yngrdyn added the release_note:skip Skip the PR/issue when compiling release notes label May 9, 2024

using msearch to speed up degradedDocs query

b81cf2e

yngrdyn force-pushed the 179227-dataset-quality-adjust-degraded-docs-query branch from 832eab2 to b81cf2e Compare May 9, 2024 10:37

Merge branch 'main' into 179227-dataset-quality-adjust-degraded-docs-…

12fdea6

…query

mohamedhamed-ahmed reviewed May 9, 2024

View reviewed changes

PR comments

dd6ef15

mohamedhamed-ahmed approved these changes May 9, 2024

View reviewed changes

yngrdyn merged commit f82d640 into elastic:main May 9, 2024
18 checks passed

kibanamachine added v8.15.0 backport:skip This commit does not require backporting labels May 9, 2024

yngrdyn mentioned this pull request May 9, 2024

[Dataset quality] Reactive Estimated Data Summary Panel #182873

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dataset quality] using msearch to speed up degradedDocs query #183023

[Dataset quality] using msearch to speed up degradedDocs query #183023

yngrdyn commented May 9, 2024 •

edited

Loading

apmmachine commented May 9, 2024

mohamedhamed-ahmed May 9, 2024 •

edited

Loading

yngrdyn May 9, 2024 •

edited

Loading

mohamedhamed-ahmed left a comment

kibana-ci commented May 9, 2024 •

edited

Loading

[Dataset quality] using msearch to speed up degradedDocs query #183023

[Dataset quality] using msearch to speed up degradedDocs query #183023

Conversation

yngrdyn commented May 9, 2024 • edited Loading

apmmachine commented May 9, 2024

🤖 GitHub comments

mohamedhamed-ahmed May 9, 2024 • edited Loading

Choose a reason for hiding this comment

yngrdyn May 9, 2024 • edited Loading

Choose a reason for hiding this comment

mohamedhamed-ahmed left a comment

Choose a reason for hiding this comment

kibana-ci commented May 9, 2024 • edited Loading

💚 Build Succeeded

Metrics [docs]

Canvas Sharable Runtime

Page load bundle

History

yngrdyn commented May 9, 2024 •

edited

Loading

mohamedhamed-ahmed May 9, 2024 •

edited

Loading

yngrdyn May 9, 2024 •

edited

Loading

kibana-ci commented May 9, 2024 •

edited

Loading