Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring][Alerting] Large shard alert #89410

Merged
merged 15 commits into from
Feb 3, 2021

Conversation

igoristic
Copy link
Contributor

@igoristic igoristic commented Jan 27, 2021

Resolves: #74820

This alert checks for large (primary) shards of the specified indices. It will only trigger if the shard is over the 55gb threshold. By default it will check all index patterns *, but you can use Elastic's "Multi-target syntax" pattern/standards to get a desired set of inclusion and negation.

To test it simply set the threshold value (in gigabytes) to something very low (decimal values are also possible)

UI/UX:
Screen Shot 2021-01-28 at 7 57 55 AM
Screen Shot 2021-01-28 at 7 58 09 AM
Screen Shot 2021-01-28 at 8 12 07 AM
Screen Shot 2021-01-28 at 7 53 10 AM

@igoristic igoristic marked this pull request as ready for review January 28, 2021 14:29
@igoristic igoristic requested a review from a team January 28, 2021 14:29
@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking awesome! Great work here @igoristic!

I noted a few issues in comments!

const {
_index: monitoringIndexName,
_source: { source_node: sourceNode, index_stats: indexStats },
} = get(indexBucket, 'hits.hits.hits[0]');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using get, can we type this to the newly added interfaces for ES source fields? See https://github.com/elastic/kibana/blob/master/x-pack/plugins/monitoring/common/types/es.ts

shardIndex,
shardSize,
clusterUuid,
nodeName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing an error in the UI when I try and see the index listing page that I think is related to this:
Screen Shot 2021-01-28 at 12 54 40 PM

indexPattern: [],
};
if (!inputValues.indexPattern) {
errors.indexPattern.push(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to trigger right when I open the flyout. Should there be a default of * used?

Screen Shot 2021-01-28 at 11 11 45 AM

item: AlertData | null,
cluster: AlertCluster
) {
const shardIndices = alertStates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to condense this a little or something. This is what I'm seeing:
Screen Shot 2021-01-29 at 10 33 45 AM

Maybe we need to cap it at some limit and put like ", and more" or something. WDYT?

@ravikesarwani
Copy link
Contributor

Thanks Igor, this looks great.
Few questions I have:
What is the default behavior (*) for hidden and/or system indices? I am assuming that they are excluded for the purpose of alerting. Can you confirm?
What happens when users explicitly configures index patterns that may contains hidden and/or system indices?
Are there any validations to make sure the user entered index patterns are syntactically correct?

@igoristic
Copy link
Contributor Author

@ravikesarwani

I'm not filtering out any "internal" type indices (although it would be pretty easy to add), however, I think this should be a post enhancement. The UI could be a checkbox, kinda what we have in SM indices list page.

I'm also not "strictly" validating the index name/patterns, although, I am escaping invalid characters based on the requirements noted here in the docs

These are great points and I was aware of them, but didn't really put much thought into it since they weren't part of the acceptance criteria (at least not for this first iteration)

@ravikesarwani
Copy link
Contributor

I feel that we should have some solution for the internal indices in our delivery.
Say an internal index is large (> 55GB), what can user really do about it? Not much.
We don't want the alert to generate so much false positives such that they get annoyed and disable the alert.

With our current solution can we just change the default value in index pattern to exclude system indices?
This way we can achieve the desired results with a simple change and still deliver an expected experience to the customers. Something that when they see an alert for they have control over it.

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work here!

@igoristic igoristic requested a review from a team as a code owner February 3, 2021 04:02
Copy link
Contributor

@spalger spalger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80k is quite a jump, but it's still under the implicit goal of 100k so I don't see any reason to object. If y'all would rather rely on the per-PR metrics to identify unexpectedly large PRs I think that's reasonable.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
monitoring 617 619 +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
monitoring 796.8KB 800.2KB +3.4KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
monitoring 46.2KB 49.5KB +3.4KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@igoristic igoristic merged commit 0d54f07 into elastic:master Feb 3, 2021
@igoristic igoristic deleted the shards_alert branch February 3, 2021 14:19
igoristic added a commit that referenced this pull request Feb 3, 2021
* Shards alert draft

* Added index pattern validation

* Fixed ui/ux

* Optimizing the response

* CR feedback

* import fix

* Increased size limit
# Conflicts:
#	packages/kbn-optimizer/limits.yml
#	x-pack/plugins/monitoring/public/components/elasticsearch/index/advanced.js
@igoristic
Copy link
Contributor Author

Backport:
7.x: 1b94243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Monitoring][Additional-Alerting] Shard Size
6 participants