Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Enable OOTB alerts in RAC page and multiple rules of a rule type #106457

Merged
merged 9 commits into from
Aug 4, 2021

Conversation

neptunian
Copy link
Contributor

@neptunian neptunian commented Jul 21, 2021

Summary

Resolves #91145
Resolves #100142
Resolves #92128

  • Enables OOTB alerts to be managed in Stack Management -> RAC
  • Adds support for allowing multiples rules of a rule type

Test

  • Test alert menus with 2 rules of the same type
    • Create the OOTB default alerts first when prompted by the modal or use the dropdown
    • Edit one of the rules like CPU Usage to have something very low to trigger an alert. Select "notify me when alert is active" and to notify very 1 minute. Make sure an alert gets created.
    • Navigate to Stack Management -> Rules and Connectors
    • click "Create Rule". In the Flytout name the new rule (eg CPU USage 2) and select the same rule modified earlier like CPU Usage under Stack Monitoring and use the same settings as above. Wait for alerts to be created.
    • Click on orange "x alerts" badge. Make sure its set to "Group by Alert Type"
    • You should now see in the menu both rules of the same rule type listed, with each node with the alert nested under it
    • Screen Shot 2021-07-21 at 3 49 45 PM
    • Click on orange "x alerts" badge. Make sure its set to "Group by Node Type"
    • You should see a list of each node with the number of alerts it has in parenthesis
    • Clicking on the node you should see each rule that's being alerted for that node
    • Screen Shot 2021-07-21 at 3 53 34 PM
    • Assure that the count "x alerts" on the badge, matches the actual alerts in the menu
  • Test other pages
    • Clicking on any of the pages such as Nodes you should not have any errors. The alerts should be in a callout at the top and the alert count should match the number of alerts
    • Screen Shot 2021-07-21 at 3 55 22 PM
    • The detailed Node page has alerts for all the rules (note 2 CPU usage alerts)
      Screen Shot 2021-07-22 at 12 40 39 PM

@@ -566,7 +566,7 @@ export const ALERT_ACTION_TYPE_LOG = '.server-log';
/**
* To enable modifing of alerts in under actions
*/
export const ALERT_REQUIRES_APP_CONTEXT = true;
export const ALERT_REQUIRES_APP_CONTEXT = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allows rules to be managed in Stack Management -> RAC

CommonAlert,
CommonAlertState,
CommonAlertStatus,
} from '../../../common/types/alerts';
import { PanelItem } from '../types';
import { sortByNewestAlert } from './sort_by_newest_alert';
import { Legacy } from '../../legacy_shims';

Copy link
Contributor Author

@neptunian neptunian Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This view creates the nested alerts menu when clicking on the badges. It's very confusing but I tried to get rid of some for loops and add some types. The logic change is that it's accommodating the fact that there can be more than one type of a rule and we should show them both in the menu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's still very confusing. I would like to pull it apart more but not during this PR. For now, can we rename alerts on the incoming getAlertPanelsByCategory function, to something that is more descriptive? It's really hard to keep track of what "alerts" means at any given time.

Also, can we add a bunch of comments for now about what these functions are doing step by step?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was playing around with the code to see if I could understand it better. This is where I ended up: https://gist.github.com/jasonrhodes/a6ad54b808943d46f85082df2bbfb200

I don't think we should do any of that right now, or ever necessarily. I think what you've done here is a good step for what we need right now, but if you look that over it may help you see where some comments might be useful based on how badly I've misunderstood some things, etc.

menu.push({
...category,
alerts: alertsInCategory.map(({ alertName }) => {
const alertStatus = alertsContext.allAlerts[alertName];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point in using the alertsContext which has a list of the OOTB alerts to get the instance there since we fetch for the same data at an interval and it exists in the component state.

alertTypeId: ALERT_NODES_CHANGED,
name: `${ALERT_NODES_CHANGED}_label_2`,
...mockAlert,
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a rule of the same rule type and except that rule to be shown.

count: 0,
};
alertsByNodes[state.nodeId][alertId].count++;
alertsByNodes[state.nodeId][alertId].states.push(alertState);
Copy link
Contributor Author

@neptunian neptunian Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groups by alertId (rule id - unique UUID) instead of alertTypeId (eg: monitoring_cpu_usage) which can now happen twice due to support of multiple types of rules allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks.

const alertExists = alertTypeIds.find(
(name) => alerts[name] && alerts[name].find((rule) => rule.states.length > 0)
);
return inSetupMode || alertExists;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search in all types of rules

return cnt;
}, 0);
return cnt;
}, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count rules for all types of rules. flatten first, then reduce.

id: 1,
},
{
id: 2,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test for returning multiple rules of a type instead of expecting only 1


const [rawAlert] = alertClientAlerts.data as [Alert];
return new alertCls(rawAlert) as BaseAlert;
return alertClientAlerts.data.map((alert) => new alertCls(alert as Alert) as BaseAlert);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return all the rules of a type instead of the first one

{
rawAlert: { id: 2 },
states: [],
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect more than one rule per rule type to be returned

@neptunian neptunian marked this pull request as ready for review July 21, 2021 21:22
@neptunian neptunian requested a review from a team July 21, 2021 21:22
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@neptunian
Copy link
Contributor Author

@elasticmachine merge upstream

@jasonrhodes jasonrhodes self-assigned this Jul 27, 2021
@neptunian
Copy link
Contributor Author

@elasticmachine merge upstream

@estermv estermv self-requested a review August 2, 2021 16:29
@neptunian
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Member

@jasonrhodes jasonrhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if we can add a few renames/comments in the bulky part of this code path.

CommonAlert,
CommonAlertState,
CommonAlertStatus,
} from '../../../common/types/alerts';
import { PanelItem } from '../types';
import { sortByNewestAlert } from './sort_by_newest_alert';
import { Legacy } from '../../legacy_shims';

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's still very confusing. I would like to pull it apart more but not during this PR. For now, can we rename alerts on the incoming getAlertPanelsByCategory function, to something that is more descriptive? It's really hard to keep track of what "alerts" means at any given time.

Also, can we add a bunch of comments for now about what these functions are doing step by step?

CommonAlert,
CommonAlertState,
CommonAlertStatus,
} from '../../../common/types/alerts';
import { PanelItem } from '../types';
import { sortByNewestAlert } from './sort_by_newest_alert';
import { Legacy } from '../../legacy_shims';

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was playing around with the code to see if I could understand it better. This is where I ended up: https://gist.github.com/jasonrhodes/a6ad54b808943d46f85082df2bbfb200

I don't think we should do any of that right now, or ever necessarily. I think what you've done here is a good step for what we need right now, but if you look that over it may help you see where some comments might be useful based on how badly I've misunderstood some things, etc.

count: 0,
};
alertsByNodes[state.nodeId][alertId].count++;
alertsByNodes[state.nodeId][alertId].states.push(alertState);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks.

@estermv estermv removed their request for review August 3, 2021 06:53
@neptunian neptunian force-pushed the 100142-multiple-rule-types branch from 595bf99 to d9b279a Compare August 3, 2021 15:52
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
monitoring 730.0KB 729.4KB -654.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @jasonrhodes @neptunian

@neptunian neptunian merged commit b03d85a into elastic:master Aug 4, 2021
neptunian added a commit to neptunian/kibana that referenced this pull request Aug 4, 2021
…of a rule type (elastic#106457)

* allow rules to be managed in RAC page

* return all rules of a rule type instead of first one

* update UI to handle multiple rule types

* add comments about creating the menus by category for alerts and rules

* fix parsing of cluster alerts
neptunian added a commit that referenced this pull request Aug 4, 2021
…of a rule type (#106457) (#107643)

* allow rules to be managed in RAC page

* return all rules of a rule type instead of first one

* update UI to handle multiple rule types

* add comments about creating the menus by category for alerts and rules

* fix parsing of cluster alerts
streamich pushed a commit to vadimkibana/kibana that referenced this pull request Aug 8, 2021
…of a rule type (elastic#106457)

* allow rules to be managed in RAC page

* return all rules of a rule type instead of first one

* update UI to handle multiple rule types

* add comments about creating the menus by category for alerts and rules

* fix parsing of cluster alerts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment