Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIEM] [ML] Starting a job without enough memory doesn't always show Out of Memory error #54382

Closed
spong opened this issue Jan 9, 2020 · 4 comments
Labels
bug Fixes for quality problems that affect the customer experience Feature:ML Rule Security Solution Machine Learning rule type fixed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. QA:Validated Issue has been validated by QA Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM

Comments

@spong
Copy link
Member

spong commented Jan 9, 2020

If a job fails to start due to a lack of memory this error should be presented to the user by means of an error toast. While this behavior is present in some cases (see #45316), there seems to be some inconsistency on whether or not an error is displayed.

In the below gif, you'll notice that the force_start_datafeed request returns an empty object instead of an error. In this instance, when we next refresh the state of all jobs, we can show that the job was not able to start due to a lack of memory by putting the error message/current state as hover text on the job (or potentially a callout at the top saying we're at max utilization).

ml_no_oom_error

State stuck as 'opening' in ML UI:
image

@spong spong added bug Fixes for quality problems that affect the customer experience Team:SIEM labels Jan 9, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/siem (Team:SIEM)

@spong spong self-assigned this Jan 10, 2020
@spong
Copy link
Member Author

spong commented Jan 10, 2020

As introduced in #50766, looks like we can add even more details around the no node/OoM cases to better improve the UX here.

@spong
Copy link
Member Author

spong commented Jun 25, 2020

@MadameSheema this is still relevant, and should be prioritized as we continue to add more ML jobs.

@MadameSheema MadameSheema added the Team:Detections and Resp Security Detection Response Team label Oct 1, 2020
@MindyRS MindyRS added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Oct 27, 2020
@peluja1012 peluja1012 added the impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. label Oct 28, 2020
@peluja1012 peluja1012 added the Feature:ML Rule Security Solution Machine Learning rule type label Nov 17, 2020
@peluja1012 peluja1012 added the Team:Detection Rule Management Security Detection Rule Management Team label Sep 15, 2021
@spong spong removed their assignment Nov 10, 2021
@cybersecdiva cybersecdiva added fixed QA:Validated Issue has been validated by QA labels Aug 17, 2023
@cybersecdiva
Copy link

Tested in 8.9.0 BC5

Build Details:
VERSION: 8.9.0 BC5
BUILD: 64715
COMMIT: beb56356c5c037441f89264361302513ff5bd9f8

Preconditions:

  • Kibana must be running
  • ML node must be set up with the lowest memory (1 GB)

Describe the bug:
Starting an ML job without enough memory doesn't always show Out of Memory error

Steps to reproduce:

  1. Navigate to Security—> Manage Rules—> ML job settings

  2. Under ML job settings enable a minimum of 10 ML jobs

  3. Save the change when the Pop-up notification displays to confirm changes

  4. Reload the page for the change to take effect

  5. Navigate to Security --> Manage --> Rules

Current behavior

No Out of Memory errors are warnings for ML jobs started (over 10+ jobs enabled)

Expected behavior:

No Out of Memory errors are warnings for ML jobs started (over 10+ jobs enabled)

Observations:

A review of the ML job memory usages shows that when there are more than 10+ jobs even 20 or more jobs on enabled on the cluster, there are no out of memory error limits displayed

Screenshots of behavior:

Machine Learning Memory Usage
Screenshot 2023-08-17 at 6 39 48 PM

Conclusion:

Behavior appears to be performing as expected. Validated ✅ bug is fiexed.

@MadameSheema and @spong FYI Updated observations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:ML Rule Security Solution Machine Learning rule type fixed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. QA:Validated Issue has been validated by QA Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM
Projects
None yet
Development

No branches or pull requests

6 participants