Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Extend Detection Engine Health API with top N rules by metrics #181169

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

maximpn
Copy link
Contributor

@maximpn maximpn commented Apr 18, 2024

Relates to: #125642

Summary

This PR extends Detection Engine Health API by adding top N (by default 10) rules grouped by metrics like execution duration or schedule delay.

Details

This PR is part of my OnWeek! project to investigate possible usage of LLM models for example ChatGPT provided by OpenAI to perform automatic rule monitoring by summarising problems in Detection Engine Health API responses and giving users instructions and advices to solve the problems.

Extending Detection Engine Health API by top N rules is beneficial on its own since it allows to easily spot problematic rules and investigate further manually. It could be super helpful while working on SDH.

The following API endpoints were extended

  • Cluster health API endpoint /internal/detection_engine/health/_cluster
  • Space health API endpoint /internal/detection_engine/health/_space

A number of extracted top N rules is controlled by num_of_top_rules body param. A default value is 10 rules.
It's possible to set this param only by using a HTTP POST request (similar behavior for interval). When a HTTP GET request is used alway maximum of 10 top rules will be returned for each metric.

The following metrics were added to show top N rules for each of them (measured in milliseconds)

  • Execution duration
  • Schedule delay
  • Search duration
  • Indexing duration
  • Enrichment duration

The following response parts were extended by added a section under top_rules key

  • Health stats over the specified "health interval" (stats_over_interval)
  • Health history over the specified "health interval" (history_over_interval)

Response example

Cluster health response (truncated)

{
    "timings": {
        "requested_at": "2024-04-18T14:55:43.075Z",
        "processed_at": "2024-04-18T14:55:44.027Z",
        "processing_time_ms": 952
    },
    ...
    "health": {
        ...
        "stats_over_interval": {
            "top_rules": {
                "by_execution_duration_ms": [
                    {
                        "id": "4a47dcad-08a7-4ef7-89ae-0a0c8a8efdc5",
                        "name": "Cobalt Strike Command and Control Beacon",
                        "category": "siem.queryRule",
                        "percentiles": {
                            "50.0": 303,
                            "95.0": 241134.59999999974,
                            "99.0": 323691.71999999986,
                            "99.9": 342267.0720000002
                        }
                    },
                   ...
                ],
                "by_schedule_delay_ms": [
                    {
                        "id": "4cf4a486-cfae-49fc-968b-5f60ea84c228",
                        "name": "Machine Learning Detected DGA activity using a known SUNBURST DNS domain",
                        "category": "siem.queryRule",
                        "percentiles": {
                            "50.0": 25265,
                            "95.0": 741284.1999999995,
                            "99.0": 881731.2399999998,
                            "99.9": 913331.8240000004
                        }
                    },
                    ...
                ],
                "by_search_duration_ms": [
                    {
                        "id": "e6447ef8-fdb9-41f6-9ddb-2f60213f6c15",
                        "name": "Agent Spoofing - Multiple Hosts Using Same Agent",
                        "category": "siem.thresholdRule",
                        "percentiles": {
                            "50.0": 11,
                            "95.0": 30.599999999999994,
                            "99.0": 32.519999999999996,
                            "99.9": 32.952000000000005
                        }
                    },
                   ...
                ],
                "by_indexing_duration_ms": [
                    {
                        "id": "00064efe-c3a3-449d-b1e4-db2fe1263a55",
                        "name": "Suspicious ScreenConnect Client Child Process",
                        "category": "siem.eqlRule",
                        "percentiles": {
                            "50.0": 0,
                            "95.0": 0,
                            "99.0": 0,
                            "99.9": 0
                        }
                    },
                    ...
                ],
                "by_enrichment_duration_ms": [
                    {
                        "id": "00064efe-c3a3-449d-b1e4-db2fe1263a55",
                        "name": "Suspicious ScreenConnect Client Child Process",
                        "category": "siem.eqlRule",
                        "percentiles": {
                            "50.0": 0,
                            "95.0": 0,
                            "99.0": 0,
                            "99.9": 0
                        }
                    },
                    ...
                ]
            },
            ...
        },
        "history_over_interval": {
            "buckets": [
                {
                    "timestamp": "2024-04-18T13:00:00.000Z",
                    "stats": {
                        "top_rules": {
                            "by_execution_duration_ms": [
                                {
                                    "id": "1cccaa08-2d35-40bf-a2c3-99b41de7e6b7",
                                    "name": "Container Workload Protection",
                                    "category": "siem.queryRule",
                                    "percentiles": {
                                        "50.0": 2595,
                                        "95.0": 2595,
                                        "99.0": 2595,
                                        "99.9": 2595
                                    }
                                },
                                ...
                            ],
                            "by_schedule_delay_ms": [
                                {
                                    "id": "db627248-3395-4bc1-85d6-dae20401fc49",
                                    "name": "Endpoint Security",
                                    "category": "siem.queryRule",
                                    "percentiles": {
                                        "50.0": 1664,
                                        "95.0": 1664,
                                        "99.0": 1664,
                                        "99.9": 1664
                                    }
                                },
                                ...
                            ],
                            ...
                          ]
                        },
                    }
                },
               ...
            ]
        }
    }
}

@maximpn maximpn added enhancement New value added to drive a business result release_note:skip Skip the PR/issue when compiling release notes impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Feature:Rule Monitoring Security Solution Detection Rule Monitoring area Team:Detection Rule Management Security Detection Rule Management Team v8.15.0 labels Apr 18, 2024
@maximpn maximpn self-assigned this Apr 18, 2024
@maximpn maximpn force-pushed the add-top-n-rules-to-health-api branch from 7637471 to be946aa Compare April 18, 2024 16:02
@maximpn maximpn marked this pull request as ready for review April 18, 2024 18:46
@maximpn maximpn requested a review from a team as a code owner April 18, 2024 18:46
@maximpn maximpn requested a review from jpdjere April 18, 2024 18:46
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@maximpn maximpn requested a review from banderror April 18, 2024 18:46
@maximpn maximpn force-pushed the add-top-n-rules-to-health-api branch from 3d1d03a to 3bc6a79 Compare April 19, 2024 08:16
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 5451 5452 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 14.6MB 14.6MB +208.0B

History

  • 💚 Build #204689 succeeded be946aa468d77f074cdad40501a70112b37c70e4
  • 💔 Build #204668 failed 7637471439c60544124b7cc009cb05cc99e4240d

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @maximpn

@banderror banderror removed the request for review from jpdjere April 19, 2024 13:28
@banderror banderror marked this pull request as draft June 25, 2024 09:33
@banderror banderror removed the v8.15.0 label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Rule Monitoring Security Solution Detection Rule Monitoring area impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants