-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds ML supervised model Problem child package #2115
Adds ML supervised model Problem child package #2115
Conversation
6a60abe
to
fc68b33
Compare
fc68b33
to
b83c6dd
Compare
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
8074733
to
d6d356e
Compare
3876651
to
0cac4a9
Compare
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
}, | ||
{ | ||
"by_field_name": "host.hostname", | ||
"detector_description": "high sum by host", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, could the description here be edited to indicate the field it is running on blocklist_label
? I don't have enough context on what information is held in blocklist_label
to advise here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peteharverson So the way we classify processes as malicious is using either a supervised model or a blocklist for things the model may have missed. I think adding this in the description will introduce complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Thanks for explanation @ajosh0504
}, | ||
{ | ||
"by_field_name": "user.name", | ||
"detector_description": "high sum by host", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we include the field name in the description of this detector - what is it a high sum of?
"detectors": [ | ||
{ | ||
"by_field_name": "user.name", | ||
"detector_description": "high sum by user", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we include the field name in the description of this detector - what is it a high sum of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I think introducing field names in the descriptions might introduce complexity, since users may not know what the fields exactly mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Apoorva, field names aren't always the most user-centric and don't add that much value (imo) for the complexity they could add
"detectors": [ | ||
{ | ||
"by_field_name": "process.parent.name", | ||
"detector_description": "high sum by parent process", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we include the field name in the description of this detector - what is it a high sum of?
}, | ||
{ | ||
"by_field_name": "process.parent.name", | ||
"detector_description": "high sum by host", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we include the field name in the description of this detector - what is it a high sum of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this done! Super excited to see these models in packages. Just one question. So once a user installs a package, what's the process for them to update/change artifacts produced by the package?
@@ -0,0 +1,6 @@ | |||
# ML Problem Child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the name consistent everywhere? I'm seeing ProblemChild
, problem child
and ProblemChild
being used. We've been going with ProblemChild
so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use ProblemChild
everywhere in 7d33b9d
@@ -0,0 +1,6 @@ | |||
# ML Problem Child | |||
|
|||
The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity. | |
The Problem child package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LotL) activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 7d33b9d
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
}, | ||
{ | ||
"by_field_name": "host.hostname", | ||
"detector_description": "high sum by host", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peteharverson So the way we classify processes as malicious is using either a supervised model or a blocklist for things the model may have missed. I think adding this in the description will introduce complexity.
"detectors": [ | ||
{ | ||
"by_field_name": "user.name", | ||
"detector_description": "high sum by user", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I think introducing field names in the descriptions might introduce complexity, since users may not know what the fields exactly mean.
As with the DGA model, should the model file name here be something more understandable? It's currently @ajosh0504 - this file name is the name of the model id since the file name is used to determine the model id to send up to the put trained models api to install it to ES. Filenames being the asset id is consistent with the way fleet installs things. |
@@ -0,0 +1,10 @@ | |||
--- | |||
description: "A pipeline of pipelines for ProblemChild detection" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make this description a bit clearer? I feel like "pipeline of pipelines" at first read could be a bit confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just Pipelines for ProblemChild detection
? Is this label actually shown anywhere in the UI?
|
||
The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity. | ||
|
||
To download the assets, click **Settings** > **Install ML Problem child assets**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is within a cluster? Are there any steps needed before this? i.e. ProblemChild requires Platinum level capabilities, is that denoted anywhere (unless I missed it)? Maybe even a quick note in the README would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package manifest file has the license type that is required - that is enforced in Fleet and the installation will fail due to failing to meet the required license type.
cc @lcawl for package/model description text in the readme and such. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cool to see this finally going into the product (after being in an experimental state for so long). I left a few suggestions.
@spong FYSA, if I am not mistaken, this and #2352 and the first integrations outside of the prebuilt rules to add rules destined for the detection engine. Not sure if there were any assumptions or considerations to be made.
"author": [ | ||
"Elastic" | ||
], | ||
"description": "A supervised machine learning model (ProblemChild) or its blocklist has identified\na suspicious Windows process event to be malicious activity.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the line break in the middle intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need a trailing one either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines breaks were part of the release and haven't been changed. I'd need confirmation that it's okay to remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"license": "Elastic License", | ||
"max_signals": 10000, | ||
"name": "Machine Learning Detected a Suspicious Windows Event with a High Malicious Probability Score", | ||
"query": "(problemchild.prediction:1 and problemchild.prediction_probability \u003e 0.98) or blocklist_label:1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the \u003e
a unicode conversion error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - fixed! thanks! 7d0d452
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I'm still seeing this in the latest. The other changes from 7d0d452 appear to have made it though? (i.e. packages/problem_child/manifest.yml
title has been updated to ProblemChild
instead ML ProblemChild
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this - this should now be a >
instead of the encoded version 👍 Fixed in c3692a17d515ebdd31e95e03652f735d04db5748
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spong - looks like running the elastic-package format
command is what's causing the change from >
to the encoded version. Is this expected?
cc @mtojek - also - running elastic-package check
locally is failing with a reference to the ml_model file name - though the update was made to package-spec to match that file name pattern and I've tested with the package-spec validator test so curious if this could be something going on locally? Might need to chat with you quickly on that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, looks like you are right
Line 16 in 1d18f55
"query": "/* Registry Path ends with backslash */\nregistry where /* length(registry.data.strings) \u003e 0 and */\n registry.path : (\"HKEY_USERS\\\\*\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run\\\\\",\n \"HKU\\\\*\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run\\\\\",\n \"HKLM\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run\\\\\", \n \"HKLM\\\\Software\\\\WOW6432Node\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run\\\\\", \n \"HKEY_USERS\\\\*\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Policies\\\\Explorer\\\\Run\\\\\",\n \"HKU\\\\*\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Policies\\\\Explorer\\\\Run\\\\\",\n \"HKLM\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Policies\\\\Explorer\\\\Run\\\\\")\n", |
I guess it is ok to leave it converted then 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we can see in the CI output is:
[2022-03-18T15:59:14.998Z] Error: checking package failed: formatting the integration failed (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child, failFast: true): walking through the integration files failed: formatting file failed (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child/kibana/security_rule/9a2e372a-cbeb-4ad6-a288-017ef086324c.json): file is not formatted (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child/kibana/security_rule/9a2e372a-cbeb-4ad6-a288-017ef086324c.json)
so it confirms that you have to post the formatted code. Sometimes the formatted code might be harder to read, but we don't have a choice here if we want to depend on standard JSON libraries. Please run elastic-package format
and post the formatted content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mtojek - also - running elastic-package check locally is failing with a reference to the ml_model file name - though the update was made to package-spec to match that file name pattern and I've tested with the package-spec validator test so curious if this could be something going on locally? Might need to chat with you quickly on that one.
It depends on which elastic-package release you have locally and which one is in the go.mod
in the Integrations.
BTW we're struggling a bit with updating the dependency - here.
The ProblemChild package contains the [ProblemChild model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LotL) activity. | ||
|
||
To download the assets, click **Settings** > **Install ML ProblemChild assets**. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth elaborating on this to explain the relationship between the model, jobs, and rules. I believe the job and rules will be installed but in a disabled state. Also if the rule is enabled before the job, it will throw an error.
The schema for rules accepts defining enabled = true
, though I am not totally sure on the ml jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding something to the Configuration
section of the readme - after the to download the assets
bit
Something like Ingest data with the installed ingest pipeline to enrich your indices with inference data and run the provided anomaly detection jobs.
If rules are enabled in the same way as described in https://www.elastic.co/guide/en/security/master/rules-ui-management.html, worst case we can link there. For example, add a sentence like, "For more information about activating the detection rules, refer to .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to ultimately have a page in the docs that walks through the installation and use of this integration's assets, but since that's not available to link to yet, I agree with your suggestion for a couple of lines in the Configuration section of the readme as a stop-gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 11eb4a152ddadf13c75d6e469e97f01d6db9dd06
cc @lcawl, @peteharverson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lcawl, @peteharverson, @Winterflower
The latest screenshots and README file have been updated with more detailed instructions on how to use the package - as discussed - in c3692a17d515ebdd31e95e03652f735d04db5748
Would appreciate confirmation that it's what we want when you get a chance! 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes LGTM
4fe4ad2
to
22cc5f4
Compare
} | ||
] | ||
}, | ||
"id": "problem_child", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alvarezmelissa87 What about this "id"? CI mentioned that field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks - missed this the first time! Updated in 38fd11b
Just commenting to update that we've come to consensus on how we'll handle the aforementioned issue around |
/test |
* adds ml_problem_child package * add ml_module to problem child package * ensure modules in right path. update query * fix jobs configs * move datafeeds into attributes * update ingest pipeline name and update job configs * remove hardcoded indices for datafeeds. update descriptions * adds job groups and security rules * update deprecated property and package manifest minimum version * format files * rename package folder. update logo * fix encoding error * update package name to match validation pattern. fix encoding in rule * Add (experimental) to job descriptions) * update README with more asset info * add license requirement to card and readme * add asset context to readme * update card title and description * update overview config section with more instructions * change back to basic license but add platinum subscription language and notice * update codeowners file * update codeowners and readme * update owners in manifest * ensure files formatted correctly * update ml_module asset id to match filename * rename problem_child directory to problemchild for consistency * update ml module id to match filename * fix module id
What does this PR do?
Adds the ML supervised model package for Problem child model.
Package includes:
Checklist
changelog.yml
file.manifest.yml
file to point to the latest Elastic stack release (e.g.^7.13.0
).Author's Checklist
How to test this PR locally
Related issues
Screenshots