Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ML supervised model Problem child package #2115

Merged

Conversation

alvarezmelissa87
Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 commented Nov 3, 2021

What does this PR do?

Adds the ML supervised model package for Problem child model.

Package includes:

  • pipelines
  • ml_module
  • ml_model
  • security rules

ProblemChildCard2

ProblemChildReadme

problemChildOverview3

ProblemChildNotice

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • If I'm introducing a new feature, I have modified the Kibana version constraint in my package's manifest.yml file to point to the latest Elastic stack release (e.g. ^7.13.0).

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@alvarezmelissa87 alvarezmelissa87 added the enhancement New feature or request label Nov 3, 2021
@alvarezmelissa87 alvarezmelissa87 self-assigned this Nov 3, 2021
@elasticmachine
Copy link

elasticmachine commented Nov 3, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-28T14:39:27.986+0000

  • Duration: 21 min 46 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@alvarezmelissa87 alvarezmelissa87 force-pushed the ml-problem-child-package branch 2 times, most recently from 6a60abe to fc68b33 Compare December 6, 2021 18:06
@alvarezmelissa87 alvarezmelissa87 force-pushed the ml-problem-child-package branch from fc68b33 to b83c6dd Compare December 8, 2021 23:02
@peteharverson
Copy link
Contributor

Is it possible to set a description field for the model, so that something like The Problem child model is used to detect living off the land (LOtl) activity. appears in the models list:

image

},
{
"by_field_name": "host.hostname",
"detector_description": "high sum by host",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, could the description here be edited to indicate the field it is running on blocklist_label? I don't have enough context on what information is held in blocklist_label to advise here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peteharverson So the way we classify processes as malicious is using either a supervised model or a blocklist for things the model may have missed. I think adding this in the description will introduce complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thanks for explanation @ajosh0504

},
{
"by_field_name": "user.name",
"detector_description": "high sum by host",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include the field name in the description of this detector - what is it a high sum of?

"detectors": [
{
"by_field_name": "user.name",
"detector_description": "high sum by user",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include the field name in the description of this detector - what is it a high sum of?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think introducing field names in the descriptions might introduce complexity, since users may not know what the fields exactly mean.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Apoorva, field names aren't always the most user-centric and don't add that much value (imo) for the complexity they could add

"detectors": [
{
"by_field_name": "process.parent.name",
"detector_description": "high sum by parent process",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include the field name in the description of this detector - what is it a high sum of?

},
{
"by_field_name": "process.parent.name",
"detector_description": "high sum by host",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include the field name in the description of this detector - what is it a high sum of?

Copy link
Contributor

@ajosh0504 ajosh0504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this done! Super excited to see these models in packages. Just one question. So once a user installs a package, what's the process for them to update/change artifacts produced by the package?

packages/ml_problem_child/manifest.yml Outdated Show resolved Hide resolved
@@ -0,0 +1,6 @@
# ML Problem Child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the name consistent everywhere? I'm seeing ProblemChild, problem child and ProblemChild being used. We've been going with ProblemChild so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use ProblemChild everywhere in 7d33b9d

@@ -0,0 +1,6 @@
# ML Problem Child

The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity.
The Problem child package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LotL) activity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 7d33b9d

packages/ml_problem_child/manifest.yml Outdated Show resolved Hide resolved
packages/ml_problem_child/manifest.yml Outdated Show resolved Hide resolved
},
{
"by_field_name": "host.hostname",
"detector_description": "high sum by host",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peteharverson So the way we classify processes as malicious is using either a supervised model or a blocklist for things the model may have missed. I think adding this in the description will introduce complexity.

"detectors": [
{
"by_field_name": "user.name",
"detector_description": "high sum by user",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think introducing field names in the descriptions might introduce complexity, since users may not know what the fields exactly mean.

@ajosh0504
Copy link
Contributor

ajosh0504 commented Jan 31, 2022

As with the DGA model, should the model file name here be something more understandable? It's currently problemchild_20210526_1.0.json. This will also need to be changed in the inference pipeline.


@ajosh0504 - this file name is the name of the model id since the file name is used to determine the model id to send up to the put trained models api to install it to ES. Filenames being the asset id is consistent with the way fleet installs things.

@@ -0,0 +1,10 @@
---
description: "A pipeline of pipelines for ProblemChild detection"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this description a bit clearer? I feel like "pipeline of pipelines" at first read could be a bit confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just Pipelines for ProblemChild detection? Is this label actually shown anywhere in the UI?


The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity.

To download the assets, click **Settings** > **Install ML Problem child assets**.
Copy link

@dishadasgupta dishadasgupta Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is within a cluster? Are there any steps needed before this? i.e. ProblemChild requires Platinum level capabilities, is that denoted anywhere (unless I missed it)? Maybe even a quick note in the README would be helpful

Copy link
Contributor Author

@alvarezmelissa87 alvarezmelissa87 Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package manifest file has the license type that is required - that is enforced in Fleet and the installation will fail due to failing to meet the required license type.

@alvarezmelissa87 alvarezmelissa87 marked this pull request as ready for review February 2, 2022 00:14
@alvarezmelissa87
Copy link
Contributor Author

cc @lcawl for package/model description text in the readme and such.

Copy link
Contributor

@brokensound77 brokensound77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cool to see this finally going into the product (after being in an experimental state for so long). I left a few suggestions.

@spong FYSA, if I am not mistaken, this and #2352 and the first integrations outside of the prebuilt rules to add rules destined for the detection engine. Not sure if there were any assumptions or considerations to be made.

"author": [
"Elastic"
],
"description": "A supervised machine learning model (ProblemChild) or its blocklist has identified\na suspicious Windows process event to be malicious activity.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the line break in the middle intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need a trailing one either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines breaks were part of the release and haven't been changed. I'd need confirmation that it's okay to remove them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW there doesn't appear to be any issue with rendering (these newlines are ignored):

"license": "Elastic License",
"max_signals": 10000,
"name": "Machine Learning Detected a Suspicious Windows Event with a High Malicious Probability Score",
"query": "(problemchild.prediction:1 and problemchild.prediction_probability \u003e 0.98) or blocklist_label:1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the \u003e a unicode conversion error?

Copy link
Contributor Author

@alvarezmelissa87 alvarezmelissa87 Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - fixed! thanks! 7d0d452

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm still seeing this in the latest. The other changes from 7d0d452 appear to have made it though? (i.e. packages/problem_child/manifest.yml title has been updated to ProblemChild instead ML ProblemChild)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this - this should now be a > instead of the encoded version 👍 Fixed in c3692a17d515ebdd31e95e03652f735d04db5748

Copy link
Contributor Author

@alvarezmelissa87 alvarezmelissa87 Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spong - looks like running the elastic-package format command is what's causing the change from > to the encoded version. Is this expected?
cc @mtojek - also - running elastic-package check locally is failing with a reference to the ml_model file name - though the update was made to package-spec to match that file name pattern and I've tested with the package-spec validator test so curious if this could be something going on locally? Might need to chat with you quickly on that one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, looks like you are right

"query": "/* Registry Path ends with backslash */\nregistry where /* length(registry.data.strings) \u003e 0 and */\n registry.path : (\"HKEY_USERS\\\\*\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run\\\\\",\n \"HKU\\\\*\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run\\\\\",\n \"HKLM\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run\\\\\", \n \"HKLM\\\\Software\\\\WOW6432Node\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run\\\\\", \n \"HKEY_USERS\\\\*\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Policies\\\\Explorer\\\\Run\\\\\",\n \"HKU\\\\*\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Policies\\\\Explorer\\\\Run\\\\\",\n \"HKLM\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Policies\\\\Explorer\\\\Run\\\\\")\n",

I guess it is ok to leave it converted then 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we can see in the CI output is:

[2022-03-18T15:59:14.998Z] Error: checking package failed: formatting the integration failed (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child, failFast: true): walking through the integration files failed: formatting file failed (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child/kibana/security_rule/9a2e372a-cbeb-4ad6-a288-017ef086324c.json): file is not formatted (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child/kibana/security_rule/9a2e372a-cbeb-4ad6-a288-017ef086324c.json)

so it confirms that you have to post the formatted code. Sometimes the formatted code might be harder to read, but we don't have a choice here if we want to depend on standard JSON libraries. Please run elastic-package format and post the formatted content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mtojek - also - running elastic-package check locally is failing with a reference to the ml_model file name - though the update was made to package-spec to match that file name pattern and I've tested with the package-spec validator test so curious if this could be something going on locally? Might need to chat with you quickly on that one.

It depends on which elastic-package release you have locally and which one is in the go.mod in the Integrations.

BTW we're struggling a bit with updating the dependency - here.

The ProblemChild package contains the [ProblemChild model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LotL) activity.

To download the assets, click **Settings** > **Install ML ProblemChild assets**.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth elaborating on this to explain the relationship between the model, jobs, and rules. I believe the job and rules will be installed but in a disabled state. Also if the rule is enabled before the job, it will throw an error.

The schema for rules accepts defining enabled = true, though I am not totally sure on the ml jobs

Copy link
Contributor Author

@alvarezmelissa87 alvarezmelissa87 Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding something to the Configuration section of the readme - after the to download the assets bit
Something like Ingest data with the installed ingest pipeline to enrich your indices with inference data and run the provided anomaly detection jobs.

If rules are enabled in the same way as described in https://www.elastic.co/guide/en/security/master/rules-ui-management.html, worst case we can link there. For example, add a sentence like, "For more information about activating the detection rules, refer to .

cc @peteharverson, @lcawl, @brokensound77

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to ultimately have a page in the docs that walks through the installation and use of this integration's assets, but since that's not available to link to yet, I agree with your suggestion for a couple of lines in the Configuration section of the readme as a stop-gap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 11eb4a152ddadf13c75d6e469e97f01d6db9dd06
cc @lcawl, @peteharverson

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lcawl, @peteharverson, @Winterflower
The latest screenshots and README file have been updated with more detailed instructions on how to use the package - as discussed - in c3692a17d515ebdd31e95e03652f735d04db5748
Would appreciate confirmation that it's what we want when you get a chance! 🙏

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM

@alvarezmelissa87 alvarezmelissa87 force-pushed the ml-problem-child-package branch from 4fe4ad2 to 22cc5f4 Compare March 22, 2022 03:24
}
]
},
"id": "problem_child",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvarezmelissa87 What about this "id"? CI mentioned that field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks - missed this the first time! Updated in 38fd11b

@spong
Copy link
Member

spong commented Mar 24, 2022

Just commenting to update that we've come to consensus on how we'll handle the aforementioned issue around rule_id uniqueness between packages, and that there is no reason to hold up the merging or publishing of this package from the Security Solution side. Please see this comment elastic/kibana#128202 (comment) for all the details, and thank you all for all your input here! 🙂

@mtojek
Copy link
Contributor

mtojek commented Mar 28, 2022

/test

@alvarezmelissa87 alvarezmelissa87 merged commit 99ce3f1 into elastic:main Mar 29, 2022
@alvarezmelissa87 alvarezmelissa87 deleted the ml-problem-child-package branch March 29, 2022 00:30
eyalkraft pushed a commit to build-security/integrations that referenced this pull request Mar 30, 2022
* adds ml_problem_child package

* add ml_module to problem child package

* ensure modules in right path. update query

* fix jobs configs

* move datafeeds into attributes

* update ingest pipeline name and update job configs

* remove hardcoded indices for datafeeds. update descriptions

* adds job groups and security rules

* update deprecated property and package manifest minimum version

* format files

* rename package folder. update logo

* fix encoding error

* update package name to match validation pattern. fix encoding in rule

* Add (experimental) to job descriptions)

* update README with more asset info

* add license requirement to card and readme

* add asset context to readme

* update card title and description

* update overview config section with more instructions

* change back to basic license but add platinum subscription language and notice

* update codeowners file

* update codeowners and readme

* update owners in manifest

* ensure files formatted correctly

* update ml_module asset id to match filename

* rename problem_child directory to problemchild for consistency

* update ml module id to match filename

* fix module id
@andrewkroh andrewkroh added Integration:problemchild Living off the Land Attack Detection New Integration Issue or pull request for creating a new integration package. labels Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:problemchild Living off the Land Attack Detection New Integration Issue or pull request for creating a new integration package.
Projects
None yet
Development

Successfully merging this pull request may close these issues.