Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps] use Data Streams for AAD indices in serverless #160572

Merged
merged 29 commits into from
Aug 30, 2023

Conversation

pmuellr
Copy link
Member

@pmuellr pmuellr commented Jun 26, 2023

resolves #154266

Changes the way the alerts-as-data (AAD) indices are created and written to, to allow them to be built as they have been in the past (alias and backing indices created manually) OR as an ES Data Stream.

Serverless will use Data Streams, other environments will use the existing alias and backing indices. The determination is made by optionally including the serverless plugin, and determining if it's available.

The implementation is organized around a DataStreamAdapter object, which is instantiated with a "data stream" or "alias" flavor, and then it handles the specialized behavior. Currently, a lot of the smaller implementation bits, like setting property values in ES calls, is done via in-line boolean checks of that object, as to whether data streams or aliases are being used. This could probably be cleaned up some.

Existing non-serverless function tests are largely unchanged, as they can't test the new data stream path. Some tests have been added to the serverless function tests, to test basic reading / writing via updated alert documents.

DEFER

  • more serverless AaD tests

  • potentially re-org alerts_service tests; currently this module makes no direct ES calls, but it's jest tests are quite large and mock a LOT of ES calls. The calls are now indirect in helpers. Almost feels like the code was re-org'd to add the helpers, but the tests weren't. Would likely make the tests more understandable with a re-org - spent lots of time in this test module

    • x-pack/plugins/alerting/server/alerts_service/alerts_service.test.ts
    • x-pack/plugins/alerting/server/alerts_service/lib/create_or_update_ilm_policy.test.ts
    • x-pack/plugins/alerting/server/alerts_service/lib/create_or_update_index_template.test.ts
    • x-pack/plugins/alerting/server/alerts_service/lib/data_stream_adapter.ts
  • [Response Ops][Alerting] Discuss how to handle possible conflicts when updating alerts-as-data documents #158403 - this issue is more noticeable now that we HAVE to do OCC with data streams, so we get errors instead of simply overwriting documents (which is also bad)

Checklist

Delete any items that are not applicable to this PR.

@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch from 1722158 to 1591879 Compare July 10, 2023 23:11
@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch from 64625ec to 986774c Compare July 12, 2023 14:22
@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch from 695a5a2 to 0060185 Compare July 25, 2023 19:03
@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch from 0060185 to 259f517 Compare July 28, 2023 17:43
@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch 2 times, most recently from 4df6539 to 11d2667 Compare August 3, 2023 20:36
@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch 2 times, most recently from 74421c2 to c7c0886 Compare August 4, 2023 18:37
pmuellr added a commit to pmuellr/kibana that referenced this pull request Aug 9, 2023
Changes the way the alerts-as-data (AAD) indices are created and written
to, to allow them to be built as they have been in the past (alias and
backing indices created manually) OR as an ES Data Stream.

Serverless will use Data Streams, other environments will use the
existing alias and backing indices.

The configuration `xpack.alerting.useDataStreamForAlerts` determines
which to use, and is false by default.

This configuration is not intended to be used by customers, as it when
used in the wrong environment, will cause failures creating and writing
to the alert indices.

PR elastic#160572

- fix jest test
- start integrating bits of the dataStreamAdapter
- fix tests
- fix function tests
- fix FT
- change the rule registry writer
- fix FT
- start using the new ds adapter to create alias indices
- add create data stream logic
- use alias by default createConcreteWriteIndex() as it's being called by other plugins now
- start adapting jest tests to test datastream alerts as well
- extend create_concrete_write_index jest tests for data stream
- rebase, fix merge conflicts
- fix jest test after merge conflict
- convert the RR trial test to use data streams
- fix jest test, remove ILM for datastream, run serverless o11y test with ds, add rr test with ds
- fix jest and function tests
- fix jest tests
@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch 2 times, most recently from 8ad645e to fa31bf0 Compare August 9, 2023 18:44
@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch 2 times, most recently from 4007f39 to b7e0206 Compare August 10, 2023 18:18
@pmuellr pmuellr force-pushed the alerting/serverless-aad-to-datastream branch 2 times, most recently from d361034 to bf59bb3 Compare August 15, 2023 18:27
@pmuellr
Copy link
Member Author

pmuellr commented Aug 29, 2023

@elasticmachine merge upstream

I think this should clear the failing FTR test via PR 165103

Copy link
Member

@MadameSheema MadameSheema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-engineering-productivity changes LGTM

Copy link
Contributor

@PhilippeOberti PhilippeOberti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the Threat Hunting Investigations team

patrykkopycinski added a commit to patrykkopycinski/kibana that referenced this pull request Aug 30, 2023
elastic#164988)

## Summary

Needed for
elastic#160572 (comment)

(cherry picked from commit f8ad13a)

# Conflicts:
#	x-pack/test/security_solution_cypress/cypress/e2e/exceptions/rule_details_flow/add_edit_exception.cy.ts
Copy link
Contributor

@e40pud e40pud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DE changes LGTM

// https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html#_refreshing_shards_2
// Note: Before we tried to use "refresh: wait_for" but I do not think that was available and instead it defaulted to "refresh: true"
// but the tests do not pass with "refresh: false". If at some point a "refresh: wait_for" is implemented, we should use that instead.
refresh: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is allowing refresh:true something we want to do?
I remember we did some work to avoid explicitly refreshing indices: https://github.com/elastic/security-team/issues/6561

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In similar work, done in #162508, I had the same issue, so had add retry calls to wait until item reindexed and only after this finish request.

It worth to note though, updates in lists/items are not available in UI, only through API calls, so we do not expect them to be called frequently, especially in Serverless. Only list delete available in UI, which is also not expected to be frequent operation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested it locally and it seems false works fine, let's see if CI agrees 🤞

// https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html#_refreshing_shards_2
// Note: Before we tried to use "refresh: wait_for" but I do not think that was available and instead it defaulted to "refresh: true"
// but the tests do not pass with "refresh: false". If at some point a "refresh: wait_for" is implemented, we should use that instead.
refresh: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

is allowing refresh:true something we want to do?
I remember we did some work to avoid explicitly refreshing indices: https://github.com/elastic/security-team/issues/6561

@kibana-ci
Copy link
Collaborator

kibana-ci commented Aug 30, 2023

💔 Build Failed

Failed CI Steps

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 749 759 +10
ruleRegistry 235 238 +3
total +13

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 12.6MB 12.6MB +4.0B

Canvas Sharable Runtime

The Canvas "shareable runtime" is an bundle produced to enable running Canvas workpads outside of Kibana. This bundle is included in third-party webpages that embed canvas and therefor should be as slim as possible.

id before after diff
module count - 5580 +5580
total size - 6.0MB +6.0MB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
alerting 46 49 +3
Unknown metric groups

API count

id before after diff
alerting 780 790 +10
ruleRegistry 264 267 +3
total +13

ESLint disabled line counts

id before after diff
alerting 90 89 -1

Total ESLint disabled count

id before after diff
alerting 92 91 -1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pmuellr pmuellr merged commit 95fa356 into elastic:main Aug 30, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Aug 30, 2023
patrykkopycinski added a commit that referenced this pull request Aug 31, 2023
…ver.load (#164988) (#165203)

# Backport

This will backport the following commits from `main` to `8.10`:
- [[security_solution_cypress] Add support for options in
EsArchiver.load
(#164988)](#164988)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Patryk
Kopyciński","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-08-28T18:21:43Z","message":"[security_solution_cypress]
Add support for options in EsArchiver.load (#164988)\n\n##
Summary\r\n\r\nNeeded
for\r\nhttps://github.com//pull/160572#issuecomment-1695775316","sha":"f8ad13ae1f078aa5f0cb6e1190e3c167dec168c8","branchLabelMapping":{"^v8.11.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:prev-minor","v8.11.0"],"number":164988,"url":"https://github.com/elastic/kibana/pull/164988","mergeCommit":{"message":"[security_solution_cypress]
Add support for options in EsArchiver.load (#164988)\n\n##
Summary\r\n\r\nNeeded
for\r\nhttps://github.com//pull/160572#issuecomment-1695775316","sha":"f8ad13ae1f078aa5f0cb6e1190e3c167dec168c8"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.11.0","labelRegex":"^v8.11.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/164988","number":164988,"mergeCommit":{"message":"[security_solution_cypress]
Add support for options in EsArchiver.load (#164988)\n\n##
Summary\r\n\r\nNeeded
for\r\nhttps://github.com//pull/160572#issuecomment-1695775316","sha":"f8ad13ae1f078aa5f0cb6e1190e3c167dec168c8"}}]}]
BACKPORT-->
eokoneyo pushed a commit to eokoneyo/kibana that referenced this pull request Aug 31, 2023
…#160572)

resolves elastic#154266

Changes the way the alerts-as-data (AAD) indices are created and written
to, to allow them to be built as they have been in the past (alias and
backing indices created manually) OR as an ES Data Stream.

Serverless will use Data Streams, other environments will use the
existing alias and backing indices. The determination is made by
optionally including the `serverless` plugin, and determining if it's
available.

The implementation is organized around a `DataStreamAdapter` object,
which is instantiated with a "data stream" or "alias" flavor, and then
it handles the specialized behavior. Currently, a lot of the smaller
implementation bits, like setting property values in ES calls, is done
via in-line boolean checks of that object, as to whether data streams or
aliases are being used. This could probably be cleaned up some.

Existing non-serverless function tests are largely unchanged, as they
can't test the new data stream path. Some tests have been added to the
serverless function tests, to test basic reading / writing via updated
alert documents.

## DEFER

- more serverless AaD tests

- elastic#158403 - this issue is more
noticeable now that we HAVE to do OCC with data streams, so we get
errors instead of simply overwriting documents (which is also bad)

Co-authored-by: Patryk Kopycinski <[email protected]>
pmuellr added a commit that referenced this pull request Sep 26, 2024
resolves: #190376

In PR #160572, we changed from
using just the bulk op `index` to using `create` when new alerts are
being created.

Unfortunately, the code to handle the bulk responses didn't take into
account that the bulk responses for `create`s need different handling
than `index`s.  Specifically, conflicts for `create` were being treated
as errors.

This PR changes the processing to consider additional ops besides just
`index`.
pmuellr added a commit to pmuellr/kibana that referenced this pull request Oct 15, 2024
resolves: elastic#190376

In PR elastic#160572, we changed from
using just the bulk op `index` to using `create` when new alerts are
being created.

Unfortunately, the code to handle the bulk responses didn't take into
account that the bulk responses for `create`s need different handling
than `index`s.  Specifically, conflicts for `create` were being treated
as errors.

This PR changes the processing to consider additional ops besides just
`index`.
pmuellr added a commit to pmuellr/kibana that referenced this pull request Dec 10, 2024
resolves: elastic#190376

In PR elastic#160572, we changed from
using just the bulk op `index` to using `create` when new alerts are
being created.

Unfortunately, the code to handle the bulk responses didn't take into
account that the bulk responses for `create`s need different handling
than `index`s.  Specifically, conflicts for `create` were being treated
as errors.

This PR changes the processing to consider additional ops besides just
`index`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:build-serverless-image ci:serverless-test-all Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change .alerts to use datastream