Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Elasticsearch index behavior in case of missing settings #42426

Merged
merged 13 commits into from
Jan 29, 2025

Conversation

3kt
Copy link
Contributor

@3kt 3kt commented Jan 25, 2025

Proposed commit message

Fixes #42424

In more details:

  • reactivate test coverage for elasticsearch/index module
  • add a test case for missing routing table / index settings
  • add a fix to pass said test

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • Check in Elasticsearch source code what settings can be missing, and when this happens (creation_date and tier_preference) Irrelevant, we decided to tolerate index setting failure
  • Verifies that the integration that uses the generated data behaves correctly if the aforementioned settings are missing.

How to test this PR locally

Related issues

@3kt 3kt added the bug label Jan 25, 2025
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 25, 2025
@3kt 3kt requested a review from consulthys January 25, 2025 18:12
Copy link
Contributor

mergify bot commented Jan 25, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @3kt? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

@mergify mergify bot assigned 3kt Jan 25, 2025
Copy link
Contributor

mergify bot commented Jan 25, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 25, 2025
@3kt
Copy link
Contributor Author

3kt commented Jan 25, 2025

@consulthys I'm unsure how to approach the missing setting case: theoretically a race condition can cause tis any time, and with enough iterations it is bound to happen on every cluster.

Indeed, a measure from MB is 3 API calls:

  • _stats
  • _cluster/state
  • _settings

If something changes between these calls, it can be impossible for Metricbeat to reconciliate the data structures.

Now the positive thing is that the errors are neither unrecoverable nor blocking, I would expect Metricbeat to continue shipping while reporting errors.

We could silently ignore these issues, keep the current behavior, or refine when/how we want to raise these problems.

What do you think?

@3kt 3kt added the backport-8.17 Automated backport with mergify label Jan 25, 2025
@consulthys
Copy link
Contributor

@3kt Good question! I think the main purpose of this metric set is to report _stats, so if either or both of the other two calls fail for any reason, I think we should log it in WARN but still return the index stats data, but without the settings fields... which also means the new usage dashboard might have some holes from time to time or for specific indices. What do you think?

@3kt
Copy link
Contributor Author

3kt commented Jan 27, 2025

@consulthys I agree, but how would this differ from the current behavior? AFAICT the eventsMapping function returns an error and nothing else, and upon returning such a thing, the agent would show up as unhealthy (which is the very thing we would like to prevent).

@consulthys
Copy link
Contributor

how would this differ from the current behavior?

I would not add the error to errs, but just log it. Now, the problem is that for some indices we know it's going to fail every time, so we should probably log this as DEBUG in order not to deliberately increase the logs volume. Wdyt?

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 28, 2025
@TristanAdams03
Copy link

what is the best way to fix this at the moment I am getting 4 errors in regards to "key not found" for the elasticsearch integration module? Thank you.

@TristanAdams03
Copy link

Is this on master nodes only or the kibana node?

@TristanAdams03
Copy link

is there a way to fix the issue with "key not found" within elastic without messing with the code it is causing my stack to pop up as degrading because the indexes in question are throwing back that error.

@TristanAdams03
Copy link

i do not know where these files are located

@3kt 3kt marked this pull request as ready for review January 28, 2025 22:32
@3kt 3kt requested review from a team as code owners January 28, 2025 22:32
@3kt 3kt requested review from belimawr and rdner January 28, 2025 22:32
Copy link
Contributor

@consulthys consulthys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGT Stack Monitoring

Copy link
Member

@rdner rdner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of reformatting of Python code that does not relate to the fix but all in all looks good to me.

@3kt
Copy link
Contributor Author

3kt commented Jan 29, 2025

Lots of reformatting of Python code that does not relate to the fix but all in all looks good to me.

Yea, for some reason the make fmt triggered this - unsure why but let's not argue with the CI ;)

Copy link
Contributor

mergify bot commented Jan 29, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix_missing_settings upstream/fix_missing_settings
git merge upstream/main
git push upstream fix_missing_settings

@3kt 3kt enabled auto-merge (squash) January 29, 2025 14:39
Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@nicholasberlin nicholasberlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packetbeat changes LGTM

@3kt 3kt merged commit 074a201 into elastic:main Jan 29, 2025
142 checks passed
mergify bot pushed a commit that referenced this pull request Jan 29, 2025
* Unskip test

* Bold move: assumes we can drop test coverage for unsupported versions

* Added test case: index with no cluster state entry or settings

* Added comment about "race situation"

* Logged index settings error as debug, fixed test case

* Handling for multi version tests

* Added changelog entry

---------

Co-authored-by: Valentin Crettaz <[email protected]>
Co-authored-by: Valentin Crettaz <[email protected]>
(cherry picked from commit 074a201)
mergify bot pushed a commit that referenced this pull request Jan 29, 2025
* Unskip test

* Bold move: assumes we can drop test coverage for unsupported versions

* Added test case: index with no cluster state entry or settings

* Added comment about "race situation"

* Logged index settings error as debug, fixed test case

* Handling for multi version tests

* Added changelog entry

---------

Co-authored-by: Valentin Crettaz <[email protected]>
Co-authored-by: Valentin Crettaz <[email protected]>
(cherry picked from commit 074a201)
3kt added a commit that referenced this pull request Jan 29, 2025
…#42474)

* Unskip test

* Bold move: assumes we can drop test coverage for unsupported versions

* Added test case: index with no cluster state entry or settings

* Added comment about "race situation"

* Logged index settings error as debug, fixed test case

* Handling for multi version tests

* Added changelog entry

---------

Co-authored-by: Valentin Crettaz <[email protected]>
Co-authored-by: Valentin Crettaz <[email protected]>
(cherry picked from commit 074a201)

Co-authored-by: Alexis Charveriat <[email protected]>
consulthys pushed a commit that referenced this pull request Jan 30, 2025
…#42475)

* Unskip test

* Bold move: assumes we can drop test coverage for unsupported versions

* Added test case: index with no cluster state entry or settings

* Added comment about "race situation"

* Logged index settings error as debug, fixed test case

* Handling for multi version tests

* Added changelog entry

---------

Co-authored-by: Valentin Crettaz <[email protected]>
Co-authored-by: Valentin Crettaz <[email protected]>
(cherry picked from commit 074a201)

Co-authored-by: Alexis Charveriat <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport backport-8.x Automated backport to the 8.x branch with mergify backport-8.17 Automated backport with mergify bug Feature:Stack Monitoring Team:Monitoring Stack Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metricbeat] [Elasticsearch] Error in case of missing settings key
6 participants