Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: better resilience for concurrent node starts #3011

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

prasannavl
Copy link
Member

Summary

  • Fixes most node start failures due to high concurrency during normal operation.
  • Rationale: When concurrency levels are high, nodes end up failing to start due to the strict flaky way of checking of RPC start. While this is still not bulletproof, and if system is out of resources, there can still be other failures, however this adds significant resiliency for on-start check.
    • For most normal operation (eg: MAKE_JOBS=<no-of-logical-CPUs>), this provides a good default and avoids most false failures due to slightly delayed node starts.

Implications

  • Storage

    • Database reindex required
    • Database reindex optional
    • Database reindex not required
    • None
  • Consensus

    • Network upgrade required
    • Includes backward compatible changes
    • Includes consensus workarounds
    • Includes consensus refactors
    • None

@prasannavl prasannavl merged commit 94d1aab into master Aug 27, 2024
19 of 26 checks passed
@prasannavl prasannavl deleted the pvl/fix-flaky-tests-cc branch August 27, 2024 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant