Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the puma persistent_timeout higher than nginx #2191

Merged
merged 1 commit into from
May 25, 2021

Conversation

ismarc
Copy link
Contributor

@ismarc ismarc commented May 21, 2021

If the load balancer/proxy/server in front of Conjur has
a longer timeout than Conjur, there will be 502s as Conjur will
close the connection even if there are pending requests. This
can lead to it being marked unavailable, errors returned to the
client and performance issues as nginx/load balancer in front of it

What does this PR do?

  • What's changed? Why were these changes made? This specifies the puma persistent_timeout value that should be higher than any load balancer or proxy in front of Conjur. The default value is 20 seconds, shorter than nginx and most load balancer default values.

What ticket does this PR close?

Resolves

Potentially solves symptoms associated with

Checklists

Change log

  • The CHANGELOG has been updated, or
  • This PR does not include user-facing changes and doesn't require a CHANGELOG update

Test coverage

  • This PR includes new unit and integration tests to go with the code changes, or
  • The changes in this PR do not require tests

Documentation

  • Docs (e.g. READMEs) were updated in this PR, and/or there is a follow-on issue to update docs, or
  • This PR does not require updating any documentation

API Changes

  • The OpenAPI spec has been updated to meet new API changes (or an issue has been opened), or
  • The changes in this PR do not affect the Conjur API

@ismarc ismarc requested a review from a team as a code owner May 21, 2021 21:41
@ismarc ismarc force-pushed the fix-keepalive-missmatch branch from 4026019 to a7dfed5 Compare May 21, 2021 21:51
CHANGELOG.md Outdated
@@ -26,6 +26,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Upgrade Puma to 4.3.8 to resolve
[CVE-2021-29509](https://nvd.nist.gov/vuln/detail/CVE-2021-29509).

### Fixed
- Specify keepalive timeout for puma to be longer than proxy and load balancers
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists should be surrounded by blank lines

@ismarc ismarc force-pushed the fix-keepalive-missmatch branch from a7dfed5 to d1b5e1e Compare May 21, 2021 22:16
@@ -17,6 +17,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Fix bug where running `conjurctl server` or `conjurctl account create` with
passwords that contain `,`s sent via stdin raised an error.
[cyberark/conjur#2159](https://github.com/cyberark/conjur/issues/2159)
- Specify keepalive timeout for puma to be longer than proxy and load balancers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you link to an issue please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this change affect the user?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it fix a known bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added links to known-related issues in the description. The effect should be to prevent the random and spurious 502 errors frequently seen when running conjur behind a load balancer/nginx/proxy due to mismatched keepalive times, but otherwise users should not see behavioral changes. The issue was discovered for the ConjurOps v5 deployment, but is a symptom that has been seen pretty frequently in other contexts and always chalked up to transient issues.

@ismarc ismarc force-pushed the fix-keepalive-missmatch branch from d1b5e1e to 5da9c55 Compare May 24, 2021 17:24
jtuttle
jtuttle previously approved these changes May 25, 2021
If the load balancer/proxy/server in front of Conjur has
a longer timeout than Conjur, there will be 502s as Conjur will
close the connection even if there are pending requests.  This
can lead to it being marked unavailable, errors returned to the
client and performance issues as nginx/load balancer in front of it
@codeclimate
Copy link

codeclimate bot commented May 25, 2021

Code Climate has analyzed commit 8a2602f and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 89.2% (0.0% change).

View more on Code Climate.

@ismarc ismarc merged commit f0e9a25 into master May 25, 2021
@whip113
Copy link

whip113 commented May 27, 2021

@ismarc sent me this little tweak for some flakiness I observed while trying to use the seed-fetcher with an AWS auto-scaling group. Note that in my setup, I did NOT have a load balancer in front of my leader, my followers connected directly to it. I had regular failures with deployed followers where some wouldn't get the seed file at all. This seemed to get markedly worse after I exceeded 30 followers (with max_wal_senders increased appropriately). After making the change to puma.rb as suggested here, I was able to roll out 90 followers with zero failures (failure rate before was approx 20%). I suggest we publish this in a KBA as well, so that customers can start benefitting from it before 12.2 is released (or even without having to upgrade to 12.2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants