Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent crashes with transport: error while dialing: x509: signature check attempts limit reached while verifying certificate chain #1004

Closed
mcpherrinm opened this issue Jul 11, 2019 · 2 comments

Comments

@mcpherrinm
Copy link
Contributor

mcpherrinm commented Jul 11, 2019

time="2019-03-20T01:30:16Z" level=info msg="data directory: \"/data\""
time="2019-03-20T01:30:16Z" level=info msg="Starting plugin catalog" subsystem_name=catalog
time="2019-03-20T01:30:16Z" level=debug msg="WorkloadAttestor(k8s): configuring plugin" subsystem_name=catalog
time="2019-03-20T01:30:16Z" level=debug msg="WorkloadAttestor(unix): configuring plugin" subsystem_name=catalog
time="2019-03-20T01:30:16Z" level=debug msg="NodeAttestor(aws_iid): configuring plugin" subsystem_name=catalog
time="2019-03-20T01:30:16Z" level=debug msg="KeyManager(disk): configuring plugin" subsystem_name=catalog
time="2019-03-20T01:30:17Z" level=error msg="x509: signature check attempts limit reached while verifying certificate chain" subsystem_name=attestor
time="2019-03-20T01:30:17Z" level=info msg="Stopping plugin catalog" subsystem_name=catalog
time="2019-03-20T01:30:17Z" level=error msg="agent crashed: opening stream for attestation: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: error while dialing: x509: signature check attempts limit reached while verifying certificate chain\""

This was a while ago (as the timestamps suggest), but I didn't want to forget about it.

One thing we noticed was the bundle has 108 entries in it - We suspect it's related to that. But I haven't made a minimal repro yet.

@mcpherrinm mcpherrinm changed the title "transport: error while dialing: x509: signature check attempts limit reached while verifying certificate chain Agent crashes with transport: error while dialing: x509: signature check attempts limit reached while verifying certificate chain Jul 11, 2019
@evan2645
Copy link
Member

I worked through this with @mweissbacher a couple weeks ago. IIRC, the bundle grew to this size due to a couple reasons: an aggressive ca_ttl (coupled with memory diskmanager and server restarts), and a version of SPIRE prior to #859.

Prior to #859 (and v0.8.0), SPIRE included intermediate servers in the bundle as a backwards compatibility measure for 0.6.x agents. In this particular case, it caused the bundle to grow large as short-lived intermediates rotated. Moving to 0.8.0 alleviates the situation, but management of a large bundle remains problematic (see #921).

@azdagron looked into this particular error further, I believe. Go x509 library will halt validation if it hits 100 signature checks. Even if there are 100+ members in the bundle though, we still thought that the validator should be able to use AKID/SKID to reduce the number of candidates and avoid this problem. I don't recall exactly what he found there, I'll leave it to him to fill in the rest of the details.

In terms of managing bundle growth generally, I am not quite sure what the ideal behavior would be. We could certainly log warnings about it... but I don't think we can just stop rotation. Practically speaking, this would also place an upper limit on the horizontal scaling of SPIRE server.

@mweissbacher
Copy link
Contributor

We adapted our configuration and haven't run into this issue since. The original configuration was purposefully aggressive - apparently too aggressive!

For context, here's the reasoning for the limit: golang/go#29233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants