Skip to content
This repository has been archived by the owner on Nov 9, 2017. It is now read-only.

Unexpected EOF #42

Closed
danielflippance opened this issue Jan 30, 2017 · 13 comments
Closed

Unexpected EOF #42

danielflippance opened this issue Jan 30, 2017 · 13 comments

Comments

@danielflippance
Copy link

Setting Sabayon up on one of my environments is returning an EOF unexpectedly. Everything seems to be setup correctly, the same way I have it setup for another environment which is working great with Sabayon. The error I see is:

$ heroku run bin/sabayon -a sabayon-for-my-app

Running bin/sabayon on ⬢ sabayon-for-my-app... up, run.1834 (Hobby)
2017/01/30 17:14:22 cert.create email='[email protected]' domains='[www.domain1.com www.domain2.com]'
2017/01/30 17:14:23 [INFO] acme: Registering account for [email protected]
2017/01/30 17:14:23 [INFO][www.domain1.com, www.domain2.com] acme: Obtaining bundled SAN certificate
2017/01/30 17:14:24 EOF

$

Settings:

ACME_APP_NAME: my-app
ACME_DOMAIN: www.domain1.com,www.domain2.com
ACME_EMAIL: [email protected]
HEROKU_TOKEN: .....

sabayon-for-my-app is using Hobby dynos; my-app is using standard dynos.

@dmathieu
Copy link
Owner

That looks like a connection unexpectedly lost when calling the letsencrypt API. You may want to try again in a couple hours.

@danielflippance
Copy link
Author

Yep, I thought that too but I tried it last night and this morning several times with the same result.

@dmathieu
Copy link
Owner

This EOF is definitely an error in the connection, either with the letsencrypt API or the heroku one.
Seeing where this is happening in the logs, this would be with letsencrypt though.

Basically, an error is being returned by the API here.
There is nothing in this codebase I can change that will fix this kind of connectivity error.

@dmathieu
Copy link
Owner

The only other way this could have failed is if letsencrypt introduced a breaking change in their API. I just tried regenerating a certificate in a test app using sabayon, which worked properly.

@danielflippance
Copy link
Author

Looking back through the output of my various attempts I see that one of them returned a different error message:

acme: Error 400 - urn:acme:error:badNonce - JWS has invalid anti-replay nonce 7TarXZbO....

All of the other attempts returned the EOF error.

@danielflippance
Copy link
Author

I wonder if the problem may be caused because these domains are all production domains running on Heroku and already use a single Heroku SSL Endpoint with a valid UCC SAN certificate.

My Sabayon use is intended to replace the Heroku SSL Endpoint, but I can't remove it without incurring downtime for the clients of those domains.

I ran a test to get a cert using certbot and that failed too: Incorrect validation certificate for TLS-SNI-01 challenge.

@dmathieu
Copy link
Owner

Have you tried raising that issue with letsencrypt support?

@danielflippance
Copy link
Author

@danielflippance
Copy link
Author

I got a cert successfully using certbot --manual, so it's not likely the existing cert causing the problem.

Perhaps the problem is to do with the Pre-boot feature of Heroku. This feature is used to prevent downtime on dynos, but a side effect is that when you change an environment variable in a Heroku app, that change is not immediately available to the code running on the dyno. My experience is that it takes 1-2 minutes before the change shows up in the code.

That would explain the problem for the first attempt running Sabayon (or first few). But would not explain it if the ACME_KEY-n and ACME_TOKEN_n keep their values every time Sabayon is run. From my quick tests they seem to keep their values, so then it's probably not caused by Pre-boot...

@dmathieu
Copy link
Owner

I don't think this is an issue with preboot. The EOF happens before the app is updated with the new values.
By default, sabayon waits for 20 seconds to ack the restart. But you can increase that with the RESTART_WAIT_TIME config var.

@danielflippance
Copy link
Author

It looks like this is a bug in https://github.com/xenolf/lego - go-acme/lego#338

@mtimofiiv
Copy link

mtimofiiv commented Feb 14, 2017

In the interests of helping this along, I have had the same issue as the OP. So I tried re-deploying fresh to Heroku, and entered the same ENV variables as the current app I had this error on. So I go in and heroku run bash into the app and ran sabayon --force to see what would happen.

I got an error message (quoted below). After this error happened once, I could not see the error again and instead saw EOF without fail.

Now I also posted this in the lego issue since the trace clearly points there, but I am also going to leave it here for the benefit of debugging.

My theory as to why this I got this result is this - perhaps because the cron scheduler runs this job, this error happened to all of us affected, but since it was on a cron cycle who knows when, it would be impossible to find, and all we see now is the EOF that proceeds it.

I am not 100% sure how sabayon works, but is there any kind of state store involved that perhaps keeps track of nonces or something?

panic: runtime error: slice bounds out of range

goroutine 74 [running]:
panic(0x717ee0, 0xc4200100e0)
	/app/tmp/cache/go1.7.5/go/src/runtime/panic.go:500 +0x1a1
github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme.(*jws).Nonce(0xc42038eb00, 0xc4203a0500, 0x1, 0xc4201183f0, 0x1)
	/tmp/tmp.Q2359EFByl/.go/src/github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme/jws.go:105 +0xd4
github.com/dmathieu/sabayon/vendor/gopkg.in/square/go-jose%2ev1.(*genericSigner).Sign(0xc420308a50, 0xc4203a0460, 0x7d, 0x9d, 0x7f487012e1e8, 0xc420308a50, 0x0)
	/tmp/tmp.Q2359EFByl/.go/src/github.com/dmathieu/sabayon/vendor/gopkg.in/square/go-jose.v1/signing.go:157 +0x64f
github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme.(*jws).signContent(0xc42038eb00, 0xc4203a0460, 0x7d, 0x9d, 0xc42018acf0, 0x99, 0x100)
	/tmp/tmp.Q2359EFByl/.go/src/github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme/jws.go:70 +0x100
github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme.(*jws).post(0xc42038eb00, 0xc42038f380, 0x33, 0xc4203a0460, 0x7d, 0x9d, 0x0, 0x88, 0x88)
	/tmp/tmp.Q2359EFByl/.go/src/github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme/jws.go:35 +0x67
github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme.postJSON(0xc42038eb00, 0xc42038f380, 0x33, 0x7511a0, 0xc42018acf0, 0x6f1f80, 0xc42018ac60, 0x0, 0x0, 0x0)
	/tmp/tmp.Q2359EFByl/.go/src/github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme/http.go:96 +0x153
github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme.(*Client).getChallenges.func1(0xc4200821b0, 0xc4200e6360, 0xc4200e6060, 0xc42001415a, 0x1d)
	/tmp/tmp.Q2359EFByl/.go/src/github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme/client.go:408 +0x1c6
created by github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme.(*Client).getChallenges
	/tmp/tmp.Q2359EFByl/.go/src/github.com/dmathieu/sabayon/vendor/github.com/xenolf/lego/acme/client.go:421 +0x113

@mtimofiiv
Copy link

mtimofiiv commented Feb 19, 2017

@dmathieu: @xenolf suggests upgrading to the latest version of lego:

Sabayon seems to use a version of lego which did not check for a certain condition in the Nonce() function. This has since been resolved and should no longer pop up - I don't think it is related to this issue though.

dmathieu added a commit that referenced this issue Feb 19, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants