-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to place allocations on clients running Consul 1.13.8 #17302
Comments
We're seeing the same thing, occurred when we upgraded to 1.5.6 from 1.5.5 and consul to 1.13.8 from 1.13.7 I suspect the issue here is consul 1.13.8 instead of Nomad due to @josh-m-sharpe's versions - will test this shortly. My intuition suggests hashicorp/consul#17270 is the source of the breakage as the initial message is "unable to fingerprint consul: attribute=consul.grpc" edit: opened issue with consul because this appears to be an issue on their side |
@erulabs thanks for the confirmation, but I found that consul 1.14.7 (latest version) works for us. There was a minor config change needed to launch that and everything's working. I did our staging cluster this afternoon and doing prod in the morning. So I guess I'm past whatever this issue is |
@josh-m-sharpe awesome - I'll look at going to consul 1.14 as well. Rolling back to 1.13.7 and keeping Nomad 1.5.6 works properly as well. |
Hi @josh-m-sharpe and @erulabs 👋 Thank you for the report. Upon further investigation I found out that the problem was an API breaking change in Consul where the version value returned by the $ curl -s http://localhost:8500/v1/agent/self | jq '.DebugConfig.Version'
"1.13.8\n" I'm not sure why this happened, but I opened hashicorp/consul#17503 in the Consul repo. This extra line break causes the Nomad fingerprint to break when trying to parse the version in GRPC and SKU detectors. I opened #17349 to prevent problems like this from affecting Nomad in the future. As far as I can tell this is the only version of Consul with this broken version return value, so other version seem safe to upgrade. Apologies for headache during the upgrade process. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Our stable versions:
consul: 1.13.2
nomad: 1.3.5
Decided to start upgrading, so I figured, sure, the last patch release of 1.13.x and 1.3.x respectively.
Attempted version combination:
consul: 1.13.8
nomad: 1.3.14 (I started this process before 1.3.15 dropped)
When I launch nomad with these versions, I see this output in /var/log/messages:
This seems to have the effect of preventing nomad from allocating containers there. This only happened once, and it took out my site, so I'm a bit reluctant to "test" that part again. I'm super interested in understanding and eliminating this warning though. It all seems odd though, the consul agent joins the cluster and looks like it's healthy. The nomad agent joins as well but just shows an empty client - nothing gets provisioned there.
I can't find much on the internet about what those attributes are and/or what nomad is looking for. So I don't have much to go on.
By trial and error I've sorta determined that with consul 1.13.7 this issue doesn't show up, so by downgrading to:
Consul: 1.13.7
Nomad: 1.3.14
...I sorta have a new stable set of versions.
I saw this release: https://github.com/hashicorp/nomad/releases/tag/v1.3.10 which mentions
consul: add client configuration for grpc_ca_file
so I attempted nomad 1.3.9 with consul 1.13.8 and that still produces these warnings.Before I go try all the versions of nomad to figure out what's going on - is there anything I've missed. I didn't see much in any of the release notes between Nomad 1.3.5 and 1.3.15. Thanks!
The text was updated successfully, but these errors were encountered: