Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor error message when secrets-encryption only enabled on primary server #3220

Closed
vrivellino opened this issue Apr 20, 2021 · 2 comments
Closed
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@vrivellino
Copy link

Environmental Info:
K3s Version:
k3s version v1.20.6+k3s1 (8d04328)
go version go1.15.10

Node(s) CPU architecture, OS, and Version:

Linux k3s-500-2 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

Bootstrapping High Availability with Embedded DB. Primary server configured; attempting to add second. Third server yet to be configured

Describe the bug:

The first server is bootstrapped and had --secrets-encryption enabled.
When bootstrapping the second server, --secrets-encryption is omitted.

The only log messages produced are:

Apr 20 12:42:34 k3s-500-2 k3s[1535]: time="2021-04-20T12:42:34.674014114Z" level=info msg="Starting k3s v1.20.6+k3s1 (8d043282)"
Apr 20 12:42:34 k3s-500-2 k3s[1535]: time="2021-04-20T12:42:34.716012083Z" level=info msg="Managed etcd cluster not yet initialized"
Apr 20 12:42:34 k3s-500-2 k3s[1535]: time="2021-04-20T12:42:34.720594759Z" level=fatal msg="starting kubernetes: preparing server: failed to write to : open : no such file or directory"
Apr 20 12:42:34 k3s-500-2 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE

Steps To Reproduce:

  • Install K3s on primary: K3S_TOKEN=SECRET k3s server --cluster-init --secrets-encryption`
  • After primary is configured, install K3s on secondary: K3S_TOKEN=SECRET k3s server --server https://<ip or hostname of server1>:6443 (omit --secrets-encryption)

Expected behavior:

A more concise error message describing the failure. The HA docs should also indicate which options must be replicated on all server nodes

@brandond
Copy link
Member

Attaching the strace from our Slack conversation, in case its useful later:

[pid  3469] newfstatat(AT_FDCWD, "/var/lib/rancher/k3s/server/tls/etcd", {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0
[pid  3467] <... nanosleep resumed>NULL) = 0
[pid  3469] openat(AT_FDCWD, "/var/lib/rancher/k3s/server/tls/etcd/peer-ca.key", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0600 <unfinished ...>
[pid  3467] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid  3469] <... openat resumed>)       = 8
[pid  3469] epoll_ctl(3, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=582465096, u64=139810357884488}}) = -1 EPERM (Operation not permitted)
[pid  3469] epoll_ctl(3, EPOLL_CTL_DEL, 8, 0xc0015ad864 <unfinished ...>
[pid  3467] <... nanosleep resumed>NULL) = 0
[pid  3469] <... epoll_ctl resumed>)    = -1 EPERM (Operation not permitted)
[pid  3467] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid  3469] write(8, "-----BEGIN EC PRIVATE KEY-----\nM"..., 227) = 227
[pid  3469] close(8 <unfinished ...>
[pid  3467] <... nanosleep resumed>NULL) = 0
[pid  3469] <... close resumed>)        = 0
[pid  3467] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid  3469] newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0
[pid  3469] openat(AT_FDCWD, "", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0600) = -1 ENOENT (No such file or directory)
[pid  3467] <... nanosleep resumed>NULL) = 0
[pid  3467] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid  3469] write(2, "\33[31mFATA\33[0m[2021-04-20T02:03:4"..., 140FATA[2021-04-20T02:03:42.734753028Z] starting kubernetes: preparing server: failed to write to : open : no such file or directory
) = 140
[pid  3469] exit_group(1 <unfinished ...>
[pid  3473] <... futex resumed>)        = -1 (errno 18446744073709551385)
[pid  3474] <... futex resumed>)        = ?
[pid  3471] <... futex resumed>)        = ?
[pid  3462] <... futex resumed>)        = ?
[pid  3473] +++ exited with 1 +++
[pid  3474] +++ exited with 1 +++
[pid  3472] <... futex resumed>)        = ?
[pid  3470] <... futex resumed>)        = ?
[pid  3469] <... exit_group resumed>)   = ?
[pid  3468] <... epoll_pwait resumed> <unfinished ...>) = ?
[pid  3467] <... nanosleep resumed> <unfinished ...>) = ?
[pid  3472] +++ exited with 1 +++
[pid  3471] +++ exited with 1 +++
[pid  3470] +++ exited with 1 +++
[pid  3469] +++ exited with 1 +++
[pid  3467] +++ exited with 1 +++
[pid  3468] +++ exited with 1 +++
+++ exited with 1 +++

@brandond brandond added the kind/bug Something isn't working label Apr 20, 2021
@brandond brandond added this to the v1.21.1+k3s1 milestone Apr 20, 2021
@dereknola dereknola self-assigned this Aug 31, 2022
@caroline-suse-rancher caroline-suse-rancher moved this to 🆕 New in K3s Backlog Nov 15, 2022
@caroline-suse-rancher caroline-suse-rancher moved this from To Be Sorted to Enhancements in K3s Backlog Nov 28, 2022
@caroline-suse-rancher caroline-suse-rancher moved this from Backlog to Next Up in K3s Development Jan 19, 2023
@dereknola
Copy link
Member

This was fixed with #6409

@github-project-automation github-project-automation bot moved this from Next Up to Done Issue in K3s Development Feb 24, 2023
@github-project-automation github-project-automation bot moved this from Enhancements to Closed in K3s Backlog Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: Closed
Archived in project
Development

No branches or pull requests

5 participants