Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s wipes its server token #11204

Closed
farcaller opened this issue Nov 1, 2024 · 6 comments
Closed

k3s wipes its server token #11204

farcaller opened this issue Nov 1, 2024 · 6 comments

Comments

@farcaller
Copy link
Contributor

farcaller commented Nov 1, 2024

Environmental Info:
K3s Version:

v1.31.1+k3s1 (452dbbc)

Node(s) CPU architecture, OS, and Version:

Linux horse 6.6.56 #1-NixOS SMP PREEMPT_DYNAMIC Thu Oct 10 10:50:06 UTC 2024 x86_64 GNU/Linux

Cluster Configuration:

1 server (single node cluster)

Describe the bug:

k3s zeroes its/var/lib/rancher/k3s/server/token and is unable to start up after (the reoccurrence of #5345).

Steps To Reproduce:

What triggers it for me (reliably) is

  • replace --kubelet-arg=node-ip=IPV4,IPV6 with --node-ip=IPV4,IPV6 and --node-external-ip=IPV6
  • receive the complaint that the service net isn't withing the external node subnet
  • remove --node-external-ip=IPV6

Expected behavior:

the node boots up

Actual behavior:

the node wipes its token

@brandond
Copy link
Member

brandond commented Nov 1, 2024

Wipes which token from where?

Please provide specific steps to reproduce, along with full details on your cluster config.

Are you using sqlite, embedded etcd, or an external database?

Where exactly are you replacing the args? In the config file? In the systemd unit? Are you doing that by hand, or by rerunning the install script?

@farcaller
Copy link
Contributor Author

Made the file path fully qualified in the comment.

I change the flags in the nixos configuration which would be somewhat our of the scope of this issue to post. It affects the systemd unit and I can confirm it's reproducible if I run the k3s binary with the relevant flags by hand. The effect is that the token file is truncated.

@brandond
Copy link
Member

brandond commented Nov 1, 2024

I can confirm it's reproducible if I run the k3s binary with the relevant flags by hand.

OK, can you provide the sequence of k3s server invocations necessary to trigger this?

@farcaller
Copy link
Contributor Author

farcaller commented Nov 1, 2024

Funny enough, the previously guaranteed repro no longer triggers.

That said, the only place the token is written is writeToken. and that calls os.WriteFile which can truncate the file first and fail after (see also golang/go#56173)

It's extremely curious that I'd expect the error to bubble up into tokenRotate then, to be logged inside tokenRequestHandler, and yet I don't see anything in the logs (I'd expect Sending HTTP 500 response, I suppose).

@farcaller
Copy link
Contributor Author

I did a number of k3s token rotates now, and they all work as intended. I'm somewhat dumbfounded by this, because I failed it 4 times before I went to file a bug.

Still, the only way it could theoretically fail would be in os.WriteFile doing the truncation and then returning an error. Why'd that happen? I don't know right now. The system was doing a backup back when I triggered the issues but I can hardly imagine a scenario where it would cause the truncate to go through and then the open to fail. There's also the lack of a log to confirm that was the exact issue.

@brandond
Copy link
Member

brandond commented Nov 1, 2024

Yeah, I can't really see how this would happen either unless the k3s process was interrupted at JUST the right time or there is some underlying weirdness with the host that caused an unsynced write to the file to get lost.

If you are able to reproduce this on demand on a stable node, please let us know how and we can reopen.

@brandond brandond closed this as completed Nov 1, 2024
@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants