Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smee panic in proxy mode #482

Closed
waldner opened this issue Jul 17, 2024 · 6 comments · Fixed by #483
Closed

Smee panic in proxy mode #482

waldner opened this issue Jul 17, 2024 · 6 comments · Fixed by #483
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@waldner
Copy link

waldner commented Jul 17, 2024

When running smee in proxy mode, the second PXE boot crashes it.

Current Behaviour

Running smee with -dhcp-mode=proxy. When the client first boots, it boots fine, loads hooks and runs through the provisioning template. After this, I set allowPXE: false and allowWorkflow: false in the corresponding hardware object. I then reboot the client (which is still configured to do PXE boot), and the PXE request crashes smee:

{"level":"info","ts":1721220400.4172351,"caller":"smee/main.go:124","msg":"starting","version":"02731c4"}
{"level":"info","ts":1721220400.4172654,"caller":"smee/main.go:129","msg":"starting syslog server","bind_addr":"0.0.0.0:514"}
{"level":"info","ts":1721220400.4172966,"caller":"smee/main.go:158","msg":"starting tftp server","bind_addr":"0.0.0.0:69"}
{"level":"info","ts":1721220400.4176638,"logger":"github.com/tinkerbell/ipxedust","caller":"[email protected]/ipxedust.go:201","msg":"serving iPXE binaries via TFTP","service":"github.com/tinkerbell/smee","addr":"0.0.0.0:69","blocksize":512,"timeout":5,"singlePortEnabled":true}
{"level":"info","ts":1721220400.422946,"caller":"smee/main.go:220","msg":"serving http","addr":"0.0.0.0:7171","trusted_proxies":["10.1.0.0/16","10.0.1.0/16"]}
{"level":"info","ts":1721220400.4255617,"caller":"smee/main.go:233","msg":"starting dhcp server","bind_addr":"0.0.0.0:67"}
{"level":"info","ts":1721220400.4256585,"caller":"server/dhcp.go:35","msg":"Server listening on","addr":"0.0.0.0:67"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x158d2f8]

goroutine 91 [running]:
github.com/tinkerbell/smee/internal/dhcp/handler/proxy.(*Handler).Handle(0xc00050cb40, {0x1c1b440, 0xc00017e690}, 0xc000564190, {{0x1c074b0?, 0xc000630e10?}, 0xc00080c370?, 0xc00061c630?})
	/home/runner/work/smee/smee/internal/dhcp/handler/proxy/proxy.go:190 +0x1338
created by github.com/tinkerbell/smee/internal/dhcp/server.(*DHCP).Serve in goroutine 51
	/home/runner/work/smee/smee/internal/dhcp/server/dhcp.go:86 +0x6ef

Your Environment

Running smee in kubernetes (microk8s) using the tinkerbell helm chart 0.4.4, smee image is version v0.11.0.
This is consistently reproducible.

smee:
  additionalArgs: ["-dhcp-mode=proxy"]
  trustedProxies: ["10.1.0.0/16","10.0.1.0/16"]
  logLevel: debug
  hostNetwork: true
  publicIP: 10.110.0.12

EDIT: This does not happen in DHCP "normal" (ie, reservation) mode. When the host with allowPXE: false reboots, smee does not crash and (from what I can see) serves the netboot-not-allowed file. This causes a PXE boot error on the client, which then proceeds to boot from hard drive. A bit rough possibly, but it works, so I'd at least expect the same behavior upon second boot when in proxy mode.

{"level":"info","ts":1721223489.1344376,"caller":"reservation/handler.go:141","msg":"sent DHCP response","mac":"58:11:22:32:83:0d","xid":"0xb9bded33","interface":"calic5d7ed3f76a","type":"ACK","bootFileName":"/netboot-not-allowed","nextServer":"0.0.0.0","ipAddress":"10.112.0.65","destination":"10.1.30.15:67"}
{"level":"error","ts":1721223489.1371915,"logger":"github.com/tinkerbell/ipxedust","caller":"itftp/itftp.go:91","msg":"file unknown","service":"github.com/tinkerbell/smee","event":"get","filename":"netboot-not-allowed","uri":"/netboot-not-allowed","client":{"IP":"10.1.30.15","Port":37262,"Zone":""},"macFromURI":"","error":"file [netboot-not-allowed] unknown: file does not exist","stacktrace":"github.com/tinkerbell/ipxedust/itftp.Handler.HandleRead\n\t/home/runner/go/pkg/mod/github.com/tinkerbell/[email protected]/itftp/itftp.go:91\ngithub.com/pin/tftp/v3.(*Server).handlePacket.func2\n\t/home/runner/go/pkg/mod/github.com/pin/tftp/[email protected]/server.go:455"}
@jacobweinstock
Copy link
Member

Hey @waldner, thanks for reporting this! I have opened #483 to resolve this.

@jacobweinstock jacobweinstock added the kind/bug Categorizes issue or PR as related to a bug. label Jul 17, 2024
jacobweinstock added a commit that referenced this issue Jul 17, 2024
Fix nil pointer error:

## Description

<!--- Please describe what this PR is going to change -->
When in proxyDHCP mode if AllowNetboot is false Smee would panic because of this log line. This resolves the issue.

## Why is this needed

<!--- Link to issue you have raised -->

Fixes: #482

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->


## How are existing users impacted? What migration steps/scripts do we need?

<!--- Fixes a bug, unblocks installation, removes a component of the stack etc -->
<!--- Requires a DB migration script, etc. -->


## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
@waldner
Copy link
Author

waldner commented Jul 17, 2024

Thanks! Will this be included in a new smee image (eg v0.12.0)? And when will it be published?

@jacobweinstock
Copy link
Member

jacobweinstock commented Jul 17, 2024

Yes, it will be in v0.12.0. I'm hoping to get that out by end of next week. Also, it is available now using quay.io/tinkerbell/smee:sha-47170cde and quay.io/tinkerbell/smee:latest

@waldner
Copy link
Author

waldner commented Jul 18, 2024

Trying to use the new image (smee:sha-47170cde) with a 0.4.4 helm chart, I'm getting this error:

$ kubectl logs -n tink-system smee-69d7dcddc6-zqjmc
flag provided but not defined: -dhcp-http-ipxe-binary-url
Smee is the DHCP and Network boot service for use in the Tinkerbell stack.

USAGE
  smee [flags]
...

@jacobweinstock
Copy link
Member

Trying to use the new image (smee:sha-47170cde) with a 0.4.4 helm chart, I'm getting this error:

$ kubectl logs -n tink-system smee-69d7dcddc6-zqjmc
flag provided but not defined: -dhcp-http-ipxe-binary-url
Smee is the DHCP and Network boot service for use in the Tinkerbell stack.

USAGE
  smee [flags]
...

Yeah, the top of tree for Smee has cli flag changes. Here's the Helm chart updates that are needed: tinkerbell/charts#111
These will land in the Charts repo once Smee is released. You can also see the different cli flags by running: docker run -it --rm quay.io/tinkerbell/smee:sha-47170cde -h

@waldner
Copy link
Author

waldner commented Jul 18, 2024

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants