Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the CLI to use gRPC #212

Merged
merged 41 commits into from
Nov 12, 2021
Merged

Rework the CLI to use gRPC #212

merged 41 commits into from
Nov 12, 2021

Conversation

kradalby
Copy link
Collaborator

@kradalby kradalby commented Nov 4, 2021

First, apologise for the massive PR, I got a bit carried away. (A lot of the code is generated, and is a separate commit)

This PR moves the rest of the user facing CLI (all commands but serve) to use gRPC to communicate with the server.

This means that the PR has "three layers".

  1. It adds protobuf definitions and generate types and rpc based on this
  2. Then it implements the "service" interface provided by gRPC
  3. And then we move all the commands to use the new interface.
  4. Bonus layer: I have added integration tests for the CLI (execute command and read out result) for every command.

In between these steps, there is a lot of code cleanup and streamlining of functions to better fit the new interface. I have also made an attempt on standardising and cleaning up where we had several different ways to get information.

This means that we now have:

  • A CLI that communicates over rpc
    • which means we can run it from everywhere when we add authentication
  • A Web API generated from the same spec, supporting the same as gRPC (currently disabled, due to missing auth)
  • Integration tests for the CLI, should help us find direct and detect underlying issues changes can cause.

What I still would like to tackle (before release):
The cli takes quite inconsistent parameters: database id, machine key, name + namespace. I think we should discuss to standardise on one main approach.

This commit adds proto rpc definitions for the communication needed for
the CLI interface.
This will allow us to move the rest of the CLI interface over to gRPC
and in the future allow remote access
This commit adds a debug command tree, intended to host commands used
for debugging and testing.

It adds a create node/machine command which will be used later to create
machines that can be used to test the registration command.
This commit moves the routes lookup functions to be subcommands of
Machine, making them a lot simpler and more specific/composable.

It also moves the register command from cli.go into machine, so we can
clear out the extra file.

Finally a toProto function has been added to convert between the machine
database model and the proto/rpc model.
This commit is getting rid of a bunch of returned list pointers.
This commit is a first in a series of commits migrating the command
interfaces to use the new gRPC client.

As a part of this commit, they have been streamlined and each command
_should_ be a bit more similar and use consistent output.

By using the new output function, we now make sure its always json
(errors and everything) if the user asks for JSON.
This PR adds a new part to the integration test suite which spins up a
new headscale and runs through a scenario of test cases for each
command.

The intent is to check that all commands work as intended and produce
the expected output.

I think they have been pretty well covered, but would appreciate
additional test cases if I have missed some.

Please note: headscale is set up, and teared down for _each_ "test
function" in this file, this means that its more suitable for specific
cases.
@kradalby kradalby marked this pull request as ready for review November 5, 2021 07:42
@kradalby kradalby requested review from juanfont and cure November 5, 2021 07:42
Copy link
Contributor

@cure cure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small things below. This is really great cleanup!

I'm going to do some testing now.

cmd/headscale/cli/namespaces.go Outdated Show resolved Hide resolved
cmd/headscale/cli/nodes.go Outdated Show resolved Hide resolved
cmd/headscale/cli/root.go Outdated Show resolved Hide resolved
@kradalby
Copy link
Collaborator Author

kradalby commented Nov 7, 2021

If I understand unix sockets and your implementation correct. For now to interact with the server, the server needs to be started to add nodes or whatever.
This is correct, previously you could use the CLI without the server because you were modifying the database directly.
While there can be use cases for that, it make it very hard to validate all input and in general would just open more opportunities for bad things to happen to your database.

This communication runs over the socket create on start in /var/run/headscale.socket. Headscale will need to run as root user to create this socket or even communicate over it. Therefore we implicitly make headscale only runable as root. Is that intended? I don't really know the inner workings of grpc

headscale will not need to run as root, but the default path will require permission to write that one file in /var/run/headscale.sock.
What you would typically do in this situation is to configure you init system (e.g. systemd) to run as a headscale user, and create the file and fix the permissions.

You can of course also change the socket path to somewhere else.

(the use of socket is not gRPC specific, it is as old as (unix)time it selves, consider gRPC a protocol)

I would suggest to rather not do that. As it is a great feature to run headscale without root and I also plan in future to make the docker setup rootless. More security. If possible.

Same above would apply in docker, it is just about permissions.

As I could see from the log the grpc server is also multiplexed on the listen_address. So we are exposing the API without any authentication? Edit: Over read A Web API generated from the same spec, supporting the same as gRPC (currently disabled, due to missing auth)

The WebAPI and gRPC over network is listening, but it is returning access denied on all requests as there is no way to authenticate. So it is not unauthenticated, it is impossible to use :)

For now I still in the middle of testing most use cases.
Greatly appreciated !

@cure
Copy link
Contributor

cure commented Nov 7, 2021

This happens for other commands too:
./headscale routes list -i 6
Cannot get nodes: rpc error: code = Unknown desc = record not found

Should be better now.

headscale namespaces rename x y
Cannot rename namespace: Namespace not found

Yes! That is much better thanks.

juanfont
juanfont previously approved these changes Nov 7, 2021
Copy link
Owner

@juanfont juanfont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is an amazing piece of software. Thank you so much @kradalby

if _, err := os.Stat(h.cfg.UnixSocket); errors.Is(err, os.ErrNotExist) {
return nil
}
return os.Remove(h.cfg.UnixSocket)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have headscale already running?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will crash the other one. I am a bit conflicted trying to fix that, because at some point I think it must become the responsibility of a init system like systemd 🤔

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm. Kinda agreed!

@ohdearaugustin
Copy link
Collaborator

Just tested with docker seems to be fine.

Just realized. That the logging of grpc is leaking the preshared keys like:

2021-11-07T22:20:20Z INF unary dur=0.454557 md={":authority":"/var/run/headscale.sock","content-type":"application/grpc","user-agent":"grpc-go/1.42.0"} method=ListPreAuthKeys req={"namespace":"test"} resp={"preAuthKeys":[{"createdAt":"2021-11-07T22:12:38.805900338Z","expiration":"2021-11-08T22:12:38.804877152Z","id":"1","key":"b113f30858285f9dc1ae41d9c485d2db1e400becdfbb3945","namespace":"test","resuable":true},{"createdAt":"2021-11-07T22:12:54.439030863Z","expiration":"2021-11-08T22:12:54.437958889Z","id":"2","key":"8a6659bf225ef5436143fe2c6c66c68ffe6a4bc8c02ffe4a","namespace":"test","resuable":true,"used":true},{"createdAt":"2021-11-07T22:16:06.510883372Z","expiration":"2021-11-08T22:16:06.405761418Z","id":"3","key":"f137325200d09306ad72137603a49b36bf0128a2754f490e","namespace":"test","resuable":true}]} service=headscale.v1.HeadscaleService

@kradalby
Copy link
Collaborator Author

kradalby commented Nov 8, 2021

I am pushing a fix for the nil pointer in a few moments.

I'm not sure how it should work.

Thats a good point with reusable and ephemeral 🤦 , @juanfont empherial og reusable is mutual exclusive right?

@kradalby
Copy link
Collaborator Author

kradalby commented Nov 8, 2021

This should now have addressed all the concerns, please have another look @cure and @juanfont

@kradalby
Copy link
Collaborator Author

kradalby commented Nov 8, 2021

Just realized. That the logging of grpc is leaking the preshared keys like:

Should be fixed now.

Can you give the other issues you saw a go @ohdearaugustin ?

@ohdearaugustin
Copy link
Collaborator

Just realized. That the logging of grpc is leaking the preshared keys like:

Should be fixed now.

Can you give the other issues you saw a go @ohdearaugustin ?

Leak and panic in preauthkey create look fine now. As you mentioned above we still should be sure about how the preauthkeys should work. Otherwise look good. Thank you (=

@cure
Copy link
Contributor

cure commented Nov 9, 2021

Hmm, at commit 4989330 I am still seeing a segfault when doing headscale nodes list:

headscale[1347376]: panic: runtime error: invalid memory address or nil pointer dereference
headscale[1347376]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xea3c9b]
headscale[1347376]: goroutine 91 [running]:
headscale[1347376]: github.com/juanfont/headscale.(*PreAuthKey).toProto(0xc0002d3b80, 0xc0000d5080)
headscale[1347376]:         /usr/src/headscale/preauth_keys.go:149 +0x5b
headscale[1347376]: github.com/juanfont/headscale.(*Machine).toProto(0xc00099d6c8, 0xc0000d4fc0)
headscale[1347376]:         /usr/src/headscale/machine.go:544 +0x3f6
headscale[1347376]: github.com/juanfont/headscale.headscaleV1APIServer.ListMachines(0xc0003ef200, 0x13cd9f8, 0xc000943f80, 0xc00037b280, 0x1204e60, 0xc00030fa00, 0x13dbed0)
headscale[1347376]:         /usr/src/headscale/grpcv1.go:228 +0x465
headscale[1347376]: github.com/juanfont/headscale/gen/go/headscale/v1._HeadscaleService_ListMachines_Handler.func1(0x13cd9f8, 0xc000943f80, 0x114dda0, 0xc00037b280, 0x0, 0x203000, 0x7fe0a64e2e40, 0x18)
headscale[1347376]:         /usr/src/headscale/gen/go/headscale/v1/headscale_grpc.pb.go:530 +0x89
headscale[1347376]: github.com/philip-bui/grpc-zerolog.NewUnaryServerInterceptorWithLogger.func1(0x13cd9f8, 0xc000943f80, 0x114dda0, 0xc00037b280, 0xc00036f840, 0xc00094e540, 0xc000566ba0, 0x6441c6, 0x11973e0, 0xc000943f80)
headscale[1347376]:         /usr/src/pkg/mod/github.com/philip-bui/[email protected]/unary_interceptor.go:50 +0x9d
headscale[1347376]: github.com/juanfont/headscale/gen/go/headscale/v1._HeadscaleService_ListMachines_Handler(0x1204e60, 0xc000011218, 0x13cd9f8, 0xc000943f80, 0xc000948960, 0xc000135260, 0x13cd9f8, 0xc000943f80, 0x0, 0x0)
headscale[1347376]:         /usr/src/headscale/gen/go/headscale/v1/headscale_grpc.pb.go:532 +0x150
headscale[1347376]: google.golang.org/grpc.(*Server).processUnaryRPC(0xc000024700, 0x13d7978, 0xc0002cf380, 0xc000449d40, 0xc000290240, 0x19eca40, 0x0, 0x0, 0x0)
headscale[1347376]:         /usr/src/pkg/mod/google.golang.org/[email protected]/server.go:1282 +0x52b
headscale[1347376]: google.golang.org/grpc.(*Server).handleStream(0xc000024700, 0x13d7978, 0xc0002cf380, 0xc000449d40, 0x0) 
headscale[1347376]:         /usr/src/pkg/mod/google.golang.org/[email protected]/server.go:1616 +0xd0c
headscale[1347376]: google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc0004347c0, 0xc000024700, 0x13d7978, 0xc0002cf380, 0xc000449d40)
headscale[1347376]:         /usr/src/pkg/mod/google.golang.org/[email protected]/server.go:921 +0xab
headscale[1347376]: created by google.golang.org/grpc.(*Server).serveStreams.func1
headscale[1347376]:         /usr/src/pkg/mod/google.golang.org/[email protected]/server.go:919 +0x1fd
systemd[1]: headscale.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: headscale.service: Failed with result 'exit-code'.
systemd[1]: headscale.service: Scheduled restart job, restart counter is at 7.

Sorry, I haven't had time yet to dig into this.

@kradalby
Copy link
Collaborator Author

kradalby commented Nov 9, 2021

Sorry, I haven't had time yet to dig into this.

No worries, do you have any minimum steps to reproduce? I can't seem to get it, and I need a test case :)

@juanfont
Copy link
Owner

juanfont commented Nov 9, 2021

I am pushing a fix for the nil pointer in a few moments.

I'm not sure how it should work.

Thats a good point with reusable and ephemeral 🤦 , @juanfont empherial og reusable is mutual exclusive right?

Correct!

@kradalby kradalby mentioned this pull request Nov 12, 2021
@kradalby
Copy link
Collaborator Author

@cure, I am still quite puzzled about the issue, I cant replicate it, which I think means that I cant get my database into the same state as yours.

As far as I can see, you end up with a PreAuthKey object that is missing one (or more) of these fields:

https://github.com/kradalby/headscale/blob/cli-grpc/preauth_keys.go#L143-L148

	protoKey := v1.PreAuthKey{
		Namespace: key.Namespace.Name,
		Id:        strconv.FormatUint(key.ID, 10),
		Key:       key.Key,
		Ephemeral: key.Ephemeral,
		Reusable:  key.Reusable,
		Used:      key.Used,
	}

Since they are not pointers, they should not be nil, but thats not really a guarantee.

Could you try to run a version of the code with a print statement or something for the key variable in that function?

Maybe it could be some field missing in your DB that has not been migrated in the past?

Copy link
Contributor

@cure cure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, apparently I was not on the tip of this branch, I was still on the dev20211107104005 version. I pulled in the latest changes and rebuilt, and the problem with headscale nodes list crashing is now gone.

I do indeed have some invalid data in my database - there are some preauthkeys that belong to a namespace that has been deleted (cf. #217). Maybe that had something to do with this.

In any case, I think this is good to merge now. Thanks for your patience!

@kradalby
Copy link
Collaborator Author

Thank you :), I think merging this and continuing some more work I have in the pipeline would make sense (try to avoid making such massive PRs).

I think @juanfont started to prepare it, but I think we should run this under 0.12.0-beta for a bit, maybe even some RCs.

And add a clear note that the server has to be running now to use the CLI.

🎉

@kradalby kradalby merged commit ba65092 into juanfont:main Nov 12, 2021
@kradalby kradalby deleted the cli-grpc branch March 2, 2022 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants