Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: add ability to delete devices #68

Open
rsyring opened this issue Feb 17, 2022 · 15 comments · May be fixed by #160
Open

FR: add ability to delete devices #68

rsyring opened this issue Feb 17, 2022 · 15 comments · May be fixed by #160
Labels
enhancement New feature or request

Comments

@rsyring
Copy link

rsyring commented Feb 17, 2022

Is your feature request related to a problem? Please describe.

When destroying resources with terraform, I'd like the ability to remove the device from Tailscale. For example, when an AWS instance will be deleted, I want it's associated tailscale device to also be deleted.

Describe the solution you'd like
Keep track of devices in Terraform state and, when the device is removed, use the DELETE device API to remove the device from the tailnet.

@davidsbond
Copy link
Contributor

davidsbond commented Feb 21, 2022

Hey @rsyring, so deleting a device is a little bit complicated when it comes to the terraform provider, as devices cannot be created using it.

A terraform provider typically describes the lifecycle of a resource, but a device cannot be a resource because it can't be created purely via API calls.

One suggestion might be that when the device_authorization resource is deleted, the device is deleted also. How would you feel about this solution?

@pellegrino
Copy link

I managed to get a POC for this working on my branch but I must say I don't love the solution either. As @davidsbond said, devices cannot be created using it, so the provider has to track the changes that are happening elsewhere.

I made the create part of the provider ignore if the device isn't present, effectively giving it an "unavailable" marker. Whenever terraform runs and finds the device created, it will update its state with the correct ID. When the resource is deleted, it will be cleaned up using the device delete method.

provider "tailscale" {
  #  api_key = "my-api-key"
  #  tailnet = "my-tailnet"
}

resource "tailscale_device" "example" {
  name = "example.my-tailnet"
}

If you want, I can submit a PR for it.

@davidsbond
Copy link
Contributor

@pellegrino personally, I don't think this is a good way to go, if we can't actually create the device via API calls we shouldn't provide a resource for it.

I'd be willing to consider deletion occurring from deleting a device_authorization but I don't think we should add faux resources that do nothing.

For now my recommendation would be to use an ephemeral key that will remove the resource once the workload dies or restarts

@rsyring
Copy link
Author

rsyring commented Mar 2, 2022

Just making the implicit explicit: we have a chicken and egg problem here between Tailscale's operational model and Terraform's. Since the models are currently incompatible, some kind of workaround is going to be needed.

FWIW, I like the idea of a provider even though it can't actually create the record. Could take the "wait_for" logic from #72 and apply it here as well. Put BIG WARNINGS in the docs that the device isn't actually created through the API and point to an example of how to get Tailscale installed and running on a new host at creation time. Could also add the warning and explanation to the error message, and a link to the docs, if the provider times out waiting for the Tailscale device to come online.

While I agree that this violates the spirit of what a provider is in this case, the implementation feels closer to what is "should be" and if Tailscale ever gives the ability to actually create the resource through the API, only the provider implementation would change. The terraform scripts would continue to work as-is, with potentially a deprecation notice in case they set the wait_for argument to a non-default value.

Having said all that, I wouldn't mind deleting through device_authorization. The delete optional so the case of de-authorizing but not deleting is still possible.

@defo89
Copy link

defo89 commented Oct 10, 2022

I have a similar use case as @rsyring. Deleting the device through device_authorization would also work for me. We would execute the de-auth step before re-deploying the device with same name.

Using ephemeral key does not really fit since we install VMs (also terraform managed) that tend to reboot sometimes.

@defo89 defo89 linked a pull request Oct 10, 2022 that will close this issue
@eloop
Copy link

eloop commented Nov 15, 2022

This workaround seems ok, we delete any existing devices in a remote-exec provisioner while creating a VM (openstack + ubuntu 22.04 for my case), with the appropriately defined vars for the keys.

provisioner "remote-exec" {
    inline = [
      "curl -fsSL https://pkgs.tailscale.com/stable/ubuntu/jammy.noarmor.gpg | sudo tee /usr/share/keyrings/tailscale-archive-keyring.gpg >/dev/null",
      "curl -fsSL https://pkgs.tailscale.com/stable/ubuntu/jammy.tailscale-keyring.list | sudo tee /etc/apt/sources.list.d/tailscale.list",
      "sudo apt-get update",
      "sudo apt-get -y install tailscale jq",
      # We want to make sure any Tailscale devices with this name have been deleted.
      <<EOT
      curl 'https://api.tailscale.com/api/v2/tailnet/[email protected]/devices' -u "${var.tailscale_api_key}:" |  \
         jq -r '.devices[] | select(.hostname == "${self.name}") | .nodeId' |  while read -r nodeid
         do
           #echo curl -x DELETE "https://api.tailscale.com/api/v2/device/$nodeid" -u "${var.tailscale_api_key}:"
           curl -X DELETE "https://api.tailscale.com/api/v2/device/$nodeid" -u "${var.tailscale_api_key}:" -v
         done
       EOT
      ,
      "sudo tailscale up --authkey=${var.tailscale_key}  --ssh"
    ]
  }

@ghost
Copy link

ghost commented Sep 19, 2023

A behaviour that maps onto existing Terraform workflows and resource lifecycles would be to: delete all machines using a key created through tailscale_tailnet_key, when said Terraform resource gets deleted.

@evilhamsterman
Copy link

I've been trying to think of a way to handle this and I just can't think of a clean solution as it stand right now. I really think it needs an architectural change from Tailscale, like @mlangenberg mentioned in #232 Tailscale needs the ability to create a one time use key that associates a nodeId.

Once that's possible you can have a resource "tailscale_device" that creates a one time use key to be provided to something like cloud-init and has a nodeId associated that will be assigned to the node that uses the key. The key would be saved as part of the state and from Terraform's perspective it would be unchanging so cloud-inits won't be rebuilt, but it wouldn't be as sensitize because once it's used it is no longer active. The nodeId can then also be used elsewhere like device authorization. And since the tailscale_device would be a dependency of say a VM, when that VM is deleted it's dependencies will be to including the tailscale_device.

@gberenice
Copy link

Hey Tailscale team, any updates on this one?

@Patricol
Copy link

Patricol commented Oct 1, 2024

@davidsbond How amenable would you be towards allowing exposure of the provider's internally generated apiKey via a provider function, for use in cases like these?

I could write a few pages about how there are absolutely no valid options for handling common terraform usecases with the current design without massive security AND convenience compromises. (Pass around a static apikey in the only-encrypted-at-rest terraform state and write extra mechanisms to try to check when it's out of date and manually recreate it every few months? No thanks.) But as you can likely already tell; I am rather frustrated, and so I think it's not the best idea.

There are about a dozen better solutions to this than what I am suggesting above, but all the obvious ones have already been rejected by tailscale devs in different threads across different repos. The API could leverage the identity of the node it is run from for authentication. Logout could actually allow re-use of a name. A flag could be added to tailscale up which causes re-use of the name of an existing machine; causing either its deletion or renaming. If that is too extreme, then it could only work on machines that have already logged out. The CLI could allow for deletion of machines in any way, shape, or form. Ephemeral keys could not expire, so all the infrastructure doesn't go down if one node happens to reboot after its tailnet_key has expired. This terraform provider could be setup to import-on-creation; a pattern used in other major providers, and which would allow proper deletion.

Supporting extremely basic usage patterns should probably come before design-purity-idealism; but that's just my opinion. My apologies for the rudeness.

@jaxxstorm
Copy link
Contributor

@Patricol

A flag could be added to tailscale up which causes re-use of the name of an existing machine

This is already supported. Use tailscale up --state=arn:aws:ssm... with the path to an SSM or kuberetes secret, or an existing file attachment with the original state

Ephemeral keys could not expire, so all the infrastructure doesn't go down if one node happens to reboot after its tailnet_key has expired

This is also already supported - use an oauth client secret and specify ?ephemeral=true as a parameter to it

@jaxxstorm
Copy link
Contributor

I think the general pattern has been covered here - Terraform can't delete the device because it has no knowledge of the device upon creation. The device creation happens during machine provisioning, and as a result the provider state has no prior knowledge for the deletion event to occur.

The canonical way to handle device deletion is using ephemeral keys. For those folks who have pointed out this might be a concern due to tailnet keys expiring - note you can authenticate a device using an oauth client secret, which doesn't expire. You can also see that link for setting the device as ephemeral, which will delete on client logout.

If you'd like to reuse device state, or reuse names (rather than deleting old devices) - the solution there is to reuse the tailscaled state by storing in a filesystem that can be reattached to the new device, or using a Kubernetes secret or AWS SSM parameter.

@gberenice
Copy link

@jaxxstorm I don't see --state flag in the docs https://tailscale.com/kb/1241/tailscale-up
Could you please point me to the documentation about it? Thanks!

@jaxxstorm
Copy link
Contributor

@gberenice - the flag is actually on tailscaled - which is the underlying daemon that tailscale up interacts with and configures.

You can find the flag here:

https://tailscale.com/kb/1278/tailscaled#flags-to-tailscaled

@gberenice
Copy link

@jaxxstorm ah, right. Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants