Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate terraform-plugin-framework, and rewrite the tfe_variable resource #873

Merged
merged 18 commits into from
May 9, 2023

Conversation

nfagerlund
Copy link
Member

@nfagerlund nfagerlund commented May 4, 2023

Description

Fixes #839.

Wow, lots of sturm und drang for one little bug fix, huh? But anyway, this starts us on the path to adopting the shiny new plugin framework (which, by the way, is actually incredibly excellent to work with compared to the v2 sdk) and validates our theory that it grants the power to resolve that perennially obnoxious "can't tell absent from zero" family of defects.

TODO:

  • Minor docs tweak
  • Changelog

Testing plan

Sorry, this is outrageously convoluted, which is why no one apparently noticed the bug until the reporter on #839.

  1. Make an arbitrary terraform config that relies on a sensitive variable being set to a particular value. (Probably easiest is to just switch between two output values based on an equality check against a known value.)

  2. Make a terraform config that uses this provider to create a TFC workspace (which will use the config from step 1), and a sensitive variable in that workspace. The var should have the value set to an empty string, and should have ignore_changes = [value]. Like this:

    resource "tfe_variable" "affected" {
      key          = "my_var"
      value        = ""
      category     = "terraform"
      workspace_id = tfe_workspace.var-sensitive-nuke-repro.id
      description  = "not edited yet"
      sensitive    = true
      lifecycle {
        ignore_changes = [value]
      }
    }
  3. Apply the config from step 2.

  4. Manually, in the TFC UI, change the value of the sensitive variable in your new workspace to the Particular Value that your config from 1 relies on.

  5. Do an apply in the new workspace and verify the value is set as expected.

  6. Edit your config from step 2 to change the description of the variable, leaving value unchanged. Do an apply, and note that Terraform only reports a change to description as being necessary.

  7. Do an apply in the new workspace again.

    • In released versions of the provider, this now breaks because the value was forcibly, silently reset to an empty string, despite a description edit being the only planned change.
    • With this PR, it works fine.

Additionally: Slam on the tfe_variable resource as hard as you can and try to squeeze any kind of unexpected behavior out of it. It ought to behave.

Output from acceptance tests

TF_ACC=1 TF_LOG_SDK_PROTO=OFF go test $(go list ./... |grep -v 'vendor') -v -run TestAccTFEVariable_ -timeout 15m
?   	github.com/hashicorp/terraform-provider-tfe	[no test files]
=== RUN   TestAccTFEVariable_basic
--- PASS: TestAccTFEVariable_basic (37.70s)
=== RUN   TestAccTFEVariable_basic_variable_set
--- PASS: TestAccTFEVariable_basic_variable_set (32.52s)
=== RUN   TestAccTFEVariable_update
--- PASS: TestAccTFEVariable_update (55.84s)
=== RUN   TestAccTFEVariable_update_key_sensitive
--- PASS: TestAccTFEVariable_update_key_sensitive (55.95s)
=== RUN   TestAccTFEVariable_import
--- PASS: TestAccTFEVariable_import (37.09s)
=== RUN   TestAccTFEVariable_rewrite
--- PASS: TestAccTFEVariable_rewrite (56.65s)
PASS
ok  	github.com/hashicorp/terraform-provider-tfe/tfe	275.978s
?   	github.com/hashicorp/terraform-provider-tfe/version	[no test files]

nfagerlund added 10 commits May 3, 2023 18:04
This lets me develop the rewrite side-by-side with the existing implementation
for reference.
This commit adds a complete re-implementation of tfe_variable. It keeps most
existing behavior, but fixes a notable bug (#839) where we were unable to
respect `ignore_changes` for sensitive variable values. This was impossible to
address in the v2 SDK, but is incredibly straightforward with the new framework.

Although the effort required for this rewrite wouldn't necessarily be justified
for a single bug fix, it:

- Helps validate our hypothesis that the framework can indeed address the
recurring scourge of "bugs not fixable due to inability to distinguish zero from
absent".

- Gets the framework muxed in and ready to rock and roll, so we can immediately
start implementing new resource types in it.

- Establishes some initial patterns, practices, and examples for using the new
framework within this particular provider.

Anyway, here's some interesting sidenotes about this rewrite:

- The migration docs for the new framework were invaluable:
https://developer.hashicorp.com/terraform/plugin/framework/migrating

- Likewise the working code for the "HashiCups" teaching provider:
https://github.com/hashicorp/terraform-provider-hashicups-pf

- Note that I omitted some logic in most of the existing CRUD methods that
fetched the workspace prior to trying to manage the variable. This logic dated
from the time when the workspace_id argument used the `org/ws_name` format
instead of an external ID; it's no longer necessary now that we have a usable ID
right off the bat.

- Attributes that have a `Default` must also be `Computed`, apparently. I guess
I can see it.

- Most of the go-tfe struct types (update options, etc.) are only constructed by
one CRUD method, but most the methods need to convert a go-tfe value to a model
that we can use to update the state; thus, that's the logic I pulled out into
helper funcs. Conveniently, that's also the point where we need to take some
extra care to make sure we're preserving our last-known information about the
value of a sensitive variable, so we're able to keep that logic centralized too.
Note: This also blows away the old variableV0 schema migration tests, and I'm
not actually sure yet how to test the new style of state migrator. The only idea
I've had so far is to do something similar to the regression acceptance test
included in this PR, and use ExternalProviders to pull in the last version of
the provider before the `workspace_id` changed in the `tfe_variable` schema.
HOWEVER, said schema change happened in late 2019, and I'm not even sure a
provider that old would work properly with a modern TFC?

Perhaps we keep this around as a best-effort, but don't bother testing it
because of ever-diminishing likelihood anyone will invoke it. At any rate, at
least it provides an example of how to implement a state upgrader in the new
setup.
With this, the existing tests pass with no further changes. 🙌🏼
This should verify that the new resource implementation is a safe drop-in
replacement for the old one, causing no state changes when run against existing
resource state.

The utility of this test is expected to diminish over time, and we'll probably
need to delete it the next time we make a notable change to the workspace or
variable set resources. All things pass!
@nfagerlund nfagerlund requested a review from a team as a code owner May 4, 2023 02:04
nfagerlund added 4 commits May 3, 2023 19:08
This is an expected part of Terraform's behavior around write-only values, and
you're expected to prevent out-of-band modifications to such values if you care
about their consistency after their initial creation.
The old SDK v2 provider no longer implements tfe_variable, so any test that
needs to create variable resources now needs to use the muxed providers server
instead of a standalone instance of the SDK v2 provider.
I did in fact end up verifying the ExactlyOneOf validator behavior -- as the
docs say, it includes the surrounding attribute in the list of mutually
exclusive attrs, so if you have a pair, each one only needs to identify the
other one.
@nfagerlund
Copy link
Member Author

Unless someone finds a notable defect in resource behavior, I think the main open questions for review are:

  • How hard do we want to work for testing the state migrator? I actually DID get some good examples and suggestions for how to test those (mostly this stuff in the random provider), so I'm willing to take a swing at it. On the one hand, probably this code path will never be executed in anger ever again, but on the other hand, half the point of this PR is to lay down a track in the dirt for the next person to follow, so maybe we want a fully tested migrator around just for that reason. 🤷
  • Can you spot anything in the resource implementation that needs better explanation or should be organized in a more elegant way?
  • Do we want to change the physical layout of the files at this time, to prepare for growth in the number of framework resources and set them apart from the sdkv2 resources?

@uturunku1
Copy link
Collaborator

uturunku1 commented May 5, 2023

I am excited to test for these changes today!
I am looking at the instructions for Testing Plan, and I wanted to ask for some clarifications. As far as I understand the testing plan is suggesting to work with two separate terraform configs: config from step 1 and config from step 2.
Questions:

  1. Step 3 is "Apply the config from step 2", so at what time do we "Apply the config from step 1"?
  2. Step 2 says "create a TFC workspace which will use the config from step 1". Could you elaborate how is resource "tfe_workspace" "var-sensitive-nuke-repro" using the first config?

Copy link
Collaborator

@brandonc brandonc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't break it.

"token": schema.StringAttribute{
Optional: true,
Description: descriptions["token"],
// TODO: should be sensitive, but that's a breaking change.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold up, this is interesting. I don't think it's possible to reference the value of a provider config (value = provider.tfe.token) so it should never show up in a diff. What is the practical effect?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is just an awkward side effect of using the same schema types as resource schema 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so! I just noticed that the muxed provider explodes if every internal provider doesn't have the exact same schema, so we'd have to change them all. And doing so might not actually be a "breaking" change per se, I haven't really tried yet. 🤔

tfe/provider_next.go Show resolved Hide resolved
@nfagerlund
Copy link
Member Author

nfagerlund commented May 5, 2023

@uturunku1

As far as I understand the testing plan is suggesting to work with two separate terraform configs: config from step 1 and config from step 2.

Yeah, that's right!

  1. Step 3 is "Apply the config from step 2", so at what time do we "Apply the config from step 1"?

In step 5, and in step 7.

  1. Step 2 says "create a TFC workspace which will use the config from step 1". Could you elaborate how is resource "tfe_workspace" "var-sensitive-nuke-repro" using the first config?

You'd have a vcs_repo block that grabs your config 1 from Github or whatever. Alternately, you could use CLI mode and just upload the config from a local workdir.

Sorry, I know it's confusing! Ultimately all we actually want to test is this: If Terraform didn't say it was gonna change the value of the sensitive variable, the value had better not be changed after apply. But because the API never allows you to inspect the value again, you have to do SOMETHING tricky and annoying to work around that! You could skip the whole "config 1" roundabout entirely by spying on TFC internals in your local dev — either dumping the entire request to logs whenever a variable gets mutated, or cracking into the rails console and grabbing and decrypting the variable value out of the DB. But I wanted to make sure I was also writing down a validation that external testers could manage. 😓

In other words, Config 1 only exists to indirectly inspect the sensitive value before and after apply. You can use whatever you want to do that inspection!

@uturunku1
Copy link
Collaborator

@uturunku1

As far as I understand the testing plan is suggesting to work with two separate terraform configs: config from step 1 and config from step 2.

Yeah, that's right!

  1. Step 3 is "Apply the config from step 2", so at what time do we "Apply the config from step 1"?

In step 5, and in step 7.

  1. Step 2 says "create a TFC workspace which will use the config from step 1". Could you elaborate how is resource "tfe_workspace" "var-sensitive-nuke-repro" using the first config?

You'd have a vcs_repo block that grabs your config 1 from Github or whatever. Alternately, you could use CLI mode and just upload the config from a local workdir.

Sorry, I know it's confusing! Ultimately all we actually want to test is this: If Terraform didn't say it was gonna change the value of the sensitive variable, the value had better not be changed after apply. But because the API never allows you to inspect the value again, you have to do SOMETHING tricky and annoying to work around that! You could skip the whole "config 1" roundabout entirely by spying on TFC internals in your local dev — either dumping the entire request to logs whenever a variable gets mutated, or cracking into the rails console and grabbing and decrypting the variable value out of the DB. But I wanted to make sure I was also writing down a validation that external testers could manage. 😓

In other words, Config 1 only exists to indirectly inspect the sensitive value before and after apply. You can use whatever you want to do that inspection!

Thanks for clarifying, that makes sense! I'll just my local dev to inspect the sensitive values. I'll try this on Monday morning and report back.

Copy link
Collaborator

@uturunku1 uturunku1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I appreciate the individual commits, it helped me digesting all the changes included in this PR. To be honest, I didn't read the Migration docs because I felt that between the Confluence page you wrote and your commits, I had plenty information.

As far as your questions posted on this comment #873 (comment) ...

  1. I don't think that we need to test state migrator that hard because we'll continue adding changes that will put it to test
  2. No, I feel like there is a good amount of comments.
  3. Hmm, I am not sure about this one. It's a meaningful change. Part of me wants to say yes. Maybe this is something we can discuss in the team meeting and come out with a decision together?

@@ -211,6 +211,32 @@ func TestAccTFEVariable_import(t *testing.T) {
})
}

// Verify that the rewritten framework version of the resource results in no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

resp.RequiresReplace = true
}
},
"Force replacement if sensitive argument changed from true to false.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forced replacement is only when sensitive changes from true to false, what about when there is a reverse change, false -> true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is: that's fine, and the API will let you do it without a blip. It's just that the API won't let you turn sensitive off, because it's not willing to reveal that prior value to you again once it's been contaminated with the sensitive flag.

tfe/resource_tfe_variable.go Outdated Show resolved Hide resolved
tfe/resource_tfe_variable.go Outdated Show resolved Hide resolved
tfe/resource_tfe_variable.go Outdated Show resolved Hide resolved
Co-authored-by: UKEME BASSEY <[email protected]>
nfagerlund added 2 commits May 9, 2023 09:31
…rset

These are logically identical, but aligning the attribute with the function
name is less indirect. 😅
Copy link
Contributor

@Uk1288 Uk1288 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nfagerlund
Copy link
Member Author

Recording for posterity: we discussed the prospect of reorganizing the provider at this time, and decided not to. The principle we're probably going to follow as we migrate resources is: if we nest it as a separate domain in our API docs, we should nest it in the provider's internal organization as well. That means basically

  • admin API
  • registry API

and nothing really else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sensitive tfe_variable values are wiped upon description change
4 participants