Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helper/schema: Introduce context-aware resource CRUD functions without default timeout #723

Merged

Conversation

bflad
Copy link
Contributor

@bflad bflad commented Mar 9, 2021

Closes #675

(Test changes may be easiest to view with the "hide whitespace changes" option in GitHub as they are just introducing looping to cover relevant Create/CreateContext/CreateWithoutTimeout method implementations.)

The Terraform Plugin SDK has always supported customizable timeouts, declared at the helper/schema.Resource.Timeouts level with the operation aware helper/schema.ResourceTimeout type. These were designed to allow practitioners to implement a timeout per resource, e.g.

resource "example_thing" "example" {
  timeouts {
    create = "60m"
  }
}

Where the resource logic could fetch the timeout value via (helper/schema.ResourceData).Timeout() function.

Recently, context-aware resource operation functions were introduced and the existing non-context-aware functions were deprecated. These new context-aware resource operations forcibly introduce an operation timeout (defaults to 20 minutes) for the called logic, e.g.

ctx, cancel := context.WithTimeout(ctx, d.Timeout(TimeoutCreate))
defer cancel()
return r.CreateContext(ctx, d, meta)

Previously, the non-context-aware resources could be implemented with helper/schema.Resource.Timeouts.Create defined and operation logic such as:

func resourceExampleThingCreate(d *schema.ResourceData, meta interface{}) error {
  mutexkv.Lock("somekey")
  defer mutexkv.Unlock("somekey")
  // ... reference to d.Timeout(schema.TimeoutCreate) ...
}

Where in this case, the create timeout could be referenced in the code via d.Timeout(schema.TimeoutCreate) after the synchronization coordination. The creation timeout only affected this single operation, even if multiple operations were going to occur.

In the case of the context-aware functionality however, the timeout is configured before the mutex. So when multiple operations are waiting on the mutex, they are now sharing the same resource operation timeout. Since the timeout cannot be extended within the resource logic, developers are stuck either setting an arbitrarily high timeout (losing its usefulness) or practitioners are arbitrarily hitting the timeout as they scale out their configuration.

To characterize how some remote systems can require that operations must be serialized or otherwise synchronized:

  • Only 1 resource type can be created at a time
  • Only 1 child resource type can be created at a time per parent resource
  • Only 10 concurrent updates of a resource type can be performed at a time
  • Remote system may support configurable synchronization limits (either by the customer or remote system operators)

This essentially means the semantics of the synchronization can be based off resource configuration, operation type, and potentially accordant to the limits set by a remote system.

Here we introduce separate CRUD methods that are invoked before the context is set with the timeout, so as to not "force" the resource timeout. If timeout behavior is still desired, developers must manually implement timeout handling in the function logic when using these.

Most developers should prefer the existing context-aware functions and this is documented as such. However, another minor side benefit of these functions is that it also allows developers to upgrade CRUD function signatures for context awareness without logically changing resource behavior, then slowly switch to the timeout functions over time.

…t default timeout

Reference: hashicorp#675

The Terraform Plugin SDK has always supported customizable timeouts, declared at the `helper/schema.Resource.Timeouts` level with the operation aware [`helper/schema.ResourceTimeout` type](https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#ResourceTimeout). These were designed to allow practitioners to implement a timeout per resource, e.g.

```terraform
resource "example_thing" "example" {
  timeouts {
    create = "60m"
  }
}
```

Where the resource logic could fetch the timeout value via [`(helper/schema.ResourceData).Timeout()` function](https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#ResourceData.Timeout).

Recently, context-aware resource operation functions were introduced and the existing non-context-aware functions were deprecated. These new context-aware resource operations forcibly introduce an operation timeout (defaults to 20 minutes) for the called logic, e.g.

https://github.com/hashicorp/terraform-plugin-sdk/blob/e21a5bbb73209dcb31f4c9dff3a8c0cec1ca65ac/helper/schema/resource.go#L283-L285

Previously, the non-context-aware resources could be implemented with `helper/schema.Resource.Timeouts.Create` defined and operation logic such as:

```go
func resourceExampleThingCreate(d *schema.ResourceData, meta interface{}) error {
  mutexkv.Lock("somekey")
  defer mutexkv.Unlock("somekey")
  // ... reference to d.Timeout(schema.TimeoutCreate) ...
}
```

Where in this case, the create timeout could be referenced in the code via `d.Timeout(schema.TimeoutCreate)` after the synchronization coordination. The creation timeout only affected this single operation, even if multiple operations were going to occur.

In the case of the context-aware functionality however, the timeout is configured before the mutex. So when multiple operations are waiting on the mutex, they are now sharing the same resource operation timeout. Since the timeout cannot be extended within the resource logic, developers are stuck either setting an arbitrarily high timeout (losing its usefulness) or practitioners are arbitrarily hitting the timeout as they scale out their configuration.

To characterize how some remote systems can require that operations must be serialized or otherwise synchronized:

* Only 1 resource type can be created at a time
* Only 1 child resource type can be created at a time per parent resource
* Only 10 concurrent updates of a resource type can be performed at a time
* Remote system may support configurable synchronization limits (either by the customer or remote system operators)

This essentially means the semantics of the synchronization can be based off resource configuration, operation type, and potentially accordant to the limits set by a remote system.

### Attempted Solutions

```go
&schema.Resource{
	Timeouts: &schema.ResourceTimeout{
		Create: schema.DefaultTimeout(3 * time.Hour),
	},
}
```

This introduces separate CRUD methods that are invoked before the context is set with the timeout, so as to not "force" the resource timeout. If timeout behavior is still desired, developers must manually implement timeout handling in the function logic when using these.

Most developers should prefer the existing context-aware functions and this is documented as such. However, another minor side benefit of these functions is that it also allows developers to upgrade CRUD function signatures for context awareness without logically changing resource behavior, then slowly switch to the timeout functions over time.
@paddycarver
Copy link
Contributor

This looks reasonable to me. As we discussed, I find the proliferation of CRUD functions alarming--we're up to 12--but I think this is a pragmatic tradeoff given the compatibility constraints (the ability to restore prior behavior using sed) and the fact that we accidentally opted people into behavior without realizing it matters when discussing what solution is appropriate. I think in the future a boolean on the schema.Resource struct to toggle this behavior would be the right answer, but this is a bit of a special case.

@paddycarver paddycarver merged commit 0e34772 into hashicorp:master Mar 18, 2021
@ghost
Copy link

ghost commented Apr 18, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema Resource Timeout Affects Synchronized Resources Globally With Context Aware Functions
2 participants