Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor providers.Schemas and add a global schema cache #33482

Merged
merged 12 commits into from
Jul 6, 2023

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Jul 6, 2023

The primary goal of this PR is a structural refactoring of the types used to store provider schemas. Previously there was the deprecated *terraform.ProviderSchema, which was an alias for providers.Schemas, and providers.GetProviderSchemaResponse which contained the same data but was structurally distinct from providers.Schemas. All of these were equally used throughout the codebase, including tests, so unfortunately there was no way to get to a unified the type without a large diff.

The result here is that we now have a single type for all provider schema representations: providers.ProviderSchema. This happens to be an alias for the internal rpc response type providers.GetProviderSchemaResponse, which includes all schema data including request diagnostics. While the diagnostics are not applicable in most situations, multiple concurrent calls to the provider each need to access the schema and need to handle the error path, so the cache must include the diagnostics for use within the provider client. The ProviderSchema type does also use the Schema struct for all schemas which happens to have a Version field not applicable in all cases, however keeping the type consistent and reusable without messy conversions outweighs the extra cruft in the type signature while leaving room open for extension.

We also lift the caching layer for schemas from core to providers.SchemaCache as a global store of provider's schemas. The second copy of the schemas in core was redundant, and the goal of the global cache is to reduce memory footprint and allow multiple instances of providers to share the previously loaded schema. While the cache is transparently used through the provider, we still have core check the cache when looking up a schema, because we may not even need to load a provider instance at all in some cases.

Writing to the cache is temporarily disabled, because some releases of providers require that GetProviderSchema is called first to initialize the provider, and a forthcoming ProviderCapability flag will be set by those that can be effectively cached. Even though there will be providers that cannot be globally cached, there is still no need for the old core caching layer, as it didn't prevent the GetProviderSchema call on every provider instance, and if it did work it would have broken those providers which require that as the first call (these providers' clients will still cache the schema themselves per instance).

jbardin added 10 commits July 6, 2023 10:37
Add a single global schema cache for providers. This allows multiple
provider instances to share a single copy of the schema, and prevents
loading the schema multiple times for a given provider type during a
single command.

This does not currently work with some provider releases, which are
using GetProviderSchema to trigger certain initializations. A new server
capability will be introduced to trigger reloading their schemas, but
not store duplicate results.
Unify the struct used for Schemes and GetProviderSchemaResponse so that
we can have a single cache which handles all schema access.
Use the global providers.SchemaCache and update all schema access to the
providers.Schemas, except where the provider.GetProviderSchemaResponse
type name would be expected.

Some tests that reuse provider factories needed a little more careful
handling. Change the fixed func to only reset the provider on the first
call.
p.ReadDataSourceCalled = false
p.CloseCalled = false
}
if p, ok := rp.(*MockProvider); ok {
Copy link
Member Author

@jbardin jbardin Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has always been a very delicate and unreliable aspect of the mock provider. Resetting these on each factory call assumed that the same instance was used for an entire operation which isn't always true, and the new external caching makes that all the more apparent. We settle on making the provider instance reusable, but requiring a new closure for each context if you want to use these fields.

@jbardin jbardin marked this pull request as draft July 6, 2023 15:33
@jbardin jbardin marked this pull request as ready for review July 6, 2023 15:53
@jbardin jbardin requested a review from a team July 6, 2023 15:54
Copy link
Contributor

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is of course too large for a thorough review, but I tried to pay closest attention to the specifics you mentioned in the PR description and those parts all seemed reasonable to me. I think the best way to identify any unexpected implications of this is to merge it into main and let it soak in our alpha releases during the v1.6 development period.

I was initially a little nervous about using GetProviderSchemaResponse as the main representation given that it's quite tightly coupled to the current wire protocol -- e.g. if we decided to switch to a different approach of requesting individual resource types only when needed rather than fetching everything all at once then GetProviderSchemaResponse might cease to exist altogether. But on reflection it seems like we'd still want to have somewhere to flatten all of that down into a single type, and the fact that you used a type alias elsewhere rather than mentioning GetProviderSchemaResponse directly means that we could just rename type GetProviderSchemaResponse to type ProviderSchemas if later we have more separation between the protocol representation and the internal representation.

My only remaining feedback about that, then, is that I'd like to see a common on type GetProviderSchemaResponse to make it explicit that this name should be used only for the type in the signature of the providers.Interface method, and that type ProviderSchemas should be used everywhere else. (I might even consider swapping it so that GetProviderSchemaResponse is the alias, to make it more likely that a new maintainer will just "accidentally" find the right name to use, but I'll leave that up to you!)

@jbardin
Copy link
Member Author

jbardin commented Jul 6, 2023

Yes, I used the GetProviderSchemaResponse alias kind of as a compromise to reduce even more churn (because a huge amount of test code creates that structure directly), but once I started implementing it, it didn't feel like there was much of a reason to try and break that apart at all now. There is a still a bunch of test code which does leverage the fact that these are type aliases, but that would need to be refactored regardless of what approach we use, so there's no benefit to attacking it all at once, and the type system will catch all the cases for us if we ever separate the types. I'll add another comment better specifying the intentions of each type name though.

@jbardin jbardin merged commit 6be6f69 into main Jul 6, 2023
@jbardin jbardin deleted the jbardin/schema-cache branch July 6, 2023 20:06
@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

bflad added a commit that referenced this pull request Aug 30, 2023
Reference: #33482

The intention of this logic is to use the global provider schema cache if it is available for a given provider address based on the `GetProviderSchemaOptional` server capability. Previously the logic was errantly checking the named return early, which was not populated yet with the server response. This flips the logic to properly use the server response to determine whether the cache should be set, thereby still gating the caching based on the server capability.

In a configuration with 19 `hashicorp/aws` provider instances and an associated data source, this drops the received `GetProviderSchema` calls from 39(?) to 1. Maximum RSS dropped from 1.22 GB to 189 MB.
Copy link
Contributor

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants