-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor providers.Schemas and add a global schema cache #33482
Conversation
Add a single global schema cache for providers. This allows multiple provider instances to share a single copy of the schema, and prevents loading the schema multiple times for a given provider type during a single command. This does not currently work with some provider releases, which are using GetProviderSchema to trigger certain initializations. A new server capability will be introduced to trigger reloading their schemas, but not store duplicate results.
Unify the struct used for Schemes and GetProviderSchemaResponse so that we can have a single cache which handles all schema access.
Use the global providers.SchemaCache and update all schema access to the providers.Schemas, except where the provider.GetProviderSchemaResponse type name would be expected. Some tests that reuse provider factories needed a little more careful handling. Change the fixed func to only reset the provider on the first call.
p.ReadDataSourceCalled = false | ||
p.CloseCalled = false | ||
} | ||
if p, ok := rp.(*MockProvider); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has always been a very delicate and unreliable aspect of the mock provider. Resetting these on each factory call assumed that the same instance was used for an entire operation which isn't always true, and the new external caching makes that all the more apparent. We settle on making the provider instance reusable, but requiring a new closure for each context if you want to use these fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is of course too large for a thorough review, but I tried to pay closest attention to the specifics you mentioned in the PR description and those parts all seemed reasonable to me. I think the best way to identify any unexpected implications of this is to merge it into main
and let it soak in our alpha releases during the v1.6 development period.
I was initially a little nervous about using GetProviderSchemaResponse
as the main representation given that it's quite tightly coupled to the current wire protocol -- e.g. if we decided to switch to a different approach of requesting individual resource types only when needed rather than fetching everything all at once then GetProviderSchemaResponse
might cease to exist altogether. But on reflection it seems like we'd still want to have somewhere to flatten all of that down into a single type, and the fact that you used a type alias elsewhere rather than mentioning GetProviderSchemaResponse
directly means that we could just rename type GetProviderSchemaResponse
to type ProviderSchemas
if later we have more separation between the protocol representation and the internal representation.
My only remaining feedback about that, then, is that I'd like to see a common on type GetProviderSchemaResponse
to make it explicit that this name should be used only for the type in the signature of the providers.Interface
method, and that type ProviderSchemas
should be used everywhere else. (I might even consider swapping it so that GetProviderSchemaResponse
is the alias, to make it more likely that a new maintainer will just "accidentally" find the right name to use, but I'll leave that up to you!)
Yes, I used the |
Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch. |
Reference: #33482 The intention of this logic is to use the global provider schema cache if it is available for a given provider address based on the `GetProviderSchemaOptional` server capability. Previously the logic was errantly checking the named return early, which was not populated yet with the server response. This flips the logic to properly use the server response to determine whether the cache should be set, thereby still gating the caching based on the server capability. In a configuration with 19 `hashicorp/aws` provider instances and an associated data source, this drops the received `GetProviderSchema` calls from 39(?) to 1. Maximum RSS dropped from 1.22 GB to 189 MB.
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
The primary goal of this PR is a structural refactoring of the types used to store provider schemas. Previously there was the deprecated
*terraform.ProviderSchema
, which was an alias forproviders.Schemas
, andproviders.GetProviderSchemaResponse
which contained the same data but was structurally distinct fromproviders.Schemas
. All of these were equally used throughout the codebase, including tests, so unfortunately there was no way to get to a unified the type without a large diff.The result here is that we now have a single type for all provider schema representations:
providers.ProviderSchema
. This happens to be an alias for the internal rpc response typeproviders.GetProviderSchemaResponse
, which includes all schema data including request diagnostics. While the diagnostics are not applicable in most situations, multiple concurrent calls to the provider each need to access the schema and need to handle the error path, so the cache must include the diagnostics for use within the provider client. TheProviderSchema
type does also use theSchema
struct for all schemas which happens to have aVersion
field not applicable in all cases, however keeping the type consistent and reusable without messy conversions outweighs the extra cruft in the type signature while leaving room open for extension.We also lift the caching layer for schemas from core to
providers.SchemaCache
as a global store of provider's schemas. The second copy of the schemas in core was redundant, and the goal of the global cache is to reduce memory footprint and allow multiple instances of providers to share the previously loaded schema. While the cache is transparently used through the provider, we still have core check the cache when looking up a schema, because we may not even need to load a provider instance at all in some cases.Writing to the cache is temporarily disabled, because some releases of providers require that
GetProviderSchema
is called first to initialize the provider, and a forthcomingProviderCapability
flag will be set by those that can be effectively cached. Even though there will be providers that cannot be globally cached, there is still no need for the old core caching layer, as it didn't prevent theGetProviderSchema
call on every provider instance, and if it did work it would have broken those providers which require that as the first call (these providers' clients will still cache the schema themselves per instance).