Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3: shrink package size #3422

Open
thomas11 opened this issue Jul 8, 2024 · 8 comments
Open

V3: shrink package size #3422

thomas11 opened this issue Jul 8, 2024 · 8 comments
Labels
3.0 area/providers impact/usability Something that impacts users' ability to use the product easily and intuitively kind/enhancement Improvements or new features

Comments

@thomas11
Copy link
Contributor

thomas11 commented Jul 8, 2024

This is an open-ended issue to capture ideas and investigations on how we could shrink the size the provider and its SDK. The v3 major version bump might provide opportunities for breaking changes.

@thomas11 thomas11 added impact/usability Something that impacts users' ability to use the product easily and intuitively area/providers 3.0 labels Jul 8, 2024
@mikhailshilkov mikhailshilkov added the kind/enhancement Improvements or new features label Jul 9, 2024
@olafurnielsen
Copy link

olafurnielsen commented Jul 9, 2024

Size in megabytes isn't the most concerning problem for this provider IMO – rather the number of symbols/objects the IDE's language server has to analyze & index, for properly typed experience.

As mentioned in #3124 the sheer number of objects in this provider, due to the breadth of the Azure API and autogenerated nature of this provider, causes the Python Language Server to run out of memory and blow up.

Now with the introduction of #3400 number of symbols in the provider seems to have increased by roughly 30k:

$ grep -Ri 'ArgsDict(TypedDict)' . | wc -l
29958

I really like the direction of TypedDicts and assume they will supersede the Args classes in the next major version but this seems to have made matters even worse than before, causing more frequent crashes of the Python Language Server in VScode.

@AaronFriel
Copy link
Contributor

Is LSP performance improved or affected by how pulumi_azure_native is imported?

# import named types:
from pulumi_azure_native.storage import ...
# import submodule
from pulumi_azure import storage
# import package
import pulumi_azure

@olafurnielsen
Copy link

Don't know if this is getting out of scope but I guess these are in the end symptoms caused by the size/breadth of the SDK – @thomas11 feel free to comment if you want Python/IDE/typing specific issues/symptoms to be tracked in a sub/seperate issue.

Is LSP performance improved or affected by how pulumi_azure_native is imported?

For sure, import pulumi_azure_native will trigger the LSP into scanning the whole library and should be avoided. I've made several attempts in optimizing the settings and import strategies but importing a sub-package (... import storage) vs. specific objects from [...].storage import X, Y Z doesn't seem to make much of a difference – I suspect the LSP will analyze __init__.py anyways and follow the trail to other modules in the same sub-package.

I took another shot at investigating the behaviour in a freshly created azure-native Python project and even when being extra careful with your imports something like hovering over a module and navigating to its definition will kick off a library-wide scan.

After some unstructured browsing through issues of similar nature tracked in Pyright & MyPy affecting other large Python libraries I wanted to throw some ideas into the mix that might (?) improve IDE experience in generated Python SDKs:

  • Could you start generating and including typestubs (*.pyi) in the package? Those only contain extracted type information and exclude implementation for more performant type analysis. Some Python libraries choose to distribute this in a seperate -stubs package or via package extras.
  • Would generating a single __init__.pyi for a sub-package with extracted type information from all of its members improve performance instead of generating typestubs for each sub-package's module? Using pulumi_azure_native.storage as an example – it has 44 Python files most of which only export 2-4 symbols each (resource & args classes or get/list/result functions/types)
  • The SDK lazy loads sub-packages at runtime but imports them during type checking which is clever. Scientific Python's lazy_loader implements this in a way that also supports attaching *.pyi during type checking. This blog post and implementation might provide some insights on possible improvements?
  • Sub-package __init__.pys in Pulumi Python SDKs seem to use * imports (i.e. from .blob import *) instead of explicit imports from .blob import Blob, BlobArgs and defining their exported interfaces via __all__. I don't know if this would improve performance but most libraries I've investigated follow the latter pattern.

@AaronFriel
Copy link
Contributor

@olafurnielsen appreciate the insight and citations here. I think I can speak for Pulumi and say great feedback like this really helps us hone our product.

@cleverguy25
Copy link

Added to epic https://github.com/pulumi/home/issues/3552

@1oglop1
Copy link

1oglop1 commented Sep 24, 2024

I would like to add that this problem is not unique to azure-native but it is a problem for all packages, eg. @pulumi/aws where node_modules/@pulumi/aws/types/input.d.ts is ~30MB which causes performance issues with IntelliJ IDEs

@EronWright
Copy link
Contributor

A perhaps obvious solution would be to produce a separate library for each Azure API, all tied to the "azure-native" plugin. This would force the user to have more fine-grained dependencies, with benefits felt across all languages.

A wilder idea is to make it easy to generate a project-specific SDK that is tailored to have specific Azure APIs. Imagine some sort of parameterized provider with a configuration block in Pulumi.yaml to select your preferred api versions on a per-API basis.

@joeduffy
Copy link
Member

joeduffy commented Jan 9, 2025

This is an interesting approach: https://github.com/Ankvi/pulumi-azure-native. It splits out every sub-namespace into its own NPM package. Apparently, this is working well for them in practice ("night and day difference, perf-wise.") They said the usability hit of needing to add additional package references isn't bad and there is precident in other projects, like Angular.

thomas11 added a commit that referenced this issue Jan 29, 2025
…3899)

Part of #3422. Determine API versions that are older than any _previous_
default version in a module. These are good candidates for removal.

---------

Co-authored-by: Daniel Bradley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 area/providers impact/usability Something that impacts users' ability to use the product easily and intuitively kind/enhancement Improvements or new features
Projects
Status: No status
Development

No branches or pull requests

8 participants