Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(resolve): increase resolve speed via resolve result cache #12395

Closed
wants to merge 2 commits into from

Conversation

sun0day
Copy link
Member

@sun0day sun0day commented Mar 13, 2023

Description

refs #12363

I found that there are tons of duplicate bare package import requests sent to resolve.ts. So I guess we can optimize this via a Map.

I tested the reproduction mentioned in #6030 in my local macos, after this PR, the speed of resolving each module will increase about 7-10 times and the realpath's self time will decrease about 100 times.

After PR:
image
image


Before PR:

image

image

Additional context


What is the purpose of this pull request?

  • Bug fix
  • New Feature
  • Documentation update
  • Other

Before submitting the PR, please make sure you do the following

  • Read the Contributing Guidelines.
  • Read the Pull Request Guidelines and follow the PR Title Convention.
  • Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. fixes #123).
  • Ideally, include relevant tests that fail without this PR but pass with it.

@stackblitz
Copy link

stackblitz bot commented Mar 13, 2023

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

@sun0day sun0day changed the title chore(resolve): increase resolve speed via resolve result cache perf(resolve): increase resolve speed via resolve result cache Mar 13, 2023
@patak-dev patak-dev added the performance Performance related enhancement label Mar 13, 2023
@patak-dev
Copy link
Member

This is really interesting @sun0day! And it speaks volumes of how much we can win by optimizing the resolution of bare imports.

My first reaction to the cache at this level is that it may introduce new failure points. Something we need to do, for example, is to clean the cache when a new deps optimization is committed, as this invalidates the resolved optimized URLs. I'm trying to think if something else could outdated the resolution, and it may actually end up being safe if we are careful.

Another idea to avoid this caching could be to do the caching at the realpath level. See:

Maybe before moving forward with this PR, we should check @bluwy's idea of replacing fs.realpath.native with fs.realpath that already has caching inplace.

@sun0day
Copy link
Member Author

sun0day commented Mar 14, 2023

Yeah, I've thought about cache fs.realpath too. And I will test its performance later.

But if we can cache resolve calls at a high level(e.g resolve.ts), we are caching not only fs.realpath calculate result but also some extra function calls. What's more, we can maintain only one cache layer in this way, maybe we don't need the packageCache cache anymore and don't have to care about nodejs API (e,g fs.realpath ) bottom details.

On the other hand, I have to admit that this PR has some flaws like you said, but I think these flaws won't be a blocker.

@patak-dev
Copy link
Member

Yes, I agree. It is a compelling place for a cache. We have other caches already in place for bare imports, for example there is a cache for shouldExternalizeForSsr for an id. And it currently only depends on the resolved config. So it is expected that the resolution is kept stable by user plugins.

We are also caching the resolution of optimized deps (through the optimizer), so we are already halfway there. I'm interested in seeing what others think here.

I think we can review this idea and if we find it is safe, try it out during the next minor beta (we should avoid merging bigger changes now so we can release 4.2).

@patak-dev patak-dev added the p4-important Violate documented behavior or significantly improves performance (priority) label Mar 14, 2023
Copy link
Member

@ArnaudBarre ArnaudBarre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating the resolve performances, much needed.

The issue is that with all the extension logic going on, any new file could invalidate the resolution. Maybe we could add some time to leave checks (like 5-10s) so we get the speed up on dev-server startup where probably new files aren't created but we don't get stale cache during HMR.

@@ -126,6 +126,7 @@ export function resolvePlugin(resolveOptions: InternalResolveOptions): Plugin {
} = resolveOptions

const { target: ssrTarget, noExternal: ssrNoExternal } = ssrConfig ?? {}
const resolvedCache = new Map()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to type the Map?

Copy link
Member

@patak-dev patak-dev Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Map should be replaced by a WeakMap<ResolvedConfig,...> like we are doing with other caches

Copy link
Member Author

@sun0day sun0day Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether ResolvedConfig's shallow value will change or not. My first reaction is using JSON.stringify(ResolvedConfig) as a cache key, something like Map<string, string> or WeakMap<string, string>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the current usage I think at least we should have something like Map<string, string>().
Maybe WeakMap is better for memory management? I don't know the current usages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already using the same pattern for all other caches (as the one linked in my last message). ResolvedConfig can be considered immutable once the dev server is running or the rollup build started (the only case where it could be modified is in the configResolved hooks in some plugins, but is more like a hack). If the ResolvedConfig changes, all values should be invalidated so having the Cache against it works well. It is also done like this because the plugins are cloned when the server restarts or there is a rebuild, so we need this general invalidation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood you before. Now I get it. The resolvedCache here is always new every time resolvePlugin is called whenever the server restarts or there is a rebuild. You can think of it as WeakMap<ResolvedConfig, ResolvedCache>'s pair value. I am fine to use WeakMap to store it for better memory management.

@patak-dev
Copy link
Member

The issue is that with all the extension logic going on, any new file could invalidate the resolution. Maybe we could add some time to leave checks (like 5-10s) so we get the speed up on dev-server startup where probably new files aren't created but we don't get stale cache during HMR.

Even if a time-based cache sounds safer I think in this case we should only proceed if we can properly invalidate the cache when needed. If it is safe, then we don't need the time check. If it isn't safe, then we better avoid this optimization.

Do you have a good example of a new file invalidating the resolution of bare imports? I think it would be fine as far as we respect the current limitations. For example, the dev server watcher is ignoring '**/node_modules/**' so for any modifications we do inside node_modules, we need to restart the server manually. The users are already used to this because of deps optimization. If something changes in the config, the server is restarted and the cache will be clean.

@sun0day one thing I see is that the importer is missing in the cache key. If added, it may prevent this optimization from having such a big impact. In that case, maybe caches could be added deeper into the resolve algorithm.

@sun0day
Copy link
Member Author

sun0day commented Mar 15, 2023

@sun0day one thing I see is that the importer is missing in the cache key. If added, it may prevent this optimization from having such a big impact. In that case, maybe caches could be added deeper into the resolve algorithm.

Yeah. I also test the performance with importer too. The realpath's self-time will only decrease about 10 times. But I think there is still a lot of room for optimization even using importer as part of the cache key. The first thing we need to do is to determine the params on which the bare-import's resolution result depends.

@sun0day
Copy link
Member Author

sun0day commented Mar 15, 2023

Using importer as part of the cache key or not reminds me that with all the test cases passed in this PR, there are two possibilities:

  1. if the current PR without importer as part of the cache key is safe, then the resolve logic downstream may have duplicate logic
  2. if the current PR is unsafe, I think vite may miss some test cases

@sun0day
Copy link
Member Author

sun0day commented Mar 22, 2023

I tested both the performances in themain branch and mine after merging your great work @patak-dev @bluwy and I found the performance is good enough in main and mine is less help.

This pr is somewhat experimental and maybe unsafe, so feel free to close it.

@bluwy
Copy link
Member

bluwy commented Mar 22, 2023

Thanks for testing this again. A hunch I had is also that the JSON.stringify is slower compared to just running the resolve, so maybe that explains the difference. I'll let patak check this before deciding to close 🙂

@patak-dev
Copy link
Member

Doubling on the thanks to you @sun0day, this PR was important in pointing us in the right direction and was one of the ideas that ended up finally pushing the current optimization sprint. Let's close it as suggested. We can revisit later if caching at a different level could help in the future.

@patak-dev patak-dev closed this Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p4-important Violate documented behavior or significantly improves performance (priority) performance Performance related enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants