Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Node.getChildren speed regression in ts 5.5 when used at very large scale #59101

Closed
walkerdb opened this issue Jul 2, 2024 · 37 comments · Fixed by #59154
Closed

perf: Node.getChildren speed regression in ts 5.5 when used at very large scale #59101

walkerdb opened this issue Jul 2, 2024 · 37 comments · Fixed by #59154
Assignees
Labels
Domain: Performance Reports of unusually slow behavior Fix Available A PR has been opened for this issue Needs Investigation This issue needs a team member to investigate its status. Recent Regression This is a new regression just found in the last major/minor version of TypeScript.

Comments

@walkerdb
Copy link

walkerdb commented Jul 2, 2024

🔎 Search Terms

typescript 5.5 performance, slowdown, typescript-eslint

🕗 Version & Regression Information

⏯ Playground Link

No response

💻 Code

I've put together a repro in a forked version of the sentry monorepo here: walkerdb/sentry#1. It's using the sentry repo only because it's a large public repo that uses typescript-eslint.

🙁 Actual behavior

When running typescript-eslint's type-aware lint rules in a large monorepo with typescript 5.5, we observe lint times between 1.3x to 3x worse than when running the exact same rules and the same config under typescript 5.4. The slowdown seems to be entirely coming from the monomorphized objects change in the ts 5.5 release.

🙂 Expected behavior

There should be no performance impact when moving from typescript 5.4 to 5.5.

Additional information about the issue

We've submitted a bug report to typescript-eslint but we're also reporting here since the slowdown can be pinpointed to a specific typescript change.

Also attaching some before/after update perf traces via 0x that show the extra time is coming largely from typescript internals. The highlighted boxes show the specific part of the process that has ballooned in runtime. I'm happy to share the actual traces with maintainers if they'd be useful in debugging.

ts 5.4 ts 5.5
Screenshot 2024-07-01 at 5 48 38 PM Screenshot 2024-07-01 at 6 04 48 PM
@DanielRosenwasser
Copy link
Member

I didn't get that far on this today, but I believe this is from more time spent in the GC. If you try upping your --max-old-space-size to something way higher, let me know if that alleviates things.

As an aside, SO much time in your build is being spent on shims running within eslint-plugin-react.

Time dominated by calls in findVariableByName

While findVariableByName really should be optimized to not create an intermediate array with variablesInScope, is it possible that you are running on an undesirable input file?

@jakebailey
Copy link
Member

This is the opposite of expected! Which walk function is that? I'm not familiar with that "walk" function. Traces would be helpful, but walkerdb/sentry#1 is probably enough to give us a lead.

You may also consider running things via https://www.npmjs.com/package/pprof-it (what Daniel just used above) as it in my experience does a better job attributing time and memory allocations to specific lines of code.

@jakebailey
Copy link
Member

Hm, my trace doesn't seem to show that particular code as to blame:

image

@walkerdb What tool did you use to find your result?

@jakebailey
Copy link
Member

I timed this, and in 5.5, it's 100s, but 5.4 is 91s; not 2-3x slower from what I can tell...

@jakebailey
Copy link
Member

A second run of 5.4 was 100s too. Not sure what's going on here. What version of Node are you using?

@jakebailey
Copy link
Member

jakebailey commented Jul 2, 2024

I was using Node 22; switching down to Node 18 shows a worse regression:

// 5.4.5
total time:  160.75s
user time:   212.16s
system time: 4.96s
CPU percent: 135%
max memory:  3946 MB

// 5.5.3
total time:  217.70s
user time:   292.33s
system time: 6.36s
CPU percent: 137%
max memory:  3995 MB

Not 2-3x but very much farther apart than Node 22.

Unfortunately I'm now going to be gone until next week, so this will just have to eat me up inside until then...

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

Hi @jakebailey! Sorry, I should have clarified a few things:

  1. the traces above were generated using 0x, but on our private monorepo, which has a 2.5x slowdown when running our typed lints under ts 5.5 instead of 5.4. I used sentry as a public repro but its slowdown was only ~1.3x on my machine.
  2. in our private monorepo we run lint with essentially unlimited max-old-space and also with higher max-semi-space since we run into gc issues with young gen / scavenge as well at our size.
  3. we are indeed running node 18! Likely will update to 22 by end of year but that's a major undertaking for us.

to @DanielRosenwasser I'm not a part of sentry, I was just using that repo as an example repro since ours is private.

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

the types in Notion's private monorepo can be quite complex, and I suspect are close to a pathological case for some typescript perf behavior, which may help explain the 2.5x slowdown we were observing. I'll see if I can hack running the same checks under node 22 to see if the timings change there much.

@jakebailey
Copy link
Member

Ran Node 18, and the results are very different from #59101 (comment):

image

That is an incredible amount of GC time, along with the badness Daniel noted above. Given the difference between Node 18 and Node 22 on this being in that shim code, I'm not 100% certain that this repro is representative of the repo you're testing internally, unfortunately...

Would you be able to run pprof-it on your own codebase? Screenshots of flame graphs and such should not expose any details, but you can also run it with PPROF_SANITIZE=on to sanitize the output for sharing (as that will remove any paths from the profile).

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

@jakebailey I'll give pprof-it a try, ty!

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

here's my pprof result for typescript 5.5.2 on node 18.18. I'll also upload one for typescript 5.4 in a sec
pprof-time-typescript-55.pb.gz
pprof-heap-typescript-55.pb.gz

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

pprof results for the exact same run but using typescript 5.4.5 on node 18.18 . This is significantly faster, and matches our expected run time from prior to the 5.5 update
pprof-time-typescript-54.pb.gz
pprof-heap-typescript-54.pb.gz

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

last run, with typescript 5.5.2 and node 22.3.0. Running under a newer node version doesn't seem to actually be much faster in this case?
pprof-time-typescript-55-node-22.pb.gz
pprof-heap-typescript-55-node-22.pb.gz

@DanielRosenwasser DanielRosenwasser added Domain: Performance Reports of unusually slow behavior Recent Regression This is a new regression just found in the last major/minor version of TypeScript. Needs Investigation This issue needs a team member to investigate its status. labels Jul 2, 2024
@jakebailey
Copy link
Member

In all of those profiles, nearly half the time is spent just getting/setting the node children via the WeakMap. Yikes. I'm not sure why this isn't observable outside of your internal code unless WeakMaps get noticably slower when they get massive.

May have to tack that prop back on again, but the prop not being on all nodes directly was one of the things we had to do to not blow up perf too much.

@jakebailey
Copy link
Member

At least I assume it's the weak map itself; haven't quite figured out how to remap the paths in the profile to a real version. (I'm also on a plane, that doesn't help)

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

thanks so much @jakebailey and @DanielRosenwasser for digging in! Happy to help however I can, feel free to shoot me a message at [email protected] if there's any other info you could use / commands to run / code to share / etc that I can help with.

In the meantime though we can wait through the holiday week, hope you have a great vacation!

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

can confirm that the getChildren call in the trace maps to this code in the ts source

public getChildren(sourceFile?: SourceFileLike): readonly Node[] {

here also are some pprof diff screenshots between the 5.4.5 run and the 5.5.2 run, generated with go tool pprof -http=:8080 --diff_base=pprof-time-typescript-54.pb.gz pprof-time-typescript-55.pb.gz

Screenshot 2024-07-02 at 10 47 25 AM Screenshot 2024-07-02 at 10 46 46 AM

@walkerdb
Copy link
Author

walkerdb commented Jul 2, 2024

Interesting also that the slower ts 5.5 run uses significantly less heap, 23gb for 5.4 vs just 14gb on 5.5. Most of the items in the screenshots are typescript-eslint functions that themselves call ts public methods. (screenshots from a similar diff view comparing 5.4 heap usage to 5.5 heap usage, with 5.4 as the baseline)

Screenshot 2024-07-02 at 11 04 29 AM Screenshot 2024-07-02 at 11 04 16 AM

@DanielRosenwasser
Copy link
Member

DanielRosenwasser commented Jul 3, 2024

@walkerdb any chance you can iterate and reduce this repro to see if there's a root cause beyond just scale?

For example, if you cut out half of the files in your program - even if there are build errors, do you see the same surprising amount of time in set/getNodeChilren? If so, could you keep trying that until it stops? I'd be curious to see if there's a specific set of files that causes it, or if it's just fundamentally something about the move to WeakMaps.

@dmichon-msft
Copy link
Contributor

This is probably just scale and repeated reallocation of the WeakMap at larger and larger sizes. It might be worth providing a facility for these APIs to be handed a scoped cache object (or using the SourceFileLike ancestor as the cache object, where applicable), rather than using a single global WeakMap.

@walkerdb walkerdb changed the title perf: monomorphized objects change in ts 5.5 significantly slows down third-party tooling perf: Node.getChildren speed regression in ts 5.5 when used at very large scale Jul 5, 2024
@walkerdb
Copy link
Author

walkerdb commented Jul 5, 2024

@DanielRosenwasser I just gave a code bisect a go to see if a specific module or type is causing problems. As far as I can tell it's unrelated to the specific code being parsed, but as @dmichon-msft just mentioned it seems related to scale -- past a certain line-of-code count getChildren gets progressively slower and slower.

Specifically, I'm noticing the getChildren function showing up in perf traces once our typed-lint runs start passing 6000 files processed (~1M LOC), and is definitely prominent once we hit ~1.5M LOC. In reality this is probably more accurately tied to AST nodes processed but I don't have quick access to that number.

Running the same checks on smaller subsets of the same set of files results in expected fast performance.

Also for additional context, normal typecheck time with tsc is unchanged for us in ts 5.5, it's just this use of the Node.getChildren function through typescript-eslint that has significantly slowed down.

@walkerdb
Copy link
Author

walkerdb commented Jul 5, 2024

Even this much information is great though. I think my team now has a clear path forward for us to make the 5.5 bump, just split our typed lint job into smaller subsets under the LOC perf limit and overall runtime shouldn't be a long pole in our CI.

@DanielRosenwasser
Copy link
Member

DanielRosenwasser commented Jul 5, 2024

So I created #59154 to experiment a bit with @dmichon-msft's suggestion. If you want to take a profile with it, you can install @typescript-deploys/[email protected] and let us know how that works:

npm install -D @typescript-deploys/[email protected]

Note that this is an experiment and it's pretty risky perf-wise for the language service. This change means that populating the list of child nodes now needs a walk up to the source file itself. So no guarantees that we land it, but it could inform other work.

@DanielRosenwasser
Copy link
Member

DanielRosenwasser commented Jul 5, 2024

It could also be slower for you too in a different way. It means that simply requesting the children of the tree grows to something like $n \log(n)$ without knowing the the sourceFile (hopefully TS ESLint is passing that along).

@walkerdb
Copy link
Author

walkerdb commented Jul 5, 2024

@DanielRosenwasser looks like that change does resolve nearly all the perf difference for us between ts 5.4 and 5.5! Running our typed-lint suite with @typescript-deploys/[email protected] seems to match total runtime for our ts 5.4 runs. Profiles attached:

pprof-time-typescript-childrenCachedBySourceFile.pb.gz
pprof-heap-typescript-childrenCachedBySourceFile.pb.gz

@DanielRosenwasser
Copy link
Member

DanielRosenwasser commented Jul 5, 2024

@walkerdb that's really great! I think we will discuss this internally and try to figure out if we ship this specific change, and if we can back-port it. One thing that would help us a lot is if you could try that build as your daily driver. In VS Code, you can configure a workspace version of TypeScript by following the docs here or just setting the typescript.tsdk option:

{
  "typescript.tsdk": "./node_modules/typescript/lib"
}

@DanielRosenwasser DanielRosenwasser self-assigned this Jul 5, 2024
@DanielRosenwasser DanielRosenwasser added this to the TypeScript 5.5.4 milestone Jul 5, 2024
@walkerdb
Copy link
Author

walkerdb commented Jul 6, 2024

Can do, thank you for all your help! Is there any specific feedback that would help on the new version other than whether tsserver requests like getProgram, getSemanticDiagnostics, getCompletionsAtPosition etc still seem to work and are just as responsive as before? (we track perf metrics for each of them via a custom plugin so it should be pretty easy for us to tell whether there's been any major regression there)

@jakebailey
Copy link
Member

we track perf metrics for each of them via a custom plugin so it should be pretty easy for us to tell whether there's been any major regression there

This sounds amazing, and I absolutely want to hear more about this. This is the first we've heard of someone tracking this to this extent so anything is awesome.

@maschwenk
Copy link

Had a huge typescript-eslint regression just like this in our very large monolith. Had a similar idea of bumping --max-old-space-size but that didn't work:

https://discord.com/channels/1026804805894672454/1254556633556713553/1255160613441769683

Setting my typescript dependency to "typescript": "npm:@typescript-deploys/[email protected]", fixed the issue, and actually seems a bit faster than baseline!

@typescript-bot typescript-bot added the Fix Available A PR has been opened for this issue label Jul 8, 2024
@DanielRosenwasser
Copy link
Member

DanielRosenwasser commented Jul 8, 2024

Hey all, this is fixed in the main branch but I want it to prove itself out for a bit. If you use the nightly build tomorrow (especially in the editor https://marketplace.visualstudio.com/items?itemName=ms-vscode.vscode-typescript-next), that would go a long way.

If you are actually able to measure, we are specifically looking for differences in memory usage, changes in editor operation delays, as well as changes in variance in delays.

If things feel good, we can cherry-pick the change back into 5.5.

@walkerdb
Copy link
Author

walkerdb commented Jul 9, 2024

@DanielRosenwasser can do! I'll get a test group going internally tomorrow, will see if metrics move much. We do also track total tsserver memory usage, should be able to see if that moves much under the new version as well.

@jakebailey happy to chat more about our tsserver observability setup! Could talk sync / show a few dashboards if you're interested?

@walkerdb
Copy link
Author

walkerdb commented Jul 9, 2024

and thank you both again for moving so fast on this, much appreciated!

@JoshuaKGoldberg
Copy link
Contributor

+1 from the typescript-eslint side, we really appreciate it! Thanks so much for the fast follow and (very interesting) deep dive everyone!

@walkerdb
Copy link
Author

walkerdb commented Jul 12, 2024

following up to say we haven't noticed any serious perf regressions in IDE usage so far. We haven't had the participation numbers to say much more than that though.

@maschwenk
Copy link

Likewise on stability in my testing in very large monorepo (with a very large monolith). Curious if there are plans on a 5.5 backport? Or still targeting 5.6?

@jakebailey
Copy link
Member

The backport is open here: #59211

@DanielRosenwasser
Copy link
Member

TypeScript 5.5.4 should contain the fix - thanks for reporting everyone!

AlessioGr added a commit to payloadcms/payload that referenced this issue Jul 30, 2024
TypeScript 5.5.4 apparently speeds up linting by a lot:
microsoft/TypeScript#59101
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Domain: Performance Reports of unusually slow behavior Fix Available A PR has been opened for this issue Needs Investigation This issue needs a team member to investigate its status. Recent Regression This is a new regression just found in the last major/minor version of TypeScript.
Projects
None yet
7 participants