Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements to CPU profile code #7917

Open
mkustermann opened this issue Jun 12, 2024 · 5 comments
Open

Performance improvements to CPU profile code #7917

mkustermann opened this issue Jun 12, 2024 · 5 comments
Labels
dart2wasm devtools app performance Related to the performance of the DevTools app (not the Performance page) P2 important to work on, but not at the top of the work list. product-quality Issues related to product quality. screen: cpu profiler Issues related to the CPU Profiler screen

Comments

@mkustermann
Copy link
Member

After having taken a look at the cpu profiling code in DevTools, we found:

  • It unnecessarily encodes json maps and decodes them again: devtools/pull/7916
  • It gets Stream</* String | List<int> */> from the websocket and then json decodes strings
    => It would be more efficient to avoid going from utf-8 to string and then to json and instead directly from utf8->json
    => This would probably require some refactoring in various packages, so maybe not easily doable?
  • Way too many conversions
    • Bytes->String in websocket layer
    • String->JSON in package:vm_service layer
    • JSON -> service objects in package:vm_service layer (e.g. vm_service.CpuSamples, vm_service.CpuSample)
    • service objects -> _CpuProfileTimelineTree tree structure
      => Uses expandos (!!) to map service objects back to tree nodes
    • _CpuProfileTimelineTree -> JSON "traceObject" (very expensive operation!)
    • Going from JSON "traceObject" back to normal objects in CpuProfileData.fromJson which builds json maps with strings and CpuStackFrame as values

It seems there's too many conversions from one data structure to another and the representations used for encoding this profiling information doesn't seem very efficient either.

@kenzieschmoll
Copy link
Member

CC @bkonyi

@kenzieschmoll kenzieschmoll added screen: cpu profiler Issues related to the CPU Profiler screen devtools app performance Related to the performance of the DevTools app (not the Performance page) labels Jun 12, 2024
@mkustermann
Copy link
Member Author

Another thing is that I believe loading CPU profiles can freeze the UI for a while - most likely due to all this work happening synchronously without any yields to the event loop. I have observed 80 MB of json to be loaded for cpu profile, which may result in quite a bit more of in-memory json (consumed by maps, lists, etc - which aren't as compact as in the encoded form) - possibly 100s of MBs it's churning through synchronously several times.

@kevmoo
Copy link
Contributor

kevmoo commented Jun 14, 2024

Could we make this function async? Then we can batch the processing and avoid janking frames...

@kenzieschmoll kenzieschmoll added P2 important to work on, but not at the top of the work list. dart2wasm labels Jul 10, 2024
@mkustermann
Copy link
Member Author

Friendly ping. @kenzieschmoll @bkonyi Are there plans on improving this in the short/mid term?

@kenzieschmoll
Copy link
Member

@mkustermann thanks for the performance analysis. It sounds like all of your recommendations would be great improvements. In practice, this isn't something we have the cycles to work on in the short term, but are more than happy to accept & review contributions if you'd like to take a stab at any of these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dart2wasm devtools app performance Related to the performance of the DevTools app (not the Performance page) P2 important to work on, but not at the top of the work list. product-quality Issues related to product quality. screen: cpu profiler Issues related to the CPU Profiler screen
Projects
None yet
Development

No branches or pull requests

3 participants