-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential performance improvement. #1572
Comments
Hi, the stack bounds is already cached in Can you verify whether that is actually happening? Are you calling the runtime from a different thread every time? |
Sorry, I don't understand what these screenshots are supposed to tell me. |
That is flame graph from a single thread. On those screenshots you can see that thread_stack_bounds is called multiple times. If it would be cached in this flow we would see it only once. workouts::SharebleWorklet::toJSValue calls another jsi::Function which calls ScopedNativeCallFrame which finally calls thread_stack_bounds. There are 2 such calls on the first screenshot and 3 on the second one. The last screenshot is just to show that all comes from the same thread :). |
Hi! @Szymon20000 I was recently working on this problem in reanimated. This is something that happens when there are calls from different threads to the reanimated worklet runtime. I think you should also see some calls to the reanimated runtime on the JS thread happening at the same time in your trace. The problem is that Hermes goes through a slow path overflow check when it's called from a different thread. So if there are any interleaving JS thread and UI thread calls, then you will see a performance degradation. This issue is present since:
We are not yet sure how we want to address it in reanimated since we are not sure if this is a problem on our side or if this is a problem on the application side. Could you share with us the code that's causing this to happen, as this would help us decide our next steps? |
I am not opposed to adding per-thread caching, but before we do it, I would like to understand the use case. So, is the case that we are getting calls to the same runtime from the same two (or more) threads, just switching between them? Also, when this happens, is the JS stack empty? |
Ah so currently Hermes only keeps last thread cached? The calls I showed are only from main but it is possible that between them the runtime is accessed from other threads. |
If that's the case then it looks like a bug in Reanimated (@bartlomiejbloniarz maybe synchronous calls blocks UI thread but access runtime from js-thread?) but at the same time the same thing can happen in fabric when we (sometimes) render synchronously on UI thread but I would need to check if that happens actually on Js thread and we just block the UI or it happens on UI actually. |
https://github.com/facebook/react-native/blob/1d909efa235ffae150de25a201efbbe752d0bc52/packages/react-native/ReactCommon/runtimeexecutor/ReactCommon/RuntimeExecutor.h#L35 It actually will run it on UI thread so I think we have the same case in Fabric. @tmikov But If there are no other use cases for caching it per thread then it would be good to suggest to RN team to call callback on JS thread always and just block current thread. |
@bartlomiejbloniarz I think it's enough to block js thread and schedule a call on UI this will probably fix this problem. Right now it takes lock and uses runtime on JS thread. https://github.com/software-mansion/react-native-reanimated/blob/0162804a8ace2a8f6b77764894d8e4d87f94781a/packages/react-native-reanimated/Common/cpp/worklets/WorkletRuntime/WorkletRuntime.cpp#L84 |
software-mansion/react-native-reanimated#6770 @bartlomiejbloniarz This seems to fix it but at the same time now we need to wait for the main thread even when reanimated is currently not using the thread. So I think caching per thread would be actually better @tmikov . Another approach would be to make reanimated use separate thread and also call runSync from UI but not sure how big would be the overhead. |
@tmikov yes. We've seen this in one of our client's apps, that they had many calls to our hermes runtime from both UI and the JS thread simultaneously. They weren't able to give me a reproduction, but I was able to reproduce it "artificially," simply by:
I think a more natural scenario for this to happen would be if someone was running an animation on a component with an I think that usually it's an anti-pattern to do this in reanimated (but there might be some cases when it is necessary, I just haven't seen them yet). Adding per-thread caching would be I think the best solution for us. @Szymon20000 I don't think that this change in reanimated is a good solution. I will give a more in depth answer in the PR. |
@tmikov In Reanimated we don't exactly couple additional runtimes to a given thread. They're considered resources which any thread could potentially obtain for a period of time. Sometimes, a runtime (let's call it Primary) might require synchronization of some data with another runtime (Secondary) on a user's request. Whether this request is reasonable from the user's App design perspective is a whole another topic. Because the runtimes aren't coupled to a thread, it might occur that Secondary isn't used by any thread at the moment and we can acquire it "freely" from Primary's thread. In result we get a much faster sync. Secondary is released by the Primary's thread after the sync and we are back where we were. At the time we preferred that solution to it's alternative - runtime-thread coupling. In the coupling scenario, we'd need to schedule an operation from the Primary's thread to the Secondary's thread. This seems good on paper - except for the fact that in the majority of our cases the Secondary's thread is the Main thread. Schedule the operation at an unlucky moment and you might need to wait for a very long time to get its result. @bartlomiejbloniarz Please correct me if I omitted/oversimplified some things here. |
@tmikov Do you have enough information from the responses above or is there still uncertainty about the use cases? Just trying to figure out where it is & if there's something actionable to do to move this forward. Thanks! |
We are adding this optimization. |
The fix landed in Static Hermes here: 4fc57f3 We are backporting it to Hermes as well. |
Backport landed in Hermes 568b1c9. |
Problem
I'm profiling a RN app and noticed huge performance problems with calling js function from jsi level. I digged into it and it turned out that most of the time is spend on pthread_getattr_np (on android). Here is a screenshot from android studio sampling profiler
Solution
Would it be possible to just cache it? Chat GPT suggests something like this:
Additional Context
The text was updated successfully, but these errors were encountered: