-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kotlin coroutines with loops are 18 times slower under the Graal CI. #1330
Comments
@shelajev pointed out that it all comes from the fact that only structurred loops are supported in IR and this will not likely to change in the nearest future. To understand where this irreducible loop comes from (and why we can't simply fix Kotlin compiler here), one should understand what coroutines are. Functions like this:
we roughly (I'll ignore all exception handling here) translate it to the following form:
additionally, we generate a separate class
Don't hesitate to ask me for further explanations of this example because I am slightly biased about coroutines internals simplicity :) Now back to irreducible loops.
Applying the same transformation procedure, we end up in a situation when code path can jump directly into loop's body, And this is exactly an irreducible loop that Graal cannot compile :( |
This is the same root cause as #366 As you diagnosed, this is caused by bytecode containing irreducible loops which is not supported in the Graal compiler. Hopefully Kotlin only produce cases where there are "few" extra entries (or those extra entries jump towards the end of the loop, close to the back-edges) otherwise it will cause an explosion of the code size for large loops. |
It is very unfortunate to hear as we had plans to build native images of coroutines-based applications as well.
It grows linearly with the count of suspension points (calls to another Feel free to ping me if you need help with evaluation/testing of the potential (?) change on the Graal side, we have a bunch of applications that exploit coroutines. |
Hi @qwwdfsad, i have implemented a simple duplication strategy (not merged yet) so to evaluate it i'd be interested in other workloads that run into this issue. FYI on 1.8.0_212
|
Nice! It is hard for me to extract such workloads into separate self-containing projects, but I can point a couple of our specific benchmarks in different projects (with steps how to configure and run them). Is it okay? For example: |
That would be great
I'll probably put that code behind a flag at first so you can do that but i wanted to do some basic testing on my side first to avoid too many round-trips.
I will start with that. |
@gilles-duboscq I bumped into the same issue when trying to get GraalVM to do native image generation for Ballerina. I would be more than happy to test your fix. In the mean time you can reproduce this as follows,
|
Hi, could you please elaborate on the status of this fix? Do you need any additional help from me, e.g. new benchmarks or test suites? |
I had a first version but i noticed some issues while adding more tests. I have ideas about how to fix it and still plan to do it but i have no ETA. |
Hi @gilles-duboscq , any chance to share some status update if any ? Thanks in advance. |
Hi, no, i have not been able to allocate any time to that. |
1 similar comment
Hi, no, i have not been able to allocate any time to that. |
Is there any chance you could open a PR with your version so the community could potentially take it from there ? Thanks in advance. |
As i said in #366, i'm planning to take a new look at this for 20.1.0 |
This should be fixed by 4662877. The fix is included in the latest 20.1 dev build (e.g., 20.1.0-dev-20200325_0537). Using the
|
Amazing! I've tested it on some of our workloads. When no suspension happens, it is on par with C2, but as soon as a benchmark has a hot-loop with a suspension, it is significantly faster. E.g.:
(I'd say that Great job! |
Glad to hear it helped your use-case. Thank you for the report. |
Reproducing project:
https://github.com/qwwdfsad/coroutines-graal
Overview:
FlowBenchmark
is constructed to expose a non-standard pattern which Graal fails to compile.Flow
is a very simplified version ofkotlinx.coroutines.Flow
, a suspension-based primitive for operating with reactive streams.Benchmark results
How to run:
./gradlew --no-daemon cleanJmhJar jmhJar && java -jar benchmarks.jar
from the root folder.Results:
dtraceasm
profiler shows that all the time spent in the interpreter, mostly infast_aputfield
(probably it is a coroutine state machine spilling).Native call-stacks obtained via
async-profiler
are polluted withInterpreterRuntime::frequency_counter_overflow
from the uppermost Java frame (flow.FlowBenchmark$numbers$$inlined$flow$1::collect
), that is, by the way, compiled with C1.Compilation log
Compilation log contains pretty suspicious statements about target method:
The text was updated successfully, but these errors were encountered: