-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TROUBLESHOOTING.md with JFR recommendations #43092
Conversation
cc @radcortez |
Thanks @geoand dealing with the regression and I will come back to this |
|
||
# profile allocation startup | ||
java -agentpath:/path/to/async-profiler/build/libasyncProfiler.so=start,event=alloc,file=/tmp/startup-alloc-profile.html,interval=1000000,simple\ | ||
-jar my-application.jar | ||
java -agentpath:${PATH_TO_ASYNC_PROFILER}/lib/libasyncProfiler.so=start,alloc=1,total,event=alloc,file=startup-alloc-profile.jfr -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UseTLAB -Xmx1G -Xms1G -XX:+AlwaysPreTouch -jar target/quarkus-app/quarkus-run.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need for -XX:+AlwaysPreTouch
and both -XX:+UseEpsilonGC
or -XX:+UseG1GC
- with the latter which improve tremendously the life of users, because doesn't risk to OOM because cannot recycle memory.
I suggest to use a duration, which is realistic enough considered the actual startup time with the tracing enabled (which can be of many seconds i.e. +12 seconds too!)
i.e.
-agentpath:${PATH_TO_ASYNC_PROFILER}/lib/libasyncProfiler.so=start,alloc=1,total,event=alloc,file=startup-alloc-profile.jfr,timeout=20s
This is required to not mix the stop of application allocation events with the startup only: it means too that users shouldn't stop the application before the expected timeout setup.
There's another option instead, using the async profiler executable and using the application's pid, to forcibly stop the profiler, without specifying any timeout
configuration.
I can explain that if you prefer this option, which is what we do in the QE startstop test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment but I have some further comments:
Don't need for -XX:+AlwaysPreTouch and both -XX:+UseEpsilonGC or -XX:+UseG1GC - with the latter which improve tremendously the life of users, because doesn't risk to OOM because cannot recycle memory.
I don't understand what you're saying tbh. This command line was apparently given by you to Roberto. I'm just using it there.
Can you please provide an alternative command if you think it's not good?
Keep in mind that this command is used solely to profile the startup. So I'm not sure what improve tremendously the life of users
have to do with it? Here we want to get the best picture of the allocations. That's all we want. So if you have an alternative command that does exactly that, it's very welcome.
I suggest to use a duration, which is realistic enough considered the actual startup time with the tracing enabled (which can be of many seconds i.e. +12 seconds too!)
I don't think that's a good option for this particular purpose. I think we want the control of when we stop the app.
The shutdown of a Quarkus application can't be mixed really, it's very well isolated in the flamegraph so I would rather use something that is deterministic and you can stop when you want, rather than something with a timeout that might not be very practical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not let perfection be the enemy of progress on this. I think that we all agree that what @gsmet has added here is an improvement over the what exists in the guide, so best get it in and @franz1981 or anyone else can further improve when they have the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on this @geoand but I'm just brain dumping before I forgot. If the changes here are not possible, I'll send myself a PR
The point about -XX:+AlwaysPreTouch
is that was used for a different purpose (i.e. paying upfront on the RSS side the full cost of a used heap) - and the -XX:+UseEpsilonGC
requires users to carefully size the heap or they will get OOM while profiling, which is a PITA.
I hope that explain why I've suggested to remove the first and and switch to G1 (no need for experimental options too, at this poin!) - while keeping -XX:-UseTLAB
.
So, TLDR @gsmet
Can you please provide an alternative command if you think it's not good?
Yep, is:
java -agentpath:${PATH_TO_ASYNC_PROFILER}/lib/libasyncProfiler.so=start,alloc=1,total,event=alloc=1,file=startup-alloc-profile.jfr -XX:+UseG1GC -XX:-UseTLAB -Xmx1G -Xms1G -jar target/quarkus-app/quarkus-run.jar
I don't think that's a good option for this particular purpose. I think we want the control of when we stop the app.
In that case, your choice: I personally prefer to stop the profiler manually i.e.
${PATH_TO_ASYNC_PROFILER}/bin/asprof stop -f startup-alloc-profile.jfr <pid of the java app>
But if you prefer instead to have the stop, as you rightly said, it should be clearly visible in the flamegraph, and could just be ignored.
|
||
# profile allocation startup | ||
java -agentpath:/path/to/async-profiler/build/libasyncProfiler.so=start,event=alloc,file=/tmp/startup-alloc-profile.html,interval=1000000,simple\ | ||
-jar my-application.jar | ||
java -agentpath:${PATH_TO_ASYNC_PROFILER}/lib/libasyncProfiler.so=start,alloc=1,total,event=alloc,file=startup-alloc-profile.jfr -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UseTLAB -Xmx1G -Xms1G -XX:+AlwaysPreTouch -jar target/quarkus-app/quarkus-run.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add the reference to this comment from Andrei P. at async-profiler/async-profiler#583 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather keep it simple and not point to comments that will make things more complicated for people not familiar with this.
We just want a command that allows to troubleshoot for people who don't know how to do it.
|
||
# profile allocation startup | ||
mvn quarkus:dev -Djvm.args="-agentpath:/path/to/async-profiler/build/libasyncProfiler.so=start,event=alloc,file=/tmp/startup-alloc-profile.html,interval=1000000,simple" | ||
mvn quarkus:dev -Djvm.args="-agentpath:${PATH_TO_ASYNC_PROFILER}/lib/libasyncProfiler.so=start,alloc=1,total,event=alloc,file=startup-alloc-profile.jfr -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UseTLAB -Xmx1G -Xms1G -XX:+AlwaysPreTouch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM is up to @gsmet address the changes; I'm happy regardless, cause is an improvement nonetheless 👍
well done!
FWIW, I was open to suggestions but somehow forgot about this PR :). So feel free to create a follow-up PR if needed. Let's just make sure things are simple enough for people not accustomed to performance work to be able to collect data as it's the primary purpose of this document. |
@franz1981 I updated this doc as it's probably better to have it properly documented rather than "ask Franz or Roberto" :)
Note that I haven't updated the first part with the
profiler.sh
shell script as it's not something I use. But if you have the info, maybe it's worth updating it too?If you can provide me with a recommended command line, I can update it too.