-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make -Z self-profile
more efficient
#58372
Comments
[self-profiler] Make the profiler faster/more efficient Related to #58372 r? @michaelwoerister
So here are some thoughts on making the part of profiling that is happening in the compiler process as cheap as possibe:
As a result of these thoughts I have the following proposal:
What I like about this approach is that it should be rather efficient but it is still generic. I.e. the string table only knows about strings and references between them but does not need to duplicate more logic about compiler internal data structures like the Any feedback is welcome! |
…r=michaelwoerister [self-profiler] Make the profiler faster/more efficient Related to rust-lang#58372 r? @michaelwoerister
…erister [self-profiler] Make the profiler faster/more efficient Related to #58372 r? @michaelwoerister
cc #58967 |
Related to rust-lang#58372 Related to rust-lang#58967
Related to rust-lang#58372 Related to rust-lang#58967
@michaelwoerister I believe this issue can be closed now that we're using |
Yes, I think all things listed here are implemented via measureme. Closing. |
The self-profiling feature is going to make profiling the compilers performance a lot easier. However, a recent first stab at collecting more detailed information (see #58085) still has too much overhead.
Here are some of the things that could be improved:
Move post-processing of the collected data out of the
rustc
process, as much as possible. SelfProfiler::get_results() does a lot of work for generating the statistics from the collected events. All of this should probably be moved to a separate tool that runs after profiling is done.Reduce the amount of dispatch and locking that needs to be done for each event. For each event we have to get exclusive access to the profiler (
RefCell
/parking_lot
mutex) and then look up the event stream for the current thread in anFxHashMap
. This should probably solved via thread-local data somehow.Reduce the size of events. Events are quite big (32 bytes on
x86_64
would be my guess). The timestamp can be reduced to 64 bits if we just measure the time from process start. The&str
containing the query name can be replaced by a 4 byte tag.Persist events to disk in a binary format. We should probably open a memory mapped file per thread that we write events to directly. If events don't contain pointers they can be written to disk verbatim. The post-processing tool can then convert them to something platform independent.
Some time soon we also want to record query keys per event. This can already be done efficiently by storing the 32 bit
DepNodeIndex
that corresponds to a query (which also obviates the need to store the query name in each event). However, in order for theDepNodeIndex
to be useful, we'll need to create a persist a mapping ofDepNodeIndex -> String
at some point before thetcx
is destroyed (i.e. in the middle of the compilation process). I expect that creating this map will not be entirely cheap:/
@Mark-Simulacrum, as you can see the whole workflow around self-profiling will change quite a bit, so I
think it's too early to add infrastructure for it to perf.rlo just yet.
cc @wesleywiser @nnethercote (and @rust-lang/wg-compiler-performance for good measure)
The text was updated successfully, but these errors were encountered: