Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate measureme's hardware performance counter support. #78781

Merged
merged 9 commits into from
Jun 14, 2022

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Nov 5, 2020

Note: this is a companion to rust-lang/measureme#143, and duplicates some information with it for convenience

(much later) EDIT: take any numbers with a grain of salt, they may have changed since initial PR open.

Credits

I'd like to start by thanking @alyssais, @cuviper, @edef1c, @glandium, @jix, @Mark-Simulacrum, @m-ou-se, @mystor, @nagisa, @puckipedia, and @yorickvP, for all of their help with testing, and valuable insight and suggestions.
Getting here wouldn't have been possible without you!

(If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs)

Summary

This PR adds support to -Z self-profile for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the rdpmc instruction on x86_64 Linux.

While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads.

Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware.
That said, 32-bit x86 (aka i686) would be almost trivial to add and test once we land the initial x86_64 version (as all the CPU detection code can be reused).

A new flag -Z self-profile-counter was added, to control which of the named measureme counters is used, and which defaults to wall-time, in order to keep -Z self-profile's current functionality unchanged (at least for now).

The named counters so far are:

  • wall-time: the existing time measurement
    • name chosen for consistency with perf.rust-lang.org
    • continues to use std::time::Instant for a nanosecond-precision "monotonic clock"
  • instructions:u: the hardware performance counter usually referred to as "Instructions retired"
    • here "retired" (roughly) means "fully executed"
    • the :u suffix is from the Linux perf tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions
      • see Caveats/Subtracting IRQs for why this isn't entirely true and why instructions-minus-irqs:u should be preferred instead
  • instructions-minus-irqs:u: same as instructions:u, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted
  • instructions-minus-r0420:u: experimental counter, same as instructions-minus-irqs:u but subtracting an undocumented counter (r0420:u) instead of IRQs
    • the rXXXX notation is again from Linux perf, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't really know what it is
    • this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far
    • see Challenges/Zen's undocumented 420 counter for details on how this counter was found and what it does

There are also some additional commits:


(much later) EDIT: take any numbers with a grain of salt, they may have changed since initial PR open.

Write-up / report

Because of how extensive the full report ended up being, I've kept most of it on hackmd.io, but for convenient access, here are all the sections (with individual links):
(someone suggested I'd make a backup, so here it is on the wayback machine - I'll need to remember to update that if I have to edit the write-up)

@rust-highfive
Copy link
Collaborator

r? @davidtwco

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 5, 2020
@eddyb
Copy link
Member Author

eddyb commented Nov 5, 2020

@bors try @rust-timer queue (this won't use the new counters, it's just to check the overhead of the new hacks)

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Nov 5, 2020

⌛ Trying commit 5c0e03110a79a66356646ec3c02b3c0a4b6fb77a with merge 36688b7cf765fadab38d1c07a0e7c3111cac632d...

@eddyb
Copy link
Member Author

eddyb commented Nov 5, 2020

cc @Amanieu Looks like we generate LLVM inline assembly that's not compatible with older versions?
i.e. LLVM 8 on CI, specifically this run: https://github.com/rust-lang/rust/pull/78781/checks?check_run_id=1359732667

LLVM ERROR: Bad $ operand number in inline asm string: 'xor eax, eax
cpuid
mov ecx, ${4:k}
rdpmc'
error: could not compile `measureme`

@bjorn3
Copy link
Member

bjorn3 commented Nov 5, 2020

Intel syntax is not supported by older versions of LLVM. Only AT&T syntax is.

@bors
Copy link
Contributor

bors commented Nov 5, 2020

☀️ Try build successful - checks-actions
Build commit: 36688b7cf765fadab38d1c07a0e7c3111cac632d (36688b7cf765fadab38d1c07a0e7c3111cac632d)

@rust-timer
Copy link
Collaborator

Queued 36688b7cf765fadab38d1c07a0e7c3111cac632d with parent b1d9f31, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (36688b7cf765fadab38d1c07a0e7c3111cac632d): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@bjorn3
Copy link
Member

bjorn3 commented Nov 5, 2020

Regressions of up to 3%.

@eddyb
Copy link
Member Author

eddyb commented Nov 6, 2020

@Mark-Simulacrum Just checking, do we run the -Z self-profile for query details under perf stat, and if so, do we mix that into non--Z self-profile results? Because -Z self-profile will have more instructions now, as we have to match on the Counter.

@Mark-Simulacrum
Copy link
Member

Yes, but we take the minimum of all runs under perf stat (including the self profile one), and always have at least 2 (and IIRC at most 3 right now) runs, including one self profile run.

@Amanieu
Copy link
Member

Amanieu commented Nov 6, 2020

Intel syntax is not supported by older versions of LLVM. Only AT&T syntax is.

You can use options(att_syntax) and then use AT&T syntax for the assembly code and it will work on all LLVM versions.

Copy link
Member

@davidtwco davidtwco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM, r=me when ready

Comment on lines 1311 to 1315
// HACK(eddyb) remove instruction-counting noise from `-Z self-profile`.
#[cfg(target_arch = "x86_64")]
unsafe {
asm!("mfence", options(nostack));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidtwco @nikomatsakis Just it doesn't get lost, I don't think we should merge with this, but I also am not sure what to do here. Do you think it would be reasonable to limit this to when -Z self-profile is enabled?

It's also not easy to test the impact of this fence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure either. I personally wouldn't have a problem with this being limited to when -Z self-profile is enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for -Zself-profile is probably more expensive than unconditionally running mfence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine leaving it in as it is for now. We can analyze its impact once perftest multithreaded rustc

@eddyb
Copy link
Member Author

eddyb commented Nov 20, 2020

Given #78785, it might make sense to define getrandom in librustc_driver and have new flag for replacing getrandom with a deterministic PRNG (e.g. -Z getrandom-seed?) - note that this would not only affect proc macros (e.g. serde) using their own HashMaps, but also uses of randomness in the const_random proc macro, or for things like temp dir names in rustc itself.

@eddyb
Copy link
Member Author

eddyb commented Nov 23, 2020

Intel syntax is not supported by older versions of LLVM. Only AT&T syntax is.

I was confused about this, because I could've sworn I've used Intel syntax inline asm! many years ago.
LLVM documents inteldialect since 3.2: https://releases.llvm.org/3.2/docs/LangRef.html#inlineasm

The code emitting the error we're seeing is still around, but the ${ handling looks like it might be new.

And, there it is: operand modifiers for Intel syntax is new in LLVM 10 (llvm/llvm-project@dc5b614)

So the problem is the {...:e} modifiers in order to get 32-bit registers instead of the 64-bit ones.
Oh well, I guess I am going with AT&T syntax after all (after checking that the generated assembly is identical).
(EDIT: done, see rust-lang/measureme#147)

@eddyb
Copy link
Member Author

eddyb commented Nov 23, 2020

Looks like the CI build got much farther this time (before getting cancelled by the check failure), so that worked.
(EDIT: I went ahead and allowed the temporary git crate source in tidy, so CI can complete)

@bors
Copy link
Contributor

bors commented Dec 3, 2020

☔ The latest upstream changes (presumably #79586) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

@davidtwco davidtwco added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 21, 2020
@crlf0710
Copy link
Member

Triage: there's merge conflicts.

@crlf0710 crlf0710 removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 15, 2021
@eddyb
Copy link
Member Author

eddyb commented Jun 13, 2022

Found likely culprit, missing #[inline] on these:

impl DefaultHasher {
/// Creates a new `DefaultHasher`.
///
/// This hasher is not guaranteed to be the same as all other
/// `DefaultHasher` instances, but is the same as all other `DefaultHasher`
/// instances created through `new` or `default`.
#[stable(feature = "hashmap_default_hasher", since = "1.13.0")]
#[allow(deprecated)]
#[must_use]
pub fn new() -> DefaultHasher {
DefaultHasher(SipHasher13::new_with_keys(0, 0))
}
}
#[stable(feature = "hashmap_default_hasher", since = "1.13.0")]
impl Default for DefaultHasher {
/// Creates a new `DefaultHasher` using [`new`].
/// See its documentation for more.
///
/// [`new`]: DefaultHasher::new
fn default() -> DefaultHasher {
DefaultHasher::new()
}
}

Don't think anyone uses them so I won't bother with a separate benchmarking PR, just add a fix to this PR directly.

@eddyb
Copy link
Member Author

eddyb commented Jun 13, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 13, 2022
@bors
Copy link
Contributor

bors commented Jun 13, 2022

⌛ Trying commit a4f1331 with merge 37e95f0d4e5b87d4b88f8eb37c46b964462717c9...

@bors
Copy link
Contributor

bors commented Jun 13, 2022

☀️ Try build successful - checks-actions
Build commit: 37e95f0d4e5b87d4b88f8eb37c46b964462717c9 (37e95f0d4e5b87d4b88f8eb37c46b964462717c9)

@rust-timer
Copy link
Collaborator

Queued 37e95f0d4e5b87d4b88f8eb37c46b964462717c9 with parent 083721a, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (37e95f0d4e5b87d4b88f8eb37c46b964462717c9): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
0.3% 0.4% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.9% 2.9% 1
Improvements 🎉
(primary)
-1.4% -2.7% 2
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -1.4% -2.7% 2

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: no relevant changes found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-2.3% -2.3% 1
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -2.3% -2.3% 1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Jun 13, 2022
@eddyb
Copy link
Member Author

eddyb commented Jun 13, 2022

@oli-obk Should be ready to merge now, the remaining tt-muncher thing feels like noise or getting unlucky with the choice of deterministic hash key.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 14, 2022

@bors r+

@bors
Copy link
Contributor

bors commented Jun 14, 2022

📌 Commit a4f1331 has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 14, 2022
@bors
Copy link
Contributor

bors commented Jun 14, 2022

⌛ Testing commit a4f1331 with merge 872503d...

@bors
Copy link
Contributor

bors commented Jun 14, 2022

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 872503d to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 14, 2022
@bors bors merged commit 872503d into rust-lang:master Jun 14, 2022
@rustbot rustbot added this to the 1.63.0 milestone Jun 14, 2022
let filename = format!("{}-{}.rustc_profile", crate_name, process::id());
// HACK(eddyb) we need to pad the PID, strange as it may seem, as its
// length can behave as a source of entropy for heap addresses, when
// ASLR is disabled and the heap is otherwise determinic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*deterministic

@eddyb eddyb deleted the measureme-rdpmc branch June 14, 2022 17:58
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (872503d): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
0.8% 1.2% 3
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: no relevant changes found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-0.1% -0.1% 1
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -0.1% -0.1% 1

Cycles

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
2.4% 2.4% 1
Regressions 😿
(secondary)
4.2% 5.0% 3
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 2.4% 2.4% 1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@eddyb
Copy link
Member Author

eddyb commented Jun 24, 2022

Oops this is what I get for not testing it, only realized it now that this is actually broken:

$ echo 'fn main() {}' | rustc +nightly - -Z self-profile -Z self-profile-counter=instructions-minus-irqs:u
warning: failed to create profiler: only supported with measureme's "nightly" feature

I removed features = ["nightly"] because I thought that got removed from measureme, but I guess that's in a newer version than what rustc currently uses.

@eddyb
Copy link
Member Author

eddyb commented Dec 27, 2022

$ echo 'fn main() {}' | rustc +nightly-2022-07-23 - -Z self-profile -Z self-profile-counter=instructions-minus-irqs:u
warning: failed to create profiler: only supported with measureme's "nightly" feature

$ echo 'fn main() {}' | rustc +nightly-2022-07-24 - -Z self-profile -Z self-profile-counter=instructions-minus-irqs:u
thread 'opt hkqbnt908e7l8ng' panicked at 'assertion failed: end <= MAX_INTERVAL_VALUE', /cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/measureme-10.1.0/src/raw_event.rs:56:9

@jyn514 mentioned it never worked, to me, and... yeah, we added features = ["nightly"] but measureme 10.* is just broken, unlike measureme 9.*? (or did I never test multi-thread?)


So this is the cargo check equivalent (which is most of how I tested the counters):

$ echo 'fn main() {}' | rustc +nightly-2022-07-24 - -Z self-profile -Z self-profile-counter=instructions-minus-irqs:u --emit=metadata
$ summarize summarize unknown-crate-*.mm_profdata
Segmentation fault (core dumped)

That's a segfault in prettytable::TableSlice::get_all_column_width?? summarize summarize --json works tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.