-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AsmPrint] Correctly factor function entry count when dumping MBB frequencies #67826
Conversation
…quencies The goal in PR llvm#66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.
// were called. This helps, for example, in a type of integration tests | ||
// where we want to cross-validate the compiler's profile with a real | ||
// profile. | ||
// Using double precision because uint64 values used to encode mbb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to mention that parsing double is harder than uint64 and now all values have that burden. It depends on how you are consuming them; up to you to reconsider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harder how? And what would be the alternative?
BTW, was a float until #66818
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double representations are slower to parse, particularly with standard library functions such as strtod, see https://github.com/fastfloat/fast_float.
BTW, was a float until #66818
If you haven't had any concerns before then it probably doesn't matter now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
// were called. This helps, for example, in a type of integration tests | ||
// where we want to cross-validate the compiler's profile with a real | ||
// profile. | ||
// Using double precision because uint64 values used to encode mbb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double representations are slower to parse, particularly with standard library functions such as strtod, see https://github.com/fastfloat/fast_float.
BTW, was a float until #66818
If you haven't had any concerns before then it probably doesn't matter now.
const double EntryCount = | ||
static_cast<double>(MF->getFunction().getEntryCount()->getCount()); | ||
const double EntryFrequency = | ||
static_cast<double>(MBFI.getBlockFreq(&*MF->begin()).getFrequency()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid the BlockFrequency::getFrequency()
function if you can. It breaks the abstraction and I think we could do interesting experiments trying different formats for BlockFrequency
infos...
The code here can just do MachineBLockFrequencyInfo::getBlockFreqRelativeToEntryBlock(MBB)
(multiplied with the entry count) I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use it, and I'd prefer double precision, so changed getBlockFreqRelativeToEntryBlock
to return double
(can factor in a separate nfc if preferred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would produce the desired values and not break the abstraction, but it will change the precision of the result, returning a float
instead of a double
. Not sure how much that matters in practice, but the intention here seems to be to use a double
.
Also, if we want to preserve this abstraction, maybe we mark getFrequency()
as private and use friend classes where necessary to make the API easier to use, depending upon how the implementation here ends up going?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I didn't submit my review fast enough.
I'm not sure if it matters in practice, but getBlockFreqRelativeToEntryBlock
is used in the register allocator for calculating spill weights (and in a couple other places, but mostly things like the ML register allocator stuff, PBQP). I don't think the compile time difference would be noticeable at all or if the increase in precision would end up impacting anything (maybe given how some cold branch weights are set by default), but it's something I think is worth considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we want to preserve this abstraction, maybe we mark getFrequency() as private and use friend classes where necessary to make the API easier to use, depending upon how the implementation here ends up going?
I don't think we are in a position to enforce this (there is a lot of getFrequency()
users around), but I'll try to ask for it in reviews of new code when I see it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side.
Having the actual MBB frequencies (or as actual as they'll get after propagating through the compilation pipeine) will be nice for cost modelling applications. I've always found it interesting that MBFI doesn't return correct counts even relative to the function entry counts, but I guess that's a separate issue (or feature).
const double EntryCount = | ||
static_cast<double>(MF->getFunction().getEntryCount()->getCount()); | ||
const double EntryFrequency = | ||
static_cast<double>(MBFI.getBlockFreq(&*MF->begin()).getFrequency()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would produce the desired values and not break the abstraction, but it will change the precision of the result, returning a float
instead of a double
. Not sure how much that matters in practice, but the intention here seems to be to use a double
.
Also, if we want to preserve this abstraction, maybe we mark getFrequency()
as private and use friend classes where necessary to make the API easier to use, depending upon how the implementation here ends up going?
I suspect the terminology at use here is that "frequency" means a value relative to the entry block. While I think you are asking for execution "count"s here (like the value you get when you multiply a block frequency with the entry count). |
I don't think I have a strong opinion on |
Right, but I would expect the raw frequency ( |
@MatzeB are there any changes left to do (the review still says "changes requested", but not seeing any left - or am I missing smth) |
Ya, the bbs (incl. entry) are given uint64 values such that the rational number obtained when dividing a mbb's value to the entry's value can approximately map over the range of rational numbers representing various basic blocks' measured frequencies. So |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.