Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AsmPrint] Correctly factor function entry count when dumping MBB frequencies #67826

Merged
merged 3 commits into from
Sep 30, 2023

Conversation

mtrofin
Copy link
Member

@mtrofin mtrofin commented Sep 29, 2023

The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.

…quencies

The goal in PR llvm#66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp Outdated Show resolved Hide resolved
// were called. This helps, for example, in a type of integration tests
// where we want to cross-validate the compiler's profile with a real
// profile.
// Using double precision because uint64 values used to encode mbb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to mention that parsing double is harder than uint64 and now all values have that burden. It depends on how you are consuming them; up to you to reconsider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harder how? And what would be the alternative?

BTW, was a float until #66818

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double representations are slower to parse, particularly with standard library functions such as strtod, see https://github.com/fastfloat/fast_float.

BTW, was a float until #66818

If you haven't had any concerns before then it probably doesn't matter now.

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// were called. This helps, for example, in a type of integration tests
// where we want to cross-validate the compiler's profile with a real
// profile.
// Using double precision because uint64 values used to encode mbb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double representations are slower to parse, particularly with standard library functions such as strtod, see https://github.com/fastfloat/fast_float.

BTW, was a float until #66818

If you haven't had any concerns before then it probably doesn't matter now.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp Show resolved Hide resolved
const double EntryCount =
static_cast<double>(MF->getFunction().getEntryCount()->getCount());
const double EntryFrequency =
static_cast<double>(MBFI.getBlockFreq(&*MF->begin()).getFrequency());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid the BlockFrequency::getFrequency() function if you can. It breaks the abstraction and I think we could do interesting experiments trying different formats for BlockFrequency infos...

The code here can just do MachineBLockFrequencyInfo::getBlockFreqRelativeToEntryBlock(MBB) (multiplied with the entry count) I think.

Copy link
Member Author

@mtrofin mtrofin Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use it, and I'd prefer double precision, so changed getBlockFreqRelativeToEntryBlock to return double (can factor in a separate nfc if preferred)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would produce the desired values and not break the abstraction, but it will change the precision of the result, returning a float instead of a double. Not sure how much that matters in practice, but the intention here seems to be to use a double.

Also, if we want to preserve this abstraction, maybe we mark getFrequency() as private and use friend classes where necessary to make the API easier to use, depending upon how the implementation here ends up going?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I didn't submit my review fast enough.

I'm not sure if it matters in practice, but getBlockFreqRelativeToEntryBlock is used in the register allocator for calculating spill weights (and in a couple other places, but mostly things like the ML register allocator stuff, PBQP). I don't think the compile time difference would be noticeable at all or if the increase in precision would end up impacting anything (maybe given how some cold branch weights are set by default), but it's something I think is worth considering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we want to preserve this abstraction, maybe we mark getFrequency() as private and use friend classes where necessary to make the API easier to use, depending upon how the implementation here ends up going?

I don't think we are in a position to enforce this (there is a lot of getFrequency() users around), but I'll try to ask for it in reviews of new code when I see it...

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side.

Having the actual MBB frequencies (or as actual as they'll get after propagating through the compilation pipeine) will be nice for cost modelling applications. I've always found it interesting that MBFI doesn't return correct counts even relative to the function entry counts, but I guess that's a separate issue (or feature).

const double EntryCount =
static_cast<double>(MF->getFunction().getEntryCount()->getCount());
const double EntryFrequency =
static_cast<double>(MBFI.getBlockFreq(&*MF->begin()).getFrequency());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would produce the desired values and not break the abstraction, but it will change the precision of the result, returning a float instead of a double. Not sure how much that matters in practice, but the intention here seems to be to use a double.

Also, if we want to preserve this abstraction, maybe we mark getFrequency() as private and use friend classes where necessary to make the API easier to use, depending upon how the implementation here ends up going?

@MatzeB
Copy link
Contributor

MatzeB commented Sep 29, 2023

actual MBB frequencies

I suspect the terminology at use here is that "frequency" means a value relative to the entry block. While I think you are asking for execution "count"s here (like the value you get when you multiply a block frequency with the entry count).

@MatzeB
Copy link
Contributor

MatzeB commented Sep 29, 2023

I don't think I have a strong opinion on float vs double (though FWIW I also don't see a real indication that double would be strictly necessary). If performance is a concern then we really should cache the 1.0 / getEntryFreq() result in MachineBLockFrequencyInfo, but I also have no indication that perf is relevant here so I guess I'm good with switching the API to double.

@boomanaiden154
Copy link
Contributor

I suspect the terminology at use here is that "frequency" means a value relative to the entry block. While I think you are asking for execution "count"s here (like the value you get when you multiply a block frequency with the entry count).

Right, but I would expect the raw frequency (BlockFrequency::getFrequency()) of the entry block to match the function entry count rather than having to use something like FreqAdjustmentFactor like is done here. I'd assume this is some artifact of how the MBFI is computed, but maybe I'm missing something?

@mtrofin
Copy link
Member Author

mtrofin commented Sep 29, 2023

@MatzeB are there any changes left to do (the review still says "changes requested", but not seeing any left - or am I missing smth)

@mtrofin
Copy link
Member Author

mtrofin commented Sep 29, 2023

I suspect the terminology at use here is that "frequency" means a value relative to the entry block. While I think you are asking for execution "count"s here (like the value you get when you multiply a block frequency with the entry count).

Right, but I would expect the raw frequency (BlockFrequency::getFrequency()) of the entry block to match the function entry count rather than having to use something like FreqAdjustmentFactor like is done here. I'd assume this is some artifact of how the MBFI is computed, but maybe I'm missing something?

Ya, the bbs (incl. entry) are given uint64 values such that the rational number obtained when dividing a mbb's value to the entry's value can approximately map over the range of rational numbers representing various basic blocks' measured frequencies. So F.getEntryCount() is basically unrelated to the count value of the entry basic block.

Copy link
Contributor

@MatzeB MatzeB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@MatzeB MatzeB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mtrofin mtrofin merged commit f179486 into llvm:main Sep 30, 2023
mtrofin added a commit that referenced this pull request Sep 30, 2023
@mtrofin mtrofin deleted the bbdump branch September 30, 2023 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants