-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speedup SiStripClusterizer(FromRaw) using ThreeThresholdAlgorithm #47061
speedup SiStripClusterizer(FromRaw) using ThreeThresholdAlgorithm #47061
Conversation
…efiting from .cc inlines
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47061/43220 |
A new Pull Request was created by @slava77 for master. It involves the following packages:
@atpathak, @cmsbuild, @consuegs, @jfernan2, @mandrenguyen, @perrotta can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild please test |
type performance-improvements |
+1 Size: This PR adds an extra 32KB to repository Comparison SummarySummary:
|
these are apparently from non-reproducible workflows: .7 (all-mkfit), and phase-2 |
+1 |
+alca
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
bool allBadBetween(uint16_t L, const uint16_t& R) const { | ||
while (++L < R && bad(L)) { | ||
}; | ||
return L == R; | ||
} | ||
static constexpr uint16_t kMaxStrips = 768; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could one think about sparing some empty entries by using vectors and resizing at construction time with the correct number of strips (4 APVs vs 6 APVs)? The size is known via the detid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmusich
do you happen to know if the number of APVs is strictly connected to the already available variables in the SiStripClusterizerConditions::emplace_back
method: invGains
or connections
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think invGains.size()
should know about it:
cmssw/RecoLocalTracker/SiStripClusterizer/plugins/SiStripClusterizerConditionsESProducer.cc
Lines 68 to 71 in f0dc95b
std::vector<float> invGains; | |
invGains.reserve(6); | |
std::transform( | |
gainRange.first, gainRange.second, std::back_inserter(invGains), [](auto gain) { return 1.f / gain; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using vectors
I expect that access to vector<bool>
is slower than array<bool
. I guess I can use vector<uint8_t>
or char
to have a matching performance
+1 |
I'm not sure why 16x; there are 14.5K strip modules. |
Could the following failure be related to this PR? |
nope, see #46853 (comment) |
Ah, thanks. Sorry for the noise. |
The primary goal was to speedup the full-unpacking configuration of
SiStripClusterizerFromRaw
.The following relatively straightforward updates are made:
ThreeThresholdAlgorithm
is passed using templates and most methods ofThreeThresholdAlgorithm
are inlinedSiStripClusterizerConditions
(the downside here is around 32MB of memory in conditions data and the cost of precomputation equal to around 10 events of strip unpacking cost)Overall
SiStripClusterizerFromRaw
is faster by 27% on ttbar relval Run3 MC, running with full unpacking (measured with callgrind on 300 events on the HLT config in CMSSW_14_1_4_patch5):No differences in physics results are expected.
In case backports are needed, the commits are made so that the backport is trivial down to 14_1_X.
@mmasciov