-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slowness in cmsRun starts #34633
Comments
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @makortel Matti Kortelainen. @Dr15Jones, @perrotta, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
I ran the step2 of 10024.0 with
@cms-sw/l1-l2 @cms-sw/csc-dpg-l2 Would you have any thoughts why these files are opened repeatedly (540 times each)? |
Testing in an earlier IB CMSSW_12_0_X_2021-07-18-2300 I get time stamp difference between the first CSC txt file open and the last open to be 45 s (the number of file open calls is the same). Hmm. |
I ran the 10024.0 step2 in IgProf in CMSSW_12_0_X_2021-07-26-1100 for 1 event, the profile is here The construction of 98.51 s (i.e. 85 % of the job startup) is spent in the constructor of |
@dildick please take a look at this (yours are the most recent PRs merged in https://github.com/cms-sw/cmssw/commits/master/L1Trigger/CSCTriggerPrimitives) |
"Would you have any thoughts why these files are opened repeatedly (540 times each)?" There are 540 CSC chambers, several LUT readers per chamber. I suppose this could be sped up by making a LUT managers class, instantiating them once, and then pass a shared pointer to each chamber. |
These LUTs are not actually being used by any WF
These should be only if a certain Run-3 flag is set
but that flag is still False. |
@dildick yes, that sounds like a better implementation - in particular, it seems like this info could be obtained via EventSetup. (Then you don't have to pass a shared pointer manually; the framework will give read-only access to whatever needs it.) Do you know why this behavior would have changed with the recent PRs? |
then something is wrong @dildick . I also ran 10024.0 step2 (
|
I recently modified the GEM-CSC trigger for data vs emulation studies. It relies heavily on LUTs, but they are all in This line needs to be in a How would such thing work via EventSetup? Is there an example somewhere I can look into? |
should only be loaded if a flag is set in the CSCALCTCrossCLCT class. |
A PR I can suggest in the mean time to speed up the code (until I have a proper solution in place):
|
@dildick here's one of my old PRs introducing an ESProducer and ESProduct for HCAL radiation damage, which also reads constants from txt files that are then accessed by different parts of the code: #18118 In particular, look at:
And for a simple usage example:
(though modern code should use |
#34638 would temporarily fix some of the problems. |
There is a record in Basically, I would need to define something like |
Thanks @dildick for the quick (partial) fix!
From the framework point of view it is perfectly fine to put many kinds of ES products into the same Record, as long as all those products have the same IOV (technically it is the Record's job denote the IOV). The ESSource for cmssw/CalibMuon/CSCCalibration/plugins/CSCL1TPParameters.cc Lines 33 to 37 in 4bbdac2
so in principle it would be possible to put both ESProducts into the same Record (assuming none of this stuff has ever been put into the CondDB, that imposes different constraints). Written that, there could be other reasons to prefer a separate Record class for the new ESProduct, and that would be fine as well. |
I could add that I'm not convinced that these CSC txt files would be the only cause for the overall slowness (e.g. in the RelVal-INPUT tests). We can always speculate with EOS, CVMFS, Frontier etc being slow, but some further investigation into other potential issues could be useful. |
If I understand correctly I would have to add files like
for a new record called CSCL1TPLookupTable that contains all the lookup tables for the CSC local trigger. At the moment, access to lookup tables is provided through So that needs to be replaced by the ESProduct It looks like The locations of the lookup tables go into |
@dildick Your overall plan looks reasonable. Assuming the |
True i also considered it and ive seen a lot of people had jammed repo's |
Hi @slava77, yes, I can attempt to convert - should be successful and should speed things up here. As for reducing the model size, possibly yes, I'll enquire. However, I am about to take vacations in August. What is the timeline? |
@bainbrid |
@dildick thanks for the quick turn around! I have a few comments const edm::ParameterSet& pset_; needs to be
as the lifetime of the object passed to the module's constructor is not guaranteed to be longer than the module. In your void set_cclutPosition(const t_lut& lut) { cclutPosition_ = lut; } pass the argument by value and then do a void set_cclutPosition(t_lut lut) { cclutPosition_ = std::move(lut); } Then when calling the lut-> set_cclutPosition(std::move(cclutPosition)); This will avoid a copy of the entire container. You can get rid of the static const std::vector<int> wg_min_hs_ME1a_; and, instead, in the .cc file just use an anonyous namespace with namespace {
constexpr std::array<int, 48> CSCALCTCrossCLCT::wg_min_hs_ME1a_ = {...};
...
} |
Unfortunately not... My vacation starts tomorrow, and the conversion will likely require some small dev and some careful validation. I'll get to it as soon as I can. |
I created #34707. Have a nice vacation. |
@Dr15Jones I implemented your suggestions. Testing the code now. Getting this in WF 11634.0.
Where/how should I insert the ES producer |
I think I need to inject it here: https://github.com/cms-sw/cmssw/blob/master/L1Trigger/L1TMuon/python/simDigis_cff.py#L16 and other trigger cff files that load the muon trigger. |
Seems reasonable. In principal, any configuration that directly names the module of type 'CSCTriggerPrimitivesProducer' should have the ESProducer added. I do such checks by doing > git grep CSCTriggerPrimitivesProducer | grep '\.py' In that case only one hit
so it looks like this is more useful > git grep cscTriggerPrimitiveDigis_cfi | grep '\.py' which yields
|
@Dr15Jones I have a running version of the CSC/GEM-CSC trigger with the lookup tables in eventsetup objects. Going to test it on a single muon relval and submit the PR later this week. I should check though if the HLT step is not crashing... I switched this off yesterday in my test. |
@francescobrivio I'm afraid @makortel is on vacation right now. It seems like if you did > runTheMatrix.py -l 10024.0 -t 8 --maxSteps=2 Then step2 would be the one to look at. I'd suggest running valgrind on the step2 process both before and after applying the pull request and then look at the relative timing for the constructor of the module in question. |
Thanks @Dr15Jones ! We'll try that then! (FYI @dildick) |
#34735 is only partial fix. |
Hi all, I will try to tackle this today. Can somebody please give me the commands used to generate the (IgProf?) profiling logs above so that I can test myself? Otherwise, I can just create the PR and let somebody else measure the effect? |
we can test it in the PR with "enable profiling" |
Here is the PR: #34908 |
Yes |
Last week @Dr15Jones and I noticed some slowness in
cmsRun
job startups (like up to 20 min). I noticed todayin #34621 (comment).
I then tested locally workflow 10024.0 in CMSSW_12_0_X_2021-07-26-1100, and noticed the step2 to take 2 minutes from
Successfully opened file
andBegin processing the 1st record
. The job really should not take 2 minutes to get there.The text was updated successfully, but these errors were encountered: