Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanning up depricating L1 O2O Tags from the CondDB #958

Open
panoskatsoulis opened this issue Nov 17, 2021 · 31 comments
Open

Cleanning up depricating L1 O2O Tags from the CondDB #958

panoskatsoulis opened this issue Nov 17, 2021 · 31 comments

Comments

@panoskatsoulis
Copy link

This issue is meant to be a workspace for discussions about cleaning up the database from old/deprecated/unused Tags regarding the L1Trigger

FYI: @rekovic @hjkwon260

@hjkwon260
Copy link

Here are list of tags included in the Run3 offline GTs which need to be checked if they can be dropped from them:

  • L1CaloEcalScale_CRAFT09_hlt
  • L1CaloGeometry_CRAFT09_hlt
  • L1CaloHcalScale_CRAFT09_hlt
  • L1EmEtScale_CRAFT09_hlt
  • L1GctChannelMask_CRAFT09v2_hlt
  • L1GctJetFinderParams_CRAFT09_hlt
  • L1GtBoardMaps_CRAFT09_hlt
  • L1GtParameters_CRAFT09_hlt
  • L1GtPrescaleFactorsAlgoTrig_CRAFT09v2_hlt
  • L1GtPrescaleFactorsTechTrig_CRAFT09v2_hlt
  • L1GtPsbSetup_CRAFT09v2_hlt
  • L1GtStableParameters_CRAFT09_hlt
  • L1GtTriggerMaskAlgoTrig_CRAFT09v2_hlt
  • L1GtTriggerMaskTechTrig_CRAFT09v2_hlt
  • L1GtTriggerMaskVetoAlgoTrig_CRAFT09_hlt
  • L1GtTriggerMaskVetoTechTrig_CRAFT09v2_hlt
  • L1GtTriggerMenu_CRAFT09_hlt
  • L1HfRingEtScale_CRAFT09_hlt
  • L1HtMissScale_CRAFT09_hlt
  • L1JetEtScale_CRAFT09_hlt
  • L1MuCSCPtLut_CRAFT09v2_hlt
  • L1MuCSCTFAlignment_CRAFT09_hlt
  • L1MuCSCTFConfiguration_CRAFT09v4_hlt
  • L1MuDTEtaPatternLut_CRAFT09_hlt
  • L1MuDTExtLut_CRAFT09_hlt
  • L1MuDTPhiLut_CRAFT09_hlt
  • L1MuDTPtaLut_CRAFT09_hlt
  • L1MuDTQualPatternLut_CRAFT09_hlt
  • L1MuDTTFMasks_CRAFT09_hlt
  • L1MuDTTFParameters_CRAFT09_hlt
  • L1MuGMTChannelMask_CRAFT09_hlt
  • L1MuGMTParameters_CRAFT09_hlt
  • L1MuGMTScales_CRAFT09_hlt
  • L1MuTriggerPtScale_CRAFT09_hlt
  • L1MuTriggerScales_CRAFT09_hlt
  • L1RCTChannelMask_CRAFT09v2_hlt
  • L1RCTNoisyChannelMask_CRAFT09_hlt
  • L1RCTParameters_CRAFT09_hlt
  • L1RPCBxOrConfig_CRAFT09v3_hlt
  • L1RPCConeDefinition_CRAFT09v2_hlt
  • L1RPCConfig_CRAFT09v2_hlt
  • L1RPCHsbConfig_CRAFT09v2_hlt
  • L1RPCHwConfig_v2_hlt
  • L1TriggerKeyList_CRAFT09_hlt
  • L1TriggerKey_CRAFT09v3_hlt

@panoskatsoulis
Copy link
Author

Hello,
after checking with the Github search function for these old Tags, in the source code, there is no hardcoded reference to any of these. (only some very very old issues)

The next step is to check the Run3 data WFs using this [1] custom made GT that drops the Tags mentioned above.
We think that if no WF crashes with the testing tag we should be safe removing them. (follow up soon)

[1] 121X_dataRun3_L1TriggerTagRemoval_v1 - for diff press-here

FYI @hjkwon260 @tvami @rekovic

@tvami
Copy link

tvami commented Feb 1, 2022

Hi @panoskatsoulis , do you have any news on this?

@panoskatsoulis
Copy link
Author

Hi @tvami , no yet I was taken by other issues.
I will report more this week before the next AlCa meeting
sry for the delay

@tvami
Copy link

tvami commented Feb 2, 2022

Hi @panoskatsoulis
great thanks!
Could you please also have a look at cms-sw/cmssw#36806 ? ? That is actually a more urgent question.

Btw, I also tried to reach the L1T team on Mattermost here
https://mattermost.web.cern.ch/cms-l1t-dpg/channels/town-square/x1dwngcwkffkby4woft7qxi1ea
but not answer.
What is the best channel to interact with you all?

Thanks!

@epalencia
Copy link

Hi @tvami , you can reach us (Cécile and myself) at [email protected]

@tvami
Copy link

tvami commented Feb 2, 2022

Hi @epalencia great, thank you! I already taged Cécile in cms-sw#36806
I'll write an email next time if it's urgent.

@panoskatsoulis
Copy link
Author

Hi all, getting back to this, can somebody point me to some run3 data general WF that we want to test the above testing GT against??

I am playing with this atm for debugging it
cmsDriver.py test_data_run3 -s RAW2DIGI,RECO,ALCA --data --conditions 121X_dataRun3_L1TriggerTagRemoval_v1 --era Run3 --eventcontent RECO --nThreads=1 --dasquery="file run=346450 dataset=/MinimumBias9/Commissioning2021-v1/RAW"

@tvami
Copy link

tvami commented Feb 7, 2022

138.4 is a prompt reco wf, that should probably work, so the same command I gave you on the other thread should suffice too

runTheMatrix.py --what standard -l 138.4 --command "--conditions 121X_dataRun3_L1TriggerTagRemoval_v1 -n 1000" -t 8

please note that the GT was done in 12_1_X CMSSW, so it won't probably work in 12_3_X IBs, so let's try to converge in 12_1_X and then we can produce a new test GT in 12_3_X that can go together with a possible PR that excludes the non-required tag consumption for Run-3

@panoskatsoulis
Copy link
Author

panoskatsoulis commented Feb 8, 2022

Hello guys,
so here is the first report on these crashes. I am using the command posted by @tvami

runTheMatrix.py --what standard -l 138.4 --command "--conditions 121X_dataRun3_L1TriggerTagRemoval_v1 -n 1000" -t 8

I found various modules crashing and I tried to eliminate them one-by-one to document them but for some it's not that easy because they produce items consumed by following modules. So I've thought to debug it step-by-step.

Here is a first list of modules found crashing, the missing record, and also the depricated/removed Tag from 121X_dataRun3_v8 that was providing the Rcd.

Possible Group Module Record Removed Tag
DQM/JetMET METAnalyzer L1GtTriggerMenuRcd L1GtTriggerMenu_CRAFT09_hlt
DQM/JetMET JetAnalyzer L1GtTriggerMenuRcd L1GtTriggerMenu_CRAFT09_hlt
DQM/BPH BPHMonitor L1GtStableParametersRcd L1GtStableParameters_CRAFT09_hlt
DQM/Strips SiStripMonitorCluster L1GtStableParametersRcd L1GtStableParameters_CRAFT09_hlt
DQM/Muons MuonMonitor L1GtStableParametersRcd L1GtStableParameters_CRAFT09_hlt
L1Trigger L1GlobalTriggerRawToDigi L1MuTriggerScalesRcd
L1MuTriggerPtScaleRcd
L1GtBoardMapsRcd
L1MuTriggerScales_CRAFT09_hlt
L1MuTriggerPtScale_CRAFT09_hlt
L1GtBoardMaps_CRAFT09_hlt

For the last one I need to talk with @epalencia and we'll report more

@panoskatsoulis
Copy link
Author

Once we got more info on these modules and why they use so much deprecated Tags, I can keep debugging as I need to overcome these crashes to see if other modules crash as well
We need the contacts for those responsible

@tvami
Copy link

tvami commented Feb 8, 2022

Thanks @panoskatsoulis
Are these used in stage-1 or stage-2 L1 reco?

@panoskatsoulis
Copy link
Author

panoskatsoulis commented Feb 9, 2022

Hi @tvami ,
about the L1 module only and given that the name L1GlobalTriggerRawToDigi, this should be the GT unpacker so I would say stage2 since it's also running with this run3 command you have given to me.
I will check and ask today in the SW meeting and will come back to you.

@panoskatsoulis
Copy link
Author

I had a quick talk with GT people and he confirms that this should be an old module.
Also, something that skipped my attention is that the naming scheme says L1 and not L1T that is known to be used for stage2 code.
This really seems to be an old module that only is getting initialized (so it tries to fetch the Tag) and we could try to deactivate it.
I will look into this and try to make a PR to keep the debug going.

However, I'm not sure that I can help a lot with the other modules, we need people of these groups.
It's really possible that it is the same story tho.

@tvami
Copy link

tvami commented Feb 9, 2022

Hi @panoskatsoulis
thanks for tracking this down,
I suggested Aaron to use the modifier approach for another problem here:
https://github.com/cms-sw/cmssw/pull/36916/files#r802035761
I think you could do something similar that L1GlobalTriggerRawToDigi should be excluded from Run-3 workflows?

@panoskatsoulis
Copy link
Author

Possibly, I have to test it

@tvami
Copy link

tvami commented Feb 16, 2022

@panoskatsoulis do you have any news on this?

@panoskatsoulis
Copy link
Author

Hello, in prep for the discussion today I performed also the tests below with reason to pinpoint better where the dependencies exist.

Specifically, after modifying the wf 138.4 with removing steps here are the results

  • Raw2Digi → Passed
  • Raw2Digi + L1Reco → Crashed (L1 modules)
  • Raw2Digi + RECO → Passed
  • Raw2Digi + DQM → Crashed (DQM modules)
  • Raw2Digi + RECO + AlCa → Passed
    — — — — — — — — — — — — — — — —
    Raw2Digi + RECO + AlCa
    +RECO_output → Crashed (fetches deprecated "L1" modules)
    +MINIAOD_output → Crashed (fetches deprecated "L1" modules)
    +DQM_output → Passed (!) dummy output file
    +AlCa_output → Crashed (prefetch for ‘ALCARECOStreamTkAlMinBias’)

@tvami
Copy link

tvami commented Feb 21, 2022

I decoupled the DQM modules in cms-sw#37016

@tvami
Copy link

tvami commented Feb 21, 2022

@panoskatsoulis can you maybe post what is the msg when you did

Raw2Digi + DQM → Crashed (DQM modules)

So we could give that full list to the DQM experts?

@tvami
Copy link

tvami commented Feb 22, 2022

@panoskatsoulis please note this PR cms-sw#37033

@tvami
Copy link

tvami commented Feb 24, 2022

Hi @panoskatsoulis so now that cms-sw#37033 is about to be merged, you could use the next IB for testing. I made a queue for this testing, named 123X_dataRun3_Prompt_RemoveL1Tags_Queue please use this as a GT (for now, in general one should never do this)

A queue is a live GT so we can do changes dynamically. I.e. for now I've removed L1GtTriggerMenuRcd only. Please check the wf, and for now I only expect the L1GlobalTriggerRawToDigi to fail.

After that's fixed, we can remove the next record from the very same queue.

@tvami
Copy link

tvami commented Mar 1, 2022

Hi @panoskatsoulis can you confirm that everything is fine with 123X_dataRun3_Prompt_RemoveL1Tags_Queue in for example CMSSW_12_3_X_2022-02-28-2300 ?

@panoskatsoulis
Copy link
Author

Hello @tvami , sorry for the long pause on this
thanks a lot for the new 123X queue, I will test and report back today.

We will also have some chat in the L1 Weekly meeting in a while about this, so we will come back with more info and possibly also people who can resolve this if it's still crashing in the 123X

@panoskatsoulis
Copy link
Author

Hi again,
I'm still getting METAnalyzer errors with this built that you suggested CMSSW_12_3_X_2022-02-28-2300
(I suppose it is meant that the new DQM prs should be already merged in this 123X night build, correct?)

Here is my test command:

runTheMatrix.py --what standard -l 138.4 --command "--conditions 123X_dataRun3_Prompt_RemoveL1Tags_Queue -n 1000" -t 8

and here is my output on CMSSW_12_3_X_2022-02-28:

%MSG-w UnusedProductsForCanDeleteEarly:  AfterModDestruction  01-Mar-2022 14:27:36 CET pre-events
The following products in the 'canDeleteEarly' list are not used in this job and will be ignored.
 If possible, remove the producer from the job or add the product to the producer's own 'mightGet' list.
 recoIsoDepositedmValueMap_gedElPFIsoDepositChargedAll__reRECO
 recoIsoDepositedmValueMap_gedElPFIsoDepositCharged__reRECO
 recoIsoDepositedmValueMap_gedElPFIsoDepositGamma__reRECO
 recoIsoDepositedmValueMap_gedElPFIsoDepositNeutral__reRECO
 recoIsoDepositedmValueMap_gedElPFIsoDepositPU__reRECO
%MSG
PersistencyIO    INFO  +++ Set Streamer to dd4hep::OpaqueDataBlock
DD4hep           WARN  ++ Using globally Geant4 unit system (mm,ns,MeV)
DD4CMS           INFO  +++ Processing the CMS detector description xml-memory-buffer
Detector         INFO  *********** Created World volume with size: 101000 101000 450000
Detector         INFO  +++ Patching names of anonymous shapes....
DDDefinition     INFO  +++ Finished processing xml-memory-buffer
----- Begin Fatal Exception 01-Mar-2022 14:27:59 CET-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing global begin Run run: 346512
   [1] Calling method for module METAnalyzer/'pfMetDQMAnalyzer'
Exception Message:
No "L1GtTriggerMenuRcd" record found in the EventSetup.

 Please add an ESSource or ESProducer that delivers such a record.
----- End Fatal Exception -------------------------------------------------
01-Mar-2022 14:27:59 CET  Closed file root://eoscms.cern.ch//eos/cms/store/data/Commissioning2021/MinimumBias/RAW/v1/000/346/512/00000/be4e0e99-6d25-4b6f-8648-1adefb79c7bf.root

@tvami
Copy link

tvami commented Mar 1, 2022

Ok, there is another PR that moves different clients to Run-3 eras, let's see if that will fix pfMetDQMAnalyzer, let's come back to this when that merges. Thanks for checking

@panoskatsoulis
Copy link
Author

Hi, returning to this I ran the following test under the releases 12_3_0_pre6
This time I didn't run the DQM modules on purpose by modifying the faulty step2 of the WF. See the steps I followed below

  1. runTheMatrix for 138.4
  2. got the cmsDriver cmd (step2) and removed steps ALCA, DQM (so got rid of the above issue - still present in pre6)
  3. ran cmsDriver.py ... -s RAW2DIGI, L1Reco, RECO ...
    The resulting cfg runs "error-less"
    (actually the L1Reco sequence is entirely empty - @rekovic or @bundocka is this possible that the entire L1Reco is a deprecated "L1" step that is no longer in use?)

Also, @tvami is there any change that I've missed from your side? (again sorry for the 20 days break on this)
Seems that from L1 this is good for pre6

(testing more 12_3_0 pre-releases atm)

@tvami
Copy link

tvami commented Mar 30, 2022

Hi @panoskatsoulis sorry, I was on vacation.

There is no change in the queue, I removed a single tag in it, so if this works, we can remove the next one.
Although if I understand correctly, it is still consumed it the DQM step, so it would be good if you could get in touch with the L1T DQM team responsibles for fixing the DQM code (I'm not sure who those people are)

@panoskatsoulis
Copy link
Author

panoskatsoulis commented Mar 30, 2022

Hi @tvami no worries

the strange thing in 12_3_0 pre6 is that the entire "L1Reco" sequence was empty.
And it was this sequence that was crashing containing the old modules.
So even if you further remove Tags from the queue I don't think it makes difference in the result (excluding the DQM step).

Did you find L1-DQM modules that were requesting the old tags?
I don't think that I saw any of them but I didn't get into depth on the DQM modules (because I was thinking that the crashing ones were not L1 modules). Do you have a list of the L1 modules?
I can look into these.

Also, @epalencia can you add here L1DQM people who might can help with looking on the L1 dqm side too?

@tvami
Copy link

tvami commented Mar 30, 2022

I didnt test this any recently. Let me add from our side our software coordinator Yuan @yuanchao , I asked him to look into this from the AlCa side.

@epalencia
Copy link

I'm adding @vukasinmilosevic, who might know about the L1 DQM side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants