-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track classifier using TensorFlow - continuation #32128
Conversation
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32128/19791
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
Apparently
(maybe on the same go the API could be changed to return std::unique_ptr to make the ownership semantics clear)
FYI @riga @mialiu149 The public interface of EventSetup products must disallow modifications. |
You could if you would also address #31682 (comment), I.e.
Then |
Interesting, I wonder why. Can you show the stack trace (even if it probably won't help much)? |
Yes, we should add that method.
Sounds reasonable, but I'm not sure about the implications yet. I'll look into it. |
@minxiyang can you please address code checks? scram b code-checks |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32128/19847
|
A new Pull Request was created by @minxiyang (Minxi Yang) for master. It involves the following packages: RecoTracker/FinalTrackSelectors @perrotta, @jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some reco comments.
Please address also the core team comment about simplifying/correcting how the GraphDef is loaded through the ES infrastructure: https://github.com/cms-sw/cmssw/pull/32128/files#issuecomment-726427600
Thanks!
@@ -0,0 +1,2 @@ | |||
11024.0_TTbar_13+2018PU+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoFakeHLTPU+HARVESTFakeHLTPU+Nano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED Step4-PASSED - time date Tue Nov 10 17:34:43 2020-date Tue Nov 10 17:32:23 2020; exit: 0 0 0 0 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file should be removed from the PR
}; | ||
|
||
namespace { | ||
struct TfDnn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment #31682 (comment) from the original PR should be addressed:
- if you want to describe a simple data structure with no constructors/destructors use
struct
- if constructors, destructors or nontrivial methods are needed, use
class
Here the reco recommendation is to use a class rather than a struct, unless there is a specific reason to use a struct.
float operator()(reco::Track const& trk, | ||
reco::BeamSpot const& beamSpot, | ||
reco::VertexCollection const& vertices) const { | ||
Point bestVertex = getBestVertex(trk, vertices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here a copy is being made of the vertex, is this intended? If not, const auto& bestVertex = ...
should be fine.
@@ -0,0 +1,3 @@ | |||
#include "PhysicsTools/TensorFlow/interface/TensorFlow.h" | |||
#include "FWCore/Utilities/interface/typelookup.h" | |||
TYPELOOKUP_DATA_REG(tensorflow::Session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per #31682 (comment), this should be unnecessary and should be removed.
@@ -0,0 +1,13 @@ | |||
qualityCutDictionary = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ask to put a reference/indico URL to the slides where these numbers were presented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a reference URL here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be addressed for the final version
@@ -0,0 +1,22 @@ | |||
#ifndef TrackingTools_Records_TfGraphRecord_TfGraphRecord_h | |||
#define TrackingTools_Records_TfGraphRecord_TfGraphRecord_h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CMS convention for the header guard would be
TrackingTools_Records_TfGraphRecord_h
: tfDnnLabel_(cfg.getParameter<std::string>("tfDnnLabel")) | ||
|
||
{} | ||
TfDnnCache* cache_ = new TfDnnCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the initialization in the constructor instead of at variable declaration?
@cmsbuild please test |
The tests are being triggered in jenkins.
|
-1 Tested at: 9262727 CMSSW: CMSSW_11_2_X_2020-11-18-2300 I found follow errors while testing this PR Failed tests: RelVals AddOn
When I ran the RelVals I found an error in the following workflows: runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log5.1 step1 runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log140.53 step2 runTheMatrix-results/140.53_RunHI2011+RunHI2011+RECOHID11+HARVESTDHI/step2_RunHI2011+RunHI2011+RECOHID11+HARVESTDHI.log140.56 step2 runTheMatrix-results/140.56_RunHI2018+RunHI2018+RECOHID18+HARVESTDHI18/step2_RunHI2018+RunHI2018+RECOHID18+HARVESTDHI18.log4.22 step3 runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step3_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log4.53 step3 runTheMatrix-results/4.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step3_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log136.731 step3 runTheMatrix-results/136.731_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2/step3_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2.log136.793 step3 runTheMatrix-results/136.793_RunDoubleEG2017C+RunDoubleEG2017C+HLTDR2_2017+RECODR2_2017reHLT_skimDoubleEG_Prompt+HARVEST2017/step3_RunDoubleEG2017C+RunDoubleEG2017C+HLTDR2_2017+RECODR2_2017reHLT_skimDoubleEG_Prompt+HARVEST2017.log136.874 step3 runTheMatrix-results/136.874_RunEGamma2018C+RunEGamma2018C+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Offline_L1TEgDQM+HARVEST2018_L1TEgDQM/step3_RunEGamma2018C+RunEGamma2018C+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Offline_L1TEgDQM+HARVEST2018_L1TEgDQM.log8.0 step4 runTheMatrix-results/8.0_BeamHalo+BeamHalo+DIGICOS+RECOCOS+ALCABH+HARVESTCOS/step4_BeamHalo+BeamHalo+DIGICOS+RECOCOS+ALCABH+HARVESTCOS.log101.0 step1 runTheMatrix-results/101.0_SingleElectronE120EHCAL+SingleElectronE120EHCAL/step1_SingleElectronE120EHCAL+SingleElectronE120EHCAL.log7.3 step3 runTheMatrix-results/7.3_CosmicsSPLoose_UP18+CosmicsSPLoose_UP18+DIGICOS_UP18+RECOCOS_UP18+ALCACOS_UP18+HARVESTCOS_UP18/step3_CosmicsSPLoose_UP18+CosmicsSPLoose_UP18+DIGICOS_UP18+RECOCOS_UP18+ALCACOS_UP18+HARVESTCOS_UP18.log1000.0 step2 runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log1001.0 step2 runTheMatrix-results/1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVDSIPIXELCALRUN1+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5/step2_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVDSIPIXELCALRUN1+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5.log1306.0 step3 runTheMatrix-results/1306.0_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15/step3_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15.log9.0 step3 runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log1330.0 step3 runTheMatrix-results/1330.0_ZMM_13+ZMM_13+DIGIUP15+RECOUP15_L1TMuDQM+HARVESTUP15_L1TMuDQM+NANOUP15/step3_ZMM_13+ZMM_13+DIGIUP15+RECOUP15_L1TMuDQM+HARVESTUP15_L1TMuDQM+NANOUP15.log312.0 step3 runTheMatrix-results/312.0_Pyquen_ZeemumuJets_pt10_2760GeV_2021+Pyquen_ZeemumuJets_pt10_2760GeV_2021+DIGIHI2021MIX+RECOHI2021MIX+HARVESTHI2021PPRECO/step3_Pyquen_ZeemumuJets_pt10_2760GeV_2021+Pyquen_ZeemumuJets_pt10_2760GeV_2021+DIGIHI2021MIX+RECOHI2021MIX+HARVESTHI2021PPRECO.log25.0 step3 runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log10042.0 step3 runTheMatrix-results/10042.0_ZMM_13+2017+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step3_ZMM_13+2017+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log10024.0 step3 runTheMatrix-results/10024.0_TTbar_13+2017+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step3_TTbar_13+2017+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log10824.0 step3 runTheMatrix-results/10824.0_TTbar_13+2018+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step3_TTbar_13+2018+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log11634.0 step3 runTheMatrix-results/11634.0_TTbar_14TeV+2021+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA/step3_TTbar_14TeV+2021+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA.log25202.0 step3 runTheMatrix-results/25202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25.log12434.0 step3 runTheMatrix-results/12434.0_TTbar_14TeV+2023+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA/step3_TTbar_14TeV+2023+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA.log10224.0 step3 runTheMatrix-results/10224.0_TTbar_13+2017PU+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoFakeHLTPU+HARVESTFakeHLTPU+Nano/step3_TTbar_13+2017PU+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoFakeHLTPU+HARVESTFakeHLTPU+Nano.log23234.0 step3 runTheMatrix-results/23234.0_TTbar_14TeV+2026D49+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D49+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal.log11642.911 step3 runTheMatrix-results/11642.911_ZMM_13+2021_DD4hep+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA/step3_ZMM_13+2021_DD4hep+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA.log28234.0 step3 runTheMatrix-results/28234.0_TTbar_14TeV+2026D60+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D60+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal.log250202.181 step4 runTheMatrix-results/250202.181_TTbar_13UP18+TTbar_13UP18+PREMIXUP18_PU25+DIGIPRMXLOCALUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25/step4_TTbar_13UP18+TTbar_13UP18+PREMIXUP18_PU25+DIGIPRMXLOCALUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25.log11624.911 step3 runTheMatrix-results/11624.911_TTbar_13+2021_DD4hep+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA/step3_TTbar_13+2021_DD4hep+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA.log23434.999 step4 runTheMatrix-results/23434.999_TTbar_14TeV+2026D49PU_PMXS1S2PR+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU/step4_TTbar_14TeV+2026D49PU_PMXS1S2PR+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU.log
I found errors in the following addon tests: cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Thu Nov 19 11:21:27 2020-date Thu Nov 19 11:21:00 2020 s - exit: 23040 |
Comparison job queued. |
@cmsbuild please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-88f171/12767/summary.html Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
+reconstruction
|
+Upgrade |
kind ping @cms-sw/pdmv-l2 |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
|
||
// ------------ method called to produce the data ------------ | ||
std::unique_ptr<TfGraphDefWrapper> TfGraphDefProducer::produce(const TfGraphRecord& iRecord) { | ||
return std::make_unique<TfGraphDefWrapper>(tensorflow::createSession(tensorflow::loadGraphDef(filename_), 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually leak the GraphDef
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out! Per the recent CMSSW ML documentation, we need to indeed delete the graphDef by hand after closeSession (for some reason I was under the impression closeSession deletes the associated graphs). I will open a PR to fix this. I don't seem to observe any detectable effect on the memory, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix merged in #32945
PR description:
This is continuation of PR #31682, integrating the DNN for track selection. We addressed most of the code comments received on the previous PR.
Some comments we could not fully address because we ran into problems and would like to get some feedback on it:
@makortel About #31682 (comment). In class TfGraphDefWrapper, we cannot set variables to be constant, because in https://github.com/minxiyang/cmssw/blob/f98002ab54de04c615cf53140d4105f41911124b/RecoTracker/FinalTrackSelectors/plugins/TrackTfClassifier.cc#L49, createSession only take non-constant input.
We also cannot use unique_ptr in #31682 (comment), since graphDef_ will be unique and copy or transfer it would cause the implicit deletion of the member of TfGraphDefWrapper.
About #31682 (comment), not using
new TfDnnCache()
results in a crash at runtime, which we currently don't understand.@jpata About #31682 (comment). We don't see a particular reason to use struct instead of class. Is there any preference to use one or the other in CMSSW?
@makortel #31682 (comment). We haven't started to address the issues with how TensorFlow is used here yet, we will look at it once we address the question of sessions and memory usage.
We have tested the code after the changes made and there is no change in the output.