Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track classifier using TensorFlow - continuation #32128

Merged
merged 32 commits into from
Feb 16, 2021

Conversation

minxiyang
Copy link
Contributor

PR description:

This is continuation of PR #31682, integrating the DNN for track selection. We addressed most of the code comments received on the previous PR.

Some comments we could not fully address because we ran into problems and would like to get some feedback on it:

@makortel About #31682 (comment). In class TfGraphDefWrapper, we cannot set variables to be constant, because in https://github.com/minxiyang/cmssw/blob/f98002ab54de04c615cf53140d4105f41911124b/RecoTracker/FinalTrackSelectors/plugins/TrackTfClassifier.cc#L49, createSession only take non-constant input.
We also cannot use unique_ptr in #31682 (comment), since graphDef_ will be unique and copy or transfer it would cause the implicit deletion of the member of TfGraphDefWrapper.
About #31682 (comment), not using new TfDnnCache() results in a crash at runtime, which we currently don't understand.

@jpata About #31682 (comment). We don't see a particular reason to use struct instead of class. Is there any preference to use one or the other in CMSSW?

@makortel #31682 (comment). We haven't started to address the issues with how TensorFlow is used here yet, we will look at it once we address the question of sessions and memory usage.

We have tested the code after the changes made and there is no change in the output.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32128/19791

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@makortel
Copy link
Contributor

About #31682 (comment). In class TfGraphDefWrapper, we cannot set variables to be constant, because in https://github.com/minxiyang/cmssw/blob/f98002ab54de04c615cf53140d4105f41911124b/RecoTracker/FinalTrackSelectors/plugins/TrackTfClassifier.cc#L49, createSession only take non-constant input.

Apparently tensorflow::session() itself takes the GraphDef via const reference
https://github.com/cms-externals/tensorflow/blob/cms/v2.1.2/tensorflow/core/public/session.h#L100
and therefore our interface could be changed to take the GraphDef via const pointer as well

Session* createSession(GraphDef* graphDef, SessionOptions& sessionOptions);

(maybe on the same go the API could be changed to return std::unique_ptr to make the ownership semantics clear)

FYI @riga @mialiu149

The public interface of EventSetup products must disallow modifications.

@makortel
Copy link
Contributor

We also cannot use unique_ptr in #31682 (comment), since graphDef_ will be unique and copy or transfer it would cause the implicit deletion of the member of TfGraphDefWrapper.

You could if you would also address #31682 (comment), I.e.

  • remove the TfGraphDefProducer::wrapper_ member (it is unnecessary, and even if works here in practice, the pattern is harmful for concurrent IOVs)
  • move the call to tensorflow::loadGraphDef() to TfGraphDefProducer::produce() (which would allow skipping the graph construction if no module would actually ask for it)

Then TfGraphDefWrapper would be the sole owner the of the GraphDef, and could use std::unique_ptr to handle the ownership.

@makortel
Copy link
Contributor

About #31682 (comment), not using new TfDnnCache() results in a crash at runtime, which we currently don't understand.

Interesting, I wonder why. Can you show the stack trace (even if it probably won't help much)?

@riga
Copy link
Contributor

riga commented Nov 13, 2020

@makortel

our interface could be changed to take the GraphDef via const pointer as well

Yes, we should add that method.

(maybe on the same go the API could be changed to return std::unique_ptr to make the ownership semantics clear)

Sounds reasonable, but I'm not sure about the implications yet. I'll look into it.

@jpata
Copy link
Contributor

jpata commented Nov 16, 2020

@minxiyang can you please address code checks?

scram b code-checks

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32128/19847

  • This PR adds an extra 12KB to repository

  • Found files with invalid states:

    • DataFormats/TrackTfGraph/src/TfGraphDefWrapper.cc:
    • DataFormats/TrackTfGraph/plugins/TfGraphDefProducer.cc:
    • DataFormats/TrackTfGraph/src/ES_TfGraphDefWrapper.cc:
    • DataFormats/TrackTfGraph/BuildFile.xml:
    • DataFormats/TrackTfGraph/plugins/BuildFile.xml:
    • DataFormats/TrackTfGraph/interface/TfGraphDefWrapper.h:
  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @minxiyang (Minxi Yang) for master.

It involves the following packages:

RecoTracker/FinalTrackSelectors
RecoTracker/IterativeTracking
TrackingTools/Records

@perrotta, @jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @bellan, @mschrode, @lecriste, @gpetruc, @ebrondol, @mtosi, @dgulhan this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

Copy link
Contributor

@jpata jpata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some reco comments.

Please address also the core team comment about simplifying/correcting how the GraphDef is loaded through the ES infrastructure: https://github.com/cms-sw/cmssw/pull/32128/files#issuecomment-726427600

Thanks!

@@ -0,0 +1,2 @@
11024.0_TTbar_13+2018PU+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoFakeHLTPU+HARVESTFakeHLTPU+Nano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED Step4-PASSED - time date Tue Nov 10 17:34:43 2020-date Tue Nov 10 17:32:23 2020; exit: 0 0 0 0 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should be removed from the PR

};

namespace {
struct TfDnn {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment #31682 (comment) from the original PR should be addressed:

  • if you want to describe a simple data structure with no constructors/destructors use struct
  • if constructors, destructors or nontrivial methods are needed, use class

Here the reco recommendation is to use a class rather than a struct, unless there is a specific reason to use a struct.

float operator()(reco::Track const& trk,
reco::BeamSpot const& beamSpot,
reco::VertexCollection const& vertices) const {
Point bestVertex = getBestVertex(trk, vertices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here a copy is being made of the vertex, is this intended? If not, const auto& bestVertex = ... should be fine.

@@ -0,0 +1,3 @@
#include "PhysicsTools/TensorFlow/interface/TensorFlow.h"
#include "FWCore/Utilities/interface/typelookup.h"
TYPELOOKUP_DATA_REG(tensorflow::Session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #31682 (comment), this should be unnecessary and should be removed.

@@ -0,0 +1,13 @@
qualityCutDictionary = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ask to put a reference/indico URL to the slides where these numbers were presented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a reference URL here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be addressed for the final version

@@ -0,0 +1,22 @@
#ifndef TrackingTools_Records_TfGraphRecord_TfGraphRecord_h
#define TrackingTools_Records_TfGraphRecord_TfGraphRecord_h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CMS convention for the header guard would be
TrackingTools_Records_TfGraphRecord_h

: tfDnnLabel_(cfg.getParameter<std::string>("tfDnnLabel"))

{}
TfDnnCache* cache_ = new TfDnnCache();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the initialization in the constructor instead of at variable declaration?

@jpata
Copy link
Contributor

jpata commented Nov 19, 2020

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 19, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-1

Tested at: 9262727

CMSSW: CMSSW_11_2_X_2020-11-18-2300
SCRAM_ARCH: slc7_amd64_gcc820
You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-88f171/10851/summary.html

I found follow errors while testing this PR

Failed tests: RelVals AddOn

  • RelVals:

When I ran the RelVals I found an error in the following workflows:
135.4 step1

runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log

5.1 step1
runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log

140.53 step2
runTheMatrix-results/140.53_RunHI2011+RunHI2011+RECOHID11+HARVESTDHI/step2_RunHI2011+RunHI2011+RECOHID11+HARVESTDHI.log

140.56 step2
runTheMatrix-results/140.56_RunHI2018+RunHI2018+RECOHID18+HARVESTDHI18/step2_RunHI2018+RunHI2018+RECOHID18+HARVESTDHI18.log

4.22 step3
runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step3_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log

4.53 step3
runTheMatrix-results/4.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step3_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log

136.731 step3
runTheMatrix-results/136.731_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2/step3_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2.log

136.793 step3
runTheMatrix-results/136.793_RunDoubleEG2017C+RunDoubleEG2017C+HLTDR2_2017+RECODR2_2017reHLT_skimDoubleEG_Prompt+HARVEST2017/step3_RunDoubleEG2017C+RunDoubleEG2017C+HLTDR2_2017+RECODR2_2017reHLT_skimDoubleEG_Prompt+HARVEST2017.log

136.874 step3
runTheMatrix-results/136.874_RunEGamma2018C+RunEGamma2018C+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Offline_L1TEgDQM+HARVEST2018_L1TEgDQM/step3_RunEGamma2018C+RunEGamma2018C+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Offline_L1TEgDQM+HARVEST2018_L1TEgDQM.log

8.0 step4
runTheMatrix-results/8.0_BeamHalo+BeamHalo+DIGICOS+RECOCOS+ALCABH+HARVESTCOS/step4_BeamHalo+BeamHalo+DIGICOS+RECOCOS+ALCABH+HARVESTCOS.log

101.0 step1
runTheMatrix-results/101.0_SingleElectronE120EHCAL+SingleElectronE120EHCAL/step1_SingleElectronE120EHCAL+SingleElectronE120EHCAL.log

7.3 step3
runTheMatrix-results/7.3_CosmicsSPLoose_UP18+CosmicsSPLoose_UP18+DIGICOS_UP18+RECOCOS_UP18+ALCACOS_UP18+HARVESTCOS_UP18/step3_CosmicsSPLoose_UP18+CosmicsSPLoose_UP18+DIGICOS_UP18+RECOCOS_UP18+ALCACOS_UP18+HARVESTCOS_UP18.log

1000.0 step2
runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log

1001.0 step2
runTheMatrix-results/1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVDSIPIXELCALRUN1+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5/step2_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVDSIPIXELCALRUN1+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5.log

1306.0 step3
runTheMatrix-results/1306.0_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15/step3_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15.log

9.0 step3
runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log

1330.0 step3
runTheMatrix-results/1330.0_ZMM_13+ZMM_13+DIGIUP15+RECOUP15_L1TMuDQM+HARVESTUP15_L1TMuDQM+NANOUP15/step3_ZMM_13+ZMM_13+DIGIUP15+RECOUP15_L1TMuDQM+HARVESTUP15_L1TMuDQM+NANOUP15.log

312.0 step3
runTheMatrix-results/312.0_Pyquen_ZeemumuJets_pt10_2760GeV_2021+Pyquen_ZeemumuJets_pt10_2760GeV_2021+DIGIHI2021MIX+RECOHI2021MIX+HARVESTHI2021PPRECO/step3_Pyquen_ZeemumuJets_pt10_2760GeV_2021+Pyquen_ZeemumuJets_pt10_2760GeV_2021+DIGIHI2021MIX+RECOHI2021MIX+HARVESTHI2021PPRECO.log

25.0 step3
runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log

10042.0 step3
runTheMatrix-results/10042.0_ZMM_13+2017+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step3_ZMM_13+2017+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log

10024.0 step3
runTheMatrix-results/10024.0_TTbar_13+2017+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step3_TTbar_13+2017+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log

10824.0 step3
runTheMatrix-results/10824.0_TTbar_13+2018+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step3_TTbar_13+2018+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log

11634.0 step3
runTheMatrix-results/11634.0_TTbar_14TeV+2021+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA/step3_TTbar_14TeV+2021+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA.log

25202.0 step3
runTheMatrix-results/25202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25.log

12434.0 step3
runTheMatrix-results/12434.0_TTbar_14TeV+2023+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA/step3_TTbar_14TeV+2023+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA.log

10224.0 step3
runTheMatrix-results/10224.0_TTbar_13+2017PU+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoFakeHLTPU+HARVESTFakeHLTPU+Nano/step3_TTbar_13+2017PU+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoFakeHLTPU+HARVESTFakeHLTPU+Nano.log

23234.0 step3
runTheMatrix-results/23234.0_TTbar_14TeV+2026D49+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D49+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal.log

11642.911 step3
runTheMatrix-results/11642.911_ZMM_13+2021_DD4hep+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA/step3_ZMM_13+2021_DD4hep+ZMM_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA.log

28234.0 step3
runTheMatrix-results/28234.0_TTbar_14TeV+2026D60+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D60+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal.log

250202.181 step4
runTheMatrix-results/250202.181_TTbar_13UP18+TTbar_13UP18+PREMIXUP18_PU25+DIGIPRMXLOCALUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25/step4_TTbar_13UP18+TTbar_13UP18+PREMIXUP18_PU25+DIGIPRMXLOCALUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25.log

11624.911 step3
runTheMatrix-results/11624.911_TTbar_13+2021_DD4hep+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA/step3_TTbar_13+2021_DD4hep+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA.log

23434.999 step4
runTheMatrix-results/23434.999_TTbar_14TeV+2026D49PU_PMXS1S2PR+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU/step4_TTbar_14TeV+2026D49PU_PMXS1S2PR+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU.log

  • AddOn:

I found errors in the following addon tests:

cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Thu Nov 19 11:21:27 2020-date Thu Nov 19 11:21:00 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:Fake2,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_Fake2 --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --processName=HLTRECO --filein file:RelVal_Raw_Fake2_DATA.root --fileout file:RelVal_Raw_Fake2_DATA_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:24:20 2020-date Thu Nov 19 11:21:06 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:PRef,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run3_data_PRef --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run3 --processName=HLTRECO --filein file:RelVal_Raw_PRef_DATA.root --fileout file:RelVal_Raw_PRef_DATA_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:26:22 2020-date Thu Nov 19 11:21:09 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:HIon,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run3_data_HIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run3_pp_on_PbPb --processName=HLTRECO --filein file:RelVal_Raw_HIon_DATA.root --fileout file:RelVal_Raw_HIon_DATA_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:25:53 2020-date Thu Nov 19 11:21:11 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:GRun,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run3_mc_GRun --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run3 --processName=HLTRECO --filein file:RelVal_Raw_GRun_MC.root --fileout file:RelVal_Raw_GRun_MC_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:30:34 2020-date Thu Nov 19 11:21:12 2020 s - exit: 23040
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc_l1stage1 --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_25ns : FAILED - time: date Thu Nov 19 11:21:44 2020-date Thu Nov 19 11:21:13 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:PRef,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run3_mc_PRef --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run3 --processName=HLTRECO --filein file:RelVal_Raw_PRef_MC.root --fileout file:RelVal_Raw_PRef_MC_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:27:14 2020-date Thu Nov 19 11:21:15 2020 s - exit: 23040
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_2016 : FAILED - time: date Thu Nov 19 11:21:49 2020-date Thu Nov 19 11:21:16 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:PIon,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run3_data_PIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run3 --processName=HLTRECO --filein file:RelVal_Raw_PIon_DATA.root --fileout file:RelVal_Raw_PIon_DATA_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:25:38 2020-date Thu Nov 19 11:21:19 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:Fake,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run1_mc_Fake --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --processName=HLTRECO --filein file:RelVal_Raw_Fake_MC.root --fileout file:RelVal_Raw_Fake_MC_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:26:30 2020-date Thu Nov 19 11:21:21 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:HIon,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run3_mc_HIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run3_pp_on_PbPb --processName=HLTRECO --filein file:RelVal_Raw_HIon_MC.root --fileout file:RelVal_Raw_HIon_MC_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:28:02 2020-date Thu Nov 19 11:21:26 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:Fake2,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run2_mc_Fake2 --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --processName=HLTRECO --filein file:RelVal_Raw_Fake2_MC.root --fileout file:RelVal_Raw_Fake2_MC_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:26:49 2020-date Thu Nov 19 11:21:28 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:Fake1,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run2_mc_Fake1 --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_25ns --processName=HLTRECO --filein file:RelVal_Raw_Fake1_MC.root --fileout file:RelVal_Raw_Fake1_MC_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:26:59 2020-date Thu Nov 19 11:21:29 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:Fake,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run1_data_Fake --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --processName=HLTRECO --filein file:RelVal_Raw_Fake_DATA.root --fileout file:RelVal_Raw_Fake_DATA_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:25:35 2020-date Thu Nov 19 11:21:37 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:GRun,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run3_data_GRun --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run3 --processName=HLTRECO --filein file:RelVal_Raw_GRun_DATA.root --fileout file:RelVal_Raw_GRun_DATA_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:31:04 2020-date Thu Nov 19 11:21:39 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:Fake1,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_Fake1 --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_25ns --processName=HLTRECO --filein file:RelVal_Raw_Fake1_DATA.root --fileout file:RelVal_Raw_Fake1_DATA_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:25:35 2020-date Thu Nov 19 11:21:44 2020 s - exit: 23040
cmsDriver.py RelVal -s HLT:PIon,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run3_mc_PIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run3 --processName=HLTRECO --filein file:RelVal_Raw_PIon_MC.root --fileout file:RelVal_Raw_PIon_MC_HLT_RECO.root : FAILED - time: date Thu Nov 19 11:27:17 2020-date Thu Nov 19 11:21:47 2020 s - exit: 23040

@cmsbuild
Copy link
Contributor

Comparison job queued.

@jpata
Copy link
Contributor

jpata commented Feb 8, 2021

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 8, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-88f171/12767/summary.html
COMMIT: 26625ca
CMSSW: CMSSW_11_3_X_2021-02-08-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/32128/12767/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-88f171/11634.91_TTbar_14TeV+2021_trackdnn+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2751765
  • DQMHistoTests: Total failures: 15
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2751728
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 62.438 KiB( 36 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 1.861 KiB Tracking/TrackParameters
  • DQMHistoSizes: changed ( 1306.0,... ): 4.977 KiB Tracking/TrackBuilding
  • DQMHistoSizes: changed ( 23234.0,... ): 5.563 KiB Tracking/TrackBuilding
  • DQMHistoSizes: changed ( 23234.0,... ): 1.991 KiB Tracking/TrackParameters
  • Checked 156 log files, 37 edm output root files, 37 DQM output files

@jpata
Copy link
Contributor

jpata commented Feb 9, 2021

+reconstruction

@srimanob
Copy link
Contributor

+Upgrade

@jpata
Copy link
Contributor

jpata commented Feb 15, 2021

kind ping @cms-sw/pdmv-l2

@chayanit
Copy link

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Feb 16, 2021

+1


// ------------ method called to produce the data ------------
std::unique_ptr<TfGraphDefWrapper> TfGraphDefProducer::produce(const TfGraphRecord& iRecord) {
return std::make_unique<TfGraphDefWrapper>(tensorflow::createSession(tensorflow::loadGraphDef(filename_), 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually leak the GraphDef?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking into it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out! Per the recent CMSSW ML documentation, we need to indeed delete the graphDef by hand after closeSession (for some reason I was under the impression closeSession deletes the associated graphs). I will open a PR to fix this. I don't seem to observe any detectable effect on the memory, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix merged in #32945

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.