Candcloner bugfixes and follow up mitigation #102

slava77 · 2022-06-25T08:16:38Z

consistent handling of seeds sorting, which affected iterations that do not have seed merging (pixelLess)
Fixes in the candCloner:
- properly trace the overlap hits, to not be used in the Kalman updates (these were out of order and were used for Kalman updates)
- do not add overlap hit chi2 to the candidate score (the overlap hits are meant to be auxiliary during building)

Based on the validation plots http://uaf-10.t2.ucsd.edu/~legianni/check-clonerFIX2/ ,

out-of-the-box (OOB) there impact is rather minimal
- the efficiency is almost unchanged (small increase at very low pt)
- fakes are up a bit (likely via pixelLess); duplicates are a bit down
- tracks are slightly longer (more visible in the barrel)
the initialStep built tracks are almost unchanged ; average fakes are lower (–5% in some bins) at low pt, but the average vs eta has an increase in the barrel (and some decrease in the endcaps). Tracks are longer by about 0.25 hits
highPtTriplet built tracks have up to 5% fractional increase in the efficiency at lower pt, but fakes are up by a few % absolute scale this iteration comparison is using trackDNN candidate/builtTracks selection (need a check without it). Tracks are longer by 0.06 hits on average (the distribution has an increase in the shorter tracks)
detachedQuad built tracks are mostly unchanged. Tracks are longer by 0.15 hits
detachedTriplet built tracks have up to 10% more (fractional) efficiency below 1 GeV, the fake rate is up by about 5% at low pt as well (it is on average higher for eta < 2, and is lower for eta > 2); the fake rate increase is most visible for d0>0.1 cm. Tracks are longer by 0.13 hits, but there is an increased component of short tracks (5-7 hits).
pixelLess is the most affected this iteration comparison is using trackDNN candidate/builtTracks selection (need a check without it):
- the efficiency is up by a few % below 2 GeV; by about 15% for large displacements (this region is known to be depressed by the DNN selections )
- the fake rate is up by about 20%
- the track length is down by 0.2 on average; there is a small component of more longer tracks, but the average is overtaken by a factor of a few more short tracks

So, based on the observations, I think that this PR can not go as is

(already clear) we need some mitigation updates in the pixelLess selections to reduce the number of fakes.
MTV plots for running without candidate DNN are needed
need to get comparisons in the softQCD sample
extended comparisons are needed to see more details

1. Loop over all available hits, until max-cand output slots are filled. 2. Make sure Kalman update is done on the main selected hit (not overlap). 3. Do not bump candidate chi2 when adding overlap hits.

leonardogiannini · 2022-06-27T05:37:10Z

I'm attaching TTbar and softQCD comparisons without cand. level DNN (in high pT triplet and pixelLess)

blue is CKF, red is baseline without cand DNN and black is after the fix in cand cloner.

*chi2 is also propagated from mkFit to the cand level DNN https://github.com/cms-sw/cmssw/blob/master/RecoTracker/MkFit/plugins/MkFitOutputConverter.cc#L693.

slava77 · 2022-06-27T09:56:38Z

http://uaf-10.t2.ucsd.edu/~legianni/clonerFix/TTbarPU/

highPtTriplet (now without the cand DNN) built tracks have fairly small change, perhaps about 0.5-1% higher efficiency for pt< 2 GeV and no substantive change in the fake rate. The tracks are 0.23 hits longer and there is no component with increasing short hits.
- "byOriginalAlgo" is still almost unchanged
- recall that the changes were much larger with trackDNN, which points to a larger impact on the candidates (before the final fit)
pixelLess built tracks have
- no clear change in eff vs pt or eta; t
- he fake rate goes up by 1-2% for pt<2 GeV (the total number of fake tracks is up by about 5% in the same range; the largest increase is around eta 1.2 with 30% more fake tracks)
- tracks still show an increase in the longer tracks, but also in the short tracks (quality filters need review?)

…o the Kalman update plane in the barrel (can be controlled by KFO_Local_Cov)

slava77 · 2022-07-27T01:06:56Z

I pushed the "local" covariance updates to this PR f8f6ada...eecfaba

This effectively applies a restriction of the track state covariance to the plane where the Kalman update is made. The procedure is to rotate the space coordinates to have x orthogonal to the update plane (the same angle of rotation as the one used in the Kalman projection matrix), then zero-out the x component of the covariance and rotate back. The implementation in the code is after the explicit expansion and can be controlled with KFO_Local_Cov flag.

This affects only the barrel Kalman updates; in the endcap the update is already done properly in the xy plane.
In the barrel the propagated covariance is apparently constrained in the tangent plane (fixed path) instead of fixed R. The difference decreases with increasing pt.

In the following, the impact of these commits can be seen between black and orange:

Plots for ttbar are in https://slava77.web.cern.ch/slava77/mic/CMSSW_12_4_0-pixmkf/11834.0_TTbar_14TeV+2021/AVE_50_BX01_25ns/ckf_mkfit-6-nocdnn-orig-candcloner-3-fix-covLocal-noD0filt-28613e2/
there is a small increase in "out-of-the-box" tracks at low pt (close to 0.5% below 0.6 GeV; the contribution seems to be roughly split between the detachedTriplet and pixelLess iterations)

plots for the softQCD are in https://slava77.web.cern.ch/slava77/mic/CMSSW_12_4_0-pixmkf/SoftQCD_XiFilter/ckf_mkfit-6-nocdnn-orig-candcloner-3-fix-covLocal-noD0filt-28613e2/
The effect is more visible in iterations more dominated by lower-pt and more displaced tracks, e.g in SoftQCD in the pixelLess we get more hits both in the strips and pick up some pixel hits as well

slava77 · 2022-07-28T06:26:09Z

it turns out that while transferring from my debugging/testing setup in 12_4_0 to the mergeable in CMSSW version in 12_5_0_pre2, I missed a switch to enable the covariance matrix restriction on the backward fit.
(the summary plots were from a correctly working variant in 12_4_0; the longer story is that the initial validation plots were for a code where the constraint was accidentally enabled ignoring the control flag and then while making it properly controlled by the flag I missed the MkFinder calls for bw fit)

This is now pushed in 7f5d194 in this PR.

@mmasciov

slava77 · 2022-08-03T13:20:14Z

as noted on Monday, I have pushed a bugfix in the propToR Jacobian f71fb49.

Looking back, the effective flip of the sign happened in trackreco/mkFit#46 in trackreco/mkFit@a55225a#diff-12bbb3341241b082ec368e3bc565eddc0d144431034349ce475245bffd52ddebL453-R452

Effectively, only the first step of the 5-iteration propagation contributed to the Jacobian derivatives computation. After the fix, each iteration step contributes to the accumulation of affected derivatives.

One side effect is that cov used to compute chi2 (small step propagation from the nominal radius) and cov used in update (large step propagation from the previous hit layer) was inconsistent.

For one specific "friend" (pixelLess candidate with state params x y z 1/pt phi theta: -12.012802 41.732376 -14.565144 1.726228 2.292912 1.974368) prop errors xx, xy, yy:

before the fix (notice the quite significant differences in each step)
- during chi2 comp: 0.051847 0.018698 0.006746
- during Kalman update, before local projection: 0.0324688 0.0245153 0.0185123
- after local projection from e3f2c8c earlier in this PR: 4.101098e-02 1.168057e-02 3.326809e-03
after the fix:
- during chi2 comp: 5.755034e-02 1.638089e-02 4.662616e-03
- during Kalman update before local projection: 5.896209e-02 1.577486e-02 4.220664e-03
- during Kalman update after local projection: 5.842577e-02 1.664057e-02 4.739495e-03

After this fix the "friend" backward propagation has reasonable chi2 (max at 3.4 per hit), while before the fix some hits were going over chi2 of 200.

Even though the change seems significant, the changes in physics are fairly small, as seen in the incremental comparison with 6-iter mkFit (with DNN on candidates disabled) and (blue is before the fix)

ttbar PU50
- OOB all tracks relatively unchanged (tiny increase in eff also has a small increase in fakes):

- pixelLess built tracks look a bit better (more eff, lower fake)

are a bit longer

.. and probably because of increase in length the fake rate instead increases after DNN selection in allTracksByOriginalAlgo

- [SoftQCD_XiFilter](https://slava77.web.cern.ch/slava77/mic/CMSSW_12_4_0-pixmkf/SoftQCD_XiFilter/mkfit-6-nocdnn-candcloner-3-fix-covLocal-noD0filt-43b4019_5e0cebb/)

…oded

slava77 · 2022-08-04T16:15:07Z

one more commit 4b7c60f ( fix matched layer counting in nHitsByTypeEncoded and nLayersByTypeEncoded). in one example of a candidate in the barrel with valid hits on layers 4 5 6 7 8 9 10 11, which should be 5 matched layers, we counted before the fix only 3 layers and filtered out quite a few candidates.

This fix can technically affect the detachedTriplets, but there the effect seems almost invisible. Most of the visible change is in the pixelLess step and in particular in cases without attached pixel hits. Validation plots with candidate DNN disabled and using our 6-iter mkFit setup; comparing CKF (blue) with baseline as in CMSSW_12_4_0 (red), next-to-last commit (black), and the last commit (orange), without changes in the candidate filters:

ttbar PU50
- pixelLess built tracks vs displacement about a half of inefficiency is recovered without much increase in fakes
SoftQCD
- OOB all tracks show an improvement in efficiency even with the current DNN
- pixelLess built tracks are better at low pt, but still have more work to do
- pixelLess built tracks vs vtx position show very significant improvement for large displacement (black->orange)

…electHitIndices

slava77 · 2022-08-08T23:04:25Z

One more commit is added aad62ce, to account for dphi spread corresponding to the layer thickness. It is used in the selectHitIndices and is enabled only if no holes are allowed in the trajectory. This dphi is still clamped by the range defined with the MC truth "training".

I tried to apply this more generally, but could not find a positive effect:

in the barrel still with zero holes almost nothing changes (ttbar and SoftQCD blue: before changes, red endcap, black both barrel and endcap) I'm not sure if this is a feature of the tracks or something is out of place in the phi windows.
if the cut is enabled for all iterations, the performance degrades a bit: more fakes and even a bot lower efficiency (incremental diffs: blue before changes, red with zero holes [just pixelLess] and black enabled everywhere : ttbar SoftQCD )
- e.g. in ttbar detachedQuad

Plots for the last commit are (comparing ckf blue, mkfit in the release red, before last commit in black, and this commit in orange):

ttbar
- pixelLess builtTracks hit distributions show longer tracks
- pixelLess builtTracks eff and fake has a small increase in efficiency below 1 GeV and even some decrease in fakes in that same range
- pixelLess built track fakes at large d0 are a bit lower
SoftQCD
- in pixelLess builtTracks there is some noticeable increase of efficiency below 0.9 GeV

slava77 · 2022-08-13T20:37:11Z

I have merged the commit from @mmasciov
and changed this to no-draft.

slava77 · 2022-08-13T20:52:01Z

the last commits include:

[f229157 and 9f5b62e] relaxed requirement in the pixelLess trajectory filtering to allow tracks with >= 6 matched hits/layers. Soft QCD pixelLess built tracks show the effect of this change from black to orange (compared to CKF in blue and red for what we have without this PR)
[08be050] disable cand DNN selection in the pixelLess iteration. The justification is a bit biased by the physics performance and looking for a default setup that allows to monitor built tracks without this DNN in view of a retraining. The impact is most visible in the SoftQCD efficiency vs vtx displacement in the pixelLess iteration allTracks for sim tracks with pt<0.9 GeV pink has cand DNN, orange doesn't (red is before this PR and blue/black are with CKF pixelLess before/after this PR )

osschar added 4 commits June 22, 2022 12:52

Fix 3 bugs in CandCloner.

b132dae

1. Loop over all available hits, until max-cand output slots are filled. 2. Make sure Kalman update is done on the main selected hit (not overlap). 3. Do not bump candidate chi2 when adding overlap hits.

Make sure seed_cleaning flag is used consistently in runOneIteration().

e879b58

Remove MkFinder::m_LastHoT that is no longer needed.

d0e7503

Place hit data into proper Matriplex slot.

f8f6ada

slava77devel added 2 commits July 27, 2022 02:17

zero-out the predicted state covariance in the direction transverse t…

e3f2c8c

…o the Kalman update plane in the barrel (can be controlled by KFO_Local_Cov)

code-checks

eecfaba

apply KFO_Local_Cov in the backward fit

7f5d194

fix condition used for derivatives in prop to R iterative

f71fb49

fix matched layer counting in nHitsByTypeEncoded and nLayersByTypeEnc…

4b7c60f

…oded

account for layer thickness in the endcap dphi range computation in s…

aad62ce

…electHitIndices

Do not do seed hit merging for bulk of lowPtQuad seeds

2d86624

slava77 marked this pull request as ready for review August 13, 2022 19:49

slava77devel added 3 commits August 13, 2022 22:50

lower nLyers cut to 5 for low-pt forward filter in the pixelLess

f229157

lower nHits cut to 5 for low-pt forward filter in the pixelLess

9f5b62e

disable cand dnn filter in the pixelLess step

08be050

slava77 mentioned this pull request Aug 14, 2022

mkFit updates aiming to improve pixelLess iteration and related cms-sw/cmssw#39055

Merged

slava77 merged commit 2bf39e1 into master Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Candcloner bugfixes and follow up mitigation #102

Candcloner bugfixes and follow up mitigation #102

slava77 commented Jun 25, 2022

leonardogiannini commented Jun 27, 2022

slava77 commented Jun 27, 2022

slava77 commented Jul 27, 2022

slava77 commented Jul 28, 2022

slava77 commented Aug 3, 2022

slava77 commented Aug 4, 2022

slava77 commented Aug 8, 2022

slava77 commented Aug 13, 2022

slava77 commented Aug 13, 2022

Candcloner bugfixes and follow up mitigation #102

Candcloner bugfixes and follow up mitigation #102

Conversation

slava77 commented Jun 25, 2022

leonardogiannini commented Jun 27, 2022

slava77 commented Jun 27, 2022

slava77 commented Jul 27, 2022

slava77 commented Jul 28, 2022

slava77 commented Aug 3, 2022

slava77 commented Aug 4, 2022

slava77 commented Aug 8, 2022

slava77 commented Aug 13, 2022

slava77 commented Aug 13, 2022