Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print diagnostics when fetched invalid BeamSpotOnlineObjects #47171

Draft
wants to merge 8 commits into
base: CMSSW_14_0_X
Choose a base branch
from

Conversation

JanChyczynski
Copy link
Contributor

PR description:

Printing more diagnostics to help with investigating #45555
Continuation of #46395

For BeamSpotOnlineObjects, when fetching IOV sequence all the objects of the sequence are fetched and checked if they're valid. The validity check is implemented by:

  • validating error of VertexState created from the BeamSpot as done in
    VertexState beamVertexState(beamSpot);
    if ((beamVertexState.error().cxx() <= 0.) || (beamVertexState.error().cyy() <= 0.) ||
    (beamVertexState.error().czz() <= 0.)) {
    edm::LogError("UnusableBeamSpot") << "Beamspot with invalid errors " << beamVertexState.error().matrix();
    validBS = false;
    }
  • checking if the BeamSpot.covariance() (error) matrix is invertible (by running .Inverse)

PR validation:

There's a debug mode to test the diagnostics, turned on by #define BEAM_SPOT_DIAGNOSTICS_DEBUG.

Tested on CMSSW_14_0_11
Run with (taken from #45555) :

#!/bin/bash -ex

# originally on CMSSW_14_0_11_MULTIARCHS

hltGetConfiguration run:383631 \
--globaltag 140X_dataRun3_HLT_v3 \
--data \
--no-prescale \
--no-output \
--max-events -1 \
--input '/store/group/tsg/FOG/error_stream_root/run383631/run383631_ls0444_index000423_fu-c2b14-43-01_pid675389.root,/store/group/tsg/FOG/error_stream_root/run383631/run383631_ls0445_index000025_fu-c2b14-43-01_pid675389.root,/store/group/tsg/FOG/error_stream_root/run383631/run383631_ls0666_index000047_fu-c2b14-43-01_pid675067.root,/store/group/tsg/FOG/error_stream_root/run383631/run383631_ls0445_index000002_fu-c2b14-43-01_pid675389.root,/store/group/tsg/FOG/error_stream_root/run383631/run383631_ls0666_index000027_fu-c2b14-43-01_pid675067.root,/store/group/tsg/FOG/error_stream_root/run383631/run383631_ls0666_index000065_fu-c2b14-43-01_pid675067.root' > hlt.py

cat <<@EOF >> hlt.py
process.options.wantSummary = True
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0
@EOF

cmsRun hlt.py &> hlt.log

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Unsure how this should be handled. For now the draft PR is to the release I tested it with.

Before submitting your pull requests, make sure you followed this checklist:

@cmsbuild
Copy link
Contributor

Pull request #47171 was updated.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 23, 2025

cms-bot internal usage

@JanChyczynski
Copy link
Contributor Author

JanChyczynski commented Jan 23, 2025

@Dr15Jones @mmusich Please take a look. I expressed some doubts by //TODO in the code, I'll also post some comments to some part of the code.

Please also advise me on how this PR should be handled regarding releases and backports. I've tested it on CMSSW_14_0_11 because the code to run the hlt was running on this release.

FYI @PonIlya

@cmsbuild
Copy link
Contributor

Pull request #47171 was updated.

Comment on lines +387 to +419
s << "\n--------------------- BeamSpot Diagnostics (hash: " << hash << ") ----------------\n"
<< "Fetched invalid beamspot obj of hash " << hash << "\n"
<< "Tag: " << iovProxyData->tagInfo.name << "\n"
<< "IOV: " << iov << "(" << cond::time::timeTypeName(iovProxyData->tagInfo.timeType) << ")\n"
<< "Request info: "
<< "' request interval [ " << lowerGroup << " , " << higherGroup << " ] "
<< "new range [ " << iovProxyData->groupLowerIov << " , " << iovProxyData->groupHigherIov << " ] "
<< "#entries " << iovProxyData->iovSequence.size() << "\n"
<< "\n"
<< "BeamSpotOnlineObjects:\n"
<< *beamspotObj
<< "\n"
//TODO probably no new info here, enough to preant only BeamSpotOnlineObjects
<< "BeamSpot:\n"
<< beamSpot << "\n"
<< "BeamSpot (BS) rotatedCovariance3D:\n"
<< beamSpot.rotatedCovariance3D() << "\n"
<< "BS position: " << beamSpot.position() << "\n"
<< "BS covariance Determinant: " << det << "\n"
<< "Of matrix: " << beamSpot.covariance() << "\n"
<< "is VertexState(BS).error() invalid? (0=valid): " << vertexFromBSInvalid << "\n"
<< "VertexState(BS).error().matrix(): \n"
<< beamVertexState.error().matrix() << "\n"
<< "err cxx() = " << beamVertexState.error().cxx() << "\n"
<< "err cyy() = " << beamVertexState.error().cyy() << "\n"
<< "err czz() = " << beamVertexState.error().czz() << "\n"
<< "did BS.covariance() inversion failed? (0=ok, 1=failed): " << inversionFail << "\n"
<< "BS.covariance().Inverse():\n"
<< errorInverse << "\n"
<< "\n-------------- end of BeamSpot Diagnostics (hash: " << hash << ") ----------------\n"
<< std::endl;
edm::LogSystem("NewIOV") << s.str();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what diagnostics should I remove and if any info is missing.


double det; //TODO do we check BS validity by det?
beamSpot.covariance().Det2(det);
if (vertexFromBSInvalid || inversionFail) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please notice that the diagnostics are handled differently from the way that implementation by @Dr15Jones worked. Info is printed only when an invalid BeamSpotOnlineObjects payload is fetched and there is no check whether the fetching happened for the first time or not.

Should there be any info displayed even when fetch objects are valid?

Comment on lines +374 to +384
//Check validity of BeamSpot by validating error of VertexState created from it
VertexState beamVertexState(beamSpot);
bool vertexFromBSInvalid = (beamVertexState.error().cxx() <= 0.) || (beamVertexState.error().cyy() <= 0.) ||
(beamVertexState.error().czz() <= 0.);

//Check validity of BeamSpot by confirming it can be inverted
int inversionFail = 2; //the .Inverse(...) will set it to 0 or 1
reco::BeamSpot::CovarianceMatrix errorInverse = beamSpot.covariance().Inverse(inversionFail);

double det; //TODO do we check BS validity by det?
beamSpot.covariance().Det2(det);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a good way to check the validity of beamSpotObjects? Wouldn't that be too big of an overhead for HLT? What would you do differently?

@cmsbuild
Copy link
Contributor

Pull request #47171 was updated.

@cmsbuild
Copy link
Contributor

Pull request #47171 was updated.

@Dr15Jones
Copy link
Contributor

I personally think this is way too intrusive on the base code (i.e. making the PoolDBESSource directly dependent upon beam spot object code). Plus fetching ALL the beam spot object each time the IOV sequence is fetched is going to be pulling lots of data potentially from the DB all the time.

Why not instead add a parameter to PoolDBESSource to turn on printing the IOV info (i.e. iov/hash) in the fetch request if the payloadType matches a certain name? Such a change would then potentially help other debugging jobs.

For this case, when the problem happens you can check what the IOV info was, see if it matches what was expected (and or seen by other worker nodes). If the hash doesn't match expectations then that identifies the in the IOV sequence request. If it does match it means the request to fetch using the 'hash' went wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants