-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SONIC updates for site support #45182
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45182/40540
|
A new Pull Request was created by @kpedro88 for master. It involves the following packages:
@fwyzard, @makortel, @cmsbuild can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
test parameters: |
please test |
-1 Failed Tests: UnitTests Unit TestsI found 1 errors in the following unit tests: ---> test DRNTest had ERRORS Comparison SummarySummary:
|
please test |
+1 Size: This PR adds an extra 24KB to repository Comparison SummarySummary:
|
+heterogeneous |
@cms-sw/reconstruction-l2 please check - the only change in your area is propagating some interface changes to a unit test |
+reconstruction |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @mandrenguyen, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
@mandrenguyen the comments were resolved in e24f362, now marked as such. |
+1 |
Ah right, the error in 14_0_X is indeed different (
Thanks! |
Thanks! |
PR description:
This PR includes several updates to support production-style tests at several sites, including T2_US_Purdue, other T2s, and NERSC. There are also some important fixes for bugs that were found during the course of testing.
New features/changes:
$SONIC_LOCAL_BALANCER_HOST
and$SONIC_LOCAL_BALANCER_PORT
, set viacmsset_local.sh
inSITECONF
. (This is a temporary solution that is eventually expected to migrate to something more standardized, similar tostorage.json
. It is introduced here to facilitate tests at other T2 sites to gain operational experience before determining the final form of this configuration.)TritonDiscovery
to enable tracking/debugging of production-style jobs without requiring full verbosity. (This can be seen as an interim step toward full provenance tracking.)podman
andpodman-hpc
to launch Triton containers (useful at NERSC).Bug fixes:
cmsTriton
server launching script, rather than in Python. (Production-style jobs can dump the Python config on a local node and then reuse it on a worker node, so any system calls happen on the local node and may not give the right answer for the worker node as to the presence of a GPU.)There are some interface changes (related to supporting more container engines and checking for local GPUs), which are documented.
PR validation:
Unit tests succeed. Site-specific options have been tested successfully at the appropriate sites.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Not a backport and not intended to be backported.