-
Notifications
You must be signed in to change notification settings - Fork 0
Externals Survey
-
Current
: what's in ext-v2.1, i.e. the externals used by our current nightlies -
NFDT_DEV_24<MMDD>_A9
: what's in the still-in-development ext-v2.2 and used by that particular test nightly -
Preferred (no checksum)
: the version labeled as "Preferred" in the latestpackage.py
-
Latest from checksum
: latest version if you runspack checksum <package>
Current | NFDT_DEV_241129_A9 | NFDT_DEV_241214_A9 | NFDT_DEV_241216_A9 | NFDT_DEV_241230_A9 | NFDT_DEV_250104_A9 | NFDT_DEV_250109_A9 | NFDT_DEV_250115_A9 | Preferred (no checksum) | Latest from checksum | |
---|---|---|---|---|---|---|---|---|---|---|
abseil-cpp | 20240116.2 | " | " | " | " | " | " | " | " | 20240722.0 |
boost | 1.77.0 | 1.85.0 | " | " | " | " | " | " | " | 1_87_0_b1 |
cetlib | 3.18.01 | " | " | " | " | " | " | " | " | " |
cli11 | 2.3.2 | " | " | " | " | " | " | " | " | 2.4.2 |
cmake | 3.26.3 | " | " | " | " | " | " | " | 3.27.9 | 3.31.1 |
cppzmq | 4.8.1 | 4.10.0 | " | " | " | " | " | " | " | " |
cyrus-sasl | 2.1.27 | X | X | X | X | X | 2.1.28 | " | " | " |
dpdk | 22.11 | " | " | " | " | " | " | " | 23.03 | " |
felix-software | dunedaq-v4.2.0 | " | fddaq-v5.3.0 | " | " | " | " | " | " | " |
fmt* | 8.1.1 | 10.2.1 | " | " | 8.1.1 | " | " | " | 10.2.1 | 11.0.2 |
folly* | 2021.12.13.00 | " | 2024.12.02.00 | " | " | " | " | " | 2021.05.24.00 | 2024.12.02.00 |
gcc | 12.1.0 | " | 13.2.0 | " | " | " | " | " | " | |
gdb | 13.1 | " | 14.1 | " | " | " | " | " | " | |
grpc | 1.65.1 | " | " | " | " | " | " | " | " | |
highfive | 2.7.1 | " | " | " | " | 2.9.0 | " | " | " | 3.0.0-beta1 |
intel-tbb | 2020.3 | " | 2021.9.0 | " | " | " | " | " | " | 2022.0.0 |
krb5 | 1.19.2 | X | X | X | X | X | 1.20.1 | X | 1.20.1 | 1.21.3 |
librdkafka | 1.7.0 | " | " | " | " | " | 2.2.0 | " | " | 2.6.1 |
msgpack-c | 3.3.0 | " | " | " | " | " | " | " | " | 7.0.0 |
ninja | 1.10.0 | " | 1.11.1? | 1.10.0 | " | " | " | " | ||
nlohmann-json | 3.9.0 | 3.11.2 | " | " | " | " | " | " | " | 3.11.3 |
numactl | 2.0.14 | " | " | " | " | " | " | " | " | |
openssh* | 8.7p1 | X | X | X | X | X | 9.7p1 | X | 9.7p1 | 9.9p1 |
openssl | 1.1.1t | " | " | " | " | " | " | X | 3.1.0 | |
pistache* | dunedaq-v2.8.0 | " | fddaq-v5.3.0 | dunedaq-v2.8.0* | fddaq-v5.3.0* | " | " | " | " | " |
pkgconf | 2.2.0 | " | " | " | " | " | " | " | " | 2.3.0 |
protobuf | 4.24.4 | " | " | " | " | " | " | " | " | 29.0 |
pugixml | 1.12.1 | " | " | " | " | " | " | " | 1.13 | |
py-moo | 0.6.7 | " | " | " | " | " | " | " | " | " |
py-pybind11 | 2.6.2 | 2.12.0 | " | " | " | " | " | " | " | 2.13.6 |
python* | 3.10.10 | " | " | " | " | " | " | " | 3.11.7 | |
qt | 5.15.12 | " | " | " | " | " | " | " | " | |
trace | 3.17.14 | " | " | " | " | " | " | " | " | |
uhal | 2.8.1 | " | " | " | " | " | " | " | " | " |
n.b. Pistache version dunedaq-v2.8.0
between NFDT_DEV_241129_A9
and NFDT_DEV_241216_A9
had a patch added so it would build in gcc 13.2.0. Perhaps a bit confusingly, for NFDT_DEV_241230_A9
this patched dunedaq-v2.8.0
got rechristened fddaq-v5.3.0
. I made this decision in order to be consistent with my treatment of felix-software
, where I'd already bumped the version due to a simple gcc 13.2.0
compatibility patch.
n.b. Python 3.11.7
was used for the not-shown-above test build NFDT_DEV_241213_A9
, but caused an immediate failure of drunc
; see later in this document for more.
n.b. Between NFDT_241216_A9
and NFDT_241230_A9
, folly 2024.12.02.00
was rebuilt so that the -mavx2
option was removed and the FOLLY_F14_FORCE_FALLBACK
precompiler #define
was added
n.b. fmt
was reverted to its original version since there was a compatibility issue with dpdklibs
, and its further use is now deprecated anyway
n.b. For externals v2.2, openssh
is a build-only dependency in one or two locations, but it's deleted in the build and unavailable in the nightlies, unlike externals v2.1 where it was a normal external
current | NFDT_DEV_241129_A9 | NFDT_DEV_241214_A9 | NFDT_DEV_241216_A9 | NFDT_DEV_241230_A9 | NFDT_DEV_250104_A9 | NFDT_DEV_250109_A9 | NFDT_DEV_250115_A9 | Latest | |
---|---|---|---|---|---|---|---|---|---|
anytree | 2.8.0 | 2.12.1 | " | " | " | " | X | X | |
click | 8.1.7 | " | " | " | " | " | " | " | " |
click-didyoumean | 0.3.0 | 0.3.1 | " | " | " | " | " | " | " |
click-shell | 2.1 | " | " | " | " | " | " | " | " |
colorama | 0.4.4 | 0.4.6 | " | " | " | " | " | " | " |
deepdiff | 6.3.1 | 8.0.1 | " | " | " | " | X | X | |
Flask | 2.1.1 | 3.1.0 | " | " | " | " | " | " | " |
Flask-Cors | 3.0.10 | 5.0.0 | " | " | " | " | X | X | |
Flask-Caching | X | 2.3.0 | " | " | " | " | " | " | " |
Flask-HTTPAuth | 4.6.0 | 4.8.0 | " | " | " | " | " | " | " |
Flask-RESTful | 0.3.9 | 0.3.10 | " | " | " | " | " | " | " |
Flask-SQLAlchemy | X | 3.1.1 | " | " | " | " | " | " | " |
graphviz | 0.16 | 0.20.3 | " | " | " | " | X | X | |
gunicorn | 20.1.0 | 23.0.0 | " | " | " | " | " | " | " |
h5py | 3.7.0 | 3.12.1 | " | " | " | " | " | " | " |
httpx | 0.23.3 | 0.27.2 | " | " | " | " | " | " | 0.28.0 |
kubernetes | 23.6.0 | 31.0.0 | " | " | " | " | " | " | " |
matplotlib | X | 3.9.2 | " | " | " | " | " | " | 3.9.3 |
numpy | 1.24 | 2.1.3 | " | " | " | " | " | " | " |
pandas | X | 2.2.3 | " | " | " | " | " | " | " |
pexpect | 4.8.0 | 4.9.0 | " | " | " | " | " | " | " |
psutil | 5.9.0 | 6.1.0 | " | " | " | " | " | " | " |
py | 1.10.0 | 1.11.0 | " | " | " | " | " | " | |
pytest | 8.3.3 | " | " | " | " | " | " | " | 8.3.4 |
python-ipmi | 0.5.1 | 0.5.7 | " | " | " | " | " | " | " |
rsa | 4.8 | 4.9 | " | " | " | " | " | " | " |
sh | 1.14.1 | 2.1.0 | " | " | " | " | " | " | " |
textual | 0.83.0 | 0.87.1 | " | " | " | " | " | " | 0.88.1 |
transitions | 0.8.10 | 0.9.2 | " | " | " | " | " | " | " |
Notes on the package.py
's which have been vendored into daq-release over the years. Looking at the head of develop
of daq-release, d55d8cf3af4
, in spack-repos/externals/packages
:
catch2:
Vendoring occured in commit cf73df3e2
from July 3 this year, and appears related to the update to Spack 0.22.0.
It's unclear why this bump required vendoring.
catch2 doesn't depend on anything.
cetlib-except depends on [email protected]
cetlib depends on catch2
hep-concurrency depends on catch2
cetmodules depends on [email protected] (build-only)
cetlib, cetlib-except, cetmodules:
Vendored because we have no choice
Only fixed dependency is, as described above, [email protected]
cmake:
Vendored so cmake-findprotobuf.patch
can be applied to CMake versions 3.23.1 and up
Having said that, there's considerable difference beyond that between what's in daq-release and what's in builtin
cpr: Can be dropped, no longer need in DUNE DAQ
cyrus-sasl: Can be dropped, apparently superfluous dependency
dpdk: The vendored package.py literally dates from 2021
A commit from May 11, 2022: "JCF: Issue #163: add support for a Spack installation of dpdk"
Can this be dropped? Need to keep in mind things like commit 8eb8dd7d
from earlier this year, where I deal with a libarchive dependency.
felix-software: Obviously has to be vendored. Need to modify so it works with gcc 13.2
fftw: Drop, was only used by dqm
folly: Needs to remain vendored since incredibly, May 2021 is the latest version in builtin. Chesteron's fence? I was able to at least get it to December 2021. And furthermore, the December 2021 doesn't build under gcc 13.1.0.
More details: whereas its dependency, glog, goes up to 0.7.0, it can only build against 0.6.0 and no later because of a complaint about how it includes headers. The December 2024 version also depends on fast-float
, which isn't builtin in Spack, so I've vendored this as well.
grpc: Has to be vendored, builtin only goes up to 1.55.0 and default built is C++11
hep-concurrency:
Not even used. cetlib used to depend on this. Or, it depends on it, but only if there's a ~lite
build. Are we ever going to have that?
highfive:
builtin goes to 2.9.0 while vendored goes to 2.7.1. OTOH, Pengfei added a patch. Also note you needed to add +threadsafe
to hdf5
dependency. Edit this to include the later versions.
lcov: needed for my work
librdkafka:
vendored in March 2022 for unknown reasons (the classic merge proto-spack
commit); openssl dependency added a year later (commit info is add dependency of openssl
). builtin doesn't have openssl dependency but does go up to 2.2.0, vendored only goes to 1.7.0 .
libtorrent: vendored goes to 2.0.9; builtin to 0.13.8
libzmq:
vendored goes to 4.3.4, builtin to 4.3.5. Added entirely in one commit back in August 2022. Builtin has a patch Fix static assertion failure with gcc-13
, not in vendored. Candidate for removal?
msgpack-c:
vendored goes to 3.3.0, builtin to 3.1.1. Obvious keep, the only question is what further versions spack checksum
would give us
openssh:
For externals v2.2 this will only be a dependency of a dependency of git
, a dependency of go
which is a build-only dependency of rclone. But its history (on Slack, etc.) needs revisiting. Chesterton's Fence.
openssl: You vendored this in the switch to spack-0.22.0 back in the summer but it's unclear why.
perl-timedate: needed for my lcov work
pistache:
not a built in, obviously needs to be vendored. Current version used doesn't build in gcc 13.2.0, however, because of missing headers (<cstdint>
, IIRC)
protobuf: Vendored latest is 4.24.4, builtin latest is 3.25.3. Also some bespoke abseil-cpp version logic.
pugixml:
Added by Pengfei in 2022; it may not have existed in builtin. It does now, though note that builtin has 1.3, 1.11.4 and vendored has 1.12.1, 1.12, 1.11.4. Whether or not to remove it from being vendored depends on whether spack spec
picks out 1.3 or not.
py-anyconfig: Added by Pengfei in the original March 2022 commit; doesn't exist in builtin
py-jsonnet: Added by Pengfei in the original March 2022 commit; doesn't exist in builtin
py-fastjsonschema: Added by Pengfei in the original March 2022 commit; identical in builtin
py-sphinxcontrib-moderncmakedomain: Added by Pengfei in the original March 2022 commit; identical in builtin
rclone:
The vendoring of this package appears to be related to the set of rcloneConfig.cmake
-and-related files created for it. Pengfei, September 2023.
trace: Obviously needed
uhal: Not in builtin, obviously needed. Untouched since September 2022.
The minimal_system_quick_test.py
works. However, there are two new messages which you don't see when you run using the regular nightly:
-
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
(really more informational than a warning, in my opinion) - The snippet below is a subset of what you see, this
skipping fork() handlers
message actually appears for all controller and DAQ applications:
I0000 00:00:1734375559.316994 4109005 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
INFO ssh_process_manager.py:299 ssh-process-manager: Booted df-controller uid:
7500052e-7df4-4b65-a8fa-4d97662cb84f
'df-controller' (7500052e-7df4-4b65-a8fa-4d97662cb84f) process started
INFO ssh_process_manager.py:220 ssh-process-manager: Booting user: "jofreema"
session: "minimal"
name: "dfo-01"
tree_id: "0.0.0"
I0000 00:00:1734375559.333264 4109005 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
INFO ssh_process_manager.py:299 ssh-process-manager: Booted dfo-01 uid: 08b9cc5b-4d3b-4690-b8b3-18191e3d2075
'dfo-01' (08b9cc5b-4d3b-4690-b8b3-18191e3d2075) process started
INFO ssh_process_manager.py:220 ssh-process-manager: Booting user: "jofreema"
session: "minimal"
name: "df-01"
tree_id: "0.0.1"
Despite the message, integration test performance very much comparable with the regular nightlies: https://github.com/DUNE-DAQ/daq-release/actions/runs/12088527284 . One thing, however, that's appeared in the output of listrev_test.py
(but not the tests from integrationtest
are SIGHUP messages (reproduced below). It's not clear this is actually a problem since this always occurs after the data taking is complete and the system has been wound down
Problem(s) found in logfile /tmp/pytest-of-dunedaq/pytest-1998/run4/log_dunedaq_lr-session_local-connection-server.txt:
Error: 2-16 19:35:37 -0600] [4080069] [ERROR] Worker (pid:4080077) was sent SIGHUP!
Followup :
The skipping fork() handlers
message appears since the grpcio
Python package has been bumped from 1.63.0
to 1.68.0
. As can be seen from https://github.com/grpc/grpc, between those two versions the logging system used in grpc
switched from an in-house one over to abseil
; this also explains the WARNING: All log messages ...
message.
Somewhat similarly, in the case of the messages appearing as [ERROR] Worker (pid:4080077) was sent SIGHUP!
, this is also related to a change in the way information is logged. In this case, gunicorn
was bumped from 20.1.0
to 23.0.0
, and in between those versions the ./gunicorn/arbiter.py
was updated so that what would have previously been a warning (e.g. [WARNING] Worker with pid 2381488 was terminated due to signal 1
) became an error ([ERROR] Worker (pid:2371972) was sent SIGHUP!
)
Here, Python 3.11.7 was used. However, running minimal_system_quick_test.py
there's an immediate failure thanks to the dataclasses
module:
(dbt) [jofreema@np04-srv-019 /nfs/sw/work_dirs/jcfree/NFDT_DEV_241213_A9]$ pytest -s -v $DAQSYSTEMTEST_SHARE/integtest/minimal_system_quick_test.py
======================================================= test session starts =======================================================
platform linux -- Python 3.11.7, pytest-8.3.3, pluggy-1.5.0 -- /nfs/sw/work_dirs/jcfree/NFDT_DEV_241213_A9/.venv/bin/python
cachedir: .pytest_cache
rootdir: /cvmfs/dunedaq-development.opensciencegrid.org/nightly/NFDT_DEV_241213_A9/spack-0.22.0
configfile: pytest.ini
plugins: anyio-4.6.2.post1, integrationtest-3.1.0
collected 0 items / 1 error
============================================================= ERRORS ==============================================================
_ ERROR collecting opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/daqsystemtest-NFDT_DEV_241213_A9-5mfgaaoiauk5tv2cp65bc2fiwiy3vvtc/share/integtest/minimal_system_quick_test.py _
/cvmfs/dunedaq-development.opensciencegrid.org/nightly/NFDT_DEV_241213_A9/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/daqsystemtest-NFDT_DEV_241213_A9-5mfgaaoiauk5tv2cp65bc2fiwiy3vvtc/share/integtest/minimal_system_quick_test.py:6: in <module>
import integrationtest.data_classes as data_classes
<frozen importlib._bootstrap>:1176: in _find_and_load
???
<frozen importlib._bootstrap>:1147: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:690: in _load_unlocked
???
.venv/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:184: in exec_module
exec(co, module.__dict__)
.venv/lib/python3.11/site-packages/integrationtest/data_classes.py:21: in <module>
@dataclass
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/python-3.11.7-svtllxgwlzx3niutdi32fxz7wbaapnbs/lib/python3.11/dataclasses.py:1230: in dataclass
return wrap(cls)
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/python-3.11.7-svtllxgwlzx3niutdi32fxz7wbaapnbs/lib/python3.11/dataclasses.py:1220: in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/python-3.11.7-svtllxgwlzx3niutdi32fxz7wbaapnbs/lib/python3.11/dataclasses.py:958: in _process_class
cls_fields.append(_get_field(cls, name, type, kw_only))
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/python-3.11.7-svtllxgwlzx3niutdi32fxz7wbaapnbs/lib/python3.11/dataclasses.py:815: in _get_field
raise ValueError(f'mutable default {type(f.default)} for field '
E ValueError: mutable default <class 'integrationtest.data_classes.DROMap_config'> for field dro_map_config is not allowed: use default_factory
===================================================== short test summary info =====================================================
ERROR ../../../../../cvmfs/dunedaq-development.opensciencegrid.org/nightly/NFDT_DEV_241213_A9/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/daqsystemtest-NFDT_DEV_241213_A9-5mfgaaoiauk5tv2cp65bc2fiwiy3vvtc/share/integtest/minimal_system_quick_test.py - ValueError: mutable default <class 'integrationtest.data_classes.DROMap_config'> for field dro_map_config is not allowed: use ...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================================== 1 error in 4.93s =========================================================
The major difference wrt the NFDT_DEV_241213_A9
nightly is that Python gets reverted back to the "classic" 3.10.10
rather than the 3.11.7
given the failures from dataclasses
. The main sticking point this time is that when running minimal_system_quick_test.py
, while the configuration transition goes off without a hitch, the start transition reliably hangs for every process, whether it's a controller process or DAQ process (e.g., mlt, df-01, etc.):
Running transition 'start' on controller 'root-controller'
[16:03:03] ERROR shell_utils.py:138 controller_driver: Command 'execute_fsm_command' failed
on 'mlt' (response flag 'DRUNC_EXCEPTION_THROWN')
and if you looked at an individual process log, whereas you'd see something like this for the configuration transition:
2024-Dec-14 15:51:47,687 LOG [void dunedaq::restcmd::RestEndpoint::handleResponseCommand(const dunedaq::restcmd::cmdobj_t&, dunedaq::cmdlib::cmd::CommandReply&) at /tmp/root/spack-stage/spack-stage-restcmd-NBT_DEV_241214_A9-oz47s7aon65lh4f2ncc6ku4dhn4msxgl/spack-src/src/RestEndpoint.cpp:102] Sending POST request to daq.fnal.gov:59469/response
2024-Dec-14 15:51:47,781 LOG [dunedaq::restcmd::RestEndpoint::handleResponseCommand(const dunedaq::restcmd::cmdobj_t&, dunedaq::cmdlib::cmd::CommandReply&)::<lambda(Pistache::Http::Response)> at /tmp/root/spack-stage/spack-stage-restcmd-NBT_DEV_241214_A9-oz47s7aon65lh4f2ncc6ku4dhn4msxgl/spack-src/src/RestEndpoint.cpp:109] Response code = OK
you'd always see a hang after Sending POST request ...
for the start transition:
2024-Dec-14 15:51:49,809 LOG [void dunedaq::restcmd::RestEndpoint::handleResponseCommand(const dunedaq::restcmd::cmdobj_t&, dunedaq::cmdlib::cmd::CommandReply&) at /tmp/root/spack-stage/spack-stage-restcmd-NBT_DEV_241214_A9-oz47s7aon65lh4f2ncc6ku4dhn4msxgl/spack-src/src/RestEndpoint.cpp:102] Sending POST request to daq.fnal.gov:59469/response
This is what led me to hypothesize that the problem had to do with the Pistache package which restcmd
depends on and which I'd bumped up from its classic, October 2020 commit to a December 2024 commit between NFDT_DEV_241129_A9
and NFDT_DEV_241214_A9
.
Now, another important point: whether it's because of the bump from gcc 12.1.0 to gcc 13.2.0 or the bump of folly from 2021.12.13.00 to 2024.12.02.00, it seems folly refuses to link to a translation unit if it can tell that the translation unit was built with a different set of flags (it uses F14LinkCheck
for this). As a result, since fdreadoutlibs
uses the -mavx2
option to access Advanced Vector Extensions 2, I needed to move this flag into daq-cmake's daq_setup_environment
so that all code would build with it.
As just mentioned, the major change here was to revert Pistache to its October 2020 commit, which we've been using for years. With one catch: I had to add a patch in order for it to build against gcc 13.2.0 rather than 12.1.0. It's the same change I've made in other packages, namely, adding <cXXXXXX>
includes where needed (e.g., <cstdlib>
).
With this change made, the hang went away. E.g., integration tests worked about as well as you'd expect, especially given the recent woes of daq.fnal.gov
: https://github.com/DUNE-DAQ/daq-release/actions/runs/12364412829
Having said that, a couple of things to note:
- Specifically on
protodune-daq01.fnal.gov
, the integration tests (and builds, etc.) don't seem to work. Will be investigated but I wonder if it's related to (1) the jump from gcc 12.1.0 to gcc 13.2.0 and/or (2) the expanded instruction set from the-mavx2
option passed to gcc. On the np04 cluster anddaq.fnal.gov
, this doesn't show up.
The main change here is that now, rather than building folly
+ all the DUNE DAQ packages with the -mavx2
option as was the case for NFDT_DEV_241214_A9
and NFDT_DEV_241216_A9
, I build it with the FOLLY_F14_FORCE_FALLBACK
preprocessor #define
d and set to 1
. I also build the DUNE DAQ packages this way. This allows folly
to link against translation units with different build flags.
Note that with this change, it's possible to build the full DUNE DAQ stack on protodune-daq01.fnal.gov
whereas it wasn't before (in fact, this was done to create NFDU_DEV_241230_A9
, same as NFDT_DEV_241230_A9
except the choice of build machine). The integration tests continue to fail, but get further than under NFDT_DEV_241216_A9
. Whereas with the NFDT_DEV_241216_A9
integration tests on protodune-daq01
nothing even booted (https://github.com/DUNE-DAQ/daq-release/actions/runs/12376009563) now with NFDU_DEV_241216_A9
things boot, but all the applications then crash (https://github.com/DUNE-DAQ/daq-release/actions/runs/12376009563).
Digging down, what I know so far is that if, in a NFDU_DEV_241230_A9
-based work area on protodune-daq01
I try this directly from the command line:
daq_application -s minimal --name mlt -c rest://localhost:0 --configurationService oksconflibs:/tmp/pytest-of-dunedaq/pytest-18/config0/integtest-session-resolved.data.xml
then there's a crash with a complaint about an Illegal Instruction (this is an example of a daq_application
call done during integration tests). Will investigate some more, though it's important to note that the integration tests run fine on np04
and daq.fnal.gov
.
Followup : The Illegal Instruction complaint occurs in other programs as well when we run on protodune-daq01
; e.g., queue_IO_check
from iomanager
. However, I'm really not going to lose sleep over this for reasons I'm about to explain. Whereas NFDU_DEV_241230_A9
was built on protodune-daq01.fnal.gov
, the externals it uses weren't; they were built as usual on daq.fnal.gov
. And the fact is, protodune-daq01
has old processors: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz. If you go to the relevant Intel page you see that this was released in 2012 (!). On all other hosts I've tried (lxplus960
, np04-srv-019
, daq.fnal.gov
) we don't have the Illegal Instruction complaint. Unless people clamor for our software to run on decade old processors I'm fine with it not running on a local, old computer at Fermilab.
Essentially the only difference here with respect to NFDT_DEV_241230_A9
is that I bumped highfive
from version 2.7.1
to 2.9.0
. The version bump was largely opportunistic, a result of my needing to pore over how HighFive brings in HDF5 and adding a patch which accounts for a false warning message it provided claiming that HDF5 doesn't support parallel computing (it does). For more details on what I'm talking about, see the Issue I filed and the commit I used to address it. Note I haven't yet closed the Issue since the commit currently exists on the johnfreeman/update_externals
branch of daq-release and not yet its develop
branch.
One issue I've discovered - and that almost certainly extends back to earlier test releases as well - is that when programs are link against RdKafka::rdkafka
and/or RdKafka::rdkafka++
(e.g., as happens in erskafka
) we get this warning, which I'll need to investigate:
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-11.4.1/binutils-2.42-7o72ksi4dkzlabij67uk3bmzouupx4mh/bin/ld: warning: libcrypto.so.3, needed by /lib64/libgssapi_krb5.so.2, may conflict with libcrypto.so.1.1
This test nightly uses what I consider to be the final version of externals v2.2, unless I get input requesting otherwise. Relative to NFDT_DEV_250104_A9
, the changes are:
-To fix the linking warning I describe above in the entry for NFDT_DEV_250104_A9
, openssh
and its dependency krb5
, which are in externals v2.1 but which I'd been leaving out of externals v2.2, had to be included. The reason is that the librdkafka
shared object libraries use libgssapi_krb5.so.*
libraries, found in krb5
but also on a system's /lib64
directory. The libgssapi_krb5.so.*
previously used by librdkafka
was the system one, which depended on one version of libcrypto.so.*
, but openssl
in our stack was using a different version, causing a link warning. After discussion with Alessandro and Pengfei on the Software Coordination channel on Jan-9-2024, I reincluded them. Note that dropping openssl
from the stack wasn't an attractive option since dependencies on it appear in many different package.py
's in the builtin
area which we make use of, and they would all have needed to be vendored.
-I took the opportunity to bump the version of librdkafka
as well, plus make its dependence on krb5
explicit in its (vendored) package.py
file.
-A few Python packages were dropped, namely nanorc
and the packages which were only used for nanorc
.
This test nightly's build can be found here and its integration tests here.
Two changes after some discussion with Alessandro. One is that librdkafka
now depends on cyrus-sasl
rather than krb5
; this is because what its CMakeFile.txt
directly needs is the SASL ("Simple Authentication and Security Layer") package, not krb5
. The other is that on his request I've stripped out openssh
, krb5
, and openssl
; note that stripping out openssl
required a bit of work.
The test nightly's build can be found here and its integration tests here.
-
NFDT_DEV_250111_DEV
contains the latest-greatest externals v2.2; I feel that it's ready for general use -
gcc
is bumped from12.1.0
to13.2.0
. This means full support for C++20 (formerly, the<format>
library wasn't available) -
python
did not get bumped; as noted in the section onNFDT_DEV_241213_A9
,integrationtest
isn't compatible with thedataclasses
module which came with the newer version of Python I tried -
cmake
did not get bumped; my reasoning being that its version was bumped less than a year ago, plus a fear of rocking the boat with respect to thedaq-cmake
functions -
In general, if version bumps were available for externals packages, I took them. In some cases, however, I didn't - e.g.,
dpdk
is such a specialist package that I didn't want to touch it, and a newer version offmt
brokedpdklibs
(plus its use is now deprecated anyway) - A few packages needed (trivial) patches to build against
gcc
13.2.0
; this usually involved adding<cstdint>
where they'd been missing. Sopistache
andfelix-software
versionsfddaq-v5.3.0
are simply their previous versions, but with these patches added - One phenemenon to be aware of is that as package versions are bumped, they tend to add new messages (info, warning, or error). Details on how I handled this can be found in the test-nightly-specific notes above
- A few of the DUNE DAQ repos needed very minor changes. All of them involved adding the header I mentioned above (
<cstdint>
), except fordaq-cmake
(needed to add a preprocessor definition to getfolly
to link correctly) andcmdlib
(needed to modernize the way it referred tointel-tbb
in itsCMakeLists.txt
). These changes are all on branches calledjohnfreeman/build_against_externals_v2.2
. - Processes from releases built with
gcc
13.2.0
result inIllegal Instruction
errors onprotodune-daq01.fnal.gov
. However, that computer uses processors which were released in 2012, and the code runs fine on computers which chips made in the last decade such asnp04
,lxplus
,daq.fnal.gov
, so I'm not too concerned. -
openssh
,krb5
andopenssl
were all removed as Spack packages; the system versions of the packages will now be used.