Skip to content

Externals Survey

John Freeman edited this page Jan 15, 2025 · 43 revisions

Spack Packages

  • Current: what's in ext-v2.1, i.e. the externals used by our current nightlies
  • NFDT_DEV_24<MMDD>_A9: what's in the still-in-development ext-v2.2 and used by that particular test nightly
  • Preferred (no checksum): the version labeled as "Preferred" in the latest package.py
  • Latest from checksum: latest version if you run spack checksum <package>
Current NFDT_DEV_241129_A9 NFDT_DEV_241214_A9 NFDT_DEV_241216_A9 NFDT_DEV_241230_A9 NFDT_DEV_250104_A9 NFDT_DEV_250109_A9 NFDT_DEV_250115_A9 Preferred (no checksum) Latest from checksum
abseil-cpp 20240116.2 " " " " " " " " 20240722.0
boost 1.77.0 1.85.0 " " " " " " " 1_87_0_b1
cetlib 3.18.01 " " " " " " " " "
cli11 2.3.2 " " " " " " " " 2.4.2
cmake 3.26.3 " " " " " " " 3.27.9 3.31.1
cppzmq 4.8.1 4.10.0 " " " " " " " "
cyrus-sasl 2.1.27 X X X X X 2.1.28 " " "
dpdk 22.11 " " " " " " " 23.03 "
felix-software dunedaq-v4.2.0 " fddaq-v5.3.0 " " " " " " "
fmt* 8.1.1 10.2.1 " " 8.1.1 " " " 10.2.1 11.0.2
folly* 2021.12.13.00 " 2024.12.02.00 " " " " " 2021.05.24.00 2024.12.02.00
gcc 12.1.0 " 13.2.0 " " " " " "
gdb 13.1 " 14.1 " " " " " "
grpc 1.65.1 " " " " " " " "
highfive 2.7.1 " " " " 2.9.0 " " " 3.0.0-beta1
intel-tbb 2020.3 " 2021.9.0 " " " " " " 2022.0.0
krb5 1.19.2 X X X X X 1.20.1 X 1.20.1 1.21.3
librdkafka 1.7.0 " " " " " 2.2.0 " " 2.6.1
msgpack-c 3.3.0 " " " " " " " " 7.0.0
ninja 1.10.0 " 1.11.1? 1.10.0 " " " "
nlohmann-json 3.9.0 3.11.2 " " " " " " " 3.11.3
numactl 2.0.14 " " " " " " " "
openssh* 8.7p1 X X X X X 9.7p1 X 9.7p1 9.9p1
openssl 1.1.1t " " " " " " X 3.1.0
pistache* dunedaq-v2.8.0 " fddaq-v5.3.0 dunedaq-v2.8.0* fddaq-v5.3.0* " " " " "
pkgconf 2.2.0 " " " " " " " " 2.3.0
protobuf 4.24.4 " " " " " " " " 29.0
pugixml 1.12.1 " " " " " " " 1.13
py-moo 0.6.7 " " " " " " " " "
py-pybind11 2.6.2 2.12.0 " " " " " " " 2.13.6
python* 3.10.10 " " " " " " " 3.11.7
qt 5.15.12 " " " " " " " "
trace 3.17.14 " " " " " " " "
uhal 2.8.1 " " " " " " " " "

n.b. Pistache version dunedaq-v2.8.0 between NFDT_DEV_241129_A9 and NFDT_DEV_241216_A9 had a patch added so it would build in gcc 13.2.0. Perhaps a bit confusingly, for NFDT_DEV_241230_A9 this patched dunedaq-v2.8.0 got rechristened fddaq-v5.3.0. I made this decision in order to be consistent with my treatment of felix-software, where I'd already bumped the version due to a simple gcc 13.2.0 compatibility patch.

n.b. Python 3.11.7 was used for the not-shown-above test build NFDT_DEV_241213_A9, but caused an immediate failure of drunc; see later in this document for more.

n.b. Between NFDT_241216_A9 and NFDT_241230_A9, folly 2024.12.02.00 was rebuilt so that the -mavx2 option was removed and the FOLLY_F14_FORCE_FALLBACK precompiler #define was added

n.b. fmt was reverted to its original version since there was a compatibility issue with dpdklibs, and its further use is now deprecated anyway

n.b. For externals v2.2, openssh is a build-only dependency in one or two locations, but it's deleted in the build and unavailable in the nightlies, unlike externals v2.1 where it was a normal external

Python packages

current NFDT_DEV_241129_A9 NFDT_DEV_241214_A9 NFDT_DEV_241216_A9 NFDT_DEV_241230_A9 NFDT_DEV_250104_A9 NFDT_DEV_250109_A9 NFDT_DEV_250115_A9 Latest
anytree 2.8.0 2.12.1 " " " " X X
click 8.1.7 " " " " " " " "
click-didyoumean 0.3.0 0.3.1 " " " " " " "
click-shell 2.1 " " " " " " " "
colorama 0.4.4 0.4.6 " " " " " " "
deepdiff 6.3.1 8.0.1 " " " " X X
Flask 2.1.1 3.1.0 " " " " " " "
Flask-Cors 3.0.10 5.0.0 " " " " X X
Flask-Caching X 2.3.0 " " " " " " "
Flask-HTTPAuth 4.6.0 4.8.0 " " " " " " "
Flask-RESTful 0.3.9 0.3.10 " " " " " " "
Flask-SQLAlchemy X 3.1.1 " " " " " " "
graphviz 0.16 0.20.3 " " " " X X
gunicorn 20.1.0 23.0.0 " " " " " " "
h5py 3.7.0 3.12.1 " " " " " " "
httpx 0.23.3 0.27.2 " " " " " " 0.28.0
kubernetes 23.6.0 31.0.0 " " " " " " "
matplotlib X 3.9.2 " " " " " " 3.9.3
numpy 1.24 2.1.3 " " " " " " "
pandas X 2.2.3 " " " " " " "
pexpect 4.8.0 4.9.0 " " " " " " "
psutil 5.9.0 6.1.0 " " " " " " "
py 1.10.0 1.11.0 " " " " " "
pytest 8.3.3 " " " " " " " 8.3.4
python-ipmi 0.5.1 0.5.7 " " " " " " "
rsa 4.8 4.9 " " " " " " "
sh 1.14.1 2.1.0 " " " " " " "
textual 0.83.0 0.87.1 " " " " " " 0.88.1
transitions 0.8.10 0.9.2 " " " " " " "

Notes on vendored package.pys

Notes on the package.py's which have been vendored into daq-release over the years. Looking at the head of develop of daq-release, d55d8cf3af4, in spack-repos/externals/packages:

catch2:

Vendoring occured in commit cf73df3e2 from July 3 this year, and appears related to the update to Spack 0.22.0. It's unclear why this bump required vendoring. catch2 doesn't depend on anything.

cetlib-except depends on [email protected]

cetlib depends on catch2

hep-concurrency depends on catch2

cetmodules depends on [email protected] (build-only)

cetlib, cetlib-except, cetmodules:

Vendored because we have no choice

Only fixed dependency is, as described above, [email protected]

cmake:

Vendored so cmake-findprotobuf.patch can be applied to CMake versions 3.23.1 and up

Having said that, there's considerable difference beyond that between what's in daq-release and what's in builtin

cpr: Can be dropped, no longer need in DUNE DAQ

cyrus-sasl: Can be dropped, apparently superfluous dependency

dpdk: The vendored package.py literally dates from 2021

A commit from May 11, 2022: "JCF: Issue #163: add support for a Spack installation of dpdk"

Can this be dropped? Need to keep in mind things like commit 8eb8dd7d from earlier this year, where I deal with a libarchive dependency.

felix-software: Obviously has to be vendored. Need to modify so it works with gcc 13.2

fftw: Drop, was only used by dqm

folly: Needs to remain vendored since incredibly, May 2021 is the latest version in builtin. Chesteron's fence? I was able to at least get it to December 2021. And furthermore, the December 2021 doesn't build under gcc 13.1.0.

More details: whereas its dependency, glog, goes up to 0.7.0, it can only build against 0.6.0 and no later because of a complaint about how it includes headers. The December 2024 version also depends on fast-float, which isn't builtin in Spack, so I've vendored this as well.

grpc: Has to be vendored, builtin only goes up to 1.55.0 and default built is C++11

hep-concurrency: Not even used. cetlib used to depend on this. Or, it depends on it, but only if there's a ~lite build. Are we ever going to have that?

highfive: builtin goes to 2.9.0 while vendored goes to 2.7.1. OTOH, Pengfei added a patch. Also note you needed to add +threadsafe to hdf5 dependency. Edit this to include the later versions.

lcov: needed for my work

librdkafka: vendored in March 2022 for unknown reasons (the classic merge proto-spack commit); openssl dependency added a year later (commit info is add dependency of openssl). builtin doesn't have openssl dependency but does go up to 2.2.0, vendored only goes to 1.7.0 .

libtorrent: vendored goes to 2.0.9; builtin to 0.13.8

libzmq: vendored goes to 4.3.4, builtin to 4.3.5. Added entirely in one commit back in August 2022. Builtin has a patch Fix static assertion failure with gcc-13, not in vendored. Candidate for removal?

msgpack-c: vendored goes to 3.3.0, builtin to 3.1.1. Obvious keep, the only question is what further versions spack checksum would give us

openssh: For externals v2.2 this will only be a dependency of a dependency of git, a dependency of go which is a build-only dependency of rclone. But its history (on Slack, etc.) needs revisiting. Chesterton's Fence.

openssl: You vendored this in the switch to spack-0.22.0 back in the summer but it's unclear why.

perl-timedate: needed for my lcov work

pistache: not a built in, obviously needs to be vendored. Current version used doesn't build in gcc 13.2.0, however, because of missing headers (<cstdint>, IIRC)

protobuf: Vendored latest is 4.24.4, builtin latest is 3.25.3. Also some bespoke abseil-cpp version logic.

pugixml: Added by Pengfei in 2022; it may not have existed in builtin. It does now, though note that builtin has 1.3, 1.11.4 and vendored has 1.12.1, 1.12, 1.11.4. Whether or not to remove it from being vendored depends on whether spack spec picks out 1.3 or not.

py-anyconfig: Added by Pengfei in the original March 2022 commit; doesn't exist in builtin

py-jsonnet: Added by Pengfei in the original March 2022 commit; doesn't exist in builtin

py-fastjsonschema: Added by Pengfei in the original March 2022 commit; identical in builtin

py-sphinxcontrib-moderncmakedomain: Added by Pengfei in the original March 2022 commit; identical in builtin

rclone: The vendoring of this package appears to be related to the set of rcloneConfig.cmake-and-related files created for it. Pengfei, September 2023.

trace: Obviously needed

uhal: Not in builtin, obviously needed. Untouched since September 2022.

Notes on specific nightlies

NFDT_DEV_241129_A9

The minimal_system_quick_test.py works. However, there are two new messages which you don't see when you run using the regular nightly:

  1. WARNING: All log messages before absl::InitializeLog() is called are written to STDERR (really more informational than a warning, in my opinion)
  2. The snippet below is a subset of what you see, this skipping fork() handlers message actually appears for all controller and DAQ applications:
I0000 00:00:1734375559.316994 4109005 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
           INFO     ssh_process_manager.py:299      ssh-process-manager:    Booted df-controller uid:                              
                    7500052e-7df4-4b65-a8fa-4d97662cb84f                                                                           
'df-controller' (7500052e-7df4-4b65-a8fa-4d97662cb84f) process started
           INFO     ssh_process_manager.py:220      ssh-process-manager:    Booting user: "jofreema"                               
                    session: "minimal"                                                                                             
                    name: "dfo-01"                                                                                                 
                    tree_id: "0.0.0"                                                                                               
                                                                                                                                   
I0000 00:00:1734375559.333264 4109005 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
           INFO     ssh_process_manager.py:299      ssh-process-manager:    Booted dfo-01 uid: 08b9cc5b-4d3b-4690-b8b3-18191e3d2075
'dfo-01' (08b9cc5b-4d3b-4690-b8b3-18191e3d2075) process started
           INFO     ssh_process_manager.py:220      ssh-process-manager:    Booting user: "jofreema"                               
                    session: "minimal"                                                                                             
                    name: "df-01"                                                                                                  
                    tree_id: "0.0.1"                                                                                      

Despite the message, integration test performance very much comparable with the regular nightlies: https://github.com/DUNE-DAQ/daq-release/actions/runs/12088527284 . One thing, however, that's appeared in the output of listrev_test.py (but not the tests from integrationtest are SIGHUP messages (reproduced below). It's not clear this is actually a problem since this always occurs after the data taking is complete and the system has been wound down

Problem(s) found in logfile /tmp/pytest-of-dunedaq/pytest-1998/run4/log_dunedaq_lr-session_local-connection-server.txt:
Error: 2-16 19:35:37 -0600] [4080069] [ERROR] Worker (pid:4080077) was sent SIGHUP!

Followup : The skipping fork() handlers message appears since the grpcio Python package has been bumped from 1.63.0 to 1.68.0. As can be seen from https://github.com/grpc/grpc, between those two versions the logging system used in grpc switched from an in-house one over to abseil; this also explains the WARNING: All log messages ... message.

Somewhat similarly, in the case of the messages appearing as [ERROR] Worker (pid:4080077) was sent SIGHUP!, this is also related to a change in the way information is logged. In this case, gunicorn was bumped from 20.1.0 to 23.0.0, and in between those versions the ./gunicorn/arbiter.py was updated so that what would have previously been a warning (e.g. [WARNING] Worker with pid 2381488 was terminated due to signal 1) became an error ([ERROR] Worker (pid:2371972) was sent SIGHUP!)

NFDT_DEV_241213_A9

Here, Python 3.11.7 was used. However, running minimal_system_quick_test.py there's an immediate failure thanks to the dataclasses module:

(dbt) [jofreema@np04-srv-019 /nfs/sw/work_dirs/jcfree/NFDT_DEV_241213_A9]$ pytest -s -v $DAQSYSTEMTEST_SHARE/integtest/minimal_system_quick_test.py
======================================================= test session starts =======================================================
platform linux -- Python 3.11.7, pytest-8.3.3, pluggy-1.5.0 -- /nfs/sw/work_dirs/jcfree/NFDT_DEV_241213_A9/.venv/bin/python
cachedir: .pytest_cache
rootdir: /cvmfs/dunedaq-development.opensciencegrid.org/nightly/NFDT_DEV_241213_A9/spack-0.22.0
configfile: pytest.ini
plugins: anyio-4.6.2.post1, integrationtest-3.1.0
collected 0 items / 1 error                                                                                                       

============================================================= ERRORS ==============================================================
_ ERROR collecting opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/daqsystemtest-NFDT_DEV_241213_A9-5mfgaaoiauk5tv2cp65bc2fiwiy3vvtc/share/integtest/minimal_system_quick_test.py _
/cvmfs/dunedaq-development.opensciencegrid.org/nightly/NFDT_DEV_241213_A9/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/daqsystemtest-NFDT_DEV_241213_A9-5mfgaaoiauk5tv2cp65bc2fiwiy3vvtc/share/integtest/minimal_system_quick_test.py:6: in <module>
    import integrationtest.data_classes as data_classes
<frozen importlib._bootstrap>:1176: in _find_and_load
    ???
<frozen importlib._bootstrap>:1147: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
.venv/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:184: in exec_module
    exec(co, module.__dict__)
.venv/lib/python3.11/site-packages/integrationtest/data_classes.py:21: in <module>
    @dataclass
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/python-3.11.7-svtllxgwlzx3niutdi32fxz7wbaapnbs/lib/python3.11/dataclasses.py:1230: in dataclass
    return wrap(cls)
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/python-3.11.7-svtllxgwlzx3niutdi32fxz7wbaapnbs/lib/python3.11/dataclasses.py:1220: in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/python-3.11.7-svtllxgwlzx3niutdi32fxz7wbaapnbs/lib/python3.11/dataclasses.py:958: in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/python-3.11.7-svtllxgwlzx3niutdi32fxz7wbaapnbs/lib/python3.11/dataclasses.py:815: in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
E   ValueError: mutable default <class 'integrationtest.data_classes.DROMap_config'> for field dro_map_config is not allowed: use default_factory
===================================================== short test summary info =====================================================
ERROR ../../../../../cvmfs/dunedaq-development.opensciencegrid.org/nightly/NFDT_DEV_241213_A9/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-13.2.0/daqsystemtest-NFDT_DEV_241213_A9-5mfgaaoiauk5tv2cp65bc2fiwiy3vvtc/share/integtest/minimal_system_quick_test.py - ValueError: mutable default <class 'integrationtest.data_classes.DROMap_config'> for field dro_map_config is not allowed: use ...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================================== 1 error in 4.93s =========================================================

NFDT_DEV_241214_A9

The major difference wrt the NFDT_DEV_241213_A9 nightly is that Python gets reverted back to the "classic" 3.10.10 rather than the 3.11.7 given the failures from dataclasses. The main sticking point this time is that when running minimal_system_quick_test.py, while the configuration transition goes off without a hitch, the start transition reliably hangs for every process, whether it's a controller process or DAQ process (e.g., mlt, df-01, etc.):

Running transition 'start' on controller 'root-controller'
[16:03:03] ERROR    shell_utils.py:138      controller_driver:      Command 'execute_fsm_command' failed
                    on 'mlt' (response flag 'DRUNC_EXCEPTION_THROWN')

and if you looked at an individual process log, whereas you'd see something like this for the configuration transition:

2024-Dec-14 15:51:47,687 LOG [void dunedaq::restcmd::RestEndpoint::handleResponseCommand(const dunedaq::restcmd::cmdobj_t&, dunedaq::cmdlib::cmd::CommandReply&) at /tmp/root/spack-stage/spack-stage-restcmd-NBT_DEV_241214_A9-oz47s7aon65lh4f2ncc6ku4dhn4msxgl/spack-src/src/RestEndpoint.cpp:102] Sending POST request to daq.fnal.gov:59469/response
2024-Dec-14 15:51:47,781 LOG [dunedaq::restcmd::RestEndpoint::handleResponseCommand(const dunedaq::restcmd::cmdobj_t&, dunedaq::cmdlib::cmd::CommandReply&)::<lambda(Pistache::Http::Response)> at /tmp/root/spack-stage/spack-stage-restcmd-NBT_DEV_241214_A9-oz47s7aon65lh4f2ncc6ku4dhn4msxgl/spack-src/src/RestEndpoint.cpp:109] Response code = OK

you'd always see a hang after Sending POST request ... for the start transition:

2024-Dec-14 15:51:49,809 LOG [void dunedaq::restcmd::RestEndpoint::handleResponseCommand(const dunedaq::restcmd::cmdobj_t&, dunedaq::cmdlib::cmd::CommandReply&) at /tmp/root/spack-stage/spack-stage-restcmd-NBT_DEV_241214_A9-oz47s7aon65lh4f2ncc6ku4dhn4msxgl/spack-src/src/RestEndpoint.cpp:102] Sending POST request to daq.fnal.gov:59469/response

This is what led me to hypothesize that the problem had to do with the Pistache package which restcmd depends on and which I'd bumped up from its classic, October 2020 commit to a December 2024 commit between NFDT_DEV_241129_A9 and NFDT_DEV_241214_A9.

Now, another important point: whether it's because of the bump from gcc 12.1.0 to gcc 13.2.0 or the bump of folly from 2021.12.13.00 to 2024.12.02.00, it seems folly refuses to link to a translation unit if it can tell that the translation unit was built with a different set of flags (it uses F14LinkCheck for this). As a result, since fdreadoutlibs uses the -mavx2 option to access Advanced Vector Extensions 2, I needed to move this flag into daq-cmake's daq_setup_environment so that all code would build with it.

NFDT_DEV_241216_A9

As just mentioned, the major change here was to revert Pistache to its October 2020 commit, which we've been using for years. With one catch: I had to add a patch in order for it to build against gcc 13.2.0 rather than 12.1.0. It's the same change I've made in other packages, namely, adding <cXXXXXX> includes where needed (e.g., <cstdlib>).

With this change made, the hang went away. E.g., integration tests worked about as well as you'd expect, especially given the recent woes of daq.fnal.gov: https://github.com/DUNE-DAQ/daq-release/actions/runs/12364412829

Having said that, a couple of things to note:

  1. Specifically on protodune-daq01.fnal.gov, the integration tests (and builds, etc.) don't seem to work. Will be investigated but I wonder if it's related to (1) the jump from gcc 12.1.0 to gcc 13.2.0 and/or (2) the expanded instruction set from the -mavx2 option passed to gcc. On the np04 cluster and daq.fnal.gov, this doesn't show up.

NFDT_DEV_241230_A9

The main change here is that now, rather than building folly + all the DUNE DAQ packages with the -mavx2 option as was the case for NFDT_DEV_241214_A9 and NFDT_DEV_241216_A9, I build it with the FOLLY_F14_FORCE_FALLBACK preprocessor #defined and set to 1. I also build the DUNE DAQ packages this way. This allows folly to link against translation units with different build flags.

Note that with this change, it's possible to build the full DUNE DAQ stack on protodune-daq01.fnal.gov whereas it wasn't before (in fact, this was done to create NFDU_DEV_241230_A9, same as NFDT_DEV_241230_A9 except the choice of build machine). The integration tests continue to fail, but get further than under NFDT_DEV_241216_A9. Whereas with the NFDT_DEV_241216_A9 integration tests on protodune-daq01 nothing even booted (https://github.com/DUNE-DAQ/daq-release/actions/runs/12376009563) now with NFDU_DEV_241216_A9 things boot, but all the applications then crash (https://github.com/DUNE-DAQ/daq-release/actions/runs/12376009563).

Digging down, what I know so far is that if, in a NFDU_DEV_241230_A9-based work area on protodune-daq01 I try this directly from the command line:

daq_application -s minimal --name mlt -c rest://localhost:0 --configurationService oksconflibs:/tmp/pytest-of-dunedaq/pytest-18/config0/integtest-session-resolved.data.xml

then there's a crash with a complaint about an Illegal Instruction (this is an example of a daq_application call done during integration tests). Will investigate some more, though it's important to note that the integration tests run fine on np04 and daq.fnal.gov.

Followup : The Illegal Instruction complaint occurs in other programs as well when we run on protodune-daq01; e.g., queue_IO_check from iomanager. However, I'm really not going to lose sleep over this for reasons I'm about to explain. Whereas NFDU_DEV_241230_A9 was built on protodune-daq01.fnal.gov, the externals it uses weren't; they were built as usual on daq.fnal.gov. And the fact is, protodune-daq01 has old processors: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz. If you go to the relevant Intel page you see that this was released in 2012 (!). On all other hosts I've tried (lxplus960, np04-srv-019, daq.fnal.gov) we don't have the Illegal Instruction complaint. Unless people clamor for our software to run on decade old processors I'm fine with it not running on a local, old computer at Fermilab.

NFDT_DEV_250104_A9

Essentially the only difference here with respect to NFDT_DEV_241230_A9 is that I bumped highfive from version 2.7.1 to 2.9.0. The version bump was largely opportunistic, a result of my needing to pore over how HighFive brings in HDF5 and adding a patch which accounts for a false warning message it provided claiming that HDF5 doesn't support parallel computing (it does). For more details on what I'm talking about, see the Issue I filed and the commit I used to address it. Note I haven't yet closed the Issue since the commit currently exists on the johnfreeman/update_externals branch of daq-release and not yet its develop branch.

One issue I've discovered - and that almost certainly extends back to earlier test releases as well - is that when programs are link against RdKafka::rdkafka and/or RdKafka::rdkafka++ (e.g., as happens in erskafka) we get this warning, which I'll need to investigate:

/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.2/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-11.4.1/binutils-2.42-7o72ksi4dkzlabij67uk3bmzouupx4mh/bin/ld: warning: libcrypto.so.3, needed by /lib64/libgssapi_krb5.so.2, may conflict with libcrypto.so.1.1

NFDT_DEV_250109_A9

This test nightly uses what I consider to be the final version of externals v2.2, unless I get input requesting otherwise. Relative to NFDT_DEV_250104_A9, the changes are:

-To fix the linking warning I describe above in the entry for NFDT_DEV_250104_A9, openssh and its dependency krb5, which are in externals v2.1 but which I'd been leaving out of externals v2.2, had to be included. The reason is that the librdkafka shared object libraries use libgssapi_krb5.so.* libraries, found in krb5 but also on a system's /lib64 directory. The libgssapi_krb5.so.* previously used by librdkafka was the system one, which depended on one version of libcrypto.so.*, but openssl in our stack was using a different version, causing a link warning. After discussion with Alessandro and Pengfei on the Software Coordination channel on Jan-9-2024, I reincluded them. Note that dropping openssl from the stack wasn't an attractive option since dependencies on it appear in many different package.py's in the builtin area which we make use of, and they would all have needed to be vendored.

-I took the opportunity to bump the version of librdkafka as well, plus make its dependence on krb5 explicit in its (vendored) package.py file.

-A few Python packages were dropped, namely nanorc and the packages which were only used for nanorc.

This test nightly's build can be found here and its integration tests here.

NFDT_DEV_250115_A9

Two changes after some discussion with Alessandro. One is that librdkafka now depends on cyrus-sasl rather than krb5; this is because what its CMakeFile.txt directly needs is the SASL ("Simple Authentication and Security Layer") package, not krb5. The other is that on his request I've stripped out openssh, krb5, and openssl; note that stripping out openssl required a bit of work.

The test nightly's build can be found here and its integration tests here.

A very high-level overview of externals v2.2

  • NFDT_DEV_250111_DEV contains the latest-greatest externals v2.2; I feel that it's ready for general use
  • gcc is bumped from 12.1.0 to 13.2.0. This means full support for C++20 (formerly, the <format> library wasn't available)
  • python did not get bumped; as noted in the section on NFDT_DEV_241213_A9, integrationtest isn't compatible with the dataclasses module which came with the newer version of Python I tried
  • cmake did not get bumped; my reasoning being that its version was bumped less than a year ago, plus a fear of rocking the boat with respect to the daq-cmake functions
  • In general, if version bumps were available for externals packages, I took them. In some cases, however, I didn't - e.g., dpdk is such a specialist package that I didn't want to touch it, and a newer version of fmt broke dpdklibs (plus its use is now deprecated anyway)
  • A few packages needed (trivial) patches to build against gcc 13.2.0; this usually involved adding <cstdint> where they'd been missing. So pistache and felix-software versions fddaq-v5.3.0 are simply their previous versions, but with these patches added
  • One phenemenon to be aware of is that as package versions are bumped, they tend to add new messages (info, warning, or error). Details on how I handled this can be found in the test-nightly-specific notes above
  • A few of the DUNE DAQ repos needed very minor changes. All of them involved adding the header I mentioned above (<cstdint>), except for daq-cmake (needed to add a preprocessor definition to get folly to link correctly) and cmdlib (needed to modernize the way it referred to intel-tbb in its CMakeLists.txt). These changes are all on branches called johnfreeman/build_against_externals_v2.2.
  • Processes from releases built with gcc 13.2.0 result in Illegal Instruction errors on protodune-daq01.fnal.gov. However, that computer uses processors which were released in 2012, and the code runs fine on computers which chips made in the last decade such as np04, lxplus, daq.fnal.gov, so I'm not too concerned.
  • openssh, krb5 and openssl were all removed as Spack packages; the system versions of the packages will now be used.