-
Notifications
You must be signed in to change notification settings - Fork 0
meeting 2025 01 09
- date & time: Thu 9 Jan 2025 - 14:00 CET (13:00 UTC)
- (every first Thursday of the month)
- venue: (online, see mail for meeting link, or ask in Slack)
- agenda:
- Quick introduction by new people
- EESSI-related meetings and events in last month(s)
- Progress update per EESSI layer
- Update on EESSI production repository software.eessi.io
- Update on EESSI test suite + build-and-deploy bot
- Integration of EESSI in EuroHPC Federation Platform
- AWS/Azure sponsorship update
- Upcoming/recent events
- Q&A
(by Bob/Pedro)
- Christian Kniep
- "One of the original gangsters"
- Used to work AWS and Docker and others
- Now working for MemVerge
- Containers for AI factories
- Interested in packaging parts of EESSI into containers
- E4 company
- HPC system integrator company from Italy
- Interested in EESSI because it allows them to provide customers with a software stack
- Jannetta Steyn
- Newcastle University
- Interested in running the EESSI stack on Raspberry Pi's
(see slides)
- Thank you very much for voting on EESSI! 🙏
(see slides)
-
Pull request that added capability of handling subdirectories for installed software, needed for dev.eessi.io
- Check dev.eessi.io documentation for more details
-
New (minor) CernVM-FS release, major release expected soon
- Should/can we update it? Usually the procedure for upgrading to a new major release is carefully described.
- Clients wouldn´t need to update as older versions are still supported (up to a point). Something to look into: how old a version of CVMFS clients can we support?
(see slides) Upcoming new EESSI version. This includes a new version of the compatibility layer.
- Newer toolchains are tricky to install with the current compatibility layer, e.g. due to the OpenSSL version
- Bot can build new compatibility layer 🎉
- Probably will wait until Feb to include newer glibc
- Test new compat layer by building some software with it and the newer toolchain.
- Discussion on if/when EESSI versions should be archived.
(see slides)
-
Bunch of new software has been added to the repository
- Slightly less than before due to the holidays and end-of-year deadlines
-
Limited CPU + GPU combinations plus large reservations for GPU nodes make AWS and Azure trickier for GPU builds.
- Moving GPU builds to service accounts on partner systems, well received by site system administrators. Set-up and configuration in progress.
- Service account now available for A64FX in Deucalion (Portugal). This means multiple people can work on A64FX bulds.
-
BSC (Spain) RISC-V service account for RISC-V builds. Necessary to access hardware needed for builds.
(see slides)
- Zen4 support is nearly ready/complete
- Older toolchains cannot be installed, and warnings will be printed when you try to load modules depending on these toolchain
- Other than that, it's on par with other micro-architectures
(see slides)
-
Support for setting env vars that get set before the build starts. The main use case is to set whitelisted EasyBuild parameters that might be necessary in a given build. Variable - value pairs need to be whitelisted in the bot configuration for the bot to respond to them.
-
Discussion on handling build job submissions by bots -> there are more concurrent bot instances configured for many architectures. Care not to start/inject duplicate builds by accident etc.
(see slides)
- CVMFS infrastructure overview grafana dashboard that gives a more zoomed in view of the service status -> useful to trace back after Slack monitoring alerts
- Network traffic information
- Monitoring has already picked up short network outages in the Stratum 0 data center
- Dead Man's snitch already warned about the alert manager not working once, which was quickly acted upon.
- Suggestion: monitoring of Stratum 1 -> Stratum 0 connection. Might already be in place (the status page may already check for this)
(see slides)
- Please let us know (by email / Slack / opening a PR) if you know about additional systems where EESSI is available!
(see slides)
- The
eessi_mixin
class should make it easy to write portable tests, documentation is now available - It should be even easier now to run the EESSI test suite on top of a local module stack
(see slides)
- Started 1 Jan 2025. Ghent University HPC team is a partner and responsible for integrating EESSI in the Federation Platform. Other Projects/tools are used for other components of the platform (see slides).
- Updates when relevant in these meetings and in other channels.
- Note: The EuroHPC Federation Platform is not funding EESSI or it's development
(see slides)
- AWS Block storage (EBS) is a big component of the consumed credits, this can probably be optimised.
- We should look into this and keep track of the disk usage
- Azure December graph is misleading because we redeployed the slurm cluster for zen4, the accounting is done in a different way. Graphic needs to be corrected, results should be close to those in November.
(see slides)
- Suggestions / comments / ideas are welcome. Contact one of the members of the interim Steering Committee.
(see slides)
- next meetings: March 6, May 8 (!)