-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting 2023 09 12
Kenneth Hoste edited this page Sep 19, 2023
·
2 revisions
- Monthly, every 2nd Tuesday of the month at 10:00 CE(S)T
- Notes of previous meetings at https://github.com/multixscale/meetings/wiki
attending: Alan, Kenneth, Jean-Noël, Susana, Bob, Xin, Xavier, Neja, Maxim, Thomas excused: Caspar
- overview of MultiXscale planning
- WP status updates
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- [UGent] T1.1 Stable (EESSI) - due M12+M24
- new Stratum-0 hardware at @ RUG (for new
software.eessi.io
CernVM-FS repo)- EESSI/filesystem-layer issue #151
- TODO: set up & distribute yubikeys, giving admin access to others, ingesting 2023.06 compat layer that was built for software.eessi.io, building software layer
- look into CDN to improve performance of Stratum-1's
- The Alliance is using free CloudFlare?
- talk to Ryan about their setup
- via AWS/Azure sponsored credits?
- The Alliance is using free CloudFlare?
- EasyBuild v4.8.1 was released yesterday
- includes several fixes related to EESSI, see https://docs.easybuild.io/release-notes
- Building EESSI 2023.06 software layer
- for ESPResSo, we need a CPU-only easyconfig
- cfr. easyconfigs PR #17709
- UCX-CUDA dep is missing in ESPResSo easyconfig with CUDA
- would be good to have this in place by ESPResSo summer school (Oct'23) - https://www.cecam.org/workshop-details/1229
- incl. 15min talk on EESSI/MultiXscale on 12 Oct
- GPU isn't really used in HPC context since only 1 single GPU can be used per simulation
- we should have a demo with ESPResSo in EESSI demo repo
- GPU support
- https://github.com/multixscale/planning/issues/1
- dedicated meeting to figure out steps to take, who does what
- ship CUDA compat libraries: where (compat layer?), structure, etc.
- changes to script to launch build container to allow access to GPU
- no progress since last month
- planning to pick up on this again soon
- new Stratum-0 hardware at @ RUG (for new
- [RUG] T1.2 Extending support (starts M9, due M30)
- other than some toy experiments with RISC-V, no action yet
- which systems are being used?
- StarFive VisionFive 2 SBC development board
- Kenneth has access to RISC-V Slurm clusters at BSC (via John Davis, now ex-BSC) + EPCC (https://riscv.epcc.ed.ac.uk/documentation/running_riscv/)
- which systems are being used?
- effort on ARM Neoverse falls under this task
- other than some toy experiments with RISC-V, no action yet
- [SURF] T1.3 Test suite - due M12+M24
- tests for GROMACS and TensorFlow
- looking OSU Microbenchmarks
- very close to v0.1 release
- see meeting notes via https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-09-06)
- starting exploring tests for ESPResSo
- OpenMPI-related issues in the testsuite (race condition in temporary folder creation) are being addressed by Caspar and Jean-Noël
- flaky tests remain an issue, they are difficult to reproduce and diagnose
- [BSC] T1.4 RISC-V (starts M13)
- effort to be planned
- Xavier is looking into who will be involved from BSC
- kickoff meeting in Jan'24
- building compat layer for RISC-V
- cfr. Google Summer of Code 2022 project
- LLVM/Clang based toolchain in EasyBuild
- CernVM-FS also worked on RISC-V at some point
- [SURF] T1.5 Consolidation (starts M25)
- [UGent] T1.1 Stable (EESSI) - due M12+M24
- [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
- [UGent] T5.1 Support portal - due M12
- In progress
- the support portal @ https://gitlab.com/eessi/support
- templates updated
- labels updated - see https://gitlab.com/eessi/support/-/labels + https://gitlab.com/eessi/support/-/issues/3
- wiki updated - see https://gitlab.com/eessi/support/-/wikis/home
- if you find that something is missing open an issue?
- READY to deploy?
- working on adding a support page to EESSI documentation
- working on deliverable D5.2 (due M12)
- the support portal @ https://gitlab.com/eessi/support
- In progress
- [SURF] T5.2 Monitoring/testing (starts M9)
- [UiB] T5.3 community contributions (bot) - due M12
- working towards release v0.1 (https://github.com/multixscale/planning/issues/42)
- all major features / parts planned for v0.1 have been implemented
- doing one code polishing pass
- targetting mid/end of August for the release
- contribution policy isn't really part of the v0.1 bot release
- now only missing a cleanup in the README file
- plans for v0.2
- testing phase
- bot being used to build software stack for 2023.06
- bot has been working as intended for building/deploying software for EESSI
- some ideas for improvements have popped up
- next release: add using test suite (details of how tests are run and when to be defined)
- refactoring of the bot code done
- infrastructure for running the bot
- maintaining Slurm cluster in AWS (set up with Cluster-in-the-Cloud) is becoming a bit of a pain
- we should set up new Magic Castle Slurm clusters in AWS + Azure to replace it
- may need to set up separate MC instances for x86_64 and aarch64
- deployment of compat layer not working yet with bot, but building works fine
- working towards release v0.1 (https://github.com/multixscale/planning/issues/42)
- [UGent] T5.4 support/maintenance (starts M13)
- using EESSI support portal, monthly/bi-weekly/weekly rotation
- [UGent] T5.1 Support portal - due M12
- [UB] WP6 Community outreach, education, and training
- First pass of "Elevator Pitch" created, another revision under way
- High-level overview of MultiXscale goals, sell it to the user community to get them interested
- HPCNow! is working on a revision
- Ambassador program to be outlined at NCC/CoE Coffee Break, 7th Sept'23
- Some NCCs do seem to be interested in the concept
- Mostly useful for EESSI (which is generic), more difficult for scientific WPs (due to required domain expertise)
- Interest in a "public" tutorial on EESSI for EuroHPC NCCs
- Figuring out expected scope
- Probably 1-1.5h
- Maybe in Dec'23, would be interesting for MultiXscale project review
- Magic Castle tutorial at SC'23 accepted
- EESSI will get a mention as it is one of the available stacks in MC
- proposal for HUST'23 workshop at SC'23 was rejected
- short paper wasn't considered detailed enough, too general (vs open access EESSI paper)
- Second meeting with CVMFS devels regarding the Q4 workshop
- notes here
- see https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices + https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices
- currently targetting Mon 4 Dec'23 for online tutorial
- deadline for ISC'24 tutorials is 8 Dec'23
- performance aspect of this could be start of a paper?
- Have offered to do a "Code of the Month" session with CASTIEL2
- Should this wait until switch
eessi.io
is done? - May be of little value if there's a tutorial for NCCs
- Should this wait until switch
- First pass of "Elevator Pitch" created, another revision under way
- [HPCNow] WP7 Dissemination, Exploitation & Communication
- MultiXscale poster sent to European corner at the "European Researchers' night" in Barcelona (https://lanitdelarecerca.cat/european-corner/)
- Planned an EESSI talk by Elisabeth for October 9th at BSC (WHPC "Southern Europe" chapter annual meeting)
- Supercomputing'23
- Magic Castle tutorial (Alan)
- booth talks on EESSI?
- AWS: no (no lightning talks)
- Azure: probably yes
- at DoItNow booth
- talks/poster not possible at EuroHPC booth, but we can get included in video loop?
- Susana has provided a video on MultiXscale, see MultiXscale Slack (see #communication channel, should also share in #general)
- first MultiXscale newsletter was sent out
- https://www.multixscale.eu/wp-content/uploads/2023/07/Newsletter-Multixscale-Issue-1-2023.pdf
- maybe PDF is not the best format for newsletter?
- plan to have two per year, next one end of 2023
- we expect to get some criticism on MultiXscale website, pretty basic and not frequently updated
- doing more frequent posts on website + collecting them in a newsletter could help
- should also add event calendar
- integrate Twitter/X feed in website? maybe also LinkedIn?
- series of posts, incl. MultiXscale kickoff meeting, events, etc.
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- Other updates
- Should we start considering a new EESSI tutorial again, incl. adding software to EESSI
- Request has come for one from NCCs
- Rough date for EuroHPC review meeting has been given: 19-23 Feb'23 (in Luxembourg)
- Full-day meeting that week
- Can do presentations by WP separate, or grouped (Training/dissem, scientific, technical)
- More feedback expected on next quarterly reports
- Neja is starting full time on MultiXscale 1 Oct'23
- Should we consider GA meeting (well) before project review?
- Who is expected to attend this?
- Only WP leaders?
- All partners?
- Will ask input from project officer on this
- Will be discussed at next SC meeting (25 Sept'23)
- Start fixing dates for next MultiXscale GA meeting
- Could be right after EuroHPC Summit in Belgium (Mon-Thu 18-21 March 2024)
- Thu+Fri 21+22 March 2024 in Ghent
- other option is to tie it to EuroHPC review
- check for opportunities for present at EuroHPC Summit
- GA meeting is interesting both before/after EuroHPC project review
- CASTIEL2 social media plan
- MultiXscale scheduled at week of 23-27 Oct'23
- we should prepare some social media posts by then, try and update website
- a post on MultiXscale activity at SC23 would be interesting
- our events should get posted to https://hpc-portal.eu/upcoming-events-courses
- this can data be scraped (and filtered), see https://calendar.learnhpc.eu
- Initial request to provide EESSI on Meluxina was denied
- Initially due to lack of manpower to set up and maintain CernVM-FS
- Later also because of some ISO certificate they need to adher to
- Can be raised at next CASTIEL2 meeting
- Should we start considering a new EESSI tutorial again, incl. adding software to EESSI
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-08-08
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-07-11
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-06-13
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-05-09
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-04-11
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-03-14
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-02-14
- https://github.com/multixscale/meetings/wiki/sync-meeting-2023-01-10
TO COPY-PASTE
- overview of MultiXscale planning
- WP status updates
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- [UGent] T1.1 Stable (EESSI) - due M12+M24
- ...
- [RUG] T1.2 Extending support (starts M9, due M30)
- [SURF] T1.3 Test suite - due M12+M24
- ...
- [BSC] T1.4 RISC-V (starts M13)
- [SURF] T1.5 Consolidation (starts M25)
- [UGent] T1.1 Stable (EESSI) - due M12+M24
- [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
- [UGent] T5.1 Support portal - due M12
- ...
- [SURF] T5.2 Monitoring/testing (starts M9)
- [UiB] T5.3 community contributions (bot) - due M12
- ...
- [UGent] T5.4 support/maintenance (starts M13)
- [UGent] T5.1 Support portal - due M12
- [UB] WP6 Community outreach, education, and training
- ...
- [HPCNow] WP7 Dissemination, Exploitation & Communication
- ...
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies