Skip to content

Sync meeting 2023 09 12

Kenneth Hoste edited this page Sep 19, 2023 · 2 revisions

MultiXscale WP1+WP5 sync meetings


Agenda/notes 2023-09-12

attending: Alan, Kenneth, Jean-Noël, Susana, Bob, Xin, Xavier, Neja, Maxim, Thomas excused: Caspar

  • overview of MultiXscale planning
  • WP status updates
    • [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
      • [UGent] T1.1 Stable (EESSI) - due M12+M24
        • new Stratum-0 hardware at @ RUG (for new software.eessi.io CernVM-FS repo)
          • EESSI/filesystem-layer issue #151
          • TODO: set up & distribute yubikeys, giving admin access to others, ingesting 2023.06 compat layer that was built for software.eessi.io, building software layer
          • look into CDN to improve performance of Stratum-1's
            • The Alliance is using free CloudFlare?
              • talk to Ryan about their setup
            • via AWS/Azure sponsored credits?
        • EasyBuild v4.8.1 was released yesterday
        • Building EESSI 2023.06 software layer
        • for ESPResSo, we need a CPU-only easyconfig
          • cfr. easyconfigs PR #17709
          • UCX-CUDA dep is missing in ESPResSo easyconfig with CUDA
          • would be good to have this in place by ESPResSo summer school (Oct'23) - https://www.cecam.org/workshop-details/1229
            • incl. 15min talk on EESSI/MultiXscale on 12 Oct
          • GPU isn't really used in HPC context since only 1 single GPU can be used per simulation
          • we should have a demo with ESPResSo in EESSI demo repo
        • GPU support
          • https://github.com/multixscale/planning/issues/1
          • dedicated meeting to figure out steps to take, who does what
          • ship CUDA compat libraries: where (compat layer?), structure, etc.
          • changes to script to launch build container to allow access to GPU
          • no progress since last month
          • planning to pick up on this again soon
      • [RUG] T1.2 Extending support (starts M9, due M30)
        • other than some toy experiments with RISC-V, no action yet
        • effort on ARM Neoverse falls under this task
      • [SURF] T1.3 Test suite - due M12+M24
        • tests for GROMACS and TensorFlow
        • looking OSU Microbenchmarks
        • very close to v0.1 release
        • see meeting notes via https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-09-06)
        • starting exploring tests for ESPResSo
          • OpenMPI-related issues in the testsuite (race condition in temporary folder creation) are being addressed by Caspar and Jean-Noël
          • flaky tests remain an issue, they are difficult to reproduce and diagnose
      • [BSC] T1.4 RISC-V (starts M13)
        • effort to be planned
        • Xavier is looking into who will be involved from BSC
        • kickoff meeting in Jan'24
        • building compat layer for RISC-V
          • cfr. Google Summer of Code 2022 project
        • LLVM/Clang based toolchain in EasyBuild
        • CernVM-FS also worked on RISC-V at some point
      • [SURF] T1.5 Consolidation (starts M25)
    • [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
      • [UGent] T5.1 Support portal - due M12
      • [SURF] T5.2 Monitoring/testing (starts M9)
      • [UiB] T5.3 community contributions (bot) - due M12
        • working towards release v0.1 (https://github.com/multixscale/planning/issues/42)
          • all major features / parts planned for v0.1 have been implemented
          • doing one code polishing pass
          • targetting mid/end of August for the release
          • contribution policy isn't really part of the v0.1 bot release
          • now only missing a cleanup in the README file
        • plans for v0.2
          • testing phase
        • bot being used to build software stack for 2023.06
          • bot has been working as intended for building/deploying software for EESSI
          • some ideas for improvements have popped up
        • next release: add using test suite (details of how tests are run and when to be defined)
        • refactoring of the bot code done
        • infrastructure for running the bot
          • maintaining Slurm cluster in AWS (set up with Cluster-in-the-Cloud) is becoming a bit of a pain
          • we should set up new Magic Castle Slurm clusters in AWS + Azure to replace it
            • may need to set up separate MC instances for x86_64 and aarch64
        • deployment of compat layer not working yet with bot, but building works fine
      • [UGent] T5.4 support/maintenance (starts M13)
        • using EESSI support portal, monthly/bi-weekly/weekly rotation
    • [UB] WP6 Community outreach, education, and training
      • First pass of "Elevator Pitch" created, another revision under way
        • High-level overview of MultiXscale goals, sell it to the user community to get them interested
        • HPCNow! is working on a revision
      • Ambassador program to be outlined at NCC/CoE Coffee Break, 7th Sept'23
        • Some NCCs do seem to be interested in the concept
        • Mostly useful for EESSI (which is generic), more difficult for scientific WPs (due to required domain expertise)
        • Interest in a "public" tutorial on EESSI for EuroHPC NCCs
          • Figuring out expected scope
          • Probably 1-1.5h
          • Maybe in Dec'23, would be interesting for MultiXscale project review
      • Magic Castle tutorial at SC'23 accepted
        • EESSI will get a mention as it is one of the available stacks in MC
      • proposal for HUST'23 workshop at SC'23 was rejected
        • short paper wasn't considered detailed enough, too general (vs open access EESSI paper)
      • Second meeting with CVMFS devels regarding the Q4 workshop
      • Have offered to do a "Code of the Month" session with CASTIEL2
        • Should this wait until switch eessi.io is done?
        • May be of little value if there's a tutorial for NCCs
    • [HPCNow] WP7 Dissemination, Exploitation & Communication
      • MultiXscale poster sent to European corner at the "European Researchers' night" in Barcelona (https://lanitdelarecerca.cat/european-corner/)
      • Planned an EESSI talk by Elisabeth for October 9th at BSC (WHPC "Southern Europe" chapter annual meeting)
      • Supercomputing'23
        • Magic Castle tutorial (Alan)
        • booth talks on EESSI?
          • AWS: no (no lightning talks)
          • Azure: probably yes
          • at DoItNow booth
          • talks/poster not possible at EuroHPC booth, but we can get included in video loop?
            • Susana has provided a video on MultiXscale, see MultiXscale Slack (see #communication channel, should also share in #general)
      • first MultiXscale newsletter was sent out
      • we expect to get some criticism on MultiXscale website, pretty basic and not frequently updated
        • doing more frequent posts on website + collecting them in a newsletter could help
        • should also add event calendar
        • integrate Twitter/X feed in website? maybe also LinkedIn?
        • series of posts, incl. MultiXscale kickoff meeting, events, etc.
  • Other updates
    • Should we start considering a new EESSI tutorial again, incl. adding software to EESSI
      • Request has come for one from NCCs
    • Rough date for EuroHPC review meeting has been given: 19-23 Feb'23 (in Luxembourg)
      • Full-day meeting that week
      • Can do presentations by WP separate, or grouped (Training/dissem, scientific, technical)
      • More feedback expected on next quarterly reports
      • Neja is starting full time on MultiXscale 1 Oct'23
      • Should we consider GA meeting (well) before project review?
      • Who is expected to attend this?
        • Only WP leaders?
        • All partners?
        • Will ask input from project officer on this
      • Will be discussed at next SC meeting (25 Sept'23)
    • Start fixing dates for next MultiXscale GA meeting
    • CASTIEL2 social media plan
      • MultiXscale scheduled at week of 23-27 Oct'23
      • we should prepare some social media posts by then, try and update website
      • a post on MultiXscale activity at SC23 would be interesting
    • our events should get posted to https://hpc-portal.eu/upcoming-events-courses
    • Initial request to provide EESSI on Meluxina was denied
      • Initially due to lack of manpower to set up and maintain CernVM-FS
      • Later also because of some ISO certificate they need to adher to
      • Can be raised at next CASTIEL2 meeting

Notes of previous meetings


Template for sync meeting notes

TO COPY-PASTE

  • overview of MultiXscale planning
  • WP status updates
    • [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
      • [UGent] T1.1 Stable (EESSI) - due M12+M24
        • ...
      • [RUG] T1.2 Extending support (starts M9, due M30)
      • [SURF] T1.3 Test suite - due M12+M24
        • ...
      • [BSC] T1.4 RISC-V (starts M13)
      • [SURF] T1.5 Consolidation (starts M25)
    • [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
      • [UGent] T5.1 Support portal - due M12
        • ...
      • [SURF] T5.2 Monitoring/testing (starts M9)
      • [UiB] T5.3 community contributions (bot) - due M12
        • ...
      • [UGent] T5.4 support/maintenance (starts M13)
    • [UB] WP6 Community outreach, education, and training
      • ...
    • [HPCNow] WP7 Dissemination, Exploitation & Communication
      • ...
Clone this wiki locally