Magic Castle EESSI 2023 10 11

Magic Castle clusters for EESSI

Sync meeting 2023-10-11 (13:00 CEST)

attendees: Thomas, Alan, Kenneth
test Slurm cluster with Magic Castle running in AWS
- notes in magic-castle-cluster issue #5
- fully configured, working as expected
  - 5TB disk space in /project
- EESSI bot is installed & configured in bot account by Thomas
  - works as expected, see test build in software-layer PR
  - contributors need to use repo:eessi-hpc.org-2023.06-software in build commands
  - need to replace credentials
- can add nessibot account
- need to update Slurm there as soon as security update is available
  - depends on Slurm RPMs that need to be built by Félix-Antoine
  - dnf update on login1 + mgmt1
  - update node images for x86_64 + aarch64
- auto-update of packages is currently enabled (not using skip_upgrade)
  - updating of packages is only done on boot (so not very relevant for login nodes)
- who should get sudo access?
  - add pubkey to public_keys in main.tf in right branch
- use DNS to make move to new cluster less painful (different IP)
- not having protected branches is annoying
  - need GitHub Teams to have protected branches in private repos
  - Kenneth will ask Laura/Davide/Hugo if we can leverage Azure sponsored credits somehow
  - Kenneth can also ask the GitHub open source community people
  - requires GitHub Teams subscription (~$1k/year with current 25 members in EESSI org...)
    - could look into a separate EESSI-admins org to reduce cost
- create accounts for active contributors once Slurm is updated
  - make sure to keep track of email addresses as well!
- update README in branch with basic info, incl. IP address
- should add scripts for stuff like installing extra packages on login node, node images, to set up the bot, etc.
we'll spin up a new cluster once Magic Castle 13.x is out
burn down test Magic Castle set up in AWS
- OK for Alan & Thomas
- need to check for Bob & Lara => empty accounts, so OK
- nothing in bot account
- good to burn down
terminate CitC cluster in AWS
- start with disabling bot there (kill screen sessions)
- remove all nodes so no new jobs can be started
- set date to destroy cluster
- inform everyone with an account
- try and get confirmation from everyone that they're OK with having their data removed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Magic Castle EESSI 2023 10 11

Magic Castle clusters for EESSI

Sync meeting 2023-10-11 (13:00 CEST)

Previous meetings

Clone this wiki locally