AWS ParallelCluster v3.7.0
dreambeyondorange
released this
30 Aug 12:11
·
21 commits
to release-3.7
since this release
We're excited to announce the release of AWS ParallelCluster Cookbook 3.7.0
This is associated with AWS ParallelCluster v3.7.0
ENHANCEMENTS
- Add support for Ubuntu 22. RSA keys are not supported by default. See this page.
- Add support for login nodes.
- Add support to mount existing Amazon File Cache as shared storage.
- Allow configuration of static and dynamic node priorities in Slurm compute resources via the ParallelCluster configuration YAML file.
- Add a queue-level parameter (
JobExclusiveAllocation
) to ensure nodes in the partition are exclusively allocated to a single job at any given time. - Allow overriding the aws-parallelcluster-node package at cluster creation and update time (only on the head node during update). Useful for development purposes only.
- Allow memory-based scheduling when multiple instance types are specified for a Slurm Compute Resource.
- Avoid starting the NFS server on compute nodes.
- Forward SLURM_RESUME_FILE to ParallelCluster resume program.
CHANGES
- Deprecate Ubuntu 18.
- Upgrade Slurm to version 23.02.4.
- Update the default root volume size to 40 GB to account for limits on Centos 7.
- Upgrade NVIDIA driver to version 535.54.03.
- Upgrade CUDA library to version 12.2.0.
- Upgrade NVIDIA Fabric manager to nvidia-fabricmanager-535.
- Upgrade NICE DCV to version 2023.0-15487.
- server: 2023.0.15487-1
- xdcv: 2023.0.551-1
- gl: 2023.0.1039-1
- web_viewer: 2023.0.15487-1
- Upgrade EFA installer to 1.25.1.
- Efa-driver: efa-2.5.0-1
- Efa-config: efa-config-1.15-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.18.1-1
- Rdma-core: rdma-core-46.0-1
- Open MPI: openmpi40-aws-4.1.5-4
- Upgrade ARM PL to version 23.04.1 for Ubuntu 22.04 only.
- Upgrade third-party cookbook dependencies:
- apt-7.5.14 (from apt-7.4.0)
- line-4.5.13 (from line-4.5.2)
- openssh-2.11.3 (from openssh-2.10.3)
- pyenv-4.2.3 (from pyenv-3.5.1)
- selinux-6.1.12 (from selinux-6.0.5)
- yum-7.4.13 (from yum-7.4.0)
- yum-epel-5.0.2 (from yum-epel-4.5.0)
- Assign Slurm dynamic nodes a priority (weight) of 1000 by default. This allows Slurm to prioritize idle static nodes over idle dynamic ones.
- Change the default value of
Imds/ImdsSupport
from v1.0 to v2.0. - Make
aws-parallelcluster-node
daemons handle only ParallelCluster-managed Slurm partitions. - Restrict permission on file
/tmp/wait_condition_handle.txt
within the head node so that only root can read it. - Create a Slurm
partition-nodelist
mapping JSON file to be used by the node package daemons to recognize PC-managed Slurm partitions and nodelists. - Increase EFS-utils watchdog poll interval to 10 seconds. Note: This change is meaningful only if EncryptionInTransit is set to true, because watchdog does not run otherwise.
BUG FIXES
- Add validation to
ScaledownIdletime
value, to prevent setting a value lower than-1
. - Fix issue causing dangling IAM policies to be created when creating ParallelCluster CloudFormation custom resource provider with CustomLambdaRole.
- Fix an issue that was causing misalignment of compute nodes DNS name on instances with multiple network interfaces,
when usingSlurmSettings/Dns/UseEc2Hostnames
equals toTrue
. - Fix cluster creation failure with Ubuntu Deep Learning AMI on GPU instances and DCV enabled.