From dfcb2c911d1e07cf19c71ef4ea512174f0f94815 Mon Sep 17 00:00:00 2001 From: MaximeVdB Date: Fri, 6 Dec 2024 10:04:31 +0100 Subject: [PATCH 1/4] Improve phrasing/spelling/... in charge rate section --- source/leuven/credits.rst | 62 +++++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/source/leuven/credits.rst b/source/leuven/credits.rst index 051c2b70e..0a4eb4d04 100644 --- a/source/leuven/credits.rst +++ b/source/leuven/credits.rst @@ -44,12 +44,12 @@ Please contact your VSC coordinator/contact or your :ref:`local support staff Job cost calculation ~~~~~~~~~~~~~~~~~~~~ -On Tier-2 clusters, we use Slurm for accounting purposes (on top of resource and +On Tier-2 clusters, we use Slurm for accounting purposes (on top of resource and job management). See :ref:`Slurm accounting ` page for additional information. In Slurm terminology, the cost of a job depends on the trackable resources (TRES) -it consumes. Two distinct TRES are the number of CPU cores and GPU devices. -Different types of CPU and GPU nodes are given different weights +it consumes. Two distinct TRES are the number of CPU cores and GPU devices. +Different types of CPU and GPU nodes are given different weights (``TRESBillingWeights``) which you can retrieve as follows for e.g. wICE:: scontrol show partitions --clusters=wice @@ -75,7 +75,7 @@ Where The following formula applies:: (CPU TRESBillingWeights * num_cores + GPU TRESBillingWeights * num_gpus) * walltime - + Where - ``CPU TRESBillingWeights`` is the applied weight for CPU resources (see above) @@ -94,11 +94,11 @@ Where .. note:: The Tier-2 cluster has several types of compute nodes. - Hence, different ``TRESBillingWeights`` apply to + Hence, different ``TRESBillingWeights`` apply to different resources on different partitions of Genius and wICE. The difference in cost between different machines/processors reflects the performance difference between those types of nodes. - For additional information, you may refer to the + For additional information, you may refer to the `HPC Service Catalog `_ (login required). @@ -125,31 +125,31 @@ will be charged:: Charge rates ------------ -The charge rate for the various node types of Genius and wICE are listed in the table -below. -The reported cost is the number of Slurm credits needed per core/GPU per minute. - -+---------+-----------------+------------------------+ -| Cluster | node type | ``TRESBillingWeights`` | -+=========+=================+========================+ -| Genius | skylake | 4.62963 | -+ +-----------------+------------------------+ -| | cascadelake | 4.62963 | -+ +-----------------+------------------------+ -| | skylake bigmem | 5.55556 | -+ +-----------------+------------------------+ -| | Nvidia P100 GPU | 41.6667 | -+ +-----------------+------------------------+ -| | Nvidia V100 GPU | 59.5833 | -+ +-----------------+------------------------+ -| | Superdome | 18.7500 | -+---------+-----------------+------------------------+ -| wICE | icelake | 2.54630 | -+ +-----------------+------------------------+ -| | icelake bigmem | 4.39815 | -+ +-----------------+------------------------+ -| | Nvidia A100 GPU | 141.667 | -+---------+-----------------+------------------------+ +The table below shows the charge rates for each CPU and GPU type on Genius +and wICE. These values correspond to the number of Slurm credits needed +to allocate one core or GPU during one minute. + ++---------+---------------------+----------+------------------------+ +| Cluster | Resource | Type | ``TRESBillingWeights`` | ++=========+=====================+==========+========================+ +| Genius | Skylake | CPU core | 4.62963 | ++ +---------------------+----------+------------------------+ +| | Skylake (bigmem) | CPU core | 5.55556 | ++ +---------------------+----------+------------------------+ +| | Skylake (superdome) | CPU core | 18.7500 | ++ +---------------------+----------+------------------------+ +| | Cascadelake | CPU core | 4.62963 | ++ +---------------------+----------+------------------------+ +| | P100 | GPU | 41.6667 | ++ +---------------------+----------+------------------------+ +| | V100 | GPU | 59.5833 | ++---------+---------------------+----------+------------------------+ +| wICE | Icelake | CPU core | 2.54630 | ++ +---------------------+----------+------------------------+ +| | Icelake (bigmem) | CPU core | 4.39815 | ++ +---------------------+----------+------------------------+ +| | A100 | GPU | 141.667 | ++---------+---------------------+----------+------------------------+ .. _Geert Jan Bex: mailto:geertjan.bex@uhasselt.be From c4517e946bd4128cbf3fc36c23931654914adfa6 Mon Sep 17 00:00:00 2001 From: MaximeVdB Date: Fri, 6 Dec 2024 10:12:48 +0100 Subject: [PATCH 2/4] Add wICE extension hardware to charge rate table --- source/leuven/credits.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/source/leuven/credits.rst b/source/leuven/credits.rst index 0a4eb4d04..aa0e05475 100644 --- a/source/leuven/credits.rst +++ b/source/leuven/credits.rst @@ -148,7 +148,15 @@ to allocate one core or GPU during one minute. + +---------------------+----------+------------------------+ | | Icelake (bigmem) | CPU core | 4.39815 | + +---------------------+----------+------------------------+ +| | Icelake (hugemem) | CPU core | 4.39815 | ++ +---------------------+----------+------------------------+ +| | Sapphire Rapids | CPU core | 3.47222 | ++ +---------------------+----------+------------------------+ +| | Zen4 Genoa | CPU core | 3.47222 | ++ +---------------------+----------+------------------------+ | | A100 | GPU | 141.667 | ++ +---------------------+----------+------------------------+ +| | H100 | GPU | 569.444 | +---------+---------------------+----------+------------------------+ From 77beabda42e0025b373d5a8386d6a45e48e85dfb Mon Sep 17 00:00:00 2001 From: MaximeVdB Date: Fri, 6 Dec 2024 10:42:46 +0100 Subject: [PATCH 3/4] Update and correct the job credit cost example Updating in the sense that it's better to not explicitly mention Skylake since the default partition now only consists of Cascadelake nodes. Correcting in the sense that this also involves a floor function. And to use more generic-looking credit account and jobscript names. --- source/leuven/credits.rst | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/source/leuven/credits.rst b/source/leuven/credits.rst index aa0e05475..6e1d46b6d 100644 --- a/source/leuven/credits.rst +++ b/source/leuven/credits.rst @@ -107,19 +107,16 @@ the price-performance difference between those types of nodes. The total cost of a job will be comparable on any compute node, but the walltime will be different, depending on the performance of the nodes. -In the examples below, you run your jobs on a ``skylake`` node, for which -we charge 10 000 Slurm credits per hour. +As an example, consider a job running on two nodes of the default partition on +Genius, where ``TRESBillingWeights=CPU=4.62963`` applies:: -An example of a job running on multiple nodes and cores is given below:: + $ sbatch --account=lp_myproject --clusters=genius --nodes=2 \ + --ntasks-per-node=36 myjobscript.slurm - $ sbatch --account=lp_astrophysics_014 --clusters=genius --nodes=2 \ - --ntasks-per-node=36 simulation_3415.slurm - -For Genius thin nodes we have ``TRESBillingWeights=CPU=4.62963``. If this job finishes in 2.5 hours (i.e., walltime is 150 minutes), the user will be charged:: - 4.62963 * (2 * 36) * 150 = 50 000 credits + floor(4.62963 * (2 * 36)) * 150 = 49 950 credits Charge rates From 95365951b6a7f98171e1d8891652db42e99f7743 Mon Sep 17 00:00:00 2001 From: MaximeVdB Date: Fri, 6 Dec 2024 11:43:12 +0100 Subject: [PATCH 4/4] Mention sam-quote tool for credit cost estimates --- source/leuven/credits.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/source/leuven/credits.rst b/source/leuven/credits.rst index 6e1d46b6d..85f9a3cfd 100644 --- a/source/leuven/credits.rst +++ b/source/leuven/credits.rst @@ -118,6 +118,16 @@ will be charged:: floor(4.62963 * (2 * 36)) * 150 = 49 950 credits +You can also get such estimates from the ``sam-quote`` tool by providing it +with your job submission command:: + + $ sam-quote sbatch --account=lp_myproject --clusters=genius --nodes=2 \ + --ntasks-per-node=36 --time=2:30:00 myjobscript.slurm + 49950 + +Note that ``sam-quote`` assumes a worst-case scenario in which the job does +not stop before reaching its time limit. + Charge rates ------------