Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add setting for a script to customize build environment #302

Merged
merged 2 commits into from
Feb 13, 2025

Conversation

trz42
Copy link
Contributor

@trz42 trz42 commented Feb 13, 2025

Adds an additional means to customize the environment in which a build job runs. This could be already done by writing a module file and then using the setting load_modules. However doing that if one would just want to run something like

umask 0002

would require unnecessary effort.

@casparvl
Copy link
Contributor

Yes, and another use case I see is setting the REFRAME_SCALE_TAG for the test step, which could be made conditional on the partition the job has started in. I'm currently doing that in my .bashrc in a rather indirect way, by checking where our /sw/arch symlink points to (this is different per node type):

SWDIR=$(basename $(realpath /sw/arch))
if [[ ! -z "${SWDIR}" ]]; then
    if [[ "${SWDIR}"=="AMD-ZEN2" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_8_node'
    elif [[ "${SWDIR}"=="AMD-ZEN4" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_8_node'
    elif [[ "${SWDIR}"=="INTEL-AVX512" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_4_node'
    elif [[ "${SWDIR}"=="AMD-ZEN4-H100" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_4_node'
    else
        export REFRAME_SCALE_TAG=UNKOWN_SW_ARCH_DIR
    fi
else
    export REFRAME_SCALE_TAG=NO_SW_ARCH_DIR
fi

The problem is that the .bashrc gets sourced before the SLURM_JOB_PARTITION is set. But, if we have a site-customizeable script that gets sourced, I could just do:

if [[ ! -z "${SLURM_JOB_PARTITION}" ]]; then
    if [[ "${SLURM_JOB_PARTITION}"=="rome" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_8_node'
    elif [[ "${SLURM_JOB_PARTITION}"=="genoa" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_8_node'
    elif [[ "${SLURM_JOB_PARTITION}"=="gpu_a100" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_4_node'
    elif [[ "${SLURM_JOB_PARTITION}"=="gpu_h100" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_4_node'
    else
        export REFRAME_SCALE_TAG=UNKOWN_SW_ARCH_DIR
    fi
else
    export REFRAME_SCALE_TAG=NO_SW_ARCH_DIR
fi

Which is much more clear and direct.

@casparvl
Copy link
Contributor

This thing works brilliantly:

site_config_script = /home/casparl/EESSI/bot-instance/eessi-bot-configs/snellius/site_config_script.sh

which contains:

$ cat /home/casparl/EESSI/bot-instance/eessi-bot-configs/snellius/site_config_script.sh
#!/bin/bash
# This script will be sourced to customize the environment in which the EESSI build bot
# runs on Snellius

# Set umask, as the
echo "Original umask: $(umask)"
umask 0002
echo "New umask: $(umask)"


# Set the REFRAME_SCALE_TAG for each node type
echo "Original REFRAME_SCALE_TAG: ${REFRAME_SCALE_TAG}"
if [[ ! -z "${SLURM_JOB_PARTITION}" ]]; then
    if [[ "${SLURM_JOB_PARTITION}"=="rome" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_8_node'
    elif [[ "${SLURM_JOB_PARTITION}"=="genoa" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_8_node'
    elif [[ "${SLURM_JOB_PARTITION}"=="gpu_a100" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_4_node'
    elif [[ "${SLURM_JOB_PARTITION}"=="gpu_H100" ]]; then
        export REFRAME_SCALE_TAG='--tag 1_4_node'
    else
        export REFRAME_SCALE_TAG=UNKOWN_SW_ARCH_DIR
    fi
else
    export REFRAME_SCALE_TAG=NO_SW_ARCH_DIR
fi
echo "New REFRAME_SCALE_TAG: ${REFRAME_SCALE_TAG}"

# Set the APPTAINER_CONFIG_FILE to make sure `allow pid = yes` is set, which is needed for apptainers fusemount option
# See https://github.com/apptainer/apptainer/issues/2762#issuecomment-2637803358
echo "Original APPTAINER_CONFIG_FILE: ${APPTAINER_CONFIG_FILE}"
export APPTAINER_CONFIG_FILE=$HOME/EESSI/bot-instance/eessi-bot-configs/snellius/apptainer.conf
echo "New APPTAINER_CONFIG_FILE: ${APPTAINER_CONFIG_FILE}"

And I now see the following output in the job:

Overwriting current TMPDIR '/scratch-local/casparl.9929733' with the value '/tmp/casparl/EESSI/eessi_job.FdenVQWnfj', as configured in cfg/job.cfg
Sourcing site config script '/home/casparl/EESSI/bot-instance/eessi-bot-configs/snellius/site_config_script.sh'
Original umask: 0027
New umask: 0002
Original REFRAME_SCALE_TAG: --tag 1_8_node
New REFRAME_SCALE_TAG: --tag 1_8_node
Original APPTAINER_CONFIG_FILE:
New APPTAINER_CONFIG_FILE: /home/casparl/EESSI/bot-instance/eessi-bot-configs/snellius/apptainer.conf

Great, this is what I need. I can now set the umask. Setting that APPTAINER_CONFIG_FILE allows me to avoid an issue I had with config file in /etc/apptainer/apptainer.conf... And I can strip the setting of the REFRAME_SCALE_TAG from my .bashrc.

Awesome!

@casparvl
Copy link
Contributor

Oh, one issue that I saw is:

/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found
/var/spool/slurm/slurmd/job9929733/slurm_script: line 67: -n: command not found

in the output. I thought that was introduced by a previous PR, but it is this one. I'll suggest a fix.

if $inside_site_config && [[ $line =~ ^site_config_script\ *=\ *([^[:space:]]+) ]]; then
site_config_script_value="${BASH_REMATCH[1]}"
fi
if -n "$local_tmp_value" && -n "$site_config_script_value"; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if -n "$local_tmp_value" && -n "$site_config_script_value"; then
if [[ -n "${local_tmp_value}" ]] && [[ -n "${site_config_script_value}" ]]; then

Copy link
Contributor

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny bug in one of the if statements for bash, see my suggestion. Otherwise, this looks and works great.

Copy link
Contributor

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@casparvl casparvl merged commit 358ef01 into EESSI:develop Feb 13, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants