Note
This was developed on the NYU Greene HPC cluster, and may require modifications to run on other setups.
Setup gcloud cli (only needed if working with GCP Cloud Storage)
SCRATCH=/scratch/${USER}
cd $SCRATCH
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-473.0.0-linux-x86_64.tar.gz
tar -xvf google-cloud-sdk-473.0.0-linux-x86_64.tar.gz
./google-cloud-sdk/install.sh
# init --> configure default project, region+zone = us-central2-b
./google-cloud-sdk/bin/gcloud init
# install the alpha components
./google-cloud-sdk/bin/gcloud components install alpha
Install miniforge:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
IMPORTANT: select /scratch/<USER>/miniforge3 as the installation path
Create an environment file: (important)
Your `~/.bashrc` won't be sourced in the Slurm job, so you need to create an environment file to load the necessary modules in the Slurm script. This file will be sourced in the Slurm scripts to set up the environment.Create an env file in your scratch directory:
vim $SCRATCH/env.sh
Add the following lines to the $SCRATCH/env.sh
file:
#!/bin/bash
source $SCRATCH/miniforge3/etc/profile.d/conda.sh
# activate the conda environment
conda activate eval
# The next line updates PATH for the Google Cloud SDK.
if [ -f $SCRATCH/google-cloud-sdk/path.bash.inc ]; then . $SCRATCH/google-cloud-sdk/path.bash.inc; fi
# ensure you download hf files to /scratch instead of /home
export HF_HOME=$SCRATCH/.cache/huggingface
export HF_HUB_CACHE=$SCRATCH/.cache/huggingface/hub
export HF_DATASETS_CACHE=$SCRATCH/.cache/huggingface/datasets
export EVAL_DIR=$SCRATCH/mllm_eval_hpc # path to this project directory
alternatively to setting the HF vars, you can create a dir in $SCRATCH and symlink it to the default cache dir
mkdir -p $SCRATCH/.cache ln -s $SCRATCH/.cache ~/.cache
[From login node] Setup the overlay and download the singularity container:
SCRATCH=/scratch/${USER}
mkdir -p ${SCRATCH}/overlay
scp greene-dtn:/scratch/work/public/overlay-fs-ext3/overlay-25GB-500K.ext3.gz ${SCRATCH}/overlay/cambrian.ext3.gz
gunzip -vvv ${SCRATCH}/overlay/cambrian.ext3.gz
scp -rp greene-dtn:/scratch/work/public/singularity/cuda12.1.1-cudnn8.9.0-devel-ubuntu22.04.2.sif ${SCRATCH}/overlay/cuda12.1.1-cudnn8.9.0-devel-ubuntu22.04.2.sif
[From GPU compute node] Setup the environment:
Request a GPU node:
srun --pty -c 6 --mem=16GB --gres=gpu:rtx8000:1 --time=04:00:00 /bin/bash
Load the singularity container in read-write mode:
SCRATCH=/scratch/${USER}
singularity exec --bind /scratch --nv --overlay $SCRATCH/overlay/cambrian.ext3:rw $SCRATCH/overlay/cuda12.1.1-cudnn8.9.0-devel-ubuntu22.04.2.sif /bin/bash
Install the environment:
conda create -n eval python=3.10 -y
conda activate eval
pip install --upgrade pip
pip install -r requirements.txt
from the root of this project directory, run the following command to launch a job to
- download a checkpoint from GCP using the
gcloud
CLI - consolidate the checkpoint using
consolidate.py
- convert the checkpoint to HF format using
convert_hf_model.py
sbatch slurm/consolidate.slurm <path_to_checkpoint>
example
sbatch slurm/consolidate.slurm gs://us-central2-storage/cambrian/checkpoints/TPU-llava-v1.5-7b-finetune-6993k
This will save the consolidated checkpoint to $SCRATCH/cambrian-TPU-llava-v1.5-7b-finetune-6993k
Note: an extra "cambrian-" prefix is added to the checkpoint name if it is not already present to ensure the checkpoint can be loaded properly with the
cambrian
code
You can launch a sbatch job to evaluate a model checkpoint on a benchmark using submit_eval.bash
Usage: bash slurm/submit_eval.bash --benchmark <benchmark> --ckpt <ckpt> [OPTIONS]
Submit a job to evaluate a model checkpoint on a benchmark.
Required Arguments:
--benchmark <benchmark> The benchmark to evaluate on.
--ckpt <ckpt> The path to the model checkpoint.
Optional Arguments:
--conv_mode <conv_mode> The conversation mode to use.
(Default: vicuna_v1)
--gpus <gpus> The number of GPUs to request.
(Default: 2)
--constraint <constraint> The gres constraint to use.
(Default: a100|h100|rtx8000)
--cpus <cpus> The number of CPUs per task.
(Default: 64)
--mem <mem> The amount of memory to use.
(Default: 128GB)
--time <time> The time limit for the job.
(Default: 03:00:00)
--help Show this message.
example
bash slurm/submit_eval.bash --ckpt $SCRATCH/checkpoints/llava-yi-finetune-6993k/ --conv_mode chatml_direct --constraint "a100|h100" --gpus 2 --benchmark mmmu
Under the hood
The submit_eval.bash
script does the following:
- Parses the command-line arguments and validates them.
- Determines the appropriate Slurm script to use for the evaluation.
- Constructs the Slurm command to submit the evaluation job.
- Submits the evaluation job to the Slurm job scheduler.
The Slurm script sets up the environment, loads the necessary modules, and runs the run_benchmark.sh
script with the provided arguments. See eval_benchmark.slurm
for more details.
The run_benchmark.sh
script does the following:
- Parses the command-line arguments.
- Validates the benchmark directory and required scripts.
- Handles the distribution of the evaluation workload across multiple GPUs using chunking.
- Runs the evaluation script for each chunk in parallel.
- Combines the results from all the chunks into a single answers file.
- Runs the testing script on the combined answers file to compute the evaluation metrics.
The submit_all_benchmarks_parallel.bash
script will call the submit_eval.bash
script for each benchmark that has been implemented for a given checkpoint.
Usage: bash slurm/submit_all_benchmarks_parallel.bash --ckpt <ckpt> [OPTIONS]
Submits jobs to evaluate a model checkpoint on each benchmark.
Required Arguments:
--ckpt <ckpt> The path to the model checkpoint.
Optional Arguments:
--conv_mode <conv_mode> The conversation mode to use.
(Default: vicuna_v1)
--gpus <gpus> The number of GPUs to request.
(Default: 2)
--constraint <constraint> The gres constraint to use.
(Default: a100|h100|rtx8000)
--cpus <cpus> The number of CPUs per task.
(Default: 64)
--mem <mem> The amount of memory to use.
(Default: 128GB)
--time <time> The time limit for the job.
(Default: 03:00:00)
--help Show this message.
example
bash slurm/submit_all_benchmarks_parallel.bash --ckpt $SCRATCH/checkpoints/llava-TPU-llava-v1.5-7b-finetune-6993k
or using the nyu-visionx/cambrian-8b
HF model:
bash slurm/submit_all_benchmarks_parallel.bash --ckpt nyu-visionx/cambrian-8b
The e2e.bash
script chains together the downloading, consolidation, conversion, and evaluation steps for a given checkpoint stored on GCP.
Usage: bash slurm/e2e.bash --ckpt <ckpt> [OPTIONS]
End-to-end script to consolidate a GCP checkpoint and submit eval jobs.
Required Arguments:
--ckpt <ckpt> The path to the model checkpoint.
Optional Arguments:
--conv_mode <conv_mode> The conversation mode to use.
(Default: vicuna_v1)
--gpus <gpus> The number of GPUs to request.
(Default: 1)
--constraint <constraint> The gres constraint to use.
(Default: a100|h100|rtx8000)
--cpus <cpus> The number of CPUs per task.
(Default: 18)
--mem <mem> The amount of memory to use.
(Default: 32GB)
--time <time> The time limit for the job.
(Default: 10:00:00)
--help Show this message.
example
bash slurm/e2e.bash --ckpt gs://us-central2-storage/cambrian/checkpoints/TPU-llava-v1.5-7b-finetune-6993k
Under the hood
The e2e.bash
script performs the following steps:
-
Consolidation Job Submission:
- Submits a Slurm job using
slurm/consolidate.slurm
to:- Download the checkpoint from GCP Cloud Storage.
- Consolidate the checkpoint using
scripts/consolidate.py
. - Convert the checkpoint to HuggingFace format using
scripts/convert_hf_model.py
.
- Captures the job ID of the consolidation job.
- Submits a Slurm job using
-
Checkpoint Path Processing:
- Extracts the checkpoint name from the GCP path.
- Prepends "cambrian-" to the checkpoint name if not already present.
- Constructs the full local path where the consolidated checkpoint will be saved.
-
Evaluation Jobs Submission:
- Calls
slurm/submit_all_benchmarks_parallel.bash
to submit evaluation jobs for all implemented benchmarks. - Passes along all relevant parameters (checkpoint path, conversation mode, Slurm job settings).
- Sets up a dependency on the consolidation job, ensuring evaluations only start after consolidation is complete.
- Calls
This end-to-end process automates the entire workflow from downloading a checkpoint from GCP to running all benchmark evaluations, making it easy to evaluate new checkpoints with a single command.