Skip to content

Commit

Permalink
update NeMo framework examples to 24.12
Browse files Browse the repository at this point in the history
  • Loading branch information
akiki-liang0 committed Jan 30, 2025
1 parent 16db3aa commit e4da724
Show file tree
Hide file tree
Showing 9 changed files with 14 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

ARG NEMOFW_VERSION=24.07
ARG NEMOFW_VERSION=24.12
FROM nvcr.io/nvidia/nemo:${NEMOFW_VERSION}

ENV USE_TCPX=yes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ README

1. Set up NeMo Framework Container

This makes a few environment variable modifications to the [nvcr.io/nvidia/nemo:24.07](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
This makes a few environment variable modifications to the [nvcr.io/nvidia/nemo:24.12](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
container, and submits a Slurm job to copy the framework launcher scripts and a
few other auxiliary files into your working directory.

Expand Down Expand Up @@ -45,7 +45,7 @@ README
launcher_scripts_path=${PWD} \
stages=[training] \
env_vars.TRANSFORMERS_OFFLINE=0 \
container=../nemofw+tcpx-24.07.sqsh \
container=../nemofw+tcpx-24.12.sqsh \
container_mounts='['${HOME}/.cache',"/var/lib/tcpx/lib64","/run/tcpx-\${SLURM_JOB_ID}:/run/tcpx"]' \
cluster.srun_args=["--container-writable"] \
training.model.data.data_impl=mock \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#SBATCH --partition=a3
#SBATCH --exclusive

: "${NEMOFW_VERSION:=24.07}"
: "${NEMOFW_VERSION:=24.12}"

srun docker build --build-arg="NEMOFW_VERSION=${NEMOFW_VERSION}" -t nemofw:tcpx-"${NEMOFW_VERSION}" .
srun rm -f nemofw+tcpx-"${NEMOFW_VERSION}".sqsh
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

ARG NEMOFW_VERSION=24.07
ARG NEMOFW_VERSION=24.12
FROM nvcr.io/nvidia/nemo:$NEMOFW_VERSION

ENV NCCL_FASTRAK_CTRL_DEV=enp0s12
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ README

1. Set up NeMo Framework Container

This makes a few environment variable modifications to the [nvcr.io/nvidia/nemo:24.07](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
This makes a few environment variable modifications to the [nvcr.io/nvidia/nemo:24.12](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
container, and submits a Slurm job to copy the framework launcher scripts and a
few other auxiliary files into your working directory.

Expand All @@ -21,7 +21,7 @@ README
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt # Copied from the NeMo Framework Container earlier
# This is needed to use 24.07 and python3.11, which is what is present on
# This is needed to use 24.12 and python3.11, which is what is present on
# Debian 12
pip install -U hydra-core
```
Expand Down Expand Up @@ -53,7 +53,7 @@ README
stages=[training] \
training=gpt3/5b \
env_vars.TRANSFORMERS_OFFLINE=0 \
container=../nemofw+tcpxo-24.07.sqsh \
container=../nemofw+tcpxo-24.12.sqsh \
container_mounts=[${HOME}/.cache,/var/lib/tcpxo/lib64] \
cluster.srun_args=["--container-writable"] \
training.model.data.data_impl=mock \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#SBATCH --partition=a3mega
#SBATCH --exclusive

: "${NEMOFW_VERSION:=24.07}"
: "${NEMOFW_VERSION:=24.12}"

srun docker build --build-arg="NEMOFW_VERSION=${NEMOFW_VERSION}" -t nemofw:tcpxo-"${NEMOFW_VERSION}" .
srun rm -f nemofw+tcpxo-"${NEMOFW_VERSION}".sqsh
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

ARG NEMOFW_VERSION=24.07
ARG NEMOFW_VERSION=24.12
FROM nvcr.io/nvidia/nemo:$NEMOFW_VERSION

ENV NCCL_DEBUG=INFO,WARN
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ README

1. Set up NeMo Framework Container

This makes a few environment variable modifications to the [nvcr.io/nvidia/nemo:24.07](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
This makes a few environment variable modifications to the [nvcr.io/nvidia/nemo:24.12](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
container, and submits a Slurm job to copy the framework launcher scripts and a
few other auxiliary files into your working directory.

Expand All @@ -21,7 +21,7 @@ README
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt # Copied from the NeMo Framework Container earlier
# This is needed to use 24.07 and python3.11, which is what is present on
# This is needed to use 24.12 and python3.11, which is what is present on
# Debian 12
pip install -U hydra-core
```
Expand Down Expand Up @@ -53,7 +53,7 @@ README
stages=[training] \
training=gpt3/5b \
env_vars.TRANSFORMERS_OFFLINE=0 \
container=../nemo-24.07.sqsh \
container=../nemo-24.12.sqsh \
container_mounts=[${HOME}/.cache,/usr/local/gib] \
cluster.srun_args=["--container-writable"] \
training.model.data.data_impl=mock \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#SBATCH --partition=a3ultra
#SBATCH --exclusive

: "${NEMOFW_VERSION:=24.07}"
: "${NEMOFW_VERSION:=24.12}"

srun docker build --build-arg="NEMOFW_VERSION=${NEMOFW_VERSION}" -t nemo-"${NEMOFW_VERSION}" .
srun rm -f nemo-"${NEMOFW_VERSION}".sqsh
Expand Down

0 comments on commit e4da724

Please sign in to comment.