Connection to one of the CS-2 cluster login nodes requires an MFA passcode for authentication - either an 8-digit passcode generated by an app on your mobile device (e.g. MobilePASS+) or a CRYPTOCard-generated passcode prefixed by a 4-digit pin. This is the same passcode used to authenticate into other ALCF systems, such as Theta and Cooley.
To connect to a CS-2 login, ssh to login nodes:
mkdir ~/R_2.3.0
cd ~/R_2.3.0
# Note: "deactivate" does not actually work in scripts.
deactivate
rm -r venv_cerebras_pt
/software/cerebras/python3.8/bin/python3.8 -m venv venv_cerebras_pt
source venv_cerebras_pt/bin/activate
pip install --upgrade pip
pip install cerebras_pytorch==2.3.0
We use an example from Cerebras Modelzoo repository for this hands-on.
- Clone the modezoo repository.
mkdir ~/R_2.3.0
cd ~/R_2.3.0
git clone https://github.com/Cerebras/modelzoo.git
cd modelzoo
git tag
git checkout Release_2.3.0
- Install requirements for modelzoo
cd ~/R_2.3.0/modelzoo
pip install -r requirements.txt
The CS-2 cluster has its own Kubernetes-based system for job submission and queuing. Jobs are started automatically through the Python scripts.
Use Cerebras cluster command line tool to get addional information about the jobs.
- Jobs that have not yet completed can be listed as
(venv_pt) $ csctl get jobs
- Jobs can be canceled as shown:
(venv_tf) $ csctl cancel job wsjob-eyjapwgnycahq9tus4w7id
See csctl -h
for more options.
Run BERT example with different batch sizes like 512, 2048 and observe the performance difference. Submit proof (contents printed out to your terminal, path to a logfile or screenshot) that you were able to successfully follow the instructions and execute.
- ALCF Cerebras Documentation
- Cerebras Documntation
- Cerebras Modelzoo Repo
- Datasets Path:
/software/cerebras/dataset