[LLM] Adds Llama 3.1 reference implementation code #781

Elnifio · 2025-01-08T08:00:07Z

Adds the initial Llama 3.1 reference implementation.

github-actions · 2025-01-08T08:00:20Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

ShriyaPalsamudram · 2025-01-09T15:45:55Z

large_language_model/nemo/Dockerfile

+FROM ${NEMO_BASE_IMAGE} AS nemo-base-image
+
+RUN pip uninstall transformers -y
+RUN pip install transformers blobfile


Please freeze the versions for reproducibility

ShriyaPalsamudram · 2025-01-09T15:46:07Z

large_language_model/nemo/Dockerfile

+
+RUN pip uninstall transformers -y
+RUN pip install transformers blobfile
+RUN pip install prettytable


Same as above

ShriyaPalsamudram · 2025-01-09T15:46:29Z

large_language_model/nemo/README.md

+
+### Steps to run and time
+
+To train Llama 3.1 405B, we need to fill out all fields in [config.sh](./config.sh). This file contains all configurations for Slurm cluster access and job submission configurations, directory mappings, containers, and model configurations. 


Can we have an example config.sh file with all these values filled out so users know what the paths should point to or what the contents for these values should look like? For example mention what the contents of the tokenizer path should look like and where it should be saved. Maybe also state that these paths need to be mounted (if that's necessary) during docker run.

ShriyaPalsamudram · 2025-01-09T15:47:26Z

large_language_model/nemo/README.md

+
+We use Mixtral 8x22B tokenizer in this benchmark. Tokenizer files can be downloaded [here](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1/tree/main). Only the five files containing tokenizer-related contents (`special_tokens_map.json`, `tokenizer.json`, `tokenizer.model`, `tokenizer.model.v1`, `tokenizer_config.json`) are needed. 
+
+### Data preprocessing


Once the processed data is uploaded to MLC, please move these sections to an appendix and just share instructions on how to download the processed data.

ShriyaPalsamudram · 2025-01-09T15:50:53Z

large_language_model/nemo/pretrain_llama31.py

+
+    exp_name = size
+
+    if size == "8b":


Mention that 8b and 70b are there only for debugging and that the benchmark is actually llama3.1 405b to avoid confusion

ShriyaPalsamudram · 2025-01-09T15:52:16Z

large_language_model/nemo/pretrain_llama31.py

+    model_group.add_argument(
+        "--size", 
+        type=str, 
+        default="8b", 


Change default to 405b

Elnifio and others added 7 commits September 26, 2024 11:19

Initial commit of Llama 3.1 405B ref

6b96832

removes comments

29b82f9

adds checkpoint loading and full C4 dataset loading

18b3bc9

updates checkpointings and instructions

56b400a

adds MLPerf callbacks

9eeb1cb

Changes the dataset sources and adds multiple seeds

46784ab

Merge branch 'mlcommons:master' into llama31-ref

2e42ad9

Elnifio requested a review from a team as a code owner January 8, 2025 08:00

ShriyaPalsamudram reviewed Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] Adds Llama 3.1 reference implementation code #781

[LLM] Adds Llama 3.1 reference implementation code #781

Elnifio commented Jan 8, 2025

github-actions bot commented Jan 8, 2025

ShriyaPalsamudram Jan 9, 2025

ShriyaPalsamudram Jan 9, 2025

ShriyaPalsamudram Jan 9, 2025

ShriyaPalsamudram Jan 9, 2025

ShriyaPalsamudram Jan 9, 2025

ShriyaPalsamudram Jan 9, 2025


		### Steps to run and time

		To train Llama 3.1 405B, we need to fill out all fields in [config.sh](./config.sh). This file contains all configurations for Slurm cluster access and job submission configurations, directory mappings, containers, and model configurations.


		We use Mixtral 8x22B tokenizer in this benchmark. Tokenizer files can be downloaded [here](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1/tree/main). Only the five files containing tokenizer-related contents (`special_tokens_map.json`, `tokenizer.json`, `tokenizer.model`, `tokenizer.model.v1`, `tokenizer_config.json`) are needed.

		### Data preprocessing

[LLM] Adds Llama 3.1 reference implementation code #781

Are you sure you want to change the base?

[LLM] Adds Llama 3.1 reference implementation code #781

Conversation

Elnifio commented Jan 8, 2025

github-actions bot commented Jan 8, 2025

ShriyaPalsamudram Jan 9, 2025

Choose a reason for hiding this comment

ShriyaPalsamudram Jan 9, 2025

Choose a reason for hiding this comment

ShriyaPalsamudram Jan 9, 2025

Choose a reason for hiding this comment

ShriyaPalsamudram Jan 9, 2025

Choose a reason for hiding this comment

ShriyaPalsamudram Jan 9, 2025

Choose a reason for hiding this comment

ShriyaPalsamudram Jan 9, 2025

Choose a reason for hiding this comment