Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[405B] Add performance data for 405B model #554

Merged
merged 6 commits into from
Aug 23, 2024
Merged

[405B] Add performance data for 405B model #554

merged 6 commits into from
Aug 23, 2024

Conversation

fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Aug 21, 2024

In this PR, we mostly measured the performance and loss curves for 405B model with some optimizations techniques we recently developed. We also want to log the actual peak TFLOPs used for MFU calculation for cross-validation. Also we should get device information from system rather from device name because it does not contain "NVL" or "SXM".

image

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 21, 2024
@fduwjj fduwjj mentioned this pull request Aug 21, 2024
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except that the peak flops section is still a bit mysterious to me.

fduwjj added 2 commits August 21, 2024 15:14
@fduwjj fduwjj requested a review from tianyu-l August 21, 2024 22:16
@fduwjj fduwjj requested a review from tianyu-l August 21, 2024 22:53
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove and delay the changes on peak flops (as it's more of a separate topic which needs further discussion) and land the rest for now.

Partially verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
We cannot verify signatures from co-authors, and some of the co-authors attributed to this commit require their commits to be signed.
@fduwjj fduwjj merged commit 9515a14 into main Aug 23, 2024
3 of 5 checks passed
@fduwjj fduwjj deleted the 405b_perf branch August 23, 2024 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants