Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize on either GiB/MiB or GB/MB in logging #94

Closed
awgu opened this issue Feb 27, 2024 · 3 comments
Closed

Standardize on either GiB/MiB or GB/MB in logging #94

awgu opened this issue Feb 27, 2024 · 3 comments

Comments

@awgu
Copy link
Contributor

awgu commented Feb 27, 2024

In my understanding, gibibyte (GiB) is $1024^3$ bytes, and mebibyte (MiB) is $1024^2$ bytes. However, gigabyte (GB) is $1000^3$ bytes, and megabyte (MB) is $1000$ bytes.

I think we should standardize on one or the other, maybe GiB since the memory profiler went that direction: pytorch/pytorch#120172 In that case, I think we mainly need to change "GB" to "GiB" in our logging.

@lessw2020
Copy link
Contributor

good catch. Right now we are actually calculating GiB in the logging but calling it (incorrectly) GB, so we are 7% off.

That said, I think we should stay consistent with how Nvidia specs their GPU's and atm it's in GB rather than GiB, so I would vote for correcting the calcs to 1000^3 rather than current 1024^3.

Here's H100 specs for example:
Uploading Screenshot 2024-02-27 at 10.48.52 AM.png…

@awgu
Copy link
Contributor Author

awgu commented Feb 27, 2024

For some reason I cannot see the screenshot. Interestingly, in pytorch/pytorch#120172, the user suggested that Nvidia labels it as GB but actually means GiB 😆 (pytorch/pytorch#120172 (comment))

@lessw2020
Copy link
Contributor

lessw2020 commented Feb 27, 2024

oh I think the comment that Nvidia labels things as GB in specs but means GiB is correct - nvidia-smi reports in MiB/GiB actually:

Screenshot 2024-02-27 at 11 03 03 AM

So agree with your initial assessment, lets' update it to GiB labelling (and MiB) and then we'll be both correct label wise and in synch with Nividia-smi and mem profiler.

Thanks for flagging this - I'll make a very simple PR right now to fix.

lessw2020 added a commit that referenced this issue Feb 27, 2024
this PR updates the GPU metrics to labelling as GiB - we were
calculating GiB but calling it GB.
(credit to @awgu for flagging this - issue
#94)

function names and member vars in metrics.py have been updated to _gib
instead of _gb for clarity, and the logging output now labels as GiB:
<img width="851" alt="Screenshot 2024-02-27 at 11 28 23 AM"
src="https://github.com/pytorch/torchtrain/assets/46302957/85eb260a-77e9-4c49-be8a-b1aaa10dc3e2">
@awgu awgu closed this as completed Feb 29, 2024
lessw2020 added a commit that referenced this issue Apr 18, 2024
this PR updates the GPU metrics to labelling as GiB - we were
calculating GiB but calling it GB.
(credit to @awgu for flagging this - issue
#94)

function names and member vars in metrics.py have been updated to _gib
instead of _gb for clarity, and the logging output now labels as GiB:
<img width="851" alt="Screenshot 2024-02-27 at 11 28 23 AM"
src="https://github.com/pytorch/torchtrain/assets/46302957/85eb260a-77e9-4c49-be8a-b1aaa10dc3e2">
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this issue Aug 17, 2024
this PR updates the GPU metrics to labelling as GiB - we were
calculating GiB but calling it GB.
(credit to @awgu for flagging this - issue
pytorch#94)

function names and member vars in metrics.py have been updated to _gib
instead of _gb for clarity, and the logging output now labels as GiB:
<img width="851" alt="Screenshot 2024-02-27 at 11 28 23 AM"
src="https://github.com/pytorch/torchtrain/assets/46302957/85eb260a-77e9-4c49-be8a-b1aaa10dc3e2">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants