-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Memory Benchmark #30
Comments
You should compare it to full attention! (Just set the |
Not sure how accurate these results are, but when I plot the memory usage with respect to the sequence length of a model with this setup Note that for shorter sequences, the transformer model ( |
@pabloppp yup, the actual Transformer would have probably ceased to work at around 2048, because the reversibility is still in play even if you turn on full attention. other hyperparameters to play around with is |
I'm gonna add here the trials I do myself with the corresponding memory usage: dim = 1024, seq_len=8960, depth=12, heads=16, batch_size=1: 8501MiB |
hi lucidrains. |
I did a few training runs of a simple Reformer module with different parameters and logged the GPU memory usage.
Of course, depending on your machine or other things these values can vary, but I thought it might be useful as a visual guide:
dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 1: 452 MB
dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 8: 992 MB
dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 16: 1584 MB
dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 32: 2866 MB
dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 64: 4606 MB
dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 128: 9788 MB
dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 1: 538 MB
dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 8: 1580 MB
dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 16: 2870 MB
dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 32: 4582 MB
dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 64: 9276 MB
dim = 512,seq_len = 1024, depth = 1, heads = 1, batch_size = 1: 682 MB
dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 8: 2904 MB
dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 16: 4634 MB
dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 32: 9310 MB
dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 1: 992 MB
dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 8: 4644 MB
dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 16: 9256 MB
dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 1: 1602 MB
dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 8: 8810 MB
dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 10: 10976 MB
dim = 512, seq_len = 8192, depth = 1, heads = 1, batch_size = 1: 2884 MB
dim = 512, seq_len = 8192, depth = 1, heads = 1, batch_size = 5: 11396 MB
dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 8: 992 MB
dim = 512, seq_len = 256, depth = 2, heads = 1, batch_size = 8: 1054 MB
dim = 512, seq_len = 256, depth = 4, heads = 1, batch_size = 8: 1142 MB
dim = 512, seq_len = 256, depth = 6, heads = 1, batch_size = 8: 1220 MB
dim = 512, seq_len = 256, depth = 12, heads = 1, batch_size = 8: 1512 MB
dim = 512, seq_len = 256, depth = 24, heads = 1, batch_size = 8: 2056 MB
dim = 512, seq_len = 256, depth = 24, heads = 1, batch_size = 16: 2680 MB
dim = 128, seq_len = 256, depth = 12, heads = 1, batch_size = 8: 566 MB
dim = 128, seq_len = 256, depth = 12, heads = 2, batch_size = 8: 576 MB
dim = 128, seq_len = 256, depth = 12, heads = 4, batch_size = 8: 616 MB
dim = 128, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 732 MB
dim = 128, seq_len = 256, depth = 12, heads = 16, batch_size = 8: 1000 MB
dim = 32, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 644 MB
dim = 64, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 670 MB
dim = 128, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 732 MB
dim = 256, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 918 MB
dim = 512, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 1516 MB
dim = 1024, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 3552 MB
dim = 512, seq_len = 4096, depth = 6, heads = 8, batch_size = 8: 9672 MB
dim = 128, seq_len = 4096, depth = 12, heads = 8, batch_size = 8: 6270 MB
dim = 512, seq_len = 8192, depth = 12, heads = 8, batch_size = 1: 3628 MB
dim = 512, seq_len = 8192, depth = 12, heads = 8, batch_size = 4: 10048 MB
dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 32: 4608 MB
dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 64: 8052 MB
dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 80: 9990 MB
The text was updated successfully, but these errors were encountered: