GPU Memory Benchmark #30

pabloppp · 2020-02-10T12:51:48Z

I did a few training runs of a simple Reformer module with different parameters and logged the GPU memory usage.

Of course, depending on your machine or other things these values can vary, but I thought it might be useful as a visual guide:

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 1: 452 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 8: 992 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 16: 1584 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 32: 2866 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 64: 4606 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 128: 9788 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 1: 538 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 8: 1580 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 16: 2870 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 32: 4582 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 64: 9276 MB

dim = 512,seq_len = 1024, depth = 1, heads = 1, batch_size = 1: 682 MB

dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 8: 2904 MB

dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 16: 4634 MB

dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 32: 9310 MB

dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 1: 992 MB

dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 8: 4644 MB

dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 16: 9256 MB

dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 1: 1602 MB

dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 8: 8810 MB

dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 10: 10976 MB

dim = 512, seq_len = 8192, depth = 1, heads = 1, batch_size = 1: 2884 MB

dim = 512, seq_len = 8192, depth = 1, heads = 1, batch_size = 5: 11396 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 8: 992 MB

dim = 512, seq_len = 256, depth = 2, heads = 1, batch_size = 8: 1054 MB

dim = 512, seq_len = 256, depth = 4, heads = 1, batch_size = 8: 1142 MB

dim = 512, seq_len = 256, depth = 6, heads = 1, batch_size = 8: 1220 MB

dim = 512, seq_len = 256, depth = 12, heads = 1, batch_size = 8: 1512 MB

dim = 512, seq_len = 256, depth = 24, heads = 1, batch_size = 8: 2056 MB

dim = 512, seq_len = 256, depth = 24, heads = 1, batch_size = 16: 2680 MB

dim = 128, seq_len = 256, depth = 12, heads = 1, batch_size = 8: 566 MB

dim = 128, seq_len = 256, depth = 12, heads = 2, batch_size = 8: 576 MB

dim = 128, seq_len = 256, depth = 12, heads = 4, batch_size = 8: 616 MB

dim = 128, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 732 MB

dim = 128, seq_len = 256, depth = 12, heads = 16, batch_size = 8: 1000 MB

dim = 32, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 644 MB

dim = 64, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 670 MB

dim = 128, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 732 MB

dim = 256, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 918 MB

dim = 512, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 1516 MB

dim = 1024, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 3552 MB

dim = 512, seq_len = 4096, depth = 6, heads = 8, batch_size = 8: 9672 MB

dim = 128, seq_len = 4096, depth = 12, heads = 8, batch_size = 8: 6270 MB

dim = 512, seq_len = 8192, depth = 12, heads = 8, batch_size = 1: 3628 MB

dim = 512, seq_len = 8192, depth = 12, heads = 8, batch_size = 4: 10048 MB

dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 32: 4608 MB

dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 64: 8052 MB

dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 80: 9990 MB

lucidrains · 2020-02-10T21:23:10Z

You should compare it to full attention! (Just set the use_full_attn flag to True)

pabloppp · 2020-02-10T23:06:31Z

Not sure how accurate these results are, but when I plot the memory usage with respect to the sequence length of a model with this setup dim = 128, depth = 6, heads = 4, batch_size = 4 I get this:

Note that for shorter sequences, the transformer model (use_full_attn flag to True) seems to be slightly less memory intensive, but starting with sequence length of 4096 the Reformer seems to work much better, while the transformer's memory soars and I run out of memory with a sequence length of 8192 (I have 11GB of memory), with the reformer I am able to get to up to a sequence length of 16384 without even filling the memory (Only uses 8GB) so we could probably get even higher.

lucidrains · 2020-02-10T23:12:56Z

@pabloppp yup, the actual Transformer would have probably ceased to work at around 2048, because the reversibility is still in play even if you turn on full attention. other hyperparameters to play around with is ff_chunks, which you can increase for further CPU / memory trade offs

avacaondata · 2020-02-20T12:55:08Z

I'm gonna add here the trials I do myself with the corresponding memory usage:

dim = 1024, seq_len=8960, depth=12, heads=16, batch_size=1: 8501MiB

jaideep11061982 · 2020-12-30T11:05:49Z

hi lucidrains.
How i pass parameter that says i want these many encoder and these many decoders.
Say 2 enc,2 decoder

pabloppp changed the title ~~[NOT AN ISSUE] Memory Benchmark~~ [NOT AN ISSUE] GPU Memory Benchmark Feb 10, 2020

lucidrains changed the title ~~[NOT AN ISSUE] GPU Memory Benchmark~~ GPU Memory Benchmark Mar 6, 2020

lucidrains added the documentation Improvements or additions to documentation label Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Memory Benchmark #30

GPU Memory Benchmark #30

pabloppp commented Feb 10, 2020

lucidrains commented Feb 10, 2020 •

edited

Loading

pabloppp commented Feb 10, 2020

lucidrains commented Feb 10, 2020

avacaondata commented Feb 20, 2020

jaideep11061982 commented Dec 30, 2020

GPU Memory Benchmark #30

GPU Memory Benchmark #30

Comments

pabloppp commented Feb 10, 2020

lucidrains commented Feb 10, 2020 • edited Loading

pabloppp commented Feb 10, 2020

lucidrains commented Feb 10, 2020

avacaondata commented Feb 20, 2020

jaideep11061982 commented Dec 30, 2020

lucidrains commented Feb 10, 2020 •

edited

Loading