Distributed Training Examples & Scalability Benchmarks #11

avik-pal · 2022-02-02T16:06:18Z

Currently, FluxMPI has only 1 example. It would be good to showcase training of more image models -- ViT (FluxML/Metalhead.jl#105), ResNets, etc. from Metalhead and also benchmark their scaling across multiple GPUs.

dnabanita7 · 2022-02-02T17:11:22Z

I am not sure if it's appropriate to raise the concern here and it is vaguely related to this issue but for benchmarks, Can I suggest something of the sort like mlpack benchmarks. I really like how they are using valgrind for memory benchmarks and profiling, sqlite to store results, etc. The comparison amongst other ML libraries provides a better depiction of Flux and why to use Flux over other libraries.

avik-pal · 2022-02-02T17:21:00Z

I think that might be more relevant for FluxBench. I mainly want to test the scalability across GPUs something like https://horovod.readthedocs.io/en/stable/benchmarks.html

CarloLucibello · 2022-03-14T16:28:10Z

Can I ask for a minimal example without FastAI.jl? e.g. I'd like to see how this script should be changed for distributed training:
https://github.com/FluxML/model-zoo/blob/master/vision/vgg_cifar10/vgg_cifar10.jl

avik-pal changed the title ~~Distributed Training Examples/Benchmarks~~ Distributed Training Examples & Scalability Benchmarks Feb 2, 2022

dnabanita7 mentioned this issue Feb 5, 2022

scaling example for resnet50 #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Training Examples & Scalability Benchmarks #11

Distributed Training Examples & Scalability Benchmarks #11

avik-pal commented Feb 2, 2022 •

edited

Loading

dnabanita7 commented Feb 2, 2022

avik-pal commented Feb 2, 2022

CarloLucibello commented Mar 14, 2022

Distributed Training Examples & Scalability Benchmarks #11

Distributed Training Examples & Scalability Benchmarks #11

Comments

avik-pal commented Feb 2, 2022 • edited Loading

dnabanita7 commented Feb 2, 2022

avik-pal commented Feb 2, 2022

CarloLucibello commented Mar 14, 2022

avik-pal commented Feb 2, 2022 •

edited

Loading