Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments about Multi-GPU and Torch #9

Open
sirotenko opened this issue Dec 30, 2015 · 1 comment
Open

Comments about Multi-GPU and Torch #9

sirotenko opened this issue Dec 30, 2015 · 1 comment

Comments

@sirotenko
Copy link

Thanks for creating this comparison page. I think it will be usefull for many people.
Few comments:

  1. CNTK Multi-GPU. The paper you mentioned only presents results for fully-connected networks. And it can not be compared to distributed training of CNNs, for example. It turns out that distributed training of CNNs is much harder in a sense of acheiving high scalability factor. Currently noone demonstrated predictable and fast distributed training of CNNs.
  2. Model deployment. I think Torch mark could be 0.5 higher. The reason is since it is written on C and Lua, and lua in turn also written on C, very compact and embeddable, you actually can make Torch models run on exotic processors or DSPs (and even FPGA) which only have C compiler available.
  3. Architecture. This might be subjective but I would give Torch smaller mark. nn module alone looks good, but Torch is more then just nn, so as an inheritance of using Lua you have to use a lot of other modules.
@zer0n
Copy link
Owner

zer0n commented Dec 30, 2015

Thanks @sirotenko

  1. Agreed that distributed training of CNNs is harder. However, keep in mind that: (a) although there aren't many empirical evidences, 1-bit (or 2/4-bit) quantization of the gradient is a generic and cool technique for distributed training overall; (b) We don't have good benchmarks even single-node training, let alone distributed training; hence I don't provide ratings in the multi-GPU performance area (although I personally believe that CNTK would easily win).
  2. Model deployment. It's not about whether something can be done but whether it can be done easily and fits well with the rest of the production pipeline. As an evidence, my friend had trouble deploying a trained Torch model on Android.
  3. It's true that using Torch, you may need to use many modules. I personally like it because it keeps the architecture modular and compact. For example, most NN toolkits would embed SGD, etc. but in Torch, SGD is contained in the optim package and the expected scope of optim isn't limited to just NN or ML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants