-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement in the documentation of the plugin interface. #309
Comments
The size needs only to be set when the request is complete. Indeed the documentation is not as good as it could be; it's pretty much only the comments in nccl_net.h, the rest being in the implementation. The discovery / topology detection functions are used by src/graph/topo.cc, while the connection/communication part only happens in src/transport/net.cc (communication is all inside send/recv proxy functions). We never put a lot of effort in documenting this, since it was not intended to be used by users, more by vendors/partners which we usually work closer with. Better documentation is always better though, so clarifying things won't hurt. Also please note the API is evolving a bit version after version. Make sure you check the v3 structure here for NCCL 2.6 : https://github.com/NVIDIA/nccl/blob/v2.6/src/include/nccl_net.h. We're interested in knowing who is developing and maintaining plugins so that we can give heads up and ping you when we make the API evolve (like in 2.6). |
Hey Sylvain, Thanks for the clarification! I'll reach out over email with some contact points in our side. Regarding the API, there are a couple more things that are not clear:
Thanks again! |
|
|
https://github.com/NVIDIA/nccl/blob/master/src/init.cc#L402 The oob communication happens inside bootstrapSend/bootstrapRecv, in bootstrap.cc. |
I think that might help libfabrics as well. It would provide opportunity to do lesser memory registrations and reuse memory regions once it has been used. |
Hello all,
I'm implementing a plugin using the nccl_net.h API and I found the documentation of the functions pretty hard to understand. If you folks don't mind, I'll start sending PRs to improve the documentation there, based on my current understanding of the interface.
Additionally, there are some things that are not clear to me even after reading the code, such as: In
test()
, can the plugin disregardsize
if the request is not complete or should it contain the size of partially downloaded data?Let me know if you want me to work on this.
The text was updated successfully, but these errors were encountered: