Improvement in the documentation of the plugin interface. #309

capcah · 2020-03-23T18:57:34Z

Hello all,

I'm implementing a plugin using the nccl_net.h API and I found the documentation of the functions pretty hard to understand. If you folks don't mind, I'll start sending PRs to improve the documentation there, based on my current understanding of the interface.

Additionally, there are some things that are not clear to me even after reading the code, such as: In test(), can the plugin disregard size if the request is not complete or should it contain the size of partially downloaded data?

Let me know if you want me to work on this.

The text was updated successfully, but these errors were encountered:

sjeaugey · 2020-03-23T19:23:39Z

The size needs only to be set when the request is complete.

Indeed the documentation is not as good as it could be; it's pretty much only the comments in nccl_net.h, the rest being in the implementation. The discovery / topology detection functions are used by src/graph/topo.cc, while the connection/communication part only happens in src/transport/net.cc (communication is all inside send/recv proxy functions).

We never put a lot of effort in documenting this, since it was not intended to be used by users, more by vendors/partners which we usually work closer with. Better documentation is always better though, so clarifying things won't hurt.

Also please note the API is evolving a bit version after version. Make sure you check the v3 structure here for NCCL 2.6 : https://github.com/NVIDIA/nccl/blob/v2.6/src/include/nccl_net.h.

We're interested in knowing who is developing and maintaining plugins so that we can give heads up and ping you when we make the API evolve (like in 2.6).

capcah · 2020-03-30T15:58:01Z

Hey Sylvain,

Thanks for the clarification! I'll reach out over email with some contact points in our side.

Regarding the API, there are a couple more things that are not clear:

How does exactly the handle argument in connect() and listen() work? As it understand, it is initialized during listen() and accessed during connect(), but that seems odd: Hows is the code calling connect() aware of what the handle is supposed to be?
For RMA memory registration, AFAIK, NCCL calls regMr() with a block of memory to be registered for remote access. Is there any way of giving NCCL a pointer to a memory region to use for its buffers instead or some way of controlling those addresses from the plugin?

Thanks again!

sjeaugey · 2020-03-30T17:01:16Z

Between listen() and connect(), NCCL exchanges the handle through a side channel (sockets).
Not currently, but not a bad idea. We already have some special code for Infiniband (to support fork()), making sure memory we allocate is page-aligned. That said, that would only be used for CPU memory allocation; if we want to use GPU Direct RDMA, the buffer would be on the GPU hence allocated through cudaMalloc. Also, the CPU memory we would allocate would need to be registered on the GPU (the GPU would need to access it, writing data directly to it). So, you would need to make sure cudaHostRegister would work on that memory allocation. In any case, we could imagine adding a memAlloc() function to the net plugin. @rashikakheria would that help the libfabrics plugin ?

capcah · 2020-03-30T19:46:45Z

Thanks! That makes a lot more sense now. Could you point me towards the code that does the OOB transport?
That seems reasonable, since the plugin is already able to tell whether or not it can handle GPU pointers, it would make sense to have it allocate the memory needed and deal with any NIC-specific quirks. If this sounds good to you and Rashika, I'd be glad to give it a shot.

sjeaugey · 2020-03-30T19:56:30Z

https://github.com/NVIDIA/nccl/blob/master/src/init.cc#L402
This is the function that connects peers, basically calling :
selectTransport .. transport->{send,recv}Setup .. net->listen
bootstrapSend/Recv
transport->{send,recv}Connect .. net->connect

The oob communication happens inside bootstrapSend/bootstrapRecv, in bootstrap.cc.

rashikakheria · 2020-03-31T01:00:48Z

1. Between listen() and connect(), NCCL exchanges the handle through a side channel (sockets).

2. Not currently, but not a bad idea. We already have some special code for Infiniband (to support fork()), making sure memory we allocate is page-aligned. That said, that would only be used for CPU memory allocation; if we want to use GPU Direct RDMA, the buffer would be on the GPU hence allocated through cudaMalloc. Also, the CPU memory we would allocate would need to be registered on the GPU (the GPU would need to access it, writing data directly to it). So, you would need to make sure cudaHostRegister would work on that memory allocation. In any case, we could imagine adding a memAlloc() function to the net plugin. @rashikakheria would that help the libfabrics plugin ?

I think that might help libfabrics as well. It would provide opportunity to do lesser memory registrations and reuse memory regions once it has been used.

capcah mentioned this issue Apr 17, 2020

Extend network plugin to support allocations. #327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement in the documentation of the plugin interface. #309

Improvement in the documentation of the plugin interface. #309

capcah commented Mar 23, 2020

sjeaugey commented Mar 23, 2020 •

edited

Loading

capcah commented Mar 30, 2020 •

edited

Loading

sjeaugey commented Mar 30, 2020

capcah commented Mar 30, 2020

sjeaugey commented Mar 30, 2020 •

edited

Loading

rashikakheria commented Mar 31, 2020

Improvement in the documentation of the plugin interface. #309

Improvement in the documentation of the plugin interface. #309

Comments

capcah commented Mar 23, 2020

sjeaugey commented Mar 23, 2020 • edited Loading

capcah commented Mar 30, 2020 • edited Loading

sjeaugey commented Mar 30, 2020

capcah commented Mar 30, 2020

sjeaugey commented Mar 30, 2020 • edited Loading

rashikakheria commented Mar 31, 2020

sjeaugey commented Mar 23, 2020 •

edited

Loading

capcah commented Mar 30, 2020 •

edited

Loading

sjeaugey commented Mar 30, 2020 •

edited

Loading