GPUDirect RDMA #18

omor1 · 2020-05-02T23:36:54Z

I'm having trouble getting GDR working. My current understanding is that the way GDR works is that the InfiniBand driver has a plugin that interacts with the CUDA driver and runtime and the IB Verbs memory registration/deregistration functions are extended to be aware of GPU memory. From the application perspective, it doesn't really need to change anything; it can just pass a CUDA device pointer into the memory registration function and then use it for RDMA.

I'm having trouble getting this to work though, with a segfault resulting from ibv_post_send after the sendd/recvd rendezvous. Specifically, the segfault is the libc memcpy implementation; this indicates to me that I'm either missing something or GDR isn't set up right.

Vu, do you have any idea what could be going on? I unfortunately only have access to Comet's GPU nodes, so I can't try another platform.

The text was updated successfully, but these errors were encountered:

omor1 · 2020-05-03T00:12:47Z

Actually I've discovered that Comet's OFED stack might not support GDR.

omor1 · 2020-05-03T07:26:24Z

Actually, scratch that, this is on me. Small data (rightly) is attempted to be sent inline, (meaning a memcpy), but that obviously doesn't work if that memory is on a CUDA device. I can add a check in my code to ensure the pointer isn't on a CUDA device.

To be honest, does it make sense to get rid of this check in lc_server_rma_rtr? I think that this path is only taken by direct-send, meaning that the user wants to do RDMA. If an inline send were wanted, the user would have used immediate-send...

danghvu · 2020-05-11T05:34:08Z

Not completely understood the issue, can you answer the questions:

Have you tried the bare-metal GPUDirect and confirm it works on the cluster ?
Are you talking about this line for inline check: https://github.com/uiuc-hpc/LC/blob/51ef5280a5cc5a8b7e23501d6fc273ce5f0d8b28/src/include/server/server_ibv.h#L367 ? This is purely for performance consideration since registration is expensive. If this is for it to work with GPUDirect we may want to annotate the buffer somehow.

omor1 · 2020-05-11T06:28:58Z

Have you tried the bare-metal GPUDirect and confirm it works on the cluster?

As long as I make sure to send a buffer larger than the s->max_inline parameter above, GPUDirect RDMA works.

Are you talking about this line for inline check:

Yes. My thought is that if someone is choosing to use the direct communication type, they are intentionally opting into RDMA—we shouldn't hide this inline decision from them.

The workaround for GPUDirect RDMA is to a) detect if the buffer is in GPU memory and b) if so, either ignore the inline check or first copy it to host memory.

danghvu · 2020-05-11T06:56:28Z

I get your point, though this is purely implementation choice since the interface does not (yet) tell whether a registration to be performed. The fact is that the buffer is small, so you don’t need to register it before you send, you may — it is wasting cycles.

Maybe a better choice would be to say the user needs to register the buffer with the runtime first, then we just get the lkey from the user or from registration table.

The fix can be simple as adding a condition, if it is a gpu buffer then register anyway.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPUDirect RDMA #18

GPUDirect RDMA #18

omor1 commented May 2, 2020

omor1 commented May 3, 2020

omor1 commented May 3, 2020

danghvu commented May 11, 2020

omor1 commented May 11, 2020

danghvu commented May 11, 2020

GPUDirect RDMA #18

GPUDirect RDMA #18

Comments

omor1 commented May 2, 2020

omor1 commented May 3, 2020

omor1 commented May 3, 2020

danghvu commented May 11, 2020

omor1 commented May 11, 2020

danghvu commented May 11, 2020