Infiniband Application's Memory Buffer Mapped into the RDMA Device's Address Space
Application's (InfiniBand/libibverb based) Memory Buffer Mapped into the RDMA (IB) Device's Address Space
Steps to Map Memory Buffer into RDMA Device's Address Space
- Memory Allocation:
- The application first allocates a memory buffer that it intends to use for RDMA operations. This buffer can be used for sending, receiving, or direct memory access operations.
- Protection Domain Creation:
- A protection domain is created to define a scope within which resources like memory regions, queue pairs, and completion queues are associated. It acts as a security boundary for RDMA operations.
- Memory Registration:
- The application registers the allocated memory buffer with the RDMA hardware. This is done using the Verbs API, which provides functions to interact with RDMA devices.
- The registration process involves creating a memory region (MR) object, specifying the buffer's virtual address, size, and access permissions (e.g., read, write, atomic).
- Pinning Memory:
- During registration, the RDMA hardware pins the memory, ensuring it remains in physical memory and is not swapped out by the operating system. This guarantees consistent access times and prevents page faults during RDMA operations.
- Assigning Memory Keys:
- The RDMA hardware assigns an
lkey(local key) and anrkey(remote key) to the registered memory region. These keys are used to control access to the memory region during RDMA operations. - The
lkeyis used by the local RDMA device to access the memory region, while therkeyis shared with remote devices that need access.
- The RDMA hardware assigns an
- Mapping into RDMA Device's Address Space:
- The registered memory region is mapped into the RDMA device's address space, allowing the RDMA hardware to access it directly.
- This mapping is facilitated by the RDMA hardware, which uses the memory keys to verify access permissions and perform operations directly on the memory.
Verbs API Functions
- ibv_reg_mr(): This function is used to register a memory region with the RDMA device. It takes parameters such as the protection domain, buffer address, buffer length, and access permissions, and returns a handle to the memory region along with the
lkeyandrkey. - ibv_dereg_mr(): This function is used to deregister a memory region, releasing resources associated with it.
Benefits of Memory Mapping
- Zero-Copy Data Transfer: By mapping memory directly into the RDMA device's address space, RDMA operations can be performed without intermediate buffering, reducing latency and CPU overhead.
- High Throughput and Low Latency: Direct memory access enables high-speed data transfers, making RDMA ideal for high-performance computing applications.
- Efficient Resource Utilization: Memory mapping minimizes context switches and data copying, ensuring efficient use of system resources.
Security Considerations
- Access Control: Proper access control mechanisms should be in place to ensure that only authorized entities can access the mapped memory regions.
- Secure Key Exchange: Ensure that
rkeyvalues are exchanged securely to prevent unauthorized access. - a vendor-specific kernel module (e.g.
ib_mthcafor Mellanox devices) - a kernel module that allows verbs access from userspace (
ib_uverbs) - an user-space vendor driver library (e.g.
libmthca) - a glue component between the previous two (
libibverbs)
InfiniBand supports in general two semantics - packet-based operation and remote DMA. No matter the mode of operation, both implement zero-copy by directly reading from and writing to the application buffer(s). This is done (as already explained by haggai_e) by fixing the buffer in physical memory (also called registering), thus preventing the virtual memory manager from swapping it off to the disk or moving it around in the physical RAM. A very nice feature of InfiniBand is that each HCA has its own virtual-to-physical address translation engine which allows one to pass userspace pointers directly to the hardware.
The reason to have a user-level driver is that verbs exposes directly the HCA's hardware registers to the userspace and each HCA has a different set of registers, therefore the need for an intermediate userspace layer. Of course, it could be implemented entirely in the kernel and then a single vendor-independent userspace library could be used, but InfiniBand tries very hard to provide as low latency as possible and having to go through the kernel every time will be very expensive. The fact that RDMA devices can translate virtual addresses on their own means that the userspace library does not have to go through the kernel in order to obtain the physical address of the buffer when creating entries in the work queues (part of the mechanism used by verbs to send and receive data).
Note that there are basically two vendor libraries - one in the kernel and one in userspace. The former provides verbs functionality to other kernel modules like file systems (e.g. Lustre) or network protocol drivers (e.g. IP-over-InfiniBand), while the latter provides that functionality in userspace. Some operations cannot be done entirely in userspace, e.g. registering memory or opening/closing device contexts, and those are transparently passed to the kernel module by
libibverbs.Although technically RDMA over Converged Ethernet (RoCE, implemented in userspace as
librxe) is not InfiniBand on the hardware level, the OpenFabrics stack is designed in such a way as to support RDMA-capable hardware other than InfiniBand HCAs, including RoCE and iWARP adapters.- a vendor-specific kernel module (e.g.




Comments
Post a Comment