1IBV_WR API(3) Libibverbs Programmer’s Manual IBV_WR API(3)
2
3
4
6 ibv_wr_abort, ibv_wr_complete, ibv_wr_start - Manage regions allowed to
7 post work
8
9 ibv_wr_atomic_cmp_swp, ibv_wr_atomic_fetch_add - Post remote atomic op‐
10 eration work requests
11
12 ibv_wr_bind_mw, ibv_wr_local_inv - Post work requests for memory win‐
13 dows
14
15 ibv_wr_rdma_read, ibv_wr_rdma_write, ibv_wr_rdma_write_imm,
16 ibv_wr_flush - Post RDMA work requests
17
18 ibv_wr_send, ibv_wr_send_imm, ibv_wr_send_inv - Post send work requests
19
20 ibv_wr_send_tso - Post segmentation offload work requests
21
22 ibv_wr_set_inline_data, ibv_wr_set_inline_data_list - Attach inline da‐
23 ta to the last work request
24
25 ibv_wr_set_sge, ibv_wr_set_sge_list - Attach data to the last work re‐
26 quest
27
28 ibv_wr_set_ud_addr - Attach UD addressing info to the last work request
29
30 ibv_wr_set_xrc_srqn - Attach an XRC SRQN to the last work request
31
33 #include <infiniband/verbs.h>
34
35 void ibv_wr_abort(struct ibv_qp_ex *qp);
36 int ibv_wr_complete(struct ibv_qp_ex *qp);
37 void ibv_wr_start(struct ibv_qp_ex *qp);
38
39 void ibv_wr_atomic_cmp_swp(struct ibv_qp_ex *qp, uint32_t rkey,
40 uint64_t remote_addr, uint64_t compare,
41 uint64_t swap);
42 void ibv_wr_atomic_fetch_add(struct ibv_qp_ex *qp, uint32_t rkey,
43 uint64_t remote_addr, uint64_t add);
44
45 void ibv_wr_bind_mw(struct ibv_qp_ex *qp, struct ibv_mw *mw, uint32_t rkey,
46 const struct ibv_mw_bind_info *bind_info);
47 void ibv_wr_local_inv(struct ibv_qp_ex *qp, uint32_t invalidate_rkey);
48
49 void ibv_wr_rdma_read(struct ibv_qp_ex *qp, uint32_t rkey,
50 uint64_t remote_addr);
51 void ibv_wr_rdma_write(struct ibv_qp_ex *qp, uint32_t rkey,
52 uint64_t remote_addr);
53 void ibv_wr_rdma_write_imm(struct ibv_qp_ex *qp, uint32_t rkey,
54 uint64_t remote_addr, __be32 imm_data);
55
56 void ibv_wr_send(struct ibv_qp_ex *qp);
57 void ibv_wr_send_imm(struct ibv_qp_ex *qp, __be32 imm_data);
58 void ibv_wr_send_inv(struct ibv_qp_ex *qp, uint32_t invalidate_rkey);
59 void ibv_wr_send_tso(struct ibv_qp_ex *qp, void *hdr, uint16_t hdr_sz,
60 uint16_t mss);
61
62 void ibv_wr_set_inline_data(struct ibv_qp_ex *qp, void *addr, size_t length);
63 void ibv_wr_set_inline_data_list(struct ibv_qp_ex *qp, size_t num_buf,
64 const struct ibv_data_buf *buf_list);
65 void ibv_wr_set_sge(struct ibv_qp_ex *qp, uint32_t lkey, uint64_t addr,
66 uint32_t length);
67 void ibv_wr_set_sge_list(struct ibv_qp_ex *qp, size_t num_sge,
68 const struct ibv_sge *sg_list);
69
70 void ibv_wr_set_ud_addr(struct ibv_qp_ex *qp, struct ibv_ah *ah,
71 uint32_t remote_qpn, uint32_t remote_qkey);
72 void ibv_wr_set_xrc_srqn(struct ibv_qp_ex *qp, uint32_t remote_srqn);
73 void ibv_wr_flush(struct ibv_qp_ex *qp, uint32_t rkey, uint64_t remote_addr,
74 size_t len, uint8_t type, uint8_t level);
75
77 The verbs work request API (ibv_wr_*) allows efficient posting of work
78 to a send queue using function calls instead of the struct based
79 ibv_post_send() scheme. This approach is designed to minimize CPU
80 branching and locking during the posting process.
81
82 This API is intended to be used to access additional functionality be‐
83 yond what is provided by ibv_post_send().
84
85 WRs batches of ibv_post_send() and this API WRs batches can interleave
86 together just if they are not posted within the critical region of each
87 other. (A critical region in this API formed by ibv_wr_start() and
88 ibv_wr_complete()/ibv_wr_abort())
89
91 To use these APIs the QP must be created using ibv_create_qp_ex() which
92 allows setting the IBV_QP_INIT_ATTR_SEND_OPS_FLAGS in comp_mask. The
93 send_ops_flags should be set to the OR of the work request types that
94 will be posted to the QP.
95
96 If the QP does not support all the requested work request types then QP
97 creation will fail.
98
99 Posting work requests to the QP is done within the critical region
100 formed by ibv_wr_start() and ibv_wr_complete()/ibv_wr_abort() (see CON‐
101 CURRENCY below).
102
103 Each work request is created by calling a WR builder function (see the
104 table column WR builder below) to start creating the work request, fol‐
105 lowed by allowed/required setter functions described below.
106
107 The WR builder and setter combination can be called multiple times to
108 efficiently post multiple work requests within a single critical re‐
109 gion.
110
111 Each WR builder will use the wr_id member of struct ibv_qp_ex to set
112 the value to be returned in the completion. Some operations will also
113 use the wr_flags member to influence operation (see Flags below).
114 These values should be set before invoking the WR builder function.
115
116 For example a simple send could be formed as follows:
117
118 qpx->wr_id = 1;
119 ibv_wr_send(qpx);
120 ibv_wr_set_sge(qpx, lkey, &data, sizeof(data));
121
122 The section WORK REQUESTS describes the various WR builders and setters
123 in details.
124
125 Posting work is completed by calling ibv_wr_complete() or
126 ibv_wr_abort(). No work is executed to the queue until ibv_wr_com‐
127 plete() returns success. ibv_wr_abort() will discard all work prepared
128 since ibv_wr_start().
129
131 Many of the operations match the opcodes available for ibv_post_send().
132 Each operation has a WR builder function, a list of allowed setters,
133 and a flag bit to request the operation with send_ops_flags in struct
134 ibv_qp_init_attr_ex (see the EXAMPLE below).
135
136 Operation WR builder QP Type Supported setters
137 ─────────────────────────────────────────────────────────────────────────────────
138 ATOM‐ ibv_wr_atom‐ RC, XRC_SEND DATA,
139 IC_CMP_AND_SWP ic_cmp_swp() QP
140 ATOM‐ ibv_wr_atom‐ RC, XRC_SEND DATA,
141 IC_FETCH_AND_ADD ic_fetch_add() QP
142 BIND_MW ibv_wr_bind_mw() UC, RC, XRC_SEND NONE
143 LOCAL_INV ibv_wr_local_inv() UC, RC, XRC_SEND NONE
144 RDMA_READ ibv_wr_rdma_read() RC, XRC_SEND DATA,
145 QP
146 RDMA_WRITE ibv_wr_rdma_write() UC, RC, XRC_SEND DATA,
147 QP
148 FLUSH ibv_wr_flush() RC, RD, XRC_SEND DATA,
149 QP
150 RD‐ ibv_wr_rd‐ UC, RC, XRC_SEND DATA,
151 MA_WRITE_WITH_IMM ma_write_imm() QP
152 SEND ibv_wr_send() UD, UC, RC, XRC_SEND, DATA,
153 RAW_PACKET QP
154 SEND_WITH_IMM ibv_wr_send_imm() UD, UC, RC, SRC SEND DATA,
155 QP
156 SEND_WITH_INV ibv_wr_send_inv() UC, RC, XRC_SEND DATA,
157 QP
158 TSO ibv_wr_send_tso() UD, RAW_PACKET DATA,
159 QP
160
161 Atomic operations
162 Atomic operations are only atomic so long as all writes to memory go
163 only through the same RDMA hardware. It is not atomic with writes per‐
164 formed by the CPU, or by other RDMA hardware in the system.
165
166 ibv_wr_atomic_cmp_swp()
167 If the remote 64 bit memory location specified by rkey and re‐
168 mote_addr equals compare then set it to swap.
169
170 ibv_wr_atomic_fetch_add()
171 Add add to the 64 bit memory location specified rkey and re‐
172 mote_addr.
173
174 Memory Windows
175 Memory window type 2 operations (See man page for ibv_alloc_mw).
176
177 ibv_wr_bind_mw()
178 Bind a MW type 2 specified by mw, set a new rkey and set its
179 properties by bind_info.
180
181 ibv_wr_local_inv()
182 Invalidate a MW type 2 which is associated with rkey.
183
184 RDMA
185 ibv_wr_rdma_read()
186 Read from the remote memory location specified rkey and re‐
187 mote_addr. The number of bytes to read, and the local location
188 to store the data, is determined by the DATA buffers set after
189 this call.
190
191 ibv_wr_rdma_write(), ibv_wr_rdma_write_imm()
192 Write to the remote memory location specified rkey and re‐
193 mote_addr. The number of bytes to read, and the local location
194 to get the data, is determined by the DATA buffers set after
195 this call.
196
197 The _imm version causes the remote side to get a IBV_WC_RECV_RD‐
198 MA_WITH_IMM containing the 32 bits of immediate data.
199
200 Message Send
201 ibv_wr_send(), ibv_wr_send_imm()
202 Send a message. The number of bytes to send, and the local lo‐
203 cation to get the data, is determined by the DATA buffers set
204 after this call.
205
206 The _imm version causes the remote side to get a IBV_WC_RECV_RD‐
207 MA_WITH_IMM containing the 32 bits of immediate data.
208
209 ibv_wr_send_inv()
210 The data transfer is the same as for ibv_wr_send(), however the
211 remote side will invalidate the MR specified by invalidate_rkey
212 before delivering a completion.
213
214 ibv_wr_send_tso()
215 Produce multiple SEND messages using TCP Segmentation Offload.
216 The SGE points to a TCP Stream buffer which will be segmented
217 into MSS size SENDs. The hdr includes the entire network head‐
218 ers up to and including the TCP header and is prefixed before
219 each segment.
220
221 QP Specific setters
222 Certain QP types require each post to be accompanied by additional set‐
223 ters, these setters are mandatory for any operation listing a QP setter
224 in the above table.
225
226 UD QPs ibv_wr_set_ud_addr() must be called to set the destination ad‐
227 dress of the work.
228
229 XRC_SEND QPs
230 ibv_wr_set_xrc_srqn() must be called to set the destination SRQN
231 field.
232
233 DATA transfer setters
234 For work that requires to transfer data one of the following setters
235 should be called once after the WR builder:
236
237 ibv_wr_set_sge()
238 Transfer data to/from a single buffer given by the lkey, addr
239 and length. This is equivalent to ibv_wr_set_sge_list() with a
240 single element.
241
242 ibv_wr_set_sge_list()
243 Transfer data to/from a list of buffers, logically concatenated
244 together. Each buffer is specified by an element in an array of
245 struct ibv_sge.
246
247 Inline setters will copy the send data during the setter and allows the
248 caller to immediately re-use the buffer. This behavior is identical to
249 the IBV_SEND_INLINE flag. Generally this copy is done in a way that
250 optimizes SEND latency and is suitable for small messages. The
251 provider will limit the amount of data it can support in a single oper‐
252 ation. This limit is requested in the max_inline_data member of struct
253 ibv_qp_init_attr. Valid only for SEND and RDMA_WRITE.
254
255 ibv_wr_set_inline_data()
256 Copy send data from a single buffer given by the addr and
257 length. This is equivalent to ibv_wr_set_inline_data_list()
258 with a single element.
259
260 ibv_wr_set_inline_data_list()
261 Copy send data from a list of buffers, logically concatenated
262 together. Each buffer is specified by an element in an array of
263 struct ibv_inl_data.
264
265 Flags
266 A bit mask of flags may be specified in wr_flags to control the behav‐
267 ior of the work request.
268
269 IBV_SEND_FENCE
270 Do not start this work request until prior work has completed.
271
272 IBV_SEND_IP_CSUM
273 Offload the IPv4 and TCP/UDP checksum calculation
274
275 IBV_SEND_SIGNALED
276 A completion will be generated in the completion queue for the
277 operation.
278
279 IBV_SEND_SOLICITED
280 Set the solicited bit in the RDMA packet. This informs the oth‐
281 er side to generate a completion event upon receiving the RDMA
282 operation.
283
285 The provider will provide locking to ensure that ibv_wr_start() and
286 ibv_wr_complete()/abort() form a per-QP critical section where no other
287 threads can enter.
288
289 If an ibv_td is provided during QP creation then no locking will be
290 performed and it is up to the caller to ensure that only one thread can
291 be within the critical region at a time.
292
294 Applications should use this API in a way that does not create fail‐
295 ures. The individual APIs do not return a failure indication to avoid
296 branching.
297
298 If a failure is detected during operation, for instance due to an in‐
299 valid argument, then ibv_wr_complete() will return failure and the en‐
300 tire posting will be aborted.
301
303 /* create RC QP type and specify the required send opcodes */
304 qp_init_attr_ex.qp_type = IBV_QPT_RC;
305 qp_init_attr_ex.comp_mask |= IBV_QP_INIT_ATTR_SEND_OPS_FLAGS;
306 qp_init_attr_ex.send_ops_flags |= IBV_QP_EX_WITH_RDMA_WRITE;
307 qp_init_attr_ex.send_ops_flags |= IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM;
308
309 ibv_qp *qp = ibv_create_qp_ex(ctx, qp_init_attr_ex);
310 ibv_qp_ex *qpx = ibv_qp_to_qp_ex(qp);
311
312 ibv_wr_start(qpx);
313
314 /* create 1st WRITE WR entry */
315 qpx->wr_id = my_wr_id_1;
316 ibv_wr_rdma_write(qpx, rkey, remote_addr_1);
317 ibv_wr_set_sge(qpx, lkey, local_addr_1, length_1);
318
319 /* create 2nd WRITE_WITH_IMM WR entry */
320 qpx->wr_id = my_wr_id_2;
321 qpx->wr_flags = IBV_SEND_SIGNALED;
322 ibv_wr_rdma_write_imm(qpx, rkey, remote_addr_2, htonl(0x1234));
323 ibv_set_wr_sge(qpx, lkey, local_addr_2, length_2);
324
325 /* Begin processing WRs */
326 ret = ibv_wr_complete(qpx);
327
329 ibv_post_send(3), ibv_create_qp_ex(3).
330
332 Jason Gunthorpe <jgg@mellanox.com> Guy Levi <guyle@mellanox.com>
333
334
335
336libibverbs 2018-11-27 IBV_WR API(3)