1BPF-HELPERS(7) Linux Programmer's Manual BPF-HELPERS(7)
2
3
4
6 BPF-HELPERS - list of eBPF helper functions
7
9 The extended Berkeley Packet Filter (eBPF) subsystem consists in pro‐
10 grams written in a pseudo-assembly language, then attached to one of
11 the several kernel hooks and run in reaction of specific events. This
12 framework differs from the older, "classic" BPF (or "cBPF") in several
13 aspects, one of them being the ability to call special functions (or
14 "helpers") from within a program. These functions are restricted to a
15 white-list of helpers defined in the kernel.
16
17 These helpers are used by eBPF programs to interact with the system, or
18 with the context in which they work. For instance, they can be used to
19 print debugging messages, to get the time since the system was booted,
20 to interact with eBPF maps, or to manipulate network packets. Since
21 there are several eBPF program types, and that they do not run in the
22 same context, each program type can only call a subset of those
23 helpers.
24
25 Due to eBPF conventions, a helper can not have more than five argu‐
26 ments.
27
28 Internally, eBPF programs call directly into the compiled helper func‐
29 tions without requiring any foreign-function interface. As a result,
30 calling helpers introduces no overhead, thus offering excellent perfor‐
31 mance.
32
33 This document is an attempt to list and document the helpers available
34 to eBPF developers. They are sorted by chronological order (the oldest
35 helpers in the kernel at the top).
36
38 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
39
40 Description
41 Perform a lookup in map for an entry associated to key.
42
43 Return Map value associated to key, or NULL if no entry was
44 found.
45
46 int bpf_map_update_elem(struct bpf_map *map, const void *key, const
47 void *value, u64 flags)
48
49 Description
50 Add or update the value of the entry associated to key in
51 map with value. flags is one of:
52
53 BPF_NOEXIST
54 The entry for key must not exist in the map.
55
56 BPF_EXIST
57 The entry for key must already exist in the map.
58
59 BPF_ANY
60 No condition on the existence of the entry for
61 key.
62
63 Flag value BPF_NOEXIST cannot be used for maps of types
64 BPF_MAP_TYPE_ARRAY or BPF_MAP_TYPE_PERCPU_ARRAY (all
65 elements always exist), the helper would return an error.
66
67 Return 0 on success, or a negative error in case of failure.
68
69 int bpf_map_delete_elem(struct bpf_map *map, const void *key)
70
71 Description
72 Delete entry with key from map.
73
74 Return 0 on success, or a negative error in case of failure.
75
76 int bpf_map_push_elem(struct bpf_map *map, const void *value, u64
77 flags)
78
79 Description
80 Push an element value in map. flags is one of:
81
82 BPF_EXIST If the queue/stack is full, the oldest element
83 is removed to make room for this.
84
85 Return 0 on success, or a negative error in case of failure.
86
87 int bpf_probe_read(void *dst, u32 size, const void *src)
88
89 Description
90 For tracing programs, safely attempt to read size bytes
91 from address src and store the data in dst.
92
93 Return 0 on success, or a negative error in case of failure.
94
95 u64 bpf_ktime_get_ns(void)
96
97 Description
98 Return the time elapsed since system boot, in nanosec‐
99 onds.
100
101 Return Current ktime.
102
103 int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
104
105 Description
106 This helper is a "printk()-like" facility for debugging.
107 It prints a message defined by format fmt (of size
108 fmt_size) to file /sys/kernel/debug/tracing/trace from
109 DebugFS, if available. It can take up to three additional
110 u64 arguments (as an eBPF helpers, the total number of
111 arguments is limited to five).
112
113 Each time the helper is called, it appends a line to the
114 trace. The format of the trace is customizable, and the
115 exact output one will get depends on the options set in
116 /sys/kernel/debug/tracing/trace_options (see also the
117 README file under the same directory). However, it usu‐
118 ally defaults to something like:
119
120 telnet-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
121
122 In the above:
123
124 · telnet is the name of the current task.
125
126 · 470 is the PID of the current task.
127
128 · 001 is the CPU number on which the task is running.
129
130 · In .N.., each character refers to a set of options
131 (whether irqs are enabled, scheduling options,
132 whether hard/softirqs are running, level of pre‐
133 empt_disabled respectively). N means that
134 TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED are set.
135
136 · 419421.045894 is a timestamp.
137
138 · 0x00000001 is a fake value used by BPF for the
139 instruction pointer register.
140
141 · <formatted msg> is the message formatted with fmt.
142
143 The conversion specifiers supported by fmt are similar,
144 but more limited than for printk(). They are %d, %i, %u,
145 %x, %ld, %li, %lu, %lx, %lld, %lli, %llu, %llx, %p, %s.
146 No modifier (size of field, padding with zeroes, etc.) is
147 available, and the helper will return -EINVAL (but print
148 nothing) if it encounters an unknown specifier.
149
150 Also, note that bpf_trace_printk() is slow, and should
151 only be used for debugging purposes. For this reason, a
152 notice bloc (spanning several lines) is printed to kernel
153 logs and states that the helper should not be used "for
154 production use" the first time this helper is used (or
155 more precisely, when trace_printk() buffers are allo‐
156 cated). For passing values to user space, perf events
157 should be preferred.
158
159 Return The number of bytes written to the buffer, or a negative
160 error in case of failure.
161
162 u32 bpf_get_prandom_u32(void)
163
164 Description
165 Get a pseudo-random number.
166
167 From a security point of view, this helper uses its own
168 pseudo-random internal state, and cannot be used to infer
169 the seed of other random functions in the kernel. How‐
170 ever, it is essential to note that the generator used by
171 the helper is not cryptographically secure.
172
173 Return A random 32-bit unsigned value.
174
175 u32 bpf_get_smp_processor_id(void)
176
177 Description
178 Get the SMP (symmetric multiprocessing) processor id.
179 Note that all programs run with preemption disabled,
180 which means that the SMP processor id is stable during
181 all the execution of the program.
182
183 Return The SMP id of the processor running the program.
184
185 int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void
186 *from, u32 len, u64 flags)
187
188 Description
189 Store len bytes from address from into the packet associ‐
190 ated to skb, at offset. flags are a combination of
191 BPF_F_RECOMPUTE_CSUM (automatically recompute the check‐
192 sum for the packet after storing the bytes) and
193 BPF_F_INVALIDATE_HASH (set skb->hash, skb->swhash and
194 skb->l4hash to 0).
195
196 A call to this helper is susceptible to change the under‐
197 laying packet buffer. Therefore, at load time, all checks
198 on pointers previously done by the verifier are invali‐
199 dated and must be performed again, if the helper is used
200 in combination with direct packet access.
201
202 Return 0 on success, or a negative error in case of failure.
203
204 int bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
205 to, u64 size)
206
207 Description
208 Recompute the layer 3 (e.g. IP) checksum for the packet
209 associated to skb. Computation is incremental, so the
210 helper must know the former value of the header field
211 that was modified (from), the new value of this field
212 (to), and the number of bytes (2 or 4) for this field,
213 stored in size. Alternatively, it is possible to store
214 the difference between the previous and the new values of
215 the header field in to, by setting from and size to 0.
216 For both methods, offset indicates the location of the IP
217 checksum within the packet.
218
219 This helper works in combination with bpf_csum_diff(),
220 which does not update the checksum in-place, but offers
221 more flexibility and can handle sizes larger than 2 or 4
222 for the checksum to update.
223
224 A call to this helper is susceptible to change the under‐
225 laying packet buffer. Therefore, at load time, all checks
226 on pointers previously done by the verifier are invali‐
227 dated and must be performed again, if the helper is used
228 in combination with direct packet access.
229
230 Return 0 on success, or a negative error in case of failure.
231
232 int bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
233 to, u64 flags)
234
235 Description
236 Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum
237 for the packet associated to skb. Computation is incre‐
238 mental, so the helper must know the former value of the
239 header field that was modified (from), the new value of
240 this field (to), and the number of bytes (2 or 4) for
241 this field, stored on the lowest four bits of flags.
242 Alternatively, it is possible to store the difference
243 between the previous and the new values of the header
244 field in to, by setting from and the four lowest bits of
245 flags to 0. For both methods, offset indicates the loca‐
246 tion of the IP checksum within the packet. In addition to
247 the size of the field, flags can be added (bitwise OR)
248 actual flags. With BPF_F_MARK_MANGLED_0, a null checksum
249 is left untouched (unless BPF_F_MARK_ENFORCE is added as
250 well), and for updates resulting in a null checksum the
251 value is set to CSUM_MANGLED_0 instead. Flag
252 BPF_F_PSEUDO_HDR indicates the checksum is to be computed
253 against a pseudo-header.
254
255 This helper works in combination with bpf_csum_diff(),
256 which does not update the checksum in-place, but offers
257 more flexibility and can handle sizes larger than 2 or 4
258 for the checksum to update.
259
260 A call to this helper is susceptible to change the under‐
261 laying packet buffer. Therefore, at load time, all checks
262 on pointers previously done by the verifier are invali‐
263 dated and must be performed again, if the helper is used
264 in combination with direct packet access.
265
266 Return 0 on success, or a negative error in case of failure.
267
268 int bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 index)
269
270 Description
271 This special helper is used to trigger a "tail call", or
272 in other words, to jump into another eBPF program. The
273 same stack frame is used (but values on stack and in reg‐
274 isters for the caller are not accessible to the callee).
275 This mechanism allows for program chaining, either for
276 raising the maximum number of available eBPF instruc‐
277 tions, or to execute given programs in conditional
278 blocks. For security reasons, there is an upper limit to
279 the number of successive tail calls that can be per‐
280 formed.
281
282 Upon call of this helper, the program attempts to jump
283 into a program referenced at index index in
284 prog_array_map, a special map of type
285 BPF_MAP_TYPE_PROG_ARRAY, and passes ctx, a pointer to the
286 context.
287
288 If the call succeeds, the kernel immediately runs the
289 first instruction of the new program. This is not a func‐
290 tion call, and it never returns to the previous program.
291 If the call fails, then the helper has no effect, and the
292 caller continues to run its subsequent instructions. A
293 call can fail if the destination program for the jump
294 does not exist (i.e. index is superior to the number of
295 entries in prog_array_map), or if the maximum number of
296 tail calls has been reached for this chain of programs.
297 This limit is defined in the kernel by the macro
298 MAX_TAIL_CALL_CNT (not accessible to user space), which
299 is currently set to 32.
300
301 Return 0 on success, or a negative error in case of failure.
302
303 int bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
304
305 Description
306 Clone and redirect the packet associated to skb to
307 another net device of index ifindex. Both ingress and
308 egress interfaces can be used for redirection. The
309 BPF_F_INGRESS value in flags is used to make the distinc‐
310 tion (ingress path is selected if the flag is present,
311 egress path otherwise). This is the only flag supported
312 for now.
313
314 In comparison with bpf_redirect() helper, bpf_clone_redi‐
315 rect() has the associated cost of duplicating the packet
316 buffer, but this can be executed out of the eBPF program.
317 Conversely, bpf_redirect() is more efficient, but it is
318 handled through an action code where the redirection hap‐
319 pens only after the eBPF program has returned.
320
321 A call to this helper is susceptible to change the under‐
322 laying packet buffer. Therefore, at load time, all checks
323 on pointers previously done by the verifier are invali‐
324 dated and must be performed again, if the helper is used
325 in combination with direct packet access.
326
327 Return 0 on success, or a negative error in case of failure.
328
329 u64 bpf_get_current_pid_tgid(void)
330
331 Return A 64-bit integer containing the current tgid and pid, and
332 created as such: current_task->tgid << 32 | cur‐
333 rent_task->pid.
334
335 u64 bpf_get_current_uid_gid(void)
336
337 Return A 64-bit integer containing the current GID and UID, and
338 created as such: current_gid << 32 | current_uid.
339
340 int bpf_get_current_comm(char *buf, u32 size_of_buf)
341
342 Description
343 Copy the comm attribute of the current task into buf of
344 size_of_buf. The comm attribute contains the name of the
345 executable (excluding the path) for the current task. The
346 size_of_buf must be strictly positive. On success, the
347 helper makes sure that the buf is NUL-terminated. On
348 failure, it is filled with zeroes.
349
350 Return 0 on success, or a negative error in case of failure.
351
352 u32 bpf_get_cgroup_classid(struct sk_buff *skb)
353
354 Description
355 Retrieve the classid for the current task, i.e. for the
356 net_cls cgroup to which skb belongs.
357
358 This helper can be used on TC egress path, but not on
359 ingress.
360
361 The net_cls cgroup provides an interface to tag network
362 packets based on a user-provided identifier for all traf‐
363 fic coming from the tasks belonging to the related
364 cgroup. See also the related kernel documentation, avail‐
365 able from the Linux sources in file Documenta‐
366 tion/cgroup-v1/net_cls.txt.
367
368 The Linux kernel has two versions for cgroups: there are
369 cgroups v1 and cgroups v2. Both are available to users,
370 who can use a mixture of them, but note that the net_cls
371 cgroup is for cgroup v1 only. This makes it incompatible
372 with BPF programs run on cgroups, which is a
373 cgroup-v2-only feature (a socket can only hold data for
374 one version of cgroups at a time).
375
376 This helper is only available is the kernel was compiled
377 with the CONFIG_CGROUP_NET_CLASSID configuration option
378 set to "y" or to "m".
379
380 Return The classid, or 0 for the default unconfigured classid.
381
382 int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16
383 vlan_tci)
384
385 Description
386 Push a vlan_tci (VLAN tag control information) of proto‐
387 col vlan_proto to the packet associated to skb, then
388 update the checksum. Note that if vlan_proto is different
389 from ETH_P_8021Q and ETH_P_8021AD, it is considered to be
390 ETH_P_8021Q.
391
392 A call to this helper is susceptible to change the under‐
393 laying packet buffer. Therefore, at load time, all checks
394 on pointers previously done by the verifier are invali‐
395 dated and must be performed again, if the helper is used
396 in combination with direct packet access.
397
398 Return 0 on success, or a negative error in case of failure.
399
400 int bpf_skb_vlan_pop(struct sk_buff *skb)
401
402 Description
403 Pop a VLAN header from the packet associated to skb.
404
405 A call to this helper is susceptible to change the under‐
406 laying packet buffer. Therefore, at load time, all checks
407 on pointers previously done by the verifier are invali‐
408 dated and must be performed again, if the helper is used
409 in combination with direct packet access.
410
411 Return 0 on success, or a negative error in case of failure.
412
413 int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key
414 *key, u32 size, u64 flags)
415
416 Description
417 Get tunnel metadata. This helper takes a pointer key to
418 an empty struct bpf_tunnel_key of size, that will be
419 filled with tunnel metadata for the packet associated to
420 skb. The flags can be set to BPF_F_TUNINFO_IPV6, which
421 indicates that the tunnel is based on IPv6 protocol
422 instead of IPv4.
423
424 The struct bpf_tunnel_key is an object that generalizes
425 the principal parameters used by various tunneling proto‐
426 cols into a single struct. This way, it can be used to
427 easily make a decision based on the contents of the
428 encapsulation header, "summarized" in this struct. In
429 particular, it holds the IP address of the remote end
430 (IPv4 or IPv6, depending on the case) in key->remote_ipv4
431 or key->remote_ipv6. Also, this struct exposes the
432 key->tunnel_id, which is generally mapped to a VNI (Vir‐
433 tual Network Identifier), making it programmable together
434 with the bpf_skb_set_tunnel_key() helper.
435
436 Let's imagine that the following code is part of a pro‐
437 gram attached to the TC ingress interface, on one end of
438 a GRE tunnel, and is supposed to filter out all messages
439 coming from remote ends with IPv4 address other than
440 10.0.0.1:
441
442 int ret;
443 struct bpf_tunnel_key key = {};
444
445 ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
446 if (ret < 0)
447 return TC_ACT_SHOT; // drop packet
448
449 if (key.remote_ipv4 != 0x0a000001)
450 return TC_ACT_SHOT; // drop packet
451
452 return TC_ACT_OK; // accept packet
453
454 This interface can also be used with all encapsulation
455 devices that can operate in "collect metadata" mode:
456 instead of having one network device per specific config‐
457 uration, the "collect metadata" mode only requires a sin‐
458 gle device where the configuration can be extracted from
459 this helper.
460
461 This can be used together with various tunnels such as
462 VXLan, Geneve, GRE or IP in IP (IPIP).
463
464 Return 0 on success, or a negative error in case of failure.
465
466 int bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key
467 *key, u32 size, u64 flags)
468
469 Description
470 Populate tunnel metadata for packet associated to skb.
471 The tunnel metadata is set to the contents of key, of
472 size. The flags can be set to a combination of the fol‐
473 lowing values:
474
475 BPF_F_TUNINFO_IPV6
476 Indicate that the tunnel is based on IPv6 protocol
477 instead of IPv4.
478
479 BPF_F_ZERO_CSUM_TX
480 For IPv4 packets, add a flag to tunnel metadata
481 indicating that checksum computation should be
482 skipped and checksum set to zeroes.
483
484 BPF_F_DONT_FRAGMENT
485 Add a flag to tunnel metadata indicating that the
486 packet should not be fragmented.
487
488 BPF_F_SEQ_NUMBER
489 Add a flag to tunnel metadata indicating that a
490 sequence number should be added to tunnel header
491 before sending the packet. This flag was added for
492 GRE encapsulation, but might be used with other
493 protocols as well in the future.
494
495 Here is a typical usage on the transmit path:
496
497 struct bpf_tunnel_key key;
498 populate key ...
499 bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
500 bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
501
502 See also the description of the bpf_skb_get_tunnel_key()
503 helper for additional information.
504
505 Return 0 on success, or a negative error in case of failure.
506
507 u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
508
509 Description
510 Read the value of a perf event counter. This helper
511 relies on a map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY.
512 The nature of the perf event counter is selected when map
513 is updated with perf event file descriptors. The map is
514 an array whose size is the number of available CPUs, and
515 each cell contains a value relative to one CPU. The value
516 to retrieve is indicated by flags, that contains the
517 index of the CPU to look up, masked with
518 BPF_F_INDEX_MASK. Alternatively, flags can be set to
519 BPF_F_CURRENT_CPU to indicate that the value for the cur‐
520 rent CPU should be retrieved.
521
522 Note that before Linux 4.13, only hardware perf event can
523 be retrieved.
524
525 Also, be aware that the newer helper
526 bpf_perf_event_read_value() is recommended over
527 bpf_perf_event_read() in general. The latter has some ABI
528 quirks where error and counter value are used as a return
529 code (which is wrong to do since ranges may overlap).
530 This issue is fixed with bpf_perf_event_read_value(),
531 which at the same time provides more features over the
532 bpf_perf_event_read() interface. Please refer to the
533 description of bpf_perf_event_read_value() for details.
534
535 Return The value of the perf event counter read from the map, or
536 a negative error code in case of failure.
537
538 int bpf_redirect(u32 ifindex, u64 flags)
539
540 Description
541 Redirect the packet to another net device of index
542 ifindex. This helper is somewhat similar to
543 bpf_clone_redirect(), except that the packet is not
544 cloned, which provides increased performance.
545
546 Except for XDP, both ingress and egress interfaces can be
547 used for redirection. The BPF_F_INGRESS value in flags is
548 used to make the distinction (ingress path is selected if
549 the flag is present, egress path otherwise). Currently,
550 XDP only supports redirection to the egress interface,
551 and accepts no flag at all.
552
553 The same effect can be attained with the more generic
554 bpf_redirect_map(), which requires specific maps to be
555 used but offers better performance.
556
557 Return For XDP, the helper returns XDP_REDIRECT on success or
558 XDP_ABORTED on error. For other program types, the values
559 are TC_ACT_REDIRECT on success or TC_ACT_SHOT on error.
560
561 u32 bpf_get_route_realm(struct sk_buff *skb)
562
563 Description
564 Retrieve the realm or the route, that is to say the
565 tclassid field of the destination for the skb. The inden‐
566 tifier retrieved is a user-provided tag, similar to the
567 one used with the net_cls cgroup (see description for
568 bpf_get_cgroup_classid() helper), but here this tag is
569 held by a route (a destination entry), not by a task.
570
571 Retrieving this identifier works with the clsact TC
572 egress hook (see also tc-bpf(8)), or alternatively on
573 conventional classful egress qdiscs, but not on TC
574 ingress path. In case of clsact TC egress hook, this has
575 the advantage that, internally, the destination entry has
576 not been dropped yet in the transmit path. Therefore, the
577 destination entry does not need to be artificially held
578 via netif_keep_dst() for a classful qdisc until the skb
579 is freed.
580
581 This helper is available only if the kernel was compiled
582 with CONFIG_IP_ROUTE_CLASSID configuration option.
583
584 Return The realm of the route for the packet associated to skb,
585 or 0 if none was found.
586
587 int bpf_perf_event_output(struct pt_reg *ctx, struct bpf_map *map, u64
588 flags, void *data, u64 size)
589
590 Description
591 Write raw data blob into a special BPF perf event held by
592 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
593 event must have the following attributes: PERF_SAMPLE_RAW
594 as sample_type, PERF_TYPE_SOFTWARE as type, and
595 PERF_COUNT_SW_BPF_OUTPUT as config.
596
597 The flags are used to indicate the index in map for which
598 the value must be put, masked with BPF_F_INDEX_MASK.
599 Alternatively, flags can be set to BPF_F_CURRENT_CPU to
600 indicate that the index of the current CPU core should be
601 used.
602
603 The value to write, of size, is passed through eBPF stack
604 and pointed by data.
605
606 The context of the program ctx needs also be passed to
607 the helper.
608
609 On user space, a program willing to read the values needs
610 to call perf_event_open() on the perf event (either for
611 one or for all CPUs) and to store the file descriptor
612 into the map. This must be done before the eBPF program
613 can send data into it. An example is available in file
614 samples/bpf/trace_output_user.c in the Linux kernel
615 source tree (the eBPF program counterpart is in sam‐
616 ples/bpf/trace_output_kern.c).
617
618 bpf_perf_event_output() achieves better performance than
619 bpf_trace_printk() for sharing data with user space, and
620 is much better suitable for streaming data from eBPF pro‐
621 grams.
622
623 Note that this helper is not restricted to tracing use
624 cases and can be used with programs attached to TC or XDP
625 as well, where it allows for passing data to user space
626 listeners. Data can be:
627
628 · Only custom structs,
629
630 · Only the packet payload, or
631
632 · A combination of both.
633
634 Return 0 on success, or a negative error in case of failure.
635
636 int bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to,
637 u32 len)
638
639 Description
640 This helper was provided as an easy way to load data from
641 a packet. It can be used to load len bytes from offset
642 from the packet associated to skb, into the buffer
643 pointed by to.
644
645 Since Linux 4.7, usage of this helper has mostly been
646 replaced by "direct packet access", enabling packet data
647 to be manipulated with skb->data and skb->data_end point‐
648 ing respectively to the first byte of packet data and to
649 the byte after the last byte of packet data. However, it
650 remains useful if one wishes to read large quantities of
651 data at once from a packet into the eBPF stack.
652
653 Return 0 on success, or a negative error in case of failure.
654
655 int bpf_get_stackid(struct pt_reg *ctx, struct bpf_map *map, u64 flags)
656
657 Description
658 Walk a user or a kernel stack and return its id. To
659 achieve this, the helper needs ctx, which is a pointer to
660 the context on which the tracing program is executed, and
661 a pointer to a map of type BPF_MAP_TYPE_STACK_TRACE.
662
663 The last argument, flags, holds the number of stack
664 frames to skip (from 0 to 255), masked with
665 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set a
666 combination of the following flags:
667
668 BPF_F_USER_STACK
669 Collect a user space stack instead of a kernel
670 stack.
671
672 BPF_F_FAST_STACK_CMP
673 Compare stacks by hash only.
674
675 BPF_F_REUSE_STACKID
676 If two different stacks hash into the same
677 stackid, discard the old one.
678
679 The stack id retrieved is a 32 bit long integer handle
680 which can be further combined with other data (including
681 other stack ids) and used as a key into maps. This can be
682 useful for generating a variety of graphs (such as flame
683 graphs or off-cpu graphs).
684
685 For walking a stack, this helper is an improvement over
686 bpf_probe_read(), which can be used with unrolled loops
687 but is not efficient and consumes a lot of eBPF instruc‐
688 tions. Instead, bpf_get_stackid() can collect up to
689 PERF_MAX_STACK_DEPTH both kernel and user frames. Note
690 that this limit can be controlled with the sysctl pro‐
691 gram, and that it should be manually increased in order
692 to profile long user stacks (such as stacks for Java pro‐
693 grams). To do so, use:
694
695 # sysctl kernel.perf_event_max_stack=<new value>
696
697 Return The positive or null stack id on success, or a negative
698 error in case of failure.
699
700 s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size,
701 __wsum seed)
702
703 Description
704 Compute a checksum difference, from the raw buffer
705 pointed by from, of length from_size (that must be a mul‐
706 tiple of 4), towards the raw buffer pointed by to, of
707 size to_size (same remark). An optional seed can be added
708 to the value (this can be cascaded, the seed may come
709 from a previous call to the helper).
710
711 This is flexible enough to be used in several ways:
712
713 · With from_size == 0, to_size > 0 and seed set to check‐
714 sum, it can be used when pushing new data.
715
716 · With from_size > 0, to_size == 0 and seed set to check‐
717 sum, it can be used when removing data from a packet.
718
719 · With from_size > 0, to_size > 0 and seed set to 0, it
720 can be used to compute a diff. Note that from_size and
721 to_size do not need to be equal.
722
723 This helper can be used in combination with
724 bpf_l3_csum_replace() and bpf_l4_csum_replace(), to which
725 one can feed in the difference computed with
726 bpf_csum_diff().
727
728 Return The checksum result, or a negative error code in case of
729 failure.
730
731 int bpf_skb_get_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
732
733 Description
734 Retrieve tunnel options metadata for the packet associ‐
735 ated to skb, and store the raw tunnel option data to the
736 buffer opt of size.
737
738 This helper can be used with encapsulation devices that
739 can operate in "collect metadata" mode (please refer to
740 the related note in the description of bpf_skb_get_tun‐
741 nel_key() for more details). A particular example where
742 this can be used is in combination with the Geneve encap‐
743 sulation protocol, where it allows for pushing (with
744 bpf_skb_get_tunnel_opt() helper) and retrieving arbitrary
745 TLVs (Type-Length-Value headers) from the eBPF program.
746 This allows for full customization of these headers.
747
748 Return The size of the option data retrieved.
749
750 int bpf_skb_set_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
751
752 Description
753 Set tunnel options metadata for the packet associated to
754 skb to the option data contained in the raw buffer opt of
755 size.
756
757 See also the description of the bpf_skb_get_tunnel_opt()
758 helper for additional information.
759
760 Return 0 on success, or a negative error in case of failure.
761
762 int bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
763
764 Description
765 Change the protocol of the skb to proto. Currently sup‐
766 ported are transition from IPv4 to IPv6, and from IPv6 to
767 IPv4. The helper takes care of the groundwork for the
768 transition, including resizing the socket buffer. The
769 eBPF program is expected to fill the new headers, if any,
770 via skb_store_bytes() and to recompute the checksums with
771 bpf_l3_csum_replace() and bpf_l4_csum_replace(). The main
772 case for this helper is to perform NAT64 operations out
773 of an eBPF program.
774
775 Internally, the GSO type is marked as dodgy so that head‐
776 ers are checked and segments are recalculated by the
777 GSO/GRO engine. The size for GSO target is adapted as
778 well.
779
780 All values for flags are reserved for future usage, and
781 must be left at zero.
782
783 A call to this helper is susceptible to change the under‐
784 laying packet buffer. Therefore, at load time, all checks
785 on pointers previously done by the verifier are invali‐
786 dated and must be performed again, if the helper is used
787 in combination with direct packet access.
788
789 Return 0 on success, or a negative error in case of failure.
790
791 int bpf_skb_change_type(struct sk_buff *skb, u32 type)
792
793 Description
794 Change the packet type for the packet associated to skb.
795 This comes down to setting skb->pkt_type to type, except
796 the eBPF program does not have a write access to
797 skb->pkt_type beside this helper. Using a helper here
798 allows for graceful handling of errors.
799
800 The major use case is to change incoming skb*s to
801 **PACKET_HOST* in a programmatic way instead of having to
802 recirculate via redirect(..., BPF_F_INGRESS), for exam‐
803 ple.
804
805 Note that type only allows certain values. At this time,
806 they are:
807
808 PACKET_HOST
809 Packet is for us.
810
811 PACKET_BROADCAST
812 Send packet to all.
813
814 PACKET_MULTICAST
815 Send packet to group.
816
817 PACKET_OTHERHOST
818 Send packet to someone else.
819
820 Return 0 on success, or a negative error in case of failure.
821
822 int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32
823 index)
824
825 Description
826 Check whether skb is a descendant of the cgroup2 held by
827 map of type BPF_MAP_TYPE_CGROUP_ARRAY, at index.
828
829 Return The return value depends on the result of the test, and
830 can be:
831
832 · 0, if the skb failed the cgroup2 descendant test.
833
834 · 1, if the skb succeeded the cgroup2 descendant test.
835
836 · A negative error code, if an error occurred.
837
838 u32 bpf_get_hash_recalc(struct sk_buff *skb)
839
840 Description
841 Retrieve the hash of the packet, skb->hash. If it is not
842 set, in particular if the hash was cleared due to man‐
843 gling, recompute this hash. Later accesses to the hash
844 can be done directly with skb->hash.
845
846 Calling bpf_set_hash_invalid(), changing a packet proto‐
847 type with bpf_skb_change_proto(), or calling
848 bpf_skb_store_bytes() with the BPF_F_INVALIDATE_HASH are
849 actions susceptible to clear the hash and to trigger a
850 new computation for the next call to
851 bpf_get_hash_recalc().
852
853 Return The 32-bit hash.
854
855 u64 bpf_get_current_task(void)
856
857 Return A pointer to the current task struct.
858
859 int bpf_probe_write_user(void *dst, const void *src, u32 len)
860
861 Description
862 Attempt in a safe way to write len bytes from the buffer
863 src to dst in memory. It only works for threads that are
864 in user context, and dst must be a valid user space
865 address.
866
867 This helper should not be used to implement any kind of
868 security mechanism because of TOC-TOU attacks, but rather
869 to debug, divert, and manipulate execution of semi-coop‐
870 erative processes.
871
872 Keep in mind that this feature is meant for experiments,
873 and it has a risk of crashing the system and running pro‐
874 grams. Therefore, when an eBPF program using this helper
875 is attached, a warning including PID and process name is
876 printed to kernel logs.
877
878 Return 0 on success, or a negative error in case of failure.
879
880 int bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
881
882 Description
883 Check whether the probe is being run is the context of a
884 given subset of the cgroup2 hierarchy. The cgroup2 to
885 test is held by map of type BPF_MAP_TYPE_CGROUP_ARRAY, at
886 index.
887
888 Return The return value depends on the result of the test, and
889 can be:
890
891 · 0, if the skb task belongs to the cgroup2.
892
893 · 1, if the skb task does not belong to the cgroup2.
894
895 · A negative error code, if an error occurred.
896
897 int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
898
899 Description
900 Resize (trim or grow) the packet associated to skb to the
901 new len. The flags are reserved for future usage, and
902 must be left at zero.
903
904 The basic idea is that the helper performs the needed
905 work to change the size of the packet, then the eBPF pro‐
906 gram rewrites the rest via helpers like
907 bpf_skb_store_bytes(), bpf_l3_csum_replace(),
908 bpf_l3_csum_replace() and others. This helper is a slow
909 path utility intended for replies with control messages.
910 And because it is targeted for slow path, the helper
911 itself can afford to be slow: it implicitly linearizes,
912 unclones and drops offloads from the skb.
913
914 A call to this helper is susceptible to change the under‐
915 laying packet buffer. Therefore, at load time, all checks
916 on pointers previously done by the verifier are invali‐
917 dated and must be performed again, if the helper is used
918 in combination with direct packet access.
919
920 Return 0 on success, or a negative error in case of failure.
921
922 int bpf_skb_pull_data(struct sk_buff *skb, u32 len)
923
924 Description
925 Pull in non-linear data in case the skb is non-linear and
926 not all of len are part of the linear section. Make len
927 bytes from skb readable and writable. If a zero value is
928 passed for len, then the whole length of the skb is
929 pulled.
930
931 This helper is only needed for reading and writing with
932 direct packet access.
933
934 For direct packet access, testing that offsets to access
935 are within packet boundaries (test on skb->data_end) is
936 susceptible to fail if offsets are invalid, or if the
937 requested data is in non-linear parts of the skb. On
938 failure the program can just bail out, or in the case of
939 a non-linear buffer, use a helper to make the data avail‐
940 able. The bpf_skb_load_bytes() helper is a first solution
941 to access the data. Another one consists in using
942 bpf_skb_pull_data to pull in once the non-linear parts,
943 then retesting and eventually access the data.
944
945 At the same time, this also makes sure the skb is
946 uncloned, which is a necessary condition for direct
947 write. As this needs to be an invariant for the write
948 part only, the verifier detects writes and adds a pro‐
949 logue that is calling bpf_skb_pull_data() to effectively
950 unclone the skb from the very beginning in case it is
951 indeed cloned.
952
953 A call to this helper is susceptible to change the under‐
954 laying packet buffer. Therefore, at load time, all checks
955 on pointers previously done by the verifier are invali‐
956 dated and must be performed again, if the helper is used
957 in combination with direct packet access.
958
959 Return 0 on success, or a negative error in case of failure.
960
961 s64 bpf_csum_update(struct sk_buff *skb, __wsum csum)
962
963 Description
964 Add the checksum csum into skb->csum in case the driver
965 has supplied a checksum for the entire packet into that
966 field. Return an error otherwise. This helper is intended
967 to be used in combination with bpf_csum_diff(), in par‐
968 ticular when the checksum needs to be updated after data
969 has been written into the packet through direct packet
970 access.
971
972 Return The checksum on success, or a negative error code in case
973 of failure.
974
975 void bpf_set_hash_invalid(struct sk_buff *skb)
976
977 Description
978 Invalidate the current skb->hash. It can be used after
979 mangling on headers through direct packet access, in
980 order to indicate that the hash is outdated and to trig‐
981 ger a recalculation the next time the kernel tries to
982 access this hash or when the bpf_get_hash_recalc() helper
983 is called.
984
985 int bpf_get_numa_node_id(void)
986
987 Description
988 Return the id of the current NUMA node. The primary use
989 case for this helper is the selection of sockets for the
990 local NUMA node, when the program is attached to sockets
991 using the SO_ATTACH_REUSEPORT_EBPF option (see also
992 socket(7)), but the helper is also available to other
993 eBPF program types, similarly to bpf_get_smp_proces‐
994 sor_id().
995
996 Return The id of current NUMA node.
997
998 int bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
999
1000 Description
1001 Grows headroom of packet associated to skb and adjusts
1002 the offset of the MAC header accordingly, adding len
1003 bytes of space. It automatically extends and reallocates
1004 memory as required.
1005
1006 This helper can be used on a layer 3 skb to push a MAC
1007 header for redirection into a layer 2 device.
1008
1009 All values for flags are reserved for future usage, and
1010 must be left at zero.
1011
1012 A call to this helper is susceptible to change the under‐
1013 laying packet buffer. Therefore, at load time, all checks
1014 on pointers previously done by the verifier are invali‐
1015 dated and must be performed again, if the helper is used
1016 in combination with direct packet access.
1017
1018 Return 0 on success, or a negative error in case of failure.
1019
1020 int bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
1021
1022 Description
1023 Adjust (move) xdp_md->data by delta bytes. Note that it
1024 is possible to use a negative value for delta. This
1025 helper can be used to prepare the packet for pushing or
1026 popping headers.
1027
1028 A call to this helper is susceptible to change the under‐
1029 laying packet buffer. Therefore, at load time, all checks
1030 on pointers previously done by the verifier are invali‐
1031 dated and must be performed again, if the helper is used
1032 in combination with direct packet access.
1033
1034 Return 0 on success, or a negative error in case of failure.
1035
1036 int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
1037
1038 Description
1039 Copy a NUL terminated string from an unsafe address
1040 unsafe_ptr to dst. The size should include the terminat‐
1041 ing NUL byte. In case the string length is smaller than
1042 size, the target is not padded with further NUL bytes. If
1043 the string length is larger than size, just size-1 bytes
1044 are copied and the last byte is set to NUL.
1045
1046 On success, the length of the copied string is returned.
1047 This makes this helper useful in tracing programs for
1048 reading strings, and more importantly to get its length
1049 at runtime. See the following snippet:
1050
1051 SEC("kprobe/sys_open")
1052 void bpf_sys_open(struct pt_regs *ctx)
1053 char buf[PATHLEN]; // PATHLEN is defined to 256
1054 int res = bpf_probe_read_str(buf, sizeof(buf),
1055 ctx->di);
1056
1057 // Consume buf, for example push it to
1058 // userspace via bpf_perf_event_output(); we
1059 // can use res (the string length) as event
1060 // size, after checking its boundaries.
1061
1062 In comparison, using bpf_probe_read() helper here instead
1063 to read the string would require to estimate the length
1064 at compile time, and would often result in copying more
1065 memory than necessary.
1066
1067 Another useful use case is when parsing individual
1068 process arguments or individual environment variables
1069 navigating current->mm->arg_start and cur‐
1070 rent->mm->env_start: using this helper and the return
1071 value, one can quickly iterate at the right offset of the
1072 memory area.
1073
1074 Return On success, the strictly positive length of the string,
1075 including the trailing NUL character. On error, a nega‐
1076 tive value.
1077
1078 u64 bpf_get_socket_cookie(struct sk_buff *skb)
1079
1080 Description
1081 If the struct sk_buff pointed by skb has a known socket,
1082 retrieve the cookie (generated by the kernel) of this
1083 socket. If no cookie has been set yet, generate a new
1084 cookie. Once generated, the socket cookie remains stable
1085 for the life of the socket. This helper can be useful for
1086 monitoring per socket networking traffic statistics as it
1087 provides a unique socket identifier per namespace.
1088
1089 Return A 8-byte long non-decreasing number on success, or 0 if
1090 the socket field is missing inside skb.
1091
1092 u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
1093
1094 Description
1095 Equivalent to bpf_get_socket_cookie() helper that accepts
1096 skb, but gets socket from struct bpf_sock_addr contex.
1097
1098 Return A 8-byte long non-decreasing number.
1099
1100 u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
1101
1102 Description
1103 Equivalent to bpf_get_socket_cookie() helper that accepts
1104 skb, but gets socket from struct bpf_sock_ops contex.
1105
1106 Return A 8-byte long non-decreasing number.
1107
1108 u32 bpf_get_socket_uid(struct sk_buff *skb)
1109
1110 Return The owner UID of the socket associated to skb. If the
1111 socket is NULL, or if it is not a full socket (i.e. if it
1112 is a time-wait or a request socket instead), overflowuid
1113 value is returned (note that overflowuid might also be
1114 the actual UID value for the socket).
1115
1116 u32 bpf_set_hash(struct sk_buff *skb, u32 hash)
1117
1118 Description
1119 Set the full hash for skb (set the field skb->hash) to
1120 value hash.
1121
1122 Return
1123
1124 int bpf_setsockopt(struct bpf_sock_ops *bpf_socket, int level, int opt‐
1125 name, char *optval, int optlen)
1126
1127 Description
1128 Emulate a call to setsockopt() on the socket associated
1129 to bpf_socket, which must be a full socket. The level at
1130 which the option resides and the name optname of the
1131 option must be specified, see setsockopt(2) for more
1132 information. The option value of length optlen is
1133 pointed by optval.
1134
1135 This helper actually implements a subset of setsockopt().
1136 It supports the following levels:
1137
1138 · SOL_SOCKET, which supports the following optnames:
1139 SO_RCVBUF, SO_SNDBUF, SO_MAX_PACING_RATE, SO_PRIORITY,
1140 SO_RCVLOWAT, SO_MARK.
1141
1142 · IPPROTO_TCP, which supports the following optnames:
1143 TCP_CONGESTION, TCP_BPF_IW, TCP_BPF_SNDCWND_CLAMP.
1144
1145 · IPPROTO_IP, which supports optname IP_TOS.
1146
1147 · IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1148
1149 Return 0 on success, or a negative error in case of failure.
1150
1151 int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode,
1152 u64 flags)
1153
1154 Description
1155 Grow or shrink the room for data in the packet associated
1156 to skb by len_diff, and according to the selected mode.
1157
1158 There is a single supported mode at this time:
1159
1160 · BPF_ADJ_ROOM_NET: Adjust room at the network layer
1161 (room space is added or removed below the layer 3
1162 header).
1163
1164 All values for flags are reserved for future usage, and
1165 must be left at zero.
1166
1167 A call to this helper is susceptible to change the under‐
1168 laying packet buffer. Therefore, at load time, all checks
1169 on pointers previously done by the verifier are invali‐
1170 dated and must be performed again, if the helper is used
1171 in combination with direct packet access.
1172
1173 Return 0 on success, or a negative error in case of failure.
1174
1175 int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
1176
1177 Description
1178 Redirect the packet to the endpoint referenced by map at
1179 index key. Depending on its type, this map can contain
1180 references to net devices (for forwarding packets through
1181 other ports), or to CPUs (for redirecting XDP frames to
1182 another CPU; but this is only implemented for native XDP
1183 (with driver support) as of this writing).
1184
1185 All values for flags are reserved for future usage, and
1186 must be left at zero.
1187
1188 When used to redirect packets to net devices, this helper
1189 provides a high performance increase over bpf_redirect().
1190 This is due to various implementation details of the
1191 underlying mechanisms, one of which is the fact that
1192 bpf_redirect_map() tries to send packet as a "bulk" to
1193 the device.
1194
1195 Return XDP_REDIRECT on success, or XDP_ABORTED on error.
1196
1197 int bpf_sk_redirect_map(struct bpf_map *map, u32 key, u64 flags)
1198
1199 Description
1200 Redirect the packet to the socket referenced by map (of
1201 type BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1202 egress interfaces can be used for redirection. The
1203 BPF_F_INGRESS value in flags is used to make the distinc‐
1204 tion (ingress path is selected if the flag is present,
1205 egress path otherwise). This is the only flag supported
1206 for now.
1207
1208 Return SK_PASS on success, or SK_DROP on error.
1209
1210 int bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map
1211 *map, void *key, u64 flags)
1212
1213 Description
1214 Add an entry to, or update a map referencing sockets. The
1215 skops is used as a new value for the entry associated to
1216 key. flags is one of:
1217
1218 BPF_NOEXIST
1219 The entry for key must not exist in the map.
1220
1221 BPF_EXIST
1222 The entry for key must already exist in the map.
1223
1224 BPF_ANY
1225 No condition on the existence of the entry for
1226 key.
1227
1228 If the map has eBPF programs (parser and verdict), those
1229 will be inherited by the socket being added. If the
1230 socket is already attached to eBPF programs, this results
1231 in an error.
1232
1233 Return 0 on success, or a negative error in case of failure.
1234
1235 int bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
1236
1237 Description
1238 Adjust the address pointed by xdp_md->data_meta by delta
1239 (which can be positive or negative). Note that this oper‐
1240 ation modifies the address stored in xdp_md->data, so the
1241 latter must be loaded only after the helper has been
1242 called.
1243
1244 The use of xdp_md->data_meta is optional and programs are
1245 not required to use it. The rationale is that when the
1246 packet is processed with XDP (e.g. as DoS filter), it is
1247 possible to push further meta data along with it before
1248 passing to the stack, and to give the guarantee that an
1249 ingress eBPF program attached as a TC classifier on the
1250 same device can pick this up for further post-processing.
1251 Since TC works with socket buffers, it remains possible
1252 to set from XDP the mark or priority pointers, or other
1253 pointers for the socket buffer. Having this scratch
1254 space generic and programmable allows for more flexibil‐
1255 ity as the user is free to store whatever meta data they
1256 need.
1257
1258 A call to this helper is susceptible to change the under‐
1259 laying packet buffer. Therefore, at load time, all checks
1260 on pointers previously done by the verifier are invali‐
1261 dated and must be performed again, if the helper is used
1262 in combination with direct packet access.
1263
1264 Return 0 on success, or a negative error in case of failure.
1265
1266 int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct
1267 bpf_perf_event_value *buf, u32 buf_size)
1268
1269 Description
1270 Read the value of a perf event counter, and store it into
1271 buf of size buf_size. This helper relies on a map of type
1272 BPF_MAP_TYPE_PERF_EVENT_ARRAY. The nature of the perf
1273 event counter is selected when map is updated with perf
1274 event file descriptors. The map is an array whose size is
1275 the number of available CPUs, and each cell contains a
1276 value relative to one CPU. The value to retrieve is indi‐
1277 cated by flags, that contains the index of the CPU to
1278 look up, masked with BPF_F_INDEX_MASK. Alternatively,
1279 flags can be set to BPF_F_CURRENT_CPU to indicate that
1280 the value for the current CPU should be retrieved.
1281
1282 This helper behaves in a way close to
1283 bpf_perf_event_read() helper, save that instead of just
1284 returning the value observed, it fills the buf structure.
1285 This allows for additional data to be retrieved: in par‐
1286 ticular, the enabled and running times (in buf->enabled
1287 and buf->running, respectively) are copied. In general,
1288 bpf_perf_event_read_value() is recommended over
1289 bpf_perf_event_read(), which has some ABI issues and pro‐
1290 vides fewer functionalities.
1291
1292 These values are interesting, because hardware PMU (Per‐
1293 formance Monitoring Unit) counters are limited resources.
1294 When there are more PMU based perf events opened than
1295 available counters, kernel will multiplex these events so
1296 each event gets certain percentage (but not all) of the
1297 PMU time. In case that multiplexing happens, the number
1298 of samples or counter value will not reflect the case
1299 compared to when no multiplexing occurs. This makes com‐
1300 parison between different runs difficult. Typically, the
1301 counter value should be normalized before comparing to
1302 other experiments. The usual normalization is done as
1303 follows.
1304
1305 normalized_counter = counter * t_enabled / t_running
1306
1307 Where t_enabled is the time enabled for event and t_run‐
1308 ning is the time running for event since last normaliza‐
1309 tion. The enabled and running times are accumulated since
1310 the perf event open. To achieve scaling factor between
1311 two invocations of an eBPF program, users can can use CPU
1312 id as the key (which is typical for perf array usage
1313 model) to remember the previous value and do the calcula‐
1314 tion inside the eBPF program.
1315
1316 Return 0 on success, or a negative error in case of failure.
1317
1318 int bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct
1319 bpf_perf_event_value *buf, u32 buf_size)
1320
1321 Description
1322 For en eBPF program attached to a perf event, retrieve
1323 the value of the event counter associated to ctx and
1324 store it in the structure pointed by buf and of size
1325 buf_size. Enabled and running times are also stored in
1326 the structure (see description of helper
1327 bpf_perf_event_read_value() for more details).
1328
1329 Return 0 on success, or a negative error in case of failure.
1330
1331 int bpf_getsockopt(struct bpf_sock_ops *bpf_socket, int level, int opt‐
1332 name, char *optval, int optlen)
1333
1334 Description
1335 Emulate a call to getsockopt() on the socket associated
1336 to bpf_socket, which must be a full socket. The level at
1337 which the option resides and the name optname of the
1338 option must be specified, see getsockopt(2) for more
1339 information. The retrieved value is stored in the struc‐
1340 ture pointed by opval and of length optlen.
1341
1342 This helper actually implements a subset of getsockopt().
1343 It supports the following levels:
1344
1345 · IPPROTO_TCP, which supports optname TCP_CONGESTION.
1346
1347 · IPPROTO_IP, which supports optname IP_TOS.
1348
1349 · IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1350
1351 Return 0 on success, or a negative error in case of failure.
1352
1353 int bpf_override_return(struct pt_reg *regs, u64 rc)
1354
1355 Description
1356 Used for error injection, this helper uses kprobes to
1357 override the return value of the probed function, and to
1358 set it to rc. The first argument is the context regs on
1359 which the kprobe works.
1360
1361 This helper works by setting setting the PC (program
1362 counter) to an override function which is run in place of
1363 the original probed function. This means the probed func‐
1364 tion is not run at all. The replacement function just
1365 returns with the required value.
1366
1367 This helper has security implications, and thus is sub‐
1368 ject to restrictions. It is only available if the kernel
1369 was compiled with the CONFIG_BPF_KPROBE_OVERRIDE configu‐
1370 ration option, and in this case it only works on func‐
1371 tions tagged with ALLOW_ERROR_INJECTION in the kernel
1372 code.
1373
1374 Also, the helper is only available for the architectures
1375 having the CONFIG_FUNCTION_ERROR_INJECTION option. As of
1376 this writing, x86 architecture is the only one to support
1377 this feature.
1378
1379 Return
1380
1381 int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int
1382 argval)
1383
1384 Description
1385 Attempt to set the value of the bpf_sock_ops_cb_flags
1386 field for the full TCP socket associated to bpf_sock_ops
1387 to argval.
1388
1389 The primary use of this field is to determine if there
1390 should be calls to eBPF programs of type
1391 BPF_PROG_TYPE_SOCK_OPS at various points in the TCP code.
1392 A program of the same type can change its value, per con‐
1393 nection and as necessary, when the connection is estab‐
1394 lished. This field is directly accessible for reading,
1395 but this helper must be used for updates in order to
1396 return an error if an eBPF program tries to set a call‐
1397 back that is not supported in the current kernel.
1398
1399 The supported callback values that argval can combine
1400 are:
1401
1402 · BPF_SOCK_OPS_RTO_CB_FLAG (retransmission time out)
1403
1404 · BPF_SOCK_OPS_RETRANS_CB_FLAG (retransmission)
1405
1406 · BPF_SOCK_OPS_STATE_CB_FLAG (TCP state change)
1407
1408 Here are some examples of where one could call such eBPF
1409 program:
1410
1411 · When RTO fires.
1412
1413 · When a packet is retransmitted.
1414
1415 · When the connection terminates.
1416
1417 · When a packet is sent.
1418
1419 · When a packet is received.
1420
1421 Return Code -EINVAL if the socket is not a full TCP socket; oth‐
1422 erwise, a positive number containing the bits that could
1423 not be set is returned (which comes down to 0 if all bits
1424 were set as required).
1425
1426 int bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map,
1427 u32 key, u64 flags)
1428
1429 Description
1430 This helper is used in programs implementing policies at
1431 the socket level. If the message msg is allowed to pass
1432 (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1433 rect it to the socket referenced by map (of type
1434 BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1435 egress interfaces can be used for redirection. The
1436 BPF_F_INGRESS value in flags is used to make the distinc‐
1437 tion (ingress path is selected if the flag is present,
1438 egress path otherwise). This is the only flag supported
1439 for now.
1440
1441 Return SK_PASS on success, or SK_DROP on error.
1442
1443 int bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
1444
1445 Description
1446 For socket policies, apply the verdict of the eBPF pro‐
1447 gram to the next bytes (number of bytes) of message msg.
1448
1449 For example, this helper can be used in the following
1450 cases:
1451
1452 · A single sendmsg() or sendfile() system call contains
1453 multiple logical messages that the eBPF program is sup‐
1454 posed to read and for which it should apply a verdict.
1455
1456 · An eBPF program only cares to read the first bytes of a
1457 msg. If the message has a large payload, then setting
1458 up and calling the eBPF program repeatedly for all
1459 bytes, even though the verdict is already known, would
1460 create unnecessary overhead.
1461
1462 When called from within an eBPF program, the helper sets
1463 a counter internal to the BPF infrastructure, that is
1464 used to apply the last verdict to the next bytes. If
1465 bytes is smaller than the current data being processed
1466 from a sendmsg() or sendfile() system call, the first
1467 bytes will be sent and the eBPF program will be re-run
1468 with the pointer for start of data pointing to byte num‐
1469 ber bytes + 1. If bytes is larger than the current data
1470 being processed, then the eBPF verdict will be applied to
1471 multiple sendmsg() or sendfile() calls until bytes are
1472 consumed.
1473
1474 Note that if a socket closes with the internal counter
1475 holding a non-zero value, this is not a problem because
1476 data is not being buffered for bytes and is sent as it is
1477 received.
1478
1479 Return
1480
1481 int bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
1482
1483 Description
1484 For socket policies, prevent the execution of the verdict
1485 eBPF program for message msg until bytes (byte number)
1486 have been accumulated.
1487
1488 This can be used when one needs a specific number of
1489 bytes before a verdict can be assigned, even if the data
1490 spans multiple sendmsg() or sendfile() calls. The extreme
1491 case would be a user calling sendmsg() repeatedly with
1492 1-byte long message segments. Obviously, this is bad for
1493 performance, but it is still valid. If the eBPF program
1494 needs bytes bytes to validate a header, this helper can
1495 be used to prevent the eBPF program to be called again
1496 until bytes have been accumulated.
1497
1498 Return
1499
1500 int bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64
1501 flags)
1502
1503 Description
1504 For socket policies, pull in non-linear data from user
1505 space for msg and set pointers msg->data and
1506 msg->data_end to start and end bytes offsets into msg,
1507 respectively.
1508
1509 If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
1510 it can only parse data that the (data, data_end) pointers
1511 have already consumed. For sendmsg() hooks this is likely
1512 the first scatterlist element. But for calls relying on
1513 the sendpage handler (e.g. sendfile()) this will be the
1514 range (0, 0) because the data is shared with user space
1515 and by default the objective is to avoid allowing user
1516 space to modify data while (or after) eBPF verdict is
1517 being decided. This helper can be used to pull in data
1518 and to set the start and end pointer to given values.
1519 Data will be copied if necessary (i.e. if data was not
1520 linear and if start and end pointers do not point to the
1521 same chunk).
1522
1523 A call to this helper is susceptible to change the under‐
1524 laying packet buffer. Therefore, at load time, all checks
1525 on pointers previously done by the verifier are invali‐
1526 dated and must be performed again, if the helper is used
1527 in combination with direct packet access.
1528
1529 All values for flags are reserved for future usage, and
1530 must be left at zero.
1531
1532 Return 0 on success, or a negative error in case of failure.
1533
1534 int bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int
1535 addr_len)
1536
1537 Description
1538 Bind the socket associated to ctx to the address pointed
1539 by addr, of length addr_len. This allows for making out‐
1540 going connection from the desired IP address, which can
1541 be useful for example when all processes inside a cgroup
1542 should use one single IP address on a host that has mul‐
1543 tiple IP configured.
1544
1545 This helper works for IPv4 and IPv6, TCP and UDP sockets.
1546 The domain (addr->sa_family) must be AF_INET (or
1547 AF_INET6). Looking for a free port to bind to can be
1548 expensive, therefore binding to port is not permitted by
1549 the helper: addr->sin_port (or sin6_port, respectively)
1550 must be set to zero.
1551
1552 Return 0 on success, or a negative error in case of failure.
1553
1554 int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
1555
1556 Description
1557 Adjust (move) xdp_md->data_end by delta bytes. It is only
1558 possible to shrink the packet as of this writing, there‐
1559 fore delta must be a negative integer.
1560
1561 A call to this helper is susceptible to change the under‐
1562 laying packet buffer. Therefore, at load time, all checks
1563 on pointers previously done by the verifier are invali‐
1564 dated and must be performed again, if the helper is used
1565 in combination with direct packet access.
1566
1567 Return 0 on success, or a negative error in case of failure.
1568
1569 int bpf_skb_get_xfrm_state(struct sk_buff *skb, u32 index, struct
1570 bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
1571
1572 Description
1573 Retrieve the XFRM state (IP transform framework, see also
1574 ip-xfrm(8)) at index in XFRM "security path" for skb.
1575
1576 The retrieved value is stored in the struct
1577 bpf_xfrm_state pointed by xfrm_state and of length size.
1578
1579 All values for flags are reserved for future usage, and
1580 must be left at zero.
1581
1582 This helper is available only if the kernel was compiled
1583 with CONFIG_XFRM configuration option.
1584
1585 Return 0 on success, or a negative error in case of failure.
1586
1587 int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
1588
1589 Description
1590 Return a user or a kernel stack in bpf program provided
1591 buffer. To achieve this, the helper needs ctx, which is
1592 a pointer to the context on which the tracing program is
1593 executed. To store the stacktrace, the bpf program pro‐
1594 vides buf with a nonnegative size.
1595
1596 The last argument, flags, holds the number of stack
1597 frames to skip (from 0 to 255), masked with
1598 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set
1599 the following flags:
1600
1601 BPF_F_USER_STACK
1602 Collect a user space stack instead of a kernel
1603 stack.
1604
1605 BPF_F_USER_BUILD_ID
1606 Collect buildid+offset instead of ips for user
1607 stack, only valid if BPF_F_USER_STACK is also
1608 specified.
1609
1610 bpf_get_stack() can collect up to PERF_MAX_STACK_DEPTH
1611 both kernel and user frames, subject to sufficient large
1612 buffer size. Note that this limit can be controlled with
1613 the sysctl program, and that it should be manually
1614 increased in order to profile long user stacks (such as
1615 stacks for Java programs). To do so, use:
1616
1617 # sysctl kernel.perf_event_max_stack=<new value>
1618
1619 Return A non-negative value equal to or less than size on suc‐
1620 cess, or a negative error in case of failure.
1621
1622 int bpf_skb_load_bytes_relative(const struct sk_buff *skb, u32 offset,
1623 void *to, u32 len, u32 start_header)
1624
1625 Description
1626 This helper is similar to bpf_skb_load_bytes() in that it
1627 provides an easy way to load len bytes from offset from
1628 the packet associated to skb, into the buffer pointed by
1629 to. The difference to bpf_skb_load_bytes() is that a
1630 fifth argument start_header exists in order to select a
1631 base offset to start from. start_header can be one of:
1632
1633 BPF_HDR_START_MAC
1634 Base offset to load data from is skb's mac header.
1635
1636 BPF_HDR_START_NET
1637 Base offset to load data from is skb's network
1638 header.
1639
1640 In general, "direct packet access" is the preferred
1641 method to access packet data, however, this helper is in
1642 particular useful in socket filters where skb->data does
1643 not always point to the start of the mac header and where
1644 "direct packet access" is not available.
1645
1646 Return 0 on success, or a negative error in case of failure.
1647
1648 int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen,
1649 u32 flags)
1650
1651 Description
1652 Do FIB lookup in kernel tables using parameters in
1653 params. If lookup is successful and result shows packet
1654 is to be forwarded, the neighbor tables are searched for
1655 the nexthop. If successful (ie., FIB lookup shows for‐
1656 warding and nexthop is resolved), the nexthop address is
1657 returned in ipv4_dst or ipv6_dst based on family, smac is
1658 set to mac address of egress device, dmac is set to nex‐
1659 thop mac address, rt_metric is set to metric from route
1660 (IPv4/IPv6 only), and ifindex is set to the device index
1661 of the nexthop from the FIB lookup.
1662
1663 plen argument is the size of the passed in struct. flags
1664 argument can be a combination of one or more of the fol‐
1665 lowing values:
1666
1667 BPF_FIB_LOOKUP_DIRECT
1668 Do a direct table lookup vs full lookup using FIB
1669 rules.
1670
1671 BPF_FIB_LOOKUP_OUTPUT
1672 Perform lookup from an egress perspective (default
1673 is ingress).
1674
1675 ctx is either struct xdp_md for XDP programs or struct
1676 sk_buff tc cls_act programs.
1677
1678 Return
1679
1680 · < 0 if any input argument is invalid
1681
1682 · 0 on success (packet is forwarded, nexthop neighbor
1683 exists)
1684
1685 · > 0 one of BPF_FIB_LKUP_RET_ codes explaining why the
1686 packet is not forwarded or needs assist from full stack
1687
1688 int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct
1689 bpf_map *map, void *key, u64 flags)
1690
1691 Description
1692 Add an entry to, or update a sockhash map referencing
1693 sockets. The skops is used as a new value for the entry
1694 associated to key. flags is one of:
1695
1696 BPF_NOEXIST
1697 The entry for key must not exist in the map.
1698
1699 BPF_EXIST
1700 The entry for key must already exist in the map.
1701
1702 BPF_ANY
1703 No condition on the existence of the entry for
1704 key.
1705
1706 If the map has eBPF programs (parser and verdict), those
1707 will be inherited by the socket being added. If the
1708 socket is already attached to eBPF programs, this results
1709 in an error.
1710
1711 Return 0 on success, or a negative error in case of failure.
1712
1713 int bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map,
1714 void *key, u64 flags)
1715
1716 Description
1717 This helper is used in programs implementing policies at
1718 the socket level. If the message msg is allowed to pass
1719 (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1720 rect it to the socket referenced by map (of type
1721 BPF_MAP_TYPE_SOCKHASH) using hash key. Both ingress and
1722 egress interfaces can be used for redirection. The
1723 BPF_F_INGRESS value in flags is used to make the distinc‐
1724 tion (ingress path is selected if the flag is present,
1725 egress path otherwise). This is the only flag supported
1726 for now.
1727
1728 Return SK_PASS on success, or SK_DROP on error.
1729
1730 int bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void
1731 *key, u64 flags)
1732
1733 Description
1734 This helper is used in programs implementing policies at
1735 the skb socket level. If the sk_buff skb is allowed to
1736 pass (i.e. if the verdeict eBPF program returns
1737 SK_PASS), redirect it to the socket referenced by map (of
1738 type BPF_MAP_TYPE_SOCKHASH) using hash key. Both ingress
1739 and egress interfaces can be used for redirection. The
1740 BPF_F_INGRESS value in flags is used to make the distinc‐
1741 tion (ingress path is selected if the flag is present,
1742 egress otherwise). This is the only flag supported for
1743 now.
1744
1745 Return SK_PASS on success, or SK_DROP on error.
1746
1747 int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32
1748 len)
1749
1750 Description
1751 Encapsulate the packet associated to skb within a Layer 3
1752 protocol header. This header is provided in the buffer at
1753 address hdr, with len its size in bytes. type indicates
1754 the protocol of the header and can be one of:
1755
1756 BPF_LWT_ENCAP_SEG6
1757 IPv6 encapsulation with Segment Routing Header
1758 (struct ipv6_sr_hdr). hdr only contains the SRH,
1759 the IPv6 header is computed by the kernel.
1760
1761 BPF_LWT_ENCAP_SEG6_INLINE
1762 Only works if skb contains an IPv6 packet. Insert
1763 a Segment Routing Header (struct ipv6_sr_hdr)
1764 inside the IPv6 header.
1765
1766 A call to this helper is susceptible to change the under‐
1767 laying packet buffer. Therefore, at load time, all checks
1768 on pointers previously done by the verifier are invali‐
1769 dated and must be performed again, if the helper is used
1770 in combination with direct packet access.
1771
1772 Return 0 on success, or a negative error in case of failure.
1773
1774 int bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const
1775 void *from, u32 len)
1776
1777 Description
1778 Store len bytes from address from into the packet associ‐
1779 ated to skb, at offset. Only the flags, tag and TLVs
1780 inside the outermost IPv6 Segment Routing Header can be
1781 modified through this helper.
1782
1783 A call to this helper is susceptible to change the under‐
1784 laying packet buffer. Therefore, at load time, all checks
1785 on pointers previously done by the verifier are invali‐
1786 dated and must be performed again, if the helper is used
1787 in combination with direct packet access.
1788
1789 Return 0 on success, or a negative error in case of failure.
1790
1791 int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
1792
1793 Description
1794 Adjust the size allocated to TLVs in the outermost IPv6
1795 Segment Routing Header contained in the packet associated
1796 to skb, at position offset by delta bytes. Only offsets
1797 after the segments are accepted. delta can be as well
1798 positive (growing) as negative (shrinking).
1799
1800 A call to this helper is susceptible to change the under‐
1801 laying packet buffer. Therefore, at load time, all checks
1802 on pointers previously done by the verifier are invali‐
1803 dated and must be performed again, if the helper is used
1804 in combination with direct packet access.
1805
1806 Return 0 on success, or a negative error in case of failure.
1807
1808 int bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param,
1809 u32 param_len)
1810
1811 Description
1812 Apply an IPv6 Segment Routing action of type action to
1813 the packet associated to skb. Each action takes a parame‐
1814 ter contained at address param, and of length param_len
1815 bytes. action can be one of:
1816
1817 SEG6_LOCAL_ACTION_END_X
1818 End.X action: Endpoint with Layer-3 cross-connect.
1819 Type of param: struct in6_addr.
1820
1821 SEG6_LOCAL_ACTION_END_T
1822 End.T action: Endpoint with specific IPv6 table
1823 lookup. Type of param: int.
1824
1825 SEG6_LOCAL_ACTION_END_B6
1826 End.B6 action: Endpoint bound to an SRv6 policy.
1827 Type of param: struct ipv6_sr_hdr.
1828
1829 SEG6_LOCAL_ACTION_END_B6_ENCAP
1830 End.B6.Encap action: Endpoint bound to an SRv6
1831 encapsulation policy. Type of param: struct
1832 ipv6_sr_hdr.
1833
1834 A call to this helper is susceptible to change the under‐
1835 laying packet buffer. Therefore, at load time, all checks
1836 on pointers previously done by the verifier are invali‐
1837 dated and must be performed again, if the helper is used
1838 in combination with direct packet access.
1839
1840 Return 0 on success, or a negative error in case of failure.
1841
1842 int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
1843
1844 Description
1845 This helper is used in programs implementing IR decoding,
1846 to report a successfully decoded key press with scancode,
1847 toggle value in the given protocol. The scancode will be
1848 translated to a keycode using the rc keymap, and reported
1849 as an input key down event. After a period a key up event
1850 is generated. This period can be extended by calling
1851 either bpf_rc_keydown() again with the same values, or
1852 calling bpf_rc_repeat().
1853
1854 Some protocols include a toggle bit, in case the button
1855 was released and pressed again between consecutive scan‐
1856 codes.
1857
1858 The ctx should point to the lirc sample as passed into
1859 the program.
1860
1861 The protocol is the decoded protocol number (see enum
1862 rc_proto for some predefined values).
1863
1864 This helper is only available is the kernel was compiled
1865 with the CONFIG_BPF_LIRC_MODE2 configuration option set
1866 to "y".
1867
1868 Return
1869
1870 int bpf_rc_repeat(void *ctx)
1871
1872 Description
1873 This helper is used in programs implementing IR decoding,
1874 to report a successfully decoded repeat key message. This
1875 delays the generation of a key up event for previously
1876 generated key down event.
1877
1878 Some IR protocols like NEC have a special IR message for
1879 repeating last button, for when a button is held down.
1880
1881 The ctx should point to the lirc sample as passed into
1882 the program.
1883
1884 This helper is only available is the kernel was compiled
1885 with the CONFIG_BPF_LIRC_MODE2 configuration option set
1886 to "y".
1887
1888 Return
1889
1890 uint64_t bpf_skb_cgroup_id(struct sk_buff *skb)
1891
1892 Description
1893 Return the cgroup v2 id of the socket associated with the
1894 skb. This is roughly similar to the bpf_get_cgroup_clas‐
1895 sid() helper for cgroup v1 by providing a tag resp. iden‐
1896 tifier that can be matched on or used for map lookups
1897 e.g. to implement policy. The cgroup v2 id of a given
1898 path in the hierarchy is exposed in user space through
1899 the f_handle API in order to get to the same 64-bit id.
1900
1901 This helper can be used on TC egress path, but not on
1902 ingress, and is available only if the kernel was compiled
1903 with the CONFIG_SOCK_CGROUP_DATA configuration option.
1904
1905 Return The id is returned or 0 in case the id could not be
1906 retrieved.
1907
1908 u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level)
1909
1910 Description
1911 Return id of cgroup v2 that is ancestor of cgroup associ‐
1912 ated with the skb at the ancestor_level. The root cgroup
1913 is at ancestor_level zero and each step down the hierar‐
1914 chy increments the level. If ancestor_level == level of
1915 cgroup associated with skb, then return value will be
1916 same as that of bpf_skb_cgroup_id().
1917
1918 The helper is useful to implement policies based on
1919 cgroups that are upper in hierarchy than immediate cgroup
1920 associated with skb.
1921
1922 The format of returned id and helper limitations are same
1923 as in bpf_skb_cgroup_id().
1924
1925 Return The id is returned or 0 in case the id could not be
1926 retrieved.
1927
1928 u64 bpf_get_current_cgroup_id(void)
1929
1930 Return A 64-bit integer containing the current cgroup id based
1931 on the cgroup within which the current task is running.
1932
1933 void* get_local_storage(void *map, u64 flags)
1934
1935 Description
1936 Get the pointer to the local storage area. The type and
1937 the size of the local storage is defined by the map argu‐
1938 ment. The flags meaning is specific for each map type,
1939 and has to be 0 for cgroup local storage.
1940
1941 Depending on the BPF program type, a local storage area
1942 can be shared between multiple instances of the BPF pro‐
1943 gram, running simultaneously.
1944
1945 A user should care about the synchronization by themself.
1946 For example, by using the BPF_STX_XADD instruction to
1947 alter the shared data.
1948
1949 Return A pointer to the local storage area.
1950
1951 int bpf_sk_select_reuseport(struct sk_reuseport_md *reuse, struct
1952 bpf_map *map, void *key, u64 flags)
1953
1954 Description
1955 Select a SO_REUSEPORT socket from a BPF_MAP_TYPE_REUSE‐
1956 PORT_ARRAY map. It checks the selected socket is match‐
1957 ing the incoming request in the socket buffer.
1958
1959 Return 0 on success, or a negative error in case of failure.
1960
1961 struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple
1962 *tuple, u32 tuple_size, u64 netns, u64 flags)
1963
1964 Description
1965 Look for TCP socket matching tuple, optionally in a child
1966 network namespace netns. The return value must be
1967 checked, and if non-NULL, released via bpf_sk_release().
1968
1969 The ctx should point to the context of the program, such
1970 as the skb or socket (depending on the hook in use). This
1971 is used to determine the base network namespace for the
1972 lookup.
1973
1974 tuple_size must be one of:
1975
1976 sizeof(tuple->ipv4)
1977 Look for an IPv4 socket.
1978
1979 sizeof(tuple->ipv6)
1980 Look for an IPv6 socket.
1981
1982 If the netns is a negative signed 32-bit integer, then
1983 the socket lookup table in the netns associated with the
1984 ctx will will be used. For the TC hooks, this is the
1985 netns of the device in the skb. For socket hooks, this is
1986 the netns of the socket. If netns is any other signed
1987 32-bit value greater than or equal to zero then it speci‐
1988 fies the ID of the netns relative to the netns associated
1989 with the ctx. netns values beyond the range of 32-bit
1990 integers are reserved for future use.
1991
1992 All values for flags are reserved for future usage, and
1993 must be left at zero.
1994
1995 This helper is available only if the kernel was compiled
1996 with CONFIG_NET configuration option.
1997
1998 Return Pointer to struct bpf_sock, or NULL in case of failure.
1999 For sockets with reuseport option, the struct bpf_sock
2000 result is from reuse->socks[] using the hash of the
2001 tuple.
2002
2003 struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple
2004 *tuple, u32 tuple_size, u64 netns, u64 flags)
2005
2006 Description
2007 Look for UDP socket matching tuple, optionally in a child
2008 network namespace netns. The return value must be
2009 checked, and if non-NULL, released via bpf_sk_release().
2010
2011 The ctx should point to the context of the program, such
2012 as the skb or socket (depending on the hook in use). This
2013 is used to determine the base network namespace for the
2014 lookup.
2015
2016 tuple_size must be one of:
2017
2018 sizeof(tuple->ipv4)
2019 Look for an IPv4 socket.
2020
2021 sizeof(tuple->ipv6)
2022 Look for an IPv6 socket.
2023
2024 If the netns is a negative signed 32-bit integer, then
2025 the socket lookup table in the netns associated with the
2026 ctx will will be used. For the TC hooks, this is the
2027 netns of the device in the skb. For socket hooks, this is
2028 the netns of the socket. If netns is any other signed
2029 32-bit value greater than or equal to zero then it speci‐
2030 fies the ID of the netns relative to the netns associated
2031 with the ctx. netns values beyond the range of 32-bit
2032 integers are reserved for future use.
2033
2034 All values for flags are reserved for future usage, and
2035 must be left at zero.
2036
2037 This helper is available only if the kernel was compiled
2038 with CONFIG_NET configuration option.
2039
2040 Return Pointer to struct bpf_sock, or NULL in case of failure.
2041 For sockets with reuseport option, the struct bpf_sock
2042 result is from reuse->socks[] using the hash of the
2043 tuple.
2044
2045 int bpf_sk_release(struct bpf_sock *sock)
2046
2047 Description
2048 Release the reference held by sock. sock must be a
2049 non-NULL pointer that was returned from
2050 bpf_sk_lookup_xxx().
2051
2052 Return 0 on success, or a negative error in case of failure.
2053
2054 int bpf_map_pop_elem(struct bpf_map *map, void *value)
2055
2056 Description
2057 Pop an element from map.
2058
2059 Return 0 on success, or a negative error in case of failure.
2060
2061 int bpf_map_peek_elem(struct bpf_map *map, void *value)
2062
2063 Description
2064 Get an element from map without removing it.
2065
2066 Return 0 on success, or a negative error in case of failure.
2067
2068 int bpf_msg_push_data(struct sk_buff *skb, u32 start, u32 len, u64
2069 flags)
2070
2071 Description
2072 For socket policies, insert len bytes into msg at offset
2073 start.
2074
2075 If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
2076 it may want to insert metadata or options into the msg.
2077 This can later be read and used by any of the lower layer
2078 BPF hooks.
2079
2080 This helper may fail if under memory pressure (a malloc
2081 fails) in these cases BPF programs will get an appropri‐
2082 ate error and BPF programs will need to handle them.
2083
2084 Return 0 on success, or a negative error in case of failure.
2085
2086 int bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 pop, u64
2087 flags)
2088
2089 Description
2090 Will remove pop bytes from a msg starting at byte start.
2091 This may result in ENOMEM errors under certain situations
2092 if an allocation and copy are required due to a full ring
2093 buffer. However, the helper will try to avoid doing the
2094 allocation if possible. Other errors can occur if input
2095 parameters are invalid either due to start byte not being
2096 valid part of msg payload and/or pop value being to
2097 large.
2098
2099 Return 0 on success, or a negative error in case of failure.
2100
2101 int bpf_rc_pointer_rel(void *ctx, s32 rel_x, s32 rel_y)
2102
2103 Description
2104 This helper is used in programs implementing IR decoding,
2105 to report a successfully decoded pointer movement.
2106
2107 The ctx should point to the lirc sample as passed into
2108 the program.
2109
2110 This helper is only available is the kernel was compiled
2111 with the CONFIG_BPF_LIRC_MODE2 configuration option set
2112 to "y".
2113
2114 Return
2115
2117 Example usage for most of the eBPF helpers listed in this manual page
2118 are available within the Linux kernel sources, at the following loca‐
2119 tions:
2120
2121 · samples/bpf/
2122
2123 · tools/testing/selftests/bpf/
2124
2126 eBPF programs can have an associated license, passed along with the
2127 bytecode instructions to the kernel when the programs are loaded. The
2128 format for that string is identical to the one in use for kernel mod‐
2129 ules (Dual licenses, such as "Dual BSD/GPL", may be used). Some helper
2130 functions are only accessible to programs that are compatible with the
2131 GNU Privacy License (GPL).
2132
2133 In order to use such helpers, the eBPF program must be loaded with the
2134 correct license string passed (via attr) to the bpf() system call, and
2135 this generally translates into the C source code of the program con‐
2136 taining a line similar to the following:
2137
2138 char ____license[] __attribute__((section("license"), used)) = "GPL";
2139
2141 This manual page is an effort to document the existing eBPF helper
2142 functions. But as of this writing, the BPF sub-system is under heavy
2143 development. New eBPF program or map types are added, along with new
2144 helper functions. Some helpers are occasionally made available for
2145 additional program types. So in spite of the efforts of the community,
2146 this page might not be up-to-date. If you want to check by yourself
2147 what helper functions exist in your kernel, or what types of programs
2148 they can support, here are some files among the kernel tree that you
2149 may be interested in:
2150
2151 · include/uapi/linux/bpf.h is the main BPF header. It contains the full
2152 list of all helper functions, as well as many other BPF definitions
2153 including most of the flags, structs or constants used by the
2154 helpers.
2155
2156 · net/core/filter.c contains the definition of most network-related
2157 helper functions, and the list of program types from which they can
2158 be used.
2159
2160 · kernel/trace/bpf_trace.c is the equivalent for most tracing pro‐
2161 gram-related helpers.
2162
2163 · kernel/bpf/verifier.c contains the functions used to check that valid
2164 types of eBPF maps are used with a given helper function.
2165
2166 · kernel/bpf/ directory contains other files in which additional
2167 helpers are defined (for cgroups, sockmaps, etc.).
2168
2169 Compatibility between helper functions and program types can generally
2170 be found in the files where helper functions are defined. Look for the
2171 struct bpf_func_proto objects and for functions returning them: these
2172 functions contain a list of helpers that a given program type can call.
2173 Note that the default: label of the switch ... case used to filter
2174 helpers can call other functions, themselves allowing access to addi‐
2175 tional helpers. The requirement for GPL license is also in those struct
2176 bpf_func_proto.
2177
2178 Compatibility between helper functions and map types can be found in
2179 the check_map_func_compatibility() function in file kernel/bpf/veri‐
2180 fier.c.
2181
2182 Helper functions that invalidate the checks on data and data_end point‐
2183 ers for network processing are listed in function
2184 bpf_helper_changes_pkt_data() in file net/core/filter.c.
2185
2187 bpf(2), cgroups(7), ip(8), perf_event_open(2), sendmsg(2), socket(7),
2188 tc-bpf(8)
2189
2191 This page is part of release 5.02 of the Linux man-pages project. A
2192 description of the project, information about reporting bugs, and the
2193 latest version of this page, can be found at
2194 https://www.kernel.org/doc/man-pages/.
2195
2196
2197
2198Linux 2019-03-06 BPF-HELPERS(7)