1BPF-HELPERS(7) BPF-HELPERS(7)
2
3
4
6 BPF-HELPERS - list of eBPF helper functions
7
9 The extended Berkeley Packet Filter (eBPF) subsystem consists in pro‐
10 grams written in a pseudo-assembly language, then attached to one of
11 the several kernel hooks and run in reaction of specific events. This
12 framework differs from the older, "classic" BPF (or "cBPF") in several
13 aspects, one of them being the ability to call special functions (or
14 "helpers") from within a program. These functions are restricted to a
15 white-list of helpers defined in the kernel.
16
17 These helpers are used by eBPF programs to interact with the system, or
18 with the context in which they work. For instance, they can be used to
19 print debugging messages, to get the time since the system was booted,
20 to interact with eBPF maps, or to manipulate network packets. Since
21 there are several eBPF program types, and that they do not run in the
22 same context, each program type can only call a subset of those
23 helpers.
24
25 Due to eBPF conventions, a helper can not have more than five argu‐
26 ments.
27
28 Internally, eBPF programs call directly into the compiled helper func‐
29 tions without requiring any foreign-function interface. As a result,
30 calling helpers introduces no overhead, thus offering excellent perfor‐
31 mance.
32
33 This document is an attempt to list and document the helpers available
34 to eBPF developers. They are sorted by chronological order (the oldest
35 helpers in the kernel at the top).
36
38 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
39
40 Description
41 Perform a lookup in map for an entry associated to key.
42
43 Return Map value associated to key, or NULL if no entry was
44 found.
45
46 int bpf_map_update_elem(struct bpf_map *map, const void *key, const
47 void *value, u64 flags)
48
49 Description
50 Add or update the value of the entry associated to key in
51 map with value. flags is one of:
52
53 BPF_NOEXIST
54 The entry for key must not exist in the map.
55
56 BPF_EXIST
57 The entry for key must already exist in the map.
58
59 BPF_ANY
60 No condition on the existence of the entry for
61 key.
62
63 Flag value BPF_NOEXIST cannot be used for maps of types
64 BPF_MAP_TYPE_ARRAY or BPF_MAP_TYPE_PERCPU_ARRAY (all
65 elements always exist), the helper would return an error.
66
67 Return 0 on success, or a negative error in case of failure.
68
69 int bpf_map_delete_elem(struct bpf_map *map, const void *key)
70
71 Description
72 Delete entry with key from map.
73
74 Return 0 on success, or a negative error in case of failure.
75
76 int bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
77
78 Description
79 For tracing programs, safely attempt to read size bytes
80 from kernel space address unsafe_ptr and store the data
81 in dst.
82
83 Generally, use bpf_probe_read_user() or
84 bpf_probe_read_kernel() instead.
85
86 Return 0 on success, or a negative error in case of failure.
87
88 u64 bpf_ktime_get_ns(void)
89
90 Description
91 Return the time elapsed since system boot, in nanosec‐
92 onds. Does not include time the system was suspended.
93 See: clock_gettime(CLOCK_MONOTONIC)
94
95 Return Current ktime.
96
97 int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
98
99 Description
100 This helper is a "printk()-like" facility for debugging.
101 It prints a message defined by format fmt (of size
102 fmt_size) to file /sys/kernel/debug/tracing/trace from
103 DebugFS, if available. It can take up to three additional
104 u64 arguments (as an eBPF helpers, the total number of
105 arguments is limited to five).
106
107 Each time the helper is called, it appends a line to the
108 trace. Lines are discarded while /sys/kernel/debug/trac‐
109 ing/trace is open, use /sys/kernel/debug/trac‐
110 ing/trace_pipe to avoid this. The format of the trace is
111 customizable, and the exact output one will get depends
112 on the options set in /sys/kernel/debug/trac‐
113 ing/trace_options (see also the README file under the
114 same directory). However, it usually defaults to some‐
115 thing like:
116
117 telnet-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
118
119 In the above:
120
121 · telnet is the name of the current task.
122
123 · 470 is the PID of the current task.
124
125 · 001 is the CPU number on which the task is running.
126
127 · In .N.., each character refers to a set of options
128 (whether irqs are enabled, scheduling options,
129 whether hard/softirqs are running, level of pre‐
130 empt_disabled respectively). N means that
131 TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED are set.
132
133 · 419421.045894 is a timestamp.
134
135 · 0x00000001 is a fake value used by BPF for the
136 instruction pointer register.
137
138 · <formatted msg> is the message formatted with fmt.
139
140 The conversion specifiers supported by fmt are similar,
141 but more limited than for printk(). They are %d, %i, %u,
142 %x, %ld, %li, %lu, %lx, %lld, %lli, %llu, %llx, %p, %s.
143 No modifier (size of field, padding with zeroes, etc.) is
144 available, and the helper will return -EINVAL (but print
145 nothing) if it encounters an unknown specifier.
146
147 Also, note that bpf_trace_printk() is slow, and should
148 only be used for debugging purposes. For this reason, a
149 notice bloc (spanning several lines) is printed to kernel
150 logs and states that the helper should not be used "for
151 production use" the first time this helper is used (or
152 more precisely, when trace_printk() buffers are allo‐
153 cated). For passing values to user space, perf events
154 should be preferred.
155
156 Return The number of bytes written to the buffer, or a negative
157 error in case of failure.
158
159 u32 bpf_get_prandom_u32(void)
160
161 Description
162 Get a pseudo-random number.
163
164 From a security point of view, this helper uses its own
165 pseudo-random internal state, and cannot be used to infer
166 the seed of other random functions in the kernel. How‐
167 ever, it is essential to note that the generator used by
168 the helper is not cryptographically secure.
169
170 Return A random 32-bit unsigned value.
171
172 u32 bpf_get_smp_processor_id(void)
173
174 Description
175 Get the SMP (symmetric multiprocessing) processor id.
176 Note that all programs run with preemption disabled,
177 which means that the SMP processor id is stable during
178 all the execution of the program.
179
180 Return The SMP id of the processor running the program.
181
182 int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void
183 *from, u32 len, u64 flags)
184
185 Description
186 Store len bytes from address from into the packet associ‐
187 ated to skb, at offset. flags are a combination of
188 BPF_F_RECOMPUTE_CSUM (automatically recompute the check‐
189 sum for the packet after storing the bytes) and
190 BPF_F_INVALIDATE_HASH (set skb->hash, skb->swhash and
191 skb->l4hash to 0).
192
193 A call to this helper is susceptible to change the under‐
194 lying packet buffer. Therefore, at load time, all checks
195 on pointers previously done by the verifier are invali‐
196 dated and must be performed again, if the helper is used
197 in combination with direct packet access.
198
199 Return 0 on success, or a negative error in case of failure.
200
201 int bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
202 to, u64 size)
203
204 Description
205 Recompute the layer 3 (e.g. IP) checksum for the packet
206 associated to skb. Computation is incremental, so the
207 helper must know the former value of the header field
208 that was modified (from), the new value of this field
209 (to), and the number of bytes (2 or 4) for this field,
210 stored in size. Alternatively, it is possible to store
211 the difference between the previous and the new values of
212 the header field in to, by setting from and size to 0.
213 For both methods, offset indicates the location of the IP
214 checksum within the packet.
215
216 This helper works in combination with bpf_csum_diff(),
217 which does not update the checksum in-place, but offers
218 more flexibility and can handle sizes larger than 2 or 4
219 for the checksum to update.
220
221 A call to this helper is susceptible to change the under‐
222 lying packet buffer. Therefore, at load time, all checks
223 on pointers previously done by the verifier are invali‐
224 dated and must be performed again, if the helper is used
225 in combination with direct packet access.
226
227 Return 0 on success, or a negative error in case of failure.
228
229 int bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
230 to, u64 flags)
231
232 Description
233 Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum
234 for the packet associated to skb. Computation is incre‐
235 mental, so the helper must know the former value of the
236 header field that was modified (from), the new value of
237 this field (to), and the number of bytes (2 or 4) for
238 this field, stored on the lowest four bits of flags.
239 Alternatively, it is possible to store the difference
240 between the previous and the new values of the header
241 field in to, by setting from and the four lowest bits of
242 flags to 0. For both methods, offset indicates the loca‐
243 tion of the IP checksum within the packet. In addition to
244 the size of the field, flags can be added (bitwise OR)
245 actual flags. With BPF_F_MARK_MANGLED_0, a null checksum
246 is left untouched (unless BPF_F_MARK_ENFORCE is added as
247 well), and for updates resulting in a null checksum the
248 value is set to CSUM_MANGLED_0 instead. Flag
249 BPF_F_PSEUDO_HDR indicates the checksum is to be computed
250 against a pseudo-header.
251
252 This helper works in combination with bpf_csum_diff(),
253 which does not update the checksum in-place, but offers
254 more flexibility and can handle sizes larger than 2 or 4
255 for the checksum to update.
256
257 A call to this helper is susceptible to change the under‐
258 lying packet buffer. Therefore, at load time, all checks
259 on pointers previously done by the verifier are invali‐
260 dated and must be performed again, if the helper is used
261 in combination with direct packet access.
262
263 Return 0 on success, or a negative error in case of failure.
264
265 int bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 index)
266
267 Description
268 This special helper is used to trigger a "tail call", or
269 in other words, to jump into another eBPF program. The
270 same stack frame is used (but values on stack and in reg‐
271 isters for the caller are not accessible to the callee).
272 This mechanism allows for program chaining, either for
273 raising the maximum number of available eBPF instruc‐
274 tions, or to execute given programs in conditional
275 blocks. For security reasons, there is an upper limit to
276 the number of successive tail calls that can be per‐
277 formed.
278
279 Upon call of this helper, the program attempts to jump
280 into a program referenced at index index in
281 prog_array_map, a special map of type
282 BPF_MAP_TYPE_PROG_ARRAY, and passes ctx, a pointer to the
283 context.
284
285 If the call succeeds, the kernel immediately runs the
286 first instruction of the new program. This is not a func‐
287 tion call, and it never returns to the previous program.
288 If the call fails, then the helper has no effect, and the
289 caller continues to run its subsequent instructions. A
290 call can fail if the destination program for the jump
291 does not exist (i.e. index is superior to the number of
292 entries in prog_array_map), or if the maximum number of
293 tail calls has been reached for this chain of programs.
294 This limit is defined in the kernel by the macro
295 MAX_TAIL_CALL_CNT (not accessible to user space), which
296 is currently set to 32.
297
298 Return 0 on success, or a negative error in case of failure.
299
300 int bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
301
302 Description
303 Clone and redirect the packet associated to skb to
304 another net device of index ifindex. Both ingress and
305 egress interfaces can be used for redirection. The
306 BPF_F_INGRESS value in flags is used to make the distinc‐
307 tion (ingress path is selected if the flag is present,
308 egress path otherwise). This is the only flag supported
309 for now.
310
311 In comparison with bpf_redirect() helper, bpf_clone_redi‐
312 rect() has the associated cost of duplicating the packet
313 buffer, but this can be executed out of the eBPF program.
314 Conversely, bpf_redirect() is more efficient, but it is
315 handled through an action code where the redirection hap‐
316 pens only after the eBPF program has returned.
317
318 A call to this helper is susceptible to change the under‐
319 lying packet buffer. Therefore, at load time, all checks
320 on pointers previously done by the verifier are invali‐
321 dated and must be performed again, if the helper is used
322 in combination with direct packet access.
323
324 Return 0 on success, or a negative error in case of failure.
325
326 u64 bpf_get_current_pid_tgid(void)
327
328 Return A 64-bit integer containing the current tgid and pid, and
329 created as such: current_task->tgid << 32 | cur‐
330 rent_task->pid.
331
332 u64 bpf_get_current_uid_gid(void)
333
334 Return A 64-bit integer containing the current GID and UID, and
335 created as such: current_gid << 32 | current_uid.
336
337 int bpf_get_current_comm(void *buf, u32 size_of_buf)
338
339 Description
340 Copy the comm attribute of the current task into buf of
341 size_of_buf. The comm attribute contains the name of the
342 executable (excluding the path) for the current task. The
343 size_of_buf must be strictly positive. On success, the
344 helper makes sure that the buf is NUL-terminated. On
345 failure, it is filled with zeroes.
346
347 Return 0 on success, or a negative error in case of failure.
348
349 u32 bpf_get_cgroup_classid(struct sk_buff *skb)
350
351 Description
352 Retrieve the classid for the current task, i.e. for the
353 net_cls cgroup to which skb belongs.
354
355 This helper can be used on TC egress path, but not on
356 ingress.
357
358 The net_cls cgroup provides an interface to tag network
359 packets based on a user-provided identifier for all traf‐
360 fic coming from the tasks belonging to the related
361 cgroup. See also the related kernel documentation, avail‐
362 able from the Linux sources in file Documenta‐
363 tion/admin-guide/cgroup-v1/net_cls.rst.
364
365 The Linux kernel has two versions for cgroups: there are
366 cgroups v1 and cgroups v2. Both are available to users,
367 who can use a mixture of them, but note that the net_cls
368 cgroup is for cgroup v1 only. This makes it incompatible
369 with BPF programs run on cgroups, which is a
370 cgroup-v2-only feature (a socket can only hold data for
371 one version of cgroups at a time).
372
373 This helper is only available is the kernel was compiled
374 with the CONFIG_CGROUP_NET_CLASSID configuration option
375 set to "y" or to "m".
376
377 Return The classid, or 0 for the default unconfigured classid.
378
379 int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16
380 vlan_tci)
381
382 Description
383 Push a vlan_tci (VLAN tag control information) of proto‐
384 col vlan_proto to the packet associated to skb, then
385 update the checksum. Note that if vlan_proto is different
386 from ETH_P_8021Q and ETH_P_8021AD, it is considered to be
387 ETH_P_8021Q.
388
389 A call to this helper is susceptible to change the under‐
390 lying packet buffer. Therefore, at load time, all checks
391 on pointers previously done by the verifier are invali‐
392 dated and must be performed again, if the helper is used
393 in combination with direct packet access.
394
395 Return 0 on success, or a negative error in case of failure.
396
397 int bpf_skb_vlan_pop(struct sk_buff *skb)
398
399 Description
400 Pop a VLAN header from the packet associated to skb.
401
402 A call to this helper is susceptible to change the under‐
403 lying packet buffer. Therefore, at load time, all checks
404 on pointers previously done by the verifier are invali‐
405 dated and must be performed again, if the helper is used
406 in combination with direct packet access.
407
408 Return 0 on success, or a negative error in case of failure.
409
410 int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key
411 *key, u32 size, u64 flags)
412
413 Description
414 Get tunnel metadata. This helper takes a pointer key to
415 an empty struct bpf_tunnel_key of size, that will be
416 filled with tunnel metadata for the packet associated to
417 skb. The flags can be set to BPF_F_TUNINFO_IPV6, which
418 indicates that the tunnel is based on IPv6 protocol
419 instead of IPv4.
420
421 The struct bpf_tunnel_key is an object that generalizes
422 the principal parameters used by various tunneling proto‐
423 cols into a single struct. This way, it can be used to
424 easily make a decision based on the contents of the
425 encapsulation header, "summarized" in this struct. In
426 particular, it holds the IP address of the remote end
427 (IPv4 or IPv6, depending on the case) in key->remote_ipv4
428 or key->remote_ipv6. Also, this struct exposes the
429 key->tunnel_id, which is generally mapped to a VNI (Vir‐
430 tual Network Identifier), making it programmable together
431 with the bpf_skb_set_tunnel_key() helper.
432
433 Let's imagine that the following code is part of a pro‐
434 gram attached to the TC ingress interface, on one end of
435 a GRE tunnel, and is supposed to filter out all messages
436 coming from remote ends with IPv4 address other than
437 10.0.0.1:
438
439 int ret;
440 struct bpf_tunnel_key key = {};
441
442 ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
443 if (ret < 0)
444 return TC_ACT_SHOT; // drop packet
445
446 if (key.remote_ipv4 != 0x0a000001)
447 return TC_ACT_SHOT; // drop packet
448
449 return TC_ACT_OK; // accept packet
450
451 This interface can also be used with all encapsulation
452 devices that can operate in "collect metadata" mode:
453 instead of having one network device per specific config‐
454 uration, the "collect metadata" mode only requires a sin‐
455 gle device where the configuration can be extracted from
456 this helper.
457
458 This can be used together with various tunnels such as
459 VXLan, Geneve, GRE or IP in IP (IPIP).
460
461 Return 0 on success, or a negative error in case of failure.
462
463 int bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key
464 *key, u32 size, u64 flags)
465
466 Description
467 Populate tunnel metadata for packet associated to skb.
468 The tunnel metadata is set to the contents of key, of
469 size. The flags can be set to a combination of the fol‐
470 lowing values:
471
472 BPF_F_TUNINFO_IPV6
473 Indicate that the tunnel is based on IPv6 protocol
474 instead of IPv4.
475
476 BPF_F_ZERO_CSUM_TX
477 For IPv4 packets, add a flag to tunnel metadata
478 indicating that checksum computation should be
479 skipped and checksum set to zeroes.
480
481 BPF_F_DONT_FRAGMENT
482 Add a flag to tunnel metadata indicating that the
483 packet should not be fragmented.
484
485 BPF_F_SEQ_NUMBER
486 Add a flag to tunnel metadata indicating that a
487 sequence number should be added to tunnel header
488 before sending the packet. This flag was added for
489 GRE encapsulation, but might be used with other
490 protocols as well in the future.
491
492 Here is a typical usage on the transmit path:
493
494 struct bpf_tunnel_key key;
495 populate key ...
496 bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
497 bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
498
499 See also the description of the bpf_skb_get_tunnel_key()
500 helper for additional information.
501
502 Return 0 on success, or a negative error in case of failure.
503
504 u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
505
506 Description
507 Read the value of a perf event counter. This helper
508 relies on a map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY.
509 The nature of the perf event counter is selected when map
510 is updated with perf event file descriptors. The map is
511 an array whose size is the number of available CPUs, and
512 each cell contains a value relative to one CPU. The value
513 to retrieve is indicated by flags, that contains the
514 index of the CPU to look up, masked with
515 BPF_F_INDEX_MASK. Alternatively, flags can be set to
516 BPF_F_CURRENT_CPU to indicate that the value for the cur‐
517 rent CPU should be retrieved.
518
519 Note that before Linux 4.13, only hardware perf event can
520 be retrieved.
521
522 Also, be aware that the newer helper
523 bpf_perf_event_read_value() is recommended over
524 bpf_perf_event_read() in general. The latter has some ABI
525 quirks where error and counter value are used as a return
526 code (which is wrong to do since ranges may overlap).
527 This issue is fixed with bpf_perf_event_read_value(),
528 which at the same time provides more features over the
529 bpf_perf_event_read() interface. Please refer to the
530 description of bpf_perf_event_read_value() for details.
531
532 Return The value of the perf event counter read from the map, or
533 a negative error code in case of failure.
534
535 int bpf_redirect(u32 ifindex, u64 flags)
536
537 Description
538 Redirect the packet to another net device of index
539 ifindex. This helper is somewhat similar to
540 bpf_clone_redirect(), except that the packet is not
541 cloned, which provides increased performance.
542
543 Except for XDP, both ingress and egress interfaces can be
544 used for redirection. The BPF_F_INGRESS value in flags is
545 used to make the distinction (ingress path is selected if
546 the flag is present, egress path otherwise). Currently,
547 XDP only supports redirection to the egress interface,
548 and accepts no flag at all.
549
550 The same effect can also be attained with the more
551 generic bpf_redirect_map(), which uses a BPF map to store
552 the redirect target instead of providing it directly to
553 the helper.
554
555 Return For XDP, the helper returns XDP_REDIRECT on success or
556 XDP_ABORTED on error. For other program types, the values
557 are TC_ACT_REDIRECT on success or TC_ACT_SHOT on error.
558
559 u32 bpf_get_route_realm(struct sk_buff *skb)
560
561 Description
562 Retrieve the realm or the route, that is to say the
563 tclassid field of the destination for the skb. The inden‐
564 tifier retrieved is a user-provided tag, similar to the
565 one used with the net_cls cgroup (see description for
566 bpf_get_cgroup_classid() helper), but here this tag is
567 held by a route (a destination entry), not by a task.
568
569 Retrieving this identifier works with the clsact TC
570 egress hook (see also tc-bpf(8)), or alternatively on
571 conventional classful egress qdiscs, but not on TC
572 ingress path. In case of clsact TC egress hook, this has
573 the advantage that, internally, the destination entry has
574 not been dropped yet in the transmit path. Therefore, the
575 destination entry does not need to be artificially held
576 via netif_keep_dst() for a classful qdisc until the skb
577 is freed.
578
579 This helper is available only if the kernel was compiled
580 with CONFIG_IP_ROUTE_CLASSID configuration option.
581
582 Return The realm of the route for the packet associated to skb,
583 or 0 if none was found.
584
585 int bpf_perf_event_output(void *ctx, struct bpf_map *map, u64 flags,
586 void *data, u64 size)
587
588 Description
589 Write raw data blob into a special BPF perf event held by
590 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
591 event must have the following attributes: PERF_SAMPLE_RAW
592 as sample_type, PERF_TYPE_SOFTWARE as type, and
593 PERF_COUNT_SW_BPF_OUTPUT as config.
594
595 The flags are used to indicate the index in map for which
596 the value must be put, masked with BPF_F_INDEX_MASK.
597 Alternatively, flags can be set to BPF_F_CURRENT_CPU to
598 indicate that the index of the current CPU core should be
599 used.
600
601 The value to write, of size, is passed through eBPF stack
602 and pointed by data.
603
604 The context of the program ctx needs also be passed to
605 the helper.
606
607 On user space, a program willing to read the values needs
608 to call perf_event_open() on the perf event (either for
609 one or for all CPUs) and to store the file descriptor
610 into the map. This must be done before the eBPF program
611 can send data into it. An example is available in file
612 samples/bpf/trace_output_user.c in the Linux kernel
613 source tree (the eBPF program counterpart is in sam‐
614 ples/bpf/trace_output_kern.c).
615
616 bpf_perf_event_output() achieves better performance than
617 bpf_trace_printk() for sharing data with user space, and
618 is much better suitable for streaming data from eBPF pro‐
619 grams.
620
621 Note that this helper is not restricted to tracing use
622 cases and can be used with programs attached to TC or XDP
623 as well, where it allows for passing data to user space
624 listeners. Data can be:
625
626 · Only custom structs,
627
628 · Only the packet payload, or
629
630 · A combination of both.
631
632 Return 0 on success, or a negative error in case of failure.
633
634 int bpf_skb_load_bytes(const void *skb, u32 offset, void *to, u32 len)
635
636 Description
637 This helper was provided as an easy way to load data from
638 a packet. It can be used to load len bytes from offset
639 from the packet associated to skb, into the buffer
640 pointed by to.
641
642 Since Linux 4.7, usage of this helper has mostly been
643 replaced by "direct packet access", enabling packet data
644 to be manipulated with skb->data and skb->data_end point‐
645 ing respectively to the first byte of packet data and to
646 the byte after the last byte of packet data. However, it
647 remains useful if one wishes to read large quantities of
648 data at once from a packet into the eBPF stack.
649
650 Return 0 on success, or a negative error in case of failure.
651
652 int bpf_get_stackid(void *ctx, struct bpf_map *map, u64 flags)
653
654 Description
655 Walk a user or a kernel stack and return its id. To
656 achieve this, the helper needs ctx, which is a pointer to
657 the context on which the tracing program is executed, and
658 a pointer to a map of type BPF_MAP_TYPE_STACK_TRACE.
659
660 The last argument, flags, holds the number of stack
661 frames to skip (from 0 to 255), masked with
662 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set a
663 combination of the following flags:
664
665 BPF_F_USER_STACK
666 Collect a user space stack instead of a kernel
667 stack.
668
669 BPF_F_FAST_STACK_CMP
670 Compare stacks by hash only.
671
672 BPF_F_REUSE_STACKID
673 If two different stacks hash into the same
674 stackid, discard the old one.
675
676 The stack id retrieved is a 32 bit long integer handle
677 which can be further combined with other data (including
678 other stack ids) and used as a key into maps. This can be
679 useful for generating a variety of graphs (such as flame
680 graphs or off-cpu graphs).
681
682 For walking a stack, this helper is an improvement over
683 bpf_probe_read(), which can be used with unrolled loops
684 but is not efficient and consumes a lot of eBPF instruc‐
685 tions. Instead, bpf_get_stackid() can collect up to
686 PERF_MAX_STACK_DEPTH both kernel and user frames. Note
687 that this limit can be controlled with the sysctl pro‐
688 gram, and that it should be manually increased in order
689 to profile long user stacks (such as stacks for Java pro‐
690 grams). To do so, use:
691
692 # sysctl kernel.perf_event_max_stack=<new value>
693
694 Return The positive or null stack id on success, or a negative
695 error in case of failure.
696
697 s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size,
698 __wsum seed)
699
700 Description
701 Compute a checksum difference, from the raw buffer
702 pointed by from, of length from_size (that must be a mul‐
703 tiple of 4), towards the raw buffer pointed by to, of
704 size to_size (same remark). An optional seed can be added
705 to the value (this can be cascaded, the seed may come
706 from a previous call to the helper).
707
708 This is flexible enough to be used in several ways:
709
710 · With from_size == 0, to_size > 0 and seed set to check‐
711 sum, it can be used when pushing new data.
712
713 · With from_size > 0, to_size == 0 and seed set to check‐
714 sum, it can be used when removing data from a packet.
715
716 · With from_size > 0, to_size > 0 and seed set to 0, it
717 can be used to compute a diff. Note that from_size and
718 to_size do not need to be equal.
719
720 This helper can be used in combination with
721 bpf_l3_csum_replace() and bpf_l4_csum_replace(), to which
722 one can feed in the difference computed with
723 bpf_csum_diff().
724
725 Return The checksum result, or a negative error code in case of
726 failure.
727
728 int bpf_skb_get_tunnel_opt(struct sk_buff *skb, void *opt, u32 size)
729
730 Description
731 Retrieve tunnel options metadata for the packet associ‐
732 ated to skb, and store the raw tunnel option data to the
733 buffer opt of size.
734
735 This helper can be used with encapsulation devices that
736 can operate in "collect metadata" mode (please refer to
737 the related note in the description of bpf_skb_get_tun‐
738 nel_key() for more details). A particular example where
739 this can be used is in combination with the Geneve encap‐
740 sulation protocol, where it allows for pushing (with
741 bpf_skb_get_tunnel_opt() helper) and retrieving arbitrary
742 TLVs (Type-Length-Value headers) from the eBPF program.
743 This allows for full customization of these headers.
744
745 Return The size of the option data retrieved.
746
747 int bpf_skb_set_tunnel_opt(struct sk_buff *skb, void *opt, u32 size)
748
749 Description
750 Set tunnel options metadata for the packet associated to
751 skb to the option data contained in the raw buffer opt of
752 size.
753
754 See also the description of the bpf_skb_get_tunnel_opt()
755 helper for additional information.
756
757 Return 0 on success, or a negative error in case of failure.
758
759 int bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
760
761 Description
762 Change the protocol of the skb to proto. Currently sup‐
763 ported are transition from IPv4 to IPv6, and from IPv6 to
764 IPv4. The helper takes care of the groundwork for the
765 transition, including resizing the socket buffer. The
766 eBPF program is expected to fill the new headers, if any,
767 via skb_store_bytes() and to recompute the checksums with
768 bpf_l3_csum_replace() and bpf_l4_csum_replace(). The main
769 case for this helper is to perform NAT64 operations out
770 of an eBPF program.
771
772 Internally, the GSO type is marked as dodgy so that head‐
773 ers are checked and segments are recalculated by the
774 GSO/GRO engine. The size for GSO target is adapted as
775 well.
776
777 All values for flags are reserved for future usage, and
778 must be left at zero.
779
780 A call to this helper is susceptible to change the under‐
781 lying packet buffer. Therefore, at load time, all checks
782 on pointers previously done by the verifier are invali‐
783 dated and must be performed again, if the helper is used
784 in combination with direct packet access.
785
786 Return 0 on success, or a negative error in case of failure.
787
788 int bpf_skb_change_type(struct sk_buff *skb, u32 type)
789
790 Description
791 Change the packet type for the packet associated to skb.
792 This comes down to setting skb->pkt_type to type, except
793 the eBPF program does not have a write access to
794 skb->pkt_type beside this helper. Using a helper here
795 allows for graceful handling of errors.
796
797 The major use case is to change incoming skb*s to
798 **PACKET_HOST* in a programmatic way instead of having to
799 recirculate via redirect(..., BPF_F_INGRESS), for exam‐
800 ple.
801
802 Note that type only allows certain values. At this time,
803 they are:
804
805 PACKET_HOST
806 Packet is for us.
807
808 PACKET_BROADCAST
809 Send packet to all.
810
811 PACKET_MULTICAST
812 Send packet to group.
813
814 PACKET_OTHERHOST
815 Send packet to someone else.
816
817 Return 0 on success, or a negative error in case of failure.
818
819 int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32
820 index)
821
822 Description
823 Check whether skb is a descendant of the cgroup2 held by
824 map of type BPF_MAP_TYPE_CGROUP_ARRAY, at index.
825
826 Return The return value depends on the result of the test, and
827 can be:
828
829 · 0, if the skb failed the cgroup2 descendant test.
830
831 · 1, if the skb succeeded the cgroup2 descendant test.
832
833 · A negative error code, if an error occurred.
834
835 u32 bpf_get_hash_recalc(struct sk_buff *skb)
836
837 Description
838 Retrieve the hash of the packet, skb->hash. If it is not
839 set, in particular if the hash was cleared due to man‐
840 gling, recompute this hash. Later accesses to the hash
841 can be done directly with skb->hash.
842
843 Calling bpf_set_hash_invalid(), changing a packet proto‐
844 type with bpf_skb_change_proto(), or calling
845 bpf_skb_store_bytes() with the BPF_F_INVALIDATE_HASH are
846 actions susceptible to clear the hash and to trigger a
847 new computation for the next call to
848 bpf_get_hash_recalc().
849
850 Return The 32-bit hash.
851
852 u64 bpf_get_current_task(void)
853
854 Return A pointer to the current task struct.
855
856 int bpf_probe_write_user(void *dst, const void *src, u32 len)
857
858 Description
859 Attempt in a safe way to write len bytes from the buffer
860 src to dst in memory. It only works for threads that are
861 in user context, and dst must be a valid user space
862 address.
863
864 This helper should not be used to implement any kind of
865 security mechanism because of TOC-TOU attacks, but rather
866 to debug, divert, and manipulate execution of semi-coop‐
867 erative processes.
868
869 Keep in mind that this feature is meant for experiments,
870 and it has a risk of crashing the system and running pro‐
871 grams. Therefore, when an eBPF program using this helper
872 is attached, a warning including PID and process name is
873 printed to kernel logs.
874
875 Return 0 on success, or a negative error in case of failure.
876
877 int bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
878
879 Description
880 Check whether the probe is being run is the context of a
881 given subset of the cgroup2 hierarchy. The cgroup2 to
882 test is held by map of type BPF_MAP_TYPE_CGROUP_ARRAY, at
883 index.
884
885 Return The return value depends on the result of the test, and
886 can be:
887
888 · 0, if the skb task belongs to the cgroup2.
889
890 · 1, if the skb task does not belong to the cgroup2.
891
892 · A negative error code, if an error occurred.
893
894 int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
895
896 Description
897 Resize (trim or grow) the packet associated to skb to the
898 new len. The flags are reserved for future usage, and
899 must be left at zero.
900
901 The basic idea is that the helper performs the needed
902 work to change the size of the packet, then the eBPF pro‐
903 gram rewrites the rest via helpers like
904 bpf_skb_store_bytes(), bpf_l3_csum_replace(),
905 bpf_l3_csum_replace() and others. This helper is a slow
906 path utility intended for replies with control messages.
907 And because it is targeted for slow path, the helper
908 itself can afford to be slow: it implicitly linearizes,
909 unclones and drops offloads from the skb.
910
911 A call to this helper is susceptible to change the under‐
912 lying packet buffer. Therefore, at load time, all checks
913 on pointers previously done by the verifier are invali‐
914 dated and must be performed again, if the helper is used
915 in combination with direct packet access.
916
917 Return 0 on success, or a negative error in case of failure.
918
919 int bpf_skb_pull_data(struct sk_buff *skb, u32 len)
920
921 Description
922 Pull in non-linear data in case the skb is non-linear and
923 not all of len are part of the linear section. Make len
924 bytes from skb readable and writable. If a zero value is
925 passed for len, then the whole length of the skb is
926 pulled.
927
928 This helper is only needed for reading and writing with
929 direct packet access.
930
931 For direct packet access, testing that offsets to access
932 are within packet boundaries (test on skb->data_end) is
933 susceptible to fail if offsets are invalid, or if the
934 requested data is in non-linear parts of the skb. On
935 failure the program can just bail out, or in the case of
936 a non-linear buffer, use a helper to make the data avail‐
937 able. The bpf_skb_load_bytes() helper is a first solution
938 to access the data. Another one consists in using
939 bpf_skb_pull_data to pull in once the non-linear parts,
940 then retesting and eventually access the data.
941
942 At the same time, this also makes sure the skb is
943 uncloned, which is a necessary condition for direct
944 write. As this needs to be an invariant for the write
945 part only, the verifier detects writes and adds a pro‐
946 logue that is calling bpf_skb_pull_data() to effectively
947 unclone the skb from the very beginning in case it is
948 indeed cloned.
949
950 A call to this helper is susceptible to change the under‐
951 lying packet buffer. Therefore, at load time, all checks
952 on pointers previously done by the verifier are invali‐
953 dated and must be performed again, if the helper is used
954 in combination with direct packet access.
955
956 Return 0 on success, or a negative error in case of failure.
957
958 s64 bpf_csum_update(struct sk_buff *skb, __wsum csum)
959
960 Description
961 Add the checksum csum into skb->csum in case the driver
962 has supplied a checksum for the entire packet into that
963 field. Return an error otherwise. This helper is intended
964 to be used in combination with bpf_csum_diff(), in par‐
965 ticular when the checksum needs to be updated after data
966 has been written into the packet through direct packet
967 access.
968
969 Return The checksum on success, or a negative error code in case
970 of failure.
971
972 void bpf_set_hash_invalid(struct sk_buff *skb)
973
974 Description
975 Invalidate the current skb->hash. It can be used after
976 mangling on headers through direct packet access, in
977 order to indicate that the hash is outdated and to trig‐
978 ger a recalculation the next time the kernel tries to
979 access this hash or when the bpf_get_hash_recalc() helper
980 is called.
981
982 int bpf_get_numa_node_id(void)
983
984 Description
985 Return the id of the current NUMA node. The primary use
986 case for this helper is the selection of sockets for the
987 local NUMA node, when the program is attached to sockets
988 using the SO_ATTACH_REUSEPORT_EBPF option (see also
989 socket(7)), but the helper is also available to other
990 eBPF program types, similarly to bpf_get_smp_proces‐
991 sor_id().
992
993 Return The id of current NUMA node.
994
995 int bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
996
997 Description
998 Grows headroom of packet associated to skb and adjusts
999 the offset of the MAC header accordingly, adding len
1000 bytes of space. It automatically extends and reallocates
1001 memory as required.
1002
1003 This helper can be used on a layer 3 skb to push a MAC
1004 header for redirection into a layer 2 device.
1005
1006 All values for flags are reserved for future usage, and
1007 must be left at zero.
1008
1009 A call to this helper is susceptible to change the under‐
1010 lying packet buffer. Therefore, at load time, all checks
1011 on pointers previously done by the verifier are invali‐
1012 dated and must be performed again, if the helper is used
1013 in combination with direct packet access.
1014
1015 Return 0 on success, or a negative error in case of failure.
1016
1017 int bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
1018
1019 Description
1020 Adjust (move) xdp_md->data by delta bytes. Note that it
1021 is possible to use a negative value for delta. This
1022 helper can be used to prepare the packet for pushing or
1023 popping headers.
1024
1025 A call to this helper is susceptible to change the under‐
1026 lying packet buffer. Therefore, at load time, all checks
1027 on pointers previously done by the verifier are invali‐
1028 dated and must be performed again, if the helper is used
1029 in combination with direct packet access.
1030
1031 Return 0 on success, or a negative error in case of failure.
1032
1033 int bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
1034
1035 Description
1036 Copy a NUL terminated string from an unsafe kernel
1037 address unsafe_ptr to dst. See bpf_probe_read_ker‐
1038 nel_str() for more details.
1039
1040 Generally, use bpf_probe_read_user_str() or
1041 bpf_probe_read_kernel_str() instead.
1042
1043 Return On success, the strictly positive length of the string,
1044 including the trailing NUL character. On error, a nega‐
1045 tive value.
1046
1047 u64 bpf_get_socket_cookie(struct sk_buff *skb)
1048
1049 Description
1050 If the struct sk_buff pointed by skb has a known socket,
1051 retrieve the cookie (generated by the kernel) of this
1052 socket. If no cookie has been set yet, generate a new
1053 cookie. Once generated, the socket cookie remains stable
1054 for the life of the socket. This helper can be useful for
1055 monitoring per socket networking traffic statistics as it
1056 provides a global socket identifier that can be assumed
1057 unique.
1058
1059 Return A 8-byte long non-decreasing number on success, or 0 if
1060 the socket field is missing inside skb.
1061
1062 u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
1063
1064 Description
1065 Equivalent to bpf_get_socket_cookie() helper that accepts
1066 skb, but gets socket from struct bpf_sock_addr context.
1067
1068 Return A 8-byte long non-decreasing number.
1069
1070 u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
1071
1072 Description
1073 Equivalent to bpf_get_socket_cookie() helper that accepts
1074 skb, but gets socket from struct bpf_sock_ops context.
1075
1076 Return A 8-byte long non-decreasing number.
1077
1078 u32 bpf_get_socket_uid(struct sk_buff *skb)
1079
1080 Return The owner UID of the socket associated to skb. If the
1081 socket is NULL, or if it is not a full socket (i.e. if it
1082 is a time-wait or a request socket instead), overflowuid
1083 value is returned (note that overflowuid might also be
1084 the actual UID value for the socket).
1085
1086 u32 bpf_set_hash(struct sk_buff *skb, u32 hash)
1087
1088 Description
1089 Set the full hash for skb (set the field skb->hash) to
1090 value hash.
1091
1092 Return 0
1093
1094 int bpf_setsockopt(void *bpf_socket, int level, int optname, void *opt‐
1095 val, int optlen)
1096
1097 Description
1098 Emulate a call to setsockopt() on the socket associated
1099 to bpf_socket, which must be a full socket. The level at
1100 which the option resides and the name optname of the
1101 option must be specified, see setsockopt(2) for more
1102 information. The option value of length optlen is
1103 pointed by optval.
1104
1105 bpf_socket should be one of the following:
1106
1107 · struct bpf_sock_ops for BPF_PROG_TYPE_SOCK_OPS.
1108
1109 · struct bpf_sock_addr for BPF_CGROUP_INET4_CONNECT and
1110 BPF_CGROUP_INET6_CONNECT.
1111
1112 This helper actually implements a subset of setsockopt().
1113 It supports the following levels:
1114
1115 · SOL_SOCKET, which supports the following optnames:
1116 SO_RCVBUF, SO_SNDBUF, SO_MAX_PACING_RATE, SO_PRIORITY,
1117 SO_RCVLOWAT, SO_MARK.
1118
1119 · IPPROTO_TCP, which supports the following optnames:
1120 TCP_CONGESTION, TCP_BPF_IW, TCP_BPF_SNDCWND_CLAMP.
1121
1122 · IPPROTO_IP, which supports optname IP_TOS.
1123
1124 · IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1125
1126 Return 0 on success, or a negative error in case of failure.
1127
1128 int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode,
1129 u64 flags)
1130
1131 Description
1132 Grow or shrink the room for data in the packet associated
1133 to skb by len_diff, and according to the selected mode.
1134
1135 By default, the helper will reset any offloaded checksum
1136 indicator of the skb to CHECKSUM_NONE. This can be
1137 avoided by the following flag:
1138
1139 · BPF_F_ADJ_ROOM_NO_CSUM_RESET: Do not reset offloaded
1140 checksum data of the skb to CHECKSUM_NONE.
1141
1142 There are two supported modes at this time:
1143
1144 · BPF_ADJ_ROOM_MAC: Adjust room at the mac layer (room
1145 space is added or removed below the layer 2 header).
1146
1147 · BPF_ADJ_ROOM_NET: Adjust room at the network layer
1148 (room space is added or removed below the layer 3
1149 header).
1150
1151 The following flags are supported at this time:
1152
1153 · BPF_F_ADJ_ROOM_FIXED_GSO: Do not adjust gso_size.
1154 Adjusting mss in this way is not allowed for datagrams.
1155
1156 · BPF_F_ADJ_ROOM_ENCAP_L3_IPV4,
1157 BPF_F_ADJ_ROOM_ENCAP_L3_IPV6: Any new space is reserved
1158 to hold a tunnel header. Configure skb offsets and
1159 other fields accordingly.
1160
1161 · BPF_F_ADJ_ROOM_ENCAP_L4_GRE,
1162 BPF_F_ADJ_ROOM_ENCAP_L4_UDP: Use with ENCAP_L3 flags to
1163 further specify the tunnel type.
1164
1165 · BPF_F_ADJ_ROOM_ENCAP_L2(len): Use with ENCAP_L3/L4
1166 flags to further specify the tunnel type; len is the
1167 length of the inner MAC header.
1168
1169 A call to this helper is susceptible to change the under‐
1170 lying packet buffer. Therefore, at load time, all checks
1171 on pointers previously done by the verifier are invali‐
1172 dated and must be performed again, if the helper is used
1173 in combination with direct packet access.
1174
1175 Return 0 on success, or a negative error in case of failure.
1176
1177 int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
1178
1179 Description
1180 Redirect the packet to the endpoint referenced by map at
1181 index key. Depending on its type, this map can contain
1182 references to net devices (for forwarding packets through
1183 other ports), or to CPUs (for redirecting XDP frames to
1184 another CPU; but this is only implemented for native XDP
1185 (with driver support) as of this writing).
1186
1187 The lower two bits of flags are used as the return code
1188 if the map lookup fails. This is so that the return value
1189 can be one of the XDP program return codes up to XDP_TX,
1190 as chosen by the caller. Any higher bits in the flags
1191 argument must be unset.
1192
1193 See also bpf_redirect(), which only supports redirecting
1194 to an ifindex, but doesn't require a map to do so.
1195
1196 Return XDP_REDIRECT on success, or the value of the two lower
1197 bits of the flags argument on error.
1198
1199 int bpf_sk_redirect_map(struct sk_buff *skb, struct bpf_map *map, u32
1200 key, u64 flags)
1201
1202 Description
1203 Redirect the packet to the socket referenced by map (of
1204 type BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1205 egress interfaces can be used for redirection. The
1206 BPF_F_INGRESS value in flags is used to make the distinc‐
1207 tion (ingress path is selected if the flag is present,
1208 egress path otherwise). This is the only flag supported
1209 for now.
1210
1211 Return SK_PASS on success, or SK_DROP on error.
1212
1213 int bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map
1214 *map, void *key, u64 flags)
1215
1216 Description
1217 Add an entry to, or update a map referencing sockets. The
1218 skops is used as a new value for the entry associated to
1219 key. flags is one of:
1220
1221 BPF_NOEXIST
1222 The entry for key must not exist in the map.
1223
1224 BPF_EXIST
1225 The entry for key must already exist in the map.
1226
1227 BPF_ANY
1228 No condition on the existence of the entry for
1229 key.
1230
1231 If the map has eBPF programs (parser and verdict), those
1232 will be inherited by the socket being added. If the
1233 socket is already attached to eBPF programs, this results
1234 in an error.
1235
1236 Return 0 on success, or a negative error in case of failure.
1237
1238 int bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
1239
1240 Description
1241 Adjust the address pointed by xdp_md->data_meta by delta
1242 (which can be positive or negative). Note that this oper‐
1243 ation modifies the address stored in xdp_md->data, so the
1244 latter must be loaded only after the helper has been
1245 called.
1246
1247 The use of xdp_md->data_meta is optional and programs are
1248 not required to use it. The rationale is that when the
1249 packet is processed with XDP (e.g. as DoS filter), it is
1250 possible to push further meta data along with it before
1251 passing to the stack, and to give the guarantee that an
1252 ingress eBPF program attached as a TC classifier on the
1253 same device can pick this up for further post-processing.
1254 Since TC works with socket buffers, it remains possible
1255 to set from XDP the mark or priority pointers, or other
1256 pointers for the socket buffer. Having this scratch
1257 space generic and programmable allows for more flexibil‐
1258 ity as the user is free to store whatever meta data they
1259 need.
1260
1261 A call to this helper is susceptible to change the under‐
1262 lying packet buffer. Therefore, at load time, all checks
1263 on pointers previously done by the verifier are invali‐
1264 dated and must be performed again, if the helper is used
1265 in combination with direct packet access.
1266
1267 Return 0 on success, or a negative error in case of failure.
1268
1269 int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct
1270 bpf_perf_event_value *buf, u32 buf_size)
1271
1272 Description
1273 Read the value of a perf event counter, and store it into
1274 buf of size buf_size. This helper relies on a map of type
1275 BPF_MAP_TYPE_PERF_EVENT_ARRAY. The nature of the perf
1276 event counter is selected when map is updated with perf
1277 event file descriptors. The map is an array whose size is
1278 the number of available CPUs, and each cell contains a
1279 value relative to one CPU. The value to retrieve is indi‐
1280 cated by flags, that contains the index of the CPU to
1281 look up, masked with BPF_F_INDEX_MASK. Alternatively,
1282 flags can be set to BPF_F_CURRENT_CPU to indicate that
1283 the value for the current CPU should be retrieved.
1284
1285 This helper behaves in a way close to
1286 bpf_perf_event_read() helper, save that instead of just
1287 returning the value observed, it fills the buf structure.
1288 This allows for additional data to be retrieved: in par‐
1289 ticular, the enabled and running times (in buf->enabled
1290 and buf->running, respectively) are copied. In general,
1291 bpf_perf_event_read_value() is recommended over
1292 bpf_perf_event_read(), which has some ABI issues and pro‐
1293 vides fewer functionalities.
1294
1295 These values are interesting, because hardware PMU (Per‐
1296 formance Monitoring Unit) counters are limited resources.
1297 When there are more PMU based perf events opened than
1298 available counters, kernel will multiplex these events so
1299 each event gets certain percentage (but not all) of the
1300 PMU time. In case that multiplexing happens, the number
1301 of samples or counter value will not reflect the case
1302 compared to when no multiplexing occurs. This makes com‐
1303 parison between different runs difficult. Typically, the
1304 counter value should be normalized before comparing to
1305 other experiments. The usual normalization is done as
1306 follows.
1307
1308 normalized_counter = counter * t_enabled / t_running
1309
1310 Where t_enabled is the time enabled for event and t_run‐
1311 ning is the time running for event since last normaliza‐
1312 tion. The enabled and running times are accumulated since
1313 the perf event open. To achieve scaling factor between
1314 two invocations of an eBPF program, users can use CPU id
1315 as the key (which is typical for perf array usage model)
1316 to remember the previous value and do the calculation
1317 inside the eBPF program.
1318
1319 Return 0 on success, or a negative error in case of failure.
1320
1321 int bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct
1322 bpf_perf_event_value *buf, u32 buf_size)
1323
1324 Description
1325 For en eBPF program attached to a perf event, retrieve
1326 the value of the event counter associated to ctx and
1327 store it in the structure pointed by buf and of size
1328 buf_size. Enabled and running times are also stored in
1329 the structure (see description of helper
1330 bpf_perf_event_read_value() for more details).
1331
1332 Return 0 on success, or a negative error in case of failure.
1333
1334 int bpf_getsockopt(void *bpf_socket, int level, int optname, void *opt‐
1335 val, int optlen)
1336
1337 Description
1338 Emulate a call to getsockopt() on the socket associated
1339 to bpf_socket, which must be a full socket. The level at
1340 which the option resides and the name optname of the
1341 option must be specified, see getsockopt(2) for more
1342 information. The retrieved value is stored in the struc‐
1343 ture pointed by opval and of length optlen.
1344
1345 bpf_socket should be one of the following:
1346
1347 · struct bpf_sock_ops for BPF_PROG_TYPE_SOCK_OPS.
1348
1349 · struct bpf_sock_addr for BPF_CGROUP_INET4_CONNECT and
1350 BPF_CGROUP_INET6_CONNECT.
1351
1352 This helper actually implements a subset of getsockopt().
1353 It supports the following levels:
1354
1355 · IPPROTO_TCP, which supports optname TCP_CONGESTION.
1356
1357 · IPPROTO_IP, which supports optname IP_TOS.
1358
1359 · IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1360
1361 Return 0 on success, or a negative error in case of failure.
1362
1363 int bpf_override_return(struct pt_regs *regs, u64 rc)
1364
1365 Description
1366 Used for error injection, this helper uses kprobes to
1367 override the return value of the probed function, and to
1368 set it to rc. The first argument is the context regs on
1369 which the kprobe works.
1370
1371 This helper works by setting the PC (program counter) to
1372 an override function which is run in place of the origi‐
1373 nal probed function. This means the probed function is
1374 not run at all. The replacement function just returns
1375 with the required value.
1376
1377 This helper has security implications, and thus is sub‐
1378 ject to restrictions. It is only available if the kernel
1379 was compiled with the CONFIG_BPF_KPROBE_OVERRIDE configu‐
1380 ration option, and in this case it only works on func‐
1381 tions tagged with ALLOW_ERROR_INJECTION in the kernel
1382 code.
1383
1384 Also, the helper is only available for the architectures
1385 having the CONFIG_FUNCTION_ERROR_INJECTION option. As of
1386 this writing, x86 architecture is the only one to support
1387 this feature.
1388
1389 Return 0
1390
1391 int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int
1392 argval)
1393
1394 Description
1395 Attempt to set the value of the bpf_sock_ops_cb_flags
1396 field for the full TCP socket associated to bpf_sock_ops
1397 to argval.
1398
1399 The primary use of this field is to determine if there
1400 should be calls to eBPF programs of type
1401 BPF_PROG_TYPE_SOCK_OPS at various points in the TCP code.
1402 A program of the same type can change its value, per con‐
1403 nection and as necessary, when the connection is estab‐
1404 lished. This field is directly accessible for reading,
1405 but this helper must be used for updates in order to
1406 return an error if an eBPF program tries to set a call‐
1407 back that is not supported in the current kernel.
1408
1409 argval is a flag array which can combine these flags:
1410
1411 · BPF_SOCK_OPS_RTO_CB_FLAG (retransmission time out)
1412
1413 · BPF_SOCK_OPS_RETRANS_CB_FLAG (retransmission)
1414
1415 · BPF_SOCK_OPS_STATE_CB_FLAG (TCP state change)
1416
1417 · BPF_SOCK_OPS_RTT_CB_FLAG (every RTT)
1418
1419 Therefore, this function can be used to clear a callback
1420 flag by setting the appropriate bit to zero. e.g. to dis‐
1421 able the RTO callback:
1422
1423 bpf_sock_ops_cb_flags_set(bpf_sock,
1424 bpf_sock->bpf_sock_ops_cb_flags &
1425 ~BPF_SOCK_OPS_RTO_CB_FLAG)
1426
1427 Here are some examples of where one could call such eBPF
1428 program:
1429
1430 · When RTO fires.
1431
1432 · When a packet is retransmitted.
1433
1434 · When the connection terminates.
1435
1436 · When a packet is sent.
1437
1438 · When a packet is received.
1439
1440 Return Code -EINVAL if the socket is not a full TCP socket; oth‐
1441 erwise, a positive number containing the bits that could
1442 not be set is returned (which comes down to 0 if all bits
1443 were set as required).
1444
1445 int bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map,
1446 u32 key, u64 flags)
1447
1448 Description
1449 This helper is used in programs implementing policies at
1450 the socket level. If the message msg is allowed to pass
1451 (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1452 rect it to the socket referenced by map (of type
1453 BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1454 egress interfaces can be used for redirection. The
1455 BPF_F_INGRESS value in flags is used to make the distinc‐
1456 tion (ingress path is selected if the flag is present,
1457 egress path otherwise). This is the only flag supported
1458 for now.
1459
1460 Return SK_PASS on success, or SK_DROP on error.
1461
1462 int bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
1463
1464 Description
1465 For socket policies, apply the verdict of the eBPF pro‐
1466 gram to the next bytes (number of bytes) of message msg.
1467
1468 For example, this helper can be used in the following
1469 cases:
1470
1471 · A single sendmsg() or sendfile() system call contains
1472 multiple logical messages that the eBPF program is sup‐
1473 posed to read and for which it should apply a verdict.
1474
1475 · An eBPF program only cares to read the first bytes of a
1476 msg. If the message has a large payload, then setting
1477 up and calling the eBPF program repeatedly for all
1478 bytes, even though the verdict is already known, would
1479 create unnecessary overhead.
1480
1481 When called from within an eBPF program, the helper sets
1482 a counter internal to the BPF infrastructure, that is
1483 used to apply the last verdict to the next bytes. If
1484 bytes is smaller than the current data being processed
1485 from a sendmsg() or sendfile() system call, the first
1486 bytes will be sent and the eBPF program will be re-run
1487 with the pointer for start of data pointing to byte num‐
1488 ber bytes + 1. If bytes is larger than the current data
1489 being processed, then the eBPF verdict will be applied to
1490 multiple sendmsg() or sendfile() calls until bytes are
1491 consumed.
1492
1493 Note that if a socket closes with the internal counter
1494 holding a non-zero value, this is not a problem because
1495 data is not being buffered for bytes and is sent as it is
1496 received.
1497
1498 Return 0
1499
1500 int bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
1501
1502 Description
1503 For socket policies, prevent the execution of the verdict
1504 eBPF program for message msg until bytes (byte number)
1505 have been accumulated.
1506
1507 This can be used when one needs a specific number of
1508 bytes before a verdict can be assigned, even if the data
1509 spans multiple sendmsg() or sendfile() calls. The extreme
1510 case would be a user calling sendmsg() repeatedly with
1511 1-byte long message segments. Obviously, this is bad for
1512 performance, but it is still valid. If the eBPF program
1513 needs bytes bytes to validate a header, this helper can
1514 be used to prevent the eBPF program to be called again
1515 until bytes have been accumulated.
1516
1517 Return 0
1518
1519 int bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64
1520 flags)
1521
1522 Description
1523 For socket policies, pull in non-linear data from user
1524 space for msg and set pointers msg->data and
1525 msg->data_end to start and end bytes offsets into msg,
1526 respectively.
1527
1528 If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
1529 it can only parse data that the (data, data_end) pointers
1530 have already consumed. For sendmsg() hooks this is likely
1531 the first scatterlist element. But for calls relying on
1532 the sendpage handler (e.g. sendfile()) this will be the
1533 range (0, 0) because the data is shared with user space
1534 and by default the objective is to avoid allowing user
1535 space to modify data while (or after) eBPF verdict is
1536 being decided. This helper can be used to pull in data
1537 and to set the start and end pointer to given values.
1538 Data will be copied if necessary (i.e. if data was not
1539 linear and if start and end pointers do not point to the
1540 same chunk).
1541
1542 A call to this helper is susceptible to change the under‐
1543 lying packet buffer. Therefore, at load time, all checks
1544 on pointers previously done by the verifier are invali‐
1545 dated and must be performed again, if the helper is used
1546 in combination with direct packet access.
1547
1548 All values for flags are reserved for future usage, and
1549 must be left at zero.
1550
1551 Return 0 on success, or a negative error in case of failure.
1552
1553 int bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int
1554 addr_len)
1555
1556 Description
1557 Bind the socket associated to ctx to the address pointed
1558 by addr, of length addr_len. This allows for making out‐
1559 going connection from the desired IP address, which can
1560 be useful for example when all processes inside a cgroup
1561 should use one single IP address on a host that has mul‐
1562 tiple IP configured.
1563
1564 This helper works for IPv4 and IPv6, TCP and UDP sockets.
1565 The domain (addr->sa_family) must be AF_INET (or
1566 AF_INET6). It's advised to pass zero port (sin_port or
1567 sin6_port) which triggers IP_BIND_ADDRESS_NO_PORT-like
1568 behavior and lets the kernel efficiently pick up an
1569 unused port as long as 4-tuple is unique. Passing
1570 non-zero port might lead to degraded performance.
1571
1572 Return 0 on success, or a negative error in case of failure.
1573
1574 int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
1575
1576 Description
1577 Adjust (move) xdp_md->data_end by delta bytes. It is pos‐
1578 sible to both shrink and grow the packet tail. Shrink
1579 done via delta being a negative integer.
1580
1581 A call to this helper is susceptible to change the under‐
1582 lying packet buffer. Therefore, at load time, all checks
1583 on pointers previously done by the verifier are invali‐
1584 dated and must be performed again, if the helper is used
1585 in combination with direct packet access.
1586
1587 Return 0 on success, or a negative error in case of failure.
1588
1589 int bpf_skb_get_xfrm_state(struct sk_buff *skb, u32 index, struct
1590 bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
1591
1592 Description
1593 Retrieve the XFRM state (IP transform framework, see also
1594 ip-xfrm(8)) at index in XFRM "security path" for skb.
1595
1596 The retrieved value is stored in the struct
1597 bpf_xfrm_state pointed by xfrm_state and of length size.
1598
1599 All values for flags are reserved for future usage, and
1600 must be left at zero.
1601
1602 This helper is available only if the kernel was compiled
1603 with CONFIG_XFRM configuration option.
1604
1605 Return 0 on success, or a negative error in case of failure.
1606
1607 int bpf_get_stack(void *ctx, void *buf, u32 size, u64 flags)
1608
1609 Description
1610 Return a user or a kernel stack in bpf program provided
1611 buffer. To achieve this, the helper needs ctx, which is
1612 a pointer to the context on which the tracing program is
1613 executed. To store the stacktrace, the bpf program pro‐
1614 vides buf with a nonnegative size.
1615
1616 The last argument, flags, holds the number of stack
1617 frames to skip (from 0 to 255), masked with
1618 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set
1619 the following flags:
1620
1621 BPF_F_USER_STACK
1622 Collect a user space stack instead of a kernel
1623 stack.
1624
1625 BPF_F_USER_BUILD_ID
1626 Collect buildid+offset instead of ips for user
1627 stack, only valid if BPF_F_USER_STACK is also
1628 specified.
1629
1630 bpf_get_stack() can collect up to PERF_MAX_STACK_DEPTH
1631 both kernel and user frames, subject to sufficient large
1632 buffer size. Note that this limit can be controlled with
1633 the sysctl program, and that it should be manually
1634 increased in order to profile long user stacks (such as
1635 stacks for Java programs). To do so, use:
1636
1637 # sysctl kernel.perf_event_max_stack=<new value>
1638
1639 Return A non-negative value equal to or less than size on suc‐
1640 cess, or a negative error in case of failure.
1641
1642 int bpf_skb_load_bytes_relative(const void *skb, u32 offset, void *to,
1643 u32 len, u32 start_header)
1644
1645 Description
1646 This helper is similar to bpf_skb_load_bytes() in that it
1647 provides an easy way to load len bytes from offset from
1648 the packet associated to skb, into the buffer pointed by
1649 to. The difference to bpf_skb_load_bytes() is that a
1650 fifth argument start_header exists in order to select a
1651 base offset to start from. start_header can be one of:
1652
1653 BPF_HDR_START_MAC
1654 Base offset to load data from is skb's mac header.
1655
1656 BPF_HDR_START_NET
1657 Base offset to load data from is skb's network
1658 header.
1659
1660 In general, "direct packet access" is the preferred
1661 method to access packet data, however, this helper is in
1662 particular useful in socket filters where skb->data does
1663 not always point to the start of the mac header and where
1664 "direct packet access" is not available.
1665
1666 Return 0 on success, or a negative error in case of failure.
1667
1668 int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen,
1669 u32 flags)
1670
1671 Description
1672 Do FIB lookup in kernel tables using parameters in
1673 params. If lookup is successful and result shows packet
1674 is to be forwarded, the neighbor tables are searched for
1675 the nexthop. If successful (ie., FIB lookup shows for‐
1676 warding and nexthop is resolved), the nexthop address is
1677 returned in ipv4_dst or ipv6_dst based on family, smac is
1678 set to mac address of egress device, dmac is set to nex‐
1679 thop mac address, rt_metric is set to metric from route
1680 (IPv4/IPv6 only), and ifindex is set to the device index
1681 of the nexthop from the FIB lookup.
1682
1683 plen argument is the size of the passed in struct. flags
1684 argument can be a combination of one or more of the fol‐
1685 lowing values:
1686
1687 BPF_FIB_LOOKUP_DIRECT
1688 Do a direct table lookup vs full lookup using FIB
1689 rules.
1690
1691 BPF_FIB_LOOKUP_OUTPUT
1692 Perform lookup from an egress perspective (default
1693 is ingress).
1694
1695 ctx is either struct xdp_md for XDP programs or struct
1696 sk_buff tc cls_act programs.
1697
1698 Return
1699
1700 · < 0 if any input argument is invalid
1701
1702 · 0 on success (packet is forwarded, nexthop neighbor
1703 exists)
1704
1705 · > 0 one of BPF_FIB_LKUP_RET_ codes explaining why the
1706 packet is not forwarded or needs assist from full stack
1707
1708 int bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map
1709 *map, void *key, u64 flags)
1710
1711 Description
1712 Add an entry to, or update a sockhash map referencing
1713 sockets. The skops is used as a new value for the entry
1714 associated to key. flags is one of:
1715
1716 BPF_NOEXIST
1717 The entry for key must not exist in the map.
1718
1719 BPF_EXIST
1720 The entry for key must already exist in the map.
1721
1722 BPF_ANY
1723 No condition on the existence of the entry for
1724 key.
1725
1726 If the map has eBPF programs (parser and verdict), those
1727 will be inherited by the socket being added. If the
1728 socket is already attached to eBPF programs, this results
1729 in an error.
1730
1731 Return 0 on success, or a negative error in case of failure.
1732
1733 int bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map,
1734 void *key, u64 flags)
1735
1736 Description
1737 This helper is used in programs implementing policies at
1738 the socket level. If the message msg is allowed to pass
1739 (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1740 rect it to the socket referenced by map (of type
1741 BPF_MAP_TYPE_SOCKHASH) using hash key. Both ingress and
1742 egress interfaces can be used for redirection. The
1743 BPF_F_INGRESS value in flags is used to make the distinc‐
1744 tion (ingress path is selected if the flag is present,
1745 egress path otherwise). This is the only flag supported
1746 for now.
1747
1748 Return SK_PASS on success, or SK_DROP on error.
1749
1750 int bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void
1751 *key, u64 flags)
1752
1753 Description
1754 This helper is used in programs implementing policies at
1755 the skb socket level. If the sk_buff skb is allowed to
1756 pass (i.e. if the verdeict eBPF program returns
1757 SK_PASS), redirect it to the socket referenced by map (of
1758 type BPF_MAP_TYPE_SOCKHASH) using hash key. Both ingress
1759 and egress interfaces can be used for redirection. The
1760 BPF_F_INGRESS value in flags is used to make the distinc‐
1761 tion (ingress path is selected if the flag is present,
1762 egress otherwise). This is the only flag supported for
1763 now.
1764
1765 Return SK_PASS on success, or SK_DROP on error.
1766
1767 int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32
1768 len)
1769
1770 Description
1771 Encapsulate the packet associated to skb within a Layer 3
1772 protocol header. This header is provided in the buffer at
1773 address hdr, with len its size in bytes. type indicates
1774 the protocol of the header and can be one of:
1775
1776 BPF_LWT_ENCAP_SEG6
1777 IPv6 encapsulation with Segment Routing Header
1778 (struct ipv6_sr_hdr). hdr only contains the SRH,
1779 the IPv6 header is computed by the kernel.
1780
1781 BPF_LWT_ENCAP_SEG6_INLINE
1782 Only works if skb contains an IPv6 packet. Insert
1783 a Segment Routing Header (struct ipv6_sr_hdr)
1784 inside the IPv6 header.
1785
1786 BPF_LWT_ENCAP_IP
1787 IP encapsulation (GRE/GUE/IPIP/etc). The outer
1788 header must be IPv4 or IPv6, followed by zero or
1789 more additional headers, up to LWT_BPF_MAX_HEAD‐
1790 ROOM total bytes in all prepended headers. Please
1791 note that if skb_is_gso(skb) is true, no more than
1792 two headers can be prepended, and the inner
1793 header, if present, should be either GRE or
1794 UDP/GUE.
1795
1796 BPF_LWT_ENCAP_SEG6* types can be called by BPF programs
1797 of type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can
1798 be called by bpf programs of types BPF_PROG_TYPE_LWT_IN
1799 and BPF_PROG_TYPE_LWT_XMIT.
1800
1801 A call to this helper is susceptible to change the under‐
1802 lying packet buffer. Therefore, at load time, all checks
1803 on pointers previously done by the verifier are invali‐
1804 dated and must be performed again, if the helper is used
1805 in combination with direct packet access.
1806
1807 Return 0 on success, or a negative error in case of failure.
1808
1809 int bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const
1810 void *from, u32 len)
1811
1812 Description
1813 Store len bytes from address from into the packet associ‐
1814 ated to skb, at offset. Only the flags, tag and TLVs
1815 inside the outermost IPv6 Segment Routing Header can be
1816 modified through this helper.
1817
1818 A call to this helper is susceptible to change the under‐
1819 lying packet buffer. Therefore, at load time, all checks
1820 on pointers previously done by the verifier are invali‐
1821 dated and must be performed again, if the helper is used
1822 in combination with direct packet access.
1823
1824 Return 0 on success, or a negative error in case of failure.
1825
1826 int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
1827
1828 Description
1829 Adjust the size allocated to TLVs in the outermost IPv6
1830 Segment Routing Header contained in the packet associated
1831 to skb, at position offset by delta bytes. Only offsets
1832 after the segments are accepted. delta can be as well
1833 positive (growing) as negative (shrinking).
1834
1835 A call to this helper is susceptible to change the under‐
1836 lying packet buffer. Therefore, at load time, all checks
1837 on pointers previously done by the verifier are invali‐
1838 dated and must be performed again, if the helper is used
1839 in combination with direct packet access.
1840
1841 Return 0 on success, or a negative error in case of failure.
1842
1843 int bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param,
1844 u32 param_len)
1845
1846 Description
1847 Apply an IPv6 Segment Routing action of type action to
1848 the packet associated to skb. Each action takes a parame‐
1849 ter contained at address param, and of length param_len
1850 bytes. action can be one of:
1851
1852 SEG6_LOCAL_ACTION_END_X
1853 End.X action: Endpoint with Layer-3 cross-connect.
1854 Type of param: struct in6_addr.
1855
1856 SEG6_LOCAL_ACTION_END_T
1857 End.T action: Endpoint with specific IPv6 table
1858 lookup. Type of param: int.
1859
1860 SEG6_LOCAL_ACTION_END_B6
1861 End.B6 action: Endpoint bound to an SRv6 policy.
1862 Type of param: struct ipv6_sr_hdr.
1863
1864 SEG6_LOCAL_ACTION_END_B6_ENCAP
1865 End.B6.Encap action: Endpoint bound to an SRv6
1866 encapsulation policy. Type of param: struct
1867 ipv6_sr_hdr.
1868
1869 A call to this helper is susceptible to change the under‐
1870 lying packet buffer. Therefore, at load time, all checks
1871 on pointers previously done by the verifier are invali‐
1872 dated and must be performed again, if the helper is used
1873 in combination with direct packet access.
1874
1875 Return 0 on success, or a negative error in case of failure.
1876
1877 int bpf_rc_repeat(void *ctx)
1878
1879 Description
1880 This helper is used in programs implementing IR decoding,
1881 to report a successfully decoded repeat key message. This
1882 delays the generation of a key up event for previously
1883 generated key down event.
1884
1885 Some IR protocols like NEC have a special IR message for
1886 repeating last button, for when a button is held down.
1887
1888 The ctx should point to the lirc sample as passed into
1889 the program.
1890
1891 This helper is only available is the kernel was compiled
1892 with the CONFIG_BPF_LIRC_MODE2 configuration option set
1893 to "y".
1894
1895 Return 0
1896
1897 int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
1898
1899 Description
1900 This helper is used in programs implementing IR decoding,
1901 to report a successfully decoded key press with scancode,
1902 toggle value in the given protocol. The scancode will be
1903 translated to a keycode using the rc keymap, and reported
1904 as an input key down event. After a period a key up event
1905 is generated. This period can be extended by calling
1906 either bpf_rc_keydown() again with the same values, or
1907 calling bpf_rc_repeat().
1908
1909 Some protocols include a toggle bit, in case the button
1910 was released and pressed again between consecutive scan‐
1911 codes.
1912
1913 The ctx should point to the lirc sample as passed into
1914 the program.
1915
1916 The protocol is the decoded protocol number (see enum
1917 rc_proto for some predefined values).
1918
1919 This helper is only available is the kernel was compiled
1920 with the CONFIG_BPF_LIRC_MODE2 configuration option set
1921 to "y".
1922
1923 Return 0
1924
1925 u64 bpf_skb_cgroup_id(struct sk_buff *skb)
1926
1927 Description
1928 Return the cgroup v2 id of the socket associated with the
1929 skb. This is roughly similar to the bpf_get_cgroup_clas‐
1930 sid() helper for cgroup v1 by providing a tag resp. iden‐
1931 tifier that can be matched on or used for map lookups
1932 e.g. to implement policy. The cgroup v2 id of a given
1933 path in the hierarchy is exposed in user space through
1934 the f_handle API in order to get to the same 64-bit id.
1935
1936 This helper can be used on TC egress path, but not on
1937 ingress, and is available only if the kernel was compiled
1938 with the CONFIG_SOCK_CGROUP_DATA configuration option.
1939
1940 Return The id is returned or 0 in case the id could not be
1941 retrieved.
1942
1943 u64 bpf_get_current_cgroup_id(void)
1944
1945 Return A 64-bit integer containing the current cgroup id based
1946 on the cgroup within which the current task is running.
1947
1948 void *bpf_get_local_storage(void *map, u64 flags)
1949
1950 Description
1951 Get the pointer to the local storage area. The type and
1952 the size of the local storage is defined by the map argu‐
1953 ment. The flags meaning is specific for each map type,
1954 and has to be 0 for cgroup local storage.
1955
1956 Depending on the BPF program type, a local storage area
1957 can be shared between multiple instances of the BPF pro‐
1958 gram, running simultaneously.
1959
1960 A user should care about the synchronization by himself.
1961 For example, by using the BPF_STX_XADD instruction to
1962 alter the shared data.
1963
1964 Return A pointer to the local storage area.
1965
1966 int bpf_sk_select_reuseport(struct sk_reuseport_md *reuse, struct
1967 bpf_map *map, void *key, u64 flags)
1968
1969 Description
1970 Select a SO_REUSEPORT socket from a BPF_MAP_TYPE_REUSE‐
1971 PORT_ARRAY map. It checks the selected socket is match‐
1972 ing the incoming request in the socket buffer.
1973
1974 Return 0 on success, or a negative error in case of failure.
1975
1976 u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level)
1977
1978 Description
1979 Return id of cgroup v2 that is ancestor of cgroup associ‐
1980 ated with the skb at the ancestor_level. The root cgroup
1981 is at ancestor_level zero and each step down the hierar‐
1982 chy increments the level. If ancestor_level == level of
1983 cgroup associated with skb, then return value will be
1984 same as that of bpf_skb_cgroup_id().
1985
1986 The helper is useful to implement policies based on
1987 cgroups that are upper in hierarchy than immediate cgroup
1988 associated with skb.
1989
1990 The format of returned id and helper limitations are same
1991 as in bpf_skb_cgroup_id().
1992
1993 Return The id is returned or 0 in case the id could not be
1994 retrieved.
1995
1996 struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple
1997 *tuple, u32 tuple_size, u64 netns, u64 flags)
1998
1999 Description
2000 Look for TCP socket matching tuple, optionally in a child
2001 network namespace netns. The return value must be
2002 checked, and if non-NULL, released via bpf_sk_release().
2003
2004 The ctx should point to the context of the program, such
2005 as the skb or socket (depending on the hook in use). This
2006 is used to determine the base network namespace for the
2007 lookup.
2008
2009 tuple_size must be one of:
2010
2011 sizeof(tuple->ipv4)
2012 Look for an IPv4 socket.
2013
2014 sizeof(tuple->ipv6)
2015 Look for an IPv6 socket.
2016
2017 If the netns is a negative signed 32-bit integer, then
2018 the socket lookup table in the netns associated with the
2019 ctx will will be used. For the TC hooks, this is the
2020 netns of the device in the skb. For socket hooks, this is
2021 the netns of the socket. If netns is any other signed
2022 32-bit value greater than or equal to zero then it speci‐
2023 fies the ID of the netns relative to the netns associated
2024 with the ctx. netns values beyond the range of 32-bit
2025 integers are reserved for future use.
2026
2027 All values for flags are reserved for future usage, and
2028 must be left at zero.
2029
2030 This helper is available only if the kernel was compiled
2031 with CONFIG_NET configuration option.
2032
2033 Return Pointer to struct bpf_sock, or NULL in case of failure.
2034 For sockets with reuseport option, the struct bpf_sock
2035 result is from reuse->socks[] using the hash of the
2036 tuple.
2037
2038 struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple
2039 *tuple, u32 tuple_size, u64 netns, u64 flags)
2040
2041 Description
2042 Look for UDP socket matching tuple, optionally in a child
2043 network namespace netns. The return value must be
2044 checked, and if non-NULL, released via bpf_sk_release().
2045
2046 The ctx should point to the context of the program, such
2047 as the skb or socket (depending on the hook in use). This
2048 is used to determine the base network namespace for the
2049 lookup.
2050
2051 tuple_size must be one of:
2052
2053 sizeof(tuple->ipv4)
2054 Look for an IPv4 socket.
2055
2056 sizeof(tuple->ipv6)
2057 Look for an IPv6 socket.
2058
2059 If the netns is a negative signed 32-bit integer, then
2060 the socket lookup table in the netns associated with the
2061 ctx will will be used. For the TC hooks, this is the
2062 netns of the device in the skb. For socket hooks, this is
2063 the netns of the socket. If netns is any other signed
2064 32-bit value greater than or equal to zero then it speci‐
2065 fies the ID of the netns relative to the netns associated
2066 with the ctx. netns values beyond the range of 32-bit
2067 integers are reserved for future use.
2068
2069 All values for flags are reserved for future usage, and
2070 must be left at zero.
2071
2072 This helper is available only if the kernel was compiled
2073 with CONFIG_NET configuration option.
2074
2075 Return Pointer to struct bpf_sock, or NULL in case of failure.
2076 For sockets with reuseport option, the struct bpf_sock
2077 result is from reuse->socks[] using the hash of the
2078 tuple.
2079
2080 int bpf_sk_release(struct bpf_sock *sock)
2081
2082 Description
2083 Release the reference held by sock. sock must be a
2084 non-NULL pointer that was returned from
2085 bpf_sk_lookup_xxx().
2086
2087 Return 0 on success, or a negative error in case of failure.
2088
2089 int bpf_map_push_elem(struct bpf_map *map, const void *value, u64
2090 flags)
2091
2092 Description
2093 Push an element value in map. flags is one of:
2094
2095 BPF_EXIST
2096 If the queue/stack is full, the oldest element is
2097 removed to make room for this.
2098
2099 Return 0 on success, or a negative error in case of failure.
2100
2101 int bpf_map_pop_elem(struct bpf_map *map, void *value)
2102
2103 Description
2104 Pop an element from map.
2105
2106 Return 0 on success, or a negative error in case of failure.
2107
2108 int bpf_map_peek_elem(struct bpf_map *map, void *value)
2109
2110 Description
2111 Get an element from map without removing it.
2112
2113 Return 0 on success, or a negative error in case of failure.
2114
2115 int bpf_msg_push_data(struct sk_msg_buff *msg, u32 start, u32 len, u64
2116 flags)
2117
2118 Description
2119 For socket policies, insert len bytes into msg at offset
2120 start.
2121
2122 If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
2123 it may want to insert metadata or options into the msg.
2124 This can later be read and used by any of the lower layer
2125 BPF hooks.
2126
2127 This helper may fail if under memory pressure (a malloc
2128 fails) in these cases BPF programs will get an appropri‐
2129 ate error and BPF programs will need to handle them.
2130
2131 Return 0 on success, or a negative error in case of failure.
2132
2133 int bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 len, u64
2134 flags)
2135
2136 Description
2137 Will remove len bytes from a msg starting at byte start.
2138 This may result in ENOMEM errors under certain situations
2139 if an allocation and copy are required due to a full ring
2140 buffer. However, the helper will try to avoid doing the
2141 allocation if possible. Other errors can occur if input
2142 parameters are invalid either due to start byte not being
2143 valid part of msg payload and/or pop value being to
2144 large.
2145
2146 Return 0 on success, or a negative error in case of failure.
2147
2148 int bpf_rc_pointer_rel(void *ctx, s32 rel_x, s32 rel_y)
2149
2150 Description
2151 This helper is used in programs implementing IR decoding,
2152 to report a successfully decoded pointer movement.
2153
2154 The ctx should point to the lirc sample as passed into
2155 the program.
2156
2157 This helper is only available is the kernel was compiled
2158 with the CONFIG_BPF_LIRC_MODE2 configuration option set
2159 to "y".
2160
2161 Return 0
2162
2163 int bpf_spin_lock(struct bpf_spin_lock *lock)
2164
2165 Description
2166 Acquire a spinlock represented by the pointer lock, which
2167 is stored as part of a value of a map. Taking the lock
2168 allows to safely update the rest of the fields in that
2169 value. The spinlock can (and must) later be released with
2170 a call to bpf_spin_unlock(lock).
2171
2172 Spinlocks in BPF programs come with a number of restric‐
2173 tions and constraints:
2174
2175 · bpf_spin_lock objects are only allowed inside maps of
2176 types BPF_MAP_TYPE_HASH and BPF_MAP_TYPE_ARRAY (this
2177 list could be extended in the future).
2178
2179 · BTF description of the map is mandatory.
2180
2181 · The BPF program can take ONE lock at a time, since tak‐
2182 ing two or more could cause dead locks.
2183
2184 · Only one struct bpf_spin_lock is allowed per map ele‐
2185 ment.
2186
2187 · When the lock is taken, calls (either BPF to BPF or
2188 helpers) are not allowed.
2189
2190 · The BPF_LD_ABS and BPF_LD_IND instructions are not
2191 allowed inside a spinlock-ed region.
2192
2193 · The BPF program MUST call bpf_spin_unlock() to release
2194 the lock, on all execution paths, before it returns.
2195
2196 · The BPF program can access struct bpf_spin_lock only
2197 via the bpf_spin_lock() and bpf_spin_unlock() helpers.
2198 Loading or storing data into the struct bpf_spin_lock
2199 lock; field of a map is not allowed.
2200
2201 · To use the bpf_spin_lock() helper, the BTF description
2202 of the map value must be a struct and have struct
2203 bpf_spin_lock anyname; field at the top level. Nested
2204 lock inside another struct is not allowed.
2205
2206 · The struct bpf_spin_lock lock field in a map value must
2207 be aligned on a multiple of 4 bytes in that value.
2208
2209 · Syscall with command BPF_MAP_LOOKUP_ELEM does not copy
2210 the bpf_spin_lock field to user space.
2211
2212 · Syscall with command BPF_MAP_UPDATE_ELEM, or update
2213 from a BPF program, do not update the bpf_spin_lock
2214 field.
2215
2216 · bpf_spin_lock cannot be on the stack or inside a net‐
2217 working packet (it can only be inside of a map values).
2218
2219 · bpf_spin_lock is available to root only.
2220
2221 · Tracing programs and socket filter programs cannot use
2222 bpf_spin_lock() due to insufficient preemption checks
2223 (but this may change in the future).
2224
2225 · bpf_spin_lock is not allowed in inner maps of
2226 map-in-map.
2227
2228 Return 0
2229
2230 int bpf_spin_unlock(struct bpf_spin_lock *lock)
2231
2232 Description
2233 Release the lock previously locked by a call to
2234 bpf_spin_lock(lock).
2235
2236 Return 0
2237
2238 struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
2239
2240 Description
2241 This helper gets a struct bpf_sock pointer such that all
2242 the fields in this bpf_sock can be accessed.
2243
2244 Return A struct bpf_sock pointer on success, or NULL in case of
2245 failure.
2246
2247 struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
2248
2249 Description
2250 This helper gets a struct bpf_tcp_sock pointer from a
2251 struct bpf_sock pointer.
2252
2253 Return A struct bpf_tcp_sock pointer on success, or NULL in case
2254 of failure.
2255
2256 int bpf_skb_ecn_set_ce(struct sk_buff *skb)
2257
2258 Description
2259 Set ECN (Explicit Congestion Notification) field of IP
2260 header to CE (Congestion Encountered) if current value is
2261 ECT (ECN Capable Transport). Otherwise, do nothing. Works
2262 with IPv6 and IPv4.
2263
2264 Return 1 if the CE flag is set (either by the current helper
2265 call or because it was already present), 0 if it is not
2266 set.
2267
2268 struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)
2269
2270 Description
2271 Return a struct bpf_sock pointer in TCP_LISTEN state.
2272 bpf_sk_release() is unnecessary and not allowed.
2273
2274 Return A struct bpf_sock pointer on success, or NULL in case of
2275 failure.
2276
2277 struct bpf_sock *bpf_skc_lookup_tcp(void *ctx, struct bpf_sock_tuple
2278 *tuple, u32 tuple_size, u64 netns, u64 flags)
2279
2280 Description
2281 Look for TCP socket matching tuple, optionally in a child
2282 network namespace netns. The return value must be
2283 checked, and if non-NULL, released via bpf_sk_release().
2284
2285 This function is identical to bpf_sk_lookup_tcp(), except
2286 that it also returns timewait or request sockets. Use
2287 bpf_sk_fullsock() or bpf_tcp_sock() to access the full
2288 structure.
2289
2290 This helper is available only if the kernel was compiled
2291 with CONFIG_NET configuration option.
2292
2293 Return Pointer to struct bpf_sock, or NULL in case of failure.
2294 For sockets with reuseport option, the struct bpf_sock
2295 result is from reuse->socks[] using the hash of the
2296 tuple.
2297
2298 int bpf_tcp_check_syncookie(struct bpf_sock *sk, void *iph, u32
2299 iph_len, struct tcphdr *th, u32 th_len)
2300
2301 Description
2302 Check whether iph and th contain a valid SYN cookie ACK
2303 for the listening socket in sk.
2304
2305 iph points to the start of the IPv4 or IPv6 header, while
2306 iph_len contains sizeof(struct iphdr) or sizeof(struct
2307 ip6hdr).
2308
2309 th points to the start of the TCP header, while th_len
2310 contains sizeof(struct tcphdr).
2311
2312 Return 0 if iph and th are a valid SYN cookie ACK, or a negative
2313 error otherwise.
2314
2315 int bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t
2316 buf_len, u64 flags)
2317
2318 Description
2319 Get name of sysctl in /proc/sys/ and copy it into pro‐
2320 vided by program buffer buf of size buf_len.
2321
2322 The buffer is always NUL terminated, unless it's
2323 zero-sized.
2324
2325 If flags is zero, full name (e.g. "net/ipv4/tcp_mem") is
2326 copied. Use BPF_F_SYSCTL_BASE_NAME flag to copy base name
2327 only (e.g. "tcp_mem").
2328
2329 Return Number of character copied (not including the trailing
2330 NUL).
2331
2332 -E2BIG if the buffer wasn't big enough (buf will contain
2333 truncated name in this case).
2334
2335 int bpf_sysctl_get_current_value(struct bpf_sysctl *ctx, char *buf,
2336 size_t buf_len)
2337
2338 Description
2339 Get current value of sysctl as it is presented in
2340 /proc/sys (incl. newline, etc), and copy it as a string
2341 into provided by program buffer buf of size buf_len.
2342
2343 The whole value is copied, no matter what file position
2344 user space issued e.g. sys_read at.
2345
2346 The buffer is always NUL terminated, unless it's
2347 zero-sized.
2348
2349 Return Number of character copied (not including the trailing
2350 NUL).
2351
2352 -E2BIG if the buffer wasn't big enough (buf will contain
2353 truncated name in this case).
2354
2355 -EINVAL if current value was unavailable, e.g. because
2356 sysctl is uninitialized and read returns -EIO for it.
2357
2358 int bpf_sysctl_get_new_value(struct bpf_sysctl *ctx, char *buf, size_t
2359 buf_len)
2360
2361 Description
2362 Get new value being written by user space to sysctl
2363 (before the actual write happens) and copy it as a string
2364 into provided by program buffer buf of size buf_len.
2365
2366 User space may write new value at file position > 0.
2367
2368 The buffer is always NUL terminated, unless it's
2369 zero-sized.
2370
2371 Return Number of character copied (not including the trailing
2372 NUL).
2373
2374 -E2BIG if the buffer wasn't big enough (buf will contain
2375 truncated name in this case).
2376
2377 -EINVAL if sysctl is being read.
2378
2379 int bpf_sysctl_set_new_value(struct bpf_sysctl *ctx, const char *buf,
2380 size_t buf_len)
2381
2382 Description
2383 Override new value being written by user space to sysctl
2384 with value provided by program in buffer buf of size
2385 buf_len.
2386
2387 buf should contain a string in same form as provided by
2388 user space on sysctl write.
2389
2390 User space may write new value at file position > 0. To
2391 override the whole sysctl value file position should be
2392 set to zero.
2393
2394 Return 0 on success.
2395
2396 -E2BIG if the buf_len is too big.
2397
2398 -EINVAL if sysctl is being read.
2399
2400 int bpf_strtol(const char *buf, size_t buf_len, u64 flags, long *res)
2401
2402 Description
2403 Convert the initial part of the string from buffer buf of
2404 size buf_len to a long integer according to the given
2405 base and save the result in res.
2406
2407 The string may begin with an arbitrary amount of white
2408 space (as determined by isspace(3)) followed by a single
2409 optional '-' sign.
2410
2411 Five least significant bits of flags encode base, other
2412 bits are currently unused.
2413
2414 Base must be either 8, 10, 16 or 0 to detect it automati‐
2415 cally similar to user space strtol(3).
2416
2417 Return Number of characters consumed on success. Must be posi‐
2418 tive but no more than buf_len.
2419
2420 -EINVAL if no valid digits were found or unsupported base
2421 was provided.
2422
2423 -ERANGE if resulting value was out of range.
2424
2425 int bpf_strtoul(const char *buf, size_t buf_len, u64 flags, unsigned
2426 long *res)
2427
2428 Description
2429 Convert the initial part of the string from buffer buf of
2430 size buf_len to an unsigned long integer according to the
2431 given base and save the result in res.
2432
2433 The string may begin with an arbitrary amount of white
2434 space (as determined by isspace(3)).
2435
2436 Five least significant bits of flags encode base, other
2437 bits are currently unused.
2438
2439 Base must be either 8, 10, 16 or 0 to detect it automati‐
2440 cally similar to user space strtoul(3).
2441
2442 Return Number of characters consumed on success. Must be posi‐
2443 tive but no more than buf_len.
2444
2445 -EINVAL if no valid digits were found or unsupported base
2446 was provided.
2447
2448 -ERANGE if resulting value was out of range.
2449
2450 void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void
2451 *value, u64 flags)
2452
2453 Description
2454 Get a bpf-local-storage from a sk.
2455
2456 Logically, it could be thought of getting the value from
2457 a map with sk as the key. From this perspective, the
2458 usage is not much different from bpf_map_lookup_elem(map,
2459 &sk) except this helper enforces the key must be a full
2460 socket and the map must be a BPF_MAP_TYPE_SK_STORAGE
2461 also.
2462
2463 Underneath, the value is stored locally at sk instead of
2464 the map. The map is used as the bpf-local-storage
2465 "type". The bpf-local-storage "type" (i.e. the map) is
2466 searched against all bpf-local-storages residing at sk.
2467
2468 An optional flags (BPF_SK_STORAGE_GET_F_CREATE) can be
2469 used such that a new bpf-local-storage will be created if
2470 one does not exist. value can be used together with
2471 BPF_SK_STORAGE_GET_F_CREATE to specify the initial value
2472 of a bpf-local-storage. If value is NULL, the new
2473 bpf-local-storage will be zero initialized.
2474
2475 Return A bpf-local-storage pointer is returned on success.
2476
2477 NULL if not found or there was an error in adding a new
2478 bpf-local-storage.
2479
2480 int bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk)
2481
2482 Description
2483 Delete a bpf-local-storage from a sk.
2484
2485 Return 0 on success.
2486
2487 -ENOENT if the bpf-local-storage cannot be found.
2488
2489 int bpf_send_signal(u32 sig)
2490
2491 Description
2492 Send signal sig to the process of the current task. The
2493 signal may be delivered to any of this process's threads.
2494
2495 Return 0 on success or successfully queued.
2496
2497 -EBUSY if work queue under nmi is full.
2498
2499 -EINVAL if sig is invalid.
2500
2501 -EPERM if no permission to send the sig.
2502
2503 -EAGAIN if bpf program can try again.
2504
2505 s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len,
2506 struct tcphdr *th, u32 th_len)
2507
2508 Description
2509 Try to issue a SYN cookie for the packet with correspond‐
2510 ing IP/TCP headers, iph and th, on the listening socket
2511 in sk.
2512
2513 iph points to the start of the IPv4 or IPv6 header, while
2514 iph_len contains sizeof(struct iphdr) or sizeof(struct
2515 ip6hdr).
2516
2517 th points to the start of the TCP header, while th_len
2518 contains the length of the TCP header.
2519
2520 Return On success, lower 32 bits hold the generated SYN cookie
2521 in followed by 16 bits which hold the MSS value for that
2522 cookie, and the top 16 bits are unused.
2523
2524 On failure, the returned value is one of the following:
2525
2526 -EINVAL SYN cookie cannot be issued due to error
2527
2528 -ENOENT SYN cookie should not be issued (no SYN flood)
2529
2530 -EOPNOTSUPP kernel configuration does not enable SYN
2531 cookies
2532
2533 -EPROTONOSUPPORT IP packet version is not 4 or 6
2534
2535 int bpf_skb_output(void *ctx, struct bpf_map *map, u64 flags, void
2536 *data, u64 size)
2537
2538 Description
2539 Write raw data blob into a special BPF perf event held by
2540 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
2541 event must have the following attributes: PERF_SAMPLE_RAW
2542 as sample_type, PERF_TYPE_SOFTWARE as type, and
2543 PERF_COUNT_SW_BPF_OUTPUT as config.
2544
2545 The flags are used to indicate the index in map for which
2546 the value must be put, masked with BPF_F_INDEX_MASK.
2547 Alternatively, flags can be set to BPF_F_CURRENT_CPU to
2548 indicate that the index of the current CPU core should be
2549 used.
2550
2551 The value to write, of size, is passed through eBPF stack
2552 and pointed by data.
2553
2554 ctx is a pointer to in-kernel struct sk_buff.
2555
2556 This helper is similar to bpf_perf_event_output() but
2557 restricted to raw_tracepoint bpf programs.
2558
2559 Return 0 on success, or a negative error in case of failure.
2560
2561 int bpf_probe_read_user(void *dst, u32 size, const void *unsafe_ptr)
2562
2563 Description
2564 Safely attempt to read size bytes from user space address
2565 unsafe_ptr and store the data in dst.
2566
2567 Return 0 on success, or a negative error in case of failure.
2568
2569 int bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
2570
2571 Description
2572 Safely attempt to read size bytes from kernel space
2573 address unsafe_ptr and store the data in dst.
2574
2575 Return 0 on success, or a negative error in case of failure.
2576
2577 int bpf_probe_read_user_str(void *dst, u32 size, const void
2578 *unsafe_ptr)
2579
2580 Description
2581 Copy a NUL terminated string from an unsafe user address
2582 unsafe_ptr to dst. The size should include the terminat‐
2583 ing NUL byte. In case the string length is smaller than
2584 size, the target is not padded with further NUL bytes. If
2585 the string length is larger than size, just size-1 bytes
2586 are copied and the last byte is set to NUL.
2587
2588 On success, the length of the copied string is returned.
2589 This makes this helper useful in tracing programs for
2590 reading strings, and more importantly to get its length
2591 at runtime. See the following snippet:
2592
2593 SEC("kprobe/sys_open")
2594 void bpf_sys_open(struct pt_regs *ctx)
2595 {
2596 char buf[PATHLEN]; // PATHLEN is defined to 256
2597 int res = bpf_probe_read_user_str(buf, sizeof(buf),
2598 ctx->di);
2599
2600 // Consume buf, for example push it to
2601 // userspace via bpf_perf_event_output(); we
2602 // can use res (the string length) as event
2603 // size, after checking its boundaries.
2604 }
2605
2606 In comparison, using bpf_probe_read_user() helper here
2607 instead to read the string would require to estimate the
2608 length at compile time, and would often result in copying
2609 more memory than necessary.
2610
2611 Another useful use case is when parsing individual
2612 process arguments or individual environment variables
2613 navigating current->mm->arg_start and cur‐
2614 rent->mm->env_start: using this helper and the return
2615 value, one can quickly iterate at the right offset of the
2616 memory area.
2617
2618 Return On success, the strictly positive length of the string,
2619 including the trailing NUL character. On error, a nega‐
2620 tive value.
2621
2622 int bpf_probe_read_kernel_str(void *dst, u32 size, const void
2623 *unsafe_ptr)
2624
2625 Description
2626 Copy a NUL terminated string from an unsafe kernel
2627 address unsafe_ptr to dst. Same semantics as with
2628 bpf_probe_read_user_str() apply.
2629
2630 Return On success, the strictly positive length of the string,
2631 including the trailing NUL character. On error, a nega‐
2632 tive value.
2633
2634 int bpf_tcp_send_ack(void *tp, u32 rcv_nxt)
2635
2636 Description
2637 Send out a tcp-ack. tp is the in-kernel struct tcp_sock.
2638 rcv_nxt is the ack_seq to be sent out.
2639
2640 Return 0 on success, or a negative error in case of failure.
2641
2642 int bpf_send_signal_thread(u32 sig)
2643
2644 Description
2645 Send signal sig to the thread corresponding to the cur‐
2646 rent task.
2647
2648 Return 0 on success or successfully queued.
2649
2650 -EBUSY if work queue under nmi is full.
2651
2652 -EINVAL if sig is invalid.
2653
2654 -EPERM if no permission to send the sig.
2655
2656 -EAGAIN if bpf program can try again.
2657
2658 u64 bpf_jiffies64(void)
2659
2660 Description
2661 Obtain the 64bit jiffies
2662
2663 Return The 64 bit jiffies
2664
2665 int bpf_read_branch_records(struct bpf_perf_event_data *ctx, void *buf,
2666 u32 size, u64 flags)
2667
2668 Description
2669 For an eBPF program attached to a perf event, retrieve
2670 the branch records (struct perf_branch_entry) associated
2671 to ctx and store it in the buffer pointed by buf up to
2672 size size bytes.
2673
2674 Return On success, number of bytes written to buf. On error, a
2675 negative value.
2676
2677 The flags can be set to BPF_F_GET_BRANCH_RECORDS_SIZE to
2678 instead return the number of bytes required to store all
2679 the branch entries. If this flag is set, buf may be NULL.
2680
2681 -EINVAL if arguments invalid or size not a multiple of
2682 sizeof(struct perf_branch_entry).
2683
2684 -ENOENT if architecture does not support branch records.
2685
2686 int bpf_get_ns_current_pid_tgid(u64 dev, u64 ino, struct bpf_pidns_info
2687 *nsdata, u32 size)
2688
2689 Description
2690 Returns 0 on success, values for pid and tgid as seen
2691 from the current namespace will be returned in nsdata.
2692
2693 Return 0 on success, or one of the following in case of failure:
2694
2695 -EINVAL if dev and inum supplied don't match dev_t and
2696 inode number with nsfs of current task, or if dev conver‐
2697 sion to dev_t lost high bits.
2698
2699 -ENOENT if pidns does not exists for the current task.
2700
2701 int bpf_xdp_output(void *ctx, struct bpf_map *map, u64 flags, void
2702 *data, u64 size)
2703
2704 Description
2705 Write raw data blob into a special BPF perf event held by
2706 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
2707 event must have the following attributes: PERF_SAMPLE_RAW
2708 as sample_type, PERF_TYPE_SOFTWARE as type, and
2709 PERF_COUNT_SW_BPF_OUTPUT as config.
2710
2711 The flags are used to indicate the index in map for which
2712 the value must be put, masked with BPF_F_INDEX_MASK.
2713 Alternatively, flags can be set to BPF_F_CURRENT_CPU to
2714 indicate that the index of the current CPU core should be
2715 used.
2716
2717 The value to write, of size, is passed through eBPF stack
2718 and pointed by data.
2719
2720 ctx is a pointer to in-kernel struct xdp_buff.
2721
2722 This helper is similar to bpf_perf_eventoutput() but
2723 restricted to raw_tracepoint bpf programs.
2724
2725 Return 0 on success, or a negative error in case of failure.
2726
2727 u64 bpf_get_netns_cookie(void *ctx)
2728
2729 Description
2730 Retrieve the cookie (generated by the kernel) of the net‐
2731 work namespace the input ctx is associated with. The net‐
2732 work namespace cookie remains stable for its lifetime and
2733 provides a global identifier that can be assumed unique.
2734 If ctx is NULL, then the helper returns the cookie for
2735 the initial network namespace. The cookie itself is very
2736 similar to that of bpf_get_socket_cookie() helper, but
2737 for network namespaces instead of sockets.
2738
2739 Return A 8-byte long opaque number.
2740
2741 u64 bpf_get_current_ancestor_cgroup_id(int ancestor_level)
2742
2743 Description
2744 Return id of cgroup v2 that is ancestor of the cgroup
2745 associated with the current task at the ancestor_level.
2746 The root cgroup is at ancestor_level zero and each step
2747 down the hierarchy increments the level. If ances‐
2748 tor_level == level of cgroup associated with the current
2749 task, then return value will be the same as that of
2750 bpf_get_current_cgroup_id().
2751
2752 The helper is useful to implement policies based on
2753 cgroups that are upper in hierarchy than immediate cgroup
2754 associated with the current task.
2755
2756 The format of returned id and helper limitations are same
2757 as in bpf_get_current_cgroup_id().
2758
2759 Return The id is returned or 0 in case the id could not be
2760 retrieved.
2761
2762 int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
2763
2764 Description
2765 Assign the sk to the skb. When combined with appropriate
2766 routing configuration to receive the packet towards the
2767 socket, will cause skb to be delivered to the specified
2768 socket. Subsequent redirection of skb via bpf_redi‐
2769 rect(), bpf_clone_redirect() or other methods outside of
2770 BPF may interfere with successful delivery to the socket.
2771
2772 This operation is only valid from TC ingress path.
2773
2774 The flags argument must be zero.
2775
2776 Return 0 on success, or a negative error in case of failure:
2777
2778 -EINVAL if specified flags are not supported.
2779
2780 -ENOENT if the socket is unavailable for assignment.
2781
2782 -ENETUNREACH if the socket is unreachable (wrong netns).
2783
2784 -EOPNOTSUPP if the operation is not supported, for exam‐
2785 ple a call from outside of TC ingress.
2786
2787 -ESOCKTNOSUPPORT if the socket type is not supported
2788 (reuseport).
2789
2790 u64 bpf_ktime_get_boot_ns(void)
2791
2792 Description
2793 Return the time elapsed since system boot, in nanosec‐
2794 onds. Does include the time the system was suspended.
2795 See: clock_gettime(CLOCK_BOOTTIME)
2796
2797 Return Current ktime.
2798
2799 int bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size,
2800 const void *data, u32 data_len)
2801
2802 Description
2803 bpf_seq_printf() uses seq_file seq_printf() to print out
2804 the format string. The m represents the seq_file. The
2805 fmt and fmt_size are for the format string itself. The
2806 data and data_len are format string arguments. The data
2807 are a u64 array and corresponding format string values
2808 are stored in the array. For strings and pointers where
2809 pointees are accessed, only the pointer values are stored
2810 in the data array. The data_len is the size of data in
2811 bytes.
2812
2813 Formats %s, %p{i,I}{4,6} requires to read kernel memory.
2814 Reading kernel memory may fail due to either invalid
2815 address or valid address but requiring a major memory
2816 fault. If reading kernel memory fails, the string for %s
2817 will be an empty string, and the ip address for
2818 %p{i,I}{4,6} will be 0. Not returning error to bpf pro‐
2819 gram is consistent with what bpf_trace_printk() does for
2820 now.
2821
2822 Return 0 on success, or a negative error in case of failure:
2823
2824 -EBUSY if per-CPU memory copy buffer is busy, can try
2825 again by returning 1 from bpf program.
2826
2827 -EINVAL if arguments are invalid, or if fmt is
2828 invalid/unsupported.
2829
2830 -E2BIG if fmt contains too many format specifiers.
2831
2832 -EOVERFLOW if an overflow happened: The same object will
2833 be tried again.
2834
2835 int bpf_seq_write(struct seq_file *m, const void *data, u32 len)
2836
2837 Description
2838 bpf_seq_write() uses seq_file seq_write() to write the
2839 data. The m represents the seq_file. The data and len
2840 represent the data to write in bytes.
2841
2842 Return 0 on success, or a negative error in case of failure:
2843
2844 -EOVERFLOW if an overflow happened: The same object will
2845 be tried again.
2846
2847 u64 bpf_sk_cgroup_id(struct bpf_sock *sk)
2848
2849 Description
2850 Return the cgroup v2 id of the socket sk.
2851
2852 sk must be a non-NULL pointer to a full socket, e.g. one
2853 returned from bpf_sk_lookup_xxx(), bpf_sk_fullsock(),
2854 etc. The format of returned id is same as in
2855 bpf_skb_cgroup_id().
2856
2857 This helper is available only if the kernel was compiled
2858 with the CONFIG_SOCK_CGROUP_DATA configuration option.
2859
2860 Return The id is returned or 0 in case the id could not be
2861 retrieved.
2862
2863 u64 bpf_sk_ancestor_cgroup_id(struct bpf_sock *sk, int ancestor_level)
2864
2865 Description
2866 Return id of cgroup v2 that is ancestor of cgroup associ‐
2867 ated with the sk at the ancestor_level. The root cgroup
2868 is at ancestor_level zero and each step down the hierar‐
2869 chy increments the level. If ancestor_level == level of
2870 cgroup associated with sk, then return value will be same
2871 as that of bpf_sk_cgroup_id().
2872
2873 The helper is useful to implement policies based on
2874 cgroups that are upper in hierarchy than immediate cgroup
2875 associated with sk.
2876
2877 The format of returned id and helper limitations are same
2878 as in bpf_sk_cgroup_id().
2879
2880 Return The id is returned or 0 in case the id could not be
2881 retrieved.
2882
2883 void *bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64
2884 flags)
2885
2886 Description
2887 Copy size bytes from data into a ring buffer ringbuf. If
2888 BPF_RB_NO_WAKEUP is specified in flags, no notification
2889 of new data availability is sent. IF BPF_RB_FORCE_WAKEUP
2890 is specified in flags, notification of new data avail‐
2891 ability is sent unconditionally.
2892
2893 Return 0, on success; < 0, on error.
2894
2895 void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
2896
2897 Description
2898 Reserve size bytes of payload in a ring buffer ringbuf.
2899
2900 Return Valid pointer with size bytes of memory available; NULL,
2901 otherwise.
2902
2903 void bpf_ringbuf_submit(void *data, u64 flags)
2904
2905 Description
2906 Submit reserved ring buffer sample, pointed to by data.
2907 If BPF_RB_NO_WAKEUP is specified in flags, no notifica‐
2908 tion of new data availability is sent. IF
2909 BPF_RB_FORCE_WAKEUP is specified in flags, notification
2910 of new data availability is sent unconditionally.
2911
2912 Return Nothing. Always succeeds.
2913
2914 void bpf_ringbuf_discard(void *data, u64 flags)
2915
2916 Description
2917 Discard reserved ring buffer sample, pointed to by data.
2918 If BPF_RB_NO_WAKEUP is specified in flags, no notifica‐
2919 tion of new data availability is sent. IF
2920 BPF_RB_FORCE_WAKEUP is specified in flags, notification
2921 of new data availability is sent unconditionally.
2922
2923 Return Nothing. Always succeeds.
2924
2925 u64 bpf_ringbuf_query(void *ringbuf, u64 flags)
2926
2927 Description
2928 Query various characteristics of provided ring buffer.
2929 What exactly is queries is determined by flags:
2930
2931 System Message: ERROR/3 (/tmp/bpf-helpers.rst:, line 2636)
2932 Unexpected indentation.
2933
2934 · BPF_RB_AVAIL_DATA - amount of data not yet consumed;
2935
2936 · BPF_RB_RING_SIZE - the size of ring buffer;
2937
2938 · BPF_RB_CONS_POS - consumer position (can wrap
2939 around);
2940
2941 · BPF_RB_PROD_POS - producer(s) position (can wrap
2942 around);
2943
2944 System Message: WARNING/2 (/tmp/bpf-helpers.rst:, line 2640)
2945 Block quote ends without a blank line; unexpected unin‐
2946 dent.
2947
2948 Data returned is just a momentary snapshots of actual
2949 values and could be inaccurate, so this facility should
2950 be used to power heuristics and for reporting, not to
2951 make 100% correct calculation.
2952
2953 Return Requested value, or 0, if flags are not recognized.
2954
2955 int bpf_csum_level(struct sk_buff *skb, u64 level)
2956
2957 Description
2958 Change the skbs checksum level by one layer up or down,
2959 or reset it entirely to none in order to have the stack
2960 perform checksum validation. The level is applicable to
2961 the following protocols: TCP, UDP, GRE, SCTP, FCOE. For
2962 example, a decap of | ETH | IP | UDP | GUE | IP | TCP |
2963 into | ETH | IP | TCP | through bpf_skb_adjust_room()
2964 helper with passing in BPF_F_ADJ_ROOM_NO_CSUM_RESET flag
2965 would require one call to bpf_csum_level() with
2966 BPF_CSUM_LEVEL_DEC since the UDP header is removed. Simi‐
2967 larly, an encap of the latter into the former could be
2968 accompanied by a helper call to bpf_csum_level() with
2969 BPF_CSUM_LEVEL_INC if the skb is still intended to be
2970 processed in higher layers of the stack instead of just
2971 egressing at tc.
2972
2973 There are three supported level settings at this time:
2974
2975 · BPF_CSUM_LEVEL_INC: Increases skb->csum_level for skbs
2976 with CHECKSUM_UNNECESSARY.
2977
2978 · BPF_CSUM_LEVEL_DEC: Decreases skb->csum_level for skbs
2979 with CHECKSUM_UNNECESSARY.
2980
2981 · BPF_CSUM_LEVEL_RESET: Resets skb->csum_level to 0 and
2982 sets CHECKSUM_NONE to force checksum validation by the
2983 stack.
2984
2985 · BPF_CSUM_LEVEL_QUERY: No-op, returns the current
2986 skb->csum_level.
2987
2988 Return 0 on success, or a negative error in case of failure. In
2989 the case of BPF_CSUM_LEVEL_QUERY, the current
2990 skb->csum_level is returned or the error code -EACCES in
2991 case the skb is not subject to CHECKSUM_UNNECESSARY.
2992
2994 Example usage for most of the eBPF helpers listed in this manual page
2995 are available within the Linux kernel sources, at the following loca‐
2996 tions:
2997
2998 · samples/bpf/
2999
3000 · tools/testing/selftests/bpf/
3001
3003 eBPF programs can have an associated license, passed along with the
3004 bytecode instructions to the kernel when the programs are loaded. The
3005 format for that string is identical to the one in use for kernel mod‐
3006 ules (Dual licenses, such as "Dual BSD/GPL", may be used). Some helper
3007 functions are only accessible to programs that are compatible with the
3008 GNU Privacy License (GPL).
3009
3010 In order to use such helpers, the eBPF program must be loaded with the
3011 correct license string passed (via attr) to the bpf() system call, and
3012 this generally translates into the C source code of the program con‐
3013 taining a line similar to the following:
3014
3015 char ____license[] __attribute__((section("license"), used)) = "GPL";
3016
3018 This manual page is an effort to document the existing eBPF helper
3019 functions. But as of this writing, the BPF sub-system is under heavy
3020 development. New eBPF program or map types are added, along with new
3021 helper functions. Some helpers are occasionally made available for
3022 additional program types. So in spite of the efforts of the community,
3023 this page might not be up-to-date. If you want to check by yourself
3024 what helper functions exist in your kernel, or what types of programs
3025 they can support, here are some files among the kernel tree that you
3026 may be interested in:
3027
3028 · include/uapi/linux/bpf.h is the main BPF header. It contains the full
3029 list of all helper functions, as well as many other BPF definitions
3030 including most of the flags, structs or constants used by the
3031 helpers.
3032
3033 · net/core/filter.c contains the definition of most network-related
3034 helper functions, and the list of program types from which they can
3035 be used.
3036
3037 · kernel/trace/bpf_trace.c is the equivalent for most tracing pro‐
3038 gram-related helpers.
3039
3040 · kernel/bpf/verifier.c contains the functions used to check that valid
3041 types of eBPF maps are used with a given helper function.
3042
3043 · kernel/bpf/ directory contains other files in which additional
3044 helpers are defined (for cgroups, sockmaps, etc.).
3045
3046 · The bpftool utility can be used to probe the availability of helper
3047 functions on the system (as well as supported program and map types,
3048 and a number of other parameters). To do so, run bpftool feature
3049 probe (see bpftool-feature(8) for details). Add the unprivileged key‐
3050 word to list features available to unprivileged users.
3051
3052 Compatibility between helper functions and program types can generally
3053 be found in the files where helper functions are defined. Look for the
3054 struct bpf_func_proto objects and for functions returning them: these
3055 functions contain a list of helpers that a given program type can call.
3056 Note that the default: label of the switch ... case used to filter
3057 helpers can call other functions, themselves allowing access to addi‐
3058 tional helpers. The requirement for GPL license is also in those struct
3059 bpf_func_proto.
3060
3061 Compatibility between helper functions and map types can be found in
3062 the check_map_func_compatibility() function in file kernel/bpf/veri‐
3063 fier.c.
3064
3065 Helper functions that invalidate the checks on data and data_end point‐
3066 ers for network processing are listed in function
3067 bpf_helper_changes_pkt_data() in file net/core/filter.c.
3068
3070 bpf(2), bpftool(8), cgroups(7), ip(8), perf_event_open(2), sendmsg(2),
3071 socket(7), tc-bpf(8)
3072
3074 This page is part of release 5.07 of the Linux man-pages project. A
3075 description of the project, information about reporting bugs, and the
3076 latest version of this page, can be found at
3077 https://www.kernel.org/doc/man-pages/.
3078
3079
3080
3081 BPF-HELPERS(7)