1BPF-HELPERS(7) Miscellaneous Information Manual BPF-HELPERS(7)
2
3
4
6 BPF-HELPERS - list of eBPF helper functions
7
9 The extended Berkeley Packet Filter (eBPF) subsystem consists in pro‐
10 grams written in a pseudo-assembly language, then attached to one of
11 the several kernel hooks and run in reaction of specific events. This
12 framework differs from the older, "classic" BPF (or "cBPF") in several
13 aspects, one of them being the ability to call special functions (or
14 "helpers") from within a program. These functions are restricted to a
15 white-list of helpers defined in the kernel.
16
17 These helpers are used by eBPF programs to interact with the system, or
18 with the context in which they work. For instance, they can be used to
19 print debugging messages, to get the time since the system was booted,
20 to interact with eBPF maps, or to manipulate network packets. Since
21 there are several eBPF program types, and that they do not run in the
22 same context, each program type can only call a subset of those
23 helpers.
24
25 Due to eBPF conventions, a helper can not have more than five argu‐
26 ments.
27
28 Internally, eBPF programs call directly into the compiled helper func‐
29 tions without requiring any foreign-function interface. As a result,
30 calling helpers introduces no overhead, thus offering excellent perfor‐
31 mance.
32
33 This document is an attempt to list and document the helpers available
34 to eBPF developers. They are sorted by chronological order (the oldest
35 helpers in the kernel at the top).
36
38 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
39
40 Description
41 Perform a lookup in map for an entry associated to key.
42
43 Return Map value associated to key, or NULL if no entry was
44 found.
45
46 long bpf_map_update_elem(struct bpf_map *map, const void *key, const
47 void *value, u64 flags)
48
49 Description
50 Add or update the value of the entry associated to key in
51 map with value. flags is one of:
52
53 BPF_NOEXIST
54 The entry for key must not exist in the map.
55
56 BPF_EXIST
57 The entry for key must already exist in the map.
58
59 BPF_ANY
60 No condition on the existence of the entry for
61 key.
62
63 Flag value BPF_NOEXIST cannot be used for maps of types
64 BPF_MAP_TYPE_ARRAY or BPF_MAP_TYPE_PERCPU_ARRAY (all el‐
65 ements always exist), the helper would return an error.
66
67 Return 0 on success, or a negative error in case of failure.
68
69 long bpf_map_delete_elem(struct bpf_map *map, const void *key)
70
71 Description
72 Delete entry with key from map.
73
74 Return 0 on success, or a negative error in case of failure.
75
76 long bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
77
78 Description
79 For tracing programs, safely attempt to read size bytes
80 from kernel space address unsafe_ptr and store the data
81 in dst.
82
83 Generally, use bpf_probe_read_user() or
84 bpf_probe_read_kernel() instead.
85
86 Return 0 on success, or a negative error in case of failure.
87
88 u64 bpf_ktime_get_ns(void)
89
90 Description
91 Return the time elapsed since system boot, in nanosec‐
92 onds. Does not include time the system was suspended.
93 See: clock_gettime(CLOCK_MONOTONIC)
94
95 Return Current ktime.
96
97 long bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
98
99 Description
100 This helper is a "printk()-like" facility for debugging.
101 It prints a message defined by format fmt (of size
102 fmt_size) to file /sys/kernel/debug/tracing/trace from
103 DebugFS, if available. It can take up to three additional
104 u64 arguments (as an eBPF helpers, the total number of
105 arguments is limited to five).
106
107 Each time the helper is called, it appends a line to the
108 trace. Lines are discarded while /sys/kernel/debug/trac‐
109 ing/trace is open, use /sys/kernel/debug/trac‐
110 ing/trace_pipe to avoid this. The format of the trace is
111 customizable, and the exact output one will get depends
112 on the options set in /sys/kernel/debug/tracing/trace_op‐
113 tions (see also the README file under the same direc‐
114 tory). However, it usually defaults to something like:
115
116 telnet-470 [001] .N.. 419421.045894: 0x00000001: <fmt>
117
118 In the above:
119
120 • telnet is the name of the current task.
121
122 • 470 is the PID of the current task.
123
124 • 001 is the CPU number on which the task is running.
125
126 • In .N.., each character refers to a set of options
127 (whether irqs are enabled, scheduling options,
128 whether hard/softirqs are running, level of pre‐
129 empt_disabled respectively). N means that
130 TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED are set.
131
132 • 419421.045894 is a timestamp.
133
134 • 0x00000001 is a fake value used by BPF for the in‐
135 struction pointer register.
136
137 • <fmt> is the message formatted with fmt.
138
139 The conversion specifiers supported by fmt are similar,
140 but more limited than for printk(). They are %d, %i, %u,
141 %x, %ld, %li, %lu, %lx, %lld, %lli, %llu, %llx, %p, %s.
142 No modifier (size of field, padding with zeroes, etc.) is
143 available, and the helper will return -EINVAL (but print
144 nothing) if it encounters an unknown specifier.
145
146 Also, note that bpf_trace_printk() is slow, and should
147 only be used for debugging purposes. For this reason, a
148 notice block (spanning several lines) is printed to ker‐
149 nel logs and states that the helper should not be used
150 "for production use" the first time this helper is used
151 (or more precisely, when trace_printk() buffers are allo‐
152 cated). For passing values to user space, perf events
153 should be preferred.
154
155 Return The number of bytes written to the buffer, or a negative
156 error in case of failure.
157
158 u32 bpf_get_prandom_u32(void)
159
160 Description
161 Get a pseudo-random number.
162
163 From a security point of view, this helper uses its own
164 pseudo-random internal state, and cannot be used to infer
165 the seed of other random functions in the kernel. How‐
166 ever, it is essential to note that the generator used by
167 the helper is not cryptographically secure.
168
169 Return A random 32-bit unsigned value.
170
171 u32 bpf_get_smp_processor_id(void)
172
173 Description
174 Get the SMP (symmetric multiprocessing) processor id.
175 Note that all programs run with migration disabled, which
176 means that the SMP processor id is stable during all the
177 execution of the program.
178
179 Return The SMP id of the processor running the program.
180
181 long bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void
182 *from, u32 len, u64 flags)
183
184 Description
185 Store len bytes from address from into the packet associ‐
186 ated to skb, at offset. flags are a combination of
187 BPF_F_RECOMPUTE_CSUM (automatically recompute the check‐
188 sum for the packet after storing the bytes) and BPF_F_IN‐
189 VALIDATE_HASH (set skb->hash, skb->swhash and skb->l4hash
190 to 0).
191
192 A call to this helper is susceptible to change the under‐
193 lying packet buffer. Therefore, at load time, all checks
194 on pointers previously done by the verifier are invali‐
195 dated and must be performed again, if the helper is used
196 in combination with direct packet access.
197
198 Return 0 on success, or a negative error in case of failure.
199
200 long bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
201 to, u64 size)
202
203 Description
204 Recompute the layer 3 (e.g. IP) checksum for the packet
205 associated to skb. Computation is incremental, so the
206 helper must know the former value of the header field
207 that was modified (from), the new value of this field
208 (to), and the number of bytes (2 or 4) for this field,
209 stored in size. Alternatively, it is possible to store
210 the difference between the previous and the new values of
211 the header field in to, by setting from and size to 0.
212 For both methods, offset indicates the location of the IP
213 checksum within the packet.
214
215 This helper works in combination with bpf_csum_diff(),
216 which does not update the checksum in-place, but offers
217 more flexibility and can handle sizes larger than 2 or 4
218 for the checksum to update.
219
220 A call to this helper is susceptible to change the under‐
221 lying packet buffer. Therefore, at load time, all checks
222 on pointers previously done by the verifier are invali‐
223 dated and must be performed again, if the helper is used
224 in combination with direct packet access.
225
226 Return 0 on success, or a negative error in case of failure.
227
228 long bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
229 to, u64 flags)
230
231 Description
232 Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum
233 for the packet associated to skb. Computation is incre‐
234 mental, so the helper must know the former value of the
235 header field that was modified (from), the new value of
236 this field (to), and the number of bytes (2 or 4) for
237 this field, stored on the lowest four bits of flags. Al‐
238 ternatively, it is possible to store the difference be‐
239 tween the previous and the new values of the header field
240 in to, by setting from and the four lowest bits of flags
241 to 0. For both methods, offset indicates the location of
242 the IP checksum within the packet. In addition to the
243 size of the field, flags can be added (bitwise OR) actual
244 flags. With BPF_F_MARK_MANGLED_0, a null checksum is left
245 untouched (unless BPF_F_MARK_ENFORCE is added as well),
246 and for updates resulting in a null checksum the value is
247 set to CSUM_MANGLED_0 instead. Flag BPF_F_PSEUDO_HDR in‐
248 dicates the checksum is to be computed against a
249 pseudo-header.
250
251 This helper works in combination with bpf_csum_diff(),
252 which does not update the checksum in-place, but offers
253 more flexibility and can handle sizes larger than 2 or 4
254 for the checksum to update.
255
256 A call to this helper is susceptible to change the under‐
257 lying packet buffer. Therefore, at load time, all checks
258 on pointers previously done by the verifier are invali‐
259 dated and must be performed again, if the helper is used
260 in combination with direct packet access.
261
262 Return 0 on success, or a negative error in case of failure.
263
264 long bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 in‐
265 dex)
266
267 Description
268 This special helper is used to trigger a "tail call", or
269 in other words, to jump into another eBPF program. The
270 same stack frame is used (but values on stack and in reg‐
271 isters for the caller are not accessible to the callee).
272 This mechanism allows for program chaining, either for
273 raising the maximum number of available eBPF instruc‐
274 tions, or to execute given programs in conditional
275 blocks. For security reasons, there is an upper limit to
276 the number of successive tail calls that can be per‐
277 formed.
278
279 Upon call of this helper, the program attempts to jump
280 into a program referenced at index index in prog_ar‐
281 ray_map, a special map of type BPF_MAP_TYPE_PROG_ARRAY,
282 and passes ctx, a pointer to the context.
283
284 If the call succeeds, the kernel immediately runs the
285 first instruction of the new program. This is not a func‐
286 tion call, and it never returns to the previous program.
287 If the call fails, then the helper has no effect, and the
288 caller continues to run its subsequent instructions. A
289 call can fail if the destination program for the jump
290 does not exist (i.e. index is superior to the number of
291 entries in prog_array_map), or if the maximum number of
292 tail calls has been reached for this chain of programs.
293 This limit is defined in the kernel by the macro
294 MAX_TAIL_CALL_CNT (not accessible to user space), which
295 is currently set to 33.
296
297 Return 0 on success, or a negative error in case of failure.
298
299 long bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
300
301 Description
302 Clone and redirect the packet associated to skb to an‐
303 other net device of index ifindex. Both ingress and
304 egress interfaces can be used for redirection. The
305 BPF_F_INGRESS value in flags is used to make the distinc‐
306 tion (ingress path is selected if the flag is present,
307 egress path otherwise). This is the only flag supported
308 for now.
309
310 In comparison with bpf_redirect() helper, bpf_clone_redi‐
311 rect() has the associated cost of duplicating the packet
312 buffer, but this can be executed out of the eBPF program.
313 Conversely, bpf_redirect() is more efficient, but it is
314 handled through an action code where the redirection hap‐
315 pens only after the eBPF program has returned.
316
317 A call to this helper is susceptible to change the under‐
318 lying packet buffer. Therefore, at load time, all checks
319 on pointers previously done by the verifier are invali‐
320 dated and must be performed again, if the helper is used
321 in combination with direct packet access.
322
323 Return 0 on success, or a negative error in case of failure.
324
325 u64 bpf_get_current_pid_tgid(void)
326
327 Description
328 Get the current pid and tgid.
329
330 Return A 64-bit integer containing the current tgid and pid, and
331 created as such: current_task->tgid << 32 | cur‐
332 rent_task->pid.
333
334 u64 bpf_get_current_uid_gid(void)
335
336 Description
337 Get the current uid and gid.
338
339 Return A 64-bit integer containing the current GID and UID, and
340 created as such: current_gid << 32 | current_uid.
341
342 long bpf_get_current_comm(void *buf, u32 size_of_buf)
343
344 Description
345 Copy the comm attribute of the current task into buf of
346 size_of_buf. The comm attribute contains the name of the
347 executable (excluding the path) for the current task. The
348 size_of_buf must be strictly positive. On success, the
349 helper makes sure that the buf is NUL-terminated. On
350 failure, it is filled with zeroes.
351
352 Return 0 on success, or a negative error in case of failure.
353
354 u32 bpf_get_cgroup_classid(struct sk_buff *skb)
355
356 Description
357 Retrieve the classid for the current task, i.e. for the
358 net_cls cgroup to which skb belongs.
359
360 This helper can be used on TC egress path, but not on
361 ingress.
362
363 The net_cls cgroup provides an interface to tag network
364 packets based on a user-provided identifier for all traf‐
365 fic coming from the tasks belonging to the related
366 cgroup. See also the related kernel documentation, avail‐
367 able from the Linux sources in file Documentation/ad‐
368 min-guide/cgroup-v1/net_cls.rst.
369
370 The Linux kernel has two versions for cgroups: there are
371 cgroups v1 and cgroups v2. Both are available to users,
372 who can use a mixture of them, but note that the net_cls
373 cgroup is for cgroup v1 only. This makes it incompatible
374 with BPF programs run on cgroups, which is a
375 cgroup-v2-only feature (a socket can only hold data for
376 one version of cgroups at a time).
377
378 This helper is only available is the kernel was compiled
379 with the CONFIG_CGROUP_NET_CLASSID configuration option
380 set to "y" or to "m".
381
382 Return The classid, or 0 for the default unconfigured classid.
383
384 long bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16
385 vlan_tci)
386
387 Description
388 Push a vlan_tci (VLAN tag control information) of proto‐
389 col vlan_proto to the packet associated to skb, then up‐
390 date the checksum. Note that if vlan_proto is different
391 from ETH_P_8021Q and ETH_P_8021AD, it is considered to be
392 ETH_P_8021Q.
393
394 A call to this helper is susceptible to change the under‐
395 lying packet buffer. Therefore, at load time, all checks
396 on pointers previously done by the verifier are invali‐
397 dated and must be performed again, if the helper is used
398 in combination with direct packet access.
399
400 Return 0 on success, or a negative error in case of failure.
401
402 long bpf_skb_vlan_pop(struct sk_buff *skb)
403
404 Description
405 Pop a VLAN header from the packet associated to skb.
406
407 A call to this helper is susceptible to change the under‐
408 lying packet buffer. Therefore, at load time, all checks
409 on pointers previously done by the verifier are invali‐
410 dated and must be performed again, if the helper is used
411 in combination with direct packet access.
412
413 Return 0 on success, or a negative error in case of failure.
414
415 long bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key
416 *key, u32 size, u64 flags)
417
418 Description
419 Get tunnel metadata. This helper takes a pointer key to
420 an empty struct bpf_tunnel_key of size, that will be
421 filled with tunnel metadata for the packet associated to
422 skb. The flags can be set to BPF_F_TUNINFO_IPV6, which
423 indicates that the tunnel is based on IPv6 protocol in‐
424 stead of IPv4.
425
426 The struct bpf_tunnel_key is an object that generalizes
427 the principal parameters used by various tunneling proto‐
428 cols into a single struct. This way, it can be used to
429 easily make a decision based on the contents of the en‐
430 capsulation header, "summarized" in this struct. In par‐
431 ticular, it holds the IP address of the remote end (IPv4
432 or IPv6, depending on the case) in key->remote_ipv4 or
433 key->remote_ipv6. Also, this struct exposes the key->tun‐
434 nel_id, which is generally mapped to a VNI (Virtual Net‐
435 work Identifier), making it programmable together with
436 the bpf_skb_set_tunnel_key() helper.
437
438 Let's imagine that the following code is part of a pro‐
439 gram attached to the TC ingress interface, on one end of
440 a GRE tunnel, and is supposed to filter out all messages
441 coming from remote ends with IPv4 address other than
442 10.0.0.1:
443
444 int ret;
445 struct bpf_tunnel_key key = {};
446
447 ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
448 if (ret < 0)
449 return TC_ACT_SHOT; // drop packet
450
451 if (key.remote_ipv4 != 0x0a000001)
452 return TC_ACT_SHOT; // drop packet
453
454 return TC_ACT_OK; // accept packet
455
456 This interface can also be used with all encapsulation
457 devices that can operate in "collect metadata" mode: in‐
458 stead of having one network device per specific configu‐
459 ration, the "collect metadata" mode only requires a sin‐
460 gle device where the configuration can be extracted from
461 this helper.
462
463 This can be used together with various tunnels such as
464 VXLan, Geneve, GRE or IP in IP (IPIP).
465
466 Return 0 on success, or a negative error in case of failure.
467
468 long bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key
469 *key, u32 size, u64 flags)
470
471 Description
472 Populate tunnel metadata for packet associated to skb.
473 The tunnel metadata is set to the contents of key, of
474 size. The flags can be set to a combination of the fol‐
475 lowing values:
476
477 BPF_F_TUNINFO_IPV6
478 Indicate that the tunnel is based on IPv6 protocol
479 instead of IPv4.
480
481 BPF_F_ZERO_CSUM_TX
482 For IPv4 packets, add a flag to tunnel metadata
483 indicating that checksum computation should be
484 skipped and checksum set to zeroes.
485
486 BPF_F_DONT_FRAGMENT
487 Add a flag to tunnel metadata indicating that the
488 packet should not be fragmented.
489
490 BPF_F_SEQ_NUMBER
491 Add a flag to tunnel metadata indicating that a
492 sequence number should be added to tunnel header
493 before sending the packet. This flag was added for
494 GRE encapsulation, but might be used with other
495 protocols as well in the future.
496
497 Here is a typical usage on the transmit path:
498
499 struct bpf_tunnel_key key;
500 populate key ...
501 bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
502 bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
503
504 See also the description of the bpf_skb_get_tunnel_key()
505 helper for additional information.
506
507 Return 0 on success, or a negative error in case of failure.
508
509 u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
510
511 Description
512 Read the value of a perf event counter. This helper re‐
513 lies on a map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. The
514 nature of the perf event counter is selected when map is
515 updated with perf event file descriptors. The map is an
516 array whose size is the number of available CPUs, and
517 each cell contains a value relative to one CPU. The value
518 to retrieve is indicated by flags, that contains the in‐
519 dex of the CPU to look up, masked with BPF_F_INDEX_MASK.
520 Alternatively, flags can be set to BPF_F_CURRENT_CPU to
521 indicate that the value for the current CPU should be re‐
522 trieved.
523
524 Note that before Linux 4.13, only hardware perf event can
525 be retrieved.
526
527 Also, be aware that the newer helper
528 bpf_perf_event_read_value() is recommended over
529 bpf_perf_event_read() in general. The latter has some ABI
530 quirks where error and counter value are used as a return
531 code (which is wrong to do since ranges may overlap).
532 This issue is fixed with bpf_perf_event_read_value(),
533 which at the same time provides more features over the
534 bpf_perf_event_read() interface. Please refer to the de‐
535 scription of bpf_perf_event_read_value() for details.
536
537 Return The value of the perf event counter read from the map, or
538 a negative error code in case of failure.
539
540 long bpf_redirect(u32 ifindex, u64 flags)
541
542 Description
543 Redirect the packet to another net device of index
544 ifindex. This helper is somewhat similar to
545 bpf_clone_redirect(), except that the packet is not
546 cloned, which provides increased performance.
547
548 Except for XDP, both ingress and egress interfaces can be
549 used for redirection. The BPF_F_INGRESS value in flags is
550 used to make the distinction (ingress path is selected if
551 the flag is present, egress path otherwise). Currently,
552 XDP only supports redirection to the egress interface,
553 and accepts no flag at all.
554
555 The same effect can also be attained with the more
556 generic bpf_redirect_map(), which uses a BPF map to store
557 the redirect target instead of providing it directly to
558 the helper.
559
560 Return For XDP, the helper returns XDP_REDIRECT on success or
561 XDP_ABORTED on error. For other program types, the values
562 are TC_ACT_REDIRECT on success or TC_ACT_SHOT on error.
563
564 u32 bpf_get_route_realm(struct sk_buff *skb)
565
566 Description
567 Retrieve the realm or the route, that is to say the
568 tclassid field of the destination for the skb. The iden‐
569 tifier retrieved is a user-provided tag, similar to the
570 one used with the net_cls cgroup (see description for
571 bpf_get_cgroup_classid() helper), but here this tag is
572 held by a route (a destination entry), not by a task.
573
574 Retrieving this identifier works with the clsact TC
575 egress hook (see also tc-bpf(8)), or alternatively on
576 conventional classful egress qdiscs, but not on TC
577 ingress path. In case of clsact TC egress hook, this has
578 the advantage that, internally, the destination entry has
579 not been dropped yet in the transmit path. Therefore, the
580 destination entry does not need to be artificially held
581 via netif_keep_dst() for a classful qdisc until the skb
582 is freed.
583
584 This helper is available only if the kernel was compiled
585 with CONFIG_IP_ROUTE_CLASSID configuration option.
586
587 Return The realm of the route for the packet associated to skb,
588 or 0 if none was found.
589
590 long bpf_perf_event_output(void *ctx, struct bpf_map *map, u64 flags,
591 void *data, u64 size)
592
593 Description
594 Write raw data blob into a special BPF perf event held by
595 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
596 event must have the following attributes: PERF_SAMPLE_RAW
597 as sample_type, PERF_TYPE_SOFTWARE as type, and
598 PERF_COUNT_SW_BPF_OUTPUT as config.
599
600 The flags are used to indicate the index in map for which
601 the value must be put, masked with BPF_F_INDEX_MASK. Al‐
602 ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
603 dicate that the index of the current CPU core should be
604 used.
605
606 The value to write, of size, is passed through eBPF stack
607 and pointed by data.
608
609 The context of the program ctx needs also be passed to
610 the helper.
611
612 On user space, a program willing to read the values needs
613 to call perf_event_open() on the perf event (either for
614 one or for all CPUs) and to store the file descriptor
615 into the map. This must be done before the eBPF program
616 can send data into it. An example is available in file
617 samples/bpf/trace_output_user.c in the Linux kernel
618 source tree (the eBPF program counterpart is in sam‐
619 ples/bpf/trace_output_kern.c).
620
621 bpf_perf_event_output() achieves better performance than
622 bpf_trace_printk() for sharing data with user space, and
623 is much better suitable for streaming data from eBPF pro‐
624 grams.
625
626 Note that this helper is not restricted to tracing use
627 cases and can be used with programs attached to TC or XDP
628 as well, where it allows for passing data to user space
629 listeners. Data can be:
630
631 • Only custom structs,
632
633 • Only the packet payload, or
634
635 • A combination of both.
636
637 Return 0 on success, or a negative error in case of failure.
638
639 long bpf_skb_load_bytes(const void *skb, u32 offset, void *to, u32 len)
640
641 Description
642 This helper was provided as an easy way to load data from
643 a packet. It can be used to load len bytes from offset
644 from the packet associated to skb, into the buffer
645 pointed by to.
646
647 Since Linux 4.7, usage of this helper has mostly been re‐
648 placed by "direct packet access", enabling packet data to
649 be manipulated with skb->data and skb->data_end pointing
650 respectively to the first byte of packet data and to the
651 byte after the last byte of packet data. However, it re‐
652 mains useful if one wishes to read large quantities of
653 data at once from a packet into the eBPF stack.
654
655 Return 0 on success, or a negative error in case of failure.
656
657 long bpf_get_stackid(void *ctx, struct bpf_map *map, u64 flags)
658
659 Description
660 Walk a user or a kernel stack and return its id. To
661 achieve this, the helper needs ctx, which is a pointer to
662 the context on which the tracing program is executed, and
663 a pointer to a map of type BPF_MAP_TYPE_STACK_TRACE.
664
665 The last argument, flags, holds the number of stack
666 frames to skip (from 0 to 255), masked with
667 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set a
668 combination of the following flags:
669
670 BPF_F_USER_STACK
671 Collect a user space stack instead of a kernel
672 stack.
673
674 BPF_F_FAST_STACK_CMP
675 Compare stacks by hash only.
676
677 BPF_F_REUSE_STACKID
678 If two different stacks hash into the same
679 stackid, discard the old one.
680
681 The stack id retrieved is a 32 bit long integer handle
682 which can be further combined with other data (including
683 other stack ids) and used as a key into maps. This can be
684 useful for generating a variety of graphs (such as flame
685 graphs or off-cpu graphs).
686
687 For walking a stack, this helper is an improvement over
688 bpf_probe_read(), which can be used with unrolled loops
689 but is not efficient and consumes a lot of eBPF instruc‐
690 tions. Instead, bpf_get_stackid() can collect up to
691 PERF_MAX_STACK_DEPTH both kernel and user frames. Note
692 that this limit can be controlled with the sysctl pro‐
693 gram, and that it should be manually increased in order
694 to profile long user stacks (such as stacks for Java pro‐
695 grams). To do so, use:
696
697 # sysctl kernel.perf_event_max_stack=<new value>
698
699 Return The positive or null stack id on success, or a negative
700 error in case of failure.
701
702 s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size,
703 __wsum seed)
704
705 Description
706 Compute a checksum difference, from the raw buffer
707 pointed by from, of length from_size (that must be a mul‐
708 tiple of 4), towards the raw buffer pointed by to, of
709 size to_size (same remark). An optional seed can be added
710 to the value (this can be cascaded, the seed may come
711 from a previous call to the helper).
712
713 This is flexible enough to be used in several ways:
714
715 • With from_size == 0, to_size > 0 and seed set to check‐
716 sum, it can be used when pushing new data.
717
718 • With from_size > 0, to_size == 0 and seed set to check‐
719 sum, it can be used when removing data from a packet.
720
721 • With from_size > 0, to_size > 0 and seed set to 0, it
722 can be used to compute a diff. Note that from_size and
723 to_size do not need to be equal.
724
725 This helper can be used in combination with
726 bpf_l3_csum_replace() and bpf_l4_csum_replace(), to which
727 one can feed in the difference computed with
728 bpf_csum_diff().
729
730 Return The checksum result, or a negative error code in case of
731 failure.
732
733 long bpf_skb_get_tunnel_opt(struct sk_buff *skb, void *opt, u32 size)
734
735 Description
736 Retrieve tunnel options metadata for the packet associ‐
737 ated to skb, and store the raw tunnel option data to the
738 buffer opt of size.
739
740 This helper can be used with encapsulation devices that
741 can operate in "collect metadata" mode (please refer to
742 the related note in the description of bpf_skb_get_tun‐
743 nel_key() for more details). A particular example where
744 this can be used is in combination with the Geneve encap‐
745 sulation protocol, where it allows for pushing (with
746 bpf_skb_get_tunnel_opt() helper) and retrieving arbitrary
747 TLVs (Type-Length-Value headers) from the eBPF program.
748 This allows for full customization of these headers.
749
750 Return The size of the option data retrieved.
751
752 long bpf_skb_set_tunnel_opt(struct sk_buff *skb, void *opt, u32 size)
753
754 Description
755 Set tunnel options metadata for the packet associated to
756 skb to the option data contained in the raw buffer opt of
757 size.
758
759 See also the description of the bpf_skb_get_tunnel_opt()
760 helper for additional information.
761
762 Return 0 on success, or a negative error in case of failure.
763
764 long bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
765
766 Description
767 Change the protocol of the skb to proto. Currently sup‐
768 ported are transition from IPv4 to IPv6, and from IPv6 to
769 IPv4. The helper takes care of the groundwork for the
770 transition, including resizing the socket buffer. The
771 eBPF program is expected to fill the new headers, if any,
772 via skb_store_bytes() and to recompute the checksums with
773 bpf_l3_csum_replace() and bpf_l4_csum_replace(). The main
774 case for this helper is to perform NAT64 operations out
775 of an eBPF program.
776
777 Internally, the GSO type is marked as dodgy so that head‐
778 ers are checked and segments are recalculated by the
779 GSO/GRO engine. The size for GSO target is adapted as
780 well.
781
782 All values for flags are reserved for future usage, and
783 must be left at zero.
784
785 A call to this helper is susceptible to change the under‐
786 lying packet buffer. Therefore, at load time, all checks
787 on pointers previously done by the verifier are invali‐
788 dated and must be performed again, if the helper is used
789 in combination with direct packet access.
790
791 Return 0 on success, or a negative error in case of failure.
792
793 long bpf_skb_change_type(struct sk_buff *skb, u32 type)
794
795 Description
796 Change the packet type for the packet associated to skb.
797 This comes down to setting skb->pkt_type to type, except
798 the eBPF program does not have a write access to
799 skb->pkt_type beside this helper. Using a helper here al‐
800 lows for graceful handling of errors.
801
802 The major use case is to change incoming skb*s to
803 **PACKET_HOST* in a programmatic way instead of having to
804 recirculate via redirect(..., BPF_F_INGRESS), for exam‐
805 ple.
806
807 Note that type only allows certain values. At this time,
808 they are:
809
810 PACKET_HOST
811 Packet is for us.
812
813 PACKET_BROADCAST
814 Send packet to all.
815
816 PACKET_MULTICAST
817 Send packet to group.
818
819 PACKET_OTHERHOST
820 Send packet to someone else.
821
822 Return 0 on success, or a negative error in case of failure.
823
824 long bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32
825 index)
826
827 Description
828 Check whether skb is a descendant of the cgroup2 held by
829 map of type BPF_MAP_TYPE_CGROUP_ARRAY, at index.
830
831 Return The return value depends on the result of the test, and
832 can be:
833
834 • 0, if the skb failed the cgroup2 descendant test.
835
836 • 1, if the skb succeeded the cgroup2 descendant test.
837
838 • A negative error code, if an error occurred.
839
840 u32 bpf_get_hash_recalc(struct sk_buff *skb)
841
842 Description
843 Retrieve the hash of the packet, skb->hash. If it is not
844 set, in particular if the hash was cleared due to man‐
845 gling, recompute this hash. Later accesses to the hash
846 can be done directly with skb->hash.
847
848 Calling bpf_set_hash_invalid(), changing a packet proto‐
849 type with bpf_skb_change_proto(), or calling
850 bpf_skb_store_bytes() with the BPF_F_INVALIDATE_HASH are
851 actions susceptible to clear the hash and to trigger a
852 new computation for the next call to bpf_get_hash_re‐
853 calc().
854
855 Return The 32-bit hash.
856
857 u64 bpf_get_current_task(void)
858
859 Description
860 Get the current task.
861
862 Return A pointer to the current task struct.
863
864 long bpf_probe_write_user(void *dst, const void *src, u32 len)
865
866 Description
867 Attempt in a safe way to write len bytes from the buffer
868 src to dst in memory. It only works for threads that are
869 in user context, and dst must be a valid user space ad‐
870 dress.
871
872 This helper should not be used to implement any kind of
873 security mechanism because of TOC-TOU attacks, but rather
874 to debug, divert, and manipulate execution of semi-coop‐
875 erative processes.
876
877 Keep in mind that this feature is meant for experiments,
878 and it has a risk of crashing the system and running pro‐
879 grams. Therefore, when an eBPF program using this helper
880 is attached, a warning including PID and process name is
881 printed to kernel logs.
882
883 Return 0 on success, or a negative error in case of failure.
884
885 long bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
886
887 Description
888 Check whether the probe is being run is the context of a
889 given subset of the cgroup2 hierarchy. The cgroup2 to
890 test is held by map of type BPF_MAP_TYPE_CGROUP_ARRAY, at
891 index.
892
893 Return The return value depends on the result of the test, and
894 can be:
895
896 • 1, if current task belongs to the cgroup2.
897
898 • 0, if current task does not belong to the cgroup2.
899
900 • A negative error code, if an error occurred.
901
902 long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
903
904 Description
905 Resize (trim or grow) the packet associated to skb to the
906 new len. The flags are reserved for future usage, and
907 must be left at zero.
908
909 The basic idea is that the helper performs the needed
910 work to change the size of the packet, then the eBPF pro‐
911 gram rewrites the rest via helpers like
912 bpf_skb_store_bytes(), bpf_l3_csum_replace(),
913 bpf_l3_csum_replace() and others. This helper is a slow
914 path utility intended for replies with control messages.
915 And because it is targeted for slow path, the helper it‐
916 self can afford to be slow: it implicitly linearizes, un‐
917 clones and drops offloads from the skb.
918
919 A call to this helper is susceptible to change the under‐
920 lying packet buffer. Therefore, at load time, all checks
921 on pointers previously done by the verifier are invali‐
922 dated and must be performed again, if the helper is used
923 in combination with direct packet access.
924
925 Return 0 on success, or a negative error in case of failure.
926
927 long bpf_skb_pull_data(struct sk_buff *skb, u32 len)
928
929 Description
930 Pull in non-linear data in case the skb is non-linear and
931 not all of len are part of the linear section. Make len
932 bytes from skb readable and writable. If a zero value is
933 passed for len, then all bytes in the linear part of skb
934 will be made readable and writable.
935
936 This helper is only needed for reading and writing with
937 direct packet access.
938
939 For direct packet access, testing that offsets to access
940 are within packet boundaries (test on skb->data_end) is
941 susceptible to fail if offsets are invalid, or if the re‐
942 quested data is in non-linear parts of the skb. On fail‐
943 ure the program can just bail out, or in the case of a
944 non-linear buffer, use a helper to make the data avail‐
945 able. The bpf_skb_load_bytes() helper is a first solution
946 to access the data. Another one consists in using
947 bpf_skb_pull_data to pull in once the non-linear parts,
948 then retesting and eventually access the data.
949
950 At the same time, this also makes sure the skb is un‐
951 cloned, which is a necessary condition for direct write.
952 As this needs to be an invariant for the write part only,
953 the verifier detects writes and adds a prologue that is
954 calling bpf_skb_pull_data() to effectively unclone the
955 skb from the very beginning in case it is indeed cloned.
956
957 A call to this helper is susceptible to change the under‐
958 lying packet buffer. Therefore, at load time, all checks
959 on pointers previously done by the verifier are invali‐
960 dated and must be performed again, if the helper is used
961 in combination with direct packet access.
962
963 Return 0 on success, or a negative error in case of failure.
964
965 s64 bpf_csum_update(struct sk_buff *skb, __wsum csum)
966
967 Description
968 Add the checksum csum into skb->csum in case the driver
969 has supplied a checksum for the entire packet into that
970 field. Return an error otherwise. This helper is intended
971 to be used in combination with bpf_csum_diff(), in par‐
972 ticular when the checksum needs to be updated after data
973 has been written into the packet through direct packet
974 access.
975
976 Return The checksum on success, or a negative error code in case
977 of failure.
978
979 void bpf_set_hash_invalid(struct sk_buff *skb)
980
981 Description
982 Invalidate the current skb->hash. It can be used after
983 mangling on headers through direct packet access, in or‐
984 der to indicate that the hash is outdated and to trigger
985 a recalculation the next time the kernel tries to access
986 this hash or when the bpf_get_hash_recalc() helper is
987 called.
988
989 Return void.
990
991 long bpf_get_numa_node_id(void)
992
993 Description
994 Return the id of the current NUMA node. The primary use
995 case for this helper is the selection of sockets for the
996 local NUMA node, when the program is attached to sockets
997 using the SO_ATTACH_REUSEPORT_EBPF option (see also
998 socket(7)), but the helper is also available to other
999 eBPF program types, similarly to bpf_get_smp_proces‐
1000 sor_id().
1001
1002 Return The id of current NUMA node.
1003
1004 long bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
1005
1006 Description
1007 Grows headroom of packet associated to skb and adjusts
1008 the offset of the MAC header accordingly, adding len
1009 bytes of space. It automatically extends and reallocates
1010 memory as required.
1011
1012 This helper can be used on a layer 3 skb to push a MAC
1013 header for redirection into a layer 2 device.
1014
1015 All values for flags are reserved for future usage, and
1016 must be left at zero.
1017
1018 A call to this helper is susceptible to change the under‐
1019 lying packet buffer. Therefore, at load time, all checks
1020 on pointers previously done by the verifier are invali‐
1021 dated and must be performed again, if the helper is used
1022 in combination with direct packet access.
1023
1024 Return 0 on success, or a negative error in case of failure.
1025
1026 long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
1027
1028 Description
1029 Adjust (move) xdp_md->data by delta bytes. Note that it
1030 is possible to use a negative value for delta. This
1031 helper can be used to prepare the packet for pushing or
1032 popping headers.
1033
1034 A call to this helper is susceptible to change the under‐
1035 lying packet buffer. Therefore, at load time, all checks
1036 on pointers previously done by the verifier are invali‐
1037 dated and must be performed again, if the helper is used
1038 in combination with direct packet access.
1039
1040 Return 0 on success, or a negative error in case of failure.
1041
1042 long bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
1043
1044 Description
1045 Copy a NUL terminated string from an unsafe kernel ad‐
1046 dress unsafe_ptr to dst. See bpf_probe_read_kernel_str()
1047 for more details.
1048
1049 Generally, use bpf_probe_read_user_str() or
1050 bpf_probe_read_kernel_str() instead.
1051
1052 Return On success, the strictly positive length of the string,
1053 including the trailing NUL character. On error, a nega‐
1054 tive value.
1055
1056 u64 bpf_get_socket_cookie(struct sk_buff *skb)
1057
1058 Description
1059 If the struct sk_buff pointed by skb has a known socket,
1060 retrieve the cookie (generated by the kernel) of this
1061 socket. If no cookie has been set yet, generate a new
1062 cookie. Once generated, the socket cookie remains stable
1063 for the life of the socket. This helper can be useful for
1064 monitoring per socket networking traffic statistics as it
1065 provides a global socket identifier that can be assumed
1066 unique.
1067
1068 Return A 8-byte long unique number on success, or 0 if the
1069 socket field is missing inside skb.
1070
1071 u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
1072
1073 Description
1074 Equivalent to bpf_get_socket_cookie() helper that accepts
1075 skb, but gets socket from struct bpf_sock_addr context.
1076
1077 Return A 8-byte long unique number.
1078
1079 u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
1080
1081 Description
1082 Equivalent to bpf_get_socket_cookie() helper that accepts
1083 skb, but gets socket from struct bpf_sock_ops context.
1084
1085 Return A 8-byte long unique number.
1086
1087 u64 bpf_get_socket_cookie(struct sock *sk)
1088
1089 Description
1090 Equivalent to bpf_get_socket_cookie() helper that accepts
1091 sk, but gets socket from a BTF struct sock. This helper
1092 also works for sleepable programs.
1093
1094 Return A 8-byte long unique number or 0 if sk is NULL.
1095
1096 u32 bpf_get_socket_uid(struct sk_buff *skb)
1097
1098 Description
1099 Get the owner UID of the socked associated to skb.
1100
1101 Return The owner UID of the socket associated to skb. If the
1102 socket is NULL, or if it is not a full socket (i.e. if it
1103 is a time-wait or a request socket instead), overflowuid
1104 value is returned (note that overflowuid might also be
1105 the actual UID value for the socket).
1106
1107 long bpf_set_hash(struct sk_buff *skb, u32 hash)
1108
1109 Description
1110 Set the full hash for skb (set the field skb->hash) to
1111 value hash.
1112
1113 Return 0
1114
1115 long bpf_setsockopt(void *bpf_socket, int level, int optname, void
1116 *optval, int optlen)
1117
1118 Description
1119 Emulate a call to setsockopt() on the socket associated
1120 to bpf_socket, which must be a full socket. The level at
1121 which the option resides and the name optname of the op‐
1122 tion must be specified, see setsockopt(2) for more infor‐
1123 mation. The option value of length optlen is pointed by
1124 optval.
1125
1126 bpf_socket should be one of the following:
1127
1128 • struct bpf_sock_ops for BPF_PROG_TYPE_SOCK_OPS.
1129
1130 • struct bpf_sock_addr for BPF_CGROUP_INET4_CONNECT and
1131 BPF_CGROUP_INET6_CONNECT.
1132
1133 This helper actually implements a subset of setsockopt().
1134 It supports the following levels:
1135
1136 • SOL_SOCKET, which supports the following optnames:
1137 SO_RCVBUF, SO_SNDBUF, SO_MAX_PACING_RATE, SO_PRIORITY,
1138 SO_RCVLOWAT, SO_MARK, SO_BINDTODEVICE, SO_KEEPALIVE,
1139 SO_REUSEADDR, SO_REUSEPORT, SO_BINDTOIFINDEX, SO_TXRE‐
1140 HASH.
1141
1142 • IPPROTO_TCP, which supports the following optnames:
1143 TCP_CONGESTION, TCP_BPF_IW, TCP_BPF_SNDCWND_CLAMP,
1144 TCP_SAVE_SYN, TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT,
1145 TCP_SYNCNT, TCP_USER_TIMEOUT, TCP_NOTSENT_LOWAT,
1146 TCP_NODELAY, TCP_MAXSEG, TCP_WINDOW_CLAMP,
1147 TCP_THIN_LINEAR_TIMEOUTS, TCP_BPF_DELACK_MAX,
1148 TCP_BPF_RTO_MIN.
1149
1150 • IPPROTO_IP, which supports optname IP_TOS.
1151
1152 • IPPROTO_IPV6, which supports the following optnames:
1153 IPV6_TCLASS, IPV6_AUTOFLOWLABEL.
1154
1155 Return 0 on success, or a negative error in case of failure.
1156
1157 long bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode,
1158 u64 flags)
1159
1160 Description
1161 Grow or shrink the room for data in the packet associated
1162 to skb by len_diff, and according to the selected mode.
1163
1164 By default, the helper will reset any offloaded checksum
1165 indicator of the skb to CHECKSUM_NONE. This can be
1166 avoided by the following flag:
1167
1168 • BPF_F_ADJ_ROOM_NO_CSUM_RESET: Do not reset offloaded
1169 checksum data of the skb to CHECKSUM_NONE.
1170
1171 There are two supported modes at this time:
1172
1173 • BPF_ADJ_ROOM_MAC: Adjust room at the mac layer (room
1174 space is added or removed between the layer 2 and layer
1175 3 headers).
1176
1177 • BPF_ADJ_ROOM_NET: Adjust room at the network layer
1178 (room space is added or removed between the layer 3 and
1179 layer 4 headers).
1180
1181 The following flags are supported at this time:
1182
1183 • BPF_F_ADJ_ROOM_FIXED_GSO: Do not adjust gso_size. Ad‐
1184 justing mss in this way is not allowed for datagrams.
1185
1186 • BPF_F_ADJ_ROOM_ENCAP_L3_IPV4, BPF_F_ADJ_ROOM_EN‐
1187 CAP_L3_IPV6: Any new space is reserved to hold a tunnel
1188 header. Configure skb offsets and other fields accord‐
1189 ingly.
1190
1191 • BPF_F_ADJ_ROOM_ENCAP_L4_GRE, BPF_F_ADJ_ROOM_EN‐
1192 CAP_L4_UDP: Use with ENCAP_L3 flags to further specify
1193 the tunnel type.
1194
1195 • BPF_F_ADJ_ROOM_ENCAP_L2(len): Use with ENCAP_L3/L4
1196 flags to further specify the tunnel type; len is the
1197 length of the inner MAC header.
1198
1199 • BPF_F_ADJ_ROOM_ENCAP_L2_ETH: Use with
1200 BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the L2
1201 type as Ethernet.
1202
1203 A call to this helper is susceptible to change the under‐
1204 lying packet buffer. Therefore, at load time, all checks
1205 on pointers previously done by the verifier are invali‐
1206 dated and must be performed again, if the helper is used
1207 in combination with direct packet access.
1208
1209 Return 0 on success, or a negative error in case of failure.
1210
1211 long bpf_redirect_map(struct bpf_map *map, u64 key, u64 flags)
1212
1213 Description
1214 Redirect the packet to the endpoint referenced by map at
1215 index key. Depending on its type, this map can contain
1216 references to net devices (for forwarding packets through
1217 other ports), or to CPUs (for redirecting XDP frames to
1218 another CPU; but this is only implemented for native XDP
1219 (with driver support) as of this writing).
1220
1221 The lower two bits of flags are used as the return code
1222 if the map lookup fails. This is so that the return value
1223 can be one of the XDP program return codes up to XDP_TX,
1224 as chosen by the caller. The higher bits of flags can be
1225 set to BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS as de‐
1226 fined below.
1227
1228 With BPF_F_BROADCAST the packet will be broadcasted to
1229 all the interfaces in the map, with BPF_F_EXCLUDE_INGRESS
1230 the ingress interface will be excluded when do broadcast‐
1231 ing.
1232
1233 See also bpf_redirect(), which only supports redirecting
1234 to an ifindex, but doesn't require a map to do so.
1235
1236 Return XDP_REDIRECT on success, or the value of the two lower
1237 bits of the flags argument on error.
1238
1239 long bpf_sk_redirect_map(struct sk_buff *skb, struct bpf_map *map, u32
1240 key, u64 flags)
1241
1242 Description
1243 Redirect the packet to the socket referenced by map (of
1244 type BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1245 egress interfaces can be used for redirection. The
1246 BPF_F_INGRESS value in flags is used to make the distinc‐
1247 tion (ingress path is selected if the flag is present,
1248 egress path otherwise). This is the only flag supported
1249 for now.
1250
1251 Return SK_PASS on success, or SK_DROP on error.
1252
1253 long bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map
1254 *map, void *key, u64 flags)
1255
1256 Description
1257 Add an entry to, or update a map referencing sockets. The
1258 skops is used as a new value for the entry associated to
1259 key. flags is one of:
1260
1261 BPF_NOEXIST
1262 The entry for key must not exist in the map.
1263
1264 BPF_EXIST
1265 The entry for key must already exist in the map.
1266
1267 BPF_ANY
1268 No condition on the existence of the entry for
1269 key.
1270
1271 If the map has eBPF programs (parser and verdict), those
1272 will be inherited by the socket being added. If the
1273 socket is already attached to eBPF programs, this results
1274 in an error.
1275
1276 Return 0 on success, or a negative error in case of failure.
1277
1278 long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
1279
1280 Description
1281 Adjust the address pointed by xdp_md->data_meta by delta
1282 (which can be positive or negative). Note that this oper‐
1283 ation modifies the address stored in xdp_md->data, so the
1284 latter must be loaded only after the helper has been
1285 called.
1286
1287 The use of xdp_md->data_meta is optional and programs are
1288 not required to use it. The rationale is that when the
1289 packet is processed with XDP (e.g. as DoS filter), it is
1290 possible to push further meta data along with it before
1291 passing to the stack, and to give the guarantee that an
1292 ingress eBPF program attached as a TC classifier on the
1293 same device can pick this up for further post-processing.
1294 Since TC works with socket buffers, it remains possible
1295 to set from XDP the mark or priority pointers, or other
1296 pointers for the socket buffer. Having this scratch
1297 space generic and programmable allows for more flexibil‐
1298 ity as the user is free to store whatever meta data they
1299 need.
1300
1301 A call to this helper is susceptible to change the under‐
1302 lying packet buffer. Therefore, at load time, all checks
1303 on pointers previously done by the verifier are invali‐
1304 dated and must be performed again, if the helper is used
1305 in combination with direct packet access.
1306
1307 Return 0 on success, or a negative error in case of failure.
1308
1309 long bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct
1310 bpf_perf_event_value *buf, u32 buf_size)
1311
1312 Description
1313 Read the value of a perf event counter, and store it into
1314 buf of size buf_size. This helper relies on a map of type
1315 BPF_MAP_TYPE_PERF_EVENT_ARRAY. The nature of the perf
1316 event counter is selected when map is updated with perf
1317 event file descriptors. The map is an array whose size is
1318 the number of available CPUs, and each cell contains a
1319 value relative to one CPU. The value to retrieve is indi‐
1320 cated by flags, that contains the index of the CPU to
1321 look up, masked with BPF_F_INDEX_MASK. Alternatively,
1322 flags can be set to BPF_F_CURRENT_CPU to indicate that
1323 the value for the current CPU should be retrieved.
1324
1325 This helper behaves in a way close to
1326 bpf_perf_event_read() helper, save that instead of just
1327 returning the value observed, it fills the buf structure.
1328 This allows for additional data to be retrieved: in par‐
1329 ticular, the enabled and running times (in buf->enabled
1330 and buf->running, respectively) are copied. In general,
1331 bpf_perf_event_read_value() is recommended over
1332 bpf_perf_event_read(), which has some ABI issues and pro‐
1333 vides fewer functionalities.
1334
1335 These values are interesting, because hardware PMU (Per‐
1336 formance Monitoring Unit) counters are limited resources.
1337 When there are more PMU based perf events opened than
1338 available counters, kernel will multiplex these events so
1339 each event gets certain percentage (but not all) of the
1340 PMU time. In case that multiplexing happens, the number
1341 of samples or counter value will not reflect the case
1342 compared to when no multiplexing occurs. This makes com‐
1343 parison between different runs difficult. Typically, the
1344 counter value should be normalized before comparing to
1345 other experiments. The usual normalization is done as
1346 follows.
1347
1348 normalized_counter = counter * t_enabled / t_running
1349
1350 Where t_enabled is the time enabled for event and t_run‐
1351 ning is the time running for event since last normaliza‐
1352 tion. The enabled and running times are accumulated since
1353 the perf event open. To achieve scaling factor between
1354 two invocations of an eBPF program, users can use CPU id
1355 as the key (which is typical for perf array usage model)
1356 to remember the previous value and do the calculation in‐
1357 side the eBPF program.
1358
1359 Return 0 on success, or a negative error in case of failure.
1360
1361 long bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct
1362 bpf_perf_event_value *buf, u32 buf_size)
1363
1364 Description
1365 For en eBPF program attached to a perf event, retrieve
1366 the value of the event counter associated to ctx and
1367 store it in the structure pointed by buf and of size
1368 buf_size. Enabled and running times are also stored in
1369 the structure (see description of helper
1370 bpf_perf_event_read_value() for more details).
1371
1372 Return 0 on success, or a negative error in case of failure.
1373
1374 long bpf_getsockopt(void *bpf_socket, int level, int optname, void
1375 *optval, int optlen)
1376
1377 Description
1378 Emulate a call to getsockopt() on the socket associated
1379 to bpf_socket, which must be a full socket. The level at
1380 which the option resides and the name optname of the op‐
1381 tion must be specified, see getsockopt(2) for more infor‐
1382 mation. The retrieved value is stored in the structure
1383 pointed by opval and of length optlen.
1384
1385 bpf_socket should be one of the following:
1386
1387 • struct bpf_sock_ops for BPF_PROG_TYPE_SOCK_OPS.
1388
1389 • struct bpf_sock_addr for BPF_CGROUP_INET4_CONNECT and
1390 BPF_CGROUP_INET6_CONNECT.
1391
1392 This helper actually implements a subset of getsockopt().
1393 It supports the same set of optnames that is supported by
1394 the bpf_setsockopt() helper. The exceptions are
1395 TCP_BPF_* is bpf_setsockopt() only and TCP_SAVED_SYN is
1396 bpf_getsockopt() only.
1397
1398 Return 0 on success, or a negative error in case of failure.
1399
1400 long bpf_override_return(struct pt_regs *regs, u64 rc)
1401
1402 Description
1403 Used for error injection, this helper uses kprobes to
1404 override the return value of the probed function, and to
1405 set it to rc. The first argument is the context regs on
1406 which the kprobe works.
1407
1408 This helper works by setting the PC (program counter) to
1409 an override function which is run in place of the origi‐
1410 nal probed function. This means the probed function is
1411 not run at all. The replacement function just returns
1412 with the required value.
1413
1414 This helper has security implications, and thus is sub‐
1415 ject to restrictions. It is only available if the kernel
1416 was compiled with the CONFIG_BPF_KPROBE_OVERRIDE configu‐
1417 ration option, and in this case it only works on func‐
1418 tions tagged with ALLOW_ERROR_INJECTION in the kernel
1419 code.
1420
1421 Also, the helper is only available for the architectures
1422 having the CONFIG_FUNCTION_ERROR_INJECTION option. As of
1423 this writing, x86 architecture is the only one to support
1424 this feature.
1425
1426 Return 0
1427
1428 long bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int
1429 argval)
1430
1431 Description
1432 Attempt to set the value of the bpf_sock_ops_cb_flags
1433 field for the full TCP socket associated to bpf_sock_ops
1434 to argval.
1435
1436 The primary use of this field is to determine if there
1437 should be calls to eBPF programs of type
1438 BPF_PROG_TYPE_SOCK_OPS at various points in the TCP code.
1439 A program of the same type can change its value, per con‐
1440 nection and as necessary, when the connection is estab‐
1441 lished. This field is directly accessible for reading,
1442 but this helper must be used for updates in order to re‐
1443 turn an error if an eBPF program tries to set a callback
1444 that is not supported in the current kernel.
1445
1446 argval is a flag array which can combine these flags:
1447
1448 • BPF_SOCK_OPS_RTO_CB_FLAG (retransmission time out)
1449
1450 • BPF_SOCK_OPS_RETRANS_CB_FLAG (retransmission)
1451
1452 • BPF_SOCK_OPS_STATE_CB_FLAG (TCP state change)
1453
1454 • BPF_SOCK_OPS_RTT_CB_FLAG (every RTT)
1455
1456 Therefore, this function can be used to clear a callback
1457 flag by setting the appropriate bit to zero. e.g. to dis‐
1458 able the RTO callback:
1459
1460 bpf_sock_ops_cb_flags_set(bpf_sock,
1461 bpf_sock->bpf_sock_ops_cb_flags &
1462 ~BPF_SOCK_OPS_RTO_CB_FLAG)
1463
1464 Here are some examples of where one could call such eBPF
1465 program:
1466
1467 • When RTO fires.
1468
1469 • When a packet is retransmitted.
1470
1471 • When the connection terminates.
1472
1473 • When a packet is sent.
1474
1475 • When a packet is received.
1476
1477 Return Code -EINVAL if the socket is not a full TCP socket; oth‐
1478 erwise, a positive number containing the bits that could
1479 not be set is returned (which comes down to 0 if all bits
1480 were set as required).
1481
1482 long bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map,
1483 u32 key, u64 flags)
1484
1485 Description
1486 This helper is used in programs implementing policies at
1487 the socket level. If the message msg is allowed to pass
1488 (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1489 rect it to the socket referenced by map (of type
1490 BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1491 egress interfaces can be used for redirection. The
1492 BPF_F_INGRESS value in flags is used to make the distinc‐
1493 tion (ingress path is selected if the flag is present,
1494 egress path otherwise). This is the only flag supported
1495 for now.
1496
1497 Return SK_PASS on success, or SK_DROP on error.
1498
1499 long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
1500
1501 Description
1502 For socket policies, apply the verdict of the eBPF pro‐
1503 gram to the next bytes (number of bytes) of message msg.
1504
1505 For example, this helper can be used in the following
1506 cases:
1507
1508 • A single sendmsg() or sendfile() system call contains
1509 multiple logical messages that the eBPF program is sup‐
1510 posed to read and for which it should apply a verdict.
1511
1512 • An eBPF program only cares to read the first bytes of a
1513 msg. If the message has a large payload, then setting
1514 up and calling the eBPF program repeatedly for all
1515 bytes, even though the verdict is already known, would
1516 create unnecessary overhead.
1517
1518 When called from within an eBPF program, the helper sets
1519 a counter internal to the BPF infrastructure, that is
1520 used to apply the last verdict to the next bytes. If
1521 bytes is smaller than the current data being processed
1522 from a sendmsg() or sendfile() system call, the first
1523 bytes will be sent and the eBPF program will be re-run
1524 with the pointer for start of data pointing to byte num‐
1525 ber bytes + 1. If bytes is larger than the current data
1526 being processed, then the eBPF verdict will be applied to
1527 multiple sendmsg() or sendfile() calls until bytes are
1528 consumed.
1529
1530 Note that if a socket closes with the internal counter
1531 holding a non-zero value, this is not a problem because
1532 data is not being buffered for bytes and is sent as it is
1533 received.
1534
1535 Return 0
1536
1537 long bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
1538
1539 Description
1540 For socket policies, prevent the execution of the verdict
1541 eBPF program for message msg until bytes (byte number)
1542 have been accumulated.
1543
1544 This can be used when one needs a specific number of
1545 bytes before a verdict can be assigned, even if the data
1546 spans multiple sendmsg() or sendfile() calls. The extreme
1547 case would be a user calling sendmsg() repeatedly with
1548 1-byte long message segments. Obviously, this is bad for
1549 performance, but it is still valid. If the eBPF program
1550 needs bytes bytes to validate a header, this helper can
1551 be used to prevent the eBPF program to be called again
1552 until bytes have been accumulated.
1553
1554 Return 0
1555
1556 long bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64
1557 flags)
1558
1559 Description
1560 For socket policies, pull in non-linear data from user
1561 space for msg and set pointers msg->data and
1562 msg->data_end to start and end bytes offsets into msg,
1563 respectively.
1564
1565 If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
1566 it can only parse data that the (data, data_end) pointers
1567 have already consumed. For sendmsg() hooks this is likely
1568 the first scatterlist element. But for calls relying on
1569 the sendpage handler (e.g. sendfile()) this will be the
1570 range (0, 0) because the data is shared with user space
1571 and by default the objective is to avoid allowing user
1572 space to modify data while (or after) eBPF verdict is be‐
1573 ing decided. This helper can be used to pull in data and
1574 to set the start and end pointer to given values. Data
1575 will be copied if necessary (i.e. if data was not linear
1576 and if start and end pointers do not point to the same
1577 chunk).
1578
1579 A call to this helper is susceptible to change the under‐
1580 lying packet buffer. Therefore, at load time, all checks
1581 on pointers previously done by the verifier are invali‐
1582 dated and must be performed again, if the helper is used
1583 in combination with direct packet access.
1584
1585 All values for flags are reserved for future usage, and
1586 must be left at zero.
1587
1588 Return 0 on success, or a negative error in case of failure.
1589
1590 long bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int
1591 addr_len)
1592
1593 Description
1594 Bind the socket associated to ctx to the address pointed
1595 by addr, of length addr_len. This allows for making out‐
1596 going connection from the desired IP address, which can
1597 be useful for example when all processes inside a cgroup
1598 should use one single IP address on a host that has mul‐
1599 tiple IP configured.
1600
1601 This helper works for IPv4 and IPv6, TCP and UDP sockets.
1602 The domain (addr->sa_family) must be AF_INET (or
1603 AF_INET6). It's advised to pass zero port (sin_port or
1604 sin6_port) which triggers IP_BIND_ADDRESS_NO_PORT-like
1605 behavior and lets the kernel efficiently pick up an un‐
1606 used port as long as 4-tuple is unique. Passing non-zero
1607 port might lead to degraded performance.
1608
1609 Return 0 on success, or a negative error in case of failure.
1610
1611 long bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
1612
1613 Description
1614 Adjust (move) xdp_md->data_end by delta bytes. It is pos‐
1615 sible to both shrink and grow the packet tail. Shrink
1616 done via delta being a negative integer.
1617
1618 A call to this helper is susceptible to change the under‐
1619 lying packet buffer. Therefore, at load time, all checks
1620 on pointers previously done by the verifier are invali‐
1621 dated and must be performed again, if the helper is used
1622 in combination with direct packet access.
1623
1624 Return 0 on success, or a negative error in case of failure.
1625
1626 long bpf_skb_get_xfrm_state(struct sk_buff *skb, u32 index, struct
1627 bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
1628
1629 Description
1630 Retrieve the XFRM state (IP transform framework, see also
1631 ip-xfrm(8)) at index in XFRM "security path" for skb.
1632
1633 The retrieved value is stored in the struct
1634 bpf_xfrm_state pointed by xfrm_state and of length size.
1635
1636 All values for flags are reserved for future usage, and
1637 must be left at zero.
1638
1639 This helper is available only if the kernel was compiled
1640 with CONFIG_XFRM configuration option.
1641
1642 Return 0 on success, or a negative error in case of failure.
1643
1644 long bpf_get_stack(void *ctx, void *buf, u32 size, u64 flags)
1645
1646 Description
1647 Return a user or a kernel stack in bpf program provided
1648 buffer. To achieve this, the helper needs ctx, which is
1649 a pointer to the context on which the tracing program is
1650 executed. To store the stacktrace, the bpf program pro‐
1651 vides buf with a nonnegative size.
1652
1653 The last argument, flags, holds the number of stack
1654 frames to skip (from 0 to 255), masked with
1655 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set
1656 the following flags:
1657
1658 BPF_F_USER_STACK
1659 Collect a user space stack instead of a kernel
1660 stack.
1661
1662 BPF_F_USER_BUILD_ID
1663 Collect (build_id, file_offset) instead of ips for
1664 user stack, only valid if BPF_F_USER_STACK is also
1665 specified.
1666
1667 file_offset is an offset relative to the beginning
1668 of the executable or shared object file backing
1669 the vma which the ip falls in. It is not an offset
1670 relative to that object's base address. Accord‐
1671 ingly, it must be adjusted by adding (sh_addr -
1672 sh_offset), where sh_{addr,offset} correspond to
1673 the executable section containing file_offset in
1674 the object, for comparisons to symbols' st_value
1675 to be valid.
1676
1677 bpf_get_stack() can collect up to PERF_MAX_STACK_DEPTH
1678 both kernel and user frames, subject to sufficient large
1679 buffer size. Note that this limit can be controlled with
1680 the sysctl program, and that it should be manually in‐
1681 creased in order to profile long user stacks (such as
1682 stacks for Java programs). To do so, use:
1683
1684 # sysctl kernel.perf_event_max_stack=<new value>
1685
1686 Return The non-negative copied buf length equal to or less than
1687 size on success, or a negative error in case of failure.
1688
1689 long bpf_skb_load_bytes_relative(const void *skb, u32 offset, void *to,
1690 u32 len, u32 start_header)
1691
1692 Description
1693 This helper is similar to bpf_skb_load_bytes() in that it
1694 provides an easy way to load len bytes from offset from
1695 the packet associated to skb, into the buffer pointed by
1696 to. The difference to bpf_skb_load_bytes() is that a
1697 fifth argument start_header exists in order to select a
1698 base offset to start from. start_header can be one of:
1699
1700 BPF_HDR_START_MAC
1701 Base offset to load data from is skb's mac header.
1702
1703 BPF_HDR_START_NET
1704 Base offset to load data from is skb's network
1705 header.
1706
1707 In general, "direct packet access" is the preferred
1708 method to access packet data, however, this helper is in
1709 particular useful in socket filters where skb->data does
1710 not always point to the start of the mac header and where
1711 "direct packet access" is not available.
1712
1713 Return 0 on success, or a negative error in case of failure.
1714
1715 long bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen,
1716 u32 flags)
1717
1718 Description
1719 Do FIB lookup in kernel tables using parameters in
1720 params. If lookup is successful and result shows packet
1721 is to be forwarded, the neighbor tables are searched for
1722 the nexthop. If successful (ie., FIB lookup shows for‐
1723 warding and nexthop is resolved), the nexthop address is
1724 returned in ipv4_dst or ipv6_dst based on family, smac is
1725 set to mac address of egress device, dmac is set to nex‐
1726 thop mac address, rt_metric is set to metric from route
1727 (IPv4/IPv6 only), and ifindex is set to the device index
1728 of the nexthop from the FIB lookup.
1729
1730 plen argument is the size of the passed in struct. flags
1731 argument can be a combination of one or more of the fol‐
1732 lowing values:
1733
1734 BPF_FIB_LOOKUP_DIRECT
1735 Do a direct table lookup vs full lookup using FIB
1736 rules.
1737
1738 BPF_FIB_LOOKUP_OUTPUT
1739 Perform lookup from an egress perspective (default
1740 is ingress).
1741
1742 ctx is either struct xdp_md for XDP programs or struct
1743 sk_buff tc cls_act programs.
1744
1745 Return
1746
1747 • < 0 if any input argument is invalid
1748
1749 • 0 on success (packet is forwarded, nexthop neighbor ex‐
1750 ists)
1751
1752 • > 0 one of BPF_FIB_LKUP_RET_ codes explaining why the
1753 packet is not forwarded or needs assist from full stack
1754
1755 If lookup fails with BPF_FIB_LKUP_RET_FRAG_NEEDED, then
1756 the MTU was exceeded and output params->mtu_result con‐
1757 tains the MTU.
1758
1759 long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map
1760 *map, void *key, u64 flags)
1761
1762 Description
1763 Add an entry to, or update a sockhash map referencing
1764 sockets. The skops is used as a new value for the entry
1765 associated to key. flags is one of:
1766
1767 BPF_NOEXIST
1768 The entry for key must not exist in the map.
1769
1770 BPF_EXIST
1771 The entry for key must already exist in the map.
1772
1773 BPF_ANY
1774 No condition on the existence of the entry for
1775 key.
1776
1777 If the map has eBPF programs (parser and verdict), those
1778 will be inherited by the socket being added. If the
1779 socket is already attached to eBPF programs, this results
1780 in an error.
1781
1782 Return 0 on success, or a negative error in case of failure.
1783
1784 long bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map
1785 *map, void *key, u64 flags)
1786
1787 Description
1788 This helper is used in programs implementing policies at
1789 the socket level. If the message msg is allowed to pass
1790 (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1791 rect it to the socket referenced by map (of type
1792 BPF_MAP_TYPE_SOCKHASH) using hash key. Both ingress and
1793 egress interfaces can be used for redirection. The
1794 BPF_F_INGRESS value in flags is used to make the distinc‐
1795 tion (ingress path is selected if the flag is present,
1796 egress path otherwise). This is the only flag supported
1797 for now.
1798
1799 Return SK_PASS on success, or SK_DROP on error.
1800
1801 long bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map,
1802 void *key, u64 flags)
1803
1804 Description
1805 This helper is used in programs implementing policies at
1806 the skb socket level. If the sk_buff skb is allowed to
1807 pass (i.e. if the verdict eBPF program returns SK_PASS),
1808 redirect it to the socket referenced by map (of type
1809 BPF_MAP_TYPE_SOCKHASH) using hash key. Both ingress and
1810 egress interfaces can be used for redirection. The
1811 BPF_F_INGRESS value in flags is used to make the distinc‐
1812 tion (ingress path is selected if the flag is present,
1813 egress otherwise). This is the only flag supported for
1814 now.
1815
1816 Return SK_PASS on success, or SK_DROP on error.
1817
1818 long bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32
1819 len)
1820
1821 Description
1822 Encapsulate the packet associated to skb within a Layer 3
1823 protocol header. This header is provided in the buffer at
1824 address hdr, with len its size in bytes. type indicates
1825 the protocol of the header and can be one of:
1826
1827 BPF_LWT_ENCAP_SEG6
1828 IPv6 encapsulation with Segment Routing Header
1829 (struct ipv6_sr_hdr). hdr only contains the SRH,
1830 the IPv6 header is computed by the kernel.
1831
1832 BPF_LWT_ENCAP_SEG6_INLINE
1833 Only works if skb contains an IPv6 packet. Insert
1834 a Segment Routing Header (struct ipv6_sr_hdr) in‐
1835 side the IPv6 header.
1836
1837 BPF_LWT_ENCAP_IP
1838 IP encapsulation (GRE/GUE/IPIP/etc). The outer
1839 header must be IPv4 or IPv6, followed by zero or
1840 more additional headers, up to LWT_BPF_MAX_HEAD‐
1841 ROOM total bytes in all prepended headers. Please
1842 note that if skb_is_gso(skb) is true, no more than
1843 two headers can be prepended, and the inner
1844 header, if present, should be either GRE or
1845 UDP/GUE.
1846
1847 BPF_LWT_ENCAP_SEG6* types can be called by BPF programs
1848 of type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can
1849 be called by bpf programs of types BPF_PROG_TYPE_LWT_IN
1850 and BPF_PROG_TYPE_LWT_XMIT.
1851
1852 A call to this helper is susceptible to change the under‐
1853 lying packet buffer. Therefore, at load time, all checks
1854 on pointers previously done by the verifier are invali‐
1855 dated and must be performed again, if the helper is used
1856 in combination with direct packet access.
1857
1858 Return 0 on success, or a negative error in case of failure.
1859
1860 long bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const
1861 void *from, u32 len)
1862
1863 Description
1864 Store len bytes from address from into the packet associ‐
1865 ated to skb, at offset. Only the flags, tag and TLVs in‐
1866 side the outermost IPv6 Segment Routing Header can be
1867 modified through this helper.
1868
1869 A call to this helper is susceptible to change the under‐
1870 lying packet buffer. Therefore, at load time, all checks
1871 on pointers previously done by the verifier are invali‐
1872 dated and must be performed again, if the helper is used
1873 in combination with direct packet access.
1874
1875 Return 0 on success, or a negative error in case of failure.
1876
1877 long bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32
1878 delta)
1879
1880 Description
1881 Adjust the size allocated to TLVs in the outermost IPv6
1882 Segment Routing Header contained in the packet associated
1883 to skb, at position offset by delta bytes. Only offsets
1884 after the segments are accepted. delta can be as well
1885 positive (growing) as negative (shrinking).
1886
1887 A call to this helper is susceptible to change the under‐
1888 lying packet buffer. Therefore, at load time, all checks
1889 on pointers previously done by the verifier are invali‐
1890 dated and must be performed again, if the helper is used
1891 in combination with direct packet access.
1892
1893 Return 0 on success, or a negative error in case of failure.
1894
1895 long bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param,
1896 u32 param_len)
1897
1898 Description
1899 Apply an IPv6 Segment Routing action of type action to
1900 the packet associated to skb. Each action takes a parame‐
1901 ter contained at address param, and of length param_len
1902 bytes. action can be one of:
1903
1904 SEG6_LOCAL_ACTION_END_X
1905 End.X action: Endpoint with Layer-3 cross-connect.
1906 Type of param: struct in6_addr.
1907
1908 SEG6_LOCAL_ACTION_END_T
1909 End.T action: Endpoint with specific IPv6 table
1910 lookup. Type of param: int.
1911
1912 SEG6_LOCAL_ACTION_END_B6
1913 End.B6 action: Endpoint bound to an SRv6 policy.
1914 Type of param: struct ipv6_sr_hdr.
1915
1916 SEG6_LOCAL_ACTION_END_B6_ENCAP
1917 End.B6.Encap action: Endpoint bound to an SRv6 en‐
1918 capsulation policy. Type of param: struct
1919 ipv6_sr_hdr.
1920
1921 A call to this helper is susceptible to change the under‐
1922 lying packet buffer. Therefore, at load time, all checks
1923 on pointers previously done by the verifier are invali‐
1924 dated and must be performed again, if the helper is used
1925 in combination with direct packet access.
1926
1927 Return 0 on success, or a negative error in case of failure.
1928
1929 long bpf_rc_repeat(void *ctx)
1930
1931 Description
1932 This helper is used in programs implementing IR decoding,
1933 to report a successfully decoded repeat key message. This
1934 delays the generation of a key up event for previously
1935 generated key down event.
1936
1937 Some IR protocols like NEC have a special IR message for
1938 repeating last button, for when a button is held down.
1939
1940 The ctx should point to the lirc sample as passed into
1941 the program.
1942
1943 This helper is only available is the kernel was compiled
1944 with the CONFIG_BPF_LIRC_MODE2 configuration option set
1945 to "y".
1946
1947 Return 0
1948
1949 long bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
1950
1951 Description
1952 This helper is used in programs implementing IR decoding,
1953 to report a successfully decoded key press with scancode,
1954 toggle value in the given protocol. The scancode will be
1955 translated to a keycode using the rc keymap, and reported
1956 as an input key down event. After a period a key up event
1957 is generated. This period can be extended by calling ei‐
1958 ther bpf_rc_keydown() again with the same values, or
1959 calling bpf_rc_repeat().
1960
1961 Some protocols include a toggle bit, in case the button
1962 was released and pressed again between consecutive scan‐
1963 codes.
1964
1965 The ctx should point to the lirc sample as passed into
1966 the program.
1967
1968 The protocol is the decoded protocol number (see enum
1969 rc_proto for some predefined values).
1970
1971 This helper is only available is the kernel was compiled
1972 with the CONFIG_BPF_LIRC_MODE2 configuration option set
1973 to "y".
1974
1975 Return 0
1976
1977 u64 bpf_skb_cgroup_id(struct sk_buff *skb)
1978
1979 Description
1980 Return the cgroup v2 id of the socket associated with the
1981 skb. This is roughly similar to the bpf_get_cgroup_clas‐
1982 sid() helper for cgroup v1 by providing a tag resp. iden‐
1983 tifier that can be matched on or used for map lookups
1984 e.g. to implement policy. The cgroup v2 id of a given
1985 path in the hierarchy is exposed in user space through
1986 the f_handle API in order to get to the same 64-bit id.
1987
1988 This helper can be used on TC egress path, but not on
1989 ingress, and is available only if the kernel was compiled
1990 with the CONFIG_SOCK_CGROUP_DATA configuration option.
1991
1992 Return The id is returned or 0 in case the id could not be re‐
1993 trieved.
1994
1995 u64 bpf_get_current_cgroup_id(void)
1996
1997 Description
1998 Get the current cgroup id based on the cgroup within
1999 which the current task is running.
2000
2001 Return A 64-bit integer containing the current cgroup id based
2002 on the cgroup within which the current task is running.
2003
2004 void *bpf_get_local_storage(void *map, u64 flags)
2005
2006 Description
2007 Get the pointer to the local storage area. The type and
2008 the size of the local storage is defined by the map argu‐
2009 ment. The flags meaning is specific for each map type,
2010 and has to be 0 for cgroup local storage.
2011
2012 Depending on the BPF program type, a local storage area
2013 can be shared between multiple instances of the BPF pro‐
2014 gram, running simultaneously.
2015
2016 A user should care about the synchronization by himself.
2017 For example, by using the BPF_ATOMIC instructions to al‐
2018 ter the shared data.
2019
2020 Return A pointer to the local storage area.
2021
2022 long bpf_sk_select_reuseport(struct sk_reuseport_md *reuse, struct
2023 bpf_map *map, void *key, u64 flags)
2024
2025 Description
2026 Select a SO_REUSEPORT socket from a BPF_MAP_TYPE_REUSE‐
2027 PORT_SOCKARRAY map. It checks the selected socket is
2028 matching the incoming request in the socket buffer.
2029
2030 Return 0 on success, or a negative error in case of failure.
2031
2032 u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level)
2033
2034 Description
2035 Return id of cgroup v2 that is ancestor of cgroup associ‐
2036 ated with the skb at the ancestor_level. The root cgroup
2037 is at ancestor_level zero and each step down the hierar‐
2038 chy increments the level. If ancestor_level == level of
2039 cgroup associated with skb, then return value will be
2040 same as that of bpf_skb_cgroup_id().
2041
2042 The helper is useful to implement policies based on
2043 cgroups that are upper in hierarchy than immediate cgroup
2044 associated with skb.
2045
2046 The format of returned id and helper limitations are same
2047 as in bpf_skb_cgroup_id().
2048
2049 Return The id is returned or 0 in case the id could not be re‐
2050 trieved.
2051
2052 struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple
2053 *tuple, u32 tuple_size, u64 netns, u64 flags)
2054
2055 Description
2056 Look for TCP socket matching tuple, optionally in a child
2057 network namespace netns. The return value must be
2058 checked, and if non-NULL, released via bpf_sk_release().
2059
2060 The ctx should point to the context of the program, such
2061 as the skb or socket (depending on the hook in use). This
2062 is used to determine the base network namespace for the
2063 lookup.
2064
2065 tuple_size must be one of:
2066
2067 sizeof(tuple->ipv4)
2068 Look for an IPv4 socket.
2069
2070 sizeof(tuple->ipv6)
2071 Look for an IPv6 socket.
2072
2073 If the netns is a negative signed 32-bit integer, then
2074 the socket lookup table in the netns associated with the
2075 ctx will be used. For the TC hooks, this is the netns of
2076 the device in the skb. For socket hooks, this is the
2077 netns of the socket. If netns is any other signed 32-bit
2078 value greater than or equal to zero then it specifies the
2079 ID of the netns relative to the netns associated with the
2080 ctx. netns values beyond the range of 32-bit integers are
2081 reserved for future use.
2082
2083 All values for flags are reserved for future usage, and
2084 must be left at zero.
2085
2086 This helper is available only if the kernel was compiled
2087 with CONFIG_NET configuration option.
2088
2089 Return Pointer to struct bpf_sock, or NULL in case of failure.
2090 For sockets with reuseport option, the struct bpf_sock
2091 result is from reuse->socks[] using the hash of the tu‐
2092 ple.
2093
2094 struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple
2095 *tuple, u32 tuple_size, u64 netns, u64 flags)
2096
2097 Description
2098 Look for UDP socket matching tuple, optionally in a child
2099 network namespace netns. The return value must be
2100 checked, and if non-NULL, released via bpf_sk_release().
2101
2102 The ctx should point to the context of the program, such
2103 as the skb or socket (depending on the hook in use). This
2104 is used to determine the base network namespace for the
2105 lookup.
2106
2107 tuple_size must be one of:
2108
2109 sizeof(tuple->ipv4)
2110 Look for an IPv4 socket.
2111
2112 sizeof(tuple->ipv6)
2113 Look for an IPv6 socket.
2114
2115 If the netns is a negative signed 32-bit integer, then
2116 the socket lookup table in the netns associated with the
2117 ctx will be used. For the TC hooks, this is the netns of
2118 the device in the skb. For socket hooks, this is the
2119 netns of the socket. If netns is any other signed 32-bit
2120 value greater than or equal to zero then it specifies the
2121 ID of the netns relative to the netns associated with the
2122 ctx. netns values beyond the range of 32-bit integers are
2123 reserved for future use.
2124
2125 All values for flags are reserved for future usage, and
2126 must be left at zero.
2127
2128 This helper is available only if the kernel was compiled
2129 with CONFIG_NET configuration option.
2130
2131 Return Pointer to struct bpf_sock, or NULL in case of failure.
2132 For sockets with reuseport option, the struct bpf_sock
2133 result is from reuse->socks[] using the hash of the tu‐
2134 ple.
2135
2136 long bpf_sk_release(void *sock)
2137
2138 Description
2139 Release the reference held by sock. sock must be a
2140 non-NULL pointer that was returned from
2141 bpf_sk_lookup_xxx().
2142
2143 Return 0 on success, or a negative error in case of failure.
2144
2145 long bpf_map_push_elem(struct bpf_map *map, const void *value, u64
2146 flags)
2147
2148 Description
2149 Push an element value in map. flags is one of:
2150
2151 BPF_EXIST
2152 If the queue/stack is full, the oldest element is
2153 removed to make room for this.
2154
2155 Return 0 on success, or a negative error in case of failure.
2156
2157 long bpf_map_pop_elem(struct bpf_map *map, void *value)
2158
2159 Description
2160 Pop an element from map.
2161
2162 Return 0 on success, or a negative error in case of failure.
2163
2164 long bpf_map_peek_elem(struct bpf_map *map, void *value)
2165
2166 Description
2167 Get an element from map without removing it.
2168
2169 Return 0 on success, or a negative error in case of failure.
2170
2171 long bpf_msg_push_data(struct sk_msg_buff *msg, u32 start, u32 len, u64
2172 flags)
2173
2174 Description
2175 For socket policies, insert len bytes into msg at offset
2176 start.
2177
2178 If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
2179 it may want to insert metadata or options into the msg.
2180 This can later be read and used by any of the lower layer
2181 BPF hooks.
2182
2183 This helper may fail if under memory pressure (a malloc
2184 fails) in these cases BPF programs will get an appropri‐
2185 ate error and BPF programs will need to handle them.
2186
2187 Return 0 on success, or a negative error in case of failure.
2188
2189 long bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 len, u64
2190 flags)
2191
2192 Description
2193 Will remove len bytes from a msg starting at byte start.
2194 This may result in ENOMEM errors under certain situations
2195 if an allocation and copy are required due to a full ring
2196 buffer. However, the helper will try to avoid doing the
2197 allocation if possible. Other errors can occur if input
2198 parameters are invalid either due to start byte not being
2199 valid part of msg payload and/or pop value being to
2200 large.
2201
2202 Return 0 on success, or a negative error in case of failure.
2203
2204 long bpf_rc_pointer_rel(void *ctx, s32 rel_x, s32 rel_y)
2205
2206 Description
2207 This helper is used in programs implementing IR decoding,
2208 to report a successfully decoded pointer movement.
2209
2210 The ctx should point to the lirc sample as passed into
2211 the program.
2212
2213 This helper is only available is the kernel was compiled
2214 with the CONFIG_BPF_LIRC_MODE2 configuration option set
2215 to "y".
2216
2217 Return 0
2218
2219 long bpf_spin_lock(struct bpf_spin_lock *lock)
2220
2221 Description
2222 Acquire a spinlock represented by the pointer lock, which
2223 is stored as part of a value of a map. Taking the lock
2224 allows to safely update the rest of the fields in that
2225 value. The spinlock can (and must) later be released with
2226 a call to bpf_spin_unlock(lock).
2227
2228 Spinlocks in BPF programs come with a number of restric‐
2229 tions and constraints:
2230
2231 • bpf_spin_lock objects are only allowed inside maps of
2232 types BPF_MAP_TYPE_HASH and BPF_MAP_TYPE_ARRAY (this
2233 list could be extended in the future).
2234
2235 • BTF description of the map is mandatory.
2236
2237 • The BPF program can take ONE lock at a time, since tak‐
2238 ing two or more could cause dead locks.
2239
2240 • Only one struct bpf_spin_lock is allowed per map ele‐
2241 ment.
2242
2243 • When the lock is taken, calls (either BPF to BPF or
2244 helpers) are not allowed.
2245
2246 • The BPF_LD_ABS and BPF_LD_IND instructions are not al‐
2247 lowed inside a spinlock-ed region.
2248
2249 • The BPF program MUST call bpf_spin_unlock() to release
2250 the lock, on all execution paths, before it returns.
2251
2252 • The BPF program can access struct bpf_spin_lock only
2253 via the bpf_spin_lock() and bpf_spin_unlock() helpers.
2254 Loading or storing data into the struct bpf_spin_lock
2255 lock; field of a map is not allowed.
2256
2257 • To use the bpf_spin_lock() helper, the BTF description
2258 of the map value must be a struct and have struct
2259 bpf_spin_lock anyname; field at the top level. Nested
2260 lock inside another struct is not allowed.
2261
2262 • The struct bpf_spin_lock lock field in a map value must
2263 be aligned on a multiple of 4 bytes in that value.
2264
2265 • Syscall with command BPF_MAP_LOOKUP_ELEM does not copy
2266 the bpf_spin_lock field to user space.
2267
2268 • Syscall with command BPF_MAP_UPDATE_ELEM, or update
2269 from a BPF program, do not update the bpf_spin_lock
2270 field.
2271
2272 • bpf_spin_lock cannot be on the stack or inside a net‐
2273 working packet (it can only be inside of a map values).
2274
2275 • bpf_spin_lock is available to root only.
2276
2277 • Tracing programs and socket filter programs cannot use
2278 bpf_spin_lock() due to insufficient preemption checks
2279 (but this may change in the future).
2280
2281 • bpf_spin_lock is not allowed in inner maps of
2282 map-in-map.
2283
2284 Return 0
2285
2286 long bpf_spin_unlock(struct bpf_spin_lock *lock)
2287
2288 Description
2289 Release the lock previously locked by a call to
2290 bpf_spin_lock(lock).
2291
2292 Return 0
2293
2294 struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
2295
2296 Description
2297 This helper gets a struct bpf_sock pointer such that all
2298 the fields in this bpf_sock can be accessed.
2299
2300 Return A struct bpf_sock pointer on success, or NULL in case of
2301 failure.
2302
2303 struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
2304
2305 Description
2306 This helper gets a struct bpf_tcp_sock pointer from a
2307 struct bpf_sock pointer.
2308
2309 Return A struct bpf_tcp_sock pointer on success, or NULL in case
2310 of failure.
2311
2312 long bpf_skb_ecn_set_ce(struct sk_buff *skb)
2313
2314 Description
2315 Set ECN (Explicit Congestion Notification) field of IP
2316 header to CE (Congestion Encountered) if current value is
2317 ECT (ECN Capable Transport). Otherwise, do nothing. Works
2318 with IPv6 and IPv4.
2319
2320 Return 1 if the CE flag is set (either by the current helper
2321 call or because it was already present), 0 if it is not
2322 set.
2323
2324 struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)
2325
2326 Description
2327 Return a struct bpf_sock pointer in TCP_LISTEN state.
2328 bpf_sk_release() is unnecessary and not allowed.
2329
2330 Return A struct bpf_sock pointer on success, or NULL in case of
2331 failure.
2332
2333 struct bpf_sock *bpf_skc_lookup_tcp(void *ctx, struct bpf_sock_tuple
2334 *tuple, u32 tuple_size, u64 netns, u64 flags)
2335
2336 Description
2337 Look for TCP socket matching tuple, optionally in a child
2338 network namespace netns. The return value must be
2339 checked, and if non-NULL, released via bpf_sk_release().
2340
2341 This function is identical to bpf_sk_lookup_tcp(), except
2342 that it also returns timewait or request sockets. Use
2343 bpf_sk_fullsock() or bpf_tcp_sock() to access the full
2344 structure.
2345
2346 This helper is available only if the kernel was compiled
2347 with CONFIG_NET configuration option.
2348
2349 Return Pointer to struct bpf_sock, or NULL in case of failure.
2350 For sockets with reuseport option, the struct bpf_sock
2351 result is from reuse->socks[] using the hash of the tu‐
2352 ple.
2353
2354 long bpf_tcp_check_syncookie(void *sk, void *iph, u32 iph_len, struct
2355 tcphdr *th, u32 th_len)
2356
2357 Description
2358 Check whether iph and th contain a valid SYN cookie ACK
2359 for the listening socket in sk.
2360
2361 iph points to the start of the IPv4 or IPv6 header, while
2362 iph_len contains sizeof(struct iphdr) or sizeof(struct
2363 ipv6hdr).
2364
2365 th points to the start of the TCP header, while th_len
2366 contains the length of the TCP header (at least
2367 sizeof(struct tcphdr)).
2368
2369 Return 0 if iph and th are a valid SYN cookie ACK, or a negative
2370 error otherwise.
2371
2372 long bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t
2373 buf_len, u64 flags)
2374
2375 Description
2376 Get name of sysctl in /proc/sys/ and copy it into pro‐
2377 vided by program buffer buf of size buf_len.
2378
2379 The buffer is always NUL terminated, unless it's
2380 zero-sized.
2381
2382 If flags is zero, full name (e.g. "net/ipv4/tcp_mem") is
2383 copied. Use BPF_F_SYSCTL_BASE_NAME flag to copy base name
2384 only (e.g. "tcp_mem").
2385
2386 Return Number of character copied (not including the trailing
2387 NUL).
2388
2389 -E2BIG if the buffer wasn't big enough (buf will contain
2390 truncated name in this case).
2391
2392 long bpf_sysctl_get_current_value(struct bpf_sysctl *ctx, char *buf,
2393 size_t buf_len)
2394
2395 Description
2396 Get current value of sysctl as it is presented in
2397 /proc/sys (incl. newline, etc), and copy it as a string
2398 into provided by program buffer buf of size buf_len.
2399
2400 The whole value is copied, no matter what file position
2401 user space issued e.g. sys_read at.
2402
2403 The buffer is always NUL terminated, unless it's
2404 zero-sized.
2405
2406 Return Number of character copied (not including the trailing
2407 NUL).
2408
2409 -E2BIG if the buffer wasn't big enough (buf will contain
2410 truncated name in this case).
2411
2412 -EINVAL if current value was unavailable, e.g. because
2413 sysctl is uninitialized and read returns -EIO for it.
2414
2415 long bpf_sysctl_get_new_value(struct bpf_sysctl *ctx, char *buf, size_t
2416 buf_len)
2417
2418 Description
2419 Get new value being written by user space to sysctl (be‐
2420 fore the actual write happens) and copy it as a string
2421 into provided by program buffer buf of size buf_len.
2422
2423 User space may write new value at file position > 0.
2424
2425 The buffer is always NUL terminated, unless it's
2426 zero-sized.
2427
2428 Return Number of character copied (not including the trailing
2429 NUL).
2430
2431 -E2BIG if the buffer wasn't big enough (buf will contain
2432 truncated name in this case).
2433
2434 -EINVAL if sysctl is being read.
2435
2436 long bpf_sysctl_set_new_value(struct bpf_sysctl *ctx, const char *buf,
2437 size_t buf_len)
2438
2439 Description
2440 Override new value being written by user space to sysctl
2441 with value provided by program in buffer buf of size
2442 buf_len.
2443
2444 buf should contain a string in same form as provided by
2445 user space on sysctl write.
2446
2447 User space may write new value at file position > 0. To
2448 override the whole sysctl value file position should be
2449 set to zero.
2450
2451 Return 0 on success.
2452
2453 -E2BIG if the buf_len is too big.
2454
2455 -EINVAL if sysctl is being read.
2456
2457 long bpf_strtol(const char *buf, size_t buf_len, u64 flags, long *res)
2458
2459 Description
2460 Convert the initial part of the string from buffer buf of
2461 size buf_len to a long integer according to the given
2462 base and save the result in res.
2463
2464 The string may begin with an arbitrary amount of white
2465 space (as determined by isspace(3)) followed by a single
2466 optional '-' sign.
2467
2468 Five least significant bits of flags encode base, other
2469 bits are currently unused.
2470
2471 Base must be either 8, 10, 16 or 0 to detect it automati‐
2472 cally similar to user space strtol(3).
2473
2474 Return Number of characters consumed on success. Must be posi‐
2475 tive but no more than buf_len.
2476
2477 -EINVAL if no valid digits were found or unsupported base
2478 was provided.
2479
2480 -ERANGE if resulting value was out of range.
2481
2482 long bpf_strtoul(const char *buf, size_t buf_len, u64 flags, unsigned
2483 long *res)
2484
2485 Description
2486 Convert the initial part of the string from buffer buf of
2487 size buf_len to an unsigned long integer according to the
2488 given base and save the result in res.
2489
2490 The string may begin with an arbitrary amount of white
2491 space (as determined by isspace(3)).
2492
2493 Five least significant bits of flags encode base, other
2494 bits are currently unused.
2495
2496 Base must be either 8, 10, 16 or 0 to detect it automati‐
2497 cally similar to user space strtoul(3).
2498
2499 Return Number of characters consumed on success. Must be posi‐
2500 tive but no more than buf_len.
2501
2502 -EINVAL if no valid digits were found or unsupported base
2503 was provided.
2504
2505 -ERANGE if resulting value was out of range.
2506
2507 void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value,
2508 u64 flags)
2509
2510 Description
2511 Get a bpf-local-storage from a sk.
2512
2513 Logically, it could be thought of getting the value from
2514 a map with sk as the key. From this perspective, the
2515 usage is not much different from bpf_map_lookup_elem(map,
2516 &sk) except this helper enforces the key must be a full
2517 socket and the map must be a BPF_MAP_TYPE_SK_STORAGE
2518 also.
2519
2520 Underneath, the value is stored locally at sk instead of
2521 the map. The map is used as the bpf-local-storage
2522 "type". The bpf-local-storage "type" (i.e. the map) is
2523 searched against all bpf-local-storages residing at sk.
2524
2525 sk is a kernel struct sock pointer for LSM program. sk
2526 is a struct bpf_sock pointer for other program types.
2527
2528 An optional flags (BPF_SK_STORAGE_GET_F_CREATE) can be
2529 used such that a new bpf-local-storage will be created if
2530 one does not exist. value can be used together with
2531 BPF_SK_STORAGE_GET_F_CREATE to specify the initial value
2532 of a bpf-local-storage. If value is NULL, the new
2533 bpf-local-storage will be zero initialized.
2534
2535 Return A bpf-local-storage pointer is returned on success.
2536
2537 NULL if not found or there was an error in adding a new
2538 bpf-local-storage.
2539
2540 long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
2541
2542 Description
2543 Delete a bpf-local-storage from a sk.
2544
2545 Return 0 on success.
2546
2547 -ENOENT if the bpf-local-storage cannot be found. -EIN‐
2548 VAL if sk is not a fullsock (e.g. a request_sock).
2549
2550 long bpf_send_signal(u32 sig)
2551
2552 Description
2553 Send signal sig to the process of the current task. The
2554 signal may be delivered to any of this process's threads.
2555
2556 Return 0 on success or successfully queued.
2557
2558 -EBUSY if work queue under nmi is full.
2559
2560 -EINVAL if sig is invalid.
2561
2562 -EPERM if no permission to send the sig.
2563
2564 -EAGAIN if bpf program can try again.
2565
2566 s64 bpf_tcp_gen_syncookie(void *sk, void *iph, u32 iph_len, struct
2567 tcphdr *th, u32 th_len)
2568
2569 Description
2570 Try to issue a SYN cookie for the packet with correspond‐
2571 ing IP/TCP headers, iph and th, on the listening socket
2572 in sk.
2573
2574 iph points to the start of the IPv4 or IPv6 header, while
2575 iph_len contains sizeof(struct iphdr) or sizeof(struct
2576 ipv6hdr).
2577
2578 th points to the start of the TCP header, while th_len
2579 contains the length of the TCP header with options (at
2580 least sizeof(struct tcphdr)).
2581
2582 Return On success, lower 32 bits hold the generated SYN cookie
2583 in followed by 16 bits which hold the MSS value for that
2584 cookie, and the top 16 bits are unused.
2585
2586 On failure, the returned value is one of the following:
2587
2588 -EINVAL SYN cookie cannot be issued due to error
2589
2590 -ENOENT SYN cookie should not be issued (no SYN flood)
2591
2592 -EOPNOTSUPP kernel configuration does not enable SYN
2593 cookies
2594
2595 -EPROTONOSUPPORT IP packet version is not 4 or 6
2596
2597 long bpf_skb_output(void *ctx, struct bpf_map *map, u64 flags, void
2598 *data, u64 size)
2599
2600 Description
2601 Write raw data blob into a special BPF perf event held by
2602 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
2603 event must have the following attributes: PERF_SAMPLE_RAW
2604 as sample_type, PERF_TYPE_SOFTWARE as type, and
2605 PERF_COUNT_SW_BPF_OUTPUT as config.
2606
2607 The flags are used to indicate the index in map for which
2608 the value must be put, masked with BPF_F_INDEX_MASK. Al‐
2609 ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
2610 dicate that the index of the current CPU core should be
2611 used.
2612
2613 The value to write, of size, is passed through eBPF stack
2614 and pointed by data.
2615
2616 ctx is a pointer to in-kernel struct sk_buff.
2617
2618 This helper is similar to bpf_perf_event_output() but re‐
2619 stricted to raw_tracepoint bpf programs.
2620
2621 Return 0 on success, or a negative error in case of failure.
2622
2623 long bpf_probe_read_user(void *dst, u32 size, const void *unsafe_ptr)
2624
2625 Description
2626 Safely attempt to read size bytes from user space address
2627 unsafe_ptr and store the data in dst.
2628
2629 Return 0 on success, or a negative error in case of failure.
2630
2631 long bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
2632
2633 Description
2634 Safely attempt to read size bytes from kernel space ad‐
2635 dress unsafe_ptr and store the data in dst.
2636
2637 Return 0 on success, or a negative error in case of failure.
2638
2639 long bpf_probe_read_user_str(void *dst, u32 size, const void *un‐
2640 safe_ptr)
2641
2642 Description
2643 Copy a NUL terminated string from an unsafe user address
2644 unsafe_ptr to dst. The size should include the terminat‐
2645 ing NUL byte. In case the string length is smaller than
2646 size, the target is not padded with further NUL bytes. If
2647 the string length is larger than size, just size-1 bytes
2648 are copied and the last byte is set to NUL.
2649
2650 On success, returns the number of bytes that were writ‐
2651 ten, including the terminal NUL. This makes this helper
2652 useful in tracing programs for reading strings, and more
2653 importantly to get its length at runtime. See the follow‐
2654 ing snippet:
2655
2656 SEC("kprobe/sys_open")
2657 void bpf_sys_open(struct pt_regs *ctx)
2658 {
2659 char buf[PATHLEN]; // PATHLEN is defined to 256
2660 int res;
2661
2662 res = bpf_probe_read_user_str(buf, sizeof(buf),
2663 ctx->di);
2664
2665 // Consume buf, for example push it to
2666 // userspace via bpf_perf_event_output(); we
2667 // can use res (the string length) as event
2668 // size, after checking its boundaries.
2669 }
2670
2671 In comparison, using bpf_probe_read_user() helper here
2672 instead to read the string would require to estimate the
2673 length at compile time, and would often result in copying
2674 more memory than necessary.
2675
2676 Another useful use case is when parsing individual
2677 process arguments or individual environment variables
2678 navigating current->mm->arg_start and cur‐
2679 rent->mm->env_start: using this helper and the return
2680 value, one can quickly iterate at the right offset of the
2681 memory area.
2682
2683 Return On success, the strictly positive length of the output
2684 string, including the trailing NUL character. On error, a
2685 negative value.
2686
2687 long bpf_probe_read_kernel_str(void *dst, u32 size, const void *un‐
2688 safe_ptr)
2689
2690 Description
2691 Copy a NUL terminated string from an unsafe kernel ad‐
2692 dress unsafe_ptr to dst. Same semantics as with
2693 bpf_probe_read_user_str() apply.
2694
2695 Return On success, the strictly positive length of the string,
2696 including the trailing NUL character. On error, a nega‐
2697 tive value.
2698
2699 long bpf_tcp_send_ack(void *tp, u32 rcv_nxt)
2700
2701 Description
2702 Send out a tcp-ack. tp is the in-kernel struct tcp_sock.
2703 rcv_nxt is the ack_seq to be sent out.
2704
2705 Return 0 on success, or a negative error in case of failure.
2706
2707 long bpf_send_signal_thread(u32 sig)
2708
2709 Description
2710 Send signal sig to the thread corresponding to the cur‐
2711 rent task.
2712
2713 Return 0 on success or successfully queued.
2714
2715 -EBUSY if work queue under nmi is full.
2716
2717 -EINVAL if sig is invalid.
2718
2719 -EPERM if no permission to send the sig.
2720
2721 -EAGAIN if bpf program can try again.
2722
2723 u64 bpf_jiffies64(void)
2724
2725 Description
2726 Obtain the 64bit jiffies
2727
2728 Return The 64 bit jiffies
2729
2730 long bpf_read_branch_records(struct bpf_perf_event_data *ctx, void
2731 *buf, u32 size, u64 flags)
2732
2733 Description
2734 For an eBPF program attached to a perf event, retrieve
2735 the branch records (struct perf_branch_entry) associated
2736 to ctx and store it in the buffer pointed by buf up to
2737 size size bytes.
2738
2739 Return On success, number of bytes written to buf. On error, a
2740 negative value.
2741
2742 The flags can be set to BPF_F_GET_BRANCH_RECORDS_SIZE to
2743 instead return the number of bytes required to store all
2744 the branch entries. If this flag is set, buf may be NULL.
2745
2746 -EINVAL if arguments invalid or size not a multiple of
2747 sizeof(struct perf_branch_entry).
2748
2749 -ENOENT if architecture does not support branch records.
2750
2751 long bpf_get_ns_current_pid_tgid(u64 dev, u64 ino, struct
2752 bpf_pidns_info *nsdata, u32 size)
2753
2754 Description
2755 Returns 0 on success, values for pid and tgid as seen
2756 from the current namespace will be returned in nsdata.
2757
2758 Return 0 on success, or one of the following in case of failure:
2759
2760 -EINVAL if dev and inum supplied don't match dev_t and
2761 inode number with nsfs of current task, or if dev conver‐
2762 sion to dev_t lost high bits.
2763
2764 -ENOENT if pidns does not exists for the current task.
2765
2766 long bpf_xdp_output(void *ctx, struct bpf_map *map, u64 flags, void
2767 *data, u64 size)
2768
2769 Description
2770 Write raw data blob into a special BPF perf event held by
2771 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
2772 event must have the following attributes: PERF_SAMPLE_RAW
2773 as sample_type, PERF_TYPE_SOFTWARE as type, and
2774 PERF_COUNT_SW_BPF_OUTPUT as config.
2775
2776 The flags are used to indicate the index in map for which
2777 the value must be put, masked with BPF_F_INDEX_MASK. Al‐
2778 ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
2779 dicate that the index of the current CPU core should be
2780 used.
2781
2782 The value to write, of size, is passed through eBPF stack
2783 and pointed by data.
2784
2785 ctx is a pointer to in-kernel struct xdp_buff.
2786
2787 This helper is similar to bpf_perf_eventoutput() but re‐
2788 stricted to raw_tracepoint bpf programs.
2789
2790 Return 0 on success, or a negative error in case of failure.
2791
2792 u64 bpf_get_netns_cookie(void *ctx)
2793
2794 Description
2795 Retrieve the cookie (generated by the kernel) of the net‐
2796 work namespace the input ctx is associated with. The net‐
2797 work namespace cookie remains stable for its lifetime and
2798 provides a global identifier that can be assumed unique.
2799 If ctx is NULL, then the helper returns the cookie for
2800 the initial network namespace. The cookie itself is very
2801 similar to that of bpf_get_socket_cookie() helper, but
2802 for network namespaces instead of sockets.
2803
2804 Return A 8-byte long opaque number.
2805
2806 u64 bpf_get_current_ancestor_cgroup_id(int ancestor_level)
2807
2808 Description
2809 Return id of cgroup v2 that is ancestor of the cgroup as‐
2810 sociated with the current task at the ancestor_level. The
2811 root cgroup is at ancestor_level zero and each step down
2812 the hierarchy increments the level. If ancestor_level ==
2813 level of cgroup associated with the current task, then
2814 return value will be the same as that of bpf_get_cur‐
2815 rent_cgroup_id().
2816
2817 The helper is useful to implement policies based on
2818 cgroups that are upper in hierarchy than immediate cgroup
2819 associated with the current task.
2820
2821 The format of returned id and helper limitations are same
2822 as in bpf_get_current_cgroup_id().
2823
2824 Return The id is returned or 0 in case the id could not be re‐
2825 trieved.
2826
2827 long bpf_sk_assign(struct sk_buff *skb, void *sk, u64 flags)
2828
2829 Description
2830 Helper is overloaded depending on BPF program type. This
2831 description applies to BPF_PROG_TYPE_SCHED_CLS and
2832 BPF_PROG_TYPE_SCHED_ACT programs.
2833
2834 Assign the sk to the skb. When combined with appropriate
2835 routing configuration to receive the packet towards the
2836 socket, will cause skb to be delivered to the specified
2837 socket. Subsequent redirection of skb via bpf_redi‐
2838 rect(), bpf_clone_redirect() or other methods outside of
2839 BPF may interfere with successful delivery to the socket.
2840
2841 This operation is only valid from TC ingress path.
2842
2843 The flags argument must be zero.
2844
2845 Return 0 on success, or a negative error in case of failure:
2846
2847 -EINVAL if specified flags are not supported.
2848
2849 -ENOENT if the socket is unavailable for assignment.
2850
2851 -ENETUNREACH if the socket is unreachable (wrong netns).
2852
2853 -EOPNOTSUPP if the operation is not supported, for exam‐
2854 ple a call from outside of TC ingress.
2855
2856 -ESOCKTNOSUPPORT if the socket type is not supported
2857 (reuseport).
2858
2859 long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64
2860 flags)
2861
2862 Description
2863 Helper is overloaded depending on BPF program type. This
2864 description applies to BPF_PROG_TYPE_SK_LOOKUP programs.
2865
2866 Select the sk as a result of a socket lookup.
2867
2868 For the operation to succeed passed socket must be com‐
2869 patible with the packet description provided by the ctx
2870 object.
2871
2872 L4 protocol (IPPROTO_TCP or IPPROTO_UDP) must be an exact
2873 match. While IP family (AF_INET or AF_INET6) must be com‐
2874 patible, that is IPv6 sockets that are not v6-only can be
2875 selected for IPv4 packets.
2876
2877 Only TCP listeners and UDP unconnected sockets can be se‐
2878 lected. sk can also be NULL to reset any previous selec‐
2879 tion.
2880
2881 flags argument can combination of following values:
2882
2883 • BPF_SK_LOOKUP_F_REPLACE to override the previous socket
2884 selection, potentially done by a BPF program that ran
2885 before us.
2886
2887 • BPF_SK_LOOKUP_F_NO_REUSEPORT to skip load-balancing
2888 within reuseport group for the socket being selected.
2889
2890 On success ctx->sk will point to the selected socket.
2891
2892 Return 0 on success, or a negative errno in case of failure.
2893
2894 • -EAFNOSUPPORT if socket family (sk->family) is not com‐
2895 patible with packet family (ctx->family).
2896
2897 • -EEXIST if socket has been already selected, poten‐
2898 tially by another program, and BPF_SK_LOOKUP_F_REPLACE
2899 flag was not specified.
2900
2901 • -EINVAL if unsupported flags were specified.
2902
2903 • -EPROTOTYPE if socket L4 protocol (sk->protocol)
2904 doesn't match packet protocol (ctx->protocol).
2905
2906 • -ESOCKTNOSUPPORT if socket is not in allowed state (TCP
2907 listening or UDP unconnected).
2908
2909 u64 bpf_ktime_get_boot_ns(void)
2910
2911 Description
2912 Return the time elapsed since system boot, in nanosec‐
2913 onds. Does include the time the system was suspended.
2914 See: clock_gettime(CLOCK_BOOTTIME)
2915
2916 Return Current ktime.
2917
2918 long bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size,
2919 const void *data, u32 data_len)
2920
2921 Description
2922 bpf_seq_printf() uses seq_file seq_printf() to print out
2923 the format string. The m represents the seq_file. The
2924 fmt and fmt_size are for the format string itself. The
2925 data and data_len are format string arguments. The data
2926 are a u64 array and corresponding format string values
2927 are stored in the array. For strings and pointers where
2928 pointees are accessed, only the pointer values are stored
2929 in the data array. The data_len is the size of data in
2930 bytes - must be a multiple of 8.
2931
2932 Formats %s, %p{i,I}{4,6} requires to read kernel memory.
2933 Reading kernel memory may fail due to either invalid ad‐
2934 dress or valid address but requiring a major memory
2935 fault. If reading kernel memory fails, the string for %s
2936 will be an empty string, and the ip address for
2937 %p{i,I}{4,6} will be 0. Not returning error to bpf pro‐
2938 gram is consistent with what bpf_trace_printk() does for
2939 now.
2940
2941 Return 0 on success, or a negative error in case of failure:
2942
2943 -EBUSY if per-CPU memory copy buffer is busy, can try
2944 again by returning 1 from bpf program.
2945
2946 -EINVAL if arguments are invalid, or if fmt is in‐
2947 valid/unsupported.
2948
2949 -E2BIG if fmt contains too many format specifiers.
2950
2951 -EOVERFLOW if an overflow happened: The same object will
2952 be tried again.
2953
2954 long bpf_seq_write(struct seq_file *m, const void *data, u32 len)
2955
2956 Description
2957 bpf_seq_write() uses seq_file seq_write() to write the
2958 data. The m represents the seq_file. The data and len
2959 represent the data to write in bytes.
2960
2961 Return 0 on success, or a negative error in case of failure:
2962
2963 -EOVERFLOW if an overflow happened: The same object will
2964 be tried again.
2965
2966 u64 bpf_sk_cgroup_id(void *sk)
2967
2968 Description
2969 Return the cgroup v2 id of the socket sk.
2970
2971 sk must be a non-NULL pointer to a socket, e.g. one re‐
2972 turned from bpf_sk_lookup_xxx(), bpf_sk_fullsock(), etc.
2973 The format of returned id is same as in
2974 bpf_skb_cgroup_id().
2975
2976 This helper is available only if the kernel was compiled
2977 with the CONFIG_SOCK_CGROUP_DATA configuration option.
2978
2979 Return The id is returned or 0 in case the id could not be re‐
2980 trieved.
2981
2982 u64 bpf_sk_ancestor_cgroup_id(void *sk, int ancestor_level)
2983
2984 Description
2985 Return id of cgroup v2 that is ancestor of cgroup associ‐
2986 ated with the sk at the ancestor_level. The root cgroup
2987 is at ancestor_level zero and each step down the hierar‐
2988 chy increments the level. If ancestor_level == level of
2989 cgroup associated with sk, then return value will be same
2990 as that of bpf_sk_cgroup_id().
2991
2992 The helper is useful to implement policies based on
2993 cgroups that are upper in hierarchy than immediate cgroup
2994 associated with sk.
2995
2996 The format of returned id and helper limitations are same
2997 as in bpf_sk_cgroup_id().
2998
2999 Return The id is returned or 0 in case the id could not be re‐
3000 trieved.
3001
3002 long bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64 flags)
3003
3004 Description
3005 Copy size bytes from data into a ring buffer ringbuf. If
3006 BPF_RB_NO_WAKEUP is specified in flags, no notification
3007 of new data availability is sent. If BPF_RB_FORCE_WAKEUP
3008 is specified in flags, notification of new data avail‐
3009 ability is sent unconditionally. If 0 is specified in
3010 flags, an adaptive notification of new data availability
3011 is sent.
3012
3013 An adaptive notification is a notification sent whenever
3014 the user-space process has caught up and consumed all
3015 available payloads. In case the user-space process is
3016 still processing a previous payload, then no notification
3017 is needed as it will process the newly added payload au‐
3018 tomatically.
3019
3020 Return 0 on success, or a negative error in case of failure.
3021
3022 void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
3023
3024 Description
3025 Reserve size bytes of payload in a ring buffer ringbuf.
3026 flags must be 0.
3027
3028 Return Valid pointer with size bytes of memory available; NULL,
3029 otherwise.
3030
3031 void bpf_ringbuf_submit(void *data, u64 flags)
3032
3033 Description
3034 Submit reserved ring buffer sample, pointed to by data.
3035 If BPF_RB_NO_WAKEUP is specified in flags, no notifica‐
3036 tion of new data availability is sent. If
3037 BPF_RB_FORCE_WAKEUP is specified in flags, notification
3038 of new data availability is sent unconditionally. If 0
3039 is specified in flags, an adaptive notification of new
3040 data availability is sent.
3041
3042 See 'bpf_ringbuf_output()' for the definition of adaptive
3043 notification.
3044
3045 Return Nothing. Always succeeds.
3046
3047 void bpf_ringbuf_discard(void *data, u64 flags)
3048
3049 Description
3050 Discard reserved ring buffer sample, pointed to by data.
3051 If BPF_RB_NO_WAKEUP is specified in flags, no notifica‐
3052 tion of new data availability is sent. If
3053 BPF_RB_FORCE_WAKEUP is specified in flags, notification
3054 of new data availability is sent unconditionally. If 0
3055 is specified in flags, an adaptive notification of new
3056 data availability is sent.
3057
3058 See 'bpf_ringbuf_output()' for the definition of adaptive
3059 notification.
3060
3061 Return Nothing. Always succeeds.
3062
3063 u64 bpf_ringbuf_query(void *ringbuf, u64 flags)
3064
3065 Description
3066 Query various characteristics of provided ring buffer.
3067 What exactly is queries is determined by flags:
3068
3069 • BPF_RB_AVAIL_DATA: Amount of data not yet consumed.
3070
3071 • BPF_RB_RING_SIZE: The size of ring buffer.
3072
3073 • BPF_RB_CONS_POS: Consumer position (can wrap around).
3074
3075 • BPF_RB_PROD_POS: Producer(s) position (can wrap
3076 around).
3077
3078 Data returned is just a momentary snapshot of actual val‐
3079 ues and could be inaccurate, so this facility should be
3080 used to power heuristics and for reporting, not to make
3081 100% correct calculation.
3082
3083 Return Requested value, or 0, if flags are not recognized.
3084
3085 long bpf_csum_level(struct sk_buff *skb, u64 level)
3086
3087 Description
3088 Change the skbs checksum level by one layer up or down,
3089 or reset it entirely to none in order to have the stack
3090 perform checksum validation. The level is applicable to
3091 the following protocols: TCP, UDP, GRE, SCTP, FCOE. For
3092 example, a decap of | ETH | IP | UDP | GUE | IP | TCP |
3093 into | ETH | IP | TCP | through bpf_skb_adjust_room()
3094 helper with passing in BPF_F_ADJ_ROOM_NO_CSUM_RESET flag
3095 would require one call to bpf_csum_level() with
3096 BPF_CSUM_LEVEL_DEC since the UDP header is removed. Simi‐
3097 larly, an encap of the latter into the former could be
3098 accompanied by a helper call to bpf_csum_level() with
3099 BPF_CSUM_LEVEL_INC if the skb is still intended to be
3100 processed in higher layers of the stack instead of just
3101 egressing at tc.
3102
3103 There are three supported level settings at this time:
3104
3105 • BPF_CSUM_LEVEL_INC: Increases skb->csum_level for skbs
3106 with CHECKSUM_UNNECESSARY.
3107
3108 • BPF_CSUM_LEVEL_DEC: Decreases skb->csum_level for skbs
3109 with CHECKSUM_UNNECESSARY.
3110
3111 • BPF_CSUM_LEVEL_RESET: Resets skb->csum_level to 0 and
3112 sets CHECKSUM_NONE to force checksum validation by the
3113 stack.
3114
3115 • BPF_CSUM_LEVEL_QUERY: No-op, returns the current
3116 skb->csum_level.
3117
3118 Return 0 on success, or a negative error in case of failure. In
3119 the case of BPF_CSUM_LEVEL_QUERY, the current
3120 skb->csum_level is returned or the error code -EACCES in
3121 case the skb is not subject to CHECKSUM_UNNECESSARY.
3122
3123 struct tcp6_sock *bpf_skc_to_tcp6_sock(void *sk)
3124
3125 Description
3126 Dynamically cast a sk pointer to a tcp6_sock pointer.
3127
3128 Return sk if casting is valid, or NULL otherwise.
3129
3130 struct tcp_sock *bpf_skc_to_tcp_sock(void *sk)
3131
3132 Description
3133 Dynamically cast a sk pointer to a tcp_sock pointer.
3134
3135 Return sk if casting is valid, or NULL otherwise.
3136
3137 struct tcp_timewait_sock *bpf_skc_to_tcp_timewait_sock(void *sk)
3138
3139 Description
3140 Dynamically cast a sk pointer to a tcp_timewait_sock
3141 pointer.
3142
3143 Return sk if casting is valid, or NULL otherwise.
3144
3145 struct tcp_request_sock *bpf_skc_to_tcp_request_sock(void *sk)
3146
3147 Description
3148 Dynamically cast a sk pointer to a tcp_request_sock
3149 pointer.
3150
3151 Return sk if casting is valid, or NULL otherwise.
3152
3153 struct udp6_sock *bpf_skc_to_udp6_sock(void *sk)
3154
3155 Description
3156 Dynamically cast a sk pointer to a udp6_sock pointer.
3157
3158 Return sk if casting is valid, or NULL otherwise.
3159
3160 long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size,
3161 u64 flags)
3162
3163 Description
3164 Return a user or a kernel stack in bpf program provided
3165 buffer. To achieve this, the helper needs task, which is
3166 a valid pointer to struct task_struct. To store the
3167 stacktrace, the bpf program provides buf with a nonnega‐
3168 tive size.
3169
3170 The last argument, flags, holds the number of stack
3171 frames to skip (from 0 to 255), masked with
3172 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set
3173 the following flags:
3174
3175 BPF_F_USER_STACK
3176 Collect a user space stack instead of a kernel
3177 stack.
3178
3179 BPF_F_USER_BUILD_ID
3180 Collect buildid+offset instead of ips for user
3181 stack, only valid if BPF_F_USER_STACK is also
3182 specified.
3183
3184 bpf_get_task_stack() can collect up to
3185 PERF_MAX_STACK_DEPTH both kernel and user frames, subject
3186 to sufficient large buffer size. Note that this limit can
3187 be controlled with the sysctl program, and that it should
3188 be manually increased in order to profile long user
3189 stacks (such as stacks for Java programs). To do so, use:
3190
3191 # sysctl kernel.perf_event_max_stack=<new value>
3192
3193 Return The non-negative copied buf length equal to or less than
3194 size on success, or a negative error in case of failure.
3195
3196 long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res,
3197 u32 len, u64 flags)
3198
3199 Description
3200 Load header option. Support reading a particular TCP
3201 header option for bpf program (BPF_PROG_TYPE_SOCK_OPS).
3202
3203 If flags is 0, it will search the option from the
3204 skops->skb_data. The comment in struct bpf_sock_ops has
3205 details on what skb_data contains under different
3206 skops->op.
3207
3208 The first byte of the searchby_res specifies the kind
3209 that it wants to search.
3210
3211 If the searching kind is an experimental kind (i.e. 253
3212 or 254 according to RFC6994). It also needs to specify
3213 the "magic" which is either 2 bytes or 4 bytes. It then
3214 also needs to specify the size of the magic by using the
3215 2nd byte which is "kind-length" of a TCP header option
3216 and the "kind-length" also includes the first 2 bytes
3217 "kind" and "kind-length" itself as a normal TCP header
3218 option also does.
3219
3220 For example, to search experimental kind 254 with 2 byte
3221 magic 0xeB9F, the searchby_res should be [ 254, 4, 0xeB,
3222 0x9F, 0, 0, .... 0 ].
3223
3224 To search for the standard window scale option (3), the
3225 searchby_res should be [ 3, 0, 0, .... 0 ]. Note,
3226 kind-length must be 0 for regular option.
3227
3228 Searching for No-Op (0) and End-of-Option-List (1) are
3229 not supported.
3230
3231 len must be at least 2 bytes which is the minimal size of
3232 a header option.
3233
3234 Supported flags:
3235
3236 • BPF_LOAD_HDR_OPT_TCP_SYN to search from the saved_syn
3237 packet or the just-received syn packet.
3238
3239 Return > 0 when found, the header option is copied to
3240 searchby_res. The return value is the total length
3241 copied. On failure, a negative error code is returned:
3242
3243 -EINVAL if a parameter is invalid.
3244
3245 -ENOMSG if the option is not found.
3246
3247 -ENOENT if no syn packet is available when
3248 BPF_LOAD_HDR_OPT_TCP_SYN is used.
3249
3250 -ENOSPC if there is not enough space. Only len number of
3251 bytes are copied.
3252
3253 -EFAULT on failure to parse the header options in the
3254 packet.
3255
3256 -EPERM if the helper cannot be used under the current
3257 skops->op.
3258
3259 long bpf_store_hdr_opt(struct bpf_sock_ops *skops, const void *from,
3260 u32 len, u64 flags)
3261
3262 Description
3263 Store header option. The data will be copied from buffer
3264 from with length len to the TCP header.
3265
3266 The buffer from should have the whole option that in‐
3267 cludes the kind, kind-length, and the actual option data.
3268 The len must be at least kind-length long. The
3269 kind-length does not have to be 4 byte aligned. The ker‐
3270 nel will take care of the padding and setting the 4 bytes
3271 aligned value to th->doff.
3272
3273 This helper will check for duplicated option by searching
3274 the same option in the outgoing skb.
3275
3276 This helper can only be called during
3277 BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
3278
3279 Return 0 on success, or negative error in case of failure:
3280
3281 -EINVAL If param is invalid.
3282
3283 -ENOSPC if there is not enough space in the header.
3284 Nothing has been written
3285
3286 -EEXIST if the option already exists.
3287
3288 -EFAULT on failure to parse the existing header options.
3289
3290 -EPERM if the helper cannot be used under the current
3291 skops->op.
3292
3293 long bpf_reserve_hdr_opt(struct bpf_sock_ops *skops, u32 len, u64
3294 flags)
3295
3296 Description
3297 Reserve len bytes for the bpf header option. The space
3298 will be used by bpf_store_hdr_opt() later in
3299 BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
3300
3301 If bpf_reserve_hdr_opt() is called multiple times, the
3302 total number of bytes will be reserved.
3303
3304 This helper can only be called during
3305 BPF_SOCK_OPS_HDR_OPT_LEN_CB.
3306
3307 Return 0 on success, or negative error in case of failure:
3308
3309 -EINVAL if a parameter is invalid.
3310
3311 -ENOSPC if there is not enough space in the header.
3312
3313 -EPERM if the helper cannot be used under the current
3314 skops->op.
3315
3316 void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void
3317 *value, u64 flags)
3318
3319 Description
3320 Get a bpf_local_storage from an inode.
3321
3322 Logically, it could be thought of as getting the value
3323 from a map with inode as the key. From this perspective,
3324 the usage is not much different from
3325 bpf_map_lookup_elem(map, &inode) except this helper en‐
3326 forces the key must be an inode and the map must also be
3327 a BPF_MAP_TYPE_INODE_STORAGE.
3328
3329 Underneath, the value is stored locally at inode instead
3330 of the map. The map is used as the bpf-local-storage
3331 "type". The bpf-local-storage "type" (i.e. the map) is
3332 searched against all bpf_local_storage residing at inode.
3333
3334 An optional flags (BPF_LOCAL_STORAGE_GET_F_CREATE) can be
3335 used such that a new bpf_local_storage will be created if
3336 one does not exist. value can be used together with
3337 BPF_LOCAL_STORAGE_GET_F_CREATE to specify the initial
3338 value of a bpf_local_storage. If value is NULL, the new
3339 bpf_local_storage will be zero initialized.
3340
3341 Return A bpf_local_storage pointer is returned on success.
3342
3343 NULL if not found or there was an error in adding a new
3344 bpf_local_storage.
3345
3346 int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
3347
3348 Description
3349 Delete a bpf_local_storage from an inode.
3350
3351 Return 0 on success.
3352
3353 -ENOENT if the bpf_local_storage cannot be found.
3354
3355 long bpf_d_path(struct path *path, char *buf, u32 sz)
3356
3357 Description
3358 Return full path for given struct path object, which
3359 needs to be the kernel BTF path object. The path is re‐
3360 turned in the provided buffer buf of size sz and is zero
3361 terminated.
3362
3363 Return On success, the strictly positive length of the string,
3364 including the trailing NUL character. On error, a nega‐
3365 tive value.
3366
3367 long bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
3368
3369 Description
3370 Read size bytes from user space address user_ptr and
3371 store the data in dst. This is a wrapper of
3372 copy_from_user().
3373
3374 Return 0 on success, or a negative error in case of failure.
3375
3376 long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32
3377 btf_ptr_size, u64 flags)
3378
3379 Description
3380 Use BTF to store a string representation of ptr->ptr in
3381 str, using ptr->type_id. This value should specify the
3382 type that ptr->ptr points to. LLVM
3383 __builtin_btf_type_id(type, 1) can be used to look up vm‐
3384 linux BTF type ids. Traversing the data structure using
3385 BTF, the type information and values are stored in the
3386 first str_size - 1 bytes of str. Safe copy of the
3387 pointer data is carried out to avoid kernel crashes dur‐
3388 ing operation. Smaller types can use string space on the
3389 stack; larger programs can use map data to store the
3390 string representation.
3391
3392 The string can be subsequently shared with userspace via
3393 bpf_perf_event_output() or ring buffer interfaces.
3394 bpf_trace_printk() is to be avoided as it places too
3395 small a limit on string size to be useful.
3396
3397 flags is a combination of
3398
3399 BTF_F_COMPACT
3400 no formatting around type information
3401
3402 BTF_F_NONAME
3403 no struct/union member names/types
3404
3405 BTF_F_PTR_RAW
3406 show raw (unobfuscated) pointer values; equivalent
3407 to printk specifier %px.
3408
3409 BTF_F_ZERO
3410 show zero-valued struct/union members; they are
3411 not displayed by default
3412
3413 Return The number of bytes that were written (or would have been
3414 written if output had to be truncated due to string
3415 size), or a negative error in cases of failure.
3416
3417 long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32
3418 ptr_size, u64 flags)
3419
3420 Description
3421 Use BTF to write to seq_write a string representation of
3422 ptr->ptr, using ptr->type_id as per bpf_snprintf_btf().
3423 flags are identical to those used for bpf_snprintf_btf.
3424
3425 Return 0 on success or a negative error in case of failure.
3426
3427 u64 bpf_skb_cgroup_classid(struct sk_buff *skb)
3428
3429 Description
3430 See bpf_get_cgroup_classid() for the main description.
3431 This helper differs from bpf_get_cgroup_classid() in that
3432 the cgroup v1 net_cls class is retrieved only from the
3433 skb's associated socket instead of the current process.
3434
3435 Return The id is returned or 0 in case the id could not be re‐
3436 trieved.
3437
3438 long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params,
3439 int plen, u64 flags)
3440
3441 Description
3442 Redirect the packet to another net device of index
3443 ifindex and fill in L2 addresses from neighboring subsys‐
3444 tem. This helper is somewhat similar to bpf_redirect(),
3445 except that it populates L2 addresses as well, meaning,
3446 internally, the helper relies on the neighbor lookup for
3447 the L2 address of the nexthop.
3448
3449 The helper will perform a FIB lookup based on the skb's
3450 networking header to get the address of the next hop, un‐
3451 less this is supplied by the caller in the params argu‐
3452 ment. The plen argument indicates the len of params and
3453 should be set to 0 if params is NULL.
3454
3455 The flags argument is reserved and must be 0. The helper
3456 is currently only supported for tc BPF program types, and
3457 enabled for IPv4 and IPv6 protocols.
3458
3459 Return The helper returns TC_ACT_REDIRECT on success or
3460 TC_ACT_SHOT on error.
3461
3462 void *bpf_per_cpu_ptr(const void *percpu_ptr, u32 cpu)
3463
3464 Description
3465 Take a pointer to a percpu ksym, percpu_ptr, and return a
3466 pointer to the percpu kernel variable on cpu. A ksym is
3467 an extern variable decorated with '__ksym'. For ksym,
3468 there is a global var (either static or global) defined
3469 of the same name in the kernel. The ksym is percpu if the
3470 global var is percpu. The returned pointer points to the
3471 global percpu var on cpu.
3472
3473 bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr()
3474 in the kernel, except that bpf_per_cpu_ptr() may return
3475 NULL. This happens if cpu is larger than nr_cpu_ids. The
3476 caller of bpf_per_cpu_ptr() must check the returned
3477 value.
3478
3479 Return A pointer pointing to the kernel percpu variable on cpu,
3480 or NULL, if cpu is invalid.
3481
3482 void *bpf_this_cpu_ptr(const void *percpu_ptr)
3483
3484 Description
3485 Take a pointer to a percpu ksym, percpu_ptr, and return a
3486 pointer to the percpu kernel variable on this cpu. See
3487 the description of 'ksym' in bpf_per_cpu_ptr().
3488
3489 bpf_this_cpu_ptr() has the same semantic as
3490 this_cpu_ptr() in the kernel. Different from
3491 bpf_per_cpu_ptr(), it would never return NULL.
3492
3493 Return A pointer pointing to the kernel percpu variable on this
3494 cpu.
3495
3496 long bpf_redirect_peer(u32 ifindex, u64 flags)
3497
3498 Description
3499 Redirect the packet to another net device of index
3500 ifindex. This helper is somewhat similar to bpf_redi‐
3501 rect(), except that the redirection happens to the
3502 ifindex' peer device and the netns switch takes place
3503 from ingress to ingress without going through the CPU's
3504 backlog queue.
3505
3506 The flags argument is reserved and must be 0. The helper
3507 is currently only supported for tc BPF program types at
3508 the ingress hook and for veth device types. The peer de‐
3509 vice must reside in a different network namespace.
3510
3511 Return The helper returns TC_ACT_REDIRECT on success or
3512 TC_ACT_SHOT on error.
3513
3514 void *bpf_task_storage_get(struct bpf_map *map, struct task_struct
3515 *task, void *value, u64 flags)
3516
3517 Description
3518 Get a bpf_local_storage from the task.
3519
3520 Logically, it could be thought of as getting the value
3521 from a map with task as the key. From this perspective,
3522 the usage is not much different from
3523 bpf_map_lookup_elem(map, &task) except this helper en‐
3524 forces the key must be a task_struct and the map must
3525 also be a BPF_MAP_TYPE_TASK_STORAGE.
3526
3527 Underneath, the value is stored locally at task instead
3528 of the map. The map is used as the bpf-local-storage
3529 "type". The bpf-local-storage "type" (i.e. the map) is
3530 searched against all bpf_local_storage residing at task.
3531
3532 An optional flags (BPF_LOCAL_STORAGE_GET_F_CREATE) can be
3533 used such that a new bpf_local_storage will be created if
3534 one does not exist. value can be used together with
3535 BPF_LOCAL_STORAGE_GET_F_CREATE to specify the initial
3536 value of a bpf_local_storage. If value is NULL, the new
3537 bpf_local_storage will be zero initialized.
3538
3539 Return A bpf_local_storage pointer is returned on success.
3540
3541 NULL if not found or there was an error in adding a new
3542 bpf_local_storage.
3543
3544 long bpf_task_storage_delete(struct bpf_map *map, struct task_struct
3545 *task)
3546
3547 Description
3548 Delete a bpf_local_storage from a task.
3549
3550 Return 0 on success.
3551
3552 -ENOENT if the bpf_local_storage cannot be found.
3553
3554 struct task_struct *bpf_get_current_task_btf(void)
3555
3556 Description
3557 Return a BTF pointer to the "current" task. This pointer
3558 can also be used in helpers that accept an
3559 ARG_PTR_TO_BTF_ID of type task_struct.
3560
3561 Return Pointer to the current task.
3562
3563 long bpf_bprm_opts_set(struct linux_binprm *bprm, u64 flags)
3564
3565 Description
3566 Set or clear certain options on bprm:
3567
3568 BPF_F_BPRM_SECUREEXEC Set the secureexec bit which sets
3569 the AT_SECURE auxv for glibc. The bit is cleared if the
3570 flag is not specified.
3571
3572 Return -EINVAL if invalid flags are passed, zero otherwise.
3573
3574 u64 bpf_ktime_get_coarse_ns(void)
3575
3576 Description
3577 Return a coarse-grained version of the time elapsed since
3578 system boot, in nanoseconds. Does not include time the
3579 system was suspended.
3580
3581 See: clock_gettime(CLOCK_MONOTONIC_COARSE)
3582
3583 Return Current ktime.
3584
3585 long bpf_ima_inode_hash(struct inode *inode, void *dst, u32 size)
3586
3587 Description
3588 Returns the stored IMA hash of the inode (if it's avail‐
3589 able). If the hash is larger than size, then only size
3590 bytes will be copied to dst
3591
3592 Return The hash_algo is returned on success, -EOPNOTSUP if IMA
3593 is disabled or -EINVAL if invalid arguments are passed.
3594
3595 struct socket *bpf_sock_from_file(struct file *file)
3596
3597 Description
3598 If the given file represents a socket, returns the asso‐
3599 ciated socket.
3600
3601 Return A pointer to a struct socket on success or NULL if the
3602 file is not a socket.
3603
3604 long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff,
3605 u64 flags)
3606
3607 Description
3608 Check packet size against exceeding MTU of net device
3609 (based on ifindex). This helper will likely be used in
3610 combination with helpers that adjust/change the packet
3611 size.
3612
3613 The argument len_diff can be used for querying with a
3614 planned size change. This allows to check MTU prior to
3615 changing packet ctx. Providing a len_diff adjustment that
3616 is larger than the actual packet size (resulting in nega‐
3617 tive packet size) will in principle not exceed the MTU,
3618 which is why it is not considered a failure. Other BPF
3619 helpers are needed for performing the planned size
3620 change; therefore the responsibility for catching a nega‐
3621 tive packet size belongs in those helpers.
3622
3623 Specifying ifindex zero means the MTU check is performed
3624 against the current net device. This is practical if
3625 this isn't used prior to redirect.
3626
3627 On input mtu_len must be a valid pointer, else verifier
3628 will reject BPF program. If the value mtu_len is ini‐
3629 tialized to zero then the ctx packet size is use. When
3630 value mtu_len is provided as input this specify the L3
3631 length that the MTU check is done against. Remember XDP
3632 and TC length operate at L2, but this value is L3 as this
3633 correlate to MTU and IP-header tot_len values which are
3634 L3 (similar behavior as bpf_fib_lookup).
3635
3636 The Linux kernel route table can configure MTUs on a more
3637 specific per route level, which is not provided by this
3638 helper. For route level MTU checks use the
3639 bpf_fib_lookup() helper.
3640
3641 ctx is either struct xdp_md for XDP programs or struct
3642 sk_buff for tc cls_act programs.
3643
3644 The flags argument can be a combination of one or more of
3645 the following values:
3646
3647 BPF_MTU_CHK_SEGS
3648 This flag will only works for ctx struct sk_buff.
3649 If packet context contains extra packet segment
3650 buffers (often knows as GSO skb), then MTU check
3651 is harder to check at this point, because in
3652 transmit path it is possible for the skb packet to
3653 get re-segmented (depending on net device fea‐
3654 tures). This could still be a MTU violation, so
3655 this flag enables performing MTU check against
3656 segments, with a different violation return code
3657 to tell it apart. Check cannot use len_diff.
3658
3659 On return mtu_len pointer contains the MTU value of the
3660 net device. Remember the net device configured MTU is
3661 the L3 size, which is returned here and XDP and TC length
3662 operate at L2. Helper take this into account for you,
3663 but remember when using MTU value in your BPF-code.
3664
3665 Return
3666
3667 • 0 on success, and populate MTU value in mtu_len
3668 pointer.
3669
3670 • < 0 if any input argument is invalid (mtu_len not up‐
3671 dated)
3672
3673 MTU violations return positive values, but also populate
3674 MTU value in mtu_len pointer, as this can be needed for
3675 implementing PMTU handing:
3676
3677 • BPF_MTU_CHK_RET_FRAG_NEEDED
3678
3679 • BPF_MTU_CHK_RET_SEGS_TOOBIG
3680
3681 long bpf_for_each_map_elem(struct bpf_map *map, void *callback_fn, void
3682 *callback_ctx, u64 flags)
3683
3684 Description
3685 For each element in map, call callback_fn function with
3686 map, callback_ctx and other map-specific parameters. The
3687 callback_fn should be a static function and the call‐
3688 back_ctx should be a pointer to the stack. The flags is
3689 used to control certain aspects of the helper. Cur‐
3690 rently, the flags must be 0.
3691
3692 The following are a list of supported map types and their
3693 respective expected callback signatures:
3694
3695 BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_PERCPU_HASH,
3696 BPF_MAP_TYPE_LRU_HASH, BPF_MAP_TYPE_LRU_PERCPU_HASH,
3697 BPF_MAP_TYPE_ARRAY, BPF_MAP_TYPE_PERCPU_ARRAY
3698
3699 long (*callback_fn)(struct bpf_map *map, const void *key,
3700 void *value, void *ctx);
3701
3702 For per_cpu maps, the map_value is the value on the cpu
3703 where the bpf_prog is running.
3704
3705 If callback_fn return 0, the helper will continue to the
3706 next element. If return value is 1, the helper will skip
3707 the rest of elements and return. Other return values are
3708 not used now.
3709
3710 Return The number of traversed map elements for success, -EINVAL
3711 for invalid flags.
3712
3713 long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data,
3714 u32 data_len)
3715
3716 Description
3717 Outputs a string into the str buffer of size str_size
3718 based on a format string stored in a read-only map
3719 pointed by fmt.
3720
3721 Each format specifier in fmt corresponds to one u64 ele‐
3722 ment in the data array. For strings and pointers where
3723 pointees are accessed, only the pointer values are stored
3724 in the data array. The data_len is the size of data in
3725 bytes - must be a multiple of 8.
3726
3727 Formats %s and %p{i,I}{4,6} require to read kernel mem‐
3728 ory. Reading kernel memory may fail due to either invalid
3729 address or valid address but requiring a major memory
3730 fault. If reading kernel memory fails, the string for %s
3731 will be an empty string, and the ip address for
3732 %p{i,I}{4,6} will be 0. Not returning error to bpf pro‐
3733 gram is consistent with what bpf_trace_printk() does for
3734 now.
3735
3736 Return The strictly positive length of the formatted string, in‐
3737 cluding the trailing zero character. If the return value
3738 is greater than str_size, str contains a truncated
3739 string, guaranteed to be zero-terminated except when
3740 str_size is 0.
3741
3742 Or -EBUSY if the per-CPU memory copy buffer is busy.
3743
3744 long bpf_sys_bpf(u32 cmd, void *attr, u32 attr_size)
3745
3746 Description
3747 Execute bpf syscall with given arguments.
3748
3749 Return A syscall result.
3750
3751 long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int
3752 flags)
3753
3754 Description
3755 Find BTF type with given name and kind in vmlinux BTF or
3756 in module's BTFs.
3757
3758 Return Returns btf_id and btf_obj_fd in lower and upper 32 bits.
3759
3760 long bpf_sys_close(u32 fd)
3761
3762 Description
3763 Execute close syscall for given FD.
3764
3765 Return A syscall result.
3766
3767 long bpf_timer_init(struct bpf_timer *timer, struct bpf_map *map, u64
3768 flags)
3769
3770 Description
3771 Initialize the timer. First 4 bits of flags specify
3772 clockid. Only CLOCK_MONOTONIC, CLOCK_REALTIME,
3773 CLOCK_BOOTTIME are allowed. All other bits of flags are
3774 reserved. The verifier will reject the program if timer
3775 is not from the same map.
3776
3777 Return 0 on success. -EBUSY if timer is already initialized.
3778 -EINVAL if invalid flags are passed. -EPERM if timer is
3779 in a map that doesn't have any user references. The user
3780 space should either hold a file descriptor to a map with
3781 timers or pin such map in bpffs. When map is unpinned or
3782 file descriptor is closed all timers in the map will be
3783 cancelled and freed.
3784
3785 long bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn)
3786
3787 Description
3788 Configure the timer to call callback_fn static function.
3789
3790 Return 0 on success. -EINVAL if timer was not initialized with
3791 bpf_timer_init() earlier. -EPERM if timer is in a map
3792 that doesn't have any user references. The user space
3793 should either hold a file descriptor to a map with timers
3794 or pin such map in bpffs. When map is unpinned or file
3795 descriptor is closed all timers in the map will be can‐
3796 celled and freed.
3797
3798 long bpf_timer_start(struct bpf_timer *timer, u64 nsecs, u64 flags)
3799
3800 Description
3801 Set timer expiration N nanoseconds from the current time.
3802 The configured callback will be invoked in soft irq con‐
3803 text on some cpu and will not repeat unless another
3804 bpf_timer_start() is made. In such case the next invoca‐
3805 tion can migrate to a different cpu. Since struct
3806 bpf_timer is a field inside map element the map owns the
3807 timer. The bpf_timer_set_callback() will increment refcnt
3808 of BPF program to make sure that callback_fn code stays
3809 valid. When user space reference to a map reaches zero
3810 all timers in a map are cancelled and corresponding pro‐
3811 gram's refcnts are decremented. This is done to make sure
3812 that Ctrl-C of a user process doesn't leave any timers
3813 running. If map is pinned in bpffs the callback_fn can
3814 re-arm itself indefinitely. bpf_map_update/delete_elem()
3815 helpers and user space sys_bpf commands cancel and free
3816 the timer in the given map element. The map can contain
3817 timers that invoke callback_fn-s from different programs.
3818 The same callback_fn can serve different timers from dif‐
3819 ferent maps if key/value layout matches across maps. Ev‐
3820 ery bpf_timer_set_callback() can have different call‐
3821 back_fn.
3822
3823 Return 0 on success. -EINVAL if timer was not initialized with
3824 bpf_timer_init() earlier or invalid flags are passed.
3825
3826 long bpf_timer_cancel(struct bpf_timer *timer)
3827
3828 Description
3829 Cancel the timer and wait for callback_fn to finish if it
3830 was running.
3831
3832 Return 0 if the timer was not active. 1 if the timer was ac‐
3833 tive. -EINVAL if timer was not initialized with
3834 bpf_timer_init() earlier. -EDEADLK if callback_fn tried
3835 to call bpf_timer_cancel() on its own timer which would
3836 have led to a deadlock otherwise.
3837
3838 u64 bpf_get_func_ip(void *ctx)
3839
3840 Description
3841 Get address of the traced function (for tracing and
3842 kprobe programs).
3843
3844 Return Address of the traced function. 0 for kprobes placed
3845 within the function (not at the entry).
3846
3847 u64 bpf_get_attach_cookie(void *ctx)
3848
3849 Description
3850 Get bpf_cookie value provided (optionally) during the
3851 program attachment. It might be different for each indi‐
3852 vidual attachment, even if BPF program itself is the
3853 same. Expects BPF program context ctx as a first argu‐
3854 ment.
3855
3856 Supported for the following program types:
3857
3858 • kprobe/uprobe;
3859
3860 • tracepoint;
3861
3862 • perf_event.
3863
3864 Return Value specified by user at BPF link creation/attachment
3865 time or 0, if it was not specified.
3866
3867 long bpf_task_pt_regs(struct task_struct *task)
3868
3869 Description
3870 Get the struct pt_regs associated with task.
3871
3872 Return A pointer to struct pt_regs.
3873
3874 long bpf_get_branch_snapshot(void *entries, u32 size, u64 flags)
3875
3876 Description
3877 Get branch trace from hardware engines like Intel LBR.
3878 The hardware engine is stopped shortly after the helper
3879 is called. Therefore, the user need to filter branch en‐
3880 tries based on the actual use case. To capture branch
3881 trace before the trigger point of the BPF program, the
3882 helper should be called at the beginning of the BPF pro‐
3883 gram.
3884
3885 The data is stored as struct perf_branch_entry into out‐
3886 put buffer entries. size is the size of entries in bytes.
3887 flags is reserved for now and must be zero.
3888
3889 Return On success, number of bytes written to buf. On error, a
3890 negative value.
3891
3892 -EINVAL if flags is not zero.
3893
3894 -ENOENT if architecture does not support branch records.
3895
3896 long bpf_trace_vprintk(const char *fmt, u32 fmt_size, const void *data,
3897 u32 data_len)
3898
3899 Description
3900 Behaves like bpf_trace_printk() helper, but takes an ar‐
3901 ray of u64 to format and can handle more format args as a
3902 result.
3903
3904 Arguments are to be used as in bpf_seq_printf() helper.
3905
3906 Return The number of bytes written to the buffer, or a negative
3907 error in case of failure.
3908
3909 struct unix_sock *bpf_skc_to_unix_sock(void *sk)
3910
3911 Description
3912 Dynamically cast a sk pointer to a unix_sock pointer.
3913
3914 Return sk if casting is valid, or NULL otherwise.
3915
3916 long bpf_kallsyms_lookup_name(const char *name, int name_sz, int flags,
3917 u64 *res)
3918
3919 Description
3920 Get the address of a kernel symbol, returned in res. res
3921 is set to 0 if the symbol is not found.
3922
3923 Return On success, zero. On error, a negative value.
3924
3925 -EINVAL if flags is not zero.
3926
3927 -EINVAL if string name is not the same size as name_sz.
3928
3929 -ENOENT if symbol is not found.
3930
3931 -EPERM if caller does not have permission to obtain ker‐
3932 nel address.
3933
3934 long bpf_find_vma(struct task_struct *task, u64 addr, void *call‐
3935 back_fn, void *callback_ctx, u64 flags)
3936
3937 Description
3938 Find vma of task that contains addr, call callback_fn
3939 function with task, vma, and callback_ctx. The call‐
3940 back_fn should be a static function and the callback_ctx
3941 should be a pointer to the stack. The flags is used to
3942 control certain aspects of the helper. Currently, the
3943 flags must be 0.
3944
3945 The expected callback signature is
3946
3947 long (*callback_fn)(struct task_struct *task, struct
3948 vm_area_struct *vma, void *callback_ctx);
3949
3950 Return 0 on success. -ENOENT if task->mm is NULL, or no vma
3951 contains addr. -EBUSY if failed to try lock mmap_lock.
3952 -EINVAL for invalid flags.
3953
3954 long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64
3955 flags)
3956
3957 Description
3958 For nr_loops, call callback_fn function with callback_ctx
3959 as the context parameter. The callback_fn should be a
3960 static function and the callback_ctx should be a pointer
3961 to the stack. The flags is used to control certain as‐
3962 pects of the helper. Currently, the flags must be 0.
3963 Currently, nr_loops is limited to 1 << 23 (~8 million)
3964 loops.
3965
3966 long (*callback_fn)(u32 index, void *ctx);
3967
3968 where index is the current index in the loop. The index
3969 is zero-indexed.
3970
3971 If callback_fn returns 0, the helper will continue to the
3972 next loop. If return value is 1, the helper will skip the
3973 rest of the loops and return. Other return values are not
3974 used now, and will be rejected by the verifier.
3975
3976 Return The number of loops performed, -EINVAL for invalid flags,
3977 -E2BIG if nr_loops exceeds the maximum number of loops.
3978
3979 long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
3980
3981 Description
3982 Do strncmp() between s1 and s2. s1 doesn't need to be
3983 null-terminated and s1_sz is the maximum storage size of
3984 s1. s2 must be a read-only string.
3985
3986 Return An integer less than, equal to, or greater than zero if
3987 the first s1_sz bytes of s1 is found to be less than, to
3988 match, or be greater than s2.
3989
3990 long bpf_get_func_arg(void *ctx, u32 n, u64 *value)
3991
3992 Description
3993 Get n-th argument register (zero based) of the traced
3994 function (for tracing programs) returned in value.
3995
3996 Return 0 on success. -EINVAL if n >= argument register count of
3997 traced function.
3998
3999 long bpf_get_func_ret(void *ctx, u64 *value)
4000
4001 Description
4002 Get return value of the traced function (for tracing pro‐
4003 grams) in value.
4004
4005 Return 0 on success. -EOPNOTSUPP for tracing programs other
4006 than BPF_TRACE_FEXIT or BPF_MODIFY_RETURN.
4007
4008 long bpf_get_func_arg_cnt(void *ctx)
4009
4010 Description
4011 Get number of registers of the traced function (for trac‐
4012 ing programs) where function arguments are stored in
4013 these registers.
4014
4015 Return The number of argument registers of the traced function.
4016
4017 int bpf_get_retval(void)
4018
4019 Description
4020 Get the BPF program's return value that will be returned
4021 to the upper layers.
4022
4023 This helper is currently supported by cgroup programs and
4024 only by the hooks where BPF program's return value is re‐
4025 turned to the userspace via errno.
4026
4027 Return The BPF program's return value.
4028
4029 int bpf_set_retval(int retval)
4030
4031 Description
4032 Set the BPF program's return value that will be returned
4033 to the upper layers.
4034
4035 This helper is currently supported by cgroup programs and
4036 only by the hooks where BPF program's return value is re‐
4037 turned to the userspace via errno.
4038
4039 Note that there is the following corner case where the
4040 program exports an error via bpf_set_retval but signals
4041 success via 'return 1':
4042 bpf_set_retval(-EPERM); return 1;
4043
4044 In this case, the BPF program's return value will use
4045 helper's -EPERM. This still holds true for
4046 cgroup/bind{4,6} which supports extra 'return 3' success
4047 case.
4048
4049 Return 0 on success, or a negative error in case of failure.
4050
4051 u64 bpf_xdp_get_buff_len(struct xdp_buff *xdp_md)
4052
4053 Description
4054 Get the total size of a given xdp buff (linear and paged
4055 area)
4056
4057 Return The total size of a given xdp buffer.
4058
4059 long bpf_xdp_load_bytes(struct xdp_buff *xdp_md, u32 offset, void *buf,
4060 u32 len)
4061
4062 Description
4063 This helper is provided as an easy way to load data from
4064 a xdp buffer. It can be used to load len bytes from off‐
4065 set from the frame associated to xdp_md, into the buffer
4066 pointed by buf.
4067
4068 Return 0 on success, or a negative error in case of failure.
4069
4070 long bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, void
4071 *buf, u32 len)
4072
4073 Description
4074 Store len bytes from buffer buf into the frame associated
4075 to xdp_md, at offset.
4076
4077 Return 0 on success, or a negative error in case of failure.
4078
4079 long bpf_copy_from_user_task(void *dst, u32 size, const void *user_ptr,
4080 struct task_struct *tsk, u64 flags)
4081
4082 Description
4083 Read size bytes from user space address user_ptr in tsk's
4084 address space, and stores the data in dst. flags is not
4085 used yet and is provided for future extensibility. This
4086 helper can only be used by sleepable programs.
4087
4088 Return 0 on success, or a negative error in case of failure. On
4089 error dst buffer is zeroed out.
4090
4091 long bpf_skb_set_tstamp(struct sk_buff *skb, u64 tstamp, u32
4092 tstamp_type)
4093
4094 Description
4095 Change the __sk_buff->tstamp_type to tstamp_type and set
4096 tstamp to the __sk_buff->tstamp together.
4097
4098 If there is no need to change the __sk_buff->tstamp_type,
4099 the tstamp value can be directly written to
4100 __sk_buff->tstamp instead.
4101
4102 BPF_SKB_TSTAMP_DELIVERY_MONO is the only tstamp that will
4103 be kept during bpf_redirect_*(). A non zero tstamp must
4104 be used with the BPF_SKB_TSTAMP_DELIVERY_MONO
4105 tstamp_type.
4106
4107 A BPF_SKB_TSTAMP_UNSPEC tstamp_type can only be used with
4108 a zero tstamp.
4109
4110 Only IPv4 and IPv6 skb->protocol are supported.
4111
4112 This function is most useful when it needs to set a mono
4113 delivery time to __sk_buff->tstamp and then bpf_redi‐
4114 rect_*() to the egress of an iface. For example, chang‐
4115 ing the (rcv) timestamp in __sk_buff->tstamp at ingress
4116 to a mono delivery time and then bpf_redirect_*() to
4117 sch_fq@phy-dev.
4118
4119 Return 0 on success. -EINVAL for invalid input -EOPNOTSUPP for
4120 unsupported protocol
4121
4122 long bpf_ima_file_hash(struct file *file, void *dst, u32 size)
4123
4124 Description
4125 Returns a calculated IMA hash of the file. If the hash
4126 is larger than size, then only size bytes will be copied
4127 to dst
4128
4129 Return The hash_algo is returned on success, -EOPNOTSUP if the
4130 hash calculation failed or -EINVAL if invalid arguments
4131 are passed.
4132
4133 void *bpf_kptr_xchg(void *map_value, void *ptr)
4134
4135 Description
4136 Exchange kptr at pointer map_value with ptr, and return
4137 the old value. ptr can be NULL, otherwise it must be a
4138 referenced pointer which will be released when this
4139 helper is called.
4140
4141 Return The old value of kptr (which can be NULL). The returned
4142 pointer if not NULL, is a reference which must be re‐
4143 leased using its corresponding release function, or moved
4144 into a BPF map before program exit.
4145
4146 void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key,
4147 u32 cpu)
4148
4149 Description
4150 Perform a lookup in percpu map for an entry associated to
4151 key on cpu.
4152
4153 Return Map value associated to key on cpu, or NULL if no entry
4154 was found or cpu is invalid.
4155
4156 struct mptcp_sock *bpf_skc_to_mptcp_sock(void *sk)
4157
4158 Description
4159 Dynamically cast a sk pointer to a mptcp_sock pointer.
4160
4161 Return sk if casting is valid, or NULL otherwise.
4162
4163 long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct
4164 bpf_dynptr *ptr)
4165
4166 Description
4167 Get a dynptr to local memory data.
4168
4169 data must be a ptr to a map value. The maximum size sup‐
4170 ported is DYNPTR_MAX_SIZE. flags is currently unused.
4171
4172 Return 0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
4173 -EINVAL if flags is not 0.
4174
4175 long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags,
4176 struct bpf_dynptr *ptr)
4177
4178 Description
4179 Reserve size bytes of payload in a ring buffer ringbuf
4180 through the dynptr interface. flags must be 0.
4181
4182 Please note that a corresponding bpf_ringbuf_sub‐
4183 mit_dynptr or bpf_ringbuf_discard_dynptr must be called
4184 on ptr, even if the reservation fails. This is enforced
4185 by the verifier.
4186
4187 Return 0 on success, or a negative error in case of failure.
4188
4189 void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
4190
4191 Description
4192 Submit reserved ring buffer sample, pointed to by data,
4193 through the dynptr interface. This is a no-op if the
4194 dynptr is invalid/null.
4195
4196 For more information on flags, please see 'bpf_ring‐
4197 buf_submit'.
4198
4199 Return Nothing. Always succeeds.
4200
4201 void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
4202
4203 Description
4204 Discard reserved ring buffer sample through the dynptr
4205 interface. This is a no-op if the dynptr is invalid/null.
4206
4207 For more information on flags, please see 'bpf_ring‐
4208 buf_discard'.
4209
4210 Return Nothing. Always succeeds.
4211
4212 long bpf_dynptr_read(void *dst, u32 len, const struct bpf_dynptr *src,
4213 u32 offset, u64 flags)
4214
4215 Description
4216 Read len bytes from src into dst, starting from offset
4217 into src. flags is currently unused.
4218
4219 Return 0 on success, -E2BIG if offset + len exceeds the length
4220 of src's data, -EINVAL if src is an invalid dynptr or if
4221 flags is not 0.
4222
4223 long bpf_dynptr_write(const struct bpf_dynptr *dst, u32 offset, void
4224 *src, u32 len, u64 flags)
4225
4226 Description
4227 Write len bytes from src into dst, starting from offset
4228 into dst. flags is currently unused.
4229
4230 Return 0 on success, -E2BIG if offset + len exceeds the length
4231 of dst's data, -EINVAL if dst is an invalid dynptr or if
4232 dst is a read-only dynptr or if flags is not 0.
4233
4234 void *bpf_dynptr_data(const struct bpf_dynptr *ptr, u32 offset, u32
4235 len)
4236
4237 Description
4238 Get a pointer to the underlying dynptr data.
4239
4240 len must be a statically known value. The returned data
4241 slice is invalidated whenever the dynptr is invalidated.
4242
4243 Return Pointer to the underlying dynptr data, NULL if the dynptr
4244 is read-only, if the dynptr is invalid, or if the offset
4245 and length is out of bounds.
4246
4247 s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr
4248 *th, u32 th_len)
4249
4250 Description
4251 Try to issue a SYN cookie for the packet with correspond‐
4252 ing IPv4/TCP headers, iph and th, without depending on a
4253 listening socket.
4254
4255 iph points to the IPv4 header.
4256
4257 th points to the start of the TCP header, while th_len
4258 contains the length of the TCP header (at least
4259 sizeof(struct tcphdr)).
4260
4261 Return On success, lower 32 bits hold the generated SYN cookie
4262 in followed by 16 bits which hold the MSS value for that
4263 cookie, and the top 16 bits are unused.
4264
4265 On failure, the returned value is one of the following:
4266
4267 -EINVAL if th_len is invalid.
4268
4269 s64 bpf_tcp_raw_gen_syncookie_ipv6(struct ipv6hdr *iph, struct tcphdr
4270 *th, u32 th_len)
4271
4272 Description
4273 Try to issue a SYN cookie for the packet with correspond‐
4274 ing IPv6/TCP headers, iph and th, without depending on a
4275 listening socket.
4276
4277 iph points to the IPv6 header.
4278
4279 th points to the start of the TCP header, while th_len
4280 contains the length of the TCP header (at least
4281 sizeof(struct tcphdr)).
4282
4283 Return On success, lower 32 bits hold the generated SYN cookie
4284 in followed by 16 bits which hold the MSS value for that
4285 cookie, and the top 16 bits are unused.
4286
4287 On failure, the returned value is one of the following:
4288
4289 -EINVAL if th_len is invalid.
4290
4291 -EPROTONOSUPPORT if CONFIG_IPV6 is not builtin.
4292
4293 long bpf_tcp_raw_check_syncookie_ipv4(struct iphdr *iph, struct tcphdr
4294 *th)
4295
4296 Description
4297 Check whether iph and th contain a valid SYN cookie ACK
4298 without depending on a listening socket.
4299
4300 iph points to the IPv4 header.
4301
4302 th points to the TCP header.
4303
4304 Return 0 if iph and th are a valid SYN cookie ACK.
4305
4306 On failure, the returned value is one of the following:
4307
4308 -EACCES if the SYN cookie is not valid.
4309
4310 long bpf_tcp_raw_check_syncookie_ipv6(struct ipv6hdr *iph, struct
4311 tcphdr *th)
4312
4313 Description
4314 Check whether iph and th contain a valid SYN cookie ACK
4315 without depending on a listening socket.
4316
4317 iph points to the IPv6 header.
4318
4319 th points to the TCP header.
4320
4321 Return 0 if iph and th are a valid SYN cookie ACK.
4322
4323 On failure, the returned value is one of the following:
4324
4325 -EACCES if the SYN cookie is not valid.
4326
4327 -EPROTONOSUPPORT if CONFIG_IPV6 is not builtin.
4328
4329 u64 bpf_ktime_get_tai_ns(void)
4330
4331 Description
4332 A nonsettable system-wide clock derived from wall-clock
4333 time but ignoring leap seconds. This clock does not ex‐
4334 perience discontinuities and backwards jumps caused by
4335 NTP inserting leap seconds as CLOCK_REALTIME does.
4336
4337 See: clock_gettime(CLOCK_TAI)
4338
4339 Return Current ktime.
4340
4341 long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn,
4342 void *ctx, u64 flags)
4343
4344 Description
4345 Drain samples from the specified user ring buffer, and
4346 invoke the provided callback for each such sample:
4347
4348 long (*callback_fn)(const struct bpf_dynptr *dynptr, void
4349 *ctx);
4350
4351 If callback_fn returns 0, the helper will continue to try
4352 and drain the next sample, up to a maximum of
4353 BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value
4354 is 1, the helper will skip the rest of the samples and
4355 return. Other return values are not used now, and will be
4356 rejected by the verifier.
4357
4358 Return The number of drained samples if no error was encountered
4359 while draining samples, or 0 if no samples were present
4360 in the ring buffer. If a user-space producer was
4361 epoll-waiting on this map, and at least one sample was
4362 drained, they will receive an event notification notify‐
4363 ing them of available space in the ring buffer. If the
4364 BPF_RB_NO_WAKEUP flag is passed to this function, no
4365 wakeup notification will be sent. If the
4366 BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification
4367 will be sent even if no sample was drained.
4368
4369 On failure, the returned value is one of the following:
4370
4371 -EBUSY if the ring buffer is contended, and another call‐
4372 ing context was concurrently draining the ring buffer.
4373
4374 -EINVAL if user-space is not properly tracking the ring
4375 buffer due to the producer position not being aligned to
4376 8 bytes, a sample not being aligned to 8 bytes, or the
4377 producer position not matching the advertised length of a
4378 sample.
4379
4380 -E2BIG if user-space has tried to publish a sample which
4381 is larger than the size of the ring buffer, or which can‐
4382 not fit within a struct bpf_dynptr.
4383
4384 void *bpf_cgrp_storage_get(struct bpf_map *map, struct cgroup *cgroup,
4385 void *value, u64 flags)
4386
4387 Description
4388 Get a bpf_local_storage from the cgroup.
4389
4390 Logically, it could be thought of as getting the value
4391 from a map with cgroup as the key. From this perspec‐
4392 tive, the usage is not much different from
4393 bpf_map_lookup_elem(map, &cgroup) except this helper en‐
4394 forces the key must be a cgroup struct and the map must
4395 also be a BPF_MAP_TYPE_CGRP_STORAGE.
4396
4397 In reality, the local-storage value is embedded directly
4398 inside of the cgroup object itself, rather than being lo‐
4399 cated in the BPF_MAP_TYPE_CGRP_STORAGE map. When the lo‐
4400 cal-storage value is queried for some map on a cgroup ob‐
4401 ject, the kernel will perform an O(n) iteration over all
4402 of the live local-storage values for that cgroup object
4403 until the local-storage value for the map is found.
4404
4405 An optional flags (BPF_LOCAL_STORAGE_GET_F_CREATE) can be
4406 used such that a new bpf_local_storage will be created if
4407 one does not exist. value can be used together with
4408 BPF_LOCAL_STORAGE_GET_F_CREATE to specify the initial
4409 value of a bpf_local_storage. If value is NULL, the new
4410 bpf_local_storage will be zero initialized.
4411
4412 Return A bpf_local_storage pointer is returned on success.
4413
4414 NULL if not found or there was an error in adding a new
4415 bpf_local_storage.
4416
4417 long bpf_cgrp_storage_delete(struct bpf_map *map, struct cgroup
4418 *cgroup)
4419
4420 Description
4421 Delete a bpf_local_storage from a cgroup.
4422
4423 Return 0 on success.
4424
4425 -ENOENT if the bpf_local_storage cannot be found.
4426
4428 Example usage for most of the eBPF helpers listed in this manual page
4429 are available within the Linux kernel sources, at the following loca‐
4430 tions:
4431
4432 • samples/bpf/
4433
4434 • tools/testing/selftests/bpf/
4435
4437 eBPF programs can have an associated license, passed along with the
4438 bytecode instructions to the kernel when the programs are loaded. The
4439 format for that string is identical to the one in use for kernel mod‐
4440 ules (Dual licenses, such as "Dual BSD/GPL", may be used). Some helper
4441 functions are only accessible to programs that are compatible with the
4442 GNU Privacy License (GPL).
4443
4444 In order to use such helpers, the eBPF program must be loaded with the
4445 correct license string passed (via attr) to the bpf() system call, and
4446 this generally translates into the C source code of the program con‐
4447 taining a line similar to the following:
4448
4449 char ____license[] __attribute__((section("license"), used)) = "GPL";
4450
4452 This manual page is an effort to document the existing eBPF helper
4453 functions. But as of this writing, the BPF sub-system is under heavy
4454 development. New eBPF program or map types are added, along with new
4455 helper functions. Some helpers are occasionally made available for ad‐
4456 ditional program types. So in spite of the efforts of the community,
4457 this page might not be up-to-date. If you want to check by yourself
4458 what helper functions exist in your kernel, or what types of programs
4459 they can support, here are some files among the kernel tree that you
4460 may be interested in:
4461
4462 • include/uapi/linux/bpf.h is the main BPF header. It contains the full
4463 list of all helper functions, as well as many other BPF definitions
4464 including most of the flags, structs or constants used by the
4465 helpers.
4466
4467 • net/core/filter.c contains the definition of most network-related
4468 helper functions, and the list of program types from which they can
4469 be used.
4470
4471 • kernel/trace/bpf_trace.c is the equivalent for most tracing pro‐
4472 gram-related helpers.
4473
4474 • kernel/bpf/verifier.c contains the functions used to check that valid
4475 types of eBPF maps are used with a given helper function.
4476
4477 • kernel/bpf/ directory contains other files in which additional
4478 helpers are defined (for cgroups, sockmaps, etc.).
4479
4480 • The bpftool utility can be used to probe the availability of helper
4481 functions on the system (as well as supported program and map types,
4482 and a number of other parameters). To do so, run bpftool feature
4483 probe (see bpftool-feature(8) for details). Add the unprivileged key‐
4484 word to list features available to unprivileged users.
4485
4486 Compatibility between helper functions and program types can generally
4487 be found in the files where helper functions are defined. Look for the
4488 struct bpf_func_proto objects and for functions returning them: these
4489 functions contain a list of helpers that a given program type can call.
4490 Note that the default: label of the switch ... case used to filter
4491 helpers can call other functions, themselves allowing access to addi‐
4492 tional helpers. The requirement for GPL license is also in those struct
4493 bpf_func_proto.
4494
4495 Compatibility between helper functions and map types can be found in
4496 the check_map_func_compatibility() function in file kernel/bpf/veri‐
4497 fier.c.
4498
4499 Helper functions that invalidate the checks on data and data_end point‐
4500 ers for network processing are listed in function
4501 bpf_helper_changes_pkt_data() in file net/core/filter.c.
4502
4504 bpf(2), bpftool(8), cgroups(7), ip(8), perf_event_open(2), sendmsg(2),
4505 socket(7), tc-bpf(8)
4506
4507
4508
4509
4510Linux v6.2 2023-04-11 BPF-HELPERS(7)