1BPF-HELPERS(7) Miscellaneous Information Manual BPF-HELPERS(7)
2
3
4
6 BPF-HELPERS - list of eBPF helper functions
7
9 The extended Berkeley Packet Filter (eBPF) subsystem consists in pro‐
10 grams written in a pseudo-assembly language, then attached to one of
11 the several kernel hooks and run in reaction of specific events. This
12 framework differs from the older, "classic" BPF (or "cBPF") in several
13 aspects, one of them being the ability to call special functions (or
14 "helpers") from within a program. These functions are restricted to a
15 white-list of helpers defined in the kernel.
16
17 These helpers are used by eBPF programs to interact with the system, or
18 with the context in which they work. For instance, they can be used to
19 print debugging messages, to get the time since the system was booted,
20 to interact with eBPF maps, or to manipulate network packets. Since
21 there are several eBPF program types, and that they do not run in the
22 same context, each program type can only call a subset of those
23 helpers.
24
25 Due to eBPF conventions, a helper can not have more than five argu‐
26 ments.
27
28 Internally, eBPF programs call directly into the compiled helper func‐
29 tions without requiring any foreign-function interface. As a result,
30 calling helpers introduces no overhead, thus offering excellent perfor‐
31 mance.
32
33 This document is an attempt to list and document the helpers available
34 to eBPF developers. They are sorted by chronological order (the oldest
35 helpers in the kernel at the top).
36
38 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
39
40 Description
41 Perform a lookup in map for an entry associated to key.
42
43 Return Map value associated to key, or NULL if no entry was
44 found.
45
46 long bpf_map_update_elem(struct bpf_map *map, const void *key, const
47 void *value, u64 flags)
48
49 Description
50 Add or update the value of the entry associated to key in
51 map with value. flags is one of:
52
53 BPF_NOEXIST
54 The entry for key must not exist in the map.
55
56 BPF_EXIST
57 The entry for key must already exist in the map.
58
59 BPF_ANY
60 No condition on the existence of the entry for
61 key.
62
63 Flag value BPF_NOEXIST cannot be used for maps of types
64 BPF_MAP_TYPE_ARRAY or BPF_MAP_TYPE_PERCPU_ARRAY (all el‐
65 ements always exist), the helper would return an error.
66
67 Return 0 on success, or a negative error in case of failure.
68
69 long bpf_map_delete_elem(struct bpf_map *map, const void *key)
70
71 Description
72 Delete entry with key from map.
73
74 Return 0 on success, or a negative error in case of failure.
75
76 long bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
77
78 Description
79 For tracing programs, safely attempt to read size bytes
80 from kernel space address unsafe_ptr and store the data
81 in dst.
82
83 Generally, use bpf_probe_read_user() or
84 bpf_probe_read_kernel() instead.
85
86 Return 0 on success, or a negative error in case of failure.
87
88 u64 bpf_ktime_get_ns(void)
89
90 Description
91 Return the time elapsed since system boot, in nanosec‐
92 onds. Does not include time the system was suspended.
93 See: clock_gettime(CLOCK_MONOTONIC)
94
95 Return Current ktime.
96
97 long bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
98
99 Description
100 This helper is a "printk()-like" facility for debugging.
101 It prints a message defined by format fmt (of size
102 fmt_size) to file /sys/kernel/debug/tracing/trace from
103 DebugFS, if available. It can take up to three additional
104 u64 arguments (as an eBPF helpers, the total number of
105 arguments is limited to five).
106
107 Each time the helper is called, it appends a line to the
108 trace. Lines are discarded while /sys/kernel/debug/trac‐
109 ing/trace is open, use /sys/kernel/debug/trac‐
110 ing/trace_pipe to avoid this. The format of the trace is
111 customizable, and the exact output one will get depends
112 on the options set in /sys/kernel/debug/tracing/trace_op‐
113 tions (see also the README file under the same direc‐
114 tory). However, it usually defaults to something like:
115
116 telnet-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
117
118 In the above:
119
120 • telnet is the name of the current task.
121
122 • 470 is the PID of the current task.
123
124 • 001 is the CPU number on which the task is running.
125
126 • In .N.., each character refers to a set of options
127 (whether irqs are enabled, scheduling options,
128 whether hard/softirqs are running, level of pre‐
129 empt_disabled respectively). N means that
130 TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED are set.
131
132 • 419421.045894 is a timestamp.
133
134 • 0x00000001 is a fake value used by BPF for the in‐
135 struction pointer register.
136
137 • <formatted msg> is the message formatted with fmt.
138
139 The conversion specifiers supported by fmt are similar,
140 but more limited than for printk(). They are %d, %i, %u,
141 %x, %ld, %li, %lu, %lx, %lld, %lli, %llu, %llx, %p, %s.
142 No modifier (size of field, padding with zeroes, etc.) is
143 available, and the helper will return -EINVAL (but print
144 nothing) if it encounters an unknown specifier.
145
146 Also, note that bpf_trace_printk() is slow, and should
147 only be used for debugging purposes. For this reason, a
148 notice block (spanning several lines) is printed to ker‐
149 nel logs and states that the helper should not be used
150 "for production use" the first time this helper is used
151 (or more precisely, when trace_printk() buffers are allo‐
152 cated). For passing values to user space, perf events
153 should be preferred.
154
155 Return The number of bytes written to the buffer, or a negative
156 error in case of failure.
157
158 u32 bpf_get_prandom_u32(void)
159
160 Description
161 Get a pseudo-random number.
162
163 From a security point of view, this helper uses its own
164 pseudo-random internal state, and cannot be used to infer
165 the seed of other random functions in the kernel. How‐
166 ever, it is essential to note that the generator used by
167 the helper is not cryptographically secure.
168
169 Return A random 32-bit unsigned value.
170
171 u32 bpf_get_smp_processor_id(void)
172
173 Description
174 Get the SMP (symmetric multiprocessing) processor id.
175 Note that all programs run with migration disabled, which
176 means that the SMP processor id is stable during all the
177 execution of the program.
178
179 Return The SMP id of the processor running the program.
180
181 long bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void
182 *from, u32 len, u64 flags)
183
184 Description
185 Store len bytes from address from into the packet associ‐
186 ated to skb, at offset. flags are a combination of
187 BPF_F_RECOMPUTE_CSUM (automatically recompute the check‐
188 sum for the packet after storing the bytes) and BPF_F_IN‐
189 VALIDATE_HASH (set skb->hash, skb->swhash and skb->l4hash
190 to 0).
191
192 A call to this helper is susceptible to change the under‐
193 lying packet buffer. Therefore, at load time, all checks
194 on pointers previously done by the verifier are invali‐
195 dated and must be performed again, if the helper is used
196 in combination with direct packet access.
197
198 Return 0 on success, or a negative error in case of failure.
199
200 long bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
201 to, u64 size)
202
203 Description
204 Recompute the layer 3 (e.g. IP) checksum for the packet
205 associated to skb. Computation is incremental, so the
206 helper must know the former value of the header field
207 that was modified (from), the new value of this field
208 (to), and the number of bytes (2 or 4) for this field,
209 stored in size. Alternatively, it is possible to store
210 the difference between the previous and the new values of
211 the header field in to, by setting from and size to 0.
212 For both methods, offset indicates the location of the IP
213 checksum within the packet.
214
215 This helper works in combination with bpf_csum_diff(),
216 which does not update the checksum in-place, but offers
217 more flexibility and can handle sizes larger than 2 or 4
218 for the checksum to update.
219
220 A call to this helper is susceptible to change the under‐
221 lying packet buffer. Therefore, at load time, all checks
222 on pointers previously done by the verifier are invali‐
223 dated and must be performed again, if the helper is used
224 in combination with direct packet access.
225
226 Return 0 on success, or a negative error in case of failure.
227
228 long bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
229 to, u64 flags)
230
231 Description
232 Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum
233 for the packet associated to skb. Computation is incre‐
234 mental, so the helper must know the former value of the
235 header field that was modified (from), the new value of
236 this field (to), and the number of bytes (2 or 4) for
237 this field, stored on the lowest four bits of flags. Al‐
238 ternatively, it is possible to store the difference be‐
239 tween the previous and the new values of the header field
240 in to, by setting from and the four lowest bits of flags
241 to 0. For both methods, offset indicates the location of
242 the IP checksum within the packet. In addition to the
243 size of the field, flags can be added (bitwise OR) actual
244 flags. With BPF_F_MARK_MANGLED_0, a null checksum is left
245 untouched (unless BPF_F_MARK_ENFORCE is added as well),
246 and for updates resulting in a null checksum the value is
247 set to CSUM_MANGLED_0 instead. Flag BPF_F_PSEUDO_HDR in‐
248 dicates the checksum is to be computed against a
249 pseudo-header.
250
251 This helper works in combination with bpf_csum_diff(),
252 which does not update the checksum in-place, but offers
253 more flexibility and can handle sizes larger than 2 or 4
254 for the checksum to update.
255
256 A call to this helper is susceptible to change the under‐
257 lying packet buffer. Therefore, at load time, all checks
258 on pointers previously done by the verifier are invali‐
259 dated and must be performed again, if the helper is used
260 in combination with direct packet access.
261
262 Return 0 on success, or a negative error in case of failure.
263
264 long bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 in‐
265 dex)
266
267 Description
268 This special helper is used to trigger a "tail call", or
269 in other words, to jump into another eBPF program. The
270 same stack frame is used (but values on stack and in reg‐
271 isters for the caller are not accessible to the callee).
272 This mechanism allows for program chaining, either for
273 raising the maximum number of available eBPF instruc‐
274 tions, or to execute given programs in conditional
275 blocks. For security reasons, there is an upper limit to
276 the number of successive tail calls that can be per‐
277 formed.
278
279 Upon call of this helper, the program attempts to jump
280 into a program referenced at index index in prog_ar‐
281 ray_map, a special map of type BPF_MAP_TYPE_PROG_ARRAY,
282 and passes ctx, a pointer to the context.
283
284 If the call succeeds, the kernel immediately runs the
285 first instruction of the new program. This is not a func‐
286 tion call, and it never returns to the previous program.
287 If the call fails, then the helper has no effect, and the
288 caller continues to run its subsequent instructions. A
289 call can fail if the destination program for the jump
290 does not exist (i.e. index is superior to the number of
291 entries in prog_array_map), or if the maximum number of
292 tail calls has been reached for this chain of programs.
293 This limit is defined in the kernel by the macro
294 MAX_TAIL_CALL_CNT (not accessible to user space), which
295 is currently set to 33.
296
297 Return 0 on success, or a negative error in case of failure.
298
299 long bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
300
301 Description
302 Clone and redirect the packet associated to skb to an‐
303 other net device of index ifindex. Both ingress and
304 egress interfaces can be used for redirection. The
305 BPF_F_INGRESS value in flags is used to make the distinc‐
306 tion (ingress path is selected if the flag is present,
307 egress path otherwise). This is the only flag supported
308 for now.
309
310 In comparison with bpf_redirect() helper, bpf_clone_redi‐
311 rect() has the associated cost of duplicating the packet
312 buffer, but this can be executed out of the eBPF program.
313 Conversely, bpf_redirect() is more efficient, but it is
314 handled through an action code where the redirection hap‐
315 pens only after the eBPF program has returned.
316
317 A call to this helper is susceptible to change the under‐
318 lying packet buffer. Therefore, at load time, all checks
319 on pointers previously done by the verifier are invali‐
320 dated and must be performed again, if the helper is used
321 in combination with direct packet access.
322
323 Return 0 on success, or a negative error in case of failure.
324
325 u64 bpf_get_current_pid_tgid(void)
326
327 Description
328 Get the current pid and tgid.
329
330 Return A 64-bit integer containing the current tgid and pid, and
331 created as such: current_task->tgid << 32 | cur‐
332 rent_task->pid.
333
334 u64 bpf_get_current_uid_gid(void)
335
336 Description
337 Get the current uid and gid.
338
339 Return A 64-bit integer containing the current GID and UID, and
340 created as such: current_gid << 32 | current_uid.
341
342 long bpf_get_current_comm(void *buf, u32 size_of_buf)
343
344 Description
345 Copy the comm attribute of the current task into buf of
346 size_of_buf. The comm attribute contains the name of the
347 executable (excluding the path) for the current task. The
348 size_of_buf must be strictly positive. On success, the
349 helper makes sure that the buf is NUL-terminated. On
350 failure, it is filled with zeroes.
351
352 Return 0 on success, or a negative error in case of failure.
353
354 u32 bpf_get_cgroup_classid(struct sk_buff *skb)
355
356 Description
357 Retrieve the classid for the current task, i.e. for the
358 net_cls cgroup to which skb belongs.
359
360 This helper can be used on TC egress path, but not on
361 ingress.
362
363 The net_cls cgroup provides an interface to tag network
364 packets based on a user-provided identifier for all traf‐
365 fic coming from the tasks belonging to the related
366 cgroup. See also the related kernel documentation, avail‐
367 able from the Linux sources in file Documentation/ad‐
368 min-guide/cgroup-v1/net_cls.rst.
369
370 The Linux kernel has two versions for cgroups: there are
371 cgroups v1 and cgroups v2. Both are available to users,
372 who can use a mixture of them, but note that the net_cls
373 cgroup is for cgroup v1 only. This makes it incompatible
374 with BPF programs run on cgroups, which is a
375 cgroup-v2-only feature (a socket can only hold data for
376 one version of cgroups at a time).
377
378 This helper is only available is the kernel was compiled
379 with the CONFIG_CGROUP_NET_CLASSID configuration option
380 set to "y" or to "m".
381
382 Return The classid, or 0 for the default unconfigured classid.
383
384 long bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16
385 vlan_tci)
386
387 Description
388 Push a vlan_tci (VLAN tag control information) of proto‐
389 col vlan_proto to the packet associated to skb, then up‐
390 date the checksum. Note that if vlan_proto is different
391 from ETH_P_8021Q and ETH_P_8021AD, it is considered to be
392 ETH_P_8021Q.
393
394 A call to this helper is susceptible to change the under‐
395 lying packet buffer. Therefore, at load time, all checks
396 on pointers previously done by the verifier are invali‐
397 dated and must be performed again, if the helper is used
398 in combination with direct packet access.
399
400 Return 0 on success, or a negative error in case of failure.
401
402 long bpf_skb_vlan_pop(struct sk_buff *skb)
403
404 Description
405 Pop a VLAN header from the packet associated to skb.
406
407 A call to this helper is susceptible to change the under‐
408 lying packet buffer. Therefore, at load time, all checks
409 on pointers previously done by the verifier are invali‐
410 dated and must be performed again, if the helper is used
411 in combination with direct packet access.
412
413 Return 0 on success, or a negative error in case of failure.
414
415 long bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key
416 *key, u32 size, u64 flags)
417
418 Description
419 Get tunnel metadata. This helper takes a pointer key to
420 an empty struct bpf_tunnel_key of size, that will be
421 filled with tunnel metadata for the packet associated to
422 skb. The flags can be set to BPF_F_TUNINFO_IPV6, which
423 indicates that the tunnel is based on IPv6 protocol in‐
424 stead of IPv4.
425
426 The struct bpf_tunnel_key is an object that generalizes
427 the principal parameters used by various tunneling proto‐
428 cols into a single struct. This way, it can be used to
429 easily make a decision based on the contents of the en‐
430 capsulation header, "summarized" in this struct. In par‐
431 ticular, it holds the IP address of the remote end (IPv4
432 or IPv6, depending on the case) in key->remote_ipv4 or
433 key->remote_ipv6. Also, this struct exposes the key->tun‐
434 nel_id, which is generally mapped to a VNI (Virtual Net‐
435 work Identifier), making it programmable together with
436 the bpf_skb_set_tunnel_key() helper.
437
438 Let's imagine that the following code is part of a pro‐
439 gram attached to the TC ingress interface, on one end of
440 a GRE tunnel, and is supposed to filter out all messages
441 coming from remote ends with IPv4 address other than
442 10.0.0.1:
443
444 int ret;
445 struct bpf_tunnel_key key = {};
446
447 ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
448 if (ret < 0)
449 return TC_ACT_SHOT; // drop packet
450
451 if (key.remote_ipv4 != 0x0a000001)
452 return TC_ACT_SHOT; // drop packet
453
454 return TC_ACT_OK; // accept packet
455
456 This interface can also be used with all encapsulation
457 devices that can operate in "collect metadata" mode: in‐
458 stead of having one network device per specific configu‐
459 ration, the "collect metadata" mode only requires a sin‐
460 gle device where the configuration can be extracted from
461 this helper.
462
463 This can be used together with various tunnels such as
464 VXLan, Geneve, GRE or IP in IP (IPIP).
465
466 Return 0 on success, or a negative error in case of failure.
467
468 long bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key
469 *key, u32 size, u64 flags)
470
471 Description
472 Populate tunnel metadata for packet associated to skb.
473 The tunnel metadata is set to the contents of key, of
474 size. The flags can be set to a combination of the fol‐
475 lowing values:
476
477 BPF_F_TUNINFO_IPV6
478 Indicate that the tunnel is based on IPv6 protocol
479 instead of IPv4.
480
481 BPF_F_ZERO_CSUM_TX
482 For IPv4 packets, add a flag to tunnel metadata
483 indicating that checksum computation should be
484 skipped and checksum set to zeroes.
485
486 BPF_F_DONT_FRAGMENT
487 Add a flag to tunnel metadata indicating that the
488 packet should not be fragmented.
489
490 BPF_F_SEQ_NUMBER
491 Add a flag to tunnel metadata indicating that a
492 sequence number should be added to tunnel header
493 before sending the packet. This flag was added for
494 GRE encapsulation, but might be used with other
495 protocols as well in the future.
496
497 Here is a typical usage on the transmit path:
498
499 struct bpf_tunnel_key key;
500 populate key ...
501 bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
502 bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
503
504 See also the description of the bpf_skb_get_tunnel_key()
505 helper for additional information.
506
507 Return 0 on success, or a negative error in case of failure.
508
509 u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
510
511 Description
512 Read the value of a perf event counter. This helper re‐
513 lies on a map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. The
514 nature of the perf event counter is selected when map is
515 updated with perf event file descriptors. The map is an
516 array whose size is the number of available CPUs, and
517 each cell contains a value relative to one CPU. The value
518 to retrieve is indicated by flags, that contains the in‐
519 dex of the CPU to look up, masked with BPF_F_INDEX_MASK.
520 Alternatively, flags can be set to BPF_F_CURRENT_CPU to
521 indicate that the value for the current CPU should be re‐
522 trieved.
523
524 Note that before Linux 4.13, only hardware perf event can
525 be retrieved.
526
527 Also, be aware that the newer helper
528 bpf_perf_event_read_value() is recommended over
529 bpf_perf_event_read() in general. The latter has some ABI
530 quirks where error and counter value are used as a return
531 code (which is wrong to do since ranges may overlap).
532 This issue is fixed with bpf_perf_event_read_value(),
533 which at the same time provides more features over the
534 bpf_perf_event_read() interface. Please refer to the de‐
535 scription of bpf_perf_event_read_value() for details.
536
537 Return The value of the perf event counter read from the map, or
538 a negative error code in case of failure.
539
540 long bpf_redirect(u32 ifindex, u64 flags)
541
542 Description
543 Redirect the packet to another net device of index
544 ifindex. This helper is somewhat similar to
545 bpf_clone_redirect(), except that the packet is not
546 cloned, which provides increased performance.
547
548 Except for XDP, both ingress and egress interfaces can be
549 used for redirection. The BPF_F_INGRESS value in flags is
550 used to make the distinction (ingress path is selected if
551 the flag is present, egress path otherwise). Currently,
552 XDP only supports redirection to the egress interface,
553 and accepts no flag at all.
554
555 The same effect can also be attained with the more
556 generic bpf_redirect_map(), which uses a BPF map to store
557 the redirect target instead of providing it directly to
558 the helper.
559
560 Return For XDP, the helper returns XDP_REDIRECT on success or
561 XDP_ABORTED on error. For other program types, the values
562 are TC_ACT_REDIRECT on success or TC_ACT_SHOT on error.
563
564 u32 bpf_get_route_realm(struct sk_buff *skb)
565
566 Description
567 Retrieve the realm or the route, that is to say the
568 tclassid field of the destination for the skb. The iden‐
569 tifier retrieved is a user-provided tag, similar to the
570 one used with the net_cls cgroup (see description for
571 bpf_get_cgroup_classid() helper), but here this tag is
572 held by a route (a destination entry), not by a task.
573
574 Retrieving this identifier works with the clsact TC
575 egress hook (see also tc-bpf(8)), or alternatively on
576 conventional classful egress qdiscs, but not on TC
577 ingress path. In case of clsact TC egress hook, this has
578 the advantage that, internally, the destination entry has
579 not been dropped yet in the transmit path. Therefore, the
580 destination entry does not need to be artificially held
581 via netif_keep_dst() for a classful qdisc until the skb
582 is freed.
583
584 This helper is available only if the kernel was compiled
585 with CONFIG_IP_ROUTE_CLASSID configuration option.
586
587 Return The realm of the route for the packet associated to skb,
588 or 0 if none was found.
589
590 long bpf_perf_event_output(void *ctx, struct bpf_map *map, u64 flags,
591 void *data, u64 size)
592
593 Description
594 Write raw data blob into a special BPF perf event held by
595 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
596 event must have the following attributes: PERF_SAMPLE_RAW
597 as sample_type, PERF_TYPE_SOFTWARE as type, and
598 PERF_COUNT_SW_BPF_OUTPUT as config.
599
600 The flags are used to indicate the index in map for which
601 the value must be put, masked with BPF_F_INDEX_MASK. Al‐
602 ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
603 dicate that the index of the current CPU core should be
604 used.
605
606 The value to write, of size, is passed through eBPF stack
607 and pointed by data.
608
609 The context of the program ctx needs also be passed to
610 the helper.
611
612 On user space, a program willing to read the values needs
613 to call perf_event_open() on the perf event (either for
614 one or for all CPUs) and to store the file descriptor
615 into the map. This must be done before the eBPF program
616 can send data into it. An example is available in file
617 samples/bpf/trace_output_user.c in the Linux kernel
618 source tree (the eBPF program counterpart is in sam‐
619 ples/bpf/trace_output_kern.c).
620
621 bpf_perf_event_output() achieves better performance than
622 bpf_trace_printk() for sharing data with user space, and
623 is much better suitable for streaming data from eBPF pro‐
624 grams.
625
626 Note that this helper is not restricted to tracing use
627 cases and can be used with programs attached to TC or XDP
628 as well, where it allows for passing data to user space
629 listeners. Data can be:
630
631 • Only custom structs,
632
633 • Only the packet payload, or
634
635 • A combination of both.
636
637 Return 0 on success, or a negative error in case of failure.
638
639 long bpf_skb_load_bytes(const void *skb, u32 offset, void *to, u32 len)
640
641 Description
642 This helper was provided as an easy way to load data from
643 a packet. It can be used to load len bytes from offset
644 from the packet associated to skb, into the buffer
645 pointed by to.
646
647 Since Linux 4.7, usage of this helper has mostly been re‐
648 placed by "direct packet access", enabling packet data to
649 be manipulated with skb->data and skb->data_end pointing
650 respectively to the first byte of packet data and to the
651 byte after the last byte of packet data. However, it re‐
652 mains useful if one wishes to read large quantities of
653 data at once from a packet into the eBPF stack.
654
655 Return 0 on success, or a negative error in case of failure.
656
657 long bpf_get_stackid(void *ctx, struct bpf_map *map, u64 flags)
658
659 Description
660 Walk a user or a kernel stack and return its id. To
661 achieve this, the helper needs ctx, which is a pointer to
662 the context on which the tracing program is executed, and
663 a pointer to a map of type BPF_MAP_TYPE_STACK_TRACE.
664
665 The last argument, flags, holds the number of stack
666 frames to skip (from 0 to 255), masked with
667 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set a
668 combination of the following flags:
669
670 BPF_F_USER_STACK
671 Collect a user space stack instead of a kernel
672 stack.
673
674 BPF_F_FAST_STACK_CMP
675 Compare stacks by hash only.
676
677 BPF_F_REUSE_STACKID
678 If two different stacks hash into the same
679 stackid, discard the old one.
680
681 The stack id retrieved is a 32 bit long integer handle
682 which can be further combined with other data (including
683 other stack ids) and used as a key into maps. This can be
684 useful for generating a variety of graphs (such as flame
685 graphs or off-cpu graphs).
686
687 For walking a stack, this helper is an improvement over
688 bpf_probe_read(), which can be used with unrolled loops
689 but is not efficient and consumes a lot of eBPF instruc‐
690 tions. Instead, bpf_get_stackid() can collect up to
691 PERF_MAX_STACK_DEPTH both kernel and user frames. Note
692 that this limit can be controlled with the sysctl pro‐
693 gram, and that it should be manually increased in order
694 to profile long user stacks (such as stacks for Java pro‐
695 grams). To do so, use:
696
697 # sysctl kernel.perf_event_max_stack=<new value>
698
699 Return The positive or null stack id on success, or a negative
700 error in case of failure.
701
702 s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size,
703 __wsum seed)
704
705 Description
706 Compute a checksum difference, from the raw buffer
707 pointed by from, of length from_size (that must be a mul‐
708 tiple of 4), towards the raw buffer pointed by to, of
709 size to_size (same remark). An optional seed can be added
710 to the value (this can be cascaded, the seed may come
711 from a previous call to the helper).
712
713 This is flexible enough to be used in several ways:
714
715 • With from_size == 0, to_size > 0 and seed set to check‐
716 sum, it can be used when pushing new data.
717
718 • With from_size > 0, to_size == 0 and seed set to check‐
719 sum, it can be used when removing data from a packet.
720
721 • With from_size > 0, to_size > 0 and seed set to 0, it
722 can be used to compute a diff. Note that from_size and
723 to_size do not need to be equal.
724
725 This helper can be used in combination with
726 bpf_l3_csum_replace() and bpf_l4_csum_replace(), to which
727 one can feed in the difference computed with
728 bpf_csum_diff().
729
730 Return The checksum result, or a negative error code in case of
731 failure.
732
733 long bpf_skb_get_tunnel_opt(struct sk_buff *skb, void *opt, u32 size)
734
735 Description
736 Retrieve tunnel options metadata for the packet associ‐
737 ated to skb, and store the raw tunnel option data to the
738 buffer opt of size.
739
740 This helper can be used with encapsulation devices that
741 can operate in "collect metadata" mode (please refer to
742 the related note in the description of bpf_skb_get_tun‐
743 nel_key() for more details). A particular example where
744 this can be used is in combination with the Geneve encap‐
745 sulation protocol, where it allows for pushing (with
746 bpf_skb_get_tunnel_opt() helper) and retrieving arbitrary
747 TLVs (Type-Length-Value headers) from the eBPF program.
748 This allows for full customization of these headers.
749
750 Return The size of the option data retrieved.
751
752 long bpf_skb_set_tunnel_opt(struct sk_buff *skb, void *opt, u32 size)
753
754 Description
755 Set tunnel options metadata for the packet associated to
756 skb to the option data contained in the raw buffer opt of
757 size.
758
759 See also the description of the bpf_skb_get_tunnel_opt()
760 helper for additional information.
761
762 Return 0 on success, or a negative error in case of failure.
763
764 long bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
765
766 Description
767 Change the protocol of the skb to proto. Currently sup‐
768 ported are transition from IPv4 to IPv6, and from IPv6 to
769 IPv4. The helper takes care of the groundwork for the
770 transition, including resizing the socket buffer. The
771 eBPF program is expected to fill the new headers, if any,
772 via skb_store_bytes() and to recompute the checksums with
773 bpf_l3_csum_replace() and bpf_l4_csum_replace(). The main
774 case for this helper is to perform NAT64 operations out
775 of an eBPF program.
776
777 Internally, the GSO type is marked as dodgy so that head‐
778 ers are checked and segments are recalculated by the
779 GSO/GRO engine. The size for GSO target is adapted as
780 well.
781
782 All values for flags are reserved for future usage, and
783 must be left at zero.
784
785 A call to this helper is susceptible to change the under‐
786 lying packet buffer. Therefore, at load time, all checks
787 on pointers previously done by the verifier are invali‐
788 dated and must be performed again, if the helper is used
789 in combination with direct packet access.
790
791 Return 0 on success, or a negative error in case of failure.
792
793 long bpf_skb_change_type(struct sk_buff *skb, u32 type)
794
795 Description
796 Change the packet type for the packet associated to skb.
797 This comes down to setting skb->pkt_type to type, except
798 the eBPF program does not have a write access to
799 skb->pkt_type beside this helper. Using a helper here al‐
800 lows for graceful handling of errors.
801
802 The major use case is to change incoming skb*s to
803 **PACKET_HOST* in a programmatic way instead of having to
804 recirculate via redirect(..., BPF_F_INGRESS), for exam‐
805 ple.
806
807 Note that type only allows certain values. At this time,
808 they are:
809
810 PACKET_HOST
811 Packet is for us.
812
813 PACKET_BROADCAST
814 Send packet to all.
815
816 PACKET_MULTICAST
817 Send packet to group.
818
819 PACKET_OTHERHOST
820 Send packet to someone else.
821
822 Return 0 on success, or a negative error in case of failure.
823
824 long bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32
825 index)
826
827 Description
828 Check whether skb is a descendant of the cgroup2 held by
829 map of type BPF_MAP_TYPE_CGROUP_ARRAY, at index.
830
831 Return The return value depends on the result of the test, and
832 can be:
833
834 • 0, if the skb failed the cgroup2 descendant test.
835
836 • 1, if the skb succeeded the cgroup2 descendant test.
837
838 • A negative error code, if an error occurred.
839
840 u32 bpf_get_hash_recalc(struct sk_buff *skb)
841
842 Description
843 Retrieve the hash of the packet, skb->hash. If it is not
844 set, in particular if the hash was cleared due to man‐
845 gling, recompute this hash. Later accesses to the hash
846 can be done directly with skb->hash.
847
848 Calling bpf_set_hash_invalid(), changing a packet proto‐
849 type with bpf_skb_change_proto(), or calling
850 bpf_skb_store_bytes() with the BPF_F_INVALIDATE_HASH are
851 actions susceptible to clear the hash and to trigger a
852 new computation for the next call to bpf_get_hash_re‐
853 calc().
854
855 Return The 32-bit hash.
856
857 u64 bpf_get_current_task(void)
858
859 Description
860 Get the current task.
861
862 Return A pointer to the current task struct.
863
864 long bpf_probe_write_user(void *dst, const void *src, u32 len)
865
866 Description
867 Attempt in a safe way to write len bytes from the buffer
868 src to dst in memory. It only works for threads that are
869 in user context, and dst must be a valid user space ad‐
870 dress.
871
872 This helper should not be used to implement any kind of
873 security mechanism because of TOC-TOU attacks, but rather
874 to debug, divert, and manipulate execution of semi-coop‐
875 erative processes.
876
877 Keep in mind that this feature is meant for experiments,
878 and it has a risk of crashing the system and running pro‐
879 grams. Therefore, when an eBPF program using this helper
880 is attached, a warning including PID and process name is
881 printed to kernel logs.
882
883 Return 0 on success, or a negative error in case of failure.
884
885 long bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
886
887 Description
888 Check whether the probe is being run is the context of a
889 given subset of the cgroup2 hierarchy. The cgroup2 to
890 test is held by map of type BPF_MAP_TYPE_CGROUP_ARRAY, at
891 index.
892
893 Return The return value depends on the result of the test, and
894 can be:
895
896 • 1, if current task belongs to the cgroup2.
897
898 • 0, if current task does not belong to the cgroup2.
899
900 • A negative error code, if an error occurred.
901
902 long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
903
904 Description
905 Resize (trim or grow) the packet associated to skb to the
906 new len. The flags are reserved for future usage, and
907 must be left at zero.
908
909 The basic idea is that the helper performs the needed
910 work to change the size of the packet, then the eBPF pro‐
911 gram rewrites the rest via helpers like
912 bpf_skb_store_bytes(), bpf_l3_csum_replace(),
913 bpf_l3_csum_replace() and others. This helper is a slow
914 path utility intended for replies with control messages.
915 And because it is targeted for slow path, the helper it‐
916 self can afford to be slow: it implicitly linearizes, un‐
917 clones and drops offloads from the skb.
918
919 A call to this helper is susceptible to change the under‐
920 lying packet buffer. Therefore, at load time, all checks
921 on pointers previously done by the verifier are invali‐
922 dated and must be performed again, if the helper is used
923 in combination with direct packet access.
924
925 Return 0 on success, or a negative error in case of failure.
926
927 long bpf_skb_pull_data(struct sk_buff *skb, u32 len)
928
929 Description
930 Pull in non-linear data in case the skb is non-linear and
931 not all of len are part of the linear section. Make len
932 bytes from skb readable and writable. If a zero value is
933 passed for len, then all bytes in the linear part of skb
934 will be made readable and writable.
935
936 This helper is only needed for reading and writing with
937 direct packet access.
938
939 For direct packet access, testing that offsets to access
940 are within packet boundaries (test on skb->data_end) is
941 susceptible to fail if offsets are invalid, or if the re‐
942 quested data is in non-linear parts of the skb. On fail‐
943 ure the program can just bail out, or in the case of a
944 non-linear buffer, use a helper to make the data avail‐
945 able. The bpf_skb_load_bytes() helper is a first solution
946 to access the data. Another one consists in using
947 bpf_skb_pull_data to pull in once the non-linear parts,
948 then retesting and eventually access the data.
949
950 At the same time, this also makes sure the skb is un‐
951 cloned, which is a necessary condition for direct write.
952 As this needs to be an invariant for the write part only,
953 the verifier detects writes and adds a prologue that is
954 calling bpf_skb_pull_data() to effectively unclone the
955 skb from the very beginning in case it is indeed cloned.
956
957 A call to this helper is susceptible to change the under‐
958 lying packet buffer. Therefore, at load time, all checks
959 on pointers previously done by the verifier are invali‐
960 dated and must be performed again, if the helper is used
961 in combination with direct packet access.
962
963 Return 0 on success, or a negative error in case of failure.
964
965 s64 bpf_csum_update(struct sk_buff *skb, __wsum csum)
966
967 Description
968 Add the checksum csum into skb->csum in case the driver
969 has supplied a checksum for the entire packet into that
970 field. Return an error otherwise. This helper is intended
971 to be used in combination with bpf_csum_diff(), in par‐
972 ticular when the checksum needs to be updated after data
973 has been written into the packet through direct packet
974 access.
975
976 Return The checksum on success, or a negative error code in case
977 of failure.
978
979 void bpf_set_hash_invalid(struct sk_buff *skb)
980
981 Description
982 Invalidate the current skb->hash. It can be used after
983 mangling on headers through direct packet access, in or‐
984 der to indicate that the hash is outdated and to trigger
985 a recalculation the next time the kernel tries to access
986 this hash or when the bpf_get_hash_recalc() helper is
987 called.
988
989 Return void.
990
991 long bpf_get_numa_node_id(void)
992
993 Description
994 Return the id of the current NUMA node. The primary use
995 case for this helper is the selection of sockets for the
996 local NUMA node, when the program is attached to sockets
997 using the SO_ATTACH_REUSEPORT_EBPF option (see also
998 socket(7)), but the helper is also available to other
999 eBPF program types, similarly to bpf_get_smp_proces‐
1000 sor_id().
1001
1002 Return The id of current NUMA node.
1003
1004 long bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
1005
1006 Description
1007 Grows headroom of packet associated to skb and adjusts
1008 the offset of the MAC header accordingly, adding len
1009 bytes of space. It automatically extends and reallocates
1010 memory as required.
1011
1012 This helper can be used on a layer 3 skb to push a MAC
1013 header for redirection into a layer 2 device.
1014
1015 All values for flags are reserved for future usage, and
1016 must be left at zero.
1017
1018 A call to this helper is susceptible to change the under‐
1019 lying packet buffer. Therefore, at load time, all checks
1020 on pointers previously done by the verifier are invali‐
1021 dated and must be performed again, if the helper is used
1022 in combination with direct packet access.
1023
1024 Return 0 on success, or a negative error in case of failure.
1025
1026 long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
1027
1028 Description
1029 Adjust (move) xdp_md->data by delta bytes. Note that it
1030 is possible to use a negative value for delta. This
1031 helper can be used to prepare the packet for pushing or
1032 popping headers.
1033
1034 A call to this helper is susceptible to change the under‐
1035 lying packet buffer. Therefore, at load time, all checks
1036 on pointers previously done by the verifier are invali‐
1037 dated and must be performed again, if the helper is used
1038 in combination with direct packet access.
1039
1040 Return 0 on success, or a negative error in case of failure.
1041
1042 long bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
1043
1044 Description
1045 Copy a NUL terminated string from an unsafe kernel ad‐
1046 dress unsafe_ptr to dst. See bpf_probe_read_kernel_str()
1047 for more details.
1048
1049 Generally, use bpf_probe_read_user_str() or
1050 bpf_probe_read_kernel_str() instead.
1051
1052 Return On success, the strictly positive length of the string,
1053 including the trailing NUL character. On error, a nega‐
1054 tive value.
1055
1056 u64 bpf_get_socket_cookie(struct sk_buff *skb)
1057
1058 Description
1059 If the struct sk_buff pointed by skb has a known socket,
1060 retrieve the cookie (generated by the kernel) of this
1061 socket. If no cookie has been set yet, generate a new
1062 cookie. Once generated, the socket cookie remains stable
1063 for the life of the socket. This helper can be useful for
1064 monitoring per socket networking traffic statistics as it
1065 provides a global socket identifier that can be assumed
1066 unique.
1067
1068 Return A 8-byte long unique number on success, or 0 if the
1069 socket field is missing inside skb.
1070
1071 u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
1072
1073 Description
1074 Equivalent to bpf_get_socket_cookie() helper that accepts
1075 skb, but gets socket from struct bpf_sock_addr context.
1076
1077 Return A 8-byte long unique number.
1078
1079 u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
1080
1081 Description
1082 Equivalent to bpf_get_socket_cookie() helper that accepts
1083 skb, but gets socket from struct bpf_sock_ops context.
1084
1085 Return A 8-byte long unique number.
1086
1087 u64 bpf_get_socket_cookie(struct sock *sk)
1088
1089 Description
1090 Equivalent to bpf_get_socket_cookie() helper that accepts
1091 sk, but gets socket from a BTF struct sock. This helper
1092 also works for sleepable programs.
1093
1094 Return A 8-byte long unique number or 0 if sk is NULL.
1095
1096 u32 bpf_get_socket_uid(struct sk_buff *skb)
1097
1098 Description
1099 Get the owner UID of the socked associated to skb.
1100
1101 Return The owner UID of the socket associated to skb. If the
1102 socket is NULL, or if it is not a full socket (i.e. if it
1103 is a time-wait or a request socket instead), overflowuid
1104 value is returned (note that overflowuid might also be
1105 the actual UID value for the socket).
1106
1107 long bpf_set_hash(struct sk_buff *skb, u32 hash)
1108
1109 Description
1110 Set the full hash for skb (set the field skb->hash) to
1111 value hash.
1112
1113 Return 0
1114
1115 long bpf_setsockopt(void *bpf_socket, int level, int optname, void
1116 *optval, int optlen)
1117
1118 Description
1119 Emulate a call to setsockopt() on the socket associated
1120 to bpf_socket, which must be a full socket. The level at
1121 which the option resides and the name optname of the op‐
1122 tion must be specified, see setsockopt(2) for more infor‐
1123 mation. The option value of length optlen is pointed by
1124 optval.
1125
1126 bpf_socket should be one of the following:
1127
1128 • struct bpf_sock_ops for BPF_PROG_TYPE_SOCK_OPS.
1129
1130 • struct bpf_sock_addr for BPF_CGROUP_INET4_CONNECT and
1131 BPF_CGROUP_INET6_CONNECT.
1132
1133 This helper actually implements a subset of setsockopt().
1134 It supports the following levels:
1135
1136 • SOL_SOCKET, which supports the following optnames:
1137 SO_RCVBUF, SO_SNDBUF, SO_MAX_PACING_RATE, SO_PRIORITY,
1138 SO_RCVLOWAT, SO_MARK, SO_BINDTODEVICE, SO_KEEPALIVE.
1139
1140 • IPPROTO_TCP, which supports the following optnames:
1141 TCP_CONGESTION, TCP_BPF_IW, TCP_BPF_SNDCWND_CLAMP,
1142 TCP_SAVE_SYN, TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT,
1143 TCP_SYNCNT, TCP_USER_TIMEOUT, TCP_NOTSENT_LOWAT.
1144
1145 • IPPROTO_IP, which supports optname IP_TOS.
1146
1147 • IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1148
1149 Return 0 on success, or a negative error in case of failure.
1150
1151 long bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode,
1152 u64 flags)
1153
1154 Description
1155 Grow or shrink the room for data in the packet associated
1156 to skb by len_diff, and according to the selected mode.
1157
1158 By default, the helper will reset any offloaded checksum
1159 indicator of the skb to CHECKSUM_NONE. This can be
1160 avoided by the following flag:
1161
1162 • BPF_F_ADJ_ROOM_NO_CSUM_RESET: Do not reset offloaded
1163 checksum data of the skb to CHECKSUM_NONE.
1164
1165 There are two supported modes at this time:
1166
1167 • BPF_ADJ_ROOM_MAC: Adjust room at the mac layer (room
1168 space is added or removed between the layer 2 and layer
1169 3 headers).
1170
1171 • BPF_ADJ_ROOM_NET: Adjust room at the network layer
1172 (room space is added or removed between the layer 3 and
1173 layer 4 headers).
1174
1175 The following flags are supported at this time:
1176
1177 • BPF_F_ADJ_ROOM_FIXED_GSO: Do not adjust gso_size. Ad‐
1178 justing mss in this way is not allowed for datagrams.
1179
1180 • BPF_F_ADJ_ROOM_ENCAP_L3_IPV4, BPF_F_ADJ_ROOM_EN‐
1181 CAP_L3_IPV6: Any new space is reserved to hold a tunnel
1182 header. Configure skb offsets and other fields accord‐
1183 ingly.
1184
1185 • BPF_F_ADJ_ROOM_ENCAP_L4_GRE, BPF_F_ADJ_ROOM_EN‐
1186 CAP_L4_UDP: Use with ENCAP_L3 flags to further specify
1187 the tunnel type.
1188
1189 • BPF_F_ADJ_ROOM_ENCAP_L2(len): Use with ENCAP_L3/L4
1190 flags to further specify the tunnel type; len is the
1191 length of the inner MAC header.
1192
1193 • BPF_F_ADJ_ROOM_ENCAP_L2_ETH: Use with
1194 BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the L2
1195 type as Ethernet.
1196
1197 A call to this helper is susceptible to change the under‐
1198 lying packet buffer. Therefore, at load time, all checks
1199 on pointers previously done by the verifier are invali‐
1200 dated and must be performed again, if the helper is used
1201 in combination with direct packet access.
1202
1203 Return 0 on success, or a negative error in case of failure.
1204
1205 long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
1206
1207 Description
1208 Redirect the packet to the endpoint referenced by map at
1209 index key. Depending on its type, this map can contain
1210 references to net devices (for forwarding packets through
1211 other ports), or to CPUs (for redirecting XDP frames to
1212 another CPU; but this is only implemented for native XDP
1213 (with driver support) as of this writing).
1214
1215 The lower two bits of flags are used as the return code
1216 if the map lookup fails. This is so that the return value
1217 can be one of the XDP program return codes up to XDP_TX,
1218 as chosen by the caller. The higher bits of flags can be
1219 set to BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS as de‐
1220 fined below.
1221
1222 With BPF_F_BROADCAST the packet will be broadcasted to
1223 all the interfaces in the map, with BPF_F_EXCLUDE_INGRESS
1224 the ingress interface will be excluded when do broadcast‐
1225 ing.
1226
1227 See also bpf_redirect(), which only supports redirecting
1228 to an ifindex, but doesn't require a map to do so.
1229
1230 Return XDP_REDIRECT on success, or the value of the two lower
1231 bits of the flags argument on error.
1232
1233 long bpf_sk_redirect_map(struct sk_buff *skb, struct bpf_map *map, u32
1234 key, u64 flags)
1235
1236 Description
1237 Redirect the packet to the socket referenced by map (of
1238 type BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1239 egress interfaces can be used for redirection. The
1240 BPF_F_INGRESS value in flags is used to make the distinc‐
1241 tion (ingress path is selected if the flag is present,
1242 egress path otherwise). This is the only flag supported
1243 for now.
1244
1245 Return SK_PASS on success, or SK_DROP on error.
1246
1247 long bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map
1248 *map, void *key, u64 flags)
1249
1250 Description
1251 Add an entry to, or update a map referencing sockets. The
1252 skops is used as a new value for the entry associated to
1253 key. flags is one of:
1254
1255 BPF_NOEXIST
1256 The entry for key must not exist in the map.
1257
1258 BPF_EXIST
1259 The entry for key must already exist in the map.
1260
1261 BPF_ANY
1262 No condition on the existence of the entry for
1263 key.
1264
1265 If the map has eBPF programs (parser and verdict), those
1266 will be inherited by the socket being added. If the
1267 socket is already attached to eBPF programs, this results
1268 in an error.
1269
1270 Return 0 on success, or a negative error in case of failure.
1271
1272 long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
1273
1274 Description
1275 Adjust the address pointed by xdp_md->data_meta by delta
1276 (which can be positive or negative). Note that this oper‐
1277 ation modifies the address stored in xdp_md->data, so the
1278 latter must be loaded only after the helper has been
1279 called.
1280
1281 The use of xdp_md->data_meta is optional and programs are
1282 not required to use it. The rationale is that when the
1283 packet is processed with XDP (e.g. as DoS filter), it is
1284 possible to push further meta data along with it before
1285 passing to the stack, and to give the guarantee that an
1286 ingress eBPF program attached as a TC classifier on the
1287 same device can pick this up for further post-processing.
1288 Since TC works with socket buffers, it remains possible
1289 to set from XDP the mark or priority pointers, or other
1290 pointers for the socket buffer. Having this scratch
1291 space generic and programmable allows for more flexibil‐
1292 ity as the user is free to store whatever meta data they
1293 need.
1294
1295 A call to this helper is susceptible to change the under‐
1296 lying packet buffer. Therefore, at load time, all checks
1297 on pointers previously done by the verifier are invali‐
1298 dated and must be performed again, if the helper is used
1299 in combination with direct packet access.
1300
1301 Return 0 on success, or a negative error in case of failure.
1302
1303 long bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct
1304 bpf_perf_event_value *buf, u32 buf_size)
1305
1306 Description
1307 Read the value of a perf event counter, and store it into
1308 buf of size buf_size. This helper relies on a map of type
1309 BPF_MAP_TYPE_PERF_EVENT_ARRAY. The nature of the perf
1310 event counter is selected when map is updated with perf
1311 event file descriptors. The map is an array whose size is
1312 the number of available CPUs, and each cell contains a
1313 value relative to one CPU. The value to retrieve is indi‐
1314 cated by flags, that contains the index of the CPU to
1315 look up, masked with BPF_F_INDEX_MASK. Alternatively,
1316 flags can be set to BPF_F_CURRENT_CPU to indicate that
1317 the value for the current CPU should be retrieved.
1318
1319 This helper behaves in a way close to
1320 bpf_perf_event_read() helper, save that instead of just
1321 returning the value observed, it fills the buf structure.
1322 This allows for additional data to be retrieved: in par‐
1323 ticular, the enabled and running times (in buf->enabled
1324 and buf->running, respectively) are copied. In general,
1325 bpf_perf_event_read_value() is recommended over
1326 bpf_perf_event_read(), which has some ABI issues and pro‐
1327 vides fewer functionalities.
1328
1329 These values are interesting, because hardware PMU (Per‐
1330 formance Monitoring Unit) counters are limited resources.
1331 When there are more PMU based perf events opened than
1332 available counters, kernel will multiplex these events so
1333 each event gets certain percentage (but not all) of the
1334 PMU time. In case that multiplexing happens, the number
1335 of samples or counter value will not reflect the case
1336 compared to when no multiplexing occurs. This makes com‐
1337 parison between different runs difficult. Typically, the
1338 counter value should be normalized before comparing to
1339 other experiments. The usual normalization is done as
1340 follows.
1341
1342 normalized_counter = counter * t_enabled / t_running
1343
1344 Where t_enabled is the time enabled for event and t_run‐
1345 ning is the time running for event since last normaliza‐
1346 tion. The enabled and running times are accumulated since
1347 the perf event open. To achieve scaling factor between
1348 two invocations of an eBPF program, users can use CPU id
1349 as the key (which is typical for perf array usage model)
1350 to remember the previous value and do the calculation in‐
1351 side the eBPF program.
1352
1353 Return 0 on success, or a negative error in case of failure.
1354
1355 long bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct
1356 bpf_perf_event_value *buf, u32 buf_size)
1357
1358 Description
1359 For en eBPF program attached to a perf event, retrieve
1360 the value of the event counter associated to ctx and
1361 store it in the structure pointed by buf and of size
1362 buf_size. Enabled and running times are also stored in
1363 the structure (see description of helper
1364 bpf_perf_event_read_value() for more details).
1365
1366 Return 0 on success, or a negative error in case of failure.
1367
1368 long bpf_getsockopt(void *bpf_socket, int level, int optname, void
1369 *optval, int optlen)
1370
1371 Description
1372 Emulate a call to getsockopt() on the socket associated
1373 to bpf_socket, which must be a full socket. The level at
1374 which the option resides and the name optname of the op‐
1375 tion must be specified, see getsockopt(2) for more infor‐
1376 mation. The retrieved value is stored in the structure
1377 pointed by opval and of length optlen.
1378
1379 bpf_socket should be one of the following:
1380
1381 • struct bpf_sock_ops for BPF_PROG_TYPE_SOCK_OPS.
1382
1383 • struct bpf_sock_addr for BPF_CGROUP_INET4_CONNECT and
1384 BPF_CGROUP_INET6_CONNECT.
1385
1386 This helper actually implements a subset of getsockopt().
1387 It supports the following levels:
1388
1389 • IPPROTO_TCP, which supports optname TCP_CONGESTION.
1390
1391 • IPPROTO_IP, which supports optname IP_TOS.
1392
1393 • IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1394
1395 Return 0 on success, or a negative error in case of failure.
1396
1397 long bpf_override_return(struct pt_regs *regs, u64 rc)
1398
1399 Description
1400 Used for error injection, this helper uses kprobes to
1401 override the return value of the probed function, and to
1402 set it to rc. The first argument is the context regs on
1403 which the kprobe works.
1404
1405 This helper works by setting the PC (program counter) to
1406 an override function which is run in place of the origi‐
1407 nal probed function. This means the probed function is
1408 not run at all. The replacement function just returns
1409 with the required value.
1410
1411 This helper has security implications, and thus is sub‐
1412 ject to restrictions. It is only available if the kernel
1413 was compiled with the CONFIG_BPF_KPROBE_OVERRIDE configu‐
1414 ration option, and in this case it only works on func‐
1415 tions tagged with ALLOW_ERROR_INJECTION in the kernel
1416 code.
1417
1418 Also, the helper is only available for the architectures
1419 having the CONFIG_FUNCTION_ERROR_INJECTION option. As of
1420 this writing, x86 architecture is the only one to support
1421 this feature.
1422
1423 Return 0
1424
1425 long bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int
1426 argval)
1427
1428 Description
1429 Attempt to set the value of the bpf_sock_ops_cb_flags
1430 field for the full TCP socket associated to bpf_sock_ops
1431 to argval.
1432
1433 The primary use of this field is to determine if there
1434 should be calls to eBPF programs of type
1435 BPF_PROG_TYPE_SOCK_OPS at various points in the TCP code.
1436 A program of the same type can change its value, per con‐
1437 nection and as necessary, when the connection is estab‐
1438 lished. This field is directly accessible for reading,
1439 but this helper must be used for updates in order to re‐
1440 turn an error if an eBPF program tries to set a callback
1441 that is not supported in the current kernel.
1442
1443 argval is a flag array which can combine these flags:
1444
1445 • BPF_SOCK_OPS_RTO_CB_FLAG (retransmission time out)
1446
1447 • BPF_SOCK_OPS_RETRANS_CB_FLAG (retransmission)
1448
1449 • BPF_SOCK_OPS_STATE_CB_FLAG (TCP state change)
1450
1451 • BPF_SOCK_OPS_RTT_CB_FLAG (every RTT)
1452
1453 Therefore, this function can be used to clear a callback
1454 flag by setting the appropriate bit to zero. e.g. to dis‐
1455 able the RTO callback:
1456
1457 bpf_sock_ops_cb_flags_set(bpf_sock,
1458 bpf_sock->bpf_sock_ops_cb_flags &
1459 ~BPF_SOCK_OPS_RTO_CB_FLAG)
1460
1461 Here are some examples of where one could call such eBPF
1462 program:
1463
1464 • When RTO fires.
1465
1466 • When a packet is retransmitted.
1467
1468 • When the connection terminates.
1469
1470 • When a packet is sent.
1471
1472 • When a packet is received.
1473
1474 Return Code -EINVAL if the socket is not a full TCP socket; oth‐
1475 erwise, a positive number containing the bits that could
1476 not be set is returned (which comes down to 0 if all bits
1477 were set as required).
1478
1479 long bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map,
1480 u32 key, u64 flags)
1481
1482 Description
1483 This helper is used in programs implementing policies at
1484 the socket level. If the message msg is allowed to pass
1485 (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1486 rect it to the socket referenced by map (of type
1487 BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1488 egress interfaces can be used for redirection. The
1489 BPF_F_INGRESS value in flags is used to make the distinc‐
1490 tion (ingress path is selected if the flag is present,
1491 egress path otherwise). This is the only flag supported
1492 for now.
1493
1494 Return SK_PASS on success, or SK_DROP on error.
1495
1496 long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
1497
1498 Description
1499 For socket policies, apply the verdict of the eBPF pro‐
1500 gram to the next bytes (number of bytes) of message msg.
1501
1502 For example, this helper can be used in the following
1503 cases:
1504
1505 • A single sendmsg() or sendfile() system call contains
1506 multiple logical messages that the eBPF program is sup‐
1507 posed to read and for which it should apply a verdict.
1508
1509 • An eBPF program only cares to read the first bytes of a
1510 msg. If the message has a large payload, then setting
1511 up and calling the eBPF program repeatedly for all
1512 bytes, even though the verdict is already known, would
1513 create unnecessary overhead.
1514
1515 When called from within an eBPF program, the helper sets
1516 a counter internal to the BPF infrastructure, that is
1517 used to apply the last verdict to the next bytes. If
1518 bytes is smaller than the current data being processed
1519 from a sendmsg() or sendfile() system call, the first
1520 bytes will be sent and the eBPF program will be re-run
1521 with the pointer for start of data pointing to byte num‐
1522 ber bytes + 1. If bytes is larger than the current data
1523 being processed, then the eBPF verdict will be applied to
1524 multiple sendmsg() or sendfile() calls until bytes are
1525 consumed.
1526
1527 Note that if a socket closes with the internal counter
1528 holding a non-zero value, this is not a problem because
1529 data is not being buffered for bytes and is sent as it is
1530 received.
1531
1532 Return 0
1533
1534 long bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
1535
1536 Description
1537 For socket policies, prevent the execution of the verdict
1538 eBPF program for message msg until bytes (byte number)
1539 have been accumulated.
1540
1541 This can be used when one needs a specific number of
1542 bytes before a verdict can be assigned, even if the data
1543 spans multiple sendmsg() or sendfile() calls. The extreme
1544 case would be a user calling sendmsg() repeatedly with
1545 1-byte long message segments. Obviously, this is bad for
1546 performance, but it is still valid. If the eBPF program
1547 needs bytes bytes to validate a header, this helper can
1548 be used to prevent the eBPF program to be called again
1549 until bytes have been accumulated.
1550
1551 Return 0
1552
1553 long bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64
1554 flags)
1555
1556 Description
1557 For socket policies, pull in non-linear data from user
1558 space for msg and set pointers msg->data and
1559 msg->data_end to start and end bytes offsets into msg,
1560 respectively.
1561
1562 If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
1563 it can only parse data that the (data, data_end) pointers
1564 have already consumed. For sendmsg() hooks this is likely
1565 the first scatterlist element. But for calls relying on
1566 the sendpage handler (e.g. sendfile()) this will be the
1567 range (0, 0) because the data is shared with user space
1568 and by default the objective is to avoid allowing user
1569 space to modify data while (or after) eBPF verdict is be‐
1570 ing decided. This helper can be used to pull in data and
1571 to set the start and end pointer to given values. Data
1572 will be copied if necessary (i.e. if data was not linear
1573 and if start and end pointers do not point to the same
1574 chunk).
1575
1576 A call to this helper is susceptible to change the under‐
1577 lying packet buffer. Therefore, at load time, all checks
1578 on pointers previously done by the verifier are invali‐
1579 dated and must be performed again, if the helper is used
1580 in combination with direct packet access.
1581
1582 All values for flags are reserved for future usage, and
1583 must be left at zero.
1584
1585 Return 0 on success, or a negative error in case of failure.
1586
1587 long bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int
1588 addr_len)
1589
1590 Description
1591 Bind the socket associated to ctx to the address pointed
1592 by addr, of length addr_len. This allows for making out‐
1593 going connection from the desired IP address, which can
1594 be useful for example when all processes inside a cgroup
1595 should use one single IP address on a host that has mul‐
1596 tiple IP configured.
1597
1598 This helper works for IPv4 and IPv6, TCP and UDP sockets.
1599 The domain (addr->sa_family) must be AF_INET (or
1600 AF_INET6). It's advised to pass zero port (sin_port or
1601 sin6_port) which triggers IP_BIND_ADDRESS_NO_PORT-like
1602 behavior and lets the kernel efficiently pick up an un‐
1603 used port as long as 4-tuple is unique. Passing non-zero
1604 port might lead to degraded performance.
1605
1606 Return 0 on success, or a negative error in case of failure.
1607
1608 long bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
1609
1610 Description
1611 Adjust (move) xdp_md->data_end by delta bytes. It is pos‐
1612 sible to both shrink and grow the packet tail. Shrink
1613 done via delta being a negative integer.
1614
1615 A call to this helper is susceptible to change the under‐
1616 lying packet buffer. Therefore, at load time, all checks
1617 on pointers previously done by the verifier are invali‐
1618 dated and must be performed again, if the helper is used
1619 in combination with direct packet access.
1620
1621 Return 0 on success, or a negative error in case of failure.
1622
1623 long bpf_skb_get_xfrm_state(struct sk_buff *skb, u32 index, struct
1624 bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
1625
1626 Description
1627 Retrieve the XFRM state (IP transform framework, see also
1628 ip-xfrm(8)) at index in XFRM "security path" for skb.
1629
1630 The retrieved value is stored in the struct
1631 bpf_xfrm_state pointed by xfrm_state and of length size.
1632
1633 All values for flags are reserved for future usage, and
1634 must be left at zero.
1635
1636 This helper is available only if the kernel was compiled
1637 with CONFIG_XFRM configuration option.
1638
1639 Return 0 on success, or a negative error in case of failure.
1640
1641 long bpf_get_stack(void *ctx, void *buf, u32 size, u64 flags)
1642
1643 Description
1644 Return a user or a kernel stack in bpf program provided
1645 buffer. To achieve this, the helper needs ctx, which is
1646 a pointer to the context on which the tracing program is
1647 executed. To store the stacktrace, the bpf program pro‐
1648 vides buf with a nonnegative size.
1649
1650 The last argument, flags, holds the number of stack
1651 frames to skip (from 0 to 255), masked with
1652 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set
1653 the following flags:
1654
1655 BPF_F_USER_STACK
1656 Collect a user space stack instead of a kernel
1657 stack.
1658
1659 BPF_F_USER_BUILD_ID
1660 Collect (build_id, file_offset) instead of ips for
1661 user stack, only valid if BPF_F_USER_STACK is also
1662 specified.
1663
1664 file_offset is an offset relative to the beginning
1665 of the executable or shared object file backing
1666 the vma which the ip falls in. It is not an offset
1667 relative to that object's base address. Accord‐
1668 ingly, it must be adjusted by adding (sh_addr -
1669 sh_offset), where sh_{addr,offset} correspond to
1670 the executable section containing file_offset in
1671 the object, for comparisons to symbols' st_value
1672 to be valid.
1673
1674 bpf_get_stack() can collect up to PERF_MAX_STACK_DEPTH
1675 both kernel and user frames, subject to sufficient large
1676 buffer size. Note that this limit can be controlled with
1677 the sysctl program, and that it should be manually in‐
1678 creased in order to profile long user stacks (such as
1679 stacks for Java programs). To do so, use:
1680
1681 # sysctl kernel.perf_event_max_stack=<new value>
1682
1683 Return The non-negative copied buf length equal to or less than
1684 size on success, or a negative error in case of failure.
1685
1686 long bpf_skb_load_bytes_relative(const void *skb, u32 offset, void *to,
1687 u32 len, u32 start_header)
1688
1689 Description
1690 This helper is similar to bpf_skb_load_bytes() in that it
1691 provides an easy way to load len bytes from offset from
1692 the packet associated to skb, into the buffer pointed by
1693 to. The difference to bpf_skb_load_bytes() is that a
1694 fifth argument start_header exists in order to select a
1695 base offset to start from. start_header can be one of:
1696
1697 BPF_HDR_START_MAC
1698 Base offset to load data from is skb's mac header.
1699
1700 BPF_HDR_START_NET
1701 Base offset to load data from is skb's network
1702 header.
1703
1704 In general, "direct packet access" is the preferred
1705 method to access packet data, however, this helper is in
1706 particular useful in socket filters where skb->data does
1707 not always point to the start of the mac header and where
1708 "direct packet access" is not available.
1709
1710 Return 0 on success, or a negative error in case of failure.
1711
1712 long bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen,
1713 u32 flags)
1714
1715 Description
1716 Do FIB lookup in kernel tables using parameters in
1717 params. If lookup is successful and result shows packet
1718 is to be forwarded, the neighbor tables are searched for
1719 the nexthop. If successful (ie., FIB lookup shows for‐
1720 warding and nexthop is resolved), the nexthop address is
1721 returned in ipv4_dst or ipv6_dst based on family, smac is
1722 set to mac address of egress device, dmac is set to nex‐
1723 thop mac address, rt_metric is set to metric from route
1724 (IPv4/IPv6 only), and ifindex is set to the device index
1725 of the nexthop from the FIB lookup.
1726
1727 plen argument is the size of the passed in struct. flags
1728 argument can be a combination of one or more of the fol‐
1729 lowing values:
1730
1731 BPF_FIB_LOOKUP_DIRECT
1732 Do a direct table lookup vs full lookup using FIB
1733 rules.
1734
1735 BPF_FIB_LOOKUP_OUTPUT
1736 Perform lookup from an egress perspective (default
1737 is ingress).
1738
1739 ctx is either struct xdp_md for XDP programs or struct
1740 sk_buff tc cls_act programs.
1741
1742 Return
1743
1744 • < 0 if any input argument is invalid
1745
1746 • 0 on success (packet is forwarded, nexthop neighbor ex‐
1747 ists)
1748
1749 • > 0 one of BPF_FIB_LKUP_RET_ codes explaining why the
1750 packet is not forwarded or needs assist from full stack
1751
1752 If lookup fails with BPF_FIB_LKUP_RET_FRAG_NEEDED, then
1753 the MTU was exceeded and output params->mtu_result con‐
1754 tains the MTU.
1755
1756 long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map
1757 *map, void *key, u64 flags)
1758
1759 Description
1760 Add an entry to, or update a sockhash map referencing
1761 sockets. The skops is used as a new value for the entry
1762 associated to key. flags is one of:
1763
1764 BPF_NOEXIST
1765 The entry for key must not exist in the map.
1766
1767 BPF_EXIST
1768 The entry for key must already exist in the map.
1769
1770 BPF_ANY
1771 No condition on the existence of the entry for
1772 key.
1773
1774 If the map has eBPF programs (parser and verdict), those
1775 will be inherited by the socket being added. If the
1776 socket is already attached to eBPF programs, this results
1777 in an error.
1778
1779 Return 0 on success, or a negative error in case of failure.
1780
1781 long bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map
1782 *map, void *key, u64 flags)
1783
1784 Description
1785 This helper is used in programs implementing policies at
1786 the socket level. If the message msg is allowed to pass
1787 (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1788 rect it to the socket referenced by map (of type
1789 BPF_MAP_TYPE_SOCKHASH) using hash key. Both ingress and
1790 egress interfaces can be used for redirection. The
1791 BPF_F_INGRESS value in flags is used to make the distinc‐
1792 tion (ingress path is selected if the flag is present,
1793 egress path otherwise). This is the only flag supported
1794 for now.
1795
1796 Return SK_PASS on success, or SK_DROP on error.
1797
1798 long bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map,
1799 void *key, u64 flags)
1800
1801 Description
1802 This helper is used in programs implementing policies at
1803 the skb socket level. If the sk_buff skb is allowed to
1804 pass (i.e. if the verdict eBPF program returns SK_PASS),
1805 redirect it to the socket referenced by map (of type
1806 BPF_MAP_TYPE_SOCKHASH) using hash key. Both ingress and
1807 egress interfaces can be used for redirection. The
1808 BPF_F_INGRESS value in flags is used to make the distinc‐
1809 tion (ingress path is selected if the flag is present,
1810 egress otherwise). This is the only flag supported for
1811 now.
1812
1813 Return SK_PASS on success, or SK_DROP on error.
1814
1815 long bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32
1816 len)
1817
1818 Description
1819 Encapsulate the packet associated to skb within a Layer 3
1820 protocol header. This header is provided in the buffer at
1821 address hdr, with len its size in bytes. type indicates
1822 the protocol of the header and can be one of:
1823
1824 BPF_LWT_ENCAP_SEG6
1825 IPv6 encapsulation with Segment Routing Header
1826 (struct ipv6_sr_hdr). hdr only contains the SRH,
1827 the IPv6 header is computed by the kernel.
1828
1829 BPF_LWT_ENCAP_SEG6_INLINE
1830 Only works if skb contains an IPv6 packet. Insert
1831 a Segment Routing Header (struct ipv6_sr_hdr) in‐
1832 side the IPv6 header.
1833
1834 BPF_LWT_ENCAP_IP
1835 IP encapsulation (GRE/GUE/IPIP/etc). The outer
1836 header must be IPv4 or IPv6, followed by zero or
1837 more additional headers, up to LWT_BPF_MAX_HEAD‐
1838 ROOM total bytes in all prepended headers. Please
1839 note that if skb_is_gso(skb) is true, no more than
1840 two headers can be prepended, and the inner
1841 header, if present, should be either GRE or
1842 UDP/GUE.
1843
1844 BPF_LWT_ENCAP_SEG6* types can be called by BPF programs
1845 of type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can
1846 be called by bpf programs of types BPF_PROG_TYPE_LWT_IN
1847 and BPF_PROG_TYPE_LWT_XMIT.
1848
1849 A call to this helper is susceptible to change the under‐
1850 lying packet buffer. Therefore, at load time, all checks
1851 on pointers previously done by the verifier are invali‐
1852 dated and must be performed again, if the helper is used
1853 in combination with direct packet access.
1854
1855 Return 0 on success, or a negative error in case of failure.
1856
1857 long bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const
1858 void *from, u32 len)
1859
1860 Description
1861 Store len bytes from address from into the packet associ‐
1862 ated to skb, at offset. Only the flags, tag and TLVs in‐
1863 side the outermost IPv6 Segment Routing Header can be
1864 modified through this helper.
1865
1866 A call to this helper is susceptible to change the under‐
1867 lying packet buffer. Therefore, at load time, all checks
1868 on pointers previously done by the verifier are invali‐
1869 dated and must be performed again, if the helper is used
1870 in combination with direct packet access.
1871
1872 Return 0 on success, or a negative error in case of failure.
1873
1874 long bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32
1875 delta)
1876
1877 Description
1878 Adjust the size allocated to TLVs in the outermost IPv6
1879 Segment Routing Header contained in the packet associated
1880 to skb, at position offset by delta bytes. Only offsets
1881 after the segments are accepted. delta can be as well
1882 positive (growing) as negative (shrinking).
1883
1884 A call to this helper is susceptible to change the under‐
1885 lying packet buffer. Therefore, at load time, all checks
1886 on pointers previously done by the verifier are invali‐
1887 dated and must be performed again, if the helper is used
1888 in combination with direct packet access.
1889
1890 Return 0 on success, or a negative error in case of failure.
1891
1892 long bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param,
1893 u32 param_len)
1894
1895 Description
1896 Apply an IPv6 Segment Routing action of type action to
1897 the packet associated to skb. Each action takes a parame‐
1898 ter contained at address param, and of length param_len
1899 bytes. action can be one of:
1900
1901 SEG6_LOCAL_ACTION_END_X
1902 End.X action: Endpoint with Layer-3 cross-connect.
1903 Type of param: struct in6_addr.
1904
1905 SEG6_LOCAL_ACTION_END_T
1906 End.T action: Endpoint with specific IPv6 table
1907 lookup. Type of param: int.
1908
1909 SEG6_LOCAL_ACTION_END_B6
1910 End.B6 action: Endpoint bound to an SRv6 policy.
1911 Type of param: struct ipv6_sr_hdr.
1912
1913 SEG6_LOCAL_ACTION_END_B6_ENCAP
1914 End.B6.Encap action: Endpoint bound to an SRv6 en‐
1915 capsulation policy. Type of param: struct
1916 ipv6_sr_hdr.
1917
1918 A call to this helper is susceptible to change the under‐
1919 lying packet buffer. Therefore, at load time, all checks
1920 on pointers previously done by the verifier are invali‐
1921 dated and must be performed again, if the helper is used
1922 in combination with direct packet access.
1923
1924 Return 0 on success, or a negative error in case of failure.
1925
1926 long bpf_rc_repeat(void *ctx)
1927
1928 Description
1929 This helper is used in programs implementing IR decoding,
1930 to report a successfully decoded repeat key message. This
1931 delays the generation of a key up event for previously
1932 generated key down event.
1933
1934 Some IR protocols like NEC have a special IR message for
1935 repeating last button, for when a button is held down.
1936
1937 The ctx should point to the lirc sample as passed into
1938 the program.
1939
1940 This helper is only available is the kernel was compiled
1941 with the CONFIG_BPF_LIRC_MODE2 configuration option set
1942 to "y".
1943
1944 Return 0
1945
1946 long bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
1947
1948 Description
1949 This helper is used in programs implementing IR decoding,
1950 to report a successfully decoded key press with scancode,
1951 toggle value in the given protocol. The scancode will be
1952 translated to a keycode using the rc keymap, and reported
1953 as an input key down event. After a period a key up event
1954 is generated. This period can be extended by calling ei‐
1955 ther bpf_rc_keydown() again with the same values, or
1956 calling bpf_rc_repeat().
1957
1958 Some protocols include a toggle bit, in case the button
1959 was released and pressed again between consecutive scan‐
1960 codes.
1961
1962 The ctx should point to the lirc sample as passed into
1963 the program.
1964
1965 The protocol is the decoded protocol number (see enum
1966 rc_proto for some predefined values).
1967
1968 This helper is only available is the kernel was compiled
1969 with the CONFIG_BPF_LIRC_MODE2 configuration option set
1970 to "y".
1971
1972 Return 0
1973
1974 u64 bpf_skb_cgroup_id(struct sk_buff *skb)
1975
1976 Description
1977 Return the cgroup v2 id of the socket associated with the
1978 skb. This is roughly similar to the bpf_get_cgroup_clas‐
1979 sid() helper for cgroup v1 by providing a tag resp. iden‐
1980 tifier that can be matched on or used for map lookups
1981 e.g. to implement policy. The cgroup v2 id of a given
1982 path in the hierarchy is exposed in user space through
1983 the f_handle API in order to get to the same 64-bit id.
1984
1985 This helper can be used on TC egress path, but not on
1986 ingress, and is available only if the kernel was compiled
1987 with the CONFIG_SOCK_CGROUP_DATA configuration option.
1988
1989 Return The id is returned or 0 in case the id could not be re‐
1990 trieved.
1991
1992 u64 bpf_get_current_cgroup_id(void)
1993
1994 Description
1995 Get the current cgroup id based on the cgroup within
1996 which the current task is running.
1997
1998 Return A 64-bit integer containing the current cgroup id based
1999 on the cgroup within which the current task is running.
2000
2001 void *bpf_get_local_storage(void *map, u64 flags)
2002
2003 Description
2004 Get the pointer to the local storage area. The type and
2005 the size of the local storage is defined by the map argu‐
2006 ment. The flags meaning is specific for each map type,
2007 and has to be 0 for cgroup local storage.
2008
2009 Depending on the BPF program type, a local storage area
2010 can be shared between multiple instances of the BPF pro‐
2011 gram, running simultaneously.
2012
2013 A user should care about the synchronization by himself.
2014 For example, by using the BPF_ATOMIC instructions to al‐
2015 ter the shared data.
2016
2017 Return A pointer to the local storage area.
2018
2019 long bpf_sk_select_reuseport(struct sk_reuseport_md *reuse, struct
2020 bpf_map *map, void *key, u64 flags)
2021
2022 Description
2023 Select a SO_REUSEPORT socket from a BPF_MAP_TYPE_REUSE‐
2024 PORT_SOCKARRAY map. It checks the selected socket is
2025 matching the incoming request in the socket buffer.
2026
2027 Return 0 on success, or a negative error in case of failure.
2028
2029 u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level)
2030
2031 Description
2032 Return id of cgroup v2 that is ancestor of cgroup associ‐
2033 ated with the skb at the ancestor_level. The root cgroup
2034 is at ancestor_level zero and each step down the hierar‐
2035 chy increments the level. If ancestor_level == level of
2036 cgroup associated with skb, then return value will be
2037 same as that of bpf_skb_cgroup_id().
2038
2039 The helper is useful to implement policies based on
2040 cgroups that are upper in hierarchy than immediate cgroup
2041 associated with skb.
2042
2043 The format of returned id and helper limitations are same
2044 as in bpf_skb_cgroup_id().
2045
2046 Return The id is returned or 0 in case the id could not be re‐
2047 trieved.
2048
2049 struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple
2050 *tuple, u32 tuple_size, u64 netns, u64 flags)
2051
2052 Description
2053 Look for TCP socket matching tuple, optionally in a child
2054 network namespace netns. The return value must be
2055 checked, and if non-NULL, released via bpf_sk_release().
2056
2057 The ctx should point to the context of the program, such
2058 as the skb or socket (depending on the hook in use). This
2059 is used to determine the base network namespace for the
2060 lookup.
2061
2062 tuple_size must be one of:
2063
2064 sizeof(tuple->ipv4)
2065 Look for an IPv4 socket.
2066
2067 sizeof(tuple->ipv6)
2068 Look for an IPv6 socket.
2069
2070 If the netns is a negative signed 32-bit integer, then
2071 the socket lookup table in the netns associated with the
2072 ctx will be used. For the TC hooks, this is the netns of
2073 the device in the skb. For socket hooks, this is the
2074 netns of the socket. If netns is any other signed 32-bit
2075 value greater than or equal to zero then it specifies the
2076 ID of the netns relative to the netns associated with the
2077 ctx. netns values beyond the range of 32-bit integers are
2078 reserved for future use.
2079
2080 All values for flags are reserved for future usage, and
2081 must be left at zero.
2082
2083 This helper is available only if the kernel was compiled
2084 with CONFIG_NET configuration option.
2085
2086 Return Pointer to struct bpf_sock, or NULL in case of failure.
2087 For sockets with reuseport option, the struct bpf_sock
2088 result is from reuse->socks[] using the hash of the tu‐
2089 ple.
2090
2091 struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple
2092 *tuple, u32 tuple_size, u64 netns, u64 flags)
2093
2094 Description
2095 Look for UDP socket matching tuple, optionally in a child
2096 network namespace netns. The return value must be
2097 checked, and if non-NULL, released via bpf_sk_release().
2098
2099 The ctx should point to the context of the program, such
2100 as the skb or socket (depending on the hook in use). This
2101 is used to determine the base network namespace for the
2102 lookup.
2103
2104 tuple_size must be one of:
2105
2106 sizeof(tuple->ipv4)
2107 Look for an IPv4 socket.
2108
2109 sizeof(tuple->ipv6)
2110 Look for an IPv6 socket.
2111
2112 If the netns is a negative signed 32-bit integer, then
2113 the socket lookup table in the netns associated with the
2114 ctx will be used. For the TC hooks, this is the netns of
2115 the device in the skb. For socket hooks, this is the
2116 netns of the socket. If netns is any other signed 32-bit
2117 value greater than or equal to zero then it specifies the
2118 ID of the netns relative to the netns associated with the
2119 ctx. netns values beyond the range of 32-bit integers are
2120 reserved for future use.
2121
2122 All values for flags are reserved for future usage, and
2123 must be left at zero.
2124
2125 This helper is available only if the kernel was compiled
2126 with CONFIG_NET configuration option.
2127
2128 Return Pointer to struct bpf_sock, or NULL in case of failure.
2129 For sockets with reuseport option, the struct bpf_sock
2130 result is from reuse->socks[] using the hash of the tu‐
2131 ple.
2132
2133 long bpf_sk_release(void *sock)
2134
2135 Description
2136 Release the reference held by sock. sock must be a
2137 non-NULL pointer that was returned from
2138 bpf_sk_lookup_xxx().
2139
2140 Return 0 on success, or a negative error in case of failure.
2141
2142 long bpf_map_push_elem(struct bpf_map *map, const void *value, u64
2143 flags)
2144
2145 Description
2146 Push an element value in map. flags is one of:
2147
2148 BPF_EXIST
2149 If the queue/stack is full, the oldest element is
2150 removed to make room for this.
2151
2152 Return 0 on success, or a negative error in case of failure.
2153
2154 long bpf_map_pop_elem(struct bpf_map *map, void *value)
2155
2156 Description
2157 Pop an element from map.
2158
2159 Return 0 on success, or a negative error in case of failure.
2160
2161 long bpf_map_peek_elem(struct bpf_map *map, void *value)
2162
2163 Description
2164 Get an element from map without removing it.
2165
2166 Return 0 on success, or a negative error in case of failure.
2167
2168 long bpf_msg_push_data(struct sk_msg_buff *msg, u32 start, u32 len, u64
2169 flags)
2170
2171 Description
2172 For socket policies, insert len bytes into msg at offset
2173 start.
2174
2175 If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
2176 it may want to insert metadata or options into the msg.
2177 This can later be read and used by any of the lower layer
2178 BPF hooks.
2179
2180 This helper may fail if under memory pressure (a malloc
2181 fails) in these cases BPF programs will get an appropri‐
2182 ate error and BPF programs will need to handle them.
2183
2184 Return 0 on success, or a negative error in case of failure.
2185
2186 long bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 len, u64
2187 flags)
2188
2189 Description
2190 Will remove len bytes from a msg starting at byte start.
2191 This may result in ENOMEM errors under certain situations
2192 if an allocation and copy are required due to a full ring
2193 buffer. However, the helper will try to avoid doing the
2194 allocation if possible. Other errors can occur if input
2195 parameters are invalid either due to start byte not being
2196 valid part of msg payload and/or pop value being to
2197 large.
2198
2199 Return 0 on success, or a negative error in case of failure.
2200
2201 long bpf_rc_pointer_rel(void *ctx, s32 rel_x, s32 rel_y)
2202
2203 Description
2204 This helper is used in programs implementing IR decoding,
2205 to report a successfully decoded pointer movement.
2206
2207 The ctx should point to the lirc sample as passed into
2208 the program.
2209
2210 This helper is only available is the kernel was compiled
2211 with the CONFIG_BPF_LIRC_MODE2 configuration option set
2212 to "y".
2213
2214 Return 0
2215
2216 long bpf_spin_lock(struct bpf_spin_lock *lock)
2217
2218 Description
2219 Acquire a spinlock represented by the pointer lock, which
2220 is stored as part of a value of a map. Taking the lock
2221 allows to safely update the rest of the fields in that
2222 value. The spinlock can (and must) later be released with
2223 a call to bpf_spin_unlock(lock).
2224
2225 Spinlocks in BPF programs come with a number of restric‐
2226 tions and constraints:
2227
2228 • bpf_spin_lock objects are only allowed inside maps of
2229 types BPF_MAP_TYPE_HASH and BPF_MAP_TYPE_ARRAY (this
2230 list could be extended in the future).
2231
2232 • BTF description of the map is mandatory.
2233
2234 • The BPF program can take ONE lock at a time, since tak‐
2235 ing two or more could cause dead locks.
2236
2237 • Only one struct bpf_spin_lock is allowed per map ele‐
2238 ment.
2239
2240 • When the lock is taken, calls (either BPF to BPF or
2241 helpers) are not allowed.
2242
2243 • The BPF_LD_ABS and BPF_LD_IND instructions are not al‐
2244 lowed inside a spinlock-ed region.
2245
2246 • The BPF program MUST call bpf_spin_unlock() to release
2247 the lock, on all execution paths, before it returns.
2248
2249 • The BPF program can access struct bpf_spin_lock only
2250 via the bpf_spin_lock() and bpf_spin_unlock() helpers.
2251 Loading or storing data into the struct bpf_spin_lock
2252 lock; field of a map is not allowed.
2253
2254 • To use the bpf_spin_lock() helper, the BTF description
2255 of the map value must be a struct and have struct
2256 bpf_spin_lock anyname; field at the top level. Nested
2257 lock inside another struct is not allowed.
2258
2259 • The struct bpf_spin_lock lock field in a map value must
2260 be aligned on a multiple of 4 bytes in that value.
2261
2262 • Syscall with command BPF_MAP_LOOKUP_ELEM does not copy
2263 the bpf_spin_lock field to user space.
2264
2265 • Syscall with command BPF_MAP_UPDATE_ELEM, or update
2266 from a BPF program, do not update the bpf_spin_lock
2267 field.
2268
2269 • bpf_spin_lock cannot be on the stack or inside a net‐
2270 working packet (it can only be inside of a map values).
2271
2272 • bpf_spin_lock is available to root only.
2273
2274 • Tracing programs and socket filter programs cannot use
2275 bpf_spin_lock() due to insufficient preemption checks
2276 (but this may change in the future).
2277
2278 • bpf_spin_lock is not allowed in inner maps of
2279 map-in-map.
2280
2281 Return 0
2282
2283 long bpf_spin_unlock(struct bpf_spin_lock *lock)
2284
2285 Description
2286 Release the lock previously locked by a call to
2287 bpf_spin_lock(lock).
2288
2289 Return 0
2290
2291 struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
2292
2293 Description
2294 This helper gets a struct bpf_sock pointer such that all
2295 the fields in this bpf_sock can be accessed.
2296
2297 Return A struct bpf_sock pointer on success, or NULL in case of
2298 failure.
2299
2300 struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
2301
2302 Description
2303 This helper gets a struct bpf_tcp_sock pointer from a
2304 struct bpf_sock pointer.
2305
2306 Return A struct bpf_tcp_sock pointer on success, or NULL in case
2307 of failure.
2308
2309 long bpf_skb_ecn_set_ce(struct sk_buff *skb)
2310
2311 Description
2312 Set ECN (Explicit Congestion Notification) field of IP
2313 header to CE (Congestion Encountered) if current value is
2314 ECT (ECN Capable Transport). Otherwise, do nothing. Works
2315 with IPv6 and IPv4.
2316
2317 Return 1 if the CE flag is set (either by the current helper
2318 call or because it was already present), 0 if it is not
2319 set.
2320
2321 struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)
2322
2323 Description
2324 Return a struct bpf_sock pointer in TCP_LISTEN state.
2325 bpf_sk_release() is unnecessary and not allowed.
2326
2327 Return A struct bpf_sock pointer on success, or NULL in case of
2328 failure.
2329
2330 struct bpf_sock *bpf_skc_lookup_tcp(void *ctx, struct bpf_sock_tuple
2331 *tuple, u32 tuple_size, u64 netns, u64 flags)
2332
2333 Description
2334 Look for TCP socket matching tuple, optionally in a child
2335 network namespace netns. The return value must be
2336 checked, and if non-NULL, released via bpf_sk_release().
2337
2338 This function is identical to bpf_sk_lookup_tcp(), except
2339 that it also returns timewait or request sockets. Use
2340 bpf_sk_fullsock() or bpf_tcp_sock() to access the full
2341 structure.
2342
2343 This helper is available only if the kernel was compiled
2344 with CONFIG_NET configuration option.
2345
2346 Return Pointer to struct bpf_sock, or NULL in case of failure.
2347 For sockets with reuseport option, the struct bpf_sock
2348 result is from reuse->socks[] using the hash of the tu‐
2349 ple.
2350
2351 long bpf_tcp_check_syncookie(void *sk, void *iph, u32 iph_len, struct
2352 tcphdr *th, u32 th_len)
2353
2354 Description
2355 Check whether iph and th contain a valid SYN cookie ACK
2356 for the listening socket in sk.
2357
2358 iph points to the start of the IPv4 or IPv6 header, while
2359 iph_len contains sizeof(struct iphdr) or sizeof(struct
2360 ipv6hdr).
2361
2362 th points to the start of the TCP header, while th_len
2363 contains the length of the TCP header (at least
2364 sizeof(struct tcphdr)).
2365
2366 Return 0 if iph and th are a valid SYN cookie ACK, or a negative
2367 error otherwise.
2368
2369 long bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t
2370 buf_len, u64 flags)
2371
2372 Description
2373 Get name of sysctl in /proc/sys/ and copy it into pro‐
2374 vided by program buffer buf of size buf_len.
2375
2376 The buffer is always NUL terminated, unless it's
2377 zero-sized.
2378
2379 If flags is zero, full name (e.g. "net/ipv4/tcp_mem") is
2380 copied. Use BPF_F_SYSCTL_BASE_NAME flag to copy base name
2381 only (e.g. "tcp_mem").
2382
2383 Return Number of character copied (not including the trailing
2384 NUL).
2385
2386 -E2BIG if the buffer wasn't big enough (buf will contain
2387 truncated name in this case).
2388
2389 long bpf_sysctl_get_current_value(struct bpf_sysctl *ctx, char *buf,
2390 size_t buf_len)
2391
2392 Description
2393 Get current value of sysctl as it is presented in
2394 /proc/sys (incl. newline, etc), and copy it as a string
2395 into provided by program buffer buf of size buf_len.
2396
2397 The whole value is copied, no matter what file position
2398 user space issued e.g. sys_read at.
2399
2400 The buffer is always NUL terminated, unless it's
2401 zero-sized.
2402
2403 Return Number of character copied (not including the trailing
2404 NUL).
2405
2406 -E2BIG if the buffer wasn't big enough (buf will contain
2407 truncated name in this case).
2408
2409 -EINVAL if current value was unavailable, e.g. because
2410 sysctl is uninitialized and read returns -EIO for it.
2411
2412 long bpf_sysctl_get_new_value(struct bpf_sysctl *ctx, char *buf, size_t
2413 buf_len)
2414
2415 Description
2416 Get new value being written by user space to sysctl (be‐
2417 fore the actual write happens) and copy it as a string
2418 into provided by program buffer buf of size buf_len.
2419
2420 User space may write new value at file position > 0.
2421
2422 The buffer is always NUL terminated, unless it's
2423 zero-sized.
2424
2425 Return Number of character copied (not including the trailing
2426 NUL).
2427
2428 -E2BIG if the buffer wasn't big enough (buf will contain
2429 truncated name in this case).
2430
2431 -EINVAL if sysctl is being read.
2432
2433 long bpf_sysctl_set_new_value(struct bpf_sysctl *ctx, const char *buf,
2434 size_t buf_len)
2435
2436 Description
2437 Override new value being written by user space to sysctl
2438 with value provided by program in buffer buf of size
2439 buf_len.
2440
2441 buf should contain a string in same form as provided by
2442 user space on sysctl write.
2443
2444 User space may write new value at file position > 0. To
2445 override the whole sysctl value file position should be
2446 set to zero.
2447
2448 Return 0 on success.
2449
2450 -E2BIG if the buf_len is too big.
2451
2452 -EINVAL if sysctl is being read.
2453
2454 long bpf_strtol(const char *buf, size_t buf_len, u64 flags, long *res)
2455
2456 Description
2457 Convert the initial part of the string from buffer buf of
2458 size buf_len to a long integer according to the given
2459 base and save the result in res.
2460
2461 The string may begin with an arbitrary amount of white
2462 space (as determined by isspace(3)) followed by a single
2463 optional '-' sign.
2464
2465 Five least significant bits of flags encode base, other
2466 bits are currently unused.
2467
2468 Base must be either 8, 10, 16 or 0 to detect it automati‐
2469 cally similar to user space strtol(3).
2470
2471 Return Number of characters consumed on success. Must be posi‐
2472 tive but no more than buf_len.
2473
2474 -EINVAL if no valid digits were found or unsupported base
2475 was provided.
2476
2477 -ERANGE if resulting value was out of range.
2478
2479 long bpf_strtoul(const char *buf, size_t buf_len, u64 flags, unsigned
2480 long *res)
2481
2482 Description
2483 Convert the initial part of the string from buffer buf of
2484 size buf_len to an unsigned long integer according to the
2485 given base and save the result in res.
2486
2487 The string may begin with an arbitrary amount of white
2488 space (as determined by isspace(3)).
2489
2490 Five least significant bits of flags encode base, other
2491 bits are currently unused.
2492
2493 Base must be either 8, 10, 16 or 0 to detect it automati‐
2494 cally similar to user space strtoul(3).
2495
2496 Return Number of characters consumed on success. Must be posi‐
2497 tive but no more than buf_len.
2498
2499 -EINVAL if no valid digits were found or unsupported base
2500 was provided.
2501
2502 -ERANGE if resulting value was out of range.
2503
2504 void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value,
2505 u64 flags)
2506
2507 Description
2508 Get a bpf-local-storage from a sk.
2509
2510 Logically, it could be thought of getting the value from
2511 a map with sk as the key. From this perspective, the
2512 usage is not much different from bpf_map_lookup_elem(map,
2513 &sk) except this helper enforces the key must be a full
2514 socket and the map must be a BPF_MAP_TYPE_SK_STORAGE
2515 also.
2516
2517 Underneath, the value is stored locally at sk instead of
2518 the map. The map is used as the bpf-local-storage
2519 "type". The bpf-local-storage "type" (i.e. the map) is
2520 searched against all bpf-local-storages residing at sk.
2521
2522 sk is a kernel struct sock pointer for LSM program. sk
2523 is a struct bpf_sock pointer for other program types.
2524
2525 An optional flags (BPF_SK_STORAGE_GET_F_CREATE) can be
2526 used such that a new bpf-local-storage will be created if
2527 one does not exist. value can be used together with
2528 BPF_SK_STORAGE_GET_F_CREATE to specify the initial value
2529 of a bpf-local-storage. If value is NULL, the new
2530 bpf-local-storage will be zero initialized.
2531
2532 Return A bpf-local-storage pointer is returned on success.
2533
2534 NULL if not found or there was an error in adding a new
2535 bpf-local-storage.
2536
2537 long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
2538
2539 Description
2540 Delete a bpf-local-storage from a sk.
2541
2542 Return 0 on success.
2543
2544 -ENOENT if the bpf-local-storage cannot be found. -EIN‐
2545 VAL if sk is not a fullsock (e.g. a request_sock).
2546
2547 long bpf_send_signal(u32 sig)
2548
2549 Description
2550 Send signal sig to the process of the current task. The
2551 signal may be delivered to any of this process's threads.
2552
2553 Return 0 on success or successfully queued.
2554
2555 -EBUSY if work queue under nmi is full.
2556
2557 -EINVAL if sig is invalid.
2558
2559 -EPERM if no permission to send the sig.
2560
2561 -EAGAIN if bpf program can try again.
2562
2563 s64 bpf_tcp_gen_syncookie(void *sk, void *iph, u32 iph_len, struct
2564 tcphdr *th, u32 th_len)
2565
2566 Description
2567 Try to issue a SYN cookie for the packet with correspond‐
2568 ing IP/TCP headers, iph and th, on the listening socket
2569 in sk.
2570
2571 iph points to the start of the IPv4 or IPv6 header, while
2572 iph_len contains sizeof(struct iphdr) or sizeof(struct
2573 ipv6hdr).
2574
2575 th points to the start of the TCP header, while th_len
2576 contains the length of the TCP header with options (at
2577 least sizeof(struct tcphdr)).
2578
2579 Return On success, lower 32 bits hold the generated SYN cookie
2580 in followed by 16 bits which hold the MSS value for that
2581 cookie, and the top 16 bits are unused.
2582
2583 On failure, the returned value is one of the following:
2584
2585 -EINVAL SYN cookie cannot be issued due to error
2586
2587 -ENOENT SYN cookie should not be issued (no SYN flood)
2588
2589 -EOPNOTSUPP kernel configuration does not enable SYN
2590 cookies
2591
2592 -EPROTONOSUPPORT IP packet version is not 4 or 6
2593
2594 long bpf_skb_output(void *ctx, struct bpf_map *map, u64 flags, void
2595 *data, u64 size)
2596
2597 Description
2598 Write raw data blob into a special BPF perf event held by
2599 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
2600 event must have the following attributes: PERF_SAMPLE_RAW
2601 as sample_type, PERF_TYPE_SOFTWARE as type, and
2602 PERF_COUNT_SW_BPF_OUTPUT as config.
2603
2604 The flags are used to indicate the index in map for which
2605 the value must be put, masked with BPF_F_INDEX_MASK. Al‐
2606 ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
2607 dicate that the index of the current CPU core should be
2608 used.
2609
2610 The value to write, of size, is passed through eBPF stack
2611 and pointed by data.
2612
2613 ctx is a pointer to in-kernel struct sk_buff.
2614
2615 This helper is similar to bpf_perf_event_output() but re‐
2616 stricted to raw_tracepoint bpf programs.
2617
2618 Return 0 on success, or a negative error in case of failure.
2619
2620 long bpf_probe_read_user(void *dst, u32 size, const void *unsafe_ptr)
2621
2622 Description
2623 Safely attempt to read size bytes from user space address
2624 unsafe_ptr and store the data in dst.
2625
2626 Return 0 on success, or a negative error in case of failure.
2627
2628 long bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
2629
2630 Description
2631 Safely attempt to read size bytes from kernel space ad‐
2632 dress unsafe_ptr and store the data in dst.
2633
2634 Return 0 on success, or a negative error in case of failure.
2635
2636 long bpf_probe_read_user_str(void *dst, u32 size, const void *un‐
2637 safe_ptr)
2638
2639 Description
2640 Copy a NUL terminated string from an unsafe user address
2641 unsafe_ptr to dst. The size should include the terminat‐
2642 ing NUL byte. In case the string length is smaller than
2643 size, the target is not padded with further NUL bytes. If
2644 the string length is larger than size, just size-1 bytes
2645 are copied and the last byte is set to NUL.
2646
2647 On success, returns the number of bytes that were writ‐
2648 ten, including the terminal NUL. This makes this helper
2649 useful in tracing programs for reading strings, and more
2650 importantly to get its length at runtime. See the follow‐
2651 ing snippet:
2652
2653 SEC("kprobe/sys_open")
2654 void bpf_sys_open(struct pt_regs *ctx)
2655 {
2656 char buf[PATHLEN]; // PATHLEN is defined to 256
2657 int res = bpf_probe_read_user_str(buf, sizeof(buf),
2658 ctx->di);
2659
2660 // Consume buf, for example push it to
2661 // userspace via bpf_perf_event_output(); we
2662 // can use res (the string length) as event
2663 // size, after checking its boundaries.
2664 }
2665
2666 In comparison, using bpf_probe_read_user() helper here
2667 instead to read the string would require to estimate the
2668 length at compile time, and would often result in copying
2669 more memory than necessary.
2670
2671 Another useful use case is when parsing individual
2672 process arguments or individual environment variables
2673 navigating current->mm->arg_start and cur‐
2674 rent->mm->env_start: using this helper and the return
2675 value, one can quickly iterate at the right offset of the
2676 memory area.
2677
2678 Return On success, the strictly positive length of the output
2679 string, including the trailing NUL character. On error, a
2680 negative value.
2681
2682 long bpf_probe_read_kernel_str(void *dst, u32 size, const void *un‐
2683 safe_ptr)
2684
2685 Description
2686 Copy a NUL terminated string from an unsafe kernel ad‐
2687 dress unsafe_ptr to dst. Same semantics as with
2688 bpf_probe_read_user_str() apply.
2689
2690 Return On success, the strictly positive length of the string,
2691 including the trailing NUL character. On error, a nega‐
2692 tive value.
2693
2694 long bpf_tcp_send_ack(void *tp, u32 rcv_nxt)
2695
2696 Description
2697 Send out a tcp-ack. tp is the in-kernel struct tcp_sock.
2698 rcv_nxt is the ack_seq to be sent out.
2699
2700 Return 0 on success, or a negative error in case of failure.
2701
2702 long bpf_send_signal_thread(u32 sig)
2703
2704 Description
2705 Send signal sig to the thread corresponding to the cur‐
2706 rent task.
2707
2708 Return 0 on success or successfully queued.
2709
2710 -EBUSY if work queue under nmi is full.
2711
2712 -EINVAL if sig is invalid.
2713
2714 -EPERM if no permission to send the sig.
2715
2716 -EAGAIN if bpf program can try again.
2717
2718 u64 bpf_jiffies64(void)
2719
2720 Description
2721 Obtain the 64bit jiffies
2722
2723 Return The 64 bit jiffies
2724
2725 long bpf_read_branch_records(struct bpf_perf_event_data *ctx, void
2726 *buf, u32 size, u64 flags)
2727
2728 Description
2729 For an eBPF program attached to a perf event, retrieve
2730 the branch records (struct perf_branch_entry) associated
2731 to ctx and store it in the buffer pointed by buf up to
2732 size size bytes.
2733
2734 Return On success, number of bytes written to buf. On error, a
2735 negative value.
2736
2737 The flags can be set to BPF_F_GET_BRANCH_RECORDS_SIZE to
2738 instead return the number of bytes required to store all
2739 the branch entries. If this flag is set, buf may be NULL.
2740
2741 -EINVAL if arguments invalid or size not a multiple of
2742 sizeof(struct perf_branch_entry).
2743
2744 -ENOENT if architecture does not support branch records.
2745
2746 long bpf_get_ns_current_pid_tgid(u64 dev, u64 ino, struct
2747 bpf_pidns_info *nsdata, u32 size)
2748
2749 Description
2750 Returns 0 on success, values for pid and tgid as seen
2751 from the current namespace will be returned in nsdata.
2752
2753 Return 0 on success, or one of the following in case of failure:
2754
2755 -EINVAL if dev and inum supplied don't match dev_t and
2756 inode number with nsfs of current task, or if dev conver‐
2757 sion to dev_t lost high bits.
2758
2759 -ENOENT if pidns does not exists for the current task.
2760
2761 long bpf_xdp_output(void *ctx, struct bpf_map *map, u64 flags, void
2762 *data, u64 size)
2763
2764 Description
2765 Write raw data blob into a special BPF perf event held by
2766 map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY. This perf
2767 event must have the following attributes: PERF_SAMPLE_RAW
2768 as sample_type, PERF_TYPE_SOFTWARE as type, and
2769 PERF_COUNT_SW_BPF_OUTPUT as config.
2770
2771 The flags are used to indicate the index in map for which
2772 the value must be put, masked with BPF_F_INDEX_MASK. Al‐
2773 ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
2774 dicate that the index of the current CPU core should be
2775 used.
2776
2777 The value to write, of size, is passed through eBPF stack
2778 and pointed by data.
2779
2780 ctx is a pointer to in-kernel struct xdp_buff.
2781
2782 This helper is similar to bpf_perf_eventoutput() but re‐
2783 stricted to raw_tracepoint bpf programs.
2784
2785 Return 0 on success, or a negative error in case of failure.
2786
2787 u64 bpf_get_netns_cookie(void *ctx)
2788
2789 Description
2790 Retrieve the cookie (generated by the kernel) of the net‐
2791 work namespace the input ctx is associated with. The net‐
2792 work namespace cookie remains stable for its lifetime and
2793 provides a global identifier that can be assumed unique.
2794 If ctx is NULL, then the helper returns the cookie for
2795 the initial network namespace. The cookie itself is very
2796 similar to that of bpf_get_socket_cookie() helper, but
2797 for network namespaces instead of sockets.
2798
2799 Return A 8-byte long opaque number.
2800
2801 u64 bpf_get_current_ancestor_cgroup_id(int ancestor_level)
2802
2803 Description
2804 Return id of cgroup v2 that is ancestor of the cgroup as‐
2805 sociated with the current task at the ancestor_level. The
2806 root cgroup is at ancestor_level zero and each step down
2807 the hierarchy increments the level. If ancestor_level ==
2808 level of cgroup associated with the current task, then
2809 return value will be the same as that of bpf_get_cur‐
2810 rent_cgroup_id().
2811
2812 The helper is useful to implement policies based on
2813 cgroups that are upper in hierarchy than immediate cgroup
2814 associated with the current task.
2815
2816 The format of returned id and helper limitations are same
2817 as in bpf_get_current_cgroup_id().
2818
2819 Return The id is returned or 0 in case the id could not be re‐
2820 trieved.
2821
2822 long bpf_sk_assign(struct sk_buff *skb, void *sk, u64 flags)
2823
2824 Description
2825 Helper is overloaded depending on BPF program type. This
2826 description applies to BPF_PROG_TYPE_SCHED_CLS and
2827 BPF_PROG_TYPE_SCHED_ACT programs.
2828
2829 Assign the sk to the skb. When combined with appropriate
2830 routing configuration to receive the packet towards the
2831 socket, will cause skb to be delivered to the specified
2832 socket. Subsequent redirection of skb via bpf_redi‐
2833 rect(), bpf_clone_redirect() or other methods outside of
2834 BPF may interfere with successful delivery to the socket.
2835
2836 This operation is only valid from TC ingress path.
2837
2838 The flags argument must be zero.
2839
2840 Return 0 on success, or a negative error in case of failure:
2841
2842 -EINVAL if specified flags are not supported.
2843
2844 -ENOENT if the socket is unavailable for assignment.
2845
2846 -ENETUNREACH if the socket is unreachable (wrong netns).
2847
2848 -EOPNOTSUPP if the operation is not supported, for exam‐
2849 ple a call from outside of TC ingress.
2850
2851 -ESOCKTNOSUPPORT if the socket type is not supported
2852 (reuseport).
2853
2854 long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64
2855 flags)
2856
2857 Description
2858 Helper is overloaded depending on BPF program type. This
2859 description applies to BPF_PROG_TYPE_SK_LOOKUP programs.
2860
2861 Select the sk as a result of a socket lookup.
2862
2863 For the operation to succeed passed socket must be com‐
2864 patible with the packet description provided by the ctx
2865 object.
2866
2867 L4 protocol (IPPROTO_TCP or IPPROTO_UDP) must be an exact
2868 match. While IP family (AF_INET or AF_INET6) must be com‐
2869 patible, that is IPv6 sockets that are not v6-only can be
2870 selected for IPv4 packets.
2871
2872 Only TCP listeners and UDP unconnected sockets can be se‐
2873 lected. sk can also be NULL to reset any previous selec‐
2874 tion.
2875
2876 flags argument can combination of following values:
2877
2878 • BPF_SK_LOOKUP_F_REPLACE to override the previous socket
2879 selection, potentially done by a BPF program that ran
2880 before us.
2881
2882 • BPF_SK_LOOKUP_F_NO_REUSEPORT to skip load-balancing
2883 within reuseport group for the socket being selected.
2884
2885 On success ctx->sk will point to the selected socket.
2886
2887 Return 0 on success, or a negative errno in case of failure.
2888
2889 • -EAFNOSUPPORT if socket family (sk->family) is not com‐
2890 patible with packet family (ctx->family).
2891
2892 • -EEXIST if socket has been already selected, poten‐
2893 tially by another program, and BPF_SK_LOOKUP_F_REPLACE
2894 flag was not specified.
2895
2896 • -EINVAL if unsupported flags were specified.
2897
2898 • -EPROTOTYPE if socket L4 protocol (sk->protocol)
2899 doesn't match packet protocol (ctx->protocol).
2900
2901 • -ESOCKTNOSUPPORT if socket is not in allowed state (TCP
2902 listening or UDP unconnected).
2903
2904 u64 bpf_ktime_get_boot_ns(void)
2905
2906 Description
2907 Return the time elapsed since system boot, in nanosec‐
2908 onds. Does include the time the system was suspended.
2909 See: clock_gettime(CLOCK_BOOTTIME)
2910
2911 Return Current ktime.
2912
2913 long bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size,
2914 const void *data, u32 data_len)
2915
2916 Description
2917 bpf_seq_printf() uses seq_file seq_printf() to print out
2918 the format string. The m represents the seq_file. The
2919 fmt and fmt_size are for the format string itself. The
2920 data and data_len are format string arguments. The data
2921 are a u64 array and corresponding format string values
2922 are stored in the array. For strings and pointers where
2923 pointees are accessed, only the pointer values are stored
2924 in the data array. The data_len is the size of data in
2925 bytes - must be a multiple of 8.
2926
2927 Formats %s, %p{i,I}{4,6} requires to read kernel memory.
2928 Reading kernel memory may fail due to either invalid ad‐
2929 dress or valid address but requiring a major memory
2930 fault. If reading kernel memory fails, the string for %s
2931 will be an empty string, and the ip address for
2932 %p{i,I}{4,6} will be 0. Not returning error to bpf pro‐
2933 gram is consistent with what bpf_trace_printk() does for
2934 now.
2935
2936 Return 0 on success, or a negative error in case of failure:
2937
2938 -EBUSY if per-CPU memory copy buffer is busy, can try
2939 again by returning 1 from bpf program.
2940
2941 -EINVAL if arguments are invalid, or if fmt is in‐
2942 valid/unsupported.
2943
2944 -E2BIG if fmt contains too many format specifiers.
2945
2946 -EOVERFLOW if an overflow happened: The same object will
2947 be tried again.
2948
2949 long bpf_seq_write(struct seq_file *m, const void *data, u32 len)
2950
2951 Description
2952 bpf_seq_write() uses seq_file seq_write() to write the
2953 data. The m represents the seq_file. The data and len
2954 represent the data to write in bytes.
2955
2956 Return 0 on success, or a negative error in case of failure:
2957
2958 -EOVERFLOW if an overflow happened: The same object will
2959 be tried again.
2960
2961 u64 bpf_sk_cgroup_id(void *sk)
2962
2963 Description
2964 Return the cgroup v2 id of the socket sk.
2965
2966 sk must be a non-NULL pointer to a socket, e.g. one re‐
2967 turned from bpf_sk_lookup_xxx(), bpf_sk_fullsock(), etc.
2968 The format of returned id is same as in
2969 bpf_skb_cgroup_id().
2970
2971 This helper is available only if the kernel was compiled
2972 with the CONFIG_SOCK_CGROUP_DATA configuration option.
2973
2974 Return The id is returned or 0 in case the id could not be re‐
2975 trieved.
2976
2977 u64 bpf_sk_ancestor_cgroup_id(void *sk, int ancestor_level)
2978
2979 Description
2980 Return id of cgroup v2 that is ancestor of cgroup associ‐
2981 ated with the sk at the ancestor_level. The root cgroup
2982 is at ancestor_level zero and each step down the hierar‐
2983 chy increments the level. If ancestor_level == level of
2984 cgroup associated with sk, then return value will be same
2985 as that of bpf_sk_cgroup_id().
2986
2987 The helper is useful to implement policies based on
2988 cgroups that are upper in hierarchy than immediate cgroup
2989 associated with sk.
2990
2991 The format of returned id and helper limitations are same
2992 as in bpf_sk_cgroup_id().
2993
2994 Return The id is returned or 0 in case the id could not be re‐
2995 trieved.
2996
2997 long bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64 flags)
2998
2999 Description
3000 Copy size bytes from data into a ring buffer ringbuf. If
3001 BPF_RB_NO_WAKEUP is specified in flags, no notification
3002 of new data availability is sent. If BPF_RB_FORCE_WAKEUP
3003 is specified in flags, notification of new data avail‐
3004 ability is sent unconditionally. If 0 is specified in
3005 flags, an adaptive notification of new data availability
3006 is sent.
3007
3008 An adaptive notification is a notification sent whenever
3009 the user-space process has caught up and consumed all
3010 available payloads. In case the user-space process is
3011 still processing a previous payload, then no notification
3012 is needed as it will process the newly added payload au‐
3013 tomatically.
3014
3015 Return 0 on success, or a negative error in case of failure.
3016
3017 void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
3018
3019 Description
3020 Reserve size bytes of payload in a ring buffer ringbuf.
3021 flags must be 0.
3022
3023 Return Valid pointer with size bytes of memory available; NULL,
3024 otherwise.
3025
3026 void bpf_ringbuf_submit(void *data, u64 flags)
3027
3028 Description
3029 Submit reserved ring buffer sample, pointed to by data.
3030 If BPF_RB_NO_WAKEUP is specified in flags, no notifica‐
3031 tion of new data availability is sent. If
3032 BPF_RB_FORCE_WAKEUP is specified in flags, notification
3033 of new data availability is sent unconditionally. If 0
3034 is specified in flags, an adaptive notification of new
3035 data availability is sent.
3036
3037 See 'bpf_ringbuf_output()' for the definition of adaptive
3038 notification.
3039
3040 Return Nothing. Always succeeds.
3041
3042 void bpf_ringbuf_discard(void *data, u64 flags)
3043
3044 Description
3045 Discard reserved ring buffer sample, pointed to by data.
3046 If BPF_RB_NO_WAKEUP is specified in flags, no notifica‐
3047 tion of new data availability is sent. If
3048 BPF_RB_FORCE_WAKEUP is specified in flags, notification
3049 of new data availability is sent unconditionally. If 0
3050 is specified in flags, an adaptive notification of new
3051 data availability is sent.
3052
3053 See 'bpf_ringbuf_output()' for the definition of adaptive
3054 notification.
3055
3056 Return Nothing. Always succeeds.
3057
3058 u64 bpf_ringbuf_query(void *ringbuf, u64 flags)
3059
3060 Description
3061 Query various characteristics of provided ring buffer.
3062 What exactly is queries is determined by flags:
3063
3064 • BPF_RB_AVAIL_DATA: Amount of data not yet consumed.
3065
3066 • BPF_RB_RING_SIZE: The size of ring buffer.
3067
3068 • BPF_RB_CONS_POS: Consumer position (can wrap around).
3069
3070 • BPF_RB_PROD_POS: Producer(s) position (can wrap
3071 around).
3072
3073 Data returned is just a momentary snapshot of actual val‐
3074 ues and could be inaccurate, so this facility should be
3075 used to power heuristics and for reporting, not to make
3076 100% correct calculation.
3077
3078 Return Requested value, or 0, if flags are not recognized.
3079
3080 long bpf_csum_level(struct sk_buff *skb, u64 level)
3081
3082 Description
3083 Change the skbs checksum level by one layer up or down,
3084 or reset it entirely to none in order to have the stack
3085 perform checksum validation. The level is applicable to
3086 the following protocols: TCP, UDP, GRE, SCTP, FCOE. For
3087 example, a decap of | ETH | IP | UDP | GUE | IP | TCP |
3088 into | ETH | IP | TCP | through bpf_skb_adjust_room()
3089 helper with passing in BPF_F_ADJ_ROOM_NO_CSUM_RESET flag
3090 would require one call to bpf_csum_level() with
3091 BPF_CSUM_LEVEL_DEC since the UDP header is removed. Simi‐
3092 larly, an encap of the latter into the former could be
3093 accompanied by a helper call to bpf_csum_level() with
3094 BPF_CSUM_LEVEL_INC if the skb is still intended to be
3095 processed in higher layers of the stack instead of just
3096 egressing at tc.
3097
3098 There are three supported level settings at this time:
3099
3100 • BPF_CSUM_LEVEL_INC: Increases skb->csum_level for skbs
3101 with CHECKSUM_UNNECESSARY.
3102
3103 • BPF_CSUM_LEVEL_DEC: Decreases skb->csum_level for skbs
3104 with CHECKSUM_UNNECESSARY.
3105
3106 • BPF_CSUM_LEVEL_RESET: Resets skb->csum_level to 0 and
3107 sets CHECKSUM_NONE to force checksum validation by the
3108 stack.
3109
3110 • BPF_CSUM_LEVEL_QUERY: No-op, returns the current
3111 skb->csum_level.
3112
3113 Return 0 on success, or a negative error in case of failure. In
3114 the case of BPF_CSUM_LEVEL_QUERY, the current
3115 skb->csum_level is returned or the error code -EACCES in
3116 case the skb is not subject to CHECKSUM_UNNECESSARY.
3117
3118 struct tcp6_sock *bpf_skc_to_tcp6_sock(void *sk)
3119
3120 Description
3121 Dynamically cast a sk pointer to a tcp6_sock pointer.
3122
3123 Return sk if casting is valid, or NULL otherwise.
3124
3125 struct tcp_sock *bpf_skc_to_tcp_sock(void *sk)
3126
3127 Description
3128 Dynamically cast a sk pointer to a tcp_sock pointer.
3129
3130 Return sk if casting is valid, or NULL otherwise.
3131
3132 struct tcp_timewait_sock *bpf_skc_to_tcp_timewait_sock(void *sk)
3133
3134 Description
3135 Dynamically cast a sk pointer to a tcp_timewait_sock
3136 pointer.
3137
3138 Return sk if casting is valid, or NULL otherwise.
3139
3140 struct tcp_request_sock *bpf_skc_to_tcp_request_sock(void *sk)
3141
3142 Description
3143 Dynamically cast a sk pointer to a tcp_request_sock
3144 pointer.
3145
3146 Return sk if casting is valid, or NULL otherwise.
3147
3148 struct udp6_sock *bpf_skc_to_udp6_sock(void *sk)
3149
3150 Description
3151 Dynamically cast a sk pointer to a udp6_sock pointer.
3152
3153 Return sk if casting is valid, or NULL otherwise.
3154
3155 long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size,
3156 u64 flags)
3157
3158 Description
3159 Return a user or a kernel stack in bpf program provided
3160 buffer. To achieve this, the helper needs task, which is
3161 a valid pointer to struct task_struct. To store the
3162 stacktrace, the bpf program provides buf with a nonnega‐
3163 tive size.
3164
3165 The last argument, flags, holds the number of stack
3166 frames to skip (from 0 to 255), masked with
3167 BPF_F_SKIP_FIELD_MASK. The next bits can be used to set
3168 the following flags:
3169
3170 BPF_F_USER_STACK
3171 Collect a user space stack instead of a kernel
3172 stack.
3173
3174 BPF_F_USER_BUILD_ID
3175 Collect buildid+offset instead of ips for user
3176 stack, only valid if BPF_F_USER_STACK is also
3177 specified.
3178
3179 bpf_get_task_stack() can collect up to
3180 PERF_MAX_STACK_DEPTH both kernel and user frames, subject
3181 to sufficient large buffer size. Note that this limit can
3182 be controlled with the sysctl program, and that it should
3183 be manually increased in order to profile long user
3184 stacks (such as stacks for Java programs). To do so, use:
3185
3186 # sysctl kernel.perf_event_max_stack=<new value>
3187
3188 Return The non-negative copied buf length equal to or less than
3189 size on success, or a negative error in case of failure.
3190
3191 long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res,
3192 u32 len, u64 flags)
3193
3194 Description
3195 Load header option. Support reading a particular TCP
3196 header option for bpf program (BPF_PROG_TYPE_SOCK_OPS).
3197
3198 If flags is 0, it will search the option from the
3199 skops->skb_data. The comment in struct bpf_sock_ops has
3200 details on what skb_data contains under different
3201 skops->op.
3202
3203 The first byte of the searchby_res specifies the kind
3204 that it wants to search.
3205
3206 If the searching kind is an experimental kind (i.e. 253
3207 or 254 according to RFC6994). It also needs to specify
3208 the "magic" which is either 2 bytes or 4 bytes. It then
3209 also needs to specify the size of the magic by using the
3210 2nd byte which is "kind-length" of a TCP header option
3211 and the "kind-length" also includes the first 2 bytes
3212 "kind" and "kind-length" itself as a normal TCP header
3213 option also does.
3214
3215 For example, to search experimental kind 254 with 2 byte
3216 magic 0xeB9F, the searchby_res should be [ 254, 4, 0xeB,
3217 0x9F, 0, 0, .... 0 ].
3218
3219 To search for the standard window scale option (3), the
3220 searchby_res should be [ 3, 0, 0, .... 0 ]. Note,
3221 kind-length must be 0 for regular option.
3222
3223 Searching for No-Op (0) and End-of-Option-List (1) are
3224 not supported.
3225
3226 len must be at least 2 bytes which is the minimal size of
3227 a header option.
3228
3229 Supported flags:
3230
3231 • BPF_LOAD_HDR_OPT_TCP_SYN to search from the saved_syn
3232 packet or the just-received syn packet.
3233
3234 Return > 0 when found, the header option is copied to
3235 searchby_res. The return value is the total length
3236 copied. On failure, a negative error code is returned:
3237
3238 -EINVAL if a parameter is invalid.
3239
3240 -ENOMSG if the option is not found.
3241
3242 -ENOENT if no syn packet is available when
3243 BPF_LOAD_HDR_OPT_TCP_SYN is used.
3244
3245 -ENOSPC if there is not enough space. Only len number of
3246 bytes are copied.
3247
3248 -EFAULT on failure to parse the header options in the
3249 packet.
3250
3251 -EPERM if the helper cannot be used under the current
3252 skops->op.
3253
3254 long bpf_store_hdr_opt(struct bpf_sock_ops *skops, const void *from,
3255 u32 len, u64 flags)
3256
3257 Description
3258 Store header option. The data will be copied from buffer
3259 from with length len to the TCP header.
3260
3261 The buffer from should have the whole option that in‐
3262 cludes the kind, kind-length, and the actual option data.
3263 The len must be at least kind-length long. The
3264 kind-length does not have to be 4 byte aligned. The ker‐
3265 nel will take care of the padding and setting the 4 bytes
3266 aligned value to th->doff.
3267
3268 This helper will check for duplicated option by searching
3269 the same option in the outgoing skb.
3270
3271 This helper can only be called during
3272 BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
3273
3274 Return 0 on success, or negative error in case of failure:
3275
3276 -EINVAL If param is invalid.
3277
3278 -ENOSPC if there is not enough space in the header.
3279 Nothing has been written
3280
3281 -EEXIST if the option already exists.
3282
3283 -EFAULT on failure to parse the existing header options.
3284
3285 -EPERM if the helper cannot be used under the current
3286 skops->op.
3287
3288 long bpf_reserve_hdr_opt(struct bpf_sock_ops *skops, u32 len, u64
3289 flags)
3290
3291 Description
3292 Reserve len bytes for the bpf header option. The space
3293 will be used by bpf_store_hdr_opt() later in
3294 BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
3295
3296 If bpf_reserve_hdr_opt() is called multiple times, the
3297 total number of bytes will be reserved.
3298
3299 This helper can only be called during
3300 BPF_SOCK_OPS_HDR_OPT_LEN_CB.
3301
3302 Return 0 on success, or negative error in case of failure:
3303
3304 -EINVAL if a parameter is invalid.
3305
3306 -ENOSPC if there is not enough space in the header.
3307
3308 -EPERM if the helper cannot be used under the current
3309 skops->op.
3310
3311 void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void
3312 *value, u64 flags)
3313
3314 Description
3315 Get a bpf_local_storage from an inode.
3316
3317 Logically, it could be thought of as getting the value
3318 from a map with inode as the key. From this perspective,
3319 the usage is not much different from
3320 bpf_map_lookup_elem(map, &inode) except this helper en‐
3321 forces the key must be an inode and the map must also be
3322 a BPF_MAP_TYPE_INODE_STORAGE.
3323
3324 Underneath, the value is stored locally at inode instead
3325 of the map. The map is used as the bpf-local-storage
3326 "type". The bpf-local-storage "type" (i.e. the map) is
3327 searched against all bpf_local_storage residing at inode.
3328
3329 An optional flags (BPF_LOCAL_STORAGE_GET_F_CREATE) can be
3330 used such that a new bpf_local_storage will be created if
3331 one does not exist. value can be used together with
3332 BPF_LOCAL_STORAGE_GET_F_CREATE to specify the initial
3333 value of a bpf_local_storage. If value is NULL, the new
3334 bpf_local_storage will be zero initialized.
3335
3336 Return A bpf_local_storage pointer is returned on success.
3337
3338 NULL if not found or there was an error in adding a new
3339 bpf_local_storage.
3340
3341 int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
3342
3343 Description
3344 Delete a bpf_local_storage from an inode.
3345
3346 Return 0 on success.
3347
3348 -ENOENT if the bpf_local_storage cannot be found.
3349
3350 long bpf_d_path(struct path *path, char *buf, u32 sz)
3351
3352 Description
3353 Return full path for given struct path object, which
3354 needs to be the kernel BTF path object. The path is re‐
3355 turned in the provided buffer buf of size sz and is zero
3356 terminated.
3357
3358 Return On success, the strictly positive length of the string,
3359 including the trailing NUL character. On error, a nega‐
3360 tive value.
3361
3362 long bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
3363
3364 Description
3365 Read size bytes from user space address user_ptr and
3366 store the data in dst. This is a wrapper of
3367 copy_from_user().
3368
3369 Return 0 on success, or a negative error in case of failure.
3370
3371 long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32
3372 btf_ptr_size, u64 flags)
3373
3374 Description
3375 Use BTF to store a string representation of ptr->ptr in
3376 str, using ptr->type_id. This value should specify the
3377 type that ptr->ptr points to. LLVM
3378 __builtin_btf_type_id(type, 1) can be used to look up vm‐
3379 linux BTF type ids. Traversing the data structure using
3380 BTF, the type information and values are stored in the
3381 first str_size - 1 bytes of str. Safe copy of the
3382 pointer data is carried out to avoid kernel crashes dur‐
3383 ing operation. Smaller types can use string space on the
3384 stack; larger programs can use map data to store the
3385 string representation.
3386
3387 The string can be subsequently shared with userspace via
3388 bpf_perf_event_output() or ring buffer interfaces.
3389 bpf_trace_printk() is to be avoided as it places too
3390 small a limit on string size to be useful.
3391
3392 flags is a combination of
3393
3394 BTF_F_COMPACT
3395 no formatting around type information
3396
3397 BTF_F_NONAME
3398 no struct/union member names/types
3399
3400 BTF_F_PTR_RAW
3401 show raw (unobfuscated) pointer values; equivalent
3402 to printk specifier %px.
3403
3404 BTF_F_ZERO
3405 show zero-valued struct/union members; they are
3406 not displayed by default
3407
3408 Return The number of bytes that were written (or would have been
3409 written if output had to be truncated due to string
3410 size), or a negative error in cases of failure.
3411
3412 long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32
3413 ptr_size, u64 flags)
3414
3415 Description
3416 Use BTF to write to seq_write a string representation of
3417 ptr->ptr, using ptr->type_id as per bpf_snprintf_btf().
3418 flags are identical to those used for bpf_snprintf_btf.
3419
3420 Return 0 on success or a negative error in case of failure.
3421
3422 u64 bpf_skb_cgroup_classid(struct sk_buff *skb)
3423
3424 Description
3425 See bpf_get_cgroup_classid() for the main description.
3426 This helper differs from bpf_get_cgroup_classid() in that
3427 the cgroup v1 net_cls class is retrieved only from the
3428 skb's associated socket instead of the current process.
3429
3430 Return The id is returned or 0 in case the id could not be re‐
3431 trieved.
3432
3433 long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params,
3434 int plen, u64 flags)
3435
3436 Description
3437 Redirect the packet to another net device of index
3438 ifindex and fill in L2 addresses from neighboring subsys‐
3439 tem. This helper is somewhat similar to bpf_redirect(),
3440 except that it populates L2 addresses as well, meaning,
3441 internally, the helper relies on the neighbor lookup for
3442 the L2 address of the nexthop.
3443
3444 The helper will perform a FIB lookup based on the skb's
3445 networking header to get the address of the next hop, un‐
3446 less this is supplied by the caller in the params argu‐
3447 ment. The plen argument indicates the len of params and
3448 should be set to 0 if params is NULL.
3449
3450 The flags argument is reserved and must be 0. The helper
3451 is currently only supported for tc BPF program types, and
3452 enabled for IPv4 and IPv6 protocols.
3453
3454 Return The helper returns TC_ACT_REDIRECT on success or
3455 TC_ACT_SHOT on error.
3456
3457 void *bpf_per_cpu_ptr(const void *percpu_ptr, u32 cpu)
3458
3459 Description
3460 Take a pointer to a percpu ksym, percpu_ptr, and return a
3461 pointer to the percpu kernel variable on cpu. A ksym is
3462 an extern variable decorated with '__ksym'. For ksym,
3463 there is a global var (either static or global) defined
3464 of the same name in the kernel. The ksym is percpu if the
3465 global var is percpu. The returned pointer points to the
3466 global percpu var on cpu.
3467
3468 bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr()
3469 in the kernel, except that bpf_per_cpu_ptr() may return
3470 NULL. This happens if cpu is larger than nr_cpu_ids. The
3471 caller of bpf_per_cpu_ptr() must check the returned
3472 value.
3473
3474 Return A pointer pointing to the kernel percpu variable on cpu,
3475 or NULL, if cpu is invalid.
3476
3477 void *bpf_this_cpu_ptr(const void *percpu_ptr)
3478
3479 Description
3480 Take a pointer to a percpu ksym, percpu_ptr, and return a
3481 pointer to the percpu kernel variable on this cpu. See
3482 the description of 'ksym' in bpf_per_cpu_ptr().
3483
3484 bpf_this_cpu_ptr() has the same semantic as
3485 this_cpu_ptr() in the kernel. Different from
3486 bpf_per_cpu_ptr(), it would never return NULL.
3487
3488 Return A pointer pointing to the kernel percpu variable on this
3489 cpu.
3490
3491 long bpf_redirect_peer(u32 ifindex, u64 flags)
3492
3493 Description
3494 Redirect the packet to another net device of index
3495 ifindex. This helper is somewhat similar to bpf_redi‐
3496 rect(), except that the redirection happens to the
3497 ifindex' peer device and the netns switch takes place
3498 from ingress to ingress without going through the CPU's
3499 backlog queue.
3500
3501 The flags argument is reserved and must be 0. The helper
3502 is currently only supported for tc BPF program types at
3503 the ingress hook and for veth device types. The peer de‐
3504 vice must reside in a different network namespace.
3505
3506 Return The helper returns TC_ACT_REDIRECT on success or
3507 TC_ACT_SHOT on error.
3508
3509 void *bpf_task_storage_get(struct bpf_map *map, struct task_struct
3510 *task, void *value, u64 flags)
3511
3512 Description
3513 Get a bpf_local_storage from the task.
3514
3515 Logically, it could be thought of as getting the value
3516 from a map with task as the key. From this perspective,
3517 the usage is not much different from
3518 bpf_map_lookup_elem(map, &task) except this helper en‐
3519 forces the key must be a task_struct and the map must
3520 also be a BPF_MAP_TYPE_TASK_STORAGE.
3521
3522 Underneath, the value is stored locally at task instead
3523 of the map. The map is used as the bpf-local-storage
3524 "type". The bpf-local-storage "type" (i.e. the map) is
3525 searched against all bpf_local_storage residing at task.
3526
3527 An optional flags (BPF_LOCAL_STORAGE_GET_F_CREATE) can be
3528 used such that a new bpf_local_storage will be created if
3529 one does not exist. value can be used together with
3530 BPF_LOCAL_STORAGE_GET_F_CREATE to specify the initial
3531 value of a bpf_local_storage. If value is NULL, the new
3532 bpf_local_storage will be zero initialized.
3533
3534 Return A bpf_local_storage pointer is returned on success.
3535
3536 NULL if not found or there was an error in adding a new
3537 bpf_local_storage.
3538
3539 long bpf_task_storage_delete(struct bpf_map *map, struct task_struct
3540 *task)
3541
3542 Description
3543 Delete a bpf_local_storage from a task.
3544
3545 Return 0 on success.
3546
3547 -ENOENT if the bpf_local_storage cannot be found.
3548
3549 struct task_struct *bpf_get_current_task_btf(void)
3550
3551 Description
3552 Return a BTF pointer to the "current" task. This pointer
3553 can also be used in helpers that accept an
3554 ARG_PTR_TO_BTF_ID of type task_struct.
3555
3556 Return Pointer to the current task.
3557
3558 long bpf_bprm_opts_set(struct linux_binprm *bprm, u64 flags)
3559
3560 Description
3561 Set or clear certain options on bprm:
3562
3563 BPF_F_BPRM_SECUREEXEC Set the secureexec bit which sets
3564 the AT_SECURE auxv for glibc. The bit is cleared if the
3565 flag is not specified.
3566
3567 Return -EINVAL if invalid flags are passed, zero otherwise.
3568
3569 u64 bpf_ktime_get_coarse_ns(void)
3570
3571 Description
3572 Return a coarse-grained version of the time elapsed since
3573 system boot, in nanoseconds. Does not include time the
3574 system was suspended.
3575
3576 See: clock_gettime(CLOCK_MONOTONIC_COARSE)
3577
3578 Return Current ktime.
3579
3580 long bpf_ima_inode_hash(struct inode *inode, void *dst, u32 size)
3581
3582 Description
3583 Returns the stored IMA hash of the inode (if it's avail‐
3584 able). If the hash is larger than size, then only size
3585 bytes will be copied to dst
3586
3587 Return The hash_algo is returned on success, -EOPNOTSUP if IMA
3588 is disabled or -EINVAL if invalid arguments are passed.
3589
3590 struct socket *bpf_sock_from_file(struct file *file)
3591
3592 Description
3593 If the given file represents a socket, returns the asso‐
3594 ciated socket.
3595
3596 Return A pointer to a struct socket on success or NULL if the
3597 file is not a socket.
3598
3599 long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff,
3600 u64 flags)
3601
3602 Description
3603 Check packet size against exceeding MTU of net device
3604 (based on ifindex). This helper will likely be used in
3605 combination with helpers that adjust/change the packet
3606 size.
3607
3608 The argument len_diff can be used for querying with a
3609 planned size change. This allows to check MTU prior to
3610 changing packet ctx. Providing a len_diff adjustment that
3611 is larger than the actual packet size (resulting in nega‐
3612 tive packet size) will in principle not exceed the MTU,
3613 which is why it is not considered a failure. Other BPF
3614 helpers are needed for performing the planned size
3615 change; therefore the responsibility for catching a nega‐
3616 tive packet size belongs in those helpers.
3617
3618 Specifying ifindex zero means the MTU check is performed
3619 against the current net device. This is practical if
3620 this isn't used prior to redirect.
3621
3622 On input mtu_len must be a valid pointer, else verifier
3623 will reject BPF program. If the value mtu_len is ini‐
3624 tialized to zero then the ctx packet size is use. When
3625 value mtu_len is provided as input this specify the L3
3626 length that the MTU check is done against. Remember XDP
3627 and TC length operate at L2, but this value is L3 as this
3628 correlate to MTU and IP-header tot_len values which are
3629 L3 (similar behavior as bpf_fib_lookup).
3630
3631 The Linux kernel route table can configure MTUs on a more
3632 specific per route level, which is not provided by this
3633 helper. For route level MTU checks use the
3634 bpf_fib_lookup() helper.
3635
3636 ctx is either struct xdp_md for XDP programs or struct
3637 sk_buff for tc cls_act programs.
3638
3639 The flags argument can be a combination of one or more of
3640 the following values:
3641
3642 BPF_MTU_CHK_SEGS
3643 This flag will only works for ctx struct sk_buff.
3644 If packet context contains extra packet segment
3645 buffers (often knows as GSO skb), then MTU check
3646 is harder to check at this point, because in
3647 transmit path it is possible for the skb packet to
3648 get re-segmented (depending on net device fea‐
3649 tures). This could still be a MTU violation, so
3650 this flag enables performing MTU check against
3651 segments, with a different violation return code
3652 to tell it apart. Check cannot use len_diff.
3653
3654 On return mtu_len pointer contains the MTU value of the
3655 net device. Remember the net device configured MTU is
3656 the L3 size, which is returned here and XDP and TC length
3657 operate at L2. Helper take this into account for you,
3658 but remember when using MTU value in your BPF-code.
3659
3660 Return
3661
3662 • 0 on success, and populate MTU value in mtu_len
3663 pointer.
3664
3665 • < 0 if any input argument is invalid (mtu_len not up‐
3666 dated)
3667
3668 MTU violations return positive values, but also populate
3669 MTU value in mtu_len pointer, as this can be needed for
3670 implementing PMTU handing:
3671
3672 • BPF_MTU_CHK_RET_FRAG_NEEDED
3673
3674 • BPF_MTU_CHK_RET_SEGS_TOOBIG
3675
3676 long bpf_for_each_map_elem(struct bpf_map *map, void *callback_fn, void
3677 *callback_ctx, u64 flags)
3678
3679 Description
3680 For each element in map, call callback_fn function with
3681 map, callback_ctx and other map-specific parameters. The
3682 callback_fn should be a static function and the call‐
3683 back_ctx should be a pointer to the stack. The flags is
3684 used to control certain aspects of the helper. Cur‐
3685 rently, the flags must be 0.
3686
3687 The following are a list of supported map types and their
3688 respective expected callback signatures:
3689
3690 BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_PERCPU_HASH,
3691 BPF_MAP_TYPE_LRU_HASH, BPF_MAP_TYPE_LRU_PERCPU_HASH,
3692 BPF_MAP_TYPE_ARRAY, BPF_MAP_TYPE_PERCPU_ARRAY
3693
3694 long (*callback_fn)(struct bpf_map *map, const void *key,
3695 void *value, void *ctx);
3696
3697 For per_cpu maps, the map_value is the value on the cpu
3698 where the bpf_prog is running.
3699
3700 If callback_fn return 0, the helper will continue to the
3701 next element. If return value is 1, the helper will skip
3702 the rest of elements and return. Other return values are
3703 not used now.
3704
3705 Return The number of traversed map elements for success, -EINVAL
3706 for invalid flags.
3707
3708 long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data,
3709 u32 data_len)
3710
3711 Description
3712 Outputs a string into the str buffer of size str_size
3713 based on a format string stored in a read-only map
3714 pointed by fmt.
3715
3716 Each format specifier in fmt corresponds to one u64 ele‐
3717 ment in the data array. For strings and pointers where
3718 pointees are accessed, only the pointer values are stored
3719 in the data array. The data_len is the size of data in
3720 bytes - must be a multiple of 8.
3721
3722 Formats %s and %p{i,I}{4,6} require to read kernel mem‐
3723 ory. Reading kernel memory may fail due to either invalid
3724 address or valid address but requiring a major memory
3725 fault. If reading kernel memory fails, the string for %s
3726 will be an empty string, and the ip address for
3727 %p{i,I}{4,6} will be 0. Not returning error to bpf pro‐
3728 gram is consistent with what bpf_trace_printk() does for
3729 now.
3730
3731 Return The strictly positive length of the formatted string, in‐
3732 cluding the trailing zero character. If the return value
3733 is greater than str_size, str contains a truncated
3734 string, guaranteed to be zero-terminated except when
3735 str_size is 0.
3736
3737 Or -EBUSY if the per-CPU memory copy buffer is busy.
3738
3739 long bpf_sys_bpf(u32 cmd, void *attr, u32 attr_size)
3740
3741 Description
3742 Execute bpf syscall with given arguments.
3743
3744 Return A syscall result.
3745
3746 long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int
3747 flags)
3748
3749 Description
3750 Find BTF type with given name and kind in vmlinux BTF or
3751 in module's BTFs.
3752
3753 Return Returns btf_id and btf_obj_fd in lower and upper 32 bits.
3754
3755 long bpf_sys_close(u32 fd)
3756
3757 Description
3758 Execute close syscall for given FD.
3759
3760 Return A syscall result.
3761
3762 long bpf_timer_init(struct bpf_timer *timer, struct bpf_map *map, u64
3763 flags)
3764
3765 Description
3766 Initialize the timer. First 4 bits of flags specify
3767 clockid. Only CLOCK_MONOTONIC, CLOCK_REALTIME,
3768 CLOCK_BOOTTIME are allowed. All other bits of flags are
3769 reserved. The verifier will reject the program if timer
3770 is not from the same map.
3771
3772 Return 0 on success. -EBUSY if timer is already initialized.
3773 -EINVAL if invalid flags are passed. -EPERM if timer is
3774 in a map that doesn't have any user references. The user
3775 space should either hold a file descriptor to a map with
3776 timers or pin such map in bpffs. When map is unpinned or
3777 file descriptor is closed all timers in the map will be
3778 cancelled and freed.
3779
3780 long bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn)
3781
3782 Description
3783 Configure the timer to call callback_fn static function.
3784
3785 Return 0 on success. -EINVAL if timer was not initialized with
3786 bpf_timer_init() earlier. -EPERM if timer is in a map
3787 that doesn't have any user references. The user space
3788 should either hold a file descriptor to a map with timers
3789 or pin such map in bpffs. When map is unpinned or file
3790 descriptor is closed all timers in the map will be can‐
3791 celled and freed.
3792
3793 long bpf_timer_start(struct bpf_timer *timer, u64 nsecs, u64 flags)
3794
3795 Description
3796 Set timer expiration N nanoseconds from the current time.
3797 The configured callback will be invoked in soft irq con‐
3798 text on some cpu and will not repeat unless another
3799 bpf_timer_start() is made. In such case the next invoca‐
3800 tion can migrate to a different cpu. Since struct
3801 bpf_timer is a field inside map element the map owns the
3802 timer. The bpf_timer_set_callback() will increment refcnt
3803 of BPF program to make sure that callback_fn code stays
3804 valid. When user space reference to a map reaches zero
3805 all timers in a map are cancelled and corresponding pro‐
3806 gram's refcnts are decremented. This is done to make sure
3807 that Ctrl-C of a user process doesn't leave any timers
3808 running. If map is pinned in bpffs the callback_fn can
3809 re-arm itself indefinitely. bpf_map_update/delete_elem()
3810 helpers and user space sys_bpf commands cancel and free
3811 the timer in the given map element. The map can contain
3812 timers that invoke callback_fn-s from different programs.
3813 The same callback_fn can serve different timers from dif‐
3814 ferent maps if key/value layout matches across maps. Ev‐
3815 ery bpf_timer_set_callback() can have different call‐
3816 back_fn.
3817
3818 Return 0 on success. -EINVAL if timer was not initialized with
3819 bpf_timer_init() earlier or invalid flags are passed.
3820
3821 long bpf_timer_cancel(struct bpf_timer *timer)
3822
3823 Description
3824 Cancel the timer and wait for callback_fn to finish if it
3825 was running.
3826
3827 Return 0 if the timer was not active. 1 if the timer was ac‐
3828 tive. -EINVAL if timer was not initialized with
3829 bpf_timer_init() earlier. -EDEADLK if callback_fn tried
3830 to call bpf_timer_cancel() on its own timer which would
3831 have led to a deadlock otherwise.
3832
3833 u64 bpf_get_func_ip(void *ctx)
3834
3835 Description
3836 Get address of the traced function (for tracing and
3837 kprobe programs).
3838
3839 Return Address of the traced function. 0 for kprobes placed
3840 within the function (not at the entry).
3841
3842 u64 bpf_get_attach_cookie(void *ctx)
3843
3844 Description
3845 Get bpf_cookie value provided (optionally) during the
3846 program attachment. It might be different for each indi‐
3847 vidual attachment, even if BPF program itself is the
3848 same. Expects BPF program context ctx as a first argu‐
3849 ment.
3850
3851 Supported for the following program types:
3852
3853 • kprobe/uprobe;
3854
3855 • tracepoint;
3856
3857 • perf_event.
3858
3859 Return Value specified by user at BPF link creation/attachment
3860 time or 0, if it was not specified.
3861
3862 long bpf_task_pt_regs(struct task_struct *task)
3863
3864 Description
3865 Get the struct pt_regs associated with task.
3866
3867 Return A pointer to struct pt_regs.
3868
3869 long bpf_get_branch_snapshot(void *entries, u32 size, u64 flags)
3870
3871 Description
3872 Get branch trace from hardware engines like Intel LBR.
3873 The hardware engine is stopped shortly after the helper
3874 is called. Therefore, the user need to filter branch en‐
3875 tries based on the actual use case. To capture branch
3876 trace before the trigger point of the BPF program, the
3877 helper should be called at the beginning of the BPF pro‐
3878 gram.
3879
3880 The data is stored as struct perf_branch_entry into out‐
3881 put buffer entries. size is the size of entries in bytes.
3882 flags is reserved for now and must be zero.
3883
3884 Return On success, number of bytes written to buf. On error, a
3885 negative value.
3886
3887 -EINVAL if flags is not zero.
3888
3889 -ENOENT if architecture does not support branch records.
3890
3891 long bpf_trace_vprintk(const char *fmt, u32 fmt_size, const void *data,
3892 u32 data_len)
3893
3894 Description
3895 Behaves like bpf_trace_printk() helper, but takes an ar‐
3896 ray of u64 to format and can handle more format args as a
3897 result.
3898
3899 Arguments are to be used as in bpf_seq_printf() helper.
3900
3901 Return The number of bytes written to the buffer, or a negative
3902 error in case of failure.
3903
3904 struct unix_sock *bpf_skc_to_unix_sock(void *sk)
3905
3906 Description
3907 Dynamically cast a sk pointer to a unix_sock pointer.
3908
3909 Return sk if casting is valid, or NULL otherwise.
3910
3911 long bpf_kallsyms_lookup_name(const char *name, int name_sz, int flags,
3912 u64 *res)
3913
3914 Description
3915 Get the address of a kernel symbol, returned in res. res
3916 is set to 0 if the symbol is not found.
3917
3918 Return On success, zero. On error, a negative value.
3919
3920 -EINVAL if flags is not zero.
3921
3922 -EINVAL if string name is not the same size as name_sz.
3923
3924 -ENOENT if symbol is not found.
3925
3926 -EPERM if caller does not have permission to obtain ker‐
3927 nel address.
3928
3929 long bpf_find_vma(struct task_struct *task, u64 addr, void *call‐
3930 back_fn, void *callback_ctx, u64 flags)
3931
3932 Description
3933 Find vma of task that contains addr, call callback_fn
3934 function with task, vma, and callback_ctx. The call‐
3935 back_fn should be a static function and the callback_ctx
3936 should be a pointer to the stack. The flags is used to
3937 control certain aspects of the helper. Currently, the
3938 flags must be 0.
3939
3940 The expected callback signature is
3941
3942 long (*callback_fn)(struct task_struct *task, struct
3943 vm_area_struct *vma, void *callback_ctx);
3944
3945 Return 0 on success. -ENOENT if task->mm is NULL, or no vma
3946 contains addr. -EBUSY if failed to try lock mmap_lock.
3947 -EINVAL for invalid flags.
3948
3949 long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64
3950 flags)
3951
3952 Description
3953 For nr_loops, call callback_fn function with callback_ctx
3954 as the context parameter. The callback_fn should be a
3955 static function and the callback_ctx should be a pointer
3956 to the stack. The flags is used to control certain as‐
3957 pects of the helper. Currently, the flags must be 0.
3958 Currently, nr_loops is limited to 1 << 23 (~8 million)
3959 loops.
3960
3961 long (*callback_fn)(u32 index, void *ctx);
3962
3963 where index is the current index in the loop. The index
3964 is zero-indexed.
3965
3966 If callback_fn returns 0, the helper will continue to the
3967 next loop. If return value is 1, the helper will skip the
3968 rest of the loops and return. Other return values are not
3969 used now, and will be rejected by the verifier.
3970
3971 Return The number of loops performed, -EINVAL for invalid flags,
3972 -E2BIG if nr_loops exceeds the maximum number of loops.
3973
3974 long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
3975
3976 Description
3977 Do strncmp() between s1 and s2. s1 doesn't need to be
3978 null-terminated and s1_sz is the maximum storage size of
3979 s1. s2 must be a read-only string.
3980
3981 Return An integer less than, equal to, or greater than zero if
3982 the first s1_sz bytes of s1 is found to be less than, to
3983 match, or be greater than s2.
3984
3985 long bpf_get_func_arg(void *ctx, u32 n, u64 *value)
3986
3987 Description
3988 Get n-th argument register (zero based) of the traced
3989 function (for tracing programs) returned in value.
3990
3991 Return 0 on success. -EINVAL if n >= argument register count of
3992 traced function.
3993
3994 long bpf_get_func_ret(void *ctx, u64 *value)
3995
3996 Description
3997 Get return value of the traced function (for tracing pro‐
3998 grams) in value.
3999
4000 Return 0 on success. -EOPNOTSUPP for tracing programs other
4001 than BPF_TRACE_FEXIT or BPF_MODIFY_RETURN.
4002
4003 long bpf_get_func_arg_cnt(void *ctx)
4004
4005 Description
4006 Get number of registers of the traced function (for trac‐
4007 ing programs) where function arguments are stored in
4008 these registers.
4009
4010 Return The number of argument registers of the traced function.
4011
4012 int bpf_get_retval(void)
4013
4014 Description
4015 Get the BPF program's return value that will be returned
4016 to the upper layers.
4017
4018 This helper is currently supported by cgroup programs and
4019 only by the hooks where BPF program's return value is re‐
4020 turned to the userspace via errno.
4021
4022 Return The BPF program's return value.
4023
4024 int bpf_set_retval(int retval)
4025
4026 Description
4027 Set the BPF program's return value that will be returned
4028 to the upper layers.
4029
4030 This helper is currently supported by cgroup programs and
4031 only by the hooks where BPF program's return value is re‐
4032 turned to the userspace via errno.
4033
4034 Note that there is the following corner case where the
4035 program exports an error via bpf_set_retval but signals
4036 success via 'return 1':
4037 bpf_set_retval(-EPERM); return 1;
4038
4039 In this case, the BPF program's return value will use
4040 helper's -EPERM. This still holds true for
4041 cgroup/bind{4,6} which supports extra 'return 3' success
4042 case.
4043
4044 Return 0 on success, or a negative error in case of failure.
4045
4046 u64 bpf_xdp_get_buff_len(struct xdp_buff *xdp_md)
4047
4048 Description
4049 Get the total size of a given xdp buff (linear and paged
4050 area)
4051
4052 Return The total size of a given xdp buffer.
4053
4054 long bpf_xdp_load_bytes(struct xdp_buff *xdp_md, u32 offset, void *buf,
4055 u32 len)
4056
4057 Description
4058 This helper is provided as an easy way to load data from
4059 a xdp buffer. It can be used to load len bytes from off‐
4060 set from the frame associated to xdp_md, into the buffer
4061 pointed by buf.
4062
4063 Return 0 on success, or a negative error in case of failure.
4064
4065 long bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, void
4066 *buf, u32 len)
4067
4068 Description
4069 Store len bytes from buffer buf into the frame associated
4070 to xdp_md, at offset.
4071
4072 Return 0 on success, or a negative error in case of failure.
4073
4074 long bpf_copy_from_user_task(void *dst, u32 size, const void *user_ptr,
4075 struct task_struct *tsk, u64 flags)
4076
4077 Description
4078 Read size bytes from user space address user_ptr in tsk's
4079 address space, and stores the data in dst. flags is not
4080 used yet and is provided for future extensibility. This
4081 helper can only be used by sleepable programs.
4082
4083 Return 0 on success, or a negative error in case of failure. On
4084 error dst buffer is zeroed out.
4085
4086 long bpf_skb_set_tstamp(struct sk_buff *skb, u64 tstamp, u32
4087 tstamp_type)
4088
4089 Description
4090 Change the __sk_buff->tstamp_type to tstamp_type and set
4091 tstamp to the __sk_buff->tstamp together.
4092
4093 If there is no need to change the __sk_buff->tstamp_type,
4094 the tstamp value can be directly written to
4095 __sk_buff->tstamp instead.
4096
4097 BPF_SKB_TSTAMP_DELIVERY_MONO is the only tstamp that will
4098 be kept during bpf_redirect_*(). A non zero tstamp must
4099 be used with the BPF_SKB_TSTAMP_DELIVERY_MONO
4100 tstamp_type.
4101
4102 A BPF_SKB_TSTAMP_UNSPEC tstamp_type can only be used with
4103 a zero tstamp.
4104
4105 Only IPv4 and IPv6 skb->protocol are supported.
4106
4107 This function is most useful when it needs to set a mono
4108 delivery time to __sk_buff->tstamp and then bpf_redi‐
4109 rect_*() to the egress of an iface. For example, chang‐
4110 ing the (rcv) timestamp in __sk_buff->tstamp at ingress
4111 to a mono delivery time and then bpf_redirect_*() to
4112 sch_fq@phy-dev.
4113
4114 Return 0 on success. -EINVAL for invalid input -EOPNOTSUPP for
4115 unsupported protocol
4116
4117 long bpf_ima_file_hash(struct file *file, void *dst, u32 size)
4118
4119 Description
4120 Returns a calculated IMA hash of the file. If the hash
4121 is larger than size, then only size bytes will be copied
4122 to dst
4123
4124 Return The hash_algo is returned on success, -EOPNOTSUP if the
4125 hash calculation failed or -EINVAL if invalid arguments
4126 are passed.
4127
4128 void *bpf_kptr_xchg(void *map_value, void *ptr)
4129
4130 Description
4131 Exchange kptr at pointer map_value with ptr, and return
4132 the old value. ptr can be NULL, otherwise it must be a
4133 referenced pointer which will be released when this
4134 helper is called.
4135
4136 Return The old value of kptr (which can be NULL). The returned
4137 pointer if not NULL, is a reference which must be re‐
4138 leased using its corresponding release function, or moved
4139 into a BPF map before program exit.
4140
4141 void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key,
4142 u32 cpu)
4143
4144 Description
4145 Perform a lookup in percpu map for an entry associated to
4146 key on cpu.
4147
4148 Return Map value associated to key on cpu, or NULL if no entry
4149 was found or cpu is invalid.
4150
4151 struct mptcp_sock *bpf_skc_to_mptcp_sock(void *sk)
4152
4153 Description
4154 Dynamically cast a sk pointer to a mptcp_sock pointer.
4155
4156 Return sk if casting is valid, or NULL otherwise.
4157
4158 long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct
4159 bpf_dynptr *ptr)
4160
4161 Description
4162 Get a dynptr to local memory data.
4163
4164 data must be a ptr to a map value. The maximum size sup‐
4165 ported is DYNPTR_MAX_SIZE. flags is currently unused.
4166
4167 Return 0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
4168 -EINVAL if flags is not 0.
4169
4170 long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags,
4171 struct bpf_dynptr *ptr)
4172
4173 Description
4174 Reserve size bytes of payload in a ring buffer ringbuf
4175 through the dynptr interface. flags must be 0.
4176
4177 Please note that a corresponding bpf_ringbuf_sub‐
4178 mit_dynptr or bpf_ringbuf_discard_dynptr must be called
4179 on ptr, even if the reservation fails. This is enforced
4180 by the verifier.
4181
4182 Return 0 on success, or a negative error in case of failure.
4183
4184 void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
4185
4186 Description
4187 Submit reserved ring buffer sample, pointed to by data,
4188 through the dynptr interface. This is a no-op if the
4189 dynptr is invalid/null.
4190
4191 For more information on flags, please see 'bpf_ring‐
4192 buf_submit'.
4193
4194 Return Nothing. Always succeeds.
4195
4196 void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
4197
4198 Description
4199 Discard reserved ring buffer sample through the dynptr
4200 interface. This is a no-op if the dynptr is invalid/null.
4201
4202 For more information on flags, please see 'bpf_ring‐
4203 buf_discard'.
4204
4205 Return Nothing. Always succeeds.
4206
4207 long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32
4208 offset, u64 flags)
4209
4210 Description
4211 Read len bytes from src into dst, starting from offset
4212 into src. flags is currently unused.
4213
4214 Return 0 on success, -E2BIG if offset + len exceeds the length
4215 of src's data, -EINVAL if src is an invalid dynptr or if
4216 flags is not 0.
4217
4218 long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src,
4219 u32 len, u64 flags)
4220
4221 Description
4222 Write len bytes from src into dst, starting from offset
4223 into dst. flags is currently unused.
4224
4225 Return 0 on success, -E2BIG if offset + len exceeds the length
4226 of dst's data, -EINVAL if dst is an invalid dynptr or if
4227 dst is a read-only dynptr or if flags is not 0.
4228
4229 void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
4230
4231 Description
4232 Get a pointer to the underlying dynptr data.
4233
4234 len must be a statically known value. The returned data
4235 slice is invalidated whenever the dynptr is invalidated.
4236
4237 Return Pointer to the underlying dynptr data, NULL if the dynptr
4238 is read-only, if the dynptr is invalid, or if the offset
4239 and length is out of bounds.
4240
4241 s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr
4242 *th, u32 th_len)
4243
4244 Description
4245 Try to issue a SYN cookie for the packet with correspond‐
4246 ing IPv4/TCP headers, iph and th, without depending on a
4247 listening socket.
4248
4249 iph points to the IPv4 header.
4250
4251 th points to the start of the TCP header, while th_len
4252 contains the length of the TCP header (at least
4253 sizeof(struct tcphdr)).
4254
4255 Return On success, lower 32 bits hold the generated SYN cookie
4256 in followed by 16 bits which hold the MSS value for that
4257 cookie, and the top 16 bits are unused.
4258
4259 On failure, the returned value is one of the following:
4260
4261 -EINVAL if th_len is invalid.
4262
4263 s64 bpf_tcp_raw_gen_syncookie_ipv6(struct ipv6hdr *iph, struct tcphdr
4264 *th, u32 th_len)
4265
4266 Description
4267 Try to issue a SYN cookie for the packet with correspond‐
4268 ing IPv6/TCP headers, iph and th, without depending on a
4269 listening socket.
4270
4271 iph points to the IPv6 header.
4272
4273 th points to the start of the TCP header, while th_len
4274 contains the length of the TCP header (at least
4275 sizeof(struct tcphdr)).
4276
4277 Return On success, lower 32 bits hold the generated SYN cookie
4278 in followed by 16 bits which hold the MSS value for that
4279 cookie, and the top 16 bits are unused.
4280
4281 On failure, the returned value is one of the following:
4282
4283 -EINVAL if th_len is invalid.
4284
4285 -EPROTONOSUPPORT if CONFIG_IPV6 is not builtin.
4286
4287 long bpf_tcp_raw_check_syncookie_ipv4(struct iphdr *iph, struct tcphdr
4288 *th)
4289
4290 Description
4291 Check whether iph and th contain a valid SYN cookie ACK
4292 without depending on a listening socket.
4293
4294 iph points to the IPv4 header.
4295
4296 th points to the TCP header.
4297
4298 Return 0 if iph and th are a valid SYN cookie ACK.
4299
4300 On failure, the returned value is one of the following:
4301
4302 -EACCES if the SYN cookie is not valid.
4303
4304 long bpf_tcp_raw_check_syncookie_ipv6(struct ipv6hdr *iph, struct
4305 tcphdr *th)
4306
4307 Description
4308 Check whether iph and th contain a valid SYN cookie ACK
4309 without depending on a listening socket.
4310
4311 iph points to the IPv6 header.
4312
4313 th points to the TCP header.
4314
4315 Return 0 if iph and th are a valid SYN cookie ACK.
4316
4317 On failure, the returned value is one of the following:
4318
4319 -EACCES if the SYN cookie is not valid.
4320
4321 -EPROTONOSUPPORT if CONFIG_IPV6 is not builtin.
4322
4323 u64 bpf_ktime_get_tai_ns(void)
4324
4325 Description
4326 A nonsettable system-wide clock derived from wall-clock
4327 time but ignoring leap seconds. This clock does not ex‐
4328 perience discontinuities and backwards jumps caused by
4329 NTP inserting leap seconds as CLOCK_REALTIME does.
4330
4331 See: clock_gettime(CLOCK_TAI)
4332
4333 Return Current ktime.
4334
4335 long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn,
4336 void *ctx, u64 flags)
4337
4338 Description
4339 Drain samples from the specified user ring buffer, and
4340 invoke the provided callback for each such sample:
4341
4342 long (*callback_fn)(struct bpf_dynptr *dynptr, void
4343 *ctx);
4344
4345 If callback_fn returns 0, the helper will continue to try
4346 and drain the next sample, up to a maximum of
4347 BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value
4348 is 1, the helper will skip the rest of the samples and
4349 return. Other return values are not used now, and will be
4350 rejected by the verifier.
4351
4352 Return The number of drained samples if no error was encountered
4353 while draining samples, or 0 if no samples were present
4354 in the ring buffer. If a user-space producer was
4355 epoll-waiting on this map, and at least one sample was
4356 drained, they will receive an event notification notify‐
4357 ing them of available space in the ring buffer. If the
4358 BPF_RB_NO_WAKEUP flag is passed to this function, no
4359 wakeup notification will be sent. If the
4360 BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification
4361 will be sent even if no sample was drained.
4362
4363 On failure, the returned value is one of the following:
4364
4365 -EBUSY if the ring buffer is contended, and another call‐
4366 ing context was concurrently draining the ring buffer.
4367
4368 -EINVAL if user-space is not properly tracking the ring
4369 buffer due to the producer position not being aligned to
4370 8 bytes, a sample not being aligned to 8 bytes, or the
4371 producer position not matching the advertised length of a
4372 sample.
4373
4374 -E2BIG if user-space has tried to publish a sample which
4375 is larger than the size of the ring buffer, or which can‐
4376 not fit within a struct bpf_dynptr.
4377
4379 Example usage for most of the eBPF helpers listed in this manual page
4380 are available within the Linux kernel sources, at the following loca‐
4381 tions:
4382
4383 • samples/bpf/
4384
4385 • tools/testing/selftests/bpf/
4386
4388 eBPF programs can have an associated license, passed along with the
4389 bytecode instructions to the kernel when the programs are loaded. The
4390 format for that string is identical to the one in use for kernel mod‐
4391 ules (Dual licenses, such as "Dual BSD/GPL", may be used). Some helper
4392 functions are only accessible to programs that are compatible with the
4393 GNU Privacy License (GPL).
4394
4395 In order to use such helpers, the eBPF program must be loaded with the
4396 correct license string passed (via attr) to the bpf() system call, and
4397 this generally translates into the C source code of the program con‐
4398 taining a line similar to the following:
4399
4400 char ____license[] __attribute__((section("license"), used)) = "GPL";
4401
4403 This manual page is an effort to document the existing eBPF helper
4404 functions. But as of this writing, the BPF sub-system is under heavy
4405 development. New eBPF program or map types are added, along with new
4406 helper functions. Some helpers are occasionally made available for ad‐
4407 ditional program types. So in spite of the efforts of the community,
4408 this page might not be up-to-date. If you want to check by yourself
4409 what helper functions exist in your kernel, or what types of programs
4410 they can support, here are some files among the kernel tree that you
4411 may be interested in:
4412
4413 • include/uapi/linux/bpf.h is the main BPF header. It contains the full
4414 list of all helper functions, as well as many other BPF definitions
4415 including most of the flags, structs or constants used by the
4416 helpers.
4417
4418 • net/core/filter.c contains the definition of most network-related
4419 helper functions, and the list of program types from which they can
4420 be used.
4421
4422 • kernel/trace/bpf_trace.c is the equivalent for most tracing pro‐
4423 gram-related helpers.
4424
4425 • kernel/bpf/verifier.c contains the functions used to check that valid
4426 types of eBPF maps are used with a given helper function.
4427
4428 • kernel/bpf/ directory contains other files in which additional
4429 helpers are defined (for cgroups, sockmaps, etc.).
4430
4431 • The bpftool utility can be used to probe the availability of helper
4432 functions on the system (as well as supported program and map types,
4433 and a number of other parameters). To do so, run bpftool feature
4434 probe (see bpftool-feature(8) for details). Add the unprivileged key‐
4435 word to list features available to unprivileged users.
4436
4437 Compatibility between helper functions and program types can generally
4438 be found in the files where helper functions are defined. Look for the
4439 struct bpf_func_proto objects and for functions returning them: these
4440 functions contain a list of helpers that a given program type can call.
4441 Note that the default: label of the switch ... case used to filter
4442 helpers can call other functions, themselves allowing access to addi‐
4443 tional helpers. The requirement for GPL license is also in those struct
4444 bpf_func_proto.
4445
4446 Compatibility between helper functions and map types can be found in
4447 the check_map_func_compatibility() function in file kernel/bpf/veri‐
4448 fier.c.
4449
4450 Helper functions that invalidate the checks on data and data_end point‐
4451 ers for network processing are listed in function
4452 bpf_helper_changes_pkt_data() in file net/core/filter.c.
4453
4455 bpf(2), bpftool(8), cgroups(7), ip(8), perf_event_open(2), sendmsg(2),
4456 socket(7), tc-bpf(8)
4457
4458
4459
4460
4461Linux v6.1 2022-09-26 BPF-HELPERS(7)