bpf-helpers(7)

1BPF-HELPERS(7)                                                  BPF-HELPERS(7)
2
3
4

NAME

6       BPF-HELPERS - list of eBPF helper functions
7

DESCRIPTION

9       The  extended  Berkeley Packet Filter (eBPF) subsystem consists in pro‐
10       grams written in a pseudo-assembly language, then attached  to  one  of
11       the  several  kernel hooks and run in reaction of specific events. This
12       framework differs from the older, "classic" BPF (or "cBPF") in  several
13       aspects,  one  of  them being the ability to call special functions (or
14       "helpers") from within a program.  These functions are restricted to  a
15       white-list of helpers defined in the kernel.
16
17       These helpers are used by eBPF programs to interact with the system, or
18       with the context in which they work. For instance, they can be used  to
19       print  debugging messages, to get the time since the system was booted,
20       to interact with eBPF maps, or to  manipulate  network  packets.  Since
21       there  are  several eBPF program types, and that they do not run in the
22       same context, each program  type  can  only  call  a  subset  of  those
23       helpers.
24
25       Due  to  eBPF  conventions,  a helper can not have more than five argu‐
26       ments.
27
28       Internally, eBPF programs call directly into the compiled helper  func‐
29       tions  without  requiring  any foreign-function interface. As a result,
30       calling helpers introduces no overhead, thus offering excellent perfor‐
31       mance.
32
33       This  document is an attempt to list and document the helpers available
34       to eBPF developers. They are sorted by chronological order (the  oldest
35       helpers in the kernel at the top).
36

HELPERS

38       void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
39
40              Description
41                     Perform a lookup in map for an entry associated to key.
42
43              Return Map  value  associated  to  key,  or NULL if no entry was
44                     found.
45
46       long bpf_map_update_elem(struct bpf_map *map, const  void  *key,  const
47       void *value, u64 flags)
48
49              Description
50                     Add or update the value of the entry associated to key in
51                     map with value. flags is one of:
52
53                     BPF_NOEXIST
54                            The entry for key must not exist in the map.
55
56                     BPF_EXIST
57                            The entry for key must already exist in the map.
58
59                     BPF_ANY
60                            No condition on the existence  of  the  entry  for
61                            key.
62
63                     Flag  value  BPF_NOEXIST cannot be used for maps of types
64                     BPF_MAP_TYPE_ARRAY or BPF_MAP_TYPE_PERCPU_ARRAY  (all el‐
65                     ements always exist), the helper would return an error.
66
67              Return 0 on success, or a negative error in case of failure.
68
69       long bpf_map_delete_elem(struct bpf_map *map, const void *key)
70
71              Description
72                     Delete entry with key from map.
73
74              Return 0 on success, or a negative error in case of failure.
75
76       long bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
77
78              Description
79                     For  tracing  programs, safely attempt to read size bytes
80                     from kernel space address unsafe_ptr and store  the  data
81                     in dst.
82
83                     Generally,       use       bpf_probe_read_user()       or
84                     bpf_probe_read_kernel() instead.
85
86              Return 0 on success, or a negative error in case of failure.
87
88       u64 bpf_ktime_get_ns(void)
89
90              Description
91                     Return the time elapsed since system  boot,  in  nanosec‐
92                     onds.   Does  not  include time the system was suspended.
93                     See: clock_gettime(CLOCK_MONOTONIC)
94
95              Return Current ktime.
96
97       long bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
98
99              Description
100                     This helper is a "printk()-like" facility for  debugging.
101                     It  prints  a  message  defined  by  format  fmt (of size
102                     fmt_size) to  file  /sys/kernel/debug/tracing/trace  from
103                     DebugFS, if available. It can take up to three additional
104                     u64 arguments (as an eBPF helpers, the  total  number  of
105                     arguments is limited to five).
106
107                     Each  time the helper is called, it appends a line to the
108                     trace.  Lines are discarded while /sys/kernel/debug/trac‐
109                     ing/trace    is    open,    use   /sys/kernel/debug/trac‐
110                     ing/trace_pipe to avoid this.  The format of the trace is
111                     customizable,  and  the exact output one will get depends
112                     on the options set in /sys/kernel/debug/tracing/trace_op‐
113                     tions  (see  also  the  README file under the same direc‐
114                     tory). However, it usually defaults to something like:
115
116                        telnet-470   [001] .N.. 419421.045894: 0x00000001: <formatted msg>
117
118                     In the above:
119
120                        • telnet is the name of the current task.
121
122                        • 470 is the PID of the current task.
123
124                        • 001 is the CPU number on which the task is running.
125
126                        • In .N.., each character refers to a set  of  options
127                          (whether   irqs  are  enabled,  scheduling  options,
128                          whether hard/softirqs are  running,  level  of  pre‐
129                          empt_disabled    respectively).    N    means   that
130                          TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED are set.
131
132                        • 419421.045894 is a timestamp.
133
134                        • 0x00000001 is a fake value used by BPF for  the  in‐
135                          struction pointer register.
136
137                        • <formatted msg> is the message formatted with fmt.
138
139                     The  conversion  specifiers supported by fmt are similar,
140                     but more limited than for printk(). They are %d, %i,  %u,
141                     %x,  %ld,  %li, %lu, %lx, %lld, %lli, %llu, %llx, %p, %s.
142                     No modifier (size of field, padding with zeroes, etc.) is
143                     available,  and the helper will return -EINVAL (but print
144                     nothing) if it encounters an unknown specifier.
145
146                     Also, note that bpf_trace_printk() is  slow,  and  should
147                     only  be  used for debugging purposes. For this reason, a
148                     notice block (spanning several lines) is printed to  ker‐
149                     nel  logs  and  states that the helper should not be used
150                     "for production use" the first time this helper  is  used
151                     (or more precisely, when trace_printk() buffers are allo‐
152                     cated). For passing values to  user  space,  perf  events
153                     should be preferred.
154
155              Return The  number of bytes written to the buffer, or a negative
156                     error in case of failure.
157
158       u32 bpf_get_prandom_u32(void)
159
160              Description
161                     Get a pseudo-random number.
162
163                     From a security point of view, this helper uses  its  own
164                     pseudo-random internal state, and cannot be used to infer
165                     the seed of other random functions in  the  kernel.  How‐
166                     ever,  it is essential to note that the generator used by
167                     the helper is not cryptographically secure.
168
169              Return A random 32-bit unsigned value.
170
171       u32 bpf_get_smp_processor_id(void)
172
173              Description
174                     Get the SMP  (symmetric  multiprocessing)  processor  id.
175                     Note  that  all  programs  run  with preemption disabled,
176                     which means that the SMP processor id  is  stable  during
177                     all the execution of the program.
178
179              Return The SMP id of the processor running the program.
180
181       long  bpf_skb_store_bytes(struct  sk_buff  *skb, u32 offset, const void
182       *from, u32 len, u64 flags)
183
184              Description
185                     Store len bytes from address from into the packet associ‐
186                     ated  to  skb,  at  offset.  flags  are  a combination of
187                     BPF_F_RECOMPUTE_CSUM (automatically recompute the  check‐
188                     sum for the packet after storing the bytes) and BPF_F_IN‐
189                     VALIDATE_HASH (set skb->hash, skb->swhash and skb->l4hash
190                     to 0).
191
192                     A call to this helper is susceptible to change the under‐
193                     lying packet buffer. Therefore, at load time, all  checks
194                     on  pointers  previously done by the verifier are invali‐
195                     dated and must be performed again, if the helper is  used
196                     in combination with direct packet access.
197
198              Return 0 on success, or a negative error in case of failure.
199
200       long bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
201       to, u64 size)
202
203              Description
204                     Recompute the layer 3 (e.g. IP) checksum for  the  packet
205                     associated  to  skb.  Computation  is incremental, so the
206                     helper must know the former value  of  the  header  field
207                     that  was  modified  (from),  the new value of this field
208                     (to), and the number of bytes (2 or 4)  for  this  field,
209                     stored  in  size.  Alternatively, it is possible to store
210                     the difference between the previous and the new values of
211                     the  header  field  in to, by setting from and size to 0.
212                     For both methods, offset indicates the location of the IP
213                     checksum within the packet.
214
215                     This  helper  works  in combination with bpf_csum_diff(),
216                     which does not update the checksum in-place,  but  offers
217                     more  flexibility and can handle sizes larger than 2 or 4
218                     for the checksum to update.
219
220                     A call to this helper is susceptible to change the under‐
221                     lying  packet buffer. Therefore, at load time, all checks
222                     on pointers previously done by the verifier  are  invali‐
223                     dated  and must be performed again, if the helper is used
224                     in combination with direct packet access.
225
226              Return 0 on success, or a negative error in case of failure.
227
228       long bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
229       to, u64 flags)
230
231              Description
232                     Recompute  the  layer  4 (e.g. TCP, UDP or ICMP) checksum
233                     for the packet associated to skb. Computation  is  incre‐
234                     mental,  so  the helper must know the former value of the
235                     header field that was modified (from), the new  value  of
236                     this  field  (to),  and  the number of bytes (2 or 4) for
237                     this field, stored on the lowest four bits of flags.  Al‐
238                     ternatively,  it  is possible to store the difference be‐
239                     tween the previous and the new values of the header field
240                     in  to, by setting from and the four lowest bits of flags
241                     to 0. For both methods, offset indicates the location  of
242                     the  IP  checksum  within  the packet. In addition to the
243                     size of the field, flags can be added (bitwise OR) actual
244                     flags. With BPF_F_MARK_MANGLED_0, a null checksum is left
245                     untouched (unless BPF_F_MARK_ENFORCE is added  as  well),
246                     and for updates resulting in a null checksum the value is
247                     set to CSUM_MANGLED_0 instead. Flag BPF_F_PSEUDO_HDR  in‐
248                     dicates   the  checksum  is  to  be  computed  against  a
249                     pseudo-header.
250
251                     This helper works in  combination  with  bpf_csum_diff(),
252                     which  does  not update the checksum in-place, but offers
253                     more flexibility and can handle sizes larger than 2 or  4
254                     for the checksum to update.
255
256                     A call to this helper is susceptible to change the under‐
257                     lying packet buffer. Therefore, at load time, all  checks
258                     on  pointers  previously done by the verifier are invali‐
259                     dated and must be performed again, if the helper is  used
260                     in combination with direct packet access.
261
262              Return 0 on success, or a negative error in case of failure.
263
264       long  bpf_tail_call(void  *ctx, struct bpf_map *prog_array_map, u32 in‐
265       dex)
266
267              Description
268                     This special helper is used to trigger a "tail call",  or
269                     in  other  words,  to jump into another eBPF program. The
270                     same stack frame is used (but values on stack and in reg‐
271                     isters  for the caller are not accessible to the callee).
272                     This mechanism allows for program  chaining,  either  for
273                     raising  the  maximum  number  of available eBPF instruc‐
274                     tions,  or  to  execute  given  programs  in  conditional
275                     blocks.  For security reasons, there is an upper limit to
276                     the number of successive tail  calls  that  can  be  per‐
277                     formed.
278
279                     Upon  call  of  this helper, the program attempts to jump
280                     into a program referenced  at  index  index  in  prog_ar‐
281                     ray_map,  a  special map of type BPF_MAP_TYPE_PROG_ARRAY,
282                     and passes ctx, a pointer to the context.
283
284                     If the call succeeds, the  kernel  immediately  runs  the
285                     first instruction of the new program. This is not a func‐
286                     tion call, and it never returns to the previous  program.
287                     If the call fails, then the helper has no effect, and the
288                     caller continues to run its  subsequent  instructions.  A
289                     call  can  fail  if  the destination program for the jump
290                     does not exist (i.e. index is superior to the  number  of
291                     entries  in  prog_array_map), or if the maximum number of
292                     tail calls has been reached for this chain  of  programs.
293                     This  limit  is  defined  in  the  kernel  by  the  macro
294                     MAX_TAIL_CALL_CNT (not accessible to user  space),  which
295                     is currently set to 32.
296
297              Return 0 on success, or a negative error in case of failure.
298
299       long bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
300
301              Description
302                     Clone  and  redirect  the packet associated to skb to an‐
303                     other net device  of  index  ifindex.  Both  ingress  and
304                     egress  interfaces  can  be  used  for  redirection.  The
305                     BPF_F_INGRESS value in flags is used to make the distinc‐
306                     tion  (ingress  path  is selected if the flag is present,
307                     egress path otherwise).  This is the only flag  supported
308                     for now.
309
310                     In comparison with bpf_redirect() helper, bpf_clone_redi‐
311                     rect() has the associated cost of duplicating the  packet
312                     buffer, but this can be executed out of the eBPF program.
313                     Conversely, bpf_redirect() is more efficient, but  it  is
314                     handled through an action code where the redirection hap‐
315                     pens only after the eBPF program has returned.
316
317                     A call to this helper is susceptible to change the under‐
318                     lying  packet buffer. Therefore, at load time, all checks
319                     on pointers previously done by the verifier  are  invali‐
320                     dated  and must be performed again, if the helper is used
321                     in combination with direct packet access.
322
323              Return 0 on success, or a negative error in case of failure.
324
325       u64 bpf_get_current_pid_tgid(void)
326
327              Return A 64-bit integer containing the current tgid and pid, and
328                     created   as   such:  current_task->tgid  <<  32  |  cur‐
329                     rent_task->pid.
330
331       u64 bpf_get_current_uid_gid(void)
332
333              Return A 64-bit integer containing the current GID and UID,  and
334                     created as such: current_gid << 32 | current_uid.
335
336       long bpf_get_current_comm(void *buf, u32 size_of_buf)
337
338              Description
339                     Copy  the  comm attribute of the current task into buf of
340                     size_of_buf. The comm attribute contains the name of  the
341                     executable (excluding the path) for the current task. The
342                     size_of_buf must be strictly positive.  On  success,  the
343                     helper  makes  sure  that  the  buf is NUL-terminated. On
344                     failure, it is filled with zeroes.
345
346              Return 0 on success, or a negative error in case of failure.
347
348       u32 bpf_get_cgroup_classid(struct sk_buff *skb)
349
350              Description
351                     Retrieve the classid for the current task, i.e.  for  the
352                     net_cls cgroup to which skb belongs.
353
354                     This  helper  can  be  used on TC egress path, but not on
355                     ingress.
356
357                     The net_cls cgroup provides an interface to  tag  network
358                     packets based on a user-provided identifier for all traf‐
359                     fic coming  from  the  tasks  belonging  to  the  related
360                     cgroup. See also the related kernel documentation, avail‐
361                     able from the Linux  sources  in  file  Documentation/ad‐
362                     min-guide/cgroup-v1/net_cls.rst.
363
364                     The  Linux kernel has two versions for cgroups: there are
365                     cgroups v1 and cgroups v2. Both are available  to  users,
366                     who  can use a mixture of them, but note that the net_cls
367                     cgroup is for cgroup v1 only. This makes it  incompatible
368                     with   BPF   programs   run   on   cgroups,  which  is  a
369                     cgroup-v2-only feature (a socket can only hold  data  for
370                     one version of cgroups at a time).
371
372                     This  helper is only available is the kernel was compiled
373                     with the CONFIG_CGROUP_NET_CLASSID  configuration  option
374                     set to "y" or to "m".
375
376              Return The classid, or 0 for the default unconfigured classid.
377
378       long  bpf_skb_vlan_push(struct  sk_buff  *skb,  __be16  vlan_proto, u16
379       vlan_tci)
380
381              Description
382                     Push a vlan_tci (VLAN tag control information) of  proto‐
383                     col  vlan_proto to the packet associated to skb, then up‐
384                     date the checksum. Note that if vlan_proto  is  different
385                     from ETH_P_8021Q and ETH_P_8021AD, it is considered to be
386                     ETH_P_8021Q.
387
388                     A call to this helper is susceptible to change the under‐
389                     lying  packet buffer. Therefore, at load time, all checks
390                     on pointers previously done by the verifier  are  invali‐
391                     dated  and must be performed again, if the helper is used
392                     in combination with direct packet access.
393
394              Return 0 on success, or a negative error in case of failure.
395
396       long bpf_skb_vlan_pop(struct sk_buff *skb)
397
398              Description
399                     Pop a VLAN header from the packet associated to skb.
400
401                     A call to this helper is susceptible to change the under‐
402                     lying  packet buffer. Therefore, at load time, all checks
403                     on pointers previously done by the verifier  are  invali‐
404                     dated  and must be performed again, if the helper is used
405                     in combination with direct packet access.
406
407              Return 0 on success, or a negative error in case of failure.
408
409       long bpf_skb_get_tunnel_key(struct sk_buff *skb, struct  bpf_tunnel_key
410       *key, u32 size, u64 flags)
411
412              Description
413                     Get  tunnel  metadata. This helper takes a pointer key to
414                     an empty struct bpf_tunnel_key  of  size,  that  will  be
415                     filled  with tunnel metadata for the packet associated to
416                     skb.  The flags can be set to  BPF_F_TUNINFO_IPV6,  which
417                     indicates  that  the tunnel is based on IPv6 protocol in‐
418                     stead of IPv4.
419
420                     The struct bpf_tunnel_key is an object  that  generalizes
421                     the principal parameters used by various tunneling proto‐
422                     cols into a single struct. This way, it can  be  used  to
423                     easily  make  a decision based on the contents of the en‐
424                     capsulation header, "summarized" in this struct. In  par‐
425                     ticular,  it holds the IP address of the remote end (IPv4
426                     or IPv6, depending on the case)  in  key->remote_ipv4  or
427                     key->remote_ipv6. Also, this struct exposes the key->tun‐
428                     nel_id, which is generally mapped to a VNI (Virtual  Net‐
429                     work  Identifier),  making  it programmable together with
430                     the bpf_skb_set_tunnel_key() helper.
431
432                     Let's imagine that the following code is part of  a  pro‐
433                     gram  attached to the TC ingress interface, on one end of
434                     a GRE tunnel, and is supposed to filter out all  messages
435                     coming  from  remote  ends  with  IPv4 address other than
436                     10.0.0.1:
437
438                        int ret;
439                        struct bpf_tunnel_key key = {};
440
441                        ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
442                        if (ret < 0)
443                                return TC_ACT_SHOT;     // drop packet
444
445                        if (key.remote_ipv4 != 0x0a000001)
446                                return TC_ACT_SHOT;     // drop packet
447
448                        return TC_ACT_OK;               // accept packet
449
450                     This interface can also be used  with  all  encapsulation
451                     devices  that can operate in "collect metadata" mode: in‐
452                     stead of having one network device per specific  configu‐
453                     ration,  the "collect metadata" mode only requires a sin‐
454                     gle device where the configuration can be extracted  from
455                     this helper.
456
457                     This  can  be  used together with various tunnels such as
458                     VXLan, Geneve, GRE or IP in IP (IPIP).
459
460              Return 0 on success, or a negative error in case of failure.
461
462       long bpf_skb_set_tunnel_key(struct sk_buff *skb, struct  bpf_tunnel_key
463       *key, u32 size, u64 flags)
464
465              Description
466                     Populate  tunnel  metadata  for packet associated to skb.
467                     The tunnel metadata is set to the  contents  of  key,  of
468                     size.  The  flags can be set to a combination of the fol‐
469                     lowing values:
470
471                     BPF_F_TUNINFO_IPV6
472                            Indicate that the tunnel is based on IPv6 protocol
473                            instead of IPv4.
474
475                     BPF_F_ZERO_CSUM_TX
476                            For  IPv4  packets,  add a flag to tunnel metadata
477                            indicating that  checksum  computation  should  be
478                            skipped and checksum set to zeroes.
479
480                     BPF_F_DONT_FRAGMENT
481                            Add  a flag to tunnel metadata indicating that the
482                            packet should not be fragmented.
483
484                     BPF_F_SEQ_NUMBER
485                            Add a flag to tunnel metadata  indicating  that  a
486                            sequence  number  should be added to tunnel header
487                            before sending the packet. This flag was added for
488                            GRE  encapsulation,  but  might be used with other
489                            protocols as well in the future.
490
491                     Here is a typical usage on the transmit path:
492
493                        struct bpf_tunnel_key key;
494                             populate key ...
495                        bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
496                        bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
497
498                     See also the description of the  bpf_skb_get_tunnel_key()
499                     helper for additional information.
500
501              Return 0 on success, or a negative error in case of failure.
502
503       u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
504
505              Description
506                     Read  the  value of a perf event counter. This helper re‐
507                     lies on a map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY.  The
508                     nature  of the perf event counter is selected when map is
509                     updated with perf event file descriptors. The map  is  an
510                     array  whose  size  is  the number of available CPUs, and
511                     each cell contains a value relative to one CPU. The value
512                     to  retrieve is indicated by flags, that contains the in‐
513                     dex of the CPU to look up, masked with  BPF_F_INDEX_MASK.
514                     Alternatively,  flags  can be set to BPF_F_CURRENT_CPU to
515                     indicate that the value for the current CPU should be re‐
516                     trieved.
517
518                     Note that before Linux 4.13, only hardware perf event can
519                     be retrieved.
520
521                     Also,    be    aware    that     the     newer     helper
522                     bpf_perf_event_read_value()     is    recommended    over
523                     bpf_perf_event_read() in general. The latter has some ABI
524                     quirks where error and counter value are used as a return
525                     code (which is wrong to do  since  ranges  may  overlap).
526                     This  issue  is  fixed  with bpf_perf_event_read_value(),
527                     which at the same time provides more  features  over  the
528                     bpf_perf_event_read()  interface. Please refer to the de‐
529                     scription of bpf_perf_event_read_value() for details.
530
531              Return The value of the perf event counter read from the map, or
532                     a negative error code in case of failure.
533
534       long bpf_redirect(u32 ifindex, u64 flags)
535
536              Description
537                     Redirect  the  packet  to  another  net  device  of index
538                     ifindex.    This   helper   is   somewhat   similar    to
539                     bpf_clone_redirect(),  except  that  the  packet  is  not
540                     cloned, which provides increased performance.
541
542                     Except for XDP, both ingress and egress interfaces can be
543                     used for redirection. The BPF_F_INGRESS value in flags is
544                     used to make the distinction (ingress path is selected if
545                     the  flag  is present, egress path otherwise). Currently,
546                     XDP only supports redirection to  the  egress  interface,
547                     and accepts no flag at all.
548
549                     The  same  effect  can  also  be  attained  with the more
550                     generic bpf_redirect_map(), which uses a BPF map to store
551                     the  redirect  target instead of providing it directly to
552                     the helper.
553
554              Return For XDP, the helper returns XDP_REDIRECT  on  success  or
555                     XDP_ABORTED on error. For other program types, the values
556                     are TC_ACT_REDIRECT on success or TC_ACT_SHOT on error.
557
558       u32 bpf_get_route_realm(struct sk_buff *skb)
559
560              Description
561                     Retrieve the realm or the  route,  that  is  to  say  the
562                     tclassid  field of the destination for the skb. The iden‐
563                     tifier retrieved is a user-provided tag, similar  to  the
564                     one  used  with  the  net_cls cgroup (see description for
565                     bpf_get_cgroup_classid() helper), but here  this  tag  is
566                     held by a route (a destination entry), not by a task.
567
568                     Retrieving  this  identifier  works  with  the  clsact TC
569                     egress hook (see also  tc-bpf(8)),  or  alternatively  on
570                     conventional  classful  egress  qdiscs,  but  not  on  TC
571                     ingress path. In case of clsact TC egress hook, this  has
572                     the advantage that, internally, the destination entry has
573                     not been dropped yet in the transmit path. Therefore, the
574                     destination  entry  does not need to be artificially held
575                     via netif_keep_dst() for a classful qdisc until  the  skb
576                     is freed.
577
578                     This  helper is available only if the kernel was compiled
579                     with CONFIG_IP_ROUTE_CLASSID configuration option.
580
581              Return The realm of the route for the packet associated to  skb,
582                     or 0 if none was found.
583
584       long  bpf_perf_event_output(void  *ctx, struct bpf_map *map, u64 flags,
585       void *data, u64 size)
586
587              Description
588                     Write raw data blob into a special BPF perf event held by
589                     map  of  type  BPF_MAP_TYPE_PERF_EVENT_ARRAY.  This  perf
590                     event must have the following attributes: PERF_SAMPLE_RAW
591                     as   sample_type,   PERF_TYPE_SOFTWARE   as   type,   and
592                     PERF_COUNT_SW_BPF_OUTPUT as config.
593
594                     The flags are used to indicate the index in map for which
595                     the value must be put, masked with BPF_F_INDEX_MASK.  Al‐
596                     ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
597                     dicate  that  the index of the current CPU core should be
598                     used.
599
600                     The value to write, of size, is passed through eBPF stack
601                     and pointed by data.
602
603                     The  context  of  the program ctx needs also be passed to
604                     the helper.
605
606                     On user space, a program willing to read the values needs
607                     to  call  perf_event_open() on the perf event (either for
608                     one or for all CPUs) and to  store  the  file  descriptor
609                     into  the  map. This must be done before the eBPF program
610                     can send data into it. An example is  available  in  file
611                     samples/bpf/trace_output_user.c   in   the  Linux  kernel
612                     source tree (the eBPF  program  counterpart  is  in  sam‐
613                     ples/bpf/trace_output_kern.c).
614
615                     bpf_perf_event_output()  achieves better performance than
616                     bpf_trace_printk() for sharing data with user space,  and
617                     is much better suitable for streaming data from eBPF pro‐
618                     grams.
619
620                     Note that this helper is not restricted  to  tracing  use
621                     cases and can be used with programs attached to TC or XDP
622                     as well, where it allows for passing data to  user  space
623                     listeners. Data can be:
624
625                     • Only custom structs,
626
627                     • Only the packet payload, or
628
629                     • A combination of both.
630
631              Return 0 on success, or a negative error in case of failure.
632
633       long bpf_skb_load_bytes(const void *skb, u32 offset, void *to, u32 len)
634
635              Description
636                     This helper was provided as an easy way to load data from
637                     a packet. It can be used to load len  bytes  from  offset
638                     from  the  packet  associated  to  skb,  into  the buffer
639                     pointed by to.
640
641                     Since Linux 4.7, usage of this helper has mostly been re‐
642                     placed by "direct packet access", enabling packet data to
643                     be manipulated with skb->data and skb->data_end  pointing
644                     respectively  to the first byte of packet data and to the
645                     byte after the last byte of packet data. However, it  re‐
646                     mains  useful  if  one wishes to read large quantities of
647                     data at once from a packet into the eBPF stack.
648
649              Return 0 on success, or a negative error in case of failure.
650
651       long bpf_get_stackid(void *ctx, struct bpf_map *map, u64 flags)
652
653              Description
654                     Walk a user or a kernel  stack  and  return  its  id.  To
655                     achieve this, the helper needs ctx, which is a pointer to
656                     the context on which the tracing program is executed, and
657                     a pointer to a map of type BPF_MAP_TYPE_STACK_TRACE.
658
659                     The  last  argument,  flags,  holds  the  number of stack
660                     frames  to  skip   (from   0   to   255),   masked   with
661                     BPF_F_SKIP_FIELD_MASK. The next bits can be used to set a
662                     combination of the following flags:
663
664                     BPF_F_USER_STACK
665                            Collect a user space stack  instead  of  a  kernel
666                            stack.
667
668                     BPF_F_FAST_STACK_CMP
669                            Compare stacks by hash only.
670
671                     BPF_F_REUSE_STACKID
672                            If   two  different  stacks  hash  into  the  same
673                            stackid, discard the old one.
674
675                     The stack id retrieved is a 32 bit  long  integer  handle
676                     which  can be further combined with other data (including
677                     other stack ids) and used as a key into maps. This can be
678                     useful  for generating a variety of graphs (such as flame
679                     graphs or off-cpu graphs).
680
681                     For walking a stack, this helper is an  improvement  over
682                     bpf_probe_read(),  which  can be used with unrolled loops
683                     but is not efficient and consumes a lot of eBPF  instruc‐
684                     tions.   Instead,  bpf_get_stackid()  can  collect  up to
685                     PERF_MAX_STACK_DEPTH both kernel and  user  frames.  Note
686                     that  this  limit  can be controlled with the sysctl pro‐
687                     gram, and that it should be manually increased  in  order
688                     to profile long user stacks (such as stacks for Java pro‐
689                     grams). To do so, use:
690
691                        # sysctl kernel.perf_event_max_stack=<new value>
692
693              Return The positive or null stack id on success, or  a  negative
694                     error in case of failure.
695
696       s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size,
697       __wsum seed)
698
699              Description
700                     Compute  a  checksum  difference,  from  the  raw  buffer
701                     pointed by from, of length from_size (that must be a mul‐
702                     tiple of 4), towards the raw buffer  pointed  by  to,  of
703                     size to_size (same remark). An optional seed can be added
704                     to the value (this can be cascaded,  the  seed  may  come
705                     from a previous call to the helper).
706
707                     This is flexible enough to be used in several ways:
708
709                     • With from_size == 0, to_size > 0 and seed set to check‐
710                       sum, it can be used when pushing new data.
711
712                     • With from_size > 0, to_size == 0 and seed set to check‐
713                       sum, it can be used when removing data from a packet.
714
715                     • With  from_size  > 0, to_size > 0 and seed set to 0, it
716                       can be used to compute a diff. Note that from_size  and
717                       to_size do not need to be equal.
718
719                     This   helper   can   be   used   in   combination   with
720                     bpf_l3_csum_replace() and bpf_l4_csum_replace(), to which
721                     one   can   feed   in   the   difference   computed  with
722                     bpf_csum_diff().
723
724              Return The checksum result, or a negative error code in case  of
725                     failure.
726
727       long bpf_skb_get_tunnel_opt(struct sk_buff *skb, void *opt, u32 size)
728
729              Description
730                     Retrieve  tunnel  options metadata for the packet associ‐
731                     ated to skb, and store the raw tunnel option data to  the
732                     buffer opt of size.
733
734                     This  helper  can be used with encapsulation devices that
735                     can operate in "collect metadata" mode (please  refer  to
736                     the  related  note in the description of bpf_skb_get_tun‐
737                     nel_key() for more details). A particular  example  where
738                     this can be used is in combination with the Geneve encap‐
739                     sulation protocol, where  it  allows  for  pushing  (with
740                     bpf_skb_get_tunnel_opt() helper) and retrieving arbitrary
741                     TLVs (Type-Length-Value headers) from the  eBPF  program.
742                     This allows for full customization of these headers.
743
744              Return The size of the option data retrieved.
745
746       long bpf_skb_set_tunnel_opt(struct sk_buff *skb, void *opt, u32 size)
747
748              Description
749                     Set  tunnel options metadata for the packet associated to
750                     skb to the option data contained in the raw buffer opt of
751                     size.
752
753                     See  also the description of the bpf_skb_get_tunnel_opt()
754                     helper for additional information.
755
756              Return 0 on success, or a negative error in case of failure.
757
758       long bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
759
760              Description
761                     Change the protocol of the skb to proto.  Currently  sup‐
762                     ported are transition from IPv4 to IPv6, and from IPv6 to
763                     IPv4. The helper takes care of  the  groundwork  for  the
764                     transition,  including  resizing  the  socket buffer. The
765                     eBPF program is expected to fill the new headers, if any,
766                     via skb_store_bytes() and to recompute the checksums with
767                     bpf_l3_csum_replace() and bpf_l4_csum_replace(). The main
768                     case  for  this helper is to perform NAT64 operations out
769                     of an eBPF program.
770
771                     Internally, the GSO type is marked as dodgy so that head‐
772                     ers  are  checked  and  segments  are recalculated by the
773                     GSO/GRO engine.  The size for GSO target  is  adapted  as
774                     well.
775
776                     All  values  for flags are reserved for future usage, and
777                     must be left at zero.
778
779                     A call to this helper is susceptible to change the under‐
780                     lying  packet buffer. Therefore, at load time, all checks
781                     on pointers previously done by the verifier  are  invali‐
782                     dated  and must be performed again, if the helper is used
783                     in combination with direct packet access.
784
785              Return 0 on success, or a negative error in case of failure.
786
787       long bpf_skb_change_type(struct sk_buff *skb, u32 type)
788
789              Description
790                     Change the packet type for the packet associated to  skb.
791                     This  comes down to setting skb->pkt_type to type, except
792                     the  eBPF  program  does  not  have  a  write  access  to
793                     skb->pkt_type beside this helper. Using a helper here al‐
794                     lows for graceful handling of errors.
795
796                     The major  use  case  is  to  change  incoming  skb*s  to
797                     **PACKET_HOST* in a programmatic way instead of having to
798                     recirculate via redirect(..., BPF_F_INGRESS),  for  exam‐
799                     ple.
800
801                     Note  that type only allows certain values. At this time,
802                     they are:
803
804                     PACKET_HOST
805                            Packet is for us.
806
807                     PACKET_BROADCAST
808                            Send packet to all.
809
810                     PACKET_MULTICAST
811                            Send packet to group.
812
813                     PACKET_OTHERHOST
814                            Send packet to someone else.
815
816              Return 0 on success, or a negative error in case of failure.
817
818       long bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32
819       index)
820
821              Description
822                     Check  whether skb is a descendant of the cgroup2 held by
823                     map of type BPF_MAP_TYPE_CGROUP_ARRAY, at index.
824
825              Return The return value depends on the result of the  test,  and
826                     can be:
827
828                     • 0, if the skb failed the cgroup2 descendant test.
829
830                     • 1, if the skb succeeded the cgroup2 descendant test.
831
832                     • A negative error code, if an error occurred.
833
834       u32 bpf_get_hash_recalc(struct sk_buff *skb)
835
836              Description
837                     Retrieve  the hash of the packet, skb->hash. If it is not
838                     set, in particular if the hash was cleared  due  to  man‐
839                     gling,  recompute  this  hash. Later accesses to the hash
840                     can be done directly with skb->hash.
841
842                     Calling bpf_set_hash_invalid(), changing a packet  proto‐
843                     type     with    bpf_skb_change_proto(),    or    calling
844                     bpf_skb_store_bytes() with the BPF_F_INVALIDATE_HASH  are
845                     actions  susceptible  to  clear the hash and to trigger a
846                     new computation for the  next  call  to  bpf_get_hash_re‐
847                     calc().
848
849              Return The 32-bit hash.
850
851       u64 bpf_get_current_task(void)
852
853              Return A pointer to the current task struct.
854
855       long bpf_probe_write_user(void *dst, const void *src, u32 len)
856
857              Description
858                     Attempt  in a safe way to write len bytes from the buffer
859                     src to dst in memory. It only works for threads that  are
860                     in  user  context, and dst must be a valid user space ad‐
861                     dress.
862
863                     This helper should not be used to implement any  kind  of
864                     security mechanism because of TOC-TOU attacks, but rather
865                     to debug, divert, and manipulate execution of  semi-coop‐
866                     erative processes.
867
868                     Keep  in mind that this feature is meant for experiments,
869                     and it has a risk of crashing the system and running pro‐
870                     grams.  Therefore, when an eBPF program using this helper
871                     is attached, a warning including PID and process name  is
872                     printed to kernel logs.
873
874              Return 0 on success, or a negative error in case of failure.
875
876       long bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
877
878              Description
879                     Check  whether the probe is being run is the context of a
880                     given subset of the cgroup2  hierarchy.  The  cgroup2  to
881                     test is held by map of type BPF_MAP_TYPE_CGROUP_ARRAY, at
882                     index.
883
884              Return The return value depends on the result of the  test,  and
885                     can be:
886
887                     • 0, if the skb task belongs to the cgroup2.
888
889                     • 1, if the skb task does not belong to the cgroup2.
890
891                     • A negative error code, if an error occurred.
892
893       long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
894
895              Description
896                     Resize (trim or grow) the packet associated to skb to the
897                     new len. The flags are reserved  for  future  usage,  and
898                     must be left at zero.
899
900                     The  basic  idea  is  that the helper performs the needed
901                     work to change the size of the packet, then the eBPF pro‐
902                     gram    rewrites    the    rest    via    helpers    like
903                     bpf_skb_store_bytes(),             bpf_l3_csum_replace(),
904                     bpf_l3_csum_replace()  and  others. This helper is a slow
905                     path utility intended for replies with control  messages.
906                     And  because it is targeted for slow path, the helper it‐
907                     self can afford to be slow: it implicitly linearizes, un‐
908                     clones and drops offloads from the skb.
909
910                     A call to this helper is susceptible to change the under‐
911                     lying packet buffer. Therefore, at load time, all  checks
912                     on  pointers  previously done by the verifier are invali‐
913                     dated and must be performed again, if the helper is  used
914                     in combination with direct packet access.
915
916              Return 0 on success, or a negative error in case of failure.
917
918       long bpf_skb_pull_data(struct sk_buff *skb, u32 len)
919
920              Description
921                     Pull in non-linear data in case the skb is non-linear and
922                     not all of len are part of the linear section.  Make  len
923                     bytes  from skb readable and writable. If a zero value is
924                     passed for len, then the  whole  length  of  the  skb  is
925                     pulled.
926
927                     This  helper  is only needed for reading and writing with
928                     direct packet access.
929
930                     For direct packet access, testing that offsets to  access
931                     are  within  packet boundaries (test on skb->data_end) is
932                     susceptible to fail if offsets are invalid, or if the re‐
933                     quested  data is in non-linear parts of the skb. On fail‐
934                     ure the program can just bail out, or in the  case  of  a
935                     non-linear  buffer,  use a helper to make the data avail‐
936                     able. The bpf_skb_load_bytes() helper is a first solution
937                     to  access  the  data.  Another  one  consists  in  using
938                     bpf_skb_pull_data to pull in once the  non-linear  parts,
939                     then retesting and eventually access the data.
940
941                     At  the  same  time,  this also makes sure the skb is un‐
942                     cloned, which is a necessary condition for direct  write.
943                     As this needs to be an invariant for the write part only,
944                     the verifier detects writes and adds a prologue  that  is
945                     calling  bpf_skb_pull_data()  to  effectively unclone the
946                     skb from the very beginning in case it is indeed cloned.
947
948                     A call to this helper is susceptible to change the under‐
949                     lying  packet buffer. Therefore, at load time, all checks
950                     on pointers previously done by the verifier  are  invali‐
951                     dated  and must be performed again, if the helper is used
952                     in combination with direct packet access.
953
954              Return 0 on success, or a negative error in case of failure.
955
956       s64 bpf_csum_update(struct sk_buff *skb, __wsum csum)
957
958              Description
959                     Add the checksum csum into skb->csum in case  the  driver
960                     has  supplied  a checksum for the entire packet into that
961                     field. Return an error otherwise. This helper is intended
962                     to  be  used in combination with bpf_csum_diff(), in par‐
963                     ticular when the checksum needs to be updated after  data
964                     has  been  written  into the packet through direct packet
965                     access.
966
967              Return The checksum on success, or a negative error code in case
968                     of failure.
969
970       void bpf_set_hash_invalid(struct sk_buff *skb)
971
972              Description
973                     Invalidate  the  current  skb->hash. It can be used after
974                     mangling on headers through direct packet access, in  or‐
975                     der  to indicate that the hash is outdated and to trigger
976                     a recalculation the next time the kernel tries to  access
977                     this  hash  or  when  the bpf_get_hash_recalc() helper is
978                     called.
979
980       long bpf_get_numa_node_id(void)
981
982              Description
983                     Return the id of the current NUMA node. The  primary  use
984                     case  for this helper is the selection of sockets for the
985                     local NUMA node, when the program is attached to  sockets
986                     using   the  SO_ATTACH_REUSEPORT_EBPF  option  (see  also
987                     socket(7)), but the helper is  also  available  to  other
988                     eBPF  program  types,  similarly  to  bpf_get_smp_proces‐
989                     sor_id().
990
991              Return The id of current NUMA node.
992
993       long bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
994
995              Description
996                     Grows headroom of packet associated to  skb  and  adjusts
997                     the  offset  of  the  MAC  header accordingly, adding len
998                     bytes of space. It automatically extends and  reallocates
999                     memory as required.
1000
1001                     This  helper  can  be used on a layer 3 skb to push a MAC
1002                     header for redirection into a layer 2 device.
1003
1004                     All values for flags are reserved for future  usage,  and
1005                     must be left at zero.
1006
1007                     A call to this helper is susceptible to change the under‐
1008                     lying packet buffer. Therefore, at load time, all  checks
1009                     on  pointers  previously done by the verifier are invali‐
1010                     dated and must be performed again, if the helper is  used
1011                     in combination with direct packet access.
1012
1013              Return 0 on success, or a negative error in case of failure.
1014
1015       long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
1016
1017              Description
1018                     Adjust  (move)  xdp_md->data by delta bytes. Note that it
1019                     is possible to use  a  negative  value  for  delta.  This
1020                     helper  can  be used to prepare the packet for pushing or
1021                     popping headers.
1022
1023                     A call to this helper is susceptible to change the under‐
1024                     lying  packet buffer. Therefore, at load time, all checks
1025                     on pointers previously done by the verifier  are  invali‐
1026                     dated  and must be performed again, if the helper is used
1027                     in combination with direct packet access.
1028
1029              Return 0 on success, or a negative error in case of failure.
1030
1031       long bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
1032
1033              Description
1034                     Copy a NUL terminated string from an  unsafe  kernel  ad‐
1035                     dress  unsafe_ptr to dst. See bpf_probe_read_kernel_str()
1036                     for more details.
1037
1038                     Generally,     use      bpf_probe_read_user_str()      or
1039                     bpf_probe_read_kernel_str() instead.
1040
1041              Return On  success,  the strictly positive length of the string,
1042                     including the trailing NUL character. On error,  a  nega‐
1043                     tive value.
1044
1045       u64 bpf_get_socket_cookie(struct sk_buff *skb)
1046
1047              Description
1048                     If  the struct sk_buff pointed by skb has a known socket,
1049                     retrieve the cookie (generated by  the  kernel)  of  this
1050                     socket.   If  no  cookie has been set yet, generate a new
1051                     cookie. Once generated, the socket cookie remains  stable
1052                     for the life of the socket. This helper can be useful for
1053                     monitoring per socket networking traffic statistics as it
1054                     provides  a  global socket identifier that can be assumed
1055                     unique.
1056
1057              Return A 8-byte long non-decreasing number on success, or  0  if
1058                     the socket field is missing inside skb.
1059
1060       u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
1061
1062              Description
1063                     Equivalent to bpf_get_socket_cookie() helper that accepts
1064                     skb, but gets socket from struct bpf_sock_addr context.
1065
1066              Return A 8-byte long non-decreasing number.
1067
1068       u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
1069
1070              Description
1071                     Equivalent to bpf_get_socket_cookie() helper that accepts
1072                     skb, but gets socket from struct bpf_sock_ops context.
1073
1074              Return A 8-byte long non-decreasing number.
1075
1076       u32 bpf_get_socket_uid(struct sk_buff *skb)
1077
1078              Return The  owner  UID  of  the socket associated to skb. If the
1079                     socket is NULL, or if it is not a full socket (i.e. if it
1080                     is  a time-wait or a request socket instead), overflowuid
1081                     value is returned (note that overflowuid  might  also  be
1082                     the actual UID value for the socket).
1083
1084       long bpf_set_hash(struct sk_buff *skb, u32 hash)
1085
1086              Description
1087                     Set  the  full  hash for skb (set the field skb->hash) to
1088                     value hash.
1089
1090              Return 0
1091
1092       long bpf_setsockopt(void *bpf_socket,  int  level,  int  optname,  void
1093       *optval, int optlen)
1094
1095              Description
1096                     Emulate  a  call to setsockopt() on the socket associated
1097                     to bpf_socket, which must be a full socket. The level  at
1098                     which  the option resides and the name optname of the op‐
1099                     tion must be specified, see setsockopt(2) for more infor‐
1100                     mation.   The option value of length optlen is pointed by
1101                     optval.
1102
1103                     bpf_socket should be one of the following:
1104
1105                     • struct bpf_sock_ops for BPF_PROG_TYPE_SOCK_OPS.
1106
1107                     • struct bpf_sock_addr for  BPF_CGROUP_INET4_CONNECT  and
1108                       BPF_CGROUP_INET6_CONNECT.
1109
1110                     This helper actually implements a subset of setsockopt().
1111                     It supports the following levels:
1112
1113                     • SOL_SOCKET,  which  supports  the  following  optnames:
1114                       SO_RCVBUF,  SO_SNDBUF, SO_MAX_PACING_RATE, SO_PRIORITY,
1115                       SO_RCVLOWAT, SO_MARK, SO_BINDTODEVICE, SO_KEEPALIVE.
1116
1117                     • IPPROTO_TCP, which  supports  the  following  optnames:
1118                       TCP_CONGESTION,    TCP_BPF_IW,   TCP_BPF_SNDCWND_CLAMP,
1119                       TCP_SAVE_SYN, TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT,
1120                       TCP_SYNCNT, TCP_USER_TIMEOUT.
1121
1122                     • IPPROTO_IP, which supports optname IP_TOS.
1123
1124                     • IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1125
1126              Return 0 on success, or a negative error in case of failure.
1127
1128       long  bpf_skb_adjust_room(struct  sk_buff *skb, s32 len_diff, u32 mode,
1129       u64 flags)
1130
1131              Description
1132                     Grow or shrink the room for data in the packet associated
1133                     to skb by len_diff, and according to the selected mode.
1134
1135                     By  default, the helper will reset any offloaded checksum
1136                     indicator of  the  skb  to  CHECKSUM_NONE.  This  can  be
1137                     avoided by the following flag:
1138
1139                     • BPF_F_ADJ_ROOM_NO_CSUM_RESET:  Do  not  reset offloaded
1140                       checksum data of the skb to CHECKSUM_NONE.
1141
1142                     There are two supported modes at this time:
1143
1144                     • BPF_ADJ_ROOM_MAC: Adjust room at the  mac  layer  (room
1145                       space is added or removed below the layer 2 header).
1146
1147                     • BPF_ADJ_ROOM_NET:  Adjust  room  at  the  network layer
1148                       (room space is added  or  removed  below  the  layer  3
1149                       header).
1150
1151                     The following flags are supported at this time:
1152
1153                     • BPF_F_ADJ_ROOM_FIXED_GSO:  Do not adjust gso_size.  Ad‐
1154                       justing mss in this way is not allowed for datagrams.
1155
1156                     • BPF_F_ADJ_ROOM_ENCAP_L3_IPV4,        BPF_F_ADJ_ROOM_EN‐
1157                       CAP_L3_IPV6: Any new space is reserved to hold a tunnel
1158                       header.  Configure skb offsets and other fields accord‐
1159                       ingly.
1160
1161                     • BPF_F_ADJ_ROOM_ENCAP_L4_GRE,         BPF_F_ADJ_ROOM_EN‐
1162                       CAP_L4_UDP: Use with ENCAP_L3 flags to further  specify
1163                       the tunnel type.
1164
1165                     • BPF_F_ADJ_ROOM_ENCAP_L2(len):   Use   with  ENCAP_L3/L4
1166                       flags to further specify the tunnel type;  len  is  the
1167                       length of the inner MAC header.
1168
1169                     A call to this helper is susceptible to change the under‐
1170                     lying packet buffer. Therefore, at load time, all  checks
1171                     on  pointers  previously done by the verifier are invali‐
1172                     dated and must be performed again, if the helper is  used
1173                     in combination with direct packet access.
1174
1175              Return 0 on success, or a negative error in case of failure.
1176
1177       long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
1178
1179              Description
1180                     Redirect  the packet to the endpoint referenced by map at
1181                     index key. Depending on its type, this  map  can  contain
1182                     references to net devices (for forwarding packets through
1183                     other ports), or to CPUs (for redirecting XDP  frames  to
1184                     another  CPU; but this is only implemented for native XDP
1185                     (with driver support) as of this writing).
1186
1187                     The lower two bits of flags are used as the  return  code
1188                     if the map lookup fails. This is so that the return value
1189                     can be one of the XDP program return codes up to  XDP_TX,
1190                     as chosen by the caller. Any higher bits in the flags ar‐
1191                     gument must be unset.
1192
1193                     See also bpf_redirect(), which only supports  redirecting
1194                     to an ifindex, but doesn't require a map to do so.
1195
1196              Return XDP_REDIRECT  on  success,  or the value of the two lower
1197                     bits of the flags argument on error.
1198
1199       long bpf_sk_redirect_map(struct sk_buff *skb, struct bpf_map *map,  u32
1200       key, u64 flags)
1201
1202              Description
1203                     Redirect  the  packet to the socket referenced by map (of
1204                     type BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1205                     egress  interfaces  can  be  used  for  redirection.  The
1206                     BPF_F_INGRESS value in flags is used to make the distinc‐
1207                     tion  (ingress  path  is selected if the flag is present,
1208                     egress path otherwise). This is the only  flag  supported
1209                     for now.
1210
1211              Return SK_PASS on success, or SK_DROP on error.
1212
1213       long  bpf_sock_map_update(struct  bpf_sock_ops  *skops,  struct bpf_map
1214       *map, void *key, u64 flags)
1215
1216              Description
1217                     Add an entry to, or update a map referencing sockets. The
1218                     skops  is used as a new value for the entry associated to
1219                     key. flags is one of:
1220
1221                     BPF_NOEXIST
1222                            The entry for key must not exist in the map.
1223
1224                     BPF_EXIST
1225                            The entry for key must already exist in the map.
1226
1227                     BPF_ANY
1228                            No condition on the existence  of  the  entry  for
1229                            key.
1230
1231                     If  the map has eBPF programs (parser and verdict), those
1232                     will be inherited by  the  socket  being  added.  If  the
1233                     socket is already attached to eBPF programs, this results
1234                     in an error.
1235
1236              Return 0 on success, or a negative error in case of failure.
1237
1238       long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
1239
1240              Description
1241                     Adjust the address pointed by xdp_md->data_meta by  delta
1242                     (which can be positive or negative). Note that this oper‐
1243                     ation modifies the address stored in xdp_md->data, so the
1244                     latter  must  be  loaded  only  after the helper has been
1245                     called.
1246
1247                     The use of xdp_md->data_meta is optional and programs are
1248                     not  required  to  use it. The rationale is that when the
1249                     packet is processed with XDP (e.g. as DoS filter), it  is
1250                     possible  to  push further meta data along with it before
1251                     passing to the stack, and to give the guarantee  that  an
1252                     ingress  eBPF  program attached as a TC classifier on the
1253                     same device can pick this up for further post-processing.
1254                     Since  TC  works with socket buffers, it remains possible
1255                     to set from XDP the mark or priority pointers,  or  other
1256                     pointers  for  the  socket  buffer.   Having this scratch
1257                     space generic and programmable allows for more  flexibil‐
1258                     ity  as the user is free to store whatever meta data they
1259                     need.
1260
1261                     A call to this helper is susceptible to change the under‐
1262                     lying  packet buffer. Therefore, at load time, all checks
1263                     on pointers previously done by the verifier  are  invali‐
1264                     dated  and must be performed again, if the helper is used
1265                     in combination with direct packet access.
1266
1267              Return 0 on success, or a negative error in case of failure.
1268
1269       long bpf_perf_event_read_value(struct bpf_map *map, u64  flags,  struct
1270       bpf_perf_event_value *buf, u32 buf_size)
1271
1272              Description
1273                     Read the value of a perf event counter, and store it into
1274                     buf of size buf_size. This helper relies on a map of type
1275                     BPF_MAP_TYPE_PERF_EVENT_ARRAY.  The  nature  of  the perf
1276                     event counter is selected when map is updated  with  perf
1277                     event file descriptors. The map is an array whose size is
1278                     the number of available CPUs, and each  cell  contains  a
1279                     value relative to one CPU. The value to retrieve is indi‐
1280                     cated by flags, that contains the index  of  the  CPU  to
1281                     look  up,  masked  with  BPF_F_INDEX_MASK. Alternatively,
1282                     flags can be set to BPF_F_CURRENT_CPU  to  indicate  that
1283                     the value for the current CPU should be retrieved.
1284
1285                     This    helper    behaves    in    a    way    close   to
1286                     bpf_perf_event_read() helper, save that instead  of  just
1287                     returning the value observed, it fills the buf structure.
1288                     This allows for additional data to be retrieved: in  par‐
1289                     ticular,  the  enabled and running times (in buf->enabled
1290                     and buf->running, respectively) are copied.  In  general,
1291                     bpf_perf_event_read_value()     is    recommended    over
1292                     bpf_perf_event_read(), which has some ABI issues and pro‐
1293                     vides fewer functionalities.
1294
1295                     These  values are interesting, because hardware PMU (Per‐
1296                     formance Monitoring Unit) counters are limited resources.
1297                     When  there  are  more  PMU based perf events opened than
1298                     available counters, kernel will multiplex these events so
1299                     each  event  gets certain percentage (but not all) of the
1300                     PMU time. In case that multiplexing happens,  the  number
1301                     of  samples  or  counter  value will not reflect the case
1302                     compared to when no multiplexing occurs. This makes  com‐
1303                     parison between different runs difficult.  Typically, the
1304                     counter value should be normalized  before  comparing  to
1305                     other  experiments.  The  usual  normalization is done as
1306                     follows.
1307
1308                        normalized_counter = counter * t_enabled / t_running
1309
1310                     Where t_enabled is the time enabled for event and  t_run‐
1311                     ning  is the time running for event since last normaliza‐
1312                     tion. The enabled and running times are accumulated since
1313                     the  perf  event  open. To achieve scaling factor between
1314                     two invocations of an eBPF program, users can use CPU  id
1315                     as  the key (which is typical for perf array usage model)
1316                     to remember the previous value and do the calculation in‐
1317                     side the eBPF program.
1318
1319              Return 0 on success, or a negative error in case of failure.
1320
1321       long  bpf_perf_prog_read_value(struct  bpf_perf_event_data *ctx, struct
1322       bpf_perf_event_value *buf, u32 buf_size)
1323
1324              Description
1325                     For en eBPF program attached to a  perf  event,  retrieve
1326                     the  value  of  the  event  counter associated to ctx and
1327                     store it in the structure pointed  by  buf  and  of  size
1328                     buf_size.  Enabled  and  running times are also stored in
1329                     the    structure    (see    description     of     helper
1330                     bpf_perf_event_read_value() for more details).
1331
1332              Return 0 on success, or a negative error in case of failure.
1333
1334       long  bpf_getsockopt(void  *bpf_socket,  int  level,  int optname, void
1335       *optval, int optlen)
1336
1337              Description
1338                     Emulate a call to getsockopt() on the  socket  associated
1339                     to  bpf_socket, which must be a full socket. The level at
1340                     which the option resides and the name optname of the  op‐
1341                     tion must be specified, see getsockopt(2) for more infor‐
1342                     mation.  The retrieved value is stored in  the  structure
1343                     pointed by opval and of length optlen.
1344
1345                     bpf_socket should be one of the following:
1346
1347                     • struct bpf_sock_ops for BPF_PROG_TYPE_SOCK_OPS.
1348
1349                     • struct  bpf_sock_addr  for BPF_CGROUP_INET4_CONNECT and
1350                       BPF_CGROUP_INET6_CONNECT.
1351
1352                     This helper actually implements a subset of getsockopt().
1353                     It supports the following levels:
1354
1355                     • IPPROTO_TCP, which supports optname TCP_CONGESTION.
1356
1357                     • IPPROTO_IP, which supports optname IP_TOS.
1358
1359                     • IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1360
1361              Return 0 on success, or a negative error in case of failure.
1362
1363       long bpf_override_return(struct pt_regs *regs, u64 rc)
1364
1365              Description
1366                     Used  for  error  injection,  this helper uses kprobes to
1367                     override the return value of the probed function, and  to
1368                     set  it to rc.  The first argument is the context regs on
1369                     which the kprobe works.
1370
1371                     This helper works by setting the PC (program counter)  to
1372                     an  override function which is run in place of the origi‐
1373                     nal probed function. This means the  probed  function  is
1374                     not  run  at  all.  The replacement function just returns
1375                     with the required value.
1376
1377                     This helper has security implications, and thus  is  sub‐
1378                     ject  to restrictions. It is only available if the kernel
1379                     was compiled with the CONFIG_BPF_KPROBE_OVERRIDE configu‐
1380                     ration  option,  and  in this case it only works on func‐
1381                     tions tagged with  ALLOW_ERROR_INJECTION  in  the  kernel
1382                     code.
1383
1384                     Also,  the helper is only available for the architectures
1385                     having the CONFIG_FUNCTION_ERROR_INJECTION option. As  of
1386                     this writing, x86 architecture is the only one to support
1387                     this feature.
1388
1389              Return 0
1390
1391       long  bpf_sock_ops_cb_flags_set(struct  bpf_sock_ops   *bpf_sock,   int
1392       argval)
1393
1394              Description
1395                     Attempt  to  set  the  value of the bpf_sock_ops_cb_flags
1396                     field for the full TCP socket associated to  bpf_sock_ops
1397                     to argval.
1398
1399                     The  primary  use  of this field is to determine if there
1400                     should   be   calls   to   eBPF    programs    of    type
1401                     BPF_PROG_TYPE_SOCK_OPS at various points in the TCP code.
1402                     A program of the same type can change its value, per con‐
1403                     nection  and  as necessary, when the connection is estab‐
1404                     lished. This field is directly  accessible  for  reading,
1405                     but  this helper must be used for updates in order to re‐
1406                     turn an error if an eBPF program tries to set a  callback
1407                     that is not supported in the current kernel.
1408
1409                     argval is a flag array which can combine these flags:
1410
1411                     • BPF_SOCK_OPS_RTO_CB_FLAG (retransmission time out)
1412
1413                     • BPF_SOCK_OPS_RETRANS_CB_FLAG (retransmission)
1414
1415                     • BPF_SOCK_OPS_STATE_CB_FLAG (TCP state change)
1416
1417                     • BPF_SOCK_OPS_RTT_CB_FLAG (every RTT)
1418
1419                     Therefore,  this function can be used to clear a callback
1420                     flag by setting the appropriate bit to zero. e.g. to dis‐
1421                     able the RTO callback:
1422
1423                     bpf_sock_ops_cb_flags_set(bpf_sock,
1424                            bpf_sock->bpf_sock_ops_cb_flags                  &
1425                            ~BPF_SOCK_OPS_RTO_CB_FLAG)
1426
1427                     Here are some examples of where one could call such  eBPF
1428                     program:
1429
1430                     • When RTO fires.
1431
1432                     • When a packet is retransmitted.
1433
1434                     • When the connection terminates.
1435
1436                     • When a packet is sent.
1437
1438                     • When a packet is received.
1439
1440              Return Code -EINVAL if the socket is not a full TCP socket; oth‐
1441                     erwise, a positive number containing the bits that  could
1442                     not be set is returned (which comes down to 0 if all bits
1443                     were set as required).
1444
1445       long bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map,
1446       u32 key, u64 flags)
1447
1448              Description
1449                     This  helper is used in programs implementing policies at
1450                     the socket level. If the message msg is allowed  to  pass
1451                     (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1452                     rect  it  to  the  socket  referenced  by  map  (of  type
1453                     BPF_MAP_TYPE_SOCKMAP)  at  index  key.  Both  ingress and
1454                     egress  interfaces  can  be  used  for  redirection.  The
1455                     BPF_F_INGRESS value in flags is used to make the distinc‐
1456                     tion (ingress path is selected if the  flag  is  present,
1457                     egress  path  otherwise). This is the only flag supported
1458                     for now.
1459
1460              Return SK_PASS on success, or SK_DROP on error.
1461
1462       long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
1463
1464              Description
1465                     For socket policies, apply the verdict of the  eBPF  pro‐
1466                     gram to the next bytes (number of bytes) of message msg.
1467
1468                     For  example,  this  helper  can be used in the following
1469                     cases:
1470
1471                     • A single sendmsg() or sendfile() system  call  contains
1472                       multiple logical messages that the eBPF program is sup‐
1473                       posed to read and for which it should apply a verdict.
1474
1475                     • An eBPF program only cares to read the first bytes of a
1476                       msg.  If  the message has a large payload, then setting
1477                       up and calling the  eBPF  program  repeatedly  for  all
1478                       bytes,  even though the verdict is already known, would
1479                       create unnecessary overhead.
1480
1481                     When called from within an eBPF program, the helper  sets
1482                     a  counter  internal  to  the BPF infrastructure, that is
1483                     used to apply the last verdict  to  the  next  bytes.  If
1484                     bytes  is  smaller  than the current data being processed
1485                     from a sendmsg() or sendfile()  system  call,  the  first
1486                     bytes  will  be  sent and the eBPF program will be re-run
1487                     with the pointer for start of data pointing to byte  num‐
1488                     ber  bytes  + 1. If bytes is larger than the current data
1489                     being processed, then the eBPF verdict will be applied to
1490                     multiple  sendmsg()  or  sendfile() calls until bytes are
1491                     consumed.
1492
1493                     Note that if a socket closes with  the  internal  counter
1494                     holding  a  non-zero value, this is not a problem because
1495                     data is not being buffered for bytes and is sent as it is
1496                     received.
1497
1498              Return 0
1499
1500       long bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
1501
1502              Description
1503                     For socket policies, prevent the execution of the verdict
1504                     eBPF program for message msg until  bytes  (byte  number)
1505                     have been accumulated.
1506
1507                     This  can  be  used  when  one needs a specific number of
1508                     bytes before a verdict can be assigned, even if the  data
1509                     spans multiple sendmsg() or sendfile() calls. The extreme
1510                     case would be a user calling  sendmsg()  repeatedly  with
1511                     1-byte  long message segments. Obviously, this is bad for
1512                     performance, but it is still valid. If the  eBPF  program
1513                     needs  bytes  bytes to validate a header, this helper can
1514                     be used to prevent the eBPF program to  be  called  again
1515                     until bytes have been accumulated.
1516
1517              Return 0
1518
1519       long bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64
1520       flags)
1521
1522              Description
1523                     For socket policies, pull in non-linear  data  from  user
1524                     space   for   msg   and   set   pointers   msg->data  and
1525                     msg->data_end to start and end bytes  offsets  into  msg,
1526                     respectively.
1527
1528                     If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
1529                     it can only parse data that the (data, data_end) pointers
1530                     have already consumed. For sendmsg() hooks this is likely
1531                     the first scatterlist element. But for calls  relying  on
1532                     the  sendpage  handler (e.g. sendfile()) this will be the
1533                     range (0, 0) because the data is shared with  user  space
1534                     and  by  default  the objective is to avoid allowing user
1535                     space to modify data while (or after) eBPF verdict is be‐
1536                     ing  decided. This helper can be used to pull in data and
1537                     to set the start and end pointer to  given  values.  Data
1538                     will  be copied if necessary (i.e. if data was not linear
1539                     and if start and end pointers do not point  to  the  same
1540                     chunk).
1541
1542                     A call to this helper is susceptible to change the under‐
1543                     lying packet buffer. Therefore, at load time, all  checks
1544                     on  pointers  previously done by the verifier are invali‐
1545                     dated and must be performed again, if the helper is  used
1546                     in combination with direct packet access.
1547
1548                     All  values  for flags are reserved for future usage, and
1549                     must be left at zero.
1550
1551              Return 0 on success, or a negative error in case of failure.
1552
1553       long bpf_bind(struct bpf_sock_addr *ctx,  struct  sockaddr  *addr,  int
1554       addr_len)
1555
1556              Description
1557                     Bind  the socket associated to ctx to the address pointed
1558                     by addr, of length addr_len. This allows for making  out‐
1559                     going  connection  from the desired IP address, which can
1560                     be useful for example when all processes inside a  cgroup
1561                     should  use one single IP address on a host that has mul‐
1562                     tiple IP configured.
1563
1564                     This helper works for IPv4 and IPv6, TCP and UDP sockets.
1565                     The   domain   (addr->sa_family)   must  be  AF_INET  (or
1566                     AF_INET6). It's advised to pass zero  port  (sin_port  or
1567                     sin6_port)  which  triggers  IP_BIND_ADDRESS_NO_PORT-like
1568                     behavior and lets the kernel efficiently pick up  an  un‐
1569                     used  port as long as 4-tuple is unique. Passing non-zero
1570                     port might lead to degraded performance.
1571
1572              Return 0 on success, or a negative error in case of failure.
1573
1574       long bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
1575
1576              Description
1577                     Adjust (move) xdp_md->data_end by delta bytes. It is pos‐
1578                     sible  to  both  shrink and grow the packet tail.  Shrink
1579                     done via delta being a negative integer.
1580
1581                     A call to this helper is susceptible to change the under‐
1582                     lying  packet buffer. Therefore, at load time, all checks
1583                     on pointers previously done by the verifier  are  invali‐
1584                     dated  and must be performed again, if the helper is used
1585                     in combination with direct packet access.
1586
1587              Return 0 on success, or a negative error in case of failure.
1588
1589       long bpf_skb_get_xfrm_state(struct  sk_buff  *skb,  u32  index,  struct
1590       bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
1591
1592              Description
1593                     Retrieve the XFRM state (IP transform framework, see also
1594                     ip-xfrm(8)) at index in XFRM "security path" for skb.
1595
1596                     The   retrieved   value   is   stored   in   the   struct
1597                     bpf_xfrm_state pointed by xfrm_state and of length size.
1598
1599                     All  values  for flags are reserved for future usage, and
1600                     must be left at zero.
1601
1602                     This helper is available only if the kernel was  compiled
1603                     with CONFIG_XFRM configuration option.
1604
1605              Return 0 on success, or a negative error in case of failure.
1606
1607       long bpf_get_stack(void *ctx, void *buf, u32 size, u64 flags)
1608
1609              Description
1610                     Return  a  user or a kernel stack in bpf program provided
1611                     buffer.  To achieve this, the helper needs ctx, which  is
1612                     a  pointer to the context on which the tracing program is
1613                     executed.  To store the stacktrace, the bpf program  pro‐
1614                     vides buf with a nonnegative size.
1615
1616                     The  last  argument,  flags,  holds  the  number of stack
1617                     frames  to  skip   (from   0   to   255),   masked   with
1618                     BPF_F_SKIP_FIELD_MASK.  The  next bits can be used to set
1619                     the following flags:
1620
1621                     BPF_F_USER_STACK
1622                            Collect a user space stack  instead  of  a  kernel
1623                            stack.
1624
1625                     BPF_F_USER_BUILD_ID
1626                            Collect  buildid+offset  instead  of  ips for user
1627                            stack, only  valid  if  BPF_F_USER_STACK  is  also
1628                            specified.
1629
1630                     bpf_get_stack()  can  collect  up to PERF_MAX_STACK_DEPTH
1631                     both kernel and user frames, subject to sufficient  large
1632                     buffer  size. Note that this limit can be controlled with
1633                     the sysctl program, and that it should  be  manually  in‐
1634                     creased  in  order  to  profile long user stacks (such as
1635                     stacks for Java programs). To do so, use:
1636
1637                        # sysctl kernel.perf_event_max_stack=<new value>
1638
1639              Return A non-negative value equal to or less than size  on  suc‐
1640                     cess, or a negative error in case of failure.
1641
1642       long bpf_skb_load_bytes_relative(const void *skb, u32 offset, void *to,
1643       u32 len, u32 start_header)
1644
1645              Description
1646                     This helper is similar to bpf_skb_load_bytes() in that it
1647                     provides  an  easy way to load len bytes from offset from
1648                     the packet associated to skb, into the buffer pointed  by
1649                     to.  The  difference  to  bpf_skb_load_bytes()  is that a
1650                     fifth argument start_header exists in order to  select  a
1651                     base offset to start from. start_header can be one of:
1652
1653                     BPF_HDR_START_MAC
1654                            Base offset to load data from is skb's mac header.
1655
1656                     BPF_HDR_START_NET
1657                            Base  offset  to  load  data from is skb's network
1658                            header.
1659
1660                     In general,  "direct  packet  access"  is  the  preferred
1661                     method  to access packet data, however, this helper is in
1662                     particular useful in socket filters where skb->data  does
1663                     not always point to the start of the mac header and where
1664                     "direct packet access" is not available.
1665
1666              Return 0 on success, or a negative error in case of failure.
1667
1668       long bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen,
1669       u32 flags)
1670
1671              Description
1672                     Do  FIB  lookup  in  kernel  tables  using  parameters in
1673                     params.  If lookup is successful and result shows  packet
1674                     is  to be forwarded, the neighbor tables are searched for
1675                     the nexthop.  If successful (ie., FIB lookup  shows  for‐
1676                     warding  and nexthop is resolved), the nexthop address is
1677                     returned in ipv4_dst or ipv6_dst based on family, smac is
1678                     set  to mac address of egress device, dmac is set to nex‐
1679                     thop mac address, rt_metric is set to metric  from  route
1680                     (IPv4/IPv6  only), and ifindex is set to the device index
1681                     of the nexthop from the FIB lookup.
1682
1683                     plen argument is the size of the passed in struct.  flags
1684                     argument  can be a combination of one or more of the fol‐
1685                     lowing values:
1686
1687                     BPF_FIB_LOOKUP_DIRECT
1688                            Do a direct table lookup vs full lookup using  FIB
1689                            rules.
1690
1691                     BPF_FIB_LOOKUP_OUTPUT
1692                            Perform lookup from an egress perspective (default
1693                            is ingress).
1694
1695                     ctx is either struct xdp_md for XDP  programs  or  struct
1696                     sk_buff tc cls_act programs.
1697
1698              Return
1699
1700                     • < 0 if any input argument is invalid
1701
1702                     • 0 on success (packet is forwarded, nexthop neighbor ex‐
1703                       ists)
1704
1705                     • > 0 one of BPF_FIB_LKUP_RET_ codes explaining  why  the
1706                       packet is not forwarded or needs assist from full stack
1707
1708       long  bpf_sock_hash_update(struct  bpf_sock_ops  *skops, struct bpf_map
1709       *map, void *key, u64 flags)
1710
1711              Description
1712                     Add an entry to, or update  a  sockhash  map  referencing
1713                     sockets.   The skops is used as a new value for the entry
1714                     associated to key. flags is one of:
1715
1716                     BPF_NOEXIST
1717                            The entry for key must not exist in the map.
1718
1719                     BPF_EXIST
1720                            The entry for key must already exist in the map.
1721
1722                     BPF_ANY
1723                            No condition on the existence  of  the  entry  for
1724                            key.
1725
1726                     If  the map has eBPF programs (parser and verdict), those
1727                     will be inherited by  the  socket  being  added.  If  the
1728                     socket is already attached to eBPF programs, this results
1729                     in an error.
1730
1731              Return 0 on success, or a negative error in case of failure.
1732
1733       long  bpf_msg_redirect_hash(struct  sk_msg_buff  *msg,  struct  bpf_map
1734       *map, void *key, u64 flags)
1735
1736              Description
1737                     This  helper is used in programs implementing policies at
1738                     the socket level. If the message msg is allowed  to  pass
1739                     (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1740                     rect  it  to  the  socket  referenced  by  map  (of  type
1741                     BPF_MAP_TYPE_SOCKHASH)  using  hash key. Both ingress and
1742                     egress  interfaces  can  be  used  for  redirection.  The
1743                     BPF_F_INGRESS value in flags is used to make the distinc‐
1744                     tion (ingress path is selected if the  flag  is  present,
1745                     egress  path  otherwise). This is the only flag supported
1746                     for now.
1747
1748              Return SK_PASS on success, or SK_DROP on error.
1749
1750       long bpf_sk_redirect_hash(struct sk_buff  *skb,  struct  bpf_map  *map,
1751       void *key, u64 flags)
1752
1753              Description
1754                     This  helper is used in programs implementing policies at
1755                     the skb socket level. If the sk_buff skb  is  allowed  to
1756                     pass (i.e.  if the verdict eBPF program returns SK_PASS),
1757                     redirect it to the socket  referenced  by  map  (of  type
1758                     BPF_MAP_TYPE_SOCKHASH)  using  hash key. Both ingress and
1759                     egress  interfaces  can  be  used  for  redirection.  The
1760                     BPF_F_INGRESS value in flags is used to make the distinc‐
1761                     tion (ingress path is selected if the  flag  is  present,
1762                     egress  otherwise).  This  is the only flag supported for
1763                     now.
1764
1765              Return SK_PASS on success, or SK_DROP on error.
1766
1767       long bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void  *hdr,  u32
1768       len)
1769
1770              Description
1771                     Encapsulate the packet associated to skb within a Layer 3
1772                     protocol header. This header is provided in the buffer at
1773                     address  hdr,  with len its size in bytes. type indicates
1774                     the protocol of the header and can be one of:
1775
1776                     BPF_LWT_ENCAP_SEG6
1777                            IPv6 encapsulation  with  Segment  Routing  Header
1778                            (struct  ipv6_sr_hdr).  hdr only contains the SRH,
1779                            the IPv6 header is computed by the kernel.
1780
1781                     BPF_LWT_ENCAP_SEG6_INLINE
1782                            Only works if skb contains an IPv6 packet.  Insert
1783                            a  Segment Routing Header (struct ipv6_sr_hdr) in‐
1784                            side the IPv6 header.
1785
1786                     BPF_LWT_ENCAP_IP
1787                            IP  encapsulation  (GRE/GUE/IPIP/etc).  The  outer
1788                            header  must  be IPv4 or IPv6, followed by zero or
1789                            more additional headers, up  to  LWT_BPF_MAX_HEAD‐
1790                            ROOM  total bytes in all prepended headers. Please
1791                            note that if skb_is_gso(skb) is true, no more than
1792                            two  headers  can  be  prepended,  and  the  inner
1793                            header,  if  present,  should  be  either  GRE  or
1794                            UDP/GUE.
1795
1796                     BPF_LWT_ENCAP_SEG6*  types  can be called by BPF programs
1797                     of type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP  type  can
1798                     be  called  by bpf programs of types BPF_PROG_TYPE_LWT_IN
1799                     and BPF_PROG_TYPE_LWT_XMIT.
1800
1801                     A call to this helper is susceptible to change the under‐
1802                     lying  packet buffer. Therefore, at load time, all checks
1803                     on pointers previously done by the verifier  are  invali‐
1804                     dated  and must be performed again, if the helper is used
1805                     in combination with direct packet access.
1806
1807              Return 0 on success, or a negative error in case of failure.
1808
1809       long bpf_lwt_seg6_store_bytes(struct sk_buff *skb,  u32  offset,  const
1810       void *from, u32 len)
1811
1812              Description
1813                     Store len bytes from address from into the packet associ‐
1814                     ated to skb, at offset. Only the flags, tag and TLVs  in‐
1815                     side  the  outermost  IPv6  Segment Routing Header can be
1816                     modified through this helper.
1817
1818                     A call to this helper is susceptible to change the under‐
1819                     lying  packet buffer. Therefore, at load time, all checks
1820                     on pointers previously done by the verifier  are  invali‐
1821                     dated  and must be performed again, if the helper is used
1822                     in combination with direct packet access.
1823
1824              Return 0 on success, or a negative error in case of failure.
1825
1826       long  bpf_lwt_seg6_adjust_srh(struct  sk_buff  *skb,  u32  offset,  s32
1827       delta)
1828
1829              Description
1830                     Adjust  the  size allocated to TLVs in the outermost IPv6
1831                     Segment Routing Header contained in the packet associated
1832                     to  skb,  at position offset by delta bytes. Only offsets
1833                     after the segments are accepted. delta  can  be  as  well
1834                     positive (growing) as negative (shrinking).
1835
1836                     A call to this helper is susceptible to change the under‐
1837                     lying packet buffer. Therefore, at load time, all  checks
1838                     on  pointers  previously done by the verifier are invali‐
1839                     dated and must be performed again, if the helper is  used
1840                     in combination with direct packet access.
1841
1842              Return 0 on success, or a negative error in case of failure.
1843
1844       long  bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param,
1845       u32 param_len)
1846
1847              Description
1848                     Apply an IPv6 Segment Routing action of  type  action  to
1849                     the packet associated to skb. Each action takes a parame‐
1850                     ter contained at address param, and of  length  param_len
1851                     bytes.  action can be one of:
1852
1853                     SEG6_LOCAL_ACTION_END_X
1854                            End.X action: Endpoint with Layer-3 cross-connect.
1855                            Type of param: struct in6_addr.
1856
1857                     SEG6_LOCAL_ACTION_END_T
1858                            End.T action: Endpoint with  specific  IPv6  table
1859                            lookup.  Type of param: int.
1860
1861                     SEG6_LOCAL_ACTION_END_B6
1862                            End.B6  action:  Endpoint bound to an SRv6 policy.
1863                            Type of param: struct ipv6_sr_hdr.
1864
1865                     SEG6_LOCAL_ACTION_END_B6_ENCAP
1866                            End.B6.Encap action: Endpoint bound to an SRv6 en‐
1867                            capsulation   policy.    Type   of  param:  struct
1868                            ipv6_sr_hdr.
1869
1870                     A call to this helper is susceptible to change the under‐
1871                     lying  packet buffer. Therefore, at load time, all checks
1872                     on pointers previously done by the verifier  are  invali‐
1873                     dated  and must be performed again, if the helper is used
1874                     in combination with direct packet access.
1875
1876              Return 0 on success, or a negative error in case of failure.
1877
1878       long bpf_rc_repeat(void *ctx)
1879
1880              Description
1881                     This helper is used in programs implementing IR decoding,
1882                     to report a successfully decoded repeat key message. This
1883                     delays the generation of a key up  event  for  previously
1884                     generated key down event.
1885
1886                     Some  IR protocols like NEC have a special IR message for
1887                     repeating last button, for when a button is held down.
1888
1889                     The ctx should point to the lirc sample  as  passed  into
1890                     the program.
1891
1892                     This  helper is only available is the kernel was compiled
1893                     with the CONFIG_BPF_LIRC_MODE2 configuration  option  set
1894                     to "y".
1895
1896              Return 0
1897
1898       long bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
1899
1900              Description
1901                     This helper is used in programs implementing IR decoding,
1902                     to report a successfully decoded key press with scancode,
1903                     toggle  value in the given protocol. The scancode will be
1904                     translated to a keycode using the rc keymap, and reported
1905                     as an input key down event. After a period a key up event
1906                     is generated. This period can be extended by calling  ei‐
1907                     ther  bpf_rc_keydown()  again  with  the  same values, or
1908                     calling bpf_rc_repeat().
1909
1910                     Some protocols include a toggle bit, in case  the  button
1911                     was  released and pressed again between consecutive scan‐
1912                     codes.
1913
1914                     The ctx should point to the lirc sample  as  passed  into
1915                     the program.
1916
1917                     The  protocol  is  the  decoded protocol number (see enum
1918                     rc_proto for some predefined values).
1919
1920                     This helper is only available is the kernel was  compiled
1921                     with  the  CONFIG_BPF_LIRC_MODE2 configuration option set
1922                     to "y".
1923
1924              Return 0
1925
1926       u64 bpf_skb_cgroup_id(struct sk_buff *skb)
1927
1928              Description
1929                     Return the cgroup v2 id of the socket associated with the
1930                     skb.  This is roughly similar to the bpf_get_cgroup_clas‐
1931                     sid() helper for cgroup v1 by providing a tag resp. iden‐
1932                     tifier  that  can  be  matched on or used for map lookups
1933                     e.g. to implement policy. The cgroup v2  id  of  a  given
1934                     path  in  the  hierarchy is exposed in user space through
1935                     the f_handle API in order to get to the same 64-bit id.
1936
1937                     This helper can be used on TC egress  path,  but  not  on
1938                     ingress, and is available only if the kernel was compiled
1939                     with the CONFIG_SOCK_CGROUP_DATA configuration option.
1940
1941              Return The id is returned or 0 in case the id could not  be  re‐
1942                     trieved.
1943
1944       u64 bpf_get_current_cgroup_id(void)
1945
1946              Return A  64-bit  integer containing the current cgroup id based
1947                     on the cgroup within which the current task is running.
1948
1949       void *bpf_get_local_storage(void *map, u64 flags)
1950
1951              Description
1952                     Get the pointer to the local storage area.  The type  and
1953                     the size of the local storage is defined by the map argu‐
1954                     ment.  The flags meaning is specific for each  map  type,
1955                     and has to be 0 for cgroup local storage.
1956
1957                     Depending  on  the BPF program type, a local storage area
1958                     can be shared between multiple instances of the BPF  pro‐
1959                     gram, running simultaneously.
1960
1961                     A  user should care about the synchronization by himself.
1962                     For example, by using the BPF_STX_XADD instruction to al‐
1963                     ter the shared data.
1964
1965              Return A pointer to the local storage area.
1966
1967       long   bpf_sk_select_reuseport(struct  sk_reuseport_md  *reuse,  struct
1968       bpf_map *map, void *key, u64 flags)
1969
1970              Description
1971                     Select a SO_REUSEPORT socket from  a  BPF_MAP_TYPE_REUSE‐
1972                     PORT_ARRAY  map.  It checks the selected socket is match‐
1973                     ing the incoming request in the socket buffer.
1974
1975              Return 0 on success, or a negative error in case of failure.
1976
1977       u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level)
1978
1979              Description
1980                     Return id of cgroup v2 that is ancestor of cgroup associ‐
1981                     ated with the skb at the ancestor_level.  The root cgroup
1982                     is at ancestor_level zero and each step down the  hierar‐
1983                     chy  increments  the level. If ancestor_level == level of
1984                     cgroup associated with skb, then  return  value  will  be
1985                     same as that of bpf_skb_cgroup_id().
1986
1987                     The  helper  is  useful  to  implement  policies based on
1988                     cgroups that are upper in hierarchy than immediate cgroup
1989                     associated with skb.
1990
1991                     The format of returned id and helper limitations are same
1992                     as in bpf_skb_cgroup_id().
1993
1994              Return The id is returned or 0 in case the id could not  be  re‐
1995                     trieved.
1996
1997       struct  bpf_sock  *bpf_sk_lookup_tcp(void  *ctx,  struct bpf_sock_tuple
1998       *tuple, u32 tuple_size, u64 netns, u64 flags)
1999
2000              Description
2001                     Look for TCP socket matching tuple, optionally in a child
2002                     network   namespace  netns.  The  return  value  must  be
2003                     checked, and if non-NULL, released via bpf_sk_release().
2004
2005                     The ctx should point to the context of the program,  such
2006                     as the skb or socket (depending on the hook in use). This
2007                     is used to determine the base network namespace  for  the
2008                     lookup.
2009
2010                     tuple_size must be one of:
2011
2012                     sizeof(tuple->ipv4)
2013                            Look for an IPv4 socket.
2014
2015                     sizeof(tuple->ipv6)
2016                            Look for an IPv6 socket.
2017
2018                     If  the  netns  is a negative signed 32-bit integer, then
2019                     the socket lookup table in the netns associated with  the
2020                     ctx  will be used. For the TC hooks, this is the netns of
2021                     the device in the skb. For  socket  hooks,  this  is  the
2022                     netns of the socket.  If netns is any other signed 32-bit
2023                     value greater than or equal to zero then it specifies the
2024                     ID of the netns relative to the netns associated with the
2025                     ctx. netns values beyond the range of 32-bit integers are
2026                     reserved for future use.
2027
2028                     All  values  for flags are reserved for future usage, and
2029                     must be left at zero.
2030
2031                     This helper is available only if the kernel was  compiled
2032                     with CONFIG_NET configuration option.
2033
2034              Return Pointer  to  struct bpf_sock, or NULL in case of failure.
2035                     For sockets with reuseport option,  the  struct  bpf_sock
2036                     result  is  from reuse->socks[] using the hash of the tu‐
2037                     ple.
2038
2039       struct bpf_sock  *bpf_sk_lookup_udp(void  *ctx,  struct  bpf_sock_tuple
2040       *tuple, u32 tuple_size, u64 netns, u64 flags)
2041
2042              Description
2043                     Look for UDP socket matching tuple, optionally in a child
2044                     network  namespace  netns.  The  return  value  must   be
2045                     checked, and if non-NULL, released via bpf_sk_release().
2046
2047                     The  ctx should point to the context of the program, such
2048                     as the skb or socket (depending on the hook in use). This
2049                     is  used  to determine the base network namespace for the
2050                     lookup.
2051
2052                     tuple_size must be one of:
2053
2054                     sizeof(tuple->ipv4)
2055                            Look for an IPv4 socket.
2056
2057                     sizeof(tuple->ipv6)
2058                            Look for an IPv6 socket.
2059
2060                     If the netns is a negative signed  32-bit  integer,  then
2061                     the  socket lookup table in the netns associated with the
2062                     ctx will be used. For the TC hooks, this is the netns  of
2063                     the  device  in  the  skb.  For socket hooks, this is the
2064                     netns of the socket.  If netns is any other signed 32-bit
2065                     value greater than or equal to zero then it specifies the
2066                     ID of the netns relative to the netns associated with the
2067                     ctx. netns values beyond the range of 32-bit integers are
2068                     reserved for future use.
2069
2070                     All values for flags are reserved for future  usage,  and
2071                     must be left at zero.
2072
2073                     This  helper is available only if the kernel was compiled
2074                     with CONFIG_NET configuration option.
2075
2076              Return Pointer to struct bpf_sock, or NULL in case  of  failure.
2077                     For  sockets  with  reuseport option, the struct bpf_sock
2078                     result is from reuse->socks[] using the hash of  the  tu‐
2079                     ple.
2080
2081       long bpf_sk_release(struct bpf_sock *sock)
2082
2083              Description
2084                     Release  the  reference  held  by  sock.  sock  must be a
2085                     non-NULL    pointer    that     was     returned     from
2086                     bpf_sk_lookup_xxx().
2087
2088              Return 0 on success, or a negative error in case of failure.
2089
2090       long  bpf_map_push_elem(struct  bpf_map  *map,  const  void *value, u64
2091       flags)
2092
2093              Description
2094                     Push an element value in map. flags is one of:
2095
2096                     BPF_EXIST
2097                            If the queue/stack is full, the oldest element  is
2098                            removed to make room for this.
2099
2100              Return 0 on success, or a negative error in case of failure.
2101
2102       long bpf_map_pop_elem(struct bpf_map *map, void *value)
2103
2104              Description
2105                     Pop an element from map.
2106
2107              Return 0 on success, or a negative error in case of failure.
2108
2109       long bpf_map_peek_elem(struct bpf_map *map, void *value)
2110
2111              Description
2112                     Get an element from map without removing it.
2113
2114              Return 0 on success, or a negative error in case of failure.
2115
2116       long bpf_msg_push_data(struct sk_msg_buff *msg, u32 start, u32 len, u64
2117       flags)
2118
2119              Description
2120                     For socket policies, insert len bytes into msg at  offset
2121                     start.
2122
2123                     If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
2124                     it may want to insert metadata or options into  the  msg.
2125                     This can later be read and used by any of the lower layer
2126                     BPF hooks.
2127
2128                     This helper may fail if under memory pressure  (a  malloc
2129                     fails)  in these cases BPF programs will get an appropri‐
2130                     ate error and BPF programs will need to handle them.
2131
2132              Return 0 on success, or a negative error in case of failure.
2133
2134       long bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 len,  u64
2135       flags)
2136
2137              Description
2138                     Will  remove len bytes from a msg starting at byte start.
2139                     This may result in ENOMEM errors under certain situations
2140                     if an allocation and copy are required due to a full ring
2141                     buffer.  However, the helper will try to avoid doing  the
2142                     allocation  if  possible. Other errors can occur if input
2143                     parameters are invalid either due to start byte not being
2144                     valid  part  of  msg  payload  and/or  pop value being to
2145                     large.
2146
2147              Return 0 on success, or a negative error in case of failure.
2148
2149       long bpf_rc_pointer_rel(void *ctx, s32 rel_x, s32 rel_y)
2150
2151              Description
2152                     This helper is used in programs implementing IR decoding,
2153                     to report a successfully decoded pointer movement.
2154
2155                     The  ctx  should  point to the lirc sample as passed into
2156                     the program.
2157
2158                     This helper is only available is the kernel was  compiled
2159                     with  the  CONFIG_BPF_LIRC_MODE2 configuration option set
2160                     to "y".
2161
2162              Return 0
2163
2164       long bpf_spin_lock(struct bpf_spin_lock *lock)
2165
2166              Description
2167                     Acquire a spinlock represented by the pointer lock, which
2168                     is  stored  as  part of a value of a map. Taking the lock
2169                     allows to safely update the rest of the  fields  in  that
2170                     value. The spinlock can (and must) later be released with
2171                     a call to bpf_spin_unlock(lock).
2172
2173                     Spinlocks in BPF programs come with a number of  restric‐
2174                     tions and constraints:
2175
2176                     • bpf_spin_lock  objects  are only allowed inside maps of
2177                       types BPF_MAP_TYPE_HASH  and  BPF_MAP_TYPE_ARRAY  (this
2178                       list could be extended in the future).
2179
2180                     • BTF description of the map is mandatory.
2181
2182                     • The BPF program can take ONE lock at a time, since tak‐
2183                       ing two or more could cause dead locks.
2184
2185                     • Only one struct bpf_spin_lock is allowed per  map  ele‐
2186                       ment.
2187
2188                     • When  the  lock  is  taken, calls (either BPF to BPF or
2189                       helpers) are not allowed.
2190
2191                     • The BPF_LD_ABS and BPF_LD_IND instructions are not  al‐
2192                       lowed inside a spinlock-ed region.
2193
2194                     • The  BPF program MUST call bpf_spin_unlock() to release
2195                       the lock, on all execution paths, before it returns.
2196
2197                     • The BPF program can access  struct  bpf_spin_lock  only
2198                       via  the bpf_spin_lock() and bpf_spin_unlock() helpers.
2199                       Loading or storing data into the  struct  bpf_spin_lock
2200                       lock; field of a map is not allowed.
2201
2202                     • To  use the bpf_spin_lock() helper, the BTF description
2203                       of the map value must  be  a  struct  and  have  struct
2204                       bpf_spin_lock  anyname; field at the top level.  Nested
2205                       lock inside another struct is not allowed.
2206
2207                     • The struct bpf_spin_lock lock field in a map value must
2208                       be aligned on a multiple of 4 bytes in that value.
2209
2210                     • Syscall  with command BPF_MAP_LOOKUP_ELEM does not copy
2211                       the bpf_spin_lock field to user space.
2212
2213                     • Syscall with  command  BPF_MAP_UPDATE_ELEM,  or  update
2214                       from  a  BPF  program,  do not update the bpf_spin_lock
2215                       field.
2216
2217                     • bpf_spin_lock cannot be on the stack or inside  a  net‐
2218                       working packet (it can only be inside of a map values).
2219
2220                     • bpf_spin_lock is available to root only.
2221
2222                     • Tracing  programs and socket filter programs cannot use
2223                       bpf_spin_lock() due to insufficient  preemption  checks
2224                       (but this may change in the future).
2225
2226                     • bpf_spin_lock   is   not   allowed  in  inner  maps  of
2227                       map-in-map.
2228
2229              Return 0
2230
2231       long bpf_spin_unlock(struct bpf_spin_lock *lock)
2232
2233              Description
2234                     Release  the  lock  previously  locked  by  a   call   to
2235                     bpf_spin_lock(lock).
2236
2237              Return 0
2238
2239       struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
2240
2241              Description
2242                     This  helper gets a struct bpf_sock pointer such that all
2243                     the fields in this bpf_sock can be accessed.
2244
2245              Return A struct bpf_sock pointer on success, or NULL in case  of
2246                     failure.
2247
2248       struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
2249
2250              Description
2251                     This  helper  gets  a  struct bpf_tcp_sock pointer from a
2252                     struct bpf_sock pointer.
2253
2254              Return A struct bpf_tcp_sock pointer on success, or NULL in case
2255                     of failure.
2256
2257       long bpf_skb_ecn_set_ce(struct sk_buff *skb)
2258
2259              Description
2260                     Set  ECN  (Explicit  Congestion Notification) field of IP
2261                     header to CE (Congestion Encountered) if current value is
2262                     ECT (ECN Capable Transport). Otherwise, do nothing. Works
2263                     with IPv6 and IPv4.
2264
2265              Return 1 if the CE flag is set (either  by  the  current  helper
2266                     call  or  because it was already present), 0 if it is not
2267                     set.
2268
2269       struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)
2270
2271              Description
2272                     Return a struct bpf_sock  pointer  in  TCP_LISTEN  state.
2273                     bpf_sk_release() is unnecessary and not allowed.
2274
2275              Return A  struct bpf_sock pointer on success, or NULL in case of
2276                     failure.
2277
2278       struct bpf_sock *bpf_skc_lookup_tcp(void  *ctx,  struct  bpf_sock_tuple
2279       *tuple, u32 tuple_size, u64 netns, u64 flags)
2280
2281              Description
2282                     Look for TCP socket matching tuple, optionally in a child
2283                     network  namespace  netns.  The  return  value  must   be
2284                     checked, and if non-NULL, released via bpf_sk_release().
2285
2286                     This function is identical to bpf_sk_lookup_tcp(), except
2287                     that it also returns timewait  or  request  sockets.  Use
2288                     bpf_sk_fullsock()  or  bpf_tcp_sock()  to access the full
2289                     structure.
2290
2291                     This helper is available only if the kernel was  compiled
2292                     with CONFIG_NET configuration option.
2293
2294              Return Pointer  to  struct bpf_sock, or NULL in case of failure.
2295                     For sockets with reuseport option,  the  struct  bpf_sock
2296                     result  is  from reuse->socks[] using the hash of the tu‐
2297                     ple.
2298
2299       long  bpf_tcp_check_syncookie(struct  bpf_sock  *sk,  void  *iph,   u32
2300       iph_len, struct tcphdr *th, u32 th_len)
2301
2302              Description
2303                     Check  whether  iph and th contain a valid SYN cookie ACK
2304                     for the listening socket in sk.
2305
2306                     iph points to the start of the IPv4 or IPv6 header, while
2307                     iph_len  contains  sizeof(struct  iphdr) or sizeof(struct
2308                     ip6hdr).
2309
2310                     th points to the start of the TCP  header,  while  th_len
2311                     contains sizeof(struct tcphdr).
2312
2313              Return 0 if iph and th are a valid SYN cookie ACK, or a negative
2314                     error otherwise.
2315
2316       long bpf_sysctl_get_name(struct  bpf_sysctl  *ctx,  char  *buf,  size_t
2317       buf_len, u64 flags)
2318
2319              Description
2320                     Get  name  of  sysctl in /proc/sys/ and copy it into pro‐
2321                     vided by program buffer buf of size buf_len.
2322
2323                     The  buffer  is  always  NUL  terminated,   unless   it's
2324                     zero-sized.
2325
2326                     If  flags is zero, full name (e.g. "net/ipv4/tcp_mem") is
2327                     copied. Use BPF_F_SYSCTL_BASE_NAME flag to copy base name
2328                     only (e.g. "tcp_mem").
2329
2330              Return Number  of  character  copied (not including the trailing
2331                     NUL).
2332
2333                     -E2BIG if the buffer wasn't big enough (buf will  contain
2334                     truncated name in this case).
2335
2336       long  bpf_sysctl_get_current_value(struct  bpf_sysctl  *ctx, char *buf,
2337       size_t buf_len)
2338
2339              Description
2340                     Get current  value  of  sysctl  as  it  is  presented  in
2341                     /proc/sys  (incl.  newline, etc), and copy it as a string
2342                     into provided by program buffer buf of size buf_len.
2343
2344                     The whole value is copied, no matter what  file  position
2345                     user space issued e.g. sys_read at.
2346
2347                     The   buffer   is  always  NUL  terminated,  unless  it's
2348                     zero-sized.
2349
2350              Return Number of character copied (not  including  the  trailing
2351                     NUL).
2352
2353                     -E2BIG  if the buffer wasn't big enough (buf will contain
2354                     truncated name in this case).
2355
2356                     -EINVAL if current value was  unavailable,  e.g.  because
2357                     sysctl is uninitialized and read returns -EIO for it.
2358
2359       long bpf_sysctl_get_new_value(struct bpf_sysctl *ctx, char *buf, size_t
2360       buf_len)
2361
2362              Description
2363                     Get new value being written by user space to sysctl  (be‐
2364                     fore  the  actual  write happens) and copy it as a string
2365                     into provided by program buffer buf of size buf_len.
2366
2367                     User space may write new value at file position > 0.
2368
2369                     The  buffer  is  always  NUL  terminated,   unless   it's
2370                     zero-sized.
2371
2372              Return Number  of  character  copied (not including the trailing
2373                     NUL).
2374
2375                     -E2BIG if the buffer wasn't big enough (buf will  contain
2376                     truncated name in this case).
2377
2378                     -EINVAL if sysctl is being read.
2379
2380       long  bpf_sysctl_set_new_value(struct bpf_sysctl *ctx, const char *buf,
2381       size_t buf_len)
2382
2383              Description
2384                     Override new value being written by user space to  sysctl
2385                     with  value  provided  by  program  in buffer buf of size
2386                     buf_len.
2387
2388                     buf should contain a string in same form as  provided  by
2389                     user space on sysctl write.
2390
2391                     User  space  may write new value at file position > 0. To
2392                     override the whole sysctl value file position  should  be
2393                     set to zero.
2394
2395              Return 0 on success.
2396
2397                     -E2BIG if the buf_len is too big.
2398
2399                     -EINVAL if sysctl is being read.
2400
2401       long bpf_strtol(const char *buf, size_t buf_len, u64 flags, long *res)
2402
2403              Description
2404                     Convert the initial part of the string from buffer buf of
2405                     size buf_len to a long integer  according  to  the  given
2406                     base and save the result in res.
2407
2408                     The  string  may  begin with an arbitrary amount of white
2409                     space (as determined by isspace(3)) followed by a  single
2410                     optional '-' sign.
2411
2412                     Five  least  significant bits of flags encode base, other
2413                     bits are currently unused.
2414
2415                     Base must be either 8, 10, 16 or 0 to detect it automati‐
2416                     cally similar to user space strtol(3).
2417
2418              Return Number  of  characters consumed on success. Must be posi‐
2419                     tive but no more than buf_len.
2420
2421                     -EINVAL if no valid digits were found or unsupported base
2422                     was provided.
2423
2424                     -ERANGE if resulting value was out of range.
2425
2426       long  bpf_strtoul(const  char *buf, size_t buf_len, u64 flags, unsigned
2427       long *res)
2428
2429              Description
2430                     Convert the initial part of the string from buffer buf of
2431                     size buf_len to an unsigned long integer according to the
2432                     given base and save the result in res.
2433
2434                     The string may begin with an arbitrary  amount  of  white
2435                     space (as determined by isspace(3)).
2436
2437                     Five  least  significant bits of flags encode base, other
2438                     bits are currently unused.
2439
2440                     Base must be either 8, 10, 16 or 0 to detect it automati‐
2441                     cally similar to user space strtoul(3).
2442
2443              Return Number  of  characters consumed on success. Must be posi‐
2444                     tive but no more than buf_len.
2445
2446                     -EINVAL if no valid digits were found or unsupported base
2447                     was provided.
2448
2449                     -ERANGE if resulting value was out of range.
2450
2451       void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void
2452       *value, u64 flags)
2453
2454              Description
2455                     Get a bpf-local-storage from a sk.
2456
2457                     Logically, it could be thought of getting the value  from
2458                     a  map  with  sk as the key.  From this perspective,  the
2459                     usage is not much different from bpf_map_lookup_elem(map,
2460                     &sk)  except  this helper enforces the key must be a full
2461                     socket and the  map  must  be  a  BPF_MAP_TYPE_SK_STORAGE
2462                     also.
2463
2464                     Underneath,  the value is stored locally at sk instead of
2465                     the map.   The  map  is  used  as  the  bpf-local-storage
2466                     "type".  The  bpf-local-storage  "type" (i.e. the map) is
2467                     searched against all bpf-local-storages residing at sk.
2468
2469                     An optional flags  (BPF_SK_STORAGE_GET_F_CREATE)  can  be
2470                     used such that a new bpf-local-storage will be created if
2471                     one does not exist.  value  can  be  used  together  with
2472                     BPF_SK_STORAGE_GET_F_CREATE  to specify the initial value
2473                     of a  bpf-local-storage.   If  value  is  NULL,  the  new
2474                     bpf-local-storage will be zero initialized.
2475
2476              Return A bpf-local-storage pointer is returned on success.
2477
2478                     NULL  if  not found or there was an error in adding a new
2479                     bpf-local-storage.
2480
2481       long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk)
2482
2483              Description
2484                     Delete a bpf-local-storage from a sk.
2485
2486              Return 0 on success.
2487
2488                     -ENOENT if the bpf-local-storage cannot be found.
2489
2490       long bpf_send_signal(u32 sig)
2491
2492              Description
2493                     Send signal sig to the process of the current task.   The
2494                     signal may be delivered to any of this process's threads.
2495
2496              Return 0 on success or successfully queued.
2497
2498                     -EBUSY if work queue under nmi is full.
2499
2500                     -EINVAL if sig is invalid.
2501
2502                     -EPERM if no permission to send the sig.
2503
2504                     -EAGAIN if bpf program can try again.
2505
2506       s64  bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len,
2507       struct tcphdr *th, u32 th_len)
2508
2509              Description
2510                     Try to issue a SYN cookie for the packet with correspond‐
2511                     ing  IP/TCP  headers, iph and th, on the listening socket
2512                     in sk.
2513
2514                     iph points to the start of the IPv4 or IPv6 header, while
2515                     iph_len  contains  sizeof(struct  iphdr) or sizeof(struct
2516                     ip6hdr).
2517
2518                     th points to the start of the TCP  header,  while  th_len
2519                     contains the length of the TCP header.
2520
2521              Return On  success,  lower 32 bits hold the generated SYN cookie
2522                     in followed by 16 bits which hold the MSS value for  that
2523                     cookie, and the top 16 bits are unused.
2524
2525                     On failure, the returned value is one of the following:
2526
2527                     -EINVAL SYN cookie cannot be issued due to error
2528
2529                     -ENOENT SYN cookie should not be issued (no SYN flood)
2530
2531                     -EOPNOTSUPP  kernel  configuration  does  not  enable SYN
2532                     cookies
2533
2534                     -EPROTONOSUPPORT IP packet version is not 4 or 6
2535
2536       long bpf_skb_output(void *ctx, struct bpf_map  *map,  u64  flags,  void
2537       *data, u64 size)
2538
2539              Description
2540                     Write raw data blob into a special BPF perf event held by
2541                     map  of  type  BPF_MAP_TYPE_PERF_EVENT_ARRAY.  This  perf
2542                     event must have the following attributes: PERF_SAMPLE_RAW
2543                     as   sample_type,   PERF_TYPE_SOFTWARE   as   type,   and
2544                     PERF_COUNT_SW_BPF_OUTPUT as config.
2545
2546                     The flags are used to indicate the index in map for which
2547                     the value must be put, masked with BPF_F_INDEX_MASK.  Al‐
2548                     ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
2549                     dicate that the index of the current CPU core  should  be
2550                     used.
2551
2552                     The value to write, of size, is passed through eBPF stack
2553                     and pointed by data.
2554
2555                     ctx is a pointer to in-kernel struct sk_buff.
2556
2557                     This helper is similar to bpf_perf_event_output() but re‐
2558                     stricted to raw_tracepoint bpf programs.
2559
2560              Return 0 on success, or a negative error in case of failure.
2561
2562       long bpf_probe_read_user(void *dst, u32 size, const void *unsafe_ptr)
2563
2564              Description
2565                     Safely attempt to read size bytes from user space address
2566                     unsafe_ptr and store the data in dst.
2567
2568              Return 0 on success, or a negative error in case of failure.
2569
2570       long bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
2571
2572              Description
2573                     Safely attempt to read size bytes from kernel  space  ad‐
2574                     dress unsafe_ptr and store the data in dst.
2575
2576              Return 0 on success, or a negative error in case of failure.
2577
2578       long  bpf_probe_read_user_str(void  *dst,  u32  size,  const  void *un‐
2579       safe_ptr)
2580
2581              Description
2582                     Copy a NUL terminated string from an unsafe user  address
2583                     unsafe_ptr  to dst. The size should include the terminat‐
2584                     ing NUL byte. In case the string length is  smaller  than
2585                     size, the target is not padded with further NUL bytes. If
2586                     the string length is larger than size, just size-1  bytes
2587                     are copied and the last byte is set to NUL.
2588
2589                     On  success, the length of the copied string is returned.
2590                     This makes this helper useful  in  tracing  programs  for
2591                     reading  strings,  and more importantly to get its length
2592                     at runtime. See the following snippet:
2593
2594                        SEC("kprobe/sys_open")
2595                        void bpf_sys_open(struct pt_regs *ctx)
2596                        {
2597                                char buf[PATHLEN]; // PATHLEN is defined to 256
2598                                int res = bpf_probe_read_user_str(buf, sizeof(buf),
2599                                                                  ctx->di);
2600
2601                                // Consume buf, for example push it to
2602                                // userspace via bpf_perf_event_output(); we
2603                                // can use res (the string length) as event
2604                                // size, after checking its boundaries.
2605                        }
2606
2607                     In comparison, using  bpf_probe_read_user()  helper  here
2608                     instead  to read the string would require to estimate the
2609                     length at compile time, and would often result in copying
2610                     more memory than necessary.
2611
2612                     Another  useful  use  case  is  when  parsing  individual
2613                     process arguments  or  individual  environment  variables
2614                     navigating      current->mm->arg_start      and      cur‐
2615                     rent->mm->env_start: using this  helper  and  the  return
2616                     value, one can quickly iterate at the right offset of the
2617                     memory area.
2618
2619              Return On success, the strictly positive length of  the  string,
2620                     including  the  trailing NUL character. On error, a nega‐
2621                     tive value.
2622
2623       long bpf_probe_read_kernel_str(void *dst, u32  size,  const  void  *un‐
2624       safe_ptr)
2625
2626              Description
2627                     Copy  a  NUL  terminated string from an unsafe kernel ad‐
2628                     dress  unsafe_ptr  to  dst.  Same   semantics   as   with
2629                     bpf_probe_read_user_str() apply.
2630
2631              Return On  success,  the strictly positive length of the string,
2632                     including the trailing NUL character. On error,  a  nega‐
2633                     tive value.
2634
2635       long bpf_tcp_send_ack(void *tp, u32 rcv_nxt)
2636
2637              Description
2638                     Send  out a tcp-ack. tp is the in-kernel struct tcp_sock.
2639                     rcv_nxt is the ack_seq to be sent out.
2640
2641              Return 0 on success, or a negative error in case of failure.
2642
2643       long bpf_send_signal_thread(u32 sig)
2644
2645              Description
2646                     Send signal sig to the thread corresponding to  the  cur‐
2647                     rent task.
2648
2649              Return 0 on success or successfully queued.
2650
2651                     -EBUSY if work queue under nmi is full.
2652
2653                     -EINVAL if sig is invalid.
2654
2655                     -EPERM if no permission to send the sig.
2656
2657                     -EAGAIN if bpf program can try again.
2658
2659       u64 bpf_jiffies64(void)
2660
2661              Description
2662                     Obtain the 64bit jiffies
2663
2664              Return The 64 bit jiffies
2665
2666       long   bpf_read_branch_records(struct  bpf_perf_event_data  *ctx,  void
2667       *buf, u32 size, u64 flags)
2668
2669              Description
2670                     For an eBPF program attached to a  perf  event,  retrieve
2671                     the  branch records (struct perf_branch_entry) associated
2672                     to ctx and store it in the buffer pointed by  buf  up  to
2673                     size size bytes.
2674
2675              Return On  success,  number of bytes written to buf. On error, a
2676                     negative value.
2677
2678                     The flags can be set to BPF_F_GET_BRANCH_RECORDS_SIZE  to
2679                     instead  return the number of bytes required to store all
2680                     the branch entries. If this flag is set, buf may be NULL.
2681
2682                     -EINVAL if arguments invalid or size not  a  multiple  of
2683                     sizeof(struct perf_branch_entry).
2684
2685                     -ENOENT if architecture does not support branch records.
2686
2687       long    bpf_get_ns_current_pid_tgid(u64    dev,    u64    ino,   struct
2688       bpf_pidns_info *nsdata, u32 size)
2689
2690              Description
2691                     Returns 0 on success, values for pid  and  tgid  as  seen
2692                     from the current namespace will be returned in nsdata.
2693
2694              Return 0 on success, or one of the following in case of failure:
2695
2696                     -EINVAL  if  dev  and inum supplied don't match dev_t and
2697                     inode number with nsfs of current task, or if dev conver‐
2698                     sion to dev_t lost high bits.
2699
2700                     -ENOENT if pidns does not exists for the current task.
2701
2702       long  bpf_xdp_output(void  *ctx,  struct  bpf_map *map, u64 flags, void
2703       *data, u64 size)
2704
2705              Description
2706                     Write raw data blob into a special BPF perf event held by
2707                     map  of  type  BPF_MAP_TYPE_PERF_EVENT_ARRAY.  This  perf
2708                     event must have the following attributes: PERF_SAMPLE_RAW
2709                     as   sample_type,   PERF_TYPE_SOFTWARE   as   type,   and
2710                     PERF_COUNT_SW_BPF_OUTPUT as config.
2711
2712                     The flags are used to indicate the index in map for which
2713                     the value must be put, masked with BPF_F_INDEX_MASK.  Al‐
2714                     ternatively, flags can be set to BPF_F_CURRENT_CPU to in‐
2715                     dicate  that  the index of the current CPU core should be
2716                     used.
2717
2718                     The value to write, of size, is passed through eBPF stack
2719                     and pointed by data.
2720
2721                     ctx is a pointer to in-kernel struct xdp_buff.
2722
2723                     This  helper is similar to bpf_perf_eventoutput() but re‐
2724                     stricted to raw_tracepoint bpf programs.
2725
2726              Return 0 on success, or a negative error in case of failure.
2727
2728       u64 bpf_get_netns_cookie(void *ctx)
2729
2730              Description
2731                     Retrieve the cookie (generated by the kernel) of the net‐
2732                     work namespace the input ctx is associated with. The net‐
2733                     work namespace cookie remains stable for its lifetime and
2734                     provides  a global identifier that can be assumed unique.
2735                     If ctx is NULL, then the helper returns  the  cookie  for
2736                     the  initial network namespace. The cookie itself is very
2737                     similar to that of  bpf_get_socket_cookie()  helper,  but
2738                     for network namespaces instead of sockets.
2739
2740              Return A 8-byte long opaque number.
2741
2742       u64 bpf_get_current_ancestor_cgroup_id(int ancestor_level)
2743
2744              Description
2745                     Return id of cgroup v2 that is ancestor of the cgroup as‐
2746                     sociated with the current task at the ancestor_level. The
2747                     root  cgroup is at ancestor_level zero and each step down
2748                     the hierarchy increments the level. If ancestor_level  ==
2749                     level  of  cgroup  associated with the current task, then
2750                     return value will be the same  as  that  of  bpf_get_cur‐
2751                     rent_cgroup_id().
2752
2753                     The  helper  is  useful  to  implement  policies based on
2754                     cgroups that are upper in hierarchy than immediate cgroup
2755                     associated with the current task.
2756
2757                     The format of returned id and helper limitations are same
2758                     as in bpf_get_current_cgroup_id().
2759
2760              Return The id is returned or 0 in case the id could not  be  re‐
2761                     trieved.
2762
2763       long bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
2764
2765              Description
2766                     Helper  is overloaded depending on BPF program type. This
2767                     description  applies   to   BPF_PROG_TYPE_SCHED_CLS   and
2768                     BPF_PROG_TYPE_SCHED_ACT programs.
2769
2770                     Assign  the sk to the skb. When combined with appropriate
2771                     routing configuration to receive the packet  towards  the
2772                     socket,  will  cause skb to be delivered to the specified
2773                     socket.  Subsequent redirection  of  skb  via   bpf_redi‐
2774                     rect(),  bpf_clone_redirect() or other methods outside of
2775                     BPF may interfere with successful delivery to the socket.
2776
2777                     This operation is only valid from TC ingress path.
2778
2779                     The flags argument must be zero.
2780
2781              Return 0 on success, or a negative error in case of failure:
2782
2783                     -EINVAL if specified flags are not supported.
2784
2785                     -ENOENT if the socket is unavailable for assignment.
2786
2787                     -ENETUNREACH if the socket is unreachable (wrong netns).
2788
2789                     -EOPNOTSUPP if the operation is not supported, for  exam‐
2790                     ple a call from outside of TC ingress.
2791
2792                     -ESOCKTNOSUPPORT  if  the  socket  type  is not supported
2793                     (reuseport).
2794
2795       long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk,  u64
2796       flags)
2797
2798              Description
2799                     Helper  is overloaded depending on BPF program type. This
2800                     description applies to BPF_PROG_TYPE_SK_LOOKUP programs.
2801
2802                     Select the sk as a result of a socket lookup.
2803
2804                     For the operation to succeed passed socket must  be  com‐
2805                     patible  with  the packet description provided by the ctx
2806                     object.
2807
2808                     L4 protocol (IPPROTO_TCP or IPPROTO_UDP) must be an exact
2809                     match. While IP family (AF_INET or AF_INET6) must be com‐
2810                     patible, that is IPv6 sockets that are not v6-only can be
2811                     selected for IPv4 packets.
2812
2813                     Only TCP listeners and UDP unconnected sockets can be se‐
2814                     lected. sk can also be NULL to reset any previous  selec‐
2815                     tion.
2816
2817                     flags argument can combination of following values:
2818
2819                     • BPF_SK_LOOKUP_F_REPLACE to override the previous socket
2820                       selection, potentially done by a BPF program  that  ran
2821                       before us.
2822
2823                     • BPF_SK_LOOKUP_F_NO_REUSEPORT   to  skip  load-balancing
2824                       within reuseport group for the socket being selected.
2825
2826                     On success ctx->sk will point to the selected socket.
2827
2828              Return 0 on success, or a negative errno in case of failure.
2829
2830                     • -EAFNOSUPPORT if socket family (sk->family) is not com‐
2831                       patible with packet family (ctx->family).
2832
2833                     • -EEXIST  if  socket  has  been already selected, poten‐
2834                       tially by another program, and  BPF_SK_LOOKUP_F_REPLACE
2835                       flag was not specified.
2836
2837                     • -EINVAL if unsupported flags were specified.
2838
2839                     • -EPROTOTYPE   if   socket  L4  protocol  (sk->protocol)
2840                       doesn't match packet protocol (ctx->protocol).
2841
2842                     • -ESOCKTNOSUPPORT if socket is not in allowed state (TCP
2843                       listening or UDP unconnected).
2844
2845       u64 bpf_ktime_get_boot_ns(void)
2846
2847              Description
2848                     Return  the  time  elapsed since system boot, in nanosec‐
2849                     onds.  Does include the time the  system  was  suspended.
2850                     See: clock_gettime(CLOCK_BOOTTIME)
2851
2852              Return Current ktime.
2853
2854       long  bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size,
2855       const void *data, u32 data_len)
2856
2857              Description
2858                     bpf_seq_printf() uses seq_file seq_printf() to print  out
2859                     the  format  string.   The m represents the seq_file. The
2860                     fmt and fmt_size are for the format  string  itself.  The
2861                     data  and  data_len are format string arguments. The data
2862                     are a u64 array and corresponding  format  string  values
2863                     are  stored  in the array. For strings and pointers where
2864                     pointees are accessed, only the pointer values are stored
2865                     in  the  data array.  The data_len is the size of data in
2866                     bytes.
2867
2868                     Formats %s, %p{i,I}{4,6} requires to read kernel  memory.
2869                     Reading  kernel memory may fail due to either invalid ad‐
2870                     dress or valid  address  but  requiring  a  major  memory
2871                     fault.  If reading kernel memory fails, the string for %s
2872                     will  be  an  empty  string,  and  the  ip  address   for
2873                     %p{i,I}{4,6}  will  be 0. Not returning error to bpf pro‐
2874                     gram is consistent with what bpf_trace_printk() does  for
2875                     now.
2876
2877              Return 0 on success, or a negative error in case of failure:
2878
2879                     -EBUSY  if  per-CPU  memory  copy buffer is busy, can try
2880                     again by returning 1 from bpf program.
2881
2882                     -EINVAL if arguments  are  invalid,  or  if  fmt  is  in‐
2883                     valid/unsupported.
2884
2885                     -E2BIG if fmt contains too many format specifiers.
2886
2887                     -EOVERFLOW  if an overflow happened: The same object will
2888                     be tried again.
2889
2890       long bpf_seq_write(struct seq_file *m, const void *data, u32 len)
2891
2892              Description
2893                     bpf_seq_write() uses seq_file seq_write()  to  write  the
2894                     data.   The  m  represents the seq_file. The data and len
2895                     represent the data to write in bytes.
2896
2897              Return 0 on success, or a negative error in case of failure:
2898
2899                     -EOVERFLOW if an overflow happened: The same object  will
2900                     be tried again.
2901
2902       u64 bpf_sk_cgroup_id(struct bpf_sock *sk)
2903
2904              Description
2905                     Return the cgroup v2 id of the socket sk.
2906
2907                     sk  must be a non-NULL pointer to a full socket, e.g. one
2908                     returned  from  bpf_sk_lookup_xxx(),   bpf_sk_fullsock(),
2909                     etc.   The   format   of   returned  id  is  same  as  in
2910                     bpf_skb_cgroup_id().
2911
2912                     This helper is available only if the kernel was  compiled
2913                     with the CONFIG_SOCK_CGROUP_DATA configuration option.
2914
2915              Return The  id  is returned or 0 in case the id could not be re‐
2916                     trieved.
2917
2918       u64 bpf_sk_ancestor_cgroup_id(struct bpf_sock *sk, int ancestor_level)
2919
2920              Description
2921                     Return id of cgroup v2 that is ancestor of cgroup associ‐
2922                     ated  with the sk at the ancestor_level.  The root cgroup
2923                     is at ancestor_level zero and each step down the  hierar‐
2924                     chy  increments  the level. If ancestor_level == level of
2925                     cgroup associated with sk, then return value will be same
2926                     as that of bpf_sk_cgroup_id().
2927
2928                     The  helper  is  useful  to  implement  policies based on
2929                     cgroups that are upper in hierarchy than immediate cgroup
2930                     associated with sk.
2931
2932                     The format of returned id and helper limitations are same
2933                     as in bpf_sk_cgroup_id().
2934
2935              Return The id is returned or 0 in case the id could not  be  re‐
2936                     trieved.
2937
2938       long bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64 flags)
2939
2940              Description
2941                     Copy size bytes from data into a ring buffer ringbuf.  If
2942                     BPF_RB_NO_WAKEUP is specified in flags,  no  notification
2943                     of new data availability is sent.  If BPF_RB_FORCE_WAKEUP
2944                     is specified in flags, notification of  new  data  avail‐
2945                     ability is sent unconditionally.
2946
2947              Return 0 on success, or a negative error in case of failure.
2948
2949       void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
2950
2951              Description
2952                     Reserve size bytes of payload in a ring buffer ringbuf.
2953
2954              Return Valid  pointer with size bytes of memory available; NULL,
2955                     otherwise.
2956
2957       void bpf_ringbuf_submit(void *data, u64 flags)
2958
2959              Description
2960                     Submit reserved ring buffer sample, pointed to  by  data.
2961                     If  BPF_RB_NO_WAKEUP  is specified in flags, no notifica‐
2962                     tion   of   new   data   availability   is   sent.     If
2963                     BPF_RB_FORCE_WAKEUP  is  specified in flags, notification
2964                     of new data availability is sent unconditionally.
2965
2966              Return Nothing. Always succeeds.
2967
2968       void bpf_ringbuf_discard(void *data, u64 flags)
2969
2970              Description
2971                     Discard reserved ring buffer sample, pointed to by  data.
2972                     If  BPF_RB_NO_WAKEUP  is specified in flags, no notifica‐
2973                     tion   of   new   data   availability   is   sent.     If
2974                     BPF_RB_FORCE_WAKEUP  is  specified in flags, notification
2975                     of new data availability is sent unconditionally.
2976
2977              Return Nothing. Always succeeds.
2978
2979       u64 bpf_ringbuf_query(void *ringbuf, u64 flags)
2980
2981              Description
2982                     Query various characteristics of  provided  ring  buffer.
2983                     What exactly is queries is determined by flags:
2984
2985                     • BPF_RB_AVAIL_DATA: Amount of data not yet consumed.
2986
2987                     • BPF_RB_RING_SIZE: The size of ring buffer.
2988
2989                     • BPF_RB_CONS_POS: Consumer position (can wrap around).
2990
2991                     • BPF_RB_PROD_POS:   Producer(s)   position   (can   wrap
2992                       around).
2993
2994                     Data returned is just a momentary snapshot of actual val‐
2995                     ues  and  could be inaccurate, so this facility should be
2996                     used to power heuristics and for reporting, not  to  make
2997                     100% correct calculation.
2998
2999              Return Requested value, or 0, if flags are not recognized.
3000
3001       long bpf_csum_level(struct sk_buff *skb, u64 level)
3002
3003              Description
3004                     Change  the  skbs checksum level by one layer up or down,
3005                     or reset it entirely to none in order to have  the  stack
3006                     perform  checksum  validation. The level is applicable to
3007                     the following protocols: TCP, UDP, GRE, SCTP,  FCOE.  For
3008                     example,  a  decap of | ETH | IP | UDP | GUE | IP | TCP |
3009                     into | ETH | IP |  TCP  |  through  bpf_skb_adjust_room()
3010                     helper  with passing in BPF_F_ADJ_ROOM_NO_CSUM_RESET flag
3011                     would  require  one   call   to   bpf_csum_level()   with
3012                     BPF_CSUM_LEVEL_DEC since the UDP header is removed. Simi‐
3013                     larly, an encap of the latter into the  former  could  be
3014                     accompanied  by  a  helper  call to bpf_csum_level() with
3015                     BPF_CSUM_LEVEL_INC if the skb is  still  intended  to  be
3016                     processed  in  higher layers of the stack instead of just
3017                     egressing at tc.
3018
3019                     There are three supported level settings at this time:
3020
3021                     • BPF_CSUM_LEVEL_INC: Increases skb->csum_level for  skbs
3022                       with CHECKSUM_UNNECESSARY.
3023
3024                     • BPF_CSUM_LEVEL_DEC:  Decreases skb->csum_level for skbs
3025                       with CHECKSUM_UNNECESSARY.
3026
3027                     • BPF_CSUM_LEVEL_RESET: Resets skb->csum_level to  0  and
3028                       sets  CHECKSUM_NONE to force checksum validation by the
3029                       stack.
3030
3031                     • BPF_CSUM_LEVEL_QUERY:  No-op,   returns   the   current
3032                       skb->csum_level.
3033
3034              Return 0  on success, or a negative error in case of failure. In
3035                     the   case   of   BPF_CSUM_LEVEL_QUERY,    the    current
3036                     skb->csum_level  is returned or the error code -EACCES in
3037                     case the skb is not subject to CHECKSUM_UNNECESSARY.
3038
3039       struct tcp6_sock *bpf_skc_to_tcp6_sock(void *sk)
3040
3041              Description
3042                     Dynamically cast a sk pointer to a tcp6_sock pointer.
3043
3044              Return sk if casting is valid, or NULL otherwise.
3045
3046       struct tcp_sock *bpf_skc_to_tcp_sock(void *sk)
3047
3048              Description
3049                     Dynamically cast a sk pointer to a tcp_sock pointer.
3050
3051              Return sk if casting is valid, or NULL otherwise.
3052
3053       struct tcp_timewait_sock *bpf_skc_to_tcp_timewait_sock(void *sk)
3054
3055              Description
3056                     Dynamically cast a  sk  pointer  to  a  tcp_timewait_sock
3057                     pointer.
3058
3059              Return sk if casting is valid, or NULL otherwise.
3060
3061       struct tcp_request_sock *bpf_skc_to_tcp_request_sock(void *sk)
3062
3063              Description
3064                     Dynamically  cast  a  sk  pointer  to  a tcp_request_sock
3065                     pointer.
3066
3067              Return sk if casting is valid, or NULL otherwise.
3068
3069       struct udp6_sock *bpf_skc_to_udp6_sock(void *sk)
3070
3071              Description
3072                     Dynamically cast a sk pointer to a udp6_sock pointer.
3073
3074              Return sk if casting is valid, or NULL otherwise.
3075
3076       long bpf_get_task_stack(struct task_struct *task, void *buf, u32  size,
3077       u64 flags)
3078
3079              Description
3080                     Return  a  user or a kernel stack in bpf program provided
3081                     buffer.  To achieve this, the helper needs task, which is
3082                     a  valid  pointer  to  struct  task_struct.  To store the
3083                     stacktrace, the bpf program provides buf with a  nonnega‐
3084                     tive size.
3085
3086                     The  last  argument,  flags,  holds  the  number of stack
3087                     frames  to  skip   (from   0   to   255),   masked   with
3088                     BPF_F_SKIP_FIELD_MASK.  The  next bits can be used to set
3089                     the following flags:
3090
3091                     BPF_F_USER_STACK
3092                            Collect a user space stack  instead  of  a  kernel
3093                            stack.
3094
3095                     BPF_F_USER_BUILD_ID
3096                            Collect  buildid+offset  instead  of  ips for user
3097                            stack, only  valid  if  BPF_F_USER_STACK  is  also
3098                            specified.
3099
3100                     bpf_get_task_stack()      can      collect      up     to
3101                     PERF_MAX_STACK_DEPTH both kernel and user frames, subject
3102                     to sufficient large buffer size. Note that this limit can
3103                     be controlled with the sysctl program, and that it should
3104                     be  manually  increased  in  order  to  profile long user
3105                     stacks (such as stacks for Java programs). To do so, use:
3106
3107                        # sysctl kernel.perf_event_max_stack=<new value>
3108
3109              Return A non-negative value equal to or less than size  on  suc‐
3110                     cess, or a negative error in case of failure.
3111

EXAMPLES

3113       Example  usage  for most of the eBPF helpers listed in this manual page
3114       are available within the Linux kernel sources, at the  following  loca‐
3115       tions:
3116
3117       • samples/bpf/
3118
3119       • tools/testing/selftests/bpf/
3120

LICENSE

3122       eBPF  programs  can  have  an associated license, passed along with the
3123       bytecode instructions to the kernel when the programs are  loaded.  The
3124       format  for  that string is identical to the one in use for kernel mod‐
3125       ules (Dual licenses, such as "Dual BSD/GPL", may be used). Some  helper
3126       functions  are only accessible to programs that are compatible with the
3127       GNU Privacy License (GPL).
3128
3129       In order to use such helpers, the eBPF program must be loaded with  the
3130       correct  license string passed (via attr) to the bpf() system call, and
3131       this generally translates into the C source code of  the  program  con‐
3132       taining a line similar to the following:
3133
3134          char ____license[] __attribute__((section("license"), used)) = "GPL";
3135

IMPLEMENTATION

3137       This  manual  page  is  an  effort to document the existing eBPF helper
3138       functions.  But as of this writing, the BPF sub-system is  under  heavy
3139       development.  New  eBPF  program or map types are added, along with new
3140       helper functions. Some helpers are occasionally made available for  ad‐
3141       ditional  program  types.  So in spite of the efforts of the community,
3142       this page might not be up-to-date. If you want  to  check  by  yourself
3143       what  helper  functions exist in your kernel, or what types of programs
3144       they can support, here are some files among the kernel  tree  that  you
3145       may be interested in:
3146
3147       • include/uapi/linux/bpf.h is the main BPF header. It contains the full
3148         list of all helper functions, as well as many other  BPF  definitions
3149         including  most  of  the  flags,  structs  or  constants  used by the
3150         helpers.
3151
3152       • net/core/filter.c contains the  definition  of  most  network-related
3153         helper  functions,  and the list of program types from which they can
3154         be used.
3155
3156       • kernel/trace/bpf_trace.c is the  equivalent  for  most  tracing  pro‐
3157         gram-related helpers.
3158
3159       • kernel/bpf/verifier.c contains the functions used to check that valid
3160         types of eBPF maps are used with a given helper function.
3161
3162       • kernel/bpf/  directory  contains  other  files  in  which  additional
3163         helpers are defined (for cgroups, sockmaps, etc.).
3164
3165       • The  bpftool  utility can be used to probe the availability of helper
3166         functions on the system (as well as supported program and map  types,
3167         and  a  number  of  other  parameters). To do so, run bpftool feature
3168         probe (see bpftool-feature(8) for details). Add the unprivileged key‐
3169         word to list features available to unprivileged users.
3170
3171       Compatibility  between helper functions and program types can generally
3172       be found in the files where helper functions are defined. Look for  the
3173       struct  bpf_func_proto  objects and for functions returning them: these
3174       functions contain a list of helpers that a given program type can call.
3175       Note  that  the  default:  label  of the switch ... case used to filter
3176       helpers can call other functions, themselves allowing access  to  addi‐
3177       tional helpers. The requirement for GPL license is also in those struct
3178       bpf_func_proto.
3179
3180       Compatibility between helper functions and map types can  be  found  in
3181       the  check_map_func_compatibility()  function  in file kernel/bpf/veri‐
3182       fier.c.
3183
3184       Helper functions that invalidate the checks on data and data_end point‐
3185       ers     for    network    processing    are    listed    in    function
3186       bpf_helper_changes_pkt_data() in file net/core/filter.c.
3187

COLOPHON

3193       This  page  is  part of release 5.10 of the Linux man-pages project.  A
3194       description of the project, information about reporting bugs,  and  the
3195       latest     version     of     this    page,    can    be    found    at
3196       https://www.kernel.org/doc/man-pages/.
3197
3198
3199
3200                                                                BPF-HELPERS(7)