bpf-helpers(7)

1BPF-HELPERS(7)             Linux Programmer's Manual            BPF-HELPERS(7)
2
3
4

NAME

6       BPF-HELPERS - list of eBPF helper functions
7

DESCRIPTION

9       The  extended  Berkeley Packet Filter (eBPF) subsystem consists in pro‐
10       grams written in a pseudo-assembly language, then attached  to  one  of
11       the  several  kernel hooks and run in reaction of specific events. This
12       framework differs from the older, "classic" BPF (or "cBPF") in  several
13       aspects,  one  of  them being the ability to call special functions (or
14       "helpers") from within a program.  These functions are restricted to  a
15       white-list of helpers defined in the kernel.
16
17       These helpers are used by eBPF programs to interact with the system, or
18       with the context in which they work. For instance, they can be used  to
19       print  debugging messages, to get the time since the system was booted,
20       to interact with eBPF maps, or to  manipulate  network  packets.  Since
21       there  are  several eBPF program types, and that they do not run in the
22       same context, each program  type  can  only  call  a  subset  of  those
23       helpers.
24
25       Due  to  eBPF  conventions,  a helper can not have more than five argu‐
26       ments.
27
28       Internally, eBPF programs call directly into the compiled helper  func‐
29       tions  without  requiring  any foreign-function interface. As a result,
30       calling helpers introduces no overhead, thus offering excellent perfor‐
31       mance.
32
33       This  document is an attempt to list and document the helpers available
34       to eBPF developers. They are sorted by chronological order (the  oldest
35       helpers in the kernel at the top).
36

HELPERS

38       void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
39
40              Description
41                     Perform a lookup in map for an entry associated to key.
42
43              Return Map  value  associated  to  key,  or NULL if no entry was
44                     found.
45
46       int bpf_map_update_elem(struct bpf_map *map,  const  void  *key,  const
47       void *value, u64 flags)
48
49              Description
50                     Add or update the value of the entry associated to key in
51                     map with value. flags is one of:
52
53                     BPF_NOEXIST
54                            The entry for key must not exist in the map.
55
56                     BPF_EXIST
57                            The entry for key must already exist in the map.
58
59                     BPF_ANY
60                            No condition on the existence  of  the  entry  for
61                            key.
62
63                     Flag  value  BPF_NOEXIST cannot be used for maps of types
64                     BPF_MAP_TYPE_ARRAY  or  BPF_MAP_TYPE_PERCPU_ARRAY    (all
65                     elements always exist), the helper would return an error.
66
67              Return 0 on success, or a negative error in case of failure.
68
69       int bpf_map_delete_elem(struct bpf_map *map, const void *key)
70
71              Description
72                     Delete entry with key from map.
73
74              Return 0 on success, or a negative error in case of failure.
75
76       int bpf_probe_read(void *dst, u32 size, const void *src)
77
78              Description
79                     For  tracing  programs, safely attempt to read size bytes
80                     from address src and store the data in dst.
81
82              Return 0 on success, or a negative error in case of failure.
83
84       u64 bpf_ktime_get_ns(void)
85
86              Description
87                     Return the time elapsed since system  boot,  in  nanosec‐
88                     onds.
89
90              Return Current ktime.
91
92       int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
93
94              Description
95                     This  helper is a "printk()-like" facility for debugging.
96                     It prints a  message  defined  by  format  fmt  (of  size
97                     fmt_size)  to  file  /sys/kernel/debug/tracing/trace from
98                     DebugFS, if available. It can take up to three additional
99                     u64  arguments  (as  an eBPF helpers, the total number of
100                     arguments is limited to five).
101
102                     Each time the helper is called, it appends a line to  the
103                     trace.  Lines are discarded while /sys/kernel/debug/trac‐
104                     ing/trace   is    open,    use    /sys/kernel/debug/trac‐
105                     ing/trace_pipe to avoid this.  The format of the trace is
106                     customizable, and the exact output one will  get  depends
107                     on    the    options   set   in   /sys/kernel/debug/trac‐
108                     ing/trace_options (see also the  README  file  under  the
109                     same  directory).  However,  it usually defaults to some‐
110                     thing like:
111
112                        telnet-470   [001] .N.. 419421.045894: 0x00000001: <formatted msg>
113
114                     In the above:
115
116                        · telnet is the name of the current task.
117
118                        · 470 is the PID of the current task.
119
120                        · 001 is the CPU number on which the task is running.
121
122                        · In .N.., each character refers to a set  of  options
123                          (whether   irqs  are  enabled,  scheduling  options,
124                          whether hard/softirqs are  running,  level  of  pre‐
125                          empt_disabled    respectively).    N    means   that
126                          TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED are set.
127
128                        · 419421.045894 is a timestamp.
129
130                        · 0x00000001 is a fake  value  used  by  BPF  for  the
131                          instruction pointer register.
132
133                        · <formatted msg> is the message formatted with fmt.
134
135                     The  conversion  specifiers supported by fmt are similar,
136                     but more limited than for printk(). They are %d, %i,  %u,
137                     %x,  %ld,  %li, %lu, %lx, %lld, %lli, %llu, %llx, %p, %s.
138                     No modifier (size of field, padding with zeroes, etc.) is
139                     available,  and the helper will return -EINVAL (but print
140                     nothing) if it encounters an unknown specifier.
141
142                     Also, note that bpf_trace_printk() is  slow,  and  should
143                     only  be  used for debugging purposes. For this reason, a
144                     notice bloc (spanning several lines) is printed to kernel
145                     logs  and  states that the helper should not be used "for
146                     production use" the first time this helper  is  used  (or
147                     more  precisely,  when  trace_printk()  buffers are allo‐
148                     cated). For passing values to  user  space,  perf  events
149                     should be preferred.
150
151              Return The  number of bytes written to the buffer, or a negative
152                     error in case of failure.
153
154       u32 bpf_get_prandom_u32(void)
155
156              Description
157                     Get a pseudo-random number.
158
159                     From a security point of view, this helper uses  its  own
160                     pseudo-random internal state, and cannot be used to infer
161                     the seed of other random functions in  the  kernel.  How‐
162                     ever,  it is essential to note that the generator used by
163                     the helper is not cryptographically secure.
164
165              Return A random 32-bit unsigned value.
166
167       u32 bpf_get_smp_processor_id(void)
168
169              Description
170                     Get the SMP  (symmetric  multiprocessing)  processor  id.
171                     Note  that  all  programs  run  with preemption disabled,
172                     which means that the SMP processor id  is  stable  during
173                     all the execution of the program.
174
175              Return The SMP id of the processor running the program.
176
177       int  bpf_skb_store_bytes(struct  sk_buff  *skb,  u32 offset, const void
178       *from, u32 len, u64 flags)
179
180              Description
181                     Store len bytes from address from into the packet associ‐
182                     ated  to  skb,  at  offset.  flags  are  a combination of
183                     BPF_F_RECOMPUTE_CSUM (automatically recompute the  check‐
184                     sum   for   the  packet  after  storing  the  bytes)  and
185                     BPF_F_INVALIDATE_HASH  (set  skb->hash,  skb->swhash  and
186                     skb->l4hash to 0).
187
188                     A call to this helper is susceptible to change the under‐
189                     lying packet buffer. Therefore, at load time, all  checks
190                     on  pointers  previously done by the verifier are invali‐
191                     dated and must be performed again, if the helper is  used
192                     in combination with direct packet access.
193
194              Return 0 on success, or a negative error in case of failure.
195
196       int  bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64
197       to, u64 size)
198
199              Description
200                     Recompute the layer 3 (e.g. IP) checksum for  the  packet
201                     associated  to  skb.  Computation  is incremental, so the
202                     helper must know the former value  of  the  header  field
203                     that  was  modified  (from),  the new value of this field
204                     (to), and the number of bytes (2 or 4)  for  this  field,
205                     stored  in  size.  Alternatively, it is possible to store
206                     the difference between the previous and the new values of
207                     the  header  field  in to, by setting from and size to 0.
208                     For both methods, offset indicates the location of the IP
209                     checksum within the packet.
210
211                     This  helper  works  in combination with bpf_csum_diff(),
212                     which does not update the checksum in-place,  but  offers
213                     more  flexibility and can handle sizes larger than 2 or 4
214                     for the checksum to update.
215
216                     A call to this helper is susceptible to change the under‐
217                     lying  packet buffer. Therefore, at load time, all checks
218                     on pointers previously done by the verifier  are  invali‐
219                     dated  and must be performed again, if the helper is used
220                     in combination with direct packet access.
221
222              Return 0 on success, or a negative error in case of failure.
223
224       int bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from,  u64
225       to, u64 flags)
226
227              Description
228                     Recompute  the  layer  4 (e.g. TCP, UDP or ICMP) checksum
229                     for the packet associated to skb. Computation  is  incre‐
230                     mental,  so  the helper must know the former value of the
231                     header field that was modified (from), the new  value  of
232                     this  field  (to),  and  the number of bytes (2 or 4) for
233                     this field, stored on the  lowest  four  bits  of  flags.
234                     Alternatively,  it  is  possible  to store the difference
235                     between the previous and the new  values  of  the  header
236                     field  in to, by setting from and the four lowest bits of
237                     flags to 0. For both methods, offset indicates the  loca‐
238                     tion of the IP checksum within the packet. In addition to
239                     the size of the field, flags can be  added  (bitwise  OR)
240                     actual  flags. With BPF_F_MARK_MANGLED_0, a null checksum
241                     is left untouched (unless BPF_F_MARK_ENFORCE is added  as
242                     well),  and  for updates resulting in a null checksum the
243                     value   is   set   to   CSUM_MANGLED_0   instead.    Flag
244                     BPF_F_PSEUDO_HDR indicates the checksum is to be computed
245                     against a pseudo-header.
246
247                     This helper works in  combination  with  bpf_csum_diff(),
248                     which  does  not update the checksum in-place, but offers
249                     more flexibility and can handle sizes larger than 2 or  4
250                     for the checksum to update.
251
252                     A call to this helper is susceptible to change the under‐
253                     lying packet buffer. Therefore, at load time, all  checks
254                     on  pointers  previously done by the verifier are invali‐
255                     dated and must be performed again, if the helper is  used
256                     in combination with direct packet access.
257
258              Return 0 on success, or a negative error in case of failure.
259
260       int bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 index)
261
262              Description
263                     This  special helper is used to trigger a "tail call", or
264                     in other words, to jump into another  eBPF  program.  The
265                     same stack frame is used (but values on stack and in reg‐
266                     isters for the caller are not accessible to the  callee).
267                     This  mechanism  allows  for program chaining, either for
268                     raising the maximum number  of  available  eBPF  instruc‐
269                     tions,  or  to  execute  given  programs  in  conditional
270                     blocks. For security reasons, there is an upper limit  to
271                     the  number  of  successive  tail  calls that can be per‐
272                     formed.
273
274                     Upon call of this helper, the program  attempts  to  jump
275                     into   a   program   referenced   at   index   index   in
276                     prog_array_map,     a     special     map     of     type
277                     BPF_MAP_TYPE_PROG_ARRAY, and passes ctx, a pointer to the
278                     context.
279
280                     If the call succeeds, the  kernel  immediately  runs  the
281                     first instruction of the new program. This is not a func‐
282                     tion call, and it never returns to the previous  program.
283                     If the call fails, then the helper has no effect, and the
284                     caller continues to run its  subsequent  instructions.  A
285                     call  can  fail  if  the destination program for the jump
286                     does not exist (i.e. index is superior to the  number  of
287                     entries  in  prog_array_map), or if the maximum number of
288                     tail calls has been reached for this chain  of  programs.
289                     This  limit  is  defined  in  the  kernel  by  the  macro
290                     MAX_TAIL_CALL_CNT (not accessible to user  space),  which
291                     is currently set to 32.
292
293              Return 0 on success, or a negative error in case of failure.
294
295       int bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
296
297              Description
298                     Clone  and  redirect  the  packet  associated  to  skb to
299                     another net device of index  ifindex.  Both  ingress  and
300                     egress  interfaces  can  be  used  for  redirection.  The
301                     BPF_F_INGRESS value in flags is used to make the distinc‐
302                     tion  (ingress  path  is selected if the flag is present,
303                     egress path otherwise).  This is the only flag  supported
304                     for now.
305
306                     In comparison with bpf_redirect() helper, bpf_clone_redi‐
307                     rect() has the associated cost of duplicating the  packet
308                     buffer, but this can be executed out of the eBPF program.
309                     Conversely, bpf_redirect() is more efficient, but  it  is
310                     handled through an action code where the redirection hap‐
311                     pens only after the eBPF program has returned.
312
313                     A call to this helper is susceptible to change the under‐
314                     lying  packet buffer. Therefore, at load time, all checks
315                     on pointers previously done by the verifier  are  invali‐
316                     dated  and must be performed again, if the helper is used
317                     in combination with direct packet access.
318
319              Return 0 on success, or a negative error in case of failure.
320
321       u64 bpf_get_current_pid_tgid(void)
322
323              Return A 64-bit integer containing the current tgid and pid, and
324                     created   as   such:  current_task->tgid  <<  32  |  cur‐
325                     rent_task->pid.
326
327       u64 bpf_get_current_uid_gid(void)
328
329              Return A 64-bit integer containing the current GID and UID,  and
330                     created as such: current_gid << 32 | current_uid.
331
332       int bpf_get_current_comm(char *buf, u32 size_of_buf)
333
334              Description
335                     Copy  the  comm attribute of the current task into buf of
336                     size_of_buf. The comm attribute contains the name of  the
337                     executable (excluding the path) for the current task. The
338                     size_of_buf must be strictly positive.  On  success,  the
339                     helper  makes  sure  that  the  buf is NUL-terminated. On
340                     failure, it is filled with zeroes.
341
342              Return 0 on success, or a negative error in case of failure.
343
344       u32 bpf_get_cgroup_classid(struct sk_buff *skb)
345
346              Description
347                     Retrieve the classid for the current task, i.e.  for  the
348                     net_cls cgroup to which skb belongs.
349
350                     This  helper  can  be  used on TC egress path, but not on
351                     ingress.
352
353                     The net_cls cgroup provides an interface to  tag  network
354                     packets based on a user-provided identifier for all traf‐
355                     fic coming  from  the  tasks  belonging  to  the  related
356                     cgroup. See also the related kernel documentation, avail‐
357                     able  from  the  Linux   sources   in   file   Documenta‐
358                     tion/admin-guide/cgroup-v1/net_cls.rst.
359
360                     The  Linux kernel has two versions for cgroups: there are
361                     cgroups v1 and cgroups v2. Both are available  to  users,
362                     who  can use a mixture of them, but note that the net_cls
363                     cgroup is for cgroup v1 only. This makes it  incompatible
364                     with   BPF   programs   run   on   cgroups,  which  is  a
365                     cgroup-v2-only feature (a socket can only hold  data  for
366                     one version of cgroups at a time).
367
368                     This  helper is only available is the kernel was compiled
369                     with the CONFIG_CGROUP_NET_CLASSID  configuration  option
370                     set to "y" or to "m".
371
372              Return The classid, or 0 for the default unconfigured classid.
373
374       int  bpf_skb_vlan_push(struct  sk_buff  *skb,  __be16  vlan_proto,  u16
375       vlan_tci)
376
377              Description
378                     Push a vlan_tci (VLAN tag control information) of  proto‐
379                     col  vlan_proto  to  the  packet  associated to skb, then
380                     update the checksum. Note that if vlan_proto is different
381                     from ETH_P_8021Q and ETH_P_8021AD, it is considered to be
382                     ETH_P_8021Q.
383
384                     A call to this helper is susceptible to change the under‐
385                     lying  packet buffer. Therefore, at load time, all checks
386                     on pointers previously done by the verifier  are  invali‐
387                     dated  and must be performed again, if the helper is used
388                     in combination with direct packet access.
389
390              Return 0 on success, or a negative error in case of failure.
391
392       int bpf_skb_vlan_pop(struct sk_buff *skb)
393
394              Description
395                     Pop a VLAN header from the packet associated to skb.
396
397                     A call to this helper is susceptible to change the under‐
398                     lying  packet buffer. Therefore, at load time, all checks
399                     on pointers previously done by the verifier  are  invali‐
400                     dated  and must be performed again, if the helper is used
401                     in combination with direct packet access.
402
403              Return 0 on success, or a negative error in case of failure.
404
405       int bpf_skb_get_tunnel_key(struct sk_buff *skb,  struct  bpf_tunnel_key
406       *key, u32 size, u64 flags)
407
408              Description
409                     Get  tunnel  metadata. This helper takes a pointer key to
410                     an empty struct bpf_tunnel_key  of  size,  that  will  be
411                     filled  with tunnel metadata for the packet associated to
412                     skb.  The flags can be set to  BPF_F_TUNINFO_IPV6,  which
413                     indicates  that  the  tunnel  is  based  on IPv6 protocol
414                     instead of IPv4.
415
416                     The struct bpf_tunnel_key is an object  that  generalizes
417                     the principal parameters used by various tunneling proto‐
418                     cols into a single struct. This way, it can  be  used  to
419                     easily  make  a  decision  based  on  the contents of the
420                     encapsulation header, "summarized"  in  this  struct.  In
421                     particular,  it  holds  the  IP address of the remote end
422                     (IPv4 or IPv6, depending on the case) in key->remote_ipv4
423                     or   key->remote_ipv6.  Also,  this  struct  exposes  the
424                     key->tunnel_id, which is generally mapped to a VNI  (Vir‐
425                     tual Network Identifier), making it programmable together
426                     with the bpf_skb_set_tunnel_key() helper.
427
428                     Let's imagine that the following code is part of  a  pro‐
429                     gram  attached to the TC ingress interface, on one end of
430                     a GRE tunnel, and is supposed to filter out all  messages
431                     coming  from  remote  ends  with  IPv4 address other than
432                     10.0.0.1:
433
434                        int ret;
435                        struct bpf_tunnel_key key = {};
436
437                        ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
438                        if (ret < 0)
439                                return TC_ACT_SHOT;     // drop packet
440
441                        if (key.remote_ipv4 != 0x0a000001)
442                                return TC_ACT_SHOT;     // drop packet
443
444                        return TC_ACT_OK;               // accept packet
445
446                     This interface can also be used  with  all  encapsulation
447                     devices  that  can  operate  in  "collect metadata" mode:
448                     instead of having one network device per specific config‐
449                     uration, the "collect metadata" mode only requires a sin‐
450                     gle device where the configuration can be extracted  from
451                     this helper.
452
453                     This  can  be  used together with various tunnels such as
454                     VXLan, Geneve, GRE or IP in IP (IPIP).
455
456              Return 0 on success, or a negative error in case of failure.
457
458       int bpf_skb_set_tunnel_key(struct sk_buff *skb,  struct  bpf_tunnel_key
459       *key, u32 size, u64 flags)
460
461              Description
462                     Populate  tunnel  metadata  for packet associated to skb.
463                     The tunnel metadata is set to the  contents  of  key,  of
464                     size.  The  flags can be set to a combination of the fol‐
465                     lowing values:
466
467                     BPF_F_TUNINFO_IPV6
468                            Indicate that the tunnel is based on IPv6 protocol
469                            instead of IPv4.
470
471                     BPF_F_ZERO_CSUM_TX
472                            For  IPv4  packets,  add a flag to tunnel metadata
473                            indicating that  checksum  computation  should  be
474                            skipped and checksum set to zeroes.
475
476                     BPF_F_DONT_FRAGMENT
477                            Add  a flag to tunnel metadata indicating that the
478                            packet should not be fragmented.
479
480                     BPF_F_SEQ_NUMBER
481                            Add a flag to tunnel metadata  indicating  that  a
482                            sequence  number  should be added to tunnel header
483                            before sending the packet. This flag was added for
484                            GRE  encapsulation,  but  might be used with other
485                            protocols as well in the future.
486
487                     Here is a typical usage on the transmit path:
488
489                        struct bpf_tunnel_key key;
490                             populate key ...
491                        bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
492                        bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
493
494                     See also the description of the  bpf_skb_get_tunnel_key()
495                     helper for additional information.
496
497              Return 0 on success, or a negative error in case of failure.
498
499       u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
500
501              Description
502                     Read  the  value  of  a  perf  event counter. This helper
503                     relies on a map  of  type  BPF_MAP_TYPE_PERF_EVENT_ARRAY.
504                     The nature of the perf event counter is selected when map
505                     is updated with perf event file descriptors. The  map  is
506                     an  array whose size is the number of available CPUs, and
507                     each cell contains a value relative to one CPU. The value
508                     to  retrieve  is  indicated  by  flags, that contains the
509                     index   of   the   CPU   to   look   up,   masked    with
510                     BPF_F_INDEX_MASK.  Alternatively,  flags  can  be  set to
511                     BPF_F_CURRENT_CPU to indicate that the value for the cur‐
512                     rent CPU should be retrieved.
513
514                     Note that before Linux 4.13, only hardware perf event can
515                     be retrieved.
516
517                     Also,    be    aware    that     the     newer     helper
518                     bpf_perf_event_read_value()     is    recommended    over
519                     bpf_perf_event_read() in general. The latter has some ABI
520                     quirks where error and counter value are used as a return
521                     code (which is wrong to do  since  ranges  may  overlap).
522                     This  issue  is  fixed  with bpf_perf_event_read_value(),
523                     which at the same time provides more  features  over  the
524                     bpf_perf_event_read()  interface.  Please  refer  to  the
525                     description of bpf_perf_event_read_value() for details.
526
527              Return The value of the perf event counter read from the map, or
528                     a negative error code in case of failure.
529
530       int bpf_redirect(u32 ifindex, u64 flags)
531
532              Description
533                     Redirect  the  packet  to  another  net  device  of index
534                     ifindex.    This   helper   is   somewhat   similar    to
535                     bpf_clone_redirect(),  except  that  the  packet  is  not
536                     cloned, which provides increased performance.
537
538                     Except for XDP, both ingress and egress interfaces can be
539                     used for redirection. The BPF_F_INGRESS value in flags is
540                     used to make the distinction (ingress path is selected if
541                     the  flag  is present, egress path otherwise). Currently,
542                     XDP only supports redirection to  the  egress  interface,
543                     and accepts no flag at all.
544
545                     The  same  effect  can  be attained with the more generic
546                     bpf_redirect_map(), which requires specific  maps  to  be
547                     used but offers better performance.
548
549              Return For  XDP,  the  helper returns XDP_REDIRECT on success or
550                     XDP_ABORTED on error. For other program types, the values
551                     are TC_ACT_REDIRECT on success or TC_ACT_SHOT on error.
552
553       u32 bpf_get_route_realm(struct sk_buff *skb)
554
555              Description
556                     Retrieve  the  realm  or  the  route,  that is to say the
557                     tclassid field of the destination for the skb. The inden‐
558                     tifier  retrieved  is a user-provided tag, similar to the
559                     one used with the net_cls  cgroup  (see  description  for
560                     bpf_get_cgroup_classid()  helper),  but  here this tag is
561                     held by a route (a destination entry), not by a task.
562
563                     Retrieving this  identifier  works  with  the  clsact  TC
564                     egress  hook  (see  also  tc-bpf(8)), or alternatively on
565                     conventional  classful  egress  qdiscs,  but  not  on  TC
566                     ingress  path. In case of clsact TC egress hook, this has
567                     the advantage that, internally, the destination entry has
568                     not been dropped yet in the transmit path. Therefore, the
569                     destination entry does not need to be  artificially  held
570                     via  netif_keep_dst()  for a classful qdisc until the skb
571                     is freed.
572
573                     This helper is available only if the kernel was  compiled
574                     with CONFIG_IP_ROUTE_CLASSID configuration option.
575
576              Return The  realm of the route for the packet associated to skb,
577                     or 0 if none was found.
578
579       int bpf_perf_event_output(struct pt_regs *ctx, struct bpf_map *map, u64
580       flags, void *data, u64 size)
581
582              Description
583                     Write raw data blob into a special BPF perf event held by
584                     map  of  type  BPF_MAP_TYPE_PERF_EVENT_ARRAY.  This  perf
585                     event must have the following attributes: PERF_SAMPLE_RAW
586                     as   sample_type,   PERF_TYPE_SOFTWARE   as   type,   and
587                     PERF_COUNT_SW_BPF_OUTPUT as config.
588
589                     The flags are used to indicate the index in map for which
590                     the value must  be  put,  masked  with  BPF_F_INDEX_MASK.
591                     Alternatively,  flags  can be set to BPF_F_CURRENT_CPU to
592                     indicate that the index of the current CPU core should be
593                     used.
594
595                     The value to write, of size, is passed through eBPF stack
596                     and pointed by data.
597
598                     The context of the program ctx needs also  be  passed  to
599                     the helper.
600
601                     On user space, a program willing to read the values needs
602                     to call perf_event_open() on the perf event  (either  for
603                     one  or  for  all  CPUs) and to store the file descriptor
604                     into the map. This must be done before the  eBPF  program
605                     can  send  data  into it. An example is available in file
606                     samples/bpf/trace_output_user.c  in  the   Linux   kernel
607                     source  tree  (the  eBPF  program  counterpart is in sam‐
608                     ples/bpf/trace_output_kern.c).
609
610                     bpf_perf_event_output() achieves better performance  than
611                     bpf_trace_printk()  for sharing data with user space, and
612                     is much better suitable for streaming data from eBPF pro‐
613                     grams.
614
615                     Note  that  this  helper is not restricted to tracing use
616                     cases and can be used with programs attached to TC or XDP
617                     as  well,  where it allows for passing data to user space
618                     listeners. Data can be:
619
620                     · Only custom structs,
621
622                     · Only the packet payload, or
623
624                     · A combination of both.
625
626              Return 0 on success, or a negative error in case of failure.
627
628       int bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to,
629       u32 len)
630
631              Description
632                     This helper was provided as an easy way to load data from
633                     a packet. It can be used to load len  bytes  from  offset
634                     from  the  packet  associated  to  skb,  into  the buffer
635                     pointed by to.
636
637                     Since Linux 4.7, usage of this  helper  has  mostly  been
638                     replaced  by "direct packet access", enabling packet data
639                     to be manipulated with skb->data and skb->data_end point‐
640                     ing  respectively to the first byte of packet data and to
641                     the byte after the last byte of packet data. However,  it
642                     remains  useful if one wishes to read large quantities of
643                     data at once from a packet into the eBPF stack.
644
645              Return 0 on success, or a negative error in case of failure.
646
647       int bpf_get_stackid(struct  pt_regs  *ctx,  struct  bpf_map  *map,  u64
648       flags)
649
650              Description
651                     Walk  a  user  or  a  kernel  stack and return its id. To
652                     achieve this, the helper needs ctx, which is a pointer to
653                     the context on which the tracing program is executed, and
654                     a pointer to a map of type BPF_MAP_TYPE_STACK_TRACE.
655
656                     The last argument,  flags,  holds  the  number  of  stack
657                     frames   to   skip   (from   0   to   255),  masked  with
658                     BPF_F_SKIP_FIELD_MASK. The next bits can be used to set a
659                     combination of the following flags:
660
661                     BPF_F_USER_STACK
662                            Collect  a  user  space  stack instead of a kernel
663                            stack.
664
665                     BPF_F_FAST_STACK_CMP
666                            Compare stacks by hash only.
667
668                     BPF_F_REUSE_STACKID
669                            If  two  different  stacks  hash  into  the   same
670                            stackid, discard the old one.
671
672                     The  stack  id  retrieved is a 32 bit long integer handle
673                     which can be further combined with other data  (including
674                     other stack ids) and used as a key into maps. This can be
675                     useful for generating a variety of graphs (such as  flame
676                     graphs or off-cpu graphs).
677
678                     For  walking  a stack, this helper is an improvement over
679                     bpf_probe_read(), which can be used with  unrolled  loops
680                     but  is not efficient and consumes a lot of eBPF instruc‐
681                     tions.  Instead,  bpf_get_stackid()  can  collect  up  to
682                     PERF_MAX_STACK_DEPTH  both  kernel  and user frames. Note
683                     that this limit can be controlled with  the  sysctl  pro‐
684                     gram,  and  that it should be manually increased in order
685                     to profile long user stacks (such as stacks for Java pro‐
686                     grams). To do so, use:
687
688                        # sysctl kernel.perf_event_max_stack=<new value>
689
690              Return The  positive  or null stack id on success, or a negative
691                     error in case of failure.
692
693       s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size,
694       __wsum seed)
695
696              Description
697                     Compute  a  checksum  difference,  from  the  raw  buffer
698                     pointed by from, of length from_size (that must be a mul‐
699                     tiple  of  4),  towards  the raw buffer pointed by to, of
700                     size to_size (same remark). An optional seed can be added
701                     to  the  value  (this  can be cascaded, the seed may come
702                     from a previous call to the helper).
703
704                     This is flexible enough to be used in several ways:
705
706                     · With from_size == 0, to_size > 0 and seed set to check‐
707                       sum, it can be used when pushing new data.
708
709                     · With from_size > 0, to_size == 0 and seed set to check‐
710                       sum, it can be used when removing data from a packet.
711
712                     · With from_size > 0, to_size > 0 and seed set to  0,  it
713                       can  be used to compute a diff. Note that from_size and
714                       to_size do not need to be equal.
715
716                     This   helper   can   be   used   in   combination   with
717                     bpf_l3_csum_replace() and bpf_l4_csum_replace(), to which
718                     one  can   feed   in   the   difference   computed   with
719                     bpf_csum_diff().
720
721              Return The  checksum result, or a negative error code in case of
722                     failure.
723
724       int bpf_skb_get_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
725
726              Description
727                     Retrieve tunnel options metadata for the  packet  associ‐
728                     ated  to skb, and store the raw tunnel option data to the
729                     buffer opt of size.
730
731                     This helper can be used with encapsulation  devices  that
732                     can  operate  in "collect metadata" mode (please refer to
733                     the related note in the description  of  bpf_skb_get_tun‐
734                     nel_key()  for  more details). A particular example where
735                     this can be used is in combination with the Geneve encap‐
736                     sulation  protocol,  where  it  allows  for pushing (with
737                     bpf_skb_get_tunnel_opt() helper) and retrieving arbitrary
738                     TLVs  (Type-Length-Value  headers) from the eBPF program.
739                     This allows for full customization of these headers.
740
741              Return The size of the option data retrieved.
742
743       int bpf_skb_set_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
744
745              Description
746                     Set tunnel options metadata for the packet associated  to
747                     skb to the option data contained in the raw buffer opt of
748                     size.
749
750                     See also the description of the  bpf_skb_get_tunnel_opt()
751                     helper for additional information.
752
753              Return 0 on success, or a negative error in case of failure.
754
755       int bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
756
757              Description
758                     Change  the  protocol of the skb to proto. Currently sup‐
759                     ported are transition from IPv4 to IPv6, and from IPv6 to
760                     IPv4.  The  helper  takes  care of the groundwork for the
761                     transition, including resizing  the  socket  buffer.  The
762                     eBPF program is expected to fill the new headers, if any,
763                     via skb_store_bytes() and to recompute the checksums with
764                     bpf_l3_csum_replace() and bpf_l4_csum_replace(). The main
765                     case for this helper is to perform NAT64  operations  out
766                     of an eBPF program.
767
768                     Internally, the GSO type is marked as dodgy so that head‐
769                     ers are checked and  segments  are  recalculated  by  the
770                     GSO/GRO  engine.   The  size for GSO target is adapted as
771                     well.
772
773                     All values for flags are reserved for future  usage,  and
774                     must be left at zero.
775
776                     A call to this helper is susceptible to change the under‐
777                     lying packet buffer. Therefore, at load time, all  checks
778                     on  pointers  previously done by the verifier are invali‐
779                     dated and must be performed again, if the helper is  used
780                     in combination with direct packet access.
781
782              Return 0 on success, or a negative error in case of failure.
783
784       int bpf_skb_change_type(struct sk_buff *skb, u32 type)
785
786              Description
787                     Change  the packet type for the packet associated to skb.
788                     This comes down to setting skb->pkt_type to type,  except
789                     the  eBPF  program  does  not  have  a  write  access  to
790                     skb->pkt_type beside this helper.  Using  a  helper  here
791                     allows for graceful handling of errors.
792
793                     The  major  use  case  is  to  change  incoming  skb*s to
794                     **PACKET_HOST* in a programmatic way instead of having to
795                     recirculate  via  redirect(..., BPF_F_INGRESS), for exam‐
796                     ple.
797
798                     Note that type only allows certain values. At this  time,
799                     they are:
800
801                     PACKET_HOST
802                            Packet is for us.
803
804                     PACKET_BROADCAST
805                            Send packet to all.
806
807                     PACKET_MULTICAST
808                            Send packet to group.
809
810                     PACKET_OTHERHOST
811                            Send packet to someone else.
812
813              Return 0 on success, or a negative error in case of failure.
814
815       int  bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32
816       index)
817
818              Description
819                     Check whether skb is a descendant of the cgroup2 held  by
820                     map of type BPF_MAP_TYPE_CGROUP_ARRAY, at index.
821
822              Return The  return  value depends on the result of the test, and
823                     can be:
824
825                     · 0, if the skb failed the cgroup2 descendant test.
826
827                     · 1, if the skb succeeded the cgroup2 descendant test.
828
829                     · A negative error code, if an error occurred.
830
831       u32 bpf_get_hash_recalc(struct sk_buff *skb)
832
833              Description
834                     Retrieve the hash of the packet, skb->hash. If it is  not
835                     set,  in  particular  if the hash was cleared due to man‐
836                     gling, recompute this hash. Later accesses  to  the  hash
837                     can be done directly with skb->hash.
838
839                     Calling  bpf_set_hash_invalid(), changing a packet proto‐
840                     type    with    bpf_skb_change_proto(),    or     calling
841                     bpf_skb_store_bytes()  with the BPF_F_INVALIDATE_HASH are
842                     actions susceptible to clear the hash and  to  trigger  a
843                     new     computation     for     the    next    call    to
844                     bpf_get_hash_recalc().
845
846              Return The 32-bit hash.
847
848       u64 bpf_get_current_task(void)
849
850              Return A pointer to the current task struct.
851
852       int bpf_probe_write_user(void *dst, const void *src, u32 len)
853
854              Description
855                     Attempt in a safe way to write len bytes from the  buffer
856                     src  to dst in memory. It only works for threads that are
857                     in user context, and dst  must  be  a  valid  user  space
858                     address.
859
860                     This  helper  should not be used to implement any kind of
861                     security mechanism because of TOC-TOU attacks, but rather
862                     to  debug, divert, and manipulate execution of semi-coop‐
863                     erative processes.
864
865                     Keep in mind that this feature is meant for  experiments,
866                     and it has a risk of crashing the system and running pro‐
867                     grams.  Therefore, when an eBPF program using this helper
868                     is  attached, a warning including PID and process name is
869                     printed to kernel logs.
870
871              Return 0 on success, or a negative error in case of failure.
872
873       int bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
874
875              Description
876                     Check whether the probe is being run is the context of  a
877                     given  subset  of  the  cgroup2 hierarchy. The cgroup2 to
878                     test is held by map of type BPF_MAP_TYPE_CGROUP_ARRAY, at
879                     index.
880
881              Return The  return  value depends on the result of the test, and
882                     can be:
883
884                     · 0, if the skb task belongs to the cgroup2.
885
886                     · 1, if the skb task does not belong to the cgroup2.
887
888                     · A negative error code, if an error occurred.
889
890       int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
891
892              Description
893                     Resize (trim or grow) the packet associated to skb to the
894                     new  len.  The  flags  are reserved for future usage, and
895                     must be left at zero.
896
897                     The basic idea is that the  helper  performs  the  needed
898                     work to change the size of the packet, then the eBPF pro‐
899                     gram    rewrites    the    rest    via    helpers    like
900                     bpf_skb_store_bytes(),             bpf_l3_csum_replace(),
901                     bpf_l3_csum_replace() and others. This helper is  a  slow
902                     path  utility intended for replies with control messages.
903                     And because it is targeted  for  slow  path,  the  helper
904                     itself  can  afford to be slow: it implicitly linearizes,
905                     unclones and drops offloads from the skb.
906
907                     A call to this helper is susceptible to change the under‐
908                     lying  packet buffer. Therefore, at load time, all checks
909                     on pointers previously done by the verifier  are  invali‐
910                     dated  and must be performed again, if the helper is used
911                     in combination with direct packet access.
912
913              Return 0 on success, or a negative error in case of failure.
914
915       int bpf_skb_pull_data(struct sk_buff *skb, u32 len)
916
917              Description
918                     Pull in non-linear data in case the skb is non-linear and
919                     not  all  of len are part of the linear section. Make len
920                     bytes from skb readable and writable. If a zero value  is
921                     passed  for  len,  then  the  whole  length of the skb is
922                     pulled.
923
924                     This helper is only needed for reading and  writing  with
925                     direct packet access.
926
927                     For  direct packet access, testing that offsets to access
928                     are within packet boundaries (test on  skb->data_end)  is
929                     susceptible  to  fail  if  offsets are invalid, or if the
930                     requested data is in non-linear  parts  of  the  skb.  On
931                     failure  the program can just bail out, or in the case of
932                     a non-linear buffer, use a helper to make the data avail‐
933                     able. The bpf_skb_load_bytes() helper is a first solution
934                     to  access  the  data.  Another  one  consists  in  using
935                     bpf_skb_pull_data  to  pull in once the non-linear parts,
936                     then retesting and eventually access the data.
937
938                     At the same  time,  this  also  makes  sure  the  skb  is
939                     uncloned,  which  is  a  necessary  condition  for direct
940                     write. As this needs to be an  invariant  for  the  write
941                     part  only,  the  verifier detects writes and adds a pro‐
942                     logue that is calling bpf_skb_pull_data() to  effectively
943                     unclone  the  skb  from  the very beginning in case it is
944                     indeed cloned.
945
946                     A call to this helper is susceptible to change the under‐
947                     lying  packet buffer. Therefore, at load time, all checks
948                     on pointers previously done by the verifier  are  invali‐
949                     dated  and must be performed again, if the helper is used
950                     in combination with direct packet access.
951
952              Return 0 on success, or a negative error in case of failure.
953
954       s64 bpf_csum_update(struct sk_buff *skb, __wsum csum)
955
956              Description
957                     Add the checksum csum into skb->csum in case  the  driver
958                     has  supplied  a checksum for the entire packet into that
959                     field. Return an error otherwise. This helper is intended
960                     to  be  used in combination with bpf_csum_diff(), in par‐
961                     ticular when the checksum needs to be updated after  data
962                     has  been  written  into the packet through direct packet
963                     access.
964
965              Return The checksum on success, or a negative error code in case
966                     of failure.
967
968       void bpf_set_hash_invalid(struct sk_buff *skb)
969
970              Description
971                     Invalidate  the  current  skb->hash. It can be used after
972                     mangling on headers  through  direct  packet  access,  in
973                     order  to indicate that the hash is outdated and to trig‐
974                     ger a recalculation the next time  the  kernel  tries  to
975                     access this hash or when the bpf_get_hash_recalc() helper
976                     is called.
977
978       int bpf_get_numa_node_id(void)
979
980              Description
981                     Return the id of the current NUMA node. The  primary  use
982                     case  for this helper is the selection of sockets for the
983                     local NUMA node, when the program is attached to  sockets
984                     using   the  SO_ATTACH_REUSEPORT_EBPF  option  (see  also
985                     socket(7)), but the helper is  also  available  to  other
986                     eBPF  program  types,  similarly  to  bpf_get_smp_proces‐
987                     sor_id().
988
989              Return The id of current NUMA node.
990
991       int bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
992
993              Description
994                     Grows headroom of packet associated to  skb  and  adjusts
995                     the  offset  of  the  MAC  header accordingly, adding len
996                     bytes of space. It automatically extends and  reallocates
997                     memory as required.
998
999                     This  helper  can  be used on a layer 3 skb to push a MAC
1000                     header for redirection into a layer 2 device.
1001
1002                     All values for flags are reserved for future  usage,  and
1003                     must be left at zero.
1004
1005                     A call to this helper is susceptible to change the under‐
1006                     lying packet buffer. Therefore, at load time, all  checks
1007                     on  pointers  previously done by the verifier are invali‐
1008                     dated and must be performed again, if the helper is  used
1009                     in combination with direct packet access.
1010
1011              Return 0 on success, or a negative error in case of failure.
1012
1013       int bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
1014
1015              Description
1016                     Adjust  (move)  xdp_md->data by delta bytes. Note that it
1017                     is possible to use  a  negative  value  for  delta.  This
1018                     helper  can  be used to prepare the packet for pushing or
1019                     popping headers.
1020
1021                     A call to this helper is susceptible to change the under‐
1022                     lying  packet buffer. Therefore, at load time, all checks
1023                     on pointers previously done by the verifier  are  invali‐
1024                     dated  and must be performed again, if the helper is used
1025                     in combination with direct packet access.
1026
1027              Return 0 on success, or a negative error in case of failure.
1028
1029       int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
1030
1031              Description
1032                     Copy a NUL  terminated  string  from  an  unsafe  address
1033                     unsafe_ptr  to dst. The size should include the terminat‐
1034                     ing NUL byte. In case the string length is  smaller  than
1035                     size, the target is not padded with further NUL bytes. If
1036                     the string length is larger than size, just size-1  bytes
1037                     are copied and the last byte is set to NUL.
1038
1039                     On  success, the length of the copied string is returned.
1040                     This makes this helper useful  in  tracing  programs  for
1041                     reading  strings,  and more importantly to get its length
1042                     at runtime. See the following snippet:
1043
1044                        SEC("kprobe/sys_open")
1045                        void bpf_sys_open(struct pt_regs *ctx)
1046                        {
1047                                char buf[PATHLEN]; // PATHLEN is defined to 256
1048                                int res = bpf_probe_read_str(buf, sizeof(buf),
1049                                                             ctx->di);
1050
1051                                // Consume buf, for example push it to
1052                                // userspace via bpf_perf_event_output(); we
1053                                // can use res (the string length) as event
1054                                // size, after checking its boundaries.
1055                        }
1056
1057                     In comparison, using bpf_probe_read() helper here instead
1058                     to  read  the string would require to estimate the length
1059                     at compile time, and would often result in  copying  more
1060                     memory than necessary.
1061
1062                     Another  useful  use  case  is  when  parsing  individual
1063                     process arguments  or  individual  environment  variables
1064                     navigating      current->mm->arg_start      and      cur‐
1065                     rent->mm->env_start: using this  helper  and  the  return
1066                     value, one can quickly iterate at the right offset of the
1067                     memory area.
1068
1069              Return On success, the strictly positive length of  the  string,
1070                     including  the  trailing NUL character. On error, a nega‐
1071                     tive value.
1072
1073       u64 bpf_get_socket_cookie(struct sk_buff *skb)
1074
1075              Description
1076                     If the struct sk_buff pointed by skb has a known  socket,
1077                     retrieve  the  cookie  (generated  by the kernel) of this
1078                     socket.  If no cookie has been set yet,  generate  a  new
1079                     cookie.  Once generated, the socket cookie remains stable
1080                     for the life of the socket. This helper can be useful for
1081                     monitoring per socket networking traffic statistics as it
1082                     provides a global socket identifier that can  be  assumed
1083                     unique.
1084
1085              Return A  8-byte  long non-decreasing number on success, or 0 if
1086                     the socket field is missing inside skb.
1087
1088       u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
1089
1090              Description
1091                     Equivalent to bpf_get_socket_cookie() helper that accepts
1092                     skb, but gets socket from struct bpf_sock_addr context.
1093
1094              Return A 8-byte long non-decreasing number.
1095
1096       u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
1097
1098              Description
1099                     Equivalent to bpf_get_socket_cookie() helper that accepts
1100                     skb, but gets socket from struct bpf_sock_ops context.
1101
1102              Return A 8-byte long non-decreasing number.
1103
1104       u32 bpf_get_socket_uid(struct sk_buff *skb)
1105
1106              Return The owner UID of the socket associated  to  skb.  If  the
1107                     socket is NULL, or if it is not a full socket (i.e. if it
1108                     is a time-wait or a request socket instead),  overflowuid
1109                     value  is  returned  (note that overflowuid might also be
1110                     the actual UID value for the socket).
1111
1112       u32 bpf_set_hash(struct sk_buff *skb, u32 hash)
1113
1114              Description
1115                     Set the full hash for skb (set the  field  skb->hash)  to
1116                     value hash.
1117
1118              Return 0
1119
1120       int bpf_setsockopt(struct bpf_sock_ops *bpf_socket, int level, int opt‐
1121       name, char *optval, int optlen)
1122
1123              Description
1124                     Emulate a call to setsockopt() on the  socket  associated
1125                     to  bpf_socket, which must be a full socket. The level at
1126                     which the option resides and  the  name  optname  of  the
1127                     option  must  be  specified,  see  setsockopt(2) for more
1128                     information.   The  option  value  of  length  optlen  is
1129                     pointed by optval.
1130
1131                     This helper actually implements a subset of setsockopt().
1132                     It supports the following levels:
1133
1134                     · SOL_SOCKET,  which  supports  the  following  optnames:
1135                       SO_RCVBUF,  SO_SNDBUF, SO_MAX_PACING_RATE, SO_PRIORITY,
1136                       SO_RCVLOWAT, SO_MARK.
1137
1138                     · IPPROTO_TCP, which  supports  the  following  optnames:
1139                       TCP_CONGESTION, TCP_BPF_IW, TCP_BPF_SNDCWND_CLAMP.
1140
1141                     · IPPROTO_IP, which supports optname IP_TOS.
1142
1143                     · IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1144
1145              Return 0 on success, or a negative error in case of failure.
1146
1147       int  bpf_skb_adjust_room(struct  sk_buff  *skb, s32 len_diff, u32 mode,
1148       u64 flags)
1149
1150              Description
1151                     Grow or shrink the room for data in the packet associated
1152                     to skb by len_diff, and according to the selected mode.
1153
1154                     There are two supported modes at this time:
1155
1156                     · BPF_ADJ_ROOM_MAC:  Adjust  room  at the mac layer (room
1157                       space is added or removed below the layer 2 header).
1158
1159                     · BPF_ADJ_ROOM_NET: Adjust  room  at  the  network  layer
1160                       (room  space  is  added  or  removed  below the layer 3
1161                       header).
1162
1163                     The following flags are supported at this time:
1164
1165                     · BPF_F_ADJ_ROOM_FIXED_GSO:  Do  not   adjust   gso_size.
1166                       Adjusting mss in this way is not allowed for datagrams.
1167
1168                     · BPF_F_ADJ_ROOM_ENCAP_L3_IPV4,
1169                       BPF_F_ADJ_ROOM_ENCAP_L3_IPV6: Any new space is reserved
1170                       to  hold  a  tunnel  header.  Configure skb offsets and
1171                       other fields accordingly.
1172
1173                     · BPF_F_ADJ_ROOM_ENCAP_L4_GRE,
1174                       BPF_F_ADJ_ROOM_ENCAP_L4_UDP: Use with ENCAP_L3 flags to
1175                       further specify the tunnel type.
1176
1177                     · BPF_F_ADJ_ROOM_ENCAP_L2(len):  Use   with   ENCAP_L3/L4
1178                       flags  to  further  specify the tunnel type; len is the
1179                       length of the inner MAC header.
1180
1181                     A call to this helper is susceptible to change the under‐
1182                     lying  packet buffer. Therefore, at load time, all checks
1183                     on pointers previously done by the verifier  are  invali‐
1184                     dated  and must be performed again, if the helper is used
1185                     in combination with direct packet access.
1186
1187              Return 0 on success, or a negative error in case of failure.
1188
1189       int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
1190
1191              Description
1192                     Redirect the packet to the endpoint referenced by map  at
1193                     index  key.  Depending  on its type, this map can contain
1194                     references to net devices (for forwarding packets through
1195                     other  ports),  or to CPUs (for redirecting XDP frames to
1196                     another CPU; but this is only implemented for native  XDP
1197                     (with driver support) as of this writing).
1198
1199                     The  lower  two bits of flags are used as the return code
1200                     if the map lookup fails. This is so that the return value
1201                     can  be one of the XDP program return codes up to XDP_TX,
1202                     as chosen by the caller. Any higher  bits  in  the  flags
1203                     argument must be unset.
1204
1205                     When used to redirect packets to net devices, this helper
1206                     provides a high performance increase over bpf_redirect().
1207                     This  is  due  to  various  implementation details of the
1208                     underlying mechanisms, one of  which  is  the  fact  that
1209                     bpf_redirect_map()  tries  to  send packet as a "bulk" to
1210                     the device.
1211
1212              Return XDP_REDIRECT on success, or XDP_ABORTED on error.
1213
1214       int bpf_sk_redirect_map(struct bpf_map *map, u32 key, u64 flags)
1215
1216              Description
1217                     Redirect the packet to the socket referenced by  map  (of
1218                     type BPF_MAP_TYPE_SOCKMAP) at index key. Both ingress and
1219                     egress  interfaces  can  be  used  for  redirection.  The
1220                     BPF_F_INGRESS value in flags is used to make the distinc‐
1221                     tion (ingress path is selected if the  flag  is  present,
1222                     egress  path  otherwise). This is the only flag supported
1223                     for now.
1224
1225              Return SK_PASS on success, or SK_DROP on error.
1226
1227       int  bpf_sock_map_update(struct  bpf_sock_ops  *skops,  struct  bpf_map
1228       *map, void *key, u64 flags)
1229
1230              Description
1231                     Add an entry to, or update a map referencing sockets. The
1232                     skops is used as a new value for the entry associated  to
1233                     key. flags is one of:
1234
1235                     BPF_NOEXIST
1236                            The entry for key must not exist in the map.
1237
1238                     BPF_EXIST
1239                            The entry for key must already exist in the map.
1240
1241                     BPF_ANY
1242                            No  condition  on  the  existence of the entry for
1243                            key.
1244
1245                     If the map has eBPF programs (parser and verdict),  those
1246                     will  be  inherited  by  the  socket  being added. If the
1247                     socket is already attached to eBPF programs, this results
1248                     in an error.
1249
1250              Return 0 on success, or a negative error in case of failure.
1251
1252       int bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
1253
1254              Description
1255                     Adjust  the address pointed by xdp_md->data_meta by delta
1256                     (which can be positive or negative). Note that this oper‐
1257                     ation modifies the address stored in xdp_md->data, so the
1258                     latter must be loaded only  after  the  helper  has  been
1259                     called.
1260
1261                     The use of xdp_md->data_meta is optional and programs are
1262                     not required to use it. The rationale is  that  when  the
1263                     packet  is processed with XDP (e.g. as DoS filter), it is
1264                     possible to push further meta data along with  it  before
1265                     passing  to  the stack, and to give the guarantee that an
1266                     ingress eBPF program attached as a TC classifier  on  the
1267                     same device can pick this up for further post-processing.
1268                     Since TC works with socket buffers, it  remains  possible
1269                     to  set  from XDP the mark or priority pointers, or other
1270                     pointers for the  socket  buffer.   Having  this  scratch
1271                     space  generic and programmable allows for more flexibil‐
1272                     ity as the user is free to store whatever meta data  they
1273                     need.
1274
1275                     A call to this helper is susceptible to change the under‐
1276                     lying packet buffer. Therefore, at load time, all  checks
1277                     on  pointers  previously done by the verifier are invali‐
1278                     dated and must be performed again, if the helper is  used
1279                     in combination with direct packet access.
1280
1281              Return 0 on success, or a negative error in case of failure.
1282
1283       int  bpf_perf_event_read_value(struct  bpf_map  *map, u64 flags, struct
1284       bpf_perf_event_value *buf, u32 buf_size)
1285
1286              Description
1287                     Read the value of a perf event counter, and store it into
1288                     buf of size buf_size. This helper relies on a map of type
1289                     BPF_MAP_TYPE_PERF_EVENT_ARRAY. The  nature  of  the  perf
1290                     event  counter  is selected when map is updated with perf
1291                     event file descriptors. The map is an array whose size is
1292                     the  number  of  available CPUs, and each cell contains a
1293                     value relative to one CPU. The value to retrieve is indi‐
1294                     cated  by  flags,  that  contains the index of the CPU to
1295                     look up,  masked  with  BPF_F_INDEX_MASK.  Alternatively,
1296                     flags  can  be  set to BPF_F_CURRENT_CPU to indicate that
1297                     the value for the current CPU should be retrieved.
1298
1299                     This   helper    behaves    in    a    way    close    to
1300                     bpf_perf_event_read()  helper,  save that instead of just
1301                     returning the value observed, it fills the buf structure.
1302                     This  allows for additional data to be retrieved: in par‐
1303                     ticular, the enabled and running times  (in  buf->enabled
1304                     and  buf->running,  respectively) are copied. In general,
1305                     bpf_perf_event_read_value()    is    recommended     over
1306                     bpf_perf_event_read(), which has some ABI issues and pro‐
1307                     vides fewer functionalities.
1308
1309                     These values are interesting, because hardware PMU  (Per‐
1310                     formance Monitoring Unit) counters are limited resources.
1311                     When there are more PMU based  perf  events  opened  than
1312                     available counters, kernel will multiplex these events so
1313                     each event gets certain percentage (but not all)  of  the
1314                     PMU  time.  In case that multiplexing happens, the number
1315                     of samples or counter value will  not  reflect  the  case
1316                     compared  to when no multiplexing occurs. This makes com‐
1317                     parison between different runs difficult.  Typically, the
1318                     counter  value  should  be normalized before comparing to
1319                     other experiments. The usual  normalization  is  done  as
1320                     follows.
1321
1322                        normalized_counter = counter * t_enabled / t_running
1323
1324                     Where  t_enabled is the time enabled for event and t_run‐
1325                     ning is the time running for event since last  normaliza‐
1326                     tion. The enabled and running times are accumulated since
1327                     the perf event open. To achieve  scaling  factor  between
1328                     two invocations of an eBPF program, users can can use CPU
1329                     id as the key (which is  typical  for  perf  array  usage
1330                     model) to remember the previous value and do the calcula‐
1331                     tion inside the eBPF program.
1332
1333              Return 0 on success, or a negative error in case of failure.
1334
1335       int bpf_perf_prog_read_value(struct  bpf_perf_event_data  *ctx,  struct
1336       bpf_perf_event_value *buf, u32 buf_size)
1337
1338              Description
1339                     For  en  eBPF  program attached to a perf event, retrieve
1340                     the value of the event  counter  associated  to  ctx  and
1341                     store  it  in  the  structure  pointed by buf and of size
1342                     buf_size. Enabled and running times are  also  stored  in
1343                     the     structure     (see    description    of    helper
1344                     bpf_perf_event_read_value() for more details).
1345
1346              Return 0 on success, or a negative error in case of failure.
1347
1348       int bpf_getsockopt(struct bpf_sock_ops *bpf_socket, int level, int opt‐
1349       name, char *optval, int optlen)
1350
1351              Description
1352                     Emulate  a  call to getsockopt() on the socket associated
1353                     to bpf_socket, which must be a full socket. The level  at
1354                     which  the  option  resides  and  the name optname of the
1355                     option must be  specified,  see  getsockopt(2)  for  more
1356                     information.  The retrieved value is stored in the struc‐
1357                     ture pointed by opval and of length optlen.
1358
1359                     This helper actually implements a subset of getsockopt().
1360                     It supports the following levels:
1361
1362                     · IPPROTO_TCP, which supports optname TCP_CONGESTION.
1363
1364                     · IPPROTO_IP, which supports optname IP_TOS.
1365
1366                     · IPPROTO_IPV6, which supports optname IPV6_TCLASS.
1367
1368              Return 0 on success, or a negative error in case of failure.
1369
1370       int bpf_override_return(struct pt_regs *regs, u64 rc)
1371
1372              Description
1373                     Used  for  error  injection,  this helper uses kprobes to
1374                     override the return value of the probed function, and  to
1375                     set  it to rc.  The first argument is the context regs on
1376                     which the kprobe works.
1377
1378                     This helper works by  setting  setting  the  PC  (program
1379                     counter) to an override function which is run in place of
1380                     the original probed function. This means the probed func‐
1381                     tion  is  not  run  at all. The replacement function just
1382                     returns with the required value.
1383
1384                     This helper has security implications, and thus  is  sub‐
1385                     ject  to restrictions. It is only available if the kernel
1386                     was compiled with the CONFIG_BPF_KPROBE_OVERRIDE configu‐
1387                     ration  option,  and  in this case it only works on func‐
1388                     tions tagged with  ALLOW_ERROR_INJECTION  in  the  kernel
1389                     code.
1390
1391                     Also,  the helper is only available for the architectures
1392                     having the CONFIG_FUNCTION_ERROR_INJECTION option. As  of
1393                     this writing, x86 architecture is the only one to support
1394                     this feature.
1395
1396              Return 0
1397
1398       int  bpf_sock_ops_cb_flags_set(struct   bpf_sock_ops   *bpf_sock,   int
1399       argval)
1400
1401              Description
1402                     Attempt  to  set  the  value of the bpf_sock_ops_cb_flags
1403                     field for the full TCP socket associated to  bpf_sock_ops
1404                     to argval.
1405
1406                     The  primary  use  of this field is to determine if there
1407                     should   be   calls   to   eBPF    programs    of    type
1408                     BPF_PROG_TYPE_SOCK_OPS at various points in the TCP code.
1409                     A program of the same type can change its value, per con‐
1410                     nection  and  as necessary, when the connection is estab‐
1411                     lished. This field is directly  accessible  for  reading,
1412                     but  this  helper  must  be  used for updates in order to
1413                     return an error if an eBPF program tries to set  a  call‐
1414                     back that is not supported in the current kernel.
1415
1416                     argval is a flag array which can combine these flags:
1417
1418                     · BPF_SOCK_OPS_RTO_CB_FLAG (retransmission time out)
1419
1420                     · BPF_SOCK_OPS_RETRANS_CB_FLAG (retransmission)
1421
1422                     · BPF_SOCK_OPS_STATE_CB_FLAG (TCP state change)
1423
1424                     · BPF_SOCK_OPS_RTT_CB_FLAG (every RTT)
1425
1426                     Therefore,  this function can be used to clear a callback
1427                     flag by setting the appropriate bit to zero. e.g. to dis‐
1428                     able the RTO callback:
1429
1430                     bpf_sock_ops_cb_flags_set(bpf_sock,
1431                            bpf_sock->bpf_sock_ops_cb_flags                  &
1432                            ~BPF_SOCK_OPS_RTO_CB_FLAG)
1433
1434                     Here are some examples of where one could call such  eBPF
1435                     program:
1436
1437                     · When RTO fires.
1438
1439                     · When a packet is retransmitted.
1440
1441                     · When the connection terminates.
1442
1443                     · When a packet is sent.
1444
1445                     · When a packet is received.
1446
1447              Return Code -EINVAL if the socket is not a full TCP socket; oth‐
1448                     erwise, a positive number containing the bits that  could
1449                     not be set is returned (which comes down to 0 if all bits
1450                     were set as required).
1451
1452       int bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map  *map,
1453       u32 key, u64 flags)
1454
1455              Description
1456                     This  helper is used in programs implementing policies at
1457                     the socket level. If the message msg is allowed  to  pass
1458                     (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1459                     rect  it  to  the  socket  referenced  by  map  (of  type
1460                     BPF_MAP_TYPE_SOCKMAP)  at  index  key.  Both  ingress and
1461                     egress  interfaces  can  be  used  for  redirection.  The
1462                     BPF_F_INGRESS value in flags is used to make the distinc‐
1463                     tion (ingress path is selected if the  flag  is  present,
1464                     egress  path  otherwise). This is the only flag supported
1465                     for now.
1466
1467              Return SK_PASS on success, or SK_DROP on error.
1468
1469       int bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
1470
1471              Description
1472                     For socket policies, apply the verdict of the  eBPF  pro‐
1473                     gram to the next bytes (number of bytes) of message msg.
1474
1475                     For  example,  this  helper  can be used in the following
1476                     cases:
1477
1478                     · A single sendmsg() or sendfile() system  call  contains
1479                       multiple logical messages that the eBPF program is sup‐
1480                       posed to read and for which it should apply a verdict.
1481
1482                     · An eBPF program only cares to read the first bytes of a
1483                       msg.  If  the message has a large payload, then setting
1484                       up and calling the  eBPF  program  repeatedly  for  all
1485                       bytes,  even though the verdict is already known, would
1486                       create unnecessary overhead.
1487
1488                     When called from within an eBPF program, the helper  sets
1489                     a  counter  internal  to  the BPF infrastructure, that is
1490                     used to apply the last verdict  to  the  next  bytes.  If
1491                     bytes  is  smaller  than the current data being processed
1492                     from a sendmsg() or sendfile()  system  call,  the  first
1493                     bytes  will  be  sent and the eBPF program will be re-run
1494                     with the pointer for start of data pointing to byte  num‐
1495                     ber  bytes  + 1. If bytes is larger than the current data
1496                     being processed, then the eBPF verdict will be applied to
1497                     multiple  sendmsg()  or  sendfile() calls until bytes are
1498                     consumed.
1499
1500                     Note that if a socket closes with  the  internal  counter
1501                     holding  a  non-zero value, this is not a problem because
1502                     data is not being buffered for bytes and is sent as it is
1503                     received.
1504
1505              Return 0
1506
1507       int bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
1508
1509              Description
1510                     For socket policies, prevent the execution of the verdict
1511                     eBPF program for message msg until  bytes  (byte  number)
1512                     have been accumulated.
1513
1514                     This  can  be  used  when  one needs a specific number of
1515                     bytes before a verdict can be assigned, even if the  data
1516                     spans multiple sendmsg() or sendfile() calls. The extreme
1517                     case would be a user calling  sendmsg()  repeatedly  with
1518                     1-byte  long message segments. Obviously, this is bad for
1519                     performance, but it is still valid. If the  eBPF  program
1520                     needs  bytes  bytes to validate a header, this helper can
1521                     be used to prevent the eBPF program to  be  called  again
1522                     until bytes have been accumulated.
1523
1524              Return 0
1525
1526       int  bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64
1527       flags)
1528
1529              Description
1530                     For socket policies, pull in non-linear  data  from  user
1531                     space   for   msg   and   set   pointers   msg->data  and
1532                     msg->data_end to start and end bytes  offsets  into  msg,
1533                     respectively.
1534
1535                     If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
1536                     it can only parse data that the (data, data_end) pointers
1537                     have already consumed. For sendmsg() hooks this is likely
1538                     the first scatterlist element. But for calls  relying  on
1539                     the  sendpage  handler (e.g. sendfile()) this will be the
1540                     range (0, 0) because the data is shared with  user  space
1541                     and  by  default  the objective is to avoid allowing user
1542                     space to modify data while (or  after)  eBPF  verdict  is
1543                     being  decided.  This  helper can be used to pull in data
1544                     and to set the start and end  pointer  to  given  values.
1545                     Data  will  be  copied if necessary (i.e. if data was not
1546                     linear and if start and end pointers do not point to  the
1547                     same chunk).
1548
1549                     A call to this helper is susceptible to change the under‐
1550                     lying packet buffer. Therefore, at load time, all  checks
1551                     on  pointers  previously done by the verifier are invali‐
1552                     dated and must be performed again, if the helper is  used
1553                     in combination with direct packet access.
1554
1555                     All  values  for flags are reserved for future usage, and
1556                     must be left at zero.
1557
1558              Return 0 on success, or a negative error in case of failure.
1559
1560       int bpf_bind(struct bpf_sock_addr  *ctx,  struct  sockaddr  *addr,  int
1561       addr_len)
1562
1563              Description
1564                     Bind  the socket associated to ctx to the address pointed
1565                     by addr, of length addr_len. This allows for making  out‐
1566                     going  connection  from the desired IP address, which can
1567                     be useful for example when all processes inside a  cgroup
1568                     should  use one single IP address on a host that has mul‐
1569                     tiple IP configured.
1570
1571                     This helper works for IPv4 and IPv6, TCP and UDP sockets.
1572                     The   domain   (addr->sa_family)   must  be  AF_INET  (or
1573                     AF_INET6). Looking for a free port  to  bind  to  can  be
1574                     expensive,  therefore binding to port is not permitted by
1575                     the helper: addr->sin_port (or  sin6_port,  respectively)
1576                     must be set to zero.
1577
1578              Return 0 on success, or a negative error in case of failure.
1579
1580       int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
1581
1582              Description
1583                     Adjust (move) xdp_md->data_end by delta bytes. It is only
1584                     possible to shrink the packet as of this writing,  there‐
1585                     fore delta must be a negative integer.
1586
1587                     A call to this helper is susceptible to change the under‐
1588                     lying packet buffer. Therefore, at load time, all  checks
1589                     on  pointers  previously done by the verifier are invali‐
1590                     dated and must be performed again, if the helper is  used
1591                     in combination with direct packet access.
1592
1593              Return 0 on success, or a negative error in case of failure.
1594
1595       int  bpf_skb_get_xfrm_state(struct  sk_buff  *skb,  u32  index,  struct
1596       bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
1597
1598              Description
1599                     Retrieve the XFRM state (IP transform framework, see also
1600                     ip-xfrm(8)) at index in XFRM "security path" for skb.
1601
1602                     The   retrieved   value   is   stored   in   the   struct
1603                     bpf_xfrm_state pointed by xfrm_state and of length size.
1604
1605                     All values for flags are reserved for future  usage,  and
1606                     must be left at zero.
1607
1608                     This  helper is available only if the kernel was compiled
1609                     with CONFIG_XFRM configuration option.
1610
1611              Return 0 on success, or a negative error in case of failure.
1612
1613       int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
1614
1615              Description
1616                     Return a user or a kernel stack in bpf  program  provided
1617                     buffer.   To achieve this, the helper needs ctx, which is
1618                     a pointer to the context on which the tracing program  is
1619                     executed.   To store the stacktrace, the bpf program pro‐
1620                     vides buf with a nonnegative size.
1621
1622                     The last argument,  flags,  holds  the  number  of  stack
1623                     frames   to   skip   (from   0   to   255),  masked  with
1624                     BPF_F_SKIP_FIELD_MASK. The next bits can be used  to  set
1625                     the following flags:
1626
1627                     BPF_F_USER_STACK
1628                            Collect  a  user  space  stack instead of a kernel
1629                            stack.
1630
1631                     BPF_F_USER_BUILD_ID
1632                            Collect buildid+offset instead  of  ips  for  user
1633                            stack,  only  valid  if  BPF_F_USER_STACK  is also
1634                            specified.
1635
1636                     bpf_get_stack() can collect  up  to  PERF_MAX_STACK_DEPTH
1637                     both  kernel and user frames, subject to sufficient large
1638                     buffer size. Note that this limit can be controlled  with
1639                     the  sysctl  program,  and  that  it  should  be manually
1640                     increased in order to profile long user stacks  (such  as
1641                     stacks for Java programs). To do so, use:
1642
1643                        # sysctl kernel.perf_event_max_stack=<new value>
1644
1645              Return A  non-negative  value equal to or less than size on suc‐
1646                     cess, or a negative error in case of failure.
1647
1648       int bpf_skb_load_bytes_relative(const struct sk_buff *skb, u32  offset,
1649       void *to, u32 len, u32 start_header)
1650
1651              Description
1652                     This helper is similar to bpf_skb_load_bytes() in that it
1653                     provides an easy way to load len bytes from  offset  from
1654                     the  packet associated to skb, into the buffer pointed by
1655                     to. The difference  to  bpf_skb_load_bytes()  is  that  a
1656                     fifth  argument  start_header exists in order to select a
1657                     base offset to start from. start_header can be one of:
1658
1659                     BPF_HDR_START_MAC
1660                            Base offset to load data from is skb's mac header.
1661
1662                     BPF_HDR_START_NET
1663                            Base offset to load data  from  is  skb's  network
1664                            header.
1665
1666                     In  general,  "direct  packet  access"  is  the preferred
1667                     method to access packet data, however, this helper is  in
1668                     particular  useful in socket filters where skb->data does
1669                     not always point to the start of the mac header and where
1670                     "direct packet access" is not available.
1671
1672              Return 0 on success, or a negative error in case of failure.
1673
1674       int  bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen,
1675       u32 flags)
1676
1677              Description
1678                     Do FIB  lookup  in  kernel  tables  using  parameters  in
1679                     params.   If lookup is successful and result shows packet
1680                     is to be forwarded, the neighbor tables are searched  for
1681                     the  nexthop.   If successful (ie., FIB lookup shows for‐
1682                     warding and nexthop is resolved), the nexthop address  is
1683                     returned in ipv4_dst or ipv6_dst based on family, smac is
1684                     set to mac address of egress device, dmac is set to  nex‐
1685                     thop  mac  address, rt_metric is set to metric from route
1686                     (IPv4/IPv6 only), and ifindex is set to the device  index
1687                     of the nexthop from the FIB lookup.
1688
1689                     plen argument is the size of the passed in struct.  flags
1690                     argument can be a combination of one or more of the  fol‐
1691                     lowing values:
1692
1693                     BPF_FIB_LOOKUP_DIRECT
1694                            Do  a direct table lookup vs full lookup using FIB
1695                            rules.
1696
1697                     BPF_FIB_LOOKUP_OUTPUT
1698                            Perform lookup from an egress perspective (default
1699                            is ingress).
1700
1701                     ctx  is  either  struct xdp_md for XDP programs or struct
1702                     sk_buff tc cls_act programs.
1703
1704              Return
1705
1706                     · < 0 if any input argument is invalid
1707
1708                     · 0 on success (packet  is  forwarded,  nexthop  neighbor
1709                       exists)
1710
1711                     · >  0  one of BPF_FIB_LKUP_RET_ codes explaining why the
1712                       packet is not forwarded or needs assist from full stack
1713
1714       int  bpf_sock_hash_update(struct   bpf_sock_ops_kern   *skops,   struct
1715       bpf_map *map, void *key, u64 flags)
1716
1717              Description
1718                     Add  an  entry  to,  or update a sockhash map referencing
1719                     sockets.  The skops is used as a new value for the  entry
1720                     associated to key. flags is one of:
1721
1722                     BPF_NOEXIST
1723                            The entry for key must not exist in the map.
1724
1725                     BPF_EXIST
1726                            The entry for key must already exist in the map.
1727
1728                     BPF_ANY
1729                            No  condition  on  the  existence of the entry for
1730                            key.
1731
1732                     If the map has eBPF programs (parser and verdict),  those
1733                     will  be  inherited  by  the  socket  being added. If the
1734                     socket is already attached to eBPF programs, this results
1735                     in an error.
1736
1737              Return 0 on success, or a negative error in case of failure.
1738
1739       int bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map,
1740       void *key, u64 flags)
1741
1742              Description
1743                     This helper is used in programs implementing policies  at
1744                     the  socket  level. If the message msg is allowed to pass
1745                     (i.e. if the verdict eBPF program returns SK_PASS), redi‐
1746                     rect  it  to  the  socket  referenced  by  map  (of  type
1747                     BPF_MAP_TYPE_SOCKHASH) using hash key. Both  ingress  and
1748                     egress  interfaces  can  be  used  for  redirection.  The
1749                     BPF_F_INGRESS value in flags is used to make the distinc‐
1750                     tion  (ingress  path  is selected if the flag is present,
1751                     egress path otherwise). This is the only  flag  supported
1752                     for now.
1753
1754              Return SK_PASS on success, or SK_DROP on error.
1755
1756       int bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void
1757       *key, u64 flags)
1758
1759              Description
1760                     This helper is used in programs implementing policies  at
1761                     the  skb  socket  level. If the sk_buff skb is allowed to
1762                     pass  (i.e.   if  the  verdeict  eBPF   program   returns
1763                     SK_PASS), redirect it to the socket referenced by map (of
1764                     type BPF_MAP_TYPE_SOCKHASH) using hash key. Both  ingress
1765                     and  egress  interfaces  can be used for redirection. The
1766                     BPF_F_INGRESS value in flags is used to make the distinc‐
1767                     tion  (ingress  path  is selected if the flag is present,
1768                     egress otherwise). This is the only  flag  supported  for
1769                     now.
1770
1771              Return SK_PASS on success, or SK_DROP on error.
1772
1773       int  bpf_lwt_push_encap(struct  sk_buff  *skb, u32 type, void *hdr, u32
1774       len)
1775
1776              Description
1777                     Encapsulate the packet associated to skb within a Layer 3
1778                     protocol header. This header is provided in the buffer at
1779                     address hdr, with len its size in bytes.  type  indicates
1780                     the protocol of the header and can be one of:
1781
1782                     BPF_LWT_ENCAP_SEG6
1783                            IPv6  encapsulation  with  Segment  Routing Header
1784                            (struct ipv6_sr_hdr). hdr only contains  the  SRH,
1785                            the IPv6 header is computed by the kernel.
1786
1787                     BPF_LWT_ENCAP_SEG6_INLINE
1788                            Only  works if skb contains an IPv6 packet. Insert
1789                            a  Segment  Routing  Header  (struct  ipv6_sr_hdr)
1790                            inside the IPv6 header.
1791
1792                     BPF_LWT_ENCAP_IP
1793                            IP  encapsulation  (GRE/GUE/IPIP/etc).  The  outer
1794                            header must be IPv4 or IPv6, followed by  zero  or
1795                            more  additional  headers, up to LWT_BPF_MAX_HEAD‐
1796                            ROOM total bytes in all prepended headers.  Please
1797                            note that if skb_is_gso(skb) is true, no more than
1798                            two  headers  can  be  prepended,  and  the  inner
1799                            header,  if  present,  should  be  either  GRE  or
1800                            UDP/GUE.
1801
1802                     BPF_LWT_ENCAP_SEG6* types can be called by  BPF  programs
1803                     of  type  BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can
1804                     be called by bpf programs of  types  BPF_PROG_TYPE_LWT_IN
1805                     and BPF_PROG_TYPE_LWT_XMIT.
1806
1807                     A call to this helper is susceptible to change the under‐
1808                     lying packet buffer. Therefore, at load time, all  checks
1809                     on  pointers  previously done by the verifier are invali‐
1810                     dated and must be performed again, if the helper is  used
1811                     in combination with direct packet access.
1812
1813              Return 0 on success, or a negative error in case of failure.
1814
1815       int  bpf_lwt_seg6_store_bytes(struct  sk_buff  *skb,  u32 offset, const
1816       void *from, u32 len)
1817
1818              Description
1819                     Store len bytes from address from into the packet associ‐
1820                     ated  to  skb,  at  offset.  Only the flags, tag and TLVs
1821                     inside the outermost IPv6 Segment Routing Header  can  be
1822                     modified through this helper.
1823
1824                     A call to this helper is susceptible to change the under‐
1825                     lying packet buffer. Therefore, at load time, all  checks
1826                     on  pointers  previously done by the verifier are invali‐
1827                     dated and must be performed again, if the helper is  used
1828                     in combination with direct packet access.
1829
1830              Return 0 on success, or a negative error in case of failure.
1831
1832       int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
1833
1834              Description
1835                     Adjust  the  size allocated to TLVs in the outermost IPv6
1836                     Segment Routing Header contained in the packet associated
1837                     to  skb,  at position offset by delta bytes. Only offsets
1838                     after the segments are accepted. delta  can  be  as  well
1839                     positive (growing) as negative (shrinking).
1840
1841                     A call to this helper is susceptible to change the under‐
1842                     lying packet buffer. Therefore, at load time, all  checks
1843                     on  pointers  previously done by the verifier are invali‐
1844                     dated and must be performed again, if the helper is  used
1845                     in combination with direct packet access.
1846
1847              Return 0 on success, or a negative error in case of failure.
1848
1849       int  bpf_lwt_seg6_action(struct  sk_buff *skb, u32 action, void *param,
1850       u32 param_len)
1851
1852              Description
1853                     Apply an IPv6 Segment Routing action of  type  action  to
1854                     the packet associated to skb. Each action takes a parame‐
1855                     ter contained at address param, and of  length  param_len
1856                     bytes.  action can be one of:
1857
1858                     SEG6_LOCAL_ACTION_END_X
1859                            End.X action: Endpoint with Layer-3 cross-connect.
1860                            Type of param: struct in6_addr.
1861
1862                     SEG6_LOCAL_ACTION_END_T
1863                            End.T action: Endpoint with  specific  IPv6  table
1864                            lookup.  Type of param: int.
1865
1866                     SEG6_LOCAL_ACTION_END_B6
1867                            End.B6  action:  Endpoint bound to an SRv6 policy.
1868                            Type of param: struct ipv6_sr_hdr.
1869
1870                     SEG6_LOCAL_ACTION_END_B6_ENCAP
1871                            End.B6.Encap action: Endpoint  bound  to  an  SRv6
1872                            encapsulation   policy.   Type  of  param:  struct
1873                            ipv6_sr_hdr.
1874
1875                     A call to this helper is susceptible to change the under‐
1876                     lying  packet buffer. Therefore, at load time, all checks
1877                     on pointers previously done by the verifier  are  invali‐
1878                     dated  and must be performed again, if the helper is used
1879                     in combination with direct packet access.
1880
1881              Return 0 on success, or a negative error in case of failure.
1882
1883       int bpf_rc_repeat(void *ctx)
1884
1885              Description
1886                     This helper is used in programs implementing IR decoding,
1887                     to report a successfully decoded repeat key message. This
1888                     delays the generation of a key up  event  for  previously
1889                     generated key down event.
1890
1891                     Some  IR protocols like NEC have a special IR message for
1892                     repeating last button, for when a button is held down.
1893
1894                     The ctx should point to the lirc sample  as  passed  into
1895                     the program.
1896
1897                     This  helper is only available is the kernel was compiled
1898                     with the CONFIG_BPF_LIRC_MODE2 configuration  option  set
1899                     to "y".
1900
1901              Return 0
1902
1903       int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
1904
1905              Description
1906                     This helper is used in programs implementing IR decoding,
1907                     to report a successfully decoded key press with scancode,
1908                     toggle  value in the given protocol. The scancode will be
1909                     translated to a keycode using the rc keymap, and reported
1910                     as an input key down event. After a period a key up event
1911                     is generated. This period  can  be  extended  by  calling
1912                     either  bpf_rc_keydown()  again  with the same values, or
1913                     calling bpf_rc_repeat().
1914
1915                     Some protocols include a toggle bit, in case  the  button
1916                     was  released and pressed again between consecutive scan‐
1917                     codes.
1918
1919                     The ctx should point to the lirc sample  as  passed  into
1920                     the program.
1921
1922                     The  protocol  is  the  decoded protocol number (see enum
1923                     rc_proto for some predefined values).
1924
1925                     This helper is only available is the kernel was  compiled
1926                     with  the  CONFIG_BPF_LIRC_MODE2 configuration option set
1927                     to "y".
1928
1929              Return 0
1930
1931       u64 bpf_skb_cgroup_id(struct sk_buff *skb)
1932
1933              Description
1934                     Return the cgroup v2 id of the socket associated with the
1935                     skb.  This is roughly similar to the bpf_get_cgroup_clas‐
1936                     sid() helper for cgroup v1 by providing a tag resp. iden‐
1937                     tifier  that  can  be  matched on or used for map lookups
1938                     e.g. to implement policy. The cgroup v2  id  of  a  given
1939                     path  in  the  hierarchy is exposed in user space through
1940                     the f_handle API in order to get to the same 64-bit id.
1941
1942                     This helper can be used on TC egress  path,  but  not  on
1943                     ingress, and is available only if the kernel was compiled
1944                     with the CONFIG_SOCK_CGROUP_DATA configuration option.
1945
1946              Return The id is returned or 0 in  case  the  id  could  not  be
1947                     retrieved.
1948
1949       u64 bpf_get_current_cgroup_id(void)
1950
1951              Return A  64-bit  integer containing the current cgroup id based
1952                     on the cgroup within which the current task is running.
1953
1954       void *bpf_get_local_storage(void *map, u64 flags)
1955
1956              Description
1957                     Get the pointer to the local storage area.  The type  and
1958                     the size of the local storage is defined by the map argu‐
1959                     ment.  The flags meaning is specific for each  map  type,
1960                     and has to be 0 for cgroup local storage.
1961
1962                     Depending  on  the BPF program type, a local storage area
1963                     can be shared between multiple instances of the BPF  pro‐
1964                     gram, running simultaneously.
1965
1966                     A  user should care about the synchronization by himself.
1967                     For example, by using  the  BPF_STX_XADD  instruction  to
1968                     alter the shared data.
1969
1970              Return A pointer to the local storage area.
1971
1972       int   bpf_sk_select_reuseport(struct   sk_reuseport_md  *reuse,  struct
1973       bpf_map *map, void *key, u64 flags)
1974
1975              Description
1976                     Select a SO_REUSEPORT socket from  a  BPF_MAP_TYPE_REUSE‐
1977                     PORT_ARRAY  map.  It checks the selected socket is match‐
1978                     ing the incoming request in the socket buffer.
1979
1980              Return 0 on success, or a negative error in case of failure.
1981
1982       u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level)
1983
1984              Description
1985                     Return id of cgroup v2 that is ancestor of cgroup associ‐
1986                     ated with the skb at the ancestor_level.  The root cgroup
1987                     is at ancestor_level zero and each step down the  hierar‐
1988                     chy  increments  the level. If ancestor_level == level of
1989                     cgroup associated with skb, then  return  value  will  be
1990                     same as that of bpf_skb_cgroup_id().
1991
1992                     The  helper  is  useful  to  implement  policies based on
1993                     cgroups that are upper in hierarchy than immediate cgroup
1994                     associated with skb.
1995
1996                     The format of returned id and helper limitations are same
1997                     as in bpf_skb_cgroup_id().
1998
1999              Return The id is returned or 0 in  case  the  id  could  not  be
2000                     retrieved.
2001
2002       struct  bpf_sock  *bpf_sk_lookup_tcp(void  *ctx,  struct bpf_sock_tuple
2003       *tuple, u32 tuple_size, u64 netns, u64 flags)
2004
2005              Description
2006                     Look for TCP socket matching tuple, optionally in a child
2007                     network   namespace  netns.  The  return  value  must  be
2008                     checked, and if non-NULL, released via bpf_sk_release().
2009
2010                     The ctx should point to the context of the program,  such
2011                     as the skb or socket (depending on the hook in use). This
2012                     is used to determine the base network namespace  for  the
2013                     lookup.
2014
2015                     tuple_size must be one of:
2016
2017                     sizeof(tuple->ipv4)
2018                            Look for an IPv4 socket.
2019
2020                     sizeof(tuple->ipv6)
2021                            Look for an IPv6 socket.
2022
2023                     If  the  netns  is a negative signed 32-bit integer, then
2024                     the socket lookup table in the netns associated with  the
2025                     ctx  will  will  be  used.  For the TC hooks, this is the
2026                     netns of the device in the skb. For socket hooks, this is
2027                     the  netns  of  the socket.  If netns is any other signed
2028                     32-bit value greater than or equal to zero then it speci‐
2029                     fies the ID of the netns relative to the netns associated
2030                     with the ctx. netns values beyond  the  range  of  32-bit
2031                     integers are reserved for future use.
2032
2033                     All  values  for flags are reserved for future usage, and
2034                     must be left at zero.
2035
2036                     This helper is available only if the kernel was  compiled
2037                     with CONFIG_NET configuration option.
2038
2039              Return Pointer  to  struct bpf_sock, or NULL in case of failure.
2040                     For sockets with reuseport option,  the  struct  bpf_sock
2041                     result  is  from  reuse->socks[]  using  the  hash of the
2042                     tuple.
2043
2044       struct bpf_sock  *bpf_sk_lookup_udp(void  *ctx,  struct  bpf_sock_tuple
2045       *tuple, u32 tuple_size, u64 netns, u64 flags)
2046
2047              Description
2048                     Look for UDP socket matching tuple, optionally in a child
2049                     network  namespace  netns.  The  return  value  must   be
2050                     checked, and if non-NULL, released via bpf_sk_release().
2051
2052                     The  ctx should point to the context of the program, such
2053                     as the skb or socket (depending on the hook in use). This
2054                     is  used  to determine the base network namespace for the
2055                     lookup.
2056
2057                     tuple_size must be one of:
2058
2059                     sizeof(tuple->ipv4)
2060                            Look for an IPv4 socket.
2061
2062                     sizeof(tuple->ipv6)
2063                            Look for an IPv6 socket.
2064
2065                     If the netns is a negative signed  32-bit  integer,  then
2066                     the  socket lookup table in the netns associated with the
2067                     ctx will will be used. For the  TC  hooks,  this  is  the
2068                     netns of the device in the skb. For socket hooks, this is
2069                     the netns of the socket.  If netns is  any  other  signed
2070                     32-bit value greater than or equal to zero then it speci‐
2071                     fies the ID of the netns relative to the netns associated
2072                     with  the  ctx.  netns  values beyond the range of 32-bit
2073                     integers are reserved for future use.
2074
2075                     All values for flags are reserved for future  usage,  and
2076                     must be left at zero.
2077
2078                     This  helper is available only if the kernel was compiled
2079                     with CONFIG_NET configuration option.
2080
2081              Return Pointer to struct bpf_sock, or NULL in case  of  failure.
2082                     For  sockets  with  reuseport option, the struct bpf_sock
2083                     result is from  reuse->socks[]  using  the  hash  of  the
2084                     tuple.
2085
2086       int bpf_sk_release(struct bpf_sock *sock)
2087
2088              Description
2089                     Release  the  reference  held  by  sock.  sock  must be a
2090                     non-NULL    pointer    that     was     returned     from
2091                     bpf_sk_lookup_xxx().
2092
2093              Return 0 on success, or a negative error in case of failure.
2094
2095       int  bpf_map_push_elem(struct  bpf_map  *map,  const  void  *value, u64
2096       flags)
2097
2098              Description
2099                     Push an element value in map. flags is one of:
2100
2101                     BPF_EXIST
2102                            If the queue/stack is full, the oldest element  is
2103                            removed to make room for this.
2104
2105              Return 0 on success, or a negative error in case of failure.
2106
2107       int bpf_map_pop_elem(struct bpf_map *map, void *value)
2108
2109              Description
2110                     Pop an element from map.
2111
2112              Return 0 on success, or a negative error in case of failure.
2113
2114       int bpf_map_peek_elem(struct bpf_map *map, void *value)
2115
2116              Description
2117                     Get an element from map without removing it.
2118
2119              Return 0 on success, or a negative error in case of failure.
2120
2121       int  bpf_msg_push_data(struct  sk_buff  *skb,  u32  start, u32 len, u64
2122       flags)
2123
2124              Description
2125                     For socket policies, insert len bytes into msg at  offset
2126                     start.
2127
2128                     If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg
2129                     it may want to insert metadata or options into  the  msg.
2130                     This can later be read and used by any of the lower layer
2131                     BPF hooks.
2132
2133                     This helper may fail if under memory pressure  (a  malloc
2134                     fails)  in these cases BPF programs will get an appropri‐
2135                     ate error and BPF programs will need to handle them.
2136
2137              Return 0 on success, or a negative error in case of failure.
2138
2139       int bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32  pop,  u64
2140       flags)
2141
2142              Description
2143                     Will  remove pop bytes from a msg starting at byte start.
2144                     This may result in ENOMEM errors under certain situations
2145                     if an allocation and copy are required due to a full ring
2146                     buffer.  However, the helper will try to avoid doing  the
2147                     allocation  if  possible. Other errors can occur if input
2148                     parameters are invalid either due to start byte not being
2149                     valid  part  of  msg  payload  and/or  pop value being to
2150                     large.
2151
2152              Return 0 on success, or a negative error in case of failure.
2153
2154       int bpf_rc_pointer_rel(void *ctx, s32 rel_x, s32 rel_y)
2155
2156              Description
2157                     This helper is used in programs implementing IR decoding,
2158                     to report a successfully decoded pointer movement.
2159
2160                     The  ctx  should  point to the lirc sample as passed into
2161                     the program.
2162
2163                     This helper is only available is the kernel was  compiled
2164                     with  the  CONFIG_BPF_LIRC_MODE2 configuration option set
2165                     to "y".
2166
2167              Return 0
2168
2169       int bpf_spin_lock(struct bpf_spin_lock *lock)
2170
2171              Description
2172                     Acquire a spinlock represented by the pointer lock, which
2173                     is  stored  as  part of a value of a map. Taking the lock
2174                     allows to safely update the rest of the  fields  in  that
2175                     value. The spinlock can (and must) later be released with
2176                     a call to bpf_spin_unlock(lock).
2177
2178                     Spinlocks in BPF programs come with a number of  restric‐
2179                     tions and constraints:
2180
2181                     · bpf_spin_lock  objects  are only allowed inside maps of
2182                       types BPF_MAP_TYPE_HASH  and  BPF_MAP_TYPE_ARRAY  (this
2183                       list could be extended in the future).
2184
2185                     · BTF description of the map is mandatory.
2186
2187                     · The BPF program can take ONE lock at a time, since tak‐
2188                       ing two or more could cause dead locks.
2189
2190                     · Only one struct bpf_spin_lock is allowed per  map  ele‐
2191                       ment.
2192
2193                     · When  the  lock  is  taken, calls (either BPF to BPF or
2194                       helpers) are not allowed.
2195
2196                     · The BPF_LD_ABS  and  BPF_LD_IND  instructions  are  not
2197                       allowed inside a spinlock-ed region.
2198
2199                     · The  BPF program MUST call bpf_spin_unlock() to release
2200                       the lock, on all execution paths, before it returns.
2201
2202                     · The BPF program can access  struct  bpf_spin_lock  only
2203                       via  the bpf_spin_lock() and bpf_spin_unlock() helpers.
2204                       Loading or storing data into the  struct  bpf_spin_lock
2205                       lock; field of a map is not allowed.
2206
2207                     · To  use the bpf_spin_lock() helper, the BTF description
2208                       of the map value must  be  a  struct  and  have  struct
2209                       bpf_spin_lock  anyname; field at the top level.  Nested
2210                       lock inside another struct is not allowed.
2211
2212                     · The struct bpf_spin_lock lock field in a map value must
2213                       be aligned on a multiple of 4 bytes in that value.
2214
2215                     · Syscall  with command BPF_MAP_LOOKUP_ELEM does not copy
2216                       the bpf_spin_lock field to user space.
2217
2218                     · Syscall with  command  BPF_MAP_UPDATE_ELEM,  or  update
2219                       from  a  BPF  program,  do not update the bpf_spin_lock
2220                       field.
2221
2222                     · bpf_spin_lock cannot be on the stack or inside  a  net‐
2223                       working packet (it can only be inside of a map values).
2224
2225                     · bpf_spin_lock is available to root only.
2226
2227                     · Tracing  programs and socket filter programs cannot use
2228                       bpf_spin_lock() due to insufficient  preemption  checks
2229                       (but this may change in the future).
2230
2231                     · bpf_spin_lock   is   not   allowed  in  inner  maps  of
2232                       map-in-map.
2233
2234              Return 0
2235
2236       int bpf_spin_unlock(struct bpf_spin_lock *lock)
2237
2238              Description
2239                     Release  the  lock  previously  locked  by  a   call   to
2240                     bpf_spin_lock(lock).
2241
2242              Return 0
2243
2244       struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
2245
2246              Description
2247                     This  helper gets a struct bpf_sock pointer such that all
2248                     the fields in this bpf_sock can be accessed.
2249
2250              Return A struct bpf_sock pointer on success, or NULL in case  of
2251                     failure.
2252
2253       struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
2254
2255              Description
2256                     This  helper  gets  a  struct bpf_tcp_sock pointer from a
2257                     struct bpf_sock pointer.
2258
2259              Return A struct bpf_tcp_sock pointer on success, or NULL in case
2260                     of failure.
2261
2262       int bpf_skb_ecn_set_ce(struct sk_buf *skb)
2263
2264              Description
2265                     Set  ECN  (Explicit  Congestion Notification) field of IP
2266                     header to CE (Congestion Encountered) if current value is
2267                     ECT (ECN Capable Transport). Otherwise, do nothing. Works
2268                     with IPv6 and IPv4.
2269
2270              Return 1 if the CE flag is set (either  by  the  current  helper
2271                     call  or  because it was already present), 0 if it is not
2272                     set.
2273
2274       struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)
2275
2276              Description
2277                     Return a struct bpf_sock  pointer  in  TCP_LISTEN  state.
2278                     bpf_sk_release() is unnecessary and not allowed.
2279
2280              Return A  struct bpf_sock pointer on success, or NULL in case of
2281                     failure.
2282
2283       struct bpf_sock *bpf_skc_lookup_tcp(void  *ctx,  struct  bpf_sock_tuple
2284       *tuple, u32 tuple_size, u64 netns, u64 flags)
2285
2286              Description
2287                     Look for TCP socket matching tuple, optionally in a child
2288                     network  namespace  netns.  The  return  value  must   be
2289                     checked, and if non-NULL, released via bpf_sk_release().
2290
2291                     This function is identical to bpf_sk_lookup_tcp(), except
2292                     that it also returns timewait  or  request  sockets.  Use
2293                     bpf_sk_fullsock()  or  bpf_tcp_sock()  to access the full
2294                     structure.
2295
2296                     This helper is available only if the kernel was  compiled
2297                     with CONFIG_NET configuration option.
2298
2299              Return Pointer  to  struct bpf_sock, or NULL in case of failure.
2300                     For sockets with reuseport option,  the  struct  bpf_sock
2301                     result  is  from  reuse->socks[]  using  the  hash of the
2302                     tuple.
2303
2304       int  bpf_tcp_check_syncookie(struct  bpf_sock  *sk,  void   *iph,   u32
2305       iph_len, struct tcphdr *th, u32 th_len)
2306
2307              Description
2308                     Check  whether  iph and th contain a valid SYN cookie ACK
2309                     for the listening socket in sk.
2310
2311                     iph points to the start of the IPv4 or IPv6 header, while
2312                     iph_len  contains  sizeof(struct  iphdr) or sizeof(struct
2313                     ip6hdr).
2314
2315                     th points to the start of the TCP  header,  while  th_len
2316                     contains sizeof(struct tcphdr).
2317
2318              Return 0 if iph and th are a valid SYN cookie ACK, or a negative
2319                     error otherwise.
2320
2321       int  bpf_sysctl_get_name(struct  bpf_sysctl  *ctx,  char  *buf,  size_t
2322       buf_len, u64 flags)
2323
2324              Description
2325                     Get  name  of  sysctl in /proc/sys/ and copy it into pro‐
2326                     vided by program buffer buf of size buf_len.
2327
2328                     The  buffer  is  always  NUL  terminated,   unless   it's
2329                     zero-sized.
2330
2331                     If  flags is zero, full name (e.g. "net/ipv4/tcp_mem") is
2332                     copied. Use BPF_F_SYSCTL_BASE_NAME flag to copy base name
2333                     only (e.g. "tcp_mem").
2334
2335              Return Number  of  character  copied (not including the trailing
2336                     NUL).
2337
2338                     -E2BIG if the buffer wasn't big enough (buf will  contain
2339                     truncated name in this case).
2340
2341       int  bpf_sysctl_get_current_value(struct  bpf_sysctl  *ctx,  char *buf,
2342       size_t buf_len)
2343
2344              Description
2345                     Get current  value  of  sysctl  as  it  is  presented  in
2346                     /proc/sys  (incl.  newline, etc), and copy it as a string
2347                     into provided by program buffer buf of size buf_len.
2348
2349                     The whole value is copied, no matter what  file  position
2350                     user space issued e.g. sys_read at.
2351
2352                     The   buffer   is  always  NUL  terminated,  unless  it's
2353                     zero-sized.
2354
2355              Return Number of character copied (not  including  the  trailing
2356                     NUL).
2357
2358                     -E2BIG  if the buffer wasn't big enough (buf will contain
2359                     truncated name in this case).
2360
2361                     -EINVAL if current value was  unavailable,  e.g.  because
2362                     sysctl is uninitialized and read returns -EIO for it.
2363
2364       int  bpf_sysctl_get_new_value(struct bpf_sysctl *ctx, char *buf, size_t
2365       buf_len)
2366
2367              Description
2368                     Get new value being  written  by  user  space  to  sysctl
2369                     (before the actual write happens) and copy it as a string
2370                     into provided by program buffer buf of size buf_len.
2371
2372                     User space may write new value at file position > 0.
2373
2374                     The  buffer  is  always  NUL  terminated,   unless   it's
2375                     zero-sized.
2376
2377              Return Number  of  character  copied (not including the trailing
2378                     NUL).
2379
2380                     -E2BIG if the buffer wasn't big enough (buf will  contain
2381                     truncated name in this case).
2382
2383                     -EINVAL if sysctl is being read.
2384
2385       int  bpf_sysctl_set_new_value(struct  bpf_sysctl *ctx, const char *buf,
2386       size_t buf_len)
2387
2388              Description
2389                     Override new value being written by user space to  sysctl
2390                     with  value  provided  by  program  in buffer buf of size
2391                     buf_len.
2392
2393                     buf should contain a string in same form as  provided  by
2394                     user space on sysctl write.
2395
2396                     User  space  may write new value at file position > 0. To
2397                     override the whole sysctl value file position  should  be
2398                     set to zero.
2399
2400              Return 0 on success.
2401
2402                     -E2BIG if the buf_len is too big.
2403
2404                     -EINVAL if sysctl is being read.
2405
2406       int bpf_strtol(const char *buf, size_t buf_len, u64 flags, long *res)
2407
2408              Description
2409                     Convert the initial part of the string from buffer buf of
2410                     size buf_len to a long integer  according  to  the  given
2411                     base and save the result in res.
2412
2413                     The  string  may  begin with an arbitrary amount of white
2414                     space (as determined by isspace(3)) followed by a  single
2415                     optional '-' sign.
2416
2417                     Five  least  significant bits of flags encode base, other
2418                     bits are currently unused.
2419
2420                     Base must be either 8, 10, 16 or 0 to detect it automati‐
2421                     cally similar to user space strtol(3).
2422
2423              Return Number  of  characters consumed on success. Must be posi‐
2424                     tive but no more than buf_len.
2425
2426                     -EINVAL if no valid digits were found or unsupported base
2427                     was provided.
2428
2429                     -ERANGE if resulting value was out of range.
2430
2431       int  bpf_strtoul(const  char  *buf, size_t buf_len, u64 flags, unsigned
2432       long *res)
2433
2434              Description
2435                     Convert the initial part of the string from buffer buf of
2436                     size buf_len to an unsigned long integer according to the
2437                     given base and save the result in res.
2438
2439                     The string may begin with an arbitrary  amount  of  white
2440                     space (as determined by isspace(3)).
2441
2442                     Five  least  significant bits of flags encode base, other
2443                     bits are currently unused.
2444
2445                     Base must be either 8, 10, 16 or 0 to detect it automati‐
2446                     cally similar to user space strtoul(3).
2447
2448              Return Number  of  characters consumed on success. Must be posi‐
2449                     tive but no more than buf_len.
2450
2451                     -EINVAL if no valid digits were found or unsupported base
2452                     was provided.
2453
2454                     -ERANGE if resulting value was out of range.
2455
2456       void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void
2457       *value, u64 flags)
2458
2459              Description
2460                     Get a bpf-local-storage from a sk.
2461
2462                     Logically, it could be thought of getting the value  from
2463                     a  map  with  sk as the key.  From this perspective,  the
2464                     usage is not much different from bpf_map_lookup_elem(map,
2465                     &sk)  except  this helper enforces the key must be a full
2466                     socket and the  map  must  be  a  BPF_MAP_TYPE_SK_STORAGE
2467                     also.
2468
2469                     Underneath,  the value is stored locally at sk instead of
2470                     the map.   The  map  is  used  as  the  bpf-local-storage
2471                     "type".  The  bpf-local-storage  "type" (i.e. the map) is
2472                     searched against all bpf-local-storages residing at sk.
2473
2474                     An optional flags  (BPF_SK_STORAGE_GET_F_CREATE)  can  be
2475                     used such that a new bpf-local-storage will be created if
2476                     one does not exist.  value  can  be  used  together  with
2477                     BPF_SK_STORAGE_GET_F_CREATE  to specify the initial value
2478                     of a  bpf-local-storage.   If  value  is  NULL,  the  new
2479                     bpf-local-storage will be zero initialized.
2480
2481              Return A bpf-local-storage pointer is returned on success.
2482
2483                     NULL  if  not found or there was an error in adding a new
2484                     bpf-local-storage.
2485
2486       int bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk)
2487
2488              Description
2489                     Delete a bpf-local-storage from a sk.
2490
2491              Return 0 on success.
2492
2493                     -ENOENT if the bpf-local-storage cannot be found.
2494
2495       int bpf_send_signal(u32 sig)
2496
2497              Description
2498                     Send signal sig to the current task.
2499
2500              Return 0 on success or successfully queued.
2501
2502                     -EBUSY if work queue under nmi is full.
2503
2504                     -EINVAL if sig is invalid.
2505
2506                     -EPERM if no permission to send the sig.
2507
2508                     -EAGAIN if bpf program can try again.
2509
2510       s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32  iph_len,
2511       struct tcphdr *th, u32 th_len)
2512
2513              Description
2514                     Try to issue a SYN cookie for the packet with correspond‐
2515                     ing IP/TCP headers, iph and th, on the  listening  socket
2516                     in sk.
2517
2518                     iph points to the start of the IPv4 or IPv6 header, while
2519                     iph_len contains sizeof(struct  iphdr)  or  sizeof(struct
2520                     ip6hdr).
2521
2522                     th  points  to  the start of the TCP header, while th_len
2523                     contains the length of the TCP header.
2524
2525              Return On success, lower 32 bits hold the generated  SYN  cookie
2526                     in  followed by 16 bits which hold the MSS value for that
2527                     cookie, and the top 16 bits are unused.
2528
2529                     On failure, the returned value is one of the following:
2530
2531                     -EINVAL SYN cookie cannot be issued due to error
2532
2533                     -ENOENT SYN cookie should not be issued (no SYN flood)
2534
2535                     -EOPNOTSUPP kernel  configuration  does  not  enable  SYN
2536                     cookies
2537
2538                     -EPROTONOSUPPORT IP packet version is not 4 or 6
2539

EXAMPLES

2541       Example  usage  for most of the eBPF helpers listed in this manual page
2542       are available within the Linux kernel sources, at the  following  loca‐
2543       tions:
2544
2545       · samples/bpf/
2546
2547       · tools/testing/selftests/bpf/
2548

LICENSE

2550       eBPF  programs  can  have  an associated license, passed along with the
2551       bytecode instructions to the kernel when the programs are  loaded.  The
2552       format  for  that string is identical to the one in use for kernel mod‐
2553       ules (Dual licenses, such as "Dual BSD/GPL", may be used). Some  helper
2554       functions  are only accessible to programs that are compatible with the
2555       GNU Privacy License (GPL).
2556
2557       In order to use such helpers, the eBPF program must be loaded with  the
2558       correct  license string passed (via attr) to the bpf() system call, and
2559       this generally translates into the C source code of  the  program  con‐
2560       taining a line similar to the following:
2561
2562          char ____license[] __attribute__((section("license"), used)) = "GPL";
2563

IMPLEMENTATION

2565       This  manual  page  is  an  effort to document the existing eBPF helper
2566       functions.  But as of this writing, the BPF sub-system is  under  heavy
2567       development.  New  eBPF  program or map types are added, along with new
2568       helper functions. Some helpers  are  occasionally  made  available  for
2569       additional  program types. So in spite of the efforts of the community,
2570       this page might not be up-to-date. If you want  to  check  by  yourself
2571       what  helper  functions exist in your kernel, or what types of programs
2572       they can support, here are some files among the kernel  tree  that  you
2573       may be interested in:
2574
2575       · include/uapi/linux/bpf.h is the main BPF header. It contains the full
2576         list of all helper functions, as well as many other  BPF  definitions
2577         including  most  of  the  flags,  structs  or  constants  used by the
2578         helpers.
2579
2580       · net/core/filter.c contains the  definition  of  most  network-related
2581         helper  functions,  and the list of program types from which they can
2582         be used.
2583
2584       · kernel/trace/bpf_trace.c is the  equivalent  for  most  tracing  pro‐
2585         gram-related helpers.
2586
2587       · kernel/bpf/verifier.c contains the functions used to check that valid
2588         types of eBPF maps are used with a given helper function.
2589
2590       · kernel/bpf/  directory  contains  other  files  in  which  additional
2591         helpers are defined (for cgroups, sockmaps, etc.).
2592
2593       Compatibility  between helper functions and program types can generally
2594       be found in the files where helper functions are defined. Look for  the
2595       struct  bpf_func_proto  objects and for functions returning them: these
2596       functions contain a list of helpers that a given program type can call.
2597       Note  that  the  default:  label  of the switch ... case used to filter
2598       helpers can call other functions, themselves allowing access  to  addi‐
2599       tional helpers. The requirement for GPL license is also in those struct
2600       bpf_func_proto.
2601
2602       Compatibility between helper functions and map types can  be  found  in
2603       the  check_map_func_compatibility()  function  in file kernel/bpf/veri‐
2604       fier.c.
2605
2606       Helper functions that invalidate the checks on data and data_end point‐
2607       ers     for    network    processing    are    listed    in    function
2608       bpf_helper_changes_pkt_data() in file net/core/filter.c.
2609

COLOPHON

2615       This  page  is  part of release 5.04 of the Linux man-pages project.  A
2616       description of the project, information about reporting bugs,  and  the
2617       latest     version     of     this    page,    can    be    found    at
2618       https://www.kernel.org/doc/man-pages/.
2619
2620
2621
2622Linux                             2019-11-19                    BPF-HELPERS(7)