1BPF(2)                     Linux Programmer's Manual                    BPF(2)
2
3
4

NAME

6       bpf - perform a command on an extended BPF map or program
7

SYNOPSIS

9       #include <linux/bpf.h>
10
11       int bpf(int cmd, union bpf_attr *attr, unsigned int size);
12

DESCRIPTION

14       The  bpf()  system  call  performs  a  range  of  operations related to
15       extended Berkeley Packet Filters.  Extended BPF (or eBPF) is similar to
16       the  original  ("classic")  BPF  (cBPF) used to filter network packets.
17       For both cBPF and eBPF programs, the  kernel  statically  analyzes  the
18       programs  before loading them, in order to ensure that they cannot harm
19       the running system.
20
21       eBPF extends cBPF in multiple ways, including the  ability  to  call  a
22       fixed set of in-kernel helper functions (via the BPF_CALL opcode exten‐
23       sion provided by eBPF) and access shared data structures such  as  eBPF
24       maps.
25
26   Extended BPF Design/Architecture
27       eBPF  maps  are  a generic data structure for storage of different data
28       types.  Data types are generally treated as binary  blobs,  so  a  user
29       just  specifies  the  size of the key and the size of the value at map-
30       creation time.  In other words, a key/value for a given map can have an
31       arbitrary structure.
32
33       A  user  process  can  create multiple maps (with key/value-pairs being
34       opaque bytes of data) and access them via file descriptors.   Different
35       eBPF  programs  can  access  the same maps in parallel.  It's up to the
36       user process and eBPF program to decide what they store inside maps.
37
38       There's one special map type, called a program array.  This type of map
39       stores  file  descriptors  referring  to  other  eBPF programs.  When a
40       lookup in the map is performed, the program flow is redirected in-place
41       to  the  beginning  of another eBPF program and does not return back to
42       the calling program.  The level of nesting has a fixed limit of 32,  so
43       that  infinite  loops cannot be crafted.  At run time, the program file
44       descriptors stored in the map can be modified, so program functionality
45       can  be  altered based on specific requirements.  All programs referred
46       to in a program-array map must have been  previously  loaded  into  the
47       kernel via bpf().  If a map lookup fails, the current program continues
48       its execution.  See BPF_MAP_TYPE_PROG_ARRAY below for further details.
49
50       Generally, eBPF programs are loaded by the user process  and  automati‐
51       cally unloaded when the process exits.  In some cases, for example, tc-
52       bpf(8), the program will continue to stay alive inside the kernel  even
53       after  the process that loaded the program exits.  In that case, the tc
54       subsystem holds a reference to the eBPF program after the file descrip‐
55       tor  has  been  closed by the user-space program.  Thus, whether a spe‐
56       cific program continues to live inside the kernel depends on how it  is
57       further  attached  to  a given kernel subsystem after it was loaded via
58       bpf().
59
60       Each eBPF program is a set of instructions that is safe  to  run  until
61       its  completion.   An in-kernel verifier statically determines that the
62       eBPF program terminates and is safe to execute.   During  verification,
63       the  kernel  increments  reference counts for each of the maps that the
64       eBPF program uses, so that the attached maps can't be removed until the
65       program is unloaded.
66
67       eBPF programs can be attached to different events.  These events can be
68       the arrival of network packets, tracing events,  classification  events
69       by network queueing  disciplines (for eBPF programs attached to a tc(8)
70       classifier), and other types that may be added in the  future.   A  new
71       event  triggers execution of the eBPF program, which may store informa‐
72       tion about the event in eBPF maps.  Beyond storing data, eBPF  programs
73       may call a fixed set of in-kernel helper functions.
74
75       The  same eBPF program can be attached to multiple events and different
76       eBPF programs can access the same map:
77
78           tracing     tracing    tracing    packet      packet     packet
79           event A     event B    event C    on eth0     on eth1    on eth2
80            |             |         |          |           |          ^
81            |             |         |          |           v          |
82            --> tracing <--     tracing      socket    tc ingress   tc egress
83                 prog_1          prog_2      prog_3    classifier    action
84                 |  |              |           |         prog_4      prog_5
85              |---  -----|  |------|          map_3        |           |
86            map_1       map_2                              --| map_4 |--
87
88   Arguments
89       The operation to be performed by the bpf() system call is determined by
90       the  cmd argument.  Each operation takes an accompanying argument, pro‐
91       vided via attr, which is a pointer to a union  of  type  bpf_attr  (see
92       below).  The size argument is the size of the union pointed to by attr.
93
94       The value provided in cmd is one of the following:
95
96       BPF_MAP_CREATE
97              Create  a  map  and  return a file descriptor that refers to the
98              map.  The close-on-exec file descriptor flag (see  fcntl(2))  is
99              automatically enabled for the new file descriptor.
100
101       BPF_MAP_LOOKUP_ELEM
102              Look  up  an  element  by  key in a specified map and return its
103              value.
104
105       BPF_MAP_UPDATE_ELEM
106              Create or update an element (key/value pair) in a specified map.
107
108       BPF_MAP_DELETE_ELEM
109              Look up and delete an element by key in a specified map.
110
111       BPF_MAP_GET_NEXT_KEY
112              Look up an element by key in a specified map and return the  key
113              of the next element.
114
115       BPF_PROG_LOAD
116              Verify and load an eBPF program, returning a new file descriptor
117              associated with the program.  The close-on-exec file  descriptor
118              flag  (see  fcntl(2))  is automatically enabled for the new file
119              descriptor.
120
121              The bpf_attr union consists of various anonymous structures that
122              are used by different bpf() commands:
123
124           union bpf_attr {
125               struct {    /* Used by BPF_MAP_CREATE */
126                   __u32         map_type;
127                   __u32         key_size;    /* size of key in bytes */
128                   __u32         value_size;  /* size of value in bytes */
129                   __u32         max_entries; /* maximum number of entries
130                                                 in a map */
131               };
132
133               struct {    /* Used by BPF_MAP_*_ELEM and BPF_MAP_GET_NEXT_KEY
134                              commands */
135                   __u32         map_fd;
136                   __aligned_u64 key;
137                   union {
138                       __aligned_u64 value;
139                       __aligned_u64 next_key;
140                   };
141                   __u64         flags;
142               };
143
144               struct {    /* Used by BPF_PROG_LOAD */
145                   __u32         prog_type;
146                   __u32         insn_cnt;
147                   __aligned_u64 insns;      /* 'const struct bpf_insn *' */
148                   __aligned_u64 license;    /* 'const char *' */
149                   __u32         log_level;  /* verbosity level of verifier */
150                   __u32         log_size;   /* size of user buffer */
151                   __aligned_u64 log_buf;    /* user supplied 'char *'
152                                                buffer */
153                   __u32         kern_version;
154                                             /* checked when prog_type=kprobe
155                                                (since Linux 4.1) */
156               };
157           } __attribute__((aligned(8)));
158
159   eBPF maps
160       Maps  are  a  generic  data structure for storage of different types of
161       data.  They allow sharing of data between  eBPF  kernel  programs,  and
162       also between kernel and user-space applications.
163
164       Each map type has the following attributes:
165
166       *  type
167
168       *  maximum number of elements
169
170       *  key size in bytes
171
172       *  value size in bytes
173
174       The  following wrapper functions demonstrate how various bpf() commands
175       can be used to access the maps.  The functions use the cmd argument  to
176       invoke different operations.
177
178       BPF_MAP_CREATE
179              The  BPF_MAP_CREATE  command  creates a new map, returning a new
180              file descriptor that refers to the map.
181
182                  int
183                  bpf_create_map(enum bpf_map_type map_type,
184                                 unsigned int key_size,
185                                 unsigned int value_size,
186                                 unsigned int max_entries)
187                  {
188                      union bpf_attr attr = {
189                          .map_type    = map_type,
190                          .key_size    = key_size,
191                          .value_size  = value_size,
192                          .max_entries = max_entries
193                      };
194
195                      return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
196                  }
197
198              The new map has the type specified by map_type,  and  attributes
199              as  specified in key_size, value_size, and max_entries.  On suc‐
200              cess, this operation returns a file descriptor.  On error, -1 is
201              returned and errno is set to EINVAL, EPERM, or ENOMEM.
202
203              The key_size and value_size attributes will be used by the veri‐
204              fier during program loading to check that the program is calling
205              bpf_map_*_elem()  helper  functions with a correctly initialized
206              key and to check that the program doesn't access the map element
207              value  beyond the specified value_size.  For example, when a map
208              is created with a key_size of 8 and the eBPF program calls
209
210                  bpf_map_lookup_elem(map_fd, fp - 4)
211
212              the program will be rejected, since the in-kernel  helper  func‐
213              tion
214
215                  bpf_map_lookup_elem(map_fd, void *key)
216
217              expects to read 8 bytes from the location pointed to by key, but
218              the fp - 4 (where fp is the top of the stack)  starting  address
219              will cause out-of-bounds stack access.
220
221              Similarly,  when a map is created with a value_size of 1 and the
222              eBPF program contains
223
224                  value = bpf_map_lookup_elem(...);
225                  *(u32 *) value = 1;
226
227              the program will  be  rejected,  since  it  accesses  the  value
228              pointer beyond the specified 1 byte value_size limit.
229
230              Currently, the following values are supported for map_type:
231
232                  enum bpf_map_type {
233                      BPF_MAP_TYPE_UNSPEC,  /* Reserve 0 as invalid map type */
234                      BPF_MAP_TYPE_HASH,
235                      BPF_MAP_TYPE_ARRAY,
236                      BPF_MAP_TYPE_PROG_ARRAY,
237                      BPF_MAP_TYPE_PERF_EVENT_ARRAY,
238                      BPF_MAP_TYPE_PERCPU_HASH,
239                      BPF_MAP_TYPE_PERCPU_ARRAY,
240                      BPF_MAP_TYPE_STACK_TRACE,
241                      BPF_MAP_TYPE_CGROUP_ARRAY,
242                      BPF_MAP_TYPE_LRU_HASH,
243                      BPF_MAP_TYPE_LRU_PERCPU_HASH,
244                      BPF_MAP_TYPE_LPM_TRIE,
245                      BPF_MAP_TYPE_ARRAY_OF_MAPS,
246                      BPF_MAP_TYPE_HASH_OF_MAPS,
247                      BPF_MAP_TYPE_DEVMAP,
248                      BPF_MAP_TYPE_SOCKMAP,
249                      BPF_MAP_TYPE_CPUMAP,
250                  };
251
252              map_type selects one of the available map implementations in the
253              kernel.  For all map types, eBPF programs access maps  with  the
254              same   bpf_map_lookup_elem()  and  bpf_map_update_elem()  helper
255              functions.  Further details of the various map types  are  given
256              below.
257
258       BPF_MAP_LOOKUP_ELEM
259              The BPF_MAP_LOOKUP_ELEM command looks up an element with a given
260              key in the map referred to by the file descriptor fd.
261
262                  int
263                  bpf_lookup_elem(int fd, const void *key, void *value)
264                  {
265                      union bpf_attr attr = {
266                          .map_fd = fd,
267                          .key    = ptr_to_u64(key),
268                          .value  = ptr_to_u64(value),
269                      };
270
271                      return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
272                  }
273
274              If an element is found, the operation returns  zero  and  stores
275              the  element's value into value, which must point to a buffer of
276              value_size bytes.
277
278              If no element is found, the operation returns -1 and sets  errno
279              to ENOENT.
280
281       BPF_MAP_UPDATE_ELEM
282              The  BPF_MAP_UPDATE_ELEM  command  creates or updates an element
283              with a given key/value in  the  map  referred  to  by  the  file
284              descriptor fd.
285
286                  int
287                  bpf_update_elem(int fd, const void *key, const void *value,
288                                  uint64_t flags)
289                  {
290                      union bpf_attr attr = {
291                          .map_fd = fd,
292                          .key    = ptr_to_u64(key),
293                          .value  = ptr_to_u64(value),
294                          .flags  = flags,
295                      };
296
297                      return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
298                  }
299
300              The flags argument should be specified as one of the following:
301
302              BPF_ANY
303                     Create a new element or update an existing element.
304
305              BPF_NOEXIST
306                     Create a new element only if it did not exist.
307
308              BPF_EXIST
309                     Update an existing element.
310
311              On  success,  the  operation  returns  zero.   On  error,  -1 is
312              returned and errno is set to EINVAL, EPERM,  ENOMEM,  or  E2BIG.
313              E2BIG  indicates  that the number of elements in the map reached
314              the max_entries limit specified at map  creation  time.   EEXIST
315              will  be returned if flags specifies BPF_NOEXIST and the element
316              with key already exists in the map.  ENOENT will be returned  if
317              flags specifies BPF_EXIST and the element with key doesn't exist
318              in the map.
319
320       BPF_MAP_DELETE_ELEM
321              The BPF_MAP_DELETE_ELEM command deleted the element whose key is
322              key from the map referred to by the file descriptor fd.
323
324                  int
325                  bpf_delete_elem(int fd, const void *key)
326                  {
327                      union bpf_attr attr = {
328                          .map_fd = fd,
329                          .key    = ptr_to_u64(key),
330                      };
331
332                      return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
333                  }
334
335              On  success,  zero is returned.  If the element is not found, -1
336              is returned and errno is set to ENOENT.
337
338       BPF_MAP_GET_NEXT_KEY
339              The BPF_MAP_GET_NEXT_KEY command looks up an element by  key  in
340              the  map  referred  to  by  the  file descriptor fd and sets the
341              next_key pointer to the key of the next element.
342
343                  int
344                  bpf_get_next_key(int fd, const void *key, void *next_key)
345                  {
346                      union bpf_attr attr = {
347                          .map_fd   = fd,
348                          .key      = ptr_to_u64(key),
349                          .next_key = ptr_to_u64(next_key),
350                      };
351
352                      return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
353                  }
354
355              If key is  found,  the  operation  returns  zero  and  sets  the
356              next_key  pointer to the key of the next element.  If key is not
357              found, the operation returns zero and sets the next_key  pointer
358              to the key of the first element.  If key is the last element, -1
359              is returned and errno is set to ENOENT.   Other  possible  errno
360              values  are  ENOMEM, EFAULT, EPERM, and EINVAL.  This method can
361              be used to iterate over all elements in the map.
362
363       close(map_fd)
364              Delete the map referred to by the file descriptor map_fd.   When
365              the  user-space  program that created a map exits, all maps will
366              be deleted automatically (but see NOTES).
367
368   eBPF map types
369       The following map types are supported:
370
371       BPF_MAP_TYPE_HASH
372              Hash-table maps have the following characteristics:
373
374              *  Maps are created and destroyed by user-space programs.   Both
375                 user-space  and eBPF programs can perform lookup, update, and
376                 delete operations.
377
378              *  The kernel takes care of  allocating  and  freeing  key/value
379                 pairs.
380
381              *  The  map_update_elem() helper will fail to insert new element
382                 when the max_entries limit is reached.   (This  ensures  that
383                 eBPF programs cannot exhaust memory.)
384
385              *  map_update_elem() replaces existing elements atomically.
386
387              Hash-table maps are optimized for speed of lookup.
388
389       BPF_MAP_TYPE_ARRAY
390              Array maps have the following characteristics:
391
392              *  Optimized  for  fastest  possible  lookup.  In the future the
393                 verifier/JIT compiler may recognize lookup() operations  that
394                 employ  a constant key and optimize it into constant pointer.
395                 It is possible to optimize a  non-constant  key  into  direct
396                 pointer arithmetic as well, since pointers and value_size are
397                 constant for the life of the eBPF program.  In  other  words,
398                 array_map_lookup_elem()  may be 'inlined' by the verifier/JIT
399                 compiler while preserving concurrent access to this map  from
400                 user space.
401
402              *  All array elements pre-allocated and zero initialized at init
403                 time
404
405              *  The key is an array index, and must be exactly four bytes.
406
407              *  map_delete_elem() fails with the error EINVAL, since elements
408                 cannot be deleted.
409
410              *  map_update_elem()  replaces  elements in a nonatomic fashion;
411                 for atomic updates, a hash-table map should be used  instead.
412                 There  is however one special case that can also be used with
413                 arrays: the atomic  built-in  __sync_fetch_and_add()  can  be
414                 used  on  32 and 64 bit atomic counters.  For example, it can
415                 be applied on the whole value itself if it represents a  sin‐
416                 gle  counter,  or  in case of a structure containing multiple
417                 counters, it could be used on individual counters.   This  is
418                 quite often useful for aggregation and accounting of events.
419
420              Among the uses for array maps are the following:
421
422              *  As  "global"  eBPF variables: an array of 1 element whose key
423                 is (index) 0 and where the value is a collection of  'global'
424                 variables  which  eBPF programs can use to keep state between
425                 events.
426
427              *  Aggregation of tracing events into a fixed set of buckets.
428
429              *  Accounting of networking events, for example, number of pack‐
430                 ets and packet sizes.
431
432       BPF_MAP_TYPE_PROG_ARRAY (since Linux 4.2)
433              A  program  array  map  is a special kind of array map whose map
434              values contain only file descriptors  referring  to  other  eBPF
435              programs.   Thus,  both  the  key_size  and  value_size  must be
436              exactly four bytes.  This map is used in  conjunction  with  the
437              bpf_tail_call() helper.
438
439              This  means  that  an  eBPF  program  with  a  program array map
440              attached to it can call from kernel side into
441
442                  void bpf_tail_call(void *context, void *prog_map,
443                                     unsigned int index);
444
445              and therefore replace its own program flow with the one from the
446              program  at  the given program array slot, if present.  This can
447              be regarded as kind of a jump table to a different eBPF program.
448              The invoked program will then reuse the same stack.  When a jump
449              into the new program has been performed, it won't return to  the
450              old program anymore.
451
452              If  no  eBPF  program is found at the given index of the program
453              array (because the map slot doesn't contain a valid program file
454              descriptor,  the specified lookup index/key is out of bounds, or
455              the limit of 32 nested calls has been exceed), execution contin‐
456              ues  with the current eBPF program.  This can be used as a fall-
457              through for default cases.
458
459              A program array map is useful, for example, in tracing  or  net‐
460              working, to handle individual system calls or protocols in their
461              own subprograms and use their identifiers as an  individual  map
462              index.   This  approach  may result in performance benefits, and
463              also makes it possible to overcome the maximum instruction limit
464              of a single eBPF program.  In dynamic environments, a user-space
465              daemon might atomically replace individual subprograms  at  run-
466              time  with newer versions to alter overall program behavior, for
467              instance, if global policies change.
468
469   eBPF programs
470       The BPF_PROG_LOAD command is used to load an eBPF program into the ker‐
471       nel.   The return value for this command is a new file descriptor asso‐
472       ciated with this eBPF program.
473
474           char bpf_log_buf[LOG_BUF_SIZE];
475
476           int
477           bpf_prog_load(enum bpf_prog_type type,
478                         const struct bpf_insn *insns, int insn_cnt,
479                         const char *license)
480           {
481               union bpf_attr attr = {
482                   .prog_type = type,
483                   .insns     = ptr_to_u64(insns),
484                   .insn_cnt  = insn_cnt,
485                   .license   = ptr_to_u64(license),
486                   .log_buf   = ptr_to_u64(bpf_log_buf),
487                   .log_size  = LOG_BUF_SIZE,
488                   .log_level = 1,
489               };
490
491               return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
492           }
493
494       prog_type is one of the available program types:
495
496                  enum bpf_prog_type {
497                      BPF_PROG_TYPE_UNSPEC,        /* Reserve 0 as invalid
498                                                      program type */
499                      BPF_PROG_TYPE_SOCKET_FILTER,
500                      BPF_PROG_TYPE_KPROBE,
501                      BPF_PROG_TYPE_SCHED_CLS,
502                      BPF_PROG_TYPE_SCHED_ACT,
503                  };
504
505       For further details of eBPF program types, see below.
506
507       The remaining fields of bpf_attr are set as follows:
508
509       *  insns is an array of struct bpf_insn instructions.
510
511       *  insn_cnt is the number of instructions in the program referred to by
512          insns.
513
514       *  license  is  a  license string, which must be GPL compatible to call
515          helper functions marked gpl_only.  (The licensing rules are the same
516          as  for  kernel  modules,  so that also dual licenses, such as "Dual
517          BSD/GPL", may be used.)
518
519       *  log_buf is a pointer to a caller-allocated buffer in which  the  in-
520          kernel  verifier  can  store  the  verification  log.  This log is a
521          multi-line string that can be checked by the program author in order
522          to  understand how the verifier came to the conclusion that the eBPF
523          program is unsafe.  The format of the output can change at any  time
524          as the verifier evolves.
525
526       *  log_size  size  of the buffer pointed to by log_buf.  If the size of
527          the buffer is not large enough to store all verifier messages, -1 is
528          returned and errno is set to ENOSPC.
529
530       *  log_level  verbosity  level  of the verifier.  A value of zero means
531          that the verifier will not provide a log; in this case, log_buf must
532          be a NULL pointer, and log_size must be zero.
533
534       Applying close(2) to the file descriptor returned by BPF_PROG_LOAD will
535       unload the eBPF program (but see NOTES).
536
537       Maps are accessible from eBPF programs and are used  to  exchange  data
538       between  eBPF  programs  and  between eBPF programs and user-space pro‐
539       grams.  For example, eBPF programs can  process  various  events  (like
540       kprobe,  packets)  and store their data into a map, and user-space pro‐
541       grams can then fetch data from the map.   Conversely,  user-space  pro‐
542       grams  can  use  a map as a configuration mechanism, populating the map
543       with values checked by the eBPF program, which then modifies its behav‐
544       ior on the fly according to those values.
545
546   eBPF program types
547       The  eBPF  program  type  (prog_type)  determines  the subset of kernel
548       helper functions that the program may  call.   The  program  type  also
549       determines the program input (context)—the format of struct bpf_context
550       (which is the data blob passed into the eBPF program as the first argu‐
551       ment).
552
553       For  example,  a tracing program does not have the exact same subset of
554       helper functions as a socket filter program (though they may have  some
555       helpers  in common).  Similarly, the input (context) for a tracing pro‐
556       gram is a set of register values, while for a socket  filter  it  is  a
557       network packet.
558
559       The  set  of  functions  available to eBPF programs of a given type may
560       increase in the future.
561
562       The following program types are supported:
563
564       BPF_PROG_TYPE_SOCKET_FILTER (since Linux 3.19)
565              Currently, the set of functions for  BPF_PROG_TYPE_SOCKET_FILTER
566              is:
567
568                  bpf_map_lookup_elem(map_fd, void *key)
569                                      /* look up key in a map_fd */
570                  bpf_map_update_elem(map_fd, void *key, void *value)
571                                      /* update key/value */
572                  bpf_map_delete_elem(map_fd, void *key)
573                                      /* delete key in a map_fd */
574
575              The bpf_context argument is a pointer to a struct __sk_buff.
576
577       BPF_PROG_TYPE_KPROBE (since Linux 4.1)
578              [To be documented]
579
580       BPF_PROG_TYPE_SCHED_CLS (since Linux 4.1)
581              [To be documented]
582
583       BPF_PROG_TYPE_SCHED_ACT (since Linux 4.1)
584              [To be documented]
585
586   Events
587       Once a program is loaded, it can be attached to an event.  Various ker‐
588       nel subsystems have different ways to do so.
589
590       Since Linux 3.19, the following call will attach the program prog_fd to
591       the socket sockfd, which was created by an earlier call to socket(2):
592
593           setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_BPF,
594                      &prog_fd, sizeof(prog_fd));
595
596       Since Linux 4.1, the following call may be used to attach the eBPF pro‐
597       gram referred to by the file descriptor prog_fd to a  perf  event  file
598       descriptor,   event_fd,   that  was  created  by  a  previous  call  to
599       perf_event_open(2):
600
601           ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
602

EXAMPLES

604       /* bpf+sockets example:
605        * 1. create array map of 256 elements
606        * 2. load program that counts number of packets received
607        *    r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]
608        *    map[r0]++
609        * 3. attach prog_fd to raw socket via setsockopt()
610        * 4. print number of received TCP/UDP packets every second
611        */
612       int
613       main(int argc, char **argv)
614       {
615           int sock, map_fd, prog_fd, key;
616           long long value = 0, tcp_cnt, udp_cnt;
617
618           map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key),
619                                   sizeof(value), 256);
620           if (map_fd < 0) {
621               printf("failed to create map '%s'\n", strerror(errno));
622               /* likely not run as root */
623               return 1;
624           }
625
626           struct bpf_insn prog[] = {
627               BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),        /* r6 = r1 */
628               BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)),
629                                       /* r0 = ip->proto */
630               BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),
631                                       /* *(u32 *)(fp - 4) = r0 */
632               BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),       /* r2 = fp */
633               BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),      /* r2 = r2 - 4 */
634               BPF_LD_MAP_FD(BPF_REG_1, map_fd),           /* r1 = map_fd */
635               BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
636                                       /* r0 = map_lookup(r1, r2) */
637               BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
638                                       /* if (r0 == 0) goto pc+2 */
639               BPF_MOV64_IMM(BPF_REG_1, 1),                /* r1 = 1 */
640               BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0),
641                                       /* lock *(u64 *) r0 += r1 */
642               BPF_MOV64_IMM(BPF_REG_0, 0),                /* r0 = 0 */
643               BPF_EXIT_INSN(),                            /* return r0 */
644           };
645
646           prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog,
647                                   sizeof(prog), "GPL");
648
649           sock = open_raw_sock("lo");
650
651           assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
652                             sizeof(prog_fd)) == 0);
653
654           for (;;) {
655               key = IPPROTO_TCP;
656               assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
657               key = IPPROTO_UDP;
658               assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
659               printf("TCP %lld UDP %lld packets\n", tcp_cnt, udp_cnt);
660               sleep(1);
661           }
662
663           return 0;
664       }
665
666       Some complete working code can be found in the samples/bpf directory in
667       the kernel source tree.
668

RETURN VALUE

670       For a successful call, the return value depends on the operation:
671
672       BPF_MAP_CREATE
673              The new file descriptor associated with the eBPF map.
674
675       BPF_PROG_LOAD
676              The new file descriptor associated with the eBPF program.
677
678       All other commands
679              Zero.
680
681       On error, -1 is returned, and errno is set appropriately.
682

ERRORS

684       E2BIG  The  eBPF  program is too large or a map reached the max_entries
685              limit (maximum number of elements).
686
687       EACCES For BPF_PROG_LOAD, even  though  all  program  instructions  are
688              valid,  the  program  has  been  rejected  because it was deemed
689              unsafe.  This may be because it may have accessed  a  disallowed
690              memory  region or an uninitialized stack/register or because the
691              function constraints don't match the  actual  types  or  because
692              there  was a misaligned memory access.  In this case, it is rec‐
693              ommended to call bpf() again with  log_level  =  1  and  examine
694              log_buf for the specific reason provided by the verifier.
695
696       EBADF  fd is not an open file descriptor.
697
698       EFAULT One  of  the pointers (key or value or log_buf or insns) is out‐
699              side the accessible address space.
700
701       EINVAL The value specified in cmd is not recognized by this kernel.
702
703       EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid.
704
705       EINVAL For  BPF_MAP_*_ELEM  commands,  some  of  the  fields  of  union
706              bpf_attr that are not used by this command are not set to zero.
707
708       EINVAL For  BPF_PROG_LOAD, indicates an attempt to load an invalid pro‐
709              gram.  eBPF programs can be deemed invalid due  to  unrecognized
710              instructions,  the  use  of reserved fields, jumps out of range,
711              infinite loops or calls of unknown functions.
712
713       ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM,  indicates  that
714              the element with the given key was not found.
715
716       ENOMEM Cannot allocate sufficient memory.
717
718       EPERM  The  call  was  made  without  sufficient privilege (without the
719              CAP_SYS_ADMIN capability).
720

VERSIONS

722       The bpf() system call first appeared in Linux 3.18.
723

CONFORMING TO

725       The bpf() system call is Linux-specific.
726

NOTES

728       In the current implementation, all bpf() commands require the caller to
729       have the CAP_SYS_ADMIN capability.
730
731       eBPF  objects (maps and programs) can be shared between processes.  For
732       example, after fork(2), the child inherits file  descriptors  referring
733       to  the  same eBPF objects.  In addition, file descriptors referring to
734       eBPF objects  can  be  transferred  over  UNIX  domain  sockets.   File
735       descriptors  referring  to  eBPF objects can be duplicated in the usual
736       way, using dup(2) and similar calls.  An  eBPF  object  is  deallocated
737       only  after  all  file  descriptors  referring  to the object have been
738       closed.
739
740       eBPF programs can be written in a restricted C that is compiled  (using
741       the  clang  compiler) into eBPF bytecode.  Various features are omitted
742       from this restricted C, such as loops, global variables, variadic func‐
743       tions, floating-point numbers, and passing structures as function argu‐
744       ments.  Some examples can be found in the samples/bpf/*_kern.c files in
745       the kernel source tree.
746
747       The  kernel contains a just-in-time (JIT) compiler that translates eBPF
748       bytecode into native machine code for better performance.   In  kernels
749       before  Linux  4.15,  the  JIT compiler is disabled by default, but its
750       operation can be controlled by writing one  of  the  following  integer
751       strings to the file /proc/sys/net/core/bpf_jit_enable:
752
753       0  Disable JIT compilation (default).
754
755       1  Normal compilation.
756
757       2  Debugging  mode.   The  generated  opcodes are dumped in hexadecimal
758          into the kernel log.  These opcodes can then be  disassembled  using
759          the program tools/net/bpf_jit_disasm.c provided in the kernel source
760          tree.
761
762       Since  Linux  4.15,  the  kernel   may   configured   with   the   CON‐
763       FIG_BPF_JIT_ALWAYS_ON option.  In this case, the JIT compiler is always
764       enabled, and the bpf_jit_enable is initialized to 1 and  is  immutable.
765       (This  kernel configuration option was provided as a mitigation for one
766       of the Spectre attacks against the BPF interpreter.)
767
768       The JIT compiler for eBPF is  currently  available  for  the  following
769       architectures:
770
771       *  x86-64 (since Linux 3.18);
772       *  ARM-64 (since Linux 3.18);
773       *  s390 (since Linux 4.1);
774       *  PowerPC 64 (since Linux 4.8);
775       *  SPARC 64 (since Linux 4.12);
776       *  MIPS (since Linux 4.13);
777       *  ARM32 (since Linux 4.14).
778

SEE ALSO

780       seccomp(2), socket(7), tc(8), tc-bpf(8)
781
782       Both  classic  and extended BPF are explained in the kernel source file
783       Documentation/networking/filter.txt.
784

COLOPHON

786       This page is part of release 4.16 of the Linux  man-pages  project.   A
787       description  of  the project, information about reporting bugs, and the
788       latest    version    of    this    page,    can     be     found     at
789       https://www.kernel.org/doc/man-pages/.
790
791
792
793Linux                             2018-02-02                            BPF(2)
Impressum