1bpf(2) System Calls Manual bpf(2)
2
3
4
6 bpf - perform a command on an extended BPF map or program
7
9 #include <linux/bpf.h>
10
11 int bpf(int cmd, union bpf_attr *attr, unsigned int size);
12
14 The bpf() system call performs a range of operations related to ex‐
15 tended Berkeley Packet Filters. Extended BPF (or eBPF) is similar to
16 the original ("classic") BPF (cBPF) used to filter network packets.
17 For both cBPF and eBPF programs, the kernel statically analyzes the
18 programs before loading them, in order to ensure that they cannot harm
19 the running system.
20
21 eBPF extends cBPF in multiple ways, including the ability to call a
22 fixed set of in-kernel helper functions (via the BPF_CALL opcode exten‐
23 sion provided by eBPF) and access shared data structures such as eBPF
24 maps.
25
26 Extended BPF Design/Architecture
27 eBPF maps are a generic data structure for storage of different data
28 types. Data types are generally treated as binary blobs, so a user
29 just specifies the size of the key and the size of the value at map-
30 creation time. In other words, a key/value for a given map can have an
31 arbitrary structure.
32
33 A user process can create multiple maps (with key/value-pairs being
34 opaque bytes of data) and access them via file descriptors. Different
35 eBPF programs can access the same maps in parallel. It's up to the
36 user process and eBPF program to decide what they store inside maps.
37
38 There's one special map type, called a program array. This type of map
39 stores file descriptors referring to other eBPF programs. When a
40 lookup in the map is performed, the program flow is redirected in-place
41 to the beginning of another eBPF program and does not return back to
42 the calling program. The level of nesting has a fixed limit of 32, so
43 that infinite loops cannot be crafted. At run time, the program file
44 descriptors stored in the map can be modified, so program functionality
45 can be altered based on specific requirements. All programs referred
46 to in a program-array map must have been previously loaded into the
47 kernel via bpf(). If a map lookup fails, the current program continues
48 its execution. See BPF_MAP_TYPE_PROG_ARRAY below for further details.
49
50 Generally, eBPF programs are loaded by the user process and automati‐
51 cally unloaded when the process exits. In some cases, for example,
52 tc-bpf(8), the program will continue to stay alive inside the kernel
53 even after the process that loaded the program exits. In that case,
54 the tc subsystem holds a reference to the eBPF program after the file
55 descriptor has been closed by the user-space program. Thus, whether a
56 specific program continues to live inside the kernel depends on how it
57 is further attached to a given kernel subsystem after it was loaded via
58 bpf().
59
60 Each eBPF program is a set of instructions that is safe to run until
61 its completion. An in-kernel verifier statically determines that the
62 eBPF program terminates and is safe to execute. During verification,
63 the kernel increments reference counts for each of the maps that the
64 eBPF program uses, so that the attached maps can't be removed until the
65 program is unloaded.
66
67 eBPF programs can be attached to different events. These events can be
68 the arrival of network packets, tracing events, classification events
69 by network queueing disciplines (for eBPF programs attached to a tc(8)
70 classifier), and other types that may be added in the future. A new
71 event triggers execution of the eBPF program, which may store informa‐
72 tion about the event in eBPF maps. Beyond storing data, eBPF programs
73 may call a fixed set of in-kernel helper functions.
74
75 The same eBPF program can be attached to multiple events and different
76 eBPF programs can access the same map:
77
78 tracing tracing tracing packet packet packet
79 event A event B event C on eth0 on eth1 on eth2
80 | | | | | ^
81 | | | | v |
82 --> tracing <-- tracing socket tc ingress tc egress
83 prog_1 prog_2 prog_3 classifier action
84 | | | | prog_4 prog_5
85 |--- -----| |------| map_3 | |
86 map_1 map_2 --| map_4 |--
87
88 Arguments
89 The operation to be performed by the bpf() system call is determined by
90 the cmd argument. Each operation takes an accompanying argument, pro‐
91 vided via attr, which is a pointer to a union of type bpf_attr (see be‐
92 low). The unused fields and padding must be zeroed out before the
93 call. The size argument is the size of the union pointed to by attr.
94
95 The value provided in cmd is one of the following:
96
97 BPF_MAP_CREATE
98 Create a map and return a file descriptor that refers to the
99 map. The close-on-exec file descriptor flag (see fcntl(2)) is
100 automatically enabled for the new file descriptor.
101
102 BPF_MAP_LOOKUP_ELEM
103 Look up an element by key in a specified map and return its
104 value.
105
106 BPF_MAP_UPDATE_ELEM
107 Create or update an element (key/value pair) in a specified map.
108
109 BPF_MAP_DELETE_ELEM
110 Look up and delete an element by key in a specified map.
111
112 BPF_MAP_GET_NEXT_KEY
113 Look up an element by key in a specified map and return the key
114 of the next element.
115
116 BPF_PROG_LOAD
117 Verify and load an eBPF program, returning a new file descriptor
118 associated with the program. The close-on-exec file descriptor
119 flag (see fcntl(2)) is automatically enabled for the new file
120 descriptor.
121
122 The bpf_attr union consists of various anonymous structures that
123 are used by different bpf() commands:
124
125 union bpf_attr {
126 struct { /* Used by BPF_MAP_CREATE */
127 __u32 map_type;
128 __u32 key_size; /* size of key in bytes */
129 __u32 value_size; /* size of value in bytes */
130 __u32 max_entries; /* maximum number of entries
131 in a map */
132 };
133
134 struct { /* Used by BPF_MAP_*_ELEM and BPF_MAP_GET_NEXT_KEY
135 commands */
136 __u32 map_fd;
137 __aligned_u64 key;
138 union {
139 __aligned_u64 value;
140 __aligned_u64 next_key;
141 };
142 __u64 flags;
143 };
144
145 struct { /* Used by BPF_PROG_LOAD */
146 __u32 prog_type;
147 __u32 insn_cnt;
148 __aligned_u64 insns; /* 'const struct bpf_insn *' */
149 __aligned_u64 license; /* 'const char *' */
150 __u32 log_level; /* verbosity level of verifier */
151 __u32 log_size; /* size of user buffer */
152 __aligned_u64 log_buf; /* user supplied 'char *'
153 buffer */
154 __u32 kern_version;
155 /* checked when prog_type=kprobe
156 (since Linux 4.1) */
157 };
158 } __attribute__((aligned(8)));
159
160 eBPF maps
161 Maps are a generic data structure for storage of different types of
162 data. They allow sharing of data between eBPF kernel programs, and
163 also between kernel and user-space applications.
164
165 Each map type has the following attributes:
166
167 • type
168
169 • maximum number of elements
170
171 • key size in bytes
172
173 • value size in bytes
174
175 The following wrapper functions demonstrate how various bpf() commands
176 can be used to access the maps. The functions use the cmd argument to
177 invoke different operations.
178
179 BPF_MAP_CREATE
180 The BPF_MAP_CREATE command creates a new map, returning a new
181 file descriptor that refers to the map.
182
183 int
184 bpf_create_map(enum bpf_map_type map_type,
185 unsigned int key_size,
186 unsigned int value_size,
187 unsigned int max_entries)
188 {
189 union bpf_attr attr = {
190 .map_type = map_type,
191 .key_size = key_size,
192 .value_size = value_size,
193 .max_entries = max_entries
194 };
195
196 return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
197 }
198
199 The new map has the type specified by map_type, and attributes
200 as specified in key_size, value_size, and max_entries. On suc‐
201 cess, this operation returns a file descriptor. On error, -1 is
202 returned and errno is set to EINVAL, EPERM, or ENOMEM.
203
204 The key_size and value_size attributes will be used by the veri‐
205 fier during program loading to check that the program is calling
206 bpf_map_*_elem() helper functions with a correctly initialized
207 key and to check that the program doesn't access the map element
208 value beyond the specified value_size. For example, when a map
209 is created with a key_size of 8 and the eBPF program calls
210
211 bpf_map_lookup_elem(map_fd, fp - 4)
212
213 the program will be rejected, since the in-kernel helper func‐
214 tion
215
216 bpf_map_lookup_elem(map_fd, void *key)
217
218 expects to read 8 bytes from the location pointed to by key, but
219 the fp - 4 (where fp is the top of the stack) starting address
220 will cause out-of-bounds stack access.
221
222 Similarly, when a map is created with a value_size of 1 and the
223 eBPF program contains
224
225 value = bpf_map_lookup_elem(...);
226 *(u32 *) value = 1;
227
228 the program will be rejected, since it accesses the value
229 pointer beyond the specified 1 byte value_size limit.
230
231 Currently, the following values are supported for map_type:
232
233 enum bpf_map_type {
234 BPF_MAP_TYPE_UNSPEC, /* Reserve 0 as invalid map type */
235 BPF_MAP_TYPE_HASH,
236 BPF_MAP_TYPE_ARRAY,
237 BPF_MAP_TYPE_PROG_ARRAY,
238 BPF_MAP_TYPE_PERF_EVENT_ARRAY,
239 BPF_MAP_TYPE_PERCPU_HASH,
240 BPF_MAP_TYPE_PERCPU_ARRAY,
241 BPF_MAP_TYPE_STACK_TRACE,
242 BPF_MAP_TYPE_CGROUP_ARRAY,
243 BPF_MAP_TYPE_LRU_HASH,
244 BPF_MAP_TYPE_LRU_PERCPU_HASH,
245 BPF_MAP_TYPE_LPM_TRIE,
246 BPF_MAP_TYPE_ARRAY_OF_MAPS,
247 BPF_MAP_TYPE_HASH_OF_MAPS,
248 BPF_MAP_TYPE_DEVMAP,
249 BPF_MAP_TYPE_SOCKMAP,
250 BPF_MAP_TYPE_CPUMAP,
251 BPF_MAP_TYPE_XSKMAP,
252 BPF_MAP_TYPE_SOCKHASH,
253 BPF_MAP_TYPE_CGROUP_STORAGE,
254 BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
255 BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
256 BPF_MAP_TYPE_QUEUE,
257 BPF_MAP_TYPE_STACK,
258 /* See /usr/include/linux/bpf.h for the full list. */
259 };
260
261 map_type selects one of the available map implementations in the
262 kernel. For all map types, eBPF programs access maps with the
263 same bpf_map_lookup_elem() and bpf_map_update_elem() helper
264 functions. Further details of the various map types are given
265 below.
266
267 BPF_MAP_LOOKUP_ELEM
268 The BPF_MAP_LOOKUP_ELEM command looks up an element with a given
269 key in the map referred to by the file descriptor fd.
270
271 int
272 bpf_lookup_elem(int fd, const void *key, void *value)
273 {
274 union bpf_attr attr = {
275 .map_fd = fd,
276 .key = ptr_to_u64(key),
277 .value = ptr_to_u64(value),
278 };
279
280 return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
281 }
282
283 If an element is found, the operation returns zero and stores
284 the element's value into value, which must point to a buffer of
285 value_size bytes.
286
287 If no element is found, the operation returns -1 and sets errno
288 to ENOENT.
289
290 BPF_MAP_UPDATE_ELEM
291 The BPF_MAP_UPDATE_ELEM command creates or updates an element
292 with a given key/value in the map referred to by the file de‐
293 scriptor fd.
294
295 int
296 bpf_update_elem(int fd, const void *key, const void *value,
297 uint64_t flags)
298 {
299 union bpf_attr attr = {
300 .map_fd = fd,
301 .key = ptr_to_u64(key),
302 .value = ptr_to_u64(value),
303 .flags = flags,
304 };
305
306 return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
307 }
308
309 The flags argument should be specified as one of the following:
310
311 BPF_ANY
312 Create a new element or update an existing element.
313
314 BPF_NOEXIST
315 Create a new element only if it did not exist.
316
317 BPF_EXIST
318 Update an existing element.
319
320 On success, the operation returns zero. On error, -1 is re‐
321 turned and errno is set to EINVAL, EPERM, ENOMEM, or E2BIG.
322 E2BIG indicates that the number of elements in the map reached
323 the max_entries limit specified at map creation time. EEXIST
324 will be returned if flags specifies BPF_NOEXIST and the element
325 with key already exists in the map. ENOENT will be returned if
326 flags specifies BPF_EXIST and the element with key doesn't exist
327 in the map.
328
329 BPF_MAP_DELETE_ELEM
330 The BPF_MAP_DELETE_ELEM command deletes the element whose key is
331 key from the map referred to by the file descriptor fd.
332
333 int
334 bpf_delete_elem(int fd, const void *key)
335 {
336 union bpf_attr attr = {
337 .map_fd = fd,
338 .key = ptr_to_u64(key),
339 };
340
341 return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
342 }
343
344 On success, zero is returned. If the element is not found, -1
345 is returned and errno is set to ENOENT.
346
347 BPF_MAP_GET_NEXT_KEY
348 The BPF_MAP_GET_NEXT_KEY command looks up an element by key in
349 the map referred to by the file descriptor fd and sets the
350 next_key pointer to the key of the next element.
351
352 int
353 bpf_get_next_key(int fd, const void *key, void *next_key)
354 {
355 union bpf_attr attr = {
356 .map_fd = fd,
357 .key = ptr_to_u64(key),
358 .next_key = ptr_to_u64(next_key),
359 };
360
361 return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
362 }
363
364 If key is found, the operation returns zero and sets the
365 next_key pointer to the key of the next element. If key is not
366 found, the operation returns zero and sets the next_key pointer
367 to the key of the first element. If key is the last element, -1
368 is returned and errno is set to ENOENT. Other possible errno
369 values are ENOMEM, EFAULT, EPERM, and EINVAL. This method can
370 be used to iterate over all elements in the map.
371
372 close(map_fd)
373 Delete the map referred to by the file descriptor map_fd. When
374 the user-space program that created a map exits, all maps will
375 be deleted automatically (but see NOTES).
376
377 eBPF map types
378 The following map types are supported:
379
380 BPF_MAP_TYPE_HASH
381 Hash-table maps have the following characteristics:
382
383 • Maps are created and destroyed by user-space programs. Both
384 user-space and eBPF programs can perform lookup, update, and
385 delete operations.
386
387 • The kernel takes care of allocating and freeing key/value
388 pairs.
389
390 • The map_update_elem() helper will fail to insert new element
391 when the max_entries limit is reached. (This ensures that
392 eBPF programs cannot exhaust memory.)
393
394 • map_update_elem() replaces existing elements atomically.
395
396 Hash-table maps are optimized for speed of lookup.
397
398 BPF_MAP_TYPE_ARRAY
399 Array maps have the following characteristics:
400
401 • Optimized for fastest possible lookup. In the future the
402 verifier/JIT compiler may recognize lookup() operations that
403 employ a constant key and optimize it into constant pointer.
404 It is possible to optimize a non-constant key into direct
405 pointer arithmetic as well, since pointers and value_size are
406 constant for the life of the eBPF program. In other words,
407 array_map_lookup_elem() may be 'inlined' by the verifier/JIT
408 compiler while preserving concurrent access to this map from
409 user space.
410
411 • All array elements pre-allocated and zero initialized at init
412 time
413
414 • The key is an array index, and must be exactly four bytes.
415
416 • map_delete_elem() fails with the error EINVAL, since elements
417 cannot be deleted.
418
419 • map_update_elem() replaces elements in a nonatomic fashion;
420 for atomic updates, a hash-table map should be used instead.
421 There is however one special case that can also be used with
422 arrays: the atomic built-in __sync_fetch_and_add() can be
423 used on 32 and 64 bit atomic counters. For example, it can
424 be applied on the whole value itself if it represents a sin‐
425 gle counter, or in case of a structure containing multiple
426 counters, it could be used on individual counters. This is
427 quite often useful for aggregation and accounting of events.
428
429 Among the uses for array maps are the following:
430
431 • As "global" eBPF variables: an array of 1 element whose key
432 is (index) 0 and where the value is a collection of 'global'
433 variables which eBPF programs can use to keep state between
434 events.
435
436 • Aggregation of tracing events into a fixed set of buckets.
437
438 • Accounting of networking events, for example, number of pack‐
439 ets and packet sizes.
440
441 BPF_MAP_TYPE_PROG_ARRAY (since Linux 4.2)
442 A program array map is a special kind of array map whose map
443 values contain only file descriptors referring to other eBPF
444 programs. Thus, both the key_size and value_size must be ex‐
445 actly four bytes. This map is used in conjunction with the
446 bpf_tail_call() helper.
447
448 This means that an eBPF program with a program array map at‐
449 tached to it can call from kernel side into
450
451 void bpf_tail_call(void *context, void *prog_map,
452 unsigned int index);
453
454 and therefore replace its own program flow with the one from the
455 program at the given program array slot, if present. This can
456 be regarded as kind of a jump table to a different eBPF program.
457 The invoked program will then reuse the same stack. When a jump
458 into the new program has been performed, it won't return to the
459 old program anymore.
460
461 If no eBPF program is found at the given index of the program
462 array (because the map slot doesn't contain a valid program file
463 descriptor, the specified lookup index/key is out of bounds, or
464 the limit of 32 nested calls has been exceed), execution contin‐
465 ues with the current eBPF program. This can be used as a fall-
466 through for default cases.
467
468 A program array map is useful, for example, in tracing or net‐
469 working, to handle individual system calls or protocols in their
470 own subprograms and use their identifiers as an individual map
471 index. This approach may result in performance benefits, and
472 also makes it possible to overcome the maximum instruction limit
473 of a single eBPF program. In dynamic environments, a user-space
474 daemon might atomically replace individual subprograms at run-
475 time with newer versions to alter overall program behavior, for
476 instance, if global policies change.
477
478 eBPF programs
479 The BPF_PROG_LOAD command is used to load an eBPF program into the ker‐
480 nel. The return value for this command is a new file descriptor asso‐
481 ciated with this eBPF program.
482
483 char bpf_log_buf[LOG_BUF_SIZE];
484
485 int
486 bpf_prog_load(enum bpf_prog_type type,
487 const struct bpf_insn *insns, int insn_cnt,
488 const char *license)
489 {
490 union bpf_attr attr = {
491 .prog_type = type,
492 .insns = ptr_to_u64(insns),
493 .insn_cnt = insn_cnt,
494 .license = ptr_to_u64(license),
495 .log_buf = ptr_to_u64(bpf_log_buf),
496 .log_size = LOG_BUF_SIZE,
497 .log_level = 1,
498 };
499
500 return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
501 }
502
503 prog_type is one of the available program types:
504
505 enum bpf_prog_type {
506 BPF_PROG_TYPE_UNSPEC, /* Reserve 0 as invalid
507 program type */
508 BPF_PROG_TYPE_SOCKET_FILTER,
509 BPF_PROG_TYPE_KPROBE,
510 BPF_PROG_TYPE_SCHED_CLS,
511 BPF_PROG_TYPE_SCHED_ACT,
512 BPF_PROG_TYPE_TRACEPOINT,
513 BPF_PROG_TYPE_XDP,
514 BPF_PROG_TYPE_PERF_EVENT,
515 BPF_PROG_TYPE_CGROUP_SKB,
516 BPF_PROG_TYPE_CGROUP_SOCK,
517 BPF_PROG_TYPE_LWT_IN,
518 BPF_PROG_TYPE_LWT_OUT,
519 BPF_PROG_TYPE_LWT_XMIT,
520 BPF_PROG_TYPE_SOCK_OPS,
521 BPF_PROG_TYPE_SK_SKB,
522 BPF_PROG_TYPE_CGROUP_DEVICE,
523 BPF_PROG_TYPE_SK_MSG,
524 BPF_PROG_TYPE_RAW_TRACEPOINT,
525 BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
526 BPF_PROG_TYPE_LWT_SEG6LOCAL,
527 BPF_PROG_TYPE_LIRC_MODE2,
528 BPF_PROG_TYPE_SK_REUSEPORT,
529 BPF_PROG_TYPE_FLOW_DISSECTOR,
530 /* See /usr/include/linux/bpf.h for the full list. */
531 };
532
533 For further details of eBPF program types, see below.
534
535 The remaining fields of bpf_attr are set as follows:
536
537 • insns is an array of struct bpf_insn instructions.
538
539 • insn_cnt is the number of instructions in the program referred to by
540 insns.
541
542 • license is a license string, which must be GPL compatible to call
543 helper functions marked gpl_only. (The licensing rules are the same
544 as for kernel modules, so that also dual licenses, such as "Dual
545 BSD/GPL", may be used.)
546
547 • log_buf is a pointer to a caller-allocated buffer in which the in-
548 kernel verifier can store the verification log. This log is a
549 multi-line string that can be checked by the program author in order
550 to understand how the verifier came to the conclusion that the eBPF
551 program is unsafe. The format of the output can change at any time
552 as the verifier evolves.
553
554 • log_size size of the buffer pointed to by log_buf. If the size of
555 the buffer is not large enough to store all verifier messages, -1 is
556 returned and errno is set to ENOSPC.
557
558 • log_level verbosity level of the verifier. A value of zero means
559 that the verifier will not provide a log; in this case, log_buf must
560 be a NULL pointer, and log_size must be zero.
561
562 Applying close(2) to the file descriptor returned by BPF_PROG_LOAD will
563 unload the eBPF program (but see NOTES).
564
565 Maps are accessible from eBPF programs and are used to exchange data
566 between eBPF programs and between eBPF programs and user-space pro‐
567 grams. For example, eBPF programs can process various events (like
568 kprobe, packets) and store their data into a map, and user-space pro‐
569 grams can then fetch data from the map. Conversely, user-space pro‐
570 grams can use a map as a configuration mechanism, populating the map
571 with values checked by the eBPF program, which then modifies its behav‐
572 ior on the fly according to those values.
573
574 eBPF program types
575 The eBPF program type (prog_type) determines the subset of kernel
576 helper functions that the program may call. The program type also de‐
577 termines the program input (context)—the format of struct bpf_context
578 (which is the data blob passed into the eBPF program as the first argu‐
579 ment).
580
581 For example, a tracing program does not have the exact same subset of
582 helper functions as a socket filter program (though they may have some
583 helpers in common). Similarly, the input (context) for a tracing pro‐
584 gram is a set of register values, while for a socket filter it is a
585 network packet.
586
587 The set of functions available to eBPF programs of a given type may in‐
588 crease in the future.
589
590 The following program types are supported:
591
592 BPF_PROG_TYPE_SOCKET_FILTER (since Linux 3.19)
593 Currently, the set of functions for BPF_PROG_TYPE_SOCKET_FILTER
594 is:
595
596 bpf_map_lookup_elem(map_fd, void *key)
597 /* look up key in a map_fd */
598 bpf_map_update_elem(map_fd, void *key, void *value)
599 /* update key/value */
600 bpf_map_delete_elem(map_fd, void *key)
601 /* delete key in a map_fd */
602
603 The bpf_context argument is a pointer to a struct __sk_buff.
604
605 BPF_PROG_TYPE_KPROBE (since Linux 4.1)
606 [To be documented]
607
608 BPF_PROG_TYPE_SCHED_CLS (since Linux 4.1)
609 [To be documented]
610
611 BPF_PROG_TYPE_SCHED_ACT (since Linux 4.1)
612 [To be documented]
613
614 Events
615 Once a program is loaded, it can be attached to an event. Various ker‐
616 nel subsystems have different ways to do so.
617
618 Since Linux 3.19, the following call will attach the program prog_fd to
619 the socket sockfd, which was created by an earlier call to socket(2):
620
621 setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_BPF,
622 &prog_fd, sizeof(prog_fd));
623
624 Since Linux 4.1, the following call may be used to attach the eBPF pro‐
625 gram referred to by the file descriptor prog_fd to a perf event file
626 descriptor, event_fd, that was created by a previous call to
627 perf_event_open(2):
628
629 ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
630
632 For a successful call, the return value depends on the operation:
633
634 BPF_MAP_CREATE
635 The new file descriptor associated with the eBPF map.
636
637 BPF_PROG_LOAD
638 The new file descriptor associated with the eBPF program.
639
640 All other commands
641 Zero.
642
643 On error, -1 is returned, and errno is set to indicate the error.
644
646 E2BIG The eBPF program is too large or a map reached the max_entries
647 limit (maximum number of elements).
648
649 EACCES For BPF_PROG_LOAD, even though all program instructions are
650 valid, the program has been rejected because it was deemed un‐
651 safe. This may be because it may have accessed a disallowed
652 memory region or an uninitialized stack/register or because the
653 function constraints don't match the actual types or because
654 there was a misaligned memory access. In this case, it is rec‐
655 ommended to call bpf() again with log_level = 1 and examine
656 log_buf for the specific reason provided by the verifier.
657
658 EAGAIN For BPF_PROG_LOAD, indicates that needed resources are blocked.
659 This happens when the verifier detects pending signals while it
660 is checking the validity of the bpf program. In this case, just
661 call bpf() again with the same parameters.
662
663 EBADF fd is not an open file descriptor.
664
665 EFAULT One of the pointers (key or value or log_buf or insns) is out‐
666 side the accessible address space.
667
668 EINVAL The value specified in cmd is not recognized by this kernel.
669
670 EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid.
671
672 EINVAL For BPF_MAP_*_ELEM commands, some of the fields of union
673 bpf_attr that are not used by this command are not set to zero.
674
675 EINVAL For BPF_PROG_LOAD, indicates an attempt to load an invalid pro‐
676 gram. eBPF programs can be deemed invalid due to unrecognized
677 instructions, the use of reserved fields, jumps out of range,
678 infinite loops or calls of unknown functions.
679
680 ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that
681 the element with the given key was not found.
682
683 ENOMEM Cannot allocate sufficient memory.
684
685 EPERM The call was made without sufficient privilege (without the
686 CAP_SYS_ADMIN capability).
687
689 Linux.
690
692 Linux 3.18.
693
695 Prior to Linux 4.4, all bpf() commands require the caller to have the
696 CAP_SYS_ADMIN capability. From Linux 4.4 onwards, an unprivileged user
697 may create limited programs of type BPF_PROG_TYPE_SOCKET_FILTER and as‐
698 sociated maps. However they may not store kernel pointers within the
699 maps and are presently limited to the following helper functions:
700
701 • get_random
702 • get_smp_processor_id
703 • tail_call
704 • ktime_get_ns
705
706 Unprivileged access may be blocked by writing the value 1 to the file
707 /proc/sys/kernel/unprivileged_bpf_disabled.
708
709 eBPF objects (maps and programs) can be shared between processes. For
710 example, after fork(2), the child inherits file descriptors referring
711 to the same eBPF objects. In addition, file descriptors referring to
712 eBPF objects can be transferred over UNIX domain sockets. File de‐
713 scriptors referring to eBPF objects can be duplicated in the usual way,
714 using dup(2) and similar calls. An eBPF object is deallocated only af‐
715 ter all file descriptors referring to the object have been closed.
716
717 eBPF programs can be written in a restricted C that is compiled (using
718 the clang compiler) into eBPF bytecode. Various features are omitted
719 from this restricted C, such as loops, global variables, variadic func‐
720 tions, floating-point numbers, and passing structures as function argu‐
721 ments. Some examples can be found in the samples/bpf/*_kern.c files in
722 the kernel source tree.
723
724 The kernel contains a just-in-time (JIT) compiler that translates eBPF
725 bytecode into native machine code for better performance. Before Linux
726 4.15, the JIT compiler is disabled by default, but its operation can be
727 controlled by writing one of the following integer strings to the file
728 /proc/sys/net/core/bpf_jit_enable:
729
730 0 Disable JIT compilation (default).
731
732 1 Normal compilation.
733
734 2 Debugging mode. The generated opcodes are dumped in hexadecimal
735 into the kernel log. These opcodes can then be disassembled us‐
736 ing the program tools/net/bpf_jit_disasm.c provided in the ker‐
737 nel source tree.
738
739 Since Linux 4.15, the kernel may configured with the CONFIG_BPF_JIT_AL‐
740 WAYS_ON option. In this case, the JIT compiler is always enabled, and
741 the bpf_jit_enable is initialized to 1 and is immutable. (This kernel
742 configuration option was provided as a mitigation for one of the Spec‐
743 tre attacks against the BPF interpreter.)
744
745 The JIT compiler for eBPF is currently available for the following ar‐
746 chitectures:
747
748 • x86-64 (since Linux 3.18; cBPF since Linux 3.0);
749 • ARM32 (since Linux 3.18; cBPF since Linux 3.4);
750 • SPARC 32 (since Linux 3.18; cBPF since Linux 3.5);
751 • ARM-64 (since Linux 3.18);
752 • s390 (since Linux 4.1; cBPF since Linux 3.7);
753 • PowerPC 64 (since Linux 4.8; cBPF since Linux 3.1);
754 • SPARC 64 (since Linux 4.12);
755 • x86-32 (since Linux 4.18);
756 • MIPS 64 (since Linux 4.18; cBPF since Linux 3.16);
757 • riscv (since Linux 5.1).
758
760 /* bpf+sockets example:
761 * 1. create array map of 256 elements
762 * 2. load program that counts number of packets received
763 * r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]
764 * map[r0]++
765 * 3. attach prog_fd to raw socket via setsockopt()
766 * 4. print number of received TCP/UDP packets every second
767 */
768 int
769 main(int argc, char *argv[])
770 {
771 int sock, map_fd, prog_fd, key;
772 long long value = 0, tcp_cnt, udp_cnt;
773
774 map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key),
775 sizeof(value), 256);
776 if (map_fd < 0) {
777 printf("failed to create map '%s'\n", strerror(errno));
778 /* likely not run as root */
779 return 1;
780 }
781
782 struct bpf_insn prog[] = {
783 BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */
784 BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)),
785 /* r0 = ip->proto */
786 BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),
787 /* *(u32 *)(fp - 4) = r0 */
788 BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */
789 BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */
790 BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */
791 BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
792 /* r0 = map_lookup(r1, r2) */
793 BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
794 /* if (r0 == 0) goto pc+2 */
795 BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
796 BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0),
797 /* lock *(u64 *) r0 += r1 */
798 BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
799 BPF_EXIT_INSN(), /* return r0 */
800 };
801
802 prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog,
803 sizeof(prog) / sizeof(prog[0]), "GPL");
804
805 sock = open_raw_sock("lo");
806
807 assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
808 sizeof(prog_fd)) == 0);
809
810 for (;;) {
811 key = IPPROTO_TCP;
812 assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
813 key = IPPROTO_UDP;
814 assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
815 printf("TCP %lld UDP %lld packets\n", tcp_cnt, udp_cnt);
816 sleep(1);
817 }
818
819 return 0;
820 }
821
822 Some complete working code can be found in the samples/bpf directory in
823 the kernel source tree.
824
826 seccomp(2), bpf-helpers(7), socket(7), tc(8), tc-bpf(8)
827
828 Both classic and extended BPF are explained in the kernel source file
829 Documentation/networking/filter.txt.
830
831
832
833Linux man-pages 6.05 2023-07-28 bpf(2)