1BPF(2) Linux Programmer's Manual BPF(2)
2
3
4
6 bpf - perform a command on an extended BPF map or program
7
9 #include <linux/bpf.h>
10
11 int bpf(int cmd, union bpf_attr *attr, unsigned int size);
12
14 The bpf() system call performs a range of operations related to
15 extended Berkeley Packet Filters. Extended BPF (or eBPF) is similar to
16 the original ("classic") BPF (cBPF) used to filter network packets.
17 For both cBPF and eBPF programs, the kernel statically analyzes the
18 programs before loading them, in order to ensure that they cannot harm
19 the running system.
20
21 eBPF extends cBPF in multiple ways, including the ability to call a
22 fixed set of in-kernel helper functions (via the BPF_CALL opcode exten‐
23 sion provided by eBPF) and access shared data structures such as eBPF
24 maps.
25
26 Extended BPF Design/Architecture
27 eBPF maps are a generic data structure for storage of different data
28 types. Data types are generally treated as binary blobs, so a user
29 just specifies the size of the key and the size of the value at map-
30 creation time. In other words, a key/value for a given map can have an
31 arbitrary structure.
32
33 A user process can create multiple maps (with key/value-pairs being
34 opaque bytes of data) and access them via file descriptors. Different
35 eBPF programs can access the same maps in parallel. It's up to the
36 user process and eBPF program to decide what they store inside maps.
37
38 There's one special map type, called a program array. This type of map
39 stores file descriptors referring to other eBPF programs. When a
40 lookup in the map is performed, the program flow is redirected in-place
41 to the beginning of another eBPF program and does not return back to
42 the calling program. The level of nesting has a fixed limit of 32, so
43 that infinite loops cannot be crafted. At run time, the program file
44 descriptors stored in the map can be modified, so program functionality
45 can be altered based on specific requirements. All programs referred
46 to in a program-array map must have been previously loaded into the
47 kernel via bpf(). If a map lookup fails, the current program continues
48 its execution. See BPF_MAP_TYPE_PROG_ARRAY below for further details.
49
50 Generally, eBPF programs are loaded by the user process and automati‐
51 cally unloaded when the process exits. In some cases, for example, tc-
52 bpf(8), the program will continue to stay alive inside the kernel even
53 after the process that loaded the program exits. In that case, the tc
54 subsystem holds a reference to the eBPF program after the file descrip‐
55 tor has been closed by the user-space program. Thus, whether a spe‐
56 cific program continues to live inside the kernel depends on how it is
57 further attached to a given kernel subsystem after it was loaded via
58 bpf().
59
60 Each eBPF program is a set of instructions that is safe to run until
61 its completion. An in-kernel verifier statically determines that the
62 eBPF program terminates and is safe to execute. During verification,
63 the kernel increments reference counts for each of the maps that the
64 eBPF program uses, so that the attached maps can't be removed until the
65 program is unloaded.
66
67 eBPF programs can be attached to different events. These events can be
68 the arrival of network packets, tracing events, classification events
69 by network queueing disciplines (for eBPF programs attached to a tc(8)
70 classifier), and other types that may be added in the future. A new
71 event triggers execution of the eBPF program, which may store informa‐
72 tion about the event in eBPF maps. Beyond storing data, eBPF programs
73 may call a fixed set of in-kernel helper functions.
74
75 The same eBPF program can be attached to multiple events and different
76 eBPF programs can access the same map:
77
78 tracing tracing tracing packet packet packet
79 event A event B event C on eth0 on eth1 on eth2
80 | | | | | ^
81 | | | | v |
82 --> tracing <-- tracing socket tc ingress tc egress
83 prog_1 prog_2 prog_3 classifier action
84 | | | | prog_4 prog_5
85 |--- -----| |------| map_3 | |
86 map_1 map_2 --| map_4 |--
87
88 Arguments
89 The operation to be performed by the bpf() system call is determined by
90 the cmd argument. Each operation takes an accompanying argument, pro‐
91 vided via attr, which is a pointer to a union of type bpf_attr (see
92 below). The size argument is the size of the union pointed to by attr.
93
94 The value provided in cmd is one of the following:
95
96 BPF_MAP_CREATE
97 Create a map and return a file descriptor that refers to the
98 map. The close-on-exec file descriptor flag (see fcntl(2)) is
99 automatically enabled for the new file descriptor.
100
101 BPF_MAP_LOOKUP_ELEM
102 Look up an element by key in a specified map and return its
103 value.
104
105 BPF_MAP_UPDATE_ELEM
106 Create or update an element (key/value pair) in a specified map.
107
108 BPF_MAP_DELETE_ELEM
109 Look up and delete an element by key in a specified map.
110
111 BPF_MAP_GET_NEXT_KEY
112 Look up an element by key in a specified map and return the key
113 of the next element.
114
115 BPF_PROG_LOAD
116 Verify and load an eBPF program, returning a new file descriptor
117 associated with the program. The close-on-exec file descriptor
118 flag (see fcntl(2)) is automatically enabled for the new file
119 descriptor.
120
121 The bpf_attr union consists of various anonymous structures that
122 are used by different bpf() commands:
123
124 union bpf_attr {
125 struct { /* Used by BPF_MAP_CREATE */
126 __u32 map_type;
127 __u32 key_size; /* size of key in bytes */
128 __u32 value_size; /* size of value in bytes */
129 __u32 max_entries; /* maximum number of entries
130 in a map */
131 };
132
133 struct { /* Used by BPF_MAP_*_ELEM and BPF_MAP_GET_NEXT_KEY
134 commands */
135 __u32 map_fd;
136 __aligned_u64 key;
137 union {
138 __aligned_u64 value;
139 __aligned_u64 next_key;
140 };
141 __u64 flags;
142 };
143
144 struct { /* Used by BPF_PROG_LOAD */
145 __u32 prog_type;
146 __u32 insn_cnt;
147 __aligned_u64 insns; /* 'const struct bpf_insn *' */
148 __aligned_u64 license; /* 'const char *' */
149 __u32 log_level; /* verbosity level of verifier */
150 __u32 log_size; /* size of user buffer */
151 __aligned_u64 log_buf; /* user supplied 'char *'
152 buffer */
153 __u32 kern_version;
154 /* checked when prog_type=kprobe
155 (since Linux 4.1) */
156 };
157 } __attribute__((aligned(8)));
158
159 eBPF maps
160 Maps are a generic data structure for storage of different types of
161 data. They allow sharing of data between eBPF kernel programs, and
162 also between kernel and user-space applications.
163
164 Each map type has the following attributes:
165
166 * type
167
168 * maximum number of elements
169
170 * key size in bytes
171
172 * value size in bytes
173
174 The following wrapper functions demonstrate how various bpf() commands
175 can be used to access the maps. The functions use the cmd argument to
176 invoke different operations.
177
178 BPF_MAP_CREATE
179 The BPF_MAP_CREATE command creates a new map, returning a new
180 file descriptor that refers to the map.
181
182 int
183 bpf_create_map(enum bpf_map_type map_type,
184 unsigned int key_size,
185 unsigned int value_size,
186 unsigned int max_entries)
187 {
188 union bpf_attr attr = {
189 .map_type = map_type,
190 .key_size = key_size,
191 .value_size = value_size,
192 .max_entries = max_entries
193 };
194
195 return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
196 }
197
198 The new map has the type specified by map_type, and attributes
199 as specified in key_size, value_size, and max_entries. On suc‐
200 cess, this operation returns a file descriptor. On error, -1 is
201 returned and errno is set to EINVAL, EPERM, or ENOMEM.
202
203 The key_size and value_size attributes will be used by the veri‐
204 fier during program loading to check that the program is calling
205 bpf_map_*_elem() helper functions with a correctly initialized
206 key and to check that the program doesn't access the map element
207 value beyond the specified value_size. For example, when a map
208 is created with a key_size of 8 and the eBPF program calls
209
210 bpf_map_lookup_elem(map_fd, fp - 4)
211
212 the program will be rejected, since the in-kernel helper func‐
213 tion
214
215 bpf_map_lookup_elem(map_fd, void *key)
216
217 expects to read 8 bytes from the location pointed to by key, but
218 the fp - 4 (where fp is the top of the stack) starting address
219 will cause out-of-bounds stack access.
220
221 Similarly, when a map is created with a value_size of 1 and the
222 eBPF program contains
223
224 value = bpf_map_lookup_elem(...);
225 *(u32 *) value = 1;
226
227 the program will be rejected, since it accesses the value
228 pointer beyond the specified 1 byte value_size limit.
229
230 Currently, the following values are supported for map_type:
231
232 enum bpf_map_type {
233 BPF_MAP_TYPE_UNSPEC, /* Reserve 0 as invalid map type */
234 BPF_MAP_TYPE_HASH,
235 BPF_MAP_TYPE_ARRAY,
236 BPF_MAP_TYPE_PROG_ARRAY,
237 BPF_MAP_TYPE_PERF_EVENT_ARRAY,
238 BPF_MAP_TYPE_PERCPU_HASH,
239 BPF_MAP_TYPE_PERCPU_ARRAY,
240 BPF_MAP_TYPE_STACK_TRACE,
241 BPF_MAP_TYPE_CGROUP_ARRAY,
242 BPF_MAP_TYPE_LRU_HASH,
243 BPF_MAP_TYPE_LRU_PERCPU_HASH,
244 BPF_MAP_TYPE_LPM_TRIE,
245 BPF_MAP_TYPE_ARRAY_OF_MAPS,
246 BPF_MAP_TYPE_HASH_OF_MAPS,
247 BPF_MAP_TYPE_DEVMAP,
248 BPF_MAP_TYPE_SOCKMAP,
249 BPF_MAP_TYPE_CPUMAP,
250 };
251
252 map_type selects one of the available map implementations in the
253 kernel. For all map types, eBPF programs access maps with the
254 same bpf_map_lookup_elem() and bpf_map_update_elem() helper
255 functions. Further details of the various map types are given
256 below.
257
258 BPF_MAP_LOOKUP_ELEM
259 The BPF_MAP_LOOKUP_ELEM command looks up an element with a given
260 key in the map referred to by the file descriptor fd.
261
262 int
263 bpf_lookup_elem(int fd, const void *key, void *value)
264 {
265 union bpf_attr attr = {
266 .map_fd = fd,
267 .key = ptr_to_u64(key),
268 .value = ptr_to_u64(value),
269 };
270
271 return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
272 }
273
274 If an element is found, the operation returns zero and stores
275 the element's value into value, which must point to a buffer of
276 value_size bytes.
277
278 If no element is found, the operation returns -1 and sets errno
279 to ENOENT.
280
281 BPF_MAP_UPDATE_ELEM
282 The BPF_MAP_UPDATE_ELEM command creates or updates an element
283 with a given key/value in the map referred to by the file
284 descriptor fd.
285
286 int
287 bpf_update_elem(int fd, const void *key, const void *value,
288 uint64_t flags)
289 {
290 union bpf_attr attr = {
291 .map_fd = fd,
292 .key = ptr_to_u64(key),
293 .value = ptr_to_u64(value),
294 .flags = flags,
295 };
296
297 return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
298 }
299
300 The flags argument should be specified as one of the following:
301
302 BPF_ANY
303 Create a new element or update an existing element.
304
305 BPF_NOEXIST
306 Create a new element only if it did not exist.
307
308 BPF_EXIST
309 Update an existing element.
310
311 On success, the operation returns zero. On error, -1 is
312 returned and errno is set to EINVAL, EPERM, ENOMEM, or E2BIG.
313 E2BIG indicates that the number of elements in the map reached
314 the max_entries limit specified at map creation time. EEXIST
315 will be returned if flags specifies BPF_NOEXIST and the element
316 with key already exists in the map. ENOENT will be returned if
317 flags specifies BPF_EXIST and the element with key doesn't exist
318 in the map.
319
320 BPF_MAP_DELETE_ELEM
321 The BPF_MAP_DELETE_ELEM command deleted the element whose key is
322 key from the map referred to by the file descriptor fd.
323
324 int
325 bpf_delete_elem(int fd, const void *key)
326 {
327 union bpf_attr attr = {
328 .map_fd = fd,
329 .key = ptr_to_u64(key),
330 };
331
332 return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
333 }
334
335 On success, zero is returned. If the element is not found, -1
336 is returned and errno is set to ENOENT.
337
338 BPF_MAP_GET_NEXT_KEY
339 The BPF_MAP_GET_NEXT_KEY command looks up an element by key in
340 the map referred to by the file descriptor fd and sets the
341 next_key pointer to the key of the next element.
342
343 int
344 bpf_get_next_key(int fd, const void *key, void *next_key)
345 {
346 union bpf_attr attr = {
347 .map_fd = fd,
348 .key = ptr_to_u64(key),
349 .next_key = ptr_to_u64(next_key),
350 };
351
352 return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
353 }
354
355 If key is found, the operation returns zero and sets the
356 next_key pointer to the key of the next element. If key is not
357 found, the operation returns zero and sets the next_key pointer
358 to the key of the first element. If key is the last element, -1
359 is returned and errno is set to ENOENT. Other possible errno
360 values are ENOMEM, EFAULT, EPERM, and EINVAL. This method can
361 be used to iterate over all elements in the map.
362
363 close(map_fd)
364 Delete the map referred to by the file descriptor map_fd. When
365 the user-space program that created a map exits, all maps will
366 be deleted automatically (but see NOTES).
367
368 eBPF map types
369 The following map types are supported:
370
371 BPF_MAP_TYPE_HASH
372 Hash-table maps have the following characteristics:
373
374 * Maps are created and destroyed by user-space programs. Both
375 user-space and eBPF programs can perform lookup, update, and
376 delete operations.
377
378 * The kernel takes care of allocating and freeing key/value
379 pairs.
380
381 * The map_update_elem() helper will fail to insert new element
382 when the max_entries limit is reached. (This ensures that
383 eBPF programs cannot exhaust memory.)
384
385 * map_update_elem() replaces existing elements atomically.
386
387 Hash-table maps are optimized for speed of lookup.
388
389 BPF_MAP_TYPE_ARRAY
390 Array maps have the following characteristics:
391
392 * Optimized for fastest possible lookup. In the future the
393 verifier/JIT compiler may recognize lookup() operations that
394 employ a constant key and optimize it into constant pointer.
395 It is possible to optimize a non-constant key into direct
396 pointer arithmetic as well, since pointers and value_size are
397 constant for the life of the eBPF program. In other words,
398 array_map_lookup_elem() may be 'inlined' by the verifier/JIT
399 compiler while preserving concurrent access to this map from
400 user space.
401
402 * All array elements pre-allocated and zero initialized at init
403 time
404
405 * The key is an array index, and must be exactly four bytes.
406
407 * map_delete_elem() fails with the error EINVAL, since elements
408 cannot be deleted.
409
410 * map_update_elem() replaces elements in a nonatomic fashion;
411 for atomic updates, a hash-table map should be used instead.
412 There is however one special case that can also be used with
413 arrays: the atomic built-in __sync_fetch_and_add() can be
414 used on 32 and 64 bit atomic counters. For example, it can
415 be applied on the whole value itself if it represents a sin‐
416 gle counter, or in case of a structure containing multiple
417 counters, it could be used on individual counters. This is
418 quite often useful for aggregation and accounting of events.
419
420 Among the uses for array maps are the following:
421
422 * As "global" eBPF variables: an array of 1 element whose key
423 is (index) 0 and where the value is a collection of 'global'
424 variables which eBPF programs can use to keep state between
425 events.
426
427 * Aggregation of tracing events into a fixed set of buckets.
428
429 * Accounting of networking events, for example, number of pack‐
430 ets and packet sizes.
431
432 BPF_MAP_TYPE_PROG_ARRAY (since Linux 4.2)
433 A program array map is a special kind of array map whose map
434 values contain only file descriptors referring to other eBPF
435 programs. Thus, both the key_size and value_size must be
436 exactly four bytes. This map is used in conjunction with the
437 bpf_tail_call() helper.
438
439 This means that an eBPF program with a program array map
440 attached to it can call from kernel side into
441
442 void bpf_tail_call(void *context, void *prog_map,
443 unsigned int index);
444
445 and therefore replace its own program flow with the one from the
446 program at the given program array slot, if present. This can
447 be regarded as kind of a jump table to a different eBPF program.
448 The invoked program will then reuse the same stack. When a jump
449 into the new program has been performed, it won't return to the
450 old program anymore.
451
452 If no eBPF program is found at the given index of the program
453 array (because the map slot doesn't contain a valid program file
454 descriptor, the specified lookup index/key is out of bounds, or
455 the limit of 32 nested calls has been exceed), execution contin‐
456 ues with the current eBPF program. This can be used as a fall-
457 through for default cases.
458
459 A program array map is useful, for example, in tracing or net‐
460 working, to handle individual system calls or protocols in their
461 own subprograms and use their identifiers as an individual map
462 index. This approach may result in performance benefits, and
463 also makes it possible to overcome the maximum instruction limit
464 of a single eBPF program. In dynamic environments, a user-space
465 daemon might atomically replace individual subprograms at run-
466 time with newer versions to alter overall program behavior, for
467 instance, if global policies change.
468
469 eBPF programs
470 The BPF_PROG_LOAD command is used to load an eBPF program into the ker‐
471 nel. The return value for this command is a new file descriptor asso‐
472 ciated with this eBPF program.
473
474 char bpf_log_buf[LOG_BUF_SIZE];
475
476 int
477 bpf_prog_load(enum bpf_prog_type type,
478 const struct bpf_insn *insns, int insn_cnt,
479 const char *license)
480 {
481 union bpf_attr attr = {
482 .prog_type = type,
483 .insns = ptr_to_u64(insns),
484 .insn_cnt = insn_cnt,
485 .license = ptr_to_u64(license),
486 .log_buf = ptr_to_u64(bpf_log_buf),
487 .log_size = LOG_BUF_SIZE,
488 .log_level = 1,
489 };
490
491 return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
492 }
493
494 prog_type is one of the available program types:
495
496 enum bpf_prog_type {
497 BPF_PROG_TYPE_UNSPEC, /* Reserve 0 as invalid
498 program type */
499 BPF_PROG_TYPE_SOCKET_FILTER,
500 BPF_PROG_TYPE_KPROBE,
501 BPF_PROG_TYPE_SCHED_CLS,
502 BPF_PROG_TYPE_SCHED_ACT,
503 };
504
505 For further details of eBPF program types, see below.
506
507 The remaining fields of bpf_attr are set as follows:
508
509 * insns is an array of struct bpf_insn instructions.
510
511 * insn_cnt is the number of instructions in the program referred to by
512 insns.
513
514 * license is a license string, which must be GPL compatible to call
515 helper functions marked gpl_only. (The licensing rules are the same
516 as for kernel modules, so that also dual licenses, such as "Dual
517 BSD/GPL", may be used.)
518
519 * log_buf is a pointer to a caller-allocated buffer in which the in-
520 kernel verifier can store the verification log. This log is a
521 multi-line string that can be checked by the program author in order
522 to understand how the verifier came to the conclusion that the eBPF
523 program is unsafe. The format of the output can change at any time
524 as the verifier evolves.
525
526 * log_size size of the buffer pointed to by log_buf. If the size of
527 the buffer is not large enough to store all verifier messages, -1 is
528 returned and errno is set to ENOSPC.
529
530 * log_level verbosity level of the verifier. A value of zero means
531 that the verifier will not provide a log; in this case, log_buf must
532 be a NULL pointer, and log_size must be zero.
533
534 Applying close(2) to the file descriptor returned by BPF_PROG_LOAD will
535 unload the eBPF program (but see NOTES).
536
537 Maps are accessible from eBPF programs and are used to exchange data
538 between eBPF programs and between eBPF programs and user-space pro‐
539 grams. For example, eBPF programs can process various events (like
540 kprobe, packets) and store their data into a map, and user-space pro‐
541 grams can then fetch data from the map. Conversely, user-space pro‐
542 grams can use a map as a configuration mechanism, populating the map
543 with values checked by the eBPF program, which then modifies its behav‐
544 ior on the fly according to those values.
545
546 eBPF program types
547 The eBPF program type (prog_type) determines the subset of kernel
548 helper functions that the program may call. The program type also
549 determines the program input (context)—the format of struct bpf_context
550 (which is the data blob passed into the eBPF program as the first argu‐
551 ment).
552
553 For example, a tracing program does not have the exact same subset of
554 helper functions as a socket filter program (though they may have some
555 helpers in common). Similarly, the input (context) for a tracing pro‐
556 gram is a set of register values, while for a socket filter it is a
557 network packet.
558
559 The set of functions available to eBPF programs of a given type may
560 increase in the future.
561
562 The following program types are supported:
563
564 BPF_PROG_TYPE_SOCKET_FILTER (since Linux 3.19)
565 Currently, the set of functions for BPF_PROG_TYPE_SOCKET_FILTER
566 is:
567
568 bpf_map_lookup_elem(map_fd, void *key)
569 /* look up key in a map_fd */
570 bpf_map_update_elem(map_fd, void *key, void *value)
571 /* update key/value */
572 bpf_map_delete_elem(map_fd, void *key)
573 /* delete key in a map_fd */
574
575 The bpf_context argument is a pointer to a struct __sk_buff.
576
577 BPF_PROG_TYPE_KPROBE (since Linux 4.1)
578 [To be documented]
579
580 BPF_PROG_TYPE_SCHED_CLS (since Linux 4.1)
581 [To be documented]
582
583 BPF_PROG_TYPE_SCHED_ACT (since Linux 4.1)
584 [To be documented]
585
586 Events
587 Once a program is loaded, it can be attached to an event. Various ker‐
588 nel subsystems have different ways to do so.
589
590 Since Linux 3.19, the following call will attach the program prog_fd to
591 the socket sockfd, which was created by an earlier call to socket(2):
592
593 setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_BPF,
594 &prog_fd, sizeof(prog_fd));
595
596 Since Linux 4.1, the following call may be used to attach the eBPF pro‐
597 gram referred to by the file descriptor prog_fd to a perf event file
598 descriptor, event_fd, that was created by a previous call to
599 perf_event_open(2):
600
601 ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
602
604 /* bpf+sockets example:
605 * 1. create array map of 256 elements
606 * 2. load program that counts number of packets received
607 * r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]
608 * map[r0]++
609 * 3. attach prog_fd to raw socket via setsockopt()
610 * 4. print number of received TCP/UDP packets every second
611 */
612 int
613 main(int argc, char **argv)
614 {
615 int sock, map_fd, prog_fd, key;
616 long long value = 0, tcp_cnt, udp_cnt;
617
618 map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key),
619 sizeof(value), 256);
620 if (map_fd < 0) {
621 printf("failed to create map '%s'\n", strerror(errno));
622 /* likely not run as root */
623 return 1;
624 }
625
626 struct bpf_insn prog[] = {
627 BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */
628 BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)),
629 /* r0 = ip->proto */
630 BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),
631 /* *(u32 *)(fp - 4) = r0 */
632 BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */
633 BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */
634 BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */
635 BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
636 /* r0 = map_lookup(r1, r2) */
637 BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
638 /* if (r0 == 0) goto pc+2 */
639 BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
640 BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0),
641 /* lock *(u64 *) r0 += r1 */
642 BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
643 BPF_EXIT_INSN(), /* return r0 */
644 };
645
646 prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog,
647 sizeof(prog), "GPL");
648
649 sock = open_raw_sock("lo");
650
651 assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
652 sizeof(prog_fd)) == 0);
653
654 for (;;) {
655 key = IPPROTO_TCP;
656 assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
657 key = IPPROTO_UDP;
658 assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
659 printf("TCP %lld UDP %lld packets\n", tcp_cnt, udp_cnt);
660 sleep(1);
661 }
662
663 return 0;
664 }
665
666 Some complete working code can be found in the samples/bpf directory in
667 the kernel source tree.
668
670 For a successful call, the return value depends on the operation:
671
672 BPF_MAP_CREATE
673 The new file descriptor associated with the eBPF map.
674
675 BPF_PROG_LOAD
676 The new file descriptor associated with the eBPF program.
677
678 All other commands
679 Zero.
680
681 On error, -1 is returned, and errno is set appropriately.
682
684 E2BIG The eBPF program is too large or a map reached the max_entries
685 limit (maximum number of elements).
686
687 EACCES For BPF_PROG_LOAD, even though all program instructions are
688 valid, the program has been rejected because it was deemed
689 unsafe. This may be because it may have accessed a disallowed
690 memory region or an uninitialized stack/register or because the
691 function constraints don't match the actual types or because
692 there was a misaligned memory access. In this case, it is rec‐
693 ommended to call bpf() again with log_level = 1 and examine
694 log_buf for the specific reason provided by the verifier.
695
696 EBADF fd is not an open file descriptor.
697
698 EFAULT One of the pointers (key or value or log_buf or insns) is out‐
699 side the accessible address space.
700
701 EINVAL The value specified in cmd is not recognized by this kernel.
702
703 EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid.
704
705 EINVAL For BPF_MAP_*_ELEM commands, some of the fields of union
706 bpf_attr that are not used by this command are not set to zero.
707
708 EINVAL For BPF_PROG_LOAD, indicates an attempt to load an invalid pro‐
709 gram. eBPF programs can be deemed invalid due to unrecognized
710 instructions, the use of reserved fields, jumps out of range,
711 infinite loops or calls of unknown functions.
712
713 ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that
714 the element with the given key was not found.
715
716 ENOMEM Cannot allocate sufficient memory.
717
718 EPERM The call was made without sufficient privilege (without the
719 CAP_SYS_ADMIN capability).
720
722 The bpf() system call first appeared in Linux 3.18.
723
725 The bpf() system call is Linux-specific.
726
728 In the current implementation, all bpf() commands require the caller to
729 have the CAP_SYS_ADMIN capability.
730
731 eBPF objects (maps and programs) can be shared between processes. For
732 example, after fork(2), the child inherits file descriptors referring
733 to the same eBPF objects. In addition, file descriptors referring to
734 eBPF objects can be transferred over UNIX domain sockets. File
735 descriptors referring to eBPF objects can be duplicated in the usual
736 way, using dup(2) and similar calls. An eBPF object is deallocated
737 only after all file descriptors referring to the object have been
738 closed.
739
740 eBPF programs can be written in a restricted C that is compiled (using
741 the clang compiler) into eBPF bytecode. Various features are omitted
742 from this restricted C, such as loops, global variables, variadic func‐
743 tions, floating-point numbers, and passing structures as function argu‐
744 ments. Some examples can be found in the samples/bpf/*_kern.c files in
745 the kernel source tree.
746
747 The kernel contains a just-in-time (JIT) compiler that translates eBPF
748 bytecode into native machine code for better performance. In kernels
749 before Linux 4.15, the JIT compiler is disabled by default, but its
750 operation can be controlled by writing one of the following integer
751 strings to the file /proc/sys/net/core/bpf_jit_enable:
752
753 0 Disable JIT compilation (default).
754
755 1 Normal compilation.
756
757 2 Debugging mode. The generated opcodes are dumped in hexadecimal
758 into the kernel log. These opcodes can then be disassembled using
759 the program tools/net/bpf_jit_disasm.c provided in the kernel source
760 tree.
761
762 Since Linux 4.15, the kernel may configured with the CON‐
763 FIG_BPF_JIT_ALWAYS_ON option. In this case, the JIT compiler is always
764 enabled, and the bpf_jit_enable is initialized to 1 and is immutable.
765 (This kernel configuration option was provided as a mitigation for one
766 of the Spectre attacks against the BPF interpreter.)
767
768 The JIT compiler for eBPF is currently available for the following
769 architectures:
770
771 * x86-64 (since Linux 3.18);
772 * ARM-64 (since Linux 3.18);
773 * s390 (since Linux 4.1);
774 * PowerPC 64 (since Linux 4.8);
775 * SPARC 64 (since Linux 4.12);
776 * MIPS (since Linux 4.13);
777 * ARM32 (since Linux 4.14).
778
780 seccomp(2), socket(7), tc(8), tc-bpf(8)
781
782 Both classic and extended BPF are explained in the kernel source file
783 Documentation/networking/filter.txt.
784
786 This page is part of release 4.16 of the Linux man-pages project. A
787 description of the project, information about reporting bugs, and the
788 latest version of this page, can be found at
789 https://www.kernel.org/doc/man-pages/.
790
791
792
793Linux 2018-02-02 BPF(2)