1BPF classifier and actions in tc(8) Linux BPF classifier and actions in tc(8)
2
3
4
6 BPF - BPF programmable classifier and actions for ingress/egress queue‐
7 ing disciplines
8
10 eBPF classifier (filter) or action:
11 tc filter ... bpf [ object-file OBJ_FILE ] [ section CLS_NAME ] [
12 export UDS_FILE ] [ verbose ] [ skip_hw | skip_sw ] [ police
13 POLICE_SPEC ] [ action ACTION_SPEC ] [ classid CLASSID ]
14 tc action ... bpf [ object-file OBJ_FILE ] [ section CLS_NAME ] [
15 export UDS_FILE ] [ verbose ]
16
17
18 cBPF classifier (filter) or action:
19 tc filter ... bpf [ bytecode-file BPF_FILE | bytecode BPF_BYTECODE ] [
20 police POLICE_SPEC ] [ action ACTION_SPEC ] [ classid CLASSID ]
21 tc action ... bpf [ bytecode-file BPF_FILE | bytecode BPF_BYTECODE ]
22
23
25 Extended Berkeley Packet Filter ( eBPF ) and classic Berkeley Packet
26 Filter (originally known as BPF, for better distinction referred to as
27 cBPF here) are both available as a fully programmable and highly effi‐
28 cient classifier and actions. They both offer a minimal instruction set
29 for implementing small programs which can safely be loaded into the
30 kernel and thus executed in a tiny virtual machine from kernel space.
31 An in-kernel verifier guarantees that a specified program always termi‐
32 nates and neither crashes nor leaks data from the kernel.
33
34 In Linux, it's generally considered that eBPF is the successor of cBPF.
35 The kernel internally transforms cBPF expressions into eBPF expressions
36 and executes the latter. Execution of them can be performed in an
37 interpreter or at setup time, they can be just-in-time compiled
38 (JIT'ed) to run as native machine code. Currently, x86_64, ARM64 and
39 s390 architectures have eBPF JIT support, whereas PPC, SPARC, ARM and
40 MIPS have cBPF, but did not (yet) switch to eBPF JIT support.
41
42 eBPF's instruction set has similar underlying principles as the cBPF
43 instruction set, it however is modelled closer to the underlying archi‐
44 tecture to better mimic native instruction sets with the aim to achieve
45 a better run-time performance. It is designed to be JIT'ed with a one
46 to one mapping, which can also open up the possibility for compilers to
47 generate optimized eBPF code through an eBPF backend that performs
48 almost as fast as natively compiled code. Given that LLVM provides such
49 an eBPF backend, eBPF programs can therefore easily be programmed in a
50 subset of the C language. Other than that, eBPF infrastructure also
51 comes with a construct called "maps". eBPF maps are key/value stores
52 that are shared between multiple eBPF programs, but also between eBPF
53 programs and user space applications.
54
55 For the traffic control subsystem, classifier and actions that can be
56 attached to ingress and egress qdiscs can be written in eBPF or cBPF.
57 The advantage over other classifier and actions is that eBPF/cBPF pro‐
58 vides the generic framework, while users can implement their highly
59 specialized use cases efficiently. This means that the classifier or
60 action written that way will not suffer from feature bloat, and can
61 therefore execute its task highly efficient. It allows for non-linear
62 classification and even merging the action part into the classifica‐
63 tion. Combined with efficient eBPF map data structures, user space can
64 push new policies like classids into the kernel without reloading a
65 classifier, or it can gather statistics that are pushed into one map
66 and use another one for dynamically load balancing traffic based on the
67 determined load, just to provide a few examples.
68
69
71 object-file
72 points to an object file that has an executable and linkable format
73 (ELF) and contains eBPF opcodes and eBPF map definitions. The LLVM com‐
74 piler infrastructure with clang(1) as a C language front end is one
75 project that supports emitting eBPF object files that can be passed to
76 the eBPF classifier (more details in the EXAMPLES section). This option
77 is mandatory when an eBPF classifier or action is to be loaded.
78
79
80 section
81 is the name of the ELF section from the object file, where the eBPF
82 classifier or action resides. By default the section name for the clas‐
83 sifier is called "classifier", and for the action "action". Given that
84 a single object file can contain multiple classifier and actions, the
85 corresponding section name needs to be specified, if it differs from
86 the defaults.
87
88
89 export
90 points to a Unix domain socket file. In case the eBPF object file also
91 contains a section named "maps" with eBPF map specifications, then the
92 map file descriptors can be handed off via the Unix domain socket to an
93 eBPF "agent" herding all descriptors after tc lifetime. This can be
94 some third party application implementing the IPC counterpart for the
95 import, that uses them for calling into bpf(2) system call to read out
96 or update eBPF map data from user space, for example, for monitoring
97 purposes or to push down new policies.
98
99
100 verbose
101 if set, it will dump the eBPF verifier output, even if loading the eBPF
102 program was successful. By default, only on error, the verifier log is
103 being emitted to the user.
104
105
106 skip_hw | skip_sw
107 hardware offload control flags. By default TC will try to offload fil‐
108 ters to hardware if possible. skip_hw explicitly disables the attempt
109 to offload. skip_sw forces the offload and disables running the eBPF
110 program in the kernel. If hardware offload is not possible and this
111 flag was set kernel will report an error and filter will not be
112 installed at all.
113
114
115 police
116 is an optional parameter for an eBPF/cBPF classifier that specifies a
117 police in tc(1) which is attached to the classifier, for example, on an
118 ingress qdisc.
119
120
121 action
122 is an optional parameter for an eBPF/cBPF classifier that specifies a
123 subsequent action in tc(1) which is attached to a classifier.
124
125
126 classid
127 flowid
128 provides the default traffic control class identifier for this
129 eBPF/cBPF classifier. The default class identifier can also be over‐
130 written by the return code of the eBPF/cBPF program. A default return
131 code of -1 specifies the here provided default class identifier to be
132 used. A return code of the eBPF/cBPF program of 0 implies that no match
133 took place, and a return code other than these two will override the
134 default classid. This allows for efficient, non-linear classification
135 with only a single eBPF/cBPF program as opposed to having multiple
136 individual programs for various class identifiers which would need to
137 reparse packet contents.
138
139
140 bytecode
141 is being used for loading cBPF classifier and actions only. The cBPF
142 bytecode is directly passed as a text string in the form of ´s,c t f
143 k,c t f k,c t f k,...´ , where s denotes the number of subsequent
144 4-tuples. One such 4-tuple consists of c t f k decimals, where c repre‐
145 sents the cBPF opcode, t the jump true offset target, f the jump false
146 offset target and k the immediate constant/literal. There are various
147 tools that generate code in this loadable format, for example, bpf_asm
148 that ships with the Linux kernel source tree under tools/net/ , so it
149 is certainly not expected to hack this by hand. The bytecode or byte‐
150 code-file option is mandatory when a cBPF classifier or action is to be
151 loaded.
152
153
154 bytecode-file
155 also being used to load a cBPF classifier or action. It's effectively
156 the same as bytecode only that the cBPF bytecode is not passed directly
157 via command line, but rather resides in a text file.
158
159
161 eBPF TOOLING
162 A full blown example including eBPF agent code can be found inside the
163 iproute2 source package under: examples/bpf/
164
165 As prerequisites, the kernel needs to have the eBPF system call namely
166 bpf(2) enabled and ships with cls_bpf and act_bpf kernel modules for
167 the traffic control subsystem. To enable eBPF/eBPF JIT support, depend‐
168 ing which of the two the given architecture supports:
169
170 echo 1 > /proc/sys/net/core/bpf_jit_enable
171
172 A given restricted C file can be compiled via LLVM as:
173
174 clang -O2 -emit-llvm -c bpf.c -o - | llc -march=bpf -filetype=obj
175 -o bpf.o
176
177 The compiler invocation might still simplify in future, so for now,
178 it's quite handy to alias this construct in one way or another, for
179 example:
180
181 __bcc() {
182 clang -O2 -emit-llvm -c $1 -o - | \
183 llc -march=bpf -filetype=obj -o "`basename $1 .c`.o"
184 }
185
186 alias bcc=__bcc
187
188 A minimal, stand-alone unit, which matches on all traffic with the
189 default classid (return code of -1) looks like:
190
191
192 #include <linux/bpf.h>
193
194 #ifndef __section
195 # define __section(x) __attribute__((section(x), used))
196 #endif
197
198 __section("classifier") int cls_main(struct __sk_buff *skb)
199 {
200 return -1;
201 }
202
203 char __license[] __section("license") = "GPL";
204
205 More examples can be found further below in subsection eBPF PROGRAMMING
206 as focus here will be on tooling.
207
208 There can be various other sections, for example, also for actions.
209 Thus, an object file in eBPF can contain multiple entrance points.
210 Always a specific entrance point, however, must be specified when con‐
211 figuring with tc. A license must be part of the restricted C code and
212 the license string syntax is the same as with Linux kernel modules.
213 The kernel reserves its right that some eBPF helper functions can be
214 restricted to GPL compatible licenses only, and thus may reject a pro‐
215 gram from loading into the kernel when such a license mismatch occurs.
216
217 The resulting object file from the compilation can be inspected with
218 the usual set of tools that also operate on normal object files, for
219 example objdump(1) for inspecting ELF section headers:
220
221
222 objdump -h bpf.o
223 [...]
224 3 classifier 000007f8 0000000000000000 0000000000000000 00000040 2**3
225 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
226 4 action-mark 00000088 0000000000000000 0000000000000000 00000838 2**3
227 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
228 5 action-rand 00000098 0000000000000000 0000000000000000 000008c0 2**3
229 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
230 6 maps 00000030 0000000000000000 0000000000000000 00000958 2**2
231 CONTENTS, ALLOC, LOAD, DATA
232 7 license 00000004 0000000000000000 0000000000000000 00000988 2**0
233 CONTENTS, ALLOC, LOAD, DATA
234 [...]
235
236 Adding an eBPF classifier from an object file that contains a classi‐
237 fier in the default ELF section is trivial (note that instead of
238 "object-file" also shortcuts such as "obj" can be used):
239
240 bcc bpf.c
241 tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1
242
243 In case the classifier resides in ELF section "mycls", then that same
244 command needs to be invoked as:
245
246 tc filter add dev em1 parent 1: bpf obj bpf.o sec mycls flowid 1:1
247
248 Dumping the classifier configuration will tell the location of the
249 classifier, in other words that it's from object file "bpf.o" under
250 section "mycls":
251
252 tc filter show dev em1
253 filter parent 1: protocol all pref 49152 bpf
254 filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid 1:1
255 bpf.o:[mycls]
256
257 The same program can also be installed on ingress qdisc side as opposed
258 to egress ...
259
260 tc qdisc add dev em1 handle ffff: ingress
261 tc filter add dev em1 parent ffff: bpf obj bpf.o sec mycls flowid
262 ffff:1
263
264 ... and again dumped from there:
265
266 tc filter show dev em1 parent ffff:
267 filter protocol all pref 49152 bpf
268 filter protocol all pref 49152 bpf handle 0x1 flowid ffff:1
269 bpf.o:[mycls]
270
271 Attaching a classifier and action on ingress has the restriction that
272 it doesn't have an actual underlying queueing discipline. What ingress
273 can do is to classify, mangle, redirect or drop packets. When queueing
274 is required on ingress side, then ingress must redirect packets to the
275 ifb device, otherwise policing can be used. Moreover, ingress can be
276 used to have an early drop point of unwanted packets before they hit
277 upper layers of the networking stack, perform network accounting with
278 eBPF maps that could be shared with egress, or have an early mangle
279 and/or redirection point to different networking devices.
280
281 Multiple eBPF actions and classifier can be placed into a single object
282 file within various sections. In that case, non-default section names
283 must be provided, which is the case for both actions in this example:
284
285 tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1 \
286 action bpf obj bpf.o sec action-mark \
287 action bpf obj bpf.o sec action-rand ok
288
289 The advantage of this is that the classifier and the two actions can
290 then share eBPF maps with each other, if implemented in the programs.
291
292 In order to access eBPF maps from user space beyond tc(8) setup life‐
293 time, the ownership can be transferred to an eBPF agent via Unix domain
294 sockets. There are two possibilities for implementing this:
295
296 1) implementation of an own eBPF agent that takes care of setting up
297 the Unix domain socket and implementing the protocol that tc(8) dic‐
298 tates. A code example of this can be found inside the iproute2 source
299 package under: examples/bpf/
300
301 2) use tc exec for transferring the eBPF map file descriptors through a
302 Unix domain socket, and spawning an application such as sh(1) . This
303 approach's advantage is that tc will place the file descriptors into
304 the environment and thus make them available just like stdin, stdout,
305 stderr file descriptors, meaning, in case user applications run from
306 within this fd-owner shell, they can terminate and restart without los‐
307 ing eBPF maps file descriptors. Example invocation with the previous
308 classifier and action mixture:
309
310 tc exec bpf imp /tmp/bpf
311 tc filter add dev em1 parent 1: bpf obj bpf.o exp /tmp/bpf flowid
312 1:1 \
313 action bpf obj bpf.o sec action-mark \
314 action bpf obj bpf.o sec action-rand ok
315
316 Assuming that eBPF maps are shared with classifier and actions, it's
317 enough to export them once, for example, from within the classifier or
318 action command. tc will setup all eBPF map file descriptors at the time
319 when the object file is first parsed.
320
321 When a shell has been spawned, the environment will have a couple of
322 eBPF related variables. BPF_NUM_MAPS provides the total number of maps
323 that have been transferred over the Unix domain socket. BPF_MAP<X>'s
324 value is the file descriptor number that can be accessed in eBPF agent
325 applications, in other words, it can directly be used as the file
326 descriptor value for the bpf(2) system call to retrieve or alter eBPF
327 map values. <X> denotes the identifier of the eBPF map. It corresponds
328 to the id member of struct bpf_elf_map from the tc eBPF map specifica‐
329 tion.
330
331 The environment in this example looks as follows:
332
333
334 sh# env | grep BPF
335 BPF_NUM_MAPS=3
336 BPF_MAP1=6
337 BPF_MAP0=5
338 BPF_MAP2=7
339 sh# ls -la /proc/self/fd
340 [...]
341 lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
342 lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
343 lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
344 sh# my_bpf_agent
345
346 eBPF agents are very useful in that they can prepopulate eBPF maps from
347 user space, monitor statistics via maps and based on that feedback, for
348 example, rewrite classids in eBPF map values during runtime. Given that
349 eBPF agents are implemented as normal applications, they can also
350 dynamically receive traffic control policies from external controllers
351 and thus push them down into eBPF maps to dynamically adapt to network
352 conditions. Moreover, eBPF maps can also be shared with other eBPF pro‐
353 gram types (e.g. tracing), thus very powerful combination can therefore
354 be implemented.
355
356
357 eBPF PROGRAMMING
358 eBPF classifier and actions are being implemented in restricted C syn‐
359 tax (in future, there could additionally be new language frontends sup‐
360 ported).
361
362 The header file linux/bpf.h provides eBPF helper functions that can be
363 called from an eBPF program. This man page will only provide two mini‐
364 mal, stand-alone examples, have a look at examples/bpf from the
365 iproute2 source package for a fully fledged flow dissector example to
366 better demonstrate some of the possibilities with eBPF.
367
368 Supported 32 bit classifier return codes from the C program and their
369 meanings:
370 0 , denotes a mismatch
371 -1 , denotes the default classid configured from the command line
372 else , everything else will override the default classid to provide
373 a facility for non-linear matching
374
375 Supported 32 bit action return codes from the C program and their mean‐
376 ings ( linux/pkt_cls.h ):
377 TC_ACT_OK (0) , will terminate the packet processing pipeline and
378 allows the packet to proceed
379 TC_ACT_SHOT (2) , will terminate the packet processing pipeline and
380 drops the packet
381 TC_ACT_UNSPEC (-1) , will use the default action configured from tc
382 (similarly as returning -1 from a classifier)
383 TC_ACT_PIPE (3) , will iterate to the next action, if available
384 TC_ACT_RECLASSIFY (1) , will terminate the packet processing pipe‐
385 line and start classification from the beginning
386 else , everything else is an unspecified return code
387
388 Both classifier and action return codes are supported in eBPF and cBPF
389 programs.
390
391 To demonstrate restricted C syntax, a minimal toy classifier example is
392 provided, which assumes that egress packets, for instance originating
393 from a container, have previously been marked in interval [0, 255]. The
394 program keeps statistics on different marks for user space and maps the
395 classid to the root qdisc with the marking itself as the minor handle:
396
397
398 #include <stdint.h>
399 #include <asm/types.h>
400
401 #include <linux/bpf.h>
402 #include <linux/pkt_sched.h>
403
404 #include "helpers.h"
405
406 struct tuple {
407 long packets;
408 long bytes;
409 };
410
411 #define BPF_MAP_ID_STATS 1 /* agent's map identifier */
412 #define BPF_MAX_MARK 256
413
414 struct bpf_elf_map __section("maps") map_stats = {
415 .type = BPF_MAP_TYPE_ARRAY,
416 .id = BPF_MAP_ID_STATS,
417 .size_key = sizeof(uint32_t),
418 .size_value = sizeof(struct tuple),
419 .max_elem = BPF_MAX_MARK,
420 };
421
422 static inline void cls_update_stats(const struct __sk_buff *skb,
423 uint32_t mark)
424 {
425 struct tuple *tu;
426
427 tu = bpf_map_lookup_elem(&map_stats, &mark);
428 if (likely(tu)) {
429 __sync_fetch_and_add(&tu->packets, 1);
430 __sync_fetch_and_add(&tu->bytes, skb->len);
431 }
432 }
433
434 __section("cls") int cls_main(struct __sk_buff *skb)
435 {
436 uint32_t mark = skb->mark;
437
438 if (unlikely(mark >= BPF_MAX_MARK))
439 return 0;
440
441 cls_update_stats(skb, mark);
442
443 return TC_H_MAKE(TC_H_ROOT, mark);
444 }
445
446 char __license[] __section("license") = "GPL";
447
448 Another small example is a port redirector which demuxes destination
449 port 80 into the interval [8080, 8087] steered by RSS, that can then be
450 attached to ingress qdisc. The exercise of adding the egress counter‐
451 part and IPv6 support is left to the reader:
452
453
454 #include <asm/types.h>
455 #include <asm/byteorder.h>
456
457 #include <linux/bpf.h>
458 #include <linux/filter.h>
459 #include <linux/in.h>
460 #include <linux/if_ether.h>
461 #include <linux/ip.h>
462 #include <linux/tcp.h>
463
464 #include "helpers.h"
465
466 static inline void set_tcp_dport(struct __sk_buff *skb, int nh_off,
467 __u16 old_port, __u16 new_port)
468 {
469 bpf_l4_csum_replace(skb, nh_off + offsetof(struct tcphdr, check),
470 old_port, new_port, sizeof(new_port));
471 bpf_skb_store_bytes(skb, nh_off + offsetof(struct tcphdr, dest),
472 &new_port, sizeof(new_port), 0);
473 }
474
475 static inline int lb_do_ipv4(struct __sk_buff *skb, int nh_off)
476 {
477 __u16 dport, dport_new = 8080, off;
478 __u8 ip_proto, ip_vl;
479
480 ip_proto = load_byte(skb, nh_off +
481 offsetof(struct iphdr, protocol));
482 if (ip_proto != IPPROTO_TCP)
483 return 0;
484
485 ip_vl = load_byte(skb, nh_off);
486 if (likely(ip_vl == 0x45))
487 nh_off += sizeof(struct iphdr);
488 else
489 nh_off += (ip_vl & 0xF) << 2;
490
491 dport = load_half(skb, nh_off + offsetof(struct tcphdr, dest));
492 if (dport != 80)
493 return 0;
494
495 off = skb->queue_mapping & 7;
496 set_tcp_dport(skb, nh_off - BPF_LL_OFF, __constant_htons(80),
497 __cpu_to_be16(dport_new + off));
498 return -1;
499 }
500
501 __section("lb") int lb_main(struct __sk_buff *skb)
502 {
503 int ret = 0, nh_off = BPF_LL_OFF + ETH_HLEN;
504
505 if (likely(skb->protocol == __constant_htons(ETH_P_IP)))
506 ret = lb_do_ipv4(skb, nh_off);
507
508 return ret;
509 }
510
511 char __license[] __section("license") = "GPL";
512
513 The related helper header file helpers.h in both examples was:
514
515
516 /* Misc helper macros. */
517 #define __section(x) __attribute__((section(x), used))
518 #define offsetof(x, y) __builtin_offsetof(x, y)
519 #define likely(x) __builtin_expect(!!(x), 1)
520 #define unlikely(x) __builtin_expect(!!(x), 0)
521
522 /* Used map structure */
523 struct bpf_elf_map {
524 __u32 type;
525 __u32 size_key;
526 __u32 size_value;
527 __u32 max_elem;
528 __u32 id;
529 };
530
531 /* Some used BPF function calls. */
532 static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from,
533 int len, int flags) =
534 (void *) BPF_FUNC_skb_store_bytes;
535 static int (*bpf_l4_csum_replace)(void *ctx, int off, int from,
536 int to, int flags) =
537 (void *) BPF_FUNC_l4_csum_replace;
538 static void *(*bpf_map_lookup_elem)(void *map, void *key) =
539 (void *) BPF_FUNC_map_lookup_elem;
540
541 /* Some used BPF intrinsics. */
542 unsigned long long load_byte(void *skb, unsigned long long off)
543 asm ("llvm.bpf.load.byte");
544 unsigned long long load_half(void *skb, unsigned long long off)
545 asm ("llvm.bpf.load.half");
546
547 Best practice, we recommend to only have a single eBPF classifier
548 loaded in tc and perform all necessary matching and mangling from there
549 instead of a list of individual classifier and separate actions. Just a
550 single classifier tailored for a given use-case will be most efficient
551 to run.
552
553
554 eBPF DEBUGGING
555 Both tc filter and action commands for bpf support an optional verbose
556 parameter that can be used to inspect the eBPF verifier log. It is
557 dumped by default in case of an error.
558
559 In case the eBPF/cBPF JIT compiler has been enabled, it can also be
560 instructed to emit a debug output of the resulting opcode image into
561 the kernel log, which can be read via dmesg(1) :
562
563 echo 2 > /proc/sys/net/core/bpf_jit_enable
564
565 The Linux kernel source tree ships additionally under tools/net/ a
566 small helper called bpf_jit_disasm that reads out the opcode image dump
567 from the kernel log and dumps the resulting disassembly:
568
569 bpf_jit_disasm -o
570
571 Other than that, the Linux kernel also contains an extensive eBPF/cBPF
572 test suite module called test_bpf . Upon ...
573
574 modprobe test_bpf
575
576 ... it performs a diversity of test cases and dumps the results into
577 the kernel log that can be inspected with dmesg(1) . The results can
578 differ depending on whether the JIT compiler is enabled or not. In case
579 of failed test cases, the module will fail to load. In such cases, we
580 urge you to file a bug report to the related JIT authors, Linux kernel
581 and networking mailing lists.
582
583
584 cBPF
585 Although we generally recommend switching to implementing eBPF classi‐
586 fier and actions, for the sake of completeness, a few words on how to
587 program in cBPF will be lost here.
588
589 Likewise, the bpf_jit_enable switch can be enabled as mentioned
590 already. Tooling such as bpf_jit_disasm is also independent whether
591 eBPF or cBPF code is being loaded.
592
593 Unlike in eBPF, classifier and action are not implemented in restricted
594 C, but rather in a minimal assembler-like language or with the help of
595 other tooling.
596
597 The raw interface with tc takes opcodes directly. For example, the most
598 minimal classifier matching on every packet resulting in the default
599 classid of 1:1 looks like:
600
601 tc filter add dev em1 parent 1: bpf bytecode '1,6 0 0 4294967295,'
602 flowid 1:1
603
604 The first decimal of the bytecode sequence denotes the number of subse‐
605 quent 4-tuples of cBPF opcodes. As mentioned, such a 4-tuple consists
606 of c t f k decimals, where c represents the cBPF opcode, t the jump
607 true offset target, f the jump false offset target and k the immediate
608 constant/literal. Here, this denotes an unconditional return from the
609 program with immediate value of -1.
610
611 Thus, for egress classification, Willem de Bruijn implemented a minimal
612 stand-alone helper tool under the GNU General Public License version 2
613 for iptables(8) BPF extension, which abuses the libpcap internal clas‐
614 sic BPF compiler, his code derived here for usage with tc(8) :
615
616
617 #include <pcap.h>
618 #include <stdio.h>
619
620 int main(int argc, char **argv)
621 {
622 struct bpf_program prog;
623 struct bpf_insn *ins;
624 int i, ret, dlt = DLT_RAW;
625
626 if (argc < 2 || argc > 3)
627 return 1;
628 if (argc == 3) {
629 dlt = pcap_datalink_name_to_val(argv[1]);
630 if (dlt == -1)
631 return 1;
632 }
633
634 ret = pcap_compile_nopcap(-1, dlt, &prog, argv[argc - 1],
635 1, PCAP_NETMASK_UNKNOWN);
636 if (ret)
637 return 1;
638
639 printf("%d,", prog.bf_len);
640 ins = prog.bf_insns;
641
642 for (i = 0; i < prog.bf_len - 1; ++ins, ++i)
643 printf("%u %u %u %u,", ins->code,
644 ins->jt, ins->jf, ins->k);
645 printf("%u %u %u %u",
646 ins->code, ins->jt, ins->jf, ins->k);
647
648 pcap_freecode(&prog);
649 return 0;
650 }
651
652 Given this small helper, any tcpdump(8) filter expression can be abused
653 as a classifier where a match will result in the default classid:
654
655 bpftool EN10MB 'tcp[tcpflags] & tcp-syn != 0' > /var/bpf/tcp-syn
656 tc filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn
657 flowid 1:1
658
659 Basically, such a minimal generator is equivalent to:
660
661 tcpdump -iem1 -ddd 'tcp[tcpflags] & tcp-syn != 0' | tr '\n' ',' >
662 /var/bpf/tcp-syn
663
664 Since libpcap does not support all Linux' specific cBPF extensions in
665 its compiler, the Linux kernel also ships under tools/net/ a minimal
666 BPF assembler called bpf_asm for providing full control. For detailed
667 syntax and semantics on implementing such programs by hand, see refer‐
668 ences under FURTHER READING .
669
670 Trivial toy example in bpf_asm for classifying IPv4/TCP packets, saved
671 in a text file called foobar :
672
673
674 ldh [12]
675 jne #0x800, drop
676 ldb [23]
677 jneq #6, drop
678 ret #-1
679 drop: ret #0
680
681 Similarly, such a classifier can be loaded as:
682
683 bpf_asm foobar > /var/bpf/tcp-syn
684 tc filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn
685 flowid 1:1
686
687 For BPF classifiers, the Linux kernel provides additionally under
688 tools/net/ a small BPF debugger called bpf_dbg , which can be used to
689 test a classifier against pcap files, single-step or add various break‐
690 points into the classifier program and dump register contents during
691 runtime.
692
693 Implementing an action in classic BPF is rather limited in the sense
694 that packet mangling is not supported. Therefore, it's generally recom‐
695 mended to make the switch to eBPF, whenever possible.
696
697
699 Further and more technical details about the BPF architecture can be
700 found in the Linux kernel source tree under Documentation/network‐
701 ing/filter.txt .
702
703 Further details on eBPF tc(8) examples can be found in the iproute2
704 source tree under examples/bpf/ .
705
706
708 tc(8), tc-ematch(8) bpf(2) bpf(4)
709
710
712 Manpage written by Daniel Borkmann.
713
714 Please report corrections or improvements to the Linux kernel network‐
715 ing mailing list: <netdev@vger.kernel.org>
716
717
718
719iproute2 18 May 201B5PF classifier and actions in tc(8)