1BPF classifier and actions in tc(8)  Linux BPF classifier and actions in tc(8)
2
3
4

NAME

6       BPF - BPF programmable classifier and actions for ingress/egress queue‐
7       ing disciplines
8

SYNOPSIS

10   eBPF classifier (filter) or action:
11       tc filter ... bpf [ object-file OBJ_FILE  ]  [  section  CLS_NAME  ]  [
12       export  UDS_FILE  ]  [  verbose  ]  [  skip_hw  |  skip_sw  ]  [ police
13       POLICE_SPEC ] [ action ACTION_SPEC ] [ classid CLASSID ]
14       tc action ... bpf [ object-file OBJ_FILE  ]  [  section  CLS_NAME  ]  [
15       export UDS_FILE ] [ verbose ]
16
17
18   cBPF classifier (filter) or action:
19       tc  filter ... bpf [ bytecode-file BPF_FILE | bytecode BPF_BYTECODE ] [
20       police POLICE_SPEC ] [ action ACTION_SPEC ] [ classid CLASSID ]
21       tc action ... bpf [ bytecode-file BPF_FILE | bytecode BPF_BYTECODE ]
22
23

DESCRIPTION

25       Extended Berkeley Packet Filter ( eBPF ) and  classic  Berkeley  Packet
26       Filter  (originally known as BPF, for better distinction referred to as
27       cBPF here) are both available as a fully programmable and highly  effi‐
28       cient classifier and actions. They both offer a minimal instruction set
29       for implementing small programs which can safely  be  loaded  into  the
30       kernel  and  thus executed in a tiny virtual machine from kernel space.
31       An in-kernel verifier guarantees that a specified program always termi‐
32       nates and neither crashes nor leaks data from the kernel.
33
34       In Linux, it's generally considered that eBPF is the successor of cBPF.
35       The kernel internally transforms cBPF expressions into eBPF expressions
36       and  executes  the  latter.  Execution  of  them can be performed in an
37       interpreter or  at  setup  time,  they  can  be  just-in-time  compiled
38       (JIT'ed)  to  run  as native machine code. Currently, x86_64, ARM64 and
39       s390 architectures have eBPF JIT support, whereas PPC, SPARC,  ARM  and
40       MIPS have cBPF, but did not (yet) switch to eBPF JIT support.
41
42       eBPF's  instruction  set  has similar underlying principles as the cBPF
43       instruction set, it however is modelled closer to the underlying archi‐
44       tecture to better mimic native instruction sets with the aim to achieve
45       a better run-time performance. It is designed to be JIT'ed with  a  one
46       to one mapping, which can also open up the possibility for compilers to
47       generate optimized eBPF code through  an  eBPF  backend  that  performs
48       almost as fast as natively compiled code. Given that LLVM provides such
49       an eBPF backend, eBPF programs can therefore easily be programmed in  a
50       subset  of  the  C  language. Other than that, eBPF infrastructure also
51       comes with a construct called "maps". eBPF maps  are  key/value  stores
52       that  are  shared between multiple eBPF programs, but also between eBPF
53       programs and user space applications.
54
55       For the traffic control subsystem, classifier and actions that  can  be
56       attached  to  ingress and egress qdiscs can be written in eBPF or cBPF.
57       The advantage over other classifier and actions is that eBPF/cBPF  pro‐
58       vides  the  generic  framework,  while users can implement their highly
59       specialized use cases efficiently. This means that  the  classifier  or
60       action  written  that  way  will not suffer from feature bloat, and can
61       therefore execute its task highly efficient. It allows  for  non-linear
62       classification  and  even  merging the action part into the classifica‐
63       tion. Combined with efficient eBPF map data structures, user space  can
64       push  new  policies  like  classids into the kernel without reloading a
65       classifier, or it can gather statistics that are pushed  into  one  map
66       and use another one for dynamically load balancing traffic based on the
67       determined load, just to provide a few examples.
68
69

PARAMETERS

71   object-file
72       points to an object file that has an  executable  and  linkable  format
73       (ELF) and contains eBPF opcodes and eBPF map definitions. The LLVM com‐
74       piler infrastructure with clang(1) as a C language  front  end  is  one
75       project  that supports emitting eBPF object files that can be passed to
76       the eBPF classifier (more details in the EXAMPLES section). This option
77       is mandatory when an eBPF classifier or action is to be loaded.
78
79
80   section
81       is  the  name  of  the ELF section from the object file, where the eBPF
82       classifier or action resides. By default the section name for the clas‐
83       sifier  is called "classifier", and for the action "action". Given that
84       a single object file can contain multiple classifier and  actions,  the
85       corresponding  section  name  needs to be specified, if it differs from
86       the defaults.
87
88
89   export
90       points to a Unix domain socket file. In case the eBPF object file  also
91       contains  a section named "maps" with eBPF map specifications, then the
92       map file descriptors can be handed off via the Unix domain socket to an
93       eBPF  "agent"  herding  all  descriptors after tc lifetime. This can be
94       some third party application implementing the IPC counterpart  for  the
95       import,  that uses them for calling into bpf(2) system call to read out
96       or update eBPF map data from user space, for  example,  for  monitoring
97       purposes or to push down new policies.
98
99
100   verbose
101       if set, it will dump the eBPF verifier output, even if loading the eBPF
102       program was successful. By default, only on error, the verifier log  is
103       being emitted to the user.
104
105
106   skip_hw | skip_sw
107       hardware  offload control flags. By default TC will try to offload fil‐
108       ters to hardware if possible.  skip_hw explicitly disables the  attempt
109       to  offload.   skip_sw forces the offload and disables running the eBPF
110       program in the kernel.  If hardware offload is not  possible  and  this
111       flag  was  set  kernel  will  report  an  error  and filter will not be
112       installed at all.
113
114
115   police
116       is an optional parameter for an eBPF/cBPF classifier that  specifies  a
117       police in tc(1) which is attached to the classifier, for example, on an
118       ingress qdisc.
119
120
121   action
122       is an optional parameter for an eBPF/cBPF classifier that  specifies  a
123       subsequent action in tc(1) which is attached to a classifier.
124
125
126   classid
127   flowid
128       provides   the  default  traffic  control  class  identifier  for  this
129       eBPF/cBPF classifier. The default class identifier can  also  be  over‐
130       written  by  the return code of the eBPF/cBPF program. A default return
131       code of -1 specifies the here provided default class identifier  to  be
132       used. A return code of the eBPF/cBPF program of 0 implies that no match
133       took place, and a return code other than these two  will  override  the
134       default  classid.  This allows for efficient, non-linear classification
135       with only a single eBPF/cBPF program  as  opposed  to  having  multiple
136       individual  programs  for various class identifiers which would need to
137       reparse packet contents.
138
139
140   bytecode
141       is being used for loading cBPF classifier and actions  only.  The  cBPF
142       bytecode  is  directly  passed as a text string in the form of ´s,c t f
143       k,c t f k,c t f k,...´ , where  s  denotes  the  number  of  subsequent
144       4-tuples. One such 4-tuple consists of c t f k decimals, where c repre‐
145       sents the cBPF opcode, t the jump true offset target, f the jump  false
146       offset  target  and k the immediate constant/literal. There are various
147       tools that generate code in this loadable format, for example,  bpf_asm
148       that  ships  with the Linux kernel source tree under tools/net/ , so it
149       is certainly not expected to hack this by hand. The bytecode  or  byte‐
150       code-file option is mandatory when a cBPF classifier or action is to be
151       loaded.
152
153
154   bytecode-file
155       also being used to load a cBPF classifier or action.  It's  effectively
156       the same as bytecode only that the cBPF bytecode is not passed directly
157       via command line, but rather resides in a text file.
158
159

EXAMPLES

161   eBPF TOOLING
162       A full blown example including eBPF agent code can be found inside  the
163       iproute2 source package under: examples/bpf/
164
165       As  prerequisites, the kernel needs to have the eBPF system call namely
166       bpf(2) enabled and ships with cls_bpf and act_bpf  kernel  modules  for
167       the traffic control subsystem. To enable eBPF/eBPF JIT support, depend‐
168       ing which of the two the given architecture supports:
169
170           echo 1 > /proc/sys/net/core/bpf_jit_enable
171
172       A given restricted C file can be compiled via LLVM as:
173
174           clang -O2 -emit-llvm -c bpf.c -o - | llc  -march=bpf  -filetype=obj
175           -o bpf.o
176
177       The  compiler  invocation  might  still simplify in future, so for now,
178       it's quite handy to alias this construct in one  way  or  another,  for
179       example:
180
181           __bcc() {
182                   clang -O2 -emit-llvm -c $1 -o - | \
183                   llc -march=bpf -filetype=obj -o "`basename $1 .c`.o"
184           }
185
186           alias bcc=__bcc
187
188       A  minimal,  stand-alone  unit,  which  matches on all traffic with the
189       default classid (return code of -1) looks like:
190
191
192           #include <linux/bpf.h>
193
194           #ifndef __section
195           # define __section(x)  __attribute__((section(x), used))
196           #endif
197
198           __section("classifier") int cls_main(struct __sk_buff *skb)
199           {
200                   return -1;
201           }
202
203           char __license[] __section("license") = "GPL";
204
205       More examples can be found further below in subsection eBPF PROGRAMMING
206       as focus here will be on tooling.
207
208       There  can  be  various  other sections, for example, also for actions.
209       Thus, an object file in eBPF  can  contain  multiple  entrance  points.
210       Always  a specific entrance point, however, must be specified when con‐
211       figuring with tc. A license must be part of the restricted C  code  and
212       the  license  string  syntax  is the same as with Linux kernel modules.
213       The kernel reserves its right that some eBPF helper  functions  can  be
214       restricted  to GPL compatible licenses only, and thus may reject a pro‐
215       gram from loading into the kernel when such a license mismatch occurs.
216
217       The resulting object file from the compilation can  be  inspected  with
218       the  usual  set  of tools that also operate on normal object files, for
219       example objdump(1) for inspecting ELF section headers:
220
221
222           objdump -h bpf.o
223           [...]
224           3 classifier    000007f8  0000000000000000  0000000000000000  00000040  2**3
225                           CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
226           4 action-mark   00000088  0000000000000000  0000000000000000  00000838  2**3
227                           CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
228           5 action-rand   00000098  0000000000000000  0000000000000000  000008c0  2**3
229                           CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
230           6 maps          00000030  0000000000000000  0000000000000000  00000958  2**2
231                           CONTENTS, ALLOC, LOAD, DATA
232           7 license       00000004  0000000000000000  0000000000000000  00000988  2**0
233                           CONTENTS, ALLOC, LOAD, DATA
234           [...]
235
236       Adding an eBPF classifier from an object file that contains  a  classi‐
237       fier  in  the  default  ELF  section  is  trivial (note that instead of
238       "object-file" also shortcuts such as "obj" can be used):
239
240           bcc bpf.c
241           tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1
242
243       In case the classifier resides in ELF section "mycls", then  that  same
244       command needs to be invoked as:
245
246           tc filter add dev em1 parent 1: bpf obj bpf.o sec mycls flowid 1:1
247
248       Dumping  the  classifier  configuration  will  tell the location of the
249       classifier, in other words that it's from  object  file  "bpf.o"  under
250       section "mycls":
251
252           tc filter show dev em1
253           filter parent 1: protocol all pref 49152 bpf
254           filter  parent 1: protocol all pref 49152 bpf handle 0x1 flowid 1:1
255           bpf.o:[mycls]
256
257       The same program can also be installed on ingress qdisc side as opposed
258       to egress ...
259
260           tc qdisc add dev em1 handle ffff: ingress
261           tc  filter  add dev em1 parent ffff: bpf obj bpf.o sec mycls flowid
262           ffff:1
263
264       ... and again dumped from there:
265
266           tc filter show dev em1 parent ffff:
267           filter protocol all pref 49152 bpf
268           filter protocol  all  pref  49152  bpf  handle  0x1  flowid  ffff:1
269           bpf.o:[mycls]
270
271       Attaching  a  classifier and action on ingress has the restriction that
272       it doesn't have an actual underlying queueing discipline. What  ingress
273       can  do is to classify, mangle, redirect or drop packets. When queueing
274       is required on ingress side, then ingress must redirect packets to  the
275       ifb  device,  otherwise  policing can be used. Moreover, ingress can be
276       used to have an early drop point of unwanted packets  before  they  hit
277       upper  layers  of the networking stack, perform network accounting with
278       eBPF maps that could be shared with egress, or  have  an  early  mangle
279       and/or redirection point to different networking devices.
280
281       Multiple eBPF actions and classifier can be placed into a single object
282       file within various sections. In that case, non-default  section  names
283       must be provided, which is the case for both actions in this example:
284
285           tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1 \
286                                    action bpf obj bpf.o sec action-mark \
287                                    action bpf obj bpf.o sec action-rand ok
288
289       The  advantage  of  this is that the classifier and the two actions can
290       then share eBPF maps with each other, if implemented in the programs.
291
292       In order to access eBPF maps from user space beyond tc(8)  setup  life‐
293       time, the ownership can be transferred to an eBPF agent via Unix domain
294       sockets. There are two possibilities for implementing this:
295
296       1) implementation of an own eBPF agent that takes care  of  setting  up
297       the  Unix  domain  socket and implementing the protocol that tc(8) dic‐
298       tates. A code example of this can be found inside the  iproute2  source
299       package under: examples/bpf/
300
301       2) use tc exec for transferring the eBPF map file descriptors through a
302       Unix domain socket, and spawning an application such as  sh(1)  .  This
303       approach's  advantage  is  that tc will place the file descriptors into
304       the environment and thus make them available just like  stdin,  stdout,
305       stderr  file  descriptors,  meaning, in case user applications run from
306       within this fd-owner shell, they can terminate and restart without los‐
307       ing  eBPF  maps  file descriptors. Example invocation with the previous
308       classifier and action mixture:
309
310           tc exec bpf imp /tmp/bpf
311           tc filter add dev em1 parent 1: bpf obj bpf.o exp  /tmp/bpf  flowid
312           1:1 \
313                                    action bpf obj bpf.o sec action-mark \
314                                    action bpf obj bpf.o sec action-rand ok
315
316       Assuming  that  eBPF  maps are shared with classifier and actions, it's
317       enough to export them once, for example, from within the classifier  or
318       action command. tc will setup all eBPF map file descriptors at the time
319       when the object file is first parsed.
320
321       When a shell has been spawned, the environment will have  a  couple  of
322       eBPF  related variables. BPF_NUM_MAPS provides the total number of maps
323       that have been transferred over the Unix  domain  socket.  BPF_MAP<X>'s
324       value  is the file descriptor number that can be accessed in eBPF agent
325       applications, in other words, it can  directly  be  used  as  the  file
326       descriptor  value  for the bpf(2) system call to retrieve or alter eBPF
327       map values. <X> denotes the identifier of the eBPF map. It  corresponds
328       to the id member of struct bpf_elf_map  from the tc eBPF map specifica‐
329       tion.
330
331       The environment in this example looks as follows:
332
333
334           sh# env | grep BPF
335               BPF_NUM_MAPS=3
336               BPF_MAP1=6
337               BPF_MAP0=5
338               BPF_MAP2=7
339           sh# ls -la /proc/self/fd
340               [...]
341               lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
342               lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
343               lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
344           sh# my_bpf_agent
345
346       eBPF agents are very useful in that they can prepopulate eBPF maps from
347       user space, monitor statistics via maps and based on that feedback, for
348       example, rewrite classids in eBPF map values during runtime. Given that
349       eBPF  agents  are  implemented  as  normal  applications, they can also
350       dynamically receive traffic control policies from external  controllers
351       and  thus push them down into eBPF maps to dynamically adapt to network
352       conditions. Moreover, eBPF maps can also be shared with other eBPF pro‐
353       gram types (e.g. tracing), thus very powerful combination can therefore
354       be implemented.
355
356
357   eBPF PROGRAMMING
358       eBPF classifier and actions are being implemented in restricted C  syn‐
359       tax (in future, there could additionally be new language frontends sup‐
360       ported).
361
362       The header file linux/bpf.h provides eBPF helper functions that can  be
363       called from an eBPF program.  This man page will only provide two mini‐
364       mal, stand-alone  examples,  have  a  look  at  examples/bpf  from  the
365       iproute2  source  package for a fully fledged flow dissector example to
366       better demonstrate some of the possibilities with eBPF.
367
368       Supported 32 bit classifier return codes from the C program  and  their
369       meanings:
370           0 , denotes a mismatch
371           -1 , denotes the default classid configured from the command line
372           else , everything else will override the default classid to provide
373           a facility for non-linear matching
374
375       Supported 32 bit action return codes from the C program and their mean‐
376       ings ( linux/pkt_cls.h ):
377           TC_ACT_OK  (0)  , will terminate the packet processing pipeline and
378           allows the packet to proceed
379           TC_ACT_SHOT (2) , will terminate the packet processing pipeline and
380           drops the packet
381           TC_ACT_UNSPEC (-1) , will use the default action configured from tc
382           (similarly as returning -1 from a classifier)
383           TC_ACT_PIPE (3) , will iterate to the next action, if available
384           TC_ACT_RECLASSIFY (1) , will terminate the packet processing  pipe‐
385           line and start classification from the beginning
386           else , everything else is an unspecified return code
387
388       Both  classifier and action return codes are supported in eBPF and cBPF
389       programs.
390
391       To demonstrate restricted C syntax, a minimal toy classifier example is
392       provided,  which  assumes that egress packets, for instance originating
393       from a container, have previously been marked in interval [0, 255]. The
394       program keeps statistics on different marks for user space and maps the
395       classid to the root qdisc with the marking itself as the minor handle:
396
397
398           #include <stdint.h>
399           #include <asm/types.h>
400
401           #include <linux/bpf.h>
402           #include <linux/pkt_sched.h>
403
404           #include "helpers.h"
405
406           struct tuple {
407                   long packets;
408                   long bytes;
409           };
410
411           #define BPF_MAP_ID_STATS        1 /* agent's map identifier */
412           #define BPF_MAX_MARK            256
413
414           struct bpf_elf_map __section("maps") map_stats = {
415                   .type           =       BPF_MAP_TYPE_ARRAY,
416                   .id             =       BPF_MAP_ID_STATS,
417                   .size_key       =       sizeof(uint32_t),
418                   .size_value     =       sizeof(struct tuple),
419                   .max_elem       =       BPF_MAX_MARK,
420           };
421
422           static inline void cls_update_stats(const struct __sk_buff *skb,
423                                               uint32_t mark)
424           {
425                   struct tuple *tu;
426
427                   tu = bpf_map_lookup_elem(&map_stats, &mark);
428                   if (likely(tu)) {
429                           __sync_fetch_and_add(&tu->packets, 1);
430                           __sync_fetch_and_add(&tu->bytes, skb->len);
431                   }
432           }
433
434           __section("cls") int cls_main(struct __sk_buff *skb)
435           {
436                   uint32_t mark = skb->mark;
437
438                   if (unlikely(mark >= BPF_MAX_MARK))
439                           return 0;
440
441                   cls_update_stats(skb, mark);
442
443                   return TC_H_MAKE(TC_H_ROOT, mark);
444           }
445
446           char __license[] __section("license") = "GPL";
447
448       Another small example is a port redirector  which  demuxes  destination
449       port 80 into the interval [8080, 8087] steered by RSS, that can then be
450       attached to ingress qdisc. The exercise of adding the  egress  counter‐
451       part and IPv6 support is left to the reader:
452
453
454           #include <asm/types.h>
455           #include <asm/byteorder.h>
456
457           #include <linux/bpf.h>
458           #include <linux/filter.h>
459           #include <linux/in.h>
460           #include <linux/if_ether.h>
461           #include <linux/ip.h>
462           #include <linux/tcp.h>
463
464           #include "helpers.h"
465
466           static inline void set_tcp_dport(struct __sk_buff *skb, int nh_off,
467                                            __u16 old_port, __u16 new_port)
468           {
469                   bpf_l4_csum_replace(skb, nh_off + offsetof(struct tcphdr, check),
470                                       old_port, new_port, sizeof(new_port));
471                   bpf_skb_store_bytes(skb, nh_off + offsetof(struct tcphdr, dest),
472                                       &new_port, sizeof(new_port), 0);
473           }
474
475           static inline int lb_do_ipv4(struct __sk_buff *skb, int nh_off)
476           {
477                   __u16 dport, dport_new = 8080, off;
478                   __u8 ip_proto, ip_vl;
479
480                   ip_proto = load_byte(skb, nh_off +
481                                        offsetof(struct iphdr, protocol));
482                   if (ip_proto != IPPROTO_TCP)
483                           return 0;
484
485                   ip_vl = load_byte(skb, nh_off);
486                   if (likely(ip_vl == 0x45))
487                           nh_off += sizeof(struct iphdr);
488                   else
489                           nh_off += (ip_vl & 0xF) << 2;
490
491                   dport = load_half(skb, nh_off + offsetof(struct tcphdr, dest));
492                   if (dport != 80)
493                           return 0;
494
495                   off = skb->queue_mapping & 7;
496                   set_tcp_dport(skb, nh_off - BPF_LL_OFF, __constant_htons(80),
497                                 __cpu_to_be16(dport_new + off));
498                   return -1;
499           }
500
501           __section("lb") int lb_main(struct __sk_buff *skb)
502           {
503                   int ret = 0, nh_off = BPF_LL_OFF + ETH_HLEN;
504
505                   if (likely(skb->protocol == __constant_htons(ETH_P_IP)))
506                           ret = lb_do_ipv4(skb, nh_off);
507
508                   return ret;
509           }
510
511           char __license[] __section("license") = "GPL";
512
513       The related helper header file helpers.h in both examples was:
514
515
516           /* Misc helper macros. */
517           #define __section(x) __attribute__((section(x), used))
518           #define offsetof(x, y) __builtin_offsetof(x, y)
519           #define likely(x) __builtin_expect(!!(x), 1)
520           #define unlikely(x) __builtin_expect(!!(x), 0)
521
522           /* Used map structure */
523           struct bpf_elf_map {
524               __u32 type;
525               __u32 size_key;
526               __u32 size_value;
527               __u32 max_elem;
528               __u32 id;
529           };
530
531           /* Some used BPF function calls. */
532           static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from,
533                                             int len, int flags) =
534                 (void *) BPF_FUNC_skb_store_bytes;
535           static int (*bpf_l4_csum_replace)(void *ctx, int off, int from,
536                                             int to, int flags) =
537                 (void *) BPF_FUNC_l4_csum_replace;
538           static void *(*bpf_map_lookup_elem)(void *map, void *key) =
539                 (void *) BPF_FUNC_map_lookup_elem;
540
541           /* Some used BPF intrinsics. */
542           unsigned long long load_byte(void *skb, unsigned long long off)
543               asm ("llvm.bpf.load.byte");
544           unsigned long long load_half(void *skb, unsigned long long off)
545               asm ("llvm.bpf.load.half");
546
547       Best  practice,  we  recommend  to  only  have a single eBPF classifier
548       loaded in tc and perform all necessary matching and mangling from there
549       instead of a list of individual classifier and separate actions. Just a
550       single classifier tailored for a given use-case will be most  efficient
551       to run.
552
553
554   eBPF DEBUGGING
555       Both  tc filter and action commands for bpf support an optional verbose
556       parameter that can be used to inspect the  eBPF  verifier  log.  It  is
557       dumped by default in case of an error.
558
559       In  case  the  eBPF/cBPF  JIT compiler has been enabled, it can also be
560       instructed to emit a debug output of the resulting  opcode  image  into
561       the kernel log, which can be read via dmesg(1) :
562
563           echo 2 > /proc/sys/net/core/bpf_jit_enable
564
565       The  Linux  kernel  source  tree  ships additionally under tools/net/ a
566       small helper called bpf_jit_disasm that reads out the opcode image dump
567       from the kernel log and dumps the resulting disassembly:
568
569           bpf_jit_disasm -o
570
571       Other  than that, the Linux kernel also contains an extensive eBPF/cBPF
572       test suite module called test_bpf . Upon ...
573
574           modprobe test_bpf
575
576       ... it performs a diversity of test cases and dumps  the  results  into
577       the  kernel  log  that can be inspected with dmesg(1) . The results can
578       differ depending on whether the JIT compiler is enabled or not. In case
579       of  failed  test cases, the module will fail to load. In such cases, we
580       urge you to file a bug report to the related JIT authors, Linux  kernel
581       and networking mailing lists.
582
583
584   cBPF
585       Although  we generally recommend switching to implementing eBPF classi‐
586       fier and actions, for the sake of completeness, a few words on  how  to
587       program in cBPF will be lost here.
588
589       Likewise,  the  bpf_jit_enable  switch  can  be  enabled  as  mentioned
590       already. Tooling such as bpf_jit_disasm  is  also  independent  whether
591       eBPF or cBPF code is being loaded.
592
593       Unlike in eBPF, classifier and action are not implemented in restricted
594       C, but rather in a minimal assembler-like language or with the help  of
595       other tooling.
596
597       The raw interface with tc takes opcodes directly. For example, the most
598       minimal classifier matching on every packet resulting  in  the  default
599       classid of 1:1 looks like:
600
601           tc  filter add dev em1 parent 1: bpf bytecode '1,6 0 0 4294967295,'
602           flowid 1:1
603
604       The first decimal of the bytecode sequence denotes the number of subse‐
605       quent  4-tuples  of cBPF opcodes. As mentioned, such a 4-tuple consists
606       of c t f k decimals, where c represents the cBPF  opcode,  t  the  jump
607       true  offset target, f the jump false offset target and k the immediate
608       constant/literal. Here, this denotes an unconditional return  from  the
609       program with immediate value of -1.
610
611       Thus, for egress classification, Willem de Bruijn implemented a minimal
612       stand-alone helper tool under the GNU General Public License version  2
613       for  iptables(8) BPF extension, which abuses the libpcap internal clas‐
614       sic BPF compiler, his code derived here for usage with tc(8) :
615
616
617           #include <pcap.h>
618           #include <stdio.h>
619
620           int main(int argc, char **argv)
621           {
622                   struct bpf_program prog;
623                   struct bpf_insn *ins;
624                   int i, ret, dlt = DLT_RAW;
625
626                   if (argc < 2 || argc > 3)
627                           return 1;
628                   if (argc == 3) {
629                           dlt = pcap_datalink_name_to_val(argv[1]);
630                           if (dlt == -1)
631                                   return 1;
632                   }
633
634                   ret = pcap_compile_nopcap(-1, dlt, &prog, argv[argc - 1],
635                                             1, PCAP_NETMASK_UNKNOWN);
636                   if (ret)
637                           return 1;
638
639                   printf("%d,", prog.bf_len);
640                   ins = prog.bf_insns;
641
642                   for (i = 0; i < prog.bf_len - 1; ++ins, ++i)
643                           printf("%u %u %u %u,", ins->code,
644                                  ins->jt, ins->jf, ins->k);
645                   printf("%u %u %u %u",
646                          ins->code, ins->jt, ins->jf, ins->k);
647
648                   pcap_freecode(&prog);
649                   return 0;
650           }
651
652       Given this small helper, any tcpdump(8) filter expression can be abused
653       as a classifier where a match will result in the default classid:
654
655           bpftool EN10MB 'tcp[tcpflags] & tcp-syn != 0' > /var/bpf/tcp-syn
656           tc  filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn
657           flowid 1:1
658
659       Basically, such a minimal generator is equivalent to:
660
661           tcpdump -iem1 -ddd 'tcp[tcpflags] & tcp-syn != 0' | tr '\n'  ','  >
662           /var/bpf/tcp-syn
663
664       Since  libpcap  does not support all Linux' specific cBPF extensions in
665       its compiler, the Linux kernel also ships under  tools/net/  a  minimal
666       BPF  assembler  called bpf_asm for providing full control. For detailed
667       syntax and semantics on implementing such programs by hand, see  refer‐
668       ences under FURTHER READING .
669
670       Trivial  toy example in bpf_asm for classifying IPv4/TCP packets, saved
671       in a text file called foobar :
672
673
674           ldh [12]
675           jne #0x800, drop
676           ldb [23]
677           jneq #6, drop
678           ret #-1
679           drop: ret #0
680
681       Similarly, such a classifier can be loaded as:
682
683           bpf_asm foobar > /var/bpf/tcp-syn
684           tc filter add dev em1 parent 1: bpf bytecode-file  /var/bpf/tcp-syn
685           flowid 1:1
686
687       For  BPF  classifiers,  the  Linux  kernel  provides additionally under
688       tools/net/ a small BPF debugger called bpf_dbg , which can be  used  to
689       test a classifier against pcap files, single-step or add various break‐
690       points into the classifier program and dump  register  contents  during
691       runtime.
692
693       Implementing  an  action  in classic BPF is rather limited in the sense
694       that packet mangling is not supported. Therefore, it's generally recom‐
695       mended to make the switch to eBPF, whenever possible.
696
697

FURTHER READING

699       Further  and  more  technical details about the BPF architecture can be
700       found in the Linux  kernel  source  tree  under  Documentation/network‐
701       ing/filter.txt .
702
703       Further  details  on  eBPF  tc(8) examples can be found in the iproute2
704       source tree under examples/bpf/ .
705
706

SEE ALSO

708       tc(8), tc-ematch(8) bpf(2) bpf(4)
709
710

AUTHORS

712       Manpage written by Daniel Borkmann.
713
714       Please report corrections or improvements to the Linux kernel  network‐
715       ing mailing list: <netdev@vger.kernel.org>
716
717
718
719iproute2                          18 May 201B5PF classifier and actions in tc(8)
Impressum