1ovn-northd(8) OVN Manual ovn-northd(8)
2
3
4
5build/.PP
6
8 ovn-northd - Open Virtual Network central control daemon
9
11 ovn-northd [options]
12
14 ovn-northd is a centralized daemon responsible for translating the
15 high-level OVN configuration into logical configuration consumable by
16 daemons such as ovn-controller. It translates the logical network con‐
17 figuration in terms of conventional network concepts, taken from the
18 OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
19 the OVN Southbound Database (see ovn-sb(5)) below it.
20
22 --ovnnb-db=database
23 The OVSDB database containing the OVN Northbound Database. If
24 the OVN_NB_DB environment variable is set, its value is used as
25 the default. Otherwise, the default is unix:/ovnnb_db.sock.
26
27 --ovnsb-db=database
28 The OVSDB database containing the OVN Southbound Database. If
29 the OVN_SB_DB environment variable is set, its value is used as
30 the default. Otherwise, the default is unix:/ovnsb_db.sock.
31
32 database in the above options must be an OVSDB active or passive con‐
33 nection method, as described in ovsdb(7).
34
35 Daemon Options
36 --pidfile[=pidfile]
37 Causes a file (by default, program.pid) to be created indicating
38 the PID of the running process. If the pidfile argument is not
39 specified, or if it does not begin with /, then it is created in
40 .
41
42 If --pidfile is not specified, no pidfile is created.
43
44 --overwrite-pidfile
45 By default, when --pidfile is specified and the specified pid‐
46 file already exists and is locked by a running process, the dae‐
47 mon refuses to start. Specify --overwrite-pidfile to cause it to
48 instead overwrite the pidfile.
49
50 When --pidfile is not specified, this option has no effect.
51
52 --detach
53 Runs this program as a background process. The process forks,
54 and in the child it starts a new session, closes the standard
55 file descriptors (which has the side effect of disabling logging
56 to the console), and changes its current directory to the root
57 (unless --no-chdir is specified). After the child completes its
58 initialization, the parent exits.
59
60 --monitor
61 Creates an additional process to monitor this program. If it
62 dies due to a signal that indicates a programming error (SIGA‐
63 BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
64 or SIGXFSZ) then the monitor process starts a new copy of it. If
65 the daemon dies or exits for another reason, the monitor process
66 exits.
67
68 This option is normally used with --detach, but it also func‐
69 tions without it.
70
71 --no-chdir
72 By default, when --detach is specified, the daemon changes its
73 current working directory to the root directory after it de‐
74 taches. Otherwise, invoking the daemon from a carelessly chosen
75 directory would prevent the administrator from unmounting the
76 file system that holds that directory.
77
78 Specifying --no-chdir suppresses this behavior, preventing the
79 daemon from changing its current working directory. This may be
80 useful for collecting core files, since it is common behavior to
81 write core dumps into the current working directory and the root
82 directory is not a good directory to use.
83
84 This option has no effect when --detach is not specified.
85
86 --no-self-confinement
87 By default this daemon will try to self-confine itself to work
88 with files under well-known directories determined at build
89 time. It is better to stick with this default behavior and not
90 to use this flag unless some other Access Control is used to
91 confine daemon. Note that in contrast to other access control
92 implementations that are typically enforced from kernel-space
93 (e.g. DAC or MAC), self-confinement is imposed from the user-
94 space daemon itself and hence should not be considered as a full
95 confinement strategy, but instead should be viewed as an addi‐
96 tional layer of security.
97
98 --user=user:group
99 Causes this program to run as a different user specified in
100 user:group, thus dropping most of the root privileges. Short
101 forms user and :group are also allowed, with current user or
102 group assumed, respectively. Only daemons started by the root
103 user accepts this argument.
104
105 On Linux, daemons will be granted CAP_IPC_LOCK and
106 CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
107 that interact with a datapath, such as ovs-vswitchd, will be
108 granted three additional capabilities, namely CAP_NET_ADMIN,
109 CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
110 apply even if the new user is root.
111
112 On Windows, this option is not currently supported. For security
113 reasons, specifying this option will cause the daemon process
114 not to start.
115
116 Logging Options
117 -v[spec]
118 --verbose=[spec]
119 Sets logging levels. Without any spec, sets the log level for ev‐
120 ery module and destination to dbg. Otherwise, spec is a list of
121 words separated by spaces or commas or colons, up to one from each
122 category below:
123
124 • A valid module name, as displayed by the vlog/list command
125 on ovs-appctl(8), limits the log level change to the speci‐
126 fied module.
127
128 • syslog, console, or file, to limit the log level change to
129 only to the system log, to the console, or to a file, re‐
130 spectively. (If --detach is specified, the daemon closes
131 its standard file descriptors, so logging to the console
132 will have no effect.)
133
134 On Windows platform, syslog is accepted as a word and is
135 only useful along with the --syslog-target option (the word
136 has no effect otherwise).
137
138 • off, emer, err, warn, info, or dbg, to control the log
139 level. Messages of the given severity or higher will be
140 logged, and messages of lower severity will be filtered
141 out. off filters out all messages. See ovs-appctl(8) for a
142 definition of each log level.
143
144 Case is not significant within spec.
145
146 Regardless of the log levels set for file, logging to a file will
147 not take place unless --log-file is also specified (see below).
148
149 For compatibility with older versions of OVS, any is accepted as a
150 word but has no effect.
151
152 -v
153 --verbose
154 Sets the maximum logging verbosity level, equivalent to --ver‐
155 bose=dbg.
156
157 -vPATTERN:destination:pattern
158 --verbose=PATTERN:destination:pattern
159 Sets the log pattern for destination to pattern. Refer to ovs-ap‐
160 pctl(8) for a description of the valid syntax for pattern.
161
162 -vFACILITY:facility
163 --verbose=FACILITY:facility
164 Sets the RFC5424 facility of the log message. facility can be one
165 of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
166 ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
167 local4, local5, local6 or local7. If this option is not specified,
168 daemon is used as the default for the local system syslog and lo‐
169 cal0 is used while sending a message to the target provided via
170 the --syslog-target option.
171
172 --log-file[=file]
173 Enables logging to a file. If file is specified, then it is used
174 as the exact name for the log file. The default log file name used
175 if file is omitted is /var/log/ovn/program.log.
176
177 --syslog-target=host:port
178 Send syslog messages to UDP port on host, in addition to the sys‐
179 tem syslog. The host must be a numerical IP address, not a host‐
180 name.
181
182 --syslog-method=method
183 Specify method as how syslog messages should be sent to syslog
184 daemon. The following forms are supported:
185
186 • libc, to use the libc syslog() function. Downside of using
187 this options is that libc adds fixed prefix to every mes‐
188 sage before it is actually sent to the syslog daemon over
189 /dev/log UNIX domain socket.
190
191 • unix:file, to use a UNIX domain socket directly. It is pos‐
192 sible to specify arbitrary message format with this option.
193 However, rsyslogd 8.9 and older versions use hard coded
194 parser function anyway that limits UNIX domain socket use.
195 If you want to use arbitrary message format with older
196 rsyslogd versions, then use UDP socket to localhost IP ad‐
197 dress instead.
198
199 • udp:ip:port, to use a UDP socket. With this method it is
200 possible to use arbitrary message format also with older
201 rsyslogd. When sending syslog messages over UDP socket ex‐
202 tra precaution needs to be taken into account, for example,
203 syslog daemon needs to be configured to listen on the spec‐
204 ified UDP port, accidental iptables rules could be inter‐
205 fering with local syslog traffic and there are some secu‐
206 rity considerations that apply to UDP sockets, but do not
207 apply to UNIX domain sockets.
208
209 • null, to discard all messages logged to syslog.
210
211 The default is taken from the OVS_SYSLOG_METHOD environment vari‐
212 able; if it is unset, the default is libc.
213
214 PKI Options
215 PKI configuration is required in order to use SSL for the connections
216 to the Northbound and Southbound databases.
217
218 -p privkey.pem
219 --private-key=privkey.pem
220 Specifies a PEM file containing the private key used as
221 identity for outgoing SSL connections.
222
223 -c cert.pem
224 --certificate=cert.pem
225 Specifies a PEM file containing a certificate that certi‐
226 fies the private key specified on -p or --private-key to be
227 trustworthy. The certificate must be signed by the certifi‐
228 cate authority (CA) that the peer in SSL connections will
229 use to verify it.
230
231 -C cacert.pem
232 --ca-cert=cacert.pem
233 Specifies a PEM file containing the CA certificate for ver‐
234 ifying certificates presented to this program by SSL peers.
235 (This may be the same certificate that SSL peers use to
236 verify the certificate specified on -c or --certificate, or
237 it may be a different one, depending on the PKI design in
238 use.)
239
240 -C none
241 --ca-cert=none
242 Disables verification of certificates presented by SSL
243 peers. This introduces a security risk, because it means
244 that certificates cannot be verified to be those of known
245 trusted hosts.
246
247 Other Options
248 --unixctl=socket
249 Sets the name of the control socket on which program listens for
250 runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
251 below). If socket does not begin with /, it is interpreted as
252 relative to . If --unixctl is not used at all, the default
253 socket is /program.pid.ctl, where pid is program’s process ID.
254
255 On Windows a local named pipe is used to listen for runtime man‐
256 agement commands. A file is created in the absolute path as
257 pointed by socket or if --unixctl is not used at all, a file is
258 created as program in the configured OVS_RUNDIR directory. The
259 file exists just to mimic the behavior of a Unix domain socket.
260
261 Specifying none for socket disables the control socket feature.
262
263
264
265 -h
266 --help
267 Prints a brief help message to the console.
268
269 -V
270 --version
271 Prints version information to the console.
272
274 ovs-appctl can send commands to a running ovn-northd process. The cur‐
275 rently supported commands are described below.
276
277 exit Causes ovn-northd to gracefully terminate.
278
279 pause Pauses the ovn-northd operation from processing any
280 Northbound and Southbound database changes. This will
281 also instruct ovn-northd to drop any lock on SB DB.
282
283 resume Resumes the ovn-northd operation to process Northbound
284 and Southbound database contents and generate logical
285 flows. This will also instruct ovn-northd to aspire for
286 the lock on SB DB.
287
288 is-paused
289 Returns "true" if ovn-northd is currently paused, "false"
290 otherwise.
291
292 status Prints this server’s status. Status will be "active" if
293 ovn-northd has acquired OVSDB lock on SB DB, "standby" if
294 it has not or "paused" if this instance is paused.
295
296 sb-cluster-state-reset
297 Reset southbound database cluster status when databases
298 are destroyed and rebuilt.
299
300 If all databases in a clustered southbound database are
301 removed from disk, then the stored index of all databases
302 will be reset to zero. This will cause ovn-northd to be
303 unable to read or write to the southbound database, be‐
304 cause it will always detect the data as stale. In such a
305 case, run this command so that ovn-northd will reset its
306 local index so that it can interact with the southbound
307 database again.
308
309 nb-cluster-state-reset
310 Reset northbound database cluster status when databases
311 are destroyed and rebuilt.
312
313 This performs the same task as sb-cluster-state-reset ex‐
314 cept for the northbound database client.
315
317 You may run ovn-northd more than once in an OVN deployment. When con‐
318 nected to a standalone or clustered DB setup, OVN will automatically
319 ensure that only one of them is active at a time. If multiple instances
320 of ovn-northd are running and the active ovn-northd fails, one of the
321 hot standby instances of ovn-northd will automatically take over.
322
323 Active-Standby with multiple OVN DB servers
324 You may run multiple OVN DB servers in an OVN deployment with:
325
326 • OVN DB servers deployed in active/passive mode with one
327 active and multiple passive ovsdb-servers.
328
329 • ovn-northd also deployed on all these nodes, using unix
330 ctl sockets to connect to the local OVN DB servers.
331
332 In such deployments, the ovn-northds on the passive nodes will process
333 the DB changes and compute logical flows to be thrown out later, be‐
334 cause write transactions are not allowed by the passive ovsdb-servers.
335 It results in unnecessary CPU usage.
336
337 With the help of runtime management command pause, you can pause
338 ovn-northd on these nodes. When a passive node becomes master, you can
339 use the runtime management command resume to resume the ovn-northd to
340 process the DB changes.
341
343 One of the main purposes of ovn-northd is to populate the Logical_Flow
344 table in the OVN_Southbound database. This section describes how
345 ovn-northd does this for switch and router logical datapaths.
346
347 Logical Switch Datapaths
348 Ingress Table 0: Admission Control and Ingress Port Security - L2
349
350 Ingress table 0 contains these logical flows:
351
352 • Priority 100 flows to drop packets with VLAN tags or mul‐
353 ticast Ethernet source addresses.
354
355 • Priority 50 flows that implement ingress port security
356 for each enabled logical port. For logical ports on which
357 port security is enabled, these match the inport and the
358 valid eth.src address(es) and advance only those packets
359 to the next flow table. For logical ports on which port
360 security is not enabled, these advance all packets that
361 match the inport.
362
363 There are no flows for disabled logical ports because the default-drop
364 behavior of logical flow tables causes packets that ingress from them
365 to be dropped.
366
367 Ingress Table 1: Ingress Port Security - IP
368
369 Ingress table 1 contains these logical flows:
370
371 • For each element in the port security set having one or
372 more IPv4 or IPv6 addresses (or both),
373
374 • Priority 90 flow to allow IPv4 traffic if it has
375 IPv4 addresses which match the inport, valid
376 eth.src and valid ip4.src address(es).
377
378 • Priority 90 flow to allow IPv4 DHCP discovery
379 traffic if it has a valid eth.src. This is neces‐
380 sary since DHCP discovery messages are sent from
381 the unspecified IPv4 address (0.0.0.0) since the
382 IPv4 address has not yet been assigned.
383
384 • Priority 90 flow to allow IPv6 traffic if it has
385 IPv6 addresses which match the inport, valid
386 eth.src and valid ip6.src address(es).
387
388 • Priority 90 flow to allow IPv6 DAD (Duplicate Ad‐
389 dress Detection) traffic if it has a valid
390 eth.src. This is is necessary since DAD include
391 requires joining an multicast group and sending
392 neighbor solicitations for the newly assigned ad‐
393 dress. Since no address is yet assigned, these are
394 sent from the unspecified IPv6 address (::).
395
396 • Priority 80 flow to drop IP (both IPv4 and IPv6)
397 traffic which match the inport and valid eth.src.
398
399 • One priority-0 fallback flow that matches all packets and
400 advances to the next table.
401
402 Ingress Table 2: Ingress Port Security - Neighbor discovery
403
404 Ingress table 2 contains these logical flows:
405
406 • For each element in the port security set,
407
408 • Priority 90 flow to allow ARP traffic which match
409 the inport and valid eth.src and arp.sha. If the
410 element has one or more IPv4 addresses, then it
411 also matches the valid arp.spa.
412
413 • Priority 90 flow to allow IPv6 Neighbor Solicita‐
414 tion and Advertisement traffic which match the in‐
415 port, valid eth.src and nd.sll/nd.tll. If the ele‐
416 ment has one or more IPv6 addresses, then it also
417 matches the valid nd.target address(es) for Neigh‐
418 bor Advertisement traffic.
419
420 • Priority 80 flow to drop ARP and IPv6 Neighbor So‐
421 licitation and Advertisement traffic which match
422 the inport and valid eth.src.
423
424 • One priority-0 fallback flow that matches all packets and
425 advances to the next table.
426
427 Ingress Table 3: Lookup MAC address learning table
428
429 This table looks up the MAC learning table of the logical switch data‐
430 path to check if the port-mac pair is present or not. MAC is learnt
431 only for logical switch VIF ports whose port security is disabled and
432 ’unknown’ address set.
433
434 • For each such logical port p whose port security is dis‐
435 abled and ’unknown’ address set following flow is added.
436
437 • Priority 100 flow with the match inport == p and
438 action reg0[11] = lookup_fdb(inport, eth.src);
439 next;
440
441 • One priority-0 fallback flow that matches all packets and
442 advances to the next table.
443
444 Ingress Table 4: Learn MAC of ’unknown’ ports.
445
446 This table learns the MAC addresses seen on the logical ports whose
447 port security is disabled and ’unknown’ address set if the lookup_fdb
448 action returned false in the previous table.
449
450 • For each such logical port p whose port security is dis‐
451 abled and ’unknown’ address set following flow is added.
452
453 • Priority 100 flow with the match inport == p &&
454 reg0[11] == 0 and action put_fdb(inport, eth.src);
455 next; which stores the port-mac in the mac learn‐
456 ing table of the logical switch datapath and ad‐
457 vances the packet to the next table.
458
459 • One priority-0 fallback flow that matches all packets and
460 advances to the next table.
461
462 Ingress Table 5:from-lportPre-ACLs
463
464 This table prepares flows for possible stateful ACL processing in
465 ingress table ACLs. It contains a priority-0 flow that simply moves
466 traffic to the next table. If stateful ACLs are used in the logical
467 datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
468 1; next;) for table Pre-stateful to send IP packets to the connection
469 tracker before eventually advancing to ingress table ACLs. If special
470 ports such as route ports or localnet ports can’t use ct(), a prior‐
471 ity-110 flow is added to skip over stateful ACLs. IPv6 Neighbor Discov‐
472 ery and MLD traffic also skips stateful ACLs.
473
474 This table also has a priority-110 flow with the match eth.dst == E for
475 all logical switch datapaths to move traffic to the next table. Where E
476 is the service monitor mac defined in the options:svc_monitor_mac colum
477 of NB_Global table.
478
479 Ingress Table 6: Pre-LB
480
481 This table prepares flows for possible stateful load balancing process‐
482 ing in ingress table LB and Stateful. It contains a priority-0 flow
483 that simply moves traffic to the next table. Moreover it contains a
484 priority-110 flow to move IPv6 Neighbor Discovery and MLD traffic to
485 the next table. If load balancing rules with virtual IP addresses (and
486 ports) are configured in OVN_Northbound database for a logical switch
487 datapath, a priority-100 flow is added with the match ip to match on IP
488 packets and sets the action reg0[2] = 1; next; to act as a hint for ta‐
489 ble Pre-stateful to send IP packets to the connection tracker for
490 packet de-fragmentation (and to possibly do DNAT for already estab‐
491 lished load balanced traffic) before eventually advancing to ingress
492 table Stateful. If controller_event has been enabled and load balancing
493 rules with empty backends have been added in OVN_Northbound, a 130 flow
494 is added to trigger ovn-controller events whenever the chassis receives
495 a packet for that particular VIP. If event-elb meter has been previ‐
496 ously created, it will be associated to the empty_lb logical flow
497
498 Prior to OVN 20.09 we were setting the reg0[0] = 1 only if the IP des‐
499 tination matches the load balancer VIP. However this had few issues
500 cases where a logical switch doesn’t have any ACLs with allow-related
501 action. To understand the issue lets a take a TCP load balancer -
502 10.0.0.10:80=10.0.0.3:80. If a logical port - p1 with IP - 10.0.0.5
503 opens a TCP connection with the VIP - 10.0.0.10, then the packet in the
504 ingress pipeline of ’p1’ is sent to the p1’s conntrack zone id and the
505 packet is load balanced to the backend - 10.0.0.3. For the reply packet
506 from the backend lport, it is not sent to the conntrack of backend
507 lport’s zone id. This is fine as long as the packet is valid. Suppose
508 the backend lport sends an invalid TCP packet (like incorrect sequence
509 number), the packet gets delivered to the lport ’p1’ without unDNATing
510 the packet to the VIP - 10.0.0.10. And this causes the connection to be
511 reset by the lport p1’s VIF.
512
513 We can’t fix this issue by adding a logical flow to drop ct.inv packets
514 in the egress pipeline since it will drop all other connections not
515 destined to the load balancers. To fix this issue, we send all the
516 packets to the conntrack in the ingress pipeline if a load balancer is
517 configured. We can now add a lflow to drop ct.inv packets.
518
519 This table also has a priority-110 flow with the match eth.dst == E for
520 all logical switch datapaths to move traffic to the next table. Where E
521 is the service monitor mac defined in the options:svc_monitor_mac colum
522 of NB_Global table.
523
524 This table also has a priority-110 flow with the match inport == I for
525 all logical switch datapaths to move traffic to the next table. Where I
526 is the peer of a logical router port. This flow is added to skip the
527 connection tracking of packets which enter from logical router datapath
528 to logical switch datapath.
529
530 Ingress Table 7: Pre-stateful
531
532 This table prepares flows for all possible stateful processing in next
533 tables. It contains a priority-0 flow that simply moves traffic to the
534 next table.
535
536 • Priority-120 flows that send the packets to connection
537 tracker using ct_lb; as the action so that the already
538 established traffic destined to the load balancer VIP
539 gets DNATted based on a hint provided by the previous ta‐
540 bles (with a match for reg0[2] == 1 and on supported load
541 balancer protocols and address families). For IPv4 traf‐
542 fic the flows also load the original destination IP and
543 transport port in registers reg1 and reg2. For IPv6 traf‐
544 fic the flows also load the original destination IP and
545 transport port in registers xxreg1 and reg2.
546
547 • A priority-110 flow sends the packets to connection
548 tracker based on a hint provided by the previous tables
549 (with a match for reg0[2] == 1) by using the ct_lb; ac‐
550 tion. This flow is added to handle the traffic for load
551 balancer VIPs whose protocol is not defined (mainly for
552 ICMP traffic).
553
554 • A priority-100 flow sends the packets to connection
555 tracker based on a hint provided by the previous tables
556 (with a match for reg0[0] == 1) by using the ct_next; ac‐
557 tion.
558
559 Ingress Table 8:from-lportACL hints
560
561 This table consists of logical flows that set hints (reg0 bits) to be
562 used in the next stage, in the ACL processing table, if stateful ACLs
563 or load balancers are configured. Multiple hints can be set for the
564 same packet. The possible hints are:
565
566 • reg0[7]: the packet might match an allow-related ACL and
567 might have to commit the connection to conntrack.
568
569 • reg0[8]: the packet might match an allow-related ACL but
570 there will be no need to commit the connection to con‐
571 ntrack because it already exists.
572
573 • reg0[9]: the packet might match a drop/reject.
574
575 • reg0[10]: the packet might match a drop/reject ACL but
576 the connection was previously allowed so it might have to
577 be committed again with ct_label=1/1.
578
579 The table contains the following flows:
580
581 • A priority-65535 flow to advance to the next table if the
582 logical switch has no ACLs configured, otherwise a prior‐
583 ity-0 flow to advance to the next table.
584
585 • A priority-7 flow that matches on packets that initiate a
586 new session. This flow sets reg0[7] and reg0[9] and then
587 advances to the next table.
588
589 • A priority-6 flow that matches on packets that are in the
590 request direction of an already existing session that has
591 been marked as blocked. This flow sets reg0[7] and
592 reg0[9] and then advances to the next table.
593
594 • A priority-5 flow that matches untracked packets. This
595 flow sets reg0[8] and reg0[9] and then advances to the
596 next table.
597
598 • A priority-4 flow that matches on packets that are in the
599 request direction of an already existing session that has
600 not been marked as blocked. This flow sets reg0[8] and
601 reg0[10] and then advances to the next table.
602
603 • A priority-3 flow that matches on packets that are in not
604 part of established sessions. This flow sets reg0[9] and
605 then advances to the next table.
606
607 • A priority-2 flow that matches on packets that are part
608 of an established session that has been marked as
609 blocked. This flow sets reg0[9] and then advances to the
610 next table.
611
612 • A priority-1 flow that matches on packets that are part
613 of an established session that has not been marked as
614 blocked. This flow sets reg0[10] and then advances to the
615 next table.
616
617 Ingress table 9:from-lportACLs
618
619 Logical flows in this table closely reproduce those in the ACL table in
620 the OVN_Northbound database for the from-lport direction. The priority
621 values from the ACL table have a limited range and have 1000 added to
622 them to leave room for OVN default flows at both higher and lower pri‐
623 orities.
624
625 • allow ACLs translate into logical flows with the next;
626 action. If there are any stateful ACLs on this datapath,
627 then allow ACLs translate to ct_commit; next; (which acts
628 as a hint for the next tables to commit the connection to
629 conntrack),
630
631 • allow-related ACLs translate into logical flows with the
632 ct_commit(ct_label=0/1); next; actions for new connec‐
633 tions and reg0[1] = 1; next; for existing connections.
634
635 • reject ACLs translate into logical flows with the tcp_re‐
636 set { output <-> inport; next(pipeline=egress,table=5);}
637 action for TCP connections,icmp4/icmp6 action for UDP
638 connections, and sctp_abort {output <-%gt; inport;
639 next(pipeline=egress,table=5);} action for SCTP associa‐
640 tions.
641
642 • Other ACLs translate to drop; for new or untracked con‐
643 nections and ct_commit(ct_label=1/1); for known connec‐
644 tions. Setting ct_label marks a connection as one that
645 was previously allowed, but should no longer be allowed
646 due to a policy change.
647
648 This table contains a priority-65535 flow to advance to the next table
649 if the logical switch has no ACLs configured, otherwise a priority-0
650 flow to advance to the next table so that ACLs allow packets by de‐
651 fault.
652
653 If the logical datapath has a stateful ACL or a load balancer with VIP
654 configured, the following flows will also be added:
655
656 • A priority-1 flow that sets the hint to commit IP traffic
657 to the connection tracker (with action reg0[1] = 1;
658 next;). This is needed for the default allow policy be‐
659 cause, while the initiator’s direction may not have any
660 stateful rules, the server’s may and then its return
661 traffic would not be known and marked as invalid.
662
663 • A priority-65532 flow that allows any traffic in the re‐
664 ply direction for a connection that has been committed to
665 the connection tracker (i.e., established flows), as long
666 as the committed flow does not have ct_label.blocked set.
667 We only handle traffic in the reply direction here be‐
668 cause we want all packets going in the request direction
669 to still go through the flows that implement the cur‐
670 rently defined policy based on ACLs. If a connection is
671 no longer allowed by policy, ct_label.blocked will get
672 set and packets in the reply direction will no longer be
673 allowed, either.
674
675 • A priority-65532 flow that allows any traffic that is
676 considered related to a committed flow in the connection
677 tracker (e.g., an ICMP Port Unreachable from a non-lis‐
678 tening UDP port), as long as the committed flow does not
679 have ct_label.blocked set.
680
681 • A priority-65532 flow that drops all traffic marked by
682 the connection tracker as invalid.
683
684 • A priority-65532 flow that drops all traffic in the reply
685 direction with ct_label.blocked set meaning that the con‐
686 nection should no longer be allowed due to a policy
687 change. Packets in the request direction are skipped here
688 to let a newly created ACL re-allow this connection.
689
690 • A priority-65532 flow that allows IPv6 Neighbor solicita‐
691 tion, Neighbor discover, Router solicitation, Router ad‐
692 vertisement and MLD packets.
693
694 If the logical datapath has any ACL or a load balancer with VIP config‐
695 ured, the following flow will also be added:
696
697 • A priority 34000 logical flow is added for each logical
698 switch datapath with the match eth.dst = E to allow the
699 service monitor reply packet destined to ovn-controller
700 with the action next, where E is the service monitor mac
701 defined in the options:svc_monitor_mac colum of NB_Global
702 table.
703
704 Ingress Table 10:from-lportQoS Marking
705
706 Logical flows in this table closely reproduce those in the QoS table
707 with the action column set in the OVN_Northbound database for the
708 from-lport direction.
709
710 • For every qos_rules entry in a logical switch with DSCP
711 marking enabled, a flow will be added at the priority
712 mentioned in the QoS table.
713
714 • One priority-0 fallback flow that matches all packets and
715 advances to the next table.
716
717 Ingress Table 11:from-lportQoS Meter
718
719 Logical flows in this table closely reproduce those in the QoS table
720 with the bandwidth column set in the OVN_Northbound database for the
721 from-lport direction.
722
723 • For every qos_rules entry in a logical switch with meter‐
724 ing enabled, a flow will be added at the priority men‐
725 tioned in the QoS table.
726
727 • One priority-0 fallback flow that matches all packets and
728 advances to the next table.
729
730 Ingress Table 12: Stateful
731
732 • For all the configured load balancing rules for a switch
733 in OVN_Northbound database that includes a L4 port PORT
734 of protocol P and IP address VIP, a priority-120 flow is
735 added. For IPv4 VIPs , the flow matches ct.new && ip &&
736 ip4.dst == VIP && P && P.dst == PORT. For IPv6 VIPs, the
737 flow matches ct.new && ip && ip6.dst == VIP && P && P.dst
738 == PORT. The flow’s action is ct_lb(args) , where args
739 contains comma separated IP addresses (and optional port
740 numbers) to load balance to. The address family of the IP
741 addresses of args is the same as the address family of
742 VIP. If health check is enabled, then args will only con‐
743 tain those endpoints whose service monitor status entry
744 in OVN_Southbound db is either online or empty. For IPv4
745 traffic the flow also loads the original destination IP
746 and transport port in registers reg1 and reg2. For IPv6
747 traffic the flow also loads the original destination IP
748 and transport port in registers xxreg1 and reg2.
749
750 • For all the configured load balancing rules for a switch
751 in OVN_Northbound database that includes just an IP ad‐
752 dress VIP to match on, OVN adds a priority-110 flow. For
753 IPv4 VIPs, the flow matches ct.new && ip && ip4.dst ==
754 VIP. For IPv6 VIPs, the flow matches ct.new && ip &&
755 ip6.dst == VIP. The action on this flow is ct_lb(args),
756 where args contains comma separated IP addresses of the
757 same address family as VIP. For IPv4 traffic the flow
758 also loads the original destination IP and transport port
759 in registers reg1 and reg2. For IPv6 traffic the flow
760 also loads the original destination IP and transport port
761 in registers xxreg1 and reg2.
762
763 • If the load balancer is created with --reject option and
764 it has no active backends, a TCP reset segment (for tcp)
765 or an ICMP port unreachable packet (for all other kind of
766 traffic) will be sent whenever an incoming packet is re‐
767 ceived for this load-balancer. Please note using --reject
768 option will disable empty_lb SB controller event for this
769 load balancer.
770
771 • A priority-100 flow commits packets to connection tracker
772 using ct_commit; next; action based on a hint provided by
773 the previous tables (with a match for reg0[1] == 1).
774
775 • A priority-0 flow that simply moves traffic to the next
776 table.
777
778 Ingress Table 13: Pre-Hairpin
779
780 • If the logical switch has load balancer(s) configured,
781 then a priority-100 flow is added with the match ip &&
782 ct.trk to check if the packet needs to be hairpinned (if
783 after load balancing the destination IP matches the
784 source IP) or not by executing the actions reg0[6] =
785 chk_lb_hairpin(); and reg0[12] = chk_lb_hairpin_reply();
786 and advances the packet to the next table.
787
788 • A priority-0 flow that simply moves traffic to the next
789 table.
790
791 Ingress Table 14: Nat-Hairpin
792
793 • If the logical switch has load balancer(s) configured,
794 then a priority-100 flow is added with the match ip &&
795 ct.new && ct.trk && reg0[6] == 1 which hairpins the traf‐
796 fic by NATting source IP to the load balancer VIP by exe‐
797 cuting the action ct_snat_to_vip and advances the packet
798 to the next table.
799
800 • If the logical switch has load balancer(s) configured,
801 then a priority-100 flow is added with the match ip &&
802 ct.est && ct.trk && reg0[6] == 1 which hairpins the traf‐
803 fic by NATting source IP to the load balancer VIP by exe‐
804 cuting the action ct_snat and advances the packet to the
805 next table.
806
807 • If the logical switch has load balancer(s) configured,
808 then a priority-90 flow is added with the match ip &&
809 reg0[12] == 1 which matches on the replies of hairpinned
810 traffic (i.e., destination IP is VIP, source IP is the
811 backend IP and source L4 port is backend port for L4 load
812 balancers) and executes ct_snat and advances the packet
813 to the next table.
814
815 • A priority-0 flow that simply moves traffic to the next
816 table.
817
818 Ingress Table 15: Hairpin
819
820 • A priority-1 flow that hairpins traffic matched by non-
821 default flows in the Pre-Hairpin table. Hairpinning is
822 done at L2, Ethernet addresses are swapped and the pack‐
823 ets are looped back on the input port.
824
825 • A priority-0 flow that simply moves traffic to the next
826 table.
827
828 Ingress Table 16: ARP/ND responder
829
830 This table implements ARP/ND responder in a logical switch for known
831 IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
832 by locally responding to ARP requests without the need to send to other
833 hypervisors. One common case is when the inport is a logical port asso‐
834 ciated with a VIF and the broadcast is responded to on the local hyper‐
835 visor rather than broadcast across the whole network and responded to
836 by the destination VM. This behavior is proxy ARP.
837
838 ARP requests arrive from VMs from a logical switch inport of type de‐
839 fault. For this case, the logical switch proxy ARP rules can be for
840 other VMs or logical router ports. Logical switch proxy ARP rules may
841 be programmed both for mac binding of IP addresses on other logical
842 switch VIF ports (which are of the default logical switch port type,
843 representing connectivity to VMs or containers), and for mac binding of
844 IP addresses on logical switch router type ports, representing their
845 logical router port peers. In order to support proxy ARP for logical
846 router ports, an IP address must be configured on the logical switch
847 router type port, with the same value as the peer logical router port.
848 The configured MAC addresses must match as well. When a VM sends an ARP
849 request for a distributed logical router port and if the peer router
850 type port of the attached logical switch does not have an IP address
851 configured, the ARP request will be broadcast on the logical switch.
852 One of the copies of the ARP request will go through the logical switch
853 router type port to the logical router datapath, where the logical
854 router ARP responder will generate a reply. The MAC binding of a dis‐
855 tributed logical router, once learned by an associated VM, is used for
856 all that VM’s communication needing routing. Hence, the action of a VM
857 re-arping for the mac binding of the logical router port should be
858 rare.
859
860 Logical switch ARP responder proxy ARP rules can also be hit when re‐
861 ceiving ARP requests externally on a L2 gateway port. In this case, the
862 hypervisor acting as an L2 gateway, responds to the ARP request on be‐
863 half of a destination VM.
864
865 Note that ARP requests received from localnet or vtep logical inports
866 can either go directly to VMs, in which case the VM responds or can hit
867 an ARP responder for a logical router port if the packet is used to re‐
868 solve a logical router port next hop address. In either case, logical
869 switch ARP responder rules will not be hit. It contains these logical
870 flows:
871
872 • Priority-100 flows to skip the ARP responder if inport is
873 of type localnet or vtep and advances directly to the
874 next table. ARP requests sent to localnet or vtep ports
875 can be received by multiple hypervisors. Now, because the
876 same mac binding rules are downloaded to all hypervisors,
877 each of the multiple hypervisors will respond. This will
878 confuse L2 learning on the source of the ARP requests.
879 ARP requests received on an inport of type router are not
880 expected to hit any logical switch ARP responder flows.
881 However, no skip flows are installed for these packets,
882 as there would be some additional flow cost for this and
883 the value appears limited.
884
885 • If inport V is of type virtual adds a priority-100 logi‐
886 cal flow for each P configured in the options:virtual-
887 parents column with the match
888
889 inport == P && && ((arp.op == 1 && arp.spa == VIP && arp.tpa == VIP) || (arp.op == 2 && arp.spa == VIP))
890
891
892 and applies the action
893
894 bind_vport(V, inport);
895
896
897 and advances the packet to the next table.
898
899 Where VIP is the virtual ip configured in the column op‐
900 tions:virtual-ip.
901
902 • Priority-50 flows that match ARP requests to each known
903 IP address A of every logical switch port, and respond
904 with ARP replies directly with corresponding Ethernet ad‐
905 dress E:
906
907 eth.dst = eth.src;
908 eth.src = E;
909 arp.op = 2; /* ARP reply. */
910 arp.tha = arp.sha;
911 arp.sha = E;
912 arp.tpa = arp.spa;
913 arp.spa = A;
914 outport = inport;
915 flags.loopback = 1;
916 output;
917
918
919 These flows are omitted for logical ports (other than
920 router ports or localport ports) that are down (unless
921 ignore_lsp_down is configured as true in options column
922 of NB_Global table of the Northbound database), for logi‐
923 cal ports of type virtual and for logical ports with ’un‐
924 known’ address set.
925
926 • Priority-50 flows that match IPv6 ND neighbor solicita‐
927 tions to each known IP address A (and A’s solicited node
928 address) of every logical switch port except of type
929 router, and respond with neighbor advertisements directly
930 with corresponding Ethernet address E:
931
932 nd_na {
933 eth.src = E;
934 ip6.src = A;
935 nd.target = A;
936 nd.tll = E;
937 outport = inport;
938 flags.loopback = 1;
939 output;
940 };
941
942
943 Priority-50 flows that match IPv6 ND neighbor solicita‐
944 tions to each known IP address A (and A’s solicited node
945 address) of logical switch port of type router, and re‐
946 spond with neighbor advertisements directly with corre‐
947 sponding Ethernet address E:
948
949 nd_na_router {
950 eth.src = E;
951 ip6.src = A;
952 nd.target = A;
953 nd.tll = E;
954 outport = inport;
955 flags.loopback = 1;
956 output;
957 };
958
959
960 These flows are omitted for logical ports (other than
961 router ports or localport ports) that are down (unless
962 ignore_lsp_down is configured as true in options column
963 of NB_Global table of the Northbound database), for logi‐
964 cal ports of type virtual and for logical ports with ’un‐
965 known’ address set.
966
967 • Priority-100 flows with match criteria like the ARP and
968 ND flows above, except that they only match packets from
969 the inport that owns the IP addresses in question, with
970 action next;. These flows prevent OVN from replying to,
971 for example, an ARP request emitted by a VM for its own
972 IP address. A VM only makes this kind of request to at‐
973 tempt to detect a duplicate IP address assignment, so
974 sending a reply will prevent the VM from accepting the IP
975 address that it owns.
976
977 In place of next;, it would be reasonable to use drop;
978 for the flows’ actions. If everything is working as it is
979 configured, then this would produce equivalent results,
980 since no host should reply to the request. But ARPing for
981 one’s own IP address is intended to detect situations
982 where the network is not working as configured, so drop‐
983 ping the request would frustrate that intent.
984
985 • For each SVC_MON_SRC_IP defined in the value of the
986 ip_port_mappings:ENDPOINT_IP column of Load_Balancer ta‐
987 ble, priority-110 logical flow is added with the match
988 arp.tpa == SVC_MON_SRC_IP && && arp.op == 1 and applies
989 the action
990
991 eth.dst = eth.src;
992 eth.src = E;
993 arp.op = 2; /* ARP reply. */
994 arp.tha = arp.sha;
995 arp.sha = E;
996 arp.tpa = arp.spa;
997 arp.spa = A;
998 outport = inport;
999 flags.loopback = 1;
1000 output;
1001
1002
1003 where E is the service monitor source mac defined in the
1004 options:svc_monitor_mac column in the NB_Global table.
1005 This mac is used as the source mac in the service monitor
1006 packets for the load balancer endpoint IP health checks.
1007
1008 SVC_MON_SRC_IP is used as the source ip in the service
1009 monitor IPv4 packets for the load balancer endpoint IP
1010 health checks.
1011
1012 These flows are required if an ARP request is sent for
1013 the IP SVC_MON_SRC_IP.
1014
1015 • For each VIP configured in the table Forwarding_Group a
1016 priority-50 logical flow is added with the match arp.tpa
1017 == vip && && arp.op == 1
1018 and applies the action
1019
1020 eth.dst = eth.src;
1021 eth.src = E;
1022 arp.op = 2; /* ARP reply. */
1023 arp.tha = arp.sha;
1024 arp.sha = E;
1025 arp.tpa = arp.spa;
1026 arp.spa = A;
1027 outport = inport;
1028 flags.loopback = 1;
1029 output;
1030
1031
1032 where E is the forwarding group’s mac defined in the
1033 vmac.
1034
1035 A is used as either the destination ip for load balancing
1036 traffic to child ports or as nexthop to hosts behind the
1037 child ports.
1038
1039 These flows are required to respond to an ARP request if
1040 an ARP request is sent for the IP vip.
1041
1042 • One priority-0 fallback flow that matches all packets and
1043 advances to the next table.
1044
1045 Ingress Table 17: DHCP option processing
1046
1047 This table adds the DHCPv4 options to a DHCPv4 packet from the logical
1048 ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
1049 larly for DHCPv6 options. This table also adds flows for the logical
1050 ports of type external.
1051
1052 • A priority-100 logical flow is added for these logical
1053 ports which matches the IPv4 packet with udp.src = 68 and
1054 udp.dst = 67 and applies the action put_dhcp_opts and ad‐
1055 vances the packet to the next table.
1056
1057 reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
1058 next;
1059
1060
1061 For DHCPDISCOVER and DHCPREQUEST, this transforms the
1062 packet into a DHCP reply, adds the DHCP offer IP ip and
1063 options to the packet, and stores 1 into reg0[3]. For
1064 other kinds of packets, it just stores 0 into reg0[3].
1065 Either way, it continues to the next table.
1066
1067 • A priority-100 logical flow is added for these logical
1068 ports which matches the IPv6 packet with udp.src = 546
1069 and udp.dst = 547 and applies the action put_dhcpv6_opts
1070 and advances the packet to the next table.
1071
1072 reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
1073 next;
1074
1075
1076 For DHCPv6 Solicit/Request/Confirm packets, this trans‐
1077 forms the packet into a DHCPv6 Advertise/Reply, adds the
1078 DHCPv6 offer IP ip and options to the packet, and stores
1079 1 into reg0[3]. For other kinds of packets, it just
1080 stores 0 into reg0[3]. Either way, it continues to the
1081 next table.
1082
1083 • A priority-0 flow that matches all packets to advances to
1084 table 16.
1085
1086 Ingress Table 18: DHCP responses
1087
1088 This table implements DHCP responder for the DHCP replies generated by
1089 the previous table.
1090
1091 • A priority 100 logical flow is added for the logical
1092 ports configured with DHCPv4 options which matches IPv4
1093 packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
1094 1 and responds back to the inport after applying these
1095 actions. If reg0[3] is set to 1, it means that the action
1096 put_dhcp_opts was successful.
1097
1098 eth.dst = eth.src;
1099 eth.src = E;
1100 ip4.src = S;
1101 udp.src = 67;
1102 udp.dst = 68;
1103 outport = P;
1104 flags.loopback = 1;
1105 output;
1106
1107
1108 where E is the server MAC address and S is the server
1109 IPv4 address defined in the DHCPv4 options. Note that
1110 ip4.dst field is handled by put_dhcp_opts.
1111
1112 (This terminates ingress packet processing; the packet
1113 does not go to the next ingress table.)
1114
1115 • A priority 100 logical flow is added for the logical
1116 ports configured with DHCPv6 options which matches IPv6
1117 packets with udp.src == 546 && udp.dst == 547 && reg0[3]
1118 == 1 and responds back to the inport after applying these
1119 actions. If reg0[3] is set to 1, it means that the action
1120 put_dhcpv6_opts was successful.
1121
1122 eth.dst = eth.src;
1123 eth.src = E;
1124 ip6.dst = A;
1125 ip6.src = S;
1126 udp.src = 547;
1127 udp.dst = 546;
1128 outport = P;
1129 flags.loopback = 1;
1130 output;
1131
1132
1133 where E is the server MAC address and S is the server
1134 IPv6 LLA address generated from the server_id defined in
1135 the DHCPv6 options and A is the IPv6 address defined in
1136 the logical port’s addresses column.
1137
1138 (This terminates packet processing; the packet does not
1139 go on the next ingress table.)
1140
1141 • A priority-0 flow that matches all packets to advances to
1142 table 17.
1143
1144 Ingress Table 19 DNS Lookup
1145
1146 This table looks up and resolves the DNS names to the corresponding
1147 configured IP address(es).
1148
1149 • A priority-100 logical flow for each logical switch data‐
1150 path if it is configured with DNS records, which matches
1151 the IPv4 and IPv6 packets with udp.dst = 53 and applies
1152 the action dns_lookup and advances the packet to the next
1153 table.
1154
1155 reg0[4] = dns_lookup(); next;
1156
1157
1158 For valid DNS packets, this transforms the packet into a
1159 DNS reply if the DNS name can be resolved, and stores 1
1160 into reg0[4]. For failed DNS resolution or other kinds of
1161 packets, it just stores 0 into reg0[4]. Either way, it
1162 continues to the next table.
1163
1164 Ingress Table 20 DNS Responses
1165
1166 This table implements DNS responder for the DNS replies generated by
1167 the previous table.
1168
1169 • A priority-100 logical flow for each logical switch data‐
1170 path if it is configured with DNS records, which matches
1171 the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
1172 1 and responds back to the inport after applying these
1173 actions. If reg0[4] is set to 1, it means that the action
1174 dns_lookup was successful.
1175
1176 eth.dst <-> eth.src;
1177 ip4.src <-> ip4.dst;
1178 udp.dst = udp.src;
1179 udp.src = 53;
1180 outport = P;
1181 flags.loopback = 1;
1182 output;
1183
1184
1185 (This terminates ingress packet processing; the packet
1186 does not go to the next ingress table.)
1187
1188 Ingress table 21 External ports
1189
1190 Traffic from the external logical ports enter the ingress datapath
1191 pipeline via the localnet port. This table adds the below logical flows
1192 to handle the traffic from these ports.
1193
1194 • A priority-100 flow is added for each external logical
1195 port which doesn’t reside on a chassis to drop the
1196 ARP/IPv6 NS request to the router IP(s) (of the logical
1197 switch) which matches on the inport of the external logi‐
1198 cal port and the valid eth.src address(es) of the exter‐
1199 nal logical port.
1200
1201 This flow guarantees that the ARP/NS request to the
1202 router IP address from the external ports is responded by
1203 only the chassis which has claimed these external ports.
1204 All the other chassis, drops these packets.
1205
1206 A priority-100 flow is added for each external logical
1207 port which doesn’t reside on a chassis to drop any packet
1208 destined to the router mac - with the match inport == ex‐
1209 ternal && eth.src == E && eth.dst == R && !is_chas‐
1210 sis_resident("external") where E is the external port mac
1211 and R is the router port mac.
1212
1213 • A priority-0 flow that matches all packets to advances to
1214 table 20.
1215
1216 Ingress Table 22 Destination Lookup
1217
1218 This table implements switching behavior. It contains these logical
1219 flows:
1220
1221 • A priority-110 flow with the match eth.src == E for all
1222 logical switch datapaths and applies the action han‐
1223 dle_svc_check(inport). Where E is the service monitor mac
1224 defined in the options:svc_monitor_mac colum of NB_Global
1225 table.
1226
1227 • A priority-100 flow that punts all IGMP/MLD packets to
1228 ovn-controller if multicast snooping is enabled on the
1229 logical switch. The flow also forwards the IGMP/MLD pack‐
1230 ets to the MC_MROUTER_STATIC multicast group, which
1231 ovn-northd populates with all the logical ports that have
1232 options :mcast_flood_reports=’true’.
1233
1234 • Priority-90 flows that forward registered IP multicast
1235 traffic to their corresponding multicast group, which
1236 ovn-northd creates based on learnt IGMP_Group entries.
1237 The flows also forward packets to the MC_MROUTER_FLOOD
1238 multicast group, which ovn-nortdh populates with all the
1239 logical ports that are connected to logical routers with
1240 options:mcast_relay=’true’.
1241
1242 • A priority-85 flow that forwards all IP multicast traffic
1243 destined to 224.0.0.X to the MC_FLOOD multicast group,
1244 which ovn-northd populates with all enabled logical
1245 ports.
1246
1247 • A priority-85 flow that forwards all IP multicast traffic
1248 destined to reserved multicast IPv6 addresses (RFC 4291,
1249 2.7.1, e.g., Solicited-Node multicast) to the MC_FLOOD
1250 multicast group, which ovn-northd populates with all en‐
1251 abled logical ports.
1252
1253 • A priority-80 flow that forwards all unregistered IP mul‐
1254 ticast traffic to the MC_STATIC multicast group, which
1255 ovn-northd populates with all the logical ports that have
1256 options :mcast_flood=’true’. The flow also forwards un‐
1257 registered IP multicast traffic to the MC_MROUTER_FLOOD
1258 multicast group, which ovn-northd populates with all the
1259 logical ports connected to logical routers that have op‐
1260 tions :mcast_relay=’true’.
1261
1262 • A priority-80 flow that drops all unregistered IP multi‐
1263 cast traffic if other_config :mcast_snoop=’true’ and
1264 other_config :mcast_flood_unregistered=’false’ and the
1265 switch is not connected to a logical router that has op‐
1266 tions :mcast_relay=’true’ and the switch doesn’t have any
1267 logical port with options :mcast_flood=’true’.
1268
1269 • Priority-80 flows for each IP address/VIP/NAT address
1270 owned by a router port connected to the switch. These
1271 flows match ARP requests and ND packets for the specific
1272 IP addresses. Matched packets are forwarded only to the
1273 router that owns the IP address and to the MC_FLOOD_L2
1274 multicast group which contains all non-router logical
1275 ports.
1276
1277 • Priority-75 flows for each port connected to a logical
1278 router matching self originated ARP request/ND packets.
1279 These packets are flooded to the MC_FLOOD_L2 which con‐
1280 tains all non-router logical ports.
1281
1282 • A priority-70 flow that outputs all packets with an Eth‐
1283 ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
1284 ticast group.
1285
1286 • One priority-50 flow that matches each known Ethernet ad‐
1287 dress against eth.dst and outputs the packet to the sin‐
1288 gle associated output port.
1289
1290 For the Ethernet address on a logical switch port of type
1291 router, when that logical switch port’s addresses column
1292 is set to router and the connected logical router port
1293 has a gateway chassis:
1294
1295 • The flow for the connected logical router port’s
1296 Ethernet address is only programmed on the gateway
1297 chassis.
1298
1299 • If the logical router has rules specified in nat
1300 with external_mac, then those addresses are also
1301 used to populate the switch’s destination lookup
1302 on the chassis where logical_port is resident.
1303
1304 For the Ethernet address on a logical switch port of type
1305 router, when that logical switch port’s addresses column
1306 is set to router and the connected logical router port
1307 specifies a reside-on-redirect-chassis and the logical
1308 router to which the connected logical router port belongs
1309 to has a distributed gateway LRP:
1310
1311 • The flow for the connected logical router port’s
1312 Ethernet address is only programmed on the gateway
1313 chassis.
1314
1315 For each forwarding group configured on the logical
1316 switch datapath, a priority-50 flow that matches on
1317 eth.dst == VIP
1318 with an action of fwd_group(childports=args ), where
1319 args contains comma separated logical switch child ports
1320 to load balance to. If liveness is enabled, then action
1321 also includes liveness=true.
1322
1323 • One priority-0 fallback flow that matches all packets
1324 with the action outport = get_fdb(eth.dst); next;. The
1325 action get_fdb gets the port for the eth.dst in the MAC
1326 learning table of the logical switch datapath. If there
1327 is no entry for eth.dst in the MAC learning table, then
1328 it stores none in the outport.
1329
1330 Ingress Table 23 Destination unknown
1331
1332 This table handles the packets whose destination was not found or and
1333 looked up in the MAC learning table of the logical switch datapath. It
1334 contains the following flows.
1335
1336 • If the logical switch has logical ports with ’unknown’
1337 addresses set, then the below logical flow is added
1338
1339 • Priority 50 flow with the match outport == none
1340 then outputs them to the MC_UNKNOWN multicast
1341 group, which ovn-northd populates with all enabled
1342 logical ports that accept unknown destination
1343 packets. As a small optimization, if no logical
1344 ports accept unknown destination packets,
1345 ovn-northd omits this multicast group and logical
1346 flow.
1347
1348 If the logical switch has no logical ports with ’unknown’
1349 address set, then the below logical flow is added
1350
1351 • Priority 50 flow with the match outport == none
1352 and drops the packets.
1353
1354 • One priority-0 fallback flow that outputs the packet to
1355 the egress stage with the outport learnt from get_fdb ac‐
1356 tion.
1357
1358 Egress Table 0: Pre-LB
1359
1360 This table is similar to ingress table Pre-LB. It contains a priority-0
1361 flow that simply moves traffic to the next table. Moreover it contains
1362 a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
1363 table. If any load balancing rules exist for the datapath, a prior‐
1364 ity-100 flow is added with a match of ip and action of reg0[2] = 1;
1365 next; to act as a hint for table Pre-stateful to send IP packets to the
1366 connection tracker for packet de-fragmentation and possibly DNAT the
1367 destination VIP to one of the selected backend for already commited
1368 load balanced traffic.
1369
1370 This table also has a priority-110 flow with the match eth.src == E for
1371 all logical switch datapaths to move traffic to the next table. Where E
1372 is the service monitor mac defined in the options:svc_monitor_mac colum
1373 of NB_Global table.
1374
1375 Egress Table 1:to-lportPre-ACLs
1376
1377 This is similar to ingress table Pre-ACLs except for to-lport traffic.
1378
1379 This table also has a priority-110 flow with the match eth.src == E for
1380 all logical switch datapaths to move traffic to the next table. Where E
1381 is the service monitor mac defined in the options:svc_monitor_mac colum
1382 of NB_Global table.
1383
1384 This table also has a priority-110 flow with the match outport == I for
1385 all logical switch datapaths to move traffic to the next table. Where I
1386 is the peer of a logical router port. This flow is added to skip the
1387 connection tracking of packets which will be entering logical router
1388 datapath from logical switch datapath for routing.
1389
1390 Egress Table 2: Pre-stateful
1391
1392 This is similar to ingress table Pre-stateful. This table adds the be‐
1393 low 3 logical flows.
1394
1395 • A Priority-120 flow that send the packets to connection
1396 tracker using ct_lb; as the action so that the already
1397 established traffic gets unDNATted from the backend IP to
1398 the load balancer VIP based on a hint provided by the
1399 previous tables with a match for reg0[2] == 1. If the
1400 packet was not DNATted earlier, then ct_lb functions like
1401 ct_next.
1402
1403 • A priority-100 flow sends the packets to connection
1404 tracker based on a hint provided by the previous tables
1405 (with a match for reg0[0] == 1) by using the ct_next; ac‐
1406 tion.
1407
1408 • A priority-0 flow that matches all packets to advance to
1409 the next table.
1410
1411 Egress Table 3:from-lportACL hints
1412
1413 This is similar to ingress table ACL hints.
1414
1415 Egress Table 4:to-lportACLs
1416
1417 This is similar to ingress table ACLs except for to-lport ACLs.
1418
1419 In addition, the following flows are added.
1420
1421 • A priority 34000 logical flow is added for each logical
1422 port which has DHCPv4 options defined to allow the DHCPv4
1423 reply packet and which has DHCPv6 options defined to al‐
1424 low the DHCPv6 reply packet from the Ingress Table 16:
1425 DHCP responses.
1426
1427 • A priority 34000 logical flow is added for each logical
1428 switch datapath configured with DNS records with the
1429 match udp.dst = 53 to allow the DNS reply packet from the
1430 Ingress Table 18: DNS responses.
1431
1432 • A priority 34000 logical flow is added for each logical
1433 switch datapath with the match eth.src = E to allow the
1434 service monitor request packet generated by ovn-con‐
1435 troller with the action next, where E is the service mon‐
1436 itor mac defined in the options:svc_monitor_mac colum of
1437 NB_Global table.
1438
1439 Egress Table 5:to-lportQoS Marking
1440
1441 This is similar to ingress table QoS marking except they apply to
1442 to-lport QoS rules.
1443
1444 Egress Table 6:to-lportQoS Meter
1445
1446 This is similar to ingress table QoS meter except they apply to
1447 to-lport QoS rules.
1448
1449 Egress Table 7: Stateful
1450
1451 This is similar to ingress table Stateful except that there are no
1452 rules added for load balancing new connections.
1453
1454 Egress Table 8: Egress Port Security - IP
1455
1456 This is similar to the port security logic in table Ingress Port Secu‐
1457 rity - IP except that outport, eth.dst, ip4.dst and ip6.dst are checked
1458 instead of inport, eth.src, ip4.src and ip6.src
1459
1460 Egress Table 9: Egress Port Security - L2
1461
1462 This is similar to the ingress port security logic in ingress table Ad‐
1463 mission Control and Ingress Port Security - L2, but with important dif‐
1464 ferences. Most obviously, outport and eth.dst are checked instead of
1465 inport and eth.src. Second, packets directed to broadcast or multicast
1466 eth.dst are always accepted instead of being subject to the port secu‐
1467 rity rules; this is implemented through a priority-100 flow that
1468 matches on eth.mcast with action output;. Moreover, to ensure that even
1469 broadcast and multicast packets are not delivered to disabled logical
1470 ports, a priority-150 flow for each disabled logical outport overrides
1471 the priority-100 flow with a drop; action. Finally if egress qos has
1472 been enabled on a localnet port, the outgoing queue id is set through
1473 set_queue action. Please remember to mark the corresponding physical
1474 interface with ovn-egress-iface set to true in external_ids
1475
1476 Logical Router Datapaths
1477 Logical router datapaths will only exist for Logical_Router rows in the
1478 OVN_Northbound database that do not have enabled set to false
1479
1480 Ingress Table 0: L2 Admission Control
1481
1482 This table drops packets that the router shouldn’t see at all based on
1483 their Ethernet headers. It contains the following flows:
1484
1485 • Priority-100 flows to drop packets with VLAN tags or mul‐
1486 ticast Ethernet source addresses.
1487
1488 • For each enabled router port P with Ethernet address E, a
1489 priority-50 flow that matches inport == P && (eth.mcast
1490 || eth.dst == E), stores the router port ethernet address
1491 and advances to next table, with action xreg0[0..47]=E;
1492 next;.
1493
1494 For the gateway port on a distributed logical router
1495 (where one of the logical router ports specifies a gate‐
1496 way chassis), the above flow matching eth.dst == E is
1497 only programmed on the gateway port instance on the gate‐
1498 way chassis.
1499
1500 • For each dnat_and_snat NAT rule on a distributed router
1501 that specifies an external Ethernet address E, a prior‐
1502 ity-50 flow that matches inport == GW && eth.dst == E,
1503 where GW is the logical router gateway port, with action
1504 xreg0[0..47]=E; next;.
1505
1506 This flow is only programmed on the gateway port instance
1507 on the chassis where the logical_port specified in the
1508 NAT rule resides.
1509
1510 Other packets are implicitly dropped.
1511
1512 Ingress Table 1: Neighbor lookup
1513
1514 For ARP and IPv6 Neighbor Discovery packets, this table looks into the
1515 MAC_Binding records to determine if OVN needs to learn the mac bind‐
1516 ings. Following flows are added:
1517
1518 • For each router port P that owns IP address A, which be‐
1519 longs to subnet S with prefix length L, if the option al‐
1520 ways_learn_from_arp_request is true for this router, a
1521 priority-100 flow is added which matches inport == P &&
1522 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1523 lowing actions:
1524
1525 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1526 next;
1527
1528
1529 If the option always_learn_from_arp_request is false, the
1530 following two flows are added.
1531
1532 A priority-110 flow is added which matches inport == P &&
1533 arp.spa == S/L && arp.tpa == A && arp.op == 1 (ARP re‐
1534 quest) with the following actions:
1535
1536 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1537 reg9[3] = 1;
1538 next;
1539
1540
1541 A priority-100 flow is added which matches inport == P &&
1542 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1543 lowing actions:
1544
1545 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1546 reg9[3] = lookup_arp_ip(inport, arp.spa);
1547 next;
1548
1549
1550 If the logical router port P is a distributed gateway
1551 router port, additional match is_chassis_resident(cr-P)
1552 is added for all these flows.
1553
1554 • A priority-100 flow which matches on ARP reply packets
1555 and applies the actions if the option al‐
1556 ways_learn_from_arp_request is true:
1557
1558 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1559 next;
1560
1561
1562 If the option always_learn_from_arp_request is false, the
1563 above actions will be:
1564
1565 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1566 reg9[3] = 1;
1567 next;
1568
1569
1570 • A priority-100 flow which matches on IPv6 Neighbor Dis‐
1571 covery advertisement packet and applies the actions if
1572 the option always_learn_from_arp_request is true:
1573
1574 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1575 next;
1576
1577
1578 If the option always_learn_from_arp_request is false, the
1579 above actions will be:
1580
1581 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1582 reg9[3] = 1;
1583 next;
1584
1585
1586 • A priority-100 flow which matches on IPv6 Neighbor Dis‐
1587 covery solicitation packet and applies the actions if the
1588 option always_learn_from_arp_request is true:
1589
1590 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1591 next;
1592
1593
1594 If the option always_learn_from_arp_request is false, the
1595 above actions will be:
1596
1597 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1598 reg9[3] = lookup_nd_ip(inport, ip6.src);
1599 next;
1600
1601
1602 • A priority-0 fallback flow that matches all packets and
1603 applies the action reg9[2] = 1; next; advancing the
1604 packet to the next table.
1605
1606 Ingress Table 2: Neighbor learning
1607
1608 This table adds flows to learn the mac bindings from the ARP and IPv6
1609 Neighbor Solicitation/Advertisement packets if it is needed according
1610 to the lookup results from the previous stage.
1611
1612 reg9[2] will be 1 if the lookup_arp/lookup_nd in the previous table was
1613 successful or skipped, meaning no need to learn mac binding from the
1614 packet.
1615
1616 reg9[3] will be 1 if the lookup_arp_ip/lookup_nd_ip in the previous ta‐
1617 ble was successful or skipped, meaning it is ok to learn mac binding
1618 from the packet (if reg9[2] is 0).
1619
1620 • A priority-100 flow with the match reg9[2] == 1 ||
1621 reg9[3] == 0 and advances the packet to the next table as
1622 there is no need to learn the neighbor.
1623
1624 • A priority-90 flow with the match arp and applies the ac‐
1625 tion put_arp(inport, arp.spa, arp.sha); next;
1626
1627 • A priority-90 flow with the match nd_na and applies the
1628 action put_nd(inport, nd.target, nd.tll); next;
1629
1630 • A priority-90 flow with the match nd_ns and applies the
1631 action put_nd(inport, ip6.src, nd.sll); next;
1632
1633 Ingress Table 3: IP Input
1634
1635 This table is the core of the logical router datapath functionality. It
1636 contains the following flows to implement very basic IP host function‐
1637 ality.
1638
1639 • For each NAT entry of a distributed logical router (with
1640 distributed gateway router port) of type snat, a prior‐
1641 ity-120 flow with the match inport == P && ip4.src == A
1642 advances the packet to the next pipeline, where P is the
1643 distributed logical router port and A is the external_ip
1644 set in the NAT entry. If A is an IPv6 address, then
1645 ip6.src is used for the match.
1646
1647 The above flow is required to handle the routing of the
1648 East/west NAT traffic.
1649
1650 • For each BFD port the two following priority-110 flows
1651 are added to manage BFD traffic:
1652
1653 • if ip4.src or ip6.src is any IP address owned by
1654 the router port and udp.dst == 3784 , the packet
1655 is advanced to the next pipeline stage.
1656
1657 • if ip4.dst or ip6.dst is any IP address owned by
1658 the router port and udp.dst == 3784 , the han‐
1659 dle_bfd_msg action is executed.
1660
1661 • L3 admission control: A priority-100 flow drops packets
1662 that match any of the following:
1663
1664 • ip4.src[28..31] == 0xe (multicast source)
1665
1666 • ip4.src == 255.255.255.255 (broadcast source)
1667
1668 • ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
1669 (localhost source or destination)
1670
1671 • ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
1672 network source or destination)
1673
1674 • ip4.src or ip6.src is any IP address owned by the
1675 router, unless the packet was recirculated due to
1676 egress loopback as indicated by REG‐
1677 BIT_EGRESS_LOOPBACK.
1678
1679 • ip4.src is the broadcast address of any IP network
1680 known to the router.
1681
1682 • A priority-100 flow parses DHCPv6 replies from IPv6 pre‐
1683 fix delegation routers (udp.src == 547 && udp.dst ==
1684 546). The handle_dhcpv6_reply is used to send IPv6 prefix
1685 delegation messages to the delegation router.
1686
1687 • ICMP echo reply. These flows reply to ICMP echo requests
1688 received for the router’s IP address. Let A be an IP ad‐
1689 dress owned by a router port. Then, for each A that is an
1690 IPv4 address, a priority-90 flow matches on ip4.dst == A
1691 and icmp4.type == 8 && icmp4.code == 0 (ICMP echo re‐
1692 quest). For each A that is an IPv6 address, a priority-90
1693 flow matches on ip6.dst == A and icmp6.type == 128 &&
1694 icmp6.code == 0 (ICMPv6 echo request). The port of the
1695 router that receives the echo request does not matter.
1696 Also, the ip.ttl of the echo request packet is not
1697 checked, so it complies with RFC 1812, section 4.2.2.9.
1698 Flows for ICMPv4 echo requests use the following actions:
1699
1700 ip4.dst <-> ip4.src;
1701 ip.ttl = 255;
1702 icmp4.type = 0;
1703 flags.loopback = 1;
1704 next;
1705
1706
1707 Flows for ICMPv6 echo requests use the following actions:
1708
1709 ip6.dst <-> ip6.src;
1710 ip.ttl = 255;
1711 icmp6.type = 129;
1712 flags.loopback = 1;
1713 next;
1714
1715
1716 • Reply to ARP requests.
1717
1718 These flows reply to ARP requests for the router’s own IP
1719 address. The ARP requests are handled only if the re‐
1720 questor’s IP belongs to the same subnets of the logical
1721 router port. For each router port P that owns IP address
1722 A, which belongs to subnet S with prefix length L, and
1723 Ethernet address E, a priority-90 flow matches inport ==
1724 P && arp.spa == S/L && arp.op == 1 && arp.tpa == A (ARP
1725 request) with the following actions:
1726
1727 eth.dst = eth.src;
1728 eth.src = xreg0[0..47];
1729 arp.op = 2; /* ARP reply. */
1730 arp.tha = arp.sha;
1731 arp.sha = xreg0[0..47];
1732 arp.tpa = arp.spa;
1733 arp.spa = A;
1734 outport = inport;
1735 flags.loopback = 1;
1736 output;
1737
1738
1739 For the gateway port on a distributed logical router
1740 (where one of the logical router ports specifies a gate‐
1741 way chassis), the above flows are only programmed on the
1742 gateway port instance on the gateway chassis. This behav‐
1743 ior avoids generation of multiple ARP responses from dif‐
1744 ferent chassis, and allows upstream MAC learning to point
1745 to the gateway chassis.
1746
1747 For the logical router port with the option reside-on-re‐
1748 direct-chassis set (which is centralized), the above
1749 flows are only programmed on the gateway port instance on
1750 the gateway chassis (if the logical router has a distrib‐
1751 uted gateway port). This behavior avoids generation of
1752 multiple ARP responses from different chassis, and allows
1753 upstream MAC learning to point to the gateway chassis.
1754
1755 • Reply to IPv6 Neighbor Solicitations. These flows reply
1756 to Neighbor Solicitation requests for the router’s own
1757 IPv6 address and populate the logical router’s mac bind‐
1758 ing table.
1759
1760 For each router port P that owns IPv6 address A, so‐
1761 licited node address S, and Ethernet address E, a prior‐
1762 ity-90 flow matches inport == P && nd_ns && ip6.dst ==
1763 {A, E} && nd.target == A with the following actions:
1764
1765 nd_na_router {
1766 eth.src = xreg0[0..47];
1767 ip6.src = A;
1768 nd.target = A;
1769 nd.tll = xreg0[0..47];
1770 outport = inport;
1771 flags.loopback = 1;
1772 output;
1773 };
1774
1775
1776 For the gateway port on a distributed logical router
1777 (where one of the logical router ports specifies a gate‐
1778 way chassis), the above flows replying to IPv6 Neighbor
1779 Solicitations are only programmed on the gateway port in‐
1780 stance on the gateway chassis. This behavior avoids gen‐
1781 eration of multiple replies from different chassis, and
1782 allows upstream MAC learning to point to the gateway
1783 chassis.
1784
1785 • These flows reply to ARP requests or IPv6 neighbor solic‐
1786 itation for the virtual IP addresses configured in the
1787 router for NAT (both DNAT and SNAT) or load balancing.
1788
1789 IPv4: For a configured NAT (both DNAT and SNAT) IP ad‐
1790 dress or a load balancer IPv4 VIP A, for each router port
1791 P with Ethernet address E, a priority-90 flow matches
1792 arp.op == 1 && arp.tpa == A (ARP request) with the fol‐
1793 lowing actions:
1794
1795 eth.dst = eth.src;
1796 eth.src = xreg0[0..47];
1797 arp.op = 2; /* ARP reply. */
1798 arp.tha = arp.sha;
1799 arp.sha = xreg0[0..47];
1800 arp.tpa = arp.spa;
1801 arp.spa = A;
1802 outport = inport;
1803 flags.loopback = 1;
1804 output;
1805
1806
1807 IPv4: For a configured load balancer IPv4 VIP, a similar
1808 flow is added with the additional match inport == P.
1809
1810 If the router port P is a distributed gateway router
1811 port, then the is_chassis_resident(P) is also added in
1812 the match condition for the load balancer IPv4 VIP A.
1813
1814 IPv6: For a configured NAT (both DNAT and SNAT) IP ad‐
1815 dress or a load balancer IPv6 VIP A, solicited node ad‐
1816 dress S, for each router port P with Ethernet address E,
1817 a priority-90 flow matches inport == P && nd_ns &&
1818 ip6.dst == {A, S} && nd.target == A with the following
1819 actions:
1820
1821 eth.dst = eth.src;
1822 nd_na {
1823 eth.src = xreg0[0..47];
1824 nd.tll = xreg0[0..47];
1825 ip6.src = A;
1826 nd.target = A;
1827 outport = inport;
1828 flags.loopback = 1;
1829 output;
1830 }
1831
1832
1833 If the router port P is a distributed gateway router
1834 port, then the is_chassis_resident(P) is also added in
1835 the match condition for the load balancer IPv6 VIP A.
1836
1837 For the gateway port on a distributed logical router with
1838 NAT (where one of the logical router ports specifies a
1839 gateway chassis):
1840
1841 • If the corresponding NAT rule cannot be handled in
1842 a distributed manner, then a priority-92 flow is
1843 programmed on the gateway port instance on the
1844 gateway chassis. A priority-91 drop flow is pro‐
1845 grammed on the other chassis when ARP requests/NS
1846 packets are received on the gateway port. This be‐
1847 havior avoids generation of multiple ARP responses
1848 from different chassis, and allows upstream MAC
1849 learning to point to the gateway chassis.
1850
1851 • If the corresponding NAT rule can be handled in a
1852 distributed manner, then this flow is only pro‐
1853 grammed on the gateway port instance where the
1854 logical_port specified in the NAT rule resides.
1855
1856 Some of the actions are different for this case,
1857 using the external_mac specified in the NAT rule
1858 rather than the gateway port’s Ethernet address E:
1859
1860 eth.src = external_mac;
1861 arp.sha = external_mac;
1862
1863
1864 or in the case of IPv6 neighbor solicition:
1865
1866 eth.src = external_mac;
1867 nd.tll = external_mac;
1868
1869
1870 This behavior avoids generation of multiple ARP
1871 responses from different chassis, and allows up‐
1872 stream MAC learning to point to the correct chas‐
1873 sis.
1874
1875 • Priority-85 flows which drops the ARP and IPv6 Neighbor
1876 Discovery packets.
1877
1878 • A priority-84 flow explicitly allows IPv6 multicast traf‐
1879 fic that is supposed to reach the router pipeline (i.e.,
1880 router solicitation and router advertisement packets).
1881
1882 • A priority-83 flow explicitly drops IPv6 multicast traf‐
1883 fic that is destined to reserved multicast groups.
1884
1885 • A priority-82 flow allows IP multicast traffic if op‐
1886 tions:mcast_relay=’true’, otherwise drops it.
1887
1888 • UDP port unreachable. Priority-80 flows generate ICMP
1889 port unreachable messages in reply to UDP datagrams di‐
1890 rected to the router’s IP address, except in the special
1891 case of gateways, which accept traffic directed to a
1892 router IP for load balancing and NAT purposes.
1893
1894 These flows should not match IP fragments with nonzero
1895 offset.
1896
1897 • TCP reset. Priority-80 flows generate TCP reset messages
1898 in reply to TCP datagrams directed to the router’s IP ad‐
1899 dress, except in the special case of gateways, which ac‐
1900 cept traffic directed to a router IP for load balancing
1901 and NAT purposes.
1902
1903 These flows should not match IP fragments with nonzero
1904 offset.
1905
1906 • Protocol or address unreachable. Priority-70 flows gener‐
1907 ate ICMP protocol or address unreachable messages for
1908 IPv4 and IPv6 respectively in reply to packets directed
1909 to the router’s IP address on IP protocols other than
1910 UDP, TCP, and ICMP, except in the special case of gate‐
1911 ways, which accept traffic directed to a router IP for
1912 load balancing purposes.
1913
1914 These flows should not match IP fragments with nonzero
1915 offset.
1916
1917 • Drop other IP traffic to this router. These flows drop
1918 any other traffic destined to an IP address of this
1919 router that is not already handled by one of the flows
1920 above, which amounts to ICMP (other than echo requests)
1921 and fragments with nonzero offsets. For each IP address A
1922 owned by the router, a priority-60 flow matches ip4.dst
1923 == A or ip6.dst == A and drops the traffic. An exception
1924 is made and the above flow is not added if the router
1925 port’s own IP address is used to SNAT packets passing
1926 through that router.
1927
1928 The flows above handle all of the traffic that might be directed to the
1929 router itself. The following flows (with lower priorities) handle the
1930 remaining traffic, potentially for forwarding:
1931
1932 • Drop Ethernet local broadcast. A priority-50 flow with
1933 match eth.bcast drops traffic destined to the local Eth‐
1934 ernet broadcast address. By definition this traffic
1935 should not be forwarded.
1936
1937 • ICMP time exceeded. For each router port P, whose IP ad‐
1938 dress is A, a priority-40 flow with match inport == P &&
1939 ip.ttl == {0, 1} && !ip.later_frag matches packets whose
1940 TTL has expired, with the following actions to send an
1941 ICMP time exceeded reply for IPv4 and IPv6 respectively:
1942
1943 icmp4 {
1944 icmp4.type = 11; /* Time exceeded. */
1945 icmp4.code = 0; /* TTL exceeded in transit. */
1946 ip4.dst = ip4.src;
1947 ip4.src = A;
1948 ip.ttl = 255;
1949 next;
1950 };
1951 icmp6 {
1952 icmp6.type = 3; /* Time exceeded. */
1953 icmp6.code = 0; /* TTL exceeded in transit. */
1954 ip6.dst = ip6.src;
1955 ip6.src = A;
1956 ip.ttl = 255;
1957 next;
1958 };
1959
1960
1961 • TTL discard. A priority-30 flow with match ip.ttl == {0,
1962 1} and actions drop; drops other packets whose TTL has
1963 expired, that should not receive a ICMP error reply (i.e.
1964 fragments with nonzero offset).
1965
1966 • Next table. A priority-0 flows match all packets that
1967 aren’t already handled and uses actions next; to feed
1968 them to the next table.
1969
1970 Ingress Table 4: DEFRAG
1971
1972 This is to send packets to connection tracker for tracking and defrag‐
1973 mentation. It contains a priority-0 flow that simply moves traffic to
1974 the next table.
1975
1976 If load balancing rules with virtual IP addresses (and ports) are con‐
1977 figured in OVN_Northbound database for a Gateway router, a priority-100
1978 flow is added for each configured virtual IP address VIP. For IPv4 VIPs
1979 the flow matches ip && ip4.dst == VIP. For IPv6 VIPs, the flow matches
1980 ip && ip6.dst == VIP. The flow uses the action ct_next; to send IP
1981 packets to the connection tracker for packet de-fragmentation and
1982 tracking before sending it to the next table.
1983
1984 If ECMP routes with symmetric reply are configured in the OVN_North‐
1985 bound database for a gateway router, a priority-100 flow is added for
1986 each router port on which symmetric replies are configured. The match‐
1987 ing logic for these ports essentially reverses the configured logic of
1988 the ECMP route. So for instance, a route with a destination routing
1989 policy will instead match if the source IP address matches the static
1990 route’s prefix. The flow uses the action ct_next to send IP packets to
1991 the connection tracker for packet de-fragmentation and tracking before
1992 sending it to the next table.
1993
1994 Ingress Table 5: UNSNAT
1995
1996 This is for already established connections’ reverse traffic. i.e.,
1997 SNAT has already been done in egress pipeline and now the packet has
1998 entered the ingress pipeline as part of a reply. It is unSNATted here.
1999
2000 Ingress Table 5: UNSNAT on Gateway and Distributed Routers
2001
2002 • If the Router (Gateway or Distributed) is configured with
2003 load balancers, then below lflows are added:
2004
2005 For each IPv4 address A defined as load balancer VIP with
2006 the protocol P (and the protocol port T if defined) is
2007 also present as an external_ip in the NAT table, a prior‐
2008 ity-120 logical flow is added with the match ip4 &&
2009 ip4.dst == A && P with the action next; to advance the
2010 packet to the next table. If the load balancer has proto‐
2011 col port B defined, then the match also has P.dst == B.
2012
2013 The above flows are also added for IPv6 load balancers.
2014
2015 Ingress Table 5: UNSNAT on Gateway Routers
2016
2017 • If the Gateway router has been configured to force SNAT
2018 any previously DNATted packets to B, a priority-110 flow
2019 matches ip && ip4.dst == B or ip && ip6.dst == B with an
2020 action ct_snat; .
2021
2022 If the Gateway router is configured with
2023 lb_force_snat_ip=router_ip then for every logical router
2024 port P attached to the Gateway router with the router ip
2025 B, a priority-110 flow is added with the match inport ==
2026 P && ip4.dst == B or inport == P && ip6.dst == B with an
2027 action ct_snat; .
2028
2029 If the Gateway router has been configured to force SNAT
2030 any previously load-balanced packets to B, a priority-100
2031 flow matches ip && ip4.dst == B or ip && ip6.dst == B
2032 with an action ct_snat; .
2033
2034 For each NAT configuration in the OVN Northbound data‐
2035 base, that asks to change the source IP address of a
2036 packet from A to B, a priority-90 flow matches ip &&
2037 ip4.dst == B or ip && ip6.dst == B with an action
2038 ct_snat; . If the NAT rule is of type dnat_and_snat and
2039 has stateless=true in the options, then the action would
2040 be ip4/6.dst= (B).
2041
2042 A priority-0 logical flow with match 1 has actions next;.
2043
2044 Ingress Table 5: UNSNAT on Distributed Routers
2045
2046 • For each configuration in the OVN Northbound database,
2047 that asks to change the source IP address of a packet
2048 from A to B, a priority-100 flow matches ip && ip4.dst ==
2049 B && inport == GW or ip && ip6.dst == B && inport == GW
2050 where GW is the logical router gateway port, with an ac‐
2051 tion ct_snat;. If the NAT rule is of type dnat_and_snat
2052 and has stateless=true in the options, then the action
2053 would be ip4/6.dst= (B).
2054
2055 If the NAT rule cannot be handled in a distributed man‐
2056 ner, then the priority-100 flow above is only programmed
2057 on the gateway chassis.
2058
2059 A priority-0 logical flow with match 1 has actions next;.
2060
2061 Ingress Table 6: DNAT
2062
2063 Packets enter the pipeline with destination IP address that needs to be
2064 DNATted from a virtual IP address to a real IP address. Packets in the
2065 reverse direction needs to be unDNATed.
2066
2067 Ingress Table 6: Load balancing DNAT rules
2068
2069 Following load balancing DNAT flows are added for Gateway router or
2070 Router with gateway port. These flows are programmed only on the gate‐
2071 way chassis. These flows do not get programmed for load balancers with
2072 IPv6 VIPs.
2073
2074 • If controller_event has been enabled for all the config‐
2075 ured load balancing rules for a Gateway router or Router
2076 with gateway port in OVN_Northbound database that does
2077 not have configured backends, a priority-130 flow is
2078 added to trigger ovn-controller events whenever the chas‐
2079 sis receives a packet for that particular VIP. If
2080 event-elb meter has been previously created, it will be
2081 associated to the empty_lb logical flow
2082
2083 • For all the configured load balancing rules for a Gateway
2084 router or Router with gateway port in OVN_Northbound
2085 database that includes a L4 port PORT of protocol P and
2086 IPv4 or IPv6 address VIP, a priority-120 flow that
2087 matches on ct.new && ip && ip4.dst == VIP && P && P.dst
2088 == PORT
2089 (ip6.dst == VIP in the IPv6 case) with an action of
2090 ct_lb(args), where args contains comma separated IPv4 or
2091 IPv6 addresses (and optional port numbers) to load bal‐
2092 ance to. If the router is configured to force SNAT any
2093 load-balanced packets, the above action will be replaced
2094 by flags.force_snat_for_lb = 1; ct_lb(args);. If the load
2095 balancing rule is configured with skip_snat set to true,
2096 the above action will be replaced by
2097 flags.skip_snat_for_lb = 1; ct_lb(args);. If health check
2098 is enabled, then args will only contain those endpoints
2099 whose service monitor status entry in OVN_Southbound db
2100 is either online or empty.
2101
2102 • For all the configured load balancing rules for a router
2103 in OVN_Northbound database that includes a L4 port PORT
2104 of protocol P and IPv4 or IPv6 address VIP, a prior‐
2105 ity-120 flow that matches on ct.est && ip && ip4.dst ==
2106 VIP && P && P.dst == PORT
2107 (ip6.dst == VIP in the IPv6 case) with an action of
2108 ct_dnat;. If the router is configured to force SNAT any
2109 load-balanced packets, the above action will be replaced
2110 by flags.force_snat_for_lb = 1; ct_dnat;. If the load
2111 balancing rule is configured with skip_snat set to true,
2112 the above action will be replaced by
2113 flags.skip_snat_for_lb = 1; ct_dnat;.
2114
2115 • For all the configured load balancing rules for a router
2116 in OVN_Northbound database that includes just an IP ad‐
2117 dress VIP to match on, a priority-110 flow that matches
2118 on ct.new && ip && ip4.dst == VIP (ip6.dst == VIP in the
2119 IPv6 case) with an action of ct_lb(args), where args con‐
2120 tains comma separated IPv4 or IPv6 addresses. If the
2121 router is configured to force SNAT any load-balanced
2122 packets, the above action will be replaced by
2123 flags.force_snat_for_lb = 1; ct_lb(args);. If the load
2124 balancing rule is configured with skip_snat set to true,
2125 the above action will be replaced by
2126 flags.skip_snat_for_lb = 1; ct_lb(args);.
2127
2128 • For all the configured load balancing rules for a router
2129 in OVN_Northbound database that includes just an IP ad‐
2130 dress VIP to match on, a priority-110 flow that matches
2131 on ct.est && ip && ip4.dst == VIP (or ip6.dst == VIP)
2132 with an action of ct_dnat;. If the router is configured
2133 to force SNAT any load-balanced packets, the above action
2134 will be replaced by flags.force_snat_for_lb = 1;
2135 ct_dnat;. If the load balancing rule is configured with
2136 skip_snat set to true, the above action will be replaced
2137 by flags.skip_snat_for_lb = 1; ct_dnat;.
2138
2139 • If the load balancer is created with --reject option and
2140 it has no active backends, a TCP reset segment (for tcp)
2141 or an ICMP port unreachable packet (for all other kind of
2142 traffic) will be sent whenever an incoming packet is re‐
2143 ceived for this load-balancer. Please note using --reject
2144 option will disable empty_lb SB controller event for this
2145 load balancer.
2146
2147 Ingress Table 6: DNAT on Gateway Routers
2148
2149 • For each configuration in the OVN Northbound database,
2150 that asks to change the destination IP address of a
2151 packet from A to B, a priority-100 flow matches ip &&
2152 ip4.dst == A or ip && ip6.dst == A with an action
2153 flags.loopback = 1; ct_dnat(B);. If the Gateway router is
2154 configured to force SNAT any DNATed packet, the above ac‐
2155 tion will be replaced by flags.force_snat_for_dnat = 1;
2156 flags.loopback = 1; ct_dnat(B);. If the NAT rule is of
2157 type dnat_and_snat and has stateless=true in the options,
2158 then the action would be ip4/6.dst= (B).
2159
2160 If the NAT rule has allowed_ext_ips configured, then
2161 there is an additional match ip4.src == allowed_ext_ips .
2162 Similarly, for IPV6, match would be ip6.src == al‐
2163 lowed_ext_ips.
2164
2165 If the NAT rule has exempted_ext_ips set, then there is
2166 an additional flow configured at priority 101. The flow
2167 matches if source ip is an exempted_ext_ip and the action
2168 is next; . This flow is used to bypass the ct_dnat action
2169 for a packet originating from exempted_ext_ips.
2170
2171 • For all IP packets of a Gateway router, a priority-50
2172 flow with an action flags.loopback = 1; ct_dnat;.
2173
2174 • A priority-0 logical flow with match 1 has actions next;.
2175
2176 Ingress Table 6: DNAT on Distributed Routers
2177
2178 On distributed routers, the DNAT table only handles packets with desti‐
2179 nation IP address that needs to be DNATted from a virtual IP address to
2180 a real IP address. The unDNAT processing in the reverse direction is
2181 handled in a separate table in the egress pipeline.
2182
2183 • For each configuration in the OVN Northbound database,
2184 that asks to change the destination IP address of a
2185 packet from A to B, a priority-100 flow matches ip &&
2186 ip4.dst == B && inport == GW, where GW is the logical
2187 router gateway port, with an action ct_dnat(B);. The
2188 match will include ip6.dst == B in the IPv6 case. If the
2189 NAT rule is of type dnat_and_snat and has stateless=true
2190 in the options, then the action would be ip4/6.dst=(B).
2191
2192 If the NAT rule cannot be handled in a distributed man‐
2193 ner, then the priority-100 flow above is only programmed
2194 on the gateway chassis.
2195
2196 If the NAT rule has allowed_ext_ips configured, then
2197 there is an additional match ip4.src == allowed_ext_ips .
2198 Similarly, for IPV6, match would be ip6.src == al‐
2199 lowed_ext_ips.
2200
2201 If the NAT rule has exempted_ext_ips set, then there is
2202 an additional flow configured at priority 101. The flow
2203 matches if source ip is an exempted_ext_ip and the action
2204 is next; . This flow is used to bypass the ct_dnat action
2205 for a packet originating from exempted_ext_ips.
2206
2207 A priority-0 logical flow with match 1 has actions next;.
2208
2209 Ingress Table 7: ECMP symmetric reply processing
2210
2211 • If ECMP routes with symmetric reply are configured in the
2212 OVN_Northbound database for a gateway router, a prior‐
2213 ity-100 flow is added for each router port on which sym‐
2214 metric replies are configured. The matching logic for
2215 these ports essentially reverses the configured logic of
2216 the ECMP route. So for instance, a route with a destina‐
2217 tion routing policy will instead match if the source IP
2218 address matches the static route’s prefix. The flow uses
2219 the action ct_commit { ct_label.ecmp_reply_eth =
2220 eth.src;" " ct_label.ecmp_reply_port = K;}; next; to
2221 commit the connection and storing eth.src and the ECMP
2222 reply port binding tunnel key K in the ct_label.
2223
2224 Ingress Table 8: IPv6 ND RA option processing
2225
2226 • A priority-50 logical flow is added for each logical
2227 router port configured with IPv6 ND RA options which
2228 matches IPv6 ND Router Solicitation packet and applies
2229 the action put_nd_ra_opts and advances the packet to the
2230 next table.
2231
2232 reg0[5] = put_nd_ra_opts(options);next;
2233
2234
2235 For a valid IPv6 ND RS packet, this transforms the packet
2236 into an IPv6 ND RA reply and sets the RA options to the
2237 packet and stores 1 into reg0[5]. For other kinds of
2238 packets, it just stores 0 into reg0[5]. Either way, it
2239 continues to the next table.
2240
2241 • A priority-0 logical flow with match 1 has actions next;.
2242
2243 Ingress Table 9: IPv6 ND RA responder
2244
2245 This table implements IPv6 ND RA responder for the IPv6 ND RA replies
2246 generated by the previous table.
2247
2248 • A priority-50 logical flow is added for each logical
2249 router port configured with IPv6 ND RA options which
2250 matches IPv6 ND RA packets and reg0[5] == 1 and responds
2251 back to the inport after applying these actions. If
2252 reg0[5] is set to 1, it means that the action
2253 put_nd_ra_opts was successful.
2254
2255 eth.dst = eth.src;
2256 eth.src = E;
2257 ip6.dst = ip6.src;
2258 ip6.src = I;
2259 outport = P;
2260 flags.loopback = 1;
2261 output;
2262
2263
2264 where E is the MAC address and I is the IPv6 link local
2265 address of the logical router port.
2266
2267 (This terminates packet processing in ingress pipeline;
2268 the packet does not go to the next ingress table.)
2269
2270 • A priority-0 logical flow with match 1 has actions next;.
2271
2272 Ingress Table 10: IP Routing
2273
2274 A packet that arrives at this table is an IP packet that should be
2275 routed to the address in ip4.dst or ip6.dst. This table implements IP
2276 routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
2277 (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
2278 and advances to the next table for ARP resolution. It also sets reg1
2279 (or xxreg1) to the IP address owned by the selected router port
2280 (ingress table ARP Request will generate an ARP request, if needed,
2281 with reg0 as the target protocol address and reg1 as the source proto‐
2282 col address).
2283
2284 For ECMP routes, i.e. multiple static routes with same policy and pre‐
2285 fix but different nexthops, the above actions are deferred to next ta‐
2286 ble. This table, instead, is responsible for determine the ECMP group
2287 id and select a member id within the group based on 5-tuple hashing. It
2288 stores group id in reg8[0..15] and member id in reg8[16..31]. This step
2289 is skipped if the traffic going out the ECMP route is reply traffic,
2290 and the ECMP route was configured to use symmetric replies. Instead,
2291 the stored ct_label value is used to choose the destination. The least
2292 significant 48 bits of the ct_label tell the destination MAC address to
2293 which the packet should be sent. The next 16 bits tell the logical
2294 router port on which the packet should be sent. These values in the
2295 ct_label are set when the initial ingress traffic is received over the
2296 ECMP route.
2297
2298 This table contains the following logical flows:
2299
2300 • Priority-550 flow that drops IPv6 Router Solicitation/Ad‐
2301 vertisement packets that were not processed in previous
2302 tables.
2303
2304 • Priority-500 flows that match IP multicast traffic des‐
2305 tined to groups registered on any of the attached
2306 switches and sets outport to the associated multicast
2307 group that will eventually flood the traffic to all in‐
2308 terested attached logical switches. The flows also decre‐
2309 ment TTL.
2310
2311 • Priority-450 flow that matches unregistered IP multicast
2312 traffic and sets outport to the MC_STATIC multicast
2313 group, which ovn-northd populates with the logical ports
2314 that have options :mcast_flood=’true’. If no router ports
2315 are configured to flood multicast traffic the packets are
2316 dropped.
2317
2318 • IPv4 routing table. For each route to IPv4 network N with
2319 netmask M, on router port P with IP address A and Ether‐
2320 net address E, a logical flow with match ip4.dst == N/M,
2321 whose priority is the number of 1-bits in M, has the fol‐
2322 lowing actions:
2323
2324 ip.ttl--;
2325 reg8[0..15] = 0;
2326 reg0 = G;
2327 reg1 = A;
2328 eth.src = E;
2329 outport = P;
2330 flags.loopback = 1;
2331 next;
2332
2333
2334 (Ingress table 1 already verified that ip.ttl--; will not
2335 yield a TTL exceeded error.)
2336
2337 If the route has a gateway, G is the gateway IP address.
2338 Instead, if the route is from a configured static route,
2339 G is the next hop IP address. Else it is ip4.dst.
2340
2341 • IPv6 routing table. For each route to IPv6 network N with
2342 netmask M, on router port P with IP address A and Ether‐
2343 net address E, a logical flow with match in CIDR notation
2344 ip6.dst == N/M, whose priority is the integer value of M,
2345 has the following actions:
2346
2347 ip.ttl--;
2348 reg8[0..15] = 0;
2349 xxreg0 = G;
2350 xxreg1 = A;
2351 eth.src = E;
2352 outport = inport;
2353 flags.loopback = 1;
2354 next;
2355
2356
2357 (Ingress table 1 already verified that ip.ttl--; will not
2358 yield a TTL exceeded error.)
2359
2360 If the route has a gateway, G is the gateway IP address.
2361 Instead, if the route is from a configured static route,
2362 G is the next hop IP address. Else it is ip6.dst.
2363
2364 If the address A is in the link-local scope, the route
2365 will be limited to sending on the ingress port.
2366
2367 • For ECMP routes, they are grouped by policy and prefix.
2368 An unique id (non-zero) is assigned to each group, and
2369 each member is also assigned an unique id (non-zero)
2370 within each group.
2371
2372 For each IPv4/IPv6 ECMP group with group id GID and mem‐
2373 ber ids MID1, MID2, ..., a logical flow with match in
2374 CIDR notation ip4.dst == N/M, or ip6.dst == N/M, whose
2375 priority is the integer value of M, has the following ac‐
2376 tions:
2377
2378 ip.ttl--;
2379 flags.loopback = 1;
2380 reg8[0..15] = GID;
2381 select(reg8[16..31], MID1, MID2, ...);
2382
2383
2384 Ingress Table 11: IP_ROUTING_ECMP
2385
2386 This table implements the second part of IP routing for ECMP routes
2387 following the previous table. If a packet matched a ECMP group in the
2388 previous table, this table matches the group id and member id stored
2389 from the previous table, setting reg0 (or xxreg0 for IPv6) to the next-
2390 hop IP address (leaving ip4.dst or ip6.dst, the packet’s final destina‐
2391 tion, unchanged) and advances to the next table for ARP resolution. It
2392 also sets reg1 (or xxreg1) to the IP address owned by the selected
2393 router port (ingress table ARP Request will generate an ARP request, if
2394 needed, with reg0 as the target protocol address and reg1 as the source
2395 protocol address).
2396
2397 This processing is skipped for reply traffic being sent out of an ECMP
2398 route if the route was configured to use symmetric replies.
2399
2400 This table contains the following logical flows:
2401
2402 • A priority-150 flow that matches reg8[0..15] == 0 with
2403 action next; directly bypasses packets of non-ECMP
2404 routes.
2405
2406 • For each member with ID MID in each ECMP group with ID
2407 GID, a priority-100 flow with match reg8[0..15] == GID &&
2408 reg8[16..31] == MID has following actions:
2409
2410 [xx]reg0 = G;
2411 [xx]reg1 = A;
2412 eth.src = E;
2413 outport = P;
2414
2415
2416 Ingress Table 12: Router policies
2417
2418 This table adds flows for the logical router policies configured on the
2419 logical router. Please see the OVN_Northbound database Logi‐
2420 cal_Router_Policy table documentation in ovn-nb for supported actions.
2421
2422 • For each router policy configured on the logical router,
2423 a logical flow is added with specified priority, match
2424 and actions.
2425
2426 • If the policy action is reroute with 2 or more nexthops
2427 defined, then the logical flow is added with the follow‐
2428 ing actions:
2429
2430 reg8[0..15] = GID;
2431 reg8[16..31] = select(1,..n);
2432
2433
2434 where GID is the ECMP group id generated by ovn-northd
2435 for this policy and n is the number of nexthops. select
2436 action selects one of the nexthop member id, stores it in
2437 the register reg8[16..31] and advances the packet to the
2438 next stage.
2439
2440 • If the policy action is reroute with just one nexhop,
2441 then the logical flow is added with the following ac‐
2442 tions:
2443
2444 [xx]reg0 = H;
2445 eth.src = E;
2446 outport = P;
2447 reg8[0..15] = 0;
2448 flags.loopback = 1;
2449 next;
2450
2451
2452 where H is the nexthop defined in the router policy, E
2453 is the ethernet address of the logical router port from
2454 which the nexthop is reachable and P is the logical
2455 router port from which the nexthop is reachable.
2456
2457 • If a router policy has the option pkt_mark=m set and if
2458 the action is not drop, then the action also includes
2459 pkt.mark = m to mark the packet with the marker m.
2460
2461 Ingress Table 13: ECMP handling for router policies
2462
2463 This table handles the ECMP for the router policies configured with
2464 multiple nexthops.
2465
2466 • A priority-150 flow is added to advance the packet to the
2467 next stage if the ECMP group id register reg8[0..15] is
2468 0.
2469
2470 • For each ECMP reroute router policy with multiple nex‐
2471 thops, a priority-100 flow is added for each nexthop H
2472 with the match reg8[0..15] == GID && reg8[16..31] == M
2473 where GID is the router policy group id generated by
2474 ovn-northd and M is the member id of the nexthop H gener‐
2475 ated by ovn-northd. The following actions are added to
2476 the flow:
2477
2478 [xx]reg0 = H;
2479 eth.src = E;
2480 outport = P
2481 "flags.loopback = 1; "
2482 "next;"
2483
2484
2485 where H is the nexthop defined in the router policy, E
2486 is the ethernet address of the logical router port from
2487 which the nexthop is reachable and P is the logical
2488 router port from which the nexthop is reachable.
2489
2490 Ingress Table 14: ARP/ND Resolution
2491
2492 Any packet that reaches this table is an IP packet whose next-hop IPv4
2493 address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
2494 contains the final destination.) This table resolves the IP address in
2495 reg0 (or xxreg0) into an output port in outport and an Ethernet address
2496 in eth.dst, using the following flows:
2497
2498 • A priority-500 flow that matches IP multicast traffic
2499 that was allowed in the routing pipeline. For this kind
2500 of traffic the outport was already set so the flow just
2501 advances to the next table.
2502
2503 • Static MAC bindings. MAC bindings can be known statically
2504 based on data in the OVN_Northbound database. For router
2505 ports connected to logical switches, MAC bindings can be
2506 known statically from the addresses column in the Logi‐
2507 cal_Switch_Port table. For router ports connected to
2508 other logical routers, MAC bindings can be known stati‐
2509 cally from the mac and networks column in the Logi‐
2510 cal_Router_Port table. (Note: the flow is NOT installed
2511 for the IP addresses that belong to a neighbor logical
2512 router port if the current router has the options:dy‐
2513 namic_neigh_routers set to true)
2514
2515 For each IPv4 address A whose host is known to have Eth‐
2516 ernet address E on router port P, a priority-100 flow
2517 with match outport === P && reg0 == A has actions eth.dst
2518 = E; next;.
2519
2520 For each virtual ip A configured on a logical port of
2521 type virtual and its virtual parent set in its corre‐
2522 sponding Port_Binding record and the virtual parent with
2523 the Ethernet address E and the virtual ip is reachable
2524 via the router port P, a priority-100 flow with match
2525 outport === P && reg0 == A has actions eth.dst = E;
2526 next;.
2527
2528 For each virtual ip A configured on a logical port of
2529 type virtual and its virtual parent not set in its corre‐
2530 sponding Port_Binding record and the virtual ip A is
2531 reachable via the router port P, a priority-100 flow with
2532 match outport === P && reg0 == A has actions eth.dst =
2533 00:00:00:00:00:00; next;. This flow is added so that the
2534 ARP is always resolved for the virtual ip A by generating
2535 ARP request and not consulting the MAC_Binding table as
2536 it can have incorrect value for the virtual ip A.
2537
2538 For each IPv6 address A whose host is known to have Eth‐
2539 ernet address E on router port P, a priority-100 flow
2540 with match outport === P && xxreg0 == A has actions
2541 eth.dst = E; next;.
2542
2543 For each logical router port with an IPv4 address A and a
2544 mac address of E that is reachable via a different logi‐
2545 cal router port P, a priority-100 flow with match outport
2546 === P && reg0 == A has actions eth.dst = E; next;.
2547
2548 For each logical router port with an IPv6 address A and a
2549 mac address of E that is reachable via a different logi‐
2550 cal router port P, a priority-100 flow with match outport
2551 === P && xxreg0 == A has actions eth.dst = E; next;.
2552
2553 • Static MAC bindings from NAT entries. MAC bindings can
2554 also be known for the entries in the NAT table. Below
2555 flows are programmed for distributed logical routers i.e
2556 with a distributed router port.
2557
2558 For each row in the NAT table with IPv4 address A in the
2559 external_ip column of NAT table, a priority-100 flow with
2560 the match outport === P && reg0 == A has actions eth.dst
2561 = E; next;, where P is the distributed logical router
2562 port, E is the Ethernet address if set in the exter‐
2563 nal_mac column of NAT table for of type dnat_and_snat,
2564 otherwise the Ethernet address of the distributed logical
2565 router port.
2566
2567 For IPv6 NAT entries, same flows are added, but using the
2568 register xxreg0 for the match.
2569
2570 • Traffic with IP destination an address owned by the
2571 router should be dropped. Such traffic is normally
2572 dropped in ingress table IP Input except for IPs that are
2573 also shared with SNAT rules. However, if there was no un‐
2574 SNAT operation that happened successfully until this
2575 point in the pipeline and the destination IP of the
2576 packet is still a router owned IP, the packets can be
2577 safely dropped.
2578
2579 A priority-1 logical flow with match ip4.dst = {..}
2580 matches on traffic destined to router owned IPv4 ad‐
2581 dresses which are also SNAT IPs. This flow has action
2582 drop;.
2583
2584 A priority-1 logical flow with match ip6.dst = {..}
2585 matches on traffic destined to router owned IPv6 ad‐
2586 dresses which are also SNAT IPs. This flow has action
2587 drop;.
2588
2589 • Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
2590 ings that have become known dynamically through ARP or
2591 neighbor discovery. (The ingress table ARP Request will
2592 issue an ARP or neighbor solicitation request for cases
2593 where the binding is not yet known.)
2594
2595 A priority-0 logical flow with match ip4 has actions
2596 get_arp(outport, reg0); next;.
2597
2598 A priority-0 logical flow with match ip6 has actions
2599 get_nd(outport, xxreg0); next;.
2600
2601 • For a distributed gateway LRP with redirect-type set to
2602 bridged, a priority-50 flow will match outport ==
2603 "ROUTER_PORT" and !is_chassis_resident ("cr-ROUTER_PORT")
2604 has actions eth.dst = E; next;, where E is the ethernet
2605 address of the logical router port.
2606
2607 Ingress Table 15: Check packet length
2608
2609 For distributed logical routers with distributed gateway port config‐
2610 ured with options:gateway_mtu to a valid integer value, this table adds
2611 a priority-50 logical flow with the match ip4 && outport == GW_PORT
2612 where GW_PORT is the distributed gateway router port and applies the
2613 action check_pkt_larger and advances the packet to the next table.
2614
2615 REGBIT_PKT_LARGER = check_pkt_larger(L); next;
2616
2617
2618 where L is the packet length to check for. If the packet is larger than
2619 L, it stores 1 in the register bit REGBIT_PKT_LARGER. The value of L is
2620 taken from options:gateway_mtu column of Logical_Router_Port row.
2621
2622 This table adds one priority-0 fallback flow that matches all packets
2623 and advances to the next table.
2624
2625 Ingress Table 16: Handle larger packets
2626
2627 For distributed logical routers with distributed gateway port config‐
2628 ured with options:gateway_mtu to a valid integer value, this table adds
2629 the following priority-50 logical flow for each logical router port
2630 with the match inport == LRP && outport == GW_PORT && REG‐
2631 BIT_PKT_LARGER, where LRP is the logical router port and GW_PORT is the
2632 distributed gateway router port and applies the following action for
2633 ipv4 and ipv6 respectively:
2634
2635 icmp4 {
2636 icmp4.type = 3; /* Destination Unreachable. */
2637 icmp4.code = 4; /* Frag Needed and DF was Set. */
2638 icmp4.frag_mtu = M;
2639 eth.dst = E;
2640 ip4.dst = ip4.src;
2641 ip4.src = I;
2642 ip.ttl = 255;
2643 REGBIT_EGRESS_LOOPBACK = 1;
2644 next(pipeline=ingress, table=0);
2645 };
2646 icmp6 {
2647 icmp6.type = 2;
2648 icmp6.code = 0;
2649 icmp6.frag_mtu = M;
2650 eth.dst = E;
2651 ip6.dst = ip6.src;
2652 ip6.src = I;
2653 ip.ttl = 255;
2654 REGBIT_EGRESS_LOOPBACK = 1;
2655 next(pipeline=ingress, table=0);
2656 };
2657
2658
2659 • Where M is the (fragment MTU - 58) whose value is taken
2660 from options:gateway_mtu column of Logical_Router_Port
2661 row.
2662
2663 • E is the Ethernet address of the logical router port.
2664
2665 • I is the IPv4/IPv6 address of the logical router port.
2666
2667 This table adds one priority-0 fallback flow that matches all packets
2668 and advances to the next table.
2669
2670 Ingress Table 17: Gateway Redirect
2671
2672 For distributed logical routers where one of the logical router ports
2673 specifies a gateway chassis, this table redirects certain packets to
2674 the distributed gateway port instance on the gateway chassis. This ta‐
2675 ble has the following flows:
2676
2677 • For each NAT rule in the OVN Northbound database that can
2678 be handled in a distributed manner, a priority-100 logi‐
2679 cal flow with match ip4.src == B && outport == GW &&
2680 is_chassis_resident(P), where GW is the logical router
2681 distributed gateway port and P is the NAT logical port.
2682 IP traffic matching the above rule will be managed lo‐
2683 cally setting reg1 to C and eth.src to D, where C is NAT
2684 external ip and D is NAT external mac.
2685
2686 • A priority-50 logical flow with match outport == GW has
2687 actions outport = CR; next;, where GW is the logical
2688 router distributed gateway port and CR is the chas‐
2689 sisredirect port representing the instance of the logical
2690 router distributed gateway port on the gateway chassis.
2691
2692 • A priority-0 logical flow with match 1 has actions next;.
2693
2694 Ingress Table 18: ARP Request
2695
2696 In the common case where the Ethernet destination has been resolved,
2697 this table outputs the packet. Otherwise, it composes and sends an ARP
2698 or IPv6 Neighbor Solicitation request. It holds the following flows:
2699
2700 • Unknown MAC address. A priority-100 flow for IPv4 packets
2701 with match eth.dst == 00:00:00:00:00:00 has the following
2702 actions:
2703
2704 arp {
2705 eth.dst = ff:ff:ff:ff:ff:ff;
2706 arp.spa = reg1;
2707 arp.tpa = reg0;
2708 arp.op = 1; /* ARP request. */
2709 output;
2710 };
2711
2712
2713 Unknown MAC address. For each IPv6 static route associ‐
2714 ated with the router with the nexthop IP: G, a prior‐
2715 ity-200 flow for IPv6 packets with match eth.dst ==
2716 00:00:00:00:00:00 && xxreg0 == G with the following ac‐
2717 tions is added:
2718
2719 nd_ns {
2720 eth.dst = E;
2721 ip6.dst = I
2722 nd.target = G;
2723 output;
2724 };
2725
2726
2727 Where E is the multicast mac derived from the Gateway IP,
2728 I is the solicited-node multicast address corresponding
2729 to the target address G.
2730
2731 Unknown MAC address. A priority-100 flow for IPv6 packets
2732 with match eth.dst == 00:00:00:00:00:00 has the following
2733 actions:
2734
2735 nd_ns {
2736 nd.target = xxreg0;
2737 output;
2738 };
2739
2740
2741 (Ingress table IP Routing initialized reg1 with the IP
2742 address owned by outport and (xx)reg0 with the next-hop
2743 IP address)
2744
2745 The IP packet that triggers the ARP/IPv6 NS request is
2746 dropped.
2747
2748 • Known MAC address. A priority-0 flow with match 1 has ac‐
2749 tions output;.
2750
2751 Egress Table 0: UNDNAT
2752
2753 This is for already established connections’ reverse traffic. i.e.,
2754 DNAT has already been done in ingress pipeline and now the packet has
2755 entered the egress pipeline as part of a reply. For NAT on a distrib‐
2756 uted router, it is unDNATted here. For Gateway routers, the unDNAT pro‐
2757 cessing is carried out in the ingress DNAT table.
2758
2759 • For all the configured load balancing rules for a router
2760 with gateway port in OVN_Northbound database that in‐
2761 cludes an IPv4 address VIP, for every backend IPv4 ad‐
2762 dress B defined for the VIP a priority-120 flow is pro‐
2763 grammed on gateway chassis that matches ip && ip4.src ==
2764 B && outport == GW, where GW is the logical router gate‐
2765 way port with an action ct_dnat;. If the backend IPv4 ad‐
2766 dress B is also configured with L4 port PORT of protocol
2767 P, then the match also includes P.src == PORT. These
2768 flows are not added for load balancers with IPv6 VIPs.
2769
2770 If the router is configured to force SNAT any load-bal‐
2771 anced packets, above action will be replaced by
2772 flags.force_snat_for_lb = 1; ct_dnat;.
2773
2774 • For each configuration in the OVN Northbound database
2775 that asks to change the destination IP address of a
2776 packet from an IP address of A to B, a priority-100 flow
2777 matches ip && ip4.src == B && outport == GW, where GW is
2778 the logical router gateway port, with an action ct_dnat;.
2779 If the NAT rule is of type dnat_and_snat and has state‐
2780 less=true in the options, then the action would be
2781 ip4/6.src= (B).
2782
2783 If the NAT rule cannot be handled in a distributed man‐
2784 ner, then the priority-100 flow above is only programmed
2785 on the gateway chassis.
2786
2787 If the NAT rule can be handled in a distributed manner,
2788 then there is an additional action eth.src = EA;, where
2789 EA is the ethernet address associated with the IP address
2790 A in the NAT rule. This allows upstream MAC learning to
2791 point to the correct chassis.
2792
2793 • A priority-0 logical flow with match 1 has actions next;.
2794
2795 Egress Table 1: SNAT
2796
2797 Packets that are configured to be SNATed get their source IP address
2798 changed based on the configuration in the OVN Northbound database.
2799
2800 • A priority-120 flow to advance the IPv6 Neighbor solici‐
2801 tation packet to next table to skip SNAT. In the case
2802 where ovn-controller injects an IPv6 Neighbor Solicita‐
2803 tion packet (for nd_ns action) we don’t want the packet
2804 to go throught conntrack.
2805
2806 Egress Table 1: SNAT on Gateway Routers
2807
2808 • If the Gateway router in the OVN Northbound database has
2809 been configured to force SNAT a packet (that has been
2810 previously DNATted) to B, a priority-100 flow matches
2811 flags.force_snat_for_dnat == 1 && ip with an action
2812 ct_snat(B);.
2813
2814 • If a load balancer configured to skip snat has been ap‐
2815 plied to the Gateway router pipeline, a priority-120 flow
2816 matches flags.skip_snat_for_lb == 1 && ip with an action
2817 next;.
2818
2819 • If the Gateway router in the OVN Northbound database has
2820 been configured to force SNAT a packet (that has been
2821 previously load-balanced) using router IP (i.e op‐
2822 tions:lb_force_snat_ip=router_ip), then for each logical
2823 router port P attached to the Gateway router, a prior‐
2824 ity-110 flow matches flags.force_snat_for_lb == 1 && out‐
2825 port == P
2826 with an action ct_snat(R); where R is the IP configured
2827 on the router port. If R is an IPv4 address then the
2828 match will also include ip4 and if it is an IPv6 address,
2829 then the match will also include ip6.
2830
2831 If the logical router port P is configured with multiple
2832 IPv4 and multiple IPv6 addresses, only the first IPv4 and
2833 first IPv6 address is considered.
2834
2835 • If the Gateway router in the OVN Northbound database has
2836 been configured to force SNAT a packet (that has been
2837 previously load-balanced) to B, a priority-100 flow
2838 matches flags.force_snat_for_lb == 1 && ip with an action
2839 ct_snat(B);.
2840
2841 • For each configuration in the OVN Northbound database,
2842 that asks to change the source IP address of a packet
2843 from an IP address of A or to change the source IP ad‐
2844 dress of a packet that belongs to network A to B, a flow
2845 matches ip && ip4.src == A with an action ct_snat(B);.
2846 The priority of the flow is calculated based on the mask
2847 of A, with matches having larger masks getting higher
2848 priorities. If the NAT rule is of type dnat_and_snat and
2849 has stateless=true in the options, then the action would
2850 be ip4/6.src= (B).
2851
2852 • If the NAT rule has allowed_ext_ips configured, then
2853 there is an additional match ip4.dst == allowed_ext_ips .
2854 Similarly, for IPV6, match would be ip6.dst == al‐
2855 lowed_ext_ips.
2856
2857 • If the NAT rule has exempted_ext_ips set, then there is
2858 an additional flow configured at the priority + 1 of cor‐
2859 responding NAT rule. The flow matches if destination ip
2860 is an exempted_ext_ip and the action is next; . This flow
2861 is used to bypass the ct_snat action for a packet which
2862 is destinted to exempted_ext_ips.
2863
2864 • A priority-0 logical flow with match 1 has actions next;.
2865
2866 Egress Table 1: SNAT on Distributed Routers
2867
2868 • For each configuration in the OVN Northbound database,
2869 that asks to change the source IP address of a packet
2870 from an IP address of A or to change the source IP ad‐
2871 dress of a packet that belongs to network A to B, a flow
2872 matches ip && ip4.src == A && outport == GW, where GW is
2873 the logical router gateway port, with an action
2874 ct_snat(B);. The priority of the flow is calculated based
2875 on the mask of A, with matches having larger masks get‐
2876 ting higher priorities. If the NAT rule is of type
2877 dnat_and_snat and has stateless=true in the options, then
2878 the action would be ip4/6.src= (B).
2879
2880 If the NAT rule cannot be handled in a distributed man‐
2881 ner, then the flow above is only programmed on the gate‐
2882 way chassis increasing flow priority by 128 in order to
2883 be run first
2884
2885 If the NAT rule can be handled in a distributed manner,
2886 then there is an additional action eth.src = EA;, where
2887 EA is the ethernet address associated with the IP address
2888 A in the NAT rule. This allows upstream MAC learning to
2889 point to the correct chassis.
2890
2891 If the NAT rule has allowed_ext_ips configured, then
2892 there is an additional match ip4.dst == allowed_ext_ips .
2893 Similarly, for IPV6, match would be ip6.dst == al‐
2894 lowed_ext_ips.
2895
2896 If the NAT rule has exempted_ext_ips set, then there is
2897 an additional flow configured at the priority + 1 of cor‐
2898 responding NAT rule. The flow matches if destination ip
2899 is an exempted_ext_ip and the action is next; . This flow
2900 is used to bypass the ct_snat action for a flow which is
2901 destinted to exempted_ext_ips.
2902
2903 • A priority-0 logical flow with match 1 has actions next;.
2904
2905 Egress Table 2: Egress Loopback
2906
2907 For distributed logical routers where one of the logical router ports
2908 specifies a gateway chassis.
2909
2910 While UNDNAT and SNAT processing have already occurred by this point,
2911 this traffic needs to be forced through egress loopback on this dis‐
2912 tributed gateway port instance, in order for UNSNAT and DNAT processing
2913 to be applied, and also for IP routing and ARP resolution after all of
2914 the NAT processing, so that the packet can be forwarded to the destina‐
2915 tion.
2916
2917 This table has the following flows:
2918
2919 • For each NAT rule in the OVN Northbound database on a
2920 distributed router, a priority-100 logical flow with
2921 match ip4.dst == E && outport == GW && is_chassis_resi‐
2922 dent(P), where E is the external IP address specified in
2923 the NAT rule, GW is the logical router distributed gate‐
2924 way port. For dnat_and_snat NAT rule, P is the logical
2925 port specified in the NAT rule. If logical_port column of
2926 NAT table is NOT set, then P is the chassisredirect port
2927 of GW with the following actions:
2928
2929 clone {
2930 ct_clear;
2931 inport = outport;
2932 outport = "";
2933 flags = 0;
2934 flags.loopback = 1;
2935 reg0 = 0;
2936 reg1 = 0;
2937 ...
2938 reg9 = 0;
2939 REGBIT_EGRESS_LOOPBACK = 1;
2940 next(pipeline=ingress, table=0);
2941 };
2942
2943
2944 flags.loopback is set since in_port is unchanged and the
2945 packet may return back to that port after NAT processing.
2946 REGBIT_EGRESS_LOOPBACK is set to indicate that egress
2947 loopback has occurred, in order to skip the source IP ad‐
2948 dress check against the router address.
2949
2950 • A priority-0 logical flow with match 1 has actions next;.
2951
2952 Egress Table 3: Delivery
2953
2954 Packets that reach this table are ready for delivery. It contains:
2955
2956 • Priority-110 logical flows that match IP multicast pack‐
2957 ets on each enabled logical router port and modify the
2958 Ethernet source address of the packets to the Ethernet
2959 address of the port and then execute action output;.
2960
2961 • Priority-100 logical flows that match packets on each en‐
2962 abled logical router port, with action output;.
2963
2964
2965
2966OVN 21.03.1 ovn-northd ovn-northd(8)