1ovn-northd(8) Open vSwitch Manual ovn-northd(8)
2
3
4
6 ovn-northd - Open Virtual Network central control daemon
7
9 ovn-northd [options]
10
12 ovn-northd is a centralized daemon responsible for translating the
13 high-level OVN configuration into logical configuration consumable by
14 daemons such as ovn-controller. It translates the logical network con‐
15 figuration in terms of conventional network concepts, taken from the
16 OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
17 the OVN Southbound Database (see ovn-sb(5)) below it.
18
20 --ovnnb-db=database
21 The OVSDB database containing the OVN Northbound Database. If
22 the OVN_NB_DB environment variable is set, its value is used as
23 the default. Otherwise, the default is unix:/var/run/open‐
24 vswitch/ovnnb_db.sock.
25
26 --ovnsb-db=database
27 The OVSDB database containing the OVN Southbound Database. If
28 the OVN_SB_DB environment variable is set, its value is used as
29 the default. Otherwise, the default is unix:/var/run/open‐
30 vswitch/ovnsb_db.sock.
31
32 database in the above options must be an OVSDB active or passive con‐
33 nection method, as described in ovsdb(7).
34
35 Daemon Options
36 --pidfile[=pidfile]
37 Causes a file (by default, program.pid) to be created indicating
38 the PID of the running process. If the pidfile argument is not
39 specified, or if it does not begin with /, then it is created in
40 /var/run/openvswitch.
41
42 If --pidfile is not specified, no pidfile is created.
43
44 --overwrite-pidfile
45 By default, when --pidfile is specified and the specified pid‐
46 file already exists and is locked by a running process, the dae‐
47 mon refuses to start. Specify --overwrite-pidfile to cause it to
48 instead overwrite the pidfile.
49
50 When --pidfile is not specified, this option has no effect.
51
52 --detach
53 Runs this program as a background process. The process forks,
54 and in the child it starts a new session, closes the standard
55 file descriptors (which has the side effect of disabling logging
56 to the console), and changes its current directory to the root
57 (unless --no-chdir is specified). After the child completes its
58 initialization, the parent exits.
59
60 --monitor
61 Creates an additional process to monitor this program. If it
62 dies due to a signal that indicates a programming error (SIGA‐
63 BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
64 or SIGXFSZ) then the monitor process starts a new copy of it. If
65 the daemon dies or exits for another reason, the monitor process
66 exits.
67
68 This option is normally used with --detach, but it also func‐
69 tions without it.
70
71 --no-chdir
72 By default, when --detach is specified, the daemon changes its
73 current working directory to the root directory after it
74 detaches. Otherwise, invoking the daemon from a carelessly cho‐
75 sen directory would prevent the administrator from unmounting
76 the file system that holds that directory.
77
78 Specifying --no-chdir suppresses this behavior, preventing the
79 daemon from changing its current working directory. This may be
80 useful for collecting core files, since it is common behavior to
81 write core dumps into the current working directory and the root
82 directory is not a good directory to use.
83
84 This option has no effect when --detach is not specified.
85
86 --no-self-confinement
87 By default this daemon will try to self-confine itself to work
88 with files under well-known directories whitelisted at build
89 time. It is better to stick with this default behavior and not
90 to use this flag unless some other Access Control is used to
91 confine daemon. Note that in contrast to other access control
92 implementations that are typically enforced from kernel-space
93 (e.g. DAC or MAC), self-confinement is imposed from the user-
94 space daemon itself and hence should not be considered as a full
95 confinement strategy, but instead should be viewed as an addi‐
96 tional layer of security.
97
98 --user=user:group
99 Causes this program to run as a different user specified in
100 user:group, thus dropping most of the root privileges. Short
101 forms user and :group are also allowed, with current user or
102 group assumed, respectively. Only daemons started by the root
103 user accepts this argument.
104
105 On Linux, daemons will be granted CAP_IPC_LOCK and
106 CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
107 that interact with a datapath, such as ovs-vswitchd, will be
108 granted three additional capabilities, namely CAP_NET_ADMIN,
109 CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
110 apply even if the new user is root.
111
112 On Windows, this option is not currently supported. For security
113 reasons, specifying this option will cause the daemon process
114 not to start.
115
116 Logging Options
117 -v[spec]
118 --verbose=[spec]
119 Sets logging levels. Without any spec, sets the log level for
120 every module and destination to dbg. Otherwise, spec is a list of
121 words separated by spaces or commas or colons, up to one from each
122 category below:
123
124 · A valid module name, as displayed by the vlog/list command
125 on ovs-appctl(8), limits the log level change to the speci‐
126 fied module.
127
128 · syslog, console, or file, to limit the log level change to
129 only to the system log, to the console, or to a file,
130 respectively. (If --detach is specified, the daemon closes
131 its standard file descriptors, so logging to the console
132 will have no effect.)
133
134 On Windows platform, syslog is accepted as a word and is
135 only useful along with the --syslog-target option (the word
136 has no effect otherwise).
137
138 · off, emer, err, warn, info, or dbg, to control the log
139 level. Messages of the given severity or higher will be
140 logged, and messages of lower severity will be filtered
141 out. off filters out all messages. See ovs-appctl(8) for a
142 definition of each log level.
143
144 Case is not significant within spec.
145
146 Regardless of the log levels set for file, logging to a file will
147 not take place unless --log-file is also specified (see below).
148
149 For compatibility with older versions of OVS, any is accepted as a
150 word but has no effect.
151
152 -v
153 --verbose
154 Sets the maximum logging verbosity level, equivalent to --ver‐
155 bose=dbg.
156
157 -vPATTERN:destination:pattern
158 --verbose=PATTERN:destination:pattern
159 Sets the log pattern for destination to pattern. Refer to
160 ovs-appctl(8) for a description of the valid syntax for pattern.
161
162 -vFACILITY:facility
163 --verbose=FACILITY:facility
164 Sets the RFC5424 facility of the log message. facility can be one
165 of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
166 ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
167 local4, local5, local6 or local7. If this option is not specified,
168 daemon is used as the default for the local system syslog and
169 local0 is used while sending a message to the target provided via
170 the --syslog-target option.
171
172 --log-file[=file]
173 Enables logging to a file. If file is specified, then it is used
174 as the exact name for the log file. The default log file name used
175 if file is omitted is /var/log/openvswitch/program.log.
176
177 --syslog-target=host:port
178 Send syslog messages to UDP port on host, in addition to the sys‐
179 tem syslog. The host must be a numerical IP address, not a host‐
180 name.
181
182 --syslog-method=method
183 Specify method as how syslog messages should be sent to syslog
184 daemon. The following forms are supported:
185
186 · libc, to use the libc syslog() function. This is the
187 default behavior. Downside of using this options is that
188 libc adds fixed prefix to every message before it is actu‐
189 ally sent to the syslog daemon over /dev/log UNIX domain
190 socket.
191
192 · unix:file, to use a UNIX domain socket directly. It is pos‐
193 sible to specify arbitrary message format with this option.
194 However, rsyslogd 8.9 and older versions use hard coded
195 parser function anyway that limits UNIX domain socket use.
196 If you want to use arbitrary message format with older
197 rsyslogd versions, then use UDP socket to localhost IP
198 address instead.
199
200 · udp:ip:port, to use a UDP socket. With this method it is
201 possible to use arbitrary message format also with older
202 rsyslogd. When sending syslog messages over UDP socket
203 extra precaution needs to be taken into account, for exam‐
204 ple, syslog daemon needs to be configured to listen on the
205 specified UDP port, accidental iptables rules could be
206 interfering with local syslog traffic and there are some
207 security considerations that apply to UDP sockets, but do
208 not apply to UNIX domain sockets.
209
210 PKI Options
211 PKI configuration is required in order to use SSL for the connections
212 to the Northbound and Southbound databases.
213
214 -p privkey.pem
215 --private-key=privkey.pem
216 Specifies a PEM file containing the private key used as
217 identity for outgoing SSL connections.
218
219 -c cert.pem
220 --certificate=cert.pem
221 Specifies a PEM file containing a certificate that certi‐
222 fies the private key specified on -p or --private-key to be
223 trustworthy. The certificate must be signed by the certifi‐
224 cate authority (CA) that the peer in SSL connections will
225 use to verify it.
226
227 -C cacert.pem
228 --ca-cert=cacert.pem
229 Specifies a PEM file containing the CA certificate for ver‐
230 ifying certificates presented to this program by SSL peers.
231 (This may be the same certificate that SSL peers use to
232 verify the certificate specified on -c or --certificate, or
233 it may be a different one, depending on the PKI design in
234 use.)
235
236 -C none
237 --ca-cert=none
238 Disables verification of certificates presented by SSL
239 peers. This introduces a security risk, because it means
240 that certificates cannot be verified to be those of known
241 trusted hosts.
242
243 Other Options
244 --unixctl=socket
245 Sets the name of the control socket on which program listens for
246 runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
247 below). If socket does not begin with /, it is interpreted as
248 relative to /var/run/openvswitch. If --unixctl is not used at
249 all, the default socket is /var/run/openvswitch/program.pid.ctl,
250 where pid is program’s process ID.
251
252 On Windows a local named pipe is used to listen for runtime man‐
253 agement commands. A file is created in the absolute path as
254 pointed by socket or if --unixctl is not used at all, a file is
255 created as program in the configured OVS_RUNDIR directory. The
256 file exists just to mimic the behavior of a Unix domain socket.
257
258 Specifying none for socket disables the control socket feature.
259
260
261
262 -h
263 --help
264 Prints a brief help message to the console.
265
266 -V
267 --version
268 Prints version information to the console.
269
271 ovs-appctl can send commands to a running ovn-northd process. The cur‐
272 rently supported commands are described below.
273
274 exit Causes ovn-northd to gracefully terminate.
275
277 You may run ovn-northd more than once in an OVN deployment. OVN will
278 automatically ensure that only one of them is active at a time. If mul‐
279 tiple instances of ovn-northd are running and the active ovn-northd
280 fails, one of the hot standby instances of ovn-northd will automati‐
281 cally take over.
282
284 One of the main purposes of ovn-northd is to populate the Logical_Flow
285 table in the OVN_Southbound database. This section describes how
286 ovn-northd does this for switch and router logical datapaths.
287
288 Logical Switch Datapaths
289 Ingress Table 0: Admission Control and Ingress Port Security - L2
290
291 Ingress table 0 contains these logical flows:
292
293 · Priority 100 flows to drop packets with VLAN tags or mul‐
294 ticast Ethernet source addresses.
295
296 · Priority 50 flows that implement ingress port security
297 for each enabled logical port. For logical ports on which
298 port security is enabled, these match the inport and the
299 valid eth.src address(es) and advance only those packets
300 to the next flow table. For logical ports on which port
301 security is not enabled, these advance all packets that
302 match the inport.
303
304 There are no flows for disabled logical ports because the default-drop
305 behavior of logical flow tables causes packets that ingress from them
306 to be dropped.
307
308 Ingress Table 1: Ingress Port Security - IP
309
310 Ingress table 1 contains these logical flows:
311
312 · For each element in the port security set having one or
313 more IPv4 or IPv6 addresses (or both),
314
315 · Priority 90 flow to allow IPv4 traffic if it has
316 IPv4 addresses which match the inport, valid
317 eth.src and valid ip4.src address(es).
318
319 · Priority 90 flow to allow IPv4 DHCP discovery
320 traffic if it has a valid eth.src. This is neces‐
321 sary since DHCP discovery messages are sent from
322 the unspecified IPv4 address (0.0.0.0) since the
323 IPv4 address has not yet been assigned.
324
325 · Priority 90 flow to allow IPv6 traffic if it has
326 IPv6 addresses which match the inport, valid
327 eth.src and valid ip6.src address(es).
328
329 · Priority 90 flow to allow IPv6 DAD (Duplicate
330 Address Detection) traffic if it has a valid
331 eth.src. This is is necessary since DAD include
332 requires joining an multicast group and sending
333 neighbor solicitations for the newly assigned
334 address. Since no address is yet assigned, these
335 are sent from the unspecified IPv6 address (::).
336
337 · Priority 80 flow to drop IP (both IPv4 and IPv6)
338 traffic which match the inport and valid eth.src.
339
340 · One priority-0 fallback flow that matches all packets and
341 advances to the next table.
342
343 Ingress Table 2: Ingress Port Security - Neighbor discovery
344
345 Ingress table 2 contains these logical flows:
346
347 · For each element in the port security set,
348
349 · Priority 90 flow to allow ARP traffic which match
350 the inport and valid eth.src and arp.sha. If the
351 element has one or more IPv4 addresses, then it
352 also matches the valid arp.spa.
353
354 · Priority 90 flow to allow IPv6 Neighbor Solicita‐
355 tion and Advertisement traffic which match the
356 inport, valid eth.src and nd.sll/nd.tll. If the
357 element has one or more IPv6 addresses, then it
358 also matches the valid nd.target address(es) for
359 Neighbor Advertisement traffic.
360
361 · Priority 80 flow to drop ARP and IPv6 Neighbor
362 Solicitation and Advertisement traffic which match
363 the inport and valid eth.src.
364
365 · One priority-0 fallback flow that matches all packets and
366 advances to the next table.
367
368 Ingress Table 3: from-lport Pre-ACLs
369
370 This table prepares flows for possible stateful ACL processing in
371 ingress table ACLs. It contains a priority-0 flow that simply moves
372 traffic to the next table. If stateful ACLs are used in the logical
373 datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
374 1; next;) for table Pre-stateful to send IP packets to the connection
375 tracker before eventually advancing to ingress table ACLs. If special
376 ports such as route ports or localnet ports can’t use ct(), a prior‐
377 ity-110 flow is added to skip over stateful ACLs.
378
379 Ingress Table 4: Pre-LB
380
381 This table prepares flows for possible stateful load balancing process‐
382 ing in ingress table LB and Stateful. It contains a priority-0 flow
383 that simply moves traffic to the next table. Moreover it contains a
384 priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
385 table. If load balancing rules with virtual IP addresses (and ports)
386 are configured in OVN_Northbound database for a logical switch data‐
387 path, a priority-100 flow is added for each configured virtual IP
388 address VIP. For IPv4 VIPs, the match is ip && ip4.dst == VIP. For IPv6
389 VIPs, the match is ip && ip6.dst == VIP. The flow sets an action
390 reg0[0] = 1; next; to act as a hint for table Pre-stateful to send IP
391 packets to the connection tracker for packet de-fragmentation before
392 eventually advancing to ingress table LB.
393
394 Ingress Table 5: Pre-stateful
395
396 This table prepares flows for all possible stateful processing in next
397 tables. It contains a priority-0 flow that simply moves traffic to the
398 next table. A priority-100 flow sends the packets to connection tracker
399 based on a hint provided by the previous tables (with a match for
400 reg0[0] == 1) by using the ct_next; action.
401
402 Ingress table 6: from-lport ACLs
403
404 Logical flows in this table closely reproduce those in the ACL table in
405 the OVN_Northbound database for the from-lport direction. The priority
406 values from the ACL table have a limited range and have 1000 added to
407 them to leave room for OVN default flows at both higher and lower pri‐
408 orities.
409
410 · allow ACLs translate into logical flows with the next;
411 action. If there are any stateful ACLs on this datapath,
412 then allow ACLs translate to ct_commit; next; (which acts
413 as a hint for the next tables to commit the connection to
414 conntrack),
415
416 · allow-related ACLs translate into logical flows with the
417 ct_commit(ct_label=0/1); next; actions for new connec‐
418 tions and reg0[1] = 1; next; for existing connections.
419
420 · Other ACLs translate to drop; for new or untracked con‐
421 nections and ct_commit(ct_label=1/1); for known connec‐
422 tions. Setting ct_label marks a connection as one that
423 was previously allowed, but should no longer be allowed
424 due to a policy change.
425
426 This table also contains a priority 0 flow with action next;, so that
427 ACLs allow packets by default. If the logical datapath has a statetful
428 ACL, the following flows will also be added:
429
430 · A priority-1 flow that sets the hint to commit IP traffic
431 to the connection tracker (with action reg0[1] = 1;
432 next;). This is needed for the default allow policy
433 because, while the initiator’s direction may not have any
434 stateful rules, the server’s may and then its return
435 traffic would not be known and marked as invalid.
436
437 · A priority-65535 flow that allows any traffic in the
438 reply direction for a connection that has been committed
439 to the connection tracker (i.e., established flows), as
440 long as the committed flow does not have ct_label.blocked
441 set. We only handle traffic in the reply direction here
442 because we want all packets going in the request direc‐
443 tion to still go through the flows that implement the
444 currently defined policy based on ACLs. If a connection
445 is no longer allowed by policy, ct_label.blocked will get
446 set and packets in the reply direction will no longer be
447 allowed, either.
448
449 · A priority-65535 flow that allows any traffic that is
450 considered related to a committed flow in the connection
451 tracker (e.g., an ICMP Port Unreachable from a non-lis‐
452 tening UDP port), as long as the committed flow does not
453 have ct_label.blocked set.
454
455 · A priority-65535 flow that drops all traffic marked by
456 the connection tracker as invalid.
457
458 · A priority-65535 flow that drops all trafic in the reply
459 direction with ct_label.blocked set meaning that the con‐
460 nection should no longer be allowed due to a policy
461 change. Packets in the request direction are skipped here
462 to let a newly created ACL re-allow this connection.
463
464 Ingress Table 7: from-lport QoS Marking
465
466 Logical flows in this table closely reproduce those in the QoS table
467 with the action column set in the OVN_Northbound database for the
468 from-lport direction.
469
470 · For every qos_rules entry in a logical switch with DSCP
471 marking enabled, a flow will be added at the priority
472 mentioned in the QoS table.
473
474 · One priority-0 fallback flow that matches all packets and
475 advances to the next table.
476
477 Ingress Table 8: from-lport QoS Meter
478
479 Logical flows in this table closely reproduce those in the QoS table
480 with the bandwidth column set in the OVN_Northbound database for the
481 from-lport direction.
482
483 · For every qos_rules entry in a logical switch with meter‐
484 ing enabled, a flow will be added at the priorirty men‐
485 tioned in the QoS table.
486
487 · One priority-0 fallback flow that matches all packets and
488 advances to the next table.
489
490 Ingress Table 9: LB
491
492 It contains a priority-0 flow that simply moves traffic to the next ta‐
493 ble. For established connections a priority 100 flow matches on ct.est
494 && !ct.rel && !ct.new && !ct.inv and sets an action reg0[2] = 1; next;
495 to act as a hint for table Stateful to send packets through connection
496 tracker to NAT the packets. (The packet will automatically get DNATed
497 to the same IP address as the first packet in that connection.)
498
499 Ingress Table 10: Stateful
500
501 · For all the configured load balancing rules for a switch
502 in OVN_Northbound database that includes a L4 port PORT
503 of protocol P and IP address VIP, a priority-120 flow is
504 added. For IPv4 VIPs , the flow matches ct.new && ip &&
505 ip4.dst == VIP && P && P.dst == PORT. For IPv6 VIPs, the
506 flow matches ct.new && ip && ip6.dst == VIP && P && P.dst
507 == PORT. The flow’s action is ct_lb(args) , where args
508 contains comma separated IP addresses (and optional port
509 numbers) to load balance to. The address family of the IP
510 addresses of args is the same as the address family of
511 VIP
512
513 · For all the configured load balancing rules for a switch
514 in OVN_Northbound database that includes just an IP
515 address VIP to match on, OVN adds a priority-110 flow.
516 For IPv4 VIPs, the flow matches ct.new && ip && ip4.dst
517 == VIP. For IPv6 VIPs, the flow matches ct.new && ip &&
518 ip6.dst == VIP. The action on this flow is ct_lb(args),
519 where args contains comma separated IP addresses of the
520 same address family as VIP.
521
522 · A priority-100 flow commits packets to connection tracker
523 using ct_commit; next; action based on a hint provided by
524 the previous tables (with a match for reg0[1] == 1).
525
526 · A priority-100 flow sends the packets to connection
527 tracker using ct_lb; as the action based on a hint pro‐
528 vided by the previous tables (with a match for reg0[2] ==
529 1).
530
531 · A priority-0 flow that simply moves traffic to the next
532 table.
533
534 Ingress Table 11: ARP/ND responder
535
536 This table implements ARP/ND responder in a logical switch for known
537 IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
538 by locally responding to ARP requests without the need to send to other
539 hypervisors. One common case is when the inport is a logical port asso‐
540 ciated with a VIF and the broadcast is responded to on the local hyper‐
541 visor rather than broadcast across the whole network and responded to
542 by the destination VM. This behavior is proxy ARP.
543
544 ARP requests arrive from VMs from a logical switch inport of type
545 default. For this case, the logical switch proxy ARP rules can be for
546 other VMs or logical router ports. Logical switch proxy ARP rules may
547 be programmed both for mac binding of IP addresses on other logical
548 switch VIF ports (which are of the default logical switch port type,
549 representing connectivity to VMs or containers), and for mac binding of
550 IP addresses on logical switch router type ports, representing their
551 logical router port peers. In order to support proxy ARP for logical
552 router ports, an IP address must be configured on the logical switch
553 router type port, with the same value as the peer logical router port.
554 The configured MAC addresses must match as well. When a VM sends an ARP
555 request for a distributed logical router port and if the peer router
556 type port of the attached logical switch does not have an IP address
557 configured, the ARP request will be broadcast on the logical switch.
558 One of the copies of the ARP request will go through the logical switch
559 router type port to the logical router datapath, where the logical
560 router ARP responder will generate a reply. The MAC binding of a dis‐
561 tributed logical router, once learned by an associated VM, is used for
562 all that VM’s communication needing routing. Hence, the action of a VM
563 re-arping for the mac binding of the logical router port should be
564 rare.
565
566 Logical switch ARP responder proxy ARP rules can also be hit when
567 receiving ARP requests externally on a L2 gateway port. In this case,
568 the hypervisor acting as an L2 gateway, responds to the ARP request on
569 behalf of a destination VM.
570
571 Note that ARP requests received from localnet or vtep logical inports
572 can either go directly to VMs, in which case the VM responds or can hit
573 an ARP responder for a logical router port if the packet is used to
574 resolve a logical router port next hop address. In either case, logical
575 switch ARP responder rules will not be hit. It contains these logical
576 flows:
577
578 · Priority-100 flows to skip the ARP responder if inport is
579 of type localnet or vtep and advances directly to the
580 next table. ARP requests sent to localnet or vtep ports
581 can be received by multiple hypervisors. Now, because the
582 same mac binding rules are downloaded to all hypervisors,
583 each of the multiple hypervisors will respond. This will
584 confuse L2 learning on the source of the ARP requests.
585 ARP requests received on an inport of type router are not
586 expected to hit any logical switch ARP responder flows.
587 However, no skip flows are installed for these packets,
588 as there would be some additional flow cost for this and
589 the value appears limited.
590
591 · Priority-50 flows that match ARP requests to each known
592 IP address A of every logical switch port, and respond
593 with ARP replies directly with corresponding Ethernet
594 address E:
595
596 eth.dst = eth.src;
597 eth.src = E;
598 arp.op = 2; /* ARP reply. */
599 arp.tha = arp.sha;
600 arp.sha = E;
601 arp.tpa = arp.spa;
602 arp.spa = A;
603 outport = inport;
604 flags.loopback = 1;
605 output;
606
607
608 These flows are omitted for logical ports (other than
609 router ports or localport ports) that are down.
610
611 · Priority-50 flows that match IPv6 ND neighbor solicita‐
612 tions to each known IP address A (and A’s solicited node
613 address) of every logical switch port except of type
614 router, and respond with neighbor advertisements directly
615 with corresponding Ethernet address E:
616
617 nd_na {
618 eth.src = E;
619 ip6.src = A;
620 nd.target = A;
621 nd.tll = E;
622 outport = inport;
623 flags.loopback = 1;
624 output;
625 };
626
627
628 Priority-50 flows that match IPv6 ND neighbor solicita‐
629 tions to each known IP address A (and A’s solicited node
630 address) of logical switch port of type router, and
631 respond with neighbor advertisements directly with corre‐
632 sponding Ethernet address E:
633
634 nd_na_router {
635 eth.src = E;
636 ip6.src = A;
637 nd.target = A;
638 nd.tll = E;
639 outport = inport;
640 flags.loopback = 1;
641 output;
642 };
643
644
645 These flows are omitted for logical ports (other than
646 router ports or localport ports) that are down.
647
648 · Priority-100 flows with match criteria like the ARP and
649 ND flows above, except that they only match packets from
650 the inport that owns the IP addresses in question, with
651 action next;. These flows prevent OVN from replying to,
652 for example, an ARP request emitted by a VM for its own
653 IP address. A VM only makes this kind of request to
654 attempt to detect a duplicate IP address assignment, so
655 sending a reply will prevent the VM from accepting the IP
656 address that it owns.
657
658 In place of next;, it would be reasonable to use drop;
659 for the flows’ actions. If everything is working as it is
660 configured, then this would produce equivalent results,
661 since no host should reply to the request. But ARPing for
662 one’s own IP address is intended to detect situations
663 where the network is not working as configured, so drop‐
664 ping the request would frustrate that intent.
665
666 · One priority-0 fallback flow that matches all packets and
667 advances to the next table.
668
669 Ingress Table 12: DHCP option processing
670
671 This table adds the DHCPv4 options to a DHCPv4 packet from the logical
672 ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
673 larly for DHCPv6 options.
674
675 · A priority-100 logical flow is added for these logical
676 ports which matches the IPv4 packet with udp.src = 68 and
677 udp.dst = 67 and applies the action put_dhcp_opts and
678 advances the packet to the next table.
679
680 reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
681 next;
682
683
684 For DHCPDISCOVER and DHCPREQUEST, this transforms the
685 packet into a DHCP reply, adds the DHCP offer IP ip and
686 options to the packet, and stores 1 into reg0[3]. For
687 other kinds of packets, it just stores 0 into reg0[3].
688 Either way, it continues to the next table.
689
690 · A priority-100 logical flow is added for these logical
691 ports which matches the IPv6 packet with udp.src = 546
692 and udp.dst = 547 and applies the action put_dhcpv6_opts
693 and advances the packet to the next table.
694
695 reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
696 next;
697
698
699 For DHCPv6 Solicit/Request/Confirm packets, this trans‐
700 forms the packet into a DHCPv6 Advertise/Reply, adds the
701 DHCPv6 offer IP ip and options to the packet, and stores
702 1 into reg0[3]. For other kinds of packets, it just
703 stores 0 into reg0[3]. Either way, it continues to the
704 next table.
705
706 · A priority-0 flow that matches all packets to advances to
707 table 11.
708
709 Ingress Table 13: DHCP responses
710
711 This table implements DHCP responder for the DHCP replies generated by
712 the previous table.
713
714 · A priority 100 logical flow is added for the logical
715 ports configured with DHCPv4 options which matches IPv4
716 packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
717 1 and responds back to the inport after applying these
718 actions. If reg0[3] is set to 1, it means that the action
719 put_dhcp_opts was successful.
720
721 eth.dst = eth.src;
722 eth.src = E;
723 ip4.dst = A;
724 ip4.src = S;
725 udp.src = 67;
726 udp.dst = 68;
727 outport = P;
728 flags.loopback = 1;
729 output;
730
731
732 where E is the server MAC address and S is the server
733 IPv4 address defined in the DHCPv4 options and A is the
734 IPv4 address defined in the logical port’s addresses col‐
735 umn.
736
737 (This terminates ingress packet processing; the packet
738 does not go to the next ingress table.)
739
740 · A priority 100 logical flow is added for the logical
741 ports configured with DHCPv6 options which matches IPv6
742 packets with udp.src == 546 && udp.dst == 547 && reg0[3]
743 == 1 and responds back to the inport after applying these
744 actions. If reg0[3] is set to 1, it means that the action
745 put_dhcpv6_opts was successful.
746
747 eth.dst = eth.src;
748 eth.src = E;
749 ip6.dst = A;
750 ip6.src = S;
751 udp.src = 547;
752 udp.dst = 546;
753 outport = P;
754 flags.loopback = 1;
755 output;
756
757
758 where E is the server MAC address and S is the server
759 IPv6 LLA address generated from the server_id defined in
760 the DHCPv6 options and A is the IPv6 address defined in
761 the logical port’s addresses column.
762
763 (This terminates packet processing; the packet does not
764 go on the next ingress table.)
765
766 · A priority-0 flow that matches all packets to advances to
767 table 12.
768
769 Ingress Table 14 DNS Lookup
770
771 This table looks up and resolves the DNS names to the corresponding
772 configured IP address(es).
773
774 · A priority-100 logical flow for each logical switch data‐
775 path if it is configured with DNS records, which matches
776 the IPv4 and IPv6 packets with udp.dst = 53 and applies
777 the action dns_lookup and advances the packet to the next
778 table.
779
780 reg0[4] = dns_lookup(); next;
781
782
783 For valid DNS packets, this transforms the packet into a
784 DNS reply if the DNS name can be resolved, and stores 1
785 into reg0[4]. For failed DNS resolution or other kinds of
786 packets, it just stores 0 into reg0[4]. Either way, it
787 continues to the next table.
788
789 Ingress Table 15 DNS Responses
790
791 This table implements DNS responder for the DNS replies generated by
792 the previous table.
793
794 · A priority-100 logical flow for each logical switch data‐
795 path if it is configured with DNS records, which matches
796 the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
797 1 and responds back to the inport after applying these
798 actions. If reg0[4] is set to 1, it means that the action
799 dns_lookup was successful.
800
801 eth.dst <-> eth.src;
802 ip4.src <-> ip4.dst;
803 udp.dst = udp.src;
804 udp.src = 53;
805 outport = P;
806 flags.loopback = 1;
807 output;
808
809
810 (This terminates ingress packet processing; the packet
811 does not go to the next ingress table.)
812
813 Ingress Table 16 Destination Lookup
814
815 This table implements switching behavior. It contains these logical
816 flows:
817
818 · A priority-100 flow that outputs all packets with an Eth‐
819 ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
820 ticast group, which ovn-northd populates with all enabled
821 logical ports.
822
823 · One priority-50 flow that matches each known Ethernet
824 address against eth.dst and outputs the packet to the
825 single associated output port.
826
827 For the Ethernet address on a logical switch port of type
828 router, when that logical switch port’s addresses column
829 is set to router and the connected logical router port
830 specifies a redirect-chassis:
831
832 · The flow for the connected logical router port’s
833 Ethernet address is only programmed on the redi‐
834 rect-chassis.
835
836 · If the logical router has rules specified in nat
837 with external_mac, then those addresses are also
838 used to populate the switch’s destination lookup
839 on the chassis where logical_port is resident.
840
841 · One priority-0 fallback flow that matches all packets and
842 outputs them to the MC_UNKNOWN multicast group, which
843 ovn-northd populates with all enabled logical ports that
844 accept unknown destination packets. As a small optimiza‐
845 tion, if no logical ports accept unknown destination
846 packets, ovn-northd omits this multicast group and logi‐
847 cal flow.
848
849 Egress Table 0: Pre-LB
850
851 This table is similar to ingress table Pre-LB. It contains a priority-0
852 flow that simply moves traffic to the next table. Moreover it contains
853 a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
854 table. If any load balancing rules exist for the datapath, a prior‐
855 ity-100 flow is added with a match of ip and action of reg0[0] = 1;
856 next; to act as a hint for table Pre-stateful to send IP packets to the
857 connection tracker for packet de-fragmentation.
858
859 Egress Table 1: to-lport Pre-ACLs
860
861 This is similar to ingress table Pre-ACLs except for to-lport traffic.
862
863 Egress Table 2: Pre-stateful
864
865 This is similar to ingress table Pre-stateful.
866
867 Egress Table 3: LB
868
869 This is similar to ingress table LB.
870
871 Egress Table 4: to-lport ACLs
872
873 This is similar to ingress table ACLs except for to-lport ACLs.
874
875 In addition, the following flows are added.
876
877 · A priority 34000 logical flow is added for each logical
878 port which has DHCPv4 options defined to allow the DHCPv4
879 reply packet and which has DHCPv6 options defined to
880 allow the DHCPv6 reply packet from the Ingress Table 13:
881 DHCP responses.
882
883 · A priority 34000 logical flow is added for each logical
884 switch datapath configured with DNS records with the
885 match udp.dst = 53 to allow the DNS reply packet from the
886 Ingress Table 15:DNS responses.
887
888 Egress Table 5: to-lport QoS Marking
889
890 This is similar to ingress table QoS marking except they apply to
891 to-lport QoS rules.
892
893 Egress Table 6: to-lport QoS Meter
894
895 This is similar to ingress table QoS meter except they apply to
896 to-lport QoS rules.
897
898 Egress Table 7: Stateful
899
900 This is similar to ingress table Stateful except that there are no
901 rules added for load balancing new connections.
902
903 Egress Table 8: Egress Port Security - IP
904
905 This is similar to the port security logic in table Ingress Port Secu‐
906 rity - IP except that outport, eth.dst, ip4.dst and ip6.dst are checked
907 instead of inport, eth.src, ip4.src and ip6.src
908
909 Egress Table 9: Egress Port Security - L2
910
911 This is similar to the ingress port security logic in ingress table
912 Admission Control and Ingress Port Security - L2, but with important
913 differences. Most obviously, outport and eth.dst are checked instead of
914 inport and eth.src. Second, packets directed to broadcast or multicast
915 eth.dst are always accepted instead of being subject to the port secu‐
916 rity rules; this is implemented through a priority-100 flow that
917 matches on eth.mcast with action output;. Finally, to ensure that even
918 broadcast and multicast packets are not delivered to disabled logical
919 ports, a priority-150 flow for each disabled logical outport overrides
920 the priority-100 flow with a drop; action.
921
922 Logical Router Datapaths
923 Logical router datapaths will only exist for Logical_Router rows in the
924 OVN_Northbound database that do not have enabled set to false
925
926 Ingress Table 0: L2 Admission Control
927
928 This table drops packets that the router shouldn’t see at all based on
929 their Ethernet headers. It contains the following flows:
930
931 · Priority-100 flows to drop packets with VLAN tags or mul‐
932 ticast Ethernet source addresses.
933
934 · For each enabled router port P with Ethernet address E, a
935 priority-50 flow that matches inport == P && (eth.mcast
936 || eth.dst == E), with action next;.
937
938 For the gateway port on a distributed logical router
939 (where one of the logical router ports specifies a redi‐
940 rect-chassis), the above flow matching eth.dst == E is
941 only programmed on the gateway port instance on the redi‐
942 rect-chassis.
943
944 · For each dnat_and_snat NAT rule on a distributed router
945 that specifies an external Ethernet address E, a prior‐
946 ity-50 flow that matches inport == GW && eth.dst == E,
947 where GW is the logical router gateway port, with action
948 next;.
949
950 This flow is only programmed on the gateway port instance
951 on the chassis where the logical_port specified in the
952 NAT rule resides.
953
954 Other packets are implicitly dropped.
955
956 Ingress Table 1: IP Input
957
958 This table is the core of the logical router datapath functionality. It
959 contains the following flows to implement very basic IP host function‐
960 ality.
961
962 · L3 admission control: A priority-100 flow drops packets
963 that match any of the following:
964
965 · ip4.src[28..31] == 0xe (multicast source)
966
967 · ip4.src == 255.255.255.255 (broadcast source)
968
969 · ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
970 (localhost source or destination)
971
972 · ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
973 network source or destination)
974
975 · ip4.src or ip6.src is any IP address owned by the
976 router, unless the packet was recirculated due to
977 egress loopback as indicated by REG‐
978 BIT_EGRESS_LOOPBACK.
979
980 · ip4.src is the broadcast address of any IP network
981 known to the router.
982
983 · ICMP echo reply. These flows reply to ICMP echo requests
984 received for the router’s IP address. Let A be an IP
985 address owned by a router port. Then, for each A that is
986 an IPv4 address, a priority-90 flow matches on ip4.dst ==
987 A and icmp4.type == 8 && icmp4.code == 0 (ICMP echo
988 request). For each A that is an IPv6 address, a prior‐
989 ity-90 flow matches on ip6.dst == A and icmp6.type == 128
990 && icmp6.code == 0 (ICMPv6 echo request). The port of the
991 router that receives the echo request does not matter.
992 Also, the ip.ttl of the echo request packet is not
993 checked, so it complies with RFC 1812, section 4.2.2.9.
994 Flows for ICMPv4 echo requests use the following actions:
995
996 ip4.dst <-> ip4.src;
997 ip.ttl = 255;
998 icmp4.type = 0;
999 flags.loopback = 1;
1000 next;
1001
1002
1003 Flows for ICMPv6 echo requests use the following actions:
1004
1005 ip6.dst <-> ip6.src;
1006 ip.ttl = 255;
1007 icmp6.type = 129;
1008 flags.loopback = 1;
1009 next;
1010
1011
1012 · Reply to ARP requests.
1013
1014 These flows reply to ARP requests for the router’s own IP
1015 address and populates mac binding table of the logical
1016 router port. The ARP requests are handled only if the
1017 requestor’s IP belongs to the same subnets of the logical
1018 router port. For each router port P that owns IP address
1019 A, which belongs to subnet S with prefix length L, and
1020 Ethernet address E, a priority-90 flow matches inport ==
1021 P && arp.spa == S/L && arp.op == 1 && arp.tpa == A (ARP
1022 request) with the following actions:
1023
1024 put_arp(inport, arp.spa, arp.sha);
1025 eth.dst = eth.src;
1026 eth.src = E;
1027 arp.op = 2; /* ARP reply. */
1028 arp.tha = arp.sha;
1029 arp.sha = E;
1030 arp.tpa = arp.spa;
1031 arp.spa = A;
1032 outport = P;
1033 flags.loopback = 1;
1034 output;
1035
1036
1037 For the gateway port on a distributed logical router
1038 (where one of the logical router ports specifies a redi‐
1039 rect-chassis), the above flows are only programmed on the
1040 gateway port instance on the redirect-chassis. This
1041 behavior avoids generation of multiple ARP responses from
1042 different chassis, and allows upstream MAC learning to
1043 point to the redirect-chassis.
1044
1045 · These flows handles ARP requests not for router’s own IP
1046 address. They use the SPA and SHA to populate the logical
1047 router port’s mac binding table, with priority 80. The
1048 typical use case of these flows are GARP requests han‐
1049 dling. For the gateway port on a distributed logical
1050 router, these flows are only programmed on the gateway
1051 port instance on the redirect-chassis.
1052
1053 · These flows reply to ARP requests for the virtual IP
1054 addresses configured in the router for DNAT or load bal‐
1055 ancing. For a configured DNAT IP address or a load bal‐
1056 ancer IPv4 VIP A, for each router port P with Ethernet
1057 address E, a priority-90 flow matches inport == P &&
1058 arp.op == 1 && arp.tpa == A (ARP request) with the fol‐
1059 lowing actions:
1060
1061 eth.dst = eth.src;
1062 eth.src = E;
1063 arp.op = 2; /* ARP reply. */
1064 arp.tha = arp.sha;
1065 arp.sha = E;
1066 arp.tpa = arp.spa;
1067 arp.spa = A;
1068 outport = P;
1069 flags.loopback = 1;
1070 output;
1071
1072
1073 For the gateway port on a distributed logical router with
1074 NAT (where one of the logical router ports specifies a
1075 redirect-chassis):
1076
1077 · If the corresponding NAT rule cannot be handled in
1078 a distributed manner, then this flow is only pro‐
1079 grammed on the gateway port instance on the redi‐
1080 rect-chassis. This behavior avoids generation of
1081 multiple ARP responses from different chassis, and
1082 allows upstream MAC learning to point to the redi‐
1083 rect-chassis.
1084
1085 · If the corresponding NAT rule can be handled in a
1086 distributed manner, then this flow is only pro‐
1087 grammed on the gateway port instance where the
1088 logical_port specified in the NAT rule resides.
1089
1090 Some of the actions are different for this case,
1091 using the external_mac specified in the NAT rule
1092 rather than the gateway port’s Ethernet address E:
1093
1094 eth.src = external_mac;
1095 arp.sha = external_mac;
1096
1097
1098 This behavior avoids generation of multiple ARP
1099 responses from different chassis, and allows
1100 upstream MAC learning to point to the correct
1101 chassis.
1102
1103 · ARP reply handling. This flow uses ARP replies to popu‐
1104 late the logical router’s ARP table. A priority-90 flow
1105 with match arp.op == 2 has actions put_arp(inport,
1106 arp.spa, arp.sha);.
1107
1108 · Reply to IPv6 Neighbor Solicitations. These flows reply
1109 to Neighbor Solicitation requests for the router’s own
1110 IPv6 address and load balancing IPv6 VIPs and populate
1111 the logical router’s mac binding table.
1112
1113 For each router port P that owns IPv6 address A,
1114 solicited node address S, and Ethernet address E, a pri‐
1115 ority-90 flow matches inport == P && nd_ns && ip6.dst ==
1116 {A, E} && nd.target == A with the following actions:
1117
1118 put_nd(inport, ip6.src, nd.sll);
1119 nd_na_router {
1120 eth.src = E;
1121 ip6.src = A;
1122 nd.target = A;
1123 nd.tll = E;
1124 outport = inport;
1125 flags.loopback = 1;
1126 output;
1127 };
1128
1129
1130 For each router port P that has load balancing VIP A,
1131 solicited node address S, and Ethernet address E, a pri‐
1132 ority-90 flow matches inport == P && nd_ns && ip6.dst ==
1133 {A, E} && nd.target == A with the following actions:
1134
1135 put_nd(inport, ip6.src, nd.sll);
1136 nd_na {
1137 eth.src = E;
1138 ip6.src = A;
1139 nd.target = A;
1140 nd.tll = E;
1141 outport = inport;
1142 flags.loopback = 1;
1143 output;
1144 };
1145
1146
1147 For the gateway port on a distributed logical router
1148 (where one of the logical router ports specifies a redi‐
1149 rect-chassis), the above flows replying to IPv6 Neighbor
1150 Solicitations are only programmed on the gateway port
1151 instance on the redirect-chassis. This behavior avoids
1152 generation of multiple replies from different chassis,
1153 and allows upstream MAC learning to point to the redi‐
1154 rect-chassis.
1155
1156 · IPv6 neighbor advertisement handling. This flow uses
1157 neighbor advertisements to populate the logical router’s
1158 mac binding table. A priority-90 flow with match nd_na
1159 has actions put_nd(inport, nd.target, nd.tll);.
1160
1161 · IPv6 neighbor solicitation for non-hosted addresses han‐
1162 dling. This flow uses neighbor solicitations to populate
1163 the logical router’s mac binding table (ones that were
1164 directed at the logical router would have matched the
1165 priority-90 neighbor solicitation flow already). A prior‐
1166 ity-80 flow with match nd_ns has actions put_nd(inport,
1167 ip6.src, nd.sll);.
1168
1169 · UDP port unreachable. Priority-80 flows generate ICMP
1170 port unreachable messages in reply to UDP datagrams
1171 directed to the router’s IP address, except in the spe‐
1172 cial case of gateways, which accept traffic directed to a
1173 router IP for load balancing and NAT purposes.
1174
1175 These flows should not match IP fragments with nonzero
1176 offset.
1177
1178 · TCP reset. Priority-80 flows generate TCP reset messages
1179 in reply to TCP datagrams directed to the router’s IP
1180 address, except in the special case of gateways, which
1181 accept traffic directed to a router IP for load balancing
1182 and NAT purposes.
1183
1184 These flows should not match IP fragments with nonzero
1185 offset.
1186
1187 · Protocol or address unreachable. Priority-70 flows gener‐
1188 ate ICMP protocol or address unreachable messages for
1189 IPv4 and IPv6 respectively in reply to packets directed
1190 to the router’s IP address on IP protocols other than
1191 UDP, TCP, and ICMP, except in the special case of gate‐
1192 ways, which accept traffic directed to a router IP for
1193 load balancing purposes.
1194
1195 These flows should not match IP fragments with nonzero
1196 offset.
1197
1198 · Drop other IP traffic to this router. These flows drop
1199 any other traffic destined to an IP address of this
1200 router that is not already handled by one of the flows
1201 above, which amounts to ICMP (other than echo requests)
1202 and fragments with nonzero offsets. For each IP address A
1203 owned by the router, a priority-60 flow matches ip4.dst
1204 == A and drops the traffic. An exception is made and the
1205 above flow is not added if the router port’s own IP
1206 address is used to SNAT packets passing through that
1207 router.
1208
1209 The flows above handle all of the traffic that might be directed to the
1210 router itself. The following flows (with lower priorities) handle the
1211 remaining traffic, potentially for forwarding:
1212
1213 · Drop Ethernet local broadcast. A priority-50 flow with
1214 match eth.bcast drops traffic destined to the local Eth‐
1215 ernet broadcast address. By definition this traffic
1216 should not be forwarded.
1217
1218 · ICMP time exceeded. For each router port P, whose IP
1219 address is A, a priority-40 flow with match inport == P
1220 && ip.ttl == {0, 1} && !ip.later_frag matches packets
1221 whose TTL has expired, with the following actions to send
1222 an ICMP time exceeded reply for IPv4 and IPv6 respec‐
1223 tively:
1224
1225 icmp4 {
1226 icmp4.type = 11; /* Time exceeded. */
1227 icmp4.code = 0; /* TTL exceeded in transit. */
1228 ip4.dst = ip4.src;
1229 ip4.src = A;
1230 ip.ttl = 255;
1231 next;
1232 };
1233 icmp6 {
1234 icmp6.type = 3; /* Time exceeded. */
1235 icmp6.code = 0; /* TTL exceeded in transit. */
1236 ip6.dst = ip6.src;
1237 ip6.src = A;
1238 ip.ttl = 255;
1239 next;
1240 };
1241
1242
1243 · TTL discard. A priority-30 flow with match ip.ttl == {0,
1244 1} and actions drop; drops other packets whose TTL has
1245 expired, that should not receive a ICMP error reply (i.e.
1246 fragments with nonzero offset).
1247
1248 · Next table. A priority-0 flows match all packets that
1249 aren’t already handled and uses actions next; to feed
1250 them to the next table.
1251
1252 Ingress Table 2: DEFRAG
1253
1254 This is to send packets to connection tracker for tracking and defrag‐
1255 mentation. It contains a priority-0 flow that simply moves traffic to
1256 the next table. If load balancing rules with virtual IP addresses (and
1257 ports) are configured in OVN_Northbound database for a Gateway router,
1258 a priority-100 flow is added for each configured virtual IP address
1259 VIP. For IPv4 VIPs the flow matches ip && ip4.dst == VIP. For IPv6
1260 VIPs, the flow matches ip && ip6.dst == VIP. The flow uses the action
1261 ct_next; to send IP packets to the connection tracker for packet de-
1262 fragmentation and tracking before sending it to the next table.
1263
1264 Ingress Table 3: UNSNAT
1265
1266 This is for already established connections’ reverse traffic. i.e.,
1267 SNAT has already been done in egress pipeline and now the packet has
1268 entered the ingress pipeline as part of a reply. It is unSNATted here.
1269
1270 Ingress Table 3: UNSNAT on Gateway Routers
1271
1272 · If the Gateway router has been configured to force SNAT
1273 any previously DNATted packets to B, a priority-110 flow
1274 matches ip && ip4.dst == B with an action ct_snat; .
1275
1276 If the Gateway router has been configured to force SNAT
1277 any previously load-balanced packets to B, a priority-100
1278 flow matches ip && ip4.dst == B with an action ct_snat; .
1279
1280 For each NAT configuration in the OVN Northbound data‐
1281 base, that asks to change the source IP address of a
1282 packet from A to B, a priority-90 flow matches ip &&
1283 ip4.dst == B with an action ct_snat; .
1284
1285 A priority-0 logical flow with match 1 has actions next;.
1286
1287 Ingress Table 3: UNSNAT on Distributed Routers
1288
1289 · For each configuration in the OVN Northbound database,
1290 that asks to change the source IP address of a packet
1291 from A to B, a priority-100 flow matches ip && ip4.dst ==
1292 B && inport == GW, where GW is the logical router gateway
1293 port, with an action ct_snat;.
1294
1295 If the NAT rule cannot be handled in a distributed man‐
1296 ner, then the priority-100 flow above is only programmed
1297 on the redirect-chassis.
1298
1299 For each configuration in the OVN Northbound database,
1300 that asks to change the source IP address of a packet
1301 from A to B, a priority-50 flow matches ip && ip4.dst ==
1302 B with an action REGBIT_NAT_REDIRECT = 1; next;. This
1303 flow is for east/west traffic to a NAT destination IPv4
1304 address. By setting the REGBIT_NAT_REDIRECT flag, in the
1305 ingress table Gateway Redirect this will trigger a redi‐
1306 rect to the instance of the gateway port on the redi‐
1307 rect-chassis.
1308
1309 A priority-0 logical flow with match 1 has actions next;.
1310
1311 Ingress Table 4: DNAT
1312
1313 Packets enter the pipeline with destination IP address that needs to be
1314 DNATted from a virtual IP address to a real IP address. Packets in the
1315 reverse direction needs to be unDNATed.
1316
1317 Ingress Table 4: Load balancing DNAT rules
1318
1319 Following load balancing DNAT flows are added for Gateway router or
1320 Router with gateway port. These flows are programmed only on the redi‐
1321 rect-chassis. These flows do not get programmed for load balancers with
1322 IPv6 VIPs.
1323
1324 · For all the configured load balancing rules for a Gateway
1325 router or Router with gateway port in OVN_Northbound
1326 database that includes a L4 port PORT of protocol P and
1327 IPv4 address VIP, a priority-120 flow that matches on
1328 ct.new && ip && ip4.dst == VIP && P && P.dst == PORT
1329 with an action of ct_lb(args), where args contains comma
1330 separated IPv4 addresses (and optional port numbers) to
1331 load balance to. If the router is configured to force
1332 SNAT any load-balanced packets, the above action will be
1333 replaced by flags.force_snat_for_lb = 1; ct_lb(args);.
1334
1335 · For all the configured load balancing rules for a router
1336 in OVN_Northbound database that includes a L4 port PORT
1337 of protocol P and IPv4 address VIP, a priority-120 flow
1338 that matches on ct.est && ip && ip4.dst == VIP && P &&
1339 P.dst == PORT
1340 with an action of ct_dnat;. If the router is configured
1341 to force SNAT any load-balanced packets, the above action
1342 will be replaced by flags.force_snat_for_lb = 1;
1343 ct_dnat;.
1344
1345 · For all the configured load balancing rules for a router
1346 in OVN_Northbound database that includes just an IP
1347 address VIP to match on, a priority-110 flow that matches
1348 on ct.new && ip && ip4.dst == VIP with an action of
1349 ct_lb(args), where args contains comma separated IPv4
1350 addresses. If the router is configured to force SNAT any
1351 load-balanced packets, the above action will be replaced
1352 by flags.force_snat_for_lb = 1; ct_lb(args);.
1353
1354 · For all the configured load balancing rules for a router
1355 in OVN_Northbound database that includes just an IP
1356 address VIP to match on, a priority-110 flow that matches
1357 on ct.est && ip && ip4.dst == VIP with an action of
1358 ct_dnat;. If the router is configured to force SNAT any
1359 load-balanced packets, the above action will be replaced
1360 by flags.force_snat_for_lb = 1; ct_dnat;.
1361
1362 Ingress Table 4: DNAT on Gateway Routers
1363
1364 · For each configuration in the OVN Northbound database,
1365 that asks to change the destination IP address of a
1366 packet from A to B, a priority-100 flow matches ip &&
1367 ip4.dst == A with an action flags.loopback = 1;
1368 ct_dnat(B);. If the Gateway router is configured to force
1369 SNAT any DNATed packet, the above action will be replaced
1370 by flags.force_snat_for_dnat = 1; flags.loopback = 1;
1371 ct_dnat(B);.
1372
1373 · For all IP packets of a Gateway router, a priority-50
1374 flow with an action flags.loopback = 1; ct_dnat;.
1375
1376 · A priority-0 logical flow with match 1 has actions next;.
1377
1378 Ingress Table 4: DNAT on Distributed Routers
1379
1380 On distributed routers, the DNAT table only handles packets with desti‐
1381 nation IP address that needs to be DNATted from a virtual IP address to
1382 a real IP address. The unDNAT processing in the reverse direction is
1383 handled in a separate table in the egress pipeline.
1384
1385 · For each configuration in the OVN Northbound database,
1386 that asks to change the destination IP address of a
1387 packet from A to B, a priority-100 flow matches ip &&
1388 ip4.dst == B && inport == GW, where GW is the logical
1389 router gateway port, with an action ct_dnat(B);.
1390
1391 If the NAT rule cannot be handled in a distributed man‐
1392 ner, then the priority-100 flow above is only programmed
1393 on the redirect-chassis.
1394
1395 For each configuration in the OVN Northbound database,
1396 that asks to change the destination IP address of a
1397 packet from A to B, a priority-50 flow matches ip &&
1398 ip4.dst == B with an action REGBIT_NAT_REDIRECT = 1;
1399 next;. This flow is for east/west traffic to a NAT desti‐
1400 nation IPv4 address. By setting the REGBIT_NAT_REDIRECT
1401 flag, in the ingress table Gateway Redirect this will
1402 trigger a redirect to the instance of the gateway port on
1403 the redirect-chassis.
1404
1405 A priority-0 logical flow with match 1 has actions next;.
1406
1407 Ingress Table 5: IPv6 ND RA option processing
1408
1409 · A priority-50 logical flow is added for each logical
1410 router port configured with IPv6 ND RA options which
1411 matches IPv6 ND Router Solicitation packet and applies
1412 the action put_nd_ra_opts and advances the packet to the
1413 next table.
1414
1415 reg0[5] = put_nd_ra_opts(options);next;
1416
1417
1418 For a valid IPv6 ND RS packet, this transforms the packet
1419 into an IPv6 ND RA reply and sets the RA options to the
1420 packet and stores 1 into reg0[5]. For other kinds of
1421 packets, it just stores 0 into reg0[5]. Either way, it
1422 continues to the next table.
1423
1424 · A priority-0 logical flow with match 1 has actions next;.
1425
1426 Ingress Table 6: IPv6 ND RA responder
1427
1428 This table implements IPv6 ND RA responder for the IPv6 ND RA replies
1429 generated by the previous table.
1430
1431 · A priority-50 logical flow is added for each logical
1432 router port configured with IPv6 ND RA options which
1433 matches IPv6 ND RA packets and reg0[5] == 1 and responds
1434 back to the inport after applying these actions. If
1435 reg0[5] is set to 1, it means that the action
1436 put_nd_ra_opts was successful.
1437
1438 eth.dst = eth.src;
1439 eth.src = E;
1440 ip6.dst = ip6.src;
1441 ip6.src = I;
1442 outport = P;
1443 flags.loopback = 1;
1444 output;
1445
1446
1447 where E is the MAC address and I is the IPv6 link local
1448 address of the logical router port.
1449
1450 (This terminates packet processing in ingress pipeline;
1451 the packet does not go to the next ingress table.)
1452
1453 · A priority-0 logical flow with match 1 has actions next;.
1454
1455 Ingress Table 7: IP Routing
1456
1457 A packet that arrives at this table is an IP packet that should be
1458 routed to the address in ip4.dst or ip6.dst. This table implements IP
1459 routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
1460 (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
1461 and advances to the next table for ARP resolution. It also sets reg1
1462 (or xxreg1) to the IP address owned by the selected router port
1463 (ingress table ARP Request will generate an ARP request, if needed,
1464 with reg0 as the target protocol address and reg1 as the source proto‐
1465 col address).
1466
1467 This table contains the following logical flows:
1468
1469 · For distributed logical routers where one of the logical
1470 router ports specifies a redirect-chassis, a priority-300
1471 logical flow with match REGBIT_NAT_REDIRECT == 1 has
1472 actions ip.ttl--; next;. The outport will be set later in
1473 the Gateway Redirect table.
1474
1475 · IPv4 routing table. For each route to IPv4 network N with
1476 netmask M, on router port P with IP address A and Ether‐
1477 net address E, a logical flow with match ip4.dst == N/M,
1478 whose priority is the number of 1-bits in M, has the fol‐
1479 lowing actions:
1480
1481 ip.ttl--;
1482 reg0 = G;
1483 reg1 = A;
1484 eth.src = E;
1485 outport = P;
1486 flags.loopback = 1;
1487 next;
1488
1489
1490 (Ingress table 1 already verified that ip.ttl--; will not
1491 yield a TTL exceeded error.)
1492
1493 If the route has a gateway, G is the gateway IP address.
1494 Instead, if the route is from a configured static route,
1495 G is the next hop IP address. Else it is ip4.dst.
1496
1497 · IPv6 routing table. For each route to IPv6 network N with
1498 netmask M, on router port P with IP address A and Ether‐
1499 net address E, a logical flow with match in CIDR notation
1500 ip6.dst == N/M, whose priority is the integer value of M,
1501 has the following actions:
1502
1503 ip.ttl--;
1504 xxreg0 = G;
1505 xxreg1 = A;
1506 eth.src = E;
1507 outport = P;
1508 flags.loopback = 1;
1509 next;
1510
1511
1512 (Ingress table 1 already verified that ip.ttl--; will not
1513 yield a TTL exceeded error.)
1514
1515 If the route has a gateway, G is the gateway IP address.
1516 Instead, if the route is from a configured static route,
1517 G is the next hop IP address. Else it is ip6.dst.
1518
1519 If the address A is in the link-local scope, the route
1520 will be limited to sending on the ingress port.
1521
1522 Ingress Table 8: ARP/ND Resolution
1523
1524 Any packet that reaches this table is an IP packet whose next-hop IPv4
1525 address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
1526 contains the final destination.) This table resolves the IP address in
1527 reg0 (or xxreg0) into an output port in outport and an Ethernet address
1528 in eth.dst, using the following flows:
1529
1530 · For distributed logical routers where one of the logical
1531 router ports specifies a redirect-chassis, a priority-200
1532 logical flow with match REGBIT_NAT_REDIRECT == 1 has
1533 actions eth.dst = E; next;, where E is the ethernet
1534 address of the router’s distributed gateway port.
1535
1536 · Static MAC bindings. MAC bindings can be known statically
1537 based on data in the OVN_Northbound database. For router
1538 ports connected to logical switches, MAC bindings can be
1539 known statically from the addresses column in the Logi‐
1540 cal_Switch_Port table. For router ports connected to
1541 other logical routers, MAC bindings can be known stati‐
1542 cally from the mac and networks column in the Logi‐
1543 cal_Router_Port table.
1544
1545 For each IPv4 address A whose host is known to have Eth‐
1546 ernet address E on router port P, a priority-100 flow
1547 with match outport === P && reg0 == A has actions eth.dst
1548 = E; next;.
1549
1550 For each IPv6 address A whose host is known to have Eth‐
1551 ernet address E on router port P, a priority-100 flow
1552 with match outport === P && xxreg0 == A has actions
1553 eth.dst = E; next;.
1554
1555 For each logical router port with an IPv4 address A and a
1556 mac address of E that is reachable via a different logi‐
1557 cal router port P, a priority-100 flow with match outport
1558 === P && reg0 == A has actions eth.dst = E; next;.
1559
1560 For each logical router port with an IPv6 address A and a
1561 mac address of E that is reachable via a different logi‐
1562 cal router port P, a priority-100 flow with match outport
1563 === P && xxreg0 == A has actions eth.dst = E; next;.
1564
1565 · Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
1566 ings that have become known dynamically through ARP or
1567 neighbor discovery. (The ingress table ARP Request will
1568 issue an ARP or neighbor solicitation request for cases
1569 where the binding is not yet known.)
1570
1571 A priority-0 logical flow with match ip4 has actions
1572 get_arp(outport, reg0); next;.
1573
1574 A priority-0 logical flow with match ip6 has actions
1575 get_nd(outport, xxreg0); next;.
1576
1577 Ingress Table 9: Gateway Redirect
1578
1579 For distributed logical routers where one of the logical router ports
1580 specifies a redirect-chassis, this table redirects certain packets to
1581 the distributed gateway port instance on the redirect-chassis. This ta‐
1582 ble has the following flows:
1583
1584 · A priority-200 logical flow with match REGBIT_NAT_REDI‐
1585 RECT == 1 has actions outport = CR; next;, where CR is
1586 the chassisredirect port representing the instance of the
1587 logical router distributed gateway port on the redi‐
1588 rect-chassis.
1589
1590 · A priority-150 logical flow with match outport == GW &&
1591 eth.dst == 00:00:00:00:00:00 has actions outport = CR;
1592 next;, where GW is the logical router distributed gateway
1593 port and CR is the chassisredirect port representing the
1594 instance of the logical router distributed gateway port
1595 on the redirect-chassis.
1596
1597 · For each NAT rule in the OVN Northbound database that can
1598 be handled in a distributed manner, a priority-100 logi‐
1599 cal flow with match ip4.src == B && outport == GW, where
1600 GW is the logical router distributed gateway port, with
1601 actions next;.
1602
1603 · A priority-50 logical flow with match outport == GW has
1604 actions outport = CR; next;, where GW is the logical
1605 router distributed gateway port and CR is the chas‐
1606 sisredirect port representing the instance of the logical
1607 router distributed gateway port on the redirect-chassis.
1608
1609 · A priority-0 logical flow with match 1 has actions next;.
1610
1611 Ingress Table 10: ARP Request
1612
1613 In the common case where the Ethernet destination has been resolved,
1614 this table outputs the packet. Otherwise, it composes and sends an ARP
1615 or IPv6 Neighbor Solicitation request. It holds the following flows:
1616
1617 · Unknown MAC address. A priority-100 flow for IPv4 packets
1618 with match eth.dst == 00:00:00:00:00:00 has the following
1619 actions:
1620
1621 arp {
1622 eth.dst = ff:ff:ff:ff:ff:ff;
1623 arp.spa = reg1;
1624 arp.tpa = reg0;
1625 arp.op = 1; /* ARP request. */
1626 output;
1627 };
1628
1629
1630 Unknown MAC address. A priority-100 flow for IPv6 packets
1631 with match eth.dst == 00:00:00:00:00:00 has the following
1632 actions:
1633
1634 nd_ns {
1635 nd.target = xxreg0;
1636 output;
1637 };
1638
1639
1640 (Ingress table IP Routing initialized reg1 with the IP
1641 address owned by outport and (xx)reg0 with the next-hop
1642 IP address)
1643
1644 The IP packet that triggers the ARP/IPv6 NS request is
1645 dropped.
1646
1647 · Known MAC address. A priority-0 flow with match 1 has
1648 actions output;.
1649
1650 Egress Table 0: UNDNAT
1651
1652 This is for already established connections’ reverse traffic. i.e.,
1653 DNAT has already been done in ingress pipeline and now the packet has
1654 entered the egress pipeline as part of a reply. For NAT on a distrib‐
1655 uted router, it is unDNATted here. For Gateway routers, the unDNAT pro‐
1656 cessing is carried out in the ingress DNAT table.
1657
1658 · For all the configured load balancing rules for a router
1659 with gateway port in OVN_Northbound database that
1660 includes an IPv4 address VIP, for every backend IPv4
1661 address B defined for the VIP a priority-120 flow is pro‐
1662 grammed on redirect-chassis that matches ip && ip4.src ==
1663 B && outport == GW, where GW is the logical router gate‐
1664 way port with an action ct_dnat;. If the backend IPv4
1665 address B is also configured with L4 port PORT of proto‐
1666 col P, then the match also includes P.src == PORT. These
1667 flows are not added for load balancers with IPv6 VIPs.
1668
1669 If the router is configured to force SNAT any load-bal‐
1670 anced packets, above action will be replaced by
1671 flags.force_snat_for_lb = 1; ct_dnat;.
1672
1673 · For each configuration in the OVN Northbound database
1674 that asks to change the destination IP address of a
1675 packet from an IP address of A to B, a priority-100 flow
1676 matches ip && ip4.src == B && outport == GW, where GW is
1677 the logical router gateway port, with an action ct_dnat;.
1678
1679 If the NAT rule cannot be handled in a distributed man‐
1680 ner, then the priority-100 flow above is only programmed
1681 on the redirect-chassis.
1682
1683 If the NAT rule can be handled in a distributed manner,
1684 then there is an additional action eth.src = EA;, where
1685 EA is the ethernet address associated with the IP address
1686 A in the NAT rule. This allows upstream MAC learning to
1687 point to the correct chassis.
1688
1689 · A priority-0 logical flow with match 1 has actions next;.
1690
1691 Egress Table 1: SNAT
1692
1693 Packets that are configured to be SNATed get their source IP address
1694 changed based on the configuration in the OVN Northbound database.
1695
1696 Egress Table 1: SNAT on Gateway Routers
1697
1698 · If the Gateway router in the OVN Northbound database has
1699 been configured to force SNAT a packet (that has been
1700 previously DNATted) to B, a priority-100 flow matches
1701 flags.force_snat_for_dnat == 1 && ip with an action
1702 ct_snat(B);.
1703
1704 If the Gateway router in the OVN Northbound database has
1705 been configured to force SNAT a packet (that has been
1706 previously load-balanced) to B, a priority-100 flow
1707 matches flags.force_snat_for_lb == 1 && ip with an action
1708 ct_snat(B);.
1709
1710 For each configuration in the OVN Northbound database,
1711 that asks to change the source IP address of a packet
1712 from an IP address of A or to change the source IP
1713 address of a packet that belongs to network A to B, a
1714 flow matches ip && ip4.src == A with an action
1715 ct_snat(B);. The priority of the flow is calculated based
1716 on the mask of A, with matches having larger masks get‐
1717 ting higher priorities.
1718
1719 A priority-0 logical flow with match 1 has actions next;.
1720
1721 Egress Table 1: SNAT on Distributed Routers
1722
1723 · For each configuration in the OVN Northbound database,
1724 that asks to change the source IP address of a packet
1725 from an IP address of A or to change the source IP
1726 address of a packet that belongs to network A to B, a
1727 flow matches ip && ip4.src == A && outport == GW, where
1728 GW is the logical router gateway port, with an action
1729 ct_snat(B);. The priority of the flow is calculated based
1730 on the mask of A, with matches having larger masks get‐
1731 ting higher priorities.
1732
1733 If the NAT rule cannot be handled in a distributed man‐
1734 ner, then the flow above is only programmed on the redi‐
1735 rect-chassis.
1736
1737 If the NAT rule can be handled in a distributed manner,
1738 then there is an additional action eth.src = EA;, where
1739 EA is the ethernet address associated with the IP address
1740 A in the NAT rule. This allows upstream MAC learning to
1741 point to the correct chassis.
1742
1743 · A priority-0 logical flow with match 1 has actions next;.
1744
1745 Egress Table 2: Egress Loopback
1746
1747 For distributed logical routers where one of the logical router ports
1748 specifies a redirect-chassis.
1749
1750 Earlier in the ingress pipeline, some east-west traffic was redirected
1751 to the chassisredirect port, based on flows in the UNSNAT and DNAT
1752 ingress tables setting the REGBIT_NAT_REDIRECT flag, which then trig‐
1753 gered a match to a flow in the Gateway Redirect ingress table. The
1754 intention was not to actually send traffic out the distributed gateway
1755 port instance on the redirect-chassis. This traffic was sent to the
1756 distributed gateway port instance in order for DNAT and/or SNAT pro‐
1757 cessing to be applied.
1758
1759 While UNDNAT and SNAT processing have already occurred by this point,
1760 this traffic needs to be forced through egress loopback on this dis‐
1761 tributed gateway port instance, in order for UNSNAT and DNAT processing
1762 to be applied, and also for IP routing and ARP resolution after all of
1763 the NAT processing, so that the packet can be forwarded to the destina‐
1764 tion.
1765
1766 This table has the following flows:
1767
1768 · For each NAT rule in the OVN Northbound database on a
1769 distributed router, a priority-100 logical flow with
1770 match ip4.dst == E && outport == GW, where E is the
1771 external IP address specified in the NAT rule, and GW is
1772 the logical router distributed gateway port, with the
1773 following actions:
1774
1775 clone {
1776 ct_clear;
1777 inport = outport;
1778 outport = "";
1779 flags = 0;
1780 flags.loopback = 1;
1781 reg0 = 0;
1782 reg1 = 0;
1783 ...
1784 reg9 = 0;
1785 REGBIT_EGRESS_LOOPBACK = 1;
1786 next(pipeline=ingress, table=0);
1787 };
1788
1789
1790 flags.loopback is set since in_port is unchanged and the
1791 packet may return back to that port after NAT processing.
1792 REGBIT_EGRESS_LOOPBACK is set to indicate that egress
1793 loopback has occurred, in order to skip the source IP
1794 address check against the router address.
1795
1796 · A priority-0 logical flow with match 1 has actions next;.
1797
1798 Egress Table 3: Delivery
1799
1800 Packets that reach this table are ready for delivery. It contains pri‐
1801 ority-100 logical flows that match packets on each enabled logical
1802 router port, with action output;.
1803
1804
1805
1806Open vSwitch 2.10.1 ovn-northd ovn-northd(8)