1ovn-northd(8) Open vSwitch Manual ovn-northd(8)
2
3
4
6 ovn-northd - Open Virtual Network central control daemon
7
9 ovn-northd [options]
10
12 ovn-northd is a centralized daemon responsible for translating the
13 high-level OVN configuration into logical configuration consumable by
14 daemons such as ovn-controller. It translates the logical network con‐
15 figuration in terms of conventional network concepts, taken from the
16 OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
17 the OVN Southbound Database (see ovn-sb(5)) below it.
18
20 --ovnnb-db=database
21 The OVSDB database containing the OVN Northbound Database. If
22 the OVN_NB_DB environment variable is set, its value is used as
23 the default. Otherwise, the default is unix:/var/run/open‐
24 vswitch/ovnnb_db.sock.
25
26 --ovnsb-db=database
27 The OVSDB database containing the OVN Southbound Database. If
28 the OVN_SB_DB environment variable is set, its value is used as
29 the default. Otherwise, the default is unix:/var/run/open‐
30 vswitch/ovnsb_db.sock.
31
32 database in the above options must be an OVSDB active or passive con‐
33 nection method, as described in ovsdb(7).
34
35 Daemon Options
36 --pidfile[=pidfile]
37 Causes a file (by default, program.pid) to be created indicating
38 the PID of the running process. If the pidfile argument is not
39 specified, or if it does not begin with /, then it is created in
40 /var/run/openvswitch.
41
42 If --pidfile is not specified, no pidfile is created.
43
44 --overwrite-pidfile
45 By default, when --pidfile is specified and the specified pid‐
46 file already exists and is locked by a running process, the dae‐
47 mon refuses to start. Specify --overwrite-pidfile to cause it to
48 instead overwrite the pidfile.
49
50 When --pidfile is not specified, this option has no effect.
51
52 --detach
53 Runs this program as a background process. The process forks,
54 and in the child it starts a new session, closes the standard
55 file descriptors (which has the side effect of disabling logging
56 to the console), and changes its current directory to the root
57 (unless --no-chdir is specified). After the child completes its
58 initialization, the parent exits.
59
60 --monitor
61 Creates an additional process to monitor this program. If it
62 dies due to a signal that indicates a programming error (SIGA‐
63 BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
64 or SIGXFSZ) then the monitor process starts a new copy of it. If
65 the daemon dies or exits for another reason, the monitor process
66 exits.
67
68 This option is normally used with --detach, but it also func‐
69 tions without it.
70
71 --no-chdir
72 By default, when --detach is specified, the daemon changes its
73 current working directory to the root directory after it
74 detaches. Otherwise, invoking the daemon from a carelessly cho‐
75 sen directory would prevent the administrator from unmounting
76 the file system that holds that directory.
77
78 Specifying --no-chdir suppresses this behavior, preventing the
79 daemon from changing its current working directory. This may be
80 useful for collecting core files, since it is common behavior to
81 write core dumps into the current working directory and the root
82 directory is not a good directory to use.
83
84 This option has no effect when --detach is not specified.
85
86 --no-self-confinement
87 By default this daemon will try to self-confine itself to work
88 with files under well-known directories whitelisted at build
89 time. It is better to stick with this default behavior and not
90 to use this flag unless some other Access Control is used to
91 confine daemon. Note that in contrast to other access control
92 implementations that are typically enforced from kernel-space
93 (e.g. DAC or MAC), self-confinement is imposed from the user-
94 space daemon itself and hence should not be considered as a full
95 confinement strategy, but instead should be viewed as an addi‐
96 tional layer of security.
97
98 --user=user:group
99 Causes this program to run as a different user specified in
100 user:group, thus dropping most of the root privileges. Short
101 forms user and :group are also allowed, with current user or
102 group assumed, respectively. Only daemons started by the root
103 user accepts this argument.
104
105 On Linux, daemons will be granted CAP_IPC_LOCK and
106 CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
107 that interact with a datapath, such as ovs-vswitchd, will be
108 granted three additional capabilities, namely CAP_NET_ADMIN,
109 CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
110 apply even if the new user is root.
111
112 On Windows, this option is not currently supported. For security
113 reasons, specifying this option will cause the daemon process
114 not to start.
115
116 Logging Options
117 -v[spec]
118 --verbose=[spec]
119 Sets logging levels. Without any spec, sets the log level for
120 every module and destination to dbg. Otherwise, spec is a list of
121 words separated by spaces or commas or colons, up to one from each
122 category below:
123
124 · A valid module name, as displayed by the vlog/list command
125 on ovs-appctl(8), limits the log level change to the speci‐
126 fied module.
127
128 · syslog, console, or file, to limit the log level change to
129 only to the system log, to the console, or to a file,
130 respectively. (If --detach is specified, the daemon closes
131 its standard file descriptors, so logging to the console
132 will have no effect.)
133
134 On Windows platform, syslog is accepted as a word and is
135 only useful along with the --syslog-target option (the word
136 has no effect otherwise).
137
138 · off, emer, err, warn, info, or dbg, to control the log
139 level. Messages of the given severity or higher will be
140 logged, and messages of lower severity will be filtered
141 out. off filters out all messages. See ovs-appctl(8) for a
142 definition of each log level.
143
144 Case is not significant within spec.
145
146 Regardless of the log levels set for file, logging to a file will
147 not take place unless --log-file is also specified (see below).
148
149 For compatibility with older versions of OVS, any is accepted as a
150 word but has no effect.
151
152 -v
153 --verbose
154 Sets the maximum logging verbosity level, equivalent to --ver‐
155 bose=dbg.
156
157 -vPATTERN:destination:pattern
158 --verbose=PATTERN:destination:pattern
159 Sets the log pattern for destination to pattern. Refer to
160 ovs-appctl(8) for a description of the valid syntax for pattern.
161
162 -vFACILITY:facility
163 --verbose=FACILITY:facility
164 Sets the RFC5424 facility of the log message. facility can be one
165 of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
166 ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
167 local4, local5, local6 or local7. If this option is not specified,
168 daemon is used as the default for the local system syslog and
169 local0 is used while sending a message to the target provided via
170 the --syslog-target option.
171
172 --log-file[=file]
173 Enables logging to a file. If file is specified, then it is used
174 as the exact name for the log file. The default log file name used
175 if file is omitted is /var/log/openvswitch/program.log.
176
177 --syslog-target=host:port
178 Send syslog messages to UDP port on host, in addition to the sys‐
179 tem syslog. The host must be a numerical IP address, not a host‐
180 name.
181
182 --syslog-method=method
183 Specify method as how syslog messages should be sent to syslog
184 daemon. The following forms are supported:
185
186 · libc, to use the libc syslog() function. This is the
187 default behavior. Downside of using this options is that
188 libc adds fixed prefix to every message before it is actu‐
189 ally sent to the syslog daemon over /dev/log UNIX domain
190 socket.
191
192 · unix:file, to use a UNIX domain socket directly. It is pos‐
193 sible to specify arbitrary message format with this option.
194 However, rsyslogd 8.9 and older versions use hard coded
195 parser function anyway that limits UNIX domain socket use.
196 If you want to use arbitrary message format with older
197 rsyslogd versions, then use UDP socket to localhost IP
198 address instead.
199
200 · udp:ip:port, to use a UDP socket. With this method it is
201 possible to use arbitrary message format also with older
202 rsyslogd. When sending syslog messages over UDP socket
203 extra precaution needs to be taken into account, for exam‐
204 ple, syslog daemon needs to be configured to listen on the
205 specified UDP port, accidental iptables rules could be
206 interfering with local syslog traffic and there are some
207 security considerations that apply to UDP sockets, but do
208 not apply to UNIX domain sockets.
209
210 PKI Options
211 PKI configuration is required in order to use SSL for the connections
212 to the Northbound and Southbound databases.
213
214 -p privkey.pem
215 --private-key=privkey.pem
216 Specifies a PEM file containing the private key used as
217 identity for outgoing SSL connections.
218
219 -c cert.pem
220 --certificate=cert.pem
221 Specifies a PEM file containing a certificate that certi‐
222 fies the private key specified on -p or --private-key to be
223 trustworthy. The certificate must be signed by the certifi‐
224 cate authority (CA) that the peer in SSL connections will
225 use to verify it.
226
227 -C cacert.pem
228 --ca-cert=cacert.pem
229 Specifies a PEM file containing the CA certificate for ver‐
230 ifying certificates presented to this program by SSL peers.
231 (This may be the same certificate that SSL peers use to
232 verify the certificate specified on -c or --certificate, or
233 it may be a different one, depending on the PKI design in
234 use.)
235
236 -C none
237 --ca-cert=none
238 Disables verification of certificates presented by SSL
239 peers. This introduces a security risk, because it means
240 that certificates cannot be verified to be those of known
241 trusted hosts.
242
243 Other Options
244 --unixctl=socket
245 Sets the name of the control socket on which program listens for
246 runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
247 below). If socket does not begin with /, it is interpreted as
248 relative to /var/run/openvswitch. If --unixctl is not used at
249 all, the default socket is /var/run/openvswitch/program.pid.ctl,
250 where pid is program’s process ID.
251
252 On Windows a local named pipe is used to listen for runtime man‐
253 agement commands. A file is created in the absolute path as
254 pointed by socket or if --unixctl is not used at all, a file is
255 created as program in the configured OVS_RUNDIR directory. The
256 file exists just to mimic the behavior of a Unix domain socket.
257
258 Specifying none for socket disables the control socket feature.
259
260
261
262 -h
263 --help
264 Prints a brief help message to the console.
265
266 -V
267 --version
268 Prints version information to the console.
269
271 ovs-appctl can send commands to a running ovn-northd process. The cur‐
272 rently supported commands are described below.
273
274 exit Causes ovn-northd to gracefully terminate.
275
277 You may run ovn-northd more than once in an OVN deployment. OVN will
278 automatically ensure that only one of them is active at a time. If mul‐
279 tiple instances of ovn-northd are running and the active ovn-northd
280 fails, one of the hot standby instances of ovn-northd will automati‐
281 cally take over.
282
284 One of the main purposes of ovn-northd is to populate the Logical_Flow
285 table in the OVN_Southbound database. This section describes how
286 ovn-northd does this for switch and router logical datapaths.
287
288 Logical Switch Datapaths
289 Ingress Table 0: Admission Control and Ingress Port Security - L2
290
291 Ingress table 0 contains these logical flows:
292
293 · Priority 100 flows to drop packets with VLAN tags or mul‐
294 ticast Ethernet source addresses.
295
296 · Priority 50 flows that implement ingress port security
297 for each enabled logical port. For logical ports on which
298 port security is enabled, these match the inport and the
299 valid eth.src address(es) and advance only those packets
300 to the next flow table. For logical ports on which port
301 security is not enabled, these advance all packets that
302 match the inport.
303
304 There are no flows for disabled logical ports because the default-drop
305 behavior of logical flow tables causes packets that ingress from them
306 to be dropped.
307
308 Ingress Table 1: Ingress Port Security - IP
309
310 Ingress table 1 contains these logical flows:
311
312 · For each element in the port security set having one or
313 more IPv4 or IPv6 addresses (or both),
314
315 · Priority 90 flow to allow IPv4 traffic if it has
316 IPv4 addresses which match the inport, valid
317 eth.src and valid ip4.src address(es).
318
319 · Priority 90 flow to allow IPv4 DHCP discovery
320 traffic if it has a valid eth.src. This is neces‐
321 sary since DHCP discovery messages are sent from
322 the unspecified IPv4 address (0.0.0.0) since the
323 IPv4 address has not yet been assigned.
324
325 · Priority 90 flow to allow IPv6 traffic if it has
326 IPv6 addresses which match the inport, valid
327 eth.src and valid ip6.src address(es).
328
329 · Priority 90 flow to allow IPv6 DAD (Duplicate
330 Address Detection) traffic if it has a valid
331 eth.src. This is is necessary since DAD include
332 requires joining an multicast group and sending
333 neighbor solicitations for the newly assigned
334 address. Since no address is yet assigned, these
335 are sent from the unspecified IPv6 address (::).
336
337 · Priority 80 flow to drop IP (both IPv4 and IPv6)
338 traffic which match the inport and valid eth.src.
339
340 · One priority-0 fallback flow that matches all packets and
341 advances to the next table.
342
343 Ingress Table 2: Ingress Port Security - Neighbor discovery
344
345 Ingress table 2 contains these logical flows:
346
347 · For each element in the port security set,
348
349 · Priority 90 flow to allow ARP traffic which match
350 the inport and valid eth.src and arp.sha. If the
351 element has one or more IPv4 addresses, then it
352 also matches the valid arp.spa.
353
354 · Priority 90 flow to allow IPv6 Neighbor Solicita‐
355 tion and Advertisement traffic which match the
356 inport, valid eth.src and nd.sll/nd.tll. If the
357 element has one or more IPv6 addresses, then it
358 also matches the valid nd.target address(es) for
359 Neighbor Advertisement traffic.
360
361 · Priority 80 flow to drop ARP and IPv6 Neighbor
362 Solicitation and Advertisement traffic which match
363 the inport and valid eth.src.
364
365 · One priority-0 fallback flow that matches all packets and
366 advances to the next table.
367
368 Ingress Table 3: from-lport Pre-ACLs
369
370 This table prepares flows for possible stateful ACL processing in
371 ingress table ACLs. It contains a priority-0 flow that simply moves
372 traffic to the next table. If stateful ACLs are used in the logical
373 datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
374 1; next;) for table Pre-stateful to send IP packets to the connection
375 tracker before eventually advancing to ingress table ACLs. If special
376 ports such as route ports or localnet ports can’t use ct(), a prior‐
377 ity-110 flow is added to skip over stateful ACLs.
378
379 Ingress Table 4: Pre-LB
380
381 This table prepares flows for possible stateful load balancing process‐
382 ing in ingress table LB and Stateful. It contains a priority-0 flow
383 that simply moves traffic to the next table. Moreover it contains a
384 priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
385 table. If load balancing rules with virtual IP addresses (and ports)
386 are configured in OVN_Northbound database for a logical switch data‐
387 path, a priority-100 flow is added for each configured virtual IP
388 address VIP. For IPv4 VIPs, the match is ip && ip4.dst == VIP. For IPv6
389 VIPs, the match is ip && ip6.dst == VIP. The flow sets an action
390 reg0[0] = 1; next; to act as a hint for table Pre-stateful to send IP
391 packets to the connection tracker for packet de-fragmentation before
392 eventually advancing to ingress table LB.
393
394 Ingress Table 5: Pre-stateful
395
396 This table prepares flows for all possible stateful processing in next
397 tables. It contains a priority-0 flow that simply moves traffic to the
398 next table. A priority-100 flow sends the packets to connection tracker
399 based on a hint provided by the previous tables (with a match for
400 reg0[0] == 1) by using the ct_next; action.
401
402 Ingress table 6: from-lport ACLs
403
404 Logical flows in this table closely reproduce those in the ACL table in
405 the OVN_Northbound database for the from-lport direction. The priority
406 values from the ACL table have a limited range and have 1000 added to
407 them to leave room for OVN default flows at both higher and lower pri‐
408 orities.
409
410 · allow ACLs translate into logical flows with the next;
411 action. If there are any stateful ACLs on this datapath,
412 then allow ACLs translate to ct_commit; next; (which acts
413 as a hint for the next tables to commit the connection to
414 conntrack),
415
416 · allow-related ACLs translate into logical flows with the
417 ct_commit(ct_label=0/1); next; actions for new connec‐
418 tions and reg0[1] = 1; next; for existing connections.
419
420 · Other ACLs translate to drop; for new or untracked con‐
421 nections and ct_commit(ct_label=1/1); for known connec‐
422 tions. Setting ct_label marks a connection as one that
423 was previously allowed, but should no longer be allowed
424 due to a policy change.
425
426 This table also contains a priority 0 flow with action next;, so that
427 ACLs allow packets by default. If the logical datapath has a statetful
428 ACL, the following flows will also be added:
429
430 · A priority-1 flow that sets the hint to commit IP traffic
431 to the connection tracker (with action reg0[1] = 1;
432 next;). This is needed for the default allow policy
433 because, while the initiator’s direction may not have any
434 stateful rules, the server’s may and then its return
435 traffic would not be known and marked as invalid.
436
437 · A priority-65535 flow that allows any traffic in the
438 reply direction for a connection that has been committed
439 to the connection tracker (i.e., established flows), as
440 long as the committed flow does not have ct_label.blocked
441 set. We only handle traffic in the reply direction here
442 because we want all packets going in the request direc‐
443 tion to still go through the flows that implement the
444 currently defined policy based on ACLs. If a connection
445 is no longer allowed by policy, ct_label.blocked will get
446 set and packets in the reply direction will no longer be
447 allowed, either.
448
449 · A priority-65535 flow that allows any traffic that is
450 considered related to a committed flow in the connection
451 tracker (e.g., an ICMP Port Unreachable from a non-lis‐
452 tening UDP port), as long as the committed flow does not
453 have ct_label.blocked set.
454
455 · A priority-65535 flow that drops all traffic marked by
456 the connection tracker as invalid.
457
458 · A priority-65535 flow that drops all trafic in the reply
459 direction with ct_label.blocked set meaning that the con‐
460 nection should no longer be allowed due to a policy
461 change. Packets in the request direction are skipped here
462 to let a newly created ACL re-allow this connection.
463
464 Ingress Table 7: from-lport QoS Marking
465
466 Logical flows in this table closely reproduce those in the QoS table
467 with the action column set in the OVN_Northbound database for the
468 from-lport direction.
469
470 · For every qos_rules entry in a logical switch with DSCP
471 marking enabled, a flow will be added at the priority
472 mentioned in the QoS table.
473
474 · One priority-0 fallback flow that matches all packets and
475 advances to the next table.
476
477 Ingress Table 8: from-lport QoS Meter
478
479 Logical flows in this table closely reproduce those in the QoS table
480 with the bandwidth column set in the OVN_Northbound database for the
481 from-lport direction.
482
483 · For every qos_rules entry in a logical switch with meter‐
484 ing enabled, a flow will be added at the priorirty men‐
485 tioned in the QoS table.
486
487 · One priority-0 fallback flow that matches all packets and
488 advances to the next table.
489
490 Ingress Table 9: LB
491
492 It contains a priority-0 flow that simply moves traffic to the next ta‐
493 ble. For established connections a priority 100 flow matches on ct.est
494 && !ct.rel && !ct.new && !ct.inv and sets an action reg0[2] = 1; next;
495 to act as a hint for table Stateful to send packets through connection
496 tracker to NAT the packets. (The packet will automatically get DNATed
497 to the same IP address as the first packet in that connection.)
498
499 Ingress Table 10: Stateful
500
501 · For all the configured load balancing rules for a switch
502 in OVN_Northbound database that includes a L4 port PORT
503 of protocol P and IP address VIP, a priority-120 flow is
504 added. For IPv4 VIPs , the flow matches ct.new && ip &&
505 ip4.dst == VIP && P && P.dst == PORT. For IPv6 VIPs, the
506 flow matches ct.new && ip && ip6.dst == VIP && P && P.dst
507 == PORT. The flow’s action is ct_lb(args) , where args
508 contains comma separated IP addresses (and optional port
509 numbers) to load balance to. The address family of the IP
510 addresses of args is the same as the address family of
511 VIP
512
513 · For all the configured load balancing rules for a switch
514 in OVN_Northbound database that includes just an IP
515 address VIP to match on, OVN adds a priority-110 flow.
516 For IPv4 VIPs, the flow matches ct.new && ip && ip4.dst
517 == VIP. For IPv6 VIPs, the flow matches ct.new && ip &&
518 ip6.dst == VIP. The action on this flow is ct_lb(args),
519 where args contains comma separated IP addresses of the
520 same address family as VIP.
521
522 · A priority-100 flow commits packets to connection tracker
523 using ct_commit; next; action based on a hint provided by
524 the previous tables (with a match for reg0[1] == 1).
525
526 · A priority-100 flow sends the packets to connection
527 tracker using ct_lb; as the action based on a hint pro‐
528 vided by the previous tables (with a match for reg0[2] ==
529 1).
530
531 · A priority-0 flow that simply moves traffic to the next
532 table.
533
534 Ingress Table 11: ARP/ND responder
535
536 This table implements ARP/ND responder in a logical switch for known
537 IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
538 by locally responding to ARP requests without the need to send to other
539 hypervisors. One common case is when the inport is a logical port asso‐
540 ciated with a VIF and the broadcast is responded to on the local hyper‐
541 visor rather than broadcast across the whole network and responded to
542 by the destination VM. This behavior is proxy ARP.
543
544 ARP requests arrive from VMs from a logical switch inport of type
545 default. For this case, the logical switch proxy ARP rules can be for
546 other VMs or logical router ports. Logical switch proxy ARP rules may
547 be programmed both for mac binding of IP addresses on other logical
548 switch VIF ports (which are of the default logical switch port type,
549 representing connectivity to VMs or containers), and for mac binding of
550 IP addresses on logical switch router type ports, representing their
551 logical router port peers. In order to support proxy ARP for logical
552 router ports, an IP address must be configured on the logical switch
553 router type port, with the same value as the peer logical router port.
554 The configured MAC addresses must match as well. When a VM sends an ARP
555 request for a distributed logical router port and if the peer router
556 type port of the attached logical switch does not have an IP address
557 configured, the ARP request will be broadcast on the logical switch.
558 One of the copies of the ARP request will go through the logical switch
559 router type port to the logical router datapath, where the logical
560 router ARP responder will generate a reply. The MAC binding of a dis‐
561 tributed logical router, once learned by an associated VM, is used for
562 all that VM’s communication needing routing. Hence, the action of a VM
563 re-arping for the mac binding of the logical router port should be
564 rare.
565
566 Logical switch ARP responder proxy ARP rules can also be hit when
567 receiving ARP requests externally on a L2 gateway port. In this case,
568 the hypervisor acting as an L2 gateway, responds to the ARP request on
569 behalf of a destination VM.
570
571 Note that ARP requests received from localnet or vtep logical inports
572 can either go directly to VMs, in which case the VM responds or can hit
573 an ARP responder for a logical router port if the packet is used to
574 resolve a logical router port next hop address. In either case, logical
575 switch ARP responder rules will not be hit. It contains these logical
576 flows:
577
578 · Priority-100 flows to skip the ARP responder if inport is
579 of type localnet or vtep and advances directly to the
580 next table. ARP requests sent to localnet or vtep ports
581 can be received by multiple hypervisors. Now, because the
582 same mac binding rules are downloaded to all hypervisors,
583 each of the multiple hypervisors will respond. This will
584 confuse L2 learning on the source of the ARP requests.
585 ARP requests received on an inport of type router are not
586 expected to hit any logical switch ARP responder flows.
587 However, no skip flows are installed for these packets,
588 as there would be some additional flow cost for this and
589 the value appears limited.
590
591 · Priority-50 flows that match ARP requests to each known
592 IP address A of every logical switch port, and respond
593 with ARP replies directly with corresponding Ethernet
594 address E:
595
596 eth.dst = eth.src;
597 eth.src = E;
598 arp.op = 2; /* ARP reply. */
599 arp.tha = arp.sha;
600 arp.sha = E;
601 arp.tpa = arp.spa;
602 arp.spa = A;
603 outport = inport;
604 flags.loopback = 1;
605 output;
606
607
608 These flows are omitted for logical ports (other than
609 router ports or localport ports) that are down.
610
611 · Priority-50 flows that match IPv6 ND neighbor solicita‐
612 tions to each known IP address A (and A’s solicited node
613 address) of every logical switch port, and respond with
614 neighbor advertisements directly with corresponding Eth‐
615 ernet address E:
616
617 nd_na {
618 eth.src = E;
619 ip6.src = A;
620 nd.target = A;
621 nd.tll = E;
622 outport = inport;
623 flags.loopback = 1;
624 output;
625 };
626
627
628 These flows are omitted for logical ports (other than
629 router ports or localport ports) that are down.
630
631 · Priority-100 flows with match criteria like the ARP and
632 ND flows above, except that they only match packets from
633 the inport that owns the IP addresses in question, with
634 action next;. These flows prevent OVN from replying to,
635 for example, an ARP request emitted by a VM for its own
636 IP address. A VM only makes this kind of request to
637 attempt to detect a duplicate IP address assignment, so
638 sending a reply will prevent the VM from accepting the IP
639 address that it owns.
640
641 In place of next;, it would be reasonable to use drop;
642 for the flows’ actions. If everything is working as it is
643 configured, then this would produce equivalent results,
644 since no host should reply to the request. But ARPing for
645 one’s own IP address is intended to detect situations
646 where the network is not working as configured, so drop‐
647 ping the request would frustrate that intent.
648
649 · One priority-0 fallback flow that matches all packets and
650 advances to the next table.
651
652 Ingress Table 12: DHCP option processing
653
654 This table adds the DHCPv4 options to a DHCPv4 packet from the logical
655 ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
656 larly for DHCPv6 options.
657
658 · A priority-100 logical flow is added for these logical
659 ports which matches the IPv4 packet with udp.src = 68 and
660 udp.dst = 67 and applies the action put_dhcp_opts and
661 advances the packet to the next table.
662
663 reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
664 next;
665
666
667 For DHCPDISCOVER and DHCPREQUEST, this transforms the
668 packet into a DHCP reply, adds the DHCP offer IP ip and
669 options to the packet, and stores 1 into reg0[3]. For
670 other kinds of packets, it just stores 0 into reg0[3].
671 Either way, it continues to the next table.
672
673 · A priority-100 logical flow is added for these logical
674 ports which matches the IPv6 packet with udp.src = 546
675 and udp.dst = 547 and applies the action put_dhcpv6_opts
676 and advances the packet to the next table.
677
678 reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
679 next;
680
681
682 For DHCPv6 Solicit/Request/Confirm packets, this trans‐
683 forms the packet into a DHCPv6 Advertise/Reply, adds the
684 DHCPv6 offer IP ip and options to the packet, and stores
685 1 into reg0[3]. For other kinds of packets, it just
686 stores 0 into reg0[3]. Either way, it continues to the
687 next table.
688
689 · A priority-0 flow that matches all packets to advances to
690 table 11.
691
692 Ingress Table 13: DHCP responses
693
694 This table implements DHCP responder for the DHCP replies generated by
695 the previous table.
696
697 · A priority 100 logical flow is added for the logical
698 ports configured with DHCPv4 options which matches IPv4
699 packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
700 1 and responds back to the inport after applying these
701 actions. If reg0[3] is set to 1, it means that the action
702 put_dhcp_opts was successful.
703
704 eth.dst = eth.src;
705 eth.src = E;
706 ip4.dst = A;
707 ip4.src = S;
708 udp.src = 67;
709 udp.dst = 68;
710 outport = P;
711 flags.loopback = 1;
712 output;
713
714
715 where E is the server MAC address and S is the server
716 IPv4 address defined in the DHCPv4 options and A is the
717 IPv4 address defined in the logical port’s addresses col‐
718 umn.
719
720 (This terminates ingress packet processing; the packet
721 does not go to the next ingress table.)
722
723 · A priority 100 logical flow is added for the logical
724 ports configured with DHCPv6 options which matches IPv6
725 packets with udp.src == 546 && udp.dst == 547 && reg0[3]
726 == 1 and responds back to the inport after applying these
727 actions. If reg0[3] is set to 1, it means that the action
728 put_dhcpv6_opts was successful.
729
730 eth.dst = eth.src;
731 eth.src = E;
732 ip6.dst = A;
733 ip6.src = S;
734 udp.src = 547;
735 udp.dst = 546;
736 outport = P;
737 flags.loopback = 1;
738 output;
739
740
741 where E is the server MAC address and S is the server
742 IPv6 LLA address generated from the server_id defined in
743 the DHCPv6 options and A is the IPv6 address defined in
744 the logical port’s addresses column.
745
746 (This terminates packet processing; the packet does not
747 go on the next ingress table.)
748
749 · A priority-0 flow that matches all packets to advances to
750 table 12.
751
752 Ingress Table 14 DNS Lookup
753
754 This table looks up and resolves the DNS names to the corresponding
755 configured IP address(es).
756
757 · A priority-100 logical flow for each logical switch data‐
758 path if it is configured with DNS records, which matches
759 the IPv4 and IPv6 packets with udp.dst = 53 and applies
760 the action dns_lookup and advances the packet to the next
761 table.
762
763 reg0[4] = dns_lookup(); next;
764
765
766 For valid DNS packets, this transforms the packet into a
767 DNS reply if the DNS name can be resolved, and stores 1
768 into reg0[4]. For failed DNS resolution or other kinds of
769 packets, it just stores 0 into reg0[4]. Either way, it
770 continues to the next table.
771
772 Ingress Table 15 DNS Responses
773
774 This table implements DNS responder for the DNS replies generated by
775 the previous table.
776
777 · A priority-100 logical flow for each logical switch data‐
778 path if it is configured with DNS records, which matches
779 the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
780 1 and responds back to the inport after applying these
781 actions. If reg0[4] is set to 1, it means that the action
782 dns_lookup was successful.
783
784 eth.dst <-> eth.src;
785 ip4.src <-> ip4.dst;
786 udp.dst = udp.src;
787 udp.src = 53;
788 outport = P;
789 flags.loopback = 1;
790 output;
791
792
793 (This terminates ingress packet processing; the packet
794 does not go to the next ingress table.)
795
796 Ingress Table 16 Destination Lookup
797
798 This table implements switching behavior. It contains these logical
799 flows:
800
801 · A priority-100 flow that outputs all packets with an Eth‐
802 ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
803 ticast group, which ovn-northd populates with all enabled
804 logical ports.
805
806 · One priority-50 flow that matches each known Ethernet
807 address against eth.dst and outputs the packet to the
808 single associated output port.
809
810 For the Ethernet address on a logical switch port of type
811 router, when that logical switch port’s addresses column
812 is set to router and the connected logical router port
813 specifies a redirect-chassis:
814
815 · The flow for the connected logical router port’s
816 Ethernet address is only programmed on the redi‐
817 rect-chassis.
818
819 · If the logical router has rules specified in nat
820 with external_mac, then those addresses are also
821 used to populate the switch’s destination lookup
822 on the chassis where logical_port is resident.
823
824 · One priority-0 fallback flow that matches all packets and
825 outputs them to the MC_UNKNOWN multicast group, which
826 ovn-northd populates with all enabled logical ports that
827 accept unknown destination packets. As a small optimiza‐
828 tion, if no logical ports accept unknown destination
829 packets, ovn-northd omits this multicast group and logi‐
830 cal flow.
831
832 Egress Table 0: Pre-LB
833
834 This table is similar to ingress table Pre-LB. It contains a priority-0
835 flow that simply moves traffic to the next table. Moreover it contains
836 a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
837 table. If any load balancing rules exist for the datapath, a prior‐
838 ity-100 flow is added with a match of ip and action of reg0[0] = 1;
839 next; to act as a hint for table Pre-stateful to send IP packets to the
840 connection tracker for packet de-fragmentation.
841
842 Egress Table 1: to-lport Pre-ACLs
843
844 This is similar to ingress table Pre-ACLs except for to-lport traffic.
845
846 Egress Table 2: Pre-stateful
847
848 This is similar to ingress table Pre-stateful.
849
850 Egress Table 3: LB
851
852 This is similar to ingress table LB.
853
854 Egress Table 4: to-lport ACLs
855
856 This is similar to ingress table ACLs except for to-lport ACLs.
857
858 In addition, the following flows are added.
859
860 · A priority 34000 logical flow is added for each logical
861 port which has DHCPv4 options defined to allow the DHCPv4
862 reply packet and which has DHCPv6 options defined to
863 allow the DHCPv6 reply packet from the Ingress Table 13:
864 DHCP responses.
865
866 · A priority 34000 logical flow is added for each logical
867 switch datapath configured with DNS records with the
868 match udp.dst = 53 to allow the DNS reply packet from the
869 Ingress Table 15:DNS responses.
870
871 Egress Table 5: to-lport QoS Marking
872
873 This is similar to ingress table QoS marking except they apply to
874 to-lport QoS rules.
875
876 Egress Table 6: to-lport QoS Meter
877
878 This is similar to ingress table QoS meter except they apply to
879 to-lport QoS rules.
880
881 Egress Table 7: Stateful
882
883 This is similar to ingress table Stateful except that there are no
884 rules added for load balancing new connections.
885
886 Egress Table 8: Egress Port Security - IP
887
888 This is similar to the port security logic in table Ingress Port Secu‐
889 rity - IP except that outport, eth.dst, ip4.dst and ip6.dst are checked
890 instead of inport, eth.src, ip4.src and ip6.src
891
892 Egress Table 9: Egress Port Security - L2
893
894 This is similar to the ingress port security logic in ingress table
895 Admission Control and Ingress Port Security - L2, but with important
896 differences. Most obviously, outport and eth.dst are checked instead of
897 inport and eth.src. Second, packets directed to broadcast or multicast
898 eth.dst are always accepted instead of being subject to the port secu‐
899 rity rules; this is implemented through a priority-100 flow that
900 matches on eth.mcast with action output;. Finally, to ensure that even
901 broadcast and multicast packets are not delivered to disabled logical
902 ports, a priority-150 flow for each disabled logical outport overrides
903 the priority-100 flow with a drop; action.
904
905 Logical Router Datapaths
906 Logical router datapaths will only exist for Logical_Router rows in the
907 OVN_Northbound database that do not have enabled set to false
908
909 Ingress Table 0: L2 Admission Control
910
911 This table drops packets that the router shouldn’t see at all based on
912 their Ethernet headers. It contains the following flows:
913
914 · Priority-100 flows to drop packets with VLAN tags or mul‐
915 ticast Ethernet source addresses.
916
917 · For each enabled router port P with Ethernet address E, a
918 priority-50 flow that matches inport == P && (eth.mcast
919 || eth.dst == E), with action next;.
920
921 For the gateway port on a distributed logical router
922 (where one of the logical router ports specifies a redi‐
923 rect-chassis), the above flow matching eth.dst == E is
924 only programmed on the gateway port instance on the redi‐
925 rect-chassis.
926
927 · For each dnat_and_snat NAT rule on a distributed router
928 that specifies an external Ethernet address E, a prior‐
929 ity-50 flow that matches inport == GW && eth.dst == E,
930 where GW is the logical router gateway port, with action
931 next;.
932
933 This flow is only programmed on the gateway port instance
934 on the chassis where the logical_port specified in the
935 NAT rule resides.
936
937 Other packets are implicitly dropped.
938
939 Ingress Table 1: IP Input
940
941 This table is the core of the logical router datapath functionality. It
942 contains the following flows to implement very basic IP host function‐
943 ality.
944
945 · L3 admission control: A priority-100 flow drops packets
946 that match any of the following:
947
948 · ip4.src[28..31] == 0xe (multicast source)
949
950 · ip4.src == 255.255.255.255 (broadcast source)
951
952 · ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
953 (localhost source or destination)
954
955 · ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
956 network source or destination)
957
958 · ip4.src or ip6.src is any IP address owned by the
959 router, unless the packet was recirculated due to
960 egress loopback as indicated by REG‐
961 BIT_EGRESS_LOOPBACK.
962
963 · ip4.src is the broadcast address of any IP network
964 known to the router.
965
966 · ICMP echo reply. These flows reply to ICMP echo requests
967 received for the router’s IP address. Let A be an IP
968 address owned by a router port. Then, for each A that is
969 an IPv4 address, a priority-90 flow matches on ip4.dst ==
970 A and icmp4.type == 8 && icmp4.code == 0 (ICMP echo
971 request). For each A that is an IPv6 address, a prior‐
972 ity-90 flow matches on ip6.dst == A and icmp6.type == 128
973 && icmp6.code == 0 (ICMPv6 echo request). The port of the
974 router that receives the echo request does not matter.
975 Also, the ip.ttl of the echo request packet is not
976 checked, so it complies with RFC 1812, section 4.2.2.9.
977 Flows for ICMPv4 echo requests use the following actions:
978
979 ip4.dst <-> ip4.src;
980 ip.ttl = 255;
981 icmp4.type = 0;
982 flags.loopback = 1;
983 next;
984
985
986 Flows for ICMPv6 echo requests use the following actions:
987
988 ip6.dst <-> ip6.src;
989 ip.ttl = 255;
990 icmp6.type = 129;
991 flags.loopback = 1;
992 next;
993
994
995 · Reply to ARP requests.
996
997 These flows reply to ARP requests for the router’s own IP
998 address. For each router port P that owns IP address A
999 and Ethernet address E, a priority-90 flow matches inport
1000 == P && arp.op == 1 && arp.tpa == A (ARP request) with
1001 the following actions:
1002
1003 eth.dst = eth.src;
1004 eth.src = E;
1005 arp.op = 2; /* ARP reply. */
1006 arp.tha = arp.sha;
1007 arp.sha = E;
1008 arp.tpa = arp.spa;
1009 arp.spa = A;
1010 outport = P;
1011 flags.loopback = 1;
1012 output;
1013
1014
1015 For the gateway port on a distributed logical router
1016 (where one of the logical router ports specifies a redi‐
1017 rect-chassis), the above flows are only programmed on the
1018 gateway port instance on the redirect-chassis. This
1019 behavior avoids generation of multiple ARP responses from
1020 different chassis, and allows upstream MAC learning to
1021 point to the redirect-chassis.
1022
1023 · These flows reply to ARP requests for the virtual IP
1024 addresses configured in the router for DNAT or load bal‐
1025 ancing. For a configured DNAT IP address or a load bal‐
1026 ancer IPv4 VIP A, for each router port P with Ethernet
1027 address E, a priority-90 flow matches inport == P &&
1028 arp.op == 1 && arp.tpa == A (ARP request) with the fol‐
1029 lowing actions:
1030
1031 eth.dst = eth.src;
1032 eth.src = E;
1033 arp.op = 2; /* ARP reply. */
1034 arp.tha = arp.sha;
1035 arp.sha = E;
1036 arp.tpa = arp.spa;
1037 arp.spa = A;
1038 outport = P;
1039 flags.loopback = 1;
1040 output;
1041
1042
1043 For the gateway port on a distributed logical router with
1044 NAT (where one of the logical router ports specifies a
1045 redirect-chassis):
1046
1047 · If the corresponding NAT rule cannot be handled in
1048 a distributed manner, then this flow is only pro‐
1049 grammed on the gateway port instance on the redi‐
1050 rect-chassis. This behavior avoids generation of
1051 multiple ARP responses from different chassis, and
1052 allows upstream MAC learning to point to the redi‐
1053 rect-chassis.
1054
1055 · If the corresponding NAT rule can be handled in a
1056 distributed manner, then this flow is only pro‐
1057 grammed on the gateway port instance where the
1058 logical_port specified in the NAT rule resides.
1059
1060 Some of the actions are different for this case,
1061 using the external_mac specified in the NAT rule
1062 rather than the gateway port’s Ethernet address E:
1063
1064 eth.src = external_mac;
1065 arp.sha = external_mac;
1066
1067
1068 This behavior avoids generation of multiple ARP
1069 responses from different chassis, and allows
1070 upstream MAC learning to point to the correct
1071 chassis.
1072
1073 · ARP reply handling. This flow uses ARP replies to popu‐
1074 late the logical router’s ARP table. A priority-90 flow
1075 with match arp.op == 2 has actions put_arp(inport,
1076 arp.spa, arp.sha);.
1077
1078 · Reply to IPv6 Neighbor Solicitations. These flows reply
1079 to Neighbor Solicitation requests for the router’s own
1080 IPv6 address and load balancing IPv6 VIPs and populate
1081 the logical router’s mac binding table.
1082
1083 For each router port P that owns IPv6 address A,
1084 solicited node address S, and Ethernet address E, a pri‐
1085 ority-90 flow matches inport == P && nd_ns && ip6.dst ==
1086 {A, E} && nd.target == A with the following actions:
1087
1088 put_nd(inport, ip6.src, nd.sll);
1089 nd_na_router {
1090 eth.src = E;
1091 ip6.src = A;
1092 nd.target = A;
1093 nd.tll = E;
1094 outport = inport;
1095 flags.loopback = 1;
1096 output;
1097 };
1098
1099
1100 For each router port P that has load balancing VIP A,
1101 solicited node address S, and Ethernet address E, a pri‐
1102 ority-90 flow matches inport == P && nd_ns && ip6.dst ==
1103 {A, E} && nd.target == A with the following actions:
1104
1105 put_nd(inport, ip6.src, nd.sll);
1106 nd_na {
1107 eth.src = E;
1108 ip6.src = A;
1109 nd.target = A;
1110 nd.tll = E;
1111 outport = inport;
1112 flags.loopback = 1;
1113 output;
1114 };
1115
1116
1117 For the gateway port on a distributed logical router
1118 (where one of the logical router ports specifies a redi‐
1119 rect-chassis), the above flows replying to IPv6 Neighbor
1120 Solicitations are only programmed on the gateway port
1121 instance on the redirect-chassis. This behavior avoids
1122 generation of multiple replies from different chassis,
1123 and allows upstream MAC learning to point to the redi‐
1124 rect-chassis.
1125
1126 · IPv6 neighbor advertisement handling. This flow uses
1127 neighbor advertisements to populate the logical router’s
1128 mac binding table. A priority-90 flow with match nd_na
1129 has actions put_nd(inport, nd.target, nd.tll);.
1130
1131 · IPv6 neighbor solicitation for non-hosted addresses han‐
1132 dling. This flow uses neighbor solicitations to populate
1133 the logical router’s mac binding table (ones that were
1134 directed at the logical router would have matched the
1135 priority-90 neighbor solicitation flow already). A prior‐
1136 ity-80 flow with match nd_ns has actions put_nd(inport,
1137 ip6.src, nd.sll);.
1138
1139 · UDP port unreachable. Priority-80 flows generate ICMP
1140 port unreachable messages in reply to UDP datagrams
1141 directed to the router’s IP address, except in the spe‐
1142 cial case of gateways, which accept traffic directed to a
1143 router IP for load balancing and NAT purposes.
1144
1145 These flows should not match IP fragments with nonzero
1146 offset.
1147
1148 · TCP reset. Priority-80 flows generate TCP reset messages
1149 in reply to TCP datagrams directed to the router’s IP
1150 address, except in the special case of gateways, which
1151 accept traffic directed to a router IP for load balancing
1152 and NAT purposes.
1153
1154 These flows should not match IP fragments with nonzero
1155 offset.
1156
1157 · Protocol or address unreachable. Priority-70 flows gener‐
1158 ate ICMP protocol or address unreachable messages for
1159 IPv4 and IPv6 respectively in reply to packets directed
1160 to the router’s IP address on IP protocols other than
1161 UDP, TCP, and ICMP, except in the special case of gate‐
1162 ways, which accept traffic directed to a router IP for
1163 load balancing purposes.
1164
1165 These flows should not match IP fragments with nonzero
1166 offset.
1167
1168 · Drop other IP traffic to this router. These flows drop
1169 any other traffic destined to an IP address of this
1170 router that is not already handled by one of the flows
1171 above, which amounts to ICMP (other than echo requests)
1172 and fragments with nonzero offsets. For each IP address A
1173 owned by the router, a priority-60 flow matches ip4.dst
1174 == A and drops the traffic. An exception is made and the
1175 above flow is not added if the router port’s own IP
1176 address is used to SNAT packets passing through that
1177 router.
1178
1179 The flows above handle all of the traffic that might be directed to the
1180 router itself. The following flows (with lower priorities) handle the
1181 remaining traffic, potentially for forwarding:
1182
1183 · Drop Ethernet local broadcast. A priority-50 flow with
1184 match eth.bcast drops traffic destined to the local Eth‐
1185 ernet broadcast address. By definition this traffic
1186 should not be forwarded.
1187
1188 · ICMP time exceeded. For each router port P, whose IP
1189 address is A, a priority-40 flow with match inport == P
1190 && ip.ttl == {0, 1} && !ip.later_frag matches packets
1191 whose TTL has expired, with the following actions to send
1192 an ICMP time exceeded reply for IPv4 and IPv6 respec‐
1193 tively:
1194
1195 icmp4 {
1196 icmp4.type = 11; /* Time exceeded. */
1197 icmp4.code = 0; /* TTL exceeded in transit. */
1198 ip4.dst = ip4.src;
1199 ip4.src = A;
1200 ip.ttl = 255;
1201 next;
1202 };
1203 icmp6 {
1204 icmp6.type = 3; /* Time exceeded. */
1205 icmp6.code = 0; /* TTL exceeded in transit. */
1206 ip6.dst = ip6.src;
1207 ip6.src = A;
1208 ip.ttl = 255;
1209 next;
1210 };
1211
1212
1213 · TTL discard. A priority-30 flow with match ip.ttl == {0,
1214 1} and actions drop; drops other packets whose TTL has
1215 expired, that should not receive a ICMP error reply (i.e.
1216 fragments with nonzero offset).
1217
1218 · Next table. A priority-0 flows match all packets that
1219 aren’t already handled and uses actions next; to feed
1220 them to the next table.
1221
1222 Ingress Table 2: DEFRAG
1223
1224 This is to send packets to connection tracker for tracking and defrag‐
1225 mentation. It contains a priority-0 flow that simply moves traffic to
1226 the next table. If load balancing rules with virtual IP addresses (and
1227 ports) are configured in OVN_Northbound database for a Gateway router,
1228 a priority-100 flow is added for each configured virtual IP address
1229 VIP. For IPv4 VIPs the flow matches ip && ip4.dst == VIP. For IPv6
1230 VIPs, the flow matches ip && ip6.dst == VIP. The flow uses the action
1231 ct_next; to send IP packets to the connection tracker for packet de-
1232 fragmentation and tracking before sending it to the next table.
1233
1234 Ingress Table 3: UNSNAT
1235
1236 This is for already established connections’ reverse traffic. i.e.,
1237 SNAT has already been done in egress pipeline and now the packet has
1238 entered the ingress pipeline as part of a reply. It is unSNATted here.
1239
1240 Ingress Table 3: UNSNAT on Gateway Routers
1241
1242 · If the Gateway router has been configured to force SNAT
1243 any previously DNATted packets to B, a priority-110 flow
1244 matches ip && ip4.dst == B with an action ct_snat; .
1245
1246 If the Gateway router has been configured to force SNAT
1247 any previously load-balanced packets to B, a priority-100
1248 flow matches ip && ip4.dst == B with an action ct_snat; .
1249
1250 For each NAT configuration in the OVN Northbound data‐
1251 base, that asks to change the source IP address of a
1252 packet from A to B, a priority-90 flow matches ip &&
1253 ip4.dst == B with an action ct_snat; .
1254
1255 A priority-0 logical flow with match 1 has actions next;.
1256
1257 Ingress Table 3: UNSNAT on Distributed Routers
1258
1259 · For each configuration in the OVN Northbound database,
1260 that asks to change the source IP address of a packet
1261 from A to B, a priority-100 flow matches ip && ip4.dst ==
1262 B && inport == GW, where GW is the logical router gateway
1263 port, with an action ct_snat;.
1264
1265 If the NAT rule cannot be handled in a distributed man‐
1266 ner, then the priority-100 flow above is only programmed
1267 on the redirect-chassis.
1268
1269 For each configuration in the OVN Northbound database,
1270 that asks to change the source IP address of a packet
1271 from A to B, a priority-50 flow matches ip && ip4.dst ==
1272 B with an action REGBIT_NAT_REDIRECT = 1; next;. This
1273 flow is for east/west traffic to a NAT destination IPv4
1274 address. By setting the REGBIT_NAT_REDIRECT flag, in the
1275 ingress table Gateway Redirect this will trigger a redi‐
1276 rect to the instance of the gateway port on the redi‐
1277 rect-chassis.
1278
1279 A priority-0 logical flow with match 1 has actions next;.
1280
1281 Ingress Table 4: DNAT
1282
1283 Packets enter the pipeline with destination IP address that needs to be
1284 DNATted from a virtual IP address to a real IP address. Packets in the
1285 reverse direction needs to be unDNATed.
1286
1287 Ingress Table 4: Load balancing DNAT rules
1288
1289 Following load balancing DNAT flows are added for Gateway router or
1290 Router with gateway port. These flows are programmed only on the redi‐
1291 rect-chassis. These flows do not get programmed for load balancers with
1292 IPv6 VIPs.
1293
1294 · For all the configured load balancing rules for a Gateway
1295 router or Router with gateway port in OVN_Northbound
1296 database that includes a L4 port PORT of protocol P and
1297 IPv4 address VIP, a priority-120 flow that matches on
1298 ct.new && ip && ip4.dst == VIP && P && P.dst == PORT
1299 with an action of ct_lb(args), where args contains comma
1300 separated IPv4 addresses (and optional port numbers) to
1301 load balance to. If the router is configured to force
1302 SNAT any load-balanced packets, the above action will be
1303 replaced by flags.force_snat_for_lb = 1; ct_lb(args);.
1304
1305 · For all the configured load balancing rules for a router
1306 in OVN_Northbound database that includes a L4 port PORT
1307 of protocol P and IPv4 address VIP, a priority-120 flow
1308 that matches on ct.est && ip && ip4.dst == VIP && P &&
1309 P.dst == PORT
1310 with an action of ct_dnat;. If the router is configured
1311 to force SNAT any load-balanced packets, the above action
1312 will be replaced by flags.force_snat_for_lb = 1;
1313 ct_dnat;.
1314
1315 · For all the configured load balancing rules for a router
1316 in OVN_Northbound database that includes just an IP
1317 address VIP to match on, a priority-110 flow that matches
1318 on ct.new && ip && ip4.dst == VIP with an action of
1319 ct_lb(args), where args contains comma separated IPv4
1320 addresses. If the router is configured to force SNAT any
1321 load-balanced packets, the above action will be replaced
1322 by flags.force_snat_for_lb = 1; ct_lb(args);.
1323
1324 · For all the configured load balancing rules for a router
1325 in OVN_Northbound database that includes just an IP
1326 address VIP to match on, a priority-110 flow that matches
1327 on ct.est && ip && ip4.dst == VIP with an action of
1328 ct_dnat;. If the router is configured to force SNAT any
1329 load-balanced packets, the above action will be replaced
1330 by flags.force_snat_for_lb = 1; ct_dnat;.
1331
1332 Ingress Table 4: DNAT on Gateway Routers
1333
1334 · For each configuration in the OVN Northbound database,
1335 that asks to change the destination IP address of a
1336 packet from A to B, a priority-100 flow matches ip &&
1337 ip4.dst == A with an action flags.loopback = 1;
1338 ct_dnat(B);. If the Gateway router is configured to force
1339 SNAT any DNATed packet, the above action will be replaced
1340 by flags.force_snat_for_dnat = 1; flags.loopback = 1;
1341 ct_dnat(B);.
1342
1343 · For all IP packets of a Gateway router, a priority-50
1344 flow with an action flags.loopback = 1; ct_dnat;.
1345
1346 · A priority-0 logical flow with match 1 has actions next;.
1347
1348 Ingress Table 4: DNAT on Distributed Routers
1349
1350 On distributed routers, the DNAT table only handles packets with desti‐
1351 nation IP address that needs to be DNATted from a virtual IP address to
1352 a real IP address. The unDNAT processing in the reverse direction is
1353 handled in a separate table in the egress pipeline.
1354
1355 · For each configuration in the OVN Northbound database,
1356 that asks to change the destination IP address of a
1357 packet from A to B, a priority-100 flow matches ip &&
1358 ip4.dst == B && inport == GW, where GW is the logical
1359 router gateway port, with an action ct_dnat(B);.
1360
1361 If the NAT rule cannot be handled in a distributed man‐
1362 ner, then the priority-100 flow above is only programmed
1363 on the redirect-chassis.
1364
1365 For each configuration in the OVN Northbound database,
1366 that asks to change the destination IP address of a
1367 packet from A to B, a priority-50 flow matches ip &&
1368 ip4.dst == B with an action REGBIT_NAT_REDIRECT = 1;
1369 next;. This flow is for east/west traffic to a NAT desti‐
1370 nation IPv4 address. By setting the REGBIT_NAT_REDIRECT
1371 flag, in the ingress table Gateway Redirect this will
1372 trigger a redirect to the instance of the gateway port on
1373 the redirect-chassis.
1374
1375 A priority-0 logical flow with match 1 has actions next;.
1376
1377 Ingress Table 5: IPv6 ND RA option processing
1378
1379 · A priority-50 logical flow is added for each logical
1380 router port configured with IPv6 ND RA options which
1381 matches IPv6 ND Router Solicitation packet and applies
1382 the action put_nd_ra_opts and advances the packet to the
1383 next table.
1384
1385 reg0[5] = put_nd_ra_opts(options);next;
1386
1387
1388 For a valid IPv6 ND RS packet, this transforms the packet
1389 into an IPv6 ND RA reply and sets the RA options to the
1390 packet and stores 1 into reg0[5]. For other kinds of
1391 packets, it just stores 0 into reg0[5]. Either way, it
1392 continues to the next table.
1393
1394 · A priority-0 logical flow with match 1 has actions next;.
1395
1396 Ingress Table 6: IPv6 ND RA responder
1397
1398 This table implements IPv6 ND RA responder for the IPv6 ND RA replies
1399 generated by the previous table.
1400
1401 · A priority-50 logical flow is added for each logical
1402 router port configured with IPv6 ND RA options which
1403 matches IPv6 ND RA packets and reg0[5] == 1 and responds
1404 back to the inport after applying these actions. If
1405 reg0[5] is set to 1, it means that the action
1406 put_nd_ra_opts was successful.
1407
1408 eth.dst = eth.src;
1409 eth.src = E;
1410 ip6.dst = ip6.src;
1411 ip6.src = I;
1412 outport = P;
1413 flags.loopback = 1;
1414 output;
1415
1416
1417 where E is the MAC address and I is the IPv6 link local
1418 address of the logical router port.
1419
1420 (This terminates packet processing in ingress pipeline;
1421 the packet does not go to the next ingress table.)
1422
1423 · A priority-0 logical flow with match 1 has actions next;.
1424
1425 Ingress Table 7: IP Routing
1426
1427 A packet that arrives at this table is an IP packet that should be
1428 routed to the address in ip4.dst or ip6.dst. This table implements IP
1429 routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
1430 (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
1431 and advances to the next table for ARP resolution. It also sets reg1
1432 (or xxreg1) to the IP address owned by the selected router port
1433 (ingress table ARP Request will generate an ARP request, if needed,
1434 with reg0 as the target protocol address and reg1 as the source proto‐
1435 col address).
1436
1437 This table contains the following logical flows:
1438
1439 · For distributed logical routers where one of the logical
1440 router ports specifies a redirect-chassis, a priority-300
1441 logical flow with match REGBIT_NAT_REDIRECT == 1 has
1442 actions ip.ttl--; next;. The outport will be set later in
1443 the Gateway Redirect table.
1444
1445 · IPv4 routing table. For each route to IPv4 network N with
1446 netmask M, on router port P with IP address A and Ether‐
1447 net address E, a logical flow with match ip4.dst == N/M,
1448 whose priority is the number of 1-bits in M, has the fol‐
1449 lowing actions:
1450
1451 ip.ttl--;
1452 reg0 = G;
1453 reg1 = A;
1454 eth.src = E;
1455 outport = P;
1456 flags.loopback = 1;
1457 next;
1458
1459
1460 (Ingress table 1 already verified that ip.ttl--; will not
1461 yield a TTL exceeded error.)
1462
1463 If the route has a gateway, G is the gateway IP address.
1464 Instead, if the route is from a configured static route,
1465 G is the next hop IP address. Else it is ip4.dst.
1466
1467 · IPv6 routing table. For each route to IPv6 network N with
1468 netmask M, on router port P with IP address A and Ether‐
1469 net address E, a logical flow with match in CIDR notation
1470 ip6.dst == N/M, whose priority is the integer value of M,
1471 has the following actions:
1472
1473 ip.ttl--;
1474 xxreg0 = G;
1475 xxreg1 = A;
1476 eth.src = E;
1477 outport = P;
1478 flags.loopback = 1;
1479 next;
1480
1481
1482 (Ingress table 1 already verified that ip.ttl--; will not
1483 yield a TTL exceeded error.)
1484
1485 If the route has a gateway, G is the gateway IP address.
1486 Instead, if the route is from a configured static route,
1487 G is the next hop IP address. Else it is ip6.dst.
1488
1489 If the address A is in the link-local scope, the route
1490 will be limited to sending on the ingress port.
1491
1492 Ingress Table 8: ARP/ND Resolution
1493
1494 Any packet that reaches this table is an IP packet whose next-hop IPv4
1495 address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
1496 contains the final destination.) This table resolves the IP address in
1497 reg0 (or xxreg0) into an output port in outport and an Ethernet address
1498 in eth.dst, using the following flows:
1499
1500 · For distributed logical routers where one of the logical
1501 router ports specifies a redirect-chassis, a priority-200
1502 logical flow with match REGBIT_NAT_REDIRECT == 1 has
1503 actions eth.dst = E; next;, where E is the ethernet
1504 address of the router’s distributed gateway port.
1505
1506 · Static MAC bindings. MAC bindings can be known statically
1507 based on data in the OVN_Northbound database. For router
1508 ports connected to logical switches, MAC bindings can be
1509 known statically from the addresses column in the Logi‐
1510 cal_Switch_Port table. For router ports connected to
1511 other logical routers, MAC bindings can be known stati‐
1512 cally from the mac and networks column in the Logi‐
1513 cal_Router_Port table.
1514
1515 For each IPv4 address A whose host is known to have Eth‐
1516 ernet address E on router port P, a priority-100 flow
1517 with match outport === P && reg0 == A has actions eth.dst
1518 = E; next;.
1519
1520 For each IPv6 address A whose host is known to have Eth‐
1521 ernet address E on router port P, a priority-100 flow
1522 with match outport === P && xxreg0 == A has actions
1523 eth.dst = E; next;.
1524
1525 For each logical router port with an IPv4 address A and a
1526 mac address of E that is reachable via a different logi‐
1527 cal router port P, a priority-100 flow with match outport
1528 === P && reg0 == A has actions eth.dst = E; next;.
1529
1530 For each logical router port with an IPv6 address A and a
1531 mac address of E that is reachable via a different logi‐
1532 cal router port P, a priority-100 flow with match outport
1533 === P && xxreg0 == A has actions eth.dst = E; next;.
1534
1535 · Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
1536 ings that have become known dynamically through ARP or
1537 neighbor discovery. (The ingress table ARP Request will
1538 issue an ARP or neighbor solicitation request for cases
1539 where the binding is not yet known.)
1540
1541 A priority-0 logical flow with match ip4 has actions
1542 get_arp(outport, reg0); next;.
1543
1544 A priority-0 logical flow with match ip6 has actions
1545 get_nd(outport, xxreg0); next;.
1546
1547 Ingress Table 9: Gateway Redirect
1548
1549 For distributed logical routers where one of the logical router ports
1550 specifies a redirect-chassis, this table redirects certain packets to
1551 the distributed gateway port instance on the redirect-chassis. This ta‐
1552 ble has the following flows:
1553
1554 · A priority-200 logical flow with match REGBIT_NAT_REDI‐
1555 RECT == 1 has actions outport = CR; next;, where CR is
1556 the chassisredirect port representing the instance of the
1557 logical router distributed gateway port on the redi‐
1558 rect-chassis.
1559
1560 · A priority-150 logical flow with match outport == GW &&
1561 eth.dst == 00:00:00:00:00:00 has actions outport = CR;
1562 next;, where GW is the logical router distributed gateway
1563 port and CR is the chassisredirect port representing the
1564 instance of the logical router distributed gateway port
1565 on the redirect-chassis.
1566
1567 · For each NAT rule in the OVN Northbound database that can
1568 be handled in a distributed manner, a priority-100 logi‐
1569 cal flow with match ip4.src == B && outport == GW, where
1570 GW is the logical router distributed gateway port, with
1571 actions next;.
1572
1573 · A priority-50 logical flow with match outport == GW has
1574 actions outport = CR; next;, where GW is the logical
1575 router distributed gateway port and CR is the chas‐
1576 sisredirect port representing the instance of the logical
1577 router distributed gateway port on the redirect-chassis.
1578
1579 · A priority-0 logical flow with match 1 has actions next;.
1580
1581 Ingress Table 10: ARP Request
1582
1583 In the common case where the Ethernet destination has been resolved,
1584 this table outputs the packet. Otherwise, it composes and sends an ARP
1585 or IPv6 Neighbor Solicitation request. It holds the following flows:
1586
1587 · Unknown MAC address. A priority-100 flow for IPv4 packets
1588 with match eth.dst == 00:00:00:00:00:00 has the following
1589 actions:
1590
1591 arp {
1592 eth.dst = ff:ff:ff:ff:ff:ff;
1593 arp.spa = reg1;
1594 arp.tpa = reg0;
1595 arp.op = 1; /* ARP request. */
1596 output;
1597 };
1598
1599
1600 Unknown MAC address. A priority-100 flow for IPv6 packets
1601 with match eth.dst == 00:00:00:00:00:00 has the following
1602 actions:
1603
1604 nd_ns {
1605 nd.target = xxreg0;
1606 output;
1607 };
1608
1609
1610 (Ingress table IP Routing initialized reg1 with the IP
1611 address owned by outport and (xx)reg0 with the next-hop
1612 IP address)
1613
1614 The IP packet that triggers the ARP/IPv6 NS request is
1615 dropped.
1616
1617 · Known MAC address. A priority-0 flow with match 1 has
1618 actions output;.
1619
1620 Egress Table 0: UNDNAT
1621
1622 This is for already established connections’ reverse traffic. i.e.,
1623 DNAT has already been done in ingress pipeline and now the packet has
1624 entered the egress pipeline as part of a reply. For NAT on a distrib‐
1625 uted router, it is unDNATted here. For Gateway routers, the unDNAT pro‐
1626 cessing is carried out in the ingress DNAT table.
1627
1628 · For all the configured load balancing rules for a router
1629 with gateway port in OVN_Northbound database that
1630 includes an IPv4 address VIP, for every backend IPv4
1631 address B defined for the VIP a priority-120 flow is pro‐
1632 grammed on redirect-chassis that matches ip && ip4.src ==
1633 B && outport == GW, where GW is the logical router gate‐
1634 way port with an action ct_dnat;. If the backend IPv4
1635 address B is also configured with L4 port PORT of proto‐
1636 col P, then the match also includes P.src == PORT. These
1637 flows are not added for load balancers with IPv6 VIPs.
1638
1639 If the router is configured to force SNAT any load-bal‐
1640 anced packets, above action will be replaced by
1641 flags.force_snat_for_lb = 1; ct_dnat;.
1642
1643 · For each configuration in the OVN Northbound database
1644 that asks to change the destination IP address of a
1645 packet from an IP address of A to B, a priority-100 flow
1646 matches ip && ip4.src == B && outport == GW, where GW is
1647 the logical router gateway port, with an action ct_dnat;.
1648
1649 If the NAT rule cannot be handled in a distributed man‐
1650 ner, then the priority-100 flow above is only programmed
1651 on the redirect-chassis.
1652
1653 If the NAT rule can be handled in a distributed manner,
1654 then there is an additional action eth.src = EA;, where
1655 EA is the ethernet address associated with the IP address
1656 A in the NAT rule. This allows upstream MAC learning to
1657 point to the correct chassis.
1658
1659 · A priority-0 logical flow with match 1 has actions next;.
1660
1661 Egress Table 1: SNAT
1662
1663 Packets that are configured to be SNATed get their source IP address
1664 changed based on the configuration in the OVN Northbound database.
1665
1666 Egress Table 1: SNAT on Gateway Routers
1667
1668 · If the Gateway router in the OVN Northbound database has
1669 been configured to force SNAT a packet (that has been
1670 previously DNATted) to B, a priority-100 flow matches
1671 flags.force_snat_for_dnat == 1 && ip with an action
1672 ct_snat(B);.
1673
1674 If the Gateway router in the OVN Northbound database has
1675 been configured to force SNAT a packet (that has been
1676 previously load-balanced) to B, a priority-100 flow
1677 matches flags.force_snat_for_lb == 1 && ip with an action
1678 ct_snat(B);.
1679
1680 For each configuration in the OVN Northbound database,
1681 that asks to change the source IP address of a packet
1682 from an IP address of A or to change the source IP
1683 address of a packet that belongs to network A to B, a
1684 flow matches ip && ip4.src == A with an action
1685 ct_snat(B);. The priority of the flow is calculated based
1686 on the mask of A, with matches having larger masks get‐
1687 ting higher priorities.
1688
1689 A priority-0 logical flow with match 1 has actions next;.
1690
1691 Egress Table 1: SNAT on Distributed Routers
1692
1693 · For each configuration in the OVN Northbound database,
1694 that asks to change the source IP address of a packet
1695 from an IP address of A or to change the source IP
1696 address of a packet that belongs to network A to B, a
1697 flow matches ip && ip4.src == A && outport == GW, where
1698 GW is the logical router gateway port, with an action
1699 ct_snat(B);. The priority of the flow is calculated based
1700 on the mask of A, with matches having larger masks get‐
1701 ting higher priorities.
1702
1703 If the NAT rule cannot be handled in a distributed man‐
1704 ner, then the flow above is only programmed on the redi‐
1705 rect-chassis.
1706
1707 If the NAT rule can be handled in a distributed manner,
1708 then there is an additional action eth.src = EA;, where
1709 EA is the ethernet address associated with the IP address
1710 A in the NAT rule. This allows upstream MAC learning to
1711 point to the correct chassis.
1712
1713 · A priority-0 logical flow with match 1 has actions next;.
1714
1715 Egress Table 2: Egress Loopback
1716
1717 For distributed logical routers where one of the logical router ports
1718 specifies a redirect-chassis.
1719
1720 Earlier in the ingress pipeline, some east-west traffic was redirected
1721 to the chassisredirect port, based on flows in the UNSNAT and DNAT
1722 ingress tables setting the REGBIT_NAT_REDIRECT flag, which then trig‐
1723 gered a match to a flow in the Gateway Redirect ingress table. The
1724 intention was not to actually send traffic out the distributed gateway
1725 port instance on the redirect-chassis. This traffic was sent to the
1726 distributed gateway port instance in order for DNAT and/or SNAT pro‐
1727 cessing to be applied.
1728
1729 While UNDNAT and SNAT processing have already occurred by this point,
1730 this traffic needs to be forced through egress loopback on this dis‐
1731 tributed gateway port instance, in order for UNSNAT and DNAT processing
1732 to be applied, and also for IP routing and ARP resolution after all of
1733 the NAT processing, so that the packet can be forwarded to the destina‐
1734 tion.
1735
1736 This table has the following flows:
1737
1738 · For each NAT rule in the OVN Northbound database on a
1739 distributed router, a priority-100 logical flow with
1740 match ip4.dst == E && outport == GW, where E is the
1741 external IP address specified in the NAT rule, and GW is
1742 the logical router distributed gateway port, with the
1743 following actions:
1744
1745 clone {
1746 ct_clear;
1747 inport = outport;
1748 outport = "";
1749 flags = 0;
1750 flags.loopback = 1;
1751 reg0 = 0;
1752 reg1 = 0;
1753 ...
1754 reg9 = 0;
1755 REGBIT_EGRESS_LOOPBACK = 1;
1756 next(pipeline=ingress, table=0);
1757 };
1758
1759
1760 flags.loopback is set since in_port is unchanged and the
1761 packet may return back to that port after NAT processing.
1762 REGBIT_EGRESS_LOOPBACK is set to indicate that egress
1763 loopback has occurred, in order to skip the source IP
1764 address check against the router address.
1765
1766 · A priority-0 logical flow with match 1 has actions next;.
1767
1768 Egress Table 3: Delivery
1769
1770 Packets that reach this table are ready for delivery. It contains pri‐
1771 ority-100 logical flows that match packets on each enabled logical
1772 router port, with action output;.
1773
1774
1775
1776Open vSwitch 2.10.0 ovn-northd ovn-northd(8)