1ovn-northd(8) OVN Manual ovn-northd(8)
2
3
4
5build/.PP
6
8 ovn-northd - Open Virtual Network central control daemon
9
11 ovn-northd [options]
12
14 ovn-northd is a centralized daemon responsible for translating the
15 high-level OVN configuration into logical configuration consumable by
16 daemons such as ovn-controller. It translates the logical network con‐
17 figuration in terms of conventional network concepts, taken from the
18 OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
19 the OVN Southbound Database (see ovn-sb(5)) below it.
20
22 --ovnnb-db=database
23 The OVSDB database containing the OVN Northbound Database. If
24 the OVN_NB_DB environment variable is set, its value is used as
25 the default. Otherwise, the default is unix:/ovnnb_db.sock.
26
27 --ovnsb-db=database
28 The OVSDB database containing the OVN Southbound Database. If
29 the OVN_SB_DB environment variable is set, its value is used as
30 the default. Otherwise, the default is unix:/ovnsb_db.sock.
31
32 database in the above options must be an OVSDB active or passive con‐
33 nection method, as described in ovsdb(7).
34
35 Daemon Options
36 --pidfile[=pidfile]
37 Causes a file (by default, program.pid) to be created indicating
38 the PID of the running process. If the pidfile argument is not
39 specified, or if it does not begin with /, then it is created in
40 .
41
42 If --pidfile is not specified, no pidfile is created.
43
44 --overwrite-pidfile
45 By default, when --pidfile is specified and the specified pid‐
46 file already exists and is locked by a running process, the dae‐
47 mon refuses to start. Specify --overwrite-pidfile to cause it to
48 instead overwrite the pidfile.
49
50 When --pidfile is not specified, this option has no effect.
51
52 --detach
53 Runs this program as a background process. The process forks,
54 and in the child it starts a new session, closes the standard
55 file descriptors (which has the side effect of disabling logging
56 to the console), and changes its current directory to the root
57 (unless --no-chdir is specified). After the child completes its
58 initialization, the parent exits.
59
60 --monitor
61 Creates an additional process to monitor this program. If it
62 dies due to a signal that indicates a programming error (SIGA‐
63 BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
64 or SIGXFSZ) then the monitor process starts a new copy of it. If
65 the daemon dies or exits for another reason, the monitor process
66 exits.
67
68 This option is normally used with --detach, but it also func‐
69 tions without it.
70
71 --no-chdir
72 By default, when --detach is specified, the daemon changes its
73 current working directory to the root directory after it
74 detaches. Otherwise, invoking the daemon from a carelessly cho‐
75 sen directory would prevent the administrator from unmounting
76 the file system that holds that directory.
77
78 Specifying --no-chdir suppresses this behavior, preventing the
79 daemon from changing its current working directory. This may be
80 useful for collecting core files, since it is common behavior to
81 write core dumps into the current working directory and the root
82 directory is not a good directory to use.
83
84 This option has no effect when --detach is not specified.
85
86 --no-self-confinement
87 By default this daemon will try to self-confine itself to work
88 with files under well-known directories whitelisted at build
89 time. It is better to stick with this default behavior and not
90 to use this flag unless some other Access Control is used to
91 confine daemon. Note that in contrast to other access control
92 implementations that are typically enforced from kernel-space
93 (e.g. DAC or MAC), self-confinement is imposed from the user-
94 space daemon itself and hence should not be considered as a full
95 confinement strategy, but instead should be viewed as an addi‐
96 tional layer of security.
97
98 --user=user:group
99 Causes this program to run as a different user specified in
100 user:group, thus dropping most of the root privileges. Short
101 forms user and :group are also allowed, with current user or
102 group assumed, respectively. Only daemons started by the root
103 user accepts this argument.
104
105 On Linux, daemons will be granted CAP_IPC_LOCK and
106 CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
107 that interact with a datapath, such as ovs-vswitchd, will be
108 granted three additional capabilities, namely CAP_NET_ADMIN,
109 CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
110 apply even if the new user is root.
111
112 On Windows, this option is not currently supported. For security
113 reasons, specifying this option will cause the daemon process
114 not to start.
115
116 Logging Options
117 -v[spec]
118 --verbose=[spec]
119 Sets logging levels. Without any spec, sets the log level for
120 every module and destination to dbg. Otherwise, spec is a list of
121 words separated by spaces or commas or colons, up to one from each
122 category below:
123
124 · A valid module name, as displayed by the vlog/list command
125 on ovs-appctl(8), limits the log level change to the speci‐
126 fied module.
127
128 · syslog, console, or file, to limit the log level change to
129 only to the system log, to the console, or to a file,
130 respectively. (If --detach is specified, the daemon closes
131 its standard file descriptors, so logging to the console
132 will have no effect.)
133
134 On Windows platform, syslog is accepted as a word and is
135 only useful along with the --syslog-target option (the word
136 has no effect otherwise).
137
138 · off, emer, err, warn, info, or dbg, to control the log
139 level. Messages of the given severity or higher will be
140 logged, and messages of lower severity will be filtered
141 out. off filters out all messages. See ovs-appctl(8) for a
142 definition of each log level.
143
144 Case is not significant within spec.
145
146 Regardless of the log levels set for file, logging to a file will
147 not take place unless --log-file is also specified (see below).
148
149 For compatibility with older versions of OVS, any is accepted as a
150 word but has no effect.
151
152 -v
153 --verbose
154 Sets the maximum logging verbosity level, equivalent to --ver‐
155 bose=dbg.
156
157 -vPATTERN:destination:pattern
158 --verbose=PATTERN:destination:pattern
159 Sets the log pattern for destination to pattern. Refer to
160 ovs-appctl(8) for a description of the valid syntax for pattern.
161
162 -vFACILITY:facility
163 --verbose=FACILITY:facility
164 Sets the RFC5424 facility of the log message. facility can be one
165 of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
166 ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
167 local4, local5, local6 or local7. If this option is not specified,
168 daemon is used as the default for the local system syslog and
169 local0 is used while sending a message to the target provided via
170 the --syslog-target option.
171
172 --log-file[=file]
173 Enables logging to a file. If file is specified, then it is used
174 as the exact name for the log file. The default log file name used
175 if file is omitted is /var/log/ovn/program.log.
176
177 --syslog-target=host:port
178 Send syslog messages to UDP port on host, in addition to the sys‐
179 tem syslog. The host must be a numerical IP address, not a host‐
180 name.
181
182 --syslog-method=method
183 Specify method as how syslog messages should be sent to syslog
184 daemon. The following forms are supported:
185
186 · libc, to use the libc syslog() function. Downside of using
187 this options is that libc adds fixed prefix to every mes‐
188 sage before it is actually sent to the syslog daemon over
189 /dev/log UNIX domain socket.
190
191 · unix:file, to use a UNIX domain socket directly. It is pos‐
192 sible to specify arbitrary message format with this option.
193 However, rsyslogd 8.9 and older versions use hard coded
194 parser function anyway that limits UNIX domain socket use.
195 If you want to use arbitrary message format with older
196 rsyslogd versions, then use UDP socket to localhost IP
197 address instead.
198
199 · udp:ip:port, to use a UDP socket. With this method it is
200 possible to use arbitrary message format also with older
201 rsyslogd. When sending syslog messages over UDP socket
202 extra precaution needs to be taken into account, for exam‐
203 ple, syslog daemon needs to be configured to listen on the
204 specified UDP port, accidental iptables rules could be
205 interfering with local syslog traffic and there are some
206 security considerations that apply to UDP sockets, but do
207 not apply to UNIX domain sockets.
208
209 · null, to discard all messages logged to syslog.
210
211 The default is taken from the OVS_SYSLOG_METHOD environment vari‐
212 able; if it is unset, the default is libc.
213
214 PKI Options
215 PKI configuration is required in order to use SSL for the connections
216 to the Northbound and Southbound databases.
217
218 -p privkey.pem
219 --private-key=privkey.pem
220 Specifies a PEM file containing the private key used as
221 identity for outgoing SSL connections.
222
223 -c cert.pem
224 --certificate=cert.pem
225 Specifies a PEM file containing a certificate that certi‐
226 fies the private key specified on -p or --private-key to be
227 trustworthy. The certificate must be signed by the certifi‐
228 cate authority (CA) that the peer in SSL connections will
229 use to verify it.
230
231 -C cacert.pem
232 --ca-cert=cacert.pem
233 Specifies a PEM file containing the CA certificate for ver‐
234 ifying certificates presented to this program by SSL peers.
235 (This may be the same certificate that SSL peers use to
236 verify the certificate specified on -c or --certificate, or
237 it may be a different one, depending on the PKI design in
238 use.)
239
240 -C none
241 --ca-cert=none
242 Disables verification of certificates presented by SSL
243 peers. This introduces a security risk, because it means
244 that certificates cannot be verified to be those of known
245 trusted hosts.
246
247 Other Options
248 --unixctl=socket
249 Sets the name of the control socket on which program listens for
250 runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
251 below). If socket does not begin with /, it is interpreted as
252 relative to . If --unixctl is not used at all, the default
253 socket is /program.pid.ctl, where pid is program’s process ID.
254
255 On Windows a local named pipe is used to listen for runtime man‐
256 agement commands. A file is created in the absolute path as
257 pointed by socket or if --unixctl is not used at all, a file is
258 created as program in the configured OVS_RUNDIR directory. The
259 file exists just to mimic the behavior of a Unix domain socket.
260
261 Specifying none for socket disables the control socket feature.
262
263
264
265 -h
266 --help
267 Prints a brief help message to the console.
268
269 -V
270 --version
271 Prints version information to the console.
272
274 ovs-appctl can send commands to a running ovn-northd process. The cur‐
275 rently supported commands are described below.
276
277 exit Causes ovn-northd to gracefully terminate.
278
279 pause Pauses the ovn-northd operation from processing any
280 Northbound and Southbound database changes. This will
281 also instruct ovn-northd to drop any lock on SB DB.
282
283 resume Resumes the ovn-northd operation to process Northbound
284 and Southbound database contents and generate logical
285 flows. This will also instruct ovn-northd to aspire for
286 the lock on SB DB.
287
288 is-paused
289 Returns "true" if ovn-northd is currently paused, "false"
290 otherwise.
291
292 status Prints this server’s status. Status will be "active" if
293 ovn-northd has acquired OVSDB lock on SB DB, "standby" if
294 it has not or "paused" if this instance is paused.
295
297 You may run ovn-northd more than once in an OVN deployment. When con‐
298 nected to a standalone or clustered DB setup, OVN will automatically
299 ensure that only one of them is active at a time. If multiple instances
300 of ovn-northd are running and the active ovn-northd fails, one of the
301 hot standby instances of ovn-northd will automatically take over.
302
303 Active-Standby with multiple OVN DB servers
304 You may run multiple OVN DB servers in an OVN deployment with:
305
306 · OVN DB servers deployed in active/passive mode with one
307 active and multiple passive ovsdb-servers.
308
309 · ovn-northd also deployed on all these nodes, using unix
310 ctl sockets to connect to the local OVN DB servers.
311
312 In such deployments, the ovn-northds on the passive nodes will process
313 the DB changes and compute logical flows to be thrown out later,
314 because write transactions are not allowed by the passive ovsdb-
315 servers. It results in unnecessary CPU usage.
316
317 With the help of runtime management command pause, you can pause
318 ovn-northd on these nodes. When a passive node becomes master, you can
319 use the runtime management command resume to resume the ovn-northd to
320 process the DB changes.
321
323 One of the main purposes of ovn-northd is to populate the Logical_Flow
324 table in the OVN_Southbound database. This section describes how
325 ovn-northd does this for switch and router logical datapaths.
326
327 Logical Switch Datapaths
328 Ingress Table 0: Admission Control and Ingress Port Security - L2
329
330 Ingress table 0 contains these logical flows:
331
332 · Priority 100 flows to drop packets with VLAN tags or mul‐
333 ticast Ethernet source addresses.
334
335 · Priority 50 flows that implement ingress port security
336 for each enabled logical port. For logical ports on which
337 port security is enabled, these match the inport and the
338 valid eth.src address(es) and advance only those packets
339 to the next flow table. For logical ports on which port
340 security is not enabled, these advance all packets that
341 match the inport.
342
343 There are no flows for disabled logical ports because the default-drop
344 behavior of logical flow tables causes packets that ingress from them
345 to be dropped.
346
347 Ingress Table 1: Ingress Port Security - IP
348
349 Ingress table 1 contains these logical flows:
350
351 · For each element in the port security set having one or
352 more IPv4 or IPv6 addresses (or both),
353
354 · Priority 90 flow to allow IPv4 traffic if it has
355 IPv4 addresses which match the inport, valid
356 eth.src and valid ip4.src address(es).
357
358 · Priority 90 flow to allow IPv4 DHCP discovery
359 traffic if it has a valid eth.src. This is neces‐
360 sary since DHCP discovery messages are sent from
361 the unspecified IPv4 address (0.0.0.0) since the
362 IPv4 address has not yet been assigned.
363
364 · Priority 90 flow to allow IPv6 traffic if it has
365 IPv6 addresses which match the inport, valid
366 eth.src and valid ip6.src address(es).
367
368 · Priority 90 flow to allow IPv6 DAD (Duplicate
369 Address Detection) traffic if it has a valid
370 eth.src. This is is necessary since DAD include
371 requires joining an multicast group and sending
372 neighbor solicitations for the newly assigned
373 address. Since no address is yet assigned, these
374 are sent from the unspecified IPv6 address (::).
375
376 · Priority 80 flow to drop IP (both IPv4 and IPv6)
377 traffic which match the inport and valid eth.src.
378
379 · One priority-0 fallback flow that matches all packets and
380 advances to the next table.
381
382 Ingress Table 2: Ingress Port Security - Neighbor discovery
383
384 Ingress table 2 contains these logical flows:
385
386 · For each element in the port security set,
387
388 · Priority 90 flow to allow ARP traffic which match
389 the inport and valid eth.src and arp.sha. If the
390 element has one or more IPv4 addresses, then it
391 also matches the valid arp.spa.
392
393 · Priority 90 flow to allow IPv6 Neighbor Solicita‐
394 tion and Advertisement traffic which match the
395 inport, valid eth.src and nd.sll/nd.tll. If the
396 element has one or more IPv6 addresses, then it
397 also matches the valid nd.target address(es) for
398 Neighbor Advertisement traffic.
399
400 · Priority 80 flow to drop ARP and IPv6 Neighbor
401 Solicitation and Advertisement traffic which match
402 the inport and valid eth.src.
403
404 · One priority-0 fallback flow that matches all packets and
405 advances to the next table.
406
407 Ingress Table 3:from-lportPre-ACLs
408
409 This table prepares flows for possible stateful ACL processing in
410 ingress table ACLs. It contains a priority-0 flow that simply moves
411 traffic to the next table. If stateful ACLs are used in the logical
412 datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
413 1; next;) for table Pre-stateful to send IP packets to the connection
414 tracker before eventually advancing to ingress table ACLs. If special
415 ports such as route ports or localnet ports can’t use ct(), a prior‐
416 ity-110 flow is added to skip over stateful ACLs.
417
418 This table also has a priority-110 flow with the match eth.dst == E for
419 all logical switch datapaths to move traffic to the next table. Where E
420 is the service monitor mac defined in the options:svc_monitor_mac colum
421 of NB_Global table.
422
423 Ingress Table 4: Pre-LB
424
425 This table prepares flows for possible stateful load balancing process‐
426 ing in ingress table LB and Stateful. It contains a priority-0 flow
427 that simply moves traffic to the next table. Moreover it contains a
428 priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
429 table. If load balancing rules with virtual IP addresses (and ports)
430 are configured in OVN_Northbound database for a logical switch data‐
431 path, a priority-100 flow is added for each configured virtual IP
432 address VIP. For IPv4 VIPs, the match is ip && ip4.dst == VIP. For IPv6
433 VIPs, the match is ip && ip6.dst == VIP. The flow sets an action
434 reg0[0] = 1; next; to act as a hint for table Pre-stateful to send IP
435 packets to the connection tracker for packet de-fragmentation before
436 eventually advancing to ingress table LB. If controller_event has been
437 enabled and load balancing rules with empty backends have been added in
438 OVN_Northbound, a 130 flow is added to trigger ovn-controller events
439 whenever the chassis receives a packet for that particular VIP. If
440 event-elb meter has been previously created, it will be associated to
441 the empty_lb logical flow
442
443 This table also has a priority-110 flow with the match eth.dst == E for
444 all logical switch datapaths to move traffic to the next table. Where E
445 is the service monitor mac defined in the options:svc_monitor_mac colum
446 of NB_Global table.
447
448 Ingress Table 5: Pre-stateful
449
450 This table prepares flows for all possible stateful processing in next
451 tables. It contains a priority-0 flow that simply moves traffic to the
452 next table. A priority-100 flow sends the packets to connection tracker
453 based on a hint provided by the previous tables (with a match for
454 reg0[0] == 1) by using the ct_next; action.
455
456 Ingress table 6:from-lportACLs
457
458 Logical flows in this table closely reproduce those in the ACL table in
459 the OVN_Northbound database for the from-lport direction. The priority
460 values from the ACL table have a limited range and have 1000 added to
461 them to leave room for OVN default flows at both higher and lower pri‐
462 orities.
463
464 · allow ACLs translate into logical flows with the next;
465 action. If there are any stateful ACLs on this datapath,
466 then allow ACLs translate to ct_commit; next; (which acts
467 as a hint for the next tables to commit the connection to
468 conntrack),
469
470 · allow-related ACLs translate into logical flows with the
471 ct_commit(ct_label=0/1); next; actions for new connec‐
472 tions and reg0[1] = 1; next; for existing connections.
473
474 · reject ACLs translate into logical flows with the
475 tcp_reset { output <-> inport; next(pipeline=egress,ta‐
476 ble=5);} action for TCP connections and icmp4/icmp6
477 action for UDP connections.
478
479 · Other ACLs translate to drop; for new or untracked con‐
480 nections and ct_commit(ct_label=1/1); for known connec‐
481 tions. Setting ct_label marks a connection as one that
482 was previously allowed, but should no longer be allowed
483 due to a policy change.
484
485 This table also contains a priority 0 flow with action next;, so that
486 ACLs allow packets by default. If the logical datapath has a statetful
487 ACL, the following flows will also be added:
488
489 · A priority-1 flow that sets the hint to commit IP traffic
490 to the connection tracker (with action reg0[1] = 1;
491 next;). This is needed for the default allow policy
492 because, while the initiator’s direction may not have any
493 stateful rules, the server’s may and then its return
494 traffic would not be known and marked as invalid.
495
496 · A priority-65535 flow that allows any traffic in the
497 reply direction for a connection that has been committed
498 to the connection tracker (i.e., established flows), as
499 long as the committed flow does not have ct_label.blocked
500 set. We only handle traffic in the reply direction here
501 because we want all packets going in the request direc‐
502 tion to still go through the flows that implement the
503 currently defined policy based on ACLs. If a connection
504 is no longer allowed by policy, ct_label.blocked will get
505 set and packets in the reply direction will no longer be
506 allowed, either.
507
508 · A priority-65535 flow that allows any traffic that is
509 considered related to a committed flow in the connection
510 tracker (e.g., an ICMP Port Unreachable from a non-lis‐
511 tening UDP port), as long as the committed flow does not
512 have ct_label.blocked set.
513
514 · A priority-65535 flow that drops all traffic marked by
515 the connection tracker as invalid.
516
517 · A priority-65535 flow that drops all traffic in the reply
518 direction with ct_label.blocked set meaning that the con‐
519 nection should no longer be allowed due to a policy
520 change. Packets in the request direction are skipped here
521 to let a newly created ACL re-allow this connection.
522
523 · A priority 34000 logical flow is added for each logical
524 switch datapath with the match eth.dst = E to allow the
525 service monitor reply packet destined to ovn-controller
526 with the action next, where E is the service monitor mac
527 defined in the options:svc_monitor_mac colum of NB_Global
528 table.
529
530 Ingress Table 7:from-lportQoS Marking
531
532 Logical flows in this table closely reproduce those in the QoS table
533 with the action column set in the OVN_Northbound database for the
534 from-lport direction.
535
536 · For every qos_rules entry in a logical switch with DSCP
537 marking enabled, a flow will be added at the priority
538 mentioned in the QoS table.
539
540 · One priority-0 fallback flow that matches all packets and
541 advances to the next table.
542
543 Ingress Table 8:from-lportQoS Meter
544
545 Logical flows in this table closely reproduce those in the QoS table
546 with the bandwidth column set in the OVN_Northbound database for the
547 from-lport direction.
548
549 · For every qos_rules entry in a logical switch with meter‐
550 ing enabled, a flow will be added at the priorirty men‐
551 tioned in the QoS table.
552
553 · One priority-0 fallback flow that matches all packets and
554 advances to the next table.
555
556 Ingress Table 9: LB
557
558 It contains a priority-0 flow that simply moves traffic to the next ta‐
559 ble. For established connections a priority 100 flow matches on ct.est
560 && !ct.rel && !ct.new && !ct.inv and sets an action reg0[2] = 1; next;
561 to act as a hint for table Stateful to send packets through connection
562 tracker to NAT the packets. (The packet will automatically get DNATed
563 to the same IP address as the first packet in that connection.)
564
565 Ingress Table 10: Stateful
566
567 · For all the configured load balancing rules for a switch
568 in OVN_Northbound database that includes a L4 port PORT
569 of protocol P and IP address VIP, a priority-120 flow is
570 added. For IPv4 VIPs , the flow matches ct.new && ip &&
571 ip4.dst == VIP && P && P.dst == PORT. For IPv6 VIPs, the
572 flow matches ct.new && ip && ip6.dst == VIP && P && P.dst
573 == PORT. The flow’s action is ct_lb(args) , where args
574 contains comma separated IP addresses (and optional port
575 numbers) to load balance to. The address family of the IP
576 addresses of args is the same as the address family of
577 VIP. If health check is enabled, then args will only con‐
578 tain those endpoints whose service monitor status entry
579 in OVN_Southbound db is either online or empty.
580
581 · For all the configured load balancing rules for a switch
582 in OVN_Northbound database that includes just an IP
583 address VIP to match on, OVN adds a priority-110 flow.
584 For IPv4 VIPs, the flow matches ct.new && ip && ip4.dst
585 == VIP. For IPv6 VIPs, the flow matches ct.new && ip &&
586 ip6.dst == VIP. The action on this flow is ct_lb(args),
587 where args contains comma separated IP addresses of the
588 same address family as VIP.
589
590 · A priority-100 flow commits packets to connection tracker
591 using ct_commit; next; action based on a hint provided by
592 the previous tables (with a match for reg0[1] == 1).
593
594 · A priority-100 flow sends the packets to connection
595 tracker using ct_lb; as the action based on a hint pro‐
596 vided by the previous tables (with a match for reg0[2] ==
597 1).
598
599 · A priority-0 flow that simply moves traffic to the next
600 table.
601
602 Ingress Table 11: Pre-Hairpin
603
604 · For all configured load balancer backends a priority-2
605 flow that matches on traffic that needs to be hairpinned,
606 i.e., after load balancing the destination IP matches the
607 source IP, which sets reg0[6] = 1 and executes
608 ct_snat(VIP) to force replies to these packets to come
609 back through OVN.
610
611 · For all configured load balancer backends a priority-1
612 flow that matches on replies to hairpinned traffic, i.e.,
613 destination IP is VIP, source IP is the backend IP and
614 source L4 port is backend port, which sets reg0[6] = 1
615 and executes ct_snat;.
616
617 · A priority-0 flow that simply moves traffic to the next
618 table.
619
620 Ingress Table 12: Hairpin
621
622 · A priority-1 flow that hairpins traffic matched by non-
623 default flows in the Pre-Hairpin table. Hairpinning is
624 done at L2, Ethernet addresses are swapped and the pack‐
625 ets are looped back on the input port.
626
627 · A priority-0 flow that simply moves traffic to the next
628 table.
629
630 Ingress Table 13: ARP/ND responder
631
632 This table implements ARP/ND responder in a logical switch for known
633 IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
634 by locally responding to ARP requests without the need to send to other
635 hypervisors. One common case is when the inport is a logical port asso‐
636 ciated with a VIF and the broadcast is responded to on the local hyper‐
637 visor rather than broadcast across the whole network and responded to
638 by the destination VM. This behavior is proxy ARP.
639
640 ARP requests arrive from VMs from a logical switch inport of type
641 default. For this case, the logical switch proxy ARP rules can be for
642 other VMs or logical router ports. Logical switch proxy ARP rules may
643 be programmed both for mac binding of IP addresses on other logical
644 switch VIF ports (which are of the default logical switch port type,
645 representing connectivity to VMs or containers), and for mac binding of
646 IP addresses on logical switch router type ports, representing their
647 logical router port peers. In order to support proxy ARP for logical
648 router ports, an IP address must be configured on the logical switch
649 router type port, with the same value as the peer logical router port.
650 The configured MAC addresses must match as well. When a VM sends an ARP
651 request for a distributed logical router port and if the peer router
652 type port of the attached logical switch does not have an IP address
653 configured, the ARP request will be broadcast on the logical switch.
654 One of the copies of the ARP request will go through the logical switch
655 router type port to the logical router datapath, where the logical
656 router ARP responder will generate a reply. The MAC binding of a dis‐
657 tributed logical router, once learned by an associated VM, is used for
658 all that VM’s communication needing routing. Hence, the action of a VM
659 re-arping for the mac binding of the logical router port should be
660 rare.
661
662 Logical switch ARP responder proxy ARP rules can also be hit when
663 receiving ARP requests externally on a L2 gateway port. In this case,
664 the hypervisor acting as an L2 gateway, responds to the ARP request on
665 behalf of a destination VM.
666
667 Note that ARP requests received from localnet or vtep logical inports
668 can either go directly to VMs, in which case the VM responds or can hit
669 an ARP responder for a logical router port if the packet is used to
670 resolve a logical router port next hop address. In either case, logical
671 switch ARP responder rules will not be hit. It contains these logical
672 flows:
673
674 · Priority-100 flows to skip the ARP responder if inport is
675 of type localnet or vtep and advances directly to the
676 next table. ARP requests sent to localnet or vtep ports
677 can be received by multiple hypervisors. Now, because the
678 same mac binding rules are downloaded to all hypervisors,
679 each of the multiple hypervisors will respond. This will
680 confuse L2 learning on the source of the ARP requests.
681 ARP requests received on an inport of type router are not
682 expected to hit any logical switch ARP responder flows.
683 However, no skip flows are installed for these packets,
684 as there would be some additional flow cost for this and
685 the value appears limited.
686
687 · If inport V is of type virtual adds a priority-100 logi‐
688 cal flow for each P configured in the options:virtual-
689 parents column with the match
690
691 inport == P && && ((arp.op == 1 && arp.spa == VIP && arp.tpa == VIP) || (arp.op == 2 && arp.spa == VIP))
692
693
694 and applies the action
695
696 bind_vport(V, inport);
697
698
699 and advances the packet to the next table.
700
701 Where VIP is the virtual ip configured in the column
702 options:virtual-ip.
703
704 · Priority-50 flows that match ARP requests to each known
705 IP address A of every logical switch port, and respond
706 with ARP replies directly with corresponding Ethernet
707 address E:
708
709 eth.dst = eth.src;
710 eth.src = E;
711 arp.op = 2; /* ARP reply. */
712 arp.tha = arp.sha;
713 arp.sha = E;
714 arp.tpa = arp.spa;
715 arp.spa = A;
716 outport = inport;
717 flags.loopback = 1;
718 output;
719
720
721 These flows are omitted for logical ports (other than
722 router ports or localport ports) that are down, for logi‐
723 cal ports of type virtual and for logical ports with
724 ’unknown’ address set.
725
726 · Priority-50 flows that match IPv6 ND neighbor solicita‐
727 tions to each known IP address A (and A’s solicited node
728 address) of every logical switch port except of type
729 router, and respond with neighbor advertisements directly
730 with corresponding Ethernet address E:
731
732 nd_na {
733 eth.src = E;
734 ip6.src = A;
735 nd.target = A;
736 nd.tll = E;
737 outport = inport;
738 flags.loopback = 1;
739 output;
740 };
741
742
743 Priority-50 flows that match IPv6 ND neighbor solicita‐
744 tions to each known IP address A (and A’s solicited node
745 address) of logical switch port of type router, and
746 respond with neighbor advertisements directly with corre‐
747 sponding Ethernet address E:
748
749 nd_na_router {
750 eth.src = E;
751 ip6.src = A;
752 nd.target = A;
753 nd.tll = E;
754 outport = inport;
755 flags.loopback = 1;
756 output;
757 };
758
759
760 These flows are omitted for logical ports (other than
761 router ports or localport ports) that are down and for
762 logical ports of type virtual.
763
764 · Priority-100 flows with match criteria like the ARP and
765 ND flows above, except that they only match packets from
766 the inport that owns the IP addresses in question, with
767 action next;. These flows prevent OVN from replying to,
768 for example, an ARP request emitted by a VM for its own
769 IP address. A VM only makes this kind of request to
770 attempt to detect a duplicate IP address assignment, so
771 sending a reply will prevent the VM from accepting the IP
772 address that it owns.
773
774 In place of next;, it would be reasonable to use drop;
775 for the flows’ actions. If everything is working as it is
776 configured, then this would produce equivalent results,
777 since no host should reply to the request. But ARPing for
778 one’s own IP address is intended to detect situations
779 where the network is not working as configured, so drop‐
780 ping the request would frustrate that intent.
781
782 · For each SVC_MON_SRC_IP defined in the value of the
783 ip_port_mappings:ENDPOINT_IP column of Load_Balancer ta‐
784 ble, priority-110 logical flow is added with the match
785 arp.tpa == SVC_MON_SRC_IP && && arp.op == 1 and applies
786 the action
787
788 eth.dst = eth.src;
789 eth.src = E;
790 arp.op = 2; /* ARP reply. */
791 arp.tha = arp.sha;
792 arp.sha = E;
793 arp.tpa = arp.spa;
794 arp.spa = A;
795 outport = inport;
796 flags.loopback = 1;
797 output;
798
799
800 where E is the service monitor source mac defined in the
801 options:svc_monitor_mac column in the NB_Global table.
802 This mac is used as the source mac in the service monitor
803 packets for the load balancer endpoint IP health checks.
804
805 SVC_MON_SRC_IP is used as the source ip in the service
806 monitor IPv4 packets for the load balancer endpoint IP
807 health checks.
808
809 These flows are required if an ARP request is sent for
810 the IP SVC_MON_SRC_IP.
811
812 · For each VIP configured in the table Forwarding_Group a
813 priority-50 logical flow is added with the match arp.tpa
814 == vip && && arp.op == 1
815 and applies the action
816
817 eth.dst = eth.src;
818 eth.src = E;
819 arp.op = 2; /* ARP reply. */
820 arp.tha = arp.sha;
821 arp.sha = E;
822 arp.tpa = arp.spa;
823 arp.spa = A;
824 outport = inport;
825 flags.loopback = 1;
826 output;
827
828
829 where E is the forwarding group’s mac defined in the
830 vmac.
831
832 A is used as either the destination ip for load balancing
833 traffic to child ports or as nexthop to hosts behind the
834 child ports.
835
836 These flows are required to respond to an ARP request if
837 an ARP request is sent for the IP vip.
838
839 · One priority-0 fallback flow that matches all packets and
840 advances to the next table.
841
842 Ingress Table 14: DHCP option processing
843
844 This table adds the DHCPv4 options to a DHCPv4 packet from the logical
845 ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
846 larly for DHCPv6 options. This table also adds flows for the logical
847 ports of type external.
848
849 · A priority-100 logical flow is added for these logical
850 ports which matches the IPv4 packet with udp.src = 68 and
851 udp.dst = 67 and applies the action put_dhcp_opts and
852 advances the packet to the next table.
853
854 reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
855 next;
856
857
858 For DHCPDISCOVER and DHCPREQUEST, this transforms the
859 packet into a DHCP reply, adds the DHCP offer IP ip and
860 options to the packet, and stores 1 into reg0[3]. For
861 other kinds of packets, it just stores 0 into reg0[3].
862 Either way, it continues to the next table.
863
864 · A priority-100 logical flow is added for these logical
865 ports which matches the IPv6 packet with udp.src = 546
866 and udp.dst = 547 and applies the action put_dhcpv6_opts
867 and advances the packet to the next table.
868
869 reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
870 next;
871
872
873 For DHCPv6 Solicit/Request/Confirm packets, this trans‐
874 forms the packet into a DHCPv6 Advertise/Reply, adds the
875 DHCPv6 offer IP ip and options to the packet, and stores
876 1 into reg0[3]. For other kinds of packets, it just
877 stores 0 into reg0[3]. Either way, it continues to the
878 next table.
879
880 · A priority-0 flow that matches all packets to advances to
881 table 15.
882
883 Ingress Table 15: DHCP responses
884
885 This table implements DHCP responder for the DHCP replies generated by
886 the previous table.
887
888 · A priority 100 logical flow is added for the logical
889 ports configured with DHCPv4 options which matches IPv4
890 packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
891 1 and responds back to the inport after applying these
892 actions. If reg0[3] is set to 1, it means that the action
893 put_dhcp_opts was successful.
894
895 eth.dst = eth.src;
896 eth.src = E;
897 ip4.src = S;
898 udp.src = 67;
899 udp.dst = 68;
900 outport = P;
901 flags.loopback = 1;
902 output;
903
904
905 where E is the server MAC address and S is the server
906 IPv4 address defined in the DHCPv4 options. Note that
907 ip4.dst field is handled by put_dhcp_opts.
908
909 (This terminates ingress packet processing; the packet
910 does not go to the next ingress table.)
911
912 · A priority 100 logical flow is added for the logical
913 ports configured with DHCPv6 options which matches IPv6
914 packets with udp.src == 546 && udp.dst == 547 && reg0[3]
915 == 1 and responds back to the inport after applying these
916 actions. If reg0[3] is set to 1, it means that the action
917 put_dhcpv6_opts was successful.
918
919 eth.dst = eth.src;
920 eth.src = E;
921 ip6.dst = A;
922 ip6.src = S;
923 udp.src = 547;
924 udp.dst = 546;
925 outport = P;
926 flags.loopback = 1;
927 output;
928
929
930 where E is the server MAC address and S is the server
931 IPv6 LLA address generated from the server_id defined in
932 the DHCPv6 options and A is the IPv6 address defined in
933 the logical port’s addresses column.
934
935 (This terminates packet processing; the packet does not
936 go on the next ingress table.)
937
938 · A priority-0 flow that matches all packets to advances to
939 table 16.
940
941 Ingress Table 16 DNS Lookup
942
943 This table looks up and resolves the DNS names to the corresponding
944 configured IP address(es).
945
946 · A priority-100 logical flow for each logical switch data‐
947 path if it is configured with DNS records, which matches
948 the IPv4 and IPv6 packets with udp.dst = 53 and applies
949 the action dns_lookup and advances the packet to the next
950 table.
951
952 reg0[4] = dns_lookup(); next;
953
954
955 For valid DNS packets, this transforms the packet into a
956 DNS reply if the DNS name can be resolved, and stores 1
957 into reg0[4]. For failed DNS resolution or other kinds of
958 packets, it just stores 0 into reg0[4]. Either way, it
959 continues to the next table.
960
961 Ingress Table 17 DNS Responses
962
963 This table implements DNS responder for the DNS replies generated by
964 the previous table.
965
966 · A priority-100 logical flow for each logical switch data‐
967 path if it is configured with DNS records, which matches
968 the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
969 1 and responds back to the inport after applying these
970 actions. If reg0[4] is set to 1, it means that the action
971 dns_lookup was successful.
972
973 eth.dst <-> eth.src;
974 ip4.src <-> ip4.dst;
975 udp.dst = udp.src;
976 udp.src = 53;
977 outport = P;
978 flags.loopback = 1;
979 output;
980
981
982 (This terminates ingress packet processing; the packet
983 does not go to the next ingress table.)
984
985 Ingress table 18 External ports
986
987 Traffic from the external logical ports enter the ingress datapath
988 pipeline via the localnet port. This table adds the below logical flows
989 to handle the traffic from these ports.
990
991 · A priority-100 flow is added for each external logical
992 port which doesn’t reside on a chassis to drop the
993 ARP/IPv6 NS request to the router IP(s) (of the logical
994 switch) which matches on the inport of the external logi‐
995 cal port and the valid eth.src address(es) of the exter‐
996 nal logical port.
997
998 This flow guarantees that the ARP/NS request to the
999 router IP address from the external ports is responded by
1000 only the chassis which has claimed these external ports.
1001 All the other chassis, drops these packets.
1002
1003 · A priority-0 flow that matches all packets to advances to
1004 table 19.
1005
1006 Ingress Table 19 Destination Lookup
1007
1008 This table implements switching behavior. It contains these logical
1009 flows:
1010
1011 · A priorirty-110 flow with the match eth.src == E for all
1012 logical switch datapaths and applies the action han‐
1013 dle_svc_check(inport). Where E is the service monitor mac
1014 defined in the options:svc_monitor_mac colum of NB_Global
1015 table.
1016
1017 · A priority-100 flow that punts all IGMP/MLD packets to
1018 ovn-controller if multicast snooping is enabled on the
1019 logical switch. The flow also forwards the IGMP/MLD pack‐
1020 ets to the MC_MROUTER_STATIC multicast group, which
1021 ovn-northd populates with all the logical ports that have
1022 options :mcast_flood_reports=’true’.
1023
1024 · Priority-90 flows that forward registered IP multicast
1025 traffic to their corresponding multicast group, which
1026 ovn-northd creates based on learnt IGMP_Group entries.
1027 The flows also forward packets to the MC_MROUTER_FLOOD
1028 multicast group, which ovn-nortdh populates with all the
1029 logical ports that are connected to logical routers with
1030 options:mcast_relay=’true’.
1031
1032 · A priority-85 flow that forwards all IP multicast traffic
1033 destined to 224.0.0.X to the MC_FLOOD multicast group,
1034 which ovn-northd populates with all enabled logical
1035 ports.
1036
1037 · A priority-85 flow that forwards all IP multicast traffic
1038 destined to reserved multicast IPv6 addresses (RFC 4291,
1039 2.7.1, e.g., Solicited-Node multicast) to the MC_FLOOD
1040 multicast group, which ovn-northd populates with all
1041 enabled logical ports.
1042
1043 · A priority-80 flow that forwards all unregistered IP mul‐
1044 ticast traffic to the MC_STATIC multicast group, which
1045 ovn-northd populates with all the logical ports that have
1046 options :mcast_flood=’true’. The flow also forwards
1047 unregistered IP multicast traffic to the MC_MROUTER_FLOOD
1048 multicast group, which ovn-northd populates with all the
1049 logical ports connected to logical routers that have
1050 options :mcast_relay=’true’.
1051
1052 · A priority-80 flow that drops all unregistered IP multi‐
1053 cast traffic if other_config :mcast_snoop=’true’ and
1054 other_config :mcast_flood_unregistered=’false’ and the
1055 switch is not connected to a logical router that has
1056 options :mcast_relay=’true’ and the switch doesn’t have
1057 any logical port with options :mcast_flood=’true’.
1058
1059 · Priority-80 flows for each port connected to a logical
1060 router matching self originated GARP/ARP request/ND pack‐
1061 ets. These packets are flooded to the MC_FLOOD which con‐
1062 tains all logical ports.
1063
1064 · Priority-75 flows for each IP address/VIP/NAT address
1065 owned by a router port connected to the switch. These
1066 flows match ARP requests and ND packets for the specific
1067 IP addresses. Matched packets are forwarded only to the
1068 router that owns the IP address and, if present, to the
1069 localnet port of the logical switch.
1070
1071 · A priority-70 flow that outputs all packets with an Eth‐
1072 ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
1073 ticast group.
1074
1075 · One priority-50 flow that matches each known Ethernet
1076 address against eth.dst and outputs the packet to the
1077 single associated output port.
1078
1079 For the Ethernet address on a logical switch port of type
1080 router, when that logical switch port’s addresses column
1081 is set to router and the connected logical router port
1082 specifies a redirect-chassis:
1083
1084 · The flow for the connected logical router port’s
1085 Ethernet address is only programmed on the redi‐
1086 rect-chassis.
1087
1088 · If the logical router has rules specified in nat
1089 with external_mac, then those addresses are also
1090 used to populate the switch’s destination lookup
1091 on the chassis where logical_port is resident.
1092
1093 For the Ethernet address on a logical switch port of type
1094 router, when that logical switch port’s addresses column
1095 is set to router and the connected logical router port
1096 specifies a reside-on-redirect-chassis and the logical
1097 router to which the connected logical router port belongs
1098 to has a redirect-chassis distributed gateway logical
1099 router port:
1100
1101 · The flow for the connected logical router port’s
1102 Ethernet address is only programmed on the redi‐
1103 rect-chassis.
1104
1105 For each forwarding group configured on the logical
1106 switch datapath, a priority-50 flow that matches on
1107 eth.dst == VIP
1108 with an action of fwd_group(childports=args ), where
1109 args contains comma separated logical switch child ports
1110 to load balance to. If liveness is enabled, then action
1111 also includes liveness=true.
1112
1113 · One priority-0 fallback flow that matches all packets and
1114 outputs them to the MC_UNKNOWN multicast group, which
1115 ovn-northd populates with all enabled logical ports that
1116 accept unknown destination packets. As a small optimiza‐
1117 tion, if no logical ports accept unknown destination
1118 packets, ovn-northd omits this multicast group and logi‐
1119 cal flow.
1120
1121 Egress Table 0: Pre-LB
1122
1123 This table is similar to ingress table Pre-LB. It contains a priority-0
1124 flow that simply moves traffic to the next table. Moreover it contains
1125 a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
1126 table. If any load balancing rules exist for the datapath, a prior‐
1127 ity-100 flow is added with a match of ip and action of reg0[0] = 1;
1128 next; to act as a hint for table Pre-stateful to send IP packets to the
1129 connection tracker for packet de-fragmentation.
1130
1131 This table also has a priority-110 flow with the match eth.src == E for
1132 all logical switch datapaths to move traffic to the next table. Where E
1133 is the service monitor mac defined in the options:svc_monitor_mac colum
1134 of NB_Global table.
1135
1136 Egress Table 1:to-lportPre-ACLs
1137
1138 This is similar to ingress table Pre-ACLs except for to-lport traffic.
1139
1140 This table also has a priority-110 flow with the match eth.src == E for
1141 all logical switch datapaths to move traffic to the next table. Where E
1142 is the service monitor mac defined in the options:svc_monitor_mac colum
1143 of NB_Global table.
1144
1145 Egress Table 2: Pre-stateful
1146
1147 This is similar to ingress table Pre-stateful.
1148
1149 Egress Table 3: LB
1150
1151 This is similar to ingress table LB.
1152
1153 Egress Table 4:to-lportACLs
1154
1155 This is similar to ingress table ACLs except for to-lport ACLs.
1156
1157 In addition, the following flows are added.
1158
1159 · A priority 34000 logical flow is added for each logical
1160 port which has DHCPv4 options defined to allow the DHCPv4
1161 reply packet and which has DHCPv6 options defined to
1162 allow the DHCPv6 reply packet from the Ingress Table 15:
1163 DHCP responses.
1164
1165 · A priority 34000 logical flow is added for each logical
1166 switch datapath configured with DNS records with the
1167 match udp.dst = 53 to allow the DNS reply packet from the
1168 Ingress Table 17: DNS responses.
1169
1170 · A priority 34000 logical flow is added for each logical
1171 switch datapath with the match eth.src = E to allow the
1172 service monitor request packet generated by ovn-con‐
1173 troller with the action next, where E is the service mon‐
1174 itor mac defined in the options:svc_monitor_mac colum of
1175 NB_Global table.
1176
1177 Egress Table 5:to-lportQoS Marking
1178
1179 This is similar to ingress table QoS marking except they apply to
1180 to-lport QoS rules.
1181
1182 Egress Table 6:to-lportQoS Meter
1183
1184 This is similar to ingress table QoS meter except they apply to
1185 to-lport QoS rules.
1186
1187 Egress Table 7: Stateful
1188
1189 This is similar to ingress table Stateful except that there are no
1190 rules added for load balancing new connections.
1191
1192 Egress Table 8: Egress Port Security - IP
1193
1194 This is similar to the port security logic in table Ingress Port Secu‐
1195 rity - IP except that outport, eth.dst, ip4.dst and ip6.dst are checked
1196 instead of inport, eth.src, ip4.src and ip6.src
1197
1198 Egress Table 9: Egress Port Security - L2
1199
1200 This is similar to the ingress port security logic in ingress table
1201 Admission Control and Ingress Port Security - L2, but with important
1202 differences. Most obviously, outport and eth.dst are checked instead of
1203 inport and eth.src. Second, packets directed to broadcast or multicast
1204 eth.dst are always accepted instead of being subject to the port secu‐
1205 rity rules; this is implemented through a priority-100 flow that
1206 matches on eth.mcast with action output;. Moreover, to ensure that even
1207 broadcast and multicast packets are not delivered to disabled logical
1208 ports, a priority-150 flow for each disabled logical outport overrides
1209 the priority-100 flow with a drop; action. Finally if egress qos has
1210 been enabled on a localnet port, the outgoing queue id is set through
1211 set_queue action. Please remember to mark the corresponding physical
1212 interface with ovn-egress-iface set to true in external_ids
1213
1214 Logical Router Datapaths
1215 Logical router datapaths will only exist for Logical_Router rows in the
1216 OVN_Northbound database that do not have enabled set to false
1217
1218 Ingress Table 0: L2 Admission Control
1219
1220 This table drops packets that the router shouldn’t see at all based on
1221 their Ethernet headers. It contains the following flows:
1222
1223 · Priority-100 flows to drop packets with VLAN tags or mul‐
1224 ticast Ethernet source addresses.
1225
1226 · For each enabled router port P with Ethernet address E, a
1227 priority-50 flow that matches inport == P && (eth.mcast
1228 || eth.dst == E), with action next;.
1229
1230 For the gateway port on a distributed logical router
1231 (where one of the logical router ports specifies a redi‐
1232 rect-chassis), the above flow matching eth.dst == E is
1233 only programmed on the gateway port instance on the redi‐
1234 rect-chassis.
1235
1236 · For each dnat_and_snat NAT rule on a distributed router
1237 that specifies an external Ethernet address E, a prior‐
1238 ity-50 flow that matches inport == GW && eth.dst == E,
1239 where GW is the logical router gateway port, with action
1240 next;.
1241
1242 This flow is only programmed on the gateway port instance
1243 on the chassis where the logical_port specified in the
1244 NAT rule resides.
1245
1246 Other packets are implicitly dropped.
1247
1248 Ingress Table 1: Neighbor lookup
1249
1250 For ARP and IPv6 Neighbor Discovery packets, this table looks into the
1251 MAC_Binding records to determine if OVN needs to learn the mac bind‐
1252 ings. Following flows are added:
1253
1254 · For each router port P that owns IP address A, which
1255 belongs to subnet S with prefix length L, a priority-100
1256 flow is added which matches inport == P && arp.spa == S/L
1257 && arp.op == 1 (ARP request) with the following actions:
1258
1259 reg9[4] = lookup_arp(inport, arp.spa, arp.sha);
1260 next;
1261
1262
1263 If the logical router port P is a distributed gateway
1264 router port, additional match is_chassis_resident(cr-P)
1265 is added so that the resident gateway chassis handles the
1266 neighbor lookup.
1267
1268 · A priority-100 flow which matches on ARP reply packets
1269 and applies the actions:
1270
1271 reg9[4] = lookup_arp(inport, arp.spa, arp.sha);
1272 next;
1273
1274
1275 · A priority-100 flow which matches on IPv6 Neighbor Dis‐
1276 covery advertisement packet and applies the actions:
1277
1278 reg9[4] = lookup_nd(inport, nd.target, nd.tll);
1279 next;
1280
1281
1282 · A priority-100 flow which matches on IPv6 Neighbor Dis‐
1283 covery solicitation packet and applies the actions:
1284
1285 reg9[4] = lookup_nd(inport, ip6.src, nd.sll);
1286 next;
1287
1288
1289 · A priority-0 fallback flow that matches all packets and
1290 applies the action reg9[5] = 1; next; advancing the
1291 packet to the next table.
1292
1293 Ingress Table 2: Neighbor learning
1294
1295 This table adds flows to learn the mac bindings from the ARP and IPv6
1296 Neighbor Solicitation/Advertisement packets if ARP/ND lookup failed in
1297 the previous table.
1298
1299 reg9[4] will be 1 if the lookup_arp/lookup_nd in the previous table was
1300 successful.
1301
1302 reg9[5] will be 1 if there was no need to do the lookup.
1303
1304 · A priority-100 flow with the match reg9[4] == 1 ||
1305 reg9[5] == 1 and advances the packet to the next table as
1306 there is no need to learn the neighbor.
1307
1308 · A priority-90 flow with the match arp and applies the
1309 action put_arp(inport, arp.spa, arp.sha); next;
1310
1311 · A priority-90 flow with the match nd_na and applies the
1312 action put_nd(inport, nd.target, nd.tll); next;
1313
1314 · A priority-90 flow with the match nd_ns and applies the
1315 action put_nd(inport, ip6.src, nd.sll); next;
1316
1317 Ingress Table 3: IP Input
1318
1319 This table is the core of the logical router datapath functionality. It
1320 contains the following flows to implement very basic IP host function‐
1321 ality.
1322
1323 · For each NAT entry of a distributed logical router (with
1324 distributed gateway router port) of type snat, a pri‐
1325 orirty-120 flow with the match inport == P && ip4.src ==
1326 A advances the packet to the next pipeline, where P is
1327 the distributed logical router port and A is the exter‐
1328 nal_ip set in the NAT entry. If A is an IPv6 address,
1329 then ip6.src is used for the match.
1330
1331 The above flow is required to handle the routing of the
1332 East/west NAT traffic.
1333
1334 · L3 admission control: A priority-100 flow drops packets
1335 that match any of the following:
1336
1337 · ip4.src[28..31] == 0xe (multicast source)
1338
1339 · ip4.src == 255.255.255.255 (broadcast source)
1340
1341 · ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
1342 (localhost source or destination)
1343
1344 · ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
1345 network source or destination)
1346
1347 · ip4.src or ip6.src is any IP address owned by the
1348 router, unless the packet was recirculated due to
1349 egress loopback as indicated by REG‐
1350 BIT_EGRESS_LOOPBACK.
1351
1352 · ip4.src is the broadcast address of any IP network
1353 known to the router.
1354
1355 · A priority-100 flow parses DHCPv6 replies from IPv6 pre‐
1356 fix delegation routers (udp.src == 547 && udp.dst ==
1357 546). The handle_dhcpv6_reply is used to send IPv6 prefix
1358 delegation messages to the delegation router.
1359
1360 · ICMP echo reply. These flows reply to ICMP echo requests
1361 received for the router’s IP address. Let A be an IP
1362 address owned by a router port. Then, for each A that is
1363 an IPv4 address, a priority-90 flow matches on ip4.dst ==
1364 A and icmp4.type == 8 && icmp4.code == 0 (ICMP echo
1365 request). For each A that is an IPv6 address, a prior‐
1366 ity-90 flow matches on ip6.dst == A and icmp6.type == 128
1367 && icmp6.code == 0 (ICMPv6 echo request). The port of the
1368 router that receives the echo request does not matter.
1369 Also, the ip.ttl of the echo request packet is not
1370 checked, so it complies with RFC 1812, section 4.2.2.9.
1371 Flows for ICMPv4 echo requests use the following actions:
1372
1373 ip4.dst <-> ip4.src;
1374 ip.ttl = 255;
1375 icmp4.type = 0;
1376 flags.loopback = 1;
1377 next;
1378
1379
1380 Flows for ICMPv6 echo requests use the following actions:
1381
1382 ip6.dst <-> ip6.src;
1383 ip.ttl = 255;
1384 icmp6.type = 129;
1385 flags.loopback = 1;
1386 next;
1387
1388
1389 · Reply to ARP requests.
1390
1391 These flows reply to ARP requests for the router’s own IP
1392 address. The ARP requests are handled only if the
1393 requestor’s IP belongs to the same subnets of the logical
1394 router port. For each router port P that owns IP address
1395 A, which belongs to subnet S with prefix length L, and
1396 Ethernet address E, a priority-90 flow matches inport ==
1397 P && arp.spa == S/L && arp.op == 1 && arp.tpa == A (ARP
1398 request) with the following actions:
1399
1400 eth.dst = eth.src;
1401 eth.src = E;
1402 arp.op = 2; /* ARP reply. */
1403 arp.tha = arp.sha;
1404 arp.sha = E;
1405 arp.tpa = arp.spa;
1406 arp.spa = A;
1407 outport = P;
1408 flags.loopback = 1;
1409 output;
1410
1411
1412 For the gateway port on a distributed logical router
1413 (where one of the logical router ports specifies a redi‐
1414 rect-chassis), the above flows are only programmed on the
1415 gateway port instance on the redirect-chassis. This
1416 behavior avoids generation of multiple ARP responses from
1417 different chassis, and allows upstream MAC learning to
1418 point to the redirect-chassis.
1419
1420 For the logical router port with the option reside-on-re‐
1421 direct-chassis set (which is centralized), the above
1422 flows are only programmed on the gateway port instance on
1423 the redirect-chassis (if the logical router has a dis‐
1424 tributed gateway port). This behavior avoids generation
1425 of multiple ARP responses from different chassis, and
1426 allows upstream MAC learning to point to the redi‐
1427 rect-chassis.
1428
1429 · Reply to IPv6 Neighbor Solicitations. These flows reply
1430 to Neighbor Solicitation requests for the router’s own
1431 IPv6 address and populate the logical router’s mac bind‐
1432 ing table.
1433
1434 For each router port P that owns IPv6 address A,
1435 solicited node address S, and Ethernet address E, a pri‐
1436 ority-90 flow matches inport == P && nd_ns && ip6.dst ==
1437 {A, E} && nd.target == A with the following actions:
1438
1439 nd_na_router {
1440 eth.src = E;
1441 ip6.src = A;
1442 nd.target = A;
1443 nd.tll = E;
1444 outport = inport;
1445 flags.loopback = 1;
1446 output;
1447 };
1448
1449
1450 For the gateway port on a distributed logical router
1451 (where one of the logical router ports specifies a redi‐
1452 rect-chassis), the above flows replying to IPv6 Neighbor
1453 Solicitations are only programmed on the gateway port
1454 instance on the redirect-chassis. This behavior avoids
1455 generation of multiple replies from different chassis,
1456 and allows upstream MAC learning to point to the redi‐
1457 rect-chassis.
1458
1459 · These flows reply to ARP requests or IPv6 neighbor solic‐
1460 itation for the virtual IP addresses configured in the
1461 router for DNAT or load balancing.
1462
1463 IPv4: For a configured DNAT IP address or a load balancer
1464 IPv4 VIP A, for each router port P with Ethernet address
1465 E, a priority-90 flow matches inport == P && arp.op == 1
1466 && arp.tpa == A (ARP request) with the following actions:
1467
1468 eth.dst = eth.src;
1469 eth.src = E;
1470 arp.op = 2; /* ARP reply. */
1471 arp.tha = arp.sha;
1472 arp.sha = E;
1473 arp.tpa = arp.spa;
1474 arp.spa = A;
1475 outport = P;
1476 flags.loopback = 1;
1477 output;
1478
1479
1480 If the router port P is a distributed gateway router
1481 port, then the is_chassis_resident(P) is also added in
1482 the match condition for the load balancer IPv4 VIP A.
1483
1484 IPv6: For a configured DNAT IP address or a load balancer
1485 IPv6 VIP A, solicited node address S, for each router
1486 port P with Ethernet address E, a priority-90 flow
1487 matches inport == P && nd_ns && ip6.dst == {A, S} &&
1488 nd.target == A with the following actions:
1489
1490 eth.dst = eth.src;
1491 nd_na {
1492 eth.src = E;
1493 nd.tll = E;
1494 ip6.src = A;
1495 nd.target = A;
1496 outport = P;
1497 flags.loopback = 1;
1498 output;
1499 }
1500
1501
1502 If the router port P is a distributed gateway router
1503 port, then the is_chassis_resident(P) is also added in
1504 the match condition for the load balancer IPv6 VIP A.
1505
1506 For the gateway port on a distributed logical router with
1507 NAT (where one of the logical router ports specifies a
1508 redirect-chassis):
1509
1510 · If the corresponding NAT rule cannot be handled in
1511 a distributed manner, then this flow is only pro‐
1512 grammed on the gateway port instance on the redi‐
1513 rect-chassis. This behavior avoids generation of
1514 multiple ARP responses from different chassis, and
1515 allows upstream MAC learning to point to the redi‐
1516 rect-chassis.
1517
1518 · If the corresponding NAT rule can be handled in a
1519 distributed manner, then this flow is only pro‐
1520 grammed on the gateway port instance where the
1521 logical_port specified in the NAT rule resides.
1522
1523 Some of the actions are different for this case,
1524 using the external_mac specified in the NAT rule
1525 rather than the gateway port’s Ethernet address E:
1526
1527 eth.src = external_mac;
1528 arp.sha = external_mac;
1529
1530
1531 or in the case of IPv6 neighbor solicition:
1532
1533 eth.src = external_mac;
1534 nd.tll = external_mac;
1535
1536
1537 This behavior avoids generation of multiple ARP
1538 responses from different chassis, and allows
1539 upstream MAC learning to point to the correct
1540 chassis.
1541
1542 · Priority-85 flows which drops the ARP and IPv6 Neighbor
1543 Discovery packets.
1544
1545 · A priority-84 flow explicitly allows IPv6 multicast traf‐
1546 fic that is supposed to reach the router pipeline (i.e.,
1547 router solicitation and router advertisement packets).
1548
1549 · A priority-83 flow explicitly drops IPv6 multicast traf‐
1550 fic that is destined to reserved multicast groups.
1551
1552 · A priority-82 flow allows IP multicast traffic if
1553 options:mcast_relay=’true’, otherwise drops it.
1554
1555 · UDP port unreachable. Priority-80 flows generate ICMP
1556 port unreachable messages in reply to UDP datagrams
1557 directed to the router’s IP address, except in the spe‐
1558 cial case of gateways, which accept traffic directed to a
1559 router IP for load balancing and NAT purposes.
1560
1561 These flows should not match IP fragments with nonzero
1562 offset.
1563
1564 · TCP reset. Priority-80 flows generate TCP reset messages
1565 in reply to TCP datagrams directed to the router’s IP
1566 address, except in the special case of gateways, which
1567 accept traffic directed to a router IP for load balancing
1568 and NAT purposes.
1569
1570 These flows should not match IP fragments with nonzero
1571 offset.
1572
1573 · Protocol or address unreachable. Priority-70 flows gener‐
1574 ate ICMP protocol or address unreachable messages for
1575 IPv4 and IPv6 respectively in reply to packets directed
1576 to the router’s IP address on IP protocols other than
1577 UDP, TCP, and ICMP, except in the special case of gate‐
1578 ways, which accept traffic directed to a router IP for
1579 load balancing purposes.
1580
1581 These flows should not match IP fragments with nonzero
1582 offset.
1583
1584 · Drop other IP traffic to this router. These flows drop
1585 any other traffic destined to an IP address of this
1586 router that is not already handled by one of the flows
1587 above, which amounts to ICMP (other than echo requests)
1588 and fragments with nonzero offsets. For each IP address A
1589 owned by the router, a priority-60 flow matches ip4.dst
1590 == A or ip6.dst == A and drops the traffic. An exception
1591 is made and the above flow is not added if the router
1592 port’s own IP address is used to SNAT packets passing
1593 through that router.
1594
1595 The flows above handle all of the traffic that might be directed to the
1596 router itself. The following flows (with lower priorities) handle the
1597 remaining traffic, potentially for forwarding:
1598
1599 · Drop Ethernet local broadcast. A priority-50 flow with
1600 match eth.bcast drops traffic destined to the local Eth‐
1601 ernet broadcast address. By definition this traffic
1602 should not be forwarded.
1603
1604 · ICMP time exceeded. For each router port P, whose IP
1605 address is A, a priority-40 flow with match inport == P
1606 && ip.ttl == {0, 1} && !ip.later_frag matches packets
1607 whose TTL has expired, with the following actions to send
1608 an ICMP time exceeded reply for IPv4 and IPv6 respec‐
1609 tively:
1610
1611 icmp4 {
1612 icmp4.type = 11; /* Time exceeded. */
1613 icmp4.code = 0; /* TTL exceeded in transit. */
1614 ip4.dst = ip4.src;
1615 ip4.src = A;
1616 ip.ttl = 255;
1617 next;
1618 };
1619 icmp6 {
1620 icmp6.type = 3; /* Time exceeded. */
1621 icmp6.code = 0; /* TTL exceeded in transit. */
1622 ip6.dst = ip6.src;
1623 ip6.src = A;
1624 ip.ttl = 255;
1625 next;
1626 };
1627
1628
1629 · TTL discard. A priority-30 flow with match ip.ttl == {0,
1630 1} and actions drop; drops other packets whose TTL has
1631 expired, that should not receive a ICMP error reply (i.e.
1632 fragments with nonzero offset).
1633
1634 · Next table. A priority-0 flows match all packets that
1635 aren’t already handled and uses actions next; to feed
1636 them to the next table.
1637
1638 Ingress Table 4: DEFRAG
1639
1640 This is to send packets to connection tracker for tracking and defrag‐
1641 mentation. It contains a priority-0 flow that simply moves traffic to
1642 the next table. If load balancing rules with virtual IP addresses (and
1643 ports) are configured in OVN_Northbound database for a Gateway router,
1644 a priority-100 flow is added for each configured virtual IP address
1645 VIP. For IPv4 VIPs the flow matches ip && ip4.dst == VIP. For IPv6
1646 VIPs, the flow matches ip && ip6.dst == VIP. The flow uses the action
1647 ct_next; to send IP packets to the connection tracker for packet de-
1648 fragmentation and tracking before sending it to the next table.
1649
1650 Ingress Table 5: UNSNAT
1651
1652 This is for already established connections’ reverse traffic. i.e.,
1653 SNAT has already been done in egress pipeline and now the packet has
1654 entered the ingress pipeline as part of a reply. It is unSNATted here.
1655
1656 Ingress Table 5: UNSNAT on Gateway and Distributed Routers
1657
1658 · If the Router (Gateway or Distributed) is configured with
1659 load balancers, then below lflows are added:
1660
1661 For each IPv4 address A defined as load balancer VIP with
1662 the protocol P (and the protocol port T if defined) is
1663 also present as an external_ip in the NAT table, a prior‐
1664 ity-120 logical flow is added with the match ip4 &&
1665 ip4.dst == A && P with the action next; to advance the
1666 packet to the next table. If the load balancer has proto‐
1667 col port B defined, then the match also has P.dst == B.
1668
1669 The above flows are also added for IPv6 load balancers.
1670
1671 Ingress Table 5: UNSNAT on Gateway Routers
1672
1673 · If the Gateway router has been configured to force SNAT
1674 any previously DNATted packets to B, a priority-110 flow
1675 matches ip && ip4.dst == B or ip && ip6.dst == B with an
1676 action ct_snat; .
1677
1678 If the Gateway router has been configured to force SNAT
1679 any previously load-balanced packets to B, a priority-100
1680 flow matches ip && ip4.dst == B or ip && ip6.dst == B
1681 with an action ct_snat; .
1682
1683 For each NAT configuration in the OVN Northbound data‐
1684 base, that asks to change the source IP address of a
1685 packet from A to B, a priority-90 flow matches ip &&
1686 ip4.dst == B or ip && ip6.dst == B with an action
1687 ct_snat; . If the NAT rule is of type dnat_and_snat and
1688 has stateless=true in the options, then the action would
1689 be ip4/6.dst= (B).
1690
1691 A priority-0 logical flow with match 1 has actions next;.
1692
1693 Ingress Table 5: UNSNAT on Distributed Routers
1694
1695 · For each configuration in the OVN Northbound database,
1696 that asks to change the source IP address of a packet
1697 from A to B, a priority-100 flow matches ip && ip4.dst ==
1698 B && inport == GW or ip && ip6.dst == B && inport == GW
1699 where GW is the logical router gateway port, with an
1700 action ct_snat;. If the NAT rule is of type dnat_and_snat
1701 and has stateless=true in the options, then the action
1702 would be ip4/6.dst= (B).
1703
1704 If the NAT rule cannot be handled in a distributed man‐
1705 ner, then the priority-100 flow above is only programmed
1706 on the redirect-chassis.
1707
1708 A priority-0 logical flow with match 1 has actions next;.
1709
1710 Ingress Table 6: DNAT
1711
1712 Packets enter the pipeline with destination IP address that needs to be
1713 DNATted from a virtual IP address to a real IP address. Packets in the
1714 reverse direction needs to be unDNATed.
1715
1716 Ingress Table 6: Load balancing DNAT rules
1717
1718 Following load balancing DNAT flows are added for Gateway router or
1719 Router with gateway port. These flows are programmed only on the redi‐
1720 rect-chassis. These flows do not get programmed for load balancers with
1721 IPv6 VIPs.
1722
1723 · If controller_event has been enabled for all the config‐
1724 ured load balancing rules for a Gateway router or Router
1725 with gateway port in OVN_Northbound database that does
1726 not have configured backends, a priority-130 flow is
1727 added to trigger ovn-controller events whenever the chas‐
1728 sis receives a packet for that particular VIP. If
1729 event-elb meter has been previously created, it will be
1730 associated to the empty_lb logical flow
1731
1732 · For all the configured load balancing rules for a Gateway
1733 router or Router with gateway port in OVN_Northbound
1734 database that includes a L4 port PORT of protocol P and
1735 IPv4 or IPv6 address VIP, a priority-120 flow that
1736 matches on ct.new && ip && ip4.dst == VIP && P && P.dst
1737 == PORT
1738 (ip6.dst == VIP in the IPv6 case) with an action of
1739 ct_lb(args), where args contains comma separated IPv4 or
1740 IPv6 addresses (and optional port numbers) to load bal‐
1741 ance to. If the router is configured to force SNAT any
1742 load-balanced packets, the above action will be replaced
1743 by flags.force_snat_for_lb = 1; ct_lb(args);. If health
1744 check is enabled, then args will only contain those end‐
1745 points whose service monitor status entry in OVN_South‐
1746 bound db is either online or empty.
1747
1748 · For all the configured load balancing rules for a router
1749 in OVN_Northbound database that includes a L4 port PORT
1750 of protocol P and IPv4 or IPv6 address VIP, a prior‐
1751 ity-120 flow that matches on ct.est && ip && ip4.dst ==
1752 VIP && P && P.dst == PORT
1753 (ip6.dst == VIP in the IPv6 case) with an action of
1754 ct_dnat;. If the router is configured to force SNAT any
1755 load-balanced packets, the above action will be replaced
1756 by flags.force_snat_for_lb = 1; ct_dnat;.
1757
1758 · For all the configured load balancing rules for a router
1759 in OVN_Northbound database that includes just an IP
1760 address VIP to match on, a priority-110 flow that matches
1761 on ct.new && ip && ip4.dst == VIP (ip6.dst == VIP in the
1762 IPv6 case) with an action of ct_lb(args), where args con‐
1763 tains comma separated IPv4 or IPv6 addresses. If the
1764 router is configured to force SNAT any load-balanced
1765 packets, the above action will be replaced by
1766 flags.force_snat_for_lb = 1; ct_lb(args);.
1767
1768 · For all the configured load balancing rules for a router
1769 in OVN_Northbound database that includes just an IP
1770 address VIP to match on, a priority-110 flow that matches
1771 on ct.est && ip && ip4.dst == VIP (or ip6.dst == VIP)
1772 with an action of ct_dnat;. If the router is configured
1773 to force SNAT any load-balanced packets, the above action
1774 will be replaced by flags.force_snat_for_lb = 1;
1775 ct_dnat;.
1776
1777 Ingress Table 6: DNAT on Gateway Routers
1778
1779 · For each configuration in the OVN Northbound database,
1780 that asks to change the destination IP address of a
1781 packet from A to B, a priority-100 flow matches ip &&
1782 ip4.dst == A or ip && ip6.dst == A with an action
1783 flags.loopback = 1; ct_dnat(B);. If the Gateway router is
1784 configured to force SNAT any DNATed packet, the above
1785 action will be replaced by flags.force_snat_for_dnat = 1;
1786 flags.loopback = 1; ct_dnat(B);. If the NAT rule is of
1787 type dnat_and_snat and has stateless=true in the options,
1788 then the action would be ip4/6.dst= (B).
1789
1790 · For all IP packets of a Gateway router, a priority-50
1791 flow with an action flags.loopback = 1; ct_dnat;.
1792
1793 · A priority-0 logical flow with match 1 has actions next;.
1794
1795 Ingress Table 6: DNAT on Distributed Routers
1796
1797 On distributed routers, the DNAT table only handles packets with desti‐
1798 nation IP address that needs to be DNATted from a virtual IP address to
1799 a real IP address. The unDNAT processing in the reverse direction is
1800 handled in a separate table in the egress pipeline.
1801
1802 · For each configuration in the OVN Northbound database,
1803 that asks to change the destination IP address of a
1804 packet from A to B, a priority-100 flow matches ip &&
1805 ip4.dst == B && inport == GW, where GW is the logical
1806 router gateway port, with an action ct_dnat(B);. The
1807 match will include ip6.dst == B in the IPv6 case. If the
1808 NAT rule is of type dnat_and_snat and has stateless=true
1809 in the options, then the action would be ip4/6.dst=(B).
1810
1811 If the NAT rule cannot be handled in a distributed man‐
1812 ner, then the priority-100 flow above is only programmed
1813 on the redirect-chassis.
1814
1815 A priority-0 logical flow with match 1 has actions next;.
1816
1817 Ingress Table 7: IPv6 ND RA option processing
1818
1819 · A priority-50 logical flow is added for each logical
1820 router port configured with IPv6 ND RA options which
1821 matches IPv6 ND Router Solicitation packet and applies
1822 the action put_nd_ra_opts and advances the packet to the
1823 next table.
1824
1825 reg0[5] = put_nd_ra_opts(options);next;
1826
1827
1828 For a valid IPv6 ND RS packet, this transforms the packet
1829 into an IPv6 ND RA reply and sets the RA options to the
1830 packet and stores 1 into reg0[5]. For other kinds of
1831 packets, it just stores 0 into reg0[5]. Either way, it
1832 continues to the next table.
1833
1834 · A priority-0 logical flow with match 1 has actions next;.
1835
1836 Ingress Table 8: IPv6 ND RA responder
1837
1838 This table implements IPv6 ND RA responder for the IPv6 ND RA replies
1839 generated by the previous table.
1840
1841 · A priority-50 logical flow is added for each logical
1842 router port configured with IPv6 ND RA options which
1843 matches IPv6 ND RA packets and reg0[5] == 1 and responds
1844 back to the inport after applying these actions. If
1845 reg0[5] is set to 1, it means that the action
1846 put_nd_ra_opts was successful.
1847
1848 eth.dst = eth.src;
1849 eth.src = E;
1850 ip6.dst = ip6.src;
1851 ip6.src = I;
1852 outport = P;
1853 flags.loopback = 1;
1854 output;
1855
1856
1857 where E is the MAC address and I is the IPv6 link local
1858 address of the logical router port.
1859
1860 (This terminates packet processing in ingress pipeline;
1861 the packet does not go to the next ingress table.)
1862
1863 · A priority-0 logical flow with match 1 has actions next;.
1864
1865 Ingress Table 9: IP Routing
1866
1867 A packet that arrives at this table is an IP packet that should be
1868 routed to the address in ip4.dst or ip6.dst. This table implements IP
1869 routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
1870 (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
1871 and advances to the next table for ARP resolution. It also sets reg1
1872 (or xxreg1) to the IP address owned by the selected router port
1873 (ingress table ARP Request will generate an ARP request, if needed,
1874 with reg0 as the target protocol address and reg1 as the source proto‐
1875 col address).
1876
1877 For ECMP routes, i.e. multiple static routes with same policy and pre‐
1878 fix but different nexthops, the above actions are deferred to next ta‐
1879 ble. This table, instead, is responsible for determine the ECMP group
1880 id and select a member id within the group based on 5-tuple hashing. It
1881 stores group id in reg8[0..15] and member id in reg8[16..31].
1882
1883 This table contains the following logical flows:
1884
1885 · Priority-550 flow that drops IPv6 Router Solicita‐
1886 tion/Advertisement packets that were not processed in
1887 previous tables.
1888
1889 · Priority-500 flows that match IP multicast traffic des‐
1890 tined to groups registered on any of the attached
1891 switches and sets outport to the associated multicast
1892 group that will eventually flood the traffic to all
1893 interested attached logical switches. The flows also
1894 decrement TTL.
1895
1896 · Priority-450 flow that matches unregistered IP multicast
1897 traffic and sets outport to the MC_STATIC multicast
1898 group, which ovn-northd populates with the logical ports
1899 that have options :mcast_flood=’true’. If no router ports
1900 are configured to flood multicast traffic the packets are
1901 dropped.
1902
1903 · For distributed logical routers where one of the logical
1904 router ports specifies a redirect-chassis, a priority-400
1905 logical flow for each dnat_and_snat NAT rules configured.
1906 These flows will allow to properly forward traffic to the
1907 external connections if available and avoid sending it
1908 through the tunnel. Assuming the following NAT rule has
1909 been configured:
1910
1911 external_ip = A;
1912 external_mac = B;
1913 logical_ip = C;
1914
1915
1916 the following action will be applied:
1917
1918 ip.ttl--;
1919 reg0 = ip.dst;
1920 reg1 = A;
1921 eth.src = B;
1922 outport = router-port;
1923 next;
1924
1925
1926 · IPv4 routing table. For each route to IPv4 network N with
1927 netmask M, on router port P with IP address A and Ether‐
1928 net address E, a logical flow with match ip4.dst == N/M,
1929 whose priority is 400 + the number of 1-bits in M if the
1930 router port is not a distributed gateway port, else the
1931 priority is the number of 1-bits in M, has the following
1932 actions:
1933
1934 ip.ttl--;
1935 reg8[0..15] = 0;
1936 reg0 = G;
1937 reg1 = A;
1938 eth.src = E;
1939 outport = P;
1940 flags.loopback = 1;
1941 next;
1942
1943
1944 (Ingress table 1 already verified that ip.ttl--; will not
1945 yield a TTL exceeded error.)
1946
1947 If the route has a gateway, G is the gateway IP address.
1948 Instead, if the route is from a configured static route,
1949 G is the next hop IP address. Else it is ip4.dst.
1950
1951 · IPv6 routing table. For each route to IPv6 network N with
1952 netmask M, on router port P with IP address A and Ether‐
1953 net address E, a logical flow with match in CIDR notation
1954 ip6.dst == N/M, whose priority is the integer value of M,
1955 has the following actions:
1956
1957 ip.ttl--;
1958 reg8[0..15] = 0;
1959 xxreg0 = G;
1960 xxreg1 = A;
1961 eth.src = E;
1962 outport = P;
1963 flags.loopback = 1;
1964 next;
1965
1966
1967 (Ingress table 1 already verified that ip.ttl--; will not
1968 yield a TTL exceeded error.)
1969
1970 If the route has a gateway, G is the gateway IP address.
1971 Instead, if the route is from a configured static route,
1972 G is the next hop IP address. Else it is ip6.dst.
1973
1974 If the address A is in the link-local scope, the route
1975 will be limited to sending on the ingress port.
1976
1977 · For ECMP routes, they are grouped by policy and prefix.
1978 An unique id (non-zero) is assigned to each group, and
1979 each member is also assigned an unique id (non-zero)
1980 within each group.
1981
1982 For each IPv4/IPv6 ECMP group with group id GID and mem‐
1983 ber ids MID1, MID2, ..., a logical flow with match in
1984 CIDR notation ip4.dst == N/M, or ip6.dst == N/M, whose
1985 priority is the integer value of M, has the following
1986 actions:
1987
1988 ip.ttl--;
1989 flags.loopback = 1;
1990 reg8[0..15] = GID;
1991 select(reg8[16..31], MID1, MID2, ...);
1992
1993
1994 Ingress Table 10: IP_ROUTING_ECMP
1995
1996 This table implements the second part of IP routing for ECMP routes
1997 following the previous table. If a packet matched a ECMP group in the
1998 previous table, this table matches the group id and member id stored
1999 from the previous table, setting reg0 (or xxreg0 for IPv6) to the next-
2000 hop IP address (leaving ip4.dst or ip6.dst, the packet’s final destina‐
2001 tion, unchanged) and advances to the next table for ARP resolution. It
2002 also sets reg1 (or xxreg1) to the IP address owned by the selected
2003 router port (ingress table ARP Request will generate an ARP request, if
2004 needed, with reg0 as the target protocol address and reg1 as the source
2005 protocol address).
2006
2007 This table contains the following logical flows:
2008
2009 · A priority-150 flow that matches reg8[0..15] == 0 with
2010 action next; directly bypasses packets of non-ECMP
2011 routes.
2012
2013 · For each member with ID MID in each ECMP group with ID
2014 GID, a priority-100 flow with match reg8[0..15] == GID &&
2015 reg8[16..31] == MID has following actions:
2016
2017 [xx]reg0 = G;
2018 [xx]reg1 = A;
2019 eth.src = E;
2020 outport = P;
2021
2022
2023 Ingress Table 12: ARP/ND Resolution
2024
2025 Any packet that reaches this table is an IP packet whose next-hop IPv4
2026 address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
2027 contains the final destination.) This table resolves the IP address in
2028 reg0 (or xxreg0) into an output port in outport and an Ethernet address
2029 in eth.dst, using the following flows:
2030
2031 · A priority-500 flow that matches IP multicast traffic
2032 that was allowed in the routing pipeline. For this kind
2033 of traffic the outport was already set so the flow just
2034 advances to the next table.
2035
2036 · Static MAC bindings. MAC bindings can be known statically
2037 based on data in the OVN_Northbound database. For router
2038 ports connected to logical switches, MAC bindings can be
2039 known statically from the addresses column in the Logi‐
2040 cal_Switch_Port table. For router ports connected to
2041 other logical routers, MAC bindings can be known stati‐
2042 cally from the mac and networks column in the Logi‐
2043 cal_Router_Port table.
2044
2045 For each IPv4 address A whose host is known to have Eth‐
2046 ernet address E on router port P, a priority-100 flow
2047 with match outport === P && reg0 == A has actions eth.dst
2048 = E; next;.
2049
2050 For each virtual ip A configured on a logical port of
2051 type virtual and its virtual parent set in its corre‐
2052 sponding Port_Binding record and the virtual parent with
2053 the Ethernet address E and the virtual ip is reachable
2054 via the router port P, a priority-100 flow with match
2055 outport === P && reg0 == A has actions eth.dst = E;
2056 next;.
2057
2058 For each virtual ip A configured on a logical port of
2059 type virtual and its virtual parent not set in its corre‐
2060 sponding Port_Binding record and the virtual ip A is
2061 reachable via the router port P, a priority-100 flow with
2062 match outport === P && reg0 == A has actions eth.dst =
2063 00:00:00:00:00:00; next;. This flow is added so that the
2064 ARP is always resolved for the virtual ip A by generating
2065 ARP request and not consulting the MAC_Binding table as
2066 it can have incorrect value for the virtual ip A.
2067
2068 For each IPv6 address A whose host is known to have Eth‐
2069 ernet address E on router port P, a priority-100 flow
2070 with match outport === P && xxreg0 == A has actions
2071 eth.dst = E; next;.
2072
2073 For each logical router port with an IPv4 address A and a
2074 mac address of E that is reachable via a different logi‐
2075 cal router port P, a priority-100 flow with match outport
2076 === P && reg0 == A has actions eth.dst = E; next;.
2077
2078 For each logical router port with an IPv6 address A and a
2079 mac address of E that is reachable via a different logi‐
2080 cal router port P, a priority-100 flow with match outport
2081 === P && xxreg0 == A has actions eth.dst = E; next;.
2082
2083 · Static MAC bindings from NAT entries. MAC bindings can
2084 also be known for the entries in the NAT table. Below
2085 flows are programmed for distributed logical routers i.e
2086 with a distributed router port.
2087
2088 For each row in the NAT table with IPv4 address A in the
2089 external_ip column of NAT table, a priority-100 flow with
2090 the match outport === P && reg0 == A has actions eth.dst
2091 = E; next;, where P is the distributed logical router
2092 port, E is the Ethernet address if set in the exter‐
2093 nal_mac column of NAT table for of type dnat_and_snat,
2094 otherwise the Ethernet address of the distributed logical
2095 router port.
2096
2097 For IPv6 NAT entries, same flows are added, but using the
2098 register xxreg0 for the match.
2099
2100 · Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
2101 ings that have become known dynamically through ARP or
2102 neighbor discovery. (The ingress table ARP Request will
2103 issue an ARP or neighbor solicitation request for cases
2104 where the binding is not yet known.)
2105
2106 A priority-0 logical flow with match ip4 has actions
2107 get_arp(outport, reg0); next;.
2108
2109 A priority-0 logical flow with match ip6 has actions
2110 get_nd(outport, xxreg0); next;.
2111
2112 · For logical router port with redirect-chassis and redi‐
2113 rect-type being set as bridged, a priority-50 flow will
2114 match outport == "ROUTER_PORT" and !is_chassis_resident
2115 ("cr-ROUTER_PORT") has actions eth.dst = E; next;, where
2116 E is the ethernet address of the logical router port.
2117
2118 Ingress Table 13: Check packet length
2119
2120 For distributed logical routers with distributed gateway port config‐
2121 ured with options:gateway_mtu to a valid integer value, this table adds
2122 a priority-50 logical flow with the match ip4 && outport == GW_PORT
2123 where GW_PORT is the distributed gateway router port and applies the
2124 action check_pkt_larger and advances the packet to the next table.
2125
2126 REGBIT_PKT_LARGER = check_pkt_larger(L); next;
2127
2128
2129 where L is the packet length to check for. If the packet is larger than
2130 L, it stores 1 in the register bit REGBIT_PKT_LARGER. The value of L is
2131 taken from options:gateway_mtu column of Logical_Router_Port row.
2132
2133 This table adds one priority-0 fallback flow that matches all packets
2134 and advances to the next table.
2135
2136 Ingress Table 14: Handle larger packets
2137
2138 For distributed logical routers with distributed gateway port config‐
2139 ured with options:gateway_mtu to a valid integer value, this table adds
2140 the following priority-50 logical flow for each logical router port
2141 with the match ip4 && inport == LRP && outport == GW_PORT && REG‐
2142 BIT_PKT_LARGER, where LRP is the logical router port and GW_PORT is the
2143 distributed gateway router port and applies the following action
2144
2145 icmp4 {
2146 icmp4.type = 3; /* Destination Unreachable. */
2147 icmp4.code = 4; /* Frag Needed and DF was Set. */
2148 icmp4.frag_mtu = M;
2149 eth.dst = E;
2150 ip4.dst = ip4.src;
2151 ip4.src = I;
2152 ip.ttl = 255;
2153 REGBIT_EGRESS_LOOPBACK = 1;
2154 next(pipeline=ingress, table=0);
2155 };
2156
2157
2158 · Where M is the (fragment MTU - 58) whose value is taken
2159 from options:gateway_mtu column of Logical_Router_Port
2160 row.
2161
2162 · E is the Ethernet address of the logical router port.
2163
2164 · I is the IPv4 address of the logical router port.
2165
2166 This table adds one priority-0 fallback flow that matches all packets
2167 and advances to the next table.
2168
2169 Ingress Table 15: Gateway Redirect
2170
2171 For distributed logical routers where one of the logical router ports
2172 specifies a redirect-chassis, this table redirects certain packets to
2173 the distributed gateway port instance on the redirect-chassis. This ta‐
2174 ble has the following flows:
2175
2176 · A priority-150 logical flow with match outport == GW &&
2177 eth.dst == 00:00:00:00:00:00 has actions outport = CR;
2178 next;, where GW is the logical router distributed gateway
2179 port and CR is the chassisredirect port representing the
2180 instance of the logical router distributed gateway port
2181 on the redirect-chassis.
2182
2183 · For each NAT rule in the OVN Northbound database that can
2184 be handled in a distributed manner, a priority-200 logi‐
2185 cal flow with match ip4.src == B && outport == GW, where
2186 GW is the logical router distributed gateway port, with
2187 actions next;.
2188
2189 · A priority-50 logical flow with match outport == GW has
2190 actions outport = CR; next;, where GW is the logical
2191 router distributed gateway port and CR is the chas‐
2192 sisredirect port representing the instance of the logical
2193 router distributed gateway port on the redirect-chassis.
2194
2195 · A priority-0 logical flow with match 1 has actions next;.
2196
2197 Ingress Table 16: ARP Request
2198
2199 In the common case where the Ethernet destination has been resolved,
2200 this table outputs the packet. Otherwise, it composes and sends an ARP
2201 or IPv6 Neighbor Solicitation request. It holds the following flows:
2202
2203 · Unknown MAC address. A priority-100 flow for IPv4 packets
2204 with match eth.dst == 00:00:00:00:00:00 has the following
2205 actions:
2206
2207 arp {
2208 eth.dst = ff:ff:ff:ff:ff:ff;
2209 arp.spa = reg1;
2210 arp.tpa = reg0;
2211 arp.op = 1; /* ARP request. */
2212 output;
2213 };
2214
2215
2216 Unknown MAC address. For each IPv6 static route associ‐
2217 ated with the router with the nexthop IP: G, a prior‐
2218 ity-200 flow for IPv6 packets with match eth.dst ==
2219 00:00:00:00:00:00 && xxreg0 == G with the following
2220 actions is added:
2221
2222 nd_ns {
2223 eth.dst = E;
2224 ip6.dst = I
2225 nd.target = G;
2226 output;
2227 };
2228
2229
2230 Where E is the multicast mac derived from the Gateway IP,
2231 I is the solicited-node multicast address corresponding
2232 to the target address G.
2233
2234 Unknown MAC address. A priority-100 flow for IPv6 packets
2235 with match eth.dst == 00:00:00:00:00:00 has the following
2236 actions:
2237
2238 nd_ns {
2239 nd.target = xxreg0;
2240 output;
2241 };
2242
2243
2244 (Ingress table IP Routing initialized reg1 with the IP
2245 address owned by outport and (xx)reg0 with the next-hop
2246 IP address)
2247
2248 The IP packet that triggers the ARP/IPv6 NS request is
2249 dropped.
2250
2251 · Known MAC address. A priority-0 flow with match 1 has
2252 actions output;.
2253
2254 Egress Table 0: UNDNAT
2255
2256 This is for already established connections’ reverse traffic. i.e.,
2257 DNAT has already been done in ingress pipeline and now the packet has
2258 entered the egress pipeline as part of a reply. For NAT on a distrib‐
2259 uted router, it is unDNATted here. For Gateway routers, the unDNAT pro‐
2260 cessing is carried out in the ingress DNAT table.
2261
2262 · For all the configured load balancing rules for a router
2263 with gateway port in OVN_Northbound database that
2264 includes an IPv4 address VIP, for every backend IPv4
2265 address B defined for the VIP a priority-120 flow is pro‐
2266 grammed on redirect-chassis that matches ip && ip4.src ==
2267 B && outport == GW, where GW is the logical router gate‐
2268 way port with an action ct_dnat;. If the backend IPv4
2269 address B is also configured with L4 port PORT of proto‐
2270 col P, then the match also includes P.src == PORT. These
2271 flows are not added for load balancers with IPv6 VIPs.
2272
2273 If the router is configured to force SNAT any load-bal‐
2274 anced packets, above action will be replaced by
2275 flags.force_snat_for_lb = 1; ct_dnat;.
2276
2277 · For each configuration in the OVN Northbound database
2278 that asks to change the destination IP address of a
2279 packet from an IP address of A to B, a priority-100 flow
2280 matches ip && ip4.src == B && outport == GW, where GW is
2281 the logical router gateway port, with an action ct_dnat;.
2282 If the NAT rule is of type dnat_and_snat and has state‐
2283 less=true in the options, then the action would be
2284 ip4/6.src= (B).
2285
2286 If the NAT rule cannot be handled in a distributed man‐
2287 ner, then the priority-100 flow above is only programmed
2288 on the redirect-chassis.
2289
2290 If the NAT rule can be handled in a distributed manner,
2291 then there is an additional action eth.src = EA;, where
2292 EA is the ethernet address associated with the IP address
2293 A in the NAT rule. This allows upstream MAC learning to
2294 point to the correct chassis.
2295
2296 · A priority-0 logical flow with match 1 has actions next;.
2297
2298 Egress Table 1: SNAT
2299
2300 Packets that are configured to be SNATed get their source IP address
2301 changed based on the configuration in the OVN Northbound database.
2302
2303 · A priority-120 flow to advance the IPv6 Neighbor solici‐
2304 tation packet to next table to skip SNAT. In the case
2305 where ovn-controller injects an IPv6 Neighbor Solicita‐
2306 tion packet (for nd_ns action) we don’t want the packet
2307 to go throught conntrack.
2308
2309 Egress Table 1: SNAT on Gateway Routers
2310
2311 · If the Gateway router in the OVN Northbound database has
2312 been configured to force SNAT a packet (that has been
2313 previously DNATted) to B, a priority-100 flow matches
2314 flags.force_snat_for_dnat == 1 && ip with an action
2315 ct_snat(B);.
2316
2317 If the Gateway router in the OVN Northbound database has
2318 been configured to force SNAT a packet (that has been
2319 previously load-balanced) to B, a priority-100 flow
2320 matches flags.force_snat_for_lb == 1 && ip with an action
2321 ct_snat(B);.
2322
2323 For each configuration in the OVN Northbound database,
2324 that asks to change the source IP address of a packet
2325 from an IP address of A or to change the source IP
2326 address of a packet that belongs to network A to B, a
2327 flow matches ip && ip4.src == A with an action
2328 ct_snat(B);. The priority of the flow is calculated based
2329 on the mask of A, with matches having larger masks get‐
2330 ting higher priorities. If the NAT rule is of type
2331 dnat_and_snat and has stateless=true in the options, then
2332 the action would be ip4/6.src= (B).
2333
2334 A priority-0 logical flow with match 1 has actions next;.
2335
2336 Egress Table 1: SNAT on Distributed Routers
2337
2338 · For each configuration in the OVN Northbound database,
2339 that asks to change the source IP address of a packet
2340 from an IP address of A or to change the source IP
2341 address of a packet that belongs to network A to B, a
2342 flow matches ip && ip4.src == A && outport == GW, where
2343 GW is the logical router gateway port, with an action
2344 ct_snat(B);. The priority of the flow is calculated based
2345 on the mask of A, with matches having larger masks get‐
2346 ting higher priorities. If the NAT rule is of type
2347 dnat_and_snat and has stateless=true in the options, then
2348 the action would be ip4/6.src= (B).
2349
2350 If the NAT rule cannot be handled in a distributed man‐
2351 ner, then the flow above is only programmed on the redi‐
2352 rect-chassis increasing flow priority by 128 in order to
2353 be run first
2354
2355 If the NAT rule can be handled in a distributed manner,
2356 then there is an additional action eth.src = EA;, where
2357 EA is the ethernet address associated with the IP address
2358 A in the NAT rule. This allows upstream MAC learning to
2359 point to the correct chassis.
2360
2361 · A priority-0 logical flow with match 1 has actions next;.
2362
2363 Egress Table 2: Egress Loopback
2364
2365 For distributed logical routers where one of the logical router ports
2366 specifies a redirect-chassis.
2367
2368 While UNDNAT and SNAT processing have already occurred by this point,
2369 this traffic needs to be forced through egress loopback on this dis‐
2370 tributed gateway port instance, in order for UNSNAT and DNAT processing
2371 to be applied, and also for IP routing and ARP resolution after all of
2372 the NAT processing, so that the packet can be forwarded to the destina‐
2373 tion.
2374
2375 This table has the following flows:
2376
2377 · For each NAT rule in the OVN Northbound database on a
2378 distributed router, a priority-100 logical flow with
2379 match ip4.dst == E && outport == GW && is_chassis_resi‐
2380 dent(P), where E is the external IP address specified in
2381 the NAT rule, GW is the logical router distributed gate‐
2382 way port. For dnat_and_snat NAT rule, P is the logical
2383 port specified in the NAT rule. If logical_port column of
2384 NAT table is NOT set, then P is the chassisredirect port
2385 of GW with the following actions:
2386
2387 clone {
2388 ct_clear;
2389 inport = outport;
2390 outport = "";
2391 flags = 0;
2392 flags.loopback = 1;
2393 reg0 = 0;
2394 reg1 = 0;
2395 ...
2396 reg9 = 0;
2397 REGBIT_EGRESS_LOOPBACK = 1;
2398 next(pipeline=ingress, table=0);
2399 };
2400
2401
2402 flags.loopback is set since in_port is unchanged and the
2403 packet may return back to that port after NAT processing.
2404 REGBIT_EGRESS_LOOPBACK is set to indicate that egress
2405 loopback has occurred, in order to skip the source IP
2406 address check against the router address.
2407
2408 · A priority-0 logical flow with match 1 has actions next;.
2409
2410 Egress Table 3: Delivery
2411
2412 Packets that reach this table are ready for delivery. It contains:
2413
2414 · Priority-110 logical flows that match IP multicast pack‐
2415 ets on each enabled logical router port and modify the
2416 Ethernet source address of the packets to the Ethernet
2417 address of the port and then execute action output;.
2418
2419 · Priority-100 logical flows that match packets on each
2420 enabled logical router port, with action output;.
2421
2422
2423
2424OVN 20.03.0 ovn-northd ovn-northd(8)