1ovn-northd(8) OVN Manual ovn-northd(8)
2
3
4
5build/.PP
6
8 ovn-northd - Open Virtual Network central control daemon
9
11 ovn-northd [options]
12
14 ovn-northd is a centralized daemon responsible for translating the
15 high-level OVN configuration into logical configuration consumable by
16 daemons such as ovn-controller. It translates the logical network con‐
17 figuration in terms of conventional network concepts, taken from the
18 OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
19 the OVN Southbound Database (see ovn-sb(5)) below it.
20
22 --ovnnb-db=database
23 The OVSDB database containing the OVN Northbound Database. If
24 the OVN_NB_DB environment variable is set, its value is used as
25 the default. Otherwise, the default is unix:/ovnnb_db.sock.
26
27 --ovnsb-db=database
28 The OVSDB database containing the OVN Southbound Database. If
29 the OVN_SB_DB environment variable is set, its value is used as
30 the default. Otherwise, the default is unix:/ovnsb_db.sock.
31
32 database in the above options must be an OVSDB active or passive con‐
33 nection method, as described in ovsdb(7).
34
35 Daemon Options
36 --pidfile[=pidfile]
37 Causes a file (by default, program.pid) to be created indicating
38 the PID of the running process. If the pidfile argument is not
39 specified, or if it does not begin with /, then it is created in
40 .
41
42 If --pidfile is not specified, no pidfile is created.
43
44 --overwrite-pidfile
45 By default, when --pidfile is specified and the specified pid‐
46 file already exists and is locked by a running process, the dae‐
47 mon refuses to start. Specify --overwrite-pidfile to cause it to
48 instead overwrite the pidfile.
49
50 When --pidfile is not specified, this option has no effect.
51
52 --detach
53 Runs this program as a background process. The process forks,
54 and in the child it starts a new session, closes the standard
55 file descriptors (which has the side effect of disabling logging
56 to the console), and changes its current directory to the root
57 (unless --no-chdir is specified). After the child completes its
58 initialization, the parent exits.
59
60 --monitor
61 Creates an additional process to monitor this program. If it
62 dies due to a signal that indicates a programming error (SIGA‐
63 BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
64 or SIGXFSZ) then the monitor process starts a new copy of it. If
65 the daemon dies or exits for another reason, the monitor process
66 exits.
67
68 This option is normally used with --detach, but it also func‐
69 tions without it.
70
71 --no-chdir
72 By default, when --detach is specified, the daemon changes its
73 current working directory to the root directory after it
74 detaches. Otherwise, invoking the daemon from a carelessly cho‐
75 sen directory would prevent the administrator from unmounting
76 the file system that holds that directory.
77
78 Specifying --no-chdir suppresses this behavior, preventing the
79 daemon from changing its current working directory. This may be
80 useful for collecting core files, since it is common behavior to
81 write core dumps into the current working directory and the root
82 directory is not a good directory to use.
83
84 This option has no effect when --detach is not specified.
85
86 --no-self-confinement
87 By default this daemon will try to self-confine itself to work
88 with files under well-known directories determined at build
89 time. It is better to stick with this default behavior and not
90 to use this flag unless some other Access Control is used to
91 confine daemon. Note that in contrast to other access control
92 implementations that are typically enforced from kernel-space
93 (e.g. DAC or MAC), self-confinement is imposed from the user-
94 space daemon itself and hence should not be considered as a full
95 confinement strategy, but instead should be viewed as an addi‐
96 tional layer of security.
97
98 --user=user:group
99 Causes this program to run as a different user specified in
100 user:group, thus dropping most of the root privileges. Short
101 forms user and :group are also allowed, with current user or
102 group assumed, respectively. Only daemons started by the root
103 user accepts this argument.
104
105 On Linux, daemons will be granted CAP_IPC_LOCK and
106 CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
107 that interact with a datapath, such as ovs-vswitchd, will be
108 granted three additional capabilities, namely CAP_NET_ADMIN,
109 CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
110 apply even if the new user is root.
111
112 On Windows, this option is not currently supported. For security
113 reasons, specifying this option will cause the daemon process
114 not to start.
115
116 Logging Options
117 -v[spec]
118 --verbose=[spec]
119 Sets logging levels. Without any spec, sets the log level for
120 every module and destination to dbg. Otherwise, spec is a list of
121 words separated by spaces or commas or colons, up to one from each
122 category below:
123
124 · A valid module name, as displayed by the vlog/list command
125 on ovs-appctl(8), limits the log level change to the speci‐
126 fied module.
127
128 · syslog, console, or file, to limit the log level change to
129 only to the system log, to the console, or to a file,
130 respectively. (If --detach is specified, the daemon closes
131 its standard file descriptors, so logging to the console
132 will have no effect.)
133
134 On Windows platform, syslog is accepted as a word and is
135 only useful along with the --syslog-target option (the word
136 has no effect otherwise).
137
138 · off, emer, err, warn, info, or dbg, to control the log
139 level. Messages of the given severity or higher will be
140 logged, and messages of lower severity will be filtered
141 out. off filters out all messages. See ovs-appctl(8) for a
142 definition of each log level.
143
144 Case is not significant within spec.
145
146 Regardless of the log levels set for file, logging to a file will
147 not take place unless --log-file is also specified (see below).
148
149 For compatibility with older versions of OVS, any is accepted as a
150 word but has no effect.
151
152 -v
153 --verbose
154 Sets the maximum logging verbosity level, equivalent to --ver‐
155 bose=dbg.
156
157 -vPATTERN:destination:pattern
158 --verbose=PATTERN:destination:pattern
159 Sets the log pattern for destination to pattern. Refer to
160 ovs-appctl(8) for a description of the valid syntax for pattern.
161
162 -vFACILITY:facility
163 --verbose=FACILITY:facility
164 Sets the RFC5424 facility of the log message. facility can be one
165 of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
166 ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
167 local4, local5, local6 or local7. If this option is not specified,
168 daemon is used as the default for the local system syslog and
169 local0 is used while sending a message to the target provided via
170 the --syslog-target option.
171
172 --log-file[=file]
173 Enables logging to a file. If file is specified, then it is used
174 as the exact name for the log file. The default log file name used
175 if file is omitted is /var/log/ovn/program.log.
176
177 --syslog-target=host:port
178 Send syslog messages to UDP port on host, in addition to the sys‐
179 tem syslog. The host must be a numerical IP address, not a host‐
180 name.
181
182 --syslog-method=method
183 Specify method as how syslog messages should be sent to syslog
184 daemon. The following forms are supported:
185
186 · libc, to use the libc syslog() function. Downside of using
187 this options is that libc adds fixed prefix to every mes‐
188 sage before it is actually sent to the syslog daemon over
189 /dev/log UNIX domain socket.
190
191 · unix:file, to use a UNIX domain socket directly. It is pos‐
192 sible to specify arbitrary message format with this option.
193 However, rsyslogd 8.9 and older versions use hard coded
194 parser function anyway that limits UNIX domain socket use.
195 If you want to use arbitrary message format with older
196 rsyslogd versions, then use UDP socket to localhost IP
197 address instead.
198
199 · udp:ip:port, to use a UDP socket. With this method it is
200 possible to use arbitrary message format also with older
201 rsyslogd. When sending syslog messages over UDP socket
202 extra precaution needs to be taken into account, for exam‐
203 ple, syslog daemon needs to be configured to listen on the
204 specified UDP port, accidental iptables rules could be
205 interfering with local syslog traffic and there are some
206 security considerations that apply to UDP sockets, but do
207 not apply to UNIX domain sockets.
208
209 · null, to discard all messages logged to syslog.
210
211 The default is taken from the OVS_SYSLOG_METHOD environment vari‐
212 able; if it is unset, the default is libc.
213
214 PKI Options
215 PKI configuration is required in order to use SSL for the connections
216 to the Northbound and Southbound databases.
217
218 -p privkey.pem
219 --private-key=privkey.pem
220 Specifies a PEM file containing the private key used as
221 identity for outgoing SSL connections.
222
223 -c cert.pem
224 --certificate=cert.pem
225 Specifies a PEM file containing a certificate that certi‐
226 fies the private key specified on -p or --private-key to be
227 trustworthy. The certificate must be signed by the certifi‐
228 cate authority (CA) that the peer in SSL connections will
229 use to verify it.
230
231 -C cacert.pem
232 --ca-cert=cacert.pem
233 Specifies a PEM file containing the CA certificate for ver‐
234 ifying certificates presented to this program by SSL peers.
235 (This may be the same certificate that SSL peers use to
236 verify the certificate specified on -c or --certificate, or
237 it may be a different one, depending on the PKI design in
238 use.)
239
240 -C none
241 --ca-cert=none
242 Disables verification of certificates presented by SSL
243 peers. This introduces a security risk, because it means
244 that certificates cannot be verified to be those of known
245 trusted hosts.
246
247 Other Options
248 --unixctl=socket
249 Sets the name of the control socket on which program listens for
250 runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
251 below). If socket does not begin with /, it is interpreted as
252 relative to . If --unixctl is not used at all, the default
253 socket is /program.pid.ctl, where pid is program’s process ID.
254
255 On Windows a local named pipe is used to listen for runtime man‐
256 agement commands. A file is created in the absolute path as
257 pointed by socket or if --unixctl is not used at all, a file is
258 created as program in the configured OVS_RUNDIR directory. The
259 file exists just to mimic the behavior of a Unix domain socket.
260
261 Specifying none for socket disables the control socket feature.
262
263
264
265 -h
266 --help
267 Prints a brief help message to the console.
268
269 -V
270 --version
271 Prints version information to the console.
272
274 ovs-appctl can send commands to a running ovn-northd process. The cur‐
275 rently supported commands are described below.
276
277 exit Causes ovn-northd to gracefully terminate.
278
279 pause Pauses the ovn-northd operation from processing any
280 Northbound and Southbound database changes. This will
281 also instruct ovn-northd to drop any lock on SB DB.
282
283 resume Resumes the ovn-northd operation to process Northbound
284 and Southbound database contents and generate logical
285 flows. This will also instruct ovn-northd to aspire for
286 the lock on SB DB.
287
288 is-paused
289 Returns "true" if ovn-northd is currently paused, "false"
290 otherwise.
291
292 status Prints this server’s status. Status will be "active" if
293 ovn-northd has acquired OVSDB lock on SB DB, "standby" if
294 it has not or "paused" if this instance is paused.
295
296 sb-cluster-state-reset
297 Reset southbound database cluster status when databases
298 are destroyed and rebuilt.
299
300 If all databases in a clustered southbound database are
301 removed from disk, then the stored index of all databases
302 will be reset to zero. This will cause ovn-northd to be
303 unable to read or write to the southbound database,
304 because it will always detect the data as stale. In such
305 a case, run this command so that ovn-northd will reset
306 its local index so that it can interact with the south‐
307 bound database again.
308
309 nb-cluster-state-reset
310 Reset northbound database cluster status when databases
311 are destroyed and rebuilt.
312
313 This performs the same task as sb-cluster-state-reset
314 except for the northbound database client.
315
317 You may run ovn-northd more than once in an OVN deployment. When con‐
318 nected to a standalone or clustered DB setup, OVN will automatically
319 ensure that only one of them is active at a time. If multiple instances
320 of ovn-northd are running and the active ovn-northd fails, one of the
321 hot standby instances of ovn-northd will automatically take over.
322
323 Active-Standby with multiple OVN DB servers
324 You may run multiple OVN DB servers in an OVN deployment with:
325
326 · OVN DB servers deployed in active/passive mode with one
327 active and multiple passive ovsdb-servers.
328
329 · ovn-northd also deployed on all these nodes, using unix
330 ctl sockets to connect to the local OVN DB servers.
331
332 In such deployments, the ovn-northds on the passive nodes will process
333 the DB changes and compute logical flows to be thrown out later,
334 because write transactions are not allowed by the passive ovsdb-
335 servers. It results in unnecessary CPU usage.
336
337 With the help of runtime management command pause, you can pause
338 ovn-northd on these nodes. When a passive node becomes master, you can
339 use the runtime management command resume to resume the ovn-northd to
340 process the DB changes.
341
343 One of the main purposes of ovn-northd is to populate the Logical_Flow
344 table in the OVN_Southbound database. This section describes how
345 ovn-northd does this for switch and router logical datapaths.
346
347 Logical Switch Datapaths
348 Ingress Table 0: Admission Control and Ingress Port Security - L2
349
350 Ingress table 0 contains these logical flows:
351
352 · Priority 100 flows to drop packets with VLAN tags or mul‐
353 ticast Ethernet source addresses.
354
355 · Priority 50 flows that implement ingress port security
356 for each enabled logical port. For logical ports on which
357 port security is enabled, these match the inport and the
358 valid eth.src address(es) and advance only those packets
359 to the next flow table. For logical ports on which port
360 security is not enabled, these advance all packets that
361 match the inport.
362
363 There are no flows for disabled logical ports because the default-drop
364 behavior of logical flow tables causes packets that ingress from them
365 to be dropped.
366
367 Ingress Table 1: Ingress Port Security - IP
368
369 Ingress table 1 contains these logical flows:
370
371 · For each element in the port security set having one or
372 more IPv4 or IPv6 addresses (or both),
373
374 · Priority 90 flow to allow IPv4 traffic if it has
375 IPv4 addresses which match the inport, valid
376 eth.src and valid ip4.src address(es).
377
378 · Priority 90 flow to allow IPv4 DHCP discovery
379 traffic if it has a valid eth.src. This is neces‐
380 sary since DHCP discovery messages are sent from
381 the unspecified IPv4 address (0.0.0.0) since the
382 IPv4 address has not yet been assigned.
383
384 · Priority 90 flow to allow IPv6 traffic if it has
385 IPv6 addresses which match the inport, valid
386 eth.src and valid ip6.src address(es).
387
388 · Priority 90 flow to allow IPv6 DAD (Duplicate
389 Address Detection) traffic if it has a valid
390 eth.src. This is is necessary since DAD include
391 requires joining an multicast group and sending
392 neighbor solicitations for the newly assigned
393 address. Since no address is yet assigned, these
394 are sent from the unspecified IPv6 address (::).
395
396 · Priority 80 flow to drop IP (both IPv4 and IPv6)
397 traffic which match the inport and valid eth.src.
398
399 · One priority-0 fallback flow that matches all packets and
400 advances to the next table.
401
402 Ingress Table 2: Ingress Port Security - Neighbor discovery
403
404 Ingress table 2 contains these logical flows:
405
406 · For each element in the port security set,
407
408 · Priority 90 flow to allow ARP traffic which match
409 the inport and valid eth.src and arp.sha. If the
410 element has one or more IPv4 addresses, then it
411 also matches the valid arp.spa.
412
413 · Priority 90 flow to allow IPv6 Neighbor Solicita‐
414 tion and Advertisement traffic which match the
415 inport, valid eth.src and nd.sll/nd.tll. If the
416 element has one or more IPv6 addresses, then it
417 also matches the valid nd.target address(es) for
418 Neighbor Advertisement traffic.
419
420 · Priority 80 flow to drop ARP and IPv6 Neighbor
421 Solicitation and Advertisement traffic which match
422 the inport and valid eth.src.
423
424 · One priority-0 fallback flow that matches all packets and
425 advances to the next table.
426
427 Ingress Table 3:from-lportPre-ACLs
428
429 This table prepares flows for possible stateful ACL processing in
430 ingress table ACLs. It contains a priority-0 flow that simply moves
431 traffic to the next table. If stateful ACLs are used in the logical
432 datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
433 1; next;) for table Pre-stateful to send IP packets to the connection
434 tracker before eventually advancing to ingress table ACLs. If special
435 ports such as route ports or localnet ports can’t use ct(), a prior‐
436 ity-110 flow is added to skip over stateful ACLs. IPv6 Neighbor Discov‐
437 ery and MLD traffic also skips stateful ACLs.
438
439 This table also has a priority-110 flow with the match eth.dst == E for
440 all logical switch datapaths to move traffic to the next table. Where E
441 is the service monitor mac defined in the options:svc_monitor_mac colum
442 of NB_Global table.
443
444 Ingress Table 4: Pre-LB
445
446 This table prepares flows for possible stateful load balancing process‐
447 ing in ingress table LB and Stateful. It contains a priority-0 flow
448 that simply moves traffic to the next table. Moreover it contains a
449 priority-110 flow to move IPv6 Neighbor Discovery and MLD traffic to
450 the next table. If load balancing rules with virtual IP addresses (and
451 ports) are configured in OVN_Northbound database for alogical switch
452 datapath, a priority-100 flow is added with the match ip to match on IP
453 packets and sets the action reg0[0] = 1; next; to act as a hint for ta‐
454 ble Pre-stateful to send IP packets to the connection tracker for
455 packet de-fragmentation before eventually advancing to ingress table
456 LB. If controller_event has been enabled and load balancing rules with
457 empty backends have been added in OVN_Northbound, a 130 flow is added
458 to trigger ovn-controller events whenever the chassis receives a packet
459 for that particular VIP. If event-elb meter has been previously cre‐
460 ated, it will be associated to the empty_lb logical flow
461
462 Prior to OVN 20.09 we were setting the reg0[0] = 1 only if the IP des‐
463 tination matches the load balancer VIP. However this had few issues
464 cases where a logical switch doesn’t have any ACLs with allow-related
465 action. To understand the issue lets a take a TCP load balancer -
466 10.0.0.10:80=10.0.0.3:80. If a logical port - p1 with IP - 10.0.0.5
467 opens a TCP connection with the VIP - 10.0.0.10, then the packet in the
468 ingress pipeline of ’p1’ is sent to the p1’s conntrack zone id and the
469 packet is load balanced to the backend - 10.0.0.3. For the reply packet
470 from the backend lport, it is not sent to the conntrack of backend
471 lport’s zone id. This is fine as long as the packet is valid. Suppose
472 the backend lport sends an invalid TCP packet (like incorrect sequence
473 number), the packet gets delivered to the lport ’p1’ without unDNATing
474 the packet to the VIP - 10.0.0.10. And this causes the connection to be
475 reset by the lport p1’s VIF.
476
477 We can’t fix this issue by adding a logical flow to drop ct.inv packets
478 in the egress pipeline since it will drop all other connections not
479 destined to the load balancers. To fix this issue, we send all the
480 packets to the conntrack in the ingress pipeline if a load balancer is
481 configured. We can now add a lflow to drop ct.inv packets.
482
483 This table also has a priority-110 flow with the match eth.dst == E for
484 all logical switch datapaths to move traffic to the next table. Where E
485 is the service monitor mac defined in the options:svc_monitor_mac colum
486 of NB_Global table.
487
488 This table also has a priority-110 flow with the match inport == I for
489 all logical switch datapaths to move traffic to the next table. Where I
490 is the peer of a logical router port. This flow is added to skip the
491 connection tracking of packets which enter from logical router datapath
492 to logical switch datapath.
493
494 Ingress Table 5: Pre-stateful
495
496 This table prepares flows for all possible stateful processing in next
497 tables. It contains a priority-0 flow that simply moves traffic to the
498 next table. A priority-100 flow sends the packets to connection tracker
499 based on a hint provided by the previous tables (with a match for
500 reg0[0] == 1) by using the ct_next; action.
501
502 Ingress Table 6:from-lportACL hints
503
504 This table consists of logical flows that set hints (reg0 bits) to be
505 used in the next stage, in the ACL processing table, if stateful ACLs
506 or load balancers are configured. Multiple hints can be set for the
507 same packet. The possible hints are:
508
509 · reg0[7]: the packet might match an allow-related ACL and
510 might have to commit the connection to conntrack.
511
512 · reg0[8]: the packet might match an allow-related ACL but
513 there will be no need to commit the connection to con‐
514 ntrack because it already exists.
515
516 · reg0[9]: the packet might match a drop/reject.
517
518 · reg0[10]: the packet might match a drop/reject ACL but
519 the connection was previously allowed so it might have to
520 be committed again with ct_label=1/1.
521
522 The table contains the following flows:
523
524 · A priority-7 flow that matches on packets that initiate a
525 new session. This flow sets reg0[7] and reg0[9] and then
526 advances to the next table.
527
528 · A priority-6 flow that matches on packets that are in the
529 request direction of an already existing session that has
530 been marked as blocked. This flow sets reg0[7] and
531 reg0[9] and then advances to the next table.
532
533 · A priority-5 flow that matches untracked packets. This
534 flow sets reg0[8] and reg0[9] and then advances to the
535 next table.
536
537 · A priority-4 flow that matches on packets that are in the
538 request direction of an already existing session that has
539 not been marked as blocked. This flow sets reg0[8] and
540 reg0[10] and then advances to the next table.
541
542 · A priority-3 flow that matches on packets that are in not
543 part of established sessions. This flow sets reg0[9] and
544 then advances to the next table.
545
546 · A priority-2 flow that matches on packets that are part
547 of an established session that has been marked as
548 blocked. This flow sets reg0[9] and then advances to the
549 next table.
550
551 · A priority-1 flow that matches on packets that are part
552 of an established session that has not been marked as
553 blocked. This flow sets reg0[10] and then advances to the
554 next table.
555
556 · A priority-0 flow to advance to the next table.
557
558 Ingress table 7:from-lportACLs
559
560 Logical flows in this table closely reproduce those in the ACL table in
561 the OVN_Northbound database for the from-lport direction. The priority
562 values from the ACL table have a limited range and have 1000 added to
563 them to leave room for OVN default flows at both higher and lower pri‐
564 orities.
565
566 · allow ACLs translate into logical flows with the next;
567 action. If there are any stateful ACLs on this datapath,
568 then allow ACLs translate to ct_commit; next; (which acts
569 as a hint for the next tables to commit the connection to
570 conntrack),
571
572 · allow-related ACLs translate into logical flows with the
573 ct_commit(ct_label=0/1); next; actions for new connec‐
574 tions and reg0[1] = 1; next; for existing connections.
575
576 · reject ACLs translate into logical flows with the
577 tcp_reset { output <-> inport; next(pipeline=egress,ta‐
578 ble=5);} action for TCP connections and icmp4/icmp6
579 action for UDP connections.
580
581 · Other ACLs translate to drop; for new or untracked con‐
582 nections and ct_commit(ct_label=1/1); for known connec‐
583 tions. Setting ct_label marks a connection as one that
584 was previously allowed, but should no longer be allowed
585 due to a policy change.
586
587 This table also contains a priority 0 flow with action next;, so that
588 ACLs allow packets by default. If the logical datapath has a stateful
589 ACL or a load balancer with VIP configured, the following flows will
590 also be added:
591
592 · A priority-1 flow that sets the hint to commit IP traffic
593 to the connection tracker (with action reg0[1] = 1;
594 next;). This is needed for the default allow policy
595 because, while the initiator’s direction may not have any
596 stateful rules, the server’s may and then its return
597 traffic would not be known and marked as invalid.
598
599 · A priority-65535 flow that allows any traffic in the
600 reply direction for a connection that has been committed
601 to the connection tracker (i.e., established flows), as
602 long as the committed flow does not have ct_label.blocked
603 set. We only handle traffic in the reply direction here
604 because we want all packets going in the request direc‐
605 tion to still go through the flows that implement the
606 currently defined policy based on ACLs. If a connection
607 is no longer allowed by policy, ct_label.blocked will get
608 set and packets in the reply direction will no longer be
609 allowed, either.
610
611 · A priority-65535 flow that allows any traffic that is
612 considered related to a committed flow in the connection
613 tracker (e.g., an ICMP Port Unreachable from a non-lis‐
614 tening UDP port), as long as the committed flow does not
615 have ct_label.blocked set.
616
617 · A priority-65535 flow that drops all traffic marked by
618 the connection tracker as invalid.
619
620 · A priority-65535 flow that drops all traffic in the reply
621 direction with ct_label.blocked set meaning that the con‐
622 nection should no longer be allowed due to a policy
623 change. Packets in the request direction are skipped here
624 to let a newly created ACL re-allow this connection.
625
626 · A priority-65535 flow that allows IPv6 Neighbor solicita‐
627 tion, Neighbor discover, Router solicitation, Router
628 advertisement and MLD packets.
629
630 · A priority 34000 logical flow is added for each logical
631 switch datapath with the match eth.dst = E to allow the
632 service monitor reply packet destined to ovn-controller
633 with the action next, where E is the service monitor mac
634 defined in the options:svc_monitor_mac colum of NB_Global
635 table.
636
637 Ingress Table 8:from-lportQoS Marking
638
639 Logical flows in this table closely reproduce those in the QoS table
640 with the action column set in the OVN_Northbound database for the
641 from-lport direction.
642
643 · For every qos_rules entry in a logical switch with DSCP
644 marking enabled, a flow will be added at the priority
645 mentioned in the QoS table.
646
647 · One priority-0 fallback flow that matches all packets and
648 advances to the next table.
649
650 Ingress Table 9:from-lportQoS Meter
651
652 Logical flows in this table closely reproduce those in the QoS table
653 with the bandwidth column set in the OVN_Northbound database for the
654 from-lport direction.
655
656 · For every qos_rules entry in a logical switch with meter‐
657 ing enabled, a flow will be added at the priorirty men‐
658 tioned in the QoS table.
659
660 · One priority-0 fallback flow that matches all packets and
661 advances to the next table.
662
663 Ingress Table 10: LB
664
665 It contains a priority-0 flow that simply moves traffic to the next ta‐
666 ble.
667
668 A priority-65535 flow with the match inport == I for all logical switch
669 datapaths to move traffic to the next table. Where I is the peer of a
670 logical router port. This flow is added to skip the connection tracking
671 of packets which enter from logical router datapath to logical switch
672 datapath.
673
674 For established connections a priority 65534 flow matches on ct.est &&
675 !ct.rel && !ct.new && !ct.inv and sets an action reg0[2] = 1; next; to
676 act as a hint for table Stateful to send packets through connection
677 tracker to NAT the packets. (The packet will automatically get DNATed
678 to the same IP address as the first packet in that connection.)
679
680 Ingress Table 11: Stateful
681
682 · For all the configured load balancing rules for a switch
683 in OVN_Northbound database that includes a L4 port PORT
684 of protocol P and IP address VIP, a priority-120 flow is
685 added. For IPv4 VIPs , the flow matches ct.new && ip &&
686 ip4.dst == VIP && P && P.dst == PORT. For IPv6 VIPs, the
687 flow matches ct.new && ip && ip6.dst == VIP && P && P.dst
688 == PORT. The flow’s action is ct_lb(args) , where args
689 contains comma separated IP addresses (and optional port
690 numbers) to load balance to. The address family of the IP
691 addresses of args is the same as the address family of
692 VIP. If health check is enabled, then args will only con‐
693 tain those endpoints whose service monitor status entry
694 in OVN_Southbound db is either online or empty.
695
696 · For all the configured load balancing rules for a switch
697 in OVN_Northbound database that includes just an IP
698 address VIP to match on, OVN adds a priority-110 flow.
699 For IPv4 VIPs, the flow matches ct.new && ip && ip4.dst
700 == VIP. For IPv6 VIPs, the flow matches ct.new && ip &&
701 ip6.dst == VIP. The action on this flow is ct_lb(args),
702 where args contains comma separated IP addresses of the
703 same address family as VIP.
704
705 · If the load balancer is created with --reject option and
706 it has no active backends, a TCP reset segment (for tcp)
707 or an ICMP port unreachable packet (for all other kind of
708 traffic) will be sent whenever an incoming packet is
709 received for this load-balancer. Please note using
710 --reject option will disable empty_lb SB controller event
711 for this load balancer.
712
713 · A priority-100 flow commits packets to connection tracker
714 using ct_commit; next; action based on a hint provided by
715 the previous tables (with a match for reg0[1] == 1).
716
717 · A priority-100 flow sends the packets to connection
718 tracker using ct_lb; as the action based on a hint pro‐
719 vided by the previous tables (with a match for reg0[2] ==
720 1).
721
722 · A priority-0 flow that simply moves traffic to the next
723 table.
724
725 Ingress Table 12: Pre-Hairpin
726
727 · If the logical switch has load balancer(s) configured,
728 then a priorirty-100 flow is added with the match ip &&
729 ct.trk&& ct.dnat to check if the packet needs to be hair‐
730 pinned (if after load balancing the destination IP
731 matches the source IP) or not by executing the action
732 reg0[6] = chk_lb_hairpin(); and advances the packet to
733 the next table.
734
735 · If the logical switch has load balancer(s) configured,
736 then a priorirty-90 flow is added with the match ip to
737 check if the packet is a reply for a hairpinned connec‐
738 tion or not by executing the action reg0[6] =
739 chk_lb_hairpin_reply(); and advances the packet to the
740 next table.
741
742 · A priority-0 flow that simply moves traffic to the next
743 table.
744
745 Ingress Table 13: Nat-Hairpin
746
747 · If the logical switch has load balancer(s) configured,
748 then a priorirty-100 flow is added with the match ip &&
749 (ct.new || ct.est) && ct.trk && ct.dnat && reg0[6] == 1
750 which hairpins the traffic by NATting source IP to the
751 load balancer VIP by executing the action ct_snat_to_vip
752 and advances the packet to the next table.
753
754 · If the logical switch has load balancer(s) configured,
755 then a priorirty-90 flow is added with the match ip &&
756 reg0[6] == 1 which matches on the replies of hairpinned
757 traffic (i.e., destination IP is VIP, source IP is the
758 backend IP and source L4 port is backend port for L4 load
759 balancers) and executes ct_snat and advances the packet
760 to the next table.
761
762 · A priority-0 flow that simply moves traffic to the next
763 table.
764
765 Ingress Table 14: Hairpin
766
767 · A priority-1 flow that hairpins traffic matched by non-
768 default flows in the Pre-Hairpin table. Hairpinning is
769 done at L2, Ethernet addresses are swapped and the pack‐
770 ets are looped back on the input port.
771
772 · A priority-0 flow that simply moves traffic to the next
773 table.
774
775 Ingress Table 15: ARP/ND responder
776
777 This table implements ARP/ND responder in a logical switch for known
778 IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
779 by locally responding to ARP requests without the need to send to other
780 hypervisors. One common case is when the inport is a logical port asso‐
781 ciated with a VIF and the broadcast is responded to on the local hyper‐
782 visor rather than broadcast across the whole network and responded to
783 by the destination VM. This behavior is proxy ARP.
784
785 ARP requests arrive from VMs from a logical switch inport of type
786 default. For this case, the logical switch proxy ARP rules can be for
787 other VMs or logical router ports. Logical switch proxy ARP rules may
788 be programmed both for mac binding of IP addresses on other logical
789 switch VIF ports (which are of the default logical switch port type,
790 representing connectivity to VMs or containers), and for mac binding of
791 IP addresses on logical switch router type ports, representing their
792 logical router port peers. In order to support proxy ARP for logical
793 router ports, an IP address must be configured on the logical switch
794 router type port, with the same value as the peer logical router port.
795 The configured MAC addresses must match as well. When a VM sends an ARP
796 request for a distributed logical router port and if the peer router
797 type port of the attached logical switch does not have an IP address
798 configured, the ARP request will be broadcast on the logical switch.
799 One of the copies of the ARP request will go through the logical switch
800 router type port to the logical router datapath, where the logical
801 router ARP responder will generate a reply. The MAC binding of a dis‐
802 tributed logical router, once learned by an associated VM, is used for
803 all that VM’s communication needing routing. Hence, the action of a VM
804 re-arping for the mac binding of the logical router port should be
805 rare.
806
807 Logical switch ARP responder proxy ARP rules can also be hit when
808 receiving ARP requests externally on a L2 gateway port. In this case,
809 the hypervisor acting as an L2 gateway, responds to the ARP request on
810 behalf of a destination VM.
811
812 Note that ARP requests received from localnet or vtep logical inports
813 can either go directly to VMs, in which case the VM responds or can hit
814 an ARP responder for a logical router port if the packet is used to
815 resolve a logical router port next hop address. In either case, logical
816 switch ARP responder rules will not be hit. It contains these logical
817 flows:
818
819 · Priority-100 flows to skip the ARP responder if inport is
820 of type localnet or vtep and advances directly to the
821 next table. ARP requests sent to localnet or vtep ports
822 can be received by multiple hypervisors. Now, because the
823 same mac binding rules are downloaded to all hypervisors,
824 each of the multiple hypervisors will respond. This will
825 confuse L2 learning on the source of the ARP requests.
826 ARP requests received on an inport of type router are not
827 expected to hit any logical switch ARP responder flows.
828 However, no skip flows are installed for these packets,
829 as there would be some additional flow cost for this and
830 the value appears limited.
831
832 · If inport V is of type virtual adds a priority-100 logi‐
833 cal flow for each P configured in the options:virtual-
834 parents column with the match
835
836 inport == P && && ((arp.op == 1 && arp.spa == VIP && arp.tpa == VIP) || (arp.op == 2 && arp.spa == VIP))
837
838
839 and applies the action
840
841 bind_vport(V, inport);
842
843
844 and advances the packet to the next table.
845
846 Where VIP is the virtual ip configured in the column
847 options:virtual-ip.
848
849 · Priority-50 flows that match ARP requests to each known
850 IP address A of every logical switch port, and respond
851 with ARP replies directly with corresponding Ethernet
852 address E:
853
854 eth.dst = eth.src;
855 eth.src = E;
856 arp.op = 2; /* ARP reply. */
857 arp.tha = arp.sha;
858 arp.sha = E;
859 arp.tpa = arp.spa;
860 arp.spa = A;
861 outport = inport;
862 flags.loopback = 1;
863 output;
864
865
866 These flows are omitted for logical ports (other than
867 router ports or localport ports) that are down (unless
868 ignore_lsp_down is configured as true in options column
869 of NB_Global table of the Northbound database), for logi‐
870 cal ports of type virtual and for logical ports with
871 ’unknown’ address set.
872
873 · Priority-50 flows that match IPv6 ND neighbor solicita‐
874 tions to each known IP address A (and A’s solicited node
875 address) of every logical switch port except of type
876 router, and respond with neighbor advertisements directly
877 with corresponding Ethernet address E:
878
879 nd_na {
880 eth.src = E;
881 ip6.src = A;
882 nd.target = A;
883 nd.tll = E;
884 outport = inport;
885 flags.loopback = 1;
886 output;
887 };
888
889
890 Priority-50 flows that match IPv6 ND neighbor solicita‐
891 tions to each known IP address A (and A’s solicited node
892 address) of logical switch port of type router, and
893 respond with neighbor advertisements directly with corre‐
894 sponding Ethernet address E:
895
896 nd_na_router {
897 eth.src = E;
898 ip6.src = A;
899 nd.target = A;
900 nd.tll = E;
901 outport = inport;
902 flags.loopback = 1;
903 output;
904 };
905
906
907 These flows are omitted for logical ports (other than
908 router ports or localport ports) that are down (unless
909 ignore_lsp_down is configured as true in options column
910 of NB_Global table of the Northbound database), for logi‐
911 cal ports of type virtual and for logical ports with
912 ’unknown’ address set.
913
914 · Priority-100 flows with match criteria like the ARP and
915 ND flows above, except that they only match packets from
916 the inport that owns the IP addresses in question, with
917 action next;. These flows prevent OVN from replying to,
918 for example, an ARP request emitted by a VM for its own
919 IP address. A VM only makes this kind of request to
920 attempt to detect a duplicate IP address assignment, so
921 sending a reply will prevent the VM from accepting the IP
922 address that it owns.
923
924 In place of next;, it would be reasonable to use drop;
925 for the flows’ actions. If everything is working as it is
926 configured, then this would produce equivalent results,
927 since no host should reply to the request. But ARPing for
928 one’s own IP address is intended to detect situations
929 where the network is not working as configured, so drop‐
930 ping the request would frustrate that intent.
931
932 · For each SVC_MON_SRC_IP defined in the value of the
933 ip_port_mappings:ENDPOINT_IP column of Load_Balancer ta‐
934 ble, priority-110 logical flow is added with the match
935 arp.tpa == SVC_MON_SRC_IP && && arp.op == 1 and applies
936 the action
937
938 eth.dst = eth.src;
939 eth.src = E;
940 arp.op = 2; /* ARP reply. */
941 arp.tha = arp.sha;
942 arp.sha = E;
943 arp.tpa = arp.spa;
944 arp.spa = A;
945 outport = inport;
946 flags.loopback = 1;
947 output;
948
949
950 where E is the service monitor source mac defined in the
951 options:svc_monitor_mac column in the NB_Global table.
952 This mac is used as the source mac in the service monitor
953 packets for the load balancer endpoint IP health checks.
954
955 SVC_MON_SRC_IP is used as the source ip in the service
956 monitor IPv4 packets for the load balancer endpoint IP
957 health checks.
958
959 These flows are required if an ARP request is sent for
960 the IP SVC_MON_SRC_IP.
961
962 · For each VIP configured in the table Forwarding_Group a
963 priority-50 logical flow is added with the match arp.tpa
964 == vip && && arp.op == 1
965 and applies the action
966
967 eth.dst = eth.src;
968 eth.src = E;
969 arp.op = 2; /* ARP reply. */
970 arp.tha = arp.sha;
971 arp.sha = E;
972 arp.tpa = arp.spa;
973 arp.spa = A;
974 outport = inport;
975 flags.loopback = 1;
976 output;
977
978
979 where E is the forwarding group’s mac defined in the
980 vmac.
981
982 A is used as either the destination ip for load balancing
983 traffic to child ports or as nexthop to hosts behind the
984 child ports.
985
986 These flows are required to respond to an ARP request if
987 an ARP request is sent for the IP vip.
988
989 · One priority-0 fallback flow that matches all packets and
990 advances to the next table.
991
992 Ingress Table 16: DHCP option processing
993
994 This table adds the DHCPv4 options to a DHCPv4 packet from the logical
995 ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
996 larly for DHCPv6 options. This table also adds flows for the logical
997 ports of type external.
998
999 · A priority-100 logical flow is added for these logical
1000 ports which matches the IPv4 packet with udp.src = 68 and
1001 udp.dst = 67 and applies the action put_dhcp_opts and
1002 advances the packet to the next table.
1003
1004 reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
1005 next;
1006
1007
1008 For DHCPDISCOVER and DHCPREQUEST, this transforms the
1009 packet into a DHCP reply, adds the DHCP offer IP ip and
1010 options to the packet, and stores 1 into reg0[3]. For
1011 other kinds of packets, it just stores 0 into reg0[3].
1012 Either way, it continues to the next table.
1013
1014 · A priority-100 logical flow is added for these logical
1015 ports which matches the IPv6 packet with udp.src = 546
1016 and udp.dst = 547 and applies the action put_dhcpv6_opts
1017 and advances the packet to the next table.
1018
1019 reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
1020 next;
1021
1022
1023 For DHCPv6 Solicit/Request/Confirm packets, this trans‐
1024 forms the packet into a DHCPv6 Advertise/Reply, adds the
1025 DHCPv6 offer IP ip and options to the packet, and stores
1026 1 into reg0[3]. For other kinds of packets, it just
1027 stores 0 into reg0[3]. Either way, it continues to the
1028 next table.
1029
1030 · A priority-0 flow that matches all packets to advances to
1031 table 16.
1032
1033 Ingress Table 17: DHCP responses
1034
1035 This table implements DHCP responder for the DHCP replies generated by
1036 the previous table.
1037
1038 · A priority 100 logical flow is added for the logical
1039 ports configured with DHCPv4 options which matches IPv4
1040 packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
1041 1 and responds back to the inport after applying these
1042 actions. If reg0[3] is set to 1, it means that the action
1043 put_dhcp_opts was successful.
1044
1045 eth.dst = eth.src;
1046 eth.src = E;
1047 ip4.src = S;
1048 udp.src = 67;
1049 udp.dst = 68;
1050 outport = P;
1051 flags.loopback = 1;
1052 output;
1053
1054
1055 where E is the server MAC address and S is the server
1056 IPv4 address defined in the DHCPv4 options. Note that
1057 ip4.dst field is handled by put_dhcp_opts.
1058
1059 (This terminates ingress packet processing; the packet
1060 does not go to the next ingress table.)
1061
1062 · A priority 100 logical flow is added for the logical
1063 ports configured with DHCPv6 options which matches IPv6
1064 packets with udp.src == 546 && udp.dst == 547 && reg0[3]
1065 == 1 and responds back to the inport after applying these
1066 actions. If reg0[3] is set to 1, it means that the action
1067 put_dhcpv6_opts was successful.
1068
1069 eth.dst = eth.src;
1070 eth.src = E;
1071 ip6.dst = A;
1072 ip6.src = S;
1073 udp.src = 547;
1074 udp.dst = 546;
1075 outport = P;
1076 flags.loopback = 1;
1077 output;
1078
1079
1080 where E is the server MAC address and S is the server
1081 IPv6 LLA address generated from the server_id defined in
1082 the DHCPv6 options and A is the IPv6 address defined in
1083 the logical port’s addresses column.
1084
1085 (This terminates packet processing; the packet does not
1086 go on the next ingress table.)
1087
1088 · A priority-0 flow that matches all packets to advances to
1089 table 17.
1090
1091 Ingress Table 18 DNS Lookup
1092
1093 This table looks up and resolves the DNS names to the corresponding
1094 configured IP address(es).
1095
1096 · A priority-100 logical flow for each logical switch data‐
1097 path if it is configured with DNS records, which matches
1098 the IPv4 and IPv6 packets with udp.dst = 53 and applies
1099 the action dns_lookup and advances the packet to the next
1100 table.
1101
1102 reg0[4] = dns_lookup(); next;
1103
1104
1105 For valid DNS packets, this transforms the packet into a
1106 DNS reply if the DNS name can be resolved, and stores 1
1107 into reg0[4]. For failed DNS resolution or other kinds of
1108 packets, it just stores 0 into reg0[4]. Either way, it
1109 continues to the next table.
1110
1111 Ingress Table 19 DNS Responses
1112
1113 This table implements DNS responder for the DNS replies generated by
1114 the previous table.
1115
1116 · A priority-100 logical flow for each logical switch data‐
1117 path if it is configured with DNS records, which matches
1118 the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
1119 1 and responds back to the inport after applying these
1120 actions. If reg0[4] is set to 1, it means that the action
1121 dns_lookup was successful.
1122
1123 eth.dst <-> eth.src;
1124 ip4.src <-> ip4.dst;
1125 udp.dst = udp.src;
1126 udp.src = 53;
1127 outport = P;
1128 flags.loopback = 1;
1129 output;
1130
1131
1132 (This terminates ingress packet processing; the packet
1133 does not go to the next ingress table.)
1134
1135 Ingress table 20 External ports
1136
1137 Traffic from the external logical ports enter the ingress datapath
1138 pipeline via the localnet port. This table adds the below logical flows
1139 to handle the traffic from these ports.
1140
1141 · A priority-100 flow is added for each external logical
1142 port which doesn’t reside on a chassis to drop the
1143 ARP/IPv6 NS request to the router IP(s) (of the logical
1144 switch) which matches on the inport of the external logi‐
1145 cal port and the valid eth.src address(es) of the exter‐
1146 nal logical port.
1147
1148 This flow guarantees that the ARP/NS request to the
1149 router IP address from the external ports is responded by
1150 only the chassis which has claimed these external ports.
1151 All the other chassis, drops these packets.
1152
1153 A priority-100 flow is added for each external logical
1154 port which doesn’t reside on a chassis to drop any packet
1155 destined to the router mac - with the match inport ==
1156 external && eth.src == E && eth.dst == R && !is_chas‐
1157 sis_resident("external") where E is the external port mac
1158 and R is the router port mac.
1159
1160 · A priority-0 flow that matches all packets to advances to
1161 table 20.
1162
1163 Ingress Table 21 Destination Lookup
1164
1165 This table implements switching behavior. It contains these logical
1166 flows:
1167
1168 · A priorirty-110 flow with the match eth.src == E for all
1169 logical switch datapaths and applies the action han‐
1170 dle_svc_check(inport). Where E is the service monitor mac
1171 defined in the options:svc_monitor_mac colum of NB_Global
1172 table.
1173
1174 · A priority-100 flow that punts all IGMP/MLD packets to
1175 ovn-controller if multicast snooping is enabled on the
1176 logical switch. The flow also forwards the IGMP/MLD pack‐
1177 ets to the MC_MROUTER_STATIC multicast group, which
1178 ovn-northd populates with all the logical ports that have
1179 options :mcast_flood_reports=’true’.
1180
1181 · Priority-90 flows that forward registered IP multicast
1182 traffic to their corresponding multicast group, which
1183 ovn-northd creates based on learnt IGMP_Group entries.
1184 The flows also forward packets to the MC_MROUTER_FLOOD
1185 multicast group, which ovn-nortdh populates with all the
1186 logical ports that are connected to logical routers with
1187 options:mcast_relay=’true’.
1188
1189 · A priority-85 flow that forwards all IP multicast traffic
1190 destined to 224.0.0.X to the MC_FLOOD multicast group,
1191 which ovn-northd populates with all enabled logical
1192 ports.
1193
1194 · A priority-85 flow that forwards all IP multicast traffic
1195 destined to reserved multicast IPv6 addresses (RFC 4291,
1196 2.7.1, e.g., Solicited-Node multicast) to the MC_FLOOD
1197 multicast group, which ovn-northd populates with all
1198 enabled logical ports.
1199
1200 · A priority-80 flow that forwards all unregistered IP mul‐
1201 ticast traffic to the MC_STATIC multicast group, which
1202 ovn-northd populates with all the logical ports that have
1203 options :mcast_flood=’true’. The flow also forwards
1204 unregistered IP multicast traffic to the MC_MROUTER_FLOOD
1205 multicast group, which ovn-northd populates with all the
1206 logical ports connected to logical routers that have
1207 options :mcast_relay=’true’.
1208
1209 · A priority-80 flow that drops all unregistered IP multi‐
1210 cast traffic if other_config :mcast_snoop=’true’ and
1211 other_config :mcast_flood_unregistered=’false’ and the
1212 switch is not connected to a logical router that has
1213 options :mcast_relay=’true’ and the switch doesn’t have
1214 any logical port with options :mcast_flood=’true’.
1215
1216 · Priority-80 flows for each IP address/VIP/NAT address
1217 owned by a router port connected to the switch. These
1218 flows match ARP requests and ND packets for the specific
1219 IP addresses. Matched packets are forwarded only to the
1220 router that owns the IP address and to the MC_FLOOD_L2
1221 multicast group which contains all non-router logical
1222 ports.
1223
1224 · Priority-75 flows for each port connected to a logical
1225 router matching self originated ARP request/ND packets.
1226 These packets are flooded to the MC_FLOOD_L2 which con‐
1227 tains all non-router logical ports.
1228
1229 · A priority-70 flow that outputs all packets with an Eth‐
1230 ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
1231 ticast group.
1232
1233 · One priority-50 flow that matches each known Ethernet
1234 address against eth.dst and outputs the packet to the
1235 single associated output port.
1236
1237 For the Ethernet address on a logical switch port of type
1238 router, when that logical switch port’s addresses column
1239 is set to router and the connected logical router port
1240 has a gateway chassis:
1241
1242 · The flow for the connected logical router port’s
1243 Ethernet address is only programmed on the gateway
1244 chassis.
1245
1246 · If the logical router has rules specified in nat
1247 with external_mac, then those addresses are also
1248 used to populate the switch’s destination lookup
1249 on the chassis where logical_port is resident.
1250
1251 For the Ethernet address on a logical switch port of type
1252 router, when that logical switch port’s addresses column
1253 is set to router and the connected logical router port
1254 specifies a reside-on-redirect-chassis and the logical
1255 router to which the connected logical router port belongs
1256 to has a distributed gateway LRP:
1257
1258 · The flow for the connected logical router port’s
1259 Ethernet address is only programmed on the gateway
1260 chassis.
1261
1262 For each forwarding group configured on the logical
1263 switch datapath, a priority-50 flow that matches on
1264 eth.dst == VIP
1265 with an action of fwd_group(childports=args ), where
1266 args contains comma separated logical switch child ports
1267 to load balance to. If liveness is enabled, then action
1268 also includes liveness=true.
1269
1270 · One priority-0 fallback flow that matches all packets and
1271 outputs them to the MC_UNKNOWN multicast group, which
1272 ovn-northd populates with all enabled logical ports that
1273 accept unknown destination packets. As a small optimiza‐
1274 tion, if no logical ports accept unknown destination
1275 packets, ovn-northd omits this multicast group and logi‐
1276 cal flow.
1277
1278 Egress Table 0: Pre-LB
1279
1280 This table is similar to ingress table Pre-LB. It contains a priority-0
1281 flow that simply moves traffic to the next table. Moreover it contains
1282 a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
1283 table. If any load balancing rules exist for the datapath, a prior‐
1284 ity-100 flow is added with a match of ip and action of reg0[0] = 1;
1285 next; to act as a hint for table Pre-stateful to send IP packets to the
1286 connection tracker for packet de-fragmentation.
1287
1288 This table also has a priority-110 flow with the match eth.src == E for
1289 all logical switch datapaths to move traffic to the next table. Where E
1290 is the service monitor mac defined in the options:svc_monitor_mac colum
1291 of NB_Global table.
1292
1293 Egress Table 1:to-lportPre-ACLs
1294
1295 This is similar to ingress table Pre-ACLs except for to-lport traffic.
1296
1297 This table also has a priority-110 flow with the match eth.src == E for
1298 all logical switch datapaths to move traffic to the next table. Where E
1299 is the service monitor mac defined in the options:svc_monitor_mac colum
1300 of NB_Global table.
1301
1302 This table also has a priority-110 flow with the match outport == I for
1303 all logical switch datapaths to move traffic to the next table. Where I
1304 is the peer of a logical router port. This flow is added to skip the
1305 connection tracking of packets which will be entering logical router
1306 datapath from logical switch datapath for routing.
1307
1308 Egress Table 2: Pre-stateful
1309
1310 This is similar to ingress table Pre-stateful.
1311
1312 Egress Table 3: LB
1313
1314 This is similar to ingress table LB.
1315
1316 Egress Table 4:from-lportACL hints
1317
1318 This is similar to ingress table ACL hints.
1319
1320 Egress Table 5:to-lportACLs
1321
1322 This is similar to ingress table ACLs except for to-lport ACLs.
1323
1324 In addition, the following flows are added.
1325
1326 · A priority 34000 logical flow is added for each logical
1327 port which has DHCPv4 options defined to allow the DHCPv4
1328 reply packet and which has DHCPv6 options defined to
1329 allow the DHCPv6 reply packet from the Ingress Table 16:
1330 DHCP responses.
1331
1332 · A priority 34000 logical flow is added for each logical
1333 switch datapath configured with DNS records with the
1334 match udp.dst = 53 to allow the DNS reply packet from the
1335 Ingress Table 18: DNS responses.
1336
1337 · A priority 34000 logical flow is added for each logical
1338 switch datapath with the match eth.src = E to allow the
1339 service monitor request packet generated by ovn-con‐
1340 troller with the action next, where E is the service mon‐
1341 itor mac defined in the options:svc_monitor_mac colum of
1342 NB_Global table.
1343
1344 Egress Table 6:to-lportQoS Marking
1345
1346 This is similar to ingress table QoS marking except they apply to
1347 to-lport QoS rules.
1348
1349 Egress Table 7:to-lportQoS Meter
1350
1351 This is similar to ingress table QoS meter except they apply to
1352 to-lport QoS rules.
1353
1354 Egress Table 8: Stateful
1355
1356 This is similar to ingress table Stateful except that there are no
1357 rules added for load balancing new connections.
1358
1359 Egress Table 9: Egress Port Security - IP
1360
1361 This is similar to the port security logic in table Ingress Port Secu‐
1362 rity - IP except that outport, eth.dst, ip4.dst and ip6.dst are checked
1363 instead of inport, eth.src, ip4.src and ip6.src
1364
1365 Egress Table 10: Egress Port Security - L2
1366
1367 This is similar to the ingress port security logic in ingress table
1368 Admission Control and Ingress Port Security - L2, but with important
1369 differences. Most obviously, outport and eth.dst are checked instead of
1370 inport and eth.src. Second, packets directed to broadcast or multicast
1371 eth.dst are always accepted instead of being subject to the port secu‐
1372 rity rules; this is implemented through a priority-100 flow that
1373 matches on eth.mcast with action output;. Moreover, to ensure that even
1374 broadcast and multicast packets are not delivered to disabled logical
1375 ports, a priority-150 flow for each disabled logical outport overrides
1376 the priority-100 flow with a drop; action. Finally if egress qos has
1377 been enabled on a localnet port, the outgoing queue id is set through
1378 set_queue action. Please remember to mark the corresponding physical
1379 interface with ovn-egress-iface set to true in external_ids
1380
1381 Logical Router Datapaths
1382 Logical router datapaths will only exist for Logical_Router rows in the
1383 OVN_Northbound database that do not have enabled set to false
1384
1385 Ingress Table 0: L2 Admission Control
1386
1387 This table drops packets that the router shouldn’t see at all based on
1388 their Ethernet headers. It contains the following flows:
1389
1390 · Priority-100 flows to drop packets with VLAN tags or mul‐
1391 ticast Ethernet source addresses.
1392
1393 · For each enabled router port P with Ethernet address E, a
1394 priority-50 flow that matches inport == P && (eth.mcast
1395 || eth.dst == E), stores the router port ethernet address
1396 and advances to next table, with action xreg0[0..47]=E;
1397 next;.
1398
1399 For the gateway port on a distributed logical router
1400 (where one of the logical router ports specifies a gate‐
1401 way chassis), the above flow matching eth.dst == E is
1402 only programmed on the gateway port instance on the gate‐
1403 way chassis.
1404
1405 · For each dnat_and_snat NAT rule on a distributed router
1406 that specifies an external Ethernet address E, a prior‐
1407 ity-50 flow that matches inport == GW && eth.dst == E,
1408 where GW is the logical router gateway port, with action
1409 xreg0[0..47]=E; next;.
1410
1411 This flow is only programmed on the gateway port instance
1412 on the chassis where the logical_port specified in the
1413 NAT rule resides.
1414
1415 Other packets are implicitly dropped.
1416
1417 Ingress Table 1: Neighbor lookup
1418
1419 For ARP and IPv6 Neighbor Discovery packets, this table looks into the
1420 MAC_Binding records to determine if OVN needs to learn the mac bind‐
1421 ings. Following flows are added:
1422
1423 · For each router port P that owns IP address A, which
1424 belongs to subnet S with prefix length L, if the option
1425 always_learn_from_arp_request is true for this router, a
1426 priority-100 flow is added which matches inport == P &&
1427 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1428 lowing actions:
1429
1430 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1431 next;
1432
1433
1434 If the option always_learn_from_arp_request is false, the
1435 following two flows are added.
1436
1437 A priority-110 flow is added which matches inport == P &&
1438 arp.spa == S/L && arp.tpa == A && arp.op == 1 (ARP
1439 request) with the following actions:
1440
1441 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1442 reg9[3] = 1;
1443 next;
1444
1445
1446 A priority-100 flow is added which matches inport == P &&
1447 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1448 lowing actions:
1449
1450 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1451 reg9[3] = lookup_arp_ip(inport, arp.spa);
1452 next;
1453
1454
1455 If the logical router port P is a distributed gateway
1456 router port, additional match is_chassis_resident(cr-P)
1457 is added for all these flows.
1458
1459 · A priority-100 flow which matches on ARP reply packets
1460 and applies the actions if the option
1461 always_learn_from_arp_request is true:
1462
1463 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1464 next;
1465
1466
1467 If the option always_learn_from_arp_request is false, the
1468 above actions will be:
1469
1470 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1471 reg9[3] = 1;
1472 next;
1473
1474
1475 · A priority-100 flow which matches on IPv6 Neighbor Dis‐
1476 covery advertisement packet and applies the actions if
1477 the option always_learn_from_arp_request is true:
1478
1479 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1480 next;
1481
1482
1483 If the option always_learn_from_arp_request is false, the
1484 above actions will be:
1485
1486 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1487 reg9[3] = 1;
1488 next;
1489
1490
1491 · A priority-100 flow which matches on IPv6 Neighbor Dis‐
1492 covery solicitation packet and applies the actions if the
1493 option always_learn_from_arp_request is true:
1494
1495 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1496 next;
1497
1498
1499 If the option always_learn_from_arp_request is false, the
1500 above actions will be:
1501
1502 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1503 reg9[3] = lookup_nd_ip(inport, ip6.src);
1504 next;
1505
1506
1507 · A priority-0 fallback flow that matches all packets and
1508 applies the action reg9[2] = 1; next; advancing the
1509 packet to the next table.
1510
1511 Ingress Table 2: Neighbor learning
1512
1513 This table adds flows to learn the mac bindings from the ARP and IPv6
1514 Neighbor Solicitation/Advertisement packets if it is needed according
1515 to the lookup results from the previous stage.
1516
1517 reg9[2] will be 1 if the lookup_arp/lookup_nd in the previous table was
1518 successful or skipped, meaning no need to learn mac binding from the
1519 packet.
1520
1521 reg9[3] will be 1 if the lookup_arp_ip/lookup_nd_ip in the previous ta‐
1522 ble was successful or skipped, meaning it is ok to learn mac binding
1523 from the packet (if reg9[2] is 0).
1524
1525 · A priority-100 flow with the match reg9[2] == 1 ||
1526 reg9[3] == 0 and advances the packet to the next table as
1527 there is no need to learn the neighbor.
1528
1529 · A priority-90 flow with the match arp and applies the
1530 action put_arp(inport, arp.spa, arp.sha); next;
1531
1532 · A priority-90 flow with the match nd_na and applies the
1533 action put_nd(inport, nd.target, nd.tll); next;
1534
1535 · A priority-90 flow with the match nd_ns and applies the
1536 action put_nd(inport, ip6.src, nd.sll); next;
1537
1538 Ingress Table 3: IP Input
1539
1540 This table is the core of the logical router datapath functionality. It
1541 contains the following flows to implement very basic IP host function‐
1542 ality.
1543
1544 · For each NAT entry of a distributed logical router (with
1545 distributed gateway router port) of type snat, a pri‐
1546 orirty-120 flow with the match inport == P && ip4.src ==
1547 A advances the packet to the next pipeline, where P is
1548 the distributed logical router port and A is the exter‐
1549 nal_ip set in the NAT entry. If A is an IPv6 address,
1550 then ip6.src is used for the match.
1551
1552 The above flow is required to handle the routing of the
1553 East/west NAT traffic.
1554
1555 · For each BFD port the two following priority-110 flows
1556 are added to manage BFD traffic:
1557
1558 · if ip4.src or ip6.src is any IP address owned by
1559 the router port and udp.dst == 3784 , the packet
1560 is advanced to the next pipeline stage.
1561
1562 · if ip4.dst or ip6.dst is any IP address owned by
1563 the router port and udp.dst == 3784 , the han‐
1564 dle_bfd_msg action is executed.
1565
1566 · L3 admission control: A priority-100 flow drops packets
1567 that match any of the following:
1568
1569 · ip4.src[28..31] == 0xe (multicast source)
1570
1571 · ip4.src == 255.255.255.255 (broadcast source)
1572
1573 · ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
1574 (localhost source or destination)
1575
1576 · ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
1577 network source or destination)
1578
1579 · ip4.src or ip6.src is any IP address owned by the
1580 router, unless the packet was recirculated due to
1581 egress loopback as indicated by REG‐
1582 BIT_EGRESS_LOOPBACK.
1583
1584 · ip4.src is the broadcast address of any IP network
1585 known to the router.
1586
1587 · A priority-100 flow parses DHCPv6 replies from IPv6 pre‐
1588 fix delegation routers (udp.src == 547 && udp.dst ==
1589 546). The handle_dhcpv6_reply is used to send IPv6 prefix
1590 delegation messages to the delegation router.
1591
1592 · ICMP echo reply. These flows reply to ICMP echo requests
1593 received for the router’s IP address. Let A be an IP
1594 address owned by a router port. Then, for each A that is
1595 an IPv4 address, a priority-90 flow matches on ip4.dst ==
1596 A and icmp4.type == 8 && icmp4.code == 0 (ICMP echo
1597 request). For each A that is an IPv6 address, a prior‐
1598 ity-90 flow matches on ip6.dst == A and icmp6.type == 128
1599 && icmp6.code == 0 (ICMPv6 echo request). The port of the
1600 router that receives the echo request does not matter.
1601 Also, the ip.ttl of the echo request packet is not
1602 checked, so it complies with RFC 1812, section 4.2.2.9.
1603 Flows for ICMPv4 echo requests use the following actions:
1604
1605 ip4.dst <-> ip4.src;
1606 ip.ttl = 255;
1607 icmp4.type = 0;
1608 flags.loopback = 1;
1609 next;
1610
1611
1612 Flows for ICMPv6 echo requests use the following actions:
1613
1614 ip6.dst <-> ip6.src;
1615 ip.ttl = 255;
1616 icmp6.type = 129;
1617 flags.loopback = 1;
1618 next;
1619
1620
1621 · Reply to ARP requests.
1622
1623 These flows reply to ARP requests for the router’s own IP
1624 address. The ARP requests are handled only if the
1625 requestor’s IP belongs to the same subnets of the logical
1626 router port. For each router port P that owns IP address
1627 A, which belongs to subnet S with prefix length L, and
1628 Ethernet address E, a priority-90 flow matches inport ==
1629 P && arp.spa == S/L && arp.op == 1 && arp.tpa == A (ARP
1630 request) with the following actions:
1631
1632 eth.dst = eth.src;
1633 eth.src = xreg0[0..47];
1634 arp.op = 2; /* ARP reply. */
1635 arp.tha = arp.sha;
1636 arp.sha = xreg0[0..47];
1637 arp.tpa = arp.spa;
1638 arp.spa = A;
1639 outport = inport;
1640 flags.loopback = 1;
1641 output;
1642
1643
1644 For the gateway port on a distributed logical router
1645 (where one of the logical router ports specifies a gate‐
1646 way chassis), the above flows are only programmed on the
1647 gateway port instance on the gateway chassis. This behav‐
1648 ior avoids generation of multiple ARP responses from dif‐
1649 ferent chassis, and allows upstream MAC learning to point
1650 to the gateway chassis.
1651
1652 For the logical router port with the option reside-on-re‐
1653 direct-chassis set (which is centralized), the above
1654 flows are only programmed on the gateway port instance on
1655 the gateway chassis (if the logical router has a distrib‐
1656 uted gateway port). This behavior avoids generation of
1657 multiple ARP responses from different chassis, and allows
1658 upstream MAC learning to point to the gateway chassis.
1659
1660 · Reply to IPv6 Neighbor Solicitations. These flows reply
1661 to Neighbor Solicitation requests for the router’s own
1662 IPv6 address and populate the logical router’s mac bind‐
1663 ing table.
1664
1665 For each router port P that owns IPv6 address A,
1666 solicited node address S, and Ethernet address E, a pri‐
1667 ority-90 flow matches inport == P && nd_ns && ip6.dst ==
1668 {A, E} && nd.target == A with the following actions:
1669
1670 nd_na_router {
1671 eth.src = xreg0[0..47];
1672 ip6.src = A;
1673 nd.target = A;
1674 nd.tll = xreg0[0..47];
1675 outport = inport;
1676 flags.loopback = 1;
1677 output;
1678 };
1679
1680
1681 For the gateway port on a distributed logical router
1682 (where one of the logical router ports specifies a gate‐
1683 way chassis), the above flows replying to IPv6 Neighbor
1684 Solicitations are only programmed on the gateway port
1685 instance on the gateway chassis. This behavior avoids
1686 generation of multiple replies from different chassis,
1687 and allows upstream MAC learning to point to the gateway
1688 chassis.
1689
1690 · These flows reply to ARP requests or IPv6 neighbor solic‐
1691 itation for the virtual IP addresses configured in the
1692 router for NAT (both DNAT and SNAT) or load balancing.
1693
1694 IPv4: For a configured NAT (both DNAT and SNAT) IP
1695 address or a load balancer IPv4 VIP A, for each router
1696 port P with Ethernet address E, a priority-90 flow
1697 matches arp.op == 1 && arp.tpa == A (ARP request) with
1698 the following actions:
1699
1700 eth.dst = eth.src;
1701 eth.src = xreg0[0..47];
1702 arp.op = 2; /* ARP reply. */
1703 arp.tha = arp.sha;
1704 arp.sha = xreg0[0..47];
1705 arp.tpa = arp.spa;
1706 arp.spa = A;
1707 outport = inport;
1708 flags.loopback = 1;
1709 output;
1710
1711
1712 IPv4: For a configured load balancer IPv4 VIP, a similar
1713 flow is added with the additional match inport == P.
1714
1715 If the router port P is a distributed gateway router
1716 port, then the is_chassis_resident(P) is also added in
1717 the match condition for the load balancer IPv4 VIP A.
1718
1719 IPv6: For a configured NAT (both DNAT and SNAT) IP
1720 address or a load balancer IPv6 VIP A, solicited node
1721 address S, for each router port P with Ethernet address
1722 E, a priority-90 flow matches inport == P && nd_ns &&
1723 ip6.dst == {A, S} && nd.target == A with the following
1724 actions:
1725
1726 eth.dst = eth.src;
1727 nd_na {
1728 eth.src = xreg0[0..47];
1729 nd.tll = xreg0[0..47];
1730 ip6.src = A;
1731 nd.target = A;
1732 outport = inport;
1733 flags.loopback = 1;
1734 output;
1735 }
1736
1737
1738 If the router port P is a distributed gateway router
1739 port, then the is_chassis_resident(P) is also added in
1740 the match condition for the load balancer IPv6 VIP A.
1741
1742 For the gateway port on a distributed logical router with
1743 NAT (where one of the logical router ports specifies a
1744 gateway chassis):
1745
1746 · If the corresponding NAT rule cannot be handled in
1747 a distributed manner, then a priority-92 flow is
1748 programmed on the gateway port instance on the
1749 gateway chassis. A priority-91 drop flow is pro‐
1750 grammed on the other chassis when ARP requests/NS
1751 packets are received on the gateway port. This
1752 behavior avoids generation of multiple ARP
1753 responses from different chassis, and allows
1754 upstream MAC learning to point to the gateway
1755 chassis.
1756
1757 · If the corresponding NAT rule can be handled in a
1758 distributed manner, then this flow is only pro‐
1759 grammed on the gateway port instance where the
1760 logical_port specified in the NAT rule resides.
1761
1762 Some of the actions are different for this case,
1763 using the external_mac specified in the NAT rule
1764 rather than the gateway port’s Ethernet address E:
1765
1766 eth.src = external_mac;
1767 arp.sha = external_mac;
1768
1769
1770 or in the case of IPv6 neighbor solicition:
1771
1772 eth.src = external_mac;
1773 nd.tll = external_mac;
1774
1775
1776 This behavior avoids generation of multiple ARP
1777 responses from different chassis, and allows
1778 upstream MAC learning to point to the correct
1779 chassis.
1780
1781 · Priority-85 flows which drops the ARP and IPv6 Neighbor
1782 Discovery packets.
1783
1784 · A priority-84 flow explicitly allows IPv6 multicast traf‐
1785 fic that is supposed to reach the router pipeline (i.e.,
1786 router solicitation and router advertisement packets).
1787
1788 · A priority-83 flow explicitly drops IPv6 multicast traf‐
1789 fic that is destined to reserved multicast groups.
1790
1791 · A priority-82 flow allows IP multicast traffic if
1792 options:mcast_relay=’true’, otherwise drops it.
1793
1794 · UDP port unreachable. Priority-80 flows generate ICMP
1795 port unreachable messages in reply to UDP datagrams
1796 directed to the router’s IP address, except in the spe‐
1797 cial case of gateways, which accept traffic directed to a
1798 router IP for load balancing and NAT purposes.
1799
1800 These flows should not match IP fragments with nonzero
1801 offset.
1802
1803 · TCP reset. Priority-80 flows generate TCP reset messages
1804 in reply to TCP datagrams directed to the router’s IP
1805 address, except in the special case of gateways, which
1806 accept traffic directed to a router IP for load balancing
1807 and NAT purposes.
1808
1809 These flows should not match IP fragments with nonzero
1810 offset.
1811
1812 · Protocol or address unreachable. Priority-70 flows gener‐
1813 ate ICMP protocol or address unreachable messages for
1814 IPv4 and IPv6 respectively in reply to packets directed
1815 to the router’s IP address on IP protocols other than
1816 UDP, TCP, and ICMP, except in the special case of gate‐
1817 ways, which accept traffic directed to a router IP for
1818 load balancing purposes.
1819
1820 These flows should not match IP fragments with nonzero
1821 offset.
1822
1823 · Drop other IP traffic to this router. These flows drop
1824 any other traffic destined to an IP address of this
1825 router that is not already handled by one of the flows
1826 above, which amounts to ICMP (other than echo requests)
1827 and fragments with nonzero offsets. For each IP address A
1828 owned by the router, a priority-60 flow matches ip4.dst
1829 == A or ip6.dst == A and drops the traffic. An exception
1830 is made and the above flow is not added if the router
1831 port’s own IP address is used to SNAT packets passing
1832 through that router.
1833
1834 The flows above handle all of the traffic that might be directed to the
1835 router itself. The following flows (with lower priorities) handle the
1836 remaining traffic, potentially for forwarding:
1837
1838 · Drop Ethernet local broadcast. A priority-50 flow with
1839 match eth.bcast drops traffic destined to the local Eth‐
1840 ernet broadcast address. By definition this traffic
1841 should not be forwarded.
1842
1843 · ICMP time exceeded. For each router port P, whose IP
1844 address is A, a priority-40 flow with match inport == P
1845 && ip.ttl == {0, 1} && !ip.later_frag matches packets
1846 whose TTL has expired, with the following actions to send
1847 an ICMP time exceeded reply for IPv4 and IPv6 respec‐
1848 tively:
1849
1850 icmp4 {
1851 icmp4.type = 11; /* Time exceeded. */
1852 icmp4.code = 0; /* TTL exceeded in transit. */
1853 ip4.dst = ip4.src;
1854 ip4.src = A;
1855 ip.ttl = 255;
1856 next;
1857 };
1858 icmp6 {
1859 icmp6.type = 3; /* Time exceeded. */
1860 icmp6.code = 0; /* TTL exceeded in transit. */
1861 ip6.dst = ip6.src;
1862 ip6.src = A;
1863 ip.ttl = 255;
1864 next;
1865 };
1866
1867
1868 · TTL discard. A priority-30 flow with match ip.ttl == {0,
1869 1} and actions drop; drops other packets whose TTL has
1870 expired, that should not receive a ICMP error reply (i.e.
1871 fragments with nonzero offset).
1872
1873 · Next table. A priority-0 flows match all packets that
1874 aren’t already handled and uses actions next; to feed
1875 them to the next table.
1876
1877 Ingress Table 4: DEFRAG
1878
1879 This is to send packets to connection tracker for tracking and defrag‐
1880 mentation. It contains a priority-0 flow that simply moves traffic to
1881 the next table.
1882
1883 If load balancing rules with virtual IP addresses (and ports) are con‐
1884 figured in OVN_Northbound database for a Gateway router, a priority-100
1885 flow is added for each configured virtual IP address VIP. For IPv4 VIPs
1886 the flow matches ip && ip4.dst == VIP. For IPv6 VIPs, the flow matches
1887 ip && ip6.dst == VIP. The flow uses the action ct_next; to send IP
1888 packets to the connection tracker for packet de-fragmentation and
1889 tracking before sending it to the next table.
1890
1891 If ECMP routes with symmetric reply are configured in the OVN_North‐
1892 bound database for a gateway router, a priority-100 flow is added for
1893 each router port on which symmetric replies are configured. The match‐
1894 ing logic for these ports essentially reverses the configured logic of
1895 the ECMP route. So for instance, a route with a destination routing
1896 policy will instead match if the source IP address matches the static
1897 route’s prefix. The flow uses the action ct_next to send IP packets to
1898 the connection tracker for packet de-fragmentation and tracking before
1899 sending it to the next table.
1900
1901 Ingress Table 5: UNSNAT
1902
1903 This is for already established connections’ reverse traffic. i.e.,
1904 SNAT has already been done in egress pipeline and now the packet has
1905 entered the ingress pipeline as part of a reply. It is unSNATted here.
1906
1907 Ingress Table 5: UNSNAT on Gateway and Distributed Routers
1908
1909 · If the Router (Gateway or Distributed) is configured with
1910 load balancers, then below lflows are added:
1911
1912 For each IPv4 address A defined as load balancer VIP with
1913 the protocol P (and the protocol port T if defined) is
1914 also present as an external_ip in the NAT table, a prior‐
1915 ity-120 logical flow is added with the match ip4 &&
1916 ip4.dst == A && P with the action next; to advance the
1917 packet to the next table. If the load balancer has proto‐
1918 col port B defined, then the match also has P.dst == B.
1919
1920 The above flows are also added for IPv6 load balancers.
1921
1922 Ingress Table 5: UNSNAT on Gateway Routers
1923
1924 · If the Gateway router has been configured to force SNAT
1925 any previously DNATted packets to B, a priority-110 flow
1926 matches ip && ip4.dst == B or ip && ip6.dst == B with an
1927 action ct_snat; .
1928
1929 If the Gateway router has been configured to force SNAT
1930 any previously load-balanced packets to B, a priority-100
1931 flow matches ip && ip4.dst == B or ip && ip6.dst == B
1932 with an action ct_snat; .
1933
1934 For each NAT configuration in the OVN Northbound data‐
1935 base, that asks to change the source IP address of a
1936 packet from A to B, a priority-90 flow matches ip &&
1937 ip4.dst == B or ip && ip6.dst == B with an action
1938 ct_snat; . If the NAT rule is of type dnat_and_snat and
1939 has stateless=true in the options, then the action would
1940 be ip4/6.dst= (B).
1941
1942 A priority-0 logical flow with match 1 has actions next;.
1943
1944 Ingress Table 5: UNSNAT on Distributed Routers
1945
1946 · For each configuration in the OVN Northbound database,
1947 that asks to change the source IP address of a packet
1948 from A to B, a priority-100 flow matches ip && ip4.dst ==
1949 B && inport == GW or ip && ip6.dst == B && inport == GW
1950 where GW is the logical router gateway port, with an
1951 action ct_snat;. If the NAT rule is of type dnat_and_snat
1952 and has stateless=true in the options, then the action
1953 would be ip4/6.dst= (B).
1954
1955 If the NAT rule cannot be handled in a distributed man‐
1956 ner, then the priority-100 flow above is only programmed
1957 on the gateway chassis.
1958
1959 A priority-0 logical flow with match 1 has actions next;.
1960
1961 Ingress Table 6: DNAT
1962
1963 Packets enter the pipeline with destination IP address that needs to be
1964 DNATted from a virtual IP address to a real IP address. Packets in the
1965 reverse direction needs to be unDNATed.
1966
1967 Ingress Table 6: Load balancing DNAT rules
1968
1969 Following load balancing DNAT flows are added for Gateway router or
1970 Router with gateway port. These flows are programmed only on the gate‐
1971 way chassis. These flows do not get programmed for load balancers with
1972 IPv6 VIPs.
1973
1974 · If controller_event has been enabled for all the config‐
1975 ured load balancing rules for a Gateway router or Router
1976 with gateway port in OVN_Northbound database that does
1977 not have configured backends, a priority-130 flow is
1978 added to trigger ovn-controller events whenever the chas‐
1979 sis receives a packet for that particular VIP. If
1980 event-elb meter has been previously created, it will be
1981 associated to the empty_lb logical flow
1982
1983 · For all the configured load balancing rules for a Gateway
1984 router or Router with gateway port in OVN_Northbound
1985 database that includes a L4 port PORT of protocol P and
1986 IPv4 or IPv6 address VIP, a priority-120 flow that
1987 matches on ct.new && ip && ip4.dst == VIP && P && P.dst
1988 == PORT
1989 (ip6.dst == VIP in the IPv6 case) with an action of
1990 ct_lb(args), where args contains comma separated IPv4 or
1991 IPv6 addresses (and optional port numbers) to load bal‐
1992 ance to. If the router is configured to force SNAT any
1993 load-balanced packets, the above action will be replaced
1994 by flags.force_snat_for_lb = 1; ct_lb(args);. If health
1995 check is enabled, then args will only contain those end‐
1996 points whose service monitor status entry in OVN_South‐
1997 bound db is either online or empty.
1998
1999 · For all the configured load balancing rules for a router
2000 in OVN_Northbound database that includes a L4 port PORT
2001 of protocol P and IPv4 or IPv6 address VIP, a prior‐
2002 ity-120 flow that matches on ct.est && ip && ip4.dst ==
2003 VIP && P && P.dst == PORT
2004 (ip6.dst == VIP in the IPv6 case) with an action of
2005 ct_dnat;. If the router is configured to force SNAT any
2006 load-balanced packets, the above action will be replaced
2007 by flags.force_snat_for_lb = 1; ct_dnat;.
2008
2009 · For all the configured load balancing rules for a router
2010 in OVN_Northbound database that includes just an IP
2011 address VIP to match on, a priority-110 flow that matches
2012 on ct.new && ip && ip4.dst == VIP (ip6.dst == VIP in the
2013 IPv6 case) with an action of ct_lb(args), where args con‐
2014 tains comma separated IPv4 or IPv6 addresses. If the
2015 router is configured to force SNAT any load-balanced
2016 packets, the above action will be replaced by
2017 flags.force_snat_for_lb = 1; ct_lb(args);.
2018
2019 · For all the configured load balancing rules for a router
2020 in OVN_Northbound database that includes just an IP
2021 address VIP to match on, a priority-110 flow that matches
2022 on ct.est && ip && ip4.dst == VIP (or ip6.dst == VIP)
2023 with an action of ct_dnat;. If the router is configured
2024 to force SNAT any load-balanced packets, the above action
2025 will be replaced by flags.force_snat_for_lb = 1;
2026 ct_dnat;.
2027
2028 · If the load balancer is created with --reject option and
2029 it has no active backends, a TCP reset segment (for tcp)
2030 or an ICMP port unreachable packet (for all other kind of
2031 traffic) will be sent whenever an incoming packet is
2032 received for this load-balancer. Please note using
2033 --reject option will disable empty_lb SB controller event
2034 for this load balancer.
2035
2036 Ingress Table 6: DNAT on Gateway Routers
2037
2038 · For each configuration in the OVN Northbound database,
2039 that asks to change the destination IP address of a
2040 packet from A to B, a priority-100 flow matches ip &&
2041 ip4.dst == A or ip && ip6.dst == A with an action
2042 flags.loopback = 1; ct_dnat(B);. If the Gateway router is
2043 configured to force SNAT any DNATed packet, the above
2044 action will be replaced by flags.force_snat_for_dnat = 1;
2045 flags.loopback = 1; ct_dnat(B);. If the NAT rule is of
2046 type dnat_and_snat and has stateless=true in the options,
2047 then the action would be ip4/6.dst= (B).
2048
2049 If the NAT rule has allowed_ext_ips configured, then
2050 there is an additional match ip4.src == allowed_ext_ips .
2051 Similarly, for IPV6, match would be ip6.src ==
2052 allowed_ext_ips.
2053
2054 If the NAT rule has exempted_ext_ips set, then there is
2055 an additional flow configured at priority 101. The flow
2056 matches if source ip is an exempted_ext_ip and the action
2057 is next; . This flow is used to bypass the ct_dnat action
2058 for a packet originating from exempted_ext_ips.
2059
2060 · For all IP packets of a Gateway router, a priority-50
2061 flow with an action flags.loopback = 1; ct_dnat;.
2062
2063 · A priority-0 logical flow with match 1 has actions next;.
2064
2065 Ingress Table 6: DNAT on Distributed Routers
2066
2067 On distributed routers, the DNAT table only handles packets with desti‐
2068 nation IP address that needs to be DNATted from a virtual IP address to
2069 a real IP address. The unDNAT processing in the reverse direction is
2070 handled in a separate table in the egress pipeline.
2071
2072 · For each configuration in the OVN Northbound database,
2073 that asks to change the destination IP address of a
2074 packet from A to B, a priority-100 flow matches ip &&
2075 ip4.dst == B && inport == GW, where GW is the logical
2076 router gateway port, with an action ct_dnat(B);. The
2077 match will include ip6.dst == B in the IPv6 case. If the
2078 NAT rule is of type dnat_and_snat and has stateless=true
2079 in the options, then the action would be ip4/6.dst=(B).
2080
2081 If the NAT rule cannot be handled in a distributed man‐
2082 ner, then the priority-100 flow above is only programmed
2083 on the gateway chassis.
2084
2085 If the NAT rule has allowed_ext_ips configured, then
2086 there is an additional match ip4.src == allowed_ext_ips .
2087 Similarly, for IPV6, match would be ip6.src ==
2088 allowed_ext_ips.
2089
2090 If the NAT rule has exempted_ext_ips set, then there is
2091 an additional flow configured at priority 101. The flow
2092 matches if source ip is an exempted_ext_ip and the action
2093 is next; . This flow is used to bypass the ct_dnat action
2094 for a packet originating from exempted_ext_ips.
2095
2096 A priority-0 logical flow with match 1 has actions next;.
2097
2098 Ingress Table 7: ECMP symmetric reply processing
2099
2100 · If ECMP routes with symmetric reply are configured in the
2101 OVN_Northbound database for a gateway router, a prior‐
2102 ity-100 flow is added for each router port on which sym‐
2103 metric replies are configured. The matching logic for
2104 these ports essentially reverses the configured logic of
2105 the ECMP route. So for instance, a route with a destina‐
2106 tion routing policy will instead match if the source IP
2107 address matches the static route’s prefix. The flow uses
2108 the action ct_commit { ct_label.ecmp_reply_eth =
2109 eth.src;" " ct_label.ecmp_reply_port = K;}; next; to
2110 commit the connection and storing eth.src and the ECMP
2111 reply port binding tunnel key K in the ct_label.
2112
2113 Ingress Table 8: IPv6 ND RA option processing
2114
2115 · A priority-50 logical flow is added for each logical
2116 router port configured with IPv6 ND RA options which
2117 matches IPv6 ND Router Solicitation packet and applies
2118 the action put_nd_ra_opts and advances the packet to the
2119 next table.
2120
2121 reg0[5] = put_nd_ra_opts(options);next;
2122
2123
2124 For a valid IPv6 ND RS packet, this transforms the packet
2125 into an IPv6 ND RA reply and sets the RA options to the
2126 packet and stores 1 into reg0[5]. For other kinds of
2127 packets, it just stores 0 into reg0[5]. Either way, it
2128 continues to the next table.
2129
2130 · A priority-0 logical flow with match 1 has actions next;.
2131
2132 Ingress Table 9: IPv6 ND RA responder
2133
2134 This table implements IPv6 ND RA responder for the IPv6 ND RA replies
2135 generated by the previous table.
2136
2137 · A priority-50 logical flow is added for each logical
2138 router port configured with IPv6 ND RA options which
2139 matches IPv6 ND RA packets and reg0[5] == 1 and responds
2140 back to the inport after applying these actions. If
2141 reg0[5] is set to 1, it means that the action
2142 put_nd_ra_opts was successful.
2143
2144 eth.dst = eth.src;
2145 eth.src = E;
2146 ip6.dst = ip6.src;
2147 ip6.src = I;
2148 outport = P;
2149 flags.loopback = 1;
2150 output;
2151
2152
2153 where E is the MAC address and I is the IPv6 link local
2154 address of the logical router port.
2155
2156 (This terminates packet processing in ingress pipeline;
2157 the packet does not go to the next ingress table.)
2158
2159 · A priority-0 logical flow with match 1 has actions next;.
2160
2161 Ingress Table 10: IP Routing
2162
2163 A packet that arrives at this table is an IP packet that should be
2164 routed to the address in ip4.dst or ip6.dst. This table implements IP
2165 routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
2166 (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
2167 and advances to the next table for ARP resolution. It also sets reg1
2168 (or xxreg1) to the IP address owned by the selected router port
2169 (ingress table ARP Request will generate an ARP request, if needed,
2170 with reg0 as the target protocol address and reg1 as the source proto‐
2171 col address).
2172
2173 For ECMP routes, i.e. multiple static routes with same policy and pre‐
2174 fix but different nexthops, the above actions are deferred to next ta‐
2175 ble. This table, instead, is responsible for determine the ECMP group
2176 id and select a member id within the group based on 5-tuple hashing. It
2177 stores group id in reg8[0..15] and member id in reg8[16..31]. This step
2178 is skipped if the traffic going out the ECMP route is reply traffic,
2179 and the ECMP route was configured to use symmetric replies. Instead,
2180 the stored ct_label value is used to choose the destination. The least
2181 significant 48 bits of the ct_label tell the destination MAC address to
2182 which the packet should be sent. The next 16 bits tell the logical
2183 router port on which the packet should be sent. These values in the
2184 ct_label are set when the initial ingress traffic is received over the
2185 ECMP route.
2186
2187 This table contains the following logical flows:
2188
2189 · Priority-550 flow that drops IPv6 Router Solicita‐
2190 tion/Advertisement packets that were not processed in
2191 previous tables.
2192
2193 · Priority-500 flows that match IP multicast traffic des‐
2194 tined to groups registered on any of the attached
2195 switches and sets outport to the associated multicast
2196 group that will eventually flood the traffic to all
2197 interested attached logical switches. The flows also
2198 decrement TTL.
2199
2200 · Priority-450 flow that matches unregistered IP multicast
2201 traffic and sets outport to the MC_STATIC multicast
2202 group, which ovn-northd populates with the logical ports
2203 that have options :mcast_flood=’true’. If no router ports
2204 are configured to flood multicast traffic the packets are
2205 dropped.
2206
2207 · IPv4 routing table. For each route to IPv4 network N with
2208 netmask M, on router port P with IP address A and Ether‐
2209 net address E, a logical flow with match ip4.dst == N/M,
2210 whose priority is the number of 1-bits in M, has the fol‐
2211 lowing actions:
2212
2213 ip.ttl--;
2214 reg8[0..15] = 0;
2215 reg0 = G;
2216 reg1 = A;
2217 eth.src = E;
2218 outport = P;
2219 flags.loopback = 1;
2220 next;
2221
2222
2223 (Ingress table 1 already verified that ip.ttl--; will not
2224 yield a TTL exceeded error.)
2225
2226 If the route has a gateway, G is the gateway IP address.
2227 Instead, if the route is from a configured static route,
2228 G is the next hop IP address. Else it is ip4.dst.
2229
2230 · IPv6 routing table. For each route to IPv6 network N with
2231 netmask M, on router port P with IP address A and Ether‐
2232 net address E, a logical flow with match in CIDR notation
2233 ip6.dst == N/M, whose priority is the integer value of M,
2234 has the following actions:
2235
2236 ip.ttl--;
2237 reg8[0..15] = 0;
2238 xxreg0 = G;
2239 xxreg1 = A;
2240 eth.src = E;
2241 outport = inport;
2242 flags.loopback = 1;
2243 next;
2244
2245
2246 (Ingress table 1 already verified that ip.ttl--; will not
2247 yield a TTL exceeded error.)
2248
2249 If the route has a gateway, G is the gateway IP address.
2250 Instead, if the route is from a configured static route,
2251 G is the next hop IP address. Else it is ip6.dst.
2252
2253 If the address A is in the link-local scope, the route
2254 will be limited to sending on the ingress port.
2255
2256 · For ECMP routes, they are grouped by policy and prefix.
2257 An unique id (non-zero) is assigned to each group, and
2258 each member is also assigned an unique id (non-zero)
2259 within each group.
2260
2261 For each IPv4/IPv6 ECMP group with group id GID and mem‐
2262 ber ids MID1, MID2, ..., a logical flow with match in
2263 CIDR notation ip4.dst == N/M, or ip6.dst == N/M, whose
2264 priority is the integer value of M, has the following
2265 actions:
2266
2267 ip.ttl--;
2268 flags.loopback = 1;
2269 reg8[0..15] = GID;
2270 select(reg8[16..31], MID1, MID2, ...);
2271
2272
2273 Ingress Table 11: IP_ROUTING_ECMP
2274
2275 This table implements the second part of IP routing for ECMP routes
2276 following the previous table. If a packet matched a ECMP group in the
2277 previous table, this table matches the group id and member id stored
2278 from the previous table, setting reg0 (or xxreg0 for IPv6) to the next-
2279 hop IP address (leaving ip4.dst or ip6.dst, the packet’s final destina‐
2280 tion, unchanged) and advances to the next table for ARP resolution. It
2281 also sets reg1 (or xxreg1) to the IP address owned by the selected
2282 router port (ingress table ARP Request will generate an ARP request, if
2283 needed, with reg0 as the target protocol address and reg1 as the source
2284 protocol address).
2285
2286 This processing is skipped for reply traffic being sent out of an ECMP
2287 route if the route was configured to use symmetric replies.
2288
2289 This table contains the following logical flows:
2290
2291 · A priority-150 flow that matches reg8[0..15] == 0 with
2292 action next; directly bypasses packets of non-ECMP
2293 routes.
2294
2295 · For each member with ID MID in each ECMP group with ID
2296 GID, a priority-100 flow with match reg8[0..15] == GID &&
2297 reg8[16..31] == MID has following actions:
2298
2299 [xx]reg0 = G;
2300 [xx]reg1 = A;
2301 eth.src = E;
2302 outport = P;
2303
2304
2305 Ingress Table 12: Router policies
2306
2307 This table adds flows for the logical router policies configured on the
2308 logical router. Please see the OVN_Northbound database Logi‐
2309 cal_Router_Policy table documentation in ovn-nb for supported actions.
2310
2311 · For each router policy configured on the logical router,
2312 a logical flow is added with specified priority, match
2313 and actions.
2314
2315 · If the policy action is reroute with 2 or more nexthops
2316 defined, then the logical flow is added with the follow‐
2317 ing actions:
2318
2319 reg8[0..15] = GID;
2320 reg8[16..31] = select(1,..n);
2321
2322
2323 where GID is the ECMP group id generated by ovn-northd
2324 for this policy and n is the number of nexthops. select
2325 action selects one of the nexthop member id, stores it in
2326 the register reg8[16..31] and advances the packet to the
2327 next stage.
2328
2329 · If the policy action is reroute with just one nexhop,
2330 then the logical flow is added with the following
2331 actions:
2332
2333 [xx]reg0 = H;
2334 eth.src = E;
2335 outport = P;
2336 reg8[0..15] = 0;
2337 flags.loopback = 1;
2338 next;
2339
2340
2341 where H is the nexthop defined in the router policy, E
2342 is the ethernet address of the logical router port from
2343 which the nexthop is reachable and P is the logical
2344 router port from which the nexthop is reachable.
2345
2346 · If a router policy has the option pkt_mark=m set and if
2347 the action is not drop, then the action also includes
2348 pkt.mark = m to mark the packet with the marker m.
2349
2350 Ingress Table 13: ECMP handling for router policies
2351
2352 This table handles the ECMP for the router policies configured with
2353 multiple nexthops.
2354
2355 · A priority-150 flow is added to advance the packet to the
2356 next stage if the ECMP group id register reg8[0..15] is
2357 0.
2358
2359 · For each ECMP reroute router policy with multiple nex‐
2360 thops, a priority-100 flow is added for each nexthop H
2361 with the match reg8[0..15] == GID && reg8[16..31] == M
2362 where GID is the router policy group id generated by
2363 ovn-northd and M is the member id of the nexthop H gener‐
2364 ated by ovn-northd. The following actions are added to
2365 the flow:
2366
2367 [xx]reg0 = H;
2368 eth.src = E;
2369 outport = P
2370 "flags.loopback = 1; "
2371 "next;"
2372
2373
2374 where H is the nexthop defined in the router policy, E
2375 is the ethernet address of the logical router port from
2376 which the nexthop is reachable and P is the logical
2377 router port from which the nexthop is reachable.
2378
2379 Ingress Table 14: ARP/ND Resolution
2380
2381 Any packet that reaches this table is an IP packet whose next-hop IPv4
2382 address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
2383 contains the final destination.) This table resolves the IP address in
2384 reg0 (or xxreg0) into an output port in outport and an Ethernet address
2385 in eth.dst, using the following flows:
2386
2387 · A priority-500 flow that matches IP multicast traffic
2388 that was allowed in the routing pipeline. For this kind
2389 of traffic the outport was already set so the flow just
2390 advances to the next table.
2391
2392 · Static MAC bindings. MAC bindings can be known statically
2393 based on data in the OVN_Northbound database. For router
2394 ports connected to logical switches, MAC bindings can be
2395 known statically from the addresses column in the Logi‐
2396 cal_Switch_Port table. For router ports connected to
2397 other logical routers, MAC bindings can be known stati‐
2398 cally from the mac and networks column in the Logi‐
2399 cal_Router_Port table. (Note: the flow is NOT installed
2400 for the IP addresses that belong to a neighbor logical
2401 router port if the current router has the
2402 options:dynamic_neigh_routers set to true)
2403
2404 For each IPv4 address A whose host is known to have Eth‐
2405 ernet address E on router port P, a priority-100 flow
2406 with match outport === P && reg0 == A has actions eth.dst
2407 = E; next;.
2408
2409 For each virtual ip A configured on a logical port of
2410 type virtual and its virtual parent set in its corre‐
2411 sponding Port_Binding record and the virtual parent with
2412 the Ethernet address E and the virtual ip is reachable
2413 via the router port P, a priority-100 flow with match
2414 outport === P && reg0 == A has actions eth.dst = E;
2415 next;.
2416
2417 For each virtual ip A configured on a logical port of
2418 type virtual and its virtual parent not set in its corre‐
2419 sponding Port_Binding record and the virtual ip A is
2420 reachable via the router port P, a priority-100 flow with
2421 match outport === P && reg0 == A has actions eth.dst =
2422 00:00:00:00:00:00; next;. This flow is added so that the
2423 ARP is always resolved for the virtual ip A by generating
2424 ARP request and not consulting the MAC_Binding table as
2425 it can have incorrect value for the virtual ip A.
2426
2427 For each IPv6 address A whose host is known to have Eth‐
2428 ernet address E on router port P, a priority-100 flow
2429 with match outport === P && xxreg0 == A has actions
2430 eth.dst = E; next;.
2431
2432 For each logical router port with an IPv4 address A and a
2433 mac address of E that is reachable via a different logi‐
2434 cal router port P, a priority-100 flow with match outport
2435 === P && reg0 == A has actions eth.dst = E; next;.
2436
2437 For each logical router port with an IPv6 address A and a
2438 mac address of E that is reachable via a different logi‐
2439 cal router port P, a priority-100 flow with match outport
2440 === P && xxreg0 == A has actions eth.dst = E; next;.
2441
2442 · Static MAC bindings from NAT entries. MAC bindings can
2443 also be known for the entries in the NAT table. Below
2444 flows are programmed for distributed logical routers i.e
2445 with a distributed router port.
2446
2447 For each row in the NAT table with IPv4 address A in the
2448 external_ip column of NAT table, a priority-100 flow with
2449 the match outport === P && reg0 == A has actions eth.dst
2450 = E; next;, where P is the distributed logical router
2451 port, E is the Ethernet address if set in the exter‐
2452 nal_mac column of NAT table for of type dnat_and_snat,
2453 otherwise the Ethernet address of the distributed logical
2454 router port.
2455
2456 For IPv6 NAT entries, same flows are added, but using the
2457 register xxreg0 for the match.
2458
2459 · Traffic with IP destination an address owned by the
2460 router should be dropped. Such traffic is normally
2461 dropped in ingress table IP Input except for IPs that are
2462 also shared with SNAT rules. However, if there was no
2463 unSNAT operation that happened successfully until this
2464 point in the pipeline and the destination IP of the
2465 packet is still a router owned IP, the packets can be
2466 safely dropped.
2467
2468 A priority-1 logical flow with match ip4.dst = {..}
2469 matches on traffic destined to router owned IPv4
2470 addresses which are also SNAT IPs. This flow has action
2471 drop;.
2472
2473 A priority-1 logical flow with match ip6.dst = {..}
2474 matches on traffic destined to router owned IPv6
2475 addresses which are also SNAT IPs. This flow has action
2476 drop;.
2477
2478 · Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
2479 ings that have become known dynamically through ARP or
2480 neighbor discovery. (The ingress table ARP Request will
2481 issue an ARP or neighbor solicitation request for cases
2482 where the binding is not yet known.)
2483
2484 A priority-0 logical flow with match ip4 has actions
2485 get_arp(outport, reg0); next;.
2486
2487 A priority-0 logical flow with match ip6 has actions
2488 get_nd(outport, xxreg0); next;.
2489
2490 · For a distributed gateway LRP with redirect-type set to
2491 bridged, a priority-50 flow will match outport ==
2492 "ROUTER_PORT" and !is_chassis_resident ("cr-ROUTER_PORT")
2493 has actions eth.dst = E; next;, where E is the ethernet
2494 address of the logical router port.
2495
2496 Ingress Table 15: Check packet length
2497
2498 For distributed logical routers with distributed gateway port config‐
2499 ured with options:gateway_mtu to a valid integer value, this table adds
2500 a priority-50 logical flow with the match ip4 && outport == GW_PORT
2501 where GW_PORT is the distributed gateway router port and applies the
2502 action check_pkt_larger and advances the packet to the next table.
2503
2504 REGBIT_PKT_LARGER = check_pkt_larger(L); next;
2505
2506
2507 where L is the packet length to check for. If the packet is larger than
2508 L, it stores 1 in the register bit REGBIT_PKT_LARGER. The value of L is
2509 taken from options:gateway_mtu column of Logical_Router_Port row.
2510
2511 This table adds one priority-0 fallback flow that matches all packets
2512 and advances to the next table.
2513
2514 Ingress Table 16: Handle larger packets
2515
2516 For distributed logical routers with distributed gateway port config‐
2517 ured with options:gateway_mtu to a valid integer value, this table adds
2518 the following priority-50 logical flow for each logical router port
2519 with the match inport == LRP && outport == GW_PORT && REG‐
2520 BIT_PKT_LARGER, where LRP is the logical router port and GW_PORT is the
2521 distributed gateway router port and applies the following action for
2522 ipv4 and ipv6 respectively:
2523
2524 icmp4 {
2525 icmp4.type = 3; /* Destination Unreachable. */
2526 icmp4.code = 4; /* Frag Needed and DF was Set. */
2527 icmp4.frag_mtu = M;
2528 eth.dst = E;
2529 ip4.dst = ip4.src;
2530 ip4.src = I;
2531 ip.ttl = 255;
2532 REGBIT_EGRESS_LOOPBACK = 1;
2533 next(pipeline=ingress, table=0);
2534 };
2535 icmp6 {
2536 icmp6.type = 2;
2537 icmp6.code = 0;
2538 icmp6.frag_mtu = M;
2539 eth.dst = E;
2540 ip6.dst = ip6.src;
2541 ip6.src = I;
2542 ip.ttl = 255;
2543 REGBIT_EGRESS_LOOPBACK = 1;
2544 next(pipeline=ingress, table=0);
2545 };
2546
2547
2548 · Where M is the (fragment MTU - 58) whose value is taken
2549 from options:gateway_mtu column of Logical_Router_Port
2550 row.
2551
2552 · E is the Ethernet address of the logical router port.
2553
2554 · I is the IPv4/IPv6 address of the logical router port.
2555
2556 This table adds one priority-0 fallback flow that matches all packets
2557 and advances to the next table.
2558
2559 Ingress Table 17: Gateway Redirect
2560
2561 For distributed logical routers where one of the logical router ports
2562 specifies a gateway chassis, this table redirects certain packets to
2563 the distributed gateway port instance on the gateway chassis. This ta‐
2564 ble has the following flows:
2565
2566 · For each NAT rule in the OVN Northbound database that can
2567 be handled in a distributed manner, a priority-100 logi‐
2568 cal flow with match ip4.src == B && outport == GW &&
2569 is_chassis_resident(P), where GW is the logical router
2570 distributed gateway port and P is the NAT logical port.
2571 IP traffic matching the above rule will be managed
2572 locally setting reg1 to C and eth.src to D, where C is
2573 NAT external ip and D is NAT external mac.
2574
2575 · A priority-50 logical flow with match outport == GW has
2576 actions outport = CR; next;, where GW is the logical
2577 router distributed gateway port and CR is the chas‐
2578 sisredirect port representing the instance of the logical
2579 router distributed gateway port on the gateway chassis.
2580
2581 · A priority-0 logical flow with match 1 has actions next;.
2582
2583 Ingress Table 18: ARP Request
2584
2585 In the common case where the Ethernet destination has been resolved,
2586 this table outputs the packet. Otherwise, it composes and sends an ARP
2587 or IPv6 Neighbor Solicitation request. It holds the following flows:
2588
2589 · Unknown MAC address. A priority-100 flow for IPv4 packets
2590 with match eth.dst == 00:00:00:00:00:00 has the following
2591 actions:
2592
2593 arp {
2594 eth.dst = ff:ff:ff:ff:ff:ff;
2595 arp.spa = reg1;
2596 arp.tpa = reg0;
2597 arp.op = 1; /* ARP request. */
2598 output;
2599 };
2600
2601
2602 Unknown MAC address. For each IPv6 static route associ‐
2603 ated with the router with the nexthop IP: G, a prior‐
2604 ity-200 flow for IPv6 packets with match eth.dst ==
2605 00:00:00:00:00:00 && xxreg0 == G with the following
2606 actions is added:
2607
2608 nd_ns {
2609 eth.dst = E;
2610 ip6.dst = I
2611 nd.target = G;
2612 output;
2613 };
2614
2615
2616 Where E is the multicast mac derived from the Gateway IP,
2617 I is the solicited-node multicast address corresponding
2618 to the target address G.
2619
2620 Unknown MAC address. A priority-100 flow for IPv6 packets
2621 with match eth.dst == 00:00:00:00:00:00 has the following
2622 actions:
2623
2624 nd_ns {
2625 nd.target = xxreg0;
2626 output;
2627 };
2628
2629
2630 (Ingress table IP Routing initialized reg1 with the IP
2631 address owned by outport and (xx)reg0 with the next-hop
2632 IP address)
2633
2634 The IP packet that triggers the ARP/IPv6 NS request is
2635 dropped.
2636
2637 · Known MAC address. A priority-0 flow with match 1 has
2638 actions output;.
2639
2640 Egress Table 0: UNDNAT
2641
2642 This is for already established connections’ reverse traffic. i.e.,
2643 DNAT has already been done in ingress pipeline and now the packet has
2644 entered the egress pipeline as part of a reply. For NAT on a distrib‐
2645 uted router, it is unDNATted here. For Gateway routers, the unDNAT pro‐
2646 cessing is carried out in the ingress DNAT table.
2647
2648 · For all the configured load balancing rules for a router
2649 with gateway port in OVN_Northbound database that
2650 includes an IPv4 address VIP, for every backend IPv4
2651 address B defined for the VIP a priority-120 flow is pro‐
2652 grammed on gateway chassis that matches ip && ip4.src ==
2653 B && outport == GW, where GW is the logical router gate‐
2654 way port with an action ct_dnat;. If the backend IPv4
2655 address B is also configured with L4 port PORT of proto‐
2656 col P, then the match also includes P.src == PORT. These
2657 flows are not added for load balancers with IPv6 VIPs.
2658
2659 If the router is configured to force SNAT any load-bal‐
2660 anced packets, above action will be replaced by
2661 flags.force_snat_for_lb = 1; ct_dnat;.
2662
2663 · For each configuration in the OVN Northbound database
2664 that asks to change the destination IP address of a
2665 packet from an IP address of A to B, a priority-100 flow
2666 matches ip && ip4.src == B && outport == GW, where GW is
2667 the logical router gateway port, with an action ct_dnat;.
2668 If the NAT rule is of type dnat_and_snat and has state‐
2669 less=true in the options, then the action would be
2670 ip4/6.src= (B).
2671
2672 If the NAT rule cannot be handled in a distributed man‐
2673 ner, then the priority-100 flow above is only programmed
2674 on the gateway chassis.
2675
2676 If the NAT rule can be handled in a distributed manner,
2677 then there is an additional action eth.src = EA;, where
2678 EA is the ethernet address associated with the IP address
2679 A in the NAT rule. This allows upstream MAC learning to
2680 point to the correct chassis.
2681
2682 · A priority-0 logical flow with match 1 has actions next;.
2683
2684 Egress Table 1: SNAT
2685
2686 Packets that are configured to be SNATed get their source IP address
2687 changed based on the configuration in the OVN Northbound database.
2688
2689 · A priority-120 flow to advance the IPv6 Neighbor solici‐
2690 tation packet to next table to skip SNAT. In the case
2691 where ovn-controller injects an IPv6 Neighbor Solicita‐
2692 tion packet (for nd_ns action) we don’t want the packet
2693 to go throught conntrack.
2694
2695 Egress Table 1: SNAT on Gateway Routers
2696
2697 · If the Gateway router in the OVN Northbound database has
2698 been configured to force SNAT a packet (that has been
2699 previously DNATted) to B, a priority-100 flow matches
2700 flags.force_snat_for_dnat == 1 && ip with an action
2701 ct_snat(B);.
2702
2703 If the Gateway router in the OVN Northbound database has
2704 been configured to force SNAT a packet (that has been
2705 previously load-balanced) to B, a priority-100 flow
2706 matches flags.force_snat_for_lb == 1 && ip with an action
2707 ct_snat(B);.
2708
2709 For each configuration in the OVN Northbound database,
2710 that asks to change the source IP address of a packet
2711 from an IP address of A or to change the source IP
2712 address of a packet that belongs to network A to B, a
2713 flow matches ip && ip4.src == A with an action
2714 ct_snat(B);. The priority of the flow is calculated based
2715 on the mask of A, with matches having larger masks get‐
2716 ting higher priorities. If the NAT rule is of type
2717 dnat_and_snat and has stateless=true in the options, then
2718 the action would be ip4/6.src= (B).
2719
2720 If the NAT rule has allowed_ext_ips configured, then
2721 there is an additional match ip4.dst == allowed_ext_ips .
2722 Similarly, for IPV6, match would be ip6.dst ==
2723 allowed_ext_ips.
2724
2725 If the NAT rule has exempted_ext_ips set, then there is
2726 an additional flow configured at the priority + 1 of cor‐
2727 responding NAT rule. The flow matches if destination ip
2728 is an exempted_ext_ip and the action is next; . This flow
2729 is used to bypass the ct_snat action for a packet which
2730 is destinted to exempted_ext_ips.
2731
2732 A priority-0 logical flow with match 1 has actions next;.
2733
2734 Egress Table 1: SNAT on Distributed Routers
2735
2736 · For each configuration in the OVN Northbound database,
2737 that asks to change the source IP address of a packet
2738 from an IP address of A or to change the source IP
2739 address of a packet that belongs to network A to B, a
2740 flow matches ip && ip4.src == A && outport == GW, where
2741 GW is the logical router gateway port, with an action
2742 ct_snat(B);. The priority of the flow is calculated based
2743 on the mask of A, with matches having larger masks get‐
2744 ting higher priorities. If the NAT rule is of type
2745 dnat_and_snat and has stateless=true in the options, then
2746 the action would be ip4/6.src= (B).
2747
2748 If the NAT rule cannot be handled in a distributed man‐
2749 ner, then the flow above is only programmed on the gate‐
2750 way chassis increasing flow priority by 128 in order to
2751 be run first
2752
2753 If the NAT rule can be handled in a distributed manner,
2754 then there is an additional action eth.src = EA;, where
2755 EA is the ethernet address associated with the IP address
2756 A in the NAT rule. This allows upstream MAC learning to
2757 point to the correct chassis.
2758
2759 If the NAT rule has allowed_ext_ips configured, then
2760 there is an additional match ip4.dst == allowed_ext_ips .
2761 Similarly, for IPV6, match would be ip6.dst ==
2762 allowed_ext_ips.
2763
2764 If the NAT rule has exempted_ext_ips set, then there is
2765 an additional flow configured at the priority + 1 of cor‐
2766 responding NAT rule. The flow matches if destination ip
2767 is an exempted_ext_ip and the action is next; . This flow
2768 is used to bypass the ct_snat action for a flow which is
2769 destinted to exempted_ext_ips.
2770
2771 · A priority-0 logical flow with match 1 has actions next;.
2772
2773 Egress Table 2: Egress Loopback
2774
2775 For distributed logical routers where one of the logical router ports
2776 specifies a gateway chassis.
2777
2778 While UNDNAT and SNAT processing have already occurred by this point,
2779 this traffic needs to be forced through egress loopback on this dis‐
2780 tributed gateway port instance, in order for UNSNAT and DNAT processing
2781 to be applied, and also for IP routing and ARP resolution after all of
2782 the NAT processing, so that the packet can be forwarded to the destina‐
2783 tion.
2784
2785 This table has the following flows:
2786
2787 · For each NAT rule in the OVN Northbound database on a
2788 distributed router, a priority-100 logical flow with
2789 match ip4.dst == E && outport == GW && is_chassis_resi‐
2790 dent(P), where E is the external IP address specified in
2791 the NAT rule, GW is the logical router distributed gate‐
2792 way port. For dnat_and_snat NAT rule, P is the logical
2793 port specified in the NAT rule. If logical_port column of
2794 NAT table is NOT set, then P is the chassisredirect port
2795 of GW with the following actions:
2796
2797 clone {
2798 ct_clear;
2799 inport = outport;
2800 outport = "";
2801 flags = 0;
2802 flags.loopback = 1;
2803 reg0 = 0;
2804 reg1 = 0;
2805 ...
2806 reg9 = 0;
2807 REGBIT_EGRESS_LOOPBACK = 1;
2808 next(pipeline=ingress, table=0);
2809 };
2810
2811
2812 flags.loopback is set since in_port is unchanged and the
2813 packet may return back to that port after NAT processing.
2814 REGBIT_EGRESS_LOOPBACK is set to indicate that egress
2815 loopback has occurred, in order to skip the source IP
2816 address check against the router address.
2817
2818 · A priority-0 logical flow with match 1 has actions next;.
2819
2820 Egress Table 3: Delivery
2821
2822 Packets that reach this table are ready for delivery. It contains:
2823
2824 · Priority-110 logical flows that match IP multicast pack‐
2825 ets on each enabled logical router port and modify the
2826 Ethernet source address of the packets to the Ethernet
2827 address of the port and then execute action output;.
2828
2829 · Priority-100 logical flows that match packets on each
2830 enabled logical router port, with action output;.
2831
2832
2833
2834OVN 20.12.0 ovn-northd ovn-northd(8)