1ovn-northd(8) OVN Manual ovn-northd(8)
2
3
4
6 ovn-northd and ovn-northd-ddlog - Open Virtual Network central control
7 daemon
8
10 ovn-northd [options]
11
13 ovn-northd is a centralized daemon responsible for translating the
14 high-level OVN configuration into logical configuration consumable by
15 daemons such as ovn-controller. It translates the logical network con‐
16 figuration in terms of conventional network concepts, taken from the
17 OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
18 the OVN Southbound Database (see ovn-sb(5)) below it.
19
20 ovn-northd is implemented in C. ovn-northd-ddlog is a compatible imple‐
21 mentation written in DDlog, a language for incremental database pro‐
22 cessing. This documentation applies to both implementations, with dif‐
23 ferences indicated where relevant.
24
26 --ovnnb-db=database
27 The OVSDB database containing the OVN Northbound Database. If
28 the OVN_NB_DB environment variable is set, its value is used as
29 the default. Otherwise, the default is unix:/ovnnb_db.sock.
30
31 --ovnsb-db=database
32 The OVSDB database containing the OVN Southbound Database. If
33 the OVN_SB_DB environment variable is set, its value is used as
34 the default. Otherwise, the default is unix:/ovnsb_db.sock.
35
36 --ddlog-record=file
37 This option is for ovn-north-ddlog only. It causes the daemon to
38 record the initial database state and later changes to file in
39 the text-based DDlog command format. The ovn_northd_cli program
40 can later replay these changes for debugging purposes. This op‐
41 tion has a performance impact. See debugging-ddlog.rst in the
42 OVN documentation for more details.
43
44 --dry-run
45 Causes ovn-northd to start paused. In the paused state,
46 ovn-northd does not apply any changes to the databases, although
47 it continues to monitor them. For more information, see the
48 pause command, under Runtime Management Commands below.
49
50 For ovn-northd-ddlog, one could use this option with
51 --ddlog-record to generate a replay log without restarting a
52 process or disturbing a running system.
53
54 database in the above options must be an OVSDB active or passive con‐
55 nection method, as described in ovsdb(7).
56
57 Daemon Options
58 --pidfile[=pidfile]
59 Causes a file (by default, program.pid) to be created indicating
60 the PID of the running process. If the pidfile argument is not
61 specified, or if it does not begin with /, then it is created in
62 .
63
64 If --pidfile is not specified, no pidfile is created.
65
66 --overwrite-pidfile
67 By default, when --pidfile is specified and the specified pid‐
68 file already exists and is locked by a running process, the dae‐
69 mon refuses to start. Specify --overwrite-pidfile to cause it to
70 instead overwrite the pidfile.
71
72 When --pidfile is not specified, this option has no effect.
73
74 --detach
75 Runs this program as a background process. The process forks,
76 and in the child it starts a new session, closes the standard
77 file descriptors (which has the side effect of disabling logging
78 to the console), and changes its current directory to the root
79 (unless --no-chdir is specified). After the child completes its
80 initialization, the parent exits.
81
82 --monitor
83 Creates an additional process to monitor this program. If it
84 dies due to a signal that indicates a programming error (SIGA‐
85 BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
86 or SIGXFSZ) then the monitor process starts a new copy of it. If
87 the daemon dies or exits for another reason, the monitor process
88 exits.
89
90 This option is normally used with --detach, but it also func‐
91 tions without it.
92
93 --no-chdir
94 By default, when --detach is specified, the daemon changes its
95 current working directory to the root directory after it de‐
96 taches. Otherwise, invoking the daemon from a carelessly chosen
97 directory would prevent the administrator from unmounting the
98 file system that holds that directory.
99
100 Specifying --no-chdir suppresses this behavior, preventing the
101 daemon from changing its current working directory. This may be
102 useful for collecting core files, since it is common behavior to
103 write core dumps into the current working directory and the root
104 directory is not a good directory to use.
105
106 This option has no effect when --detach is not specified.
107
108 --no-self-confinement
109 By default this daemon will try to self-confine itself to work
110 with files under well-known directories determined at build
111 time. It is better to stick with this default behavior and not
112 to use this flag unless some other Access Control is used to
113 confine daemon. Note that in contrast to other access control
114 implementations that are typically enforced from kernel-space
115 (e.g. DAC or MAC), self-confinement is imposed from the user-
116 space daemon itself and hence should not be considered as a full
117 confinement strategy, but instead should be viewed as an addi‐
118 tional layer of security.
119
120 --user=user:group
121 Causes this program to run as a different user specified in
122 user:group, thus dropping most of the root privileges. Short
123 forms user and :group are also allowed, with current user or
124 group assumed, respectively. Only daemons started by the root
125 user accepts this argument.
126
127 On Linux, daemons will be granted CAP_IPC_LOCK and
128 CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
129 that interact with a datapath, such as ovs-vswitchd, will be
130 granted three additional capabilities, namely CAP_NET_ADMIN,
131 CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
132 apply even if the new user is root.
133
134 On Windows, this option is not currently supported. For security
135 reasons, specifying this option will cause the daemon process
136 not to start.
137
138 Logging Options
139 -v[spec]
140 --verbose=[spec]
141 Sets logging levels. Without any spec, sets the log level for ev‐
142 ery module and destination to dbg. Otherwise, spec is a list of
143 words separated by spaces or commas or colons, up to one from each
144 category below:
145
146 • A valid module name, as displayed by the vlog/list command
147 on ovs-appctl(8), limits the log level change to the speci‐
148 fied module.
149
150 • syslog, console, or file, to limit the log level change to
151 only to the system log, to the console, or to a file, re‐
152 spectively. (If --detach is specified, the daemon closes
153 its standard file descriptors, so logging to the console
154 will have no effect.)
155
156 On Windows platform, syslog is accepted as a word and is
157 only useful along with the --syslog-target option (the word
158 has no effect otherwise).
159
160 • off, emer, err, warn, info, or dbg, to control the log
161 level. Messages of the given severity or higher will be
162 logged, and messages of lower severity will be filtered
163 out. off filters out all messages. See ovs-appctl(8) for a
164 definition of each log level.
165
166 Case is not significant within spec.
167
168 Regardless of the log levels set for file, logging to a file will
169 not take place unless --log-file is also specified (see below).
170
171 For compatibility with older versions of OVS, any is accepted as a
172 word but has no effect.
173
174 -v
175 --verbose
176 Sets the maximum logging verbosity level, equivalent to --ver‐
177 bose=dbg.
178
179 -vPATTERN:destination:pattern
180 --verbose=PATTERN:destination:pattern
181 Sets the log pattern for destination to pattern. Refer to ovs-ap‐
182 pctl(8) for a description of the valid syntax for pattern.
183
184 -vFACILITY:facility
185 --verbose=FACILITY:facility
186 Sets the RFC5424 facility of the log message. facility can be one
187 of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
188 ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
189 local4, local5, local6 or local7. If this option is not specified,
190 daemon is used as the default for the local system syslog and lo‐
191 cal0 is used while sending a message to the target provided via
192 the --syslog-target option.
193
194 --log-file[=file]
195 Enables logging to a file. If file is specified, then it is used
196 as the exact name for the log file. The default log file name used
197 if file is omitted is /var/log/ovn/program.log.
198
199 --syslog-target=host:port
200 Send syslog messages to UDP port on host, in addition to the sys‐
201 tem syslog. The host must be a numerical IP address, not a host‐
202 name.
203
204 --syslog-method=method
205 Specify method as how syslog messages should be sent to syslog
206 daemon. The following forms are supported:
207
208 • libc, to use the libc syslog() function. Downside of using
209 this options is that libc adds fixed prefix to every mes‐
210 sage before it is actually sent to the syslog daemon over
211 /dev/log UNIX domain socket.
212
213 • unix:file, to use a UNIX domain socket directly. It is pos‐
214 sible to specify arbitrary message format with this option.
215 However, rsyslogd 8.9 and older versions use hard coded
216 parser function anyway that limits UNIX domain socket use.
217 If you want to use arbitrary message format with older
218 rsyslogd versions, then use UDP socket to localhost IP ad‐
219 dress instead.
220
221 • udp:ip:port, to use a UDP socket. With this method it is
222 possible to use arbitrary message format also with older
223 rsyslogd. When sending syslog messages over UDP socket ex‐
224 tra precaution needs to be taken into account, for example,
225 syslog daemon needs to be configured to listen on the spec‐
226 ified UDP port, accidental iptables rules could be inter‐
227 fering with local syslog traffic and there are some secu‐
228 rity considerations that apply to UDP sockets, but do not
229 apply to UNIX domain sockets.
230
231 • null, to discard all messages logged to syslog.
232
233 The default is taken from the OVS_SYSLOG_METHOD environment vari‐
234 able; if it is unset, the default is libc.
235
236 PKI Options
237 PKI configuration is required in order to use SSL for the connections
238 to the Northbound and Southbound databases.
239
240 -p privkey.pem
241 --private-key=privkey.pem
242 Specifies a PEM file containing the private key used as
243 identity for outgoing SSL connections.
244
245 -c cert.pem
246 --certificate=cert.pem
247 Specifies a PEM file containing a certificate that certi‐
248 fies the private key specified on -p or --private-key to be
249 trustworthy. The certificate must be signed by the certifi‐
250 cate authority (CA) that the peer in SSL connections will
251 use to verify it.
252
253 -C cacert.pem
254 --ca-cert=cacert.pem
255 Specifies a PEM file containing the CA certificate for ver‐
256 ifying certificates presented to this program by SSL peers.
257 (This may be the same certificate that SSL peers use to
258 verify the certificate specified on -c or --certificate, or
259 it may be a different one, depending on the PKI design in
260 use.)
261
262 -C none
263 --ca-cert=none
264 Disables verification of certificates presented by SSL
265 peers. This introduces a security risk, because it means
266 that certificates cannot be verified to be those of known
267 trusted hosts.
268
269 Other Options
270 --unixctl=socket
271 Sets the name of the control socket on which program listens for
272 runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
273 below). If socket does not begin with /, it is interpreted as
274 relative to . If --unixctl is not used at all, the default
275 socket is /program.pid.ctl, where pid is program’s process ID.
276
277 On Windows a local named pipe is used to listen for runtime man‐
278 agement commands. A file is created in the absolute path as
279 pointed by socket or if --unixctl is not used at all, a file is
280 created as program in the configured OVS_RUNDIR directory. The
281 file exists just to mimic the behavior of a Unix domain socket.
282
283 Specifying none for socket disables the control socket feature.
284
285
286
287 -h
288 --help
289 Prints a brief help message to the console.
290
291 -V
292 --version
293 Prints version information to the console.
294
296 ovs-appctl can send commands to a running ovn-northd process. The cur‐
297 rently supported commands are described below.
298
299 exit Causes ovn-northd to gracefully terminate.
300
301 pause Pauses ovn-northd. When it is paused, ovn-northd receives
302 changes from the Northbound and Southbound database
303 changes as usual, but it does not send any updates. A
304 paused ovn-northd also drops database locks, which allows
305 any other non-paused instance of ovn-northd to take over.
306
307 resume Resumes the ovn-northd operation to process Northbound
308 and Southbound database contents and generate logical
309 flows. This will also instruct ovn-northd to aspire for
310 the lock on SB DB.
311
312 is-paused
313 Returns "true" if ovn-northd is currently paused, "false"
314 otherwise.
315
316 status Prints this server’s status. Status will be "active" if
317 ovn-northd has acquired OVSDB lock on SB DB, "standby" if
318 it has not or "paused" if this instance is paused.
319
320 sb-cluster-state-reset
321 Reset southbound database cluster status when databases
322 are destroyed and rebuilt.
323
324 If all databases in a clustered southbound database are
325 removed from disk, then the stored index of all databases
326 will be reset to zero. This will cause ovn-northd to be
327 unable to read or write to the southbound database, be‐
328 cause it will always detect the data as stale. In such a
329 case, run this command so that ovn-northd will reset its
330 local index so that it can interact with the southbound
331 database again.
332
333 nb-cluster-state-reset
334 Reset northbound database cluster status when databases
335 are destroyed and rebuilt.
336
337 This performs the same task as sb-cluster-state-reset ex‐
338 cept for the northbound database client.
339
340 Only ovn-northd-ddlog supports the following commands:
341
342 enable-cpu-profiling
343 disable-cpu-profiling
344 Enables or disables profiling of CPU time used by the DDlog
345 engine. When CPU profiling is enabled, the profile command
346 (see below) will include DDlog CPU usage statistics in its
347 output. Enabling CPU profiling will slow ovn-northd-ddlog.
348 Disabling CPU profiling does not clear any previously
349 recorded statistics.
350
351 profile
352 Outputs a profile of the current and peak sizes of arrange‐
353 ments inside DDlog. This profiling data can be useful for
354 optimizing DDlog code. If CPU profiling was previously en‐
355 abled (even if it was later disabled), the output also in‐
356 cludes a CPU time profile. See Profiling inside the tuto‐
357 rial in the DDlog repository for an introduction to profil‐
358 ing DDlog.
359
361 You may run ovn-northd more than once in an OVN deployment. When con‐
362 nected to a standalone or clustered DB setup, OVN will automatically
363 ensure that only one of them is active at a time. If multiple instances
364 of ovn-northd are running and the active ovn-northd fails, one of the
365 hot standby instances of ovn-northd will automatically take over.
366
367 Active-Standby with multiple OVN DB servers
368 You may run multiple OVN DB servers in an OVN deployment with:
369
370 • OVN DB servers deployed in active/passive mode with one
371 active and multiple passive ovsdb-servers.
372
373 • ovn-northd also deployed on all these nodes, using unix
374 ctl sockets to connect to the local OVN DB servers.
375
376 In such deployments, the ovn-northds on the passive nodes will process
377 the DB changes and compute logical flows to be thrown out later, be‐
378 cause write transactions are not allowed by the passive ovsdb-servers.
379 It results in unnecessary CPU usage.
380
381 With the help of runtime management command pause, you can pause
382 ovn-northd on these nodes. When a passive node becomes master, you can
383 use the runtime management command resume to resume the ovn-northd to
384 process the DB changes.
385
387 One of the main purposes of ovn-northd is to populate the Logical_Flow
388 table in the OVN_Southbound database. This section describes how
389 ovn-northd does this for switch and router logical datapaths.
390
391 Logical Switch Datapaths
392 Ingress Table 0: Admission Control and Ingress Port Security - L2
393
394 Ingress table 0 contains these logical flows:
395
396 • Priority 100 flows to drop packets with VLAN tags or mul‐
397 ticast Ethernet source addresses.
398
399 • Priority 50 flows that implement ingress port security
400 for each enabled logical port. For logical ports on which
401 port security is enabled, these match the inport and the
402 valid eth.src address(es) and advance only those packets
403 to the next flow table. For logical ports on which port
404 security is not enabled, these advance all packets that
405 match the inport.
406
407 • For logical ports of type vtep, the above logical flow
408 will also apply the action REGBIT_FROM_RAMP = 1; to indi‐
409 cate that the packet is coming from a RAMP (controller-
410 vtep) device. Later pipelines will use this information
411 to skip sending the packet to the conntrack. Packets from
412 vtep logical ports should go though ingress pipeline only
413 to determine the output port and they should not be sub‐
414 jected to any ACL checks. Egress pipeline will do the ACL
415 checks.
416
417 There are no flows for disabled logical ports because the default-drop
418 behavior of logical flow tables causes packets that ingress from them
419 to be dropped.
420
421 Ingress Table 1: Ingress Port Security - IP
422
423 Ingress table 1 contains these logical flows:
424
425 • For each element in the port security set having one or
426 more IPv4 or IPv6 addresses (or both),
427
428 • Priority 90 flow to allow IPv4 traffic if it has
429 IPv4 addresses which match the inport, valid
430 eth.src and valid ip4.src address(es).
431
432 • Priority 90 flow to allow IPv4 DHCP discovery
433 traffic if it has a valid eth.src. This is neces‐
434 sary since DHCP discovery messages are sent from
435 the unspecified IPv4 address (0.0.0.0) since the
436 IPv4 address has not yet been assigned.
437
438 • Priority 90 flow to allow IPv6 traffic if it has
439 IPv6 addresses which match the inport, valid
440 eth.src and valid ip6.src address(es).
441
442 • Priority 90 flow to allow IPv6 DAD (Duplicate Ad‐
443 dress Detection) traffic if it has a valid
444 eth.src. This is is necessary since DAD include
445 requires joining an multicast group and sending
446 neighbor solicitations for the newly assigned ad‐
447 dress. Since no address is yet assigned, these are
448 sent from the unspecified IPv6 address (::).
449
450 • Priority 80 flow to drop IP (both IPv4 and IPv6)
451 traffic which match the inport and valid eth.src.
452
453 • One priority-0 fallback flow that matches all packets and
454 advances to the next table.
455
456 Ingress Table 2: Ingress Port Security - Neighbor discovery
457
458 Ingress table 2 contains these logical flows:
459
460 • For each element in the port security set,
461
462 • Priority 90 flow to allow ARP traffic which match
463 the inport and valid eth.src and arp.sha. If the
464 element has one or more IPv4 addresses, then it
465 also matches the valid arp.spa.
466
467 • Priority 90 flow to allow IPv6 Neighbor Solicita‐
468 tion and Advertisement traffic which match the in‐
469 port, valid eth.src and nd.sll/nd.tll. If the ele‐
470 ment has one or more IPv6 addresses, then it also
471 matches the valid nd.target address(es) for Neigh‐
472 bor Advertisement traffic.
473
474 • Priority 80 flow to drop ARP and IPv6 Neighbor So‐
475 licitation and Advertisement traffic which match
476 the inport and valid eth.src.
477
478 • One priority-0 fallback flow that matches all packets and
479 advances to the next table.
480
481 Ingress Table 3: Lookup MAC address learning table
482
483 This table looks up the MAC learning table of the logical switch data‐
484 path to check if the port-mac pair is present or not. MAC is learnt
485 only for logical switch VIF ports whose port security is disabled and
486 ’unknown’ address set.
487
488 • For each such logical port p whose port security is dis‐
489 abled and ’unknown’ address set following flow is added.
490
491 • Priority 100 flow with the match inport == p and
492 action reg0[11] = lookup_fdb(inport, eth.src);
493 next;
494
495 • One priority-0 fallback flow that matches all packets and
496 advances to the next table.
497
498 Ingress Table 4: Learn MAC of ’unknown’ ports.
499
500 This table learns the MAC addresses seen on the logical ports whose
501 port security is disabled and ’unknown’ address set if the lookup_fdb
502 action returned false in the previous table.
503
504 • For each such logical port p whose port security is dis‐
505 abled and ’unknown’ address set following flow is added.
506
507 • Priority 100 flow with the match inport == p &&
508 reg0[11] == 0 and action put_fdb(inport, eth.src);
509 next; which stores the port-mac in the mac learn‐
510 ing table of the logical switch datapath and ad‐
511 vances the packet to the next table.
512
513 • One priority-0 fallback flow that matches all packets and
514 advances to the next table.
515
516 Ingress Table 5: from-lport Pre-ACLs
517
518 This table prepares flows for possible stateful ACL processing in
519 ingress table ACLs. It contains a priority-0 flow that simply moves
520 traffic to the next table. If stateful ACLs are used in the logical
521 datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
522 1; next;) for table Pre-stateful to send IP packets to the connection
523 tracker before eventually advancing to ingress table ACLs. If special
524 ports such as route ports or localnet ports can’t use ct(), a prior‐
525 ity-110 flow is added to skip over stateful ACLs. IPv6 Neighbor Discov‐
526 ery and MLD traffic also skips stateful ACLs. For "allow-stateless"
527 ACLs, a flow is added to bypass setting the hint for connection tracker
528 processing.
529
530 This table has a priority-110 flow with the match REGBIT_FROM_RAMP == 1
531 for all logical switch datapaths to resubmit traffic to the next table.
532 REGBIT_FROM_RAMP indicates that packet was received from vtep logical
533 ports and it can be skipped from the stateful ACL processing in the
534 ingress pipeline.
535
536 This table also has a priority-110 flow with the match eth.dst == E for
537 all logical switch datapaths to move traffic to the next table. Where E
538 is the service monitor mac defined in the options:svc_monitor_mac colum
539 of NB_Global table.
540
541 Ingress Table 6: Pre-LB
542
543 This table prepares flows for possible stateful load balancing process‐
544 ing in ingress table LB and Stateful. It contains a priority-0 flow
545 that simply moves traffic to the next table. Moreover it contains a
546 priority-110 flow to move IPv6 Neighbor Discovery and MLD traffic to
547 the next table. If load balancing rules with virtual IP addresses (and
548 ports) are configured in OVN_Northbound database for a logical switch
549 datapath, a priority-100 flow is added with the match ip to match on IP
550 packets and sets the action reg0[2] = 1; next; to act as a hint for ta‐
551 ble Pre-stateful to send IP packets to the connection tracker for
552 packet de-fragmentation (and to possibly do DNAT for already estab‐
553 lished load balanced traffic) before eventually advancing to ingress
554 table Stateful. If controller_event has been enabled and load balancing
555 rules with empty backends have been added in OVN_Northbound, a 130 flow
556 is added to trigger ovn-controller events whenever the chassis receives
557 a packet for that particular VIP. If event-elb meter has been previ‐
558 ously created, it will be associated to the empty_lb logical flow
559
560 Prior to OVN 20.09 we were setting the reg0[0] = 1 only if the IP des‐
561 tination matches the load balancer VIP. However this had few issues
562 cases where a logical switch doesn’t have any ACLs with allow-related
563 action. To understand the issue lets a take a TCP load balancer -
564 10.0.0.10:80=10.0.0.3:80. If a logical port - p1 with IP - 10.0.0.5
565 opens a TCP connection with the VIP - 10.0.0.10, then the packet in the
566 ingress pipeline of ’p1’ is sent to the p1’s conntrack zone id and the
567 packet is load balanced to the backend - 10.0.0.3. For the reply packet
568 from the backend lport, it is not sent to the conntrack of backend
569 lport’s zone id. This is fine as long as the packet is valid. Suppose
570 the backend lport sends an invalid TCP packet (like incorrect sequence
571 number), the packet gets delivered to the lport ’p1’ without unDNATing
572 the packet to the VIP - 10.0.0.10. And this causes the connection to be
573 reset by the lport p1’s VIF.
574
575 We can’t fix this issue by adding a logical flow to drop ct.inv packets
576 in the egress pipeline since it will drop all other connections not
577 destined to the load balancers. To fix this issue, we send all the
578 packets to the conntrack in the ingress pipeline if a load balancer is
579 configured. We can now add a lflow to drop ct.inv packets.
580
581 This table has a priority-110 flow with the match REGBIT_FROM_RAMP == 1
582 for all logical switch datapaths to resubmit traffic to the next table.
583 REGBIT_FROM_RAMP indicates that packet was received from vtep logical
584 ports and it can be skipped from the load balancer processing in the
585 ingress pipeline.
586
587 This table also has a priority-110 flow with the match eth.dst == E for
588 all logical switch datapaths to move traffic to the next table. Where E
589 is the service monitor mac defined in the options:svc_monitor_mac colum
590 of NB_Global table.
591
592 This table also has a priority-110 flow with the match inport == I for
593 all logical switch datapaths to move traffic to the next table. Where I
594 is the peer of a logical router port. This flow is added to skip the
595 connection tracking of packets which enter from logical router datapath
596 to logical switch datapath.
597
598 Ingress Table 7: Pre-stateful
599
600 This table prepares flows for all possible stateful processing in next
601 tables. It contains a priority-0 flow that simply moves traffic to the
602 next table.
603
604 • Priority-120 flows that send the packets to connection
605 tracker using ct_lb; as the action so that the already
606 established traffic destined to the load balancer VIP
607 gets DNATted based on a hint provided by the previous ta‐
608 bles (with a match for reg0[2] == 1 and on supported load
609 balancer protocols and address families). For IPv4 traf‐
610 fic the flows also load the original destination IP and
611 transport port in registers reg1 and reg2. For IPv6 traf‐
612 fic the flows also load the original destination IP and
613 transport port in registers xxreg1 and reg2.
614
615 • A priority-110 flow sends the packets to connection
616 tracker based on a hint provided by the previous tables
617 (with a match for reg0[2] == 1) by using the ct_lb; ac‐
618 tion. This flow is added to handle the traffic for load
619 balancer VIPs whose protocol is not defined (mainly for
620 ICMP traffic).
621
622 • A priority-100 flow sends the packets to connection
623 tracker based on a hint provided by the previous tables
624 (with a match for reg0[0] == 1) by using the ct_next; ac‐
625 tion.
626
627 Ingress Table 8: from-lport ACL hints
628
629 This table consists of logical flows that set hints (reg0 bits) to be
630 used in the next stage, in the ACL processing table, if stateful ACLs
631 or load balancers are configured. Multiple hints can be set for the
632 same packet. The possible hints are:
633
634 • reg0[7]: the packet might match an allow-related ACL and
635 might have to commit the connection to conntrack.
636
637 • reg0[8]: the packet might match an allow-related ACL but
638 there will be no need to commit the connection to con‐
639 ntrack because it already exists.
640
641 • reg0[9]: the packet might match a drop/reject.
642
643 • reg0[10]: the packet might match a drop/reject ACL but
644 the connection was previously allowed so it might have to
645 be committed again with ct_label=1/1.
646
647 The table contains the following flows:
648
649 • A priority-65535 flow to advance to the next table if the
650 logical switch has no ACLs configured, otherwise a prior‐
651 ity-0 flow to advance to the next table.
652
653 • A priority-7 flow that matches on packets that initiate a
654 new session. This flow sets reg0[7] and reg0[9] and then
655 advances to the next table.
656
657 • A priority-6 flow that matches on packets that are in the
658 request direction of an already existing session that has
659 been marked as blocked. This flow sets reg0[7] and
660 reg0[9] and then advances to the next table.
661
662 • A priority-5 flow that matches untracked packets. This
663 flow sets reg0[8] and reg0[9] and then advances to the
664 next table.
665
666 • A priority-4 flow that matches on packets that are in the
667 request direction of an already existing session that has
668 not been marked as blocked. This flow sets reg0[8] and
669 reg0[10] and then advances to the next table.
670
671 • A priority-3 flow that matches on packets that are in not
672 part of established sessions. This flow sets reg0[9] and
673 then advances to the next table.
674
675 • A priority-2 flow that matches on packets that are part
676 of an established session that has been marked as
677 blocked. This flow sets reg0[9] and then advances to the
678 next table.
679
680 • A priority-1 flow that matches on packets that are part
681 of an established session that has not been marked as
682 blocked. This flow sets reg0[10] and then advances to the
683 next table.
684
685 Ingress table 9: from-lport ACLs
686
687 Logical flows in this table closely reproduce those in the ACL table in
688 the OVN_Northbound database for the from-lport direction. The priority
689 values from the ACL table have a limited range and have 1000 added to
690 them to leave room for OVN default flows at both higher and lower pri‐
691 orities.
692
693 • allow ACLs translate into logical flows with the next;
694 action. If there are any stateful ACLs on this datapath,
695 then allow ACLs translate to ct_commit; next; (which acts
696 as a hint for the next tables to commit the connection to
697 conntrack). In case the ACL has a label then reg3 is
698 loaded with the label value and reg0[13] bit is set to 1
699 (which acts as a hint for the next tables to commit the
700 label to conntrack).
701
702 • allow-related ACLs translate into logical flows with the
703 ct_commit(ct_label=0/1); next; actions for new connec‐
704 tions and reg0[1] = 1; next; for existing connections. In
705 case the ACL has a label then reg3 is loaded with the la‐
706 bel value and reg0[13] bit is set to 1 (which acts as a
707 hint for the next tables to commit the label to con‐
708 ntrack).
709
710 • allow-stateless ACLs translate into logical flows with
711 the next; action.
712
713 • reject ACLs translate into logical flows with the tcp_re‐
714 set { output <-> inport; next(pipeline=egress,table=5);}
715 action for TCP connections,icmp4/icmp6 action for UDP
716 connections, and sctp_abort {output <-%gt; inport;
717 next(pipeline=egress,table=5);} action for SCTP associa‐
718 tions.
719
720 • Other ACLs translate to drop; for new or untracked con‐
721 nections and ct_commit(ct_label=1/1); for known connec‐
722 tions. Setting ct_label marks a connection as one that
723 was previously allowed, but should no longer be allowed
724 due to a policy change.
725
726 This table contains a priority-65535 flow to advance to the next table
727 if the logical switch has no ACLs configured, otherwise a priority-0
728 flow to advance to the next table so that ACLs allow packets by de‐
729 fault.
730
731 If the logical datapath has a stateful ACL or a load balancer with VIP
732 configured, the following flows will also be added:
733
734 • A priority-1 flow that sets the hint to commit IP traffic
735 to the connection tracker (with action reg0[1] = 1;
736 next;). This is needed for the default allow policy be‐
737 cause, while the initiator’s direction may not have any
738 stateful rules, the server’s may and then its return
739 traffic would not be known and marked as invalid.
740
741 • A priority-65532 flow that allows any traffic in the re‐
742 ply direction for a connection that has been committed to
743 the connection tracker (i.e., established flows), as long
744 as the committed flow does not have ct_label.blocked set.
745 We only handle traffic in the reply direction here be‐
746 cause we want all packets going in the request direction
747 to still go through the flows that implement the cur‐
748 rently defined policy based on ACLs. If a connection is
749 no longer allowed by policy, ct_label.blocked will get
750 set and packets in the reply direction will no longer be
751 allowed, either.
752
753 • A priority-65532 flow that allows any traffic that is
754 considered related to a committed flow in the connection
755 tracker (e.g., an ICMP Port Unreachable from a non-lis‐
756 tening UDP port), as long as the committed flow does not
757 have ct_label.blocked set.
758
759 • A priority-65532 flow that drops all traffic marked by
760 the connection tracker as invalid.
761
762 • A priority-65532 flow that drops all traffic in the reply
763 direction with ct_label.blocked set meaning that the con‐
764 nection should no longer be allowed due to a policy
765 change. Packets in the request direction are skipped here
766 to let a newly created ACL re-allow this connection.
767
768 • A priority-65532 flow that allows IPv6 Neighbor solicita‐
769 tion, Neighbor discover, Router solicitation, Router ad‐
770 vertisement and MLD packets.
771
772 If the logical datapath has any ACL or a load balancer with VIP config‐
773 ured, the following flow will also be added:
774
775 • A priority 34000 logical flow is added for each logical
776 switch datapath with the match eth.dst = E to allow the
777 service monitor reply packet destined to ovn-controller
778 with the action next, where E is the service monitor mac
779 defined in the options:svc_monitor_mac colum of NB_Global
780 table.
781
782 Ingress Table 10: from-lport QoS Marking
783
784 Logical flows in this table closely reproduce those in the QoS table
785 with the action column set in the OVN_Northbound database for the
786 from-lport direction.
787
788 • For every qos_rules entry in a logical switch with DSCP
789 marking enabled, a flow will be added at the priority
790 mentioned in the QoS table.
791
792 • One priority-0 fallback flow that matches all packets and
793 advances to the next table.
794
795 Ingress Table 11: from-lport QoS Meter
796
797 Logical flows in this table closely reproduce those in the QoS table
798 with the bandwidth column set in the OVN_Northbound database for the
799 from-lport direction.
800
801 • For every qos_rules entry in a logical switch with meter‐
802 ing enabled, a flow will be added at the priority men‐
803 tioned in the QoS table.
804
805 • One priority-0 fallback flow that matches all packets and
806 advances to the next table.
807
808 Ingress Table 12: Stateful
809
810 • For all the configured load balancing rules for a switch
811 in OVN_Northbound database that includes a L4 port PORT
812 of protocol P and IP address VIP, a priority-120 flow is
813 added. For IPv4 VIPs , the flow matches ct.new && ip &&
814 ip4.dst == VIP && P && P.dst == PORT. For IPv6 VIPs, the
815 flow matches ct.new && ip && ip6.dst == VIP && P && P.dst
816 == PORT. The flow’s action is ct_lb(args) , where args
817 contains comma separated IP addresses (and optional port
818 numbers) to load balance to. The address family of the IP
819 addresses of args is the same as the address family of
820 VIP. If health check is enabled, then args will only con‐
821 tain those endpoints whose service monitor status entry
822 in OVN_Southbound db is either online or empty. For IPv4
823 traffic the flow also loads the original destination IP
824 and transport port in registers reg1 and reg2. For IPv6
825 traffic the flow also loads the original destination IP
826 and transport port in registers xxreg1 and reg2.
827
828 • For all the configured load balancing rules for a switch
829 in OVN_Northbound database that includes just an IP ad‐
830 dress VIP to match on, OVN adds a priority-110 flow. For
831 IPv4 VIPs, the flow matches ct.new && ip && ip4.dst ==
832 VIP. For IPv6 VIPs, the flow matches ct.new && ip &&
833 ip6.dst == VIP. The action on this flow is ct_lb(args),
834 where args contains comma separated IP addresses of the
835 same address family as VIP. For IPv4 traffic the flow
836 also loads the original destination IP and transport port
837 in registers reg1 and reg2. For IPv6 traffic the flow
838 also loads the original destination IP and transport port
839 in registers xxreg1 and reg2.
840
841 • If the load balancer is created with --reject option and
842 it has no active backends, a TCP reset segment (for tcp)
843 or an ICMP port unreachable packet (for all other kind of
844 traffic) will be sent whenever an incoming packet is re‐
845 ceived for this load-balancer. Please note using --reject
846 option will disable empty_lb SB controller event for this
847 load balancer.
848
849 • A priority 100 flow is added which commits the packet to
850 the conntrack and sets the most significant 32-bits of
851 ct_label with the reg3 value based on the hint provided
852 by previous tables (with a match for reg0[1] == 1 &&
853 reg0[13] == 1). This is used by the ACLs with label to
854 commit the label value to conntrack.
855
856 • For ACLs without label, a second priority-100 flow com‐
857 mits packets to connection tracker using ct_commit; next;
858 action based on a hint provided by the previous tables
859 (with a match for reg0[1] == 1 && reg0[13] == 0).
860
861 • A priority-0 flow that simply moves traffic to the next
862 table.
863
864 Ingress Table 13: Pre-Hairpin
865
866 • If the logical switch has load balancer(s) configured,
867 then a priority-100 flow is added with the match ip &&
868 ct.trk to check if the packet needs to be hairpinned (if
869 after load balancing the destination IP matches the
870 source IP) or not by executing the actions reg0[6] =
871 chk_lb_hairpin(); and reg0[12] = chk_lb_hairpin_reply();
872 and advances the packet to the next table.
873
874 • A priority-0 flow that simply moves traffic to the next
875 table.
876
877 Ingress Table 14: Nat-Hairpin
878
879 • If the logical switch has load balancer(s) configured,
880 then a priority-100 flow is added with the match ip &&
881 ct.new && ct.trk && reg0[6] == 1 which hairpins the traf‐
882 fic by NATting source IP to the load balancer VIP by exe‐
883 cuting the action ct_snat_to_vip and advances the packet
884 to the next table.
885
886 • If the logical switch has load balancer(s) configured,
887 then a priority-100 flow is added with the match ip &&
888 ct.est && ct.trk && reg0[6] == 1 which hairpins the traf‐
889 fic by NATting source IP to the load balancer VIP by exe‐
890 cuting the action ct_snat and advances the packet to the
891 next table.
892
893 • If the logical switch has load balancer(s) configured,
894 then a priority-90 flow is added with the match ip &&
895 reg0[12] == 1 which matches on the replies of hairpinned
896 traffic (i.e., destination IP is VIP, source IP is the
897 backend IP and source L4 port is backend port for L4 load
898 balancers) and executes ct_snat and advances the packet
899 to the next table.
900
901 • A priority-0 flow that simply moves traffic to the next
902 table.
903
904 Ingress Table 15: Hairpin
905
906 • A priority-1 flow that hairpins traffic matched by non-
907 default flows in the Pre-Hairpin table. Hairpinning is
908 done at L2, Ethernet addresses are swapped and the pack‐
909 ets are looped back on the input port.
910
911 • A priority-0 flow that simply moves traffic to the next
912 table.
913
914 Ingress Table 16: ARP/ND responder
915
916 This table implements ARP/ND responder in a logical switch for known
917 IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
918 by locally responding to ARP requests without the need to send to other
919 hypervisors. One common case is when the inport is a logical port asso‐
920 ciated with a VIF and the broadcast is responded to on the local hyper‐
921 visor rather than broadcast across the whole network and responded to
922 by the destination VM. This behavior is proxy ARP.
923
924 ARP requests arrive from VMs from a logical switch inport of type de‐
925 fault. For this case, the logical switch proxy ARP rules can be for
926 other VMs or logical router ports. Logical switch proxy ARP rules may
927 be programmed both for mac binding of IP addresses on other logical
928 switch VIF ports (which are of the default logical switch port type,
929 representing connectivity to VMs or containers), and for mac binding of
930 IP addresses on logical switch router type ports, representing their
931 logical router port peers. In order to support proxy ARP for logical
932 router ports, an IP address must be configured on the logical switch
933 router type port, with the same value as the peer logical router port.
934 The configured MAC addresses must match as well. When a VM sends an ARP
935 request for a distributed logical router port and if the peer router
936 type port of the attached logical switch does not have an IP address
937 configured, the ARP request will be broadcast on the logical switch.
938 One of the copies of the ARP request will go through the logical switch
939 router type port to the logical router datapath, where the logical
940 router ARP responder will generate a reply. The MAC binding of a dis‐
941 tributed logical router, once learned by an associated VM, is used for
942 all that VM’s communication needing routing. Hence, the action of a VM
943 re-arping for the mac binding of the logical router port should be
944 rare.
945
946 Logical switch ARP responder proxy ARP rules can also be hit when re‐
947 ceiving ARP requests externally on a L2 gateway port. In this case, the
948 hypervisor acting as an L2 gateway, responds to the ARP request on be‐
949 half of a destination VM.
950
951 Note that ARP requests received from localnet or vtep logical inports
952 can either go directly to VMs, in which case the VM responds or can hit
953 an ARP responder for a logical router port if the packet is used to re‐
954 solve a logical router port next hop address. In either case, logical
955 switch ARP responder rules will not be hit. It contains these logical
956 flows:
957
958 • Priority-100 flows to skip the ARP responder if inport is
959 of type localnet or vtep and advances directly to the
960 next table. ARP requests sent to localnet or vtep ports
961 can be received by multiple hypervisors. Now, because the
962 same mac binding rules are downloaded to all hypervisors,
963 each of the multiple hypervisors will respond. This will
964 confuse L2 learning on the source of the ARP requests.
965 ARP requests received on an inport of type router are not
966 expected to hit any logical switch ARP responder flows.
967 However, no skip flows are installed for these packets,
968 as there would be some additional flow cost for this and
969 the value appears limited.
970
971 • If inport V is of type virtual adds a priority-100 logi‐
972 cal flow for each P configured in the options:virtual-
973 parents column with the match
974
975 inport == P && && ((arp.op == 1 && arp.spa == VIP && arp.tpa == VIP) || (arp.op == 2 && arp.spa == VIP))
976
977
978 and applies the action
979
980 bind_vport(V, inport);
981
982
983 and advances the packet to the next table.
984
985 Where VIP is the virtual ip configured in the column op‐
986 tions:virtual-ip.
987
988 • Priority-50 flows that match ARP requests to each known
989 IP address A of every logical switch port, and respond
990 with ARP replies directly with corresponding Ethernet ad‐
991 dress E:
992
993 eth.dst = eth.src;
994 eth.src = E;
995 arp.op = 2; /* ARP reply. */
996 arp.tha = arp.sha;
997 arp.sha = E;
998 arp.tpa = arp.spa;
999 arp.spa = A;
1000 outport = inport;
1001 flags.loopback = 1;
1002 output;
1003
1004
1005 These flows are omitted for logical ports (other than
1006 router ports or localport ports) that are down (unless
1007 ignore_lsp_down is configured as true in options column
1008 of NB_Global table of the Northbound database), for logi‐
1009 cal ports of type virtual, for logical ports with ’un‐
1010 known’ address set and for logical ports of a logical
1011 switch configured with other_config:vlan-passthru=true.
1012
1013 The above ARP responder flows are added for the list of
1014 IPv4 addresses if defined in options:arp_proxy column of
1015 Logical_Switch_Port table for logical switch ports of
1016 type router.
1017
1018 • Priority-50 flows that match IPv6 ND neighbor solicita‐
1019 tions to each known IP address A (and A’s solicited node
1020 address) of every logical switch port except of type
1021 router, and respond with neighbor advertisements directly
1022 with corresponding Ethernet address E:
1023
1024 nd_na {
1025 eth.src = E;
1026 ip6.src = A;
1027 nd.target = A;
1028 nd.tll = E;
1029 outport = inport;
1030 flags.loopback = 1;
1031 output;
1032 };
1033
1034
1035 Priority-50 flows that match IPv6 ND neighbor solicita‐
1036 tions to each known IP address A (and A’s solicited node
1037 address) of logical switch port of type router, and re‐
1038 spond with neighbor advertisements directly with corre‐
1039 sponding Ethernet address E:
1040
1041 nd_na_router {
1042 eth.src = E;
1043 ip6.src = A;
1044 nd.target = A;
1045 nd.tll = E;
1046 outport = inport;
1047 flags.loopback = 1;
1048 output;
1049 };
1050
1051
1052 These flows are omitted for logical ports (other than
1053 router ports or localport ports) that are down (unless
1054 ignore_lsp_down is configured as true in options column
1055 of NB_Global table of the Northbound database), for logi‐
1056 cal ports of type virtual and for logical ports with ’un‐
1057 known’ address set.
1058
1059 • Priority-100 flows with match criteria like the ARP and
1060 ND flows above, except that they only match packets from
1061 the inport that owns the IP addresses in question, with
1062 action next;. These flows prevent OVN from replying to,
1063 for example, an ARP request emitted by a VM for its own
1064 IP address. A VM only makes this kind of request to at‐
1065 tempt to detect a duplicate IP address assignment, so
1066 sending a reply will prevent the VM from accepting the IP
1067 address that it owns.
1068
1069 In place of next;, it would be reasonable to use drop;
1070 for the flows’ actions. If everything is working as it is
1071 configured, then this would produce equivalent results,
1072 since no host should reply to the request. But ARPing for
1073 one’s own IP address is intended to detect situations
1074 where the network is not working as configured, so drop‐
1075 ping the request would frustrate that intent.
1076
1077 • For each SVC_MON_SRC_IP defined in the value of the
1078 ip_port_mappings:ENDPOINT_IP column of Load_Balancer ta‐
1079 ble, priority-110 logical flow is added with the match
1080 arp.tpa == SVC_MON_SRC_IP && && arp.op == 1 and applies
1081 the action
1082
1083 eth.dst = eth.src;
1084 eth.src = E;
1085 arp.op = 2; /* ARP reply. */
1086 arp.tha = arp.sha;
1087 arp.sha = E;
1088 arp.tpa = arp.spa;
1089 arp.spa = A;
1090 outport = inport;
1091 flags.loopback = 1;
1092 output;
1093
1094
1095 where E is the service monitor source mac defined in the
1096 options:svc_monitor_mac column in the NB_Global table.
1097 This mac is used as the source mac in the service monitor
1098 packets for the load balancer endpoint IP health checks.
1099
1100 SVC_MON_SRC_IP is used as the source ip in the service
1101 monitor IPv4 packets for the load balancer endpoint IP
1102 health checks.
1103
1104 These flows are required if an ARP request is sent for
1105 the IP SVC_MON_SRC_IP.
1106
1107 • For each VIP configured in the table Forwarding_Group a
1108 priority-50 logical flow is added with the match arp.tpa
1109 == vip && && arp.op == 1
1110 and applies the action
1111
1112 eth.dst = eth.src;
1113 eth.src = E;
1114 arp.op = 2; /* ARP reply. */
1115 arp.tha = arp.sha;
1116 arp.sha = E;
1117 arp.tpa = arp.spa;
1118 arp.spa = A;
1119 outport = inport;
1120 flags.loopback = 1;
1121 output;
1122
1123
1124 where E is the forwarding group’s mac defined in the
1125 vmac.
1126
1127 A is used as either the destination ip for load balancing
1128 traffic to child ports or as nexthop to hosts behind the
1129 child ports.
1130
1131 These flows are required to respond to an ARP request if
1132 an ARP request is sent for the IP vip.
1133
1134 • One priority-0 fallback flow that matches all packets and
1135 advances to the next table.
1136
1137 Ingress Table 17: DHCP option processing
1138
1139 This table adds the DHCPv4 options to a DHCPv4 packet from the logical
1140 ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
1141 larly for DHCPv6 options. This table also adds flows for the logical
1142 ports of type external.
1143
1144 • A priority-100 logical flow is added for these logical
1145 ports which matches the IPv4 packet with udp.src = 68 and
1146 udp.dst = 67 and applies the action put_dhcp_opts and ad‐
1147 vances the packet to the next table.
1148
1149 reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
1150 next;
1151
1152
1153 For DHCPDISCOVER and DHCPREQUEST, this transforms the
1154 packet into a DHCP reply, adds the DHCP offer IP ip and
1155 options to the packet, and stores 1 into reg0[3]. For
1156 other kinds of packets, it just stores 0 into reg0[3].
1157 Either way, it continues to the next table.
1158
1159 • A priority-100 logical flow is added for these logical
1160 ports which matches the IPv6 packet with udp.src = 546
1161 and udp.dst = 547 and applies the action put_dhcpv6_opts
1162 and advances the packet to the next table.
1163
1164 reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
1165 next;
1166
1167
1168 For DHCPv6 Solicit/Request/Confirm packets, this trans‐
1169 forms the packet into a DHCPv6 Advertise/Reply, adds the
1170 DHCPv6 offer IP ip and options to the packet, and stores
1171 1 into reg0[3]. For other kinds of packets, it just
1172 stores 0 into reg0[3]. Either way, it continues to the
1173 next table.
1174
1175 • A priority-0 flow that matches all packets to advances to
1176 table 16.
1177
1178 Ingress Table 18: DHCP responses
1179
1180 This table implements DHCP responder for the DHCP replies generated by
1181 the previous table.
1182
1183 • A priority 100 logical flow is added for the logical
1184 ports configured with DHCPv4 options which matches IPv4
1185 packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
1186 1 and responds back to the inport after applying these
1187 actions. If reg0[3] is set to 1, it means that the action
1188 put_dhcp_opts was successful.
1189
1190 eth.dst = eth.src;
1191 eth.src = E;
1192 ip4.src = S;
1193 udp.src = 67;
1194 udp.dst = 68;
1195 outport = P;
1196 flags.loopback = 1;
1197 output;
1198
1199
1200 where E is the server MAC address and S is the server
1201 IPv4 address defined in the DHCPv4 options. Note that
1202 ip4.dst field is handled by put_dhcp_opts.
1203
1204 (This terminates ingress packet processing; the packet
1205 does not go to the next ingress table.)
1206
1207 • A priority 100 logical flow is added for the logical
1208 ports configured with DHCPv6 options which matches IPv6
1209 packets with udp.src == 546 && udp.dst == 547 && reg0[3]
1210 == 1 and responds back to the inport after applying these
1211 actions. If reg0[3] is set to 1, it means that the action
1212 put_dhcpv6_opts was successful.
1213
1214 eth.dst = eth.src;
1215 eth.src = E;
1216 ip6.dst = A;
1217 ip6.src = S;
1218 udp.src = 547;
1219 udp.dst = 546;
1220 outport = P;
1221 flags.loopback = 1;
1222 output;
1223
1224
1225 where E is the server MAC address and S is the server
1226 IPv6 LLA address generated from the server_id defined in
1227 the DHCPv6 options and A is the IPv6 address defined in
1228 the logical port’s addresses column.
1229
1230 (This terminates packet processing; the packet does not
1231 go on the next ingress table.)
1232
1233 • A priority-0 flow that matches all packets to advances to
1234 table 17.
1235
1236 Ingress Table 19 DNS Lookup
1237
1238 This table looks up and resolves the DNS names to the corresponding
1239 configured IP address(es).
1240
1241 • A priority-100 logical flow for each logical switch data‐
1242 path if it is configured with DNS records, which matches
1243 the IPv4 and IPv6 packets with udp.dst = 53 and applies
1244 the action dns_lookup and advances the packet to the next
1245 table.
1246
1247 reg0[4] = dns_lookup(); next;
1248
1249
1250 For valid DNS packets, this transforms the packet into a
1251 DNS reply if the DNS name can be resolved, and stores 1
1252 into reg0[4]. For failed DNS resolution or other kinds of
1253 packets, it just stores 0 into reg0[4]. Either way, it
1254 continues to the next table.
1255
1256 Ingress Table 20 DNS Responses
1257
1258 This table implements DNS responder for the DNS replies generated by
1259 the previous table.
1260
1261 • A priority-100 logical flow for each logical switch data‐
1262 path if it is configured with DNS records, which matches
1263 the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
1264 1 and responds back to the inport after applying these
1265 actions. If reg0[4] is set to 1, it means that the action
1266 dns_lookup was successful.
1267
1268 eth.dst <-> eth.src;
1269 ip4.src <-> ip4.dst;
1270 udp.dst = udp.src;
1271 udp.src = 53;
1272 outport = P;
1273 flags.loopback = 1;
1274 output;
1275
1276
1277 (This terminates ingress packet processing; the packet
1278 does not go to the next ingress table.)
1279
1280 Ingress table 21 External ports
1281
1282 Traffic from the external logical ports enter the ingress datapath
1283 pipeline via the localnet port. This table adds the below logical flows
1284 to handle the traffic from these ports.
1285
1286 • A priority-100 flow is added for each external logical
1287 port which doesn’t reside on a chassis to drop the
1288 ARP/IPv6 NS request to the router IP(s) (of the logical
1289 switch) which matches on the inport of the external logi‐
1290 cal port and the valid eth.src address(es) of the exter‐
1291 nal logical port.
1292
1293 This flow guarantees that the ARP/NS request to the
1294 router IP address from the external ports is responded by
1295 only the chassis which has claimed these external ports.
1296 All the other chassis, drops these packets.
1297
1298 A priority-100 flow is added for each external logical
1299 port which doesn’t reside on a chassis to drop any packet
1300 destined to the router mac - with the match inport == ex‐
1301 ternal && eth.src == E && eth.dst == R && !is_chas‐
1302 sis_resident("external") where E is the external port mac
1303 and R is the router port mac.
1304
1305 • A priority-0 flow that matches all packets to advances to
1306 table 20.
1307
1308 Ingress Table 22 Destination Lookup
1309
1310 This table implements switching behavior. It contains these logical
1311 flows:
1312
1313 • A priority-110 flow with the match eth.src == E for all
1314 logical switch datapaths and applies the action han‐
1315 dle_svc_check(inport). Where E is the service monitor mac
1316 defined in the options:svc_monitor_mac colum of NB_Global
1317 table.
1318
1319 • A priority-100 flow that punts all IGMP/MLD packets to
1320 ovn-controller if multicast snooping is enabled on the
1321 logical switch. The flow also forwards the IGMP/MLD pack‐
1322 ets to the MC_MROUTER_STATIC multicast group, which
1323 ovn-northd populates with all the logical ports that have
1324 options :mcast_flood_reports=’true’.
1325
1326 • Priority-90 flows that forward registered IP multicast
1327 traffic to their corresponding multicast group, which
1328 ovn-northd creates based on learnt IGMP_Group entries.
1329 The flows also forward packets to the MC_MROUTER_FLOOD
1330 multicast group, which ovn-nortdh populates with all the
1331 logical ports that are connected to logical routers with
1332 options:mcast_relay=’true’.
1333
1334 • A priority-85 flow that forwards all IP multicast traffic
1335 destined to 224.0.0.X to the MC_FLOOD multicast group,
1336 which ovn-northd populates with all enabled logical
1337 ports.
1338
1339 • A priority-85 flow that forwards all IP multicast traffic
1340 destined to reserved multicast IPv6 addresses (RFC 4291,
1341 2.7.1, e.g., Solicited-Node multicast) to the MC_FLOOD
1342 multicast group, which ovn-northd populates with all en‐
1343 abled logical ports.
1344
1345 • A priority-80 flow that forwards all unregistered IP mul‐
1346 ticast traffic to the MC_STATIC multicast group, which
1347 ovn-northd populates with all the logical ports that have
1348 options :mcast_flood=’true’. The flow also forwards un‐
1349 registered IP multicast traffic to the MC_MROUTER_FLOOD
1350 multicast group, which ovn-northd populates with all the
1351 logical ports connected to logical routers that have op‐
1352 tions :mcast_relay=’true’.
1353
1354 • A priority-80 flow that drops all unregistered IP multi‐
1355 cast traffic if other_config :mcast_snoop=’true’ and
1356 other_config :mcast_flood_unregistered=’false’ and the
1357 switch is not connected to a logical router that has op‐
1358 tions :mcast_relay=’true’ and the switch doesn’t have any
1359 logical port with options :mcast_flood=’true’.
1360
1361 • Priority-80 flows for each IP address/VIP/NAT address
1362 owned by a router port connected to the switch. These
1363 flows match ARP requests and ND packets for the specific
1364 IP addresses. Matched packets are forwarded only to the
1365 router that owns the IP address and to the MC_FLOOD_L2
1366 multicast group which contains all non-router logical
1367 ports.
1368
1369 • Priority-90 flows for each IP address/VIP/NAT address
1370 configured outside its owning router port’s subnet. These
1371 flows match ARP requests and ND packets for the specific
1372 IP addresses. Matched packets are forwarded to the
1373 MC_FLOOD multicast group which contains all connected
1374 logical ports.
1375
1376 • Priority-75 flows for each port connected to a logical
1377 router matching self originated ARP request/ND packets.
1378 These packets are flooded to the MC_FLOOD_L2 which con‐
1379 tains all non-router logical ports.
1380
1381 • A priority-70 flow that outputs all packets with an Eth‐
1382 ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
1383 ticast group.
1384
1385 • One priority-50 flow that matches each known Ethernet ad‐
1386 dress against eth.dst and outputs the packet to the sin‐
1387 gle associated output port.
1388
1389 For the Ethernet address on a logical switch port of type
1390 router, when that logical switch port’s addresses column
1391 is set to router and the connected logical router port
1392 has a gateway chassis:
1393
1394 • The flow for the connected logical router port’s
1395 Ethernet address is only programmed on the gateway
1396 chassis.
1397
1398 • If the logical router has rules specified in nat
1399 with external_mac, then those addresses are also
1400 used to populate the switch’s destination lookup
1401 on the chassis where logical_port is resident.
1402
1403 For the Ethernet address on a logical switch port of type
1404 router, when that logical switch port’s addresses column
1405 is set to router and the connected logical router port
1406 specifies a reside-on-redirect-chassis and the logical
1407 router to which the connected logical router port belongs
1408 to has a distributed gateway LRP:
1409
1410 • The flow for the connected logical router port’s
1411 Ethernet address is only programmed on the gateway
1412 chassis.
1413
1414 For each forwarding group configured on the logical
1415 switch datapath, a priority-50 flow that matches on
1416 eth.dst == VIP
1417 with an action of fwd_group(childports=args ), where
1418 args contains comma separated logical switch child ports
1419 to load balance to. If liveness is enabled, then action
1420 also includes liveness=true.
1421
1422 • One priority-0 fallback flow that matches all packets
1423 with the action outport = get_fdb(eth.dst); next;. The
1424 action get_fdb gets the port for the eth.dst in the MAC
1425 learning table of the logical switch datapath. If there
1426 is no entry for eth.dst in the MAC learning table, then
1427 it stores none in the outport.
1428
1429 Ingress Table 24 Destination unknown
1430
1431 This table handles the packets whose destination was not found or and
1432 looked up in the MAC learning table of the logical switch datapath. It
1433 contains the following flows.
1434
1435 • If the logical switch has logical ports with ’unknown’
1436 addresses set, then the below logical flow is added
1437
1438 • Priority 50 flow with the match outport == none
1439 then outputs them to the MC_UNKNOWN multicast
1440 group, which ovn-northd populates with all enabled
1441 logical ports that accept unknown destination
1442 packets. As a small optimization, if no logical
1443 ports accept unknown destination packets,
1444 ovn-northd omits this multicast group and logical
1445 flow.
1446
1447 If the logical switch has no logical ports with ’unknown’
1448 address set, then the below logical flow is added
1449
1450 • Priority 50 flow with the match outport == none
1451 and drops the packets.
1452
1453 • One priority-0 fallback flow that outputs the packet to
1454 the egress stage with the outport learnt from get_fdb ac‐
1455 tion.
1456
1457 Egress Table 0: Pre-LB
1458
1459 This table is similar to ingress table Pre-LB. It contains a priority-0
1460 flow that simply moves traffic to the next table. Moreover it contains
1461 a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
1462 table. If any load balancing rules exist for the datapath, a prior‐
1463 ity-100 flow is added with a match of ip and action of reg0[2] = 1;
1464 next; to act as a hint for table Pre-stateful to send IP packets to the
1465 connection tracker for packet de-fragmentation and possibly DNAT the
1466 destination VIP to one of the selected backend for already commited
1467 load balanced traffic.
1468
1469 This table also has a priority-110 flow with the match eth.src == E for
1470 all logical switch datapaths to move traffic to the next table. Where E
1471 is the service monitor mac defined in the options:svc_monitor_mac colum
1472 of NB_Global table.
1473
1474 Egress Table 1: to-lport Pre-ACLs
1475
1476 This is similar to ingress table Pre-ACLs except for to-lport traffic.
1477
1478 This table also has a priority-110 flow with the match eth.src == E for
1479 all logical switch datapaths to move traffic to the next table. Where E
1480 is the service monitor mac defined in the options:svc_monitor_mac colum
1481 of NB_Global table.
1482
1483 This table also has a priority-110 flow with the match outport == I for
1484 all logical switch datapaths to move traffic to the next table. Where I
1485 is the peer of a logical router port. This flow is added to skip the
1486 connection tracking of packets which will be entering logical router
1487 datapath from logical switch datapath for routing.
1488
1489 Egress Table 2: Pre-stateful
1490
1491 This is similar to ingress table Pre-stateful. This table adds the be‐
1492 low 3 logical flows.
1493
1494 • A Priority-120 flow that send the packets to connection
1495 tracker using ct_lb; as the action so that the already
1496 established traffic gets unDNATted from the backend IP to
1497 the load balancer VIP based on a hint provided by the
1498 previous tables with a match for reg0[2] == 1. If the
1499 packet was not DNATted earlier, then ct_lb functions like
1500 ct_next.
1501
1502 • A priority-100 flow sends the packets to connection
1503 tracker based on a hint provided by the previous tables
1504 (with a match for reg0[0] == 1) by using the ct_next; ac‐
1505 tion.
1506
1507 • A priority-0 flow that matches all packets to advance to
1508 the next table.
1509
1510 Egress Table 3: from-lport ACL hints
1511
1512 This is similar to ingress table ACL hints.
1513
1514 Egress Table 4: to-lport ACLs
1515
1516 This is similar to ingress table ACLs except for to-lport ACLs.
1517
1518 In addition, the following flows are added.
1519
1520 • A priority 34000 logical flow is added for each logical
1521 port which has DHCPv4 options defined to allow the DHCPv4
1522 reply packet and which has DHCPv6 options defined to al‐
1523 low the DHCPv6 reply packet from the Ingress Table 16:
1524 DHCP responses.
1525
1526 • A priority 34000 logical flow is added for each logical
1527 switch datapath configured with DNS records with the
1528 match udp.dst = 53 to allow the DNS reply packet from the
1529 Ingress Table 18: DNS responses.
1530
1531 • A priority 34000 logical flow is added for each logical
1532 switch datapath with the match eth.src = E to allow the
1533 service monitor request packet generated by ovn-con‐
1534 troller with the action next, where E is the service mon‐
1535 itor mac defined in the options:svc_monitor_mac colum of
1536 NB_Global table.
1537
1538 Egress Table 5: to-lport QoS Marking
1539
1540 This is similar to ingress table QoS marking except they apply to
1541 to-lport QoS rules.
1542
1543 Egress Table 6: to-lport QoS Meter
1544
1545 This is similar to ingress table QoS meter except they apply to
1546 to-lport QoS rules.
1547
1548 Egress Table 7: Stateful
1549
1550 This is similar to ingress table Stateful except that there are no
1551 rules added for load balancing new connections.
1552
1553 Egress Table 8: Egress Port Security - IP
1554
1555 This is similar to the port security logic in table Ingress Port Secu‐
1556 rity - IP except that outport, eth.dst, ip4.dst and ip6.dst are checked
1557 instead of inport, eth.src, ip4.src and ip6.src
1558
1559 Egress Table 9: Egress Port Security - L2
1560
1561 This is similar to the ingress port security logic in ingress table Ad‐
1562 mission Control and Ingress Port Security - L2, but with important dif‐
1563 ferences. Most obviously, outport and eth.dst are checked instead of
1564 inport and eth.src. Second, packets directed to broadcast or multicast
1565 eth.dst are always accepted instead of being subject to the port secu‐
1566 rity rules; this is implemented through a priority-100 flow that
1567 matches on eth.mcast with action output;. Moreover, to ensure that even
1568 broadcast and multicast packets are not delivered to disabled logical
1569 ports, a priority-150 flow for each disabled logical outport overrides
1570 the priority-100 flow with a drop; action. Finally if egress qos has
1571 been enabled on a localnet port, the outgoing queue id is set through
1572 set_queue action. Please remember to mark the corresponding physical
1573 interface with ovn-egress-iface set to true in external_ids
1574
1575 Logical Router Datapaths
1576 Logical router datapaths will only exist for Logical_Router rows in the
1577 OVN_Northbound database that do not have enabled set to false
1578
1579 Ingress Table 0: L2 Admission Control
1580
1581 This table drops packets that the router shouldn’t see at all based on
1582 their Ethernet headers. It contains the following flows:
1583
1584 • Priority-100 flows to drop packets with VLAN tags or mul‐
1585 ticast Ethernet source addresses.
1586
1587 • For each enabled router port P with Ethernet address E, a
1588 priority-50 flow that matches inport == P && (eth.mcast
1589 || eth.dst == E), stores the router port ethernet address
1590 and advances to next table, with action xreg0[0..47]=E;
1591 next;.
1592
1593 For the gateway port on a distributed logical router
1594 (where one of the logical router ports specifies a gate‐
1595 way chassis), the above flow matching eth.dst == E is
1596 only programmed on the gateway port instance on the gate‐
1597 way chassis.
1598
1599 For a distributed logical router or for gateway router
1600 where the port is configured with options:gateway_mtu the
1601 action of the above flow is modified adding
1602 check_pkt_larger in order to mark the packet setting REG‐
1603 BIT_PKT_LARGER if the size is greater than the MTU.
1604
1605 • For each dnat_and_snat NAT rule on a distributed router
1606 that specifies an external Ethernet address E, a prior‐
1607 ity-50 flow that matches inport == GW && eth.dst == E,
1608 where GW is the logical router gateway port, with action
1609 xreg0[0..47]=E; next;.
1610
1611 This flow is only programmed on the gateway port instance
1612 on the chassis where the logical_port specified in the
1613 NAT rule resides.
1614
1615 Other packets are implicitly dropped.
1616
1617 Ingress Table 1: Neighbor lookup
1618
1619 For ARP and IPv6 Neighbor Discovery packets, this table looks into the
1620 MAC_Binding records to determine if OVN needs to learn the mac bind‐
1621 ings. Following flows are added:
1622
1623 • For each router port P that owns IP address A, which be‐
1624 longs to subnet S with prefix length L, if the option al‐
1625 ways_learn_from_arp_request is true for this router, a
1626 priority-100 flow is added which matches inport == P &&
1627 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1628 lowing actions:
1629
1630 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1631 next;
1632
1633
1634 If the option always_learn_from_arp_request is false, the
1635 following two flows are added.
1636
1637 A priority-110 flow is added which matches inport == P &&
1638 arp.spa == S/L && arp.tpa == A && arp.op == 1 (ARP re‐
1639 quest) with the following actions:
1640
1641 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1642 reg9[3] = 1;
1643 next;
1644
1645
1646 A priority-100 flow is added which matches inport == P &&
1647 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1648 lowing actions:
1649
1650 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1651 reg9[3] = lookup_arp_ip(inport, arp.spa);
1652 next;
1653
1654
1655 If the logical router port P is a distributed gateway
1656 router port, additional match is_chassis_resident(cr-P)
1657 is added for all these flows.
1658
1659 • A priority-100 flow which matches on ARP reply packets
1660 and applies the actions if the option al‐
1661 ways_learn_from_arp_request is true:
1662
1663 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1664 next;
1665
1666
1667 If the option always_learn_from_arp_request is false, the
1668 above actions will be:
1669
1670 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1671 reg9[3] = 1;
1672 next;
1673
1674
1675 • A priority-100 flow which matches on IPv6 Neighbor Dis‐
1676 covery advertisement packet and applies the actions if
1677 the option always_learn_from_arp_request is true:
1678
1679 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1680 next;
1681
1682
1683 If the option always_learn_from_arp_request is false, the
1684 above actions will be:
1685
1686 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1687 reg9[3] = 1;
1688 next;
1689
1690
1691 • A priority-100 flow which matches on IPv6 Neighbor Dis‐
1692 covery solicitation packet and applies the actions if the
1693 option always_learn_from_arp_request is true:
1694
1695 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1696 next;
1697
1698
1699 If the option always_learn_from_arp_request is false, the
1700 above actions will be:
1701
1702 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1703 reg9[3] = lookup_nd_ip(inport, ip6.src);
1704 next;
1705
1706
1707 • A priority-0 fallback flow that matches all packets and
1708 applies the action reg9[2] = 1; next; advancing the
1709 packet to the next table.
1710
1711 Ingress Table 2: Neighbor learning
1712
1713 This table adds flows to learn the mac bindings from the ARP and IPv6
1714 Neighbor Solicitation/Advertisement packets if it is needed according
1715 to the lookup results from the previous stage.
1716
1717 reg9[2] will be 1 if the lookup_arp/lookup_nd in the previous table was
1718 successful or skipped, meaning no need to learn mac binding from the
1719 packet.
1720
1721 reg9[3] will be 1 if the lookup_arp_ip/lookup_nd_ip in the previous ta‐
1722 ble was successful or skipped, meaning it is ok to learn mac binding
1723 from the packet (if reg9[2] is 0).
1724
1725 • A priority-100 flow with the match reg9[2] == 1 ||
1726 reg9[3] == 0 and advances the packet to the next table as
1727 there is no need to learn the neighbor.
1728
1729 • A priority-90 flow with the match arp and applies the ac‐
1730 tion put_arp(inport, arp.spa, arp.sha); next;
1731
1732 • A priority-90 flow with the match nd_na and applies the
1733 action put_nd(inport, nd.target, nd.tll); next;
1734
1735 • A priority-90 flow with the match nd_ns and applies the
1736 action put_nd(inport, ip6.src, nd.sll); next;
1737
1738 Ingress Table 3: IP Input
1739
1740 This table is the core of the logical router datapath functionality. It
1741 contains the following flows to implement very basic IP host function‐
1742 ality.
1743
1744 • For distributed logical routers or gateway routers with
1745 gateway port configured with options:gateway_mtu to a
1746 valid integer value, a priority-150 flow with the match
1747 inport == LRP && REGBIT_PKT_LARGER && REGBIT_EGRESS_LOOP‐
1748 BACK == 0, where LRP is the logical router port and ap‐
1749 plies the following action for ipv4 and ipv6 respec‐
1750 tively:
1751
1752 icmp4 {
1753 icmp4.type = 3; /* Destination Unreachable. */
1754 icmp4.code = 4; /* Frag Needed and DF was Set. */
1755 icmp4.frag_mtu = M;
1756 eth.dst = E;
1757 ip4.dst = ip4.src;
1758 ip4.src = I;
1759 ip.ttl = 255;
1760 REGBIT_EGRESS_LOOPBACK = 1;
1761 REGBIT_PKT_LARGER 0;
1762 next(pipeline=ingress, table=0);
1763 };
1764 icmp6 {
1765 icmp6.type = 2;
1766 icmp6.code = 0;
1767 icmp6.frag_mtu = M;
1768 eth.dst = E;
1769 ip6.dst = ip6.src;
1770 ip6.src = I;
1771 ip.ttl = 255;
1772 REGBIT_EGRESS_LOOPBACK = 1;
1773 REGBIT_PKT_LARGER 0;
1774 next(pipeline=ingress, table=0);
1775 };
1776
1777
1778 • For each NAT entry of a distributed logical router (with
1779 distributed gateway router port) of type snat, a prior‐
1780 ity-120 flow with the match inport == P && ip4.src == A
1781 advances the packet to the next pipeline, where P is the
1782 distributed logical router port and A is the external_ip
1783 set in the NAT entry. If A is an IPv6 address, then
1784 ip6.src is used for the match.
1785
1786 The above flow is required to handle the routing of the
1787 East/west NAT traffic.
1788
1789 • For each BFD port the two following priority-110 flows
1790 are added to manage BFD traffic:
1791
1792 • if ip4.src or ip6.src is any IP address owned by
1793 the router port and udp.dst == 3784 , the packet
1794 is advanced to the next pipeline stage.
1795
1796 • if ip4.dst or ip6.dst is any IP address owned by
1797 the router port and udp.dst == 3784 , the han‐
1798 dle_bfd_msg action is executed.
1799
1800 • L3 admission control: A priority-100 flow drops packets
1801 that match any of the following:
1802
1803 • ip4.src[28..31] == 0xe (multicast source)
1804
1805 • ip4.src == 255.255.255.255 (broadcast source)
1806
1807 • ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
1808 (localhost source or destination)
1809
1810 • ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
1811 network source or destination)
1812
1813 • ip4.src or ip6.src is any IP address owned by the
1814 router, unless the packet was recirculated due to
1815 egress loopback as indicated by REG‐
1816 BIT_EGRESS_LOOPBACK.
1817
1818 • ip4.src is the broadcast address of any IP network
1819 known to the router.
1820
1821 • A priority-100 flow parses DHCPv6 replies from IPv6 pre‐
1822 fix delegation routers (udp.src == 547 && udp.dst ==
1823 546). The handle_dhcpv6_reply is used to send IPv6 prefix
1824 delegation messages to the delegation router.
1825
1826 • ICMP echo reply. These flows reply to ICMP echo requests
1827 received for the router’s IP address. Let A be an IP ad‐
1828 dress owned by a router port. Then, for each A that is an
1829 IPv4 address, a priority-90 flow matches on ip4.dst == A
1830 and icmp4.type == 8 && icmp4.code == 0 (ICMP echo re‐
1831 quest). For each A that is an IPv6 address, a priority-90
1832 flow matches on ip6.dst == A and icmp6.type == 128 &&
1833 icmp6.code == 0 (ICMPv6 echo request). The port of the
1834 router that receives the echo request does not matter.
1835 Also, the ip.ttl of the echo request packet is not
1836 checked, so it complies with RFC 1812, section 4.2.2.9.
1837 Flows for ICMPv4 echo requests use the following actions:
1838
1839 ip4.dst <-> ip4.src;
1840 ip.ttl = 255;
1841 icmp4.type = 0;
1842 flags.loopback = 1;
1843 next;
1844
1845
1846 Flows for ICMPv6 echo requests use the following actions:
1847
1848 ip6.dst <-> ip6.src;
1849 ip.ttl = 255;
1850 icmp6.type = 129;
1851 flags.loopback = 1;
1852 next;
1853
1854
1855 • Reply to ARP requests.
1856
1857 These flows reply to ARP requests for the router’s own IP
1858 address. The ARP requests are handled only if the re‐
1859 questor’s IP belongs to the same subnets of the logical
1860 router port. For each router port P that owns IP address
1861 A, which belongs to subnet S with prefix length L, and
1862 Ethernet address E, a priority-90 flow matches inport ==
1863 P && arp.spa == S/L && arp.op == 1 && arp.tpa == A (ARP
1864 request) with the following actions:
1865
1866 eth.dst = eth.src;
1867 eth.src = xreg0[0..47];
1868 arp.op = 2; /* ARP reply. */
1869 arp.tha = arp.sha;
1870 arp.sha = xreg0[0..47];
1871 arp.tpa = arp.spa;
1872 arp.spa = A;
1873 outport = inport;
1874 flags.loopback = 1;
1875 output;
1876
1877
1878 For the gateway port on a distributed logical router
1879 (where one of the logical router ports specifies a gate‐
1880 way chassis), the above flows are only programmed on the
1881 gateway port instance on the gateway chassis. This behav‐
1882 ior avoids generation of multiple ARP responses from dif‐
1883 ferent chassis, and allows upstream MAC learning to point
1884 to the gateway chassis.
1885
1886 For the logical router port with the option reside-on-re‐
1887 direct-chassis set (which is centralized), the above
1888 flows are only programmed on the gateway port instance on
1889 the gateway chassis (if the logical router has a distrib‐
1890 uted gateway port). This behavior avoids generation of
1891 multiple ARP responses from different chassis, and allows
1892 upstream MAC learning to point to the gateway chassis.
1893
1894 • Reply to IPv6 Neighbor Solicitations. These flows reply
1895 to Neighbor Solicitation requests for the router’s own
1896 IPv6 address and populate the logical router’s mac bind‐
1897 ing table.
1898
1899 For each router port P that owns IPv6 address A, so‐
1900 licited node address S, and Ethernet address E, a prior‐
1901 ity-90 flow matches inport == P && nd_ns && ip6.dst ==
1902 {A, E} && nd.target == A with the following actions:
1903
1904 nd_na_router {
1905 eth.src = xreg0[0..47];
1906 ip6.src = A;
1907 nd.target = A;
1908 nd.tll = xreg0[0..47];
1909 outport = inport;
1910 flags.loopback = 1;
1911 output;
1912 };
1913
1914
1915 For the gateway port on a distributed logical router
1916 (where one of the logical router ports specifies a gate‐
1917 way chassis), the above flows replying to IPv6 Neighbor
1918 Solicitations are only programmed on the gateway port in‐
1919 stance on the gateway chassis. This behavior avoids gen‐
1920 eration of multiple replies from different chassis, and
1921 allows upstream MAC learning to point to the gateway
1922 chassis.
1923
1924 • These flows reply to ARP requests or IPv6 neighbor solic‐
1925 itation for the virtual IP addresses configured in the
1926 router for NAT (both DNAT and SNAT) or load balancing.
1927
1928 IPv4: For a configured NAT (both DNAT and SNAT) IP ad‐
1929 dress or a load balancer IPv4 VIP A, for each router port
1930 P with Ethernet address E, a priority-90 flow matches
1931 arp.op == 1 && arp.tpa == A (ARP request) with the fol‐
1932 lowing actions:
1933
1934 eth.dst = eth.src;
1935 eth.src = xreg0[0..47];
1936 arp.op = 2; /* ARP reply. */
1937 arp.tha = arp.sha;
1938 arp.sha = xreg0[0..47];
1939 arp.tpa <-> arp.spa;
1940 outport = inport;
1941 flags.loopback = 1;
1942 output;
1943
1944
1945 IPv4: For a configured load balancer IPv4 VIP, a similar
1946 flow is added with the additional match inport == P.
1947
1948 If the router port P is a distributed gateway router
1949 port, then the is_chassis_resident(P) is also added in
1950 the match condition for the load balancer IPv4 VIP A.
1951
1952 IPv6: For a configured NAT (both DNAT and SNAT) IP ad‐
1953 dress or a load balancer IPv6 VIP A, solicited node ad‐
1954 dress S, for each router port P with Ethernet address E,
1955 a priority-90 flow matches inport == P && nd_ns &&
1956 ip6.dst == {A, S} && nd.target == A with the following
1957 actions:
1958
1959 eth.dst = eth.src;
1960 nd_na {
1961 eth.src = xreg0[0..47];
1962 nd.tll = xreg0[0..47];
1963 ip6.src = A;
1964 nd.target = A;
1965 outport = inport;
1966 flags.loopback = 1;
1967 output;
1968 }
1969
1970
1971 If the router port P is a distributed gateway router
1972 port, then the is_chassis_resident(P) is also added in
1973 the match condition for the load balancer IPv6 VIP A.
1974
1975 For the gateway port on a distributed logical router with
1976 NAT (where one of the logical router ports specifies a
1977 gateway chassis):
1978
1979 • If the corresponding NAT rule cannot be handled in
1980 a distributed manner, then a priority-92 flow is
1981 programmed on the gateway port instance on the
1982 gateway chassis. A priority-91 drop flow is pro‐
1983 grammed on the other chassis when ARP requests/NS
1984 packets are received on the gateway port. This be‐
1985 havior avoids generation of multiple ARP responses
1986 from different chassis, and allows upstream MAC
1987 learning to point to the gateway chassis.
1988
1989 • If the corresponding NAT rule can be handled in a
1990 distributed manner, then this flow is only pro‐
1991 grammed on the gateway port instance where the
1992 logical_port specified in the NAT rule resides.
1993
1994 Some of the actions are different for this case,
1995 using the external_mac specified in the NAT rule
1996 rather than the gateway port’s Ethernet address E:
1997
1998 eth.src = external_mac;
1999 arp.sha = external_mac;
2000
2001
2002 or in the case of IPv6 neighbor solicition:
2003
2004 eth.src = external_mac;
2005 nd.tll = external_mac;
2006
2007
2008 This behavior avoids generation of multiple ARP
2009 responses from different chassis, and allows up‐
2010 stream MAC learning to point to the correct chas‐
2011 sis.
2012
2013 • Priority-85 flows which drops the ARP and IPv6 Neighbor
2014 Discovery packets.
2015
2016 • A priority-84 flow explicitly allows IPv6 multicast traf‐
2017 fic that is supposed to reach the router pipeline (i.e.,
2018 router solicitation and router advertisement packets).
2019
2020 • A priority-83 flow explicitly drops IPv6 multicast traf‐
2021 fic that is destined to reserved multicast groups.
2022
2023 • A priority-82 flow allows IP multicast traffic if op‐
2024 tions:mcast_relay=’true’, otherwise drops it.
2025
2026 • UDP port unreachable. Priority-80 flows generate ICMP
2027 port unreachable messages in reply to UDP datagrams di‐
2028 rected to the router’s IP address, except in the special
2029 case of gateways, which accept traffic directed to a
2030 router IP for load balancing and NAT purposes.
2031
2032 These flows should not match IP fragments with nonzero
2033 offset.
2034
2035 • TCP reset. Priority-80 flows generate TCP reset messages
2036 in reply to TCP datagrams directed to the router’s IP ad‐
2037 dress, except in the special case of gateways, which ac‐
2038 cept traffic directed to a router IP for load balancing
2039 and NAT purposes.
2040
2041 These flows should not match IP fragments with nonzero
2042 offset.
2043
2044 • Protocol or address unreachable. Priority-70 flows gener‐
2045 ate ICMP protocol or address unreachable messages for
2046 IPv4 and IPv6 respectively in reply to packets directed
2047 to the router’s IP address on IP protocols other than
2048 UDP, TCP, and ICMP, except in the special case of gate‐
2049 ways, which accept traffic directed to a router IP for
2050 load balancing purposes.
2051
2052 These flows should not match IP fragments with nonzero
2053 offset.
2054
2055 • Drop other IP traffic to this router. These flows drop
2056 any other traffic destined to an IP address of this
2057 router that is not already handled by one of the flows
2058 above, which amounts to ICMP (other than echo requests)
2059 and fragments with nonzero offsets. For each IP address A
2060 owned by the router, a priority-60 flow matches ip4.dst
2061 == A or ip6.dst == A and drops the traffic. An exception
2062 is made and the above flow is not added if the router
2063 port’s own IP address is used to SNAT packets passing
2064 through that router.
2065
2066 The flows above handle all of the traffic that might be directed to the
2067 router itself. The following flows (with lower priorities) handle the
2068 remaining traffic, potentially for forwarding:
2069
2070 • Drop Ethernet local broadcast. A priority-50 flow with
2071 match eth.bcast drops traffic destined to the local Eth‐
2072 ernet broadcast address. By definition this traffic
2073 should not be forwarded.
2074
2075 • ICMP time exceeded. For each router port P, whose IP ad‐
2076 dress is A, a priority-40 flow with match inport == P &&
2077 ip.ttl == {0, 1} && !ip.later_frag matches packets whose
2078 TTL has expired, with the following actions to send an
2079 ICMP time exceeded reply for IPv4 and IPv6 respectively:
2080
2081 icmp4 {
2082 icmp4.type = 11; /* Time exceeded. */
2083 icmp4.code = 0; /* TTL exceeded in transit. */
2084 ip4.dst = ip4.src;
2085 ip4.src = A;
2086 ip.ttl = 255;
2087 next;
2088 };
2089 icmp6 {
2090 icmp6.type = 3; /* Time exceeded. */
2091 icmp6.code = 0; /* TTL exceeded in transit. */
2092 ip6.dst = ip6.src;
2093 ip6.src = A;
2094 ip.ttl = 255;
2095 next;
2096 };
2097
2098
2099 • TTL discard. A priority-30 flow with match ip.ttl == {0,
2100 1} and actions drop; drops other packets whose TTL has
2101 expired, that should not receive a ICMP error reply (i.e.
2102 fragments with nonzero offset).
2103
2104 • Next table. A priority-0 flows match all packets that
2105 aren’t already handled and uses actions next; to feed
2106 them to the next table.
2107
2108 Ingress Table 4: UNSNAT
2109
2110 This is for already established connections’ reverse traffic. i.e.,
2111 SNAT has already been done in egress pipeline and now the packet has
2112 entered the ingress pipeline as part of a reply. It is unSNATted here.
2113
2114 Ingress Table 4: UNSNAT on Gateway and Distributed Routers
2115
2116 • If the Router (Gateway or Distributed) is configured with
2117 load balancers, then below lflows are added:
2118
2119 For each IPv4 address A defined as load balancer VIP with
2120 the protocol P (and the protocol port T if defined) is
2121 also present as an external_ip in the NAT table, a prior‐
2122 ity-120 logical flow is added with the match ip4 &&
2123 ip4.dst == A && P with the action next; to advance the
2124 packet to the next table. If the load balancer has proto‐
2125 col port B defined, then the match also has P.dst == B.
2126
2127 The above flows are also added for IPv6 load balancers.
2128
2129 Ingress Table 4: UNSNAT on Gateway Routers
2130
2131 • If the Gateway router has been configured to force SNAT
2132 any previously DNATted packets to B, a priority-110 flow
2133 matches ip && ip4.dst == B or ip && ip6.dst == B with an
2134 action ct_snat; .
2135
2136 If the Gateway router is configured with
2137 lb_force_snat_ip=router_ip then for every logical router
2138 port P attached to the Gateway router with the router ip
2139 B, a priority-110 flow is added with the match inport ==
2140 P && ip4.dst == B or inport == P && ip6.dst == B with an
2141 action ct_snat; .
2142
2143 If the Gateway router has been configured to force SNAT
2144 any previously load-balanced packets to B, a priority-100
2145 flow matches ip && ip4.dst == B or ip && ip6.dst == B
2146 with an action ct_snat; .
2147
2148 For each NAT configuration in the OVN Northbound data‐
2149 base, that asks to change the source IP address of a
2150 packet from A to B, a priority-90 flow matches ip &&
2151 ip4.dst == B or ip && ip6.dst == B with an action
2152 ct_snat; . If the NAT rule is of type dnat_and_snat and
2153 has stateless=true in the options, then the action would
2154 be ip4/6.dst= (B).
2155
2156 A priority-0 logical flow with match 1 has actions next;.
2157
2158 Ingress Table 4: UNSNAT on Distributed Routers
2159
2160 • For each configuration in the OVN Northbound database,
2161 that asks to change the source IP address of a packet
2162 from A to B, a priority-100 flow matches ip && ip4.dst ==
2163 B && inport == GW or ip && ip6.dst == B && inport == GW
2164 where GW is the logical router gateway port, with an ac‐
2165 tion ct_snat;. If the NAT rule is of type dnat_and_snat
2166 and has stateless=true in the options, then the action
2167 would be ip4/6.dst= (B).
2168
2169 If the NAT rule cannot be handled in a distributed man‐
2170 ner, then the priority-100 flow above is only programmed
2171 on the gateway chassis.
2172
2173 A priority-0 logical flow with match 1 has actions next;.
2174
2175 Ingress Table 5: DEFRAG
2176
2177 This is to send packets to connection tracker for tracking and defrag‐
2178 mentation. It contains a priority-0 flow that simply moves traffic to
2179 the next table.
2180
2181 If load balancing rules with only virtual IP addresses are configured
2182 in OVN_Northbound database for a Gateway router, a priority-100 flow is
2183 added for each configured virtual IP address VIP. For IPv4 VIPs the
2184 flow matches ip && ip4.dst == VIP. For IPv6 VIPs, the flow matches ip
2185 && ip6.dst == VIP. The flow applies the action reg0 = VIP; ct_dnat; (or
2186 xxreg0 for IPv6) to send IP packets to the connection tracker for
2187 packet de-fragmentation and to dnat the destination IP for the commit‐
2188 ted connection before sending it to the next table.
2189
2190 If load balancing rules with virtual IP addresses and ports are config‐
2191 ured in OVN_Northbound database for a Gateway router, a priority-110
2192 flow is added for each configured virtual IP address VIP, protocol
2193 PROTO and port PORT. For IPv4 VIPs the flow matches ip && ip4.dst ==
2194 VIP && PROTO && PROTO.dst == PORT. For IPv6 VIPs, the flow matches ip
2195 && ip6.dst == VIP && PROTO && PROTO.dst == PORT. The flow applies the
2196 action reg0 = VIP; reg9[16..31] = PROTO.dst; ct_dnat; (or xxreg0 for
2197 IPv6) to send IP packets to the connection tracker for packet de-frag‐
2198 mentation and to dnat the destination IP for the committed connection
2199 before sending it to the next table.
2200
2201 If ECMP routes with symmetric reply are configured in the OVN_North‐
2202 bound database for a gateway router, a priority-300 flow is added for
2203 each router port on which symmetric replies are configured. The match‐
2204 ing logic for these ports essentially reverses the configured logic of
2205 the ECMP route. So for instance, a route with a destination routing
2206 policy will instead match if the source IP address matches the static
2207 route’s prefix. The flow uses the action ct_next to send IP packets to
2208 the connection tracker for packet de-fragmentation and tracking before
2209 sending it to the next table.
2210
2211 Ingress Table 6: DNAT
2212
2213 Packets enter the pipeline with destination IP address that needs to be
2214 DNATted from a virtual IP address to a real IP address. Packets in the
2215 reverse direction needs to be unDNATed.
2216
2217 Ingress Table 6: Load balancing DNAT rules
2218
2219 Following load balancing DNAT flows are added for Gateway router or
2220 Router with gateway port. These flows are programmed only on the gate‐
2221 way chassis. These flows do not get programmed for load balancers with
2222 IPv6 VIPs.
2223
2224 • If controller_event has been enabled for all the config‐
2225 ured load balancing rules for a Gateway router or Router
2226 with gateway port in OVN_Northbound database that does
2227 not have configured backends, a priority-130 flow is
2228 added to trigger ovn-controller events whenever the chas‐
2229 sis receives a packet for that particular VIP. If
2230 event-elb meter has been previously created, it will be
2231 associated to the empty_lb logical flow
2232
2233 • For all the configured load balancing rules for a Gateway
2234 router or Router with gateway port in OVN_Northbound
2235 database that includes a L4 port PORT of protocol P and
2236 IPv4 or IPv6 address VIP, a priority-120 flow that
2237 matches on ct.new && ip && reg0 == VIP && P &&
2238 reg9[16..31] == PORT (xxreg0 == VIP in the IPv6 case)
2239 with an action of ct_lb(args), where args contains comma
2240 separated IPv4 or IPv6 addresses (and optional port num‐
2241 bers) to load balance to. If the router is configured to
2242 force SNAT any load-balanced packets, the above action
2243 will be replaced by flags.force_snat_for_lb = 1;
2244 ct_lb(args);. If the load balancing rule is configured
2245 with skip_snat set to true, the above action will be re‐
2246 placed by flags.skip_snat_for_lb = 1; ct_lb(args);. If
2247 health check is enabled, then args will only contain
2248 those endpoints whose service monitor status entry in
2249 OVN_Southbound db is either online or empty.
2250
2251 The previous table lr_in_defrag sets the register reg0
2252 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2253 lished traffic, this table just advances the packet to
2254 the next stage.
2255
2256 • For all the configured load balancing rules for a router
2257 in OVN_Northbound database that includes a L4 port PORT
2258 of protocol P and IPv4 or IPv6 address VIP, a prior‐
2259 ity-120 flow that matches on ct.est && ip4 && reg0 == VIP
2260 && P && reg9[16..31] == PORT (ip6 and xxreg0 == VIP in
2261 the IPv6 case) with an action of next;. If the router is
2262 configured to force SNAT any load-balanced packets, the
2263 above action will be replaced by flags.force_snat_for_lb
2264 = 1; next;. If the load balancing rule is configured with
2265 skip_snat set to true, the above action will be replaced
2266 by flags.skip_snat_for_lb = 1; next;.
2267
2268 The previous table lr_in_defrag sets the register reg0
2269 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2270 lished traffic, this table just advances the packet to
2271 the next stage.
2272
2273 • For all the configured load balancing rules for a router
2274 in OVN_Northbound database that includes just an IP ad‐
2275 dress VIP to match on, a priority-110 flow that matches
2276 on ct.new && ip4 && reg0 == VIP (ip6 and xxreg0 == VIP in
2277 the IPv6 case) with an action of ct_lb(args), where args
2278 contains comma separated IPv4 or IPv6 addresses. If the
2279 router is configured to force SNAT any load-balanced
2280 packets, the above action will be replaced by
2281 flags.force_snat_for_lb = 1; ct_lb(args);. If the load
2282 balancing rule is configured with skip_snat set to true,
2283 the above action will be replaced by
2284 flags.skip_snat_for_lb = 1; ct_lb(args);.
2285
2286 The previous table lr_in_defrag sets the register reg0
2287 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2288 lished traffic, this table just advances the packet to
2289 the next stage.
2290
2291 • For all the configured load balancing rules for a router
2292 in OVN_Northbound database that includes just an IP ad‐
2293 dress VIP to match on, a priority-110 flow that matches
2294 on ct.est && ip4 && reg0 == VIP (or ip6 and xxreg0 ==
2295 VIP) with an action of next;. If the router is configured
2296 to force SNAT any load-balanced packets, the above action
2297 will be replaced by flags.force_snat_for_lb = 1; next;.
2298 If the load balancing rule is configured with skip_snat
2299 set to true, the above action will be replaced by
2300 flags.skip_snat_for_lb = 1; next;.
2301
2302 The previous table lr_in_defrag sets the register reg0
2303 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2304 lished traffic, this table just advances the packet to
2305 the next stage.
2306
2307 • If the load balancer is created with --reject option and
2308 it has no active backends, a TCP reset segment (for tcp)
2309 or an ICMP port unreachable packet (for all other kind of
2310 traffic) will be sent whenever an incoming packet is re‐
2311 ceived for this load-balancer. Please note using --reject
2312 option will disable empty_lb SB controller event for this
2313 load balancer.
2314
2315 Ingress Table 6: DNAT on Gateway Routers
2316
2317 • For each configuration in the OVN Northbound database,
2318 that asks to change the destination IP address of a
2319 packet from A to B, a priority-100 flow matches ip &&
2320 ip4.dst == A or ip && ip6.dst == A with an action
2321 flags.loopback = 1; ct_dnat(B);. If the Gateway router is
2322 configured to force SNAT any DNATed packet, the above ac‐
2323 tion will be replaced by flags.force_snat_for_dnat = 1;
2324 flags.loopback = 1; ct_dnat(B);. If the NAT rule is of
2325 type dnat_and_snat and has stateless=true in the options,
2326 then the action would be ip4/6.dst= (B).
2327
2328 If the NAT rule has allowed_ext_ips configured, then
2329 there is an additional match ip4.src == allowed_ext_ips .
2330 Similarly, for IPV6, match would be ip6.src == al‐
2331 lowed_ext_ips.
2332
2333 If the NAT rule has exempted_ext_ips set, then there is
2334 an additional flow configured at priority 101. The flow
2335 matches if source ip is an exempted_ext_ip and the action
2336 is next; . This flow is used to bypass the ct_dnat action
2337 for a packet originating from exempted_ext_ips.
2338
2339 • A priority-0 logical flow with match 1 has actions next;.
2340
2341 Ingress Table 6: DNAT on Distributed Routers
2342
2343 On distributed routers, the DNAT table only handles packets with desti‐
2344 nation IP address that needs to be DNATted from a virtual IP address to
2345 a real IP address. The unDNAT processing in the reverse direction is
2346 handled in a separate table in the egress pipeline.
2347
2348 • For each configuration in the OVN Northbound database,
2349 that asks to change the destination IP address of a
2350 packet from A to B, a priority-100 flow matches ip &&
2351 ip4.dst == B && inport == GW, where GW is the logical
2352 router gateway port, with an action ct_dnat(B);. The
2353 match will include ip6.dst == B in the IPv6 case. If the
2354 NAT rule is of type dnat_and_snat and has stateless=true
2355 in the options, then the action would be ip4/6.dst=(B).
2356
2357 If the NAT rule cannot be handled in a distributed man‐
2358 ner, then the priority-100 flow above is only programmed
2359 on the gateway chassis.
2360
2361 If the NAT rule has allowed_ext_ips configured, then
2362 there is an additional match ip4.src == allowed_ext_ips .
2363 Similarly, for IPV6, match would be ip6.src == al‐
2364 lowed_ext_ips.
2365
2366 If the NAT rule has exempted_ext_ips set, then there is
2367 an additional flow configured at priority 101. The flow
2368 matches if source ip is an exempted_ext_ip and the action
2369 is next; . This flow is used to bypass the ct_dnat action
2370 for a packet originating from exempted_ext_ips.
2371
2372 A priority-0 logical flow with match 1 has actions next;.
2373
2374 Ingress Table 7: ECMP symmetric reply processing
2375
2376 • If ECMP routes with symmetric reply are configured in the
2377 OVN_Northbound database for a gateway router, a prior‐
2378 ity-100 flow is added for each router port on which sym‐
2379 metric replies are configured. The matching logic for
2380 these ports essentially reverses the configured logic of
2381 the ECMP route. So for instance, a route with a destina‐
2382 tion routing policy will instead match if the source IP
2383 address matches the static route’s prefix. The flow uses
2384 the action ct_commit { ct_label.ecmp_reply_eth =
2385 eth.src;" " ct_label.ecmp_reply_port = K;}; next; to
2386 commit the connection and storing eth.src and the ECMP
2387 reply port binding tunnel key K in the ct_label.
2388
2389 Ingress Table 8: IPv6 ND RA option processing
2390
2391 • A priority-50 logical flow is added for each logical
2392 router port configured with IPv6 ND RA options which
2393 matches IPv6 ND Router Solicitation packet and applies
2394 the action put_nd_ra_opts and advances the packet to the
2395 next table.
2396
2397 reg0[5] = put_nd_ra_opts(options);next;
2398
2399
2400 For a valid IPv6 ND RS packet, this transforms the packet
2401 into an IPv6 ND RA reply and sets the RA options to the
2402 packet and stores 1 into reg0[5]. For other kinds of
2403 packets, it just stores 0 into reg0[5]. Either way, it
2404 continues to the next table.
2405
2406 • A priority-0 logical flow with match 1 has actions next;.
2407
2408 Ingress Table 9: IPv6 ND RA responder
2409
2410 This table implements IPv6 ND RA responder for the IPv6 ND RA replies
2411 generated by the previous table.
2412
2413 • A priority-50 logical flow is added for each logical
2414 router port configured with IPv6 ND RA options which
2415 matches IPv6 ND RA packets and reg0[5] == 1 and responds
2416 back to the inport after applying these actions. If
2417 reg0[5] is set to 1, it means that the action
2418 put_nd_ra_opts was successful.
2419
2420 eth.dst = eth.src;
2421 eth.src = E;
2422 ip6.dst = ip6.src;
2423 ip6.src = I;
2424 outport = P;
2425 flags.loopback = 1;
2426 output;
2427
2428
2429 where E is the MAC address and I is the IPv6 link local
2430 address of the logical router port.
2431
2432 (This terminates packet processing in ingress pipeline;
2433 the packet does not go to the next ingress table.)
2434
2435 • A priority-0 logical flow with match 1 has actions next;.
2436
2437 Ingress Table 10: IP Routing
2438
2439 A packet that arrives at this table is an IP packet that should be
2440 routed to the address in ip4.dst or ip6.dst. This table implements IP
2441 routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
2442 (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
2443 and advances to the next table for ARP resolution. It also sets reg1
2444 (or xxreg1) to the IP address owned by the selected router port
2445 (ingress table ARP Request will generate an ARP request, if needed,
2446 with reg0 as the target protocol address and reg1 as the source proto‐
2447 col address).
2448
2449 For ECMP routes, i.e. multiple static routes with same policy and pre‐
2450 fix but different nexthops, the above actions are deferred to next ta‐
2451 ble. This table, instead, is responsible for determine the ECMP group
2452 id and select a member id within the group based on 5-tuple hashing. It
2453 stores group id in reg8[0..15] and member id in reg8[16..31]. This step
2454 is skipped if the traffic going out the ECMP route is reply traffic,
2455 and the ECMP route was configured to use symmetric replies. Instead,
2456 the stored ct_label value is used to choose the destination. The least
2457 significant 48 bits of the ct_label tell the destination MAC address to
2458 which the packet should be sent. The next 16 bits tell the logical
2459 router port on which the packet should be sent. These values in the
2460 ct_label are set when the initial ingress traffic is received over the
2461 ECMP route.
2462
2463 This table contains the following logical flows:
2464
2465 • Priority-550 flow that drops IPv6 Router Solicitation/Ad‐
2466 vertisement packets that were not processed in previous
2467 tables.
2468
2469 • Priority-500 flows that match IP multicast traffic des‐
2470 tined to groups registered on any of the attached
2471 switches and sets outport to the associated multicast
2472 group that will eventually flood the traffic to all in‐
2473 terested attached logical switches. The flows also decre‐
2474 ment TTL.
2475
2476 • Priority-450 flow that matches unregistered IP multicast
2477 traffic and sets outport to the MC_STATIC multicast
2478 group, which ovn-northd populates with the logical ports
2479 that have options :mcast_flood=’true’. If no router ports
2480 are configured to flood multicast traffic the packets are
2481 dropped.
2482
2483 • IPv4 routing table. For each route to IPv4 network N with
2484 netmask M, on router port P with IP address A and Ether‐
2485 net address E, a logical flow with match ip4.dst == N/M,
2486 whose priority is the number of 1-bits in M, has the fol‐
2487 lowing actions:
2488
2489 ip.ttl--;
2490 reg8[0..15] = 0;
2491 reg0 = G;
2492 reg1 = A;
2493 eth.src = E;
2494 outport = P;
2495 flags.loopback = 1;
2496 next;
2497
2498
2499 (Ingress table 1 already verified that ip.ttl--; will not
2500 yield a TTL exceeded error.)
2501
2502 If the route has a gateway, G is the gateway IP address.
2503 Instead, if the route is from a configured static route,
2504 G is the next hop IP address. Else it is ip4.dst.
2505
2506 • IPv6 routing table. For each route to IPv6 network N with
2507 netmask M, on router port P with IP address A and Ether‐
2508 net address E, a logical flow with match in CIDR notation
2509 ip6.dst == N/M, whose priority is the integer value of M,
2510 has the following actions:
2511
2512 ip.ttl--;
2513 reg8[0..15] = 0;
2514 xxreg0 = G;
2515 xxreg1 = A;
2516 eth.src = E;
2517 outport = inport;
2518 flags.loopback = 1;
2519 next;
2520
2521
2522 (Ingress table 1 already verified that ip.ttl--; will not
2523 yield a TTL exceeded error.)
2524
2525 If the route has a gateway, G is the gateway IP address.
2526 Instead, if the route is from a configured static route,
2527 G is the next hop IP address. Else it is ip6.dst.
2528
2529 If the address A is in the link-local scope, the route
2530 will be limited to sending on the ingress port.
2531
2532 • For ECMP routes, they are grouped by policy and prefix.
2533 An unique id (non-zero) is assigned to each group, and
2534 each member is also assigned an unique id (non-zero)
2535 within each group.
2536
2537 For each IPv4/IPv6 ECMP group with group id GID and mem‐
2538 ber ids MID1, MID2, ..., a logical flow with match in
2539 CIDR notation ip4.dst == N/M, or ip6.dst == N/M, whose
2540 priority is the integer value of M, has the following ac‐
2541 tions:
2542
2543 ip.ttl--;
2544 flags.loopback = 1;
2545 reg8[0..15] = GID;
2546 select(reg8[16..31], MID1, MID2, ...);
2547
2548
2549 Ingress Table 11: IP_ROUTING_ECMP
2550
2551 This table implements the second part of IP routing for ECMP routes
2552 following the previous table. If a packet matched a ECMP group in the
2553 previous table, this table matches the group id and member id stored
2554 from the previous table, setting reg0 (or xxreg0 for IPv6) to the next-
2555 hop IP address (leaving ip4.dst or ip6.dst, the packet’s final destina‐
2556 tion, unchanged) and advances to the next table for ARP resolution. It
2557 also sets reg1 (or xxreg1) to the IP address owned by the selected
2558 router port (ingress table ARP Request will generate an ARP request, if
2559 needed, with reg0 as the target protocol address and reg1 as the source
2560 protocol address).
2561
2562 This processing is skipped for reply traffic being sent out of an ECMP
2563 route if the route was configured to use symmetric replies.
2564
2565 This table contains the following logical flows:
2566
2567 • A priority-150 flow that matches reg8[0..15] == 0 with
2568 action next; directly bypasses packets of non-ECMP
2569 routes.
2570
2571 • For each member with ID MID in each ECMP group with ID
2572 GID, a priority-100 flow with match reg8[0..15] == GID &&
2573 reg8[16..31] == MID has following actions:
2574
2575 [xx]reg0 = G;
2576 [xx]reg1 = A;
2577 eth.src = E;
2578 outport = P;
2579
2580
2581 Ingress Table 12: Router policies
2582
2583 This table adds flows for the logical router policies configured on the
2584 logical router. Please see the OVN_Northbound database Logi‐
2585 cal_Router_Policy table documentation in ovn-nb for supported actions.
2586
2587 • For each router policy configured on the logical router,
2588 a logical flow is added with specified priority, match
2589 and actions.
2590
2591 • If the policy action is reroute with 2 or more nexthops
2592 defined, then the logical flow is added with the follow‐
2593 ing actions:
2594
2595 reg8[0..15] = GID;
2596 reg8[16..31] = select(1,..n);
2597
2598
2599 where GID is the ECMP group id generated by ovn-northd
2600 for this policy and n is the number of nexthops. select
2601 action selects one of the nexthop member id, stores it in
2602 the register reg8[16..31] and advances the packet to the
2603 next stage.
2604
2605 • If the policy action is reroute with just one nexhop,
2606 then the logical flow is added with the following ac‐
2607 tions:
2608
2609 [xx]reg0 = H;
2610 eth.src = E;
2611 outport = P;
2612 reg8[0..15] = 0;
2613 flags.loopback = 1;
2614 next;
2615
2616
2617 where H is the nexthop defined in the router policy, E
2618 is the ethernet address of the logical router port from
2619 which the nexthop is reachable and P is the logical
2620 router port from which the nexthop is reachable.
2621
2622 • If a router policy has the option pkt_mark=m set and if
2623 the action is not drop, then the action also includes
2624 pkt.mark = m to mark the packet with the marker m.
2625
2626 Ingress Table 13: ECMP handling for router policies
2627
2628 This table handles the ECMP for the router policies configured with
2629 multiple nexthops.
2630
2631 • A priority-150 flow is added to advance the packet to the
2632 next stage if the ECMP group id register reg8[0..15] is
2633 0.
2634
2635 • For each ECMP reroute router policy with multiple nex‐
2636 thops, a priority-100 flow is added for each nexthop H
2637 with the match reg8[0..15] == GID && reg8[16..31] == M
2638 where GID is the router policy group id generated by
2639 ovn-northd and M is the member id of the nexthop H gener‐
2640 ated by ovn-northd. The following actions are added to
2641 the flow:
2642
2643 [xx]reg0 = H;
2644 eth.src = E;
2645 outport = P
2646 "flags.loopback = 1; "
2647 "next;"
2648
2649
2650 where H is the nexthop defined in the router policy, E
2651 is the ethernet address of the logical router port from
2652 which the nexthop is reachable and P is the logical
2653 router port from which the nexthop is reachable.
2654
2655 Ingress Table 14: ARP/ND Resolution
2656
2657 Any packet that reaches this table is an IP packet whose next-hop IPv4
2658 address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
2659 contains the final destination.) This table resolves the IP address in
2660 reg0 (or xxreg0) into an output port in outport and an Ethernet address
2661 in eth.dst, using the following flows:
2662
2663 • A priority-500 flow that matches IP multicast traffic
2664 that was allowed in the routing pipeline. For this kind
2665 of traffic the outport was already set so the flow just
2666 advances to the next table.
2667
2668 • Static MAC bindings. MAC bindings can be known statically
2669 based on data in the OVN_Northbound database. For router
2670 ports connected to logical switches, MAC bindings can be
2671 known statically from the addresses column in the Logi‐
2672 cal_Switch_Port table. For router ports connected to
2673 other logical routers, MAC bindings can be known stati‐
2674 cally from the mac and networks column in the Logi‐
2675 cal_Router_Port table. (Note: the flow is NOT installed
2676 for the IP addresses that belong to a neighbor logical
2677 router port if the current router has the options:dy‐
2678 namic_neigh_routers set to true)
2679
2680 For each IPv4 address A whose host is known to have Eth‐
2681 ernet address E on router port P, a priority-100 flow
2682 with match outport === P && reg0 == A has actions eth.dst
2683 = E; next;.
2684
2685 For each virtual ip A configured on a logical port of
2686 type virtual and its virtual parent set in its corre‐
2687 sponding Port_Binding record and the virtual parent with
2688 the Ethernet address E and the virtual ip is reachable
2689 via the router port P, a priority-100 flow with match
2690 outport === P && reg0 == A has actions eth.dst = E;
2691 next;.
2692
2693 For each virtual ip A configured on a logical port of
2694 type virtual and its virtual parent not set in its corre‐
2695 sponding Port_Binding record and the virtual ip A is
2696 reachable via the router port P, a priority-100 flow with
2697 match outport === P && reg0 == A has actions eth.dst =
2698 00:00:00:00:00:00; next;. This flow is added so that the
2699 ARP is always resolved for the virtual ip A by generating
2700 ARP request and not consulting the MAC_Binding table as
2701 it can have incorrect value for the virtual ip A.
2702
2703 For each IPv6 address A whose host is known to have Eth‐
2704 ernet address E on router port P, a priority-100 flow
2705 with match outport === P && xxreg0 == A has actions
2706 eth.dst = E; next;.
2707
2708 For each logical router port with an IPv4 address A and a
2709 mac address of E that is reachable via a different logi‐
2710 cal router port P, a priority-100 flow with match outport
2711 === P && reg0 == A has actions eth.dst = E; next;.
2712
2713 For each logical router port with an IPv6 address A and a
2714 mac address of E that is reachable via a different logi‐
2715 cal router port P, a priority-100 flow with match outport
2716 === P && xxreg0 == A has actions eth.dst = E; next;.
2717
2718 • Static MAC bindings from NAT entries. MAC bindings can
2719 also be known for the entries in the NAT table. Below
2720 flows are programmed for distributed logical routers i.e
2721 with a distributed router port.
2722
2723 For each row in the NAT table with IPv4 address A in the
2724 external_ip column of NAT table, a priority-100 flow with
2725 the match outport === P && reg0 == A has actions eth.dst
2726 = E; next;, where P is the distributed logical router
2727 port, E is the Ethernet address if set in the exter‐
2728 nal_mac column of NAT table for of type dnat_and_snat,
2729 otherwise the Ethernet address of the distributed logical
2730 router port. Note that if the external_ip is not within a
2731 subnet on the owning logical router, then OVN will only
2732 create ARP resolution flows if the options:add_route is
2733 set to true. Otherwise, no ARP resolution flows will be
2734 added.
2735
2736 For IPv6 NAT entries, same flows are added, but using the
2737 register xxreg0 for the match.
2738
2739 • Traffic with IP destination an address owned by the
2740 router should be dropped. Such traffic is normally
2741 dropped in ingress table IP Input except for IPs that are
2742 also shared with SNAT rules. However, if there was no un‐
2743 SNAT operation that happened successfully until this
2744 point in the pipeline and the destination IP of the
2745 packet is still a router owned IP, the packets can be
2746 safely dropped.
2747
2748 A priority-1 logical flow with match ip4.dst = {..}
2749 matches on traffic destined to router owned IPv4 ad‐
2750 dresses which are also SNAT IPs. This flow has action
2751 drop;.
2752
2753 A priority-1 logical flow with match ip6.dst = {..}
2754 matches on traffic destined to router owned IPv6 ad‐
2755 dresses which are also SNAT IPs. This flow has action
2756 drop;.
2757
2758 • Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
2759 ings that have become known dynamically through ARP or
2760 neighbor discovery. (The ingress table ARP Request will
2761 issue an ARP or neighbor solicitation request for cases
2762 where the binding is not yet known.)
2763
2764 A priority-0 logical flow with match ip4 has actions
2765 get_arp(outport, reg0); next;.
2766
2767 A priority-0 logical flow with match ip6 has actions
2768 get_nd(outport, xxreg0); next;.
2769
2770 • For a distributed gateway LRP with redirect-type set to
2771 bridged, a priority-50 flow will match outport ==
2772 "ROUTER_PORT" and !is_chassis_resident ("cr-ROUTER_PORT")
2773 has actions eth.dst = E; next;, where E is the ethernet
2774 address of the logical router port.
2775
2776 Ingress Table 15: Check packet length
2777
2778 For distributed logical routers or gateway routers with gateway port
2779 configured with options:gateway_mtu to a valid integer value, this ta‐
2780 ble adds a priority-50 logical flow with the match outport == GW_PORT
2781 where GW_PORT is the gateway router port and applies the action
2782 check_pkt_larger and advances the packet to the next table.
2783
2784 REGBIT_PKT_LARGER = check_pkt_larger(L); next;
2785
2786
2787 where L is the packet length to check for. If the packet is larger than
2788 L, it stores 1 in the register bit REGBIT_PKT_LARGER. The value of L is
2789 taken from options:gateway_mtu column of Logical_Router_Port row.
2790
2791 This table adds one priority-0 fallback flow that matches all packets
2792 and advances to the next table.
2793
2794 Ingress Table 16: Handle larger packets
2795
2796 For distributed logical routers or gateway routers with gateway port
2797 configured with options:gateway_mtu to a valid integer value, this ta‐
2798 ble adds the following priority-150 logical flow for each logical
2799 router port with the match inport == LRP && outport == GW_PORT && REG‐
2800 BIT_PKT_LARGER && !REGBIT_EGRESS_LOOPBACK, where LRP is the logical
2801 router port and GW_PORT is the gateway port and applies the following
2802 action for ipv4 and ipv6 respectively:
2803
2804 icmp4 {
2805 icmp4.type = 3; /* Destination Unreachable. */
2806 icmp4.code = 4; /* Frag Needed and DF was Set. */
2807 icmp4.frag_mtu = M;
2808 eth.dst = E;
2809 ip4.dst = ip4.src;
2810 ip4.src = I;
2811 ip.ttl = 255;
2812 REGBIT_EGRESS_LOOPBACK = 1;
2813 REGBIT_PKT_LARGER = 0;
2814 next(pipeline=ingress, table=0);
2815 };
2816 icmp6 {
2817 icmp6.type = 2;
2818 icmp6.code = 0;
2819 icmp6.frag_mtu = M;
2820 eth.dst = E;
2821 ip6.dst = ip6.src;
2822 ip6.src = I;
2823 ip.ttl = 255;
2824 REGBIT_EGRESS_LOOPBACK = 1;
2825 REGBIT_PKT_LARGER = 0;
2826 next(pipeline=ingress, table=0);
2827 };
2828
2829
2830 • Where M is the (fragment MTU - 58) whose value is taken
2831 from options:gateway_mtu column of Logical_Router_Port
2832 row.
2833
2834 • E is the Ethernet address of the logical router port.
2835
2836 • I is the IPv4/IPv6 address of the logical router port.
2837
2838 This table adds one priority-0 fallback flow that matches all packets
2839 and advances to the next table.
2840
2841 Ingress Table 17: Gateway Redirect
2842
2843 For distributed logical routers where one or more of the logical router
2844 ports specifies a gateway chassis, this table redirects certain packets
2845 to the distributed gateway port instances on the gateway chassises.
2846 This table has the following flows:
2847
2848 • For each NAT rule in the OVN Northbound database that can
2849 be handled in a distributed manner, a priority-100 logi‐
2850 cal flow with match ip4.src == B && outport == GW &&
2851 is_chassis_resident(P), where GW is the logical router
2852 distributed gateway port and P is the NAT logical port.
2853 IP traffic matching the above rule will be managed lo‐
2854 cally setting reg1 to C and eth.src to D, where C is NAT
2855 external ip and D is NAT external mac.
2856
2857 • For each NAT rule in the OVN Northbound database that can
2858 be handled in a distributed manner, a priority-80 logical
2859 flow with drop action if the NAT logical port is a vir‐
2860 tual port not claimed by any chassis yet.
2861
2862 • A priority-50 logical flow with match outport == GW has
2863 actions outport = CR; next;, where GW is the logical
2864 router distributed gateway port and CR is the chas‐
2865 sisredirect port representing the instance of the logical
2866 router distributed gateway port on the gateway chassis.
2867
2868 • A priority-0 logical flow with match 1 has actions next;.
2869
2870 Ingress Table 18: ARP Request
2871
2872 In the common case where the Ethernet destination has been resolved,
2873 this table outputs the packet. Otherwise, it composes and sends an ARP
2874 or IPv6 Neighbor Solicitation request. It holds the following flows:
2875
2876 • Unknown MAC address. A priority-100 flow for IPv4 packets
2877 with match eth.dst == 00:00:00:00:00:00 has the following
2878 actions:
2879
2880 arp {
2881 eth.dst = ff:ff:ff:ff:ff:ff;
2882 arp.spa = reg1;
2883 arp.tpa = reg0;
2884 arp.op = 1; /* ARP request. */
2885 output;
2886 };
2887
2888
2889 Unknown MAC address. For each IPv6 static route associ‐
2890 ated with the router with the nexthop IP: G, a prior‐
2891 ity-200 flow for IPv6 packets with match eth.dst ==
2892 00:00:00:00:00:00 && xxreg0 == G with the following ac‐
2893 tions is added:
2894
2895 nd_ns {
2896 eth.dst = E;
2897 ip6.dst = I
2898 nd.target = G;
2899 output;
2900 };
2901
2902
2903 Where E is the multicast mac derived from the Gateway IP,
2904 I is the solicited-node multicast address corresponding
2905 to the target address G.
2906
2907 Unknown MAC address. A priority-100 flow for IPv6 packets
2908 with match eth.dst == 00:00:00:00:00:00 has the following
2909 actions:
2910
2911 nd_ns {
2912 nd.target = xxreg0;
2913 output;
2914 };
2915
2916
2917 (Ingress table IP Routing initialized reg1 with the IP
2918 address owned by outport and (xx)reg0 with the next-hop
2919 IP address)
2920
2921 The IP packet that triggers the ARP/IPv6 NS request is
2922 dropped.
2923
2924 • Known MAC address. A priority-0 flow with match 1 has ac‐
2925 tions output;.
2926
2927 Egress Table 0: UNDNAT
2928
2929 This is for already established connections’ reverse traffic. i.e.,
2930 DNAT has already been done in ingress pipeline and now the packet has
2931 entered the egress pipeline as part of a reply. This traffic is unD‐
2932 NATed here.
2933
2934 • For all the configured load balancing rules for a router
2935 with gateway port in OVN_Northbound database that in‐
2936 cludes an IPv4 address VIP, for every backend IPv4 ad‐
2937 dress B defined for the VIP a priority-120 flow is pro‐
2938 grammed on gateway chassis that matches ip && ip4.src ==
2939 B && outport == GW, where GW is the logical router gate‐
2940 way port with an action ct_dnat;. If the backend IPv4 ad‐
2941 dress B is also configured with L4 port PORT of protocol
2942 P, then the match also includes P.src == PORT. These
2943 flows are not added for load balancers with IPv6 VIPs.
2944
2945 If the router is configured to force SNAT any load-bal‐
2946 anced packets, above action will be replaced by
2947 flags.force_snat_for_lb = 1; ct_dnat;.
2948
2949 • For each configuration in the OVN Northbound database
2950 that asks to change the destination IP address of a
2951 packet from an IP address of A to B, a priority-100 flow
2952 matches ip && ip4.src == B && outport == GW, where GW is
2953 the logical router gateway port, with an action ct_dnat;.
2954 If the NAT rule is of type dnat_and_snat and has state‐
2955 less=true in the options, then the action would be
2956 ip4/6.src= (B).
2957
2958 If the NAT rule cannot be handled in a distributed man‐
2959 ner, then the priority-100 flow above is only programmed
2960 on the gateway chassis.
2961
2962 If the NAT rule can be handled in a distributed manner,
2963 then there is an additional action eth.src = EA;, where
2964 EA is the ethernet address associated with the IP address
2965 A in the NAT rule. This allows upstream MAC learning to
2966 point to the correct chassis.
2967
2968 • For all IP packets, a priority-50 flow with an action
2969 flags.loopback = 1; ct_dnat;.
2970
2971 • A priority-0 logical flow with match 1 has actions next;.
2972
2973 Egress Table 1: Post UNDNAT
2974
2975 • A priority-50 logical flow is added that commits any un‐
2976 tracked flows from the previous table lr_out_undnat. This
2977 flow matches on ct.new && ip with action ct_commit { } ;
2978 next; .
2979
2980 • A priority-0 logical flow with match 1 has actions next;.
2981
2982 Egress Table 2: SNAT
2983
2984 Packets that are configured to be SNATed get their source IP address
2985 changed based on the configuration in the OVN Northbound database.
2986
2987 • A priority-120 flow to advance the IPv6 Neighbor solici‐
2988 tation packet to next table to skip SNAT. In the case
2989 where ovn-controller injects an IPv6 Neighbor Solicita‐
2990 tion packet (for nd_ns action) we don’t want the packet
2991 to go throught conntrack.
2992
2993 Egress Table 2: SNAT on Gateway Routers
2994
2995 • If the Gateway router in the OVN Northbound database has
2996 been configured to force SNAT a packet (that has been
2997 previously DNATted) to B, a priority-100 flow matches
2998 flags.force_snat_for_dnat == 1 && ip with an action
2999 ct_snat(B);.
3000
3001 • If a load balancer configured to skip snat has been ap‐
3002 plied to the Gateway router pipeline, a priority-120 flow
3003 matches flags.skip_snat_for_lb == 1 && ip with an action
3004 next;.
3005
3006 • If the Gateway router in the OVN Northbound database has
3007 been configured to force SNAT a packet (that has been
3008 previously load-balanced) using router IP (i.e op‐
3009 tions:lb_force_snat_ip=router_ip), then for each logical
3010 router port P attached to the Gateway router, a prior‐
3011 ity-110 flow matches flags.force_snat_for_lb == 1 && out‐
3012 port == P
3013 with an action ct_snat(R); where R is the IP configured
3014 on the router port. If R is an IPv4 address then the
3015 match will also include ip4 and if it is an IPv6 address,
3016 then the match will also include ip6.
3017
3018 If the logical router port P is configured with multiple
3019 IPv4 and multiple IPv6 addresses, only the first IPv4 and
3020 first IPv6 address is considered.
3021
3022 • If the Gateway router in the OVN Northbound database has
3023 been configured to force SNAT a packet (that has been
3024 previously load-balanced) to B, a priority-100 flow
3025 matches flags.force_snat_for_lb == 1 && ip with an action
3026 ct_snat(B);.
3027
3028 • For each configuration in the OVN Northbound database,
3029 that asks to change the source IP address of a packet
3030 from an IP address of A or to change the source IP ad‐
3031 dress of a packet that belongs to network A to B, a flow
3032 matches ip && ip4.src == A with an action ct_snat(B);.
3033 The priority of the flow is calculated based on the mask
3034 of A, with matches having larger masks getting higher
3035 priorities. If the NAT rule is of type dnat_and_snat and
3036 has stateless=true in the options, then the action would
3037 be ip4/6.src= (B).
3038
3039 • If the NAT rule has allowed_ext_ips configured, then
3040 there is an additional match ip4.dst == allowed_ext_ips .
3041 Similarly, for IPV6, match would be ip6.dst == al‐
3042 lowed_ext_ips.
3043
3044 • If the NAT rule has exempted_ext_ips set, then there is
3045 an additional flow configured at the priority + 1 of cor‐
3046 responding NAT rule. The flow matches if destination ip
3047 is an exempted_ext_ip and the action is next; . This flow
3048 is used to bypass the ct_snat action for a packet which
3049 is destinted to exempted_ext_ips.
3050
3051 • A priority-0 logical flow with match 1 has actions next;.
3052
3053 Egress Table 2: SNAT on Distributed Routers
3054
3055 • For each configuration in the OVN Northbound database,
3056 that asks to change the source IP address of a packet
3057 from an IP address of A or to change the source IP ad‐
3058 dress of a packet that belongs to network A to B, a flow
3059 matches ip && ip4.src == A && outport == GW, where GW is
3060 the logical router gateway port, with an action
3061 ct_snat(B);. The priority of the flow is calculated based
3062 on the mask of A, with matches having larger masks get‐
3063 ting higher priorities. If the NAT rule is of type
3064 dnat_and_snat and has stateless=true in the options, then
3065 the action would be ip4/6.src= (B).
3066
3067 If the NAT rule cannot be handled in a distributed man‐
3068 ner, then the flow above is only programmed on the gate‐
3069 way chassis increasing flow priority by 128 in order to
3070 be run first
3071
3072 If the NAT rule can be handled in a distributed manner,
3073 then there is an additional action eth.src = EA;, where
3074 EA is the ethernet address associated with the IP address
3075 A in the NAT rule. This allows upstream MAC learning to
3076 point to the correct chassis.
3077
3078 If the NAT rule has allowed_ext_ips configured, then
3079 there is an additional match ip4.dst == allowed_ext_ips .
3080 Similarly, for IPV6, match would be ip6.dst == al‐
3081 lowed_ext_ips.
3082
3083 If the NAT rule has exempted_ext_ips set, then there is
3084 an additional flow configured at the priority + 1 of cor‐
3085 responding NAT rule. The flow matches if destination ip
3086 is an exempted_ext_ip and the action is next; . This flow
3087 is used to bypass the ct_snat action for a flow which is
3088 destinted to exempted_ext_ips.
3089
3090 • A priority-0 logical flow with match 1 has actions next;.
3091
3092 Egress Table 3: Egress Loopback
3093
3094 For distributed logical routers where one of the logical router ports
3095 specifies a gateway chassis.
3096
3097 While UNDNAT and SNAT processing have already occurred by this point,
3098 this traffic needs to be forced through egress loopback on this dis‐
3099 tributed gateway port instance, in order for UNSNAT and DNAT processing
3100 to be applied, and also for IP routing and ARP resolution after all of
3101 the NAT processing, so that the packet can be forwarded to the destina‐
3102 tion.
3103
3104 This table has the following flows:
3105
3106 • For each NAT rule in the OVN Northbound database on a
3107 distributed router, a priority-100 logical flow with
3108 match ip4.dst == E && outport == GW && is_chassis_resi‐
3109 dent(P), where E is the external IP address specified in
3110 the NAT rule, GW is the logical router distributed gate‐
3111 way port. For dnat_and_snat NAT rule, P is the logical
3112 port specified in the NAT rule. If logical_port column of
3113 NAT table is NOT set, then P is the chassisredirect port
3114 of GW with the following actions:
3115
3116 clone {
3117 ct_clear;
3118 inport = outport;
3119 outport = "";
3120 flags = 0;
3121 flags.loopback = 1;
3122 reg0 = 0;
3123 reg1 = 0;
3124 ...
3125 reg9 = 0;
3126 REGBIT_EGRESS_LOOPBACK = 1;
3127 next(pipeline=ingress, table=0);
3128 };
3129
3130
3131 flags.loopback is set since in_port is unchanged and the
3132 packet may return back to that port after NAT processing.
3133 REGBIT_EGRESS_LOOPBACK is set to indicate that egress
3134 loopback has occurred, in order to skip the source IP ad‐
3135 dress check against the router address.
3136
3137 • A priority-0 logical flow with match 1 has actions next;.
3138
3139 Egress Table 4: Delivery
3140
3141 Packets that reach this table are ready for delivery. It contains:
3142
3143 • Priority-110 logical flows that match IP multicast pack‐
3144 ets on each enabled logical router port and modify the
3145 Ethernet source address of the packets to the Ethernet
3146 address of the port and then execute action output;.
3147
3148 • Priority-100 logical flows that match packets on each en‐
3149 abled logical router port, with action output;.
3150
3151
3152
3153OVN 21.09.0 ovn-northd ovn-northd(8)