1ovn-northd(8) OVN Manual ovn-northd(8)
2
3
4
6 ovn-northd and ovn-northd-ddlog - Open Virtual Network central control
7 daemon
8
10 ovn-northd [options]
11
13 ovn-northd is a centralized daemon responsible for translating the
14 high-level OVN configuration into logical configuration consumable by
15 daemons such as ovn-controller. It translates the logical network con‐
16 figuration in terms of conventional network concepts, taken from the
17 OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
18 the OVN Southbound Database (see ovn-sb(5)) below it.
19
20 ovn-northd is implemented in C. ovn-northd-ddlog is a compatible imple‐
21 mentation written in DDlog, a language for incremental database pro‐
22 cessing. This documentation applies to both implementations, with dif‐
23 ferences indicated where relevant.
24
26 --ovnnb-db=database
27 The OVSDB database containing the OVN Northbound Database. If
28 the OVN_NB_DB environment variable is set, its value is used as
29 the default. Otherwise, the default is unix:/ovnnb_db.sock.
30
31 --ovnsb-db=database
32 The OVSDB database containing the OVN Southbound Database. If
33 the OVN_SB_DB environment variable is set, its value is used as
34 the default. Otherwise, the default is unix:/ovnsb_db.sock.
35
36 --ddlog-record=file
37 This option is for ovn-north-ddlog only. It causes the daemon to
38 record the initial database state and later changes to file in
39 the text-based DDlog command format. The ovn_northd_cli program
40 can later replay these changes for debugging purposes. This op‐
41 tion has a performance impact. See debugging-ddlog.rst in the
42 OVN documentation for more details.
43
44 --dry-run
45 Causes ovn-northd to start paused. In the paused state,
46 ovn-northd does not apply any changes to the databases, although
47 it continues to monitor them. For more information, see the
48 pause command, under Runtime Management Commands below.
49
50 For ovn-northd-ddlog, one could use this option with
51 --ddlog-record to generate a replay log without restarting a
52 process or disturbing a running system.
53
54 n-threads N
55 In certain situations, it may be desirable to enable paral‐
56 lelization on a system to decrease latency (at the potential
57 cost of increasing CPU usage).
58
59 This option will cause ovn-northd to use N threads when building
60 logical flows, when N is within [2-256]. If N is 1, paralleliza‐
61 tion is disabled (default behavior). If N is less than 1, then N
62 is set to 1, parallelization is disabled and a warning is
63 logged. If N is more than 256, then N is set to 256, paral‐
64 lelization is enabled (with 256 threads) and a warning is
65 logged.
66
67 ovn-northd-ddlog does not support this option.
68
69 database in the above options must be an OVSDB active or passive con‐
70 nection method, as described in ovsdb(7).
71
72 Daemon Options
73 --pidfile[=pidfile]
74 Causes a file (by default, program.pid) to be created indicating
75 the PID of the running process. If the pidfile argument is not
76 specified, or if it does not begin with /, then it is created in
77 .
78
79 If --pidfile is not specified, no pidfile is created.
80
81 --overwrite-pidfile
82 By default, when --pidfile is specified and the specified pid‐
83 file already exists and is locked by a running process, the dae‐
84 mon refuses to start. Specify --overwrite-pidfile to cause it to
85 instead overwrite the pidfile.
86
87 When --pidfile is not specified, this option has no effect.
88
89 --detach
90 Runs this program as a background process. The process forks,
91 and in the child it starts a new session, closes the standard
92 file descriptors (which has the side effect of disabling logging
93 to the console), and changes its current directory to the root
94 (unless --no-chdir is specified). After the child completes its
95 initialization, the parent exits.
96
97 --monitor
98 Creates an additional process to monitor this program. If it
99 dies due to a signal that indicates a programming error (SIGA‐
100 BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
101 or SIGXFSZ) then the monitor process starts a new copy of it. If
102 the daemon dies or exits for another reason, the monitor process
103 exits.
104
105 This option is normally used with --detach, but it also func‐
106 tions without it.
107
108 --no-chdir
109 By default, when --detach is specified, the daemon changes its
110 current working directory to the root directory after it de‐
111 taches. Otherwise, invoking the daemon from a carelessly chosen
112 directory would prevent the administrator from unmounting the
113 file system that holds that directory.
114
115 Specifying --no-chdir suppresses this behavior, preventing the
116 daemon from changing its current working directory. This may be
117 useful for collecting core files, since it is common behavior to
118 write core dumps into the current working directory and the root
119 directory is not a good directory to use.
120
121 This option has no effect when --detach is not specified.
122
123 --no-self-confinement
124 By default this daemon will try to self-confine itself to work
125 with files under well-known directories determined at build
126 time. It is better to stick with this default behavior and not
127 to use this flag unless some other Access Control is used to
128 confine daemon. Note that in contrast to other access control
129 implementations that are typically enforced from kernel-space
130 (e.g. DAC or MAC), self-confinement is imposed from the user-
131 space daemon itself and hence should not be considered as a full
132 confinement strategy, but instead should be viewed as an addi‐
133 tional layer of security.
134
135 --user=user:group
136 Causes this program to run as a different user specified in
137 user:group, thus dropping most of the root privileges. Short
138 forms user and :group are also allowed, with current user or
139 group assumed, respectively. Only daemons started by the root
140 user accepts this argument.
141
142 On Linux, daemons will be granted CAP_IPC_LOCK and
143 CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
144 that interact with a datapath, such as ovs-vswitchd, will be
145 granted three additional capabilities, namely CAP_NET_ADMIN,
146 CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
147 apply even if the new user is root.
148
149 On Windows, this option is not currently supported. For security
150 reasons, specifying this option will cause the daemon process
151 not to start.
152
153 Logging Options
154 -v[spec]
155 --verbose=[spec]
156 Sets logging levels. Without any spec, sets the log level for ev‐
157 ery module and destination to dbg. Otherwise, spec is a list of
158 words separated by spaces or commas or colons, up to one from each
159 category below:
160
161 • A valid module name, as displayed by the vlog/list command
162 on ovs-appctl(8), limits the log level change to the speci‐
163 fied module.
164
165 • syslog, console, or file, to limit the log level change to
166 only to the system log, to the console, or to a file, re‐
167 spectively. (If --detach is specified, the daemon closes
168 its standard file descriptors, so logging to the console
169 will have no effect.)
170
171 On Windows platform, syslog is accepted as a word and is
172 only useful along with the --syslog-target option (the word
173 has no effect otherwise).
174
175 • off, emer, err, warn, info, or dbg, to control the log
176 level. Messages of the given severity or higher will be
177 logged, and messages of lower severity will be filtered
178 out. off filters out all messages. See ovs-appctl(8) for a
179 definition of each log level.
180
181 Case is not significant within spec.
182
183 Regardless of the log levels set for file, logging to a file will
184 not take place unless --log-file is also specified (see below).
185
186 For compatibility with older versions of OVS, any is accepted as a
187 word but has no effect.
188
189 -v
190 --verbose
191 Sets the maximum logging verbosity level, equivalent to --ver‐
192 bose=dbg.
193
194 -vPATTERN:destination:pattern
195 --verbose=PATTERN:destination:pattern
196 Sets the log pattern for destination to pattern. Refer to ovs-ap‐
197 pctl(8) for a description of the valid syntax for pattern.
198
199 -vFACILITY:facility
200 --verbose=FACILITY:facility
201 Sets the RFC5424 facility of the log message. facility can be one
202 of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
203 ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
204 local4, local5, local6 or local7. If this option is not specified,
205 daemon is used as the default for the local system syslog and lo‐
206 cal0 is used while sending a message to the target provided via
207 the --syslog-target option.
208
209 --log-file[=file]
210 Enables logging to a file. If file is specified, then it is used
211 as the exact name for the log file. The default log file name used
212 if file is omitted is /var/log/ovn/program.log.
213
214 --syslog-target=host:port
215 Send syslog messages to UDP port on host, in addition to the sys‐
216 tem syslog. The host must be a numerical IP address, not a host‐
217 name.
218
219 --syslog-method=method
220 Specify method as how syslog messages should be sent to syslog
221 daemon. The following forms are supported:
222
223 • libc, to use the libc syslog() function. Downside of using
224 this options is that libc adds fixed prefix to every mes‐
225 sage before it is actually sent to the syslog daemon over
226 /dev/log UNIX domain socket.
227
228 • unix:file, to use a UNIX domain socket directly. It is pos‐
229 sible to specify arbitrary message format with this option.
230 However, rsyslogd 8.9 and older versions use hard coded
231 parser function anyway that limits UNIX domain socket use.
232 If you want to use arbitrary message format with older
233 rsyslogd versions, then use UDP socket to localhost IP ad‐
234 dress instead.
235
236 • udp:ip:port, to use a UDP socket. With this method it is
237 possible to use arbitrary message format also with older
238 rsyslogd. When sending syslog messages over UDP socket ex‐
239 tra precaution needs to be taken into account, for example,
240 syslog daemon needs to be configured to listen on the spec‐
241 ified UDP port, accidental iptables rules could be inter‐
242 fering with local syslog traffic and there are some secu‐
243 rity considerations that apply to UDP sockets, but do not
244 apply to UNIX domain sockets.
245
246 • null, to discard all messages logged to syslog.
247
248 The default is taken from the OVS_SYSLOG_METHOD environment vari‐
249 able; if it is unset, the default is libc.
250
251 PKI Options
252 PKI configuration is required in order to use SSL for the connections
253 to the Northbound and Southbound databases.
254
255 -p privkey.pem
256 --private-key=privkey.pem
257 Specifies a PEM file containing the private key used as
258 identity for outgoing SSL connections.
259
260 -c cert.pem
261 --certificate=cert.pem
262 Specifies a PEM file containing a certificate that certi‐
263 fies the private key specified on -p or --private-key to be
264 trustworthy. The certificate must be signed by the certifi‐
265 cate authority (CA) that the peer in SSL connections will
266 use to verify it.
267
268 -C cacert.pem
269 --ca-cert=cacert.pem
270 Specifies a PEM file containing the CA certificate for ver‐
271 ifying certificates presented to this program by SSL peers.
272 (This may be the same certificate that SSL peers use to
273 verify the certificate specified on -c or --certificate, or
274 it may be a different one, depending on the PKI design in
275 use.)
276
277 -C none
278 --ca-cert=none
279 Disables verification of certificates presented by SSL
280 peers. This introduces a security risk, because it means
281 that certificates cannot be verified to be those of known
282 trusted hosts.
283
284 Other Options
285 --unixctl=socket
286 Sets the name of the control socket on which program listens for
287 runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
288 below). If socket does not begin with /, it is interpreted as
289 relative to . If --unixctl is not used at all, the default
290 socket is /program.pid.ctl, where pid is program’s process ID.
291
292 On Windows a local named pipe is used to listen for runtime man‐
293 agement commands. A file is created in the absolute path as
294 pointed by socket or if --unixctl is not used at all, a file is
295 created as program in the configured OVS_RUNDIR directory. The
296 file exists just to mimic the behavior of a Unix domain socket.
297
298 Specifying none for socket disables the control socket feature.
299
300
301
302 -h
303 --help
304 Prints a brief help message to the console.
305
306 -V
307 --version
308 Prints version information to the console.
309
311 ovs-appctl can send commands to a running ovn-northd process. The cur‐
312 rently supported commands are described below.
313
314 exit Causes ovn-northd to gracefully terminate.
315
316 pause Pauses ovn-northd. When it is paused, ovn-northd receives
317 changes from the Northbound and Southbound database
318 changes as usual, but it does not send any updates. A
319 paused ovn-northd also drops database locks, which allows
320 any other non-paused instance of ovn-northd to take over.
321
322 resume Resumes the ovn-northd operation to process Northbound
323 and Southbound database contents and generate logical
324 flows. This will also instruct ovn-northd to aspire for
325 the lock on SB DB.
326
327 is-paused
328 Returns "true" if ovn-northd is currently paused, "false"
329 otherwise.
330
331 status Prints this server’s status. Status will be "active" if
332 ovn-northd has acquired OVSDB lock on SB DB, "standby" if
333 it has not or "paused" if this instance is paused.
334
335 sb-cluster-state-reset
336 Reset southbound database cluster status when databases
337 are destroyed and rebuilt.
338
339 If all databases in a clustered southbound database are
340 removed from disk, then the stored index of all databases
341 will be reset to zero. This will cause ovn-northd to be
342 unable to read or write to the southbound database, be‐
343 cause it will always detect the data as stale. In such a
344 case, run this command so that ovn-northd will reset its
345 local index so that it can interact with the southbound
346 database again.
347
348 nb-cluster-state-reset
349 Reset northbound database cluster status when databases
350 are destroyed and rebuilt.
351
352 This performs the same task as sb-cluster-state-reset ex‐
353 cept for the northbound database client.
354
355 set-n-threads N
356 Set the number of threads used for building logical
357 flows. When N is within [2-256], parallelization is en‐
358 abled. When N is 1 parallelization is disabled. When N is
359 less than 1 or more than 256, an error is returned. If
360 ovn-northd fails to start parallelization (e.g. fails to
361 setup semaphores, parallelization is disabled and an er‐
362 ror is returned.
363
364 get-n-threads
365 Return the number of threads used for building logical
366 flows.
367
368 Only ovn-northd-ddlog supports the following commands:
369
370 enable-cpu-profiling
371 disable-cpu-profiling
372 Enables or disables profiling of CPU time used by the DDlog
373 engine. When CPU profiling is enabled, the profile command
374 (see below) will include DDlog CPU usage statistics in its
375 output. Enabling CPU profiling will slow ovn-northd-ddlog.
376 Disabling CPU profiling does not clear any previously
377 recorded statistics.
378
379 profile
380 Outputs a profile of the current and peak sizes of arrange‐
381 ments inside DDlog. This profiling data can be useful for
382 optimizing DDlog code. If CPU profiling was previously en‐
383 abled (even if it was later disabled), the output also in‐
384 cludes a CPU time profile. See Profiling inside the tuto‐
385 rial in the DDlog repository for an introduction to profil‐
386 ing DDlog.
387
389 You may run ovn-northd more than once in an OVN deployment. When con‐
390 nected to a standalone or clustered DB setup, OVN will automatically
391 ensure that only one of them is active at a time. If multiple instances
392 of ovn-northd are running and the active ovn-northd fails, one of the
393 hot standby instances of ovn-northd will automatically take over.
394
395 Active-Standby with multiple OVN DB servers
396 You may run multiple OVN DB servers in an OVN deployment with:
397
398 • OVN DB servers deployed in active/passive mode with one
399 active and multiple passive ovsdb-servers.
400
401 • ovn-northd also deployed on all these nodes, using unix
402 ctl sockets to connect to the local OVN DB servers.
403
404 In such deployments, the ovn-northds on the passive nodes will process
405 the DB changes and compute logical flows to be thrown out later, be‐
406 cause write transactions are not allowed by the passive ovsdb-servers.
407 It results in unnecessary CPU usage.
408
409 With the help of runtime management command pause, you can pause
410 ovn-northd on these nodes. When a passive node becomes master, you can
411 use the runtime management command resume to resume the ovn-northd to
412 process the DB changes.
413
415 One of the main purposes of ovn-northd is to populate the Logical_Flow
416 table in the OVN_Southbound database. This section describes how
417 ovn-northd does this for switch and router logical datapaths.
418
419 Logical Switch Datapaths
420 Ingress Table 0: Admission Control and Ingress Port Security check
421
422 Ingress table 0 contains these logical flows:
423
424 • Priority 100 flows to drop packets with VLAN tags or mul‐
425 ticast Ethernet source addresses.
426
427 • For each disabled logical port, a priority 100 flow is
428 added which matches on all packets and applies the action
429 REGBIT_PORT_SEC_DROP" = 1; next;" so that the packets are
430 dropped in the next stage.
431
432 • For each (enabled) vtep logical port, a priority 70 flow
433 is added which matches on all packets and applies the ac‐
434 tion next(pipeline=ingress, table=S_SWITCH_IN_L2_LKUP) =
435 1; to skip most stages of ingress pipeline and go di‐
436 rectly to ingress L2 lookup table to determine the output
437 port. Packets from VTEP (RAMP) switch should not be sub‐
438 jected to any ACL checks. Egress pipeline will do the ACL
439 checks.
440
441 • For each enabled logical port configured with qdisc queue
442 id in the options:qdisc_queue_id column of Logi‐
443 cal_Switch_Port, a priority 70 flow is added which
444 matches on all packets and applies the action
445 set_queue(id); REGBIT_PORT_SEC_DROP" =
446 check_in_port_sec(); next;".
447
448 • A priority 1 flow is added which matches on all packets
449 for all the logical ports and applies the action REG‐
450 BIT_PORT_SEC_DROP" = check_in_port_sec(); next; to evalu‐
451 ate the port security. The action check_in_port_sec ap‐
452 plies the port security rules defined in the port_secu‐
453 rity column of Logical_Switch_Port table.
454
455 Ingress Table 1: Ingress Port Security - Apply
456
457 This table drops the packets if the port security check failed in the
458 previous stage i.e the register bit REGBIT_PORT_SEC_DROP is set to 1.
459
460 Ingress table 1 contains these logical flows:
461
462 • A priority-50 fallback flow that drops the packet if the
463 register bit REGBIT_PORT_SEC_DROP is set to 1.
464
465 • One priority-0 fallback flow that matches all packets and
466 advances to the next table.
467
468 Ingress Table 2: Lookup MAC address learning table
469
470 This table looks up the MAC learning table of the logical switch data‐
471 path to check if the port-mac pair is present or not. MAC is learnt
472 only for logical switch VIF ports whose port security is disabled and
473 ’unknown’ address set.
474
475 • For each such logical port p whose port security is dis‐
476 abled and ’unknown’ address set following flow is added.
477
478 • Priority 100 flow with the match inport == p and
479 action reg0[11] = lookup_fdb(inport, eth.src);
480 next;
481
482 • One priority-0 fallback flow that matches all packets and
483 advances to the next table.
484
485 Ingress Table 3: Learn MAC of ’unknown’ ports.
486
487 This table learns the MAC addresses seen on the logical ports whose
488 port security is disabled and ’unknown’ address set if the lookup_fdb
489 action returned false in the previous table.
490
491 • For each such logical port p whose port security is dis‐
492 abled and ’unknown’ address set following flow is added.
493
494 • Priority 100 flow with the match inport == p &&
495 reg0[11] == 0 and action put_fdb(inport, eth.src);
496 next; which stores the port-mac in the mac learn‐
497 ing table of the logical switch datapath and ad‐
498 vances the packet to the next table.
499
500 • One priority-0 fallback flow that matches all packets and
501 advances to the next table.
502
503 Ingress Table 4: from-lport Pre-ACLs
504
505 This table prepares flows for possible stateful ACL processing in
506 ingress table ACLs. It contains a priority-0 flow that simply moves
507 traffic to the next table. If stateful ACLs are used in the logical
508 datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
509 1; next;) for table Pre-stateful to send IP packets to the connection
510 tracker before eventually advancing to ingress table ACLs. If special
511 ports such as route ports or localnet ports can’t use ct(), a prior‐
512 ity-110 flow is added to skip over stateful ACLs. Multicast, IPv6
513 Neighbor Discovery and MLD traffic also skips stateful ACLs. For "al‐
514 low-stateless" ACLs, a flow is added to bypass setting the hint for
515 connection tracker processing.
516
517 This table also has a priority-110 flow with the match eth.dst == E for
518 all logical switch datapaths to move traffic to the next table. Where E
519 is the service monitor mac defined in the options:svc_monitor_mac colum
520 of NB_Global table.
521
522 Ingress Table 5: Pre-LB
523
524 This table prepares flows for possible stateful load balancing process‐
525 ing in ingress table LB and Stateful. It contains a priority-0 flow
526 that simply moves traffic to the next table. Moreover it contains two
527 priority-110 flows to move multicast, IPv6 Neighbor Discovery and MLD
528 traffic to the next table. If load balancing rules with virtual IP ad‐
529 dresses (and ports) are configured in OVN_Northbound database for a
530 logical switch datapath, a priority-100 flow is added with the match ip
531 to match on IP packets and sets the action reg0[2] = 1; next; to act as
532 a hint for table Pre-stateful to send IP packets to the connection
533 tracker for packet de-fragmentation (and to possibly do DNAT for al‐
534 ready established load balanced traffic) before eventually advancing to
535 ingress table Stateful. If controller_event has been enabled and load
536 balancing rules with empty backends have been added in OVN_Northbound,
537 a 130 flow is added to trigger ovn-controller events whenever the chas‐
538 sis receives a packet for that particular VIP. If event-elb meter has
539 been previously created, it will be associated to the empty_lb logical
540 flow
541
542 Prior to OVN 20.09 we were setting the reg0[0] = 1 only if the IP des‐
543 tination matches the load balancer VIP. However this had few issues
544 cases where a logical switch doesn’t have any ACLs with allow-related
545 action. To understand the issue lets a take a TCP load balancer -
546 10.0.0.10:80=10.0.0.3:80. If a logical port - p1 with IP - 10.0.0.5
547 opens a TCP connection with the VIP - 10.0.0.10, then the packet in the
548 ingress pipeline of ’p1’ is sent to the p1’s conntrack zone id and the
549 packet is load balanced to the backend - 10.0.0.3. For the reply packet
550 from the backend lport, it is not sent to the conntrack of backend
551 lport’s zone id. This is fine as long as the packet is valid. Suppose
552 the backend lport sends an invalid TCP packet (like incorrect sequence
553 number), the packet gets delivered to the lport ’p1’ without unDNATing
554 the packet to the VIP - 10.0.0.10. And this causes the connection to be
555 reset by the lport p1’s VIF.
556
557 We can’t fix this issue by adding a logical flow to drop ct.inv packets
558 in the egress pipeline since it will drop all other connections not
559 destined to the load balancers. To fix this issue, we send all the
560 packets to the conntrack in the ingress pipeline if a load balancer is
561 configured. We can now add a lflow to drop ct.inv packets.
562
563 This table also has priority-120 flows that punt all IGMP/MLD packets
564 to ovn-controller if the switch is an interconnect switch with multi‐
565 cast snooping enabled.
566
567 This table also has a priority-110 flow with the match eth.dst == E for
568 all logical switch datapaths to move traffic to the next table. Where E
569 is the service monitor mac defined in the options:svc_monitor_mac colum
570 of NB_Global table.
571
572 This table also has a priority-110 flow with the match inport == I for
573 all logical switch datapaths to move traffic to the next table. Where I
574 is the peer of a logical router port. This flow is added to skip the
575 connection tracking of packets which enter from logical router datapath
576 to logical switch datapath.
577
578 Ingress Table 6: Pre-stateful
579
580 This table prepares flows for all possible stateful processing in next
581 tables. It contains a priority-0 flow that simply moves traffic to the
582 next table.
583
584 • Priority-120 flows that send the packets to connection
585 tracker using ct_lb_mark; as the action so that the al‐
586 ready established traffic destined to the load balancer
587 VIP gets DNATted based on a hint provided by the previous
588 tables (with a match for reg0[2] == 1 and on supported
589 load balancer protocols and address families). For IPv4
590 traffic the flows also load the original destination IP
591 and transport port in registers reg1 and reg2. For IPv6
592 traffic the flows also load the original destination IP
593 and transport port in registers xxreg1 and reg2.
594
595 • A priority-110 flow sends the packets to connection
596 tracker based on a hint provided by the previous tables
597 (with a match for reg0[2] == 1) by using the ct_lb_mark;
598 action. This flow is added to handle the traffic for load
599 balancer VIPs whose protocol is not defined (mainly for
600 ICMP traffic).
601
602 • A priority-100 flow sends the packets to connection
603 tracker based on a hint provided by the previous tables
604 (with a match for reg0[0] == 1) by using the ct_next; ac‐
605 tion.
606
607 Ingress Table 7: from-lport ACL hints
608
609 This table consists of logical flows that set hints (reg0 bits) to be
610 used in the next stage, in the ACL processing table, if stateful ACLs
611 or load balancers are configured. Multiple hints can be set for the
612 same packet. The possible hints are:
613
614 • reg0[7]: the packet might match an allow-related ACL and
615 might have to commit the connection to conntrack.
616
617 • reg0[8]: the packet might match an allow-related ACL but
618 there will be no need to commit the connection to con‐
619 ntrack because it already exists.
620
621 • reg0[9]: the packet might match a drop/reject.
622
623 • reg0[10]: the packet might match a drop/reject ACL but
624 the connection was previously allowed so it might have to
625 be committed again with ct_label=1/1.
626
627 The table contains the following flows:
628
629 • A priority-65535 flow to advance to the next table if the
630 logical switch has no ACLs configured, otherwise a prior‐
631 ity-0 flow to advance to the next table.
632
633 • A priority-7 flow that matches on packets that initiate a
634 new session. This flow sets reg0[7] and reg0[9] and then
635 advances to the next table.
636
637 • A priority-6 flow that matches on packets that are in the
638 request direction of an already existing session that has
639 been marked as blocked. This flow sets reg0[7] and
640 reg0[9] and then advances to the next table.
641
642 • A priority-5 flow that matches untracked packets. This
643 flow sets reg0[8] and reg0[9] and then advances to the
644 next table.
645
646 • A priority-4 flow that matches on packets that are in the
647 request direction of an already existing session that has
648 not been marked as blocked. This flow sets reg0[8] and
649 reg0[10] and then advances to the next table.
650
651 • A priority-3 flow that matches on packets that are in not
652 part of established sessions. This flow sets reg0[9] and
653 then advances to the next table.
654
655 • A priority-2 flow that matches on packets that are part
656 of an established session that has been marked as
657 blocked. This flow sets reg0[9] and then advances to the
658 next table.
659
660 • A priority-1 flow that matches on packets that are part
661 of an established session that has not been marked as
662 blocked. This flow sets reg0[10] and then advances to the
663 next table.
664
665 Ingress table 8: from-lport ACLs before LB
666
667 Logical flows in this table closely reproduce those in the ACL table in
668 the OVN_Northbound database for the from-lport direction without the
669 option apply-after-lb set or set to false. The priority values from the
670 ACL table have a limited range and have 1000 added to them to leave
671 room for OVN default flows at both higher and lower priorities.
672
673 • allow ACLs translate into logical flows with the next;
674 action. If there are any stateful ACLs on this datapath,
675 then allow ACLs translate to ct_commit; next; (which acts
676 as a hint for the next tables to commit the connection to
677 conntrack). In case the ACL has a label then reg3 is
678 loaded with the label value and reg0[13] bit is set to 1
679 (which acts as a hint for the next tables to commit the
680 label to conntrack).
681
682 • allow-related ACLs translate into logical flows with the
683 ct_commit(ct_label=0/1); next; actions for new connec‐
684 tions and reg0[1] = 1; next; for existing connections. In
685 case the ACL has a label then reg3 is loaded with the la‐
686 bel value and reg0[13] bit is set to 1 (which acts as a
687 hint for the next tables to commit the label to con‐
688 ntrack).
689
690 • allow-stateless ACLs translate into logical flows with
691 the next; action.
692
693 • reject ACLs translate into logical flows with the tcp_re‐
694 set { output <-> inport; next(pipeline=egress,table=5);}
695 action for TCP connections,icmp4/icmp6 action for UDP
696 connections, and sctp_abort {output <-%gt; inport;
697 next(pipeline=egress,table=5);} action for SCTP associa‐
698 tions.
699
700 • Other ACLs translate to drop; for new or untracked con‐
701 nections and ct_commit(ct_label=1/1); for known connec‐
702 tions. Setting ct_label marks a connection as one that
703 was previously allowed, but should no longer be allowed
704 due to a policy change.
705
706 This table contains a priority-65535 flow to advance to the next table
707 if the logical switch has no ACLs configured, otherwise a priority-0
708 flow to advance to the next table so that ACLs allow packets by default
709 if options:default_acl_drop colum of NB_Global is false or not set.
710 Otherwise the flow action is set to drop; to implement a default drop
711 behavior.
712
713 If the logical datapath has a stateful ACL or a load balancer with VIP
714 configured, the following flows will also be added:
715
716 • If options:default_acl_drop colum of NB_Global is false
717 or not set, a priority-1 flow that sets the hint to com‐
718 mit IP traffic that is not part of established sessions
719 to the connection tracker (with action reg0[1] = 1;
720 next;). This is needed for the default allow policy be‐
721 cause, while the initiator’s direction may not have any
722 stateful rules, the server’s may and then its return
723 traffic would not be known and marked as invalid.
724
725 • If options:default_acl_drop colum of NB_Global is true, a
726 priority-1 flow that drops IP traffic that is not part of
727 established sessions.
728
729 • A priority-1 flow that sets the hint to commit IP traffic
730 to the connection tracker (with action reg0[1] = 1;
731 next;). This is needed for the default allow policy be‐
732 cause, while the initiator’s direction may not have any
733 stateful rules, the server’s may and then its return
734 traffic would not be known and marked as invalid.
735
736 • A priority-65532 flow that allows any traffic in the re‐
737 ply direction for a connection that has been committed to
738 the connection tracker (i.e., established flows), as long
739 as the committed flow does not have ct_mark.blocked set.
740 We only handle traffic in the reply direction here be‐
741 cause we want all packets going in the request direction
742 to still go through the flows that implement the cur‐
743 rently defined policy based on ACLs. If a connection is
744 no longer allowed by policy, ct_mark.blocked will get set
745 and packets in the reply direction will no longer be al‐
746 lowed, either. This flow also clears the register bits
747 reg0[9] and reg0[10]. If ACL logging and logging of re‐
748 lated packets is enabled, then a companion priority-65533
749 flow will be installed that accomplishes the same thing
750 but also logs the traffic.
751
752 • A priority-65532 flow that allows any traffic that is
753 considered related to a committed flow in the connection
754 tracker (e.g., an ICMP Port Unreachable from a non-lis‐
755 tening UDP port), as long as the committed flow does not
756 have ct_mark.blocked set. If ACL logging and logging of
757 related packets is enabled, then a companion prior‐
758 ity-65533 flow will be installed that accomplishes the
759 same thing but also logs the traffic.
760
761 • A priority-65532 flow that drops all traffic marked by
762 the connection tracker as invalid.
763
764 • A priority-65532 flow that drops all traffic in the reply
765 direction with ct_mark.blocked set meaning that the con‐
766 nection should no longer be allowed due to a policy
767 change. Packets in the request direction are skipped here
768 to let a newly created ACL re-allow this connection.
769
770 • A priority-65532 flow that allows IPv6 Neighbor solicita‐
771 tion, Neighbor discover, Router solicitation, Router ad‐
772 vertisement and MLD packets.
773
774 If the logical datapath has any ACL or a load balancer with VIP config‐
775 ured, the following flow will also be added:
776
777 • A priority 34000 logical flow is added for each logical
778 switch datapath with the match eth.dst = E to allow the
779 service monitor reply packet destined to ovn-controller
780 with the action next, where E is the service monitor mac
781 defined in the options:svc_monitor_mac colum of NB_Global
782 table.
783
784 Ingress Table 9: from-lport QoS Marking
785
786 Logical flows in this table closely reproduce those in the QoS table
787 with the action column set in the OVN_Northbound database for the
788 from-lport direction.
789
790 • For every qos_rules entry in a logical switch with DSCP
791 marking enabled, a flow will be added at the priority
792 mentioned in the QoS table.
793
794 • One priority-0 fallback flow that matches all packets and
795 advances to the next table.
796
797 Ingress Table 10: from-lport QoS Meter
798
799 Logical flows in this table closely reproduce those in the QoS table
800 with the bandwidth column set in the OVN_Northbound database for the
801 from-lport direction.
802
803 • For every qos_rules entry in a logical switch with meter‐
804 ing enabled, a flow will be added at the priority men‐
805 tioned in the QoS table.
806
807 • One priority-0 fallback flow that matches all packets and
808 advances to the next table.
809
810 Ingress Table 11: LB
811
812 • For all the configured load balancing rules for a switch
813 in OVN_Northbound database that includes a L4 port PORT
814 of protocol P and IP address VIP, a priority-120 flow is
815 added. For IPv4 VIPs , the flow matches ct.new && ip &&
816 ip4.dst == VIP && P && P.dst == PORT. For IPv6 VIPs, the
817 flow matches ct.new && ip && ip6.dst == VIP && P && P.dst
818 == PORT. The flow’s action is ct_lb_mark(args) , where
819 args contains comma separated IP addresses (and optional
820 port numbers) to load balance to. The address family of
821 the IP addresses of args is the same as the address fam‐
822 ily of VIP. If health check is enabled, then args will
823 only contain those endpoints whose service monitor status
824 entry in OVN_Southbound db is either online or empty. For
825 IPv4 traffic the flow also loads the original destination
826 IP and transport port in registers reg1 and reg2. For
827 IPv6 traffic the flow also loads the original destination
828 IP and transport port in registers xxreg1 and reg2.
829
830 • For all the configured load balancing rules for a switch
831 in OVN_Northbound database that includes just an IP ad‐
832 dress VIP to match on, OVN adds a priority-110 flow. For
833 IPv4 VIPs, the flow matches ct.new && ip && ip4.dst ==
834 VIP. For IPv6 VIPs, the flow matches ct.new && ip &&
835 ip6.dst == VIP. The action on this flow is
836 ct_lb_mark(args), where args contains comma separated IP
837 addresses of the same address family as VIP. For IPv4
838 traffic the flow also loads the original destination IP
839 and transport port in registers reg1 and reg2. For IPv6
840 traffic the flow also loads the original destination IP
841 and transport port in registers xxreg1 and reg2.
842
843 • If the load balancer is created with --reject option and
844 it has no active backends, a TCP reset segment (for tcp)
845 or an ICMP port unreachable packet (for all other kind of
846 traffic) will be sent whenever an incoming packet is re‐
847 ceived for this load-balancer. Please note using --reject
848 option will disable empty_lb SB controller event for this
849 load balancer.
850
851 Ingress table 12: from-lport ACLs after LB
852
853 Logical flows in this table closely reproduce those in the ACL table in
854 the OVN_Northbound database for the from-lport direction with the op‐
855 tion apply-after-lb set to true. The priority values from the ACL table
856 have a limited range and have 1000 added to them to leave room for OVN
857 default flows at both higher and lower priorities.
858
859 • allow apply-after-lb ACLs translate into logical flows
860 with the next; action. If there are any stateful ACLs
861 (including both before-lb and after-lb ACLs) on this
862 datapath, then allow ACLs translate to ct_commit; next;
863 (which acts as a hint for the next tables to commit the
864 connection to conntrack). In case the ACL has a label
865 then reg3 is loaded with the label value and reg0[13] bit
866 is set to 1 (which acts as a hint for the next tables to
867 commit the label to conntrack).
868
869 • allow-related apply-after-lb ACLs translate into logical
870 flows with the ct_commit(ct_label=0/1); next; actions for
871 new connections and reg0[1] = 1; next; for existing con‐
872 nections. In case the ACL has a label then reg3 is loaded
873 with the label value and reg0[13] bit is set to 1 (which
874 acts as a hint for the next tables to commit the label to
875 conntrack).
876
877 • allow-stateless apply-after-lb ACLs translate into logi‐
878 cal flows with the next; action.
879
880 • reject apply-after-lb ACLs translate into logical flows
881 with the tcp_reset { output <-> inport; next(pipe‐
882 line=egress,table=5);} action for TCP connec‐
883 tions,icmp4/icmp6 action for UDP connections, and
884 sctp_abort {output <-%gt; inport; next(pipe‐
885 line=egress,table=5);} action for SCTP associations.
886
887 • Other apply-after-lb ACLs translate to drop; for new or
888 untracked connections and ct_commit(ct_label=1/1); for
889 known connections. Setting ct_label marks a connection as
890 one that was previously allowed, but should no longer be
891 allowed due to a policy change.
892
893 • One priority-0 fallback flow that matches all packets and
894 advances to the next table.
895
896 Ingress Table 13: Stateful
897
898 • A priority 100 flow is added which commits the packet to
899 the conntrack and sets the most significant 32-bits of
900 ct_label with the reg3 value based on the hint provided
901 by previous tables (with a match for reg0[1] == 1 &&
902 reg0[13] == 1). This is used by the ACLs with label to
903 commit the label value to conntrack.
904
905 • For ACLs without label, a second priority-100 flow com‐
906 mits packets to connection tracker using ct_commit; next;
907 action based on a hint provided by the previous tables
908 (with a match for reg0[1] == 1 && reg0[13] == 0).
909
910 • A priority-0 flow that simply moves traffic to the next
911 table.
912
913 Ingress Table 14: Pre-Hairpin
914
915 • If the logical switch has load balancer(s) configured,
916 then a priority-100 flow is added with the match ip &&
917 ct.trk to check if the packet needs to be hairpinned (if
918 after load balancing the destination IP matches the
919 source IP) or not by executing the actions reg0[6] =
920 chk_lb_hairpin(); and reg0[12] = chk_lb_hairpin_reply();
921 and advances the packet to the next table.
922
923 • A priority-0 flow that simply moves traffic to the next
924 table.
925
926 Ingress Table 15: Nat-Hairpin
927
928 • If the logical switch has load balancer(s) configured,
929 then a priority-100 flow is added with the match ip &&
930 ct.new && ct.trk && reg0[6] == 1 which hairpins the traf‐
931 fic by NATting source IP to the load balancer VIP by exe‐
932 cuting the action ct_snat_to_vip and advances the packet
933 to the next table.
934
935 • If the logical switch has load balancer(s) configured,
936 then a priority-100 flow is added with the match ip &&
937 ct.est && ct.trk && reg0[6] == 1 which hairpins the traf‐
938 fic by NATting source IP to the load balancer VIP by exe‐
939 cuting the action ct_snat and advances the packet to the
940 next table.
941
942 • If the logical switch has load balancer(s) configured,
943 then a priority-90 flow is added with the match ip &&
944 reg0[12] == 1 which matches on the replies of hairpinned
945 traffic (i.e., destination IP is VIP, source IP is the
946 backend IP and source L4 port is backend port for L4 load
947 balancers) and executes ct_snat and advances the packet
948 to the next table.
949
950 • A priority-0 flow that simply moves traffic to the next
951 table.
952
953 Ingress Table 16: Hairpin
954
955 • For each distributed gateway router port RP attached to
956 the logical switch, a priority-2000 flow is added with
957 the match reg0[14] == 1 && is_chassis_resident(RP)
958 and action next; to pass the traffic to the next table
959 to respond to the ARP requests for the router port IPs.
960
961 reg0[14] register bit is set in the ingress L2 port secu‐
962 rity check table for traffic received from HW VTEP (ramp)
963 ports.
964
965 • A priority-1000 flow that matches on reg0[14] register
966 bit for the traffic received from HW VTEP (ramp) ports.
967 This traffic is passed to ingress table ls_in_l2_lkup.
968
969 • A priority-1 flow that hairpins traffic matched by non-
970 default flows in the Pre-Hairpin table. Hairpinning is
971 done at L2, Ethernet addresses are swapped and the pack‐
972 ets are looped back on the input port.
973
974 • A priority-0 flow that simply moves traffic to the next
975 table.
976
977 Ingress Table 17: ARP/ND responder
978
979 This table implements ARP/ND responder in a logical switch for known
980 IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
981 by locally responding to ARP requests without the need to send to other
982 hypervisors. One common case is when the inport is a logical port asso‐
983 ciated with a VIF and the broadcast is responded to on the local hyper‐
984 visor rather than broadcast across the whole network and responded to
985 by the destination VM. This behavior is proxy ARP.
986
987 ARP requests arrive from VMs from a logical switch inport of type de‐
988 fault. For this case, the logical switch proxy ARP rules can be for
989 other VMs or logical router ports. Logical switch proxy ARP rules may
990 be programmed both for mac binding of IP addresses on other logical
991 switch VIF ports (which are of the default logical switch port type,
992 representing connectivity to VMs or containers), and for mac binding of
993 IP addresses on logical switch router type ports, representing their
994 logical router port peers. In order to support proxy ARP for logical
995 router ports, an IP address must be configured on the logical switch
996 router type port, with the same value as the peer logical router port.
997 The configured MAC addresses must match as well. When a VM sends an ARP
998 request for a distributed logical router port and if the peer router
999 type port of the attached logical switch does not have an IP address
1000 configured, the ARP request will be broadcast on the logical switch.
1001 One of the copies of the ARP request will go through the logical switch
1002 router type port to the logical router datapath, where the logical
1003 router ARP responder will generate a reply. The MAC binding of a dis‐
1004 tributed logical router, once learned by an associated VM, is used for
1005 all that VM’s communication needing routing. Hence, the action of a VM
1006 re-arping for the mac binding of the logical router port should be
1007 rare.
1008
1009 Logical switch ARP responder proxy ARP rules can also be hit when re‐
1010 ceiving ARP requests externally on a L2 gateway port. In this case, the
1011 hypervisor acting as an L2 gateway, responds to the ARP request on be‐
1012 half of a destination VM.
1013
1014 Note that ARP requests received from localnet logical inports can ei‐
1015 ther go directly to VMs, in which case the VM responds or can hit an
1016 ARP responder for a logical router port if the packet is used to re‐
1017 solve a logical router port next hop address. In either case, logical
1018 switch ARP responder rules will not be hit. It contains these logical
1019 flows:
1020
1021 • Priority-100 flows to skip the ARP responder if inport is
1022 of type localnet advances directly to the next table. ARP
1023 requests sent to localnet ports can be received by multi‐
1024 ple hypervisors. Now, because the same mac binding rules
1025 are downloaded to all hypervisors, each of the multiple
1026 hypervisors will respond. This will confuse L2 learning
1027 on the source of the ARP requests. ARP requests received
1028 on an inport of type router are not expected to hit any
1029 logical switch ARP responder flows. However, no skip
1030 flows are installed for these packets, as there would be
1031 some additional flow cost for this and the value appears
1032 limited.
1033
1034 • If inport V is of type virtual adds a priority-100 logi‐
1035 cal flows for each P configured in the options:virtual-
1036 parents column with the match
1037
1038 inport == P && && ((arp.op == 1 && arp.spa == VIP && arp.tpa == VIP) || (arp.op == 2 && arp.spa == VIP))
1039 inport == P && && ((nd_ns && ip6.dst == {VIP, NS_MULTICAST_ADDR} && nd.target == VIP) || (nd_na && nd.target == VIP))
1040
1041
1042 and applies the action
1043
1044 bind_vport(V, inport);
1045
1046
1047 and advances the packet to the next table.
1048
1049 Where VIP is the virtual ip configured in the column op‐
1050 tions:virtual-ip and NS_MULTICAST_ADDR is solicited-node
1051 multicast address corresponding to the VIP.
1052
1053 • Priority-50 flows that match ARP requests to each known
1054 IP address A of every logical switch port, and respond
1055 with ARP replies directly with corresponding Ethernet ad‐
1056 dress E:
1057
1058 eth.dst = eth.src;
1059 eth.src = E;
1060 arp.op = 2; /* ARP reply. */
1061 arp.tha = arp.sha;
1062 arp.sha = E;
1063 arp.tpa = arp.spa;
1064 arp.spa = A;
1065 outport = inport;
1066 flags.loopback = 1;
1067 output;
1068
1069
1070 These flows are omitted for logical ports (other than
1071 router ports or localport ports) that are down (unless
1072 ignore_lsp_down is configured as true in options column
1073 of NB_Global table of the Northbound database), for logi‐
1074 cal ports of type virtual, for logical ports with ’un‐
1075 known’ address set and for logical ports of a logical
1076 switch configured with other_config:vlan-passthru=true.
1077
1078 The above ARP responder flows are added for the list of
1079 IPv4 addresses if defined in options:arp_proxy column of
1080 Logical_Switch_Port table for logical switch ports of
1081 type router.
1082
1083 • Priority-50 flows that match IPv6 ND neighbor solicita‐
1084 tions to each known IP address A (and A’s solicited node
1085 address) of every logical switch port except of type
1086 router, and respond with neighbor advertisements directly
1087 with corresponding Ethernet address E:
1088
1089 nd_na {
1090 eth.src = E;
1091 ip6.src = A;
1092 nd.target = A;
1093 nd.tll = E;
1094 outport = inport;
1095 flags.loopback = 1;
1096 output;
1097 };
1098
1099
1100 Priority-50 flows that match IPv6 ND neighbor solicita‐
1101 tions to each known IP address A (and A’s solicited node
1102 address) of logical switch port of type router, and re‐
1103 spond with neighbor advertisements directly with corre‐
1104 sponding Ethernet address E:
1105
1106 nd_na_router {
1107 eth.src = E;
1108 ip6.src = A;
1109 nd.target = A;
1110 nd.tll = E;
1111 outport = inport;
1112 flags.loopback = 1;
1113 output;
1114 };
1115
1116
1117 These flows are omitted for logical ports (other than
1118 router ports or localport ports) that are down (unless
1119 ignore_lsp_down is configured as true in options column
1120 of NB_Global table of the Northbound database), for logi‐
1121 cal ports of type virtual and for logical ports with ’un‐
1122 known’ address set.
1123
1124 • Priority-100 flows with match criteria like the ARP and
1125 ND flows above, except that they only match packets from
1126 the inport that owns the IP addresses in question, with
1127 action next;. These flows prevent OVN from replying to,
1128 for example, an ARP request emitted by a VM for its own
1129 IP address. A VM only makes this kind of request to at‐
1130 tempt to detect a duplicate IP address assignment, so
1131 sending a reply will prevent the VM from accepting the IP
1132 address that it owns.
1133
1134 In place of next;, it would be reasonable to use drop;
1135 for the flows’ actions. If everything is working as it is
1136 configured, then this would produce equivalent results,
1137 since no host should reply to the request. But ARPing for
1138 one’s own IP address is intended to detect situations
1139 where the network is not working as configured, so drop‐
1140 ping the request would frustrate that intent.
1141
1142 • For each SVC_MON_SRC_IP defined in the value of the
1143 ip_port_mappings:ENDPOINT_IP column of Load_Balancer ta‐
1144 ble, priority-110 logical flow is added with the match
1145 arp.tpa == SVC_MON_SRC_IP && && arp.op == 1 and applies
1146 the action
1147
1148 eth.dst = eth.src;
1149 eth.src = E;
1150 arp.op = 2; /* ARP reply. */
1151 arp.tha = arp.sha;
1152 arp.sha = E;
1153 arp.tpa = arp.spa;
1154 arp.spa = A;
1155 outport = inport;
1156 flags.loopback = 1;
1157 output;
1158
1159
1160 where E is the service monitor source mac defined in the
1161 options:svc_monitor_mac column in the NB_Global table.
1162 This mac is used as the source mac in the service monitor
1163 packets for the load balancer endpoint IP health checks.
1164
1165 SVC_MON_SRC_IP is used as the source ip in the service
1166 monitor IPv4 packets for the load balancer endpoint IP
1167 health checks.
1168
1169 These flows are required if an ARP request is sent for
1170 the IP SVC_MON_SRC_IP.
1171
1172 • For each VIP configured in the table Forwarding_Group a
1173 priority-50 logical flow is added with the match arp.tpa
1174 == vip && && arp.op == 1
1175 and applies the action
1176
1177 eth.dst = eth.src;
1178 eth.src = E;
1179 arp.op = 2; /* ARP reply. */
1180 arp.tha = arp.sha;
1181 arp.sha = E;
1182 arp.tpa = arp.spa;
1183 arp.spa = A;
1184 outport = inport;
1185 flags.loopback = 1;
1186 output;
1187
1188
1189 where E is the forwarding group’s mac defined in the
1190 vmac.
1191
1192 A is used as either the destination ip for load balancing
1193 traffic to child ports or as nexthop to hosts behind the
1194 child ports.
1195
1196 These flows are required to respond to an ARP request if
1197 an ARP request is sent for the IP vip.
1198
1199 • One priority-0 fallback flow that matches all packets and
1200 advances to the next table.
1201
1202 Ingress Table 18: DHCP option processing
1203
1204 This table adds the DHCPv4 options to a DHCPv4 packet from the logical
1205 ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
1206 larly for DHCPv6 options. This table also adds flows for the logical
1207 ports of type external.
1208
1209 • A priority-100 logical flow is added for these logical
1210 ports which matches the IPv4 packet with udp.src = 68 and
1211 udp.dst = 67 and applies the action put_dhcp_opts and ad‐
1212 vances the packet to the next table.
1213
1214 reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
1215 next;
1216
1217
1218 For DHCPDISCOVER and DHCPREQUEST, this transforms the
1219 packet into a DHCP reply, adds the DHCP offer IP ip and
1220 options to the packet, and stores 1 into reg0[3]. For
1221 other kinds of packets, it just stores 0 into reg0[3].
1222 Either way, it continues to the next table.
1223
1224 • A priority-100 logical flow is added for these logical
1225 ports which matches the IPv6 packet with udp.src = 546
1226 and udp.dst = 547 and applies the action put_dhcpv6_opts
1227 and advances the packet to the next table.
1228
1229 reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
1230 next;
1231
1232
1233 For DHCPv6 Solicit/Request/Confirm packets, this trans‐
1234 forms the packet into a DHCPv6 Advertise/Reply, adds the
1235 DHCPv6 offer IP ip and options to the packet, and stores
1236 1 into reg0[3]. For other kinds of packets, it just
1237 stores 0 into reg0[3]. Either way, it continues to the
1238 next table.
1239
1240 • A priority-0 flow that matches all packets to advances to
1241 table 16.
1242
1243 Ingress Table 19: DHCP responses
1244
1245 This table implements DHCP responder for the DHCP replies generated by
1246 the previous table.
1247
1248 • A priority 100 logical flow is added for the logical
1249 ports configured with DHCPv4 options which matches IPv4
1250 packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
1251 1 and responds back to the inport after applying these
1252 actions. If reg0[3] is set to 1, it means that the action
1253 put_dhcp_opts was successful.
1254
1255 eth.dst = eth.src;
1256 eth.src = E;
1257 ip4.src = S;
1258 udp.src = 67;
1259 udp.dst = 68;
1260 outport = P;
1261 flags.loopback = 1;
1262 output;
1263
1264
1265 where E is the server MAC address and S is the server
1266 IPv4 address defined in the DHCPv4 options. Note that
1267 ip4.dst field is handled by put_dhcp_opts.
1268
1269 (This terminates ingress packet processing; the packet
1270 does not go to the next ingress table.)
1271
1272 • A priority 100 logical flow is added for the logical
1273 ports configured with DHCPv6 options which matches IPv6
1274 packets with udp.src == 546 && udp.dst == 547 && reg0[3]
1275 == 1 and responds back to the inport after applying these
1276 actions. If reg0[3] is set to 1, it means that the action
1277 put_dhcpv6_opts was successful.
1278
1279 eth.dst = eth.src;
1280 eth.src = E;
1281 ip6.dst = A;
1282 ip6.src = S;
1283 udp.src = 547;
1284 udp.dst = 546;
1285 outport = P;
1286 flags.loopback = 1;
1287 output;
1288
1289
1290 where E is the server MAC address and S is the server
1291 IPv6 LLA address generated from the server_id defined in
1292 the DHCPv6 options and A is the IPv6 address defined in
1293 the logical port’s addresses column.
1294
1295 (This terminates packet processing; the packet does not
1296 go on the next ingress table.)
1297
1298 • A priority-0 flow that matches all packets to advances to
1299 table 17.
1300
1301 Ingress Table 20 DNS Lookup
1302
1303 This table looks up and resolves the DNS names to the corresponding
1304 configured IP address(es).
1305
1306 • A priority-100 logical flow for each logical switch data‐
1307 path if it is configured with DNS records, which matches
1308 the IPv4 and IPv6 packets with udp.dst = 53 and applies
1309 the action dns_lookup and advances the packet to the next
1310 table.
1311
1312 reg0[4] = dns_lookup(); next;
1313
1314
1315 For valid DNS packets, this transforms the packet into a
1316 DNS reply if the DNS name can be resolved, and stores 1
1317 into reg0[4]. For failed DNS resolution or other kinds of
1318 packets, it just stores 0 into reg0[4]. Either way, it
1319 continues to the next table.
1320
1321 Ingress Table 21 DNS Responses
1322
1323 This table implements DNS responder for the DNS replies generated by
1324 the previous table.
1325
1326 • A priority-100 logical flow for each logical switch data‐
1327 path if it is configured with DNS records, which matches
1328 the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
1329 1 and responds back to the inport after applying these
1330 actions. If reg0[4] is set to 1, it means that the action
1331 dns_lookup was successful.
1332
1333 eth.dst <-> eth.src;
1334 ip4.src <-> ip4.dst;
1335 udp.dst = udp.src;
1336 udp.src = 53;
1337 outport = P;
1338 flags.loopback = 1;
1339 output;
1340
1341
1342 (This terminates ingress packet processing; the packet
1343 does not go to the next ingress table.)
1344
1345 Ingress table 22 External ports
1346
1347 Traffic from the external logical ports enter the ingress datapath
1348 pipeline via the localnet port. This table adds the below logical flows
1349 to handle the traffic from these ports.
1350
1351 • A priority-100 flow is added for each external logical
1352 port which doesn’t reside on a chassis to drop the
1353 ARP/IPv6 NS request to the router IP(s) (of the logical
1354 switch) which matches on the inport of the external logi‐
1355 cal port and the valid eth.src address(es) of the exter‐
1356 nal logical port.
1357
1358 This flow guarantees that the ARP/NS request to the
1359 router IP address from the external ports is responded by
1360 only the chassis which has claimed these external ports.
1361 All the other chassis, drops these packets.
1362
1363 A priority-100 flow is added for each external logical
1364 port which doesn’t reside on a chassis to drop any packet
1365 destined to the router mac - with the match inport == ex‐
1366 ternal && eth.src == E && eth.dst == R && !is_chas‐
1367 sis_resident("external") where E is the external port mac
1368 and R is the router port mac.
1369
1370 • A priority-0 flow that matches all packets to advances to
1371 table 20.
1372
1373 Ingress Table 23 Destination Lookup
1374
1375 This table implements switching behavior. It contains these logical
1376 flows:
1377
1378 • A priority-110 flow with the match eth.src == E for all
1379 logical switch datapaths and applies the action han‐
1380 dle_svc_check(inport). Where E is the service monitor mac
1381 defined in the options:svc_monitor_mac colum of NB_Global
1382 table.
1383
1384 • A priority-100 flow that punts all IGMP/MLD packets to
1385 ovn-controller if multicast snooping is enabled on the
1386 logical switch. The flow also forwards the IGMP/MLD pack‐
1387 ets to the MC_MROUTER_STATIC multicast group, which
1388 ovn-northd populates with all the logical ports that have
1389 options :mcast_flood_reports=’true’.
1390
1391 • Priority-90 flows that forward registered IP multicast
1392 traffic to their corresponding multicast group, which
1393 ovn-northd creates based on learnt IGMP_Group entries.
1394 The flows also forward packets to the MC_MROUTER_FLOOD
1395 multicast group, which ovn-nortdh populates with all the
1396 logical ports that are connected to logical routers with
1397 options:mcast_relay=’true’.
1398
1399 • A priority-85 flow that forwards all IP multicast traffic
1400 destined to 224.0.0.X to the MC_FLOOD multicast group,
1401 which ovn-northd populates with all enabled logical
1402 ports.
1403
1404 • A priority-85 flow that forwards all IP multicast traffic
1405 destined to reserved multicast IPv6 addresses (RFC 4291,
1406 2.7.1, e.g., Solicited-Node multicast) to the MC_FLOOD
1407 multicast group, which ovn-northd populates with all en‐
1408 abled logical ports.
1409
1410 • A priority-80 flow that forwards all unregistered IP mul‐
1411 ticast traffic to the MC_STATIC multicast group, which
1412 ovn-northd populates with all the logical ports that have
1413 options :mcast_flood=’true’. The flow also forwards un‐
1414 registered IP multicast traffic to the MC_MROUTER_FLOOD
1415 multicast group, which ovn-northd populates with all the
1416 logical ports connected to logical routers that have op‐
1417 tions :mcast_relay=’true’.
1418
1419 • A priority-80 flow that drops all unregistered IP multi‐
1420 cast traffic if other_config :mcast_snoop=’true’ and
1421 other_config :mcast_flood_unregistered=’false’ and the
1422 switch is not connected to a logical router that has op‐
1423 tions :mcast_relay=’true’ and the switch doesn’t have any
1424 logical port with options :mcast_flood=’true’.
1425
1426 • Priority-80 flows for each IP address/VIP/NAT address
1427 owned by a router port connected to the switch. These
1428 flows match ARP requests and ND packets for the specific
1429 IP addresses. Matched packets are forwarded only to the
1430 router that owns the IP address and to the MC_FLOOD_L2
1431 multicast group which contains all non-router logical
1432 ports.
1433
1434 • Priority-75 flows for each port connected to a logical
1435 router matching self originated ARP request/ND packets.
1436 These packets are flooded to the MC_FLOOD_L2 which con‐
1437 tains all non-router logical ports.
1438
1439 • A priority-70 flow that outputs all packets with an Eth‐
1440 ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
1441 ticast group.
1442
1443 • One priority-50 flow that matches each known Ethernet ad‐
1444 dress against eth.dst and outputs the packet to the sin‐
1445 gle associated output port.
1446
1447 For the Ethernet address on a logical switch port of type
1448 router, when that logical switch port’s addresses column
1449 is set to router and the connected logical router port
1450 has a gateway chassis:
1451
1452 • The flow for the connected logical router port’s
1453 Ethernet address is only programmed on the gateway
1454 chassis.
1455
1456 • If the logical router has rules specified in nat
1457 with external_mac, then those addresses are also
1458 used to populate the switch’s destination lookup
1459 on the chassis where logical_port is resident.
1460
1461 For the Ethernet address on a logical switch port of type
1462 router, when that logical switch port’s addresses column
1463 is set to router and the connected logical router port
1464 specifies a reside-on-redirect-chassis and the logical
1465 router to which the connected logical router port belongs
1466 to has a distributed gateway LRP:
1467
1468 • The flow for the connected logical router port’s
1469 Ethernet address is only programmed on the gateway
1470 chassis.
1471
1472 For each forwarding group configured on the logical
1473 switch datapath, a priority-50 flow that matches on
1474 eth.dst == VIP
1475 with an action of fwd_group(childports=args ), where
1476 args contains comma separated logical switch child ports
1477 to load balance to. If liveness is enabled, then action
1478 also includes liveness=true.
1479
1480 • One priority-0 fallback flow that matches all packets
1481 with the action outport = get_fdb(eth.dst); next;. The
1482 action get_fdb gets the port for the eth.dst in the MAC
1483 learning table of the logical switch datapath. If there
1484 is no entry for eth.dst in the MAC learning table, then
1485 it stores none in the outport.
1486
1487 Ingress Table 24 Destination unknown
1488
1489 This table handles the packets whose destination was not found or and
1490 looked up in the MAC learning table of the logical switch datapath. It
1491 contains the following flows.
1492
1493 • If the logical switch has logical ports with ’unknown’
1494 addresses set, then the below logical flow is added
1495
1496 • Priority 50 flow with the match outport == none
1497 then outputs them to the MC_UNKNOWN multicast
1498 group, which ovn-northd populates with all enabled
1499 logical ports that accept unknown destination
1500 packets. As a small optimization, if no logical
1501 ports accept unknown destination packets,
1502 ovn-northd omits this multicast group and logical
1503 flow.
1504
1505 If the logical switch has no logical ports with ’unknown’
1506 address set, then the below logical flow is added
1507
1508 • Priority 50 flow with the match outport == none
1509 and drops the packets.
1510
1511 • One priority-0 fallback flow that outputs the packet to
1512 the egress stage with the outport learnt from get_fdb ac‐
1513 tion.
1514
1515 Egress Table 0: Pre-LB
1516
1517 This table is similar to ingress table Pre-LB. It contains a priority-0
1518 flow that simply moves traffic to the next table. Moreover it contains
1519 two priority-110 flows to move multicast, IPv6 Neighbor Discovery and
1520 MLD traffic to the next table. If any load balancing rules exist for
1521 the datapath, a priority-100 flow is added with a match of ip and ac‐
1522 tion of reg0[2] = 1; next; to act as a hint for table Pre-stateful to
1523 send IP packets to the connection tracker for packet de-fragmentation
1524 and possibly DNAT the destination VIP to one of the selected backend
1525 for already commited load balanced traffic.
1526
1527 This table also has a priority-110 flow with the match eth.src == E for
1528 all logical switch datapaths to move traffic to the next table. Where E
1529 is the service monitor mac defined in the options:svc_monitor_mac colum
1530 of NB_Global table.
1531
1532 Egress Table 1: to-lport Pre-ACLs
1533
1534 This is similar to ingress table Pre-ACLs except for to-lport traffic.
1535
1536 This table also has a priority-110 flow with the match eth.src == E for
1537 all logical switch datapaths to move traffic to the next table. Where E
1538 is the service monitor mac defined in the options:svc_monitor_mac colum
1539 of NB_Global table.
1540
1541 This table also has a priority-110 flow with the match outport == I for
1542 all logical switch datapaths to move traffic to the next table. Where I
1543 is the peer of a logical router port. This flow is added to skip the
1544 connection tracking of packets which will be entering logical router
1545 datapath from logical switch datapath for routing.
1546
1547 Egress Table 2: Pre-stateful
1548
1549 This is similar to ingress table Pre-stateful. This table adds the be‐
1550 low 3 logical flows.
1551
1552 • A Priority-120 flow that send the packets to connection
1553 tracker using ct_lb_mark; as the action so that the al‐
1554 ready established traffic gets unDNATted from the backend
1555 IP to the load balancer VIP based on a hint provided by
1556 the previous tables with a match for reg0[2] == 1. If the
1557 packet was not DNATted earlier, then ct_lb_mark functions
1558 like ct_next.
1559
1560 • A priority-100 flow sends the packets to connection
1561 tracker based on a hint provided by the previous tables
1562 (with a match for reg0[0] == 1) by using the ct_next; ac‐
1563 tion.
1564
1565 • A priority-0 flow that matches all packets to advance to
1566 the next table.
1567
1568 Egress Table 3: from-lport ACL hints
1569
1570 This is similar to ingress table ACL hints.
1571
1572 Egress Table 4: to-lport ACLs
1573
1574 This is similar to ingress table ACLs except for to-lport ACLs.
1575
1576 In addition, the following flows are added.
1577
1578 • A priority 34000 logical flow is added for each logical
1579 port which has DHCPv4 options defined to allow the DHCPv4
1580 reply packet and which has DHCPv6 options defined to al‐
1581 low the DHCPv6 reply packet from the Ingress Table 18:
1582 DHCP responses.
1583
1584 • A priority 34000 logical flow is added for each logical
1585 switch datapath configured with DNS records with the
1586 match udp.dst = 53 to allow the DNS reply packet from the
1587 Ingress Table 20: DNS responses.
1588
1589 • A priority 34000 logical flow is added for each logical
1590 switch datapath with the match eth.src = E to allow the
1591 service monitor request packet generated by ovn-con‐
1592 troller with the action next, where E is the service mon‐
1593 itor mac defined in the options:svc_monitor_mac colum of
1594 NB_Global table.
1595
1596 Egress Table 5: to-lport QoS Marking
1597
1598 This is similar to ingress table QoS marking except they apply to
1599 to-lport QoS rules.
1600
1601 Egress Table 6: to-lport QoS Meter
1602
1603 This is similar to ingress table QoS meter except they apply to
1604 to-lport QoS rules.
1605
1606 Egress Table 7: Stateful
1607
1608 This is similar to ingress table Stateful except that there are no
1609 rules added for load balancing new connections.
1610
1611 Egress Table 8: Egress Port Security - check
1612
1613 This is similar to the port security logic in table Ingress Port Secu‐
1614 rity check except that action check_out_port_sec is used to check the
1615 port security rules. This table adds the below logical flows.
1616
1617 • A priority 100 flow which matches on the multicast traf‐
1618 fic and applies the action REGBIT_PORT_SEC_DROP" = 0;
1619 next;" to skip the out port security checks.
1620
1621 • For each disabled logical port, a priority 150 flow is
1622 added which matches on all packets and applies the action
1623 REGBIT_PORT_SEC_DROP" = 1; next;" so that the packets are
1624 dropped in the next stage.
1625
1626 • A priority 0 logical flow is added which matches on all
1627 the packets and applies the action REGBIT_PORT_SEC_DROP"
1628 = check_out_port_sec(); next;". The action
1629 check_out_port_sec applies the port security rules based
1630 on the addresses defined in the port_security column of
1631 Logical_Switch_Port table before delivering the packet to
1632 the outport.
1633
1634 Egress Table 9: Egress Port Security - Apply
1635
1636 This is similar to the ingress port security logic in ingress table A
1637 Ingress Port Security - Apply. This table drops the packets if the port
1638 security check failed in the previous stage i.e the register bit REG‐
1639 BIT_PORT_SEC_DROP is set to 1.
1640
1641 The following flows are added.
1642
1643 • For each localnet port configured with egress qos in the
1644 options:qdisc_queue_id column of Logical_Switch_Port, a
1645 priority 100 flow is added which matches on the localnet
1646 outport and applies the action set_queue(id); output;".
1647
1648 Please remember to mark the corresponding physical inter‐
1649 face with ovn-egress-iface set to true in external_ids.
1650
1651 • A priority-50 flow that drops the packet if the register
1652 bit REGBIT_PORT_SEC_DROP is set to 1.
1653
1654 • A priority-0 flow that outputs the packet to the outport.
1655
1656 Logical Router Datapaths
1657 Logical router datapaths will only exist for Logical_Router rows in the
1658 OVN_Northbound database that do not have enabled set to false
1659
1660 Ingress Table 0: L2 Admission Control
1661
1662 This table drops packets that the router shouldn’t see at all based on
1663 their Ethernet headers. It contains the following flows:
1664
1665 • Priority-100 flows to drop packets with VLAN tags or mul‐
1666 ticast Ethernet source addresses.
1667
1668 • For each enabled router port P with Ethernet address E, a
1669 priority-50 flow that matches inport == P && (eth.mcast
1670 || eth.dst == E), stores the router port ethernet address
1671 and advances to next table, with action xreg0[0..47]=E;
1672 next;.
1673
1674 For the gateway port on a distributed logical router
1675 (where one of the logical router ports specifies a gate‐
1676 way chassis), the above flow matching eth.dst == E is
1677 only programmed on the gateway port instance on the gate‐
1678 way chassis.
1679
1680 For a distributed logical router or for gateway router
1681 where the port is configured with options:gateway_mtu the
1682 action of the above flow is modified adding
1683 check_pkt_larger in order to mark the packet setting REG‐
1684 BIT_PKT_LARGER if the size is greater than the MTU. If
1685 the port is also configured with options:gateway_mtu_by‐
1686 pass then another flow is added, with priority-55, to by‐
1687 pass the check_pkt_larger flow. This is useful for traf‐
1688 fic that normally doesn’t need to be fragmented and for
1689 which check_pkt_larger, which might not be offloadable,
1690 is not really needed. One such example is TCP traffic.
1691
1692 • For each dnat_and_snat NAT rule on a distributed router
1693 that specifies an external Ethernet address E, a prior‐
1694 ity-50 flow that matches inport == GW && eth.dst == E,
1695 where GW is the logical router gateway port, with action
1696 xreg0[0..47]=E; next;.
1697
1698 This flow is only programmed on the gateway port instance
1699 on the chassis where the logical_port specified in the
1700 NAT rule resides.
1701
1702 Other packets are implicitly dropped.
1703
1704 Ingress Table 1: Neighbor lookup
1705
1706 For ARP and IPv6 Neighbor Discovery packets, this table looks into the
1707 MAC_Binding records to determine if OVN needs to learn the mac bind‐
1708 ings. Following flows are added:
1709
1710 • For each router port P that owns IP address A, which be‐
1711 longs to subnet S with prefix length L, if the option al‐
1712 ways_learn_from_arp_request is true for this router, a
1713 priority-100 flow is added which matches inport == P &&
1714 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1715 lowing actions:
1716
1717 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1718 next;
1719
1720
1721 If the option always_learn_from_arp_request is false, the
1722 following two flows are added.
1723
1724 A priority-110 flow is added which matches inport == P &&
1725 arp.spa == S/L && arp.tpa == A && arp.op == 1 (ARP re‐
1726 quest) with the following actions:
1727
1728 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1729 reg9[3] = 1;
1730 next;
1731
1732
1733 A priority-100 flow is added which matches inport == P &&
1734 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1735 lowing actions:
1736
1737 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1738 reg9[3] = lookup_arp_ip(inport, arp.spa);
1739 next;
1740
1741
1742 If the logical router port P is a distributed gateway
1743 router port, additional match is_chassis_resident(cr-P)
1744 is added for all these flows.
1745
1746 • A priority-100 flow which matches on ARP reply packets
1747 and applies the actions if the option al‐
1748 ways_learn_from_arp_request is true:
1749
1750 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1751 next;
1752
1753
1754 If the option always_learn_from_arp_request is false, the
1755 above actions will be:
1756
1757 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1758 reg9[3] = 1;
1759 next;
1760
1761
1762 • A priority-100 flow which matches on IPv6 Neighbor Dis‐
1763 covery advertisement packet and applies the actions if
1764 the option always_learn_from_arp_request is true:
1765
1766 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1767 next;
1768
1769
1770 If the option always_learn_from_arp_request is false, the
1771 above actions will be:
1772
1773 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1774 reg9[3] = 1;
1775 next;
1776
1777
1778 • A priority-100 flow which matches on IPv6 Neighbor Dis‐
1779 covery solicitation packet and applies the actions if the
1780 option always_learn_from_arp_request is true:
1781
1782 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1783 next;
1784
1785
1786 If the option always_learn_from_arp_request is false, the
1787 above actions will be:
1788
1789 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1790 reg9[3] = lookup_nd_ip(inport, ip6.src);
1791 next;
1792
1793
1794 • A priority-0 fallback flow that matches all packets and
1795 applies the action reg9[2] = 1; next; advancing the
1796 packet to the next table.
1797
1798 Ingress Table 2: Neighbor learning
1799
1800 This table adds flows to learn the mac bindings from the ARP and IPv6
1801 Neighbor Solicitation/Advertisement packets if it is needed according
1802 to the lookup results from the previous stage.
1803
1804 reg9[2] will be 1 if the lookup_arp/lookup_nd in the previous table was
1805 successful or skipped, meaning no need to learn mac binding from the
1806 packet.
1807
1808 reg9[3] will be 1 if the lookup_arp_ip/lookup_nd_ip in the previous ta‐
1809 ble was successful or skipped, meaning it is ok to learn mac binding
1810 from the packet (if reg9[2] is 0).
1811
1812 • A priority-100 flow with the match reg9[2] == 1 ||
1813 reg9[3] == 0 and advances the packet to the next table as
1814 there is no need to learn the neighbor.
1815
1816 • A priority-90 flow with the match arp and applies the ac‐
1817 tion put_arp(inport, arp.spa, arp.sha); next;
1818
1819 • A priority-95 flow with the match nd_na && nd.tll == 0
1820 and applies the action put_nd(inport, nd.target,
1821 eth.src); next;
1822
1823 • A priority-90 flow with the match nd_na and applies the
1824 action put_nd(inport, nd.target, nd.tll); next;
1825
1826 • A priority-90 flow with the match nd_ns and applies the
1827 action put_nd(inport, ip6.src, nd.sll); next;
1828
1829 Ingress Table 3: IP Input
1830
1831 This table is the core of the logical router datapath functionality. It
1832 contains the following flows to implement very basic IP host function‐
1833 ality.
1834
1835 • For each dnat_and_snat NAT rule on a distributed logical
1836 routers or gateway routers with gateway port configured
1837 with options:gateway_mtu to a valid integer value M, a
1838 priority-160 flow with the match inport == LRP && REG‐
1839 BIT_PKT_LARGER && REGBIT_EGRESS_LOOPBACK == 0, where LRP
1840 is the logical router port and applies the following ac‐
1841 tion for ipv4 and ipv6 respectively:
1842
1843 icmp4_error {
1844 icmp4.type = 3; /* Destination Unreachable. */
1845 icmp4.code = 4; /* Frag Needed and DF was Set. */
1846 icmp4.frag_mtu = M;
1847 eth.dst = eth.src;
1848 eth.src = E;
1849 ip4.dst = ip4.src;
1850 ip4.src = I;
1851 ip.ttl = 255;
1852 REGBIT_EGRESS_LOOPBACK = 1;
1853 REGBIT_PKT_LARGER 0;
1854 outport = LRP;
1855 flags.loopback = 1;
1856 output;
1857 };
1858 icmp6_error {
1859 icmp6.type = 2;
1860 icmp6.code = 0;
1861 icmp6.frag_mtu = M;
1862 eth.dst = eth.src;
1863 eth.src = E;
1864 ip6.dst = ip6.src;
1865 ip6.src = I;
1866 ip.ttl = 255;
1867 REGBIT_EGRESS_LOOPBACK = 1;
1868 REGBIT_PKT_LARGER 0;
1869 outport = LRP;
1870 flags.loopback = 1;
1871 output;
1872 };
1873
1874
1875 where E and I are the NAT rule external mac and IP re‐
1876 spectively.
1877
1878 • For distributed logical routers or gateway routers with
1879 gateway port configured with options:gateway_mtu to a
1880 valid integer value, a priority-150 flow with the match
1881 inport == LRP && REGBIT_PKT_LARGER && REGBIT_EGRESS_LOOP‐
1882 BACK == 0, where LRP is the logical router port and ap‐
1883 plies the following action for ipv4 and ipv6 respec‐
1884 tively:
1885
1886 icmp4_error {
1887 icmp4.type = 3; /* Destination Unreachable. */
1888 icmp4.code = 4; /* Frag Needed and DF was Set. */
1889 icmp4.frag_mtu = M;
1890 eth.dst = E;
1891 ip4.dst = ip4.src;
1892 ip4.src = I;
1893 ip.ttl = 255;
1894 REGBIT_EGRESS_LOOPBACK = 1;
1895 REGBIT_PKT_LARGER 0;
1896 next(pipeline=ingress, table=0);
1897 };
1898 icmp6_error {
1899 icmp6.type = 2;
1900 icmp6.code = 0;
1901 icmp6.frag_mtu = M;
1902 eth.dst = E;
1903 ip6.dst = ip6.src;
1904 ip6.src = I;
1905 ip.ttl = 255;
1906 REGBIT_EGRESS_LOOPBACK = 1;
1907 REGBIT_PKT_LARGER 0;
1908 next(pipeline=ingress, table=0);
1909 };
1910
1911
1912 • For each NAT entry of a distributed logical router (with
1913 distributed gateway router port) of type snat, a prior‐
1914 ity-120 flow with the match inport == P && ip4.src == A
1915 advances the packet to the next pipeline, where P is the
1916 distributed logical router port and A is the external_ip
1917 set in the NAT entry. If A is an IPv6 address, then
1918 ip6.src is used for the match.
1919
1920 The above flow is required to handle the routing of the
1921 East/west NAT traffic.
1922
1923 • For each BFD port the two following priority-110 flows
1924 are added to manage BFD traffic:
1925
1926 • if ip4.src or ip6.src is any IP address owned by
1927 the router port and udp.dst == 3784 , the packet
1928 is advanced to the next pipeline stage.
1929
1930 • if ip4.dst or ip6.dst is any IP address owned by
1931 the router port and udp.dst == 3784 , the han‐
1932 dle_bfd_msg action is executed.
1933
1934 • L3 admission control: Priority-120 flows allows IGMP and
1935 MLD packets if the router has logical ports that have op‐
1936 tions :mcast_flood=’true’.
1937
1938 • L3 admission control: A priority-100 flow drops packets
1939 that match any of the following:
1940
1941 • ip4.src[28..31] == 0xe (multicast source)
1942
1943 • ip4.src == 255.255.255.255 (broadcast source)
1944
1945 • ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
1946 (localhost source or destination)
1947
1948 • ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
1949 network source or destination)
1950
1951 • ip4.src or ip6.src is any IP address owned by the
1952 router, unless the packet was recirculated due to
1953 egress loopback as indicated by REG‐
1954 BIT_EGRESS_LOOPBACK.
1955
1956 • ip4.src is the broadcast address of any IP network
1957 known to the router.
1958
1959 • A priority-100 flow parses DHCPv6 replies from IPv6 pre‐
1960 fix delegation routers (udp.src == 547 && udp.dst ==
1961 546). The handle_dhcpv6_reply is used to send IPv6 prefix
1962 delegation messages to the delegation router.
1963
1964 • ICMP echo reply. These flows reply to ICMP echo requests
1965 received for the router’s IP address. Let A be an IP ad‐
1966 dress owned by a router port. Then, for each A that is an
1967 IPv4 address, a priority-90 flow matches on ip4.dst == A
1968 and icmp4.type == 8 && icmp4.code == 0 (ICMP echo re‐
1969 quest). For each A that is an IPv6 address, a priority-90
1970 flow matches on ip6.dst == A and icmp6.type == 128 &&
1971 icmp6.code == 0 (ICMPv6 echo request). The port of the
1972 router that receives the echo request does not matter.
1973 Also, the ip.ttl of the echo request packet is not
1974 checked, so it complies with RFC 1812, section 4.2.2.9.
1975 Flows for ICMPv4 echo requests use the following actions:
1976
1977 ip4.dst <-> ip4.src;
1978 ip.ttl = 255;
1979 icmp4.type = 0;
1980 flags.loopback = 1;
1981 next;
1982
1983
1984 Flows for ICMPv6 echo requests use the following actions:
1985
1986 ip6.dst <-> ip6.src;
1987 ip.ttl = 255;
1988 icmp6.type = 129;
1989 flags.loopback = 1;
1990 next;
1991
1992
1993 • Reply to ARP requests.
1994
1995 These flows reply to ARP requests for the router’s own IP
1996 address. The ARP requests are handled only if the re‐
1997 questor’s IP belongs to the same subnets of the logical
1998 router port. For each router port P that owns IP address
1999 A, which belongs to subnet S with prefix length L, and
2000 Ethernet address E, a priority-90 flow matches inport ==
2001 P && arp.spa == S/L && arp.op == 1 && arp.tpa == A (ARP
2002 request) with the following actions:
2003
2004 eth.dst = eth.src;
2005 eth.src = xreg0[0..47];
2006 arp.op = 2; /* ARP reply. */
2007 arp.tha = arp.sha;
2008 arp.sha = xreg0[0..47];
2009 arp.tpa = arp.spa;
2010 arp.spa = A;
2011 outport = inport;
2012 flags.loopback = 1;
2013 output;
2014
2015
2016 For the gateway port on a distributed logical router
2017 (where one of the logical router ports specifies a gate‐
2018 way chassis), the above flows are only programmed on the
2019 gateway port instance on the gateway chassis. This behav‐
2020 ior avoids generation of multiple ARP responses from dif‐
2021 ferent chassis, and allows upstream MAC learning to point
2022 to the gateway chassis.
2023
2024 For the logical router port with the option reside-on-re‐
2025 direct-chassis set (which is centralized), the above
2026 flows are only programmed on the gateway port instance on
2027 the gateway chassis (if the logical router has a distrib‐
2028 uted gateway port). This behavior avoids generation of
2029 multiple ARP responses from different chassis, and allows
2030 upstream MAC learning to point to the gateway chassis.
2031
2032 • Reply to IPv6 Neighbor Solicitations. These flows reply
2033 to Neighbor Solicitation requests for the router’s own
2034 IPv6 address and populate the logical router’s mac bind‐
2035 ing table.
2036
2037 For each router port P that owns IPv6 address A, so‐
2038 licited node address S, and Ethernet address E, a prior‐
2039 ity-90 flow matches inport == P && nd_ns && ip6.dst ==
2040 {A, E} && nd.target == A with the following actions:
2041
2042 nd_na_router {
2043 eth.src = xreg0[0..47];
2044 ip6.src = A;
2045 nd.target = A;
2046 nd.tll = xreg0[0..47];
2047 outport = inport;
2048 flags.loopback = 1;
2049 output;
2050 };
2051
2052
2053 For the gateway port on a distributed logical router
2054 (where one of the logical router ports specifies a gate‐
2055 way chassis), the above flows replying to IPv6 Neighbor
2056 Solicitations are only programmed on the gateway port in‐
2057 stance on the gateway chassis. This behavior avoids gen‐
2058 eration of multiple replies from different chassis, and
2059 allows upstream MAC learning to point to the gateway
2060 chassis.
2061
2062 • These flows reply to ARP requests or IPv6 neighbor solic‐
2063 itation for the virtual IP addresses configured in the
2064 router for NAT (both DNAT and SNAT) or load balancing.
2065
2066 IPv4: For a configured NAT (both DNAT and SNAT) IP ad‐
2067 dress or a load balancer IPv4 VIP A, for each router port
2068 P with Ethernet address E, a priority-90 flow matches
2069 arp.op == 1 && arp.tpa == A (ARP request) with the fol‐
2070 lowing actions:
2071
2072 eth.dst = eth.src;
2073 eth.src = xreg0[0..47];
2074 arp.op = 2; /* ARP reply. */
2075 arp.tha = arp.sha;
2076 arp.sha = xreg0[0..47];
2077 arp.tpa <-> arp.spa;
2078 outport = inport;
2079 flags.loopback = 1;
2080 output;
2081
2082
2083 IPv4: For a configured load balancer IPv4 VIP, a similar
2084 flow is added with the additional match inport == P if
2085 the VIP is reachable from any logical router port of the
2086 logical router.
2087
2088 If the router port P is a distributed gateway router
2089 port, then the is_chassis_resident(P) is also added in
2090 the match condition for the load balancer IPv4 VIP A.
2091
2092 IPv6: For a configured NAT (both DNAT and SNAT) IP ad‐
2093 dress or a load balancer IPv6 VIP A (if the VIP is reach‐
2094 able from any logical router port of the logical router),
2095 solicited node address S, for each router port P with
2096 Ethernet address E, a priority-90 flow matches inport ==
2097 P && nd_ns && ip6.dst == {A, S} && nd.target == A with
2098 the following actions:
2099
2100 eth.dst = eth.src;
2101 nd_na {
2102 eth.src = xreg0[0..47];
2103 nd.tll = xreg0[0..47];
2104 ip6.src = A;
2105 nd.target = A;
2106 outport = inport;
2107 flags.loopback = 1;
2108 output;
2109 }
2110
2111
2112 If the router port P is a distributed gateway router
2113 port, then the is_chassis_resident(P) is also added in
2114 the match condition for the load balancer IPv6 VIP A.
2115
2116 For the gateway port on a distributed logical router with
2117 NAT (where one of the logical router ports specifies a
2118 gateway chassis):
2119
2120 • If the corresponding NAT rule cannot be handled in
2121 a distributed manner, then a priority-92 flow is
2122 programmed on the gateway port instance on the
2123 gateway chassis. A priority-91 drop flow is pro‐
2124 grammed on the other chassis when ARP requests/NS
2125 packets are received on the gateway port. This be‐
2126 havior avoids generation of multiple ARP responses
2127 from different chassis, and allows upstream MAC
2128 learning to point to the gateway chassis.
2129
2130 • If the corresponding NAT rule can be handled in a
2131 distributed manner, then this flow is only pro‐
2132 grammed on the gateway port instance where the
2133 logical_port specified in the NAT rule resides.
2134
2135 Some of the actions are different for this case,
2136 using the external_mac specified in the NAT rule
2137 rather than the gateway port’s Ethernet address E:
2138
2139 eth.src = external_mac;
2140 arp.sha = external_mac;
2141
2142
2143 or in the case of IPv6 neighbor solicition:
2144
2145 eth.src = external_mac;
2146 nd.tll = external_mac;
2147
2148
2149 This behavior avoids generation of multiple ARP
2150 responses from different chassis, and allows up‐
2151 stream MAC learning to point to the correct chas‐
2152 sis.
2153
2154 • Priority-85 flows which drops the ARP and IPv6 Neighbor
2155 Discovery packets.
2156
2157 • A priority-84 flow explicitly allows IPv6 multicast traf‐
2158 fic that is supposed to reach the router pipeline (i.e.,
2159 router solicitation and router advertisement packets).
2160
2161 • A priority-83 flow explicitly drops IPv6 multicast traf‐
2162 fic that is destined to reserved multicast groups.
2163
2164 • A priority-82 flow allows IP multicast traffic if op‐
2165 tions:mcast_relay=’true’, otherwise drops it.
2166
2167 • UDP port unreachable. Priority-80 flows generate ICMP
2168 port unreachable messages in reply to UDP datagrams di‐
2169 rected to the router’s IP address, except in the special
2170 case of gateways, which accept traffic directed to a
2171 router IP for load balancing and NAT purposes.
2172
2173 These flows should not match IP fragments with nonzero
2174 offset.
2175
2176 • TCP reset. Priority-80 flows generate TCP reset messages
2177 in reply to TCP datagrams directed to the router’s IP ad‐
2178 dress, except in the special case of gateways, which ac‐
2179 cept traffic directed to a router IP for load balancing
2180 and NAT purposes.
2181
2182 These flows should not match IP fragments with nonzero
2183 offset.
2184
2185 • Protocol or address unreachable. Priority-70 flows gener‐
2186 ate ICMP protocol or address unreachable messages for
2187 IPv4 and IPv6 respectively in reply to packets directed
2188 to the router’s IP address on IP protocols other than
2189 UDP, TCP, and ICMP, except in the special case of gate‐
2190 ways, which accept traffic directed to a router IP for
2191 load balancing purposes.
2192
2193 These flows should not match IP fragments with nonzero
2194 offset.
2195
2196 • Drop other IP traffic to this router. These flows drop
2197 any other traffic destined to an IP address of this
2198 router that is not already handled by one of the flows
2199 above, which amounts to ICMP (other than echo requests)
2200 and fragments with nonzero offsets. For each IP address A
2201 owned by the router, a priority-60 flow matches ip4.dst
2202 == A or ip6.dst == A and drops the traffic. An exception
2203 is made and the above flow is not added if the router
2204 port’s own IP address is used to SNAT packets passing
2205 through that router.
2206
2207 The flows above handle all of the traffic that might be directed to the
2208 router itself. The following flows (with lower priorities) handle the
2209 remaining traffic, potentially for forwarding:
2210
2211 • Drop Ethernet local broadcast. A priority-50 flow with
2212 match eth.bcast drops traffic destined to the local Eth‐
2213 ernet broadcast address. By definition this traffic
2214 should not be forwarded.
2215
2216 • ICMP time exceeded. For each router port P, whose IP ad‐
2217 dress is A, a priority-100 flow with match inport == P &&
2218 ip.ttl == {0, 1} && !ip.later_frag matches packets whose
2219 TTL has expired, with the following actions to send an
2220 ICMP time exceeded reply for IPv4 and IPv6 respectively:
2221
2222 icmp4 {
2223 icmp4.type = 11; /* Time exceeded. */
2224 icmp4.code = 0; /* TTL exceeded in transit. */
2225 ip4.dst = ip4.src;
2226 ip4.src = A;
2227 ip.ttl = 254;
2228 next;
2229 };
2230 icmp6 {
2231 icmp6.type = 3; /* Time exceeded. */
2232 icmp6.code = 0; /* TTL exceeded in transit. */
2233 ip6.dst = ip6.src;
2234 ip6.src = A;
2235 ip.ttl = 254;
2236 next;
2237 };
2238
2239
2240 • TTL discard. A priority-30 flow with match ip.ttl == {0,
2241 1} and actions drop; drops other packets whose TTL has
2242 expired, that should not receive a ICMP error reply (i.e.
2243 fragments with nonzero offset).
2244
2245 • Next table. A priority-0 flows match all packets that
2246 aren’t already handled and uses actions next; to feed
2247 them to the next table.
2248
2249 Ingress Table 4: UNSNAT
2250
2251 This is for already established connections’ reverse traffic. i.e.,
2252 SNAT has already been done in egress pipeline and now the packet has
2253 entered the ingress pipeline as part of a reply. It is unSNATted here.
2254
2255 Ingress Table 4: UNSNAT on Gateway and Distributed Routers
2256
2257 • If the Router (Gateway or Distributed) is configured with
2258 load balancers, then below lflows are added:
2259
2260 For each IPv4 address A defined as load balancer VIP with
2261 the protocol P (and the protocol port T if defined) is
2262 also present as an external_ip in the NAT table, a prior‐
2263 ity-120 logical flow is added with the match ip4 &&
2264 ip4.dst == A && P with the action next; to advance the
2265 packet to the next table. If the load balancer has proto‐
2266 col port B defined, then the match also has P.dst == B.
2267
2268 The above flows are also added for IPv6 load balancers.
2269
2270 Ingress Table 4: UNSNAT on Gateway Routers
2271
2272 • If the Gateway router has been configured to force SNAT
2273 any previously DNATted packets to B, a priority-110 flow
2274 matches ip && ip4.dst == B or ip && ip6.dst == B with an
2275 action ct_snat; .
2276
2277 If the Gateway router is configured with
2278 lb_force_snat_ip=router_ip then for every logical router
2279 port P attached to the Gateway router with the router ip
2280 B, a priority-110 flow is added with the match inport ==
2281 P && ip4.dst == B or inport == P && ip6.dst == B with an
2282 action ct_snat; .
2283
2284 If the Gateway router has been configured to force SNAT
2285 any previously load-balanced packets to B, a priority-100
2286 flow matches ip && ip4.dst == B or ip && ip6.dst == B
2287 with an action ct_snat; .
2288
2289 For each NAT configuration in the OVN Northbound data‐
2290 base, that asks to change the source IP address of a
2291 packet from A to B, a priority-90 flow matches ip &&
2292 ip4.dst == B or ip && ip6.dst == B with an action
2293 ct_snat; . If the NAT rule is of type dnat_and_snat and
2294 has stateless=true in the options, then the action would
2295 be ip4/6.dst= (B).
2296
2297 A priority-0 logical flow with match 1 has actions next;.
2298
2299 Ingress Table 4: UNSNAT on Distributed Routers
2300
2301 • For each configuration in the OVN Northbound database,
2302 that asks to change the source IP address of a packet
2303 from A to B, two priority-100 flows are added.
2304
2305 If the NAT rule cannot be handled in a distributed man‐
2306 ner, then the below priority-100 flows are only pro‐
2307 grammed on the gateway chassis.
2308
2309 • The first flow matches ip && ip4.dst == B && in‐
2310 port == GW && flags.loopback == 0 or ip && ip6.dst
2311 == B && inport == GW && flags.loopback == 0 where
2312 GW is the distributed gateway port specified in
2313 the NAT rule, with an action ct_snat_in_czone; to
2314 unSNAT in the common zone. If the NAT rule is of
2315 type dnat_and_snat and has stateless=true in the
2316 options, then the action would be ip4/6.dst=(B).
2317
2318 If the NAT entry is of type snat, then there is an
2319 additional match is_chassis_resident(cr-GW)
2320 where cr-GW is the chassis resident port of GW.
2321
2322 • The second flow matches ip && ip4.dst == B && in‐
2323 port == GW && flags.loopback == 1 &&
2324 flags.use_snat_zone == 1 or ip && ip6.dst == B &&
2325 inport == GW && flags.loopback == 0 &&
2326 flags.use_snat_zone == 1 where GW is the distrib‐
2327 uted gateway port specified in the NAT rule, with
2328 an action ct_snat; to unSNAT in the snat zone. If
2329 the NAT rule is of type dnat_and_snat and has
2330 stateless=true in the options, then the action
2331 would be ip4/6.dst=(B).
2332
2333 If the NAT entry is of type snat, then there is an
2334 additional match is_chassis_resident(cr-GW)
2335 where cr-GW is the chassis resident port of GW.
2336
2337 A priority-0 logical flow with match 1 has actions next;.
2338
2339 Ingress Table 5: DEFRAG
2340
2341 This is to send packets to connection tracker for tracking and defrag‐
2342 mentation. It contains a priority-0 flow that simply moves traffic to
2343 the next table.
2344
2345 If load balancing rules with only virtual IP addresses are configured
2346 in OVN_Northbound database for a Gateway router, a priority-100 flow is
2347 added for each configured virtual IP address VIP. For IPv4 VIPs the
2348 flow matches ip && ip4.dst == VIP. For IPv6 VIPs, the flow matches ip
2349 && ip6.dst == VIP. The flow applies the action reg0 = VIP; ct_dnat; (or
2350 xxreg0 for IPv6) to send IP packets to the connection tracker for
2351 packet de-fragmentation and to dnat the destination IP for the commit‐
2352 ted connection before sending it to the next table.
2353
2354 If load balancing rules with virtual IP addresses and ports are config‐
2355 ured in OVN_Northbound database for a Gateway router, a priority-110
2356 flow is added for each configured virtual IP address VIP, protocol
2357 PROTO and port PORT. For IPv4 VIPs the flow matches ip && ip4.dst ==
2358 VIP && PROTO && PROTO.dst == PORT. For IPv6 VIPs, the flow matches ip
2359 && ip6.dst == VIP && PROTO && PROTO.dst == PORT. The flow applies the
2360 action reg0 = VIP; reg9[16..31] = PROTO.dst; ct_dnat; (or xxreg0 for
2361 IPv6) to send IP packets to the connection tracker for packet de-frag‐
2362 mentation and to dnat the destination IP for the committed connection
2363 before sending it to the next table.
2364
2365 If ECMP routes with symmetric reply are configured in the OVN_North‐
2366 bound database for a gateway router, a priority-100 flow is added for
2367 each router port on which symmetric replies are configured. The match‐
2368 ing logic for these ports essentially reverses the configured logic of
2369 the ECMP route. So for instance, a route with a destination routing
2370 policy will instead match if the source IP address matches the static
2371 route’s prefix. The flow uses the action ct_next to send IP packets to
2372 the connection tracker for packet de-fragmentation and tracking before
2373 sending it to the next table.
2374
2375 Ingress Table 6: DNAT
2376
2377 Packets enter the pipeline with destination IP address that needs to be
2378 DNATted from a virtual IP address to a real IP address. Packets in the
2379 reverse direction needs to be unDNATed.
2380
2381 Ingress Table 6: Load balancing DNAT rules
2382
2383 Following load balancing DNAT flows are added for Gateway router or
2384 Router with gateway port. These flows are programmed only on the gate‐
2385 way chassis. These flows do not get programmed for load balancers with
2386 IPv6 VIPs.
2387
2388 • If controller_event has been enabled for all the config‐
2389 ured load balancing rules for a Gateway router or Router
2390 with gateway port in OVN_Northbound database that does
2391 not have configured backends, a priority-130 flow is
2392 added to trigger ovn-controller events whenever the chas‐
2393 sis receives a packet for that particular VIP. If
2394 event-elb meter has been previously created, it will be
2395 associated to the empty_lb logical flow
2396
2397 • For all the configured load balancing rules for a Gateway
2398 router or Router with gateway port in OVN_Northbound
2399 database that includes a L4 port PORT of protocol P and
2400 IPv4 or IPv6 address VIP, a priority-120 flow that
2401 matches on ct.new && ip && reg0 == VIP && P &&
2402 reg9[16..31] == PORT (xxreg0 == VIP in the IPv6 case)
2403 with an action of ct_lb_mark(args), where args contains
2404 comma separated IPv4 or IPv6 addresses (and optional port
2405 numbers) to load balance to. If the router is configured
2406 to force SNAT any load-balanced packets, the above action
2407 will be replaced by flags.force_snat_for_lb = 1;
2408 ct_lb_mark(args);. If the load balancing rule is config‐
2409 ured with skip_snat set to true, the above action will be
2410 replaced by flags.skip_snat_for_lb = 1;
2411 ct_lb_mark(args);. If health check is enabled, then args
2412 will only contain those endpoints whose service monitor
2413 status entry in OVN_Southbound db is either online or
2414 empty.
2415
2416 The previous table lr_in_defrag sets the register reg0
2417 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2418 lished traffic, this table just advances the packet to
2419 the next stage.
2420
2421 • For all the configured load balancing rules for a router
2422 in OVN_Northbound database that includes a L4 port PORT
2423 of protocol P and IPv4 or IPv6 address VIP, a prior‐
2424 ity-120 flow that matches on ct.est && ip4 && reg0 == VIP
2425 && P && reg9[16..31] == PORT (ip6 and xxreg0 == VIP in
2426 the IPv6 case) with an action of next;. If the router is
2427 configured to force SNAT any load-balanced packets, the
2428 above action will be replaced by flags.force_snat_for_lb
2429 = 1; next;. If the load balancing rule is configured with
2430 skip_snat set to true, the above action will be replaced
2431 by flags.skip_snat_for_lb = 1; next;.
2432
2433 The previous table lr_in_defrag sets the register reg0
2434 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2435 lished traffic, this table just advances the packet to
2436 the next stage.
2437
2438 • For all the configured load balancing rules for a router
2439 in OVN_Northbound database that includes just an IP ad‐
2440 dress VIP to match on, a priority-110 flow that matches
2441 on ct.new && ip4 && reg0 == VIP (ip6 and xxreg0 == VIP in
2442 the IPv6 case) with an action of ct_lb_mark(args), where
2443 args contains comma separated IPv4 or IPv6 addresses. If
2444 the router is configured to force SNAT any load-balanced
2445 packets, the above action will be replaced by
2446 flags.force_snat_for_lb = 1; ct_lb_mark(args);. If the
2447 load balancing rule is configured with skip_snat set to
2448 true, the above action will be replaced by
2449 flags.skip_snat_for_lb = 1; ct_lb_mark(args);.
2450
2451 The previous table lr_in_defrag sets the register reg0
2452 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2453 lished traffic, this table just advances the packet to
2454 the next stage.
2455
2456 • For all the configured load balancing rules for a router
2457 in OVN_Northbound database that includes just an IP ad‐
2458 dress VIP to match on, a priority-110 flow that matches
2459 on ct.est && ip4 && reg0 == VIP (or ip6 and xxreg0 ==
2460 VIP) with an action of next;. If the router is configured
2461 to force SNAT any load-balanced packets, the above action
2462 will be replaced by flags.force_snat_for_lb = 1; next;.
2463 If the load balancing rule is configured with skip_snat
2464 set to true, the above action will be replaced by
2465 flags.skip_snat_for_lb = 1; next;.
2466
2467 The previous table lr_in_defrag sets the register reg0
2468 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2469 lished traffic, this table just advances the packet to
2470 the next stage.
2471
2472 • If the load balancer is created with --reject option and
2473 it has no active backends, a TCP reset segment (for tcp)
2474 or an ICMP port unreachable packet (for all other kind of
2475 traffic) will be sent whenever an incoming packet is re‐
2476 ceived for this load-balancer. Please note using --reject
2477 option will disable empty_lb SB controller event for this
2478 load balancer.
2479
2480 Ingress Table 6: DNAT on Gateway Routers
2481
2482 • For each configuration in the OVN Northbound database,
2483 that asks to change the destination IP address of a
2484 packet from A to B, a priority-100 flow matches ip &&
2485 ip4.dst == A or ip && ip6.dst == A with an action
2486 flags.loopback = 1; ct_dnat(B);. If the Gateway router is
2487 configured to force SNAT any DNATed packet, the above ac‐
2488 tion will be replaced by flags.force_snat_for_dnat = 1;
2489 flags.loopback = 1; ct_dnat(B);. If the NAT rule is of
2490 type dnat_and_snat and has stateless=true in the options,
2491 then the action would be ip4/6.dst= (B).
2492
2493 If the NAT rule has allowed_ext_ips configured, then
2494 there is an additional match ip4.src == allowed_ext_ips .
2495 Similarly, for IPV6, match would be ip6.src == al‐
2496 lowed_ext_ips.
2497
2498 If the NAT rule has exempted_ext_ips set, then there is
2499 an additional flow configured at priority 101. The flow
2500 matches if source ip is an exempted_ext_ip and the action
2501 is next; . This flow is used to bypass the ct_dnat action
2502 for a packet originating from exempted_ext_ips.
2503
2504 • A priority-0 logical flow with match 1 has actions next;.
2505
2506 Ingress Table 6: DNAT on Distributed Routers
2507
2508 On distributed routers, the DNAT table only handles packets with desti‐
2509 nation IP address that needs to be DNATted from a virtual IP address to
2510 a real IP address. The unDNAT processing in the reverse direction is
2511 handled in a separate table in the egress pipeline.
2512
2513 • For each configuration in the OVN Northbound database,
2514 that asks to change the destination IP address of a
2515 packet from A to B, a priority-100 flow matches ip &&
2516 ip4.dst == B && inport == GW, where GW is the logical
2517 router gateway port configured for the NAT rule, with an
2518 action ct_dnat(B);. The match will include ip6.dst == B
2519 in the IPv6 case. If the NAT rule is of type
2520 dnat_and_snat and has stateless=true in the options, then
2521 the action would be ip4/6.dst=(B).
2522
2523 If the NAT rule cannot be handled in a distributed man‐
2524 ner, then the priority-100 flow above is only programmed
2525 on the gateway chassis.
2526
2527 If the NAT rule has allowed_ext_ips configured, then
2528 there is an additional match ip4.src == allowed_ext_ips .
2529 Similarly, for IPV6, match would be ip6.src == al‐
2530 lowed_ext_ips.
2531
2532 If the NAT rule has exempted_ext_ips set, then there is
2533 an additional flow configured at priority 101. The flow
2534 matches if source ip is an exempted_ext_ip and the action
2535 is next; . This flow is used to bypass the ct_dnat action
2536 for a packet originating from exempted_ext_ips.
2537
2538 A priority-0 logical flow with match 1 has actions next;.
2539
2540 Ingress Table 7: ECMP symmetric reply processing
2541
2542 • If ECMP routes with symmetric reply are configured in the
2543 OVN_Northbound database for a gateway router, a prior‐
2544 ity-100 flow is added for each router port on which sym‐
2545 metric replies are configured. The matching logic for
2546 these ports essentially reverses the configured logic of
2547 the ECMP route. So for instance, a route with a destina‐
2548 tion routing policy will instead match if the source IP
2549 address matches the static route’s prefix. The flow uses
2550 the action ct_commit { ct_label.ecmp_reply_eth =
2551 eth.src;" " ct_mark.ecmp_reply_port = K;}; next; to com‐
2552 mit the connection and storing eth.src and the ECMP reply
2553 port binding tunnel key K in the ct_label.
2554
2555 Ingress Table 8: IPv6 ND RA option processing
2556
2557 • A priority-50 logical flow is added for each logical
2558 router port configured with IPv6 ND RA options which
2559 matches IPv6 ND Router Solicitation packet and applies
2560 the action put_nd_ra_opts and advances the packet to the
2561 next table.
2562
2563 reg0[5] = put_nd_ra_opts(options);next;
2564
2565
2566 For a valid IPv6 ND RS packet, this transforms the packet
2567 into an IPv6 ND RA reply and sets the RA options to the
2568 packet and stores 1 into reg0[5]. For other kinds of
2569 packets, it just stores 0 into reg0[5]. Either way, it
2570 continues to the next table.
2571
2572 • A priority-0 logical flow with match 1 has actions next;.
2573
2574 Ingress Table 9: IPv6 ND RA responder
2575
2576 This table implements IPv6 ND RA responder for the IPv6 ND RA replies
2577 generated by the previous table.
2578
2579 • A priority-50 logical flow is added for each logical
2580 router port configured with IPv6 ND RA options which
2581 matches IPv6 ND RA packets and reg0[5] == 1 and responds
2582 back to the inport after applying these actions. If
2583 reg0[5] is set to 1, it means that the action
2584 put_nd_ra_opts was successful.
2585
2586 eth.dst = eth.src;
2587 eth.src = E;
2588 ip6.dst = ip6.src;
2589 ip6.src = I;
2590 outport = P;
2591 flags.loopback = 1;
2592 output;
2593
2594
2595 where E is the MAC address and I is the IPv6 link local
2596 address of the logical router port.
2597
2598 (This terminates packet processing in ingress pipeline;
2599 the packet does not go to the next ingress table.)
2600
2601 • A priority-0 logical flow with match 1 has actions next;.
2602
2603 Ingress Table 10: IP Routing Pre
2604
2605 If a packet arrived at this table from Logical Router Port P which has
2606 options:route_table value set, a logical flow with match inport == "P"
2607 with priority 100 and action setting unique-generated per-datapath
2608 32-bit value (non-zero) in OVS register 7. This register’s value is
2609 checked in next table. If packet didn’t match any configured inport
2610 (<main> route table), register 7 value is set to 0.
2611
2612 This table contains the following logical flows:
2613
2614 • Priority-100 flow with match inport == "LRP_NAME" value
2615 and action, which set route table identifier in reg7.
2616
2617 A priority-0 logical flow with match 1 has actions reg7 =
2618 0; next;.
2619
2620 Ingress Table 11: IP Routing
2621
2622 A packet that arrives at this table is an IP packet that should be
2623 routed to the address in ip4.dst or ip6.dst. This table implements IP
2624 routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
2625 (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
2626 and advances to the next table for ARP resolution. It also sets reg1
2627 (or xxreg1) to the IP address owned by the selected router port
2628 (ingress table ARP Request will generate an ARP request, if needed,
2629 with reg0 as the target protocol address and reg1 as the source proto‐
2630 col address).
2631
2632 For ECMP routes, i.e. multiple static routes with same policy and pre‐
2633 fix but different nexthops, the above actions are deferred to next ta‐
2634 ble. This table, instead, is responsible for determine the ECMP group
2635 id and select a member id within the group based on 5-tuple hashing. It
2636 stores group id in reg8[0..15] and member id in reg8[16..31]. This step
2637 is skipped with a priority-10300 rule if the traffic going out the ECMP
2638 route is reply traffic, and the ECMP route was configured to use sym‐
2639 metric replies. Instead, the stored values in conntrack is used to
2640 choose the destination. The ct_label.ecmp_reply_eth tells the destina‐
2641 tion MAC address to which the packet should be sent. The
2642 ct_mark.ecmp_reply_port tells the logical router port on which the
2643 packet should be sent. These values saved to the conntrack fields when
2644 the initial ingress traffic is received over the ECMP route and commit‐
2645 ted to conntrack. The priority-10300 flows in this stage set the out‐
2646 port, while the eth.dst is set by flows at the ARP/ND Resolution stage.
2647
2648 This table contains the following logical flows:
2649
2650 • Priority-10550 flow that drops IPv6 Router Solicita‐
2651 tion/Advertisement packets that were not processed in
2652 previous tables.
2653
2654 • Priority-10550 flows that drop IGMP and MLD packets with
2655 source MAC address owned by the router. These are used to
2656 prevent looping statically forwarded IGMP and MLD packets
2657 for which TTL is not decremented (it is always 1).
2658
2659 • Priority-10500 flows that match IP multicast traffic des‐
2660 tined to groups registered on any of the attached
2661 switches and sets outport to the associated multicast
2662 group that will eventually flood the traffic to all in‐
2663 terested attached logical switches. The flows also decre‐
2664 ment TTL.
2665
2666 • Priority-10460 flows that match IGMP and MLD control
2667 packets, set outport to the MC_STATIC multicast group,
2668 which ovn-northd populates with the logical ports that
2669 have options :mcast_flood=’true’. If no router ports are
2670 configured to flood multicast traffic the packets are
2671 dropped.
2672
2673 • Priority-10450 flow that matches unregistered IP multi‐
2674 cast traffic decrements TTL and sets outport to the
2675 MC_STATIC multicast group, which ovn-northd populates
2676 with the logical ports that have options
2677 :mcast_flood=’true’. If no router ports are configured to
2678 flood multicast traffic the packets are dropped.
2679
2680 • IPv4 routing table. For each route to IPv4 network N with
2681 netmask M, on router port P with IP address A and Ether‐
2682 net address E, a logical flow with match ip4.dst == N/M,
2683 whose priority is the number of 1-bits in M, has the fol‐
2684 lowing actions:
2685
2686 ip.ttl--;
2687 reg8[0..15] = 0;
2688 reg0 = G;
2689 reg1 = A;
2690 eth.src = E;
2691 outport = P;
2692 flags.loopback = 1;
2693 next;
2694
2695
2696 (Ingress table 1 already verified that ip.ttl--; will not
2697 yield a TTL exceeded error.)
2698
2699 If the route has a gateway, G is the gateway IP address.
2700 Instead, if the route is from a configured static route,
2701 G is the next hop IP address. Else it is ip4.dst.
2702
2703 • IPv6 routing table. For each route to IPv6 network N with
2704 netmask M, on router port P with IP address A and Ether‐
2705 net address E, a logical flow with match in CIDR notation
2706 ip6.dst == N/M, whose priority is the integer value of M,
2707 has the following actions:
2708
2709 ip.ttl--;
2710 reg8[0..15] = 0;
2711 xxreg0 = G;
2712 xxreg1 = A;
2713 eth.src = E;
2714 outport = inport;
2715 flags.loopback = 1;
2716 next;
2717
2718
2719 (Ingress table 1 already verified that ip.ttl--; will not
2720 yield a TTL exceeded error.)
2721
2722 If the route has a gateway, G is the gateway IP address.
2723 Instead, if the route is from a configured static route,
2724 G is the next hop IP address. Else it is ip6.dst.
2725
2726 If the address A is in the link-local scope, the route
2727 will be limited to sending on the ingress port.
2728
2729 For each static route the reg7 == id && is prefixed in
2730 logical flow match portion. For routes with route_table
2731 value set a unique non-zero id is used. For routes within
2732 <main> route table (no route table set), this id value is
2733 0.
2734
2735 For each connected route (route to the LRP’s subnet CIDR)
2736 the logical flow match portion has no reg7 == id && pre‐
2737 fix to have route to LRP’s subnets in all routing tables.
2738
2739 • For ECMP routes, they are grouped by policy and prefix.
2740 An unique id (non-zero) is assigned to each group, and
2741 each member is also assigned an unique id (non-zero)
2742 within each group.
2743
2744 For each IPv4/IPv6 ECMP group with group id GID and mem‐
2745 ber ids MID1, MID2, ..., a logical flow with match in
2746 CIDR notation ip4.dst == N/M, or ip6.dst == N/M, whose
2747 priority is the integer value of M, has the following ac‐
2748 tions:
2749
2750 ip.ttl--;
2751 flags.loopback = 1;
2752 reg8[0..15] = GID;
2753 select(reg8[16..31], MID1, MID2, ...);
2754
2755
2756 Ingress Table 12: IP_ROUTING_ECMP
2757
2758 This table implements the second part of IP routing for ECMP routes
2759 following the previous table. If a packet matched a ECMP group in the
2760 previous table, this table matches the group id and member id stored
2761 from the previous table, setting reg0 (or xxreg0 for IPv6) to the next-
2762 hop IP address (leaving ip4.dst or ip6.dst, the packet’s final destina‐
2763 tion, unchanged) and advances to the next table for ARP resolution. It
2764 also sets reg1 (or xxreg1) to the IP address owned by the selected
2765 router port (ingress table ARP Request will generate an ARP request, if
2766 needed, with reg0 as the target protocol address and reg1 as the source
2767 protocol address).
2768
2769 This processing is skipped for reply traffic being sent out of an ECMP
2770 route if the route was configured to use symmetric replies.
2771
2772 This table contains the following logical flows:
2773
2774 • A priority-150 flow that matches reg8[0..15] == 0 with
2775 action next; directly bypasses packets of non-ECMP
2776 routes.
2777
2778 • For each member with ID MID in each ECMP group with ID
2779 GID, a priority-100 flow with match reg8[0..15] == GID &&
2780 reg8[16..31] == MID has following actions:
2781
2782 [xx]reg0 = G;
2783 [xx]reg1 = A;
2784 eth.src = E;
2785 outport = P;
2786
2787
2788 Ingress Table 13: Router policies
2789
2790 This table adds flows for the logical router policies configured on the
2791 logical router. Please see the OVN_Northbound database Logi‐
2792 cal_Router_Policy table documentation in ovn-nb for supported actions.
2793
2794 • For each router policy configured on the logical router,
2795 a logical flow is added with specified priority, match
2796 and actions.
2797
2798 • If the policy action is reroute with 2 or more nexthops
2799 defined, then the logical flow is added with the follow‐
2800 ing actions:
2801
2802 reg8[0..15] = GID;
2803 reg8[16..31] = select(1,..n);
2804
2805
2806 where GID is the ECMP group id generated by ovn-northd
2807 for this policy and n is the number of nexthops. select
2808 action selects one of the nexthop member id, stores it in
2809 the register reg8[16..31] and advances the packet to the
2810 next stage.
2811
2812 • If the policy action is reroute with just one nexhop,
2813 then the logical flow is added with the following ac‐
2814 tions:
2815
2816 [xx]reg0 = H;
2817 eth.src = E;
2818 outport = P;
2819 reg8[0..15] = 0;
2820 flags.loopback = 1;
2821 next;
2822
2823
2824 where H is the nexthop defined in the router policy, E
2825 is the ethernet address of the logical router port from
2826 which the nexthop is reachable and P is the logical
2827 router port from which the nexthop is reachable.
2828
2829 • If a router policy has the option pkt_mark=m set and if
2830 the action is not drop, then the action also includes
2831 pkt.mark = m to mark the packet with the marker m.
2832
2833 Ingress Table 14: ECMP handling for router policies
2834
2835 This table handles the ECMP for the router policies configured with
2836 multiple nexthops.
2837
2838 • A priority-150 flow is added to advance the packet to the
2839 next stage if the ECMP group id register reg8[0..15] is
2840 0.
2841
2842 • For each ECMP reroute router policy with multiple nex‐
2843 thops, a priority-100 flow is added for each nexthop H
2844 with the match reg8[0..15] == GID && reg8[16..31] == M
2845 where GID is the router policy group id generated by
2846 ovn-northd and M is the member id of the nexthop H gener‐
2847 ated by ovn-northd. The following actions are added to
2848 the flow:
2849
2850 [xx]reg0 = H;
2851 eth.src = E;
2852 outport = P
2853 "flags.loopback = 1; "
2854 "next;"
2855
2856
2857 where H is the nexthop defined in the router policy, E
2858 is the ethernet address of the logical router port from
2859 which the nexthop is reachable and P is the logical
2860 router port from which the nexthop is reachable.
2861
2862 Ingress Table 15: ARP/ND Resolution
2863
2864 Any packet that reaches this table is an IP packet whose next-hop IPv4
2865 address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
2866 contains the final destination.) This table resolves the IP address in
2867 reg0 (or xxreg0) into an output port in outport and an Ethernet address
2868 in eth.dst, using the following flows:
2869
2870 • A priority-500 flow that matches IP multicast traffic
2871 that was allowed in the routing pipeline. For this kind
2872 of traffic the outport was already set so the flow just
2873 advances to the next table.
2874
2875 • Priority-200 flows that match ECMP reply traffic for the
2876 routes configured to use symmetric replies, with actions
2877 push(xxreg1); xxreg1 = ct_label; eth.dst =
2878 xxreg1[32..79]; pop(xxreg1); next;. xxreg1 is used here
2879 to avoid masked access to ct_label, to make the flow HW-
2880 offloading friendly.
2881
2882 • Static MAC bindings. MAC bindings can be known statically
2883 based on data in the OVN_Northbound database. For router
2884 ports connected to logical switches, MAC bindings can be
2885 known statically from the addresses column in the Logi‐
2886 cal_Switch_Port table. For router ports connected to
2887 other logical routers, MAC bindings can be known stati‐
2888 cally from the mac and networks column in the Logi‐
2889 cal_Router_Port table. (Note: the flow is NOT installed
2890 for the IP addresses that belong to a neighbor logical
2891 router port if the current router has the options:dy‐
2892 namic_neigh_routers set to true)
2893
2894 For each IPv4 address A whose host is known to have Eth‐
2895 ernet address E on router port P, a priority-100 flow
2896 with match outport === P && reg0 == A has actions eth.dst
2897 = E; next;.
2898
2899 For each virtual ip A configured on a logical port of
2900 type virtual and its virtual parent set in its corre‐
2901 sponding Port_Binding record and the virtual parent with
2902 the Ethernet address E and the virtual ip is reachable
2903 via the router port P, a priority-100 flow with match
2904 outport === P && xxreg0/reg0 == A has actions eth.dst =
2905 E; next;.
2906
2907 For each virtual ip A configured on a logical port of
2908 type virtual and its virtual parent not set in its corre‐
2909 sponding Port_Binding record and the virtual ip A is
2910 reachable via the router port P, a priority-100 flow with
2911 match outport === P && xxreg0/reg0 == A has actions
2912 eth.dst = 00:00:00:00:00:00; next;. This flow is added so
2913 that the ARP is always resolved for the virtual ip A by
2914 generating ARP request and not consulting the MAC_Binding
2915 table as it can have incorrect value for the virtual ip
2916 A.
2917
2918 For each IPv6 address A whose host is known to have Eth‐
2919 ernet address E on router port P, a priority-100 flow
2920 with match outport === P && xxreg0 == A has actions
2921 eth.dst = E; next;.
2922
2923 For each logical router port with an IPv4 address A and a
2924 mac address of E that is reachable via a different logi‐
2925 cal router port P, a priority-100 flow with match outport
2926 === P && reg0 == A has actions eth.dst = E; next;.
2927
2928 For each logical router port with an IPv6 address A and a
2929 mac address of E that is reachable via a different logi‐
2930 cal router port P, a priority-100 flow with match outport
2931 === P && xxreg0 == A has actions eth.dst = E; next;.
2932
2933 • Static MAC bindings from NAT entries. MAC bindings can
2934 also be known for the entries in the NAT table. Below
2935 flows are programmed for distributed logical routers i.e
2936 with a distributed router port.
2937
2938 For each row in the NAT table with IPv4 address A in the
2939 external_ip column of NAT table, a priority-100 flow with
2940 the match outport === P && reg0 == A has actions eth.dst
2941 = E; next;, where P is the distributed logical router
2942 port, E is the Ethernet address if set in the exter‐
2943 nal_mac column of NAT table for of type dnat_and_snat,
2944 otherwise the Ethernet address of the distributed logical
2945 router port. Note that if the external_ip is not within a
2946 subnet on the owning logical router, then OVN will only
2947 create ARP resolution flows if the options:add_route is
2948 set to true. Otherwise, no ARP resolution flows will be
2949 added.
2950
2951 For IPv6 NAT entries, same flows are added, but using the
2952 register xxreg0 for the match.
2953
2954 • Traffic with IP destination an address owned by the
2955 router should be dropped. Such traffic is normally
2956 dropped in ingress table IP Input except for IPs that are
2957 also shared with SNAT rules. However, if there was no un‐
2958 SNAT operation that happened successfully until this
2959 point in the pipeline and the destination IP of the
2960 packet is still a router owned IP, the packets can be
2961 safely dropped.
2962
2963 A priority-1 logical flow with match ip4.dst = {..}
2964 matches on traffic destined to router owned IPv4 ad‐
2965 dresses which are also SNAT IPs. This flow has action
2966 drop;.
2967
2968 A priority-1 logical flow with match ip6.dst = {..}
2969 matches on traffic destined to router owned IPv6 ad‐
2970 dresses which are also SNAT IPs. This flow has action
2971 drop;.
2972
2973 • Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
2974 ings that have become known dynamically through ARP or
2975 neighbor discovery. (The ingress table ARP Request will
2976 issue an ARP or neighbor solicitation request for cases
2977 where the binding is not yet known.)
2978
2979 A priority-0 logical flow with match ip4 has actions
2980 get_arp(outport, reg0); next;.
2981
2982 A priority-0 logical flow with match ip6 has actions
2983 get_nd(outport, xxreg0); next;.
2984
2985 • For a distributed gateway LRP with redirect-type set to
2986 bridged, a priority-50 flow will match outport ==
2987 "ROUTER_PORT" and !is_chassis_resident ("cr-ROUTER_PORT")
2988 has actions eth.dst = E; next;, where E is the ethernet
2989 address of the logical router port.
2990
2991 Ingress Table 16: Check packet length
2992
2993 For distributed logical routers or gateway routers with gateway port
2994 configured with options:gateway_mtu to a valid integer value, this ta‐
2995 ble adds a priority-50 logical flow with the match outport == GW_PORT
2996 where GW_PORT is the gateway router port and applies the action
2997 check_pkt_larger and advances the packet to the next table.
2998
2999 REGBIT_PKT_LARGER = check_pkt_larger(L); next;
3000
3001
3002 where L is the packet length to check for. If the packet is larger than
3003 L, it stores 1 in the register bit REGBIT_PKT_LARGER. The value of L is
3004 taken from options:gateway_mtu column of Logical_Router_Port row.
3005
3006 If the port is also configured with options:gateway_mtu_bypass then an‐
3007 other flow is added, with priority-55, to bypass the check_pkt_larger
3008 flow.
3009
3010 This table adds one priority-0 fallback flow that matches all packets
3011 and advances to the next table.
3012
3013 Ingress Table 17: Handle larger packets
3014
3015 For distributed logical routers or gateway routers with gateway port
3016 configured with options:gateway_mtu to a valid integer value, this ta‐
3017 ble adds the following priority-150 logical flow for each logical
3018 router port with the match inport == LRP && outport == GW_PORT && REG‐
3019 BIT_PKT_LARGER && !REGBIT_EGRESS_LOOPBACK, where LRP is the logical
3020 router port and GW_PORT is the gateway port and applies the following
3021 action for ipv4 and ipv6 respectively:
3022
3023 icmp4 {
3024 icmp4.type = 3; /* Destination Unreachable. */
3025 icmp4.code = 4; /* Frag Needed and DF was Set. */
3026 icmp4.frag_mtu = M;
3027 eth.dst = E;
3028 ip4.dst = ip4.src;
3029 ip4.src = I;
3030 ip.ttl = 255;
3031 REGBIT_EGRESS_LOOPBACK = 1;
3032 REGBIT_PKT_LARGER = 0;
3033 next(pipeline=ingress, table=0);
3034 };
3035 icmp6 {
3036 icmp6.type = 2;
3037 icmp6.code = 0;
3038 icmp6.frag_mtu = M;
3039 eth.dst = E;
3040 ip6.dst = ip6.src;
3041 ip6.src = I;
3042 ip.ttl = 255;
3043 REGBIT_EGRESS_LOOPBACK = 1;
3044 REGBIT_PKT_LARGER = 0;
3045 next(pipeline=ingress, table=0);
3046 };
3047
3048
3049 • Where M is the (fragment MTU - 58) whose value is taken
3050 from options:gateway_mtu column of Logical_Router_Port
3051 row.
3052
3053 • E is the Ethernet address of the logical router port.
3054
3055 • I is the IPv4/IPv6 address of the logical router port.
3056
3057 This table adds one priority-0 fallback flow that matches all packets
3058 and advances to the next table.
3059
3060 Ingress Table 18: Gateway Redirect
3061
3062 For distributed logical routers where one or more of the logical router
3063 ports specifies a gateway chassis, this table redirects certain packets
3064 to the distributed gateway port instances on the gateway chassises.
3065 This table has the following flows:
3066
3067 • For each NAT rule in the OVN Northbound database that can
3068 be handled in a distributed manner, a priority-100 logi‐
3069 cal flow with match ip4.src == B && outport == GW &&
3070 is_chassis_resident(P), where GW is the distributed gate‐
3071 way port specified in the NAT rule and P is the NAT logi‐
3072 cal port. IP traffic matching the above rule will be man‐
3073 aged locally setting reg1 to C and eth.src to D, where C
3074 is NAT external ip and D is NAT external mac.
3075
3076 • For each NAT rule in the OVN Northbound database that can
3077 be handled in a distributed manner, a priority-80 logical
3078 flow with drop action if the NAT logical port is a vir‐
3079 tual port not claimed by any chassis yet.
3080
3081 • A priority-50 logical flow with match outport == GW has
3082 actions outport = CR; next;, where GW is the logical
3083 router distributed gateway port and CR is the chas‐
3084 sisredirect port representing the instance of the logical
3085 router distributed gateway port on the gateway chassis.
3086
3087 • A priority-0 logical flow with match 1 has actions next;.
3088
3089 Ingress Table 19: ARP Request
3090
3091 In the common case where the Ethernet destination has been resolved,
3092 this table outputs the packet. Otherwise, it composes and sends an ARP
3093 or IPv6 Neighbor Solicitation request. It holds the following flows:
3094
3095 • Unknown MAC address. A priority-100 flow for IPv4 packets
3096 with match eth.dst == 00:00:00:00:00:00 has the following
3097 actions:
3098
3099 arp {
3100 eth.dst = ff:ff:ff:ff:ff:ff;
3101 arp.spa = reg1;
3102 arp.tpa = reg0;
3103 arp.op = 1; /* ARP request. */
3104 output;
3105 };
3106
3107
3108 Unknown MAC address. For each IPv6 static route associ‐
3109 ated with the router with the nexthop IP: G, a prior‐
3110 ity-200 flow for IPv6 packets with match eth.dst ==
3111 00:00:00:00:00:00 && xxreg0 == G with the following ac‐
3112 tions is added:
3113
3114 nd_ns {
3115 eth.dst = E;
3116 ip6.dst = I
3117 nd.target = G;
3118 output;
3119 };
3120
3121
3122 Where E is the multicast mac derived from the Gateway IP,
3123 I is the solicited-node multicast address corresponding
3124 to the target address G.
3125
3126 Unknown MAC address. A priority-100 flow for IPv6 packets
3127 with match eth.dst == 00:00:00:00:00:00 has the following
3128 actions:
3129
3130 nd_ns {
3131 nd.target = xxreg0;
3132 output;
3133 };
3134
3135
3136 (Ingress table IP Routing initialized reg1 with the IP
3137 address owned by outport and (xx)reg0 with the next-hop
3138 IP address)
3139
3140 The IP packet that triggers the ARP/IPv6 NS request is
3141 dropped.
3142
3143 • Known MAC address. A priority-0 flow with match 1 has ac‐
3144 tions output;.
3145
3146 Egress Table 0: Check DNAT local
3147
3148 This table checks if the packet needs to be DNATed in the router
3149 ingress table lr_in_dnat after it is SNATed and looped back to the
3150 ingress pipeline. This check is done only for routers configured with
3151 distributed gateway ports and NAT entries. This check is done so that
3152 SNAT and DNAT is done in different zones instead of a common zone.
3153
3154 • For each NAT rule in the OVN Northbound database on a
3155 distributed router, a priority-50 logical flow with match
3156 ip4.dst == E && is_chassis_resident(P), where E is the
3157 external IP address specified in the NAT rule, GW is the
3158 logical router distributed gateway port. For
3159 dnat_and_snat NAT rule, P is the logical port specified
3160 in the NAT rule. If logical_port column of NAT table is
3161 NOT set, then P is the chassisredirect port of GW with
3162 the actions: REGBIT_DST_NAT_IP_LOCAL = 1; next;
3163
3164 • A priority-0 logical flow with match 1 has actions REG‐
3165 BIT_DST_NAT_IP_LOCAL = 0; next;.
3166
3167 Egress Table 1: UNDNAT
3168
3169 This is for already established connections’ reverse traffic. i.e.,
3170 DNAT has already been done in ingress pipeline and now the packet has
3171 entered the egress pipeline as part of a reply. This traffic is unD‐
3172 NATed here.
3173
3174 • A priority-0 logical flow with match 1 has actions next;.
3175
3176 Egress Table 1: UNDNAT on Gateway Routers
3177
3178 • For all IP packets, a priority-50 flow with an action
3179 flags.loopback = 1; ct_dnat;.
3180
3181 Egress Table 1: UNDNAT on Distributed Routers
3182
3183 • For all the configured load balancing rules for a router
3184 with gateway port in OVN_Northbound database that in‐
3185 cludes an IPv4 address VIP, for every backend IPv4 ad‐
3186 dress B defined for the VIP a priority-120 flow is pro‐
3187 grammed on gateway chassis that matches ip && ip4.src ==
3188 B && outport == GW, where GW is the logical router gate‐
3189 way port with an action ct_dnat_in_czone;. If the backend
3190 IPv4 address B is also configured with L4 port PORT of
3191 protocol P, then the match also includes P.src == PORT.
3192 These flows are not added for load balancers with IPv6
3193 VIPs.
3194
3195 If the router is configured to force SNAT any load-bal‐
3196 anced packets, above action will be replaced by
3197 flags.force_snat_for_lb = 1; ct_dnat;.
3198
3199 • For each configuration in the OVN Northbound database
3200 that asks to change the destination IP address of a
3201 packet from an IP address of A to B, a priority-100 flow
3202 matches ip && ip4.src == B && outport == GW, where GW is
3203 the logical router gateway port, with an action
3204 ct_dnat_in_czone;. If the NAT rule is of type
3205 dnat_and_snat and has stateless=true in the options, then
3206 the action would be ip4/6.src= (B).
3207
3208 If the NAT rule cannot be handled in a distributed man‐
3209 ner, then the priority-100 flow above is only programmed
3210 on the gateway chassis with the action ct_dnat_in_czone.
3211
3212 If the NAT rule can be handled in a distributed manner,
3213 then there is an additional action eth.src = EA;, where
3214 EA is the ethernet address associated with the IP address
3215 A in the NAT rule. This allows upstream MAC learning to
3216 point to the correct chassis.
3217
3218 Egress Table 2: Post UNDNAT
3219
3220 • A priority-50 logical flow is added that commits any un‐
3221 tracked flows from the previous table lr_out_undnat for
3222 Gateway routers. This flow matches on ct.new && ip with
3223 action ct_commit { } ; next; .
3224
3225 • A priority-0 logical flow with match 1 has actions next;.
3226
3227 Egress Table 3: SNAT
3228
3229 Packets that are configured to be SNATed get their source IP address
3230 changed based on the configuration in the OVN Northbound database.
3231
3232 • A priority-120 flow to advance the IPv6 Neighbor solici‐
3233 tation packet to next table to skip SNAT. In the case
3234 where ovn-controller injects an IPv6 Neighbor Solicita‐
3235 tion packet (for nd_ns action) we don’t want the packet
3236 to go throught conntrack.
3237
3238 Egress Table 3: SNAT on Gateway Routers
3239
3240 • If the Gateway router in the OVN Northbound database has
3241 been configured to force SNAT a packet (that has been
3242 previously DNATted) to B, a priority-100 flow matches
3243 flags.force_snat_for_dnat == 1 && ip with an action
3244 ct_snat(B);.
3245
3246 • If a load balancer configured to skip snat has been ap‐
3247 plied to the Gateway router pipeline, a priority-120 flow
3248 matches flags.skip_snat_for_lb == 1 && ip with an action
3249 next;.
3250
3251 • If the Gateway router in the OVN Northbound database has
3252 been configured to force SNAT a packet (that has been
3253 previously load-balanced) using router IP (i.e op‐
3254 tions:lb_force_snat_ip=router_ip), then for each logical
3255 router port P attached to the Gateway router, a prior‐
3256 ity-110 flow matches flags.force_snat_for_lb == 1 && out‐
3257 port == P
3258 with an action ct_snat(R); where R is the IP configured
3259 on the router port. If R is an IPv4 address then the
3260 match will also include ip4 and if it is an IPv6 address,
3261 then the match will also include ip6.
3262
3263 If the logical router port P is configured with multiple
3264 IPv4 and multiple IPv6 addresses, only the first IPv4 and
3265 first IPv6 address is considered.
3266
3267 • If the Gateway router in the OVN Northbound database has
3268 been configured to force SNAT a packet (that has been
3269 previously load-balanced) to B, a priority-100 flow
3270 matches flags.force_snat_for_lb == 1 && ip with an action
3271 ct_snat(B);.
3272
3273 • For each configuration in the OVN Northbound database,
3274 that asks to change the source IP address of a packet
3275 from an IP address of A or to change the source IP ad‐
3276 dress of a packet that belongs to network A to B, a flow
3277 matches ip && ip4.src == A && (!ct.trk || !ct.rpl) with
3278 an action ct_snat(B);. The priority of the flow is calcu‐
3279 lated based on the mask of A, with matches having larger
3280 masks getting higher priorities. If the NAT rule is of
3281 type dnat_and_snat and has stateless=true in the options,
3282 then the action would be ip4/6.src= (B).
3283
3284 • If the NAT rule has allowed_ext_ips configured, then
3285 there is an additional match ip4.dst == allowed_ext_ips .
3286 Similarly, for IPV6, match would be ip6.dst == al‐
3287 lowed_ext_ips.
3288
3289 • If the NAT rule has exempted_ext_ips set, then there is
3290 an additional flow configured at the priority + 1 of cor‐
3291 responding NAT rule. The flow matches if destination ip
3292 is an exempted_ext_ip and the action is next; . This flow
3293 is used to bypass the ct_snat action for a packet which
3294 is destinted to exempted_ext_ips.
3295
3296 • A priority-0 logical flow with match 1 has actions next;.
3297
3298 Egress Table 3: SNAT on Distributed Routers
3299
3300 • For each configuration in the OVN Northbound database,
3301 that asks to change the source IP address of a packet
3302 from an IP address of A or to change the source IP ad‐
3303 dress of a packet that belongs to network A to B, two
3304 flows are added. The priority P of these flows are calcu‐
3305 lated based on the mask of A, with matches having larger
3306 masks getting higher priorities.
3307
3308 If the NAT rule cannot be handled in a distributed man‐
3309 ner, then the below flows are only programmed on the
3310 gateway chassis increasing flow priority by 128 in order
3311 to be run first.
3312
3313 • The first flow is added with the calculated prior‐
3314 ity P and match ip && ip4.src == A && outport ==
3315 GW, where GW is the logical router gateway port,
3316 with an action ct_snat_in_czone(B); to SNATed in
3317 the common zone. If the NAT rule is of type
3318 dnat_and_snat and has stateless=true in the op‐
3319 tions, then the action would be ip4/6.src=(B).
3320
3321 • The second flow is added with the calculated pri‐
3322 ority P + 1 and match ip && ip4.src == A && out‐
3323 port == GW && REGBIT_DST_NAT_IP_LOCAL == 0, where
3324 GW is the logical router gateway port, with an ac‐
3325 tion ct_snat(B); to SNAT in the snat zone. If the
3326 NAT rule is of type dnat_and_snat and has state‐
3327 less=true in the options, then the action would be
3328 ip4/6.src=(B).
3329
3330 If the NAT rule can be handled in a distributed manner,
3331 then there is an additional action (for both the flows)
3332 eth.src = EA;, where EA is the ethernet address associ‐
3333 ated with the IP address A in the NAT rule. This allows
3334 upstream MAC learning to point to the correct chassis.
3335
3336 If the NAT rule has allowed_ext_ips configured, then
3337 there is an additional match ip4.dst == allowed_ext_ips .
3338 Similarly, for IPV6, match would be ip6.dst == al‐
3339 lowed_ext_ips.
3340
3341 If the NAT rule has exempted_ext_ips set, then there is
3342 an additional flow configured at the priority P + 2 of
3343 corresponding NAT rule. The flow matches if destination
3344 ip is an exempted_ext_ip and the action is next; . This
3345 flow is used to bypass the ct_snat action for a flow
3346 which is destinted to exempted_ext_ips.
3347
3348 • A priority-0 logical flow with match 1 has actions next;.
3349
3350 Egress Table 4: Egress Loopback
3351
3352 For distributed logical routers where one of the logical router ports
3353 specifies a gateway chassis.
3354
3355 While UNDNAT and SNAT processing have already occurred by this point,
3356 this traffic needs to be forced through egress loopback on this dis‐
3357 tributed gateway port instance, in order for UNSNAT and DNAT processing
3358 to be applied, and also for IP routing and ARP resolution after all of
3359 the NAT processing, so that the packet can be forwarded to the destina‐
3360 tion.
3361
3362 This table has the following flows:
3363
3364 • For each NAT rule in the OVN Northbound database on a
3365 distributed router, a priority-100 logical flow with
3366 match ip4.dst == E && outport == GW && is_chassis_resi‐
3367 dent(P), where E is the external IP address specified in
3368 the NAT rule, GW is the distributed gateway port speci‐
3369 fied in the NAT rule. For dnat_and_snat NAT rule, P is
3370 the logical port specified in the NAT rule. If logi‐
3371 cal_port column of NAT table is NOT set, then P is the
3372 chassisredirect port of GW with the following actions:
3373
3374 clone {
3375 ct_clear;
3376 inport = outport;
3377 outport = "";
3378 flags = 0;
3379 flags.loopback = 1;
3380 flags.use_snat_zone = REGBIT_DST_NAT_IP_LOCAL;
3381 reg0 = 0;
3382 reg1 = 0;
3383 ...
3384 reg9 = 0;
3385 REGBIT_EGRESS_LOOPBACK = 1;
3386 next(pipeline=ingress, table=0);
3387 };
3388
3389
3390 flags.loopback is set since in_port is unchanged and the
3391 packet may return back to that port after NAT processing.
3392 REGBIT_EGRESS_LOOPBACK is set to indicate that egress
3393 loopback has occurred, in order to skip the source IP ad‐
3394 dress check against the router address.
3395
3396 • A priority-0 logical flow with match 1 has actions next;.
3397
3398 Egress Table 5: Delivery
3399
3400 Packets that reach this table are ready for delivery. It contains:
3401
3402 • Priority-110 logical flows that match IP multicast pack‐
3403 ets on each enabled logical router port and modify the
3404 Ethernet source address of the packets to the Ethernet
3405 address of the port and then execute action output;.
3406
3407 • Priority-100 logical flows that match packets on each en‐
3408 abled logical router port, with action output;.
3409
3410
3411
3412OVN 22.06.1 ovn-northd ovn-northd(8)