1ovn-northd(8) OVN Manual ovn-northd(8)
2
3
4
6 ovn-northd and ovn-northd-ddlog - Open Virtual Network central control
7 daemon
8
10 ovn-northd [options]
11
13 ovn-northd is a centralized daemon responsible for translating the
14 high-level OVN configuration into logical configuration consumable by
15 daemons such as ovn-controller. It translates the logical network con‐
16 figuration in terms of conventional network concepts, taken from the
17 OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
18 the OVN Southbound Database (see ovn-sb(5)) below it.
19
20 ovn-northd is implemented in C. ovn-northd-ddlog is a compatible imple‐
21 mentation written in DDlog, a language for incremental database pro‐
22 cessing. This documentation applies to both implementations, with dif‐
23 ferences indicated where relevant.
24
26 --ovnnb-db=database
27 The OVSDB database containing the OVN Northbound Database. If
28 the OVN_NB_DB environment variable is set, its value is used as
29 the default. Otherwise, the default is unix:/ovnnb_db.sock.
30
31 --ovnsb-db=database
32 The OVSDB database containing the OVN Southbound Database. If
33 the OVN_SB_DB environment variable is set, its value is used as
34 the default. Otherwise, the default is unix:/ovnsb_db.sock.
35
36 --ddlog-record=file
37 This option is for ovn-north-ddlog only. It causes the daemon to
38 record the initial database state and later changes to file in
39 the text-based DDlog command format. The ovn_northd_cli program
40 can later replay these changes for debugging purposes. This op‐
41 tion has a performance impact. See debugging-ddlog.rst in the
42 OVN documentation for more details.
43
44 --dry-run
45 Causes ovn-northd to start paused. In the paused state,
46 ovn-northd does not apply any changes to the databases, although
47 it continues to monitor them. For more information, see the
48 pause command, under Runtime Management Commands below.
49
50 For ovn-northd-ddlog, one could use this option with
51 --ddlog-record to generate a replay log without restarting a
52 process or disturbing a running system.
53
54 n-threads N
55 In certain situations, it may be desirable to enable paral‐
56 lelization on a system to decrease latency (at the potential
57 cost of increasing CPU usage).
58
59 This option will cause ovn-northd to use N threads when building
60 logical flows, when N is within [2-256]. If N is 1, paralleliza‐
61 tion is disabled (default behavior). If N is less than 1, then N
62 is set to 1, parallelization is disabled and a warning is
63 logged. If N is more than 256, then N is set to 256, paral‐
64 lelization is enabled (with 256 threads) and a warning is
65 logged.
66
67 ovn-northd-ddlog does not support this option.
68
69 database in the above options must be an OVSDB active or passive con‐
70 nection method, as described in ovsdb(7).
71
72 Daemon Options
73 --pidfile[=pidfile]
74 Causes a file (by default, program.pid) to be created indicating
75 the PID of the running process. If the pidfile argument is not
76 specified, or if it does not begin with /, then it is created in
77 .
78
79 If --pidfile is not specified, no pidfile is created.
80
81 --overwrite-pidfile
82 By default, when --pidfile is specified and the specified pid‐
83 file already exists and is locked by a running process, the dae‐
84 mon refuses to start. Specify --overwrite-pidfile to cause it to
85 instead overwrite the pidfile.
86
87 When --pidfile is not specified, this option has no effect.
88
89 --detach
90 Runs this program as a background process. The process forks,
91 and in the child it starts a new session, closes the standard
92 file descriptors (which has the side effect of disabling logging
93 to the console), and changes its current directory to the root
94 (unless --no-chdir is specified). After the child completes its
95 initialization, the parent exits.
96
97 --monitor
98 Creates an additional process to monitor this program. If it
99 dies due to a signal that indicates a programming error (SIGA‐
100 BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
101 or SIGXFSZ) then the monitor process starts a new copy of it. If
102 the daemon dies or exits for another reason, the monitor process
103 exits.
104
105 This option is normally used with --detach, but it also func‐
106 tions without it.
107
108 --no-chdir
109 By default, when --detach is specified, the daemon changes its
110 current working directory to the root directory after it de‐
111 taches. Otherwise, invoking the daemon from a carelessly chosen
112 directory would prevent the administrator from unmounting the
113 file system that holds that directory.
114
115 Specifying --no-chdir suppresses this behavior, preventing the
116 daemon from changing its current working directory. This may be
117 useful for collecting core files, since it is common behavior to
118 write core dumps into the current working directory and the root
119 directory is not a good directory to use.
120
121 This option has no effect when --detach is not specified.
122
123 --no-self-confinement
124 By default this daemon will try to self-confine itself to work
125 with files under well-known directories determined at build
126 time. It is better to stick with this default behavior and not
127 to use this flag unless some other Access Control is used to
128 confine daemon. Note that in contrast to other access control
129 implementations that are typically enforced from kernel-space
130 (e.g. DAC or MAC), self-confinement is imposed from the user-
131 space daemon itself and hence should not be considered as a full
132 confinement strategy, but instead should be viewed as an addi‐
133 tional layer of security.
134
135 --user=user:group
136 Causes this program to run as a different user specified in
137 user:group, thus dropping most of the root privileges. Short
138 forms user and :group are also allowed, with current user or
139 group assumed, respectively. Only daemons started by the root
140 user accepts this argument.
141
142 On Linux, daemons will be granted CAP_IPC_LOCK and
143 CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
144 that interact with a datapath, such as ovs-vswitchd, will be
145 granted three additional capabilities, namely CAP_NET_ADMIN,
146 CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
147 apply even if the new user is root.
148
149 On Windows, this option is not currently supported. For security
150 reasons, specifying this option will cause the daemon process
151 not to start.
152
153 Logging Options
154 -v[spec]
155 --verbose=[spec]
156 Sets logging levels. Without any spec, sets the log level for ev‐
157 ery module and destination to dbg. Otherwise, spec is a list of
158 words separated by spaces or commas or colons, up to one from each
159 category below:
160
161 • A valid module name, as displayed by the vlog/list command
162 on ovs-appctl(8), limits the log level change to the speci‐
163 fied module.
164
165 • syslog, console, or file, to limit the log level change to
166 only to the system log, to the console, or to a file, re‐
167 spectively. (If --detach is specified, the daemon closes
168 its standard file descriptors, so logging to the console
169 will have no effect.)
170
171 On Windows platform, syslog is accepted as a word and is
172 only useful along with the --syslog-target option (the word
173 has no effect otherwise).
174
175 • off, emer, err, warn, info, or dbg, to control the log
176 level. Messages of the given severity or higher will be
177 logged, and messages of lower severity will be filtered
178 out. off filters out all messages. See ovs-appctl(8) for a
179 definition of each log level.
180
181 Case is not significant within spec.
182
183 Regardless of the log levels set for file, logging to a file will
184 not take place unless --log-file is also specified (see below).
185
186 For compatibility with older versions of OVS, any is accepted as a
187 word but has no effect.
188
189 -v
190 --verbose
191 Sets the maximum logging verbosity level, equivalent to --ver‐
192 bose=dbg.
193
194 -vPATTERN:destination:pattern
195 --verbose=PATTERN:destination:pattern
196 Sets the log pattern for destination to pattern. Refer to ovs-ap‐
197 pctl(8) for a description of the valid syntax for pattern.
198
199 -vFACILITY:facility
200 --verbose=FACILITY:facility
201 Sets the RFC5424 facility of the log message. facility can be one
202 of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
203 ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
204 local4, local5, local6 or local7. If this option is not specified,
205 daemon is used as the default for the local system syslog and lo‐
206 cal0 is used while sending a message to the target provided via
207 the --syslog-target option.
208
209 --log-file[=file]
210 Enables logging to a file. If file is specified, then it is used
211 as the exact name for the log file. The default log file name used
212 if file is omitted is /var/log/ovn/program.log.
213
214 --syslog-target=host:port
215 Send syslog messages to UDP port on host, in addition to the sys‐
216 tem syslog. The host must be a numerical IP address, not a host‐
217 name.
218
219 --syslog-method=method
220 Specify method as how syslog messages should be sent to syslog
221 daemon. The following forms are supported:
222
223 • libc, to use the libc syslog() function. Downside of using
224 this options is that libc adds fixed prefix to every mes‐
225 sage before it is actually sent to the syslog daemon over
226 /dev/log UNIX domain socket.
227
228 • unix:file, to use a UNIX domain socket directly. It is pos‐
229 sible to specify arbitrary message format with this option.
230 However, rsyslogd 8.9 and older versions use hard coded
231 parser function anyway that limits UNIX domain socket use.
232 If you want to use arbitrary message format with older
233 rsyslogd versions, then use UDP socket to localhost IP ad‐
234 dress instead.
235
236 • udp:ip:port, to use a UDP socket. With this method it is
237 possible to use arbitrary message format also with older
238 rsyslogd. When sending syslog messages over UDP socket ex‐
239 tra precaution needs to be taken into account, for example,
240 syslog daemon needs to be configured to listen on the spec‐
241 ified UDP port, accidental iptables rules could be inter‐
242 fering with local syslog traffic and there are some secu‐
243 rity considerations that apply to UDP sockets, but do not
244 apply to UNIX domain sockets.
245
246 • null, to discard all messages logged to syslog.
247
248 The default is taken from the OVS_SYSLOG_METHOD environment vari‐
249 able; if it is unset, the default is libc.
250
251 PKI Options
252 PKI configuration is required in order to use SSL for the connections
253 to the Northbound and Southbound databases.
254
255 -p privkey.pem
256 --private-key=privkey.pem
257 Specifies a PEM file containing the private key used as
258 identity for outgoing SSL connections.
259
260 -c cert.pem
261 --certificate=cert.pem
262 Specifies a PEM file containing a certificate that certi‐
263 fies the private key specified on -p or --private-key to be
264 trustworthy. The certificate must be signed by the certifi‐
265 cate authority (CA) that the peer in SSL connections will
266 use to verify it.
267
268 -C cacert.pem
269 --ca-cert=cacert.pem
270 Specifies a PEM file containing the CA certificate for ver‐
271 ifying certificates presented to this program by SSL peers.
272 (This may be the same certificate that SSL peers use to
273 verify the certificate specified on -c or --certificate, or
274 it may be a different one, depending on the PKI design in
275 use.)
276
277 -C none
278 --ca-cert=none
279 Disables verification of certificates presented by SSL
280 peers. This introduces a security risk, because it means
281 that certificates cannot be verified to be those of known
282 trusted hosts.
283
284 Other Options
285 --unixctl=socket
286 Sets the name of the control socket on which program listens for
287 runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
288 below). If socket does not begin with /, it is interpreted as
289 relative to . If --unixctl is not used at all, the default
290 socket is /program.pid.ctl, where pid is program’s process ID.
291
292 On Windows a local named pipe is used to listen for runtime man‐
293 agement commands. A file is created in the absolute path as
294 pointed by socket or if --unixctl is not used at all, a file is
295 created as program in the configured OVS_RUNDIR directory. The
296 file exists just to mimic the behavior of a Unix domain socket.
297
298 Specifying none for socket disables the control socket feature.
299
300
301
302 -h
303 --help
304 Prints a brief help message to the console.
305
306 -V
307 --version
308 Prints version information to the console.
309
311 ovs-appctl can send commands to a running ovn-northd process. The cur‐
312 rently supported commands are described below.
313
314 exit Causes ovn-northd to gracefully terminate.
315
316 pause Pauses ovn-northd. When it is paused, ovn-northd receives
317 changes from the Northbound and Southbound database
318 changes as usual, but it does not send any updates. A
319 paused ovn-northd also drops database locks, which allows
320 any other non-paused instance of ovn-northd to take over.
321
322 resume Resumes the ovn-northd operation to process Northbound
323 and Southbound database contents and generate logical
324 flows. This will also instruct ovn-northd to aspire for
325 the lock on SB DB.
326
327 is-paused
328 Returns "true" if ovn-northd is currently paused, "false"
329 otherwise.
330
331 status Prints this server’s status. Status will be "active" if
332 ovn-northd has acquired OVSDB lock on SB DB, "standby" if
333 it has not or "paused" if this instance is paused.
334
335 sb-cluster-state-reset
336 Reset southbound database cluster status when databases
337 are destroyed and rebuilt.
338
339 If all databases in a clustered southbound database are
340 removed from disk, then the stored index of all databases
341 will be reset to zero. This will cause ovn-northd to be
342 unable to read or write to the southbound database, be‐
343 cause it will always detect the data as stale. In such a
344 case, run this command so that ovn-northd will reset its
345 local index so that it can interact with the southbound
346 database again.
347
348 nb-cluster-state-reset
349 Reset northbound database cluster status when databases
350 are destroyed and rebuilt.
351
352 This performs the same task as sb-cluster-state-reset ex‐
353 cept for the northbound database client.
354
355 set-n-threads N
356 Set the number of threads used for building logical
357 flows. When N is within [2-256], parallelization is en‐
358 abled. When N is 1 parallelization is disabled. When N is
359 less than 1 or more than 256, an error is returned. If
360 ovn-northd fails to start parallelization (e.g. fails to
361 setup semaphores, parallelization is disabled and an er‐
362 ror is returned.
363
364 get-n-threads
365 Return the number of threads used for building logical
366 flows.
367
368 inc-engine/show-stats
369 Display ovn-northd engine counters. For each engine node
370 the following counters have been added:
371
372 • recompute
373
374 • compute
375
376 • abort
377
378 inc-engine/show-stats engine_node_name counter_name
379 Display the ovn-northd engine counter(s) for the speci‐
380 fied engine_node_name. counter_name is optional and can
381 be one of recompute, compute or abort.
382
383 inc-engine/clear-stats
384 Reset ovn-northd engine counters.
385
386 Only ovn-northd-ddlog supports the following commands:
387
388 enable-cpu-profiling
389 disable-cpu-profiling
390 Enables or disables profiling of CPU time used by the DDlog
391 engine. When CPU profiling is enabled, the profile command
392 (see below) will include DDlog CPU usage statistics in its
393 output. Enabling CPU profiling will slow ovn-northd-ddlog.
394 Disabling CPU profiling does not clear any previously
395 recorded statistics.
396
397 profile
398 Outputs a profile of the current and peak sizes of arrange‐
399 ments inside DDlog. This profiling data can be useful for
400 optimizing DDlog code. If CPU profiling was previously en‐
401 abled (even if it was later disabled), the output also in‐
402 cludes a CPU time profile. See Profiling inside the tuto‐
403 rial in the DDlog repository for an introduction to profil‐
404 ing DDlog.
405
407 You may run ovn-northd more than once in an OVN deployment. When con‐
408 nected to a standalone or clustered DB setup, OVN will automatically
409 ensure that only one of them is active at a time. If multiple instances
410 of ovn-northd are running and the active ovn-northd fails, one of the
411 hot standby instances of ovn-northd will automatically take over.
412
413 Active-Standby with multiple OVN DB servers
414 You may run multiple OVN DB servers in an OVN deployment with:
415
416 • OVN DB servers deployed in active/passive mode with one
417 active and multiple passive ovsdb-servers.
418
419 • ovn-northd also deployed on all these nodes, using unix
420 ctl sockets to connect to the local OVN DB servers.
421
422 In such deployments, the ovn-northds on the passive nodes will process
423 the DB changes and compute logical flows to be thrown out later, be‐
424 cause write transactions are not allowed by the passive ovsdb-servers.
425 It results in unnecessary CPU usage.
426
427 With the help of runtime management command pause, you can pause
428 ovn-northd on these nodes. When a passive node becomes master, you can
429 use the runtime management command resume to resume the ovn-northd to
430 process the DB changes.
431
433 One of the main purposes of ovn-northd is to populate the Logical_Flow
434 table in the OVN_Southbound database. This section describes how
435 ovn-northd does this for switch and router logical datapaths.
436
437 Logical Switch Datapaths
438 Ingress Table 0: Admission Control and Ingress Port Security check
439
440 Ingress table 0 contains these logical flows:
441
442 • Priority 100 flows to drop packets with VLAN tags or mul‐
443 ticast Ethernet source addresses.
444
445 • For each disabled logical port, a priority 100 flow is
446 added which matches on all packets and applies the action
447 REGBIT_PORT_SEC_DROP" = 1; next;" so that the packets are
448 dropped in the next stage.
449
450 • For each (enabled) vtep logical port, a priority 70 flow
451 is added which matches on all packets and applies the ac‐
452 tion next(pipeline=ingress, table=S_SWITCH_IN_L2_LKUP) =
453 1; to skip most stages of ingress pipeline and go di‐
454 rectly to ingress L2 lookup table to determine the output
455 port. Packets from VTEP (RAMP) switch should not be sub‐
456 jected to any ACL checks. Egress pipeline will do the ACL
457 checks.
458
459 • For each enabled logical port configured with qdisc queue
460 id in the options:qdisc_queue_id column of Logi‐
461 cal_Switch_Port, a priority 70 flow is added which
462 matches on all packets and applies the action
463 set_queue(id); REGBIT_PORT_SEC_DROP" =
464 check_in_port_sec(); next;".
465
466 • A priority 1 flow is added which matches on all packets
467 for all the logical ports and applies the action REG‐
468 BIT_PORT_SEC_DROP" = check_in_port_sec(); next; to evalu‐
469 ate the port security. The action check_in_port_sec ap‐
470 plies the port security rules defined in the port_secu‐
471 rity column of Logical_Switch_Port table.
472
473 Ingress Table 1: Ingress Port Security - Apply
474
475 This table drops the packets if the port security check failed in the
476 previous stage i.e the register bit REGBIT_PORT_SEC_DROP is set to 1.
477
478 Ingress table 1 contains these logical flows:
479
480 • A priority-50 fallback flow that drops the packet if the
481 register bit REGBIT_PORT_SEC_DROP is set to 1.
482
483 • One priority-0 fallback flow that matches all packets and
484 advances to the next table.
485
486 Ingress Table 2: Lookup MAC address learning table
487
488 This table looks up the MAC learning table of the logical switch data‐
489 path to check if the port-mac pair is present or not. MAC is learnt for
490 logical switch VIF ports whose port security is disabled and ’unknown’
491 address setn as well as for localnet ports with option local‐
492 net_learn_fdb. A localnet port entry does not overwrite a VIF port en‐
493 try.
494
495 • For each such VIF logical port p whose port security is
496 disabled and ’unknown’ address set following flow is
497 added.
498
499 • Priority 100 flow with the match inport == p and
500 action reg0[11] = lookup_fdb(inport, eth.src);
501 next;
502
503 • For each such localnet logical port p following flow is
504 added.
505
506 • Priority 100 flow with the match inport == p and
507 action flags.localnet = 1; reg0[11] =
508 lookup_fdb(inport, eth.src); next;
509
510 • One priority-0 fallback flow that matches all packets and
511 advances to the next table.
512
513 Ingress Table 3: Learn MAC of ’unknown’ ports.
514
515 This table learns the MAC addresses seen on the VIF logical ports whose
516 port security is disabled and ’unknown’ address set as well as on lo‐
517 calnet ports with localnet_learn_fdb option set if the lookup_fdb ac‐
518 tion returned false in the previous table. For localnet ports (with
519 flags.localnet = 1), lookup_fdb returns true if (port, mac) is found or
520 if a mac is found for a port of type vif.
521
522 • For each such VIF logical port p whose port security is
523 disabled and ’unknown’ address set and localnet port fol‐
524 lowing flow is added.
525
526 • Priority 100 flow with the match inport == p &&
527 reg0[11] == 0 and action put_fdb(inport, eth.src);
528 next; which stores the port-mac in the mac learn‐
529 ing table of the logical switch datapath and ad‐
530 vances the packet to the next table.
531
532 • One priority-0 fallback flow that matches all packets and
533 advances to the next table.
534
535 Ingress Table 4: from-lport Pre-ACLs
536
537 This table prepares flows for possible stateful ACL processing in
538 ingress table ACLs. It contains a priority-0 flow that simply moves
539 traffic to the next table. If stateful ACLs are used in the logical
540 datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
541 1; next;) for table Pre-stateful to send IP packets to the connection
542 tracker before eventually advancing to ingress table ACLs. If special
543 ports such as route ports or localnet ports can’t use ct(), a prior‐
544 ity-110 flow is added to skip over stateful ACLs. Multicast, IPv6
545 Neighbor Discovery and MLD traffic also skips stateful ACLs. For "al‐
546 low-stateless" ACLs, a flow is added to bypass setting the hint for
547 connection tracker processing when there are stateful ACLs or LB rules;
548 REGBIT_ACL_STATELESS is set for traffic matching stateless ACL flows.
549
550 This table also has a priority-110 flow with the match eth.dst == E for
551 all logical switch datapaths to move traffic to the next table. Where E
552 is the service monitor mac defined in the options:svc_monitor_mac col‐
553 umn of NB_Global table.
554
555 Ingress Table 5: Pre-LB
556
557 This table prepares flows for possible stateful load balancing process‐
558 ing in ingress table LB and Stateful. It contains a priority-0 flow
559 that simply moves traffic to the next table. Moreover it contains two
560 priority-110 flows to move multicast, IPv6 Neighbor Discovery and MLD
561 traffic to the next table. It also contains two priority-110 flows to
562 move stateless traffic, i.e traffic for which REGBIT_ACL_STATELESS is
563 set, to the next table. If load balancing rules with virtual IP ad‐
564 dresses (and ports) are configured in OVN_Northbound database for a
565 logical switch datapath, a priority-100 flow is added with the match ip
566 to match on IP packets and sets the action reg0[2] = 1; next; to act as
567 a hint for table Pre-stateful to send IP packets to the connection
568 tracker for packet de-fragmentation (and to possibly do DNAT for al‐
569 ready established load balanced traffic) before eventually advancing to
570 ingress table Stateful. If controller_event has been enabled and load
571 balancing rules with empty backends have been added in OVN_Northbound,
572 a 130 flow is added to trigger ovn-controller events whenever the chas‐
573 sis receives a packet for that particular VIP. If event-elb meter has
574 been previously created, it will be associated to the empty_lb logical
575 flow
576
577 Prior to OVN 20.09 we were setting the reg0[0] = 1 only if the IP des‐
578 tination matches the load balancer VIP. However this had few issues
579 cases where a logical switch doesn’t have any ACLs with allow-related
580 action. To understand the issue lets a take a TCP load balancer -
581 10.0.0.10:80=10.0.0.3:80. If a logical port - p1 with IP - 10.0.0.5
582 opens a TCP connection with the VIP - 10.0.0.10, then the packet in the
583 ingress pipeline of ’p1’ is sent to the p1’s conntrack zone id and the
584 packet is load balanced to the backend - 10.0.0.3. For the reply packet
585 from the backend lport, it is not sent to the conntrack of backend
586 lport’s zone id. This is fine as long as the packet is valid. Suppose
587 the backend lport sends an invalid TCP packet (like incorrect sequence
588 number), the packet gets delivered to the lport ’p1’ without unDNATing
589 the packet to the VIP - 10.0.0.10. And this causes the connection to be
590 reset by the lport p1’s VIF.
591
592 We can’t fix this issue by adding a logical flow to drop ct.inv packets
593 in the egress pipeline since it will drop all other connections not
594 destined to the load balancers. To fix this issue, we send all the
595 packets to the conntrack in the ingress pipeline if a load balancer is
596 configured. We can now add a lflow to drop ct.inv packets.
597
598 This table also has priority-120 flows that punt all IGMP/MLD packets
599 to ovn-controller if the switch is an interconnect switch with multi‐
600 cast snooping enabled.
601
602 This table also has a priority-110 flow with the match eth.dst == E for
603 all logical switch datapaths to move traffic to the next table. Where E
604 is the service monitor mac defined in the options:svc_monitor_mac col‐
605 umn of NB_Global table.
606
607 This table also has a priority-110 flow with the match inport == I for
608 all logical switch datapaths to move traffic to the next table. Where I
609 is the peer of a logical router port. This flow is added to skip the
610 connection tracking of packets which enter from logical router datapath
611 to logical switch datapath.
612
613 Ingress Table 6: Pre-stateful
614
615 This table prepares flows for all possible stateful processing in next
616 tables. It contains a priority-0 flow that simply moves traffic to the
617 next table.
618
619 • Priority-120 flows that send the packets to connection
620 tracker using ct_lb_mark; as the action so that the al‐
621 ready established traffic destined to the load balancer
622 VIP gets DNATted. These flows match each VIPs IP and
623 port. For IPv4 traffic the flows also load the original
624 destination IP and transport port in registers reg1 and
625 reg2. For IPv6 traffic the flows also load the original
626 destination IP and transport port in registers xxreg1 and
627 reg2.
628
629 • A priority-110 flow sends the packets that don’t match
630 the above flows to connection tracker based on a hint
631 provided by the previous tables (with a match for reg0[2]
632 == 1) by using the ct_lb_mark; action.
633
634 • A priority-100 flow sends the packets to connection
635 tracker based on a hint provided by the previous tables
636 (with a match for reg0[0] == 1) by using the ct_next; ac‐
637 tion.
638
639 Ingress Table 7: from-lport ACL hints
640
641 This table consists of logical flows that set hints (reg0 bits) to be
642 used in the next stage, in the ACL processing table, if stateful ACLs
643 or load balancers are configured. Multiple hints can be set for the
644 same packet. The possible hints are:
645
646 • reg0[7]: the packet might match an allow-related ACL and
647 might have to commit the connection to conntrack.
648
649 • reg0[8]: the packet might match an allow-related ACL but
650 there will be no need to commit the connection to con‐
651 ntrack because it already exists.
652
653 • reg0[9]: the packet might match a drop/reject.
654
655 • reg0[10]: the packet might match a drop/reject ACL but
656 the connection was previously allowed so it might have to
657 be committed again with ct_label=1/1.
658
659 The table contains the following flows:
660
661 • A priority-65535 flow to advance to the next table if the
662 logical switch has no ACLs configured, otherwise a prior‐
663 ity-0 flow to advance to the next table.
664
665 • A priority-7 flow that matches on packets that initiate a
666 new session. This flow sets reg0[7] and reg0[9] and then
667 advances to the next table.
668
669 • A priority-6 flow that matches on packets that are in the
670 request direction of an already existing session that has
671 been marked as blocked. This flow sets reg0[7] and
672 reg0[9] and then advances to the next table.
673
674 • A priority-5 flow that matches untracked packets. This
675 flow sets reg0[8] and reg0[9] and then advances to the
676 next table.
677
678 • A priority-4 flow that matches on packets that are in the
679 request direction of an already existing session that has
680 not been marked as blocked. This flow sets reg0[8] and
681 reg0[10] and then advances to the next table.
682
683 • A priority-3 flow that matches on packets that are in not
684 part of established sessions. This flow sets reg0[9] and
685 then advances to the next table.
686
687 • A priority-2 flow that matches on packets that are part
688 of an established session that has been marked as
689 blocked. This flow sets reg0[9] and then advances to the
690 next table.
691
692 • A priority-1 flow that matches on packets that are part
693 of an established session that has not been marked as
694 blocked. This flow sets reg0[10] and then advances to the
695 next table.
696
697 Ingress table 8: from-lport ACL evaluation before LB
698
699 Logical flows in this table closely reproduce those in the ACL table in
700 the OVN_Northbound database for the from-lport direction without the
701 option apply-after-lb set or set to false. The priority values from the
702 ACL table have a limited range and have 1000 added to them to leave
703 room for OVN default flows at both higher and lower priorities.
704
705 • This table is responsible for evaluating ACLs, and set‐
706 ting a register bit to indicate whether the ACL decided
707 to allow, drop, or reject the traffic. The allow bit is
708 reg8[16]. The drop bit is reg8[17]. All flows in this ta‐
709 ble will advance the packet to the next table, where the
710 bits from before are evaluated to determine what to do
711 with the packet. Any flows in this table that intend for
712 the packet to pass will set reg8[16] to 1, even if an ACL
713 with an allow-type action was not matched. This lets the
714 next table know to allow the traffic to pass. These bits
715 will be referred to as the "allow", "drop", and "reject"
716 bits in the upcoming paragraphs.
717
718 • If the tier column has been configured on the ACL, then
719 OVN will also match the current tier counter against the
720 configured ACL tier. OVN keeps count of the current tier
721 in reg8[30..31].
722
723 • allow ACLs translate into logical flows that set the al‐
724 low bit to 1 and advance the packet to the next table. If
725 there are any stateful ACLs on this datapath, then allow
726 ACLs set the allow bit to one and in addition perform
727 ct_commit; (which acts as a hint for future tables to
728 commit the connection to conntrack). In case the ACL has
729 a label then reg3 is loaded with the label value and
730 reg0[13] bit is set to 1 (which acts as a hint for the
731 next tables to commit the label to conntrack).
732
733 • allow-related ACLs translate into logical flows that set
734 the allow bit and additionally have ct_commit(ct_la‐
735 bel=0/1); next; actions for new connections and reg0[1] =
736 1; next; for existing connections. In case the ACL has a
737 label then reg3 is loaded with the label value and
738 reg0[13] bit is set to 1 (which acts as a hint for the
739 next tables to commit the label to conntrack).
740
741 • allow-stateless ACLs translate into logical flows that
742 set the allow bit and advance to the next table.
743
744 • reject ACLs translate into logical flows with that set
745 the reject bit and advance to the next table.
746
747 • pass ACLs translate into logical flows that do not set
748 the allow, drop, or reject bit and advance to the next
749 table.
750
751 • Other ACLs set the drop bit and advance to the next table
752 for new or untracked connections. For known connections,
753 they set the drop bit, as well as running the ct_com‐
754 mit(ct_label=1/1); action. Setting ct_label marks a con‐
755 nection as one that was previously allowed, but should no
756 longer be allowed due to a policy change.
757
758 This table contains a priority-65535 flow to set the allow bit and ad‐
759 vance to the next table if the logical switch has no ACLs configured,
760 otherwise a priority-0 flow to advance to the next table is added. This
761 flow does not set the allow bit, so that the next table can decide
762 whether to allow or drop the packet based on the value of the op‐
763 tions:default_acl_drop column of the NB_Global table.
764
765 A priority-65532 flow is added that sets the allow bit for IPv6 Neigh‐
766 bor solicitation, Neighbor discover, Router solicitation, Router adver‐
767 tisement and MLD packets regardless of other ACLs defined.
768
769 If the logical datapath has a stateful ACL or a load balancer with VIP
770 configured, the following flows will also be added:
771
772 • If options:default_acl_drop column of NB_Global is false
773 or not set, a priority-1 flow that sets the hint to com‐
774 mit IP traffic that is not part of established sessions
775 to the connection tracker (with action reg0[1] = 1;
776 next;). This is needed for the default allow policy be‐
777 cause, while the initiator’s direction may not have any
778 stateful rules, the server’s may and then its return
779 traffic would not be known and marked as invalid.
780
781 • A priority-1 flow that sets the allow bit and sets the
782 hint to commit IP traffic to the connection tracker (with
783 action reg0[1] = 1; next;). This is needed for the de‐
784 fault allow policy because, while the initiator’s direc‐
785 tion may not have any stateful rules, the server’s may
786 and then its return traffic would not be known and marked
787 as invalid.
788
789 • A priority-65532 flow that sets the allow bit for any
790 traffic in the reply direction for a connection that has
791 been committed to the connection tracker (i.e., estab‐
792 lished flows), as long as the committed flow does not
793 have ct_mark.blocked set. We only handle traffic in the
794 reply direction here because we want all packets going in
795 the request direction to still go through the flows that
796 implement the currently defined policy based on ACLs. If
797 a connection is no longer allowed by policy,
798 ct_mark.blocked will get set and packets in the reply di‐
799 rection will no longer be allowed, either. This flow also
800 clears the register bits reg0[9] and reg0[10] and sets
801 register bit reg0[17]. If ACL logging and logging of re‐
802 lated packets is enabled, then a companion priority-65533
803 flow will be installed that accomplishes the same thing
804 but also logs the traffic.
805
806 • A priority-65532 flow that sets the allow bit for any
807 traffic that is considered related to a committed flow in
808 the connection tracker (e.g., an ICMP Port Unreachable
809 from a non-listening UDP port), as long as the committed
810 flow does not have ct_mark.blocked set. This flow also
811 applies NAT to the related traffic so that ICMP headers
812 and the inner packet have correct addresses. If ACL log‐
813 ging and logging of related packets is enabled, then a
814 companion priority-65533 flow will be installed that ac‐
815 complishes the same thing but also logs the traffic.
816
817 • A priority-65532 flow that sets the drop bit for all
818 traffic marked by the connection tracker as invalid.
819
820 • A priority-65532 flow that sets the drop bit for all
821 traffic in the reply direction with ct_mark.blocked set
822 meaning that the connection should no longer be allowed
823 due to a policy change. Packets in the request direction
824 are skipped here to let a newly created ACL re-allow this
825 connection.
826
827 If the logical datapath has any ACL or a load balancer with VIP config‐
828 ured, the following flow will also be added:
829
830 • A priority 34000 logical flow is added for each logical
831 switch datapath with the match eth.dst = E to allow the
832 service monitor reply packet destined to ovn-controller
833 that sets the allow bit, where E is the service monitor
834 mac defined in the options:svc_monitor_mac column of
835 NB_Global table.
836
837 Ingress Table 9: from-lport ACL action
838
839 Logical flows in this table decide how to proceed based on the values
840 of the allow, drop, and reject bits that may have been set in the pre‐
841 vious table.
842
843 • If no ACLs are configured, then a priority 0 flow is in‐
844 stalled that matches everything and advances to the next
845 table.
846
847 • A priority 1000 flow is installed that will advance the
848 packet to the next table if the allow bit is set.
849
850 • A priority 1000 flow is installed that will run the drop;
851 action if the drop bit is set.
852
853 • A priority 1000 flow is installed that will run the
854 tcp_reset { output <-> inport; next(pipeline=egress,ta‐
855 ble=5);} action for TCP connections,icmp4/icmp6 action
856 for UDP connections, and sctp_abort {output <-%gt; in‐
857 port; next(pipeline=egress,table=5);} action for SCTP as‐
858 sociations.
859
860 • If any ACLs have tiers configured on them, then three
861 priority 500 flows are installed. If the current tier
862 counter is 0, 1, or 2, then the current tier counter is
863 incremented by one and the packet is sent back to the
864 previous table for re-evaluation.
865
866 Ingress Table 10: from-lport QoS Marking
867
868 Logical flows in this table closely reproduce those in the QoS table
869 with the action column set in the OVN_Northbound database for the
870 from-lport direction.
871
872 • For every qos_rules entry in a logical switch with DSCP
873 marking enabled, a flow will be added at the priority
874 mentioned in the QoS table.
875
876 • One priority-0 fallback flow that matches all packets and
877 advances to the next table.
878
879 Ingress Table 11: from-lport QoS Meter
880
881 Logical flows in this table closely reproduce those in the QoS table
882 with the bandwidth column set in the OVN_Northbound database for the
883 from-lport direction.
884
885 • For every qos_rules entry in a logical switch with meter‐
886 ing enabled, a flow will be added at the priority men‐
887 tioned in the QoS table.
888
889 • One priority-0 fallback flow that matches all packets and
890 advances to the next table.
891
892 Ingress Table 12: Load balancing affinity check
893
894 Load balancing affinity check table contains the following logical
895 flows:
896
897 • For all the configured load balancing rules for a switch
898 in OVN_Northbound database where a positive affinity
899 timeout is specified in options column, that includes a
900 L4 port PORT of protocol P and IP address VIP, a prior‐
901 ity-100 flow is added. For IPv4 VIPs, the flow matches
902 ct.new && ip && ip4.dst == VIP && P.dst == PORT. For IPv6
903 VIPs, the flow matches ct.new && ip && ip6.dst == VIP&& P
904 && P.dst == PORT. The flow’s action is reg9[6] =
905 chk_lb_aff(); next;.
906
907 • A priority 0 flow is added which matches on all packets
908 and applies the action next;.
909
910 Ingress Table 13: LB
911
912 • For all the configured load balancing rules for a switch
913 in OVN_Northbound database where a positive affinity
914 timeout is specified in options column, that includes a
915 L4 port PORT of protocol P and IP address VIP, a prior‐
916 ity-150 flow is added. For IPv4 VIPs, the flow matches
917 reg9[6] == 1 && ct.new && ip && ip4.dst == VIP && P.dst
918 == PORT . For IPv6 VIPs, the flow matches reg9[6] == 1 &&
919 ct.new && ip && ip6.dst == VIP && P && P.dst == PORT.
920 The flow’s action is ct_lb_mark(args), where args con‐
921 tains comma separated IP addresses (and optional port
922 numbers) to load balance to. The address family of the IP
923 addresses of args is the same as the address family of
924 VIP.
925
926 • For all the configured load balancing rules for a switch
927 in OVN_Northbound database that includes a L4 port PORT
928 of protocol P and IP address VIP, a priority-120 flow is
929 added. For IPv4 VIPs , the flow matches ct.new && ip &&
930 ip4.dst == VIP && P.dst == PORT. For IPv6 VIPs, the flow
931 matches ct.new && ip && ip6.dst == VIP && P && P.dst ==
932 PORT. The flow’s action is ct_lb_mark(args) , where args
933 contains comma separated IP addresses (and optional port
934 numbers) to load balance to. The address family of the IP
935 addresses of args is the same as the address family of
936 VIP. If health check is enabled, then args will only con‐
937 tain those endpoints whose service monitor status entry
938 in OVN_Southbound db is either online or empty. For IPv4
939 traffic the flow also loads the original destination IP
940 and transport port in registers reg1 and reg2. For IPv6
941 traffic the flow also loads the original destination IP
942 and transport port in registers xxreg1 and reg2. The
943 above flow is created even if the load balancer is at‐
944 tached to a logical router connected to the current logi‐
945 cal switch and the install_ls_lb_from_router variable in
946 options is set to true.
947
948 • For all the configured load balancing rules for a switch
949 in OVN_Northbound database that includes just an IP ad‐
950 dress VIP to match on, OVN adds a priority-110 flow. For
951 IPv4 VIPs, the flow matches ct.new && ip && ip4.dst ==
952 VIP. For IPv6 VIPs, the flow matches ct.new && ip &&
953 ip6.dst == VIP. The action on this flow is
954 ct_lb_mark(args), where args contains comma separated IP
955 addresses of the same address family as VIP. For IPv4
956 traffic the flow also loads the original destination IP
957 and transport port in registers reg1 and reg2. For IPv6
958 traffic the flow also loads the original destination IP
959 and transport port in registers xxreg1 and reg2. The
960 above flow is created even if the load balancer is at‐
961 tached to a logical router connected to the current logi‐
962 cal switch and the install_ls_lb_from_router variable in
963 options is set to true.
964
965 • If the load balancer is created with --reject option and
966 it has no active backends, a TCP reset segment (for tcp)
967 or an ICMP port unreachable packet (for all other kind of
968 traffic) will be sent whenever an incoming packet is re‐
969 ceived for this load-balancer. Please note using --reject
970 option will disable empty_lb SB controller event for this
971 load balancer.
972
973 Ingress Table 14: Load balancing affinity learn
974
975 Load balancing affinity learn table contains the following logical
976 flows:
977
978 • For all the configured load balancing rules for a switch
979 in OVN_Northbound database where a positive affinity
980 timeout T is specified in options column, that includes a
981 L4 port PORT of protocol P and IP address VIP, a prior‐
982 ity-100 flow is added. For IPv4 VIPs, the flow matches
983 reg9[6] == 0 && ct.new && ip && ip4.dst == VIP && P.dst
984 == PORT. For IPv6 VIPs, the flow matches ct.new && ip &&
985 ip6.dst == VIP && P && P.dst == PORT . The flow’s action
986 is commit_lb_aff(vip = VIP:PORT, backend = backend ip:
987 backend port, proto = P, timeout = T); .
988
989 • A priority 0 flow is added which matches on all packets
990 and applies the action next;.
991
992 Ingress Table 15: Pre-Hairpin
993
994 • If the logical switch has load balancer(s) configured,
995 then a priority-100 flow is added with the match ip &&
996 ct.trk to check if the packet needs to be hairpinned (if
997 after load balancing the destination IP matches the
998 source IP) or not by executing the actions reg0[6] =
999 chk_lb_hairpin(); and reg0[12] = chk_lb_hairpin_reply();
1000 and advances the packet to the next table.
1001
1002 • A priority-0 flow that simply moves traffic to the next
1003 table.
1004
1005 Ingress Table 16: Nat-Hairpin
1006
1007 • If the logical switch has load balancer(s) configured,
1008 then a priority-100 flow is added with the match ip &&
1009 ct.new && ct.trk && reg0[6] == 1 which hairpins the traf‐
1010 fic by NATting source IP to the load balancer VIP by exe‐
1011 cuting the action ct_snat_to_vip and advances the packet
1012 to the next table.
1013
1014 • If the logical switch has load balancer(s) configured,
1015 then a priority-100 flow is added with the match ip &&
1016 ct.est && ct.trk && reg0[6] == 1 which hairpins the traf‐
1017 fic by NATting source IP to the load balancer VIP by exe‐
1018 cuting the action ct_snat and advances the packet to the
1019 next table.
1020
1021 • If the logical switch has load balancer(s) configured,
1022 then a priority-90 flow is added with the match ip &&
1023 reg0[12] == 1 which matches on the replies of hairpinned
1024 traffic (i.e., destination IP is VIP, source IP is the
1025 backend IP and source L4 port is backend port for L4 load
1026 balancers) and executes ct_snat and advances the packet
1027 to the next table.
1028
1029 • A priority-0 flow that simply moves traffic to the next
1030 table.
1031
1032 Ingress Table 17: Hairpin
1033
1034 • If logical switch has attached logical switch port of
1035 vtep type, then for each distributed gateway router port
1036 RP attached to this logical switch and has chassis redi‐
1037 rect port cr-RP, a priority-2000 flow is added with the
1038 match .IP
1039 reg0[14] == 1 && is_chassis_resident(cr-RP)
1040
1041 and action next;.
1042
1043 reg0[14] register bit is set in the ingress L2 port secu‐
1044 rity check table for traffic received from HW VTEP (ramp)
1045 ports.
1046
1047 • If logical switch has attached logical switch port of
1048 vtep type, then a priority-1000 flow that matches on
1049 reg0[14] register bit for the traffic received from HW
1050 VTEP (ramp) ports. This traffic is passed to ingress ta‐
1051 ble ls_in_l2_lkup.
1052
1053 • A priority-1 flow that hairpins traffic matched by non-
1054 default flows in the Pre-Hairpin table. Hairpinning is
1055 done at L2, Ethernet addresses are swapped and the pack‐
1056 ets are looped back on the input port.
1057
1058 • A priority-0 flow that simply moves traffic to the next
1059 table.
1060
1061 Ingress table 18: from-lport ACL evaluation after LB
1062
1063 Logical flows in this table closely reproduce those in the ACL eval ta‐
1064 ble in the OVN_Northbound database for the from-lport direction with
1065 the option apply-after-lb set to true. The priority values from the ACL
1066 table have a limited range and have 1000 added to them to leave room
1067 for OVN default flows at both higher and lower priorities. The flows in
1068 this table indicate the ACL verdict by setting reg8[16] for allow-type
1069 ACLs, reg8[17] for drop ACLs, and reg8[17] for reject ACLs, and then
1070 advancing the packet to the next table. These will be reffered to as
1071 the allow bit, drop bit, and reject bit throughout the documentation
1072 for this table and the next one.
1073
1074 Like with ACLs that are evaluated before load balancers, if the ACL is
1075 configured with a tier value, then the current tier counter, supplied
1076 in reg8[30..31] is matched against the ACL’s configured tier in addi‐
1077 tion to the ACL’s match.
1078
1079 • allow apply-after-lb ACLs translate into logical flows
1080 that set the allow bit. If there are any stateful ACLs
1081 (including both before-lb and after-lb ACLs) on this
1082 datapath, then allow ACLs also run ct_commit; next;
1083 (which acts as a hint for an upcoming table to commit the
1084 connection to conntrack). In case the ACL has a label
1085 then reg3 is loaded with the label value and reg0[13] bit
1086 is set to 1 (which acts as a hint for the next tables to
1087 commit the label to conntrack).
1088
1089 • allow-related apply-after-lb ACLs translate into logical
1090 flows that set the allow bit and run the ct_commit(ct_la‐
1091 bel=0/1); next; actions for new connections and reg0[1] =
1092 1; next; for existing connections. In case the ACL has a
1093 label then reg3 is loaded with the label value and
1094 reg0[13] bit is set to 1 (which acts as a hint for the
1095 next tables to commit the label to conntrack).
1096
1097 • allow-stateless apply-after-lb ACLs translate into logi‐
1098 cal flows that set the allow bit and advance to the next
1099 table.
1100
1101 • reject apply-after-lb ACLs translate into logical flows
1102 that set the reject bit and advance to the next table.
1103
1104 • pass apply-after-lb ACLs translate into logical flows
1105 that do not set the allow, drop, or reject bit and ad‐
1106 vance to the next table.
1107
1108 • Other apply-after-lb ACLs set the drop bit for new or un‐
1109 tracked connections and ct_commit(ct_label=1/1); for
1110 known connections. Setting ct_label marks a connection as
1111 one that was previously allowed, but should no longer be
1112 allowed due to a policy change.
1113
1114 • One priority-65532 flow matching packets with reg0[17]
1115 set (either replies to existing sessions or traffic re‐
1116 lated to existing sessions) and allows these by setting
1117 the allow bit and advancing to the next table.
1118
1119 • One priority-0 fallback flow that matches all packets and
1120 advances to the next table.
1121
1122 Ingress Table 19: from-lport ACL action after LB
1123
1124 Logical flows in this table decide how to proceed based on the values
1125 of the allow, drop, and reject bits that may have been set in the pre‐
1126 vious table.
1127
1128 • If no ACLs are configured, then a priority 0 flow is in‐
1129 stalled that matches everything and advances to the next
1130 table.
1131
1132 • A priority 1000 flow is installed that will advance the
1133 packet to the next table if the allow bit is set.
1134
1135 • A priority 1000 flow is installed that will run the drop;
1136 action if the drop bit is set.
1137
1138 • A priority 1000 flow is installed that will run the
1139 tcp_reset { output <-> inport; next(pipeline=egress,ta‐
1140 ble=5);} action for TCP connections,icmp4/icmp6 action
1141 for UDP connections, and sctp_abort {output <-%gt; in‐
1142 port; next(pipeline=egress,table=5);} action for SCTP as‐
1143 sociations.
1144
1145 • If any ACLs have tiers configured on them, then three
1146 priority 500 flows are installed. If the current tier
1147 counter is 0, 1, or 2, then the current tier counter is
1148 incremented by one and the packet is sent back to the
1149 previous table for re-evaluation.
1150
1151 Ingress Table 20: Stateful
1152
1153 • A priority 100 flow is added which commits the packet to
1154 the conntrack and sets the most significant 32-bits of
1155 ct_label with the reg3 value based on the hint provided
1156 by previous tables (with a match for reg0[1] == 1 &&
1157 reg0[13] == 1). This is used by the ACLs with label to
1158 commit the label value to conntrack.
1159
1160 • For ACLs without label, a second priority-100 flow com‐
1161 mits packets to connection tracker using ct_commit; next;
1162 action based on a hint provided by the previous tables
1163 (with a match for reg0[1] == 1 && reg0[13] == 0).
1164
1165 • A priority-0 flow that simply moves traffic to the next
1166 table.
1167
1168 Ingress Table 21: ARP/ND responder
1169
1170 This table implements ARP/ND responder in a logical switch for known
1171 IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
1172 by locally responding to ARP requests without the need to send to other
1173 hypervisors. One common case is when the inport is a logical port asso‐
1174 ciated with a VIF and the broadcast is responded to on the local hyper‐
1175 visor rather than broadcast across the whole network and responded to
1176 by the destination VM. This behavior is proxy ARP.
1177
1178 ARP requests arrive from VMs from a logical switch inport of type de‐
1179 fault. For this case, the logical switch proxy ARP rules can be for
1180 other VMs or logical router ports. Logical switch proxy ARP rules may
1181 be programmed both for mac binding of IP addresses on other logical
1182 switch VIF ports (which are of the default logical switch port type,
1183 representing connectivity to VMs or containers), and for mac binding of
1184 IP addresses on logical switch router type ports, representing their
1185 logical router port peers. In order to support proxy ARP for logical
1186 router ports, an IP address must be configured on the logical switch
1187 router type port, with the same value as the peer logical router port.
1188 The configured MAC addresses must match as well. When a VM sends an ARP
1189 request for a distributed logical router port and if the peer router
1190 type port of the attached logical switch does not have an IP address
1191 configured, the ARP request will be broadcast on the logical switch.
1192 One of the copies of the ARP request will go through the logical switch
1193 router type port to the logical router datapath, where the logical
1194 router ARP responder will generate a reply. The MAC binding of a dis‐
1195 tributed logical router, once learned by an associated VM, is used for
1196 all that VM’s communication needing routing. Hence, the action of a VM
1197 re-arping for the mac binding of the logical router port should be
1198 rare.
1199
1200 Logical switch ARP responder proxy ARP rules can also be hit when re‐
1201 ceiving ARP requests externally on a L2 gateway port. In this case, the
1202 hypervisor acting as an L2 gateway, responds to the ARP request on be‐
1203 half of a destination VM.
1204
1205 Note that ARP requests received from localnet logical inports can ei‐
1206 ther go directly to VMs, in which case the VM responds or can hit an
1207 ARP responder for a logical router port if the packet is used to re‐
1208 solve a logical router port next hop address. In either case, logical
1209 switch ARP responder rules will not be hit. It contains these logical
1210 flows:
1211
1212 • If packet was received from HW VTEP (ramp switch), and
1213 this packet is ARP or Neighbor Solicitation, such packet
1214 is passed to next table with max proirity. ARP/ND re‐
1215 quests from HW VTEP must be handled in logical router
1216 ingress pipeline.
1217
1218 • If the logical switch has no router ports with op‐
1219 tions:arp_proxy configured add a priority-100 flows to
1220 skip the ARP responder if inport is of type localnet ad‐
1221 vances directly to the next table. ARP requests sent to
1222 localnet ports can be received by multiple hypervisors.
1223 Now, because the same mac binding rules are downloaded to
1224 all hypervisors, each of the multiple hypervisors will
1225 respond. This will confuse L2 learning on the source of
1226 the ARP requests. ARP requests received on an inport of
1227 type router are not expected to hit any logical switch
1228 ARP responder flows. However, no skip flows are installed
1229 for these packets, as there would be some additional flow
1230 cost for this and the value appears limited.
1231
1232 • If inport V is of type virtual adds a priority-100 logi‐
1233 cal flows for each P configured in the options:virtual-
1234 parents column with the match
1235
1236 inport == P && && ((arp.op == 1 && arp.spa == VIP && arp.tpa == VIP) || (arp.op == 2 && arp.spa == VIP))
1237 inport == P && && ((nd_ns && ip6.dst == {VIP, NS_MULTICAST_ADDR} && nd.target == VIP) || (nd_na && nd.target == VIP))
1238
1239
1240 and applies the action
1241
1242 bind_vport(V, inport);
1243
1244
1245 and advances the packet to the next table.
1246
1247 Where VIP is the virtual ip configured in the column op‐
1248 tions:virtual-ip and NS_MULTICAST_ADDR is solicited-node
1249 multicast address corresponding to the VIP.
1250
1251 • Priority-50 flows that match ARP requests to each known
1252 IP address A of every logical switch port, and respond
1253 with ARP replies directly with corresponding Ethernet ad‐
1254 dress E:
1255
1256 eth.dst = eth.src;
1257 eth.src = E;
1258 arp.op = 2; /* ARP reply. */
1259 arp.tha = arp.sha;
1260 arp.sha = E;
1261 arp.tpa = arp.spa;
1262 arp.spa = A;
1263 outport = inport;
1264 flags.loopback = 1;
1265 output;
1266
1267
1268 These flows are omitted for logical ports (other than
1269 router ports or localport ports) that are down (unless
1270 ignore_lsp_down is configured as true in options column
1271 of NB_Global table of the Northbound database), for logi‐
1272 cal ports of type virtual, for logical ports with ’un‐
1273 known’ address set and for logical ports of a logical
1274 switch configured with other_config:vlan-passthru=true.
1275
1276 The above ARP responder flows are added for the list of
1277 IPv4 addresses if defined in options:arp_proxy column of
1278 Logical_Switch_Port table for logical switch ports of
1279 type router.
1280
1281 • Priority-50 flows that match IPv6 ND neighbor solicita‐
1282 tions to each known IP address A (and A’s solicited node
1283 address) of every logical switch port except of type
1284 router, and respond with neighbor advertisements directly
1285 with corresponding Ethernet address E:
1286
1287 nd_na {
1288 eth.src = E;
1289 ip6.src = A;
1290 nd.target = A;
1291 nd.tll = E;
1292 outport = inport;
1293 flags.loopback = 1;
1294 output;
1295 };
1296
1297
1298 Priority-50 flows that match IPv6 ND neighbor solicita‐
1299 tions to each known IP address A (and A’s solicited node
1300 address) of logical switch port of type router, and re‐
1301 spond with neighbor advertisements directly with corre‐
1302 sponding Ethernet address E:
1303
1304 nd_na_router {
1305 eth.src = E;
1306 ip6.src = A;
1307 nd.target = A;
1308 nd.tll = E;
1309 outport = inport;
1310 flags.loopback = 1;
1311 output;
1312 };
1313
1314
1315 These flows are omitted for logical ports (other than
1316 router ports or localport ports) that are down (unless
1317 ignore_lsp_down is configured as true in options column
1318 of NB_Global table of the Northbound database), for logi‐
1319 cal ports of type virtual and for logical ports with ’un‐
1320 known’ address set.
1321
1322 The above NDP responder flows are added for the list of
1323 IPv6 addresses if defined in options:arp_proxy column of
1324 Logical_Switch_Port table for logical switch ports of
1325 type router.
1326
1327 • Priority-100 flows with match criteria like the ARP and
1328 ND flows above, except that they only match packets from
1329 the inport that owns the IP addresses in question, with
1330 action next;. These flows prevent OVN from replying to,
1331 for example, an ARP request emitted by a VM for its own
1332 IP address. A VM only makes this kind of request to at‐
1333 tempt to detect a duplicate IP address assignment, so
1334 sending a reply will prevent the VM from accepting the IP
1335 address that it owns.
1336
1337 In place of next;, it would be reasonable to use drop;
1338 for the flows’ actions. If everything is working as it is
1339 configured, then this would produce equivalent results,
1340 since no host should reply to the request. But ARPing for
1341 one’s own IP address is intended to detect situations
1342 where the network is not working as configured, so drop‐
1343 ping the request would frustrate that intent.
1344
1345 • For each SVC_MON_SRC_IP defined in the value of the
1346 ip_port_mappings:ENDPOINT_IP column of Load_Balancer ta‐
1347 ble, priority-110 logical flow is added with the match
1348 arp.tpa == SVC_MON_SRC_IP && && arp.op == 1 and applies
1349 the action
1350
1351 eth.dst = eth.src;
1352 eth.src = E;
1353 arp.op = 2; /* ARP reply. */
1354 arp.tha = arp.sha;
1355 arp.sha = E;
1356 arp.tpa = arp.spa;
1357 arp.spa = A;
1358 outport = inport;
1359 flags.loopback = 1;
1360 output;
1361
1362
1363 where E is the service monitor source mac defined in the
1364 options:svc_monitor_mac column in the NB_Global table.
1365 This mac is used as the source mac in the service monitor
1366 packets for the load balancer endpoint IP health checks.
1367
1368 SVC_MON_SRC_IP is used as the source ip in the service
1369 monitor IPv4 packets for the load balancer endpoint IP
1370 health checks.
1371
1372 These flows are required if an ARP request is sent for
1373 the IP SVC_MON_SRC_IP.
1374
1375 For IPv6 the similar flow is added with the following ac‐
1376 tion
1377
1378 nd_na {
1379 eth.dst = eth.src;
1380 eth.src = E;
1381 ip6.src = A;
1382 nd.target = A;
1383 nd.tll = E;
1384 outport = inport;
1385 flags.loopback = 1;
1386 output;
1387 };
1388
1389
1390 • For each VIP configured in the table Forwarding_Group a
1391 priority-50 logical flow is added with the match arp.tpa
1392 == vip && && arp.op == 1
1393 and applies the action
1394
1395 eth.dst = eth.src;
1396 eth.src = E;
1397 arp.op = 2; /* ARP reply. */
1398 arp.tha = arp.sha;
1399 arp.sha = E;
1400 arp.tpa = arp.spa;
1401 arp.spa = A;
1402 outport = inport;
1403 flags.loopback = 1;
1404 output;
1405
1406
1407 where E is the forwarding group’s mac defined in the
1408 vmac.
1409
1410 A is used as either the destination ip for load balancing
1411 traffic to child ports or as nexthop to hosts behind the
1412 child ports.
1413
1414 These flows are required to respond to an ARP request if
1415 an ARP request is sent for the IP vip.
1416
1417 • One priority-0 fallback flow that matches all packets and
1418 advances to the next table.
1419
1420 Ingress Table 22: DHCP option processing
1421
1422 This table adds the DHCPv4 options to a DHCPv4 packet from the logical
1423 ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
1424 larly for DHCPv6 options. This table also adds flows for the logical
1425 ports of type external.
1426
1427 • A priority-100 logical flow is added for these logical
1428 ports which matches the IPv4 packet with udp.src = 68 and
1429 udp.dst = 67 and applies the action put_dhcp_opts and ad‐
1430 vances the packet to the next table.
1431
1432 reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
1433 next;
1434
1435
1436 For DHCPDISCOVER and DHCPREQUEST, this transforms the
1437 packet into a DHCP reply, adds the DHCP offer IP ip and
1438 options to the packet, and stores 1 into reg0[3]. For
1439 other kinds of packets, it just stores 0 into reg0[3].
1440 Either way, it continues to the next table.
1441
1442 • A priority-100 logical flow is added for these logical
1443 ports which matches the IPv6 packet with udp.src = 546
1444 and udp.dst = 547 and applies the action put_dhcpv6_opts
1445 and advances the packet to the next table.
1446
1447 reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
1448 next;
1449
1450
1451 For DHCPv6 Solicit/Request/Confirm packets, this trans‐
1452 forms the packet into a DHCPv6 Advertise/Reply, adds the
1453 DHCPv6 offer IP ip and options to the packet, and stores
1454 1 into reg0[3]. For other kinds of packets, it just
1455 stores 0 into reg0[3]. Either way, it continues to the
1456 next table.
1457
1458 • A priority-0 flow that matches all packets to advances to
1459 table 16.
1460
1461 Ingress Table 23: DHCP responses
1462
1463 This table implements DHCP responder for the DHCP replies generated by
1464 the previous table.
1465
1466 • A priority 100 logical flow is added for the logical
1467 ports configured with DHCPv4 options which matches IPv4
1468 packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
1469 1 and responds back to the inport after applying these
1470 actions. If reg0[3] is set to 1, it means that the action
1471 put_dhcp_opts was successful.
1472
1473 eth.dst = eth.src;
1474 eth.src = E;
1475 ip4.src = S;
1476 udp.src = 67;
1477 udp.dst = 68;
1478 outport = P;
1479 flags.loopback = 1;
1480 output;
1481
1482
1483 where E is the server MAC address and S is the server
1484 IPv4 address defined in the DHCPv4 options. Note that
1485 ip4.dst field is handled by put_dhcp_opts.
1486
1487 (This terminates ingress packet processing; the packet
1488 does not go to the next ingress table.)
1489
1490 • A priority 100 logical flow is added for the logical
1491 ports configured with DHCPv6 options which matches IPv6
1492 packets with udp.src == 546 && udp.dst == 547 && reg0[3]
1493 == 1 and responds back to the inport after applying these
1494 actions. If reg0[3] is set to 1, it means that the action
1495 put_dhcpv6_opts was successful.
1496
1497 eth.dst = eth.src;
1498 eth.src = E;
1499 ip6.dst = A;
1500 ip6.src = S;
1501 udp.src = 547;
1502 udp.dst = 546;
1503 outport = P;
1504 flags.loopback = 1;
1505 output;
1506
1507
1508 where E is the server MAC address and S is the server
1509 IPv6 LLA address generated from the server_id defined in
1510 the DHCPv6 options and A is the IPv6 address defined in
1511 the logical port’s addresses column.
1512
1513 (This terminates packet processing; the packet does not
1514 go on the next ingress table.)
1515
1516 • A priority-0 flow that matches all packets to advances to
1517 table 17.
1518
1519 Ingress Table 24 DNS Lookup
1520
1521 This table looks up and resolves the DNS names to the corresponding
1522 configured IP address(es).
1523
1524 • A priority-100 logical flow for each logical switch data‐
1525 path if it is configured with DNS records, which matches
1526 the IPv4 and IPv6 packets with udp.dst = 53 and applies
1527 the action dns_lookup and advances the packet to the next
1528 table.
1529
1530 reg0[4] = dns_lookup(); next;
1531
1532
1533 For valid DNS packets, this transforms the packet into a
1534 DNS reply if the DNS name can be resolved, and stores 1
1535 into reg0[4]. For failed DNS resolution or other kinds of
1536 packets, it just stores 0 into reg0[4]. Either way, it
1537 continues to the next table.
1538
1539 Ingress Table 25 DNS Responses
1540
1541 This table implements DNS responder for the DNS replies generated by
1542 the previous table.
1543
1544 • A priority-100 logical flow for each logical switch data‐
1545 path if it is configured with DNS records, which matches
1546 the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
1547 1 and responds back to the inport after applying these
1548 actions. If reg0[4] is set to 1, it means that the action
1549 dns_lookup was successful.
1550
1551 eth.dst <-> eth.src;
1552 ip4.src <-> ip4.dst;
1553 udp.dst = udp.src;
1554 udp.src = 53;
1555 outport = P;
1556 flags.loopback = 1;
1557 output;
1558
1559
1560 (This terminates ingress packet processing; the packet
1561 does not go to the next ingress table.)
1562
1563 Ingress table 26 External ports
1564
1565 Traffic from the external logical ports enter the ingress datapath
1566 pipeline via the localnet port. This table adds the below logical flows
1567 to handle the traffic from these ports.
1568
1569 • A priority-100 flow is added for each external logical
1570 port which doesn’t reside on a chassis to drop the
1571 ARP/IPv6 NS request to the router IP(s) (of the logical
1572 switch) which matches on the inport of the external logi‐
1573 cal port and the valid eth.src address(es) of the exter‐
1574 nal logical port.
1575
1576 This flow guarantees that the ARP/NS request to the
1577 router IP address from the external ports is responded by
1578 only the chassis which has claimed these external ports.
1579 All the other chassis, drops these packets.
1580
1581 A priority-100 flow is added for each external logical
1582 port which doesn’t reside on a chassis to drop any packet
1583 destined to the router mac - with the match inport == ex‐
1584 ternal && eth.src == E && eth.dst == R && !is_chas‐
1585 sis_resident("external") where E is the external port mac
1586 and R is the router port mac.
1587
1588 • A priority-0 flow that matches all packets to advances to
1589 table 20.
1590
1591 Ingress Table 27 Destination Lookup
1592
1593 This table implements switching behavior. It contains these logical
1594 flows:
1595
1596 • A priority-110 flow with the match eth.src == E for all
1597 logical switch datapaths and applies the action han‐
1598 dle_svc_check(inport). Where E is the service monitor mac
1599 defined in the options:svc_monitor_mac column of
1600 NB_Global table.
1601
1602 • A priority-100 flow that punts all IGMP/MLD packets to
1603 ovn-controller if multicast snooping is enabled on the
1604 logical switch.
1605
1606 • Priority-90 flows that forward registered IP multicast
1607 traffic to their corresponding multicast group, which
1608 ovn-northd creates based on learnt IGMP_Group entries.
1609 The flows also forward packets to the MC_MROUTER_FLOOD
1610 multicast group, which ovn-nortdh populates with all the
1611 logical ports that are connected to logical routers with
1612 options:mcast_relay=’true’.
1613
1614 • A priority-85 flow that forwards all IP multicast traffic
1615 destined to 224.0.0.X to the MC_FLOOD_L2 multicast group,
1616 which ovn-northd populates with all non-router logical
1617 ports.
1618
1619 • A priority-85 flow that forwards all IP multicast traffic
1620 destined to reserved multicast IPv6 addresses (RFC 4291,
1621 2.7.1, e.g., Solicited-Node multicast) to the MC_FLOOD
1622 multicast group, which ovn-northd populates with all en‐
1623 abled logical ports.
1624
1625 • A priority-80 flow that forwards all unregistered IP mul‐
1626 ticast traffic to the MC_STATIC multicast group, which
1627 ovn-northd populates with all the logical ports that have
1628 options :mcast_flood=’true’. The flow also forwards un‐
1629 registered IP multicast traffic to the MC_MROUTER_FLOOD
1630 multicast group, which ovn-northd populates with all the
1631 logical ports connected to logical routers that have op‐
1632 tions :mcast_relay=’true’.
1633
1634 • A priority-80 flow that drops all unregistered IP multi‐
1635 cast traffic if other_config :mcast_snoop=’true’ and
1636 other_config :mcast_flood_unregistered=’false’ and the
1637 switch is not connected to a logical router that has op‐
1638 tions :mcast_relay=’true’ and the switch doesn’t have any
1639 logical port with options :mcast_flood=’true’.
1640
1641 • Priority-80 flows for each IP address/VIP/NAT address
1642 owned by a router port connected to the switch. These
1643 flows match ARP requests and ND packets for the specific
1644 IP addresses. Matched packets are forwarded only to the
1645 router that owns the IP address and to the MC_FLOOD_L2
1646 multicast group which contains all non-router logical
1647 ports.
1648
1649 • Priority-75 flows for each port connected to a logical
1650 router matching self originated ARP request/RARP re‐
1651 quest/ND packets. These packets are flooded to the
1652 MC_FLOOD_L2 which contains all non-router logical ports.
1653
1654 • A priority-72 flow that outputs all ARP requests and ND
1655 packets with an Ethernet broadcast or multicast eth.dst
1656 to the MC_FLOOD_L2 multicast group if other_config:broad‐
1657 cast-arps-to-all-routers=true.
1658
1659 • A priority-70 flow that outputs all packets with an Eth‐
1660 ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
1661 ticast group.
1662
1663 • One priority-50 flow that matches each known Ethernet ad‐
1664 dress against eth.dst. Action of this flow outputs the
1665 packet to the single associated output port if it is en‐
1666 abled. drop; action is applied if LSP is disabled.
1667
1668 For the Ethernet address on a logical switch port of type
1669 router, when that logical switch port’s addresses column
1670 is set to router and the connected logical router port
1671 has a gateway chassis:
1672
1673 • The flow for the connected logical router port’s
1674 Ethernet address is only programmed on the gateway
1675 chassis.
1676
1677 • If the logical router has rules specified in nat
1678 with external_mac, then those addresses are also
1679 used to populate the switch’s destination lookup
1680 on the chassis where logical_port is resident.
1681
1682 For the Ethernet address on a logical switch port of type
1683 router, when that logical switch port’s addresses column
1684 is set to router and the connected logical router port
1685 specifies a reside-on-redirect-chassis and the logical
1686 router to which the connected logical router port belongs
1687 to has a distributed gateway LRP:
1688
1689 • The flow for the connected logical router port’s
1690 Ethernet address is only programmed on the gateway
1691 chassis.
1692
1693 For each forwarding group configured on the logical
1694 switch datapath, a priority-50 flow that matches on
1695 eth.dst == VIP
1696 with an action of fwd_group(childports=args ), where
1697 args contains comma separated logical switch child ports
1698 to load balance to. If liveness is enabled, then action
1699 also includes liveness=true.
1700
1701 • One priority-0 fallback flow that matches all packets
1702 with the action outport = get_fdb(eth.dst); next;. The
1703 action get_fdb gets the port for the eth.dst in the MAC
1704 learning table of the logical switch datapath. If there
1705 is no entry for eth.dst in the MAC learning table, then
1706 it stores none in the outport.
1707
1708 Ingress Table 28 Destination unknown
1709
1710 This table handles the packets whose destination was not found or and
1711 looked up in the MAC learning table of the logical switch datapath. It
1712 contains the following flows.
1713
1714 • Priority 50 flow with the match outport == P is added for
1715 each disabled Logical Switch Port P. This flow has action
1716 drop;.
1717
1718 • If the logical switch has logical ports with ’unknown’
1719 addresses set, then the below logical flow is added
1720
1721 • Priority 50 flow with the match outport == "none"
1722 then outputs them to the MC_UNKNOWN multicast
1723 group, which ovn-northd populates with all enabled
1724 logical ports that accept unknown destination
1725 packets. As a small optimization, if no logical
1726 ports accept unknown destination packets,
1727 ovn-northd omits this multicast group and logical
1728 flow.
1729
1730 If the logical switch has no logical ports with ’unknown’
1731 address set, then the below logical flow is added
1732
1733 • Priority 50 flow with the match outport == none
1734 and drops the packets.
1735
1736 • One priority-0 fallback flow that outputs the packet to
1737 the egress stage with the outport learnt from get_fdb ac‐
1738 tion.
1739
1740 Egress Table 0: to-lport Pre-ACLs
1741
1742 This is similar to ingress table Pre-ACLs except for to-lport traffic.
1743
1744 This table also has a priority-110 flow with the match eth.src == E for
1745 all logical switch datapaths to move traffic to the next table. Where E
1746 is the service monitor mac defined in the options:svc_monitor_mac col‐
1747 umn of NB_Global table.
1748
1749 This table also has a priority-110 flow with the match outport == I for
1750 all logical switch datapaths to move traffic to the next table. Where I
1751 is the peer of a logical router port. This flow is added to skip the
1752 connection tracking of packets which will be entering logical router
1753 datapath from logical switch datapath for routing.
1754
1755 Egress Table 1: Pre-LB
1756
1757 This table is similar to ingress table Pre-LB. It contains a priority-0
1758 flow that simply moves traffic to the next table. Moreover it contains
1759 two priority-110 flows to move multicast, IPv6 Neighbor Discovery and
1760 MLD traffic to the next table. If any load balancing rules exist for
1761 the datapath, a priority-100 flow is added with a match of ip and ac‐
1762 tion of reg0[2] = 1; next; to act as a hint for table Pre-stateful to
1763 send IP packets to the connection tracker for packet de-fragmentation
1764 and possibly DNAT the destination VIP to one of the selected backend
1765 for already committed load balanced traffic.
1766
1767 This table also has a priority-110 flow with the match eth.src == E for
1768 all logical switch datapaths to move traffic to the next table. Where E
1769 is the service monitor mac defined in the options:svc_monitor_mac col‐
1770 umn of NB_Global table.
1771
1772 This table also has a priority-110 flow with the match outport == I for
1773 all logical switch datapaths to move traffic to the next table, and, if
1774 there are no stateful_acl, clear the ct_state. Where I is the peer of a
1775 logical router port. This flow is added to skip the connection tracking
1776 of packets which will be entering logical router datapath from logical
1777 switch datapath for routing.
1778
1779 Egress Table 2: Pre-stateful
1780
1781 This is similar to ingress table Pre-stateful. This table adds the be‐
1782 low 3 logical flows.
1783
1784 • A Priority-120 flow that send the packets to connection
1785 tracker using ct_lb_mark; as the action so that the al‐
1786 ready established traffic gets unDNATted from the backend
1787 IP to the load balancer VIP based on a hint provided by
1788 the previous tables with a match for reg0[2] == 1. If the
1789 packet was not DNATted earlier, then ct_lb_mark functions
1790 like ct_next.
1791
1792 • A priority-100 flow sends the packets to connection
1793 tracker based on a hint provided by the previous tables
1794 (with a match for reg0[0] == 1) by using the ct_next; ac‐
1795 tion.
1796
1797 • A priority-0 flow that matches all packets to advance to
1798 the next table.
1799
1800 Egress Table 3: from-lport ACL hints
1801
1802 This is similar to ingress table ACL hints.
1803
1804 Egress Table 4: to-lport ACL evaluation
1805
1806 This is similar to ingress table ACL eval except for to-lport ACLs. As
1807 a reminder, these flows use the following register bits to indicate
1808 their verdicts. Allow-type ACLs set reg8[16], drop ACLs set reg8[17],
1809 and reject ACLs set reg8[18].
1810
1811 Also like with ingress ACLs, egress ACLs can have a configured tier. If
1812 a tier is configured, then the current tier counter is evaluated
1813 against the ACL’s configured tier in addition to the ACL’s match. The
1814 current tier counter is stored in reg8[30..31].
1815
1816 Similar to ingress table, a priority-65532 flow is added to allow IPv6
1817 Neighbor solicitation, Neighbor discover, Router solicitation, Router
1818 advertisement and MLD packets regardless of other ACLs defined.
1819
1820 In addition, the following flows are added.
1821
1822 • A priority 34000 logical flow is added for each logical
1823 port which has DHCPv4 options defined to allow the DHCPv4
1824 reply packet and which has DHCPv6 options defined to al‐
1825 low the DHCPv6 reply packet from the Ingress Table 18:
1826 DHCP responses. This is indicated by setting the allow
1827 bit.
1828
1829 • A priority 34000 logical flow is added for each logical
1830 switch datapath configured with DNS records with the
1831 match udp.dst = 53 to allow the DNS reply packet from the
1832 Ingress Table 20: DNS responses. This is indicated by
1833 setting the allow bit.
1834
1835 • A priority 34000 logical flow is added for each logical
1836 switch datapath with the match eth.src = E to allow the
1837 service monitor request packet generated by ovn-con‐
1838 troller with the action next, where E is the service mon‐
1839 itor mac defined in the options:svc_monitor_mac column of
1840 NB_Global table. This is indicated by setting the allow
1841 bit.
1842
1843 Egress Table 5: to-lport ACL action
1844
1845 This is similar to ingress table ACL action.
1846
1847 Egress Table 6: to-lport QoS Marking
1848
1849 This is similar to ingress table QoS marking except they apply to
1850 to-lport QoS rules.
1851
1852 Egress Table 7: to-lport QoS Meter
1853
1854 This is similar to ingress table QoS meter except they apply to
1855 to-lport QoS rules.
1856
1857 Egress Table 8: Stateful
1858
1859 This is similar to ingress table Stateful except that there are no
1860 rules added for load balancing new connections.
1861
1862 Egress Table 9: Egress Port Security - check
1863
1864 This is similar to the port security logic in table Ingress Port Secu‐
1865 rity check except that action check_out_port_sec is used to check the
1866 port security rules. This table adds the below logical flows.
1867
1868 • A priority 100 flow which matches on the multicast traf‐
1869 fic and applies the action REGBIT_PORT_SEC_DROP" = 0;
1870 next;" to skip the out port security checks.
1871
1872 • A priority 0 logical flow is added which matches on all
1873 the packets and applies the action REGBIT_PORT_SEC_DROP"
1874 = check_out_port_sec(); next;". The action
1875 check_out_port_sec applies the port security rules based
1876 on the addresses defined in the port_security column of
1877 Logical_Switch_Port table before delivering the packet to
1878 the outport.
1879
1880 Egress Table 10: Egress Port Security - Apply
1881
1882 This is similar to the ingress port security logic in ingress table A
1883 Ingress Port Security - Apply. This table drops the packets if the port
1884 security check failed in the previous stage i.e the register bit REG‐
1885 BIT_PORT_SEC_DROP is set to 1.
1886
1887 The following flows are added.
1888
1889 • For each port configured with egress qos in the op‐
1890 tions:qdisc_queue_id column of Logical_Switch_Port, run‐
1891 ning a localnet port on the same logical switch, a prior‐
1892 ity 110 flow is added which matches on the localnet out‐
1893 port and on the port inport and applies the action
1894 set_queue(id); output;".
1895
1896 • For each localnet port configured with egress qos in the
1897 options:qdisc_queue_id column of Logical_Switch_Port, a
1898 priority 100 flow is added which matches on the localnet
1899 outport and applies the action set_queue(id); output;".
1900
1901 Please remember to mark the corresponding physical inter‐
1902 face with ovn-egress-iface set to true in external_ids.
1903
1904 • A priority-50 flow that drops the packet if the register
1905 bit REGBIT_PORT_SEC_DROP is set to 1.
1906
1907 • A priority-0 flow that outputs the packet to the outport.
1908
1909 Logical Router Datapaths
1910 Logical router datapaths will only exist for Logical_Router rows in the
1911 OVN_Northbound database that do not have enabled set to false
1912
1913 Ingress Table 0: L2 Admission Control
1914
1915 This table drops packets that the router shouldn’t see at all based on
1916 their Ethernet headers. It contains the following flows:
1917
1918 • Priority-100 flows to drop packets with VLAN tags or mul‐
1919 ticast Ethernet source addresses.
1920
1921 • For each enabled router port P with Ethernet address E, a
1922 priority-50 flow that matches inport == P && (eth.mcast
1923 || eth.dst == E), stores the router port ethernet address
1924 and advances to next table, with action xreg0[0..47]=E;
1925 next;.
1926
1927 For the gateway port on a distributed logical router
1928 (where one of the logical router ports specifies a gate‐
1929 way chassis), the above flow matching eth.dst == E is
1930 only programmed on the gateway port instance on the gate‐
1931 way chassis. If LRP’s logical switch has attached LSP of
1932 vtep type, the is_chassis_resident() part is not added to
1933 lflow to allow traffic originated from logical switch to
1934 reach LR services (LBs, NAT).
1935
1936 For each gateway port GW on a distributed logical router
1937 a priority-120 flow that matches inport == cr-GW &&
1938 !is_chassis_resident(cr-GW) where cr-GW is the chassis
1939 resident port of GW, stores GW as inport and advances to
1940 the next table.
1941
1942 For a distributed logical router or for gateway router
1943 where the port is configured with options:gateway_mtu the
1944 action of the above flow is modified adding
1945 check_pkt_larger in order to mark the packet setting REG‐
1946 BIT_PKT_LARGER if the size is greater than the MTU. If
1947 the port is also configured with options:gateway_mtu_by‐
1948 pass then another flow is added, with priority-55, to by‐
1949 pass the check_pkt_larger flow. This is useful for traf‐
1950 fic that normally doesn’t need to be fragmented and for
1951 which check_pkt_larger, which might not be offloadable,
1952 is not really needed. One such example is TCP traffic.
1953
1954 • For each dnat_and_snat NAT rule on a distributed router
1955 that specifies an external Ethernet address E, a prior‐
1956 ity-50 flow that matches inport == GW && eth.dst == E,
1957 where GW is the logical router distributed gateway port
1958 corresponding to the NAT rule (specified or inferred),
1959 with action xreg0[0..47]=E; next;.
1960
1961 This flow is only programmed on the gateway port instance
1962 on the chassis where the logical_port specified in the
1963 NAT rule resides.
1964
1965 • A priority-0 logical flow that matches all packets not
1966 already handled (match 1) and drops them (action drop;).
1967
1968 Other packets are implicitly dropped.
1969
1970 Ingress Table 1: Neighbor lookup
1971
1972 For ARP and IPv6 Neighbor Discovery packets, this table looks into the
1973 MAC_Binding records to determine if OVN needs to learn the mac bind‐
1974 ings. Following flows are added:
1975
1976 • For each router port P that owns IP address A, which be‐
1977 longs to subnet S with prefix length L, if the option al‐
1978 ways_learn_from_arp_request is true for this router, a
1979 priority-100 flow is added which matches inport == P &&
1980 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1981 lowing actions:
1982
1983 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1984 next;
1985
1986
1987 If the option always_learn_from_arp_request is false, the
1988 following two flows are added.
1989
1990 A priority-110 flow is added which matches inport == P &&
1991 arp.spa == S/L && arp.tpa == A && arp.op == 1 (ARP re‐
1992 quest) with the following actions:
1993
1994 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1995 reg9[3] = 1;
1996 next;
1997
1998
1999 A priority-100 flow is added which matches inport == P &&
2000 arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
2001 lowing actions:
2002
2003 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
2004 reg9[3] = lookup_arp_ip(inport, arp.spa);
2005 next;
2006
2007
2008 If the logical router port P is a distributed gateway
2009 router port, additional match is_chassis_resident(cr-P)
2010 is added for all these flows.
2011
2012 • A priority-100 flow which matches on ARP reply packets
2013 and applies the actions if the option al‐
2014 ways_learn_from_arp_request is true:
2015
2016 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
2017 next;
2018
2019
2020 If the option always_learn_from_arp_request is false, the
2021 above actions will be:
2022
2023 reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
2024 reg9[3] = 1;
2025 next;
2026
2027
2028 • A priority-100 flow which matches on IPv6 Neighbor Dis‐
2029 covery advertisement packet and applies the actions if
2030 the option always_learn_from_arp_request is true:
2031
2032 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
2033 next;
2034
2035
2036 If the option always_learn_from_arp_request is false, the
2037 above actions will be:
2038
2039 reg9[2] = lookup_nd(inport, nd.target, nd.tll);
2040 reg9[3] = 1;
2041 next;
2042
2043
2044 • A priority-100 flow which matches on IPv6 Neighbor Dis‐
2045 covery solicitation packet and applies the actions if the
2046 option always_learn_from_arp_request is true:
2047
2048 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
2049 next;
2050
2051
2052 If the option always_learn_from_arp_request is false, the
2053 above actions will be:
2054
2055 reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
2056 reg9[3] = lookup_nd_ip(inport, ip6.src);
2057 next;
2058
2059
2060 • A priority-0 fallback flow that matches all packets and
2061 applies the action reg9[2] = 1; next; advancing the
2062 packet to the next table.
2063
2064 Ingress Table 2: Neighbor learning
2065
2066 This table adds flows to learn the mac bindings from the ARP and IPv6
2067 Neighbor Solicitation/Advertisement packets if it is needed according
2068 to the lookup results from the previous stage.
2069
2070 reg9[2] will be 1 if the lookup_arp/lookup_nd in the previous table was
2071 successful or skipped, meaning no need to learn mac binding from the
2072 packet.
2073
2074 reg9[3] will be 1 if the lookup_arp_ip/lookup_nd_ip in the previous ta‐
2075 ble was successful or skipped, meaning it is ok to learn mac binding
2076 from the packet (if reg9[2] is 0).
2077
2078 • A priority-100 flow with the match reg9[2] == 1 ||
2079 reg9[3] == 0 and advances the packet to the next table as
2080 there is no need to learn the neighbor.
2081
2082 • A priority-95 flow with the match nd_ns && (ip6.src == 0
2083 || nd.sll == 0) and applies the action next;
2084
2085 • A priority-90 flow with the match arp and applies the ac‐
2086 tion put_arp(inport, arp.spa, arp.sha); next;
2087
2088 • A priority-95 flow with the match nd_na && nd.tll == 0
2089 and applies the action put_nd(inport, nd.target,
2090 eth.src); next;
2091
2092 • A priority-90 flow with the match nd_na and applies the
2093 action put_nd(inport, nd.target, nd.tll); next;
2094
2095 • A priority-90 flow with the match nd_ns and applies the
2096 action put_nd(inport, ip6.src, nd.sll); next;
2097
2098 • A priority-0 logical flow that matches all packets not
2099 already handled (match 1) and drops them (action drop;).
2100
2101 Ingress Table 3: IP Input
2102
2103 This table is the core of the logical router datapath functionality. It
2104 contains the following flows to implement very basic IP host function‐
2105 ality.
2106
2107 • For each dnat_and_snat NAT rule on a distributed logical
2108 routers or gateway routers with gateway port configured
2109 with options:gateway_mtu to a valid integer value M, a
2110 priority-160 flow with the match inport == LRP && REG‐
2111 BIT_PKT_LARGER && REGBIT_EGRESS_LOOPBACK == 0, where LRP
2112 is the logical router port and applies the following ac‐
2113 tion for ipv4 and ipv6 respectively:
2114
2115 icmp4_error {
2116 icmp4.type = 3; /* Destination Unreachable. */
2117 icmp4.code = 4; /* Frag Needed and DF was Set. */
2118 icmp4.frag_mtu = M;
2119 eth.dst = eth.src;
2120 eth.src = E;
2121 ip4.dst = ip4.src;
2122 ip4.src = I;
2123 ip.ttl = 255;
2124 REGBIT_EGRESS_LOOPBACK = 1;
2125 REGBIT_PKT_LARGER 0;
2126 outport = LRP;
2127 flags.loopback = 1;
2128 output;
2129 };
2130 icmp6_error {
2131 icmp6.type = 2;
2132 icmp6.code = 0;
2133 icmp6.frag_mtu = M;
2134 eth.dst = eth.src;
2135 eth.src = E;
2136 ip6.dst = ip6.src;
2137 ip6.src = I;
2138 ip.ttl = 255;
2139 REGBIT_EGRESS_LOOPBACK = 1;
2140 REGBIT_PKT_LARGER 0;
2141 outport = LRP;
2142 flags.loopback = 1;
2143 output;
2144 };
2145
2146
2147 where E and I are the NAT rule external mac and IP re‐
2148 spectively.
2149
2150 • For distributed logical routers or gateway routers with
2151 gateway port configured with options:gateway_mtu to a
2152 valid integer value, a priority-150 flow with the match
2153 inport == LRP && REGBIT_PKT_LARGER && REGBIT_EGRESS_LOOP‐
2154 BACK == 0, where LRP is the logical router port and ap‐
2155 plies the following action for ipv4 and ipv6 respec‐
2156 tively:
2157
2158 icmp4_error {
2159 icmp4.type = 3; /* Destination Unreachable. */
2160 icmp4.code = 4; /* Frag Needed and DF was Set. */
2161 icmp4.frag_mtu = M;
2162 eth.dst = E;
2163 ip4.dst = ip4.src;
2164 ip4.src = I;
2165 ip.ttl = 255;
2166 REGBIT_EGRESS_LOOPBACK = 1;
2167 REGBIT_PKT_LARGER 0;
2168 next(pipeline=ingress, table=0);
2169 };
2170 icmp6_error {
2171 icmp6.type = 2;
2172 icmp6.code = 0;
2173 icmp6.frag_mtu = M;
2174 eth.dst = E;
2175 ip6.dst = ip6.src;
2176 ip6.src = I;
2177 ip.ttl = 255;
2178 REGBIT_EGRESS_LOOPBACK = 1;
2179 REGBIT_PKT_LARGER 0;
2180 next(pipeline=ingress, table=0);
2181 };
2182
2183
2184 • For each NAT entry of a distributed logical router (with
2185 distributed gateway router port(s)) of type snat, a pri‐
2186 ority-120 flow with the match inport == P && ip4.src == A
2187 advances the packet to the next pipeline, where P is the
2188 distributed logical router port corresponding to the NAT
2189 entry (specified or inferred) and A is the external_ip
2190 set in the NAT entry. If A is an IPv6 address, then
2191 ip6.src is used for the match.
2192
2193 The above flow is required to handle the routing of the
2194 East/west NAT traffic.
2195
2196 • For each BFD port the two following priority-110 flows
2197 are added to manage BFD traffic:
2198
2199 • if ip4.src or ip6.src is any IP address owned by
2200 the router port and udp.dst == 3784 , the packet
2201 is advanced to the next pipeline stage.
2202
2203 • if ip4.dst or ip6.dst is any IP address owned by
2204 the router port and udp.dst == 3784 , the han‐
2205 dle_bfd_msg action is executed.
2206
2207 • L3 admission control: Priority-120 flows allows IGMP and
2208 MLD packets if the router has logical ports that have op‐
2209 tions :mcast_flood=’true’.
2210
2211 • L3 admission control: A priority-100 flow drops packets
2212 that match any of the following:
2213
2214 • ip4.src[28..31] == 0xe (multicast source)
2215
2216 • ip4.src == 255.255.255.255 (broadcast source)
2217
2218 • ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
2219 (localhost source or destination)
2220
2221 • ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
2222 network source or destination)
2223
2224 • ip4.src or ip6.src is any IP address owned by the
2225 router, unless the packet was recirculated due to
2226 egress loopback as indicated by REG‐
2227 BIT_EGRESS_LOOPBACK.
2228
2229 • ip4.src is the broadcast address of any IP network
2230 known to the router.
2231
2232 • A priority-100 flow parses DHCPv6 replies from IPv6 pre‐
2233 fix delegation routers (udp.src == 547 && udp.dst ==
2234 546). The handle_dhcpv6_reply is used to send IPv6 prefix
2235 delegation messages to the delegation router.
2236
2237 • ICMP echo reply. These flows reply to ICMP echo requests
2238 received for the router’s IP address. Let A be an IP ad‐
2239 dress owned by a router port. Then, for each A that is an
2240 IPv4 address, a priority-90 flow matches on ip4.dst == A
2241 and icmp4.type == 8 && icmp4.code == 0 (ICMP echo re‐
2242 quest). For each A that is an IPv6 address, a priority-90
2243 flow matches on ip6.dst == A and icmp6.type == 128 &&
2244 icmp6.code == 0 (ICMPv6 echo request). The port of the
2245 router that receives the echo request does not matter.
2246 Also, the ip.ttl of the echo request packet is not
2247 checked, so it complies with RFC 1812, section 4.2.2.9.
2248 Flows for ICMPv4 echo requests use the following actions:
2249
2250 ip4.dst <-> ip4.src;
2251 ip.ttl = 255;
2252 icmp4.type = 0;
2253 flags.loopback = 1;
2254 next;
2255
2256
2257 Flows for ICMPv6 echo requests use the following actions:
2258
2259 ip6.dst <-> ip6.src;
2260 ip.ttl = 255;
2261 icmp6.type = 129;
2262 flags.loopback = 1;
2263 next;
2264
2265
2266 • Reply to ARP requests.
2267
2268 These flows reply to ARP requests for the router’s own IP
2269 address. The ARP requests are handled only if the re‐
2270 questor’s IP belongs to the same subnets of the logical
2271 router port. For each router port P that owns IP address
2272 A, which belongs to subnet S with prefix length L, and
2273 Ethernet address E, a priority-90 flow matches inport ==
2274 P && arp.spa == S/L && arp.op == 1 && arp.tpa == A (ARP
2275 request) with the following actions:
2276
2277 eth.dst = eth.src;
2278 eth.src = xreg0[0..47];
2279 arp.op = 2; /* ARP reply. */
2280 arp.tha = arp.sha;
2281 arp.sha = xreg0[0..47];
2282 arp.tpa = arp.spa;
2283 arp.spa = A;
2284 outport = inport;
2285 flags.loopback = 1;
2286 output;
2287
2288
2289 For the gateway port on a distributed logical router
2290 (where one of the logical router ports specifies a gate‐
2291 way chassis), the above flows are only programmed on the
2292 gateway port instance on the gateway chassis. This behav‐
2293 ior avoids generation of multiple ARP responses from dif‐
2294 ferent chassis, and allows upstream MAC learning to point
2295 to the gateway chassis.
2296
2297 For the logical router port with the option reside-on-re‐
2298 direct-chassis set (which is centralized), the above
2299 flows are only programmed on the gateway port instance on
2300 the gateway chassis (if the logical router has a distrib‐
2301 uted gateway port). This behavior avoids generation of
2302 multiple ARP responses from different chassis, and allows
2303 upstream MAC learning to point to the gateway chassis.
2304
2305 • Reply to IPv6 Neighbor Solicitations. These flows reply
2306 to Neighbor Solicitation requests for the router’s own
2307 IPv6 address and populate the logical router’s mac bind‐
2308 ing table.
2309
2310 For each router port P that owns IPv6 address A, so‐
2311 licited node address S, and Ethernet address E, a prior‐
2312 ity-90 flow matches inport == P && nd_ns && ip6.dst ==
2313 {A, E} && nd.target == A with the following actions:
2314
2315 nd_na_router {
2316 eth.src = xreg0[0..47];
2317 ip6.src = A;
2318 nd.target = A;
2319 nd.tll = xreg0[0..47];
2320 outport = inport;
2321 flags.loopback = 1;
2322 output;
2323 };
2324
2325
2326 For the gateway port on a distributed logical router
2327 (where one of the logical router ports specifies a gate‐
2328 way chassis), the above flows replying to IPv6 Neighbor
2329 Solicitations are only programmed on the gateway port in‐
2330 stance on the gateway chassis. This behavior avoids gen‐
2331 eration of multiple replies from different chassis, and
2332 allows upstream MAC learning to point to the gateway
2333 chassis.
2334
2335 • These flows reply to ARP requests or IPv6 neighbor solic‐
2336 itation for the virtual IP addresses configured in the
2337 router for NAT (both DNAT and SNAT) or load balancing.
2338
2339 IPv4: For a configured NAT (both DNAT and SNAT) IP ad‐
2340 dress or a load balancer IPv4 VIP A, for each router port
2341 P with Ethernet address E, a priority-90 flow matches
2342 arp.op == 1 && arp.tpa == A (ARP request) with the fol‐
2343 lowing actions:
2344
2345 eth.dst = eth.src;
2346 eth.src = xreg0[0..47];
2347 arp.op = 2; /* ARP reply. */
2348 arp.tha = arp.sha;
2349 arp.sha = xreg0[0..47];
2350 arp.tpa <-> arp.spa;
2351 outport = inport;
2352 flags.loopback = 1;
2353 output;
2354
2355
2356 IPv4: For a configured load balancer IPv4 VIP, a similar
2357 flow is added with the additional match inport == P if
2358 the VIP is reachable from any logical router port of the
2359 logical router.
2360
2361 If the router port P is a distributed gateway router
2362 port, then the is_chassis_resident(P) is also added in
2363 the match condition for the load balancer IPv4 VIP A.
2364
2365 IPv6: For a configured NAT (both DNAT and SNAT) IP ad‐
2366 dress or a load balancer IPv6 VIP A (if the VIP is reach‐
2367 able from any logical router port of the logical router),
2368 solicited node address S, for each router port P with
2369 Ethernet address E, a priority-90 flow matches inport ==
2370 P && nd_ns && ip6.dst == {A, S} && nd.target == A with
2371 the following actions:
2372
2373 eth.dst = eth.src;
2374 nd_na {
2375 eth.src = xreg0[0..47];
2376 nd.tll = xreg0[0..47];
2377 ip6.src = A;
2378 nd.target = A;
2379 outport = inport;
2380 flags.loopback = 1;
2381 output;
2382 }
2383
2384
2385 If the router port P is a distributed gateway router
2386 port, then the is_chassis_resident(P) is also added in
2387 the match condition for the load balancer IPv6 VIP A.
2388
2389 For the gateway port on a distributed logical router with
2390 NAT (where one of the logical router ports specifies a
2391 gateway chassis):
2392
2393 • If the corresponding NAT rule cannot be handled in
2394 a distributed manner, then a priority-92 flow is
2395 programmed on the gateway port instance on the
2396 gateway chassis. A priority-91 drop flow is pro‐
2397 grammed on the other chassis when ARP requests/NS
2398 packets are received on the gateway port. This be‐
2399 havior avoids generation of multiple ARP responses
2400 from different chassis, and allows upstream MAC
2401 learning to point to the gateway chassis.
2402
2403 • If the corresponding NAT rule can be handled in a
2404 distributed manner, then this flow is only pro‐
2405 grammed on the gateway port instance where the
2406 logical_port specified in the NAT rule resides.
2407
2408 Some of the actions are different for this case,
2409 using the external_mac specified in the NAT rule
2410 rather than the gateway port’s Ethernet address E:
2411
2412 eth.src = external_mac;
2413 arp.sha = external_mac;
2414
2415
2416 or in the case of IPv6 neighbor solicition:
2417
2418 eth.src = external_mac;
2419 nd.tll = external_mac;
2420
2421
2422 This behavior avoids generation of multiple ARP
2423 responses from different chassis, and allows up‐
2424 stream MAC learning to point to the correct chas‐
2425 sis.
2426
2427 • Priority-85 flows which drops the ARP and IPv6 Neighbor
2428 Discovery packets.
2429
2430 • A priority-84 flow explicitly allows IPv6 multicast traf‐
2431 fic that is supposed to reach the router pipeline (i.e.,
2432 router solicitation and router advertisement packets).
2433
2434 • A priority-83 flow explicitly drops IPv6 multicast traf‐
2435 fic that is destined to reserved multicast groups.
2436
2437 • A priority-82 flow allows IP multicast traffic if op‐
2438 tions:mcast_relay=’true’, otherwise drops it.
2439
2440 • UDP port unreachable. Priority-80 flows generate ICMP
2441 port unreachable messages in reply to UDP datagrams di‐
2442 rected to the router’s IP address, except in the special
2443 case of gateways, which accept traffic directed to a
2444 router IP for load balancing and NAT purposes.
2445
2446 These flows should not match IP fragments with nonzero
2447 offset.
2448
2449 • TCP reset. Priority-80 flows generate TCP reset messages
2450 in reply to TCP datagrams directed to the router’s IP ad‐
2451 dress, except in the special case of gateways, which ac‐
2452 cept traffic directed to a router IP for load balancing
2453 and NAT purposes.
2454
2455 These flows should not match IP fragments with nonzero
2456 offset.
2457
2458 • Protocol or address unreachable. Priority-70 flows gener‐
2459 ate ICMP protocol or address unreachable messages for
2460 IPv4 and IPv6 respectively in reply to packets directed
2461 to the router’s IP address on IP protocols other than
2462 UDP, TCP, and ICMP, except in the special case of gate‐
2463 ways, which accept traffic directed to a router IP for
2464 load balancing purposes.
2465
2466 These flows should not match IP fragments with nonzero
2467 offset.
2468
2469 • Drop other IP traffic to this router. These flows drop
2470 any other traffic destined to an IP address of this
2471 router that is not already handled by one of the flows
2472 above, which amounts to ICMP (other than echo requests)
2473 and fragments with nonzero offsets. For each IP address A
2474 owned by the router, a priority-60 flow matches ip4.dst
2475 == A or ip6.dst == A and drops the traffic. An exception
2476 is made and the above flow is not added if the router
2477 port’s own IP address is used to SNAT packets passing
2478 through that router or if it is used as a load balancer
2479 VIP.
2480
2481 The flows above handle all of the traffic that might be directed to the
2482 router itself. The following flows (with lower priorities) handle the
2483 remaining traffic, potentially for forwarding:
2484
2485 • Drop Ethernet local broadcast. A priority-50 flow with
2486 match eth.bcast drops traffic destined to the local Eth‐
2487 ernet broadcast address. By definition this traffic
2488 should not be forwarded.
2489
2490 • Avoid ICMP time exceeded for multicast. A priority-32
2491 flow with match ip.ttl == {0, 1} && !ip.later_frag &&
2492 (ip4.mcast || ip6.mcast) and actions drop; drops multi‐
2493 cast packets whose TTL has expired without sending ICMP
2494 time exceeded.
2495
2496 • ICMP time exceeded. For each router port P, whose IP ad‐
2497 dress is A, a priority-31 flow with match inport == P &&
2498 ip.ttl == {0, 1} && !ip.later_frag matches packets whose
2499 TTL has expired, with the following actions to send an
2500 ICMP time exceeded reply for IPv4 and IPv6 respectively:
2501
2502 icmp4 {
2503 icmp4.type = 11; /* Time exceeded. */
2504 icmp4.code = 0; /* TTL exceeded in transit. */
2505 ip4.dst = ip4.src;
2506 ip4.src = A;
2507 ip.ttl = 254;
2508 next;
2509 };
2510 icmp6 {
2511 icmp6.type = 3; /* Time exceeded. */
2512 icmp6.code = 0; /* TTL exceeded in transit. */
2513 ip6.dst = ip6.src;
2514 ip6.src = A;
2515 ip.ttl = 254;
2516 next;
2517 };
2518
2519
2520 • TTL discard. A priority-30 flow with match ip.ttl == {0,
2521 1} and actions drop; drops other packets whose TTL has
2522 expired, that should not receive a ICMP error reply (i.e.
2523 fragments with nonzero offset).
2524
2525 • Next table. A priority-0 flows match all packets that
2526 aren’t already handled and uses actions next; to feed
2527 them to the next table.
2528
2529 Ingress Table 4: UNSNAT
2530
2531 This is for already established connections’ reverse traffic. i.e.,
2532 SNAT has already been done in egress pipeline and now the packet has
2533 entered the ingress pipeline as part of a reply. It is unSNATted here.
2534
2535 Ingress Table 4: UNSNAT on Gateway and Distributed Routers
2536
2537 • If the Router (Gateway or Distributed) is configured with
2538 load balancers, then below lflows are added:
2539
2540 For each IPv4 address A defined as load balancer VIP with
2541 the protocol P (and the protocol port T if defined) is
2542 also present as an external_ip in the NAT table, a prior‐
2543 ity-120 logical flow is added with the match ip4 &&
2544 ip4.dst == A && P with the action next; to advance the
2545 packet to the next table. If the load balancer has proto‐
2546 col port B defined, then the match also has P.dst == B.
2547
2548 The above flows are also added for IPv6 load balancers.
2549
2550 Ingress Table 4: UNSNAT on Gateway Routers
2551
2552 • If the Gateway router has been configured to force SNAT
2553 any previously DNATted packets to B, a priority-110 flow
2554 matches ip && ip4.dst == B or ip && ip6.dst == B with an
2555 action ct_snat; .
2556
2557 If the Gateway router is configured with
2558 lb_force_snat_ip=router_ip then for every logical router
2559 port P attached to the Gateway router with the router ip
2560 B, a priority-110 flow is added with the match inport ==
2561 P && ip4.dst == B or inport == P && ip6.dst == B with an
2562 action ct_snat; .
2563
2564 If the Gateway router has been configured to force SNAT
2565 any previously load-balanced packets to B, a priority-100
2566 flow matches ip && ip4.dst == B or ip && ip6.dst == B
2567 with an action ct_snat; .
2568
2569 For each NAT configuration in the OVN Northbound data‐
2570 base, that asks to change the source IP address of a
2571 packet from A to B, a priority-90 flow matches ip &&
2572 ip4.dst == B or ip && ip6.dst == B with an action
2573 ct_snat; . If the NAT rule is of type dnat_and_snat and
2574 has stateless=true in the options, then the action would
2575 be next;.
2576
2577 A priority-0 logical flow with match 1 has actions next;.
2578
2579 Ingress Table 4: UNSNAT on Distributed Routers
2580
2581 • For each configuration in the OVN Northbound database,
2582 that asks to change the source IP address of a packet
2583 from A to B, two priority-100 flows are added.
2584
2585 If the NAT rule cannot be handled in a distributed man‐
2586 ner, then the below priority-100 flows are only pro‐
2587 grammed on the gateway chassis.
2588
2589 • The first flow matches ip && ip4.dst == B && in‐
2590 port == GW
2591 or ip && ip6.dst == B && inport == GW where GW is
2592 the distributed gateway port corresponding to the
2593 NAT rule (specified or inferred), with an action
2594 ct_snat; to unSNAT in the common zone. If the NAT
2595 rule is of type dnat_and_snat and has state‐
2596 less=true in the options, then the action would be
2597 next;.
2598
2599 If the NAT entry is of type snat, then there is an
2600 additional match is_chassis_resident(cr-GW)
2601 where cr-GW is the chassis resident port of GW.
2602
2603 A priority-0 logical flow with match 1 has actions next;.
2604
2605 Ingress Table 5: DEFRAG
2606
2607 This is to send packets to connection tracker for tracking and defrag‐
2608 mentation. It contains a priority-0 flow that simply moves traffic to
2609 the next table.
2610
2611 For all load balancing rules that are configured in OVN_Northbound
2612 database for a Gateway router, a priority-100 flow is added for each
2613 configured virtual IP address VIP. For IPv4 VIPs the flow matches ip &&
2614 ip4.dst == VIP. For IPv6 VIPs, the flow matches ip && ip6.dst == VIP.
2615 The flow applies the action ct_dnat; to send IP packets to the connec‐
2616 tion tracker for packet de-fragmentation and to dnat the destination IP
2617 for the committed connection before sending it to the next table.
2618
2619 If ECMP routes with symmetric reply are configured in the OVN_North‐
2620 bound database for a gateway router, a priority-100 flow is added for
2621 each router port on which symmetric replies are configured. The match‐
2622 ing logic for these ports essentially reverses the configured logic of
2623 the ECMP route. So for instance, a route with a destination routing
2624 policy will instead match if the source IP address matches the static
2625 route’s prefix. The flow uses the actions chk_ecmp_nh_mac(); ct_next or
2626 chk_ecmp_nh(); ct_next to send IP packets to table 76 or to table 77 in
2627 order to check if source info are already stored by OVN and then to the
2628 connection tracker for packet de-fragmentation and tracking before
2629 sending it to the next table.
2630
2631 If load balancing rules are configured in OVN_Northbound database for a
2632 Gateway router, a priority 50 flow that matches icmp || icmp6 with an
2633 action of ct_dnat;, this allows potentially related ICMP traffic to
2634 pass through CT.
2635
2636 Ingress Table 6: Load balancing affinity check
2637
2638 Load balancing affinity check table contains the following logical
2639 flows:
2640
2641 • For all the configured load balancing rules for a logical
2642 router where a positive affinity timeout is specified in
2643 options column, that includes a L4 port PORT of protocol
2644 P and IPv4 or IPv6 address VIP, a priority-100 flow that
2645 matches on ct.new && ip && ip.dst == VIP && P && P.dst ==
2646 PORT (xxreg0 == VIP
2647 in the IPv6 case) with an action of reg0 = ip.dst;
2648 reg9[16..31] = P.dst; reg9[6] = chk_lb_aff(); next;
2649 (xxreg0 == ip6.dst in the IPv6 case)
2650
2651 • A priority 0 flow is added which matches on all packets
2652 and applies the action next;.
2653
2654 Ingress Table 7: DNAT
2655
2656 Packets enter the pipeline with destination IP address that needs to be
2657 DNATted from a virtual IP address to a real IP address. Packets in the
2658 reverse direction needs to be unDNATed.
2659
2660 Ingress Table 7: Load balancing DNAT rules
2661
2662 Following load balancing DNAT flows are added for Gateway router or
2663 Router with gateway port. These flows are programmed only on the gate‐
2664 way chassis. These flows do not get programmed for load balancers with
2665 IPv6 VIPs.
2666
2667 • For all the configured load balancing rules for a logical
2668 router where a positive affinity timeout is specified in
2669 options column, that includes a L4 port PORT of protocol
2670 P and IPv4 or IPv6 address VIP, a priority-150 flow that
2671 matches on reg9[6] == 1 && ct.new && ip && ip.dst == VIP
2672 && P && P.dst == PORT with an action of ct_lb_mark(args)
2673 , where args contains comma separated IP addresses (and
2674 optional port numbers) to load balance to. The address
2675 family of the IP addresses of args is the same as the ad‐
2676 dress family of VIP.
2677
2678 • If controller_event has been enabled for all the config‐
2679 ured load balancing rules for a Gateway router or Router
2680 with gateway port in OVN_Northbound database that does
2681 not have configured backends, a priority-130 flow is
2682 added to trigger ovn-controller events whenever the chas‐
2683 sis receives a packet for that particular VIP. If
2684 event-elb meter has been previously created, it will be
2685 associated to the empty_lb logical flow
2686
2687 • For all the configured load balancing rules for a Gateway
2688 router or Router with gateway port in OVN_Northbound
2689 database that includes a L4 port PORT of protocol P and
2690 IPv4 or IPv6 address VIP, a priority-120 flow that
2691 matches on ct.new && !ct.rel && ip && ip.dst == VIP && P
2692 && P.dst ==
2693 PORT with an action of ct_lb_mark(args), where args con‐
2694 tains comma separated IPv4 or IPv6 addresses (and op‐
2695 tional port numbers) to load balance to. If the router is
2696 configured to force SNAT any load-balanced packets, the
2697 above action will be replaced by flags.force_snat_for_lb
2698 = 1; ct_lb_mark(args; force_snat);. If the load balancing
2699 rule is configured with skip_snat set to true, the above
2700 action will be replaced by flags.skip_snat_for_lb = 1;
2701 ct_lb_mark(args; skip_snat);. If health check is enabled,
2702 then args will only contain those endpoints whose service
2703 monitor status entry in OVN_Southbound db is either on‐
2704 line or empty.
2705
2706 • For all the configured load balancing rules for a router
2707 in OVN_Northbound database that includes just an IP ad‐
2708 dress VIP to match on, a priority-110 flow that matches
2709 on ct.new && !ct.rel && ip4 && ip.dst == VIP with an ac‐
2710 tion of ct_lb_mark(args), where args contains comma sepa‐
2711 rated IPv4 or IPv6 addresses. If the router is configured
2712 to force SNAT any load-balanced packets, the above action
2713 will be replaced by flags.force_snat_for_lb = 1;
2714 ct_lb_mark(args; force_snat);. If the load balancing rule
2715 is configured with skip_snat set to true, the above ac‐
2716 tion will be replaced by flags.skip_snat_for_lb = 1;
2717 ct_lb_mark(args; skip_snat);.
2718
2719 The previous table lr_in_defrag sets the register reg0
2720 (or xxreg0 for IPv6) and does ct_dnat. Hence for estab‐
2721 lished traffic, this table just advances the packet to
2722 the next stage.
2723
2724 • If the load balancer is created with --reject option and
2725 it has no active backends, a TCP reset segment (for tcp)
2726 or an ICMP port unreachable packet (for all other kind of
2727 traffic) will be sent whenever an incoming packet is re‐
2728 ceived for this load-balancer. Please note using --reject
2729 option will disable empty_lb SB controller event for this
2730 load balancer.
2731
2732 • For the related traffic, a priority 50 flow that matches
2733 ct.rel && !ct.est && !ct.new with an action of ct_com‐
2734 mit_nat;, if the router has load balancer assigned to it.
2735 Along with two priority 70 flows that match skip_snat and
2736 force_snat flags, setting the flags.force_snat_for_lb = 1
2737 or flags.skip_snat_for_lb = 1 accordingly.
2738
2739 • For the established traffic, a priority 50 flow that
2740 matches ct.est && !ct.rel && !ct.new && ct_mark.natted
2741 with an action of next;, if the router has load balancer
2742 assigned to it. Along with two priority 70 flows that
2743 match skip_snat and force_snat flags, setting the
2744 flags.force_snat_for_lb = 1 or flags.skip_snat_for_lb = 1
2745 accordingly.
2746
2747 Ingress Table 7: DNAT on Gateway Routers
2748
2749 • For each configuration in the OVN Northbound database,
2750 that asks to change the destination IP address of a
2751 packet from A to B, a priority-100 flow matches ip &&
2752 ip4.dst == A or ip && ip6.dst == A with an action
2753 flags.loopback = 1; ct_dnat(B);. If the Gateway router is
2754 configured to force SNAT any DNATed packet, the above ac‐
2755 tion will be replaced by flags.force_snat_for_dnat = 1;
2756 flags.loopback = 1; ct_dnat(B);. If the NAT rule is of
2757 type dnat_and_snat and has stateless=true in the options,
2758 then the action would be ip4/6.dst= (B).
2759
2760 If the NAT rule has allowed_ext_ips configured, then
2761 there is an additional match ip4.src == allowed_ext_ips .
2762 Similarly, for IPV6, match would be ip6.src == al‐
2763 lowed_ext_ips.
2764
2765 If the NAT rule has exempted_ext_ips set, then there is
2766 an additional flow configured at priority 101. The flow
2767 matches if source ip is an exempted_ext_ip and the action
2768 is next; . This flow is used to bypass the ct_dnat action
2769 for a packet originating from exempted_ext_ips.
2770
2771 • A priority-0 logical flow with match 1 has actions next;.
2772
2773 Ingress Table 7: DNAT on Distributed Routers
2774
2775 On distributed routers, the DNAT table only handles packets with desti‐
2776 nation IP address that needs to be DNATted from a virtual IP address to
2777 a real IP address. The unDNAT processing in the reverse direction is
2778 handled in a separate table in the egress pipeline.
2779
2780 • For each configuration in the OVN Northbound database,
2781 that asks to change the destination IP address of a
2782 packet from A to B, a priority-100 flow matches ip &&
2783 ip4.dst == B && inport == GW, where GW is the logical
2784 router gateway port corresponding to the NAT rule (speci‐
2785 fied or inferred), with an action ct_dnat(B);. The match
2786 will include ip6.dst == B in the IPv6 case. If the NAT
2787 rule is of type dnat_and_snat and has stateless=true in
2788 the options, then the action would be ip4/6.dst=(B).
2789
2790 If the NAT rule cannot be handled in a distributed man‐
2791 ner, then the priority-100 flow above is only programmed
2792 on the gateway chassis.
2793
2794 If the NAT rule has allowed_ext_ips configured, then
2795 there is an additional match ip4.src == allowed_ext_ips .
2796 Similarly, for IPV6, match would be ip6.src == al‐
2797 lowed_ext_ips.
2798
2799 If the NAT rule has exempted_ext_ips set, then there is
2800 an additional flow configured at priority 101. The flow
2801 matches if source ip is an exempted_ext_ip and the action
2802 is next; . This flow is used to bypass the ct_dnat action
2803 for a packet originating from exempted_ext_ips.
2804
2805 A priority-0 logical flow with match 1 has actions next;.
2806
2807 Ingress Table 8: Load balancing affinity learn
2808
2809 Load balancing affinity learn table contains the following logical
2810 flows:
2811
2812 • For all the configured load balancing rules for a logical
2813 router where a positive affinity timeout T is specified
2814 in options
2815 column, that includes a L4 port PORT of protocol P and
2816 IPv4 or IPv6 address VIP, a priority-100 flow that
2817 matches on reg9[6] == 0 && ct.new && ip && reg0 == VIP &&
2818 P && reg9[16..31] == PORT (xxreg0 == VIP in the IPv6
2819 case) with an action of commit_lb_aff(vip = VIP:PORT,
2820 backend = backend ip: backend port, proto = P, timeout =
2821 T);.
2822
2823 • A priority 0 flow is added which matches on all packets
2824 and applies the action next;.
2825
2826 Ingress Table 9: ECMP symmetric reply processing
2827
2828 • If ECMP routes with symmetric reply are configured in the
2829 OVN_Northbound database for a gateway router, a prior‐
2830 ity-100 flow is added for each router port on which sym‐
2831 metric replies are configured. The matching logic for
2832 these ports essentially reverses the configured logic of
2833 the ECMP route. So for instance, a route with a destina‐
2834 tion routing policy will instead match if the source IP
2835 address matches the static route’s prefix. The flow uses
2836 the action ct_commit { ct_label.ecmp_reply_eth =
2837 eth.src;" " ct_mark.ecmp_reply_port = K;}; com‐
2838 mit_ecmp_nh(); next;
2839 to commit the connection and storing eth.src and the
2840 ECMP reply port binding tunnel key K in the ct_label and
2841 the traffic pattern to table 76 or 77.
2842
2843 Ingress Table 10: IPv6 ND RA option processing
2844
2845 • A priority-50 logical flow is added for each logical
2846 router port configured with IPv6 ND RA options which
2847 matches IPv6 ND Router Solicitation packet and applies
2848 the action put_nd_ra_opts and advances the packet to the
2849 next table.
2850
2851 reg0[5] = put_nd_ra_opts(options);next;
2852
2853
2854 For a valid IPv6 ND RS packet, this transforms the packet
2855 into an IPv6 ND RA reply and sets the RA options to the
2856 packet and stores 1 into reg0[5]. For other kinds of
2857 packets, it just stores 0 into reg0[5]. Either way, it
2858 continues to the next table.
2859
2860 • A priority-0 logical flow with match 1 has actions next;.
2861
2862 Ingress Table 11: IPv6 ND RA responder
2863
2864 This table implements IPv6 ND RA responder for the IPv6 ND RA replies
2865 generated by the previous table.
2866
2867 • A priority-50 logical flow is added for each logical
2868 router port configured with IPv6 ND RA options which
2869 matches IPv6 ND RA packets and reg0[5] == 1 and responds
2870 back to the inport after applying these actions. If
2871 reg0[5] is set to 1, it means that the action
2872 put_nd_ra_opts was successful.
2873
2874 eth.dst = eth.src;
2875 eth.src = E;
2876 ip6.dst = ip6.src;
2877 ip6.src = I;
2878 outport = P;
2879 flags.loopback = 1;
2880 output;
2881
2882
2883 where E is the MAC address and I is the IPv6 link local
2884 address of the logical router port.
2885
2886 (This terminates packet processing in ingress pipeline;
2887 the packet does not go to the next ingress table.)
2888
2889 • A priority-0 logical flow with match 1 has actions next;.
2890
2891 Ingress Table 12: IP Routing Pre
2892
2893 If a packet arrived at this table from Logical Router Port P which has
2894 options:route_table value set, a logical flow with match inport == "P"
2895 with priority 100 and action setting unique-generated per-datapath
2896 32-bit value (non-zero) in OVS register 7. This register’s value is
2897 checked in next table. If packet didn’t match any configured inport
2898 (<main> route table), register 7 value is set to 0.
2899
2900 This table contains the following logical flows:
2901
2902 • Priority-100 flow with match inport == "LRP_NAME" value
2903 and action, which set route table identifier in reg7.
2904
2905 A priority-0 logical flow with match 1 has actions reg7 =
2906 0; next;.
2907
2908 Ingress Table 13: IP Routing
2909
2910 A packet that arrives at this table is an IP packet that should be
2911 routed to the address in ip4.dst or ip6.dst. This table implements IP
2912 routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
2913 (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
2914 and advances to the next table for ARP resolution. It also sets reg1
2915 (or xxreg1) to the IP address owned by the selected router port
2916 (ingress table ARP Request will generate an ARP request, if needed,
2917 with reg0 as the target protocol address and reg1 as the source proto‐
2918 col address).
2919
2920 For ECMP routes, i.e. multiple static routes with same policy and pre‐
2921 fix but different nexthops, the above actions are deferred to next ta‐
2922 ble. This table, instead, is responsible for determine the ECMP group
2923 id and select a member id within the group based on 5-tuple hashing. It
2924 stores group id in reg8[0..15] and member id in reg8[16..31]. This step
2925 is skipped with a priority-10300 rule if the traffic going out the ECMP
2926 route is reply traffic, and the ECMP route was configured to use sym‐
2927 metric replies. Instead, the stored values in conntrack is used to
2928 choose the destination. The ct_label.ecmp_reply_eth tells the destina‐
2929 tion MAC address to which the packet should be sent. The
2930 ct_mark.ecmp_reply_port tells the logical router port on which the
2931 packet should be sent. These values saved to the conntrack fields when
2932 the initial ingress traffic is received over the ECMP route and commit‐
2933 ted to conntrack. If REGBIT_KNOWN_ECMP_NH is set, the priority-10300
2934 flows in this stage set the outport, while the eth.dst is set by flows
2935 at the ARP/ND Resolution stage.
2936
2937 This table contains the following logical flows:
2938
2939 • Priority-10550 flow that drops IPv6 Router Solicita‐
2940 tion/Advertisement packets that were not processed in
2941 previous tables.
2942
2943 • Priority-10550 flows that drop IGMP and MLD packets with
2944 source MAC address owned by the router. These are used to
2945 prevent looping statically forwarded IGMP and MLD packets
2946 for which TTL is not decremented (it is always 1).
2947
2948 • Priority-10500 flows that match IP multicast traffic des‐
2949 tined to groups registered on any of the attached
2950 switches and sets outport to the associated multicast
2951 group that will eventually flood the traffic to all in‐
2952 terested attached logical switches. The flows also decre‐
2953 ment TTL.
2954
2955 • Priority-10460 flows that match IGMP and MLD control
2956 packets, set outport to the MC_STATIC multicast group,
2957 which ovn-northd populates with the logical ports that
2958 have options :mcast_flood=’true’. If no router ports are
2959 configured to flood multicast traffic the packets are
2960 dropped.
2961
2962 • Priority-10450 flow that matches unregistered IP multi‐
2963 cast traffic decrements TTL and sets outport to the
2964 MC_STATIC multicast group, which ovn-northd populates
2965 with the logical ports that have options
2966 :mcast_flood=’true’. If no router ports are configured to
2967 flood multicast traffic the packets are dropped.
2968
2969 • IPv4 routing table. For each route to IPv4 network N with
2970 netmask M, on router port P with IP address A and Ether‐
2971 net address E, a logical flow with match ip4.dst == N/M,
2972 whose priority is the number of 1-bits in M, has the fol‐
2973 lowing actions:
2974
2975 ip.ttl--;
2976 reg8[0..15] = 0;
2977 reg0 = G;
2978 reg1 = A;
2979 eth.src = E;
2980 outport = P;
2981 flags.loopback = 1;
2982 next;
2983
2984
2985 (Ingress table 1 already verified that ip.ttl--; will not
2986 yield a TTL exceeded error.)
2987
2988 If the route has a gateway, G is the gateway IP address.
2989 Instead, if the route is from a configured static route,
2990 G is the next hop IP address. Else it is ip4.dst.
2991
2992 • IPv6 routing table. For each route to IPv6 network N with
2993 netmask M, on router port P with IP address A and Ether‐
2994 net address E, a logical flow with match in CIDR notation
2995 ip6.dst == N/M, whose priority is the integer value of M,
2996 has the following actions:
2997
2998 ip.ttl--;
2999 reg8[0..15] = 0;
3000 xxreg0 = G;
3001 xxreg1 = A;
3002 eth.src = E;
3003 outport = inport;
3004 flags.loopback = 1;
3005 next;
3006
3007
3008 (Ingress table 1 already verified that ip.ttl--; will not
3009 yield a TTL exceeded error.)
3010
3011 If the route has a gateway, G is the gateway IP address.
3012 Instead, if the route is from a configured static route,
3013 G is the next hop IP address. Else it is ip6.dst.
3014
3015 If the address A is in the link-local scope, the route
3016 will be limited to sending on the ingress port.
3017
3018 For each static route the reg7 == id && is prefixed in
3019 logical flow match portion. For routes with route_table
3020 value set a unique non-zero id is used. For routes within
3021 <main> route table (no route table set), this id value is
3022 0.
3023
3024 For each connected route (route to the LRP’s subnet CIDR)
3025 the logical flow match portion has no reg7 == id && pre‐
3026 fix to have route to LRP’s subnets in all routing tables.
3027
3028 • For ECMP routes, they are grouped by policy and prefix.
3029 An unique id (non-zero) is assigned to each group, and
3030 each member is also assigned an unique id (non-zero)
3031 within each group.
3032
3033 For each IPv4/IPv6 ECMP group with group id GID and mem‐
3034 ber ids MID1, MID2, ..., a logical flow with match in
3035 CIDR notation ip4.dst == N/M, or ip6.dst == N/M, whose
3036 priority is the integer value of M, has the following ac‐
3037 tions:
3038
3039 ip.ttl--;
3040 flags.loopback = 1;
3041 reg8[0..15] = GID;
3042 select(reg8[16..31], MID1, MID2, ...);
3043
3044
3045 • A priority-0 logical flow that matches all packets not
3046 already handled (match 1) and drops them (action drop;).
3047
3048 Ingress Table 14: IP_ROUTING_ECMP
3049
3050 This table implements the second part of IP routing for ECMP routes
3051 following the previous table. If a packet matched a ECMP group in the
3052 previous table, this table matches the group id and member id stored
3053 from the previous table, setting reg0 (or xxreg0 for IPv6) to the next-
3054 hop IP address (leaving ip4.dst or ip6.dst, the packet’s final destina‐
3055 tion, unchanged) and advances to the next table for ARP resolution. It
3056 also sets reg1 (or xxreg1) to the IP address owned by the selected
3057 router port (ingress table ARP Request will generate an ARP request, if
3058 needed, with reg0 as the target protocol address and reg1 as the source
3059 protocol address).
3060
3061 This processing is skipped for reply traffic being sent out of an ECMP
3062 route if the route was configured to use symmetric replies.
3063
3064 This table contains the following logical flows:
3065
3066 • A priority-150 flow that matches reg8[0..15] == 0 with
3067 action next; directly bypasses packets of non-ECMP
3068 routes.
3069
3070 • For each member with ID MID in each ECMP group with ID
3071 GID, a priority-100 flow with match reg8[0..15] == GID &&
3072 reg8[16..31] == MID has following actions:
3073
3074 [xx]reg0 = G;
3075 [xx]reg1 = A;
3076 eth.src = E;
3077 outport = P;
3078
3079
3080 • A priority-0 logical flow that matches all packets not
3081 already handled (match 1) and drops them (action drop;).
3082
3083 Ingress Table 15: Router policies
3084
3085 This table adds flows for the logical router policies configured on the
3086 logical router. Please see the OVN_Northbound database Logi‐
3087 cal_Router_Policy table documentation in ovn-nb for supported actions.
3088
3089 • For each router policy configured on the logical router,
3090 a logical flow is added with specified priority, match
3091 and actions.
3092
3093 • If the policy action is reroute with 2 or more nexthops
3094 defined, then the logical flow is added with the follow‐
3095 ing actions:
3096
3097 reg8[0..15] = GID;
3098 reg8[16..31] = select(1,..n);
3099
3100
3101 where GID is the ECMP group id generated by ovn-northd
3102 for this policy and n is the number of nexthops. select
3103 action selects one of the nexthop member id, stores it in
3104 the register reg8[16..31] and advances the packet to the
3105 next stage.
3106
3107 • If the policy action is reroute with just one nexhop,
3108 then the logical flow is added with the following ac‐
3109 tions:
3110
3111 [xx]reg0 = H;
3112 eth.src = E;
3113 outport = P;
3114 reg8[0..15] = 0;
3115 flags.loopback = 1;
3116 next;
3117
3118
3119 where H is the nexthop defined in the router policy, E
3120 is the ethernet address of the logical router port from
3121 which the nexthop is reachable and P is the logical
3122 router port from which the nexthop is reachable.
3123
3124 • If a router policy has the option pkt_mark=m set and if
3125 the action is not drop, then the action also includes
3126 pkt.mark = m to mark the packet with the marker m.
3127
3128 Ingress Table 16: ECMP handling for router policies
3129
3130 This table handles the ECMP for the router policies configured with
3131 multiple nexthops.
3132
3133 • A priority-150 flow is added to advance the packet to the
3134 next stage if the ECMP group id register reg8[0..15] is
3135 0.
3136
3137 • For each ECMP reroute router policy with multiple nex‐
3138 thops, a priority-100 flow is added for each nexthop H
3139 with the match reg8[0..15] == GID && reg8[16..31] == M
3140 where GID is the router policy group id generated by
3141 ovn-northd and M is the member id of the nexthop H gener‐
3142 ated by ovn-northd. The following actions are added to
3143 the flow:
3144
3145 [xx]reg0 = H;
3146 eth.src = E;
3147 outport = P
3148 "flags.loopback = 1; "
3149 "next;"
3150
3151
3152 where H is the nexthop defined in the router policy, E
3153 is the ethernet address of the logical router port from
3154 which the nexthop is reachable and P is the logical
3155 router port from which the nexthop is reachable.
3156
3157 • A priority-0 logical flow that matches all packets not
3158 already handled (match 1) and drops them (action drop;).
3159
3160 Ingress Table 17: ARP/ND Resolution
3161
3162 Any packet that reaches this table is an IP packet whose next-hop IPv4
3163 address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
3164 contains the final destination.) This table resolves the IP address in
3165 reg0 (or xxreg0) into an output port in outport and an Ethernet address
3166 in eth.dst, using the following flows:
3167
3168 • A priority-500 flow that matches IP multicast traffic
3169 that was allowed in the routing pipeline. For this kind
3170 of traffic the outport was already set so the flow just
3171 advances to the next table.
3172
3173 • Priority-200 flows that match ECMP reply traffic for the
3174 routes configured to use symmetric replies, with actions
3175 push(xxreg1); xxreg1 = ct_label; eth.dst =
3176 xxreg1[32..79]; pop(xxreg1); next;. xxreg1 is used here
3177 to avoid masked access to ct_label, to make the flow HW-
3178 offloading friendly.
3179
3180 • Static MAC bindings. MAC bindings can be known statically
3181 based on data in the OVN_Northbound database. For router
3182 ports connected to logical switches, MAC bindings can be
3183 known statically from the addresses column in the Logi‐
3184 cal_Switch_Port table. (Note: the flow is not installed
3185 for IPs of logical switch ports of type virtual, and dy‐
3186 namic MAC binding is used for those IPs instead, so that
3187 virtual parent failover does not depend on ovn-northd, to
3188 achieve better failover performance.) For router ports
3189 connected to other logical routers, MAC bindings can be
3190 known statically from the mac and networks column in the
3191 Logical_Router_Port table. (Note: the flow is NOT in‐
3192 stalled for the IP addresses that belong to a neighbor
3193 logical router port if the current router has the op‐
3194 tions:dynamic_neigh_routers set to true)
3195
3196 For each IPv4 address A whose host is known to have Eth‐
3197 ernet address E on router port P, a priority-100 flow
3198 with match outport === P && reg0 == A has actions eth.dst
3199 = E; next;.
3200
3201 For each IPv6 address A whose host is known to have Eth‐
3202 ernet address E on router port P, a priority-100 flow
3203 with match outport === P && xxreg0 == A has actions
3204 eth.dst = E; next;.
3205
3206 For each logical router port with an IPv4 address A and a
3207 mac address of E that is reachable via a different logi‐
3208 cal router port P, a priority-100 flow with match outport
3209 === P && reg0 == A has actions eth.dst = E; next;.
3210
3211 For each logical router port with an IPv6 address A and a
3212 mac address of E that is reachable via a different logi‐
3213 cal router port P, a priority-100 flow with match outport
3214 === P && xxreg0 == A has actions eth.dst = E; next;.
3215
3216 • Static MAC bindings from NAT entries. MAC bindings can
3217 also be known for the entries in the NAT table. Below
3218 flows are programmed for distributed logical routers i.e
3219 with a distributed router port.
3220
3221 For each row in the NAT table with IPv4 address A in the
3222 external_ip column of NAT table, below two flows are pro‐
3223 grammed:
3224
3225 A priority-100 flow with the match outport == P && reg0
3226 == A has actions eth.dst = E; next;, where P is the dis‐
3227 tributed logical router port, E is the Ethernet address
3228 if set in the external_mac column of NAT table for of
3229 type dnat_and_snat, otherwise the Ethernet address of the
3230 distributed logical router port. Note that if the exter‐
3231 nal_ip is not within a subnet on the owning logical
3232 router, then OVN will only create ARP resolution flows if
3233 the options:add_route is set to true. Otherwise, no ARP
3234 resolution flows will be added.
3235
3236 Corresponding to the above flow, a priority-150 flow with
3237 the match inport == P && outport == P && ip4.dst == A has
3238 actions drop; to exclude packets that have gone through
3239 DNAT/unSNAT stage but failed to convert the destination,
3240 to avoid loop.
3241
3242 For IPv6 NAT entries, same flows are added, but using the
3243 register xxreg0 and field ip6 for the match.
3244
3245 • If the router datapath runs a port with redirect-type set
3246 to bridged, for each distributed NAT rule with IP A in
3247 the logical_ip column and logical port P in the logi‐
3248 cal_port column of NAT table, a priority-90 flow with the
3249 match outport == Q && ip.src === A && is_chassis_resi‐
3250 dent(P), where Q is the distributed logical router port
3251 and action get_arp(outport, reg0); next; for IPv4 and
3252 get_nd(outport, xxreg0); next; for IPv6.
3253
3254 • Traffic with IP destination an address owned by the
3255 router should be dropped. Such traffic is normally
3256 dropped in ingress table IP Input except for IPs that are
3257 also shared with SNAT rules. However, if there was no un‐
3258 SNAT operation that happened successfully until this
3259 point in the pipeline and the destination IP of the
3260 packet is still a router owned IP, the packets can be
3261 safely dropped.
3262
3263 A priority-2 logical flow with match ip4.dst = {..}
3264 matches on traffic destined to router owned IPv4 ad‐
3265 dresses which are also SNAT IPs. This flow has action
3266 drop;.
3267
3268 A priority-2 logical flow with match ip6.dst = {..}
3269 matches on traffic destined to router owned IPv6 ad‐
3270 dresses which are also SNAT IPs. This flow has action
3271 drop;.
3272
3273 A priority-0 logical that flow matches all packets not
3274 already handled (match 1) and drops them (action drop;).
3275
3276 • Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
3277 ings that have become known dynamically through ARP or
3278 neighbor discovery. (The ingress table ARP Request will
3279 issue an ARP or neighbor solicitation request for cases
3280 where the binding is not yet known.)
3281
3282 A priority-0 logical flow with match ip4 has actions
3283 get_arp(outport, reg0); next;.
3284
3285 A priority-0 logical flow with match ip6 has actions
3286 get_nd(outport, xxreg0); next;.
3287
3288 • For a distributed gateway LRP with redirect-type set to
3289 bridged, a priority-50 flow will match outport ==
3290 "ROUTER_PORT" and !is_chassis_resident ("cr-ROUTER_PORT")
3291 has actions eth.dst = E; next;, where E is the ethernet
3292 address of the logical router port.
3293
3294 Ingress Table 18: Check packet length
3295
3296 For distributed logical routers or gateway routers with gateway port
3297 configured with options:gateway_mtu to a valid integer value, this ta‐
3298 ble adds a priority-50 logical flow with the match outport == GW_PORT
3299 where GW_PORT is the gateway router port and applies the action
3300 check_pkt_larger and advances the packet to the next table.
3301
3302 REGBIT_PKT_LARGER = check_pkt_larger(L); next;
3303
3304
3305 where L is the packet length to check for. If the packet is larger than
3306 L, it stores 1 in the register bit REGBIT_PKT_LARGER. The value of L is
3307 taken from options:gateway_mtu column of Logical_Router_Port row.
3308
3309 If the port is also configured with options:gateway_mtu_bypass then an‐
3310 other flow is added, with priority-55, to bypass the check_pkt_larger
3311 flow.
3312
3313 This table adds one priority-0 fallback flow that matches all packets
3314 and advances to the next table.
3315
3316 Ingress Table 19: Handle larger packets
3317
3318 For distributed logical routers or gateway routers with gateway port
3319 configured with options:gateway_mtu to a valid integer value, this ta‐
3320 ble adds the following priority-150 logical flow for each logical
3321 router port with the match inport == LRP && outport == GW_PORT && REG‐
3322 BIT_PKT_LARGER && !REGBIT_EGRESS_LOOPBACK, where LRP is the logical
3323 router port and GW_PORT is the gateway port and applies the following
3324 action for ipv4 and ipv6 respectively:
3325
3326 icmp4 {
3327 icmp4.type = 3; /* Destination Unreachable. */
3328 icmp4.code = 4; /* Frag Needed and DF was Set. */
3329 icmp4.frag_mtu = M;
3330 eth.dst = E;
3331 ip4.dst = ip4.src;
3332 ip4.src = I;
3333 ip.ttl = 255;
3334 REGBIT_EGRESS_LOOPBACK = 1;
3335 REGBIT_PKT_LARGER = 0;
3336 next(pipeline=ingress, table=0);
3337 };
3338 icmp6 {
3339 icmp6.type = 2;
3340 icmp6.code = 0;
3341 icmp6.frag_mtu = M;
3342 eth.dst = E;
3343 ip6.dst = ip6.src;
3344 ip6.src = I;
3345 ip.ttl = 255;
3346 REGBIT_EGRESS_LOOPBACK = 1;
3347 REGBIT_PKT_LARGER = 0;
3348 next(pipeline=ingress, table=0);
3349 };
3350
3351
3352 • Where M is the (fragment MTU - 58) whose value is taken
3353 from options:gateway_mtu column of Logical_Router_Port
3354 row.
3355
3356 • E is the Ethernet address of the logical router port.
3357
3358 • I is the IPv4/IPv6 address of the logical router port.
3359
3360 This table adds one priority-0 fallback flow that matches all packets
3361 and advances to the next table.
3362
3363 Ingress Table 20: Gateway Redirect
3364
3365 For distributed logical routers where one or more of the logical router
3366 ports specifies a gateway chassis, this table redirects certain packets
3367 to the distributed gateway port instances on the gateway chassises.
3368 This table has the following flows:
3369
3370 • For all the configured load balancing rules that include
3371 an IPv4 address VIP, and a list of IPv4 backend addresses
3372 B0, B1 .. Bn defined for the VIP a priority-200 flow is
3373 added that matches ip4 && (ip4.src == B0 || ip4.src == B1
3374 || ... || ip4.src == Bn) with an action outport = CR;
3375 next; where CR is the chassisredirect port representing
3376 the instance of the logical router distributed gateway
3377 port on the gateway chassis. If the backend IPv4 address
3378 Bx is also configured with L4 port PORT of protocol P,
3379 then the match also includes P.src == PORT. Similar flows
3380 are added for IPv6.
3381
3382 • For each NAT rule in the OVN Northbound database that can
3383 be handled in a distributed manner, a priority-100 logi‐
3384 cal flow with match ip4.src == B && outport == GW &&
3385 is_chassis_resident(P), where GW is the distributed gate‐
3386 way port specified in the NAT rule and P is the NAT logi‐
3387 cal port. IP traffic matching the above rule will be man‐
3388 aged locally setting reg1 to C and eth.src to D, where C
3389 is NAT external ip and D is NAT external mac.
3390
3391 • For each dnat_and_snat NAT rule with stateless=true and
3392 allowed_ext_ips configured, a priority-75 flow is pro‐
3393 grammed with match ip4.dst == B and action outport = CR;
3394 next; where B is the NAT rule external IP and CR is the
3395 chassisredirect port representing the instance of the
3396 logical router distributed gateway port on the gateway
3397 chassis. Moreover a priority-70 flow is programmed with
3398 same match and action drop;. For each dnat_and_snat NAT
3399 rule with stateless=true and exempted_ext_ips configured,
3400 a priority-75 flow is programmed with match ip4.dst == B
3401 and action drop; where B is the NAT rule external IP. A
3402 similar flow is added for IPv6 traffic.
3403
3404 • For each NAT rule in the OVN Northbound database that can
3405 be handled in a distributed manner, a priority-80 logical
3406 flow with drop action if the NAT logical port is a vir‐
3407 tual port not claimed by any chassis yet.
3408
3409 • A priority-50 logical flow with match outport == GW has
3410 actions outport = CR; next;, where GW is the logical
3411 router distributed gateway port and CR is the chas‐
3412 sisredirect port representing the instance of the logical
3413 router distributed gateway port on the gateway chassis.
3414
3415 • A priority-0 logical flow with match 1 has actions next;.
3416
3417 Ingress Table 21: ARP Request
3418
3419 In the common case where the Ethernet destination has been resolved,
3420 this table outputs the packet. Otherwise, it composes and sends an ARP
3421 or IPv6 Neighbor Solicitation request. It holds the following flows:
3422
3423 • Unknown MAC address. A priority-100 flow for IPv4 packets
3424 with match eth.dst == 00:00:00:00:00:00 has the following
3425 actions:
3426
3427 arp {
3428 eth.dst = ff:ff:ff:ff:ff:ff;
3429 arp.spa = reg1;
3430 arp.tpa = reg0;
3431 arp.op = 1; /* ARP request. */
3432 output;
3433 };
3434
3435
3436 Unknown MAC address. For each IPv6 static route associ‐
3437 ated with the router with the nexthop IP: G, a prior‐
3438 ity-200 flow for IPv6 packets with match eth.dst ==
3439 00:00:00:00:00:00 && xxreg0 == G with the following ac‐
3440 tions is added:
3441
3442 nd_ns {
3443 eth.dst = E;
3444 ip6.dst = I
3445 nd.target = G;
3446 output;
3447 };
3448
3449
3450 Where E is the multicast mac derived from the Gateway IP,
3451 I is the solicited-node multicast address corresponding
3452 to the target address G.
3453
3454 Unknown MAC address. A priority-100 flow for IPv6 packets
3455 with match eth.dst == 00:00:00:00:00:00 has the following
3456 actions:
3457
3458 nd_ns {
3459 nd.target = xxreg0;
3460 output;
3461 };
3462
3463
3464 (Ingress table IP Routing initialized reg1 with the IP
3465 address owned by outport and (xx)reg0 with the next-hop
3466 IP address)
3467
3468 The IP packet that triggers the ARP/IPv6 NS request is
3469 dropped.
3470
3471 • Known MAC address. A priority-0 flow with match 1 has ac‐
3472 tions output;.
3473
3474 Egress Table 0: Check DNAT local
3475
3476 This table checks if the packet needs to be DNATed in the router
3477 ingress table lr_in_dnat after it is SNATed and looped back to the
3478 ingress pipeline. This check is done only for routers configured with
3479 distributed gateway ports and NAT entries. This check is done so that
3480 SNAT and DNAT is done in different zones instead of a common zone.
3481
3482 • A priority-0 logical flow with match 1 has actions REG‐
3483 BIT_DST_NAT_IP_LOCAL = 0; next;.
3484
3485 Egress Table 1: UNDNAT
3486
3487 This is for already established connections’ reverse traffic. i.e.,
3488 DNAT has already been done in ingress pipeline and now the packet has
3489 entered the egress pipeline as part of a reply. This traffic is unD‐
3490 NATed here.
3491
3492 • A priority-0 logical flow with match 1 has actions next;.
3493
3494 Egress Table 1: UNDNAT on Gateway Routers
3495
3496 • For IPv6 Neighbor Discovery or Router Solicitation/Adver‐
3497 tisement traffic, a priority-100 flow with action next;.
3498
3499 • For all IP packets, a priority-50 flow with an action
3500 flags.loopback = 1; ct_dnat;.
3501
3502 Egress Table 1: UNDNAT on Distributed Routers
3503
3504 • For all the configured load balancing rules for a router
3505 with gateway port in OVN_Northbound database that in‐
3506 cludes an IPv4 address VIP, for every backend IPv4 ad‐
3507 dress B defined for the VIP a priority-120 flow is pro‐
3508 grammed on gateway chassis that matches ip && ip4.src ==
3509 B && outport == GW, where GW is the logical router gate‐
3510 way port with an action ct_dnat;. If the backend IPv4 ad‐
3511 dress B is also configured with L4 port PORT of protocol
3512 P, then the match also includes P.src == PORT. These
3513 flows are not added for load balancers with IPv6 VIPs.
3514
3515 If the router is configured to force SNAT any load-bal‐
3516 anced packets, above action will be replaced by
3517 flags.force_snat_for_lb = 1; ct_dnat;.
3518
3519 • For each configuration in the OVN Northbound database
3520 that asks to change the destination IP address of a
3521 packet from an IP address of A to B, a priority-100 flow
3522 matches ip && ip4.src == B && outport == GW, where GW is
3523 the logical router gateway port, with an action ct_dnat;.
3524 If the NAT rule is of type dnat_and_snat and has state‐
3525 less=true in the options, then the action would be next;.
3526
3527 If the NAT rule cannot be handled in a distributed man‐
3528 ner, then the priority-100 flow above is only programmed
3529 on the gateway chassis with the action ct_dnat.
3530
3531 If the NAT rule can be handled in a distributed manner,
3532 then there is an additional action eth.src = EA;, where
3533 EA is the ethernet address associated with the IP address
3534 A in the NAT rule. This allows upstream MAC learning to
3535 point to the correct chassis.
3536
3537 Egress Table 2: Post UNDNAT
3538
3539 • A priority-50 logical flow is added that commits any un‐
3540 tracked flows from the previous table lr_out_undnat for
3541 Gateway routers. This flow matches on ct.new && ip with
3542 action ct_commit { } ; next; .
3543
3544 • A priority-0 logical flow with match 1 has actions next;.
3545
3546 Egress Table 3: SNAT
3547
3548 Packets that are configured to be SNATed get their source IP address
3549 changed based on the configuration in the OVN Northbound database.
3550
3551 • A priority-120 flow to advance the IPv6 Neighbor solici‐
3552 tation packet to next table to skip SNAT. In the case
3553 where ovn-controller injects an IPv6 Neighbor Solicita‐
3554 tion packet (for nd_ns action) we don’t want the packet
3555 to go through conntrack.
3556
3557 Egress Table 3: SNAT on Gateway Routers
3558
3559 • If the Gateway router in the OVN Northbound database has
3560 been configured to force SNAT a packet (that has been
3561 previously DNATted) to B, a priority-100 flow matches
3562 flags.force_snat_for_dnat == 1 && ip with an action
3563 ct_snat(B);.
3564
3565 • If a load balancer configured to skip snat has been ap‐
3566 plied to the Gateway router pipeline, a priority-120 flow
3567 matches flags.skip_snat_for_lb == 1 && ip with an action
3568 next;.
3569
3570 • If the Gateway router in the OVN Northbound database has
3571 been configured to force SNAT a packet (that has been
3572 previously load-balanced) using router IP (i.e op‐
3573 tions:lb_force_snat_ip=router_ip), then for each logical
3574 router port P attached to the Gateway router, a prior‐
3575 ity-110 flow matches flags.force_snat_for_lb == 1 && out‐
3576 port == P
3577 with an action ct_snat(R); where R is the IP configured
3578 on the router port. If R is an IPv4 address then the
3579 match will also include ip4 and if it is an IPv6 address,
3580 then the match will also include ip6.
3581
3582 If the logical router port P is configured with multiple
3583 IPv4 and multiple IPv6 addresses, only the first IPv4 and
3584 first IPv6 address is considered.
3585
3586 • If the Gateway router in the OVN Northbound database has
3587 been configured to force SNAT a packet (that has been
3588 previously load-balanced) to B, a priority-100 flow
3589 matches flags.force_snat_for_lb == 1 && ip with an action
3590 ct_snat(B);.
3591
3592 • For each configuration in the OVN Northbound database,
3593 that asks to change the source IP address of a packet
3594 from an IP address of A or to change the source IP ad‐
3595 dress of a packet that belongs to network A to B, a flow
3596 matches ip && ip4.src == A && (!ct.trk || !ct.rpl) with
3597 an action ct_snat(B);. The priority of the flow is calcu‐
3598 lated based on the mask of A, with matches having larger
3599 masks getting higher priorities. If the NAT rule is of
3600 type dnat_and_snat and has stateless=true in the options,
3601 then the action would be ip4/6.src= (B).
3602
3603 • If the NAT rule has allowed_ext_ips configured, then
3604 there is an additional match ip4.dst == allowed_ext_ips .
3605 Similarly, for IPV6, match would be ip6.dst == al‐
3606 lowed_ext_ips.
3607
3608 • If the NAT rule has exempted_ext_ips set, then there is
3609 an additional flow configured at the priority + 1 of cor‐
3610 responding NAT rule. The flow matches if destination ip
3611 is an exempted_ext_ip and the action is next; . This flow
3612 is used to bypass the ct_snat action for a packet which
3613 is destinted to exempted_ext_ips.
3614
3615 • A priority-0 logical flow with match 1 has actions next;.
3616
3617 Egress Table 3: SNAT on Distributed Routers
3618
3619 • For each configuration in the OVN Northbound database,
3620 that asks to change the source IP address of a packet
3621 from an IP address of A or to change the source IP ad‐
3622 dress of a packet that belongs to network A to B, two
3623 flows are added. The priority P of these flows are calcu‐
3624 lated based on the mask of A, with matches having larger
3625 masks getting higher priorities.
3626
3627 If the NAT rule cannot be handled in a distributed man‐
3628 ner, then the below flows are only programmed on the
3629 gateway chassis increasing flow priority by 128 in order
3630 to be run first.
3631
3632 • The first flow is added with the calculated prior‐
3633 ity P and match ip && ip4.src == A && outport ==
3634 GW, where GW is the logical router gateway port,
3635 with an action ct_snat(B); to SNATed in the common
3636 zone. If the NAT rule is of type dnat_and_snat and
3637 has stateless=true in the options, then the action
3638 would be ip4/6.src=(B).
3639
3640 If the NAT rule can be handled in a distributed manner,
3641 then there is an additional action (for both the flows)
3642 eth.src = EA;, where EA is the ethernet address associ‐
3643 ated with the IP address A in the NAT rule. This allows
3644 upstream MAC learning to point to the correct chassis.
3645
3646 If the NAT rule has allowed_ext_ips configured, then
3647 there is an additional match ip4.dst == allowed_ext_ips .
3648 Similarly, for IPV6, match would be ip6.dst == al‐
3649 lowed_ext_ips.
3650
3651 If the NAT rule has exempted_ext_ips set, then there is
3652 an additional flow configured at the priority P + 2 of
3653 corresponding NAT rule. The flow matches if destination
3654 ip is an exempted_ext_ip and the action is next; . This
3655 flow is used to bypass the ct_snat action for a flow
3656 which is destinted to exempted_ext_ips.
3657
3658 • A priority-0 logical flow with match 1 has actions next;.
3659
3660 Egress Table 4: Post SNAT
3661
3662 Packets reaching this table are processed according to the flows below:
3663
3664 • A priority-0 logical flow that matches all packets not
3665 already handled (match 1) and action next;.
3666
3667 Egress Table 5: Egress Loopback
3668
3669 For distributed logical routers where one of the logical router ports
3670 specifies a gateway chassis.
3671
3672 While UNDNAT and SNAT processing have already occurred by this point,
3673 this traffic needs to be forced through egress loopback on this dis‐
3674 tributed gateway port instance, in order for UNSNAT and DNAT processing
3675 to be applied, and also for IP routing and ARP resolution after all of
3676 the NAT processing, so that the packet can be forwarded to the destina‐
3677 tion.
3678
3679 This table has the following flows:
3680
3681 • For each NAT rule in the OVN Northbound database on a
3682 distributed router, a priority-100 logical flow with
3683 match ip4.dst == E && outport == GW && is_chassis_resi‐
3684 dent(P), where E is the external IP address specified in
3685 the NAT rule, GW is the distributed gateway port corre‐
3686 sponding to the NAT rule (specified or inferred). For
3687 dnat_and_snat NAT rule, P is the logical port specified
3688 in the NAT rule. If logical_port column of NAT table is
3689 NOT set, then P is the chassisredirect port of GW with
3690 the following actions:
3691
3692 clone {
3693 ct_clear;
3694 inport = outport;
3695 outport = "";
3696 flags = 0;
3697 flags.loopback = 1;
3698 reg0 = 0;
3699 reg1 = 0;
3700 ...
3701 reg9 = 0;
3702 REGBIT_EGRESS_LOOPBACK = 1;
3703 next(pipeline=ingress, table=0);
3704 };
3705
3706
3707 flags.loopback is set since in_port is unchanged and the
3708 packet may return back to that port after NAT processing.
3709 REGBIT_EGRESS_LOOPBACK is set to indicate that egress
3710 loopback has occurred, in order to skip the source IP ad‐
3711 dress check against the router address.
3712
3713 • A priority-0 logical flow with match 1 has actions next;.
3714
3715 Egress Table 6: Delivery
3716
3717 Packets that reach this table are ready for delivery. It contains:
3718
3719 • Priority-110 logical flows that match IP multicast pack‐
3720 ets on each enabled logical router port and modify the
3721 Ethernet source address of the packets to the Ethernet
3722 address of the port and then execute action output;.
3723
3724 • Priority-100 logical flows that match packets on each en‐
3725 abled logical router port, with action output;.
3726
3727 • A priority-0 logical flow that matches all packets not
3728 already handled (match 1) and drops them (action drop;).
3729
3731 As described in the previous section, there are several places where
3732 ovn-northd might decided to drop a packet by explicitly creating a Log‐
3733 ical_Flow with the drop; action.
3734
3735 When debug drop-sampling has been cofigured in the OVN Northbound data‐
3736 base, the ovn-northd will replace all the drop; actions with a sam‐
3737 ple(priority=65535, collector_set=id, obs_domain=obs_id,
3738 obs_point=@cookie) action, where:
3739
3740 • id is the value the debug_drop_collector_set option con‐
3741 figured in the OVN Northbound.
3742
3743 • obs_id has it’s 8 most significant bits equal to the
3744 value of the debug_drop_domain_id option in the OVN
3745 Northbound and it’s 24 least significant bits equal to
3746 the datapath’s tunnel key.
3747
3748
3749
3750OVN 23.09.2 ovn-northd ovn-northd(8)