1ovn-northd(8)                     OVN Manual                     ovn-northd(8)
2
3
4

NAME

6       ovn-northd  and ovn-northd-ddlog - Open Virtual Network central control
7       daemon
8

SYNOPSIS

10       ovn-northd [options]
11

DESCRIPTION

13       ovn-northd is a centralized  daemon  responsible  for  translating  the
14       high-level  OVN  configuration into logical configuration consumable by
15       daemons such as ovn-controller. It translates the logical network  con‐
16       figuration  in  terms  of conventional network concepts, taken from the
17       OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
18       the OVN Southbound Database (see ovn-sb(5)) below it.
19
20       ovn-northd is implemented in C. ovn-northd-ddlog is a compatible imple‐
21       mentation written in DDlog, a language for  incremental  database  pro‐
22       cessing.  This documentation applies to both implementations, with dif‐
23       ferences indicated where relevant.
24

OPTIONS

26       --ovnnb-db=database
27              The OVSDB database containing the OVN  Northbound  Database.  If
28              the  OVN_NB_DB environment variable is set, its value is used as
29              the default. Otherwise, the default is unix:/ovnnb_db.sock.
30
31       --ovnsb-db=database
32              The OVSDB database containing the OVN  Southbound  Database.  If
33              the  OVN_SB_DB environment variable is set, its value is used as
34              the default. Otherwise, the default is unix:/ovnsb_db.sock.
35
36       --ddlog-record=file
37              This option is for ovn-north-ddlog only. It causes the daemon to
38              record  the  initial database state and later changes to file in
39              the text-based DDlog command format. The ovn_northd_cli  program
40              can  later replay these changes for debugging purposes. This op‐
41              tion has a performance impact. See  debugging-ddlog.rst  in  the
42              OVN documentation for more details.
43
44       --dry-run
45              Causes   ovn-northd  to  start  paused.  In  the  paused  state,
46              ovn-northd does not apply any changes to the databases, although
47              it  continues  to  monitor  them.  For more information, see the
48              pause command, under Runtime Management Commands below.
49
50              For  ovn-northd-ddlog,  one   could   use   this   option   with
51              --ddlog-record  to  generate  a  replay log without restarting a
52              process or disturbing a running system.
53
54       n-threads N
55              In certain situations, it may  be  desirable  to  enable  paral‐
56              lelization  on  a  system  to decrease latency (at the potential
57              cost of increasing CPU usage).
58
59              This option will cause ovn-northd to use N threads when building
60              logical flows, when N is within [2-256]. If N is 1, paralleliza‐
61              tion is disabled (default behavior). If N is less than 1, then N
62              is  set  to  1,  parallelization  is  disabled  and a warning is
63              logged. If N is more than 256, then N  is  set  to  256,  paral‐
64              lelization  is  enabled  (with  256  threads)  and  a warning is
65              logged.
66
67              ovn-northd-ddlog does not support this option.
68
69       database in the above options must be an OVSDB active or  passive  con‐
70       nection method, as described in ovsdb(7).
71
72   Daemon Options
73       --pidfile[=pidfile]
74              Causes a file (by default, program.pid) to be created indicating
75              the PID of the running process. If the pidfile argument  is  not
76              specified, or if it does not begin with /, then it is created in
77              .
78
79              If --pidfile is not specified, no pidfile is created.
80
81       --overwrite-pidfile
82              By default, when --pidfile is specified and the  specified  pid‐
83              file already exists and is locked by a running process, the dae‐
84              mon refuses to start. Specify --overwrite-pidfile to cause it to
85              instead overwrite the pidfile.
86
87              When --pidfile is not specified, this option has no effect.
88
89       --detach
90              Runs  this  program  as a background process. The process forks,
91              and in the child it starts a new session,  closes  the  standard
92              file descriptors (which has the side effect of disabling logging
93              to the console), and changes its current directory to  the  root
94              (unless  --no-chdir is specified). After the child completes its
95              initialization, the parent exits.
96
97       --monitor
98              Creates an additional process to monitor  this  program.  If  it
99              dies  due  to a signal that indicates a programming error (SIGA‐
100              BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
101              or SIGXFSZ) then the monitor process starts a new copy of it. If
102              the daemon dies or exits for another reason, the monitor process
103              exits.
104
105              This  option  is  normally used with --detach, but it also func‐
106              tions without it.
107
108       --no-chdir
109              By default, when --detach is specified, the daemon  changes  its
110              current  working  directory  to  the root directory after it de‐
111              taches. Otherwise, invoking the daemon from a carelessly  chosen
112              directory  would  prevent  the administrator from unmounting the
113              file system that holds that directory.
114
115              Specifying --no-chdir suppresses this behavior,  preventing  the
116              daemon  from changing its current working directory. This may be
117              useful for collecting core files, since it is common behavior to
118              write core dumps into the current working directory and the root
119              directory is not a good directory to use.
120
121              This option has no effect when --detach is not specified.
122
123       --no-self-confinement
124              By default this daemon will try to self-confine itself  to  work
125              with  files  under  well-known  directories  determined at build
126              time. It is better to stick with this default behavior  and  not
127              to  use  this  flag  unless some other Access Control is used to
128              confine daemon. Note that in contrast to  other  access  control
129              implementations  that  are  typically enforced from kernel-space
130              (e.g. DAC or MAC), self-confinement is imposed  from  the  user-
131              space daemon itself and hence should not be considered as a full
132              confinement strategy, but instead should be viewed as  an  addi‐
133              tional layer of security.
134
135       --user=user:group
136              Causes  this  program  to  run  as a different user specified in
137              user:group, thus dropping most of  the  root  privileges.  Short
138              forms  user  and  :group  are also allowed, with current user or
139              group assumed, respectively. Only daemons started  by  the  root
140              user accepts this argument.
141
142              On   Linux,   daemons   will   be   granted   CAP_IPC_LOCK   and
143              CAP_NET_BIND_SERVICES before dropping root  privileges.  Daemons
144              that  interact  with  a  datapath, such as ovs-vswitchd, will be
145              granted three  additional  capabilities,  namely  CAP_NET_ADMIN,
146              CAP_NET_BROADCAST  and  CAP_NET_RAW.  The capability change will
147              apply even if the new user is root.
148
149              On Windows, this option is not currently supported. For security
150              reasons,  specifying  this  option will cause the daemon process
151              not to start.
152
153   Logging Options
154       -v[spec]
155       --verbose=[spec]
156            Sets logging levels. Without any spec, sets the log level for  ev‐
157            ery  module  and  destination to dbg. Otherwise, spec is a list of
158            words separated by spaces or commas or colons, up to one from each
159            category below:
160
161            •      A  valid module name, as displayed by the vlog/list command
162                   on ovs-appctl(8), limits the log level change to the speci‐
163                   fied module.
164
165syslog,  console, or file, to limit the log level change to
166                   only to the system log, to the console, or to a  file,  re‐
167                   spectively.  (If  --detach  is specified, the daemon closes
168                   its standard file descriptors, so logging  to  the  console
169                   will have no effect.)
170
171                   On  Windows  platform,  syslog is accepted as a word and is
172                   only useful along with the --syslog-target option (the word
173                   has no effect otherwise).
174
175off,  emer,  err,  warn,  info,  or dbg, to control the log
176                   level. Messages of the given severity  or  higher  will  be
177                   logged,  and  messages  of  lower severity will be filtered
178                   out. off filters out all messages. See ovs-appctl(8) for  a
179                   definition of each log level.
180
181            Case is not significant within spec.
182
183            Regardless  of the log levels set for file, logging to a file will
184            not take place unless --log-file is also specified (see below).
185
186            For compatibility with older versions of OVS, any is accepted as a
187            word but has no effect.
188
189       -v
190       --verbose
191            Sets  the  maximum  logging  verbosity level, equivalent to --ver‐
192            bose=dbg.
193
194       -vPATTERN:destination:pattern
195       --verbose=PATTERN:destination:pattern
196            Sets the log pattern for destination to pattern. Refer to  ovs-ap‐
197            pctl(8) for a description of the valid syntax for pattern.
198
199       -vFACILITY:facility
200       --verbose=FACILITY:facility
201            Sets  the RFC5424 facility of the log message. facility can be one
202            of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
203            ftp,  ntp,  audit,  alert, clock2, local0, local1, local2, local3,
204            local4, local5, local6 or local7. If this option is not specified,
205            daemon  is used as the default for the local system syslog and lo‐
206            cal0 is used while sending a message to the  target  provided  via
207            the --syslog-target option.
208
209       --log-file[=file]
210            Enables  logging  to a file. If file is specified, then it is used
211            as the exact name for the log file. The default log file name used
212            if file is omitted is /var/log/ovn/program.log.
213
214       --syslog-target=host:port
215            Send  syslog messages to UDP port on host, in addition to the sys‐
216            tem syslog. The host must be a numerical IP address, not  a  host‐
217            name.
218
219       --syslog-method=method
220            Specify  method  as  how  syslog messages should be sent to syslog
221            daemon. The following forms are supported:
222
223libc, to use the libc syslog() function. Downside of  using
224                   this  options  is that libc adds fixed prefix to every mes‐
225                   sage before it is actually sent to the syslog  daemon  over
226                   /dev/log UNIX domain socket.
227
228unix:file, to use a UNIX domain socket directly. It is pos‐
229                   sible to specify arbitrary message format with this option.
230                   However,  rsyslogd  8.9  and  older versions use hard coded
231                   parser function anyway that limits UNIX domain socket  use.
232                   If  you  want  to  use  arbitrary message format with older
233                   rsyslogd versions, then use UDP socket to localhost IP  ad‐
234                   dress instead.
235
236udp:ip:port,  to  use  a UDP socket. With this method it is
237                   possible to use arbitrary message format  also  with  older
238                   rsyslogd.  When sending syslog messages over UDP socket ex‐
239                   tra precaution needs to be taken into account, for example,
240                   syslog daemon needs to be configured to listen on the spec‐
241                   ified UDP port, accidental iptables rules could  be  inter‐
242                   fering  with  local syslog traffic and there are some secu‐
243                   rity considerations that apply to UDP sockets, but  do  not
244                   apply to UNIX domain sockets.
245
246null, to discard all messages logged to syslog.
247
248            The  default is taken from the OVS_SYSLOG_METHOD environment vari‐
249            able; if it is unset, the default is libc.
250
251   PKI Options
252       PKI configuration is required in order to use SSL for  the  connections
253       to the Northbound and Southbound databases.
254
255              -p privkey.pem
256              --private-key=privkey.pem
257                   Specifies  a  PEM  file  containing the private key used as
258                   identity for outgoing SSL connections.
259
260              -c cert.pem
261              --certificate=cert.pem
262                   Specifies a PEM file containing a certificate  that  certi‐
263                   fies the private key specified on -p or --private-key to be
264                   trustworthy. The certificate must be signed by the certifi‐
265                   cate  authority  (CA) that the peer in SSL connections will
266                   use to verify it.
267
268              -C cacert.pem
269              --ca-cert=cacert.pem
270                   Specifies a PEM file containing the CA certificate for ver‐
271                   ifying certificates presented to this program by SSL peers.
272                   (This may be the same certificate that  SSL  peers  use  to
273                   verify the certificate specified on -c or --certificate, or
274                   it may be a different one, depending on the PKI  design  in
275                   use.)
276
277              -C none
278              --ca-cert=none
279                   Disables  verification  of  certificates  presented  by SSL
280                   peers. This introduces a security risk,  because  it  means
281                   that  certificates  cannot be verified to be those of known
282                   trusted hosts.
283
284   Other Options
285       --unixctl=socket
286              Sets the name of the control socket on which program listens for
287              runtime  management  commands  (see RUNTIME MANAGEMENT COMMANDS,
288              below). If socket does not begin with /, it  is  interpreted  as
289              relative  to  .  If  --unixctl  is  not used at all, the default
290              socket is /program.pid.ctl, where pid is program’s process ID.
291
292              On Windows a local named pipe is used to listen for runtime man‐
293              agement  commands.  A  file  is  created in the absolute path as
294              pointed by socket or if --unixctl is not used at all, a file  is
295              created  as  program in the configured OVS_RUNDIR directory. The
296              file exists just to mimic the behavior of a Unix domain socket.
297
298              Specifying none for socket disables the control socket feature.
299
300
301
302       -h
303       --help
304            Prints a brief help message to the console.
305
306       -V
307       --version
308            Prints version information to the console.
309

RUNTIME MANAGEMENT COMMANDS

311       ovs-appctl can send commands to a running ovn-northd process. The  cur‐
312       rently supported commands are described below.
313
314              exit   Causes ovn-northd to gracefully terminate.
315
316              pause  Pauses ovn-northd. When it is paused, ovn-northd receives
317                     changes  from  the  Northbound  and  Southbound  database
318                     changes  as  usual,  but  it does not send any updates. A
319                     paused ovn-northd also drops database locks, which allows
320                     any other non-paused instance of ovn-northd to take over.
321
322              resume Resumes  the  ovn-northd  operation to process Northbound
323                     and Southbound database  contents  and  generate  logical
324                     flows.  This  will also instruct ovn-northd to aspire for
325                     the lock on SB DB.
326
327              is-paused
328                     Returns "true" if ovn-northd is currently paused, "false"
329                     otherwise.
330
331              status Prints  this  server’s status. Status will be "active" if
332                     ovn-northd has acquired OVSDB lock on SB DB, "standby" if
333                     it has not or "paused" if this instance is paused.
334
335              sb-cluster-state-reset
336                     Reset  southbound  database cluster status when databases
337                     are destroyed and rebuilt.
338
339                     If all databases in a clustered southbound  database  are
340                     removed from disk, then the stored index of all databases
341                     will be reset to zero. This will cause ovn-northd  to  be
342                     unable  to  read or write to the southbound database, be‐
343                     cause it will always detect the data as stale. In such  a
344                     case,  run this command so that ovn-northd will reset its
345                     local index so that it can interact with  the  southbound
346                     database again.
347
348              nb-cluster-state-reset
349                     Reset  northbound  database cluster status when databases
350                     are destroyed and rebuilt.
351
352                     This performs the same task as sb-cluster-state-reset ex‐
353                     cept for the northbound database client.
354
355              set-n-threads N
356                     Set  the  number  of  threads  used  for building logical
357                     flows. When N is within [2-256], parallelization  is  en‐
358                     abled. When N is 1 parallelization is disabled. When N is
359                     less than 1 or more than 256, an error  is  returned.  If
360                     ovn-northd  fails to start parallelization (e.g. fails to
361                     setup semaphores, parallelization is disabled and an  er‐
362                     ror is returned.
363
364              get-n-threads
365                     Return  the  number  of threads used for building logical
366                     flows.
367
368              inc-engine/show-stats
369                     Display ovn-northd engine counters. For each engine  node
370                     the following counters have been added:
371
372recompute
373
374compute
375
376abort
377
378              inc-engine/show-stats engine_node_name counter_name
379                     Display  the  ovn-northd engine counter(s) for the speci‐
380                     fied engine_node_name. counter_name is optional  and  can
381                     be one of recompute, compute or abort.
382
383              inc-engine/clear-stats
384                     Reset ovn-northd engine counters.
385
386       Only ovn-northd-ddlog supports the following commands:
387
388              enable-cpu-profiling
389              disable-cpu-profiling
390                   Enables or disables profiling of CPU time used by the DDlog
391                   engine. When CPU profiling is enabled, the profile  command
392                   (see  below) will include DDlog CPU usage statistics in its
393                   output. Enabling CPU profiling will slow  ovn-northd-ddlog.
394                   Disabling  CPU  profiling  does  not  clear  any previously
395                   recorded statistics.
396
397              profile
398                   Outputs a profile of the current and peak sizes of arrange‐
399                   ments  inside  DDlog. This profiling data can be useful for
400                   optimizing DDlog code. If CPU profiling was previously  en‐
401                   abled  (even if it was later disabled), the output also in‐
402                   cludes a CPU time profile. See Profiling inside  the  tuto‐
403                   rial in the DDlog repository for an introduction to profil‐
404                   ing DDlog.
405

ACTIVE-STANDBY FOR HIGH AVAILABILITY

407       You may run ovn-northd more than once in an OVN deployment.  When  con‐
408       nected  to  a  standalone or clustered DB setup, OVN will automatically
409       ensure that only one of them is active at a time. If multiple instances
410       of  ovn-northd  are running and the active ovn-northd fails, one of the
411       hot standby instances of ovn-northd will automatically take over.
412
413   Active-Standby with multiple OVN DB servers
414       You may run multiple OVN DB servers in an OVN deployment with:
415
416              •      OVN DB servers deployed in active/passive mode  with  one
417                     active and multiple passive ovsdb-servers.
418
419ovn-northd  also  deployed on all these nodes, using unix
420                     ctl sockets to connect to the local OVN DB servers.
421
422       In such deployments, the ovn-northds on the passive nodes will  process
423       the  DB  changes  and compute logical flows to be thrown out later, be‐
424       cause write transactions are not allowed by the passive  ovsdb-servers.
425       It results in unnecessary CPU usage.
426
427       With  the  help  of  runtime  management  command  pause, you can pause
428       ovn-northd on these nodes. When a passive node becomes master, you  can
429       use  the  runtime management command resume to resume the ovn-northd to
430       process the DB changes.
431

LOGICAL FLOW TABLE STRUCTURE

433       One of the main purposes of ovn-northd is to populate the  Logical_Flow
434       table  in  the  OVN_Southbound  database.  This  section  describes how
435       ovn-northd does this for switch and router logical datapaths.
436
437   Logical Switch Datapaths
438     Ingress Table 0: Admission Control and Ingress Port Security check
439
440       Ingress table 0 contains these logical flows:
441
442              •      Priority 100 flows to drop packets with VLAN tags or mul‐
443                     ticast Ethernet source addresses.
444
445              •      For  each  disabled  logical port, a priority 100 flow is
446                     added which matches on all packets and applies the action
447                     REGBIT_PORT_SEC_DROP" = 1; next;" so that the packets are
448                     dropped in the next stage.
449
450              •      For each (enabled) vtep logical port, a priority 70  flow
451                     is added which matches on all packets and applies the ac‐
452                     tion next(pipeline=ingress, table=S_SWITCH_IN_L2_LKUP)  =
453                     1;  to  skip  most  stages of ingress pipeline and go di‐
454                     rectly to ingress L2 lookup table to determine the output
455                     port.  Packets from VTEP (RAMP) switch should not be sub‐
456                     jected to any ACL checks. Egress pipeline will do the ACL
457                     checks.
458
459              •      For each enabled logical port configured with qdisc queue
460                     id  in  the  options:qdisc_queue_id   column   of   Logi‐
461                     cal_Switch_Port,  a  priority  70  flow  is  added  which
462                     matches  on  all   packets   and   applies   the   action
463                     set_queue(id);           REGBIT_PORT_SEC_DROP"          =
464                     check_in_port_sec(); next;".
465
466              •      A priority 1 flow is added which matches on  all  packets
467                     for  all  the  logical  ports and applies the action REG‐
468                     BIT_PORT_SEC_DROP" = check_in_port_sec(); next; to evalu‐
469                     ate  the  port security. The action check_in_port_sec ap‐
470                     plies the port security rules defined in  the  port_secu‐
471                     rity column of Logical_Switch_Port table.
472
473     Ingress Table 1: Ingress Port Security - Apply
474
475       This  table  drops the packets if the port security check failed in the
476       previous stage i.e the register bit REGBIT_PORT_SEC_DROP is set to 1.
477
478       Ingress table 1 contains these logical flows:
479
480              •      A priority-50 fallback flow that drops the packet if  the
481                     register bit REGBIT_PORT_SEC_DROP is set to 1.
482
483              •      One priority-0 fallback flow that matches all packets and
484                     advances to the next table.
485
486     Ingress Table 2: Lookup MAC address learning table
487
488       This table looks up the MAC learning table of the logical switch  data‐
489       path  to  check  if  the port-mac pair is present or not. MAC is learnt
490       only for logical switch VIF ports whose port security is  disabled  and
491       ’unknown’ address set.
492
493              •      For  each such logical port p whose port security is dis‐
494                     abled and ’unknown’ address set following flow is added.
495
496                     •      Priority 100 flow with the match inport ==  p  and
497                            action  reg0[11]  =  lookup_fdb(inport,  eth.src);
498                            next;
499
500              •      One priority-0 fallback flow that matches all packets and
501                     advances to the next table.
502
503     Ingress Table 3: Learn MAC of ’unknown’ ports.
504
505       This  table  learns  the  MAC addresses seen on the logical ports whose
506       port security is disabled and ’unknown’ address set if  the  lookup_fdb
507       action returned false in the previous table.
508
509              •      For  each such logical port p whose port security is dis‐
510                     abled and ’unknown’ address set following flow is added.
511
512                     •      Priority 100 flow with the match inport  ==  p  &&
513                            reg0[11] == 0 and action put_fdb(inport, eth.src);
514                            next; which stores the port-mac in the mac  learn‐
515                            ing  table  of the logical switch datapath and ad‐
516                            vances the packet to the next table.
517
518              •      One priority-0 fallback flow that matches all packets and
519                     advances to the next table.
520
521     Ingress Table 4: from-lport Pre-ACLs
522
523       This  table  prepares  flows  for  possible  stateful ACL processing in
524       ingress table ACLs. It contains a priority-0  flow  that  simply  moves
525       traffic  to  the  next  table. If stateful ACLs are used in the logical
526       datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
527       1;  next;)  for table Pre-stateful to send IP packets to the connection
528       tracker before eventually advancing to ingress table ACLs.  If  special
529       ports  such  as  route ports or localnet ports can’t use ct(), a prior‐
530       ity-110 flow is added to  skip  over  stateful  ACLs.  Multicast,  IPv6
531       Neighbor  Discovery  and MLD traffic also skips stateful ACLs. For "al‐
532       low-stateless" ACLs, a flow is added to bypass  setting  the  hint  for
533       connection tracker processing.
534
535       This table also has a priority-110 flow with the match eth.dst == E for
536       all logical switch datapaths to move traffic to the next table. Where E
537       is  the service monitor mac defined in the options:svc_monitor_mac col‐
538       umn of NB_Global table.
539
540     Ingress Table 5: Pre-LB
541
542       This table prepares flows for possible stateful load balancing process‐
543       ing  in  ingress  table  LB and Stateful. It contains a priority-0 flow
544       that simply moves traffic to the next table. Moreover it  contains  two
545       priority-110  flows  to move multicast, IPv6 Neighbor Discovery and MLD
546       traffic to the next table. If load balancing rules with virtual IP  ad‐
547       dresses  (and  ports)  are  configured in OVN_Northbound database for a
548       logical switch datapath, a priority-100 flow is added with the match ip
549       to match on IP packets and sets the action reg0[2] = 1; next; to act as
550       a hint for table Pre-stateful to send  IP  packets  to  the  connection
551       tracker  for  packet  de-fragmentation (and to possibly do DNAT for al‐
552       ready established load balanced traffic) before eventually advancing to
553       ingress  table  Stateful. If controller_event has been enabled and load
554       balancing rules with empty backends have been added in  OVN_Northbound,
555       a 130 flow is added to trigger ovn-controller events whenever the chas‐
556       sis receives a packet for that particular VIP. If event-elb  meter  has
557       been  previously created, it will be associated to the empty_lb logical
558       flow
559
560       Prior to OVN 20.09 we were setting the reg0[0] = 1 only if the IP  des‐
561       tination  matches  the  load  balancer VIP. However this had few issues
562       cases where a logical switch doesn’t have any ACLs  with  allow-related
563       action.  To  understand  the  issue  lets  a take a TCP load balancer -
564       10.0.0.10:80=10.0.0.3:80. If a logical port - p1  with  IP  -  10.0.0.5
565       opens a TCP connection with the VIP - 10.0.0.10, then the packet in the
566       ingress pipeline of ’p1’ is sent to the p1’s conntrack zone id and  the
567       packet is load balanced to the backend - 10.0.0.3. For the reply packet
568       from the backend lport, it is not sent  to  the  conntrack  of  backend
569       lport’s  zone  id. This is fine as long as the packet is valid. Suppose
570       the backend lport sends an invalid TCP packet (like incorrect  sequence
571       number),  the packet gets delivered to the lport ’p1’ without unDNATing
572       the packet to the VIP - 10.0.0.10. And this causes the connection to be
573       reset by the lport p1’s VIF.
574
575       We can’t fix this issue by adding a logical flow to drop ct.inv packets
576       in the egress pipeline since it will drop  all  other  connections  not
577       destined  to  the  load  balancers.  To fix this issue, we send all the
578       packets to the conntrack in the ingress pipeline if a load balancer  is
579       configured. We can now add a lflow to drop ct.inv packets.
580
581       This  table  also has priority-120 flows that punt all IGMP/MLD packets
582       to ovn-controller if the switch is an interconnect switch  with  multi‐
583       cast snooping enabled.
584
585       This table also has a priority-110 flow with the match eth.dst == E for
586       all logical switch datapaths to move traffic to the next table. Where E
587       is  the service monitor mac defined in the options:svc_monitor_mac col‐
588       umn of NB_Global table.
589
590       This table also has a priority-110 flow with the match inport == I  for
591       all logical switch datapaths to move traffic to the next table. Where I
592       is the peer of a logical router port. This flow is added  to  skip  the
593       connection tracking of packets which enter from logical router datapath
594       to logical switch datapath.
595
596     Ingress Table 6: Pre-stateful
597
598       This table prepares flows for all possible stateful processing in  next
599       tables.  It contains a priority-0 flow that simply moves traffic to the
600       next table.
601
602              •      Priority-120 flows that send the  packets  to  connection
603                     tracker  using  ct_lb_mark; as the action so that the al‐
604                     ready established traffic destined to the  load  balancer
605                     VIP  gets  DNATted.  These  flows  match each VIPs IP and
606                     port. For IPv4 traffic the flows also load  the  original
607                     destination  IP  and transport port in registers reg1 and
608                     reg2. For IPv6 traffic the flows also load  the  original
609                     destination IP and transport port in registers xxreg1 and
610                     reg2.
611
612              •      A priority-110 flow sends the packets  that  don’t  match
613                     the  above  flows  to  connection tracker based on a hint
614                     provided by the previous tables (with a match for reg0[2]
615                     == 1) by using the ct_lb_mark; action.
616
617              •      A  priority-100  flow  sends  the  packets  to connection
618                     tracker based on a hint provided by the  previous  tables
619                     (with a match for reg0[0] == 1) by using the ct_next; ac‐
620                     tion.
621
622     Ingress Table 7: from-lport ACL hints
623
624       This table consists of logical flows that set hints (reg0 bits)  to  be
625       used  in  the next stage, in the ACL processing table, if stateful ACLs
626       or load balancers are configured. Multiple hints can  be  set  for  the
627       same packet. The possible hints are:
628
629reg0[7]:  the packet might match an allow-related ACL and
630                     might have to commit the connection to conntrack.
631
632reg0[8]: the packet might match an allow-related ACL  but
633                     there  will  be  no need to commit the connection to con‐
634                     ntrack because it already exists.
635
636reg0[9]: the packet might match a drop/reject.
637
638reg0[10]: the packet might match a  drop/reject  ACL  but
639                     the connection was previously allowed so it might have to
640                     be committed again with ct_label=1/1.
641
642       The table contains the following flows:
643
644              •      A priority-65535 flow to advance to the next table if the
645                     logical switch has no ACLs configured, otherwise a prior‐
646                     ity-0 flow to advance to the next table.
647
648              •      A priority-7 flow that matches on packets that initiate a
649                     new  session. This flow sets reg0[7] and reg0[9] and then
650                     advances to the next table.
651
652              •      A priority-6 flow that matches on packets that are in the
653                     request direction of an already existing session that has
654                     been marked  as  blocked.  This  flow  sets  reg0[7]  and
655                     reg0[9] and then advances to the next table.
656
657              •      A  priority-5  flow  that matches untracked packets. This
658                     flow sets reg0[8] and reg0[9] and then  advances  to  the
659                     next table.
660
661              •      A priority-4 flow that matches on packets that are in the
662                     request direction of an already existing session that has
663                     not  been  marked  as blocked. This flow sets reg0[8] and
664                     reg0[10] and then advances to the next table.
665
666              •      A priority-3 flow that matches on packets that are in not
667                     part  of established sessions. This flow sets reg0[9] and
668                     then advances to the next table.
669
670              •      A priority-2 flow that matches on packets that  are  part
671                     of  an  established  session  that  has  been  marked  as
672                     blocked. This flow sets reg0[9] and then advances to  the
673                     next table.
674
675              •      A  priority-1  flow that matches on packets that are part
676                     of an established session that has  not  been  marked  as
677                     blocked. This flow sets reg0[10] and then advances to the
678                     next table.
679
680     Ingress table 8: from-lport ACLs before LB
681
682       Logical flows in this table closely reproduce those in the ACL table in
683       the  OVN_Northbound  database  for the from-lport direction without the
684       option apply-after-lb set or set to false. The priority values from the
685       ACL  table  have  a  limited range and have 1000 added to them to leave
686       room for OVN default flows at both higher and lower priorities.
687
688allow ACLs translate into logical flows  with  the  next;
689                     action.  If there are any stateful ACLs on this datapath,
690                     then allow ACLs translate to ct_commit; next; (which acts
691                     as a hint for the next tables to commit the connection to
692                     conntrack). In case the ACL has  a  label  then  reg3  is
693                     loaded  with the label value and reg0[13] bit is set to 1
694                     (which acts as a hint for the next tables to  commit  the
695                     label to conntrack).
696
697allow-related  ACLs translate into logical flows with the
698                     ct_commit(ct_label=0/1); next; actions  for  new  connec‐
699                     tions and reg0[1] = 1; next; for existing connections. In
700                     case the ACL has a label then reg3 is loaded with the la‐
701                     bel  value  and reg0[13] bit is set to 1 (which acts as a
702                     hint for the next tables to  commit  the  label  to  con‐
703                     ntrack).
704
705allow-stateless  ACLs  translate  into logical flows with
706                     the next; action.
707
708reject ACLs translate into logical flows with the tcp_re‐
709                     set  { output <-> inport; next(pipeline=egress,table=5);}
710                     action for TCP  connections,icmp4/icmp6  action  for  UDP
711                     connections,   and   sctp_abort  {output  <-%gt;  inport;
712                     next(pipeline=egress,table=5);} action for SCTP  associa‐
713                     tions.
714
715              •      Other  ACLs  translate to drop; for new or untracked con‐
716                     nections and ct_commit(ct_label=1/1); for  known  connec‐
717                     tions.  Setting  ct_label  marks a connection as one that
718                     was previously allowed, but should no longer  be  allowed
719                     due to a policy change.
720
721       This  table contains a priority-65535 flow to advance to the next table
722       if the logical switch has no ACLs configured,  otherwise  a  priority-0
723       flow to advance to the next table so that ACLs allow packets by default
724       if options:default_acl_drop column of NB_Global is false  or  not  set.
725       Otherwise  the  flow action is set to drop; to implement a default drop
726       behavior.
727
728       If the logical datapath has a stateful ACL or a load balancer with  VIP
729       configured, the following flows will also be added:
730
731              •      If  options:default_acl_drop column of NB_Global is false
732                     or not set, a priority-1 flow that sets the hint to  com‐
733                     mit  IP  traffic that is not part of established sessions
734                     to the connection  tracker  (with  action  reg0[1]  =  1;
735                     next;).  This  is needed for the default allow policy be‐
736                     cause, while the initiator’s direction may not  have  any
737                     stateful  rules,  the  server’s  may  and then its return
738                     traffic would not be known and marked as invalid.
739
740              •      If options:default_acl_drop column of NB_Global is  true,
741                     a  priority-1 flow that drops IP traffic that is not part
742                     of established sessions.
743
744              •      A priority-1 flow that sets the hint to commit IP traffic
745                     to  the  connection  tracker  (with  action  reg0[1] = 1;
746                     next;). This is needed for the default allow  policy  be‐
747                     cause,  while  the initiator’s direction may not have any
748                     stateful rules, the server’s  may  and  then  its  return
749                     traffic would not be known and marked as invalid.
750
751              •      A  priority-65532 flow that allows any traffic in the re‐
752                     ply direction for a connection that has been committed to
753                     the connection tracker (i.e., established flows), as long
754                     as the committed flow does not have ct_mark.blocked  set.
755                     We  only  handle  traffic in the reply direction here be‐
756                     cause we want all packets going in the request  direction
757                     to  still  go  through  the flows that implement the cur‐
758                     rently defined policy based on ACLs. If a  connection  is
759                     no longer allowed by policy, ct_mark.blocked will get set
760                     and packets in the reply direction will no longer be  al‐
761                     lowed,  either.  This  flow also clears the register bits
762                     reg0[9] and reg0[10]. If ACL logging and logging  of  re‐
763                     lated packets is enabled, then a companion priority-65533
764                     flow will be installed that accomplishes the  same  thing
765                     but also logs the traffic.
766
767              •      A  priority-65532  flow  that  allows any traffic that is
768                     considered related to a committed flow in the  connection
769                     tracker  (e.g.,  an ICMP Port Unreachable from a non-lis‐
770                     tening UDP port), as long as the committed flow does  not
771                     have  ct_mark.blocked  set. This flow also applies NAT to
772                     the related traffic so that ICMP headers  and  the  inner
773                     packet have correct addresses. If ACL logging and logging
774                     of related packets is enabled, then  a  companion  prior‐
775                     ity-65533  flow  will  be installed that accomplishes the
776                     same thing but also logs the traffic.
777
778              •      A priority-65532 flow that drops all  traffic  marked  by
779                     the connection tracker as invalid.
780
781              •      A priority-65532 flow that drops all traffic in the reply
782                     direction with ct_mark.blocked set meaning that the  con‐
783                     nection  should  no  longer  be  allowed  due to a policy
784                     change. Packets in the request direction are skipped here
785                     to let a newly created ACL re-allow this connection.
786
787              •      A priority-65532 flow that allows IPv6 Neighbor solicita‐
788                     tion, Neighbor discover, Router solicitation, Router  ad‐
789                     vertisement and MLD packets.
790
791       If the logical datapath has any ACL or a load balancer with VIP config‐
792       ured, the following flow will also be added:
793
794              •      A priority 34000 logical flow is added for  each  logical
795                     switch  datapath  with the match eth.dst = E to allow the
796                     service monitor reply packet destined  to  ovn-controller
797                     with  the action next, where E is the service monitor mac
798                     defined  in   the   options:svc_monitor_mac   column   of
799                     NB_Global table.
800
801     Ingress Table 9: from-lport QoS Marking
802
803       Logical  flows  in  this table closely reproduce those in the QoS table
804       with the action column set  in  the  OVN_Northbound  database  for  the
805       from-lport direction.
806
807              •      For  every  qos_rules entry in a logical switch with DSCP
808                     marking enabled, a flow will be  added  at  the  priority
809                     mentioned in the QoS table.
810
811              •      One priority-0 fallback flow that matches all packets and
812                     advances to the next table.
813
814     Ingress Table 10: from-lport QoS Meter
815
816       Logical flows in this table closely reproduce those in  the  QoS  table
817       with  the  bandwidth  column set in the OVN_Northbound database for the
818       from-lport direction.
819
820              •      For every qos_rules entry in a logical switch with meter‐
821                     ing  enabled,  a  flow will be added at the priority men‐
822                     tioned in the QoS table.
823
824              •      One priority-0 fallback flow that matches all packets and
825                     advances to the next table.
826
827     Ingress Table 11: Load balancing affinity check
828
829       Load  balancing  affinity  check  table  contains the following logical
830       flows:
831
832              •      For all the configured load balancing rules for a  switch
833                     in  OVN_Northbound  database  where  a  positive affinity
834                     timeout is specified in options column, that  includes  a
835                     L4  port  PORT of protocol P and IP address VIP, a prior‐
836                     ity-100 flow is added. For IPv4 VIPs,  the  flow  matches
837                     ct.new && ip && ip4.dst == VIP && P.dst == PORT. For IPv6
838                     VIPs, the flow matches ct.new && ip && ip6.dst == VIP&& P
839                     &&  P.dst  ==   PORT.  The  flow’s  action  is  reg9[6] =
840                     chk_lb_aff(); next;.
841
842              •      A priority 0 flow is added which matches on  all  packets
843                     and applies the action next;.
844
845     Ingress Table 12: LB
846
847              •      For  all the configured load balancing rules for a switch
848                     in OVN_Northbound  database  where  a  positive  affinity
849                     timeout  is  specified in options column, that includes a
850                     L4 port PORT of protocol P and IP address VIP,  a  prior‐
851                     ity-150  flow  is  added. For IPv4 VIPs, the flow matches
852                     reg9[6] == 1 && ct.new && ip && ip4.dst == VIP  &&  P.dst
853                     == PORT . For IPv6 VIPs, the flow matches reg9[6] == 1 &&
854                     ct.new && ip && ip6.dst ==  VIP && P && P.dst  ==   PORT.
855                     The  flow’s  action  is ct_lb_mark(args), where args con‐
856                     tains comma separated IP  addresses  (and  optional  port
857                     numbers) to load balance to. The address family of the IP
858                     addresses of args is the same as the  address  family  of
859                     VIP.
860
861              •      For  all the configured load balancing rules for a switch
862                     in OVN_Northbound database that includes a L4  port  PORT
863                     of  protocol P and IP address VIP, a priority-120 flow is
864                     added. For IPv4 VIPs , the flow matches ct.new &&  ip  &&
865                     ip4.dst  == VIP && P.dst == PORT. For IPv6 VIPs, the flow
866                     matches ct.new && ip && ip6.dst == VIP && P &&  P.dst  ==
867                     PORT.  The flow’s action is ct_lb_mark(args) , where args
868                     contains comma separated IP addresses (and optional  port
869                     numbers) to load balance to. The address family of the IP
870                     addresses of args is the same as the  address  family  of
871                     VIP. If health check is enabled, then args will only con‐
872                     tain those endpoints whose service monitor  status  entry
873                     in  OVN_Southbound db is either online or empty. For IPv4
874                     traffic the flow also loads the original  destination  IP
875                     and  transport  port in registers reg1 and reg2. For IPv6
876                     traffic the flow also loads the original  destination  IP
877                     and  transport  port  in  registers  xxreg1 and reg2. The
878                     above flow is created even if the load  balancer  is  at‐
879                     tached to a logical router connected to the current logi‐
880                     cal switch and the install_ls_lb_from_router variable  in
881                     options is set to true.
882
883              •      For  all the configured load balancing rules for a switch
884                     in OVN_Northbound database that includes just an  IP  ad‐
885                     dress  VIP to match on, OVN adds a priority-110 flow. For
886                     IPv4 VIPs, the flow matches ct.new && ip  &&  ip4.dst  ==
887                     VIP.  For  IPv6  VIPs,  the  flow matches ct.new && ip &&
888                     ip6.dst  ==   VIP.   The   action   on   this   flow   is
889                     ct_lb_mark(args),  where args contains comma separated IP
890                     addresses of the same address family  as  VIP.  For  IPv4
891                     traffic  the  flow also loads the original destination IP
892                     and transport port in registers reg1 and reg2.  For  IPv6
893                     traffic  the  flow also loads the original destination IP
894                     and transport port in  registers  xxreg1  and  reg2.  The
895                     above  flow  is  created even if the load balancer is at‐
896                     tached to a logical router connected to the current logi‐
897                     cal  switch and the install_ls_lb_from_router variable in
898                     options is set to true.
899
900              •      If the load balancer is created with --reject option  and
901                     it  has no active backends, a TCP reset segment (for tcp)
902                     or an ICMP port unreachable packet (for all other kind of
903                     traffic)  will be sent whenever an incoming packet is re‐
904                     ceived for this load-balancer. Please note using --reject
905                     option will disable empty_lb SB controller event for this
906                     load balancer.
907
908     Ingress Table 13: Load balancing affinity learn
909
910       Load balancing affinity learn  table  contains  the  following  logical
911       flows:
912
913              •      For  all the configured load balancing rules for a switch
914                     in OVN_Northbound  database  where  a  positive  affinity
915                     timeout T is specified in options column, that includes a
916                     L4 port PORT of protocol P and IP address VIP,  a  prior‐
917                     ity-100  flow  is  added. For IPv4 VIPs, the flow matches
918                     reg9[6] == 0 && ct.new && ip && ip4.dst == VIP  &&  P.dst
919                     ==  PORT. For IPv6 VIPs, the flow matches ct.new && ip &&
920                     ip6.dst == VIP && P && P.dst == PORT . The flow’s  action
921                     is  commit_lb_aff(vip  =  VIP:PORT, backend = backend ip:
922                     backend port, proto = P, timeout = T); .
923
924              •      A priority 0 flow is added which matches on  all  packets
925                     and applies the action next;.
926
927     Ingress table 14: from-lport ACLs after LB
928
929       Logical flows in this table closely reproduce those in the ACL table in
930       the OVN_Northbound database for the from-lport direction with  the  op‐
931       tion apply-after-lb set to true. The priority values from the ACL table
932       have a limited range and have 1000 added to them to leave room for  OVN
933       default flows at both higher and lower priorities.
934
935allow  apply-after-lb  ACLs  translate into logical flows
936                     with the next; action. If there  are  any  stateful  ACLs
937                     (including  both  before-lb  and  after-lb  ACLs) on this
938                     datapath, then allow ACLs translate to  ct_commit;  next;
939                     (which  acts  as a hint for the next tables to commit the
940                     connection to conntrack). In case the  ACL  has  a  label
941                     then reg3 is loaded with the label value and reg0[13] bit
942                     is set to 1 (which acts as a hint for the next tables  to
943                     commit the label to conntrack).
944
945allow-related  apply-after-lb ACLs translate into logical
946                     flows with the ct_commit(ct_label=0/1); next; actions for
947                     new  connections and reg0[1] = 1; next; for existing con‐
948                     nections. In case the ACL has a label then reg3 is loaded
949                     with  the label value and reg0[13] bit is set to 1 (which
950                     acts as a hint for the next tables to commit the label to
951                     conntrack).
952
953allow-stateless  apply-after-lb ACLs translate into logi‐
954                     cal flows with the next; action.
955
956reject apply-after-lb ACLs translate into  logical  flows
957                     with  the  tcp_reset  {  output  <->  inport;  next(pipe‐
958                     line=egress,table=5);}    action    for    TCP    connec‐
959                     tions,icmp4/icmp6   action   for   UDP  connections,  and
960                     sctp_abort    {output    <-%gt;    inport;     next(pipe‐
961                     line=egress,table=5);} action for SCTP associations.
962
963              •      Other  apply-after-lb  ACLs translate to drop; for new or
964                     untracked connections  and  ct_commit(ct_label=1/1);  for
965                     known connections. Setting ct_label marks a connection as
966                     one that was previously allowed, but should no longer  be
967                     allowed due to a policy change.
968
969              •      One priority-0 fallback flow that matches all packets and
970                     advances to the next table.
971
972     Ingress Table 15: Stateful
973
974              •      A priority 100 flow is added which commits the packet  to
975                     the  conntrack  and  sets the most significant 32-bits of
976                     ct_label with the reg3 value based on the  hint  provided
977                     by  previous  tables  (with  a  match for reg0[1] == 1 &&
978                     reg0[13] == 1). This is used by the ACLs  with  label  to
979                     commit the label value to conntrack.
980
981              •      For  ACLs  without label, a second priority-100 flow com‐
982                     mits packets to connection tracker using ct_commit; next;
983                     action  based  on  a hint provided by the previous tables
984                     (with a match for reg0[1] == 1 && reg0[13] == 0).
985
986              •      A priority-0 flow that simply moves traffic to  the  next
987                     table.
988
989     Ingress Table 16: Pre-Hairpin
990
991              •      If  the  logical  switch has load balancer(s) configured,
992                     then a priority-100 flow is added with the  match  ip  &&
993                     ct.trk  to check if the packet needs to be hairpinned (if
994                     after load  balancing  the  destination  IP  matches  the
995                     source  IP)  or  not  by  executing the actions reg0[6] =
996                     chk_lb_hairpin(); and reg0[12] =  chk_lb_hairpin_reply();
997                     and advances the packet to the next table.
998
999              •      A  priority-0  flow that simply moves traffic to the next
1000                     table.
1001
1002     Ingress Table 17: Nat-Hairpin
1003
1004              •      If the logical switch has  load  balancer(s)  configured,
1005                     then  a  priority-100  flow is added with the match ip &&
1006                     ct.new && ct.trk && reg0[6] == 1 which hairpins the traf‐
1007                     fic by NATting source IP to the load balancer VIP by exe‐
1008                     cuting the action ct_snat_to_vip and advances the  packet
1009                     to the next table.
1010
1011              •      If  the  logical  switch has load balancer(s) configured,
1012                     then a priority-100 flow is added with the  match  ip  &&
1013                     ct.est && ct.trk && reg0[6] == 1 which hairpins the traf‐
1014                     fic by NATting source IP to the load balancer VIP by exe‐
1015                     cuting  the action ct_snat and advances the packet to the
1016                     next table.
1017
1018              •      If the logical switch has  load  balancer(s)  configured,
1019                     then  a  priority-90  flow  is added with the match ip &&
1020                     reg0[12] == 1 which matches on the replies of  hairpinned
1021                     traffic  (i.e.,  destination  IP is VIP, source IP is the
1022                     backend IP and source L4 port is backend port for L4 load
1023                     balancers)  and  executes ct_snat and advances the packet
1024                     to the next table.
1025
1026              •      A priority-0 flow that simply moves traffic to  the  next
1027                     table.
1028
1029     Ingress Table 18: Hairpin
1030
1031              •      For  each  distributed gateway router port RP attached to
1032                     the logical switch, a priority-2000 flow  is  added  with
1033                     the match reg0[14] == 1 && is_chassis_resident(RP)
1034                      and  action  next; to pass the traffic to the next table
1035                     to respond to the ARP requests for the router port IPs.
1036
1037                     reg0[14] register bit is set in the ingress L2 port secu‐
1038                     rity check table for traffic received from HW VTEP (ramp)
1039                     ports.
1040
1041              •      A priority-1000 flow that matches  on  reg0[14]  register
1042                     bit  for  the traffic received from HW VTEP (ramp) ports.
1043                     This traffic is passed to ingress table ls_in_l2_lkup.
1044
1045              •      A priority-1 flow that hairpins traffic matched  by  non-
1046                     default  flows  in  the Pre-Hairpin table. Hairpinning is
1047                     done at L2, Ethernet addresses are swapped and the  pack‐
1048                     ets are looped back on the input port.
1049
1050              •      A  priority-0  flow that simply moves traffic to the next
1051                     table.
1052
1053     Ingress Table 19: ARP/ND responder
1054
1055       This table implements ARP/ND responder in a logical  switch  for  known
1056       IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
1057       by locally responding to ARP requests without the need to send to other
1058       hypervisors. One common case is when the inport is a logical port asso‐
1059       ciated with a VIF and the broadcast is responded to on the local hyper‐
1060       visor  rather  than broadcast across the whole network and responded to
1061       by the destination VM. This behavior is proxy ARP.
1062
1063       ARP requests arrive from VMs from a logical switch inport of  type  de‐
1064       fault.  For  this  case,  the logical switch proxy ARP rules can be for
1065       other VMs or logical router ports. Logical switch proxy ARP  rules  may
1066       be  programmed  both  for  mac binding of IP addresses on other logical
1067       switch VIF ports (which are of the default logical  switch  port  type,
1068       representing connectivity to VMs or containers), and for mac binding of
1069       IP addresses on logical switch router type  ports,  representing  their
1070       logical  router  port  peers. In order to support proxy ARP for logical
1071       router ports, an IP address must be configured on  the  logical  switch
1072       router  type port, with the same value as the peer logical router port.
1073       The configured MAC addresses must match as well. When a VM sends an ARP
1074       request  for  a  distributed logical router port and if the peer router
1075       type port of the attached logical switch does not have  an  IP  address
1076       configured,  the  ARP  request will be broadcast on the logical switch.
1077       One of the copies of the ARP request will go through the logical switch
1078       router  type  port  to  the  logical router datapath, where the logical
1079       router ARP responder will generate a reply. The MAC binding of  a  dis‐
1080       tributed  logical router, once learned by an associated VM, is used for
1081       all that VM’s communication needing routing. Hence, the action of a  VM
1082       re-arping  for  the  mac  binding  of the logical router port should be
1083       rare.
1084
1085       Logical switch ARP responder proxy ARP rules can also be hit  when  re‐
1086       ceiving ARP requests externally on a L2 gateway port. In this case, the
1087       hypervisor acting as an L2 gateway, responds to the ARP request on  be‐
1088       half of a destination VM.
1089
1090       Note  that  ARP requests received from localnet logical inports can ei‐
1091       ther go directly to VMs, in which case the VM responds or  can  hit  an
1092       ARP  responder  for  a logical router port if the packet is used to re‐
1093       solve a logical router port next hop address. In either  case,  logical
1094       switch  ARP  responder rules will not be hit. It contains these logical
1095       flows:
1096
1097              •      Priority-100 flows to skip the ARP responder if inport is
1098                     of type localnet advances directly to the next table. ARP
1099                     requests sent to localnet ports can be received by multi‐
1100                     ple  hypervisors. Now, because the same mac binding rules
1101                     are downloaded to all hypervisors, each of  the  multiple
1102                     hypervisors  will  respond. This will confuse L2 learning
1103                     on the source of the ARP requests. ARP requests  received
1104                     on  an  inport of type router are not expected to hit any
1105                     logical switch ARP  responder  flows.  However,  no  skip
1106                     flows  are installed for these packets, as there would be
1107                     some additional flow cost for this and the value  appears
1108                     limited.
1109
1110              •      If  inport V is of type virtual adds a priority-100 logi‐
1111                     cal flows for each P configured in  the  options:virtual-
1112                     parents column with the match
1113
1114                     inport == P && && ((arp.op == 1 && arp.spa == VIP && arp.tpa == VIP) || (arp.op == 2 && arp.spa == VIP))
1115                     inport == P && && ((nd_ns && ip6.dst == {VIP, NS_MULTICAST_ADDR} && nd.target == VIP) || (nd_na && nd.target == VIP))
1116
1117
1118                     and applies the action
1119
1120                     bind_vport(V, inport);
1121
1122
1123                     and advances the packet to the next table.
1124
1125                     Where  VIP is the virtual ip configured in the column op‐
1126                     tions:virtual-ip and NS_MULTICAST_ADDR is  solicited-node
1127                     multicast address corresponding to the VIP.
1128
1129              •      Priority-50  flows  that match ARP requests to each known
1130                     IP address A of every logical switch  port,  and  respond
1131                     with ARP replies directly with corresponding Ethernet ad‐
1132                     dress E:
1133
1134                     eth.dst = eth.src;
1135                     eth.src = E;
1136                     arp.op = 2; /* ARP reply. */
1137                     arp.tha = arp.sha;
1138                     arp.sha = E;
1139                     arp.tpa = arp.spa;
1140                     arp.spa = A;
1141                     outport = inport;
1142                     flags.loopback = 1;
1143                     output;
1144
1145
1146                     These flows are omitted for  logical  ports  (other  than
1147                     router  ports  or  localport ports) that are down (unless
1148                     ignore_lsp_down is configured as true in  options  column
1149                     of NB_Global table of the Northbound database), for logi‐
1150                     cal ports of type virtual, for logical  ports  with  ’un‐
1151                     known’  address  set  and  for logical ports of a logical
1152                     switch configured with other_config:vlan-passthru=true.
1153
1154                     The above ARP responder flows are added for the  list  of
1155                     IPv4  addresses if defined in options:arp_proxy column of
1156                     Logical_Switch_Port table for  logical  switch  ports  of
1157                     type router.
1158
1159              •      Priority-50  flows  that match IPv6 ND neighbor solicita‐
1160                     tions to each known IP address A (and A’s solicited  node
1161                     address)  of  every  logical  switch  port except of type
1162                     router, and respond with neighbor advertisements directly
1163                     with corresponding Ethernet address E:
1164
1165                     nd_na {
1166                         eth.src = E;
1167                         ip6.src = A;
1168                         nd.target = A;
1169                         nd.tll = E;
1170                         outport = inport;
1171                         flags.loopback = 1;
1172                         output;
1173                     };
1174
1175
1176                     Priority-50  flows  that match IPv6 ND neighbor solicita‐
1177                     tions to each known IP address A (and A’s solicited  node
1178                     address)  of  logical switch port of type router, and re‐
1179                     spond with neighbor advertisements directly  with  corre‐
1180                     sponding Ethernet address E:
1181
1182                     nd_na_router {
1183                         eth.src = E;
1184                         ip6.src = A;
1185                         nd.target = A;
1186                         nd.tll = E;
1187                         outport = inport;
1188                         flags.loopback = 1;
1189                         output;
1190                     };
1191
1192
1193                     These  flows  are  omitted  for logical ports (other than
1194                     router ports or localport ports) that  are  down  (unless
1195                     ignore_lsp_down  is  configured as true in options column
1196                     of NB_Global table of the Northbound database), for logi‐
1197                     cal ports of type virtual and for logical ports with ’un‐
1198                     known’ address set.
1199
1200              •      Priority-100 flows with match criteria like the  ARP  and
1201                     ND  flows above, except that they only match packets from
1202                     the inport that owns the IP addresses in  question,  with
1203                     action  next;.  These flows prevent OVN from replying to,
1204                     for example, an ARP request emitted by a VM for  its  own
1205                     IP  address.  A VM only makes this kind of request to at‐
1206                     tempt to detect a duplicate  IP  address  assignment,  so
1207                     sending a reply will prevent the VM from accepting the IP
1208                     address that it owns.
1209
1210                     In place of next;, it would be reasonable  to  use  drop;
1211                     for the flows’ actions. If everything is working as it is
1212                     configured, then this would produce  equivalent  results,
1213                     since no host should reply to the request. But ARPing for
1214                     one’s own IP address is  intended  to  detect  situations
1215                     where  the network is not working as configured, so drop‐
1216                     ping the request would frustrate that intent.
1217
1218              •      For each SVC_MON_SRC_IP  defined  in  the  value  of  the
1219                     ip_port_mappings:ENDPOINT_IP  column of Load_Balancer ta‐
1220                     ble, priority-110 logical flow is added  with  the  match
1221                     arp.tpa  ==  SVC_MON_SRC_IP && && arp.op == 1 and applies
1222                     the action
1223
1224                     eth.dst = eth.src;
1225                     eth.src = E;
1226                     arp.op = 2; /* ARP reply. */
1227                     arp.tha = arp.sha;
1228                     arp.sha = E;
1229                     arp.tpa = arp.spa;
1230                     arp.spa = A;
1231                     outport = inport;
1232                     flags.loopback = 1;
1233                     output;
1234
1235
1236                     where E is the service monitor source mac defined in  the
1237                     options:svc_monitor_mac  column  in  the NB_Global table.
1238                     This mac is used as the source mac in the service monitor
1239                     packets for the load balancer endpoint IP health checks.
1240
1241                     SVC_MON_SRC_IP  is  used  as the source ip in the service
1242                     monitor IPv4 packets for the load  balancer  endpoint  IP
1243                     health checks.
1244
1245                     These  flows  are  required if an ARP request is sent for
1246                     the IP SVC_MON_SRC_IP.
1247
1248              •      For each VIP configured in the table  Forwarding_Group  a
1249                     priority-50  logical flow is added with the match arp.tpa
1250                     == vip && && arp.op == 1
1251                      and applies the action
1252
1253                     eth.dst = eth.src;
1254                     eth.src = E;
1255                     arp.op = 2; /* ARP reply. */
1256                     arp.tha = arp.sha;
1257                     arp.sha = E;
1258                     arp.tpa = arp.spa;
1259                     arp.spa = A;
1260                     outport = inport;
1261                     flags.loopback = 1;
1262                     output;
1263
1264
1265                     where E is the forwarding  group’s  mac  defined  in  the
1266                     vmac.
1267
1268                     A is used as either the destination ip for load balancing
1269                     traffic to child ports or as nexthop to hosts behind  the
1270                     child ports.
1271
1272                     These  flows are required to respond to an ARP request if
1273                     an ARP request is sent for the IP vip.
1274
1275              •      One priority-0 fallback flow that matches all packets and
1276                     advances to the next table.
1277
1278     Ingress Table 20: DHCP option processing
1279
1280       This  table adds the DHCPv4 options to a DHCPv4 packet from the logical
1281       ports configured with IPv4 address(es) and DHCPv4  options,  and  simi‐
1282       larly  for  DHCPv6  options. This table also adds flows for the logical
1283       ports of type external.
1284
1285              •      A priority-100 logical flow is added  for  these  logical
1286                     ports which matches the IPv4 packet with udp.src = 68 and
1287                     udp.dst = 67 and applies the action put_dhcp_opts and ad‐
1288                     vances the packet to the next table.
1289
1290                     reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
1291                     next;
1292
1293
1294                     For  DHCPDISCOVER  and  DHCPREQUEST,  this transforms the
1295                     packet into a DHCP reply, adds the DHCP offer IP  ip  and
1296                     options  to  the  packet,  and stores 1 into reg0[3]. For
1297                     other kinds of packets, it just stores  0  into  reg0[3].
1298                     Either way, it continues to the next table.
1299
1300              •      A  priority-100  logical  flow is added for these logical
1301                     ports which matches the IPv6 packet with  udp.src  =  546
1302                     and  udp.dst = 547 and applies the action put_dhcpv6_opts
1303                     and advances the packet to the next table.
1304
1305                     reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
1306                     next;
1307
1308
1309                     For DHCPv6 Solicit/Request/Confirm packets,  this  trans‐
1310                     forms  the packet into a DHCPv6 Advertise/Reply, adds the
1311                     DHCPv6 offer IP ip and options to the packet, and  stores
1312                     1  into  reg0[3].  For  other  kinds  of packets, it just
1313                     stores 0 into reg0[3]. Either way, it  continues  to  the
1314                     next table.
1315
1316              •      A priority-0 flow that matches all packets to advances to
1317                     table 16.
1318
1319     Ingress Table 21: DHCP responses
1320
1321       This table implements DHCP responder for the DHCP replies generated  by
1322       the previous table.
1323
1324              •      A  priority  100  logical  flow  is added for the logical
1325                     ports configured with DHCPv4 options which  matches  IPv4
1326                     packets with udp.src == 68 && udp.dst == 67 && reg0[3] ==
1327                     1 and responds back to the inport  after  applying  these
1328                     actions. If reg0[3] is set to 1, it means that the action
1329                     put_dhcp_opts was successful.
1330
1331                     eth.dst = eth.src;
1332                     eth.src = E;
1333                     ip4.src = S;
1334                     udp.src = 67;
1335                     udp.dst = 68;
1336                     outport = P;
1337                     flags.loopback = 1;
1338                     output;
1339
1340
1341                     where E is the server MAC address and  S  is  the  server
1342                     IPv4  address  defined  in  the DHCPv4 options. Note that
1343                     ip4.dst field is handled by put_dhcp_opts.
1344
1345                     (This terminates ingress packet  processing;  the  packet
1346                     does not go to the next ingress table.)
1347
1348              •      A  priority  100  logical  flow  is added for the logical
1349                     ports configured with DHCPv6 options which  matches  IPv6
1350                     packets  with udp.src == 546 && udp.dst == 547 && reg0[3]
1351                     == 1 and responds back to the inport after applying these
1352                     actions. If reg0[3] is set to 1, it means that the action
1353                     put_dhcpv6_opts was successful.
1354
1355                     eth.dst = eth.src;
1356                     eth.src = E;
1357                     ip6.dst = A;
1358                     ip6.src = S;
1359                     udp.src = 547;
1360                     udp.dst = 546;
1361                     outport = P;
1362                     flags.loopback = 1;
1363                     output;
1364
1365
1366                     where E is the server MAC address and  S  is  the  server
1367                     IPv6  LLA address generated from the server_id defined in
1368                     the DHCPv6 options and A is the IPv6 address  defined  in
1369                     the logical port’s addresses column.
1370
1371                     (This  terminates  packet processing; the packet does not
1372                     go on the next ingress table.)
1373
1374              •      A priority-0 flow that matches all packets to advances to
1375                     table 17.
1376
1377     Ingress Table 22 DNS Lookup
1378
1379       This  table  looks  up  and resolves the DNS names to the corresponding
1380       configured IP address(es).
1381
1382              •      A priority-100 logical flow for each logical switch data‐
1383                     path  if it is configured with DNS records, which matches
1384                     the IPv4 and IPv6 packets with udp.dst = 53  and  applies
1385                     the action dns_lookup and advances the packet to the next
1386                     table.
1387
1388                     reg0[4] = dns_lookup(); next;
1389
1390
1391                     For valid DNS packets, this transforms the packet into  a
1392                     DNS  reply  if the DNS name can be resolved, and stores 1
1393                     into reg0[4]. For failed DNS resolution or other kinds of
1394                     packets,  it  just  stores 0 into reg0[4]. Either way, it
1395                     continues to the next table.
1396
1397     Ingress Table 23 DNS Responses
1398
1399       This table implements DNS responder for the DNS  replies  generated  by
1400       the previous table.
1401
1402              •      A priority-100 logical flow for each logical switch data‐
1403                     path if it is configured with DNS records, which  matches
1404                     the IPv4 and IPv6 packets with udp.dst = 53 && reg0[4] ==
1405                     1 and responds back to the inport  after  applying  these
1406                     actions. If reg0[4] is set to 1, it means that the action
1407                     dns_lookup was successful.
1408
1409                     eth.dst <-> eth.src;
1410                     ip4.src <-> ip4.dst;
1411                     udp.dst = udp.src;
1412                     udp.src = 53;
1413                     outport = P;
1414                     flags.loopback = 1;
1415                     output;
1416
1417
1418                     (This terminates ingress packet  processing;  the  packet
1419                     does not go to the next ingress table.)
1420
1421     Ingress table 24 External ports
1422
1423       Traffic  from  the  external  logical  ports enter the ingress datapath
1424       pipeline via the localnet port. This table adds the below logical flows
1425       to handle the traffic from these ports.
1426
1427              •      A  priority-100  flow  is added for each external logical
1428                     port which doesn’t  reside  on  a  chassis  to  drop  the
1429                     ARP/IPv6  NS  request to the router IP(s) (of the logical
1430                     switch) which matches on the inport of the external logi‐
1431                     cal  port and the valid eth.src address(es) of the exter‐
1432                     nal logical port.
1433
1434                     This flow guarantees  that  the  ARP/NS  request  to  the
1435                     router IP address from the external ports is responded by
1436                     only the chassis which has claimed these external  ports.
1437                     All the other chassis, drops these packets.
1438
1439                     A  priority-100  flow  is added for each external logical
1440                     port which doesn’t reside on a chassis to drop any packet
1441                     destined to the router mac - with the match inport == ex‐
1442                     ternal && eth.src == E  &&  eth.dst  ==  R  &&  !is_chas‐
1443                     sis_resident("external") where E is the external port mac
1444                     and R is the router port mac.
1445
1446              •      A priority-0 flow that matches all packets to advances to
1447                     table 20.
1448
1449     Ingress Table 25 Destination Lookup
1450
1451       This  table  implements  switching  behavior. It contains these logical
1452       flows:
1453
1454              •      A priority-110 flow with the match eth.src == E  for  all
1455                     logical  switch  datapaths  and  applies  the action han‐
1456                     dle_svc_check(inport). Where E is the service monitor mac
1457                     defined   in   the   options:svc_monitor_mac   column  of
1458                     NB_Global table.
1459
1460              •      A priority-100 flow that punts all  IGMP/MLD  packets  to
1461                     ovn-controller  if  multicast  snooping is enabled on the
1462                     logical switch.
1463
1464              •      Priority-90 flows that forward  registered  IP  multicast
1465                     traffic  to  their  corresponding  multicast group, which
1466                     ovn-northd creates based on  learnt  IGMP_Group  entries.
1467                     The  flows  also  forward packets to the MC_MROUTER_FLOOD
1468                     multicast group, which ovn-nortdh populates with all  the
1469                     logical  ports that are connected to logical routers with
1470                     options:mcast_relay=’true’.
1471
1472              •      A priority-85 flow that forwards all IP multicast traffic
1473                     destined to 224.0.0.X to the MC_FLOOD_L2 multicast group,
1474                     which ovn-northd populates with  all  non-router  logical
1475                     ports.
1476
1477              •      A priority-85 flow that forwards all IP multicast traffic
1478                     destined to reserved multicast IPv6 addresses (RFC  4291,
1479                     2.7.1,  e.g.,  Solicited-Node  multicast) to the MC_FLOOD
1480                     multicast group, which ovn-northd populates with all  en‐
1481                     abled logical ports.
1482
1483              •      A priority-80 flow that forwards all unregistered IP mul‐
1484                     ticast traffic to the MC_STATIC  multicast  group,  which
1485                     ovn-northd populates with all the logical ports that have
1486                     options :mcast_flood=’true’. The flow also  forwards  un‐
1487                     registered  IP  multicast traffic to the MC_MROUTER_FLOOD
1488                     multicast group, which ovn-northd populates with all  the
1489                     logical  ports connected to logical routers that have op‐
1490                     tions :mcast_relay=’true’.
1491
1492              •      A priority-80 flow that drops all unregistered IP  multi‐
1493                     cast  traffic  if  other_config  :mcast_snoop=’true’  and
1494                     other_config  :mcast_flood_unregistered=’false’  and  the
1495                     switch  is not connected to a logical router that has op‐
1496                     tions :mcast_relay=’true’ and the switch doesn’t have any
1497                     logical port with options :mcast_flood=’true’.
1498
1499              •      Priority-80  flows  for  each  IP address/VIP/NAT address
1500                     owned by a router port connected  to  the  switch.  These
1501                     flows  match ARP requests and ND packets for the specific
1502                     IP addresses. Matched packets are forwarded only  to  the
1503                     router  that  owns  the IP address and to the MC_FLOOD_L2
1504                     multicast group which  contains  all  non-router  logical
1505                     ports.
1506
1507              •      Priority-75  flows  for  each port connected to a logical
1508                     router matching  self  originated  ARP  request/RARP  re‐
1509                     quest/ND  packets.  These  packets  are  flooded  to  the
1510                     MC_FLOOD_L2 which contains all non-router logical ports.
1511
1512              •      A priority-70 flow that outputs all packets with an  Eth‐
1513                     ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
1514                     ticast group.
1515
1516              •      One priority-50 flow that matches each known Ethernet ad‐
1517                     dress  against  eth.dst.  Action of this flow outputs the
1518                     packet to the single associated output port if it is  en‐
1519                     abled. drop; action is applied if LSP is disabled.
1520
1521                     For the Ethernet address on a logical switch port of type
1522                     router, when that logical switch port’s addresses  column
1523                     is  set  to  router and the connected logical router port
1524                     has a gateway chassis:
1525
1526                     •      The flow for the connected logical  router  port’s
1527                            Ethernet address is only programmed on the gateway
1528                            chassis.
1529
1530                     •      If the logical router has rules specified  in  nat
1531                            with  external_mac,  then those addresses are also
1532                            used to populate the switch’s  destination  lookup
1533                            on the chassis where logical_port is resident.
1534
1535                     For the Ethernet address on a logical switch port of type
1536                     router, when that logical switch port’s addresses  column
1537                     is  set  to  router and the connected logical router port
1538                     specifies a reside-on-redirect-chassis  and  the  logical
1539                     router to which the connected logical router port belongs
1540                     to has a distributed gateway LRP:
1541
1542                     •      The flow for the connected logical  router  port’s
1543                            Ethernet address is only programmed on the gateway
1544                            chassis.
1545
1546                     For each  forwarding  group  configured  on  the  logical
1547                     switch  datapath,  a  priority-50  flow  that  matches on
1548                     eth.dst == VIP
1549                      with an action  of  fwd_group(childports=args  ),  where
1550                     args  contains comma separated logical switch child ports
1551                     to load balance to. If liveness is enabled,  then  action
1552                     also includes  liveness=true.
1553
1554              •      One  priority-0  fallback  flow  that matches all packets
1555                     with the action outport =  get_fdb(eth.dst);  next;.  The
1556                     action  get_fdb  gets the port for the eth.dst in the MAC
1557                     learning table of the logical switch datapath.  If  there
1558                     is  no  entry for eth.dst in the MAC learning table, then
1559                     it stores none in the outport.
1560
1561     Ingress Table 26 Destination unknown
1562
1563       This table handles the packets whose destination was not found  or  and
1564       looked  up in the MAC learning table of the logical switch datapath. It
1565       contains the following flows.
1566
1567              •      Priority 50 flow with the match outport == P is added for
1568                     each disabled Logical Switch Port P. This flow has action
1569                     drop;.
1570
1571              •      If the logical switch has logical  ports  with  ’unknown’
1572                     addresses set, then the below logical flow is added
1573
1574                     •      Priority  50 flow with the match outport == "none"
1575                            then outputs  them  to  the  MC_UNKNOWN  multicast
1576                            group, which ovn-northd populates with all enabled
1577                            logical  ports  that  accept  unknown  destination
1578                            packets.  As  a  small optimization, if no logical
1579                            ports   accept   unknown   destination    packets,
1580                            ovn-northd  omits this multicast group and logical
1581                            flow.
1582
1583                     If the logical switch has no logical ports with ’unknown’
1584                     address set, then the below logical flow is added
1585
1586                     •      Priority  50  flow  with the match outport == none
1587                            and drops the packets.
1588
1589              •      One priority-0 fallback flow that outputs the  packet  to
1590                     the egress stage with the outport learnt from get_fdb ac‐
1591                     tion.
1592
1593     Egress Table 0: Pre-LB
1594
1595       This table is similar to ingress table Pre-LB. It contains a priority-0
1596       flow  that simply moves traffic to the next table. Moreover it contains
1597       two priority-110 flows to move multicast, IPv6 Neighbor  Discovery  and
1598       MLD  traffic  to  the next table. If any load balancing rules exist for
1599       the datapath, a priority-100 flow is added with a match of ip  and  ac‐
1600       tion  of  reg0[2] = 1; next; to act as a hint for table Pre-stateful to
1601       send IP packets to the connection tracker for  packet  de-fragmentation
1602       and  possibly  DNAT  the destination VIP to one of the selected backend
1603       for already committed load balanced traffic.
1604
1605       This table also has a priority-110 flow with the match eth.src == E for
1606       all logical switch datapaths to move traffic to the next table. Where E
1607       is the service monitor mac defined in the options:svc_monitor_mac  col‐
1608       umn of NB_Global table.
1609
1610     Egress Table 1: to-lport Pre-ACLs
1611
1612       This is similar to ingress table Pre-ACLs except for to-lport traffic.
1613
1614       This table also has a priority-110 flow with the match eth.src == E for
1615       all logical switch datapaths to move traffic to the next table. Where E
1616       is  the service monitor mac defined in the options:svc_monitor_mac col‐
1617       umn of NB_Global table.
1618
1619       This table also has a priority-110 flow with the match outport == I for
1620       all logical switch datapaths to move traffic to the next table. Where I
1621       is the peer of a logical router port. This flow is added  to  skip  the
1622       connection  tracking  of  packets which will be entering logical router
1623       datapath from logical switch datapath for routing.
1624
1625     Egress Table 2: Pre-stateful
1626
1627       This is similar to ingress table Pre-stateful. This table adds the  be‐
1628       low 3 logical flows.
1629
1630              •      A  Priority-120  flow that send the packets to connection
1631                     tracker using ct_lb_mark; as the action so that  the  al‐
1632                     ready established traffic gets unDNATted from the backend
1633                     IP to the load balancer VIP based on a hint  provided  by
1634                     the previous tables with a match for reg0[2] == 1. If the
1635                     packet was not DNATted earlier, then ct_lb_mark functions
1636                     like ct_next.
1637
1638              •      A  priority-100  flow  sends  the  packets  to connection
1639                     tracker based on a hint provided by the  previous  tables
1640                     (with a match for reg0[0] == 1) by using the ct_next; ac‐
1641                     tion.
1642
1643              •      A priority-0 flow that matches all packets to advance  to
1644                     the next table.
1645
1646     Egress Table 3: from-lport ACL hints
1647
1648       This is similar to ingress table ACL hints.
1649
1650     Egress Table 4: to-lport ACLs
1651
1652       This is similar to ingress table ACLs except for to-lport ACLs.
1653
1654       In addition, the following flows are added.
1655
1656              •      A  priority  34000 logical flow is added for each logical
1657                     port which has DHCPv4 options defined to allow the DHCPv4
1658                     reply  packet and which has DHCPv6 options defined to al‐
1659                     low the DHCPv6 reply packet from the  Ingress  Table  18:
1660                     DHCP responses.
1661
1662              •      A  priority  34000 logical flow is added for each logical
1663                     switch datapath configured  with  DNS  records  with  the
1664                     match udp.dst = 53 to allow the DNS reply packet from the
1665                     Ingress Table 20: DNS responses.
1666
1667              •      A priority 34000 logical flow is added for  each  logical
1668                     switch  datapath  with the match eth.src = E to allow the
1669                     service monitor  request  packet  generated  by  ovn-con‐
1670                     troller with the action next, where E is the service mon‐
1671                     itor mac defined in the options:svc_monitor_mac column of
1672                     NB_Global table.
1673
1674     Egress Table 5: to-lport QoS Marking
1675
1676       This  is  similar  to  ingress  table  QoS marking except they apply to
1677       to-lport QoS rules.
1678
1679     Egress Table 6: to-lport QoS Meter
1680
1681       This is similar to  ingress  table  QoS  meter  except  they  apply  to
1682       to-lport QoS rules.
1683
1684     Egress Table 7: Stateful
1685
1686       This  is  similar  to  ingress  table Stateful except that there are no
1687       rules added for load balancing new connections.
1688
1689     Egress Table 8: Egress Port Security - check
1690
1691       This is similar to the port security logic in table Ingress Port  Secu‐
1692       rity  check  except that action check_out_port_sec is used to check the
1693       port security rules. This table adds the below logical flows.
1694
1695              •      A priority 100 flow which matches on the multicast  traf‐
1696                     fic  and  applies  the  action REGBIT_PORT_SEC_DROP" = 0;
1697                     next;" to skip the out port security checks.
1698
1699              •      A priority 0 logical flow is added which matches  on  all
1700                     the  packets and applies the action REGBIT_PORT_SEC_DROP"
1701                     =    check_out_port_sec();     next;".     The     action
1702                     check_out_port_sec  applies the port security rules based
1703                     on the addresses defined in the port_security  column  of
1704                     Logical_Switch_Port table before delivering the packet to
1705                     the outport.
1706
1707     Egress Table 9: Egress Port Security - Apply
1708
1709       This is similar to the ingress port security logic in ingress  table  A
1710       Ingress Port Security - Apply. This table drops the packets if the port
1711       security check failed in the previous stage i.e the register  bit  REG‐
1712       BIT_PORT_SEC_DROP is set to 1.
1713
1714       The following flows are added.
1715
1716              •      For  each localnet port configured with egress qos in the
1717                     options:qdisc_queue_id column of  Logical_Switch_Port,  a
1718                     priority  100 flow is added which matches on the localnet
1719                     outport and applies the action set_queue(id); output;".
1720
1721                     Please remember to mark the corresponding physical inter‐
1722                     face with ovn-egress-iface set to true in external_ids.
1723
1724              •      A  priority-50 flow that drops the packet if the register
1725                     bit REGBIT_PORT_SEC_DROP is set to 1.
1726
1727              •      A priority-0 flow that outputs the packet to the outport.
1728
1729   Logical Router Datapaths
1730       Logical router datapaths will only exist for Logical_Router rows in the
1731       OVN_Northbound database that do not have enabled set to false
1732
1733     Ingress Table 0: L2 Admission Control
1734
1735       This  table drops packets that the router shouldn’t see at all based on
1736       their Ethernet headers. It contains the following flows:
1737
1738              •      Priority-100 flows to drop packets with VLAN tags or mul‐
1739                     ticast Ethernet source addresses.
1740
1741              •      For each enabled router port P with Ethernet address E, a
1742                     priority-50 flow that matches inport == P  &&  (eth.mcast
1743                     || eth.dst == E), stores the router port ethernet address
1744                     and advances to next table, with  action  xreg0[0..47]=E;
1745                     next;.
1746
1747                     For  the  gateway  port  on  a distributed logical router
1748                     (where one of the logical router ports specifies a  gate‐
1749                     way  chassis),  the  above  flow matching eth.dst == E is
1750                     only programmed on the gateway port instance on the gate‐
1751                     way  chassis. If LRP’s logical switch has attached LSP of
1752                     vtep type, the is_chassis_resident() part is not added to
1753                     lflow  to allow traffic originated from logical switch to
1754                     reach LR services (LBs, NAT).
1755
1756                     For a distributed logical router or  for  gateway  router
1757                     where the port is configured with options:gateway_mtu the
1758                     action   of   the   above   flow   is   modified   adding
1759                     check_pkt_larger in order to mark the packet setting REG‐
1760                     BIT_PKT_LARGER if the size is greater than  the  MTU.  If
1761                     the  port is also configured with options:gateway_mtu_by‐
1762                     pass then another flow is added, with priority-55, to by‐
1763                     pass  the check_pkt_larger flow. This is useful for traf‐
1764                     fic that normally doesn’t need to be fragmented  and  for
1765                     which  check_pkt_larger,  which might not be offloadable,
1766                     is not really needed. One such example is TCP traffic.
1767
1768              •      For each dnat_and_snat NAT rule on a  distributed  router
1769                     that  specifies  an external Ethernet address E, a prior‐
1770                     ity-50 flow that matches inport == GW &&  eth.dst  ==  E,
1771                     where  GW  is the logical router distributed gateway port
1772                     corresponding to the NAT rule  (specified  or  inferred),
1773                     with action xreg0[0..47]=E; next;.
1774
1775                     This flow is only programmed on the gateway port instance
1776                     on the chassis where the logical_port  specified  in  the
1777                     NAT rule resides.
1778
1779              •      A  priority-0  logical  flow that matches all packets not
1780                     already handled (match 1) and drops them (action drop;).
1781
1782       Other packets are implicitly dropped.
1783
1784     Ingress Table 1: Neighbor lookup
1785
1786       For ARP and IPv6 Neighbor Discovery packets, this table looks into  the
1787       MAC_Binding  records  to  determine if OVN needs to learn the mac bind‐
1788       ings. Following flows are added:
1789
1790              •      For each router port P that owns IP address A, which  be‐
1791                     longs to subnet S with prefix length L, if the option al‐
1792                     ways_learn_from_arp_request is true for  this  router,  a
1793                     priority-100  flow  is added which matches inport == P &&
1794                     arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1795                     lowing actions:
1796
1797                     reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1798                     next;
1799
1800
1801                     If the option always_learn_from_arp_request is false, the
1802                     following two flows are added.
1803
1804                     A priority-110 flow is added which matches inport == P &&
1805                     arp.spa  ==  S/L  && arp.tpa == A && arp.op == 1 (ARP re‐
1806                     quest) with the following actions:
1807
1808                     reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1809                     reg9[3] = 1;
1810                     next;
1811
1812
1813                     A priority-100 flow is added which matches inport == P &&
1814                     arp.spa == S/L && arp.op == 1 (ARP request) with the fol‐
1815                     lowing actions:
1816
1817                     reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1818                     reg9[3] = lookup_arp_ip(inport, arp.spa);
1819                     next;
1820
1821
1822                     If the logical router port P  is  a  distributed  gateway
1823                     router  port,  additional match is_chassis_resident(cr-P)
1824                     is added for all these flows.
1825
1826              •      A priority-100 flow which matches on  ARP  reply  packets
1827                     and    applies    the   actions   if   the   option   al‐
1828                     ways_learn_from_arp_request is true:
1829
1830                     reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1831                     next;
1832
1833
1834                     If the option always_learn_from_arp_request is false, the
1835                     above actions will be:
1836
1837                     reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
1838                     reg9[3] = 1;
1839                     next;
1840
1841
1842              •      A  priority-100  flow which matches on IPv6 Neighbor Dis‐
1843                     covery advertisement packet and applies  the  actions  if
1844                     the option always_learn_from_arp_request is true:
1845
1846                     reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1847                     next;
1848
1849
1850                     If the option always_learn_from_arp_request is false, the
1851                     above actions will be:
1852
1853                     reg9[2] = lookup_nd(inport, nd.target, nd.tll);
1854                     reg9[3] = 1;
1855                     next;
1856
1857
1858              •      A priority-100 flow which matches on IPv6  Neighbor  Dis‐
1859                     covery solicitation packet and applies the actions if the
1860                     option always_learn_from_arp_request is true:
1861
1862                     reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1863                     next;
1864
1865
1866                     If the option always_learn_from_arp_request is false, the
1867                     above actions will be:
1868
1869                     reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
1870                     reg9[3] = lookup_nd_ip(inport, ip6.src);
1871                     next;
1872
1873
1874              •      A  priority-0  fallback flow that matches all packets and
1875                     applies the action  reg9[2]  =  1;  next;  advancing  the
1876                     packet to the next table.
1877
1878     Ingress Table 2: Neighbor learning
1879
1880       This  table  adds flows to learn the mac bindings from the ARP and IPv6
1881       Neighbor Solicitation/Advertisement packets if it is  needed  according
1882       to the lookup results from the previous stage.
1883
1884       reg9[2] will be 1 if the lookup_arp/lookup_nd in the previous table was
1885       successful or skipped, meaning no need to learn mac  binding  from  the
1886       packet.
1887
1888       reg9[3] will be 1 if the lookup_arp_ip/lookup_nd_ip in the previous ta‐
1889       ble was successful or skipped, meaning it is ok to  learn  mac  binding
1890       from the packet (if reg9[2] is 0).
1891
1892              •      A  priority-100  flow  with  the  match  reg9[2]  == 1 ||
1893                     reg9[3] == 0 and advances the packet to the next table as
1894                     there is no need to learn the neighbor.
1895
1896              •      A  priority-95 flow with the match nd_ns && (ip6.src == 0
1897                     || nd.sll == 0) and applies the action next;
1898
1899              •      A priority-90 flow with the match arp and applies the ac‐
1900                     tion put_arp(inport, arp.spa, arp.sha); next;
1901
1902              •      A  priority-95  flow with the match nd_na  && nd.tll == 0
1903                     and  applies   the   action   put_nd(inport,   nd.target,
1904                     eth.src); next;
1905
1906              •      A  priority-90  flow with the match nd_na and applies the
1907                     action put_nd(inport, nd.target, nd.tll); next;
1908
1909              •      A priority-90 flow with the match nd_ns and  applies  the
1910                     action put_nd(inport, ip6.src, nd.sll); next;
1911
1912              •      A  priority-0  logical  flow that matches all packets not
1913                     already handled (match 1) and drops them (action drop;).
1914
1915     Ingress Table 3: IP Input
1916
1917       This table is the core of the logical router datapath functionality. It
1918       contains  the following flows to implement very basic IP host function‐
1919       ality.
1920
1921              •      For each dnat_and_snat NAT rule on a distributed  logical
1922                     routers  or  gateway routers with gateway port configured
1923                     with options:gateway_mtu to a valid integer  value  M,  a
1924                     priority-160  flow  with  the match inport == LRP && REG‐
1925                     BIT_PKT_LARGER && REGBIT_EGRESS_LOOPBACK == 0, where  LRP
1926                     is  the logical router port and applies the following ac‐
1927                     tion for ipv4 and ipv6 respectively:
1928
1929                     icmp4_error {
1930                         icmp4.type = 3; /* Destination Unreachable. */
1931                         icmp4.code = 4;  /* Frag Needed and DF was Set. */
1932                         icmp4.frag_mtu = M;
1933                         eth.dst = eth.src;
1934                         eth.src = E;
1935                         ip4.dst = ip4.src;
1936                         ip4.src = I;
1937                         ip.ttl = 255;
1938                         REGBIT_EGRESS_LOOPBACK = 1;
1939                         REGBIT_PKT_LARGER 0;
1940                         outport = LRP;
1941                         flags.loopback = 1;
1942                         output;
1943                     };
1944                     icmp6_error {
1945                         icmp6.type = 2;
1946                         icmp6.code = 0;
1947                         icmp6.frag_mtu = M;
1948                         eth.dst = eth.src;
1949                         eth.src = E;
1950                         ip6.dst = ip6.src;
1951                         ip6.src = I;
1952                         ip.ttl = 255;
1953                         REGBIT_EGRESS_LOOPBACK = 1;
1954                         REGBIT_PKT_LARGER 0;
1955                         outport = LRP;
1956                         flags.loopback = 1;
1957                         output;
1958                     };
1959
1960
1961                     where E and I are the NAT rule external mac  and  IP  re‐
1962                     spectively.
1963
1964              •      For  distributed  logical routers or gateway routers with
1965                     gateway port configured  with  options:gateway_mtu  to  a
1966                     valid  integer  value, a priority-150 flow with the match
1967                     inport == LRP && REGBIT_PKT_LARGER && REGBIT_EGRESS_LOOP‐
1968                     BACK  ==  0, where LRP is the logical router port and ap‐
1969                     plies the following action  for  ipv4  and  ipv6  respec‐
1970                     tively:
1971
1972                     icmp4_error {
1973                         icmp4.type = 3; /* Destination Unreachable. */
1974                         icmp4.code = 4;  /* Frag Needed and DF was Set. */
1975                         icmp4.frag_mtu = M;
1976                         eth.dst = E;
1977                         ip4.dst = ip4.src;
1978                         ip4.src = I;
1979                         ip.ttl = 255;
1980                         REGBIT_EGRESS_LOOPBACK = 1;
1981                         REGBIT_PKT_LARGER 0;
1982                         next(pipeline=ingress, table=0);
1983                     };
1984                     icmp6_error {
1985                         icmp6.type = 2;
1986                         icmp6.code = 0;
1987                         icmp6.frag_mtu = M;
1988                         eth.dst = E;
1989                         ip6.dst = ip6.src;
1990                         ip6.src = I;
1991                         ip.ttl = 255;
1992                         REGBIT_EGRESS_LOOPBACK = 1;
1993                         REGBIT_PKT_LARGER 0;
1994                         next(pipeline=ingress, table=0);
1995                     };
1996
1997
1998              •      For  each NAT entry of a distributed logical router (with
1999                     distributed gateway router port(s)) of type snat, a  pri‐
2000                     ority-120 flow with the match inport == P && ip4.src == A
2001                     advances the packet to the next pipeline, where P is  the
2002                     distributed  logical router port corresponding to the NAT
2003                     entry (specified or inferred) and A  is  the  external_ip
2004                     set  in  the  NAT  entry.  If  A is an IPv6 address, then
2005                     ip6.src is used for the match.
2006
2007                     The above flow is required to handle the routing  of  the
2008                     East/west NAT traffic.
2009
2010              •      For  each  BFD  port the two following priority-110 flows
2011                     are added to manage BFD traffic:
2012
2013                     •      if ip4.src or ip6.src is any IP address  owned  by
2014                            the  router  port and udp.dst == 3784 , the packet
2015                            is advanced to the next pipeline stage.
2016
2017                     •      if ip4.dst or ip6.dst is any IP address  owned  by
2018                            the  router  port  and  udp.dst == 3784 , the han‐
2019                            dle_bfd_msg action is executed.
2020
2021              •      L3 admission control: Priority-120 flows allows IGMP  and
2022                     MLD packets if the router has logical ports that have op‐
2023                     tions :mcast_flood=’true’.
2024
2025              •      L3 admission control: A priority-100 flow  drops  packets
2026                     that match any of the following:
2027
2028ip4.src[28..31] == 0xe (multicast source)
2029
2030ip4.src == 255.255.255.255 (broadcast source)
2031
2032ip4.src  ==  127.0.0.0/8 || ip4.dst == 127.0.0.0/8
2033                            (localhost source or destination)
2034
2035ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
2036                            network source or destination)
2037
2038ip4.src  or ip6.src is any IP address owned by the
2039                            router, unless the packet was recirculated due  to
2040                            egress    loopback    as    indicated    by   REG‐
2041                            BIT_EGRESS_LOOPBACK.
2042
2043ip4.src is the broadcast address of any IP network
2044                            known to the router.
2045
2046              •      A  priority-100 flow parses DHCPv6 replies from IPv6 pre‐
2047                     fix delegation routers (udp.src  ==  547  &&  udp.dst  ==
2048                     546). The handle_dhcpv6_reply is used to send IPv6 prefix
2049                     delegation messages to the delegation router.
2050
2051              •      ICMP echo reply. These flows reply to ICMP echo  requests
2052                     received  for the router’s IP address. Let A be an IP ad‐
2053                     dress owned by a router port. Then, for each A that is an
2054                     IPv4  address, a priority-90 flow matches on ip4.dst == A
2055                     and icmp4.type == 8 && icmp4.code ==  0  (ICMP  echo  re‐
2056                     quest). For each A that is an IPv6 address, a priority-90
2057                     flow matches on ip6.dst == A and  icmp6.type  ==  128  &&
2058                     icmp6.code  ==  0  (ICMPv6 echo request). The port of the
2059                     router that receives the echo request  does  not  matter.
2060                     Also,  the  ip.ttl  of  the  echo  request  packet is not
2061                     checked, so it complies with RFC 1812,  section  4.2.2.9.
2062                     Flows for ICMPv4 echo requests use the following actions:
2063
2064                     ip4.dst <-> ip4.src;
2065                     ip.ttl = 255;
2066                     icmp4.type = 0;
2067                     flags.loopback = 1;
2068                     next;
2069
2070
2071                     Flows for ICMPv6 echo requests use the following actions:
2072
2073                     ip6.dst <-> ip6.src;
2074                     ip.ttl = 255;
2075                     icmp6.type = 129;
2076                     flags.loopback = 1;
2077                     next;
2078
2079
2080              •      Reply to ARP requests.
2081
2082                     These flows reply to ARP requests for the router’s own IP
2083                     address. The ARP requests are handled  only  if  the  re‐
2084                     questor’s  IP  belongs to the same subnets of the logical
2085                     router port. For each router port P that owns IP  address
2086                     A,  which  belongs  to subnet S with prefix length L, and
2087                     Ethernet address E, a priority-90 flow matches inport  ==
2088                     P  &&  arp.spa == S/L && arp.op == 1 && arp.tpa == A (ARP
2089                     request) with the following actions:
2090
2091                     eth.dst = eth.src;
2092                     eth.src = xreg0[0..47];
2093                     arp.op = 2; /* ARP reply. */
2094                     arp.tha = arp.sha;
2095                     arp.sha = xreg0[0..47];
2096                     arp.tpa = arp.spa;
2097                     arp.spa = A;
2098                     outport = inport;
2099                     flags.loopback = 1;
2100                     output;
2101
2102
2103                     For the gateway port  on  a  distributed  logical  router
2104                     (where  one of the logical router ports specifies a gate‐
2105                     way chassis), the above flows are only programmed on  the
2106                     gateway port instance on the gateway chassis. This behav‐
2107                     ior avoids generation of multiple ARP responses from dif‐
2108                     ferent chassis, and allows upstream MAC learning to point
2109                     to the gateway chassis.
2110
2111                     For the logical router port with the option reside-on-re‐
2112                     direct-chassis  set  (which  is  centralized),  the above
2113                     flows are only programmed on the gateway port instance on
2114                     the gateway chassis (if the logical router has a distrib‐
2115                     uted gateway port). This behavior  avoids  generation  of
2116                     multiple ARP responses from different chassis, and allows
2117                     upstream MAC learning to point to the gateway chassis.
2118
2119              •      Reply to IPv6 Neighbor Solicitations. These  flows  reply
2120                     to  Neighbor  Solicitation  requests for the router’s own
2121                     IPv6 address and populate the logical router’s mac  bind‐
2122                     ing table.
2123
2124                     For  each  router  port  P  that owns IPv6 address A, so‐
2125                     licited node address S, and Ethernet address E, a  prior‐
2126                     ity-90  flow  matches  inport == P && nd_ns && ip6.dst ==
2127                     {A, E} && nd.target == A with the following actions:
2128
2129                     nd_na_router {
2130                         eth.src = xreg0[0..47];
2131                         ip6.src = A;
2132                         nd.target = A;
2133                         nd.tll = xreg0[0..47];
2134                         outport = inport;
2135                         flags.loopback = 1;
2136                         output;
2137                     };
2138
2139
2140                     For the gateway port  on  a  distributed  logical  router
2141                     (where  one of the logical router ports specifies a gate‐
2142                     way chassis), the above flows replying to  IPv6  Neighbor
2143                     Solicitations are only programmed on the gateway port in‐
2144                     stance on the gateway chassis. This behavior avoids  gen‐
2145                     eration  of  multiple replies from different chassis, and
2146                     allows upstream MAC learning  to  point  to  the  gateway
2147                     chassis.
2148
2149              •      These flows reply to ARP requests or IPv6 neighbor solic‐
2150                     itation for the virtual IP addresses  configured  in  the
2151                     router for NAT (both DNAT and SNAT) or load balancing.
2152
2153                     IPv4:  For  a  configured NAT (both DNAT and SNAT) IP ad‐
2154                     dress or a load balancer IPv4 VIP A, for each router port
2155                     P  with  Ethernet  address  E, a priority-90 flow matches
2156                     arp.op == 1 && arp.tpa == A (ARP request) with  the  fol‐
2157                     lowing actions:
2158
2159                     eth.dst = eth.src;
2160                     eth.src = xreg0[0..47];
2161                     arp.op = 2; /* ARP reply. */
2162                     arp.tha = arp.sha;
2163                     arp.sha = xreg0[0..47];
2164                     arp.tpa <-> arp.spa;
2165                     outport = inport;
2166                     flags.loopback = 1;
2167                     output;
2168
2169
2170                     IPv4:  For a configured load balancer IPv4 VIP, a similar
2171                     flow is added with the additional match inport  ==  P  if
2172                     the  VIP is reachable from any logical router port of the
2173                     logical router.
2174
2175                     If the router port P  is  a  distributed  gateway  router
2176                     port,  then  the  is_chassis_resident(P) is also added in
2177                     the match condition for the load balancer IPv4 VIP A.
2178
2179                     IPv6: For a configured NAT (both DNAT and  SNAT)  IP  ad‐
2180                     dress or a load balancer IPv6 VIP A (if the VIP is reach‐
2181                     able from any logical router port of the logical router),
2182                     solicited  node  address  S,  for each router port P with
2183                     Ethernet address E, a priority-90 flow matches inport  ==
2184                     P  &&  nd_ns  && ip6.dst == {A, S} && nd.target == A with
2185                     the following actions:
2186
2187                     eth.dst = eth.src;
2188                     nd_na {
2189                         eth.src = xreg0[0..47];
2190                         nd.tll = xreg0[0..47];
2191                         ip6.src = A;
2192                         nd.target = A;
2193                         outport = inport;
2194                         flags.loopback = 1;
2195                         output;
2196                     }
2197
2198
2199                     If the router port P  is  a  distributed  gateway  router
2200                     port,  then  the  is_chassis_resident(P) is also added in
2201                     the match condition for the load balancer IPv6 VIP A.
2202
2203                     For the gateway port on a distributed logical router with
2204                     NAT  (where  one  of the logical router ports specifies a
2205                     gateway chassis):
2206
2207                     •      If the corresponding NAT rule cannot be handled in
2208                            a  distributed  manner, then a priority-92 flow is
2209                            programmed on the gateway  port  instance  on  the
2210                            gateway  chassis.  A priority-91 drop flow is pro‐
2211                            grammed on the other chassis when ARP  requests/NS
2212                            packets are received on the gateway port. This be‐
2213                            havior avoids generation of multiple ARP responses
2214                            from  different  chassis,  and allows upstream MAC
2215                            learning to point to the gateway chassis.
2216
2217                     •      If the corresponding NAT rule can be handled in  a
2218                            distributed  manner,  then  this flow is only pro‐
2219                            grammed on the gateway  port  instance  where  the
2220                            logical_port specified in the NAT rule resides.
2221
2222                            Some  of  the actions are different for this case,
2223                            using the external_mac specified in the  NAT  rule
2224                            rather than the gateway port’s Ethernet address E:
2225
2226                            eth.src = external_mac;
2227                            arp.sha = external_mac;
2228
2229
2230                            or in the case of IPv6 neighbor solicition:
2231
2232                            eth.src = external_mac;
2233                            nd.tll = external_mac;
2234
2235
2236                            This  behavior  avoids  generation of multiple ARP
2237                            responses from different chassis, and  allows  up‐
2238                            stream  MAC learning to point to the correct chas‐
2239                            sis.
2240
2241              •      Priority-85 flows which drops the ARP and  IPv6  Neighbor
2242                     Discovery packets.
2243
2244              •      A priority-84 flow explicitly allows IPv6 multicast traf‐
2245                     fic that is supposed to reach the router pipeline  (i.e.,
2246                     router solicitation and router advertisement packets).
2247
2248              •      A  priority-83 flow explicitly drops IPv6 multicast traf‐
2249                     fic that is destined to reserved multicast groups.
2250
2251              •      A priority-82 flow allows IP  multicast  traffic  if  op‐
2252                     tions:mcast_relay=’true’, otherwise drops it.
2253
2254              •      UDP  port  unreachable.  Priority-80  flows generate ICMP
2255                     port unreachable messages in reply to UDP  datagrams  di‐
2256                     rected  to the router’s IP address, except in the special
2257                     case of gateways, which  accept  traffic  directed  to  a
2258                     router IP for load balancing and NAT purposes.
2259
2260                     These  flows  should  not match IP fragments with nonzero
2261                     offset.
2262
2263              •      TCP reset. Priority-80 flows generate TCP reset  messages
2264                     in reply to TCP datagrams directed to the router’s IP ad‐
2265                     dress, except in the special case of gateways, which  ac‐
2266                     cept  traffic  directed to a router IP for load balancing
2267                     and NAT purposes.
2268
2269                     These flows should not match IP  fragments  with  nonzero
2270                     offset.
2271
2272              •      Protocol or address unreachable. Priority-70 flows gener‐
2273                     ate ICMP protocol or  address  unreachable  messages  for
2274                     IPv4  and  IPv6 respectively in reply to packets directed
2275                     to the router’s IP address on  IP  protocols  other  than
2276                     UDP,  TCP,  and ICMP, except in the special case of gate‐
2277                     ways, which accept traffic directed to a  router  IP  for
2278                     load balancing purposes.
2279
2280                     These  flows  should  not match IP fragments with nonzero
2281                     offset.
2282
2283              •      Drop other IP traffic to this router.  These  flows  drop
2284                     any  other  traffic  destined  to  an  IP address of this
2285                     router that is not already handled by one  of  the  flows
2286                     above,  which  amounts to ICMP (other than echo requests)
2287                     and fragments with nonzero offsets. For each IP address A
2288                     owned  by  the router, a priority-60 flow matches ip4.dst
2289                     == A or ip6.dst == A and drops the traffic. An  exception
2290                     is  made  and  the  above flow is not added if the router
2291                     port’s own IP address is used  to  SNAT  packets  passing
2292                     through  that  router or if it is used as a load balancer
2293                     VIP.
2294
2295       The flows above handle all of the traffic that might be directed to the
2296       router  itself.  The following flows (with lower priorities) handle the
2297       remaining traffic, potentially for forwarding:
2298
2299              •      Drop Ethernet local broadcast. A  priority-50  flow  with
2300                     match  eth.bcast drops traffic destined to the local Eth‐
2301                     ernet  broadcast  address.  By  definition  this  traffic
2302                     should not be forwarded.
2303
2304              •      ICMP  time exceeded. For each router port P, whose IP ad‐
2305                     dress is A, a priority-100 flow with match inport == P &&
2306                     ip.ttl  == {0, 1} && !ip.later_frag matches packets whose
2307                     TTL has expired, with the following actions  to  send  an
2308                     ICMP time exceeded reply for IPv4 and IPv6 respectively:
2309
2310                     icmp4 {
2311                         icmp4.type = 11; /* Time exceeded. */
2312                         icmp4.code = 0;  /* TTL exceeded in transit. */
2313                         ip4.dst = ip4.src;
2314                         ip4.src = A;
2315                         ip.ttl = 254;
2316                         next;
2317                     };
2318                     icmp6 {
2319                         icmp6.type = 3; /* Time exceeded. */
2320                         icmp6.code = 0;  /* TTL exceeded in transit. */
2321                         ip6.dst = ip6.src;
2322                         ip6.src = A;
2323                         ip.ttl = 254;
2324                         next;
2325                     };
2326
2327
2328              •      TTL  discard. A priority-30 flow with match ip.ttl == {0,
2329                     1} and actions drop; drops other packets  whose  TTL  has
2330                     expired, that should not receive a ICMP error reply (i.e.
2331                     fragments with nonzero offset).
2332
2333              •      Next table. A priority-0 flows  match  all  packets  that
2334                     aren’t  already  handled  and  uses actions next; to feed
2335                     them to the next table.
2336
2337     Ingress Table 4: UNSNAT
2338
2339       This is for already established  connections’  reverse  traffic.  i.e.,
2340       SNAT  has  already  been done in egress pipeline and now the packet has
2341       entered the ingress pipeline as part of a reply. It is unSNATted here.
2342
2343       Ingress Table 4: UNSNAT on Gateway and Distributed Routers
2344
2345              •      If the Router (Gateway or Distributed) is configured with
2346                     load balancers, then below lflows are added:
2347
2348                     For each IPv4 address A defined as load balancer VIP with
2349                     the protocol P (and the protocol port T  if  defined)  is
2350                     also present as an external_ip in the NAT table, a prior‐
2351                     ity-120 logical flow is  added  with  the  match  ip4  &&
2352                     ip4.dst  ==  A  && P with the action next; to advance the
2353                     packet to the next table. If the load balancer has proto‐
2354                     col port B defined, then the match also has P.dst == B.
2355
2356                     The above flows are also added for IPv6 load balancers.
2357
2358       Ingress Table 4: UNSNAT on Gateway Routers
2359
2360              •      If  the  Gateway router has been configured to force SNAT
2361                     any previously DNATted packets to B, a priority-110  flow
2362                     matches  ip && ip4.dst == B or ip && ip6.dst == B with an
2363                     action ct_snat; .
2364
2365                     If   the    Gateway    router    is    configured    with
2366                     lb_force_snat_ip=router_ip  then for every logical router
2367                     port P attached to the Gateway router with the router  ip
2368                     B,  a priority-110 flow is added with the match inport ==
2369                     P && ip4.dst == B or inport == P && ip6.dst == B with  an
2370                     action ct_snat; .
2371
2372                     If  the  Gateway router has been configured to force SNAT
2373                     any previously load-balanced packets to B, a priority-100
2374                     flow  matches  ip  &&  ip4.dst == B or ip && ip6.dst == B
2375                     with an action ct_snat; .
2376
2377                     For each NAT configuration in the  OVN  Northbound  data‐
2378                     base,  that  asks  to  change  the source IP address of a
2379                     packet from A to B, a  priority-90  flow  matches  ip  &&
2380                     ip4.dst  ==  B  or  ip  &&  ip6.dst  ==  B with an action
2381                     ct_snat; . If the NAT rule is of type  dnat_and_snat  and
2382                     has  stateless=true in the options, then the action would
2383                     be next;.
2384
2385                     A priority-0 logical flow with match 1 has actions next;.
2386
2387       Ingress Table 4: UNSNAT on Distributed Routers
2388
2389              •      For each configuration in the  OVN  Northbound  database,
2390                     that  asks  to  change  the source IP address of a packet
2391                     from A to B, two priority-100 flows are added.
2392
2393                     If the NAT rule cannot be handled in a  distributed  man‐
2394                     ner,  then  the  below  priority-100  flows are only pro‐
2395                     grammed on the gateway chassis.
2396
2397                     •      The first flow matches ip && ip4.dst == B  &&  in‐
2398                            port == GW && flags.loopback == 0 or ip && ip6.dst
2399                            == B && inport == GW && flags.loopback == 0  where
2400                            GW  is  the distributed gateway port corresponding
2401                            to the NAT rule (specified or inferred),  with  an
2402                            action  ct_snat_in_czone;  to unSNAT in the common
2403                            zone. If the NAT rule is of type dnat_and_snat and
2404                            has stateless=true in the options, then the action
2405                            would be next;.
2406
2407                            If the NAT entry is of type snat, then there is an
2408                            additional match is_chassis_resident(cr-GW)
2409                             where cr-GW is the chassis resident port of GW.
2410
2411                     •      The  second flow matches ip && ip4.dst == B && in‐
2412                            port   ==   GW   &&   flags.loopback   ==   1   &&
2413                            flags.use_snat_zone  == 1 or ip && ip6.dst == B &&
2414                            inport  ==  GW   &&   flags.loopback   ==   0   &&
2415                            flags.use_snat_zone  == 1 where GW is the distrib‐
2416                            uted gateway port corresponding to  the  NAT  rule
2417                            (specified  or  inferred), with an action ct_snat;
2418                            to unSNAT in the snat zone. If the NAT rule is  of
2419                            type  dnat_and_snat  and has stateless=true in the
2420                            options, then the action would be ip4/6.dst=(B).
2421
2422                            If the NAT entry is of type snat, then there is an
2423                            additional match is_chassis_resident(cr-GW)
2424                             where cr-GW is the chassis resident port of GW.
2425
2426                     A priority-0 logical flow with match 1 has actions next;.
2427
2428     Ingress Table 5: DEFRAG
2429
2430       This  is to send packets to connection tracker for tracking and defrag‐
2431       mentation. It contains a priority-0 flow that simply moves  traffic  to
2432       the next table.
2433
2434       If  load  balancing rules with only virtual IP addresses are configured
2435       in OVN_Northbound database for a Gateway router, a priority-100 flow is
2436       added  for  each  configured  virtual IP address VIP. For IPv4 VIPs the
2437       flow matches ip && ip4.dst == VIP. For IPv6 VIPs, the flow  matches  ip
2438       && ip6.dst == VIP. The flow applies the action reg0 = VIP; ct_dnat; (or
2439       xxreg0 for IPv6) to send IP  packets  to  the  connection  tracker  for
2440       packet  de-fragmentation and to dnat the destination IP for the commit‐
2441       ted connection before sending it to the next table.
2442
2443       If load balancing rules with virtual IP addresses and ports are config‐
2444       ured  in  OVN_Northbound  database for a Gateway router, a priority-110
2445       flow is added for each configured  virtual  IP  address  VIP,  protocol
2446       PROTO  and  port  PORT. For IPv4 VIPs the flow matches ip && ip4.dst ==
2447       VIP && PROTO && PROTO.dst == PORT. For IPv6 VIPs, the flow  matches  ip
2448       &&  ip6.dst  == VIP && PROTO && PROTO.dst == PORT. The flow applies the
2449       action reg0 = VIP; reg9[16..31] = PROTO.dst; ct_dnat;  (or  xxreg0  for
2450       IPv6)  to send IP packets to the connection tracker for packet de-frag‐
2451       mentation and to dnat the destination IP for the  committed  connection
2452       before sending it to the next table.
2453
2454       If  ECMP  routes  with symmetric reply are configured in the OVN_North‐
2455       bound database for a gateway router, a priority-100 flow is  added  for
2456       each  router port on which symmetric replies are configured. The match‐
2457       ing logic for these ports essentially reverses the configured logic  of
2458       the  ECMP  route.  So  for instance, a route with a destination routing
2459       policy will instead match if the source IP address matches  the  static
2460       route’s prefix. The flow uses the actions chk_ecmp_nh_mac(); ct_next or
2461       chk_ecmp_nh(); ct_next to send IP packets to table 76 or to table 77 in
2462       order to check if source info are already stored by OVN and then to the
2463       connection tracker for  packet  de-fragmentation  and  tracking  before
2464       sending it to the next table.
2465
2466       If load balancing rules are configured in OVN_Northbound database for a
2467       Gateway router, a priority 50 flow that matches icmp || icmp6  with  an
2468       action  of  ct_dnat;,  this  allows potentially related ICMP traffic to
2469       pass through CT.
2470
2471     Ingress Table 6: Load balancing affinity check
2472
2473       Load balancing affinity check  table  contains  the  following  logical
2474       flows:
2475
2476              •      For all the configured load balancing rules for a logical
2477                     router where a positive affinity timeout is specified  in
2478                     options  column, that includes a L4 port PORT of protocol
2479                     P and IPv4 or IPv6 address VIP, a priority-100 flow  that
2480                     matches  on  ct.new  &&  ip  &&  reg0  ==  VIP  &&  P  &&
2481                     reg9[16..31] ==  PORT (xxreg0 == VIP
2482                      in  the  IPv6  case)  with  an  action  of   reg9[6]   =
2483                     chk_lb_aff(); next;
2484
2485              •      A  priority  0 flow is added which matches on all packets
2486                     and applies the action next;.
2487
2488     Ingress Table 7: DNAT
2489
2490       Packets enter the pipeline with destination IP address that needs to be
2491       DNATted  from a virtual IP address to a real IP address. Packets in the
2492       reverse direction needs to be unDNATed.
2493
2494       Ingress Table 7: Load balancing DNAT rules
2495
2496       Following load balancing DNAT flows are added  for  Gateway  router  or
2497       Router  with gateway port. These flows are programmed only on the gate‐
2498       way chassis. These flows do not get programmed for load balancers  with
2499       IPv6 VIPs.
2500
2501              •      For all the configured load balancing rules for a logical
2502                     router where a positive affinity timeout is specified  in
2503                     options  column, that includes a L4 port PORT of protocol
2504                     P and IPv4 or IPv6 address VIP, a priority-150 flow  that
2505                     matches on reg9[6] == 1 && ct.new && ip && reg0 == VIP &&
2506                     P && reg9[16..31] ==  PORT (xxreg0 ==  VIP  in  the  IPv6
2507                     case)  with  an  action  of ct_lb_mark(args) , where args
2508                     contains comma separated IP addresses (and optional  port
2509                     numbers) to load balance to. The address family of the IP
2510                     addresses of args is the same as the  address  family  of
2511                     VIP.
2512
2513              •      If  controller_event has been enabled for all the config‐
2514                     ured load balancing rules for a Gateway router or  Router
2515                     with  gateway  port  in OVN_Northbound database that does
2516                     not have configured  backends,  a  priority-130  flow  is
2517                     added to trigger ovn-controller events whenever the chas‐
2518                     sis  receives  a  packet  for  that  particular  VIP.  If
2519                     event-elb  meter  has been previously created, it will be
2520                     associated to the empty_lb logical flow
2521
2522              •      For all the configured load balancing rules for a Gateway
2523                     router  or  Router  with  gateway  port in OVN_Northbound
2524                     database that includes a L4 port PORT of protocol  P  and
2525                     IPv4  or  IPv6  address  VIP,  a  priority-120  flow that
2526                     matches on ct.new && !ct.rel && ip && reg0 == VIP && P &&
2527                     reg9[16..31] ==
2528                      PORT (xxreg0 == VIP
2529                      in  the  IPv6  case) with an action of ct_lb_mark(args),
2530                     where args contains comma  separated  IPv4  or  IPv6  ad‐
2531                     dresses  (and  optional port numbers) to load balance to.
2532                     If the router is configured to force SNAT  any  load-bal‐
2533                     anced  packets,  the  above  action  will  be replaced by
2534                     flags.force_snat_for_lb = 1;  ct_lb_mark(args);.  If  the
2535                     load  balancing  rule is configured with skip_snat set to
2536                     true,   the   above   action   will   be   replaced    by
2537                     flags.skip_snat_for_lb  = 1; ct_lb_mark(args);. If health
2538                     check is enabled, then args will only contain those  end‐
2539                     points  whose  service monitor status entry in OVN_South‐
2540                     bound db is either online or empty.
2541
2542                     The previous table lr_in_defrag sets  the  register  reg0
2543                     (or  xxreg0  for IPv6) and does ct_dnat. Hence for estab‐
2544                     lished traffic, this table just advances  the  packet  to
2545                     the next stage.
2546
2547              •      For  all the configured load balancing rules for a router
2548                     in OVN_Northbound database that includes a L4  port  PORT
2549                     of  protocol  P  and  IPv4  or IPv6 address VIP, a prior‐
2550                     ity-120 flow that matches on ct.est && !ct.rel && ip4  &&
2551                     reg0 == VIP && P && reg9[16..31] ==
2552                      PORT  (ip6  and  xxreg0 == VIP in the IPv6 case) with an
2553                     action of next;. If the router  is  configured  to  force
2554                     SNAT  any load-balanced packets, the above action will be
2555                     replaced by flags.force_snat_for_lb = 1;  next;.  If  the
2556                     load  balancing  rule is configured with skip_snat set to
2557                     true,   the   above   action   will   be   replaced    by
2558                     flags.skip_snat_for_lb = 1; next;.
2559
2560                     The  previous  table  lr_in_defrag sets the register reg0
2561                     (or xxreg0 for IPv6) and does ct_dnat. Hence  for  estab‐
2562                     lished  traffic,  this  table just advances the packet to
2563                     the next stage.
2564
2565              •      For all the configured load balancing rules for a  router
2566                     in  OVN_Northbound  database that includes just an IP ad‐
2567                     dress VIP to match on, a priority-110 flow  that  matches
2568                     on  ct.new  &&  !ct.rel  &&  ip4  && reg0 == VIP (ip6 and
2569                     xxreg0 == VIP  in  the  IPv6  case)  with  an  action  of
2570                     ct_lb_mark(args),  where  args  contains  comma separated
2571                     IPv4 or IPv6 addresses. If the router  is  configured  to
2572                     force  SNAT  any  load-balanced packets, the above action
2573                     will  be  replaced  by   flags.force_snat_for_lb   =   1;
2574                     ct_lb_mark(args);.  If the load balancing rule is config‐
2575                     ured with skip_snat set to true, the above action will be
2576                     replaced      by      flags.skip_snat_for_lb     =     1;
2577                     ct_lb_mark(args);.
2578
2579                     The previous table lr_in_defrag sets  the  register  reg0
2580                     (or  xxreg0  for IPv6) and does ct_dnat. Hence for estab‐
2581                     lished traffic, this table just advances  the  packet  to
2582                     the next stage.
2583
2584              •      For  all the configured load balancing rules for a router
2585                     in OVN_Northbound database that includes just an  IP  ad‐
2586                     dress  VIP  to match on, a priority-110 flow that matches
2587                     on ct.est && !ct.rel && ip4 && reg0 == VIP  (or  ip6  and
2588                     xxreg0  == VIP) with an action of next;. If the router is
2589                     configured to force SNAT any load-balanced  packets,  the
2590                     above  action will be replaced by flags.force_snat_for_lb
2591                     = 1; next;. If the load balancing rule is configured with
2592                     skip_snat  set to true, the above action will be replaced
2593                     by flags.skip_snat_for_lb = 1; next;.
2594
2595                     The previous table lr_in_defrag sets  the  register  reg0
2596                     (or  xxreg0  for IPv6) and does ct_dnat. Hence for estab‐
2597                     lished traffic, this table just advances  the  packet  to
2598                     the next stage.
2599
2600              •      If  the load balancer is created with --reject option and
2601                     it has no active backends, a TCP reset segment (for  tcp)
2602                     or an ICMP port unreachable packet (for all other kind of
2603                     traffic) will be sent whenever an incoming packet is  re‐
2604                     ceived for this load-balancer. Please note using --reject
2605                     option will disable empty_lb SB controller event for this
2606                     load balancer.
2607
2608              •      For  the related traffic, a priority 50 flow that matches
2609                     ct.rel && !ct.est && !ct.new  with an action  of  ct_com‐
2610                     mit_nat;, if the router has load balancer assigned to it.
2611                     Along with two priority 70 flows that match skip_snat and
2612                     force_snat flags.
2613
2614       Ingress Table 7: DNAT on Gateway Routers
2615
2616              •      For  each  configuration  in the OVN Northbound database,
2617                     that asks to change  the  destination  IP  address  of  a
2618                     packet  from  A  to  B, a priority-100 flow matches ip &&
2619                     ip4.dst == A or  ip  &&  ip6.dst  ==  A  with  an  action
2620                     flags.loopback = 1; ct_dnat(B);. If the Gateway router is
2621                     configured to force SNAT any DNATed packet, the above ac‐
2622                     tion  will  be replaced by flags.force_snat_for_dnat = 1;
2623                     flags.loopback = 1; ct_dnat(B);. If the NAT  rule  is  of
2624                     type dnat_and_snat and has stateless=true in the options,
2625                     then the action would be ip4/6.dst= (B).
2626
2627                     If the NAT  rule  has  allowed_ext_ips  configured,  then
2628                     there is an additional match ip4.src == allowed_ext_ips .
2629                     Similarly, for  IPV6,  match  would  be  ip6.src  ==  al‐
2630                     lowed_ext_ips.
2631
2632                     If  the  NAT rule has exempted_ext_ips set, then there is
2633                     an additional flow configured at priority 101.  The  flow
2634                     matches if source ip is an exempted_ext_ip and the action
2635                     is next; . This flow is used to bypass the ct_dnat action
2636                     for a packet originating from exempted_ext_ips.
2637
2638              •      A priority-0 logical flow with match 1 has actions next;.
2639
2640       Ingress Table 7: DNAT on Distributed Routers
2641
2642       On distributed routers, the DNAT table only handles packets with desti‐
2643       nation IP address that needs to be DNATted from a virtual IP address to
2644       a  real  IP  address. The unDNAT processing in the reverse direction is
2645       handled in a separate table in the egress pipeline.
2646
2647              •      For each configuration in the  OVN  Northbound  database,
2648                     that  asks  to  change  the  destination  IP address of a
2649                     packet from A to B, a priority-100  flow  matches  ip  &&
2650                     ip4.dst  ==  B  &&  inport == GW, where GW is the logical
2651                     router gateway port corresponding to the NAT rule (speci‐
2652                     fied  or inferred), with an action ct_dnat(B);. The match
2653                     will include ip6.dst == B in the IPv6 case.  If  the  NAT
2654                     rule  is  of type dnat_and_snat and has stateless=true in
2655                     the options, then the action would be ip4/6.dst=(B).
2656
2657                     If the NAT rule cannot be handled in a  distributed  man‐
2658                     ner,  then the priority-100 flow above is only programmed
2659                     on the gateway chassis.
2660
2661                     If the NAT  rule  has  allowed_ext_ips  configured,  then
2662                     there is an additional match ip4.src == allowed_ext_ips .
2663                     Similarly, for  IPV6,  match  would  be  ip6.src  ==  al‐
2664                     lowed_ext_ips.
2665
2666                     If  the  NAT rule has exempted_ext_ips set, then there is
2667                     an additional flow configured at priority 101.  The  flow
2668                     matches if source ip is an exempted_ext_ip and the action
2669                     is next; . This flow is used to bypass the ct_dnat action
2670                     for a packet originating from exempted_ext_ips.
2671
2672                     A priority-0 logical flow with match 1 has actions next;.
2673
2674     Ingress Table 8: Load balancing affinity learn
2675
2676       Load  balancing  affinity  learn  table  contains the following logical
2677       flows:
2678
2679              •      For all the configured load balancing rules for a logical
2680                     router  where  a positive affinity timeout T is specified
2681                     in options
2682                      column, that includes a L4 port PORT of protocol  P  and
2683                     IPv4  or  IPv6  address  VIP,  a  priority-100  flow that
2684                     matches on reg9[6] == 0 && ct.new && ip && reg0 == VIP &&
2685                     P  &&  reg9[16..31]  ==  PORT (xxreg0 == VIP  in the IPv6
2686                     case) with an action  of  commit_lb_aff(vip  =  VIP:PORT,
2687                     backend  = backend ip: backend port, proto = P, timeout =
2688                     T);.
2689
2690              •      A priority 0 flow is added which matches on  all  packets
2691                     and applies the action next;.
2692
2693     Ingress Table 9: ECMP symmetric reply processing
2694
2695              •      If ECMP routes with symmetric reply are configured in the
2696                     OVN_Northbound database for a gateway  router,  a  prior‐
2697                     ity-100  flow is added for each router port on which sym‐
2698                     metric replies are configured.  The  matching  logic  for
2699                     these  ports essentially reverses the configured logic of
2700                     the ECMP route. So for instance, a route with a  destina‐
2701                     tion  routing  policy will instead match if the source IP
2702                     address matches the static route’s prefix. The flow  uses
2703                     the   action   ct_commit   {   ct_label.ecmp_reply_eth  =
2704                     eth.src;"   "   ct_mark.ecmp_reply_port   =   K;};   com‐
2705                     mit_ecmp_nh(); next;
2706                      to  commit  the  connection  and storing eth.src and the
2707                     ECMP reply port binding tunnel key K in the ct_label  and
2708                     the traffic pattern to table 76 or 77.
2709
2710     Ingress Table 10: IPv6 ND RA option processing
2711
2712              •      A  priority-50  logical  flow  is  added for each logical
2713                     router port configured with  IPv6  ND  RA  options  which
2714                     matches  IPv6  ND  Router Solicitation packet and applies
2715                     the action put_nd_ra_opts and advances the packet to  the
2716                     next table.
2717
2718                     reg0[5] = put_nd_ra_opts(options);next;
2719
2720
2721                     For a valid IPv6 ND RS packet, this transforms the packet
2722                     into an IPv6 ND RA reply and sets the RA options  to  the
2723                     packet  and  stores  1  into  reg0[5]. For other kinds of
2724                     packets, it just stores 0 into reg0[5].  Either  way,  it
2725                     continues to the next table.
2726
2727              •      A priority-0 logical flow with match 1 has actions next;.
2728
2729     Ingress Table 11: IPv6 ND RA responder
2730
2731       This  table  implements IPv6 ND RA responder for the IPv6 ND RA replies
2732       generated by the previous table.
2733
2734              •      A priority-50 logical flow  is  added  for  each  logical
2735                     router  port  configured  with  IPv6  ND RA options which
2736                     matches IPv6 ND RA packets and reg0[5] == 1 and  responds
2737                     back  to  the  inport  after  applying  these actions. If
2738                     reg0[5]  is  set  to  1,  it  means   that   the   action
2739                     put_nd_ra_opts was successful.
2740
2741                     eth.dst = eth.src;
2742                     eth.src = E;
2743                     ip6.dst = ip6.src;
2744                     ip6.src = I;
2745                     outport = P;
2746                     flags.loopback = 1;
2747                     output;
2748
2749
2750                     where  E  is the MAC address and I is the IPv6 link local
2751                     address of the logical router port.
2752
2753                     (This terminates packet processing in  ingress  pipeline;
2754                     the packet does not go to the next ingress table.)
2755
2756              •      A priority-0 logical flow with match 1 has actions next;.
2757
2758     Ingress Table 12: IP Routing Pre
2759
2760       If  a packet arrived at this table from Logical Router Port P which has
2761       options:route_table value set, a logical flow with match inport ==  "P"
2762       with  priority  100  and  action  setting unique-generated per-datapath
2763       32-bit value (non-zero) in OVS register 7.  This  register’s  value  is
2764       checked  in  next  table.  If packet didn’t match any configured inport
2765       (<main> route table), register 7 value is set to 0.
2766
2767       This table contains the following logical flows:
2768
2769              •      Priority-100 flow with match inport ==  "LRP_NAME"  value
2770                     and action, which set route table identifier in reg7.
2771
2772                     A priority-0 logical flow with match 1 has actions reg7 =
2773                     0; next;.
2774
2775     Ingress Table 13: IP Routing
2776
2777       A packet that arrives at this table is an  IP  packet  that  should  be
2778       routed  to  the address in ip4.dst or ip6.dst. This table implements IP
2779       routing, setting reg0 (or xxreg0 for IPv6) to the next-hop  IP  address
2780       (leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
2781       and advances to the next table for ARP resolution. It  also  sets  reg1
2782       (or  xxreg1)  to  the  IP  address  owned  by  the selected router port
2783       (ingress table ARP Request will generate an  ARP  request,  if  needed,
2784       with  reg0 as the target protocol address and reg1 as the source proto‐
2785       col address).
2786
2787       For ECMP routes, i.e. multiple static routes with same policy and  pre‐
2788       fix  but different nexthops, the above actions are deferred to next ta‐
2789       ble. This table, instead, is responsible for determine the  ECMP  group
2790       id and select a member id within the group based on 5-tuple hashing. It
2791       stores group id in reg8[0..15] and member id in reg8[16..31]. This step
2792       is skipped with a priority-10300 rule if the traffic going out the ECMP
2793       route is reply traffic, and the ECMP route was configured to  use  sym‐
2794       metric  replies.  Instead,  the  stored  values in conntrack is used to
2795       choose the destination. The ct_label.ecmp_reply_eth tells the  destina‐
2796       tion   MAC   address   to   which   the  packet  should  be  sent.  The
2797       ct_mark.ecmp_reply_port tells the logical  router  port  on  which  the
2798       packet  should be sent. These values saved to the conntrack fields when
2799       the initial ingress traffic is received over the ECMP route and commit‐
2800       ted  to  conntrack.  If REGBIT_KNOWN_ECMP_NH is set, the priority-10300
2801       flows in this stage set the outport, while the eth.dst is set by  flows
2802       at the ARP/ND Resolution stage.
2803
2804       This table contains the following logical flows:
2805
2806              •      Priority-10550  flow  that  drops  IPv6  Router Solicita‐
2807                     tion/Advertisement packets that  were  not  processed  in
2808                     previous tables.
2809
2810              •      Priority-10550  flows that drop IGMP and MLD packets with
2811                     source MAC address owned by the router. These are used to
2812                     prevent looping statically forwarded IGMP and MLD packets
2813                     for which TTL is not decremented (it is always 1).
2814
2815              •      Priority-10500 flows that match IP multicast traffic des‐
2816                     tined  to  groups  registered  on  any  of  the  attached
2817                     switches and sets outport  to  the  associated  multicast
2818                     group  that  will eventually flood the traffic to all in‐
2819                     terested attached logical switches. The flows also decre‐
2820                     ment TTL.
2821
2822              •      Priority-10460  flows  that  match  IGMP  and MLD control
2823                     packets, set outport to the  MC_STATIC  multicast  group,
2824                     which  ovn-northd  populates  with the logical ports that
2825                     have options :mcast_flood=’true’. If no router ports  are
2826                     configured  to  flood  multicast  traffic the packets are
2827                     dropped.
2828
2829              •      Priority-10450 flow that matches unregistered  IP  multi‐
2830                     cast  traffic  decrements  TTL  and  sets  outport to the
2831                     MC_STATIC multicast  group,  which  ovn-northd  populates
2832                     with    the    logical    ports    that    have   options
2833                     :mcast_flood=’true’. If no router ports are configured to
2834                     flood multicast traffic the packets are dropped.
2835
2836              •      IPv4 routing table. For each route to IPv4 network N with
2837                     netmask M, on router port P with IP address A and  Ether‐
2838                     net  address E, a logical flow with match ip4.dst == N/M,
2839                     whose priority is the number of 1-bits in M, has the fol‐
2840                     lowing actions:
2841
2842                     ip.ttl--;
2843                     reg8[0..15] = 0;
2844                     reg0 = G;
2845                     reg1 = A;
2846                     eth.src = E;
2847                     outport = P;
2848                     flags.loopback = 1;
2849                     next;
2850
2851
2852                     (Ingress table 1 already verified that ip.ttl--; will not
2853                     yield a TTL exceeded error.)
2854
2855                     If the route has a gateway, G is the gateway IP  address.
2856                     Instead,  if the route is from a configured static route,
2857                     G is the next hop IP address. Else it is ip4.dst.
2858
2859              •      IPv6 routing table. For each route to IPv6 network N with
2860                     netmask  M, on router port P with IP address A and Ether‐
2861                     net address E, a logical flow with match in CIDR notation
2862                     ip6.dst == N/M, whose priority is the integer value of M,
2863                     has the following actions:
2864
2865                     ip.ttl--;
2866                     reg8[0..15] = 0;
2867                     xxreg0 = G;
2868                     xxreg1 = A;
2869                     eth.src = E;
2870                     outport = inport;
2871                     flags.loopback = 1;
2872                     next;
2873
2874
2875                     (Ingress table 1 already verified that ip.ttl--; will not
2876                     yield a TTL exceeded error.)
2877
2878                     If  the route has a gateway, G is the gateway IP address.
2879                     Instead, if the route is from a configured static  route,
2880                     G is the next hop IP address. Else it is ip6.dst.
2881
2882                     If  the  address  A is in the link-local scope, the route
2883                     will be limited to sending on the ingress port.
2884
2885                     For each static route the reg7 == id &&  is  prefixed  in
2886                     logical  flow  match portion. For routes with route_table
2887                     value set a unique non-zero id is used. For routes within
2888                     <main> route table (no route table set), this id value is
2889                     0.
2890
2891                     For each connected route (route to the LRP’s subnet CIDR)
2892                     the  logical flow match portion has no reg7 == id && pre‐
2893                     fix to have route to LRP’s subnets in all routing tables.
2894
2895              •      For ECMP routes, they are grouped by policy  and  prefix.
2896                     An  unique  id  (non-zero) is assigned to each group, and
2897                     each member is also  assigned  an  unique  id  (non-zero)
2898                     within each group.
2899
2900                     For  each IPv4/IPv6 ECMP group with group id GID and mem‐
2901                     ber ids MID1, MID2, ..., a logical  flow  with  match  in
2902                     CIDR  notation  ip4.dst  == N/M, or ip6.dst == N/M, whose
2903                     priority is the integer value of M, has the following ac‐
2904                     tions:
2905
2906                     ip.ttl--;
2907                     flags.loopback = 1;
2908                     reg8[0..15] = GID;
2909                     select(reg8[16..31], MID1, MID2, ...);
2910
2911
2912              •      A  priority-0  logical  flow that matches all packets not
2913                     already handled (match 1) and drops them (action drop;).
2914
2915     Ingress Table 14: IP_ROUTING_ECMP
2916
2917       This table implements the second part of IP  routing  for  ECMP  routes
2918       following  the  previous table. If a packet matched a ECMP group in the
2919       previous table, this table matches the group id and  member  id  stored
2920       from the previous table, setting reg0 (or xxreg0 for IPv6) to the next-
2921       hop IP address (leaving ip4.dst or ip6.dst, the packet’s final destina‐
2922       tion,  unchanged) and advances to the next table for ARP resolution. It
2923       also sets reg1 (or xxreg1) to the IP  address  owned  by  the  selected
2924       router port (ingress table ARP Request will generate an ARP request, if
2925       needed, with reg0 as the target protocol address and reg1 as the source
2926       protocol address).
2927
2928       This  processing is skipped for reply traffic being sent out of an ECMP
2929       route if the route was configured to use symmetric replies.
2930
2931       This table contains the following logical flows:
2932
2933              •      A priority-150 flow that matches reg8[0..15]  ==  0  with
2934                     action   next;  directly  bypasses  packets  of  non-ECMP
2935                     routes.
2936
2937              •      For each member with ID MID in each ECMP  group  with  ID
2938                     GID, a priority-100 flow with match reg8[0..15] == GID &&
2939                     reg8[16..31] == MID has following actions:
2940
2941                     [xx]reg0 = G;
2942                     [xx]reg1 = A;
2943                     eth.src = E;
2944                     outport = P;
2945
2946
2947              •      A priority-0 logical flow that matches  all  packets  not
2948                     already handled (match 1) and drops them (action drop;).
2949
2950     Ingress Table 15: Router policies
2951
2952       This table adds flows for the logical router policies configured on the
2953       logical  router.  Please  see   the   OVN_Northbound   database   Logi‐
2954       cal_Router_Policy table documentation in ovn-nb for supported actions.
2955
2956              •      For  each router policy configured on the logical router,
2957                     a logical flow is added with  specified  priority,  match
2958                     and actions.
2959
2960              •      If  the  policy action is reroute with 2 or more nexthops
2961                     defined, then the logical flow is added with the  follow‐
2962                     ing actions:
2963
2964                     reg8[0..15] = GID;
2965                     reg8[16..31] = select(1,..n);
2966
2967
2968                     where  GID  is  the ECMP group id generated by ovn-northd
2969                     for this policy and n is the number of  nexthops.  select
2970                     action selects one of the nexthop member id, stores it in
2971                     the register reg8[16..31] and advances the packet to  the
2972                     next stage.
2973
2974              •      If  the  policy  action  is reroute with just one nexhop,
2975                     then the logical flow is added  with  the  following  ac‐
2976                     tions:
2977
2978                     [xx]reg0 = H;
2979                     eth.src = E;
2980                     outport = P;
2981                     reg8[0..15] = 0;
2982                     flags.loopback = 1;
2983                     next;
2984
2985
2986                     where  H  is the nexthop  defined in the router policy, E
2987                     is the ethernet address of the logical router  port  from
2988                     which  the  nexthop  is  reachable  and  P is the logical
2989                     router port from which the nexthop is reachable.
2990
2991              •      If a router policy has the option pkt_mark=m set  and  if
2992                     the  action  is  not  drop, then the action also includes
2993                     pkt.mark = m to mark the packet with the marker m.
2994
2995     Ingress Table 16: ECMP handling for router policies
2996
2997       This table handles the ECMP for the  router  policies  configured  with
2998       multiple nexthops.
2999
3000              •      A priority-150 flow is added to advance the packet to the
3001                     next stage if the ECMP group id register  reg8[0..15]  is
3002                     0.
3003
3004              •      For  each  ECMP  reroute router policy with multiple nex‐
3005                     thops, a priority-100 flow is added for  each  nexthop  H
3006                     with  the  match  reg8[0..15] == GID && reg8[16..31] == M
3007                     where GID is the router  policy  group  id  generated  by
3008                     ovn-northd and M is the member id of the nexthop H gener‐
3009                     ated by ovn-northd. The following actions  are  added  to
3010                     the flow:
3011
3012                     [xx]reg0 = H;
3013                     eth.src = E;
3014                     outport = P
3015                     "flags.loopback = 1; "
3016                     "next;"
3017
3018
3019                     where  H  is the nexthop  defined in the router policy, E
3020                     is the ethernet address of the logical router  port  from
3021                     which  the  nexthop  is  reachable  and  P is the logical
3022                     router port from which the nexthop is reachable.
3023
3024              •      A priority-0 logical flow that matches  all  packets  not
3025                     already handled (match 1) and drops them (action drop;).
3026
3027     Ingress Table 17: ARP/ND Resolution
3028
3029       Any  packet that reaches this table is an IP packet whose next-hop IPv4
3030       address is in reg0 or IPv6 address is in xxreg0.  (ip4.dst  or  ip6.dst
3031       contains  the final destination.) This table resolves the IP address in
3032       reg0 (or xxreg0) into an output port in outport and an Ethernet address
3033       in eth.dst, using the following flows:
3034
3035              •      A  priority-500  flow  that  matches IP multicast traffic
3036                     that was allowed in the routing pipeline. For  this  kind
3037                     of  traffic  the outport was already set so the flow just
3038                     advances to the next table.
3039
3040              •      Priority-200 flows that match ECMP reply traffic for  the
3041                     routes  configured to use symmetric replies, with actions
3042                     push(xxreg1);    xxreg1    =    ct_label;    eth.dst    =
3043                     xxreg1[32..79];  pop(xxreg1);  next;. xxreg1 is used here
3044                     to avoid masked access to ct_label, to make the flow  HW-
3045                     offloading friendly.
3046
3047              •      Static MAC bindings. MAC bindings can be known statically
3048                     based on data in the OVN_Northbound database. For  router
3049                     ports  connected to logical switches, MAC bindings can be
3050                     known statically from the addresses column in  the  Logi‐
3051                     cal_Switch_Port  table.  For  router  ports  connected to
3052                     other logical routers, MAC bindings can be  known  stati‐
3053                     cally  from  the  mac  and  networks  column in the Logi‐
3054                     cal_Router_Port table. (Note: the flow is  NOT  installed
3055                     for  the  IP  addresses that belong to a neighbor logical
3056                     router port if the current  router  has  the  options:dy‐
3057                     namic_neigh_routers set to true)
3058
3059                     For  each IPv4 address A whose host is known to have Eth‐
3060                     ernet address E on router port  P,  a  priority-100  flow
3061                     with match outport === P && reg0 == A has actions eth.dst
3062                     = E; next;.
3063
3064                     For each virtual ip A configured on  a  logical  port  of
3065                     type  virtual  and  its  virtual parent set in its corre‐
3066                     sponding Port_Binding record and the virtual parent  with
3067                     the  Ethernet  address  E and the virtual ip is reachable
3068                     via the router port P, a  priority-100  flow  with  match
3069                     outport  ===  P && xxreg0/reg0 == A has actions eth.dst =
3070                     E; next;.
3071
3072                     For each virtual ip A configured on  a  logical  port  of
3073                     type virtual and its virtual parent not set in its corre‐
3074                     sponding Port_Binding record and  the  virtual  ip  A  is
3075                     reachable via the router port P, a priority-100 flow with
3076                     match outport === P  &&  xxreg0/reg0  ==  A  has  actions
3077                     eth.dst = 00:00:00:00:00:00; next;. This flow is added so
3078                     that the ARP is always resolved for the virtual ip  A  by
3079                     generating ARP request and not consulting the MAC_Binding
3080                     table as it can have incorrect value for the  virtual  ip
3081                     A.
3082
3083                     For  each IPv6 address A whose host is known to have Eth‐
3084                     ernet address E on router port  P,  a  priority-100  flow
3085                     with  match  outport  ===  P  &&  xxreg0 == A has actions
3086                     eth.dst = E; next;.
3087
3088                     For each logical router port with an IPv4 address A and a
3089                     mac  address of E that is reachable via a different logi‐
3090                     cal router port P, a priority-100 flow with match outport
3091                     === P && reg0 == A has actions eth.dst = E; next;.
3092
3093                     For each logical router port with an IPv6 address A and a
3094                     mac address of E that is reachable via a different  logi‐
3095                     cal router port P, a priority-100 flow with match outport
3096                     === P && xxreg0 == A has actions eth.dst = E; next;.
3097
3098              •      Static MAC bindings from NAT entries.  MAC  bindings  can
3099                     also  be  known  for  the entries in the NAT table. Below
3100                     flows are programmed for distributed logical routers  i.e
3101                     with a distributed router port.
3102
3103                     For  each row in the NAT table with IPv4 address A in the
3104                     external_ip column of NAT table, a priority-100 flow with
3105                     the  match outport === P && reg0 == A has actions eth.dst
3106                     = E; next;, where P is  the  distributed  logical  router
3107                     port,  E  is  the  Ethernet  address if set in the exter‐
3108                     nal_mac column of NAT table for  of  type  dnat_and_snat,
3109                     otherwise the Ethernet address of the distributed logical
3110                     router port. Note that if the external_ip is not within a
3111                     subnet  on  the owning logical router, then OVN will only
3112                     create ARP resolution flows if the  options:add_route  is
3113                     set  to  true. Otherwise, no ARP resolution flows will be
3114                     added.
3115
3116                     For IPv6 NAT entries, same flows are added, but using the
3117                     register xxreg0 for the match.
3118
3119              •      If the router datapath runs a port with redirect-type set
3120                     to bridged, for each distributed NAT rule with  IP  A  in
3121                     the  logical_ip  column  and  logical port P in the logi‐
3122                     cal_port column of NAT table, a priority-90 flow with the
3123                     match  outport  ==  Q && ip.src === A && is_chassis_resi‐
3124                     dent(P), where Q is the distributed logical  router  port
3125                     and  action  get_arp(outport,  reg0);  next; for IPv4 and
3126                     get_nd(outport, xxreg0); next; for IPv6.
3127
3128              •      Traffic with IP  destination  an  address  owned  by  the
3129                     router  should  be  dropped.  Such  traffic  is  normally
3130                     dropped in ingress table IP Input except for IPs that are
3131                     also shared with SNAT rules. However, if there was no un‐
3132                     SNAT operation  that  happened  successfully  until  this
3133                     point  in  the  pipeline  and  the  destination IP of the
3134                     packet is still a router owned IP,  the  packets  can  be
3135                     safely dropped.
3136
3137                     A  priority-2  logical  flow  with  match  ip4.dst = {..}
3138                     matches on traffic destined  to  router  owned  IPv4  ad‐
3139                     dresses  which  are  also  SNAT IPs. This flow has action
3140                     drop;.
3141
3142                     A priority-2 logical  flow  with  match  ip6.dst  =  {..}
3143                     matches  on  traffic  destined  to  router owned IPv6 ad‐
3144                     dresses which are also SNAT IPs.  This  flow  has  action
3145                     drop;.
3146
3147                     A  priority-0  logical  that flow matches all packets not
3148                     already handled (match 1) and drops them (action drop;).
3149
3150              •      Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
3151                     ings  that  have  become known dynamically through ARP or
3152                     neighbor discovery. (The ingress table ARP  Request  will
3153                     issue  an  ARP or neighbor solicitation request for cases
3154                     where the binding is not yet known.)
3155
3156                     A priority-0 logical flow  with  match  ip4  has  actions
3157                     get_arp(outport, reg0); next;.
3158
3159                     A  priority-0  logical  flow  with  match ip6 has actions
3160                     get_nd(outport, xxreg0); next;.
3161
3162              •      For a distributed gateway LRP with redirect-type  set  to
3163                     bridged,   a  priority-50  flow  will  match  outport  ==
3164                     "ROUTER_PORT" and !is_chassis_resident ("cr-ROUTER_PORT")
3165                     has  actions  eth.dst = E; next;, where E is the ethernet
3166                     address of the logical router port.
3167
3168     Ingress Table 18: Check packet length
3169
3170       For distributed logical routers or gateway routers  with  gateway  port
3171       configured  with options:gateway_mtu to a valid integer value, this ta‐
3172       ble adds a priority-50 logical flow with the match outport  ==  GW_PORT
3173       where  GW_PORT  is  the  gateway  router  port  and  applies the action
3174       check_pkt_larger and advances the packet to the next table.
3175
3176       REGBIT_PKT_LARGER = check_pkt_larger(L); next;
3177
3178
3179       where L is the packet length to check for. If the packet is larger than
3180       L, it stores 1 in the register bit REGBIT_PKT_LARGER. The value of L is
3181       taken from options:gateway_mtu column of Logical_Router_Port row.
3182
3183       If the port is also configured with options:gateway_mtu_bypass then an‐
3184       other  flow  is added, with priority-55, to bypass the check_pkt_larger
3185       flow.
3186
3187       This table adds one priority-0 fallback flow that matches  all  packets
3188       and advances to the next table.
3189
3190     Ingress Table 19: Handle larger packets
3191
3192       For  distributed  logical  routers or gateway routers with gateway port
3193       configured with options:gateway_mtu to a valid integer value, this  ta‐
3194       ble  adds  the  following  priority-150  logical  flow for each logical
3195       router port with the match inport == LRP && outport == GW_PORT &&  REG‐
3196       BIT_PKT_LARGER  &&  !REGBIT_EGRESS_LOOPBACK,  where  LRP is the logical
3197       router port and GW_PORT is the gateway port and applies  the  following
3198       action for ipv4 and ipv6 respectively:
3199
3200       icmp4 {
3201           icmp4.type = 3; /* Destination Unreachable. */
3202           icmp4.code = 4;  /* Frag Needed and DF was Set. */
3203           icmp4.frag_mtu = M;
3204           eth.dst = E;
3205           ip4.dst = ip4.src;
3206           ip4.src = I;
3207           ip.ttl = 255;
3208           REGBIT_EGRESS_LOOPBACK = 1;
3209           REGBIT_PKT_LARGER = 0;
3210           next(pipeline=ingress, table=0);
3211       };
3212       icmp6 {
3213           icmp6.type = 2;
3214           icmp6.code = 0;
3215           icmp6.frag_mtu = M;
3216           eth.dst = E;
3217           ip6.dst = ip6.src;
3218           ip6.src = I;
3219           ip.ttl = 255;
3220           REGBIT_EGRESS_LOOPBACK = 1;
3221           REGBIT_PKT_LARGER = 0;
3222           next(pipeline=ingress, table=0);
3223       };
3224
3225
3226              •      Where  M  is the (fragment MTU - 58) whose value is taken
3227                     from options:gateway_mtu  column  of  Logical_Router_Port
3228                     row.
3229
3230E is the Ethernet address of the logical router port.
3231
3232I is the IPv4/IPv6 address of the logical router port.
3233
3234       This  table  adds one priority-0 fallback flow that matches all packets
3235       and advances to the next table.
3236
3237     Ingress Table 20: Gateway Redirect
3238
3239       For distributed logical routers where one or more of the logical router
3240       ports specifies a gateway chassis, this table redirects certain packets
3241       to the distributed gateway port instances  on  the  gateway  chassises.
3242       This table has the following flows:
3243
3244              •      For each NAT rule in the OVN Northbound database that can
3245                     be handled in a distributed manner, a priority-100  logi‐
3246                     cal  flow  with  match  ip4.src  == B && outport == GW &&
3247                     is_chassis_resident(P), where GW is the distributed gate‐
3248                     way port specified in the NAT rule and P is the NAT logi‐
3249                     cal port. IP traffic matching the above rule will be man‐
3250                     aged  locally setting reg1 to C and eth.src to D, where C
3251                     is NAT external ip and D is NAT external mac.
3252
3253              •      For each dnat_and_snat NAT rule with  stateless=true  and
3254                     allowed_ext_ips  configured,  a  priority-75 flow is pro‐
3255                     grammed with match ip4.dst == B and action outport =  CR;
3256                     next;  where  B is the NAT rule external IP and CR is the
3257                     chassisredirect port representing  the  instance  of  the
3258                     logical  router  distributed  gateway port on the gateway
3259                     chassis. Moreover a priority-70 flow is  programmed  with
3260                     same  match  and action drop;. For each dnat_and_snat NAT
3261                     rule with stateless=true and exempted_ext_ips configured,
3262                     a  priority-75 flow is programmed with match ip4.dst == B
3263                     and action drop; where B is the NAT rule external  IP.  A
3264                     similar flow is added for IPv6 traffic.
3265
3266              •      For each NAT rule in the OVN Northbound database that can
3267                     be handled in a distributed manner, a priority-80 logical
3268                     flow  with  drop action if the NAT logical port is a vir‐
3269                     tual port not claimed by any chassis yet.
3270
3271              •      A priority-50 logical flow with match outport ==  GW  has
3272                     actions  outport  =  CR;  next;,  where GW is the logical
3273                     router distributed gateway  port  and  CR  is  the  chas‐
3274                     sisredirect port representing the instance of the logical
3275                     router distributed gateway port on the gateway chassis.
3276
3277              •      A priority-0 logical flow with match 1 has actions next;.
3278
3279     Ingress Table 21: ARP Request
3280
3281       In the common case where the Ethernet destination  has  been  resolved,
3282       this  table outputs the packet. Otherwise, it composes and sends an ARP
3283       or IPv6 Neighbor Solicitation request. It holds the following flows:
3284
3285              •      Unknown MAC address. A priority-100 flow for IPv4 packets
3286                     with match eth.dst == 00:00:00:00:00:00 has the following
3287                     actions:
3288
3289                     arp {
3290                         eth.dst = ff:ff:ff:ff:ff:ff;
3291                         arp.spa = reg1;
3292                         arp.tpa = reg0;
3293                         arp.op = 1;  /* ARP request. */
3294                         output;
3295                     };
3296
3297
3298                     Unknown MAC address. For each IPv6 static  route  associ‐
3299                     ated  with  the  router  with the nexthop IP: G, a prior‐
3300                     ity-200 flow for  IPv6  packets  with  match  eth.dst  ==
3301                     00:00:00:00:00:00  &&  xxreg0 == G with the following ac‐
3302                     tions is added:
3303
3304                     nd_ns {
3305                         eth.dst = E;
3306                         ip6.dst = I
3307                         nd.target = G;
3308                         output;
3309                     };
3310
3311
3312                     Where E is the multicast mac derived from the Gateway IP,
3313                     I  is  the solicited-node multicast address corresponding
3314                     to the target address G.
3315
3316                     Unknown MAC address. A priority-100 flow for IPv6 packets
3317                     with match eth.dst == 00:00:00:00:00:00 has the following
3318                     actions:
3319
3320                     nd_ns {
3321                         nd.target = xxreg0;
3322                         output;
3323                     };
3324
3325
3326                     (Ingress table IP Routing initialized reg1  with  the  IP
3327                     address  owned  by outport and (xx)reg0 with the next-hop
3328                     IP address)
3329
3330                     The IP packet that triggers the ARP/IPv6  NS  request  is
3331                     dropped.
3332
3333              •      Known MAC address. A priority-0 flow with match 1 has ac‐
3334                     tions output;.
3335
3336     Egress Table 0: Check DNAT local
3337
3338       This table checks if the packet  needs  to  be  DNATed  in  the  router
3339       ingress  table  lr_in_dnat  after  it  is SNATed and looped back to the
3340       ingress pipeline. This check is done only for routers  configured  with
3341       distributed  gateway  ports and NAT entries. This check is done so that
3342       SNAT and DNAT is done in different zones instead of a common zone.
3343
3344              •      For each NAT rule in the OVN  Northbound  database  on  a
3345                     distributed router, a priority-50 logical flow with match
3346                     ip4.dst == E && is_chassis_resident(P), where  E  is  the
3347                     external  IP address specified in the NAT rule, GW is the
3348                     logical   router   distributed    gateway    port.    For
3349                     dnat_and_snat  NAT  rule, P is the logical port specified
3350                     in the NAT rule. If logical_port column of NAT  table  is
3351                     NOT  set,  then  P is the chassisredirect port of GW with
3352                     the actions: REGBIT_DST_NAT_IP_LOCAL = 1; next;
3353
3354              •      A priority-0 logical flow with match 1 has  actions  REG‐
3355                     BIT_DST_NAT_IP_LOCAL = 0; next;.
3356
3357       This  table  also  installs a priority-50 logical flow for each logical
3358       router that has NATs configured on it. The flow has match ip &&  ct_la‐
3359       bel.natted  == 1 and action REGBIT_DST_NAT_IP_LOCAL = 1; next;. This is
3360       intended to ensure that traffic that was DNATted  locally  will  use  a
3361       separate  conntrack  zone  for  SNAT  if  SNAT is required later in the
3362       egress pipeline. Note that this flow checks the value of  ct_label.nat‐
3363       ted,  which  is set in the ingress pipeline. This means that ovn-northd
3364       assumes that this value is carried over from the  ingress  pipeline  to
3365       the  egress  pipeline and is not altered or cleared. If conntrack label
3366       values are ever changed to be cleared between the  ingress  and  egress
3367       pipelines,  then  the match conditions of this flow will be updated ac‐
3368       cordingly.
3369
3370     Egress Table 1: UNDNAT
3371
3372       This is for already established  connections’  reverse  traffic.  i.e.,
3373       DNAT  has  already been done in ingress pipeline and now the packet has
3374       entered the egress pipeline as part of a reply. This  traffic  is  unD‐
3375       NATed here.
3376
3377              •      A priority-0 logical flow with match 1 has actions next;.
3378
3379     Egress Table 1: UNDNAT on Gateway Routers
3380
3381              •      For  all  IP  packets,  a priority-50 flow with an action
3382                     flags.loopback = 1; ct_dnat;.
3383
3384     Egress Table 1: UNDNAT on Distributed Routers
3385
3386              •      For all the configured load balancing rules for a  router
3387                     with  gateway  port  in  OVN_Northbound database that in‐
3388                     cludes an IPv4 address VIP, for every  backend  IPv4  ad‐
3389                     dress  B  defined for the VIP a priority-120 flow is pro‐
3390                     grammed on gateway chassis that matches ip && ip4.src  ==
3391                     B  && outport == GW, where GW is the logical router gate‐
3392                     way port with an action ct_dnat_in_czone;. If the backend
3393                     IPv4  address  B  is also configured with L4 port PORT of
3394                     protocol P, then the match also includes P.src  ==  PORT.
3395                     These  flows  are  not added for load balancers with IPv6
3396                     VIPs.
3397
3398                     If the router is configured to force SNAT  any  load-bal‐
3399                     anced   packets,   above   action  will  be  replaced  by
3400                     flags.force_snat_for_lb = 1; ct_dnat;.
3401
3402              •      For each configuration in  the  OVN  Northbound  database
3403                     that  asks  to  change  the  destination  IP address of a
3404                     packet from an IP address of A to B, a priority-100  flow
3405                     matches  ip && ip4.src == B && outport == GW, where GW is
3406                     the  logical  router  gateway  port,   with   an   action
3407                     ct_dnat_in_czone;.   If   the   NAT   rule   is  of  type
3408                     dnat_and_snat and has stateless=true in the options, then
3409                     the action would be next;.
3410
3411                     If  the  NAT rule cannot be handled in a distributed man‐
3412                     ner, then the priority-100 flow above is only  programmed
3413                     on the gateway chassis with the action ct_dnat_in_czone.
3414
3415                     If  the  NAT rule can be handled in a distributed manner,
3416                     then there is an additional action eth.src =  EA;,  where
3417                     EA is the ethernet address associated with the IP address
3418                     A in the NAT rule. This allows upstream MAC  learning  to
3419                     point to the correct chassis.
3420
3421     Egress Table 2: Post UNDNAT
3422
3423              •      A  priority-50 logical flow is added that commits any un‐
3424                     tracked flows from the previous table  lr_out_undnat  for
3425                     Gateway  routers.  This flow matches on ct.new && ip with
3426                     action ct_commit { } ; next; .
3427
3428              •      A priority-0 logical flow with match 1 has actions next;.
3429
3430     Egress Table 3: SNAT
3431
3432       Packets that are configured to be SNATed get their  source  IP  address
3433       changed based on the configuration in the OVN Northbound database.
3434
3435              •      A  priority-120 flow to advance the IPv6 Neighbor solici‐
3436                     tation packet to next table to skip  SNAT.  In  the  case
3437                     where  ovn-controller  injects an IPv6 Neighbor Solicita‐
3438                     tion packet (for nd_ns action) we don’t want  the  packet
3439                     to go through conntrack.
3440
3441       Egress Table 3: SNAT on Gateway Routers
3442
3443              •      If  the Gateway router in the OVN Northbound database has
3444                     been configured to force SNAT a  packet  (that  has  been
3445                     previously  DNATted)  to  B,  a priority-100 flow matches
3446                     flags.force_snat_for_dnat ==  1  &&  ip  with  an  action
3447                     ct_snat(B);.
3448
3449              •      If  a  load balancer configured to skip snat has been ap‐
3450                     plied to the Gateway router pipeline, a priority-120 flow
3451                     matches  flags.skip_snat_for_lb == 1 && ip with an action
3452                     next;.
3453
3454              •      If the Gateway router in the OVN Northbound database  has
3455                     been  configured  to  force  SNAT a packet (that has been
3456                     previously  load-balanced)  using  router  IP  (i.e   op‐
3457                     tions:lb_force_snat_ip=router_ip),  then for each logical
3458                     router port P attached to the Gateway  router,  a  prior‐
3459                     ity-110 flow matches flags.force_snat_for_lb == 1 && out‐
3460                     port == P
3461                      with an action ct_snat(R); where R is the IP  configured
3462                     on  the  router  port.  If  R is an IPv4 address then the
3463                     match will also include ip4 and if it is an IPv6 address,
3464                     then the match will also include ip6.
3465
3466                     If  the logical router port P is configured with multiple
3467                     IPv4 and multiple IPv6 addresses, only the first IPv4 and
3468                     first IPv6 address is considered.
3469
3470              •      If  the Gateway router in the OVN Northbound database has
3471                     been configured to force SNAT a  packet  (that  has  been
3472                     previously  load-balanced)  to  B,  a  priority-100  flow
3473                     matches flags.force_snat_for_lb == 1 && ip with an action
3474                     ct_snat(B);.
3475
3476              •      For  each  configuration  in the OVN Northbound database,
3477                     that asks to change the source IP  address  of  a  packet
3478                     from  an  IP  address of A or to change the source IP ad‐
3479                     dress of a packet that belongs to network A to B, a  flow
3480                     matches  ip  && ip4.src == A && (!ct.trk || !ct.rpl) with
3481                     an action ct_snat(B);. The priority of the flow is calcu‐
3482                     lated  based on the mask of A, with matches having larger
3483                     masks getting higher priorities. If the NAT  rule  is  of
3484                     type dnat_and_snat and has stateless=true in the options,
3485                     then the action would be ip4/6.src= (B).
3486
3487              •      If the NAT  rule  has  allowed_ext_ips  configured,  then
3488                     there is an additional match ip4.dst == allowed_ext_ips .
3489                     Similarly, for  IPV6,  match  would  be  ip6.dst  ==  al‐
3490                     lowed_ext_ips.
3491
3492              •      If  the  NAT rule has exempted_ext_ips set, then there is
3493                     an additional flow configured at the priority + 1 of cor‐
3494                     responding  NAT  rule. The flow matches if destination ip
3495                     is an exempted_ext_ip and the action is next; . This flow
3496                     is  used  to bypass the ct_snat action for a packet which
3497                     is destinted to exempted_ext_ips.
3498
3499              •      A priority-0 logical flow with match 1 has actions next;.
3500
3501       Egress Table 3: SNAT on Distributed Routers
3502
3503              •      For each configuration in the  OVN  Northbound  database,
3504                     that  asks  to  change  the source IP address of a packet
3505                     from an IP address of A or to change the  source  IP  ad‐
3506                     dress  of  a  packet  that belongs to network A to B, two
3507                     flows are added. The priority P of these flows are calcu‐
3508                     lated  based on the mask of A, with matches having larger
3509                     masks getting higher priorities.
3510
3511                     If the NAT rule cannot be handled in a  distributed  man‐
3512                     ner,  then  the  below  flows  are only programmed on the
3513                     gateway chassis increasing flow priority by 128 in  order
3514                     to be run first.
3515
3516                     •      The first flow is added with the calculated prior‐
3517                            ity P and match ip && ip4.src == A &&  outport  ==
3518                            GW,  where  GW is the logical router gateway port,
3519                            with an action ct_snat_in_czone(B); to  SNATed  in
3520                            the  common  zone.  If  the  NAT  rule  is of type
3521                            dnat_and_snat and has stateless=true  in  the  op‐
3522                            tions, then the action would be ip4/6.src=(B).
3523
3524                     •      The  second flow is added with the calculated pri‐
3525                            ority P + 1  and match ip && ip4.src == A &&  out‐
3526                            port  == GW && REGBIT_DST_NAT_IP_LOCAL == 0, where
3527                            GW is the logical router gateway port, with an ac‐
3528                            tion  ct_snat(B); to SNAT in the snat zone. If the
3529                            NAT rule is of type dnat_and_snat and  has  state‐
3530                            less=true in the options, then the action would be
3531                            ip4/6.src=(B).
3532
3533                     If the NAT rule can be handled in a  distributed  manner,
3534                     then  there  is an additional action (for both the flows)
3535                     eth.src = EA;, where EA is the ethernet  address  associ‐
3536                     ated  with  the IP address A in the NAT rule. This allows
3537                     upstream MAC learning to point to the correct chassis.
3538
3539                     If the NAT  rule  has  allowed_ext_ips  configured,  then
3540                     there is an additional match ip4.dst == allowed_ext_ips .
3541                     Similarly, for  IPV6,  match  would  be  ip6.dst  ==  al‐
3542                     lowed_ext_ips.
3543
3544                     If  the  NAT rule has exempted_ext_ips set, then there is
3545                     an additional flow configured at the priority P +  2   of
3546                     corresponding  NAT  rule. The flow matches if destination
3547                     ip is an exempted_ext_ip and the action is next;  .  This
3548                     flow  is  used  to  bypass  the ct_snat action for a flow
3549                     which is destinted to exempted_ext_ips.
3550
3551              •      A priority-0 logical flow with match 1 has actions next;.
3552
3553     Egress Table 4: Egress Loopback
3554
3555       For distributed logical routers where one of the logical  router  ports
3556       specifies a gateway chassis.
3557
3558       While  UNDNAT  and SNAT processing have already occurred by this point,
3559       this traffic needs to be forced through egress loopback  on  this  dis‐
3560       tributed gateway port instance, in order for UNSNAT and DNAT processing
3561       to be applied, and also for IP routing and ARP resolution after all  of
3562       the NAT processing, so that the packet can be forwarded to the destina‐
3563       tion.
3564
3565       This table has the following flows:
3566
3567              •      For each NAT rule in the OVN  Northbound  database  on  a
3568                     distributed  router,  a  priority-100  logical  flow with
3569                     match ip4.dst == E && outport == GW  &&  is_chassis_resi‐
3570                     dent(P),  where E is the external IP address specified in
3571                     the NAT rule, GW is the distributed gateway  port  corre‐
3572                     sponding  to  the  NAT  rule (specified or inferred). For
3573                     dnat_and_snat NAT rule, P is the logical  port  specified
3574                     in  the  NAT rule. If logical_port column of NAT table is
3575                     NOT set, then P is the chassisredirect port  of  GW  with
3576                     the following actions:
3577
3578                     clone {
3579                         ct_clear;
3580                         inport = outport;
3581                         outport = "";
3582                         flags = 0;
3583                         flags.loopback = 1;
3584                         flags.use_snat_zone = REGBIT_DST_NAT_IP_LOCAL;
3585                         reg0 = 0;
3586                         reg1 = 0;
3587                         ...
3588                         reg9 = 0;
3589                         REGBIT_EGRESS_LOOPBACK = 1;
3590                         next(pipeline=ingress, table=0);
3591                     };
3592
3593
3594                     flags.loopback  is set since in_port is unchanged and the
3595                     packet may return back to that port after NAT processing.
3596                     REGBIT_EGRESS_LOOPBACK  is  set  to  indicate that egress
3597                     loopback has occurred, in order to skip the source IP ad‐
3598                     dress check against the router address.
3599
3600              •      A priority-0 logical flow with match 1 has actions next;.
3601
3602     Egress Table 5: Delivery
3603
3604       Packets that reach this table are ready for delivery. It contains:
3605
3606              •      Priority-110  logical flows that match IP multicast pack‐
3607                     ets on each enabled logical router port  and  modify  the
3608                     Ethernet  source  address  of the packets to the Ethernet
3609                     address of the port and then execute action output;.
3610
3611              •      Priority-100 logical flows that match packets on each en‐
3612                     abled logical router port, with action output;.
3613
3614              •      A  priority-0  logical  flow that matches all packets not
3615                     already handled (match 1) and drops them (action drop;).
3616

DROP SAMPLING

3618       As described in the previous section, there are  several  places  where
3619       ovn-northd might decided to drop a packet by explicitly creating a Log‐
3620       ical_Flow with the drop; action.
3621
3622       When debug drop-sampling has been cofigured in the OVN Northbound data‐
3623       base,  the  ovn-northd  will  replace all the drop; actions with a sam‐
3624       ple(priority=65535,        collector_set=id,         obs_domain=obs_id,
3625       obs_point=@cookie) action, where:
3626
3627id  is the value the debug_drop_collector_set option con‐
3628                     figured in the OVN Northbound.
3629
3630obs_id has it’s 8 most  significant  bits  equal  to  the
3631                     value  of  the  debug_drop_domain_id  option  in  the OVN
3632                     Northbound and it’s 24 least significant  bits  equal  to
3633                     the datapath’s tunnel key.
3634
3635
3636
3637OVN 22.12.0                       ovn-northd                     ovn-northd(8)
Impressum