1ovn-architecture(7)               OVN Manual               ovn-architecture(7)
2
3
4

NAME

6       ovn-architecture - Open Virtual Network architecture
7

DESCRIPTION

9       OVN,  the  Open Virtual Network, is a system to support logical network
10       abstraction in virtual machine and container environments. OVN  comple‐
11       ments  the existing capabilities of OVS to add native support for logi‐
12       cal network abstractions, such as logical L2 and L3 overlays and  secu‐
13       rity  groups.  Services  such as DHCP are also desirable features. Just
14       like OVS, OVN’s design goal is to have a production-quality implementa‐
15       tion that can operate at significant scale.
16
17       A  physical  network comprises physical wires, switches, and routers. A
18       virtual network extends a physical network into a  hypervisor  or  con‐
19       tainer  platform, bridging VMs or containers into the physical network.
20       An OVN logical network is a network implemented in software that is in‐
21       sulated  from  physical (and thus virtual) networks by tunnels or other
22       encapsulations. This allows IP and other address spaces used in logical
23       networks  to overlap with those used on physical networks without caus‐
24       ing conflicts. Logical network topologies can be arranged  without  re‐
25       gard  for  the  topologies  of the physical networks on which they run.
26       Thus, VMs that are part of a logical network can migrate from one phys‐
27       ical  machine  to  another without network disruption. See Logical Net‐
28       works, below, for more information.
29
30       The encapsulation layer prevents VMs and containers connected to a log‐
31       ical  network  from  communicating with nodes on physical networks. For
32       clustering VMs and containers, this can be acceptable  or  even  desir‐
33       able,  but  in  many  cases  VMs and containers do need connectivity to
34       physical networks. OVN provides multiple forms  of  gateways  for  this
35       purpose. See Gateways, below, for more information.
36
37       An OVN deployment consists of several components:
38
39              •      A  Cloud Management System (CMS), which is OVN’s ultimate
40                     client (via its users and administrators).  OVN  integra‐
41                     tion  requires  installing  a CMS-specific plugin and re‐
42                     lated software (see below). OVN initially  targets  Open‐
43                     Stack as CMS.
44
45                     We  generally  speak  of ``the’’ CMS, but one can imagine
46                     scenarios in which multiple CMSes manage different  parts
47                     of an OVN deployment.
48
49              •      An OVN Database physical or virtual node (or, eventually,
50                     cluster) installed in a central location.
51
52              •      One or more (usually many) hypervisors. Hypervisors  must
53                     run Open vSwitch and implement the interface described in
54                     Documentation/topics/integration.rst in  the  OVN  source
55                     tree.  Any  hypervisor platform supported by Open vSwitch
56                     is acceptable.
57
58              •      Zero or more gateways. A gateway extends  a  tunnel-based
59                     logical  network  into a physical network by bidirection‐
60                     ally forwarding packets between tunnels  and  a  physical
61                     Ethernet  port.  This  allows non-virtualized machines to
62                     participate in logical networks. A gateway may be a phys‐
63                     ical  host,  a virtual machine, or an ASIC-based hardware
64                     switch that supports the vtep(5) schema.
65
66                     Hypervisors and gateways are  together  called  transport
67                     node or chassis.
68
69       The  diagram  below  shows  how the major components of OVN and related
70       software interact. Starting at the top of the diagram, we have:
71
72              •      The Cloud Management System, as defined above.
73
74              •      The OVN/CMS Plugin is the component of the CMS  that  in‐
75                     terfaces  to OVN. In OpenStack, this is a Neutron plugin.
76                     The plugin’s main purpose is to translate the  CMS’s  no‐
77                     tion  of  logical  network  configuration,  stored in the
78                     CMS’s configuration database in  a  CMS-specific  format,
79                     into an intermediate representation understood by OVN.
80
81                     This  component  is  necessarily  CMS-specific,  so a new
82                     plugin needs to be developed for each CMS that  is  inte‐
83                     grated  with OVN. All of the components below this one in
84                     the diagram are CMS-independent.
85
86              •      The OVN Northbound  Database  receives  the  intermediate
87                     representation  of  logical  network configuration passed
88                     down by the OVN/CMS Plugin. The database schema is  meant
89                     to  be  ``impedance matched’’ with the concepts used in a
90                     CMS, so that it  directly  supports  notions  of  logical
91                     switches, routers, ACLs, and so on. See ovn-nb(5) for de‐
92                     tails.
93
94                     The OVN Northbound Database has  only  two  clients:  the
95                     OVN/CMS Plugin above it and ovn-northd below it.
96
97ovn-northd(8)  connects  to  the  OVN Northbound Database
98                     above it and the OVN Southbound  Database  below  it.  It
99                     translates  the logical network configuration in terms of
100                     conventional network concepts, taken from the OVN  North‐
101                     bound  Database,  into  logical datapath flows in the OVN
102                     Southbound Database below it.
103
104              •      The OVN Southbound Database is the center of the  system.
105                     Its  clients  are  ovn-northd(8)  above  it  and ovn-con‐
106                     troller(8) on every transport node below it.
107
108                     The OVN Southbound Database contains three kinds of data:
109                     Physical  Network  (PN)  tables that specify how to reach
110                     hypervisor and other nodes, Logical Network  (LN)  tables
111                     that  describe  the logical network in terms of ``logical
112                     datapath flows,’’ and Binding tables  that  link  logical
113                     network  components’  locations  to the physical network.
114                     The hypervisors populate the PN and Port_Binding  tables,
115                     whereas ovn-northd(8) populates the LN tables.
116
117                     OVN  Southbound  Database performance must scale with the
118                     number of transport nodes. This will likely require  some
119                     work  on  ovsdb-server(1)  as  we  encounter bottlenecks.
120                     Clustering for availability may be needed.
121
122       The remaining components are replicated onto each hypervisor:
123
124ovn-controller(8) is OVN’s agent on each  hypervisor  and
125                     software  gateway.  Northbound,  it  connects  to the OVN
126                     Southbound Database to learn about OVN configuration  and
127                     status  and to populate the PN table and the Chassis col‐
128                     umn in Binding table with the hypervisor’s status. South‐
129                     bound, it connects to ovs-vswitchd(8) as an OpenFlow con‐
130                     troller, for control over network traffic, and to the lo‐
131                     cal  ovsdb-server(1)  to  allow it to monitor and control
132                     Open vSwitch configuration.
133
134ovs-vswitchd(8) and ovsdb-server(1) are conventional com‐
135                     ponents of Open vSwitch.
136
137                                         CMS
138                                          |
139                                          |
140                              +-----------|-----------+
141                              |           |           |
142                              |     OVN/CMS Plugin    |
143                              |           |           |
144                              |           |           |
145                              |   OVN Northbound DB   |
146                              |           |           |
147                              |           |           |
148                              |       ovn-northd      |
149                              |           |           |
150                              +-----------|-----------+
151                                          |
152                                          |
153                                +-------------------+
154                                | OVN Southbound DB |
155                                +-------------------+
156                                          |
157                                          |
158                       +------------------+------------------+
159                       |                  |                  |
160         HV 1          |                  |    HV n          |
161       +---------------|---------------+  .  +---------------|---------------+
162       |               |               |  .  |               |               |
163       |        ovn-controller         |  .  |        ovn-controller         |
164       |         |          |          |  .  |         |          |          |
165       |         |          |          |     |         |          |          |
166       |  ovs-vswitchd   ovsdb-server  |     |  ovs-vswitchd   ovsdb-server  |
167       |                               |     |                               |
168       +-------------------------------+     +-------------------------------+
169
170
171   Information Flow in OVN
172       Configuration  data  in OVN flows from north to south. The CMS, through
173       its  OVN/CMS  plugin,  passes  the  logical  network  configuration  to
174       ovn-northd  via  the  northbound database. In turn, ovn-northd compiles
175       the configuration into a lower-level form and passes it to all  of  the
176       chassis via the southbound database.
177
178       Status information in OVN flows from south to north. OVN currently pro‐
179       vides only a few forms of status information. First,  ovn-northd  popu‐
180       lates  the  up column in the northbound Logical_Switch_Port table: if a
181       logical port’s chassis column in the southbound Port_Binding  table  is
182       nonempty,  it  sets up to true, otherwise to false. This allows the CMS
183       to detect when a VM’s networking has come up.
184
185       Second, OVN provides feedback to the CMS on the realization of its con‐
186       figuration,  that is, whether the configuration provided by the CMS has
187       taken effect. This feature requires the CMS to  participate  in  a  se‐
188       quence number protocol, which works the following way:
189
190              1.  When  the  CMS  updates  the configuration in the northbound
191                  database, as part of the same transaction, it increments the
192                  value  of the nb_cfg column in the NB_Global table. (This is
193                  only necessary if the CMS wants to know when the  configura‐
194                  tion has been realized.)
195
196              2.  When  ovn-northd  updates the southbound database based on a
197                  given snapshot of the northbound database, it copies  nb_cfg
198                  from  northbound  NB_Global  into  the  southbound  database
199                  SB_Global table, as part of the same transaction. (Thus,  an
200                  observer  monitoring  both  databases can determine when the
201                  southbound database is caught up with the northbound.)
202
203              3.  After ovn-northd receives confirmation from  the  southbound
204                  database  server that its changes have committed, it updates
205                  sb_cfg in the northbound NB_Global table to the nb_cfg  ver‐
206                  sion  that  was  pushed  down. (Thus, the CMS or another ob‐
207                  server can determine when the southbound database is  caught
208                  up without a connection to the southbound database.)
209
210              4.  The  ovn-controller process on each chassis receives the up‐
211                  dated southbound database, with  the  updated  nb_cfg.  This
212                  process  in turn updates the physical flows installed in the
213                  chassis’s Open vSwitch instances. When it receives confirma‐
214                  tion from Open vSwitch that the physical flows have been up‐
215                  dated, it updates nb_cfg in its own Chassis  record  in  the
216                  southbound database.
217
218              5.  ovn-northd  monitors the nb_cfg column in all of the Chassis
219                  records in the southbound database. It keeps  track  of  the
220                  minimum  value  among all the records and copies it into the
221                  hv_cfg column in the northbound NB_Global table. (Thus,  the
222                  CMS or another observer can determine when all of the hyper‐
223                  visors have caught up to the northbound configuration.)
224
225   Chassis Setup
226       Each chassis in an OVN deployment  must  be  configured  with  an  Open
227       vSwitch  bridge dedicated for OVN’s use, called the integration bridge.
228       System startup  scripts  may  create  this  bridge  prior  to  starting
229       ovn-controller  if desired. If this bridge does not exist when ovn-con‐
230       troller starts, it will be created automatically with the default  con‐
231       figuration  suggested  below.  The  ports on the integration bridge in‐
232       clude:
233
234              •      On any chassis, tunnel ports that OVN  uses  to  maintain
235                     logical  network  connectivity.  ovn-controller adds, up‐
236                     dates, and removes these tunnel ports.
237
238              •      On a hypervisor, any VIFs that are to be attached to log‐
239                     ical  networks. The hypervisor itself, or the integration
240                     between Open vSwitch and  the  hypervisor  (described  in
241                     Documentation/topics/integration.rst) takes care of this.
242                     (This is not part of OVN or new to OVN; this  is  pre-ex‐
243                     isting integration work that has already been done on hy‐
244                     pervisors that support OVS.)
245
246              •      On a gateway, the physical port used for logical  network
247                     connectivity. System startup scripts add this port to the
248                     bridge prior to starting ovn-controller. This  can  be  a
249                     patch port to another bridge, instead of a physical port,
250                     in more sophisticated setups.
251
252       Other ports should not be attached to the integration bridge.  In  par‐
253       ticular, physical ports attached to the underlay network (as opposed to
254       gateway ports, which are physical ports attached to  logical  networks)
255       must not be attached to the integration bridge. Underlay physical ports
256       should instead be attached to a separate Open vSwitch bridge (they need
257       not be attached to any bridge at all, in fact).
258
259       The integration bridge should be configured as described below. The ef‐
260       fect    of    each    of    these    settings    is    documented    in
261       ovs-vswitchd.conf.db(5):
262
263              fail-mode=secure
264                     Avoids  switching  packets  between isolated logical net‐
265                     works before ovn-controller  starts  up.  See  Controller
266                     Failure Settings in ovs-vsctl(8) for more information.
267
268              other-config:disable-in-band=true
269                     Suppresses  in-band  control  flows  for  the integration
270                     bridge. It would be unusual for such  flows  to  show  up
271                     anyway,  because OVN uses a local controller (over a Unix
272                     domain socket) instead of a remote controller. It’s  pos‐
273                     sible,  however, for some other bridge in the same system
274                     to have an in-band remote controller, and  in  that  case
275                     this  suppresses the flows that in-band control would or‐
276                     dinarily set up. Refer to the documentation for more  in‐
277                     formation.
278
279       The  customary  name  for the integration bridge is br-int, but another
280       name may be used.
281
282   Logical Networks
283       Logical network concepts in OVN include logical  switches  and  logical
284       routers,  the  logical version of Ethernet switches and IP routers, re‐
285       spectively. Like their physical cousins, logical switches  and  routers
286       can  be  connected  into sophisticated topologies. Logical switches and
287       routers are ordinarily purely logical entities, that is, they  are  not
288       associated  or bound to any physical location, and they are implemented
289       in a distributed manner at each hypervisor that participates in OVN.
290
291       Logical switch ports (LSPs) are points of connectivity into and out  of
292       logical  switches.  There  are  many kinds of logical switch ports. The
293       most ordinary kind represent VIFs, that is, attachment points  for  VMs
294       or containers. A VIF logical port is associated with the physical loca‐
295       tion of its VM, which might change as the VM migrates. (A  VIF  logical
296       port  can  be  associated  with a VM that is powered down or suspended.
297       Such a logical port has no location and no connectivity.)
298
299       Logical router ports (LRPs) are points of connectivity into and out  of
300       logical  routers.  A  LRP connects a logical router either to a logical
301       switch or to another logical router. Logical routers  only  connect  to
302       VMs,  containers,  and  other network nodes indirectly, through logical
303       switches.
304
305       Logical switches and logical routers have  distinct  kinds  of  logical
306       ports,  so  properly  speaking  one  should  usually talk about logical
307       switch ports or logical router ports. However, an unqualified ``logical
308       port’’ usually refers to a logical switch port.
309
310       When a VM sends a packet to a VIF logical switch port, the Open vSwitch
311       flow tables simulate the packet’s journey through that  logical  switch
312       and  any  other  logical routers and logical switches that it might en‐
313       counter. This happens without transmitting the packet across any physi‐
314       cal  medium: the flow tables implement all of the switching and routing
315       decisions and behavior. If the flow tables ultimately decide to  output
316       the packet at a logical port attached to another hypervisor (or another
317       kind of transport node), then that is the time at which the  packet  is
318       encapsulated for physical network transmission and sent.
319
320     Logical Switch Port Types
321
322       OVN  supports a number of kinds of logical switch ports. VIF ports that
323       connect to VMs or containers, described above, are  the  most  ordinary
324       kind  of  LSP.  In the OVN northbound database, VIF ports have an empty
325       string for their type. This section describes some  of  the  additional
326       port types.
327
328       A  router  logical  switch  port connects a logical switch to a logical
329       router, designating a particular LRP as its peer.
330
331       A localnet logical switch port bridges a logical switch to  a  physical
332       VLAN. A logical switch may have one or more localnet ports. Such a log‐
333       ical switch is used in two scenarios:
334
335              •      With one or more router logical switch ports,  to  attach
336                     L3 gateway routers and distributed gateways to a physical
337                     network.
338
339              •      With one or more VIF logical switch ports, to attach  VMs
340                     or  containers  directly  to  a physical network. In this
341                     case, the logical switch is not really logical, since  it
342                     is  bridged to the physical network rather than insulated
343                     from it, and therefore cannot have independent but  over‐
344                     lapping  IP  address  namespaces, etc. A deployment might
345                     nevertheless choose such a configuration to  take  advan‐
346                     tage  of  the OVN control plane and features such as port
347                     security and ACLs.
348
349       When a logical switch contains multiple localnet ports,  the  following
350       is assumed.
351
352              •      Each chassis has a bridge mapping for one of the localnet
353                     physical networks only.
354
355              •      To facilitate interconnectivity between VIF ports of  the
356                     switch that are located on different chassis with differ‐
357                     ent physical network connectivity, the fabric  implements
358                     L3  routing  between these adjacent physical network seg‐
359                     ments.
360
361       Note: nothing said above implies that a chassis cannot  be  plugged  to
362       multiple  physical  networks  as  long  as  they  belong  to  different
363       switches.
364
365       A localport logical switch port is a special kind of VIF logical switch
366       port.  These  ports are present in every chassis, not bound to any par‐
367       ticular one. Traffic to such a port will never be forwarded  through  a
368       tunnel, and traffic from such a port is expected to be destined only to
369       the same chassis, typically in response to a request it received. Open‐
370       Stack  Neutron  uses a localport port to serve metadata to VMs. A meta‐
371       data proxy process is attached to this port on every host and  all  VMs
372       within  the same network will reach it at the same IP/MAC address with‐
373       out any traffic being sent over a tunnel. For further details, see  the
374       OpenStack documentation for networking-ovn.
375
376       LSP  types  vtep and l2gateway are used for gateways. See Gateways, be‐
377       low, for more information.
378
379     Implementation Details
380
381       These concepts are details of how OVN is implemented  internally.  They
382       might still be of interest to users and administrators.
383
384       Logical  datapaths  are an implementation detail of logical networks in
385       the OVN southbound database. ovn-northd translates each logical  switch
386       or  router  in  the  northbound database into a logical datapath in the
387       southbound database Datapath_Binding table.
388
389       For the most part, ovn-northd also translates each logical switch  port
390       in the OVN northbound database into a record in the southbound database
391       Port_Binding table. The latter table corresponds roughly to the  north‐
392       bound  Logical_Switch_Port table. It has multiple types of logical port
393       bindings, of which many types correspond  directly  to  northbound  LSP
394       types. LSP types handled this way include VIF (empty string), localnet,
395       localport, vtep, and l2gateway.
396
397       The Port_Binding table has some types of port binding that do not  cor‐
398       respond directly to logical switch port types. The common is patch port
399       bindings, known as logical patch ports. These port bindings always  oc‐
400       cur  in pairs, and a packet that enters on either side comes out on the
401       other. ovn-northd connects logical switches  and  logical  routers  to‐
402       gether using logical patch ports.
403
404       Port  bindings  with types vtep, l2gateway, l3gateway, and chassisredi‐
405       rect are used for gateways. These are explained in Gateways, below.
406
407   Gateways
408       Gateways provide limited  connectivity  between  logical  networks  and
409       physical ones. They can also provide connectivity between different OVN
410       deployments. This section will focus on the former, and the latter will
411       be described in details in section OVN Deployments Interconnection.
412
413       OVN support multiple kinds of gateways.
414
415     VTEP Gateways
416
417       A  ``VTEP  gateway’’  connects an OVN logical network to a physical (or
418       virtual) switch that implements the OVSDB VTEP schema that  accompanies
419       Open vSwitch. (The ``VTEP gateway’’ term is a misnomer, since a VTEP is
420       just a VXLAN Tunnel Endpoint, but it is a well established  name.)  See
421       Life Cycle of a VTEP gateway, below, for more information.
422
423       The  main  intended  use  case  for VTEP gateways is to attach physical
424       servers to an OVN logical network using a physical  top-of-rack  switch
425       that supports the OVSDB VTEP schema.
426
427     L2 Gateways
428
429       A L2 gateway simply attaches a designated physical L2 segment available
430       on some chassis to a logical network. The physical network  effectively
431       becomes part of the logical network.
432
433       To set up a L2 gateway, the CMS adds an l2gateway LSP to an appropriate
434       logical switch, setting LSP options to name the  chassis  on  which  it
435       should be bound. ovn-northd copies this configuration into a southbound
436       Port_Binding record. On the designated chassis, ovn-controller forwards
437       packets appropriately to and from the physical segment.
438
439       L2  gateway ports have features in common with localnet ports. However,
440       with a localnet port, the physical network becomes  the  transport  be‐
441       tween  hypervisors.  With  an L2 gateway, packets are still transported
442       between hypervisors over tunnels and the l2gateway port  is  only  used
443       for  the  packets that are on the physical network. The application for
444       L2 gateways is similar to that for VTEP gateways, e.g. to add  non-vir‐
445       tualized  machines to a logical network, but L2 gateways do not require
446       special support from top-of-rack hardware switches.
447
448     L3 Gateway Routers
449
450       As described above under Logical Networks, ordinary OVN logical routers
451       are  distributed: they are not implemented in a single place but rather
452       in every hypervisor chassis. This is a problem  for  stateful  services
453       such  as  SNAT  and DNAT, which need to be implemented in a centralized
454       manner.
455
456       To allow for this  kind  of  functionality,  OVN  supports  L3  gateway
457       routers, which are OVN logical routers that are implemented in a desig‐
458       nated chassis. Gateway routers are typically used  between  distributed
459       logical  routers  and physical networks. The distributed logical router
460       and the logical switches behind it, to which VMs and containers attach,
461       effectively  reside  on each hypervisor. The distributed router and the
462       gateway router are connected by another logical switch,  sometimes  re‐
463       ferred  to  as  a  ``join’’ logical switch. (OVN logical routers may be
464       connected to one another directly, without an intervening  switch,  but
465       the  OVN  implementation only supports gateway logical routers that are
466       connected to logical switches. Using a join logical switch also reduces
467       the  number  of  IP addresses needed on the distributed router.) On the
468       other side, the gateway router connects to another logical switch  that
469       has a localnet port connecting to the physical network.
470
471       The  following  diagram  shows a typical situation. One or more logical
472       switches LS1, ..., LSn connect to distributed logical router LR1, which
473       in  turn  connects  through LSjoin to gateway logical router GLR, which
474       also connects to logical switch LSlocal, which includes a localnet port
475       to attach to the physical network.
476
477                                       LSlocal
478                                          |
479                                         GLR
480                                          |
481                                       LSjoin
482                                          |
483                                         LR1
484                                          |
485                                     +----+----+
486                                     |    |    |
487                                    LS1  ...  LSn
488
489
490       To  configure an L3 gateway router, the CMS sets options:chassis in the
491       router’s northbound Logical_Router to the chassis’s name. In  response,
492       ovn-northd  uses  a  special l3gateway port binding (instead of a patch
493       binding) in the southbound database to connect the  logical  router  to
494       its  neighbors.  In  turn,  ovn-controller tunnels packets to this port
495       binding to the designated L3 gateway  chassis,  instead  of  processing
496       them locally.
497
498       DNAT and SNAT rules may be associated with a gateway router, which pro‐
499       vides a central location that can handle one-to-many SNAT (aka IP  mas‐
500       querading).  Distributed  gateway  ports, described below, also support
501       NAT.
502
503     Distributed Gateway Ports
504
505       A distributed gateway port is a logical router port that  is  specially
506       configured  to  designate one distinguished chassis, called the gateway
507       chassis, for centralized processing. A distributed gateway port  should
508       connect  to  a logical switch that has an LSP that connects externally,
509       that is, either a localnet LSP or a connection to another  OVN  deploy‐
510       ment  (see  OVN Deployments Interconnection). Packets that traverse the
511       distributed gateway port are processed without  involving  the  gateway
512       chassis  when  they  can  be, but when needed they do take an extra hop
513       through it.
514
515       The following diagram illustrates the  use  of  a  distributed  gateway
516       port. A number of logical switches LS1, ..., LSn connect to distributed
517       logical router LR1, which in  turn  connects  through  the  distributed
518       gateway port to logical switch LSlocal that includes a localnet port to
519       attach to the physical network.
520
521                                       LSlocal
522                                          |
523                                         LR1
524                                          |
525                                     +----+----+
526                                     |    |    |
527                                    LS1  ...  LSn
528
529
530       ovn-northd creates two southbound Port_Binding records to  represent  a
531       distributed  gateway  port, instead of the usual one. One of these is a
532       patch port binding named for the LRP, which is used for as much traffic
533       as  it  can. The other one is a port binding with type chassisredirect,
534       named cr-port. The chassisredirect port  binding  has  one  specialized
535       job: when a packet is output to it, the flow table causes it to be tun‐
536       neled to the gateway chassis, at which point it is automatically output
537       to the patch port binding. Thus, the flow table can output to this port
538       binding in cases where a particular task has to happen on  the  gateway
539       chassis.  The  chassisredirect  port binding is not otherwise used (for
540       example, it never receives packets).
541
542       The CMS may configure distributed gateway ports three  different  ways.
543       See   Distributed   Gateway   Ports  in  the  documentation  for  Logi‐
544       cal_Router_Port in ovn-nb(5) for details.
545
546       Distributed gateway ports support high availability. When more than one
547       chassis  is specified, OVN only uses one at a time as the gateway chas‐
548       sis. OVN uses BFD to monitor gateway connectivity, preferring the high‐
549       est-priority gateway that is online.
550
551       A logical router can have multiple distributed gateway ports, each con‐
552       necting different external networks. However, some  features,  such  as
553       NAT  and load balancers, are not supported yet for logical routers with
554       more than one distributed gateway port configured.
555
556       Physical VLAN MTU Issues
557
558       Consider the preceding diagram again:
559
560                                       LSlocal
561                                          |
562                                         LR1
563                                          |
564                                     +----+----+
565                                     |    |    |
566                                    LS1  ...  LSn
567
568
569       Suppose that each logical switch LS1, ..., LSn is bridged to a physical
570       VLAN-tagged network attached to a localnet port on LSlocal, over a dis‐
571       tributed gateway port on LR1. If a packet originating on  LSi  is  des‐
572       tined to the external network, OVN sends it to the gateway chassis over
573       a tunnel. There, the packet traverses LR1’s  logical  router  pipeline,
574       possibly  undergoes  NAT,  and eventually ends up at LSlocal’s localnet
575       port. If all of the physical links in the network have  the  same  MTU,
576       then the packet’s transit across a tunnel causes an MTU problem: tunnel
577       overhead prevents a packet that uses the full physical MTU from  cross‐
578       ing the tunnel to the gateway chassis (without fragmentation).
579
580       OVN  offers two solutions to this problem, the reside-on-redirect-chas‐
581       sis and redirect-type options.  Both  solutions  require  each  logical
582       switch  LS1,  ...,  LSn  to include a localnet logical switch port LN1,
583       ..., LNn respectively, that is present  on  each  chassis.  Both  cause
584       packets  to  be  sent  over the localnet ports instead of tunnels. They
585       differ in which packets-some or all-are sent this way. The most  promi‐
586       nent  tradeoff between these options is that reside-on-redirect-chassis
587       is easier to configure and that redirect-type performs better for east-
588       west traffic.
589
590       The first solution is the reside-on-redirect-chassis option for logical
591       router ports. Setting this option on a LRP from (e.g.) LS1 to LR1  dis‐
592       ables  forwarding  from  LS1  to  LR1 except on the gateway chassis. On
593       chassis other than the gateway chassis, this single change  means  that
594       packets  that  would  otherwise  have been forwarded to LR1 are instead
595       forwarded to LN1. The instance of LN1 on the gateway chassis  then  re‐
596       ceives  the packet and forwards it to LR1. The packet traverses the LR1
597       logical router pipeline, possibly undergoes NAT, and eventually ends up
598       at LSlocal’s localnet port. The packet never traverses a tunnel, avoid‐
599       ing the MTU issue.
600
601       This option has the further consequence of centralizing ``distributed’’
602       logical  router  LR1, since no packets are forwarded from LS1 to LR1 on
603       any chassis other than the gateway chassis. Therefore, east-west  traf‐
604       fic  passes  through  the  gateway  chassis, not just north-south. (The
605       naive ``fix’’ of allowing east-west traffic to  flow  directly  between
606       chassis over LN1 does not work because routing sets the Ethernet source
607       address to LR1’s source address. Seeing this single Ethernet source ad‐
608       dress  originate  from  all  of  the  chassis will confuse the physical
609       switch.)
610
611       Do not set the reside-on-redirect-chassis option on a distributed gate‐
612       way  port. In the diagram above, it would be set on the LRPs connecting
613       LS1, ..., LSn to LR1.
614
615       The second solution is the redirect-type option for distributed gateway
616       ports.  Setting  this  option  to bridged causes packets that are redi‐
617       rected to the gateway chassis to go over the localnet ports instead  of
618       being  tunneled. This option does not change how OVN treats packets not
619       redirected to the gateway chassis.
620
621       The redirect-type option requires the administrator or the CMS to  con‐
622       figure  each  participating  chassis with a unique Ethernet address for
623       the logical router by  setting  ovn-chassis-mac-mappings  in  the  Open
624       vSwitch  database, for use by ovn-controller. This makes it more diffi‐
625       cult to configure than reside-on-redirect-chassis.
626
627       Set the redirect-type option on a distributed gateway port.
628
629       Using Distributed Gateway Ports For Scalability
630
631       Although the primary goal of distributed gateway ports  is  to  provide
632       connectivity  to  external  networks,  there  is a special use case for
633       scalability.
634
635       In some deployments, such as the  ones  using  ovn-kubernetes,  logical
636       switches are bound to individual chassises, and are connected by a dis‐
637       tributed logical router. In such deployments, the chassis level logical
638       switches  are  centralized on the chassis instead of distributed, which
639       means the ovn-controller on each chassis doesn’t need to process  flows
640       and  ports of logical switches on other chassises. However, without any
641       specific hint, ovn-controller  would  still  process  all  the  logical
642       switches  as  if  they are fully distributed. In this case, distributed
643       gateway port can be very useful. The chassis level logical switches can
644       be connected to the distributed router using distributed gateway ports,
645       by setting the gateway chassis (or HA chassis groups with only a single
646       chassis  in  it)  to  the chassis that each logical switch is bound to.
647       ovn-controller would then skip processing the logical switches  on  all
648       the other chassises, largely improving the scalability, especially when
649       there are a big number of chassises.
650
651   Life Cycle of a VIF
652       Tables and their schemas presented in isolation are difficult to under‐
653       stand. Here’s an example.
654
655       A VIF on a hypervisor is a virtual network interface attached either to
656       a VM or a container running directly on that hypervisor (This  is  dif‐
657       ferent from the interface of a container running inside a VM).
658
659       The  steps  in  this  example refer often to details of the OVN and OVN
660       Northbound database schemas. Please see ovn-sb(5)  and  ovn-nb(5),  re‐
661       spectively, for the full story on these databases.
662
663              1.  A VIF’s life cycle begins when a CMS administrator creates a
664                  new VIF using the CMS user interface or API and adds it to a
665                  switch (one implemented by OVN as a logical switch). The CMS
666                  updates its own  configuration.  This  includes  associating
667                  unique,  persistent  identifier  vif-id and Ethernet address
668                  mac with the VIF.
669
670              2.  The CMS plugin updates the OVN Northbound  database  to  in‐
671                  clude   the   new   VIF,  by  adding  a  row  to  the  Logi‐
672                  cal_Switch_Port table. In the new row, name is  vif-id,  mac
673                  is  mac,  switch  points  to  the OVN logical switch’s Logi‐
674                  cal_Switch record, and other columns are initialized  appro‐
675                  priately.
676
677              3.  ovn-northd  receives  the OVN Northbound database update. In
678                  turn, it makes the corresponding updates to the  OVN  South‐
679                  bound  database,  by adding rows to the OVN Southbound data‐
680                  base Logical_Flow table to reflect the new port, e.g. add  a
681                  flow  to  recognize  that packets destined to the new port’s
682                  MAC address should be delivered to it, and update  the  flow
683                  that delivers broadcast and multicast packets to include the
684                  new port. It also creates a record in the Binding table  and
685                  populates  all its columns except the column that identifies
686                  the chassis.
687
688              4.  On  every  hypervisor,  ovn-controller  receives  the  Logi‐
689                  cal_Flow  table updates that ovn-northd made in the previous
690                  step. As long as the VM that owns the VIF  is  powered  off,
691                  ovn-controller  cannot  do much; it cannot, for example, ar‐
692                  range to send packets to or receive packets  from  the  VIF,
693                  because the VIF does not actually exist anywhere.
694
695              5.  Eventually,  a  user  powers on the VM that owns the VIF. On
696                  the hypervisor where the VM is powered on,  the  integration
697                  between  the hypervisor and Open vSwitch (described in Docu‐
698                  mentation/topics/integration.rst) adds the VIF  to  the  OVN
699                  integration    bridge    and   stores   vif-id   in   exter‐
700                  nal_ids:iface-id to indicate that the interface  is  an  in‐
701                  stantiation  of  the  new  VIF. (None of this code is new in
702                  OVN; this is pre-existing integration work that has  already
703                  been done on hypervisors that support OVS.)
704
705              6.  On the hypervisor where the VM is powered on, ovn-controller
706                  notices external_ids:iface-id in the new Interface.  In  re‐
707                  sponse, in the OVN Southbound DB, it updates the Binding ta‐
708                  ble’s chassis column for the row that links the logical port
709                  from  external_ids:  iface-id  to the hypervisor. Afterward,
710                  ovn-controller updates the local hypervisor’s  OpenFlow  ta‐
711                  bles  so  that packets to and from the VIF are properly han‐
712                  dled.
713
714              7.  Some CMS systems, including OpenStack, fully start a VM only
715                  when  its  networking  is ready. To support this, ovn-northd
716                  notices the chassis column updated for the  row  in  Binding
717                  table  and  pushes  this upward by updating the up column in
718                  the OVN Northbound database’s Logical_Switch_Port  table  to
719                  indicate  that  the  VIF is now up. The CMS, if it uses this
720                  feature, can then react by allowing the  VM’s  execution  to
721                  proceed.
722
723              8.  On  every  hypervisor  but  the  one  where the VIF resides,
724                  ovn-controller notices the completely populated row  in  the
725                  Binding table. This provides ovn-controller the physical lo‐
726                  cation of the logical port, so  each  instance  updates  the
727                  OpenFlow  tables  of  its  switch (based on logical datapath
728                  flows in the OVN DB Logical_Flow table) so that  packets  to
729                  and from the VIF can be properly handled via tunnels.
730
731              9.  Eventually,  a  user powers off the VM that owns the VIF. On
732                  the hypervisor where the VM was  powered  off,  the  VIF  is
733                  deleted from the OVN integration bridge.
734
735              10. On  the  hypervisor  where  the VM was powered off, ovn-con‐
736                  troller notices that the VIF was deleted.  In  response,  it
737                  removes  the Chassis column content in the Binding table for
738                  the logical port.
739
740              11. On every hypervisor, ovn-controller notices the empty  Chas‐
741                  sis  column in the Binding table’s row for the logical port.
742                  This means that ovn-controller no longer knows the  physical
743                  location  of  the logical port, so each instance updates its
744                  OpenFlow table to reflect that.
745
746              12. Eventually, when the VIF (or its entire  VM)  is  no  longer
747                  needed by anyone, an administrator deletes the VIF using the
748                  CMS user interface or API. The CMS updates its own  configu‐
749                  ration.
750
751              13. The CMS plugin removes the VIF from the OVN Northbound data‐
752                  base, by deleting its row in the Logical_Switch_Port table.
753
754              14. ovn-northd receives the OVN Northbound update  and  in  turn
755                  updates the OVN Southbound database accordingly, by removing
756                  or updating the rows from the OVN Southbound database  Logi‐
757                  cal_Flow  table  and  Binding table that were related to the
758                  now-destroyed VIF.
759
760              15. On  every  hypervisor,  ovn-controller  receives  the  Logi‐
761                  cal_Flow  table updates that ovn-northd made in the previous
762                  step. ovn-controller updates OpenFlow tables to reflect  the
763                  update,  although there may not be much to do, since the VIF
764                  had already become unreachable when it was removed from  the
765                  Binding table in a previous step.
766
767   Life Cycle of a Container Interface Inside a VM
768       OVN  provides  virtual  network  abstractions by converting information
769       written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
770       virtual  networking  for multi-tenants can only be provided if OVN con‐
771       troller is the only entity that can modify flows in Open vSwitch.  When
772       the  Open vSwitch integration bridge resides in the hypervisor, it is a
773       fair assumption to make that tenant workloads running inside VMs cannot
774       make any changes to Open vSwitch flows.
775
776       If  the infrastructure provider trusts the applications inside the con‐
777       tainers not to break out and modify the Open vSwitch flows,  then  con‐
778       tainers  can be run in hypervisors. This is also the case when contain‐
779       ers are run inside the VMs and Open  vSwitch  integration  bridge  with
780       flows  added  by  OVN  controller  resides in the same VM. For both the
781       above cases, the workflow is the same as explained with an  example  in
782       the previous section ("Life Cycle of a VIF").
783
784       This  section talks about the life cycle of a container interface (CIF)
785       when containers are created in the VMs and the Open vSwitch integration
786       bridge resides inside the hypervisor. In this case, even if a container
787       application breaks out, other tenants are not affected because the con‐
788       tainers  running  inside  the  VMs  cannot modify the flows in the Open
789       vSwitch integration bridge.
790
791       When multiple containers are created inside a VM,  there  are  multiple
792       CIFs  associated  with  them. The network traffic associated with these
793       CIFs need to reach the Open vSwitch integration bridge running  in  the
794       hypervisor  for OVN to support virtual network abstractions. OVN should
795       also be able to distinguish network traffic coming from different CIFs.
796       There are two ways to distinguish network traffic of CIFs.
797
798       One  way  is  to  provide one VIF for every CIF (1:1 model). This means
799       that there could be a lot of network devices in  the  hypervisor.  This
800       would slow down OVS because of all the additional CPU cycles needed for
801       the management of all the VIFs. It would also mean that the entity cre‐
802       ating  the  containers in a VM should also be able to create the corre‐
803       sponding VIFs in the hypervisor.
804
805       The second way is to provide a single VIF  for  all  the  CIFs  (1:many
806       model).  OVN could then distinguish network traffic coming from differ‐
807       ent CIFs via a tag written in every packet. OVN uses this mechanism and
808       uses VLAN as the tagging mechanism.
809
810              1.  A CIF’s life cycle begins when a container is spawned inside
811                  a VM by the either the same CMS that created  the  VM  or  a
812                  tenant  that  owns that VM or even a container Orchestration
813                  System that is different than the CMS that initially created
814                  the VM. Whoever the entity is, it will need to know the vif-
815                  id that is associated with the network interface of  the  VM
816                  through  which  the container interface’s network traffic is
817                  expected to go through. The entity  that  creates  the  con‐
818                  tainer interface will also need to choose an unused VLAN in‐
819                  side that VM.
820
821              2.  The container spawning entity (either  directly  or  through
822                  the  CMS that manages the underlying infrastructure) updates
823                  the OVN Northbound database  to  include  the  new  CIF,  by
824                  adding  a  row  to the Logical_Switch_Port table. In the new
825                  row, name is any unique identifier, parent_name is the  vif-
826                  id  of the VM through which the CIF’s network traffic is ex‐
827                  pected to go through and the tag is the VLAN tag that  iden‐
828                  tifies the network traffic of that CIF.
829
830              3.  ovn-northd  receives  the OVN Northbound database update. In
831                  turn, it makes the corresponding updates to the  OVN  South‐
832                  bound  database,  by adding rows to the OVN Southbound data‐
833                  base’s Logical_Flow table to reflect the new port  and  also
834                  by  creating  a  new row in the Binding table and populating
835                  all its columns except the column that identifies the  chas‐
836                  sis.
837
838              4.  On   every  hypervisor,  ovn-controller  subscribes  to  the
839                  changes in the Binding table. When a new row is  created  by
840                  ovn-northd  that  includes  a value in parent_port column of
841                  Binding table, the ovn-controller in  the  hypervisor  whose
842                  OVN  integration bridge has that same value in vif-id in ex‐
843                  ternal_ids:iface-id updates the local hypervisor’s  OpenFlow
844                  tables so that packets to and from the VIF with the particu‐
845                  lar VLAN tag are properly handled. Afterward it updates  the
846                  chassis  column of the Binding to reflect the physical loca‐
847                  tion.
848
849              5.  One can only start the application inside the container  af‐
850                  ter  the  underlying  network  is  ready.  To  support this,
851                  ovn-northd notices the updated chassis column in Binding ta‐
852                  ble  and  updates  the up column in the OVN Northbound data‐
853                  base’s Logical_Switch_Port table to indicate that the CIF is
854                  now up. The entity responsible to start the container appli‐
855                  cation queries this value and starts the application.
856
857              6.  Eventually the entity that  created  and  started  the  con‐
858                  tainer,  stops it. The entity, through the CMS (or directly)
859                  deletes its row in the Logical_Switch_Port table.
860
861              7.  ovn-northd receives the OVN Northbound update  and  in  turn
862                  updates the OVN Southbound database accordingly, by removing
863                  or updating the rows from the OVN Southbound database  Logi‐
864                  cal_Flow  table  that were related to the now-destroyed CIF.
865                  It also deletes the row in the Binding table for that CIF.
866
867              8.  On  every  hypervisor,  ovn-controller  receives  the  Logi‐
868                  cal_Flow  table updates that ovn-northd made in the previous
869                  step. ovn-controller updates OpenFlow tables to reflect  the
870                  update.
871
872   Architectural Physical Life Cycle of a Packet
873       This section describes how a packet travels from one virtual machine or
874       container to another through OVN. This description focuses on the phys‐
875       ical treatment of a packet; for a description of the logical life cycle
876       of a packet, please refer to the Logical_Flow table in ovn-sb(5).
877
878       This section mentions several data and  metadata  fields,  for  clarity
879       summarized here:
880
881              tunnel key
882                     When  OVN encapsulates a packet in Geneve or another tun‐
883                     nel, it attaches extra data to it to allow the  receiving
884                     OVN  instance to process it correctly. This takes differ‐
885                     ent forms depending on the particular encapsulation,  but
886                     in  each  case we refer to it here as the ``tunnel key.’’
887                     See Tunnel Encapsulations, below, for details.
888
889              logical datapath field
890                     A field that denotes the logical datapath through which a
891                     packet  is being processed. OVN uses the field that Open‐
892                     Flow 1.1+ simply (and confusingly) calls ``metadata’’  to
893                     store  the logical datapath. (This field is passed across
894                     tunnels as part of the tunnel key.)
895
896              logical input port field
897                     A field that denotes the  logical  port  from  which  the
898                     packet  entered  the logical datapath. OVN stores this in
899                     Open vSwitch extension register number 14.
900
901                     Geneve and STT tunnels pass this field  as  part  of  the
902                     tunnel  key.  Ramp switch VXLAN tunnels do not explicitly
903                     carry a logical input port, but since they  are  used  to
904                     communicate  with  gateways  that  from OVN’s perspective
905                     consist of only a single logical port, so  that  OVN  can
906                     set  the  logical input port field to this one on ingress
907                     to the OVN logical pipeline. As for  regular  VXLAN  tun‐
908                     nels, they don’t carry input port field at all. This puts
909                     additional limitations on cluster capabilities  that  are
910                     described in Tunnel Encapsulations section.
911
912              logical output port field
913                     A  field  that  denotes  the  logical port from which the
914                     packet will leave the logical datapath. This is  initial‐
915                     ized  to  0 at the beginning of the logical ingress pipe‐
916                     line. OVN stores this in Open vSwitch extension  register
917                     number 15.
918
919                     Geneve,  STT and regular VXLAN tunnels pass this field as
920                     part of the tunnel key. Ramp switch VXLAN tunnels do  not
921                     transmit the logical output port field, and since they do
922                     not carry a logical output port field in the tunnel  key,
923                     when  a  packet is received from ramp switch VXLAN tunnel
924                     by an OVN hypervisor, the packet is resubmitted to  table
925                     8  to  determine  the  output  port(s);  when  the packet
926                     reaches table 32, these packets are resubmitted to  table
927                     33  for  local  delivery  by checking a MLF_RCV_FROM_RAMP
928                     flag, which is set when the packet arrives  from  a  ramp
929                     tunnel.
930
931              conntrack zone field for logical ports
932                     A  field  that  denotes  the connection tracking zone for
933                     logical ports. The value only has local significance  and
934                     is not meaningful between chassis. This is initialized to
935                     0 at the beginning of the logical ingress  pipeline.  OVN
936                     stores this in Open vSwitch extension register number 13.
937
938              conntrack zone fields for routers
939                     Fields  that  denote  the  connection  tracking zones for
940                     routers. These values only have  local  significance  and
941                     are  not  meaningful between chassis. OVN stores the zone
942                     information for north to south traffic (for  DNATting  or
943                     ECMP  symmetric replies) in Open vSwitch extension regis‐
944                     ter number 11 and zone information  for  south  to  north
945                     traffic  (for SNATing) in Open vSwitch extension register
946                     number 12.
947
948              logical flow flags
949                     The logical flags are intended to handle keeping  context
950                     between  tables  in order to decide which rules in subse‐
951                     quent tables are matched. These values  only  have  local
952                     significance  and are not meaningful between chassis. OVN
953                     stores the logical flags in Open vSwitch extension regis‐
954                     ter number 10.
955
956              VLAN ID
957                     The  VLAN ID is used as an interface between OVN and con‐
958                     tainers nested inside a VM (see Life Cycle of a container
959                     interface inside a VM, above, for more information).
960
961       Initially,  a  VM or container on the ingress hypervisor sends a packet
962       on a port attached to the OVN integration bridge. Then:
963
964              1.  OpenFlow table 0 performs  physical-to-logical  translation.
965                  It  matches  the packet’s ingress port. Its actions annotate
966                  the packet with logical metadata,  by  setting  the  logical
967                  datapath  field  to  identify  the logical datapath that the
968                  packet is traversing and the logical  input  port  field  to
969                  identify  the  ingress port. Then it resubmits to table 8 to
970                  enter the logical ingress pipeline.
971
972                  Packets that originate from a container nested within  a  VM
973                  are  treated  in  a  slightly different way. The originating
974                  container can be distinguished  based  on  the  VIF-specific
975                  VLAN  ID, so the physical-to-logical translation flows addi‐
976                  tionally match on VLAN ID and the  actions  strip  the  VLAN
977                  header.  Following  this  step, OVN treats packets from con‐
978                  tainers just like any other packets.
979
980                  Table 0 also processes packets that arrive from other  chas‐
981                  sis.  It  distinguishes  them  from other packets by ingress
982                  port, which is a tunnel. As with packets just  entering  the
983                  OVN  pipeline, the actions annotate these packets with logi‐
984                  cal datapath metadata. For tunnel  types  that  support  it,
985                  they  are also annotated with logical ingress port metadata.
986                  In addition, the actions set the logical output port  field,
987                  which is available because in OVN tunneling occurs after the
988                  logical output port is known. These  pieces  of  information
989                  are  obtained  from  the  tunnel encapsulation metadata (see
990                  Tunnel Encapsulations for encoding details).  Then  the  ac‐
991                  tions resubmit to table 33 to enter the logical egress pipe‐
992                  line.
993
994              2.  OpenFlow tables 8 through 31  execute  the  logical  ingress
995                  pipeline  from  the Logical_Flow table in the OVN Southbound
996                  database. These tables are expressed entirely  in  terms  of
997                  logical concepts like logical ports and logical datapaths. A
998                  big part of ovn-controller’s job is to translate  them  into
999                  equivalent  OpenFlow  (in particular it translates the table
1000                  numbers: Logical_Flow tables 0 through  23  become  OpenFlow
1001                  tables 8 through 31).
1002
1003                  Each logical flow maps to one or more OpenFlow flows. An ac‐
1004                  tual packet ordinarily matches only one of  these,  although
1005                  in  some  cases  it  can  match more than one of these flows
1006                  (which is not a problem because all of them  have  the  same
1007                  actions). ovn-controller uses the first 32 bits of the logi‐
1008                  cal flow’s UUID as the  cookie  for  its  OpenFlow  flow  or
1009                  flows.  (This  is not necessarily unique, since the first 32
1010                  bits of a logical flow’s UUID is not necessarily unique.)
1011
1012                  Some logical flows can map to the Open vSwitch ``conjunctive
1013                  match’’ extension (see ovs-fields(7)). Flows with a conjunc‐
1014                  tion action use an OpenFlow cookie of 0,  because  they  can
1015                  correspond  to multiple logical flows. The OpenFlow flow for
1016                  a conjunctive match includes a match on conj_id.
1017
1018                  Some logical flows may not be represented  in  the  OpenFlow
1019                  tables  on  a given hypervisor, if they could not be used on
1020                  that hypervisor. For example, if no VIF in a logical  switch
1021                  resides on a given hypervisor, and the logical switch is not
1022                  otherwise reachable on that hypervisor (e.g. over  a  series
1023                  of hops through logical switches and routers starting from a
1024                  VIF on the hypervisor), then the logical  flow  may  not  be
1025                  represented there.
1026
1027                  Most  OVN  actions  have  fairly  obvious implementations in
1028                  OpenFlow (with OVS extensions), e.g. next; is implemented as
1029                  resubmit,  field  =  constant; as set_field. A few are worth
1030                  describing in more detail:
1031
1032                  output:
1033                         Implemented by resubmitting the packet to  table  32.
1034                         If the pipeline executes more than one output action,
1035                         then each one is separately resubmitted to table  32.
1036                         This  can  be  used  to  send  multiple copies of the
1037                         packet to multiple ports. (If the packet was not mod‐
1038                         ified  between  the  output  actions, and some of the
1039                         copies are destined to the same hypervisor, then  us‐
1040                         ing  a logical multicast output port would save band‐
1041                         width between hypervisors.)
1042
1043                  get_arp(P, A);
1044                  get_nd(P, A);
1045                       Implemented by storing arguments into OpenFlow  fields,
1046                       then  resubmitting  to  table  66, which ovn-controller
1047                       populates with flows generated from the MAC_Binding ta‐
1048                       ble in the OVN Southbound database. If there is a match
1049                       in table 66, then its actions store the  bound  MAC  in
1050                       the Ethernet destination address field.
1051
1052                       (The  OpenFlow  actions  save  and restore the OpenFlow
1053                       fields used for the arguments, so that the OVN  actions
1054                       do not have to be aware of this temporary use.)
1055
1056                  put_arp(P, A, E);
1057                  put_nd(P, A, E);
1058                       Implemented  by  storing  the  arguments  into OpenFlow
1059                       fields, then outputting  a  packet  to  ovn-controller,
1060                       which updates the MAC_Binding table.
1061
1062                       (The  OpenFlow  actions  save  and restore the OpenFlow
1063                       fields used for the arguments, so that the OVN  actions
1064                       do not have to be aware of this temporary use.)
1065
1066                  R = lookup_arp(P, A, M);
1067                  R = lookup_nd(P, A, M);
1068                       Implemented  by storing arguments into OpenFlow fields,
1069                       then resubmitting to  table  67,  which  ovn-controller
1070                       populates with flows generated from the MAC_Binding ta‐
1071                       ble in the OVN Southbound database. If there is a match
1072                       in table 67, then its actions set the logical flow flag
1073                       MLF_LOOKUP_MAC.
1074
1075                       (The OpenFlow actions save  and  restore  the  OpenFlow
1076                       fields  used for the arguments, so that the OVN actions
1077                       do not have to be aware of this temporary use.)
1078
1079              3.  OpenFlow tables 32 through 47 implement the output action in
1080                  the logical ingress pipeline. Specifically, table 32 handles
1081                  packets to remote hypervisors, table 33 handles  packets  to
1082                  the  local  hypervisor,  and table 34 checks whether packets
1083                  whose logical ingress and egress port are the same should be
1084                  discarded.
1085
1086                  Logical  patch ports are a special case. Logical patch ports
1087                  do not have a physical location and  effectively  reside  on
1088                  every  hypervisor.  Thus, flow table 33, for output to ports
1089                  on the local hypervisor, naturally implements output to uni‐
1090                  cast  logical  patch  ports  too. However, applying the same
1091                  logic to a logical patch port that is part of a logical mul‐
1092                  ticast  group yields packet duplication, because each hyper‐
1093                  visor that contains a logical port in  the  multicast  group
1094                  will also output the packet to the logical patch port. Thus,
1095                  multicast groups implement output to logical patch ports  in
1096                  table 32.
1097
1098                  Each  flow  in table 32 matches on a logical output port for
1099                  unicast or multicast logical ports that  include  a  logical
1100                  port  on  a remote hypervisor. Each flow’s actions implement
1101                  sending a packet to the port it matches. For unicast logical
1102                  output ports on remote hypervisors, the actions set the tun‐
1103                  nel key to the correct value, then send the  packet  on  the
1104                  tunnel  port to the correct hypervisor. (When the remote hy‐
1105                  pervisor receives the packet, table 0 there  will  recognize
1106                  it  as a tunneled packet and pass it along to table 33.) For
1107                  multicast logical output ports, the actions send one copy of
1108                  the packet to each remote hypervisor, in the same way as for
1109                  unicast destinations. If a multicast group includes a  logi‐
1110                  cal  port or ports on the local hypervisor, then its actions
1111                  also resubmit to table 33. Table 32 also includes:
1112
1113                  •      A higher-priority rule to match packets received from
1114                         ramp switch tunnels, based on flag MLF_RCV_FROM_RAMP,
1115                         and resubmit these packets to table 33 for local  de‐
1116                         livery.  Packets  received  from  ramp switch tunnels
1117                         reach here because of a lack of logical  output  port
1118                         field in the tunnel key and thus these packets needed
1119                         to be submitted to table 8 to  determine  the  output
1120                         port.
1121
1122                  •      A higher-priority rule to match packets received from
1123                         ports of type localport, based on the  logical  input
1124                         port,  and resubmit these packets to table 33 for lo‐
1125                         cal delivery. Ports of type localport exist on  every
1126                         hypervisor  and  by  definition  their traffic should
1127                         never go out through a tunnel.
1128
1129                  •      A higher-priority rule to match packets that have the
1130                         MLF_LOCAL_ONLY  logical flow flag set, and whose des‐
1131                         tination is a multicast address. This flag  indicates
1132                         that the packet should not be delivered to remote hy‐
1133                         pervisors, even if the multicast destination includes
1134                         ports  on  remote hypervisors. This flag is used when
1135                         ovn-controller is the  originator  of  the  multicast
1136                         packet.  Since each ovn-controller instance is origi‐
1137                         nating these packets, the packets only need to be de‐
1138                         livered to local ports.
1139
1140                  •      A  fallback  flow that resubmits to table 33 if there
1141                         is no other match.
1142
1143                  Flows in table 33 resemble those in table 32 but for logical
1144                  ports  that reside locally rather than remotely. For unicast
1145                  logical output ports on the local  hypervisor,  the  actions
1146                  just  resubmit  to table 34. For multicast output ports that
1147                  include one or more logical ports on the  local  hypervisor,
1148                  for each such logical port P, the actions change the logical
1149                  output port to P, then resubmit to table 34.
1150
1151                  A special case is that when a localnet port  exists  on  the
1152                  datapath,  remote  port is connected by switching to the lo‐
1153                  calnet port. In this case, instead of adding a flow in table
1154                  32  to reach the remote port, a flow is added in table 33 to
1155                  switch the logical outport to the localnet port, and  resub‐
1156                  mit to table 33 as if it were unicasted to a logical port on
1157                  the local hypervisor.
1158
1159                  Table 34 matches and drops packets for which the logical in‐
1160                  put and output ports are the same and the MLF_ALLOW_LOOPBACK
1161                  flag is not set. It also drops  MLF_LOCAL_ONLY  packets  di‐
1162                  rected to a localnet port. It resubmits other packets to ta‐
1163                  ble 40.
1164
1165              4.  OpenFlow tables 40 through 63  execute  the  logical  egress
1166                  pipeline  from  the Logical_Flow table in the OVN Southbound
1167                  database. The egress pipeline can perform a final  stage  of
1168                  validation  before  packet delivery. Eventually, it may exe‐
1169                  cute an output action, which  ovn-controller  implements  by
1170                  resubmitting  to  table  64. A packet for which the pipeline
1171                  never executes output is effectively  dropped  (although  it
1172                  may have been transmitted through a tunnel across a physical
1173                  network).
1174
1175                  The egress pipeline cannot change the logical output port or
1176                  cause further tunneling.
1177
1178              5.  Table  64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK
1179                  is set. Logical loopback was handled in table 34, but  Open‐
1180                  Flow  by  default  also  prevents  loopback  to the OpenFlow
1181                  ingress port. Thus, when MLF_ALLOW_LOOPBACK is set, OpenFlow
1182                  table  64  saves the OpenFlow ingress port, sets it to zero,
1183                  resubmits to table 65  for  logical-to-physical  transforma‐
1184                  tion,  and  then  restores the OpenFlow ingress port, effec‐
1185                  tively disabling OpenFlow loopback  prevents.  When  MLF_AL‐
1186                  LOW_LOOPBACK is unset, table 64 flow simply resubmits to ta‐
1187                  ble 65.
1188
1189              6.  OpenFlow table 65 performs logical-to-physical  translation,
1190                  the  opposite  of  table  0. It matches the packet’s logical
1191                  egress port. Its actions output the packet to the  port  at‐
1192                  tached  to  the  OVN integration bridge that represents that
1193                  logical port. If the logical  egress  port  is  a  container
1194                  nested with a VM, then before sending the packet the actions
1195                  push on a VLAN header with an appropriate VLAN ID.
1196
1197   Logical Routers and Logical Patch Ports
1198       Typically logical routers and logical patch ports do not have a  physi‐
1199       cal  location  and  effectively reside on every hypervisor. This is the
1200       case for logical  patch  ports  between  logical  routers  and  logical
1201       switches behind those logical routers, to which VMs (and VIFs) attach.
1202
1203       Consider a packet sent from one virtual machine or container to another
1204       VM or container that resides on a different  subnet.  The  packet  will
1205       traverse  tables 0 to 65 as described in the previous section Architec‐
1206       tural Physical Life Cycle of a Packet, using the logical datapath  rep‐
1207       resenting  the  logical switch that the sender is attached to. At table
1208       32, the packet will use the fallback flow that resubmits locally to ta‐
1209       ble 33 on the same hypervisor. In this case, all of the processing from
1210       table 0 to table 65 occurs on the hypervisor where the sender resides.
1211
1212       When the packet reaches table 65, the logical egress port is a  logical
1213       patch  port.  ovn-controller  implements output to the logical patch is
1214       packet by cloning and resubmitting directly to the first OpenFlow  flow
1215       table  in the ingress pipeline, setting the logical ingress port to the
1216       peer logical patch port, and using the peer logical patch port’s  logi‐
1217       cal datapath (that represents the logical router).
1218
1219       The packet re-enters the ingress pipeline in order to traverse tables 8
1220       to 65 again, this time using the logical datapath representing the log‐
1221       ical router. The processing continues as described in the previous sec‐
1222       tion Architectural Physical Life Cycle of a  Packet.  When  the  packet
1223       reachs  table  65, the logical egress port will once again be a logical
1224       patch port. In the same manner as described above, this  logical  patch
1225       port  will  cause  the packet to be resubmitted to OpenFlow tables 8 to
1226       65, this time using  the  logical  datapath  representing  the  logical
1227       switch that the destination VM or container is attached to.
1228
1229       The packet traverses tables 8 to 65 a third and final time. If the des‐
1230       tination VM or container resides on a remote hypervisor, then table  32
1231       will  send  the packet on a tunnel port from the sender’s hypervisor to
1232       the remote hypervisor. Finally table 65 will output the packet directly
1233       to the destination VM or container.
1234
1235       The  following  sections describe two exceptions, where logical routers
1236       and/or logical patch ports are associated with a physical location.
1237
1238     Gateway Routers
1239
1240       A gateway router is a logical router that is bound to a physical  loca‐
1241       tion.  This  includes  all  of  the  logical patch ports of the logical
1242       router, as well as all of the  peer  logical  patch  ports  on  logical
1243       switches.  In the OVN Southbound database, the Port_Binding entries for
1244       these logical patch ports use the type l3gateway rather than patch,  in
1245       order  to  distinguish  that  these  logical patch ports are bound to a
1246       chassis.
1247
1248       When a hypervisor processes a packet on a logical datapath representing
1249       a  logical switch, and the logical egress port is a l3gateway port rep‐
1250       resenting connectivity to a gateway router, the  packet  will  match  a
1251       flow  in table 32 that sends the packet on a tunnel port to the chassis
1252       where the gateway router resides. This processing in table 32  is  done
1253       in the same manner as for VIFs.
1254
1255     Distributed Gateway Ports
1256
1257       This  section provides additional details on distributed gateway ports,
1258       outlined earlier.
1259
1260       The primary design goal of distributed gateway ports  is  to  allow  as
1261       much  traffic as possible to be handled locally on the hypervisor where
1262       a VM or container resides. Whenever possible, packets from  the  VM  or
1263       container  to  the outside world should be processed completely on that
1264       VM’s or container’s hypervisor, eventually traversing a  localnet  port
1265       instance or a tunnel to the physical network or a different OVN deploy‐
1266       ment. Whenever possible, packets from the outside world to a VM or con‐
1267       tainer  should be directed through the physical network directly to the
1268       VM’s or container’s hypervisor.
1269
1270       In order to allow for the distributed processing of  packets  described
1271       in  the  paragraph  above, distributed gateway ports need to be logical
1272       patch ports that effectively reside on every  hypervisor,  rather  than
1273       l3gateway  ports  that  are bound to a particular chassis. However, the
1274       flows associated with distributed gateway ports often need to be  asso‐
1275       ciated with physical locations, for the following reasons:
1276
1277              •      The  physical  network that the localnet port is attached
1278                     to typically uses L2 learning. Any Ethernet address  used
1279                     over the distributed gateway port must be restricted to a
1280                     single physical location so that upstream L2 learning  is
1281                     not  confused.  Traffic  sent out the distributed gateway
1282                     port towards the localnet port with a  specific  Ethernet
1283                     address  must  be  sent  out one specific instance of the
1284                     distributed gateway port on one specific chassis. Traffic
1285                     received  from  the  localnet  port (or from a VIF on the
1286                     same logical switch as the localnet port) with a specific
1287                     Ethernet address must be directed to the logical switch’s
1288                     patch port instance on that specific chassis.
1289
1290                     Due to the implications of L2 learning, the Ethernet  ad‐
1291                     dress and IP address of the distributed gateway port need
1292                     to be restricted to a single physical location. For  this
1293                     reason, the user must specify one chassis associated with
1294                     the distributed gateway port. Note that traffic  travers‐
1295                     ing the distributed gateway port using other Ethernet ad‐
1296                     dresses and IP addresses (e.g. one-to-one NAT) is not re‐
1297                     stricted to this chassis.
1298
1299                     Replies  to  ARP  and ND requests must be restricted to a
1300                     single physical location, where the Ethernet  address  in
1301                     the  reply  resides. This includes ARP and ND replies for
1302                     the IP address of the distributed gateway port, which are
1303                     restricted  to  the chassis that the user associated with
1304                     the distributed gateway port.
1305
1306              •      In order to support one-to-many SNAT (aka  IP  masquerad‐
1307                     ing),  where  multiple logical IP addresses spread across
1308                     multiple chassis are mapped to a single external  IP  ad‐
1309                     dress, it will be necessary to handle some of the logical
1310                     router processing on a specific chassis in a  centralized
1311                     manner.  Since  the SNAT external IP address is typically
1312                     the distributed gateway port IP address, and for simplic‐
1313                     ity,  the  same  chassis  associated with the distributed
1314                     gateway port is used.
1315
1316       The details of flow restrictions to specific chassis are  described  in
1317       the ovn-northd documentation.
1318
1319       While  most  of  the physical location dependent aspects of distributed
1320       gateway ports can be handled by  restricting  some  flows  to  specific
1321       chassis, one additional mechanism is required. When a packet leaves the
1322       ingress pipeline and the logical egress port is the distributed gateway
1323       port, one of two different sets of actions is required at table 32:
1324
1325              •      If  the packet can be handled locally on the sender’s hy‐
1326                     pervisor (e.g. one-to-one NAT traffic), then  the  packet
1327                     should  just  be  resubmitted locally to table 33, in the
1328                     normal manner for distributed logical patch ports.
1329
1330              •      However, if the packet needs to be handled on the chassis
1331                     associated  with  the distributed gateway port (e.g. one-
1332                     to-many SNAT traffic or non-NAT traffic), then  table  32
1333                     must send the packet on a tunnel port to that chassis.
1334
1335       In order to trigger the second set of actions, the chassisredirect type
1336       of southbound Port_Binding has been added. Setting the  logical  egress
1337       port  to the type chassisredirect logical port is simply a way to indi‐
1338       cate that although the packet is destined for the  distributed  gateway
1339       port,  it  needs  to be redirected to a different chassis. At table 32,
1340       packets with this logical egress port are sent to a  specific  chassis,
1341       in the same way that table 32 directs packets whose logical egress port
1342       is a VIF or a type l3gateway port to different chassis. Once the packet
1343       arrives at that chassis, table 33 resets the logical egress port to the
1344       value representing the distributed gateway port. For  each  distributed
1345       gateway  port,  there  is one type chassisredirect port, in addition to
1346       the distributed logical patch port representing the distributed gateway
1347       port.
1348
1349     High Availability for Distributed Gateway Ports
1350
1351       OVN  allows you to specify a prioritized list of chassis for a distrib‐
1352       uted gateway port. This is done by associating multiple Gateway_Chassis
1353       rows with a Logical_Router_Port in the OVN_Northbound database.
1354
1355       When  multiple  chassis  have been specified for a gateway, all chassis
1356       that may send packets to that gateway will enable BFD on tunnels to all
1357       configured  gateway chassis. The current master chassis for the gateway
1358       is the highest priority gateway chassis that is currently viewed as ac‐
1359       tive based on BFD status.
1360
1361       For  more  information on L3 gateway high availability, please refer to
1362       http://docs.ovn.org/en/latest/topics/high-availability.
1363
1364     Restrictions of Distributed Gateway Ports
1365
1366       Distributed gateway ports are used to connect to an  external  network,
1367       which  can be a physical network modeled by a logical switch with a lo‐
1368       calnet port, and can also be a logical switch that  interconnects  dif‐
1369       ferent  OVN  deployments (see OVN Deployments Interconnection). Usually
1370       there can be many logical routers connected to the same external  logi‐
1371       cal switch, as shown in below diagram.
1372
1373                                     +--LS-EXT-+
1374                                     |    |    |
1375                                     |    |    |
1376                                    LR1  ...  LRn
1377
1378
1379       In  this  diagram,  there  are n logical routers connected to a logical
1380       switch LS-EXT, each with a distributed gateway port,  so  that  traffic
1381       sent to external world is redirected to the gateway chassis that is as‐
1382       signed to the distributed gateway port of respective logical router.
1383
1384       In the logical topology, nothing can prevent an user to add a route be‐
1385       tween  the  logical routers via the connected distributed gateway ports
1386       on LS-EXT. However, the route works only if the LS-EXT  is  a  physical
1387       network  (modeled  by  a  logical switch with a localnet port). In that
1388       case the packet will be delivered between the gateway chassises through
1389       the localnet port via physical network. If the LS-EXT is a regular log‐
1390       ical switch (backed by tunneling only, as in the use case of OVN inter‐
1391       connection),  then  the  packet  will  be dropped on the source gateway
1392       chassis. The limitation is due the fact that distributed gateway  ports
1393       are tied to physical location, and without physical network connection,
1394       we will end up with either dropping the packet or transferring it  over
1395       the tunnels which could cause bigger problems such as broadcast packets
1396       being redirect repeatedly by different gateway chassises.
1397
1398       With the limitation in mind, if a user do want the direct  connectivity
1399       between the logical routers, it is better to create an internal logical
1400       switch connected to the logical routers  with  regular  logical  router
1401       ports,  which  are completely distributed and the packets don’t have to
1402       leave a chassis unless necessary, which is more  optimal  than  routing
1403       via the distributed gateway ports.
1404
1405     ARP request and ND NS packet processing
1406
1407       Due  to the fact that ARP requests and ND NA packets are usually broad‐
1408       cast packets, for performance reasons, OVN  deals  with  requests  that
1409       target  OVN  owned  IP  addresses (i.e., IP addresses configured on the
1410       router ports, VIPs, NAT IPs) in a specific way and only  forwards  them
1411       to the logical router that owns the target IP address. This behavior is
1412       different than that of traditional  switches  and  implies  that  other
1413       routers/hosts connected to the logical switch will not learn the MAC/IP
1414       binding from the request packet.
1415
1416       All other ARP and ND packets are flooded in the L2 broadcast domain and
1417       to all attached logical patch ports.
1418
1419     VIFs on the logical switch connected by a distributed gateway port
1420
1421       Typically the logical switch connected by a distributed gateway port is
1422       for external connectivity, usually to a physical network through a  lo‐
1423       calnet  port  on  the  logical  switch,  or  to a remote OVN deployment
1424       through OVN Interconnection. In these cases there is no VIF  ports  re‐
1425       quired on the logical switch.
1426
1427       While  not very common, it is still possible to create VIF ports on the
1428       logical switch connected by a distributed gateway port, but there is  a
1429       limitation that the logical ports need to reside on the gateway chassis
1430       where the distributed gateway port resides to get connectivity to other
1431       logical switches through the distributed gateway port. There is no lim‐
1432       itation for the VIFs to connect within the logical  switch,  or  beyond
1433       the  logical  switch  through  other regular distributed logical router
1434       ports.
1435
1436       A special case is when using distributed gateway ports for  scalability
1437       purpose,  as  mentioned  earlier in this document. The logical switches
1438       connected by distributed gateway ports are  not  for  connectivity  but
1439       just  for  regular VIFs. However, the above limitation usually does not
1440       matter because in this use case all the VIFs on the logical switch  are
1441       located on the same chassis with the distributed gateway port that con‐
1442       nects the logical switch.
1443
1444   Multiple localnet logical switches connected to a Logical Router
1445       It is possible to have multiple logical switches each with  a  localnet
1446       port (representing physical networks) connected to a logical router, in
1447       which one localnet logical switch may provide the external connectivity
1448       via  a  distributed  gateway  port  and  rest  of  the localnet logical
1449       switches use VLAN tagging in the physical network. It is expected  that
1450       ovn-bridge-mappings  is configured appropriately on the chassis for all
1451       these localnet networks.
1452
1453     East West routing
1454
1455       East-West routing between these localnet VLAN tagged  logical  switches
1456       work  almost the same way as normal logical switches. When the VM sends
1457       such a packet, then:
1458
1459              1.  It first enters the ingress pipeline, and then egress  pipe‐
1460                  line of the source localnet logical switch datapath. It then
1461                  enters the ingress pipeline of the logical  router  datapath
1462                  via the logical router port in the source chassis.
1463
1464              2.  Routing decision is taken.
1465
1466              3.  From the router datapath, packet enters the ingress pipeline
1467                  and then egress pipeline of the destination localnet logical
1468                  switch  datapath  and  goes out of the integration bridge to
1469                  the provider bridge ( belonging to the  destination  logical
1470                  switch)  via  the localnet port. While sending the packet to
1471                  provider bridge, we also replace router port MAC  as  source
1472                  MAC with a chassis unique MAC.
1473
1474                  This  chassis  unique MAC is configured as global ovs config
1475                  on each chassis (eg. via "ovs-vsctl set open . external-ids:
1476                  ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i"").   For
1477                  more details, see ovn-controller(8).
1478
1479                  If the above is not configured, then source MAC would be the
1480                  router  port  MAC. This could create problem if we have more
1481                  than one chassis. This is because, since the router port  is
1482                  distributed, the same (MAC,VLAN) tuple will seen by physical
1483                  network from other chassis as well, which could cause  these
1484                  issues:
1485
1486                  •      Continuous MAC moves in top-of-rack switch (ToR).
1487
1488                  •      ToR dropping the traffic, which is causing continuous
1489                         MAC moves.
1490
1491                  •      ToR blocking the ports from which MAC moves are  hap‐
1492                         pening.
1493
1494              4.  The destination chassis receives the packet via the localnet
1495                  port and sends it to the integration bridge. Before entering
1496                  the  integration bridge the source mac of the packet will be
1497                  replaced with router port mac again. The packet  enters  the
1498                  ingress pipeline and then egress pipeline of the destination
1499                  localnet logical switch and finally gets  delivered  to  the
1500                  destination VM port.
1501
1502     External traffic
1503
1504       The  following  happens  when a VM sends an external traffic (which re‐
1505       quires NATting) and the chassis hosting the VM doesn’t have a  distrib‐
1506       uted gateway port.
1507
1508              1.  The  packet  first  enters  the  ingress  pipeline, and then
1509                  egress pipeline of the source localnet logical switch  data‐
1510                  path.  It  then  enters  the ingress pipeline of the logical
1511                  router datapath via the logical router port  in  the  source
1512                  chassis.
1513
1514              2.  Routing  decision  is taken. Since the gateway router or the
1515                  distributed gateway port doesn’t reside in the source  chas‐
1516                  sis,  the  traffic  is redirected to the gateway chassis via
1517                  the tunnel port.
1518
1519              3.  The gateway chassis receives the packet via the tunnel  port
1520                  and  the  packet  enters  the egress pipeline of the logical
1521                  router datapath. NAT rules are applied here. The packet then
1522                  enters  the ingress pipeline and then egress pipeline of the
1523                  localnet logical switch  datapath  which  provides  external
1524                  connectivity  and  finally goes out via the localnet port of
1525                  the logical switch which provides external connectivity.
1526
1527       Although this works, the VM traffic is tunnelled  when  sent  from  the
1528       compute  chassis  to the gateway chassis. In order for it to work prop‐
1529       erly, the MTU of the localnet logical switches must be lowered  to  ac‐
1530       count for the tunnel encapsulation.
1531
1532   Centralized  routing for localnet VLAN tagged logical switches connected to
1533       a Logical Router
1534       To overcome the tunnel encapsulation problem described in the  previous
1535       section,  OVN  supports  the option of enabling centralized routing for
1536       localnet VLAN tagged logical switches. CMS can configure the option op‐
1537       tions:reside-on-redirect-chassis  to  true for each Logical_Router_Port
1538       which connects to the  localnet  VLAN  tagged  logical  switches.  This
1539       causes  the  gateway  chassis (hosting the distributed gateway port) to
1540       handle all the routing for these networks, making  it  centralized.  It
1541       will reply to the ARP requests for the logical router port IPs.
1542
1543       If  the logical router doesn’t have a distributed gateway port connect‐
1544       ing to the localnet logical switch which provides  external  connectiv‐
1545       ity,  or  if  it has more than one distributed gateway ports, then this
1546       option is ignored by OVN.
1547
1548       The following happens when a VM sends an east-west traffic which  needs
1549       to be routed:
1550
1551              1.  The  packet  first  enters  the  ingress  pipeline, and then
1552                  egress pipeline of the source localnet logical switch  data‐
1553                  path  and  is sent out via a localnet port of the source lo‐
1554                  calnet logical switch (instead of sending it to router pipe‐
1555                  line).
1556
1557              2.  The  gateway chassis receives the packet via a localnet port
1558                  of the source localnet logical switch and sends  it  to  the
1559                  integration bridge. The packet then enters the ingress pipe‐
1560                  line, and then egress pipeline of the source localnet  logi‐
1561                  cal  switch  datapath and enters the ingress pipeline of the
1562                  logical router datapath.
1563
1564              3.  Routing decision is taken.
1565
1566              4.  From the router datapath, packet enters the ingress pipeline
1567                  and then egress pipeline of the destination localnet logical
1568                  switch datapath. It then goes out of the integration  bridge
1569                  to  the provider bridge ( belonging to the destination logi‐
1570                  cal switch) via a localnet port.
1571
1572              5.  The destination chassis receives the packet via  a  localnet
1573                  port  and sends it to the integration bridge. The packet en‐
1574                  ters the ingress pipeline and then egress  pipeline  of  the
1575                  destination localnet logical switch and finally delivered to
1576                  the destination VM port.
1577
1578       The following happens when a VM sends an  external  traffic  which  re‐
1579       quires NATting:
1580
1581              1.  The  packet  first  enters  the  ingress  pipeline, and then
1582                  egress pipeline of the source localnet logical switch  data‐
1583                  path  and  is sent out via a localnet port of the source lo‐
1584                  calnet logical switch (instead of sending it to router pipe‐
1585                  line).
1586
1587              2.  The  gateway chassis receives the packet via a localnet port
1588                  of the source localnet logical switch and sends  it  to  the
1589                  integration bridge. The packet then enters the ingress pipe‐
1590                  line, and then egress pipeline of the source localnet  logi‐
1591                  cal  switch  datapath and enters the ingress pipeline of the
1592                  logical router datapath.
1593
1594              3.  Routing decision is taken and NAT rules are applied.
1595
1596              4.  From the router datapath, packet enters the ingress pipeline
1597                  and  then  egress  pipeline  of  the localnet logical switch
1598                  datapath which provides external connectivity. It then  goes
1599                  out  of  the  integration bridge to the provider bridge (be‐
1600                  longing to the logical switch which provides  external  con‐
1601                  nectivity) via a localnet port.
1602
1603       The following happens for the reverse external traffic.
1604
1605              1.  The gateway chassis receives the packet from a localnet port
1606                  of the logical switch which provides external  connectivity.
1607                  The  packet then enters the ingress pipeline and then egress
1608                  pipeline of the localnet logical switch (which provides  ex‐
1609                  ternal  connectivity).  The  packet  then enters the ingress
1610                  pipeline of the logical router datapath.
1611
1612              2.  The ingress pipeline of the logical router datapath  applies
1613                  the  unNATting  rules.  The  packet  then enters the ingress
1614                  pipeline and then egress pipeline  of  the  source  localnet
1615                  logical  switch.  Since  the source VM doesn’t reside in the
1616                  gateway chassis, the packet is sent out via a localnet  port
1617                  of the source logical switch.
1618
1619              3.  The  source  chassis receives the packet via a localnet port
1620                  and sends it to the integration bridge.  The  packet  enters
1621                  the  ingress pipeline and then egress pipeline of the source
1622                  localnet logical switch and finally gets  delivered  to  the
1623                  source VM port.
1624
1625       As  an  alternative  to  reside-on-redirect-chassis, OVN supports VLAN-
1626       based redirection. Whereas reside-on-redirect-chassis  centralizes  all
1627       router functionality, VLAN-based redirection only changes how OVN redi‐
1628       rects packets to the gateway chassis. By setting  options:redirect-type
1629       to  bridged on a distributed gateway port, OVN redirects packets to the
1630       gateway chassis using the localnet port of the  router’s  peer  logical
1631       switch, instead of a tunnel.
1632
1633       If  the logical router doesn’t have a distributed gateway port connect‐
1634       ing to the localnet logical switch which provides  external  connectiv‐
1635       ity,  or  if  it has more than one distributed gateway ports, then this
1636       option is ignored by OVN.
1637
1638       Following happens for bridged redirection:
1639
1640              1.  On compute chassis, packet passes  though  logical  router’s
1641                  ingress pipeline.
1642
1643              2.  If  logical  outport is gateway chassis attached router port
1644                  then packet is "redirected" to gateway  chassis  using  peer
1645                  logical switch’s localnet port.
1646
1647              3.  This  redirected  packet  has destination mac as router port
1648                  mac (the one to which gateway chassis is attached). Its VLAN
1649                  id is that of localnet port (peer logical switch of the log‐
1650                  ical router port).
1651
1652              4.  On the gateway chassis packet will enter the logical  router
1653                  pipeline  again  and  this  time  it will passthrough egress
1654                  pipeline as well.
1655
1656              5.  Reverse traffic packet flows stays the same.
1657
1658       Some guidelines and expections with bridged redirection:
1659
1660              1.  Since router port mac is destination mac, hence it has to be
1661                  ensured  that  physical  network  learns it on ONLY from the
1662                  gateway chassis. Which means  that  ovn-chassis-mac-mappings
1663                  should be configure on all the compute nodes, so that physi‐
1664                  cal network never learn router port mac from compute nodes.
1665
1666              2.  Since packet enters logical router  ingress  pipeline  twice
1667                  (once  on  compute  chassis  and  again on gateway chassis),
1668                  hence ttl will be decremented twice.
1669
1670              3.  Default redirection type continues to be overlay.  User  can
1671                  switch  the  redirect-type  between  bridged  and overlay by
1672                  changing the value of options:redirect-type
1673
1674   Life Cycle of a VTEP gateway
1675       A gateway is a chassis that forwards traffic  between  the  OVN-managed
1676       part of a logical network and a physical VLAN, extending a tunnel-based
1677       logical network into a physical network.
1678
1679       The steps below refer often to details of the  OVN  and  VTEP  database
1680       schemas. Please see ovn-sb(5), ovn-nb(5) and vtep(5), respectively, for
1681       the full story on these databases.
1682
1683              1.  A VTEP gateway’s life cycle begins  with  the  administrator
1684                  registering  the VTEP gateway as a Physical_Switch table en‐
1685                  try in the VTEP database. The ovn-controller-vtep  connected
1686                  to  this  VTEP database, will recognize the new VTEP gateway
1687                  and  create  a  new  Chassis  table  entry  for  it  in  the
1688                  OVN_Southbound database.
1689
1690              2.  The administrator can then create a new Logical_Switch table
1691                  entry, and bind a particular vlan on a VTEP  gateway’s  port
1692                  to  any  VTEP  logical switch. Once a VTEP logical switch is
1693                  bound to a VTEP gateway, the ovn-controller-vtep will detect
1694                  it  and  add its name to the vtep_logical_switches column of
1695                  the Chassis table in the OVN_Southbound database. Note,  the
1696                  tunnel_key  column  of  VTEP logical switch is not filled at
1697                  creation. The ovn-controller-vtep will set the  column  when
1698                  the correponding vtep logical switch is bound to an OVN log‐
1699                  ical network.
1700
1701              3.  Now, the administrator can use the CMS to add a VTEP logical
1702                  switch  to the OVN logical network. To do that, the CMS must
1703                  first create a new Logical_Switch_Port table  entry  in  the
1704                  OVN_Northbound database. Then, the type column of this entry
1705                  must be set to "vtep".  Next,  the  vtep-logical-switch  and
1706                  vtep-physical-switch keys in the options column must also be
1707                  specified, since multiple VTEP gateways can  attach  to  the
1708                  same VTEP logical switch. Next, the addresses column of this
1709                  logical port must be set to "unknown", it will add a  prior‐
1710                  ity  0  entry  in  "ls_in_l2_lkup"  stage  of logical switch
1711                  ingress pipeline. So, traffic with  unrecorded  mac  by  OVN
1712                  would  go  through  the Logical_Switch_Port to physical net‐
1713                  work.
1714
1715              4.  The newly created logical port in the  OVN_Northbound  data‐
1716                  base  and  its  configuration  will  be  passed  down to the
1717                  OVN_Southbound database as a new Port_Binding  table  entry.
1718                  The  ovn-controller-vtep  will recognize the change and bind
1719                  the logical port to the corresponding VTEP gateway  chassis.
1720                  Configuration  of  binding the same VTEP logical switch to a
1721                  different OVN logical networks is not allowed and a  warning
1722                  will be generated in the log.
1723
1724              5.  Beside  binding  to  the  VTEP gateway chassis, the ovn-con‐
1725                  troller-vtep will update the tunnel_key column of  the  VTEP
1726                  logical  switch  to the corresponding Datapath_Binding table
1727                  entry’s tunnel_key for the bound OVN logical network.
1728
1729              6.  Next, the ovn-controller-vtep will keep reacting to the con‐
1730                  figuration  change in the Port_Binding in the OVN_Northbound
1731                  database, and updating the Ucast_Macs_Remote  table  in  the
1732                  VTEP  database.  This  allows the VTEP gateway to understand
1733                  where to forward the unicast traffic  coming  from  the  ex‐
1734                  tended external network.
1735
1736              7.  Eventually,  the VTEP gateway’s life cycle ends when the ad‐
1737                  ministrator unregisters the VTEP gateway from the VTEP data‐
1738                  base.  The  ovn-controller-vtep will recognize the event and
1739                  remove all related configurations (Chassis table  entry  and
1740                  port bindings) in the OVN_Southbound database.
1741
1742              8.  When the ovn-controller-vtep is terminated, all related con‐
1743                  figurations in the  OVN_Southbound  database  and  the  VTEP
1744                  database  will  be  cleaned, including Chassis table entries
1745                  for all registered VTEP gateways and  their  port  bindings,
1746                  and  all  Ucast_Macs_Remote  table  entries  and  the  Logi‐
1747                  cal_Switch tunnel keys.
1748
1749   OVN Deployments Interconnection
1750       It is not uncommon for an operator to deploy multiple OVN clusters, for
1751       two  main  reasons.  Firstly,  an operator may prefer to deploy one OVN
1752       cluster for each availability zone, e.g. in different physical regions,
1753       to  avoid  single  point of failure. Secondly, there is always an upper
1754       limit for a single OVN control plane to scale.
1755
1756       Although the control planes of the different  availability  zone  (AZ)s
1757       are  independent  from each other, the workloads from different AZs may
1758       need to communicate across the zones. The OVN  interconnection  feature
1759       provides  a  native  way  to  interconnect  different AZs by L3 routing
1760       through transit overlay networks between logical routers  of  different
1761       AZs.
1762
1763       A  global OVN Interconnection Northbound database is introduced for the
1764       operator (probably through CMS systems) to  configure  transit  logical
1765       switches  that  connect  logical  routers from different AZs. A transit
1766       switch is similar to a regular logical switch, but it is used  for  in‐
1767       terconnection  purpose only. Typically, each transit switch can be used
1768       to connect all logical routers that belong to same  tenant  across  all
1769       AZs.
1770
1771       A  dedicated  daemon process ovn-ic, OVN interconnection controller, in
1772       each AZ will consume  this  data  and  populate  corresponding  logical
1773       switches to their own northbound databases for each AZ, so that logical
1774       routers can be connected to the transit switch by creating  patch  port
1775       pairs  in their northbound databases. Any router ports connected to the
1776       transit switches are considered interconnection ports,  which  will  be
1777       exchanged between AZs.
1778
1779       Physically, when workloads from different AZs communicate, packets need
1780       to go through multiple hops: source chassis, source  gateway,  destina‐
1781       tion  gateway  and  destination  chassis.  All these hops are connected
1782       through tunnels so that the packets never  leave  overlay  networks.  A
1783       distributed gateway port is required to connect the logical router to a
1784       transit switch, with a gateway chassis specified, so that  the  traffic
1785       can be forwarded through the gateway chassis.
1786
1787       A  global OVN Interconnection Southbound database is introduced for ex‐
1788       changing control plane information between the AZs. The  data  in  this
1789       database  is populated and consumed by the ovn-ic, of each AZ. The main
1790       information in this database includes:
1791
1792              •      Datapath bindings for transit switches, which mainly con‐
1793                     tains  the tunnel keys generated for each transit switch.
1794                     Separate key ranges are reserved for transit switches  so
1795                     that  they  will  never conflict with any tunnel keys lo‐
1796                     cally assigned for datapaths within each AZ.
1797
1798              •      Availability zones, which are registerd  by  ovn-ic  from
1799                     each AZ.
1800
1801              •      Gateways.  Each  AZ specifies chassises that are supposed
1802                     to work as interconnection gateways, and the ovn-ic  will
1803                     populate  this  information to the interconnection south‐
1804                     bound DB. The ovn-ic from all the other  AZs  will  learn
1805                     the gateways and populate to their own southbound DB as a
1806                     chassis.
1807
1808              •      Port bindings for logical switch  ports  created  on  the
1809                     transit switch. Each AZ maintains their logical router to
1810                     transit switch connections independently, but ovn-ic  au‐
1811                     tomatically  populates  local  port  bindings  on transit
1812                     switches to the global interconnection southbound DB, and
1813                     learns  remote  port  bindings from other AZs back to its
1814                     own northbound and southbound DBs, so that logical  flows
1815                     can be produced and then translated to OVS flows locally,
1816                     which finally enables data plane communication.
1817
1818              •      Routes that are advertised between different AZs. If  en‐
1819                     abled, routes are automatically exchanged by ovn-ic. Both
1820                     static routes and directly connected subnets  are  adver‐
1821                     tised.  Options  in options column of the NB_Global table
1822                     of OVN_NB database control the behavior of  route  adver‐
1823                     tisement, such as enable/disable the advertising/learning
1824                     routes, whether default  routes  are  advertised/learned,
1825                     and blacklisted CIDRs. See ovn-nb(5) for more details.
1826
1827       The  tunnel keys for transit switch datapaths and related port bindings
1828       must be agreed across all AZs. This is ensured by generating and  stor‐
1829       ing  the  keys  in  the global interconnection southbound database. Any
1830       ovn-ic from any AZ can allocate the key, but race conditions are solved
1831       by enforcing unique index for the column in the database.
1832
1833       Once  each  AZ’s NB and SB databases are populated with interconnection
1834       switches and ports, and agreed upon the tunnel keys, data plane  commu‐
1835       nication between the AZs are established.
1836
1837       When  VXLAN  tunneling is enabled in an OVN cluster, due to the limited
1838       range available for VNIs, Interconnection feature is not supported.
1839
1840     A day in the life of a packet crossing AZs
1841
1842              1.  An IP packet is sent out from a VIF on a hypervisor (HV1) of
1843                  AZ1, with destination IP belonging to a VIF in AZ2.
1844
1845              2.  In  HV1’s  OVS  flow tables, the packet goes through logical
1846                  switch and logical router pipelines, and in a logical router
1847                  pipeline,  the  routing stage finds out the next hop for the
1848                  destination IP, which belongs to  a  remote  logical  router
1849                  port  in  AZ2, and the output port, which is a chassis-redi‐
1850                  rect port located on  an  interconnection  gateway  (GW1  in
1851                  AZ1), so HV1 sends the packet to GW1 through tunnel.
1852
1853              3.  On  GW1,  it continues with the logical router pipe line and
1854                  switches to the transit switch’s pipeline through  the  peer
1855                  port  of  the chassis redirect port. In the transit switch’s
1856                  pipeline it outputs to the remote logical port which is  lo‐
1857                  cated on a gateway (GW2) in AZ2, so the GW1 sends the packet
1858                  to GW2 in tunnel.
1859
1860              4.  On GW2, it continues with the transit  switch  pipeline  and
1861                  switches  to  the  logical  router pipeline through the peer
1862                  port, which is a chassis redirect port that  is  located  on
1863                  GW2. The logical router pipeline then forwards the packet to
1864                  relevant logical pipelines according to the  destination  IP
1865                  address,  and figures out the MAC and location of the desti‐
1866                  nation VIF port - a hypervisor (HV2). The GW2 then sends the
1867                  packet to HV2 in tunnel.
1868
1869              5.  On HV2, the packet is delivered to the final destination VIF
1870                  port by the logical switch egress pipeline,  just  the  same
1871                  way as for intra-AZ communications.
1872
1873   Native OVN services for external logical ports
1874       To  support  OVN  native services (like DHCP/IPv6 RA/DNS lookup) to the
1875       cloud resources which  are  external,  OVN  supports  external  logical
1876       ports.
1877
1878       Below are some of the use cases where external ports can be used.
1879
1880              •      VMs  connected to SR-IOV nics - Traffic from these VMs by
1881                     passes the kernel stack and local ovn-controller  do  not
1882                     bind these ports and cannot serve the native services.
1883
1884              •      When CMS supports provisioning baremetal servers.
1885
1886       OVN will provide the native services if CMS has done the below configu‐
1887       ration in the OVN Northbound Database.
1888
1889              •      A row is created in Logical_Switch_Port, configuring  the
1890                     addresses column and setting the type to external.
1891
1892ha_chassis_group column is configured.
1893
1894              •      The  HA chassis which belongs to the HA chassis group has
1895                     the ovn-bridge-mappings configured and has proper L2 con‐
1896                     nectivity  so  that it can receive the DHCP and other re‐
1897                     lated request packets from these external resources.
1898
1899              •      The Logical_Switch of this port has a localnet port.
1900
1901              •      Native OVN services are enabled by configuring  the  DHCP
1902                     and  other options like the way it is done for the normal
1903                     logical ports.
1904
1905       It is recommended to use the same HA chassis group for all the external
1906       ports of a logical switch. Otherwise, the physical switch might see MAC
1907       flap issue when different chassis provide the native services. For  ex‐
1908       ample when supporting native DHCPv4 service, DHCPv4 server mac (config‐
1909       ured in options:server_mac column in  table  DHCP_Options)  originating
1910       from  different  ports can cause MAC flap issue. The MAC of the logical
1911       router IP(s) can also flap if the same HA chassis group is not set  for
1912       all the external ports of a logical switch.
1913

SECURITY

1915   Role-Based Access Controls for the Southbound DB
1916       In  order  to provide additional security against the possibility of an
1917       OVN chassis becoming compromised in such a way as to allow rogue  soft‐
1918       ware  to  make arbitrary modifications to the southbound database state
1919       and thus disrupt the  OVN  network,  role-based  access  controls  (see
1920       ovsdb-server(1) for additional details) are provided for the southbound
1921       database.
1922
1923       The implementation of role-based access controls  (RBAC)  requires  the
1924       addition  of  two tables to an OVSDB schema: the RBAC_Role table, which
1925       is indexed by role name and maps the the names of  the  various  tables
1926       that may be modifiable for a given role to individual rows in a permis‐
1927       sions table containing detailed permission information for  that  role,
1928       and  the  permission table itself which consists of rows containing the
1929       following information:
1930
1931              Table Name
1932                     The name of the associated table. This column exists pri‐
1933                     marily  as an aid for humans reading the contents of this
1934                     table.
1935
1936              Auth Criteria
1937                     A set of strings containing the names of columns (or col‐
1938                     umn:key pairs for columns containing string:string maps).
1939                     The contents of at least one of the columns or column:key
1940                     values in a row to be modified, inserted, or deleted must
1941                     be equal to the ID of the client attempting to act on the
1942                     row  in order for the authorization check to pass. If the
1943                     authorization criteria is empty,  authorization  checking
1944                     is  disabled and all clients for the role will be treated
1945                     as authorized.
1946
1947              Insert/Delete
1948                     Row insertion/deletion permission; boolean value indicat‐
1949                     ing whether insertion and deletion of rows is allowed for
1950                     the associated table. If true, insertion and deletion  of
1951                     rows is allowed for authorized clients.
1952
1953              Updatable Columns
1954                     A  set of strings containing the names of columns or col‐
1955                     umn:key pairs that may be updated or  mutated  by  autho‐
1956                     rized  clients. Modifications to columns within a row are
1957                     only permitted  when  the  authorization  check  for  the
1958                     client passes and all columns to be modified are included
1959                     in this set of modifiable columns.
1960
1961       RBAC configuration for the OVN southbound  database  is  maintained  by
1962       ovn-northd. With RBAC enabled, modifications are only permitted for the
1963       Chassis, Encap, Port_Binding,  and  MAC_Binding  tables,  and  are  re‐
1964       stricted as follows:
1965
1966              Chassis
1967                     Authorization: client ID must match the chassis name.
1968
1969                     Insert/Delete:  authorized row insertion and deletion are
1970                     permitted.
1971
1972                     Update: The columns  nb_cfg,  external_ids,  encaps,  and
1973                     vtep_logical_switches may be modified when authorized.
1974
1975              Encap  Authorization: client ID must match the chassis name.
1976
1977                     Insert/Delete: row insertion and row deletion are permit‐
1978                     ted.
1979
1980                     Update: The columns type, options, and ip  can  be  modi‐
1981                     fied.
1982
1983              Port_Binding
1984                     Authorization:  disabled  (all clients are considered au‐
1985                     thorized. A future enhancement may add columns  (or  keys
1986                     to  external_ids)  in  order to control which chassis are
1987                     allowed to bind each port.
1988
1989                     Insert/Delete: row insertion/deletion are  not  permitted
1990                     (ovn-northd maintains rows in this table.
1991
1992                     Update: Only modifications to the chassis column are per‐
1993                     mitted.
1994
1995              MAC_Binding
1996                     Authorization: disabled (all clients are considered to be
1997                     authorized).
1998
1999                     Insert/Delete: row insertion/deletion are permitted.
2000
2001                     Update:  The  columns logical_port, ip, mac, and datapath
2002                     may be modified by ovn-controller.
2003
2004              IGMP_Group
2005                     Authorization: disabled (all clients are considered to be
2006                     authorized).
2007
2008                     Insert/Delete: row insertion/deletion are permitted.
2009
2010                     Update: The columns address, chassis, datapath, and ports
2011                     may be modified by ovn-controller.
2012
2013       Enabling RBAC for ovn-controller connections to the southbound database
2014       requires the following steps:
2015
2016              1.  Creating SSL certificates for each chassis with the certifi‐
2017                  cate CN field set to the chassis name (e.g.  for  a  chassis
2018                  with   external-ids:system-id=chassis-1,   via  the  command
2019                  "ovs-pki -u req+sign chassis-1 switch").
2020
2021              2.  Configuring each ovn-controller to use SSL  when  connecting
2022                  to  the  southbound database (e.g. via "ovs-vsctl set open .
2023                  external-ids:ovn-remote=ssl:x.x.x.x:6642").
2024
2025              3.  Configuring a southbound database SSL remote with  "ovn-con‐
2026                  troller"    role   (e.g.   via   "ovn-sbctl   set-connection
2027                  role=ovn-controller pssl:6642").
2028
2029   Encrypt Tunnel Traffic with IPsec
2030       OVN tunnel traffic goes through physical routers  and  switches.  These
2031       physical  devices  could  be  untrusted  (devices in public network) or
2032       might be compromised. Enabling encryption to  the  tunnel  traffic  can
2033       prevent the traffic data from being monitored and manipulated.
2034
2035       The tunnel traffic is encrypted with IPsec. The CMS sets the ipsec col‐
2036       umn in the northbound NB_Global table to enable or disable IPsec encry‐
2037       tion.  If ipsec is true, all OVN tunnels will be encrypted. If ipsec is
2038       false, no OVN tunnels will be encrypted.
2039
2040       When CMS updates the ipsec column in the  northbound  NB_Global  table,
2041       ovn-northd  copies  the  value  to  the  ipsec column in the southbound
2042       SB_Global table. ovn-controller in each chassis monitors the southbound
2043       database  and sets the options of the OVS tunnel interface accordingly.
2044       OVS tunnel interface options are  monitored  by  the  ovs-monitor-ipsec
2045       daemon which configures IKE daemon to set up IPsec connections.
2046
2047       Chassis  authenticates each other by using certificate. The authentica‐
2048       tion succeeds if the other end in tunnel presents a certificate  signed
2049       by  a  trusted CA and the common name (CN) matches the expected chassis
2050       name. The SSL certificates used in role-based  access  controls  (RBAC)
2051       can  be used in IPsec. Or use ovs-pki to create different certificates.
2052       The certificate is required to be x.509 version 3, and  with  CN  field
2053       and subjectAltName field being set to the chassis name.
2054
2055       The CA certificate, chassis certificate and private key are required to
2056       be  installed  in  each  chassis  before  enabling  IPsec.  Please  see
2057       ovs-vswitchd.conf.db(5) for setting up CA based IPsec authentication.
2058

DESIGN DECISIONS

2060   Tunnel Encapsulations
2061       In  general,  OVN  annotates logical network packets that it sends from
2062       one hypervisor to another with the following three pieces of  metadata,
2063       which are encoded in an encapsulation-specific fashion:
2064
2065              •      24-bit  logical  datapath identifier, from the tunnel_key
2066                     column in the OVN Southbound Datapath_Binding table.
2067
2068              •      15-bit logical ingress port identifier. ID 0 is  reserved
2069                     for  internal use within OVN. IDs 1 through 32767, inclu‐
2070                     sive, may be assigned to  logical  ports  (see  the  tun‐
2071                     nel_key column in the OVN Southbound Port_Binding table).
2072
2073              •      16-bit  logical  egress  port  identifier.  IDs 0 through
2074                     32767 have the same meaning as for logical ingress ports.
2075                     IDs  32768  through  65535, inclusive, may be assigned to
2076                     logical multicast groups (see the  tunnel_key  column  in
2077                     the OVN Southbound Multicast_Group table).
2078
2079       When  VXLAN  is  enabled  on  any hypervisor in a cluster, datapath and
2080       egress port identifier ranges are reduced to 12-bits. This is done  be‐
2081       cause only STT and Geneve provide the large space for metadata (over 32
2082       bits per packet). To accommodate for VXLAN, 24 bits available are split
2083       as follows:
2084
2085              •      12-bit logical datapath identifier, derived from the tun‐
2086                     nel_key column in the OVN Southbound Datapath_Binding ta‐
2087                     ble.
2088
2089              •      12-bit logical egress port identifier. IDs 0 through 2047
2090                     are used for unicast output ports. IDs 2048 through 4095,
2091                     inclusive,  may  be  assigned to logical multicast groups
2092                     (see the tunnel_key column in the OVN  Southbound  Multi‐
2093                     cast_Group  table).  For  multicast  group tunnel keys, a
2094                     special mapping scheme is used  to  internally  transform
2095                     from  internal  OVN  16-bit  keys to 12-bit values before
2096                     sending packets through a VXLAN  tunnel,  and  back  from
2097                     12-bit  tunnel keys to 16-bit values when receiving pack‐
2098                     ets from a VXLAN tunnel.
2099
2100              •      No logical ingress port identifier.
2101
2102       The limited space available for metadata when VXLAN tunnels are enabled
2103       in  a  cluster  put  the following functional limitations onto features
2104       available to users:
2105
2106              •      The maximum number of networks is reduced to 4096.
2107
2108              •      The maximum number of ports per  network  is  reduced  to
2109                     4096. (Including multicast group ports.)
2110
2111              •      ACLs  matching  against  logical ingress port identifiers
2112                     are not supported.
2113
2114              •      OVN interconnection feature is not supported.
2115
2116       In addition to functional limitations described  above,  the  following
2117       should be considered before enabling it in your cluster:
2118
2119              •      STT  and  Geneve  use  randomized UDP or TCP source ports
2120                     that allows efficient distribution among  multiple  paths
2121                     in environments that use ECMP in their underlay.
2122
2123              •      NICs  are  available to offload STT and Geneve encapsula‐
2124                     tion and decapsulation.
2125
2126       Due to its flexibility, the preferred encapsulation between hypervisors
2127       is Geneve. For Geneve encapsulation, OVN transmits the logical datapath
2128       identifier in the Geneve VNI. OVN transmits  the  logical  ingress  and
2129       logical  egress  ports  in  a  TLV  with class 0x0102, type 0x80, and a
2130       32-bit value encoded as follows, from MSB to LSB:
2131
2132         1       15          16
2133       +---+------------+-----------+
2134       |rsv|ingress port|egress port|
2135       +---+------------+-----------+
2136         0
2137
2138
2139       Environments whose NICs lack Geneve offload may prefer  STT  encapsula‐
2140       tion  for  performance  reasons. For STT encapsulation, OVN encodes all
2141       three pieces of logical metadata in the STT 64-bit tunnel  ID  as  fol‐
2142       lows, from MSB to LSB:
2143
2144           9          15          16         24
2145       +--------+------------+-----------+--------+
2146       |reserved|ingress port|egress port|datapath|
2147       +--------+------------+-----------+--------+
2148           0
2149
2150
2151       For connecting to gateways, in addition to Geneve and STT, OVN supports
2152       VXLAN, because only  VXLAN  support  is  common  on  top-of-rack  (ToR)
2153       switches. Currently, gateways have a feature set that matches the capa‐
2154       bilities as defined by the VTEP schema, so fewer bits of  metadata  are
2155       necessary.  In  the future, gateways that do not support encapsulations
2156       with large amounts of metadata may continue to have a  reduced  feature
2157       set.
2158
2159
2160
2161OVN 21.09.0                    OVN Architecture            ovn-architecture(7)
Impressum