1ovn-architecture(7)           Open vSwitch Manual          ovn-architecture(7)
2
3
4

NAME

6       ovn-architecture - Open Virtual Network architecture
7

DESCRIPTION

9       OVN,  the  Open Virtual Network, is a system to support virtual network
10       abstraction. OVN complements the existing capabilities of  OVS  to  add
11       native support for virtual network abstractions, such as virtual L2 and
12       L3 overlays and security groups. Services such as DHCP are also  desir‐
13       able  features.  Just  like OVS, OVN’s design goal is to have a produc‐
14       tion-quality implementation that can operate at significant scale.
15
16       An OVN deployment consists of several components:
17
18              ·      A Cloud Management System (CMS), which is OVN’s  ultimate
19                     client  (via  its users and administrators). OVN integra‐
20                     tion  requires  installing  a  CMS-specific  plugin   and
21                     related software (see below). OVN initially targets Open‐
22                     Stack as CMS.
23
24                     We generally speak of ``the’’ CMS, but  one  can  imagine
25                     scenarios  in which multiple CMSes manage different parts
26                     of an OVN deployment.
27
28              ·      An OVN Database physical or virtual node (or, eventually,
29                     cluster) installed in a central location.
30
31              ·      One  or more (usually many) hypervisors. Hypervisors must
32                     run Open vSwitch and implement the interface described in
33                     IntegrationGuide.rst in the OVS source tree. Any hypervi‐
34                     sor platform supported by Open vSwitch is acceptable.
35
36              ·      Zero or more gateways. A gateway extends  a  tunnel-based
37                     logical  network  into a physical network by bidirection‐
38                     ally forwarding packets between tunnels  and  a  physical
39                     Ethernet  port.  This  allows non-virtualized machines to
40                     participate in logical networks. A gateway may be a phys‐
41                     ical  host,  a virtual machine, or an ASIC-based hardware
42                     switch that supports the vtep(5) schema.
43
44                     Hypervisors and gateways are  together  called  transport
45                     node or chassis.
46
47       The  diagram  below  shows  how the major components of OVN and related
48       software interact. Starting at the top of the diagram, we have:
49
50              ·      The Cloud Management System, as defined above.
51
52              ·      The OVN/CMS Plugin is  the  component  of  the  CMS  that
53                     interfaces  to OVN. In OpenStack, this is a Neutron plug‐
54                     in. The plugin’s main purpose is to translate  the  CMS’s
55                     notion  of  logical  network configuration, stored in the
56                     CMS’s configuration database in  a  CMS-specific  format,
57                     into an intermediate representation understood by OVN.
58
59                     This  component  is  necessarily  CMS-specific,  so a new
60                     plugin needs to be developed for each CMS that  is  inte‐
61                     grated  with OVN. All of the components below this one in
62                     the diagram are CMS-independent.
63
64              ·      The OVN Northbound  Database  receives  the  intermediate
65                     representation  of  logical  network configuration passed
66                     down by the OVN/CMS Plugin. The database schema is  meant
67                     to  be  ``impedance matched’’ with the concepts used in a
68                     CMS, so that it  directly  supports  notions  of  logical
69                     switches,  routers,  ACLs,  and  so on. See ovn-nb(5) for
70                     details.
71
72                     The OVN Northbound Database has  only  two  clients:  the
73                     OVN/CMS Plugin above it and ovn-northd below it.
74
75              ·      ovn-northd(8)  connects  to  the  OVN Northbound Database
76                     above it and the OVN Southbound  Database  below  it.  It
77                     translates  the logical network configuration in terms of
78                     conventional network concepts, taken from the OVN  North‐
79                     bound  Database,  into  logical datapath flows in the OVN
80                     Southbound Database below it.
81
82              ·      The OVN Southbound Database is the center of the  system.
83                     Its  clients  are  ovn-northd(8)  above  it  and ovn-con‐
84                     troller(8) on every transport node below it.
85
86                     The OVN Southbound Database contains three kinds of data:
87                     Physical  Network  (PN)  tables that specify how to reach
88                     hypervisor and other nodes, Logical Network  (LN)  tables
89                     that  describe  the logical network in terms of ``logical
90                     datapath flows,’’ and Binding tables  that  link  logical
91                     network  components’  locations  to the physical network.
92                     The hypervisors populate the PN and Port_Binding  tables,
93                     whereas ovn-northd(8) populates the LN tables.
94
95                     OVN  Southbound  Database performance must scale with the
96                     number of transport nodes. This will likely require  some
97                     work  on  ovsdb-server(1)  as  we  encounter bottlenecks.
98                     Clustering for availability may be needed.
99
100       The remaining components are replicated onto each hypervisor:
101
102              ·      ovn-controller(8) is OVN’s agent on each  hypervisor  and
103                     software  gateway.  Northbound,  it  connects  to the OVN
104                     Southbound Database to learn about OVN configuration  and
105                     status  and to populate the PN table and the Chassis col‐
106                     umn in Binding table with the hypervisor’s status. South‐
107                     bound, it connects to ovs-vswitchd(8) as an OpenFlow con‐
108                     troller, for control over network  traffic,  and  to  the
109                     local  ovsdb-server(1) to allow it to monitor and control
110                     Open vSwitch configuration.
111
112              ·      ovs-vswitchd(8) and ovsdb-server(1) are conventional com‐
113                     ponents of Open vSwitch.
114
115                                         CMS
116                                          |
117                                          |
118                              +-----------|-----------+
119                              |           |           |
120                              |     OVN/CMS Plugin    |
121                              |           |           |
122                              |           |           |
123                              |   OVN Northbound DB   |
124                              |           |           |
125                              |           |           |
126                              |       ovn-northd      |
127                              |           |           |
128                              +-----------|-----------+
129                                          |
130                                          |
131                                +-------------------+
132                                | OVN Southbound DB |
133                                +-------------------+
134                                          |
135                                          |
136                       +------------------+------------------+
137                       |                  |                  |
138         HV 1          |                  |    HV n          |
139       +---------------|---------------+  .  +---------------|---------------+
140       |               |               |  .  |               |               |
141       |        ovn-controller         |  .  |        ovn-controller         |
142       |         |          |          |  .  |         |          |          |
143       |         |          |          |     |         |          |          |
144       |  ovs-vswitchd   ovsdb-server  |     |  ovs-vswitchd   ovsdb-server  |
145       |                               |     |                               |
146       +-------------------------------+     +-------------------------------+
147
148
149   Information Flow in OVN
150       Configuration  data  in OVN flows from north to south. The CMS, through
151       its  OVN/CMS  plugin,  passes  the  logical  network  configuration  to
152       ovn-northd  via  the  northbound database. In turn, ovn-northd compiles
153       the configuration into a lower-level form and passes it to all  of  the
154       chassis via the southbound database.
155
156       Status information in OVN flows from south to north. OVN currently pro‐
157       vides only a few forms of status information. First,  ovn-northd  popu‐
158       lates  the  up column in the northbound Logical_Switch_Port table: if a
159       logical port’s chassis column in the southbound Port_Binding  table  is
160       nonempty,  it  sets up to true, otherwise to false. This allows the CMS
161       to detect when a VM’s networking has come up.
162
163       Second, OVN provides feedback to the CMS on the realization of its con‐
164       figuration,  that is, whether the configuration provided by the CMS has
165       taken effect. This  feature  requires  the  CMS  to  participate  in  a
166       sequence number protocol, which works the following way:
167
168              1.
169                When the CMS updates the configuration in the northbound data‐
170                base, as part of the same transaction, it increments the value
171                of  the  nb_cfg  column  in the NB_Global table. (This is only
172                necessary if the CMS wants to know when the configuration  has
173                been realized.)
174
175              2.
176                When  ovn-northd  updates  the  southbound database based on a
177                given snapshot of the northbound database,  it  copies  nb_cfg
178                from   northbound   NB_Global  into  the  southbound  database
179                SB_Global table, as part of the same  transaction.  (Thus,  an
180                observer  monitoring  both  databases  can  determine when the
181                southbound database is caught up with the northbound.)
182
183              3.
184                After ovn-northd receives  confirmation  from  the  southbound
185                database  server  that  its changes have committed, it updates
186                sb_cfg in the northbound NB_Global table to the nb_cfg version
187                that  was  pushed down. (Thus, the CMS or another observer can
188                determine when the southbound database is caught up without  a
189                connection to the southbound database.)
190
191              4.
192                The  ovn-controller  process  on  each  chassis  receives  the
193                updated southbound database, with  the  updated  nb_cfg.  This
194                process  in  turn  updates the physical flows installed in the
195                chassis’s Open vSwitch instances. When it  receives  confirma‐
196                tion  from  Open  vSwitch  that  the  physical flows have been
197                updated, it updates nb_cfg in its own Chassis  record  in  the
198                southbound database.
199
200              5.
201                ovn-northd  monitors  the  nb_cfg column in all of the Chassis
202                records in the southbound database. It keeps track of the min‐
203                imum value among all the records and copies it into the hv_cfg
204                column in the northbound NB_Global table. (Thus,  the  CMS  or
205                another  observer  can  determine  when all of the hypervisors
206                have caught up to the northbound configuration.)
207
208   Chassis Setup
209       Each chassis in an OVN deployment  must  be  configured  with  an  Open
210       vSwitch  bridge dedicated for OVN’s use, called the integration bridge.
211       System startup  scripts  may  create  this  bridge  prior  to  starting
212       ovn-controller  if desired. If this bridge does not exist when ovn-con‐
213       troller starts, it will be created automatically with the default  con‐
214       figuration  suggested  below.  The  ports  on  the  integration  bridge
215       include:
216
217              ·      On any chassis, tunnel ports that OVN  uses  to  maintain
218                     logical   network   connectivity.   ovn-controller  adds,
219                     updates, and removes these tunnel ports.
220
221              ·      On a hypervisor, any VIFs that are to be attached to log‐
222                     ical  networks. The hypervisor itself, or the integration
223                     between Open vSwitch and  the  hypervisor  (described  in
224                     IntegrationGuide.rst)  takes  care  of this. (This is not
225                     part of OVN or new to OVN; this is pre-existing  integra‐
226                     tion  work that has already been done on hypervisors that
227                     support OVS.)
228
229              ·      On a gateway, the physical port used for logical  network
230                     connectivity. System startup scripts add this port to the
231                     bridge prior to starting ovn-controller. This  can  be  a
232                     patch port to another bridge, instead of a physical port,
233                     in more sophisticated setups.
234
235       Other ports should not be attached to the integration bridge.  In  par‐
236       ticular, physical ports attached to the underlay network (as opposed to
237       gateway ports, which are physical ports attached to  logical  networks)
238       must not be attached to the integration bridge. Underlay physical ports
239       should instead be attached to a separate Open vSwitch bridge (they need
240       not be attached to any bridge at all, in fact).
241
242       The  integration  bridge  should  be configured as described below. The
243       effect   of   each    of    these    settings    is    documented    in
244       ovs-vswitchd.conf.db(5):
245
246              fail-mode=secure
247                     Avoids  switching  packets  between isolated logical net‐
248                     works before ovn-controller  starts  up.  See  Controller
249                     Failure Settings in ovs-vsctl(8) for more information.
250
251              other-config:disable-in-band=true
252                     Suppresses  in-band  control  flows  for  the integration
253                     bridge. It would be unusual for such  flows  to  show  up
254                     anyway,  because OVN uses a local controller (over a Unix
255                     domain socket) instead of a remote controller. It’s  pos‐
256                     sible,  however, for some other bridge in the same system
257                     to have an in-band remote controller, and  in  that  case
258                     this  suppresses  the  flows  that  in-band control would
259                     ordinarily set up. Refer to the  documentation  for  more
260                     information.
261
262       The  customary  name  for the integration bridge is br-int, but another
263       name may be used.
264
265   Logical Networks
266       A logical network implements the same concepts  as  physical  networks,
267       but  they are insulated from the physical network with tunnels or other
268       encapsulations. This allows logical networks to have  separate  IP  and
269       other address spaces that overlap, without conflicting, with those used
270       for physical networks. Logical network topologies can be arranged with‐
271       out  regard  for  the topologies of the physical networks on which they
272       run.
273
274       Logical network concepts in OVN include:
275
276              ·      Logical  switches,  the  logical  version   of   Ethernet
277                     switches.
278
279              ·      Logical routers, the logical version of IP routers. Logi‐
280                     cal switches and routers can be connected into  sophisti‐
281                     cated topologies.
282
283              ·      Logical  datapaths are the logical version of an OpenFlow
284                     switch. Logical switches and routers are both implemented
285                     as logical datapaths.
286
287              ·      Logical ports represent the points of connectivity in and
288                     out of logical switches and logical routers. Some  common
289                     types of logical ports are:
290
291                     ·      Logical ports representing VIFs.
292
293                     ·      Localnet  ports represent the points of connectiv‐
294                            ity between logical switches and the physical net‐
295                            work.  They  are  implemented  as  OVS patch ports
296                            between the integration bridge  and  the  separate
297                            Open  vSwitch  bridge that underlay physical ports
298                            attach to.
299
300                     ·      Logical patch ports represent the points  of  con‐
301                            nectivity  between  logical  switches  and logical
302                            routers, and in some cases  between  peer  logical
303                            routers. There is a pair of logical patch ports at
304                            each such point of connectivity, one on each side.
305
306                     ·      Localport ports represent the points of local con‐
307                            nectivity between logical switches and VIFs. These
308                            ports are present in every chassis (not  bound  to
309                            any  particular  one)  and  traffic from them will
310                            never go through a tunnel. A localport is expected
311                            to only generate traffic destined for a local des‐
312                            tination, typically in response to  a  request  it
313                            received.  One  use  case is how OpenStack Neutron
314                            uses a localport port for serving metadata to VM’s
315                            residing  on  every  hypervisor.  A metadata proxy
316                            process is attached to this port on every host and
317                            all  VM’s within the same network will reach it at
318                            the same IP/MAC address without any traffic  being
319                            sent over a tunnel. Further details can be seen at
320                            https://docs.openstack.org/developer/networking-
321                            ovn/design/metadata_api.html.
322
323   Life Cycle of a VIF
324       Tables and their schemas presented in isolation are difficult to under‐
325       stand. Here’s an example.
326
327       A VIF on a hypervisor is a virtual network interface attached either to
328       a  VM  or a container running directly on that hypervisor (This is dif‐
329       ferent from the interface of a container running inside a VM).
330
331       The steps in this example refer often to details of  the  OVN  and  OVN
332       Northbound  database  schemas.  Please  see  ovn-sb(5)  and  ovn-nb(5),
333       respectively, for the full story on these databases.
334
335              1.
336                A VIF’s life cycle begins when a CMS administrator  creates  a
337                new  VIF  using the CMS user interface or API and adds it to a
338                switch (one implemented by OVN as a logical switch).  The  CMS
339                updates  its  own  configuration.  This  includes  associating
340                unique, persistent identifier vif-id and Ethernet address  mac
341                with the VIF.
342
343              2.
344                The  CMS plugin updates the OVN Northbound database to include
345                the new VIF, by adding a row to the Logical_Switch_Port table.
346                In  the  new row, name is vif-id, mac is mac, switch points to
347                the OVN logical switch’s Logical_Switch record, and other col‐
348                umns are initialized appropriately.
349
350              3.
351                ovn-northd  receives  the  OVN  Northbound database update. In
352                turn, it makes the corresponding updates to the OVN Southbound
353                database,  by adding rows to the OVN Southbound database Logi‐
354                cal_Flow table to reflect the new port, e.g.  add  a  flow  to
355                recognize  that packets destined to the new port’s MAC address
356                should be delivered to it, and update the flow  that  delivers
357                broadcast  and  multicast  packets to include the new port. It
358                also creates a record in the Binding table and  populates  all
359                its columns except the column that identifies the chassis.
360
361              4.
362                On  every hypervisor, ovn-controller receives the Logical_Flow
363                table updates that ovn-northd made in the  previous  step.  As
364                long  as  the  VM  that  owns the VIF is powered off, ovn-con‐
365                troller cannot do much; it cannot,  for  example,  arrange  to
366                send  packets  to or receive packets from the VIF, because the
367                VIF does not actually exist anywhere.
368
369              5.
370                Eventually, a user powers on the VM that owns the VIF. On  the
371                hypervisor where the VM is powered on, the integration between
372                the  hypervisor  and  Open  vSwitch  (described  in   Integra‐
373                tionGuide.rst)  adds the VIF to the OVN integration bridge and
374                stores vif-id in external_ids:iface-id to  indicate  that  the
375                interface  is  an  instantiation of the new VIF. (None of this
376                code is new in OVN; this is pre-existing integration work that
377                has already been done on hypervisors that support OVS.)
378
379              6.
380                On  the  hypervisor where the VM is powered on, ovn-controller
381                notices  external_ids:iface-id  in  the  new   Interface.   In
382                response, in the OVN Southbound DB, it updates the Binding ta‐
383                ble’s chassis column for the row that links the  logical  port
384                from  external_ids:  iface-id  to  the  hypervisor. Afterward,
385                ovn-controller updates the local hypervisor’s OpenFlow  tables
386                so that packets to and from the VIF are properly handled.
387
388              7.
389                Some  CMS  systems, including OpenStack, fully start a VM only
390                when its networking is  ready.  To  support  this,  ovn-northd
391                notices  the chassis column updated for the row in Binding ta‐
392                ble and pushes this upward by updating the up  column  in  the
393                OVN  Northbound  database’s Logical_Switch_Port table to indi‐
394                cate that the VIF is now up. The CMS, if it uses this feature,
395                can then react by allowing the VM’s execution to proceed.
396
397              8.
398                On  every  hypervisor  but  the  one  where  the  VIF resides,
399                ovn-controller notices the completely  populated  row  in  the
400                Binding table. This provides ovn-controller the physical loca‐
401                tion of the logical port, so each instance updates  the  Open‐
402                Flow  tables of its switch (based on logical datapath flows in
403                the OVN DB Logical_Flow table) so that packets to and from the
404                VIF can be properly handled via tunnels.
405
406              9.
407                Eventually, a user powers off the VM that owns the VIF. On the
408                hypervisor where the VM was powered off, the  VIF  is  deleted
409                from the OVN integration bridge.
410
411              10.
412                On the hypervisor where the VM was powered off, ovn-controller
413                notices that the VIF was deleted. In response, it removes  the
414                Chassis  column  content  in the Binding table for the logical
415                port.
416
417              11.
418                On every hypervisor, ovn-controller notices the empty  Chassis
419                column  in  the Binding table’s row for the logical port. This
420                means that ovn-controller no longer knows the  physical  loca‐
421                tion  of  the logical port, so each instance updates its Open‐
422                Flow table to reflect that.
423
424              12.
425                Eventually, when the VIF (or  its  entire  VM)  is  no  longer
426                needed  by  anyone, an administrator deletes the VIF using the
427                CMS user interface or API. The CMS updates its own  configura‐
428                tion.
429
430              13.
431                The  CMS  plugin removes the VIF from the OVN Northbound data‐
432                base, by deleting its row in the Logical_Switch_Port table.
433
434              14.
435                ovn-northd receives the OVN  Northbound  update  and  in  turn
436                updates  the  OVN Southbound database accordingly, by removing
437                or updating the rows from the OVN  Southbound  database  Logi‐
438                cal_Flow table and Binding table that were related to the now-
439                destroyed VIF.
440
441              15.
442                On every hypervisor, ovn-controller receives the  Logical_Flow
443                table  updates  that  ovn-northd  made  in  the previous step.
444                ovn-controller updates OpenFlow tables to reflect the  update,
445                although  there  may  not  be  much  to  do, since the VIF had
446                already become unreachable when it was removed from the  Bind‐
447                ing table in a previous step.
448
449   Life Cycle of a Container Interface Inside a VM
450       OVN  provides  virtual  network  abstractions by converting information
451       written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
452       virtual  networking  for multi-tenants can only be provided if OVN con‐
453       troller is the only entity that can modify flows in Open vSwitch.  When
454       the  Open vSwitch integration bridge resides in the hypervisor, it is a
455       fair assumption to make that tenant workloads running inside VMs cannot
456       make any changes to Open vSwitch flows.
457
458       If  the infrastructure provider trusts the applications inside the con‐
459       tainers not to break out and modify the Open vSwitch flows,  then  con‐
460       tainers  can be run in hypervisors. This is also the case when contain‐
461       ers are run inside the VMs and Open  vSwitch  integration  bridge  with
462       flows  added  by  OVN  controller  resides in the same VM. For both the
463       above cases, the workflow is the same as explained with an  example  in
464       the previous section ("Life Cycle of a VIF").
465
466       This  section talks about the life cycle of a container interface (CIF)
467       when containers are created in the VMs and the Open vSwitch integration
468       bridge resides inside the hypervisor. In this case, even if a container
469       application breaks out, other tenants are not affected because the con‐
470       tainers  running  inside  the  VMs  cannot modify the flows in the Open
471       vSwitch integration bridge.
472
473       When multiple containers are created inside a VM,  there  are  multiple
474       CIFs  associated  with  them. The network traffic associated with these
475       CIFs need to reach the Open vSwitch integration bridge running  in  the
476       hypervisor  for OVN to support virtual network abstractions. OVN should
477       also be able to distinguish network traffic coming from different CIFs.
478       There are two ways to distinguish network traffic of CIFs.
479
480       One  way  is  to  provide one VIF for every CIF (1:1 model). This means
481       that there could be a lot of network devices in  the  hypervisor.  This
482       would slow down OVS because of all the additional CPU cycles needed for
483       the management of all the VIFs. It would also mean that the entity cre‐
484       ating  the  containers in a VM should also be able to create the corre‐
485       sponding VIFs in the hypervisor.
486
487       The second way is to provide a single VIF  for  all  the  CIFs  (1:many
488       model).  OVN could then distinguish network traffic coming from differ‐
489       ent CIFs via a tag written in every packet. OVN uses this mechanism and
490       uses VLAN as the tagging mechanism.
491
492              1.
493                A CIF’s life cycle begins when a container is spawned inside a
494                VM by the either the same CMS that created the VM or a  tenant
495                that  owns  that  VM  or even a container Orchestration System
496                that is different than the CMS that initially created the  VM.
497                Whoever the entity is, it will need to know the vif-id that is
498                associated with the network interface of the VM through  which
499                the  container  interface’s  network traffic is expected to go
500                through. The entity that creates the container interface  will
501                also need to choose an unused VLAN inside that VM.
502
503              2.
504                The  container spawning entity (either directly or through the
505                CMS that manages the underlying  infrastructure)  updates  the
506                OVN  Northbound  database  to include the new CIF, by adding a
507                row to the Logical_Switch_Port table. In the new row, name  is
508                any  unique  identifier,  parent_name  is the vif-id of the VM
509                through which the CIF’s network  traffic  is  expected  to  go
510                through  and  the tag is the VLAN tag that identifies the net‐
511                work traffic of that CIF.
512
513              3.
514                ovn-northd receives the OVN  Northbound  database  update.  In
515                turn, it makes the corresponding updates to the OVN Southbound
516                database, by adding rows to the OVN Southbound database’s Log‐
517                ical_Flow table to reflect the new port and also by creating a
518                new row in the Binding table and populating  all  its  columns
519                except the column that identifies the chassis.
520
521              4.
522                On  every hypervisor, ovn-controller subscribes to the changes
523                in the Binding table. When a new row is created by  ovn-northd
524                that  includes a value in parent_port column of Binding table,
525                the ovn-controller in the  hypervisor  whose  OVN  integration
526                bridge  has that same value in vif-id in external_ids:iface-id
527                updates the local hypervisor’s OpenFlow tables so that packets
528                to  and from the VIF with the particular VLAN tag are properly
529                handled. Afterward it updates the chassis column of the  Bind‐
530                ing to reflect the physical location.
531
532              5.
533                One  can only start the application inside the container after
534                the underlying network is ready. To support  this,  ovn-northd
535                notices  the  updated  chassis  column  in  Binding  table and
536                updates the up column in the OVN Northbound  database’s  Logi‐
537                cal_Switch_Port  table to indicate that the CIF is now up. The
538                entity responsible to start the container application  queries
539                this value and starts the application.
540
541              6.
542                Eventually  the entity that created and started the container,
543                stops it. The entity, through the CMS  (or  directly)  deletes
544                its row in the Logical_Switch_Port table.
545
546              7.
547                ovn-northd  receives  the  OVN  Northbound  update and in turn
548                updates the OVN Southbound database accordingly,  by  removing
549                or  updating  the  rows from the OVN Southbound database Logi‐
550                cal_Flow table that were related to the now-destroyed CIF.  It
551                also deletes the row in the Binding table for that CIF.
552
553              8.
554                On  every hypervisor, ovn-controller receives the Logical_Flow
555                table updates that  ovn-northd  made  in  the  previous  step.
556                ovn-controller updates OpenFlow tables to reflect the update.
557
558   Architectural Physical Life Cycle of a Packet
559       This section describes how a packet travels from one virtual machine or
560       container to another through OVN. This description focuses on the phys‐
561       ical treatment of a packet; for a description of the logical life cycle
562       of a packet, please refer to the Logical_Flow table in ovn-sb(5).
563
564       This section mentions several data and  metadata  fields,  for  clarity
565       summarized here:
566
567              tunnel key
568                     When  OVN encapsulates a packet in Geneve or another tun‐
569                     nel, it attaches extra data to it to allow the  receiving
570                     OVN  instance to process it correctly. This takes differ‐
571                     ent forms depending on the particular encapsulation,  but
572                     in  each  case we refer to it here as the ``tunnel key.’’
573                     See Tunnel Encapsulations, below, for details.
574
575              logical datapath field
576                     A field that denotes the logical datapath through which a
577                     packet  is being processed. OVN uses the field that Open‐
578                     Flow 1.1+ simply (and confusingly) calls ``metadata’’  to
579                     store  the logical datapath. (This field is passed across
580                     tunnels as part of the tunnel key.)
581
582              logical input port field
583                     A field that denotes the  logical  port  from  which  the
584                     packet  entered  the logical datapath. OVN stores this in
585                     Open vSwitch extension register number 14.
586
587                     Geneve and STT tunnels pass this field  as  part  of  the
588                     tunnel  key.  Although  VXLAN  tunnels  do not explicitly
589                     carry a logical input port, OVN only uses VXLAN to commu‐
590                     nicate  with gateways that from OVN’s perspective consist
591                     of only a single logical port, so that OVN  can  set  the
592                     logical  input  port  field to this one on ingress to the
593                     OVN logical pipeline.
594
595              logical output port field
596                     A field that denotes the  logical  port  from  which  the
597                     packet  will leave the logical datapath. This is initial‐
598                     ized to 0 at the beginning of the logical  ingress  pipe‐
599                     line.  OVN stores this in Open vSwitch extension register
600                     number 15.
601
602                     Geneve and STT tunnels pass this field  as  part  of  the
603                     tunnel  key.  VXLAN  tunnels  do not transmit the logical
604                     output port field. Since VXLAN tunnels  do  not  carry  a
605                     logical  output  port  field  in  the  tunnel key, when a
606                     packet is received from VXLAN tunnel by an  OVN  hypervi‐
607                     sor,  the  packet  is resubmitted to table 8 to determine
608                     the output port(s); when the  packet  reaches  table  32,
609                     these  packets  are  resubmitted  to  table  33 for local
610                     delivery by checking a MLF_RCV_FROM_VXLAN flag, which  is
611                     set when the packet arrives from a VXLAN tunnel.
612
613              conntrack zone field for logical ports
614                     A  field  that  denotes  the connection tracking zone for
615                     logical ports. The value only has local significance  and
616                     is not meaningful between chassis. This is initialized to
617                     0 at the beginning of the logical ingress  pipeline.  OVN
618                     stores this in Open vSwitch extension register number 13.
619
620              conntrack zone fields for routers
621                     Fields  that  denote  the  connection  tracking zones for
622                     routers. These values only have  local  significance  and
623                     are  not  meaningful between chassis. OVN stores the zone
624                     information for DNATting in Open vSwitch extension regis‐
625                     ter  number  11  and zone information for SNATing in Open
626                     vSwitch extension register number 12.
627
628              logical flow flags
629                     The logical flags are intended to handle keeping  context
630                     between  tables  in order to decide which rules in subse‐
631                     quent tables are matched. These values  only  have  local
632                     significance  and are not meaningful between chassis. OVN
633                     stores the logical flags in Open vSwitch extension regis‐
634                     ter number 10.
635
636              VLAN ID
637                     The  VLAN ID is used as an interface between OVN and con‐
638                     tainers nested inside a VM (see Life Cycle of a container
639                     interface inside a VM, above, for more information).
640
641       Initially,  a  VM or container on the ingress hypervisor sends a packet
642       on a port attached to the OVN integration bridge. Then:
643
644              1.
645                OpenFlow table 0 performs physical-to-logical translation.  It
646                matches  the  packet’s  ingress port. Its actions annotate the
647                packet with logical metadata, by setting the logical  datapath
648                field  to  identify  the  logical  datapath that the packet is
649                traversing and the logical input port field  to  identify  the
650                ingress  port. Then it resubmits to table 8 to enter the logi‐
651                cal ingress pipeline.
652
653                Packets that originate from a container nested within a VM are
654                treated in a slightly different way. The originating container
655                can be distinguished based on the VIF-specific VLAN ID, so the
656                physical-to-logical  translation  flows  additionally match on
657                VLAN ID and the actions strip the VLAN header. Following  this
658                step,  OVN  treats packets from containers just like any other
659                packets.
660
661                Table 0 also processes packets that arrive from other chassis.
662                It  distinguishes  them  from  other  packets by ingress port,
663                which is a tunnel. As with packets just entering the OVN pipe‐
664                line, the actions annotate these packets with logical datapath
665                and logical ingress port metadata. In  addition,  the  actions
666                set  the logical output port field, which is available because
667                in OVN tunneling occurs  after  the  logical  output  port  is
668                known. These three pieces of information are obtained from the
669                tunnel encapsulation metadata (see Tunnel  Encapsulations  for
670                encoding  details).  Then  the actions resubmit to table 33 to
671                enter the logical egress pipeline.
672
673              2.
674                OpenFlow tables 8 through 31 execute the logical ingress pipe‐
675                line  from  the Logical_Flow table in the OVN Southbound data‐
676                base. These tables are expressed entirely in terms of  logical
677                concepts  like logical ports and logical datapaths. A big part
678                of ovn-controller’s job is to translate them  into  equivalent
679                OpenFlow (in particular it translates the table numbers: Logi‐
680                cal_Flow tables 0 through 23 become OpenFlow tables 8  through
681                31).
682
683                Each  logical  flow  maps  to  one  or more OpenFlow flows. An
684                actual packet ordinarily matches only one of  these,  although
685                in some cases it can match more than one of these flows (which
686                is not a problem because all of them have the  same  actions).
687                ovn-controller  uses  the  first 32 bits of the logical flow’s
688                UUID as the cookie for its OpenFlow flow or  flows.  (This  is
689                not  necessarily  unique, since the first 32 bits of a logical
690                flow’s UUID is not necessarily unique.)
691
692                Some logical flows can map to the Open  vSwitch  ``conjunctive
693                match’’  extension  (see ovs-fields(7)). Flows with a conjunc‐
694                tion action use an OpenFlow cookie of 0, because they can cor‐
695                respond  to  multiple  logical  flows. The OpenFlow flow for a
696                conjunctive match includes a match on conj_id.
697
698                Some logical flows may not  be  represented  in  the  OpenFlow
699                tables  on  a  given  hypervisor, if they could not be used on
700                that hypervisor. For example, if no VIF in  a  logical  switch
701                resides  on  a given hypervisor, and the logical switch is not
702                otherwise reachable on that hypervisor (e.g. over a series  of
703                hops  through logical switches and routers starting from a VIF
704                on the hypervisor), then the logical flow may  not  be  repre‐
705                sented there.
706
707                Most  OVN actions have fairly obvious implementations in Open‐
708                Flow (with OVS  extensions),  e.g.  next;  is  implemented  as
709                resubmit,  field  =  constant;  as  set_field. A few are worth
710                describing in more detail:
711
712                output:
713                       Implemented by resubmitting the packet to table 32.  If
714                       the pipeline executes more than one output action, then
715                       each one is separately resubmitted to  table  32.  This
716                       can  be  used  to send multiple copies of the packet to
717                       multiple ports. (If the packet was not modified between
718                       the output actions, and some of the copies are destined
719                       to the same hypervisor, then using a logical  multicast
720                       output port would save bandwidth between hypervisors.)
721
722                get_arp(P, A);
723                get_nd(P, A);
724                     Implemented  by  storing  arguments into OpenFlow fields,
725                     then resubmitting to table 66, which ovn-controller popu‐
726                     lates  with flows generated from the MAC_Binding table in
727                     the OVN Southbound database. If there is a match in table
728                     66,  then its actions store the bound MAC in the Ethernet
729                     destination address field.
730
731                     (The OpenFlow  actions  save  and  restore  the  OpenFlow
732                     fields used for the arguments, so that the OVN actions do
733                     not have to be aware of this temporary use.)
734
735                put_arp(P, A, E);
736                put_nd(P, A, E);
737                     Implemented  by  storing  the  arguments  into   OpenFlow
738                     fields, then outputting a packet to ovn-controller, which
739                     updates the MAC_Binding table.
740
741                     (The OpenFlow  actions  save  and  restore  the  OpenFlow
742                     fields used for the arguments, so that the OVN actions do
743                     not have to be aware of this temporary use.)
744
745              3.
746                OpenFlow tables 32 through 47 implement the output  action  in
747                the  logical  ingress pipeline. Specifically, table 32 handles
748                packets to remote hypervisors, table 33 handles packets to the
749                local  hypervisor,  and  table 34 checks whether packets whose
750                logical ingress and egress port are the same  should  be  dis‐
751                carded.
752
753                Logical patch ports are a special case. Logical patch ports do
754                not have a physical location and effectively reside  on  every
755                hypervisor.  Thus,  flow  table 33, for output to ports on the
756                local hypervisor, naturally implements output to unicast logi‐
757                cal  patch  ports  too.  However, applying the same logic to a
758                logical patch port that is part of a logical  multicast  group
759                yields  packet  duplication, because each hypervisor that con‐
760                tains a logical port in the multicast group will  also  output
761                the  packet  to the logical patch port. Thus, multicast groups
762                implement output to logical patch ports in table 32.
763
764                Each flow in table 32 matches on a  logical  output  port  for
765                unicast or multicast logical ports that include a logical port
766                on a remote hypervisor. Each flow’s actions implement  sending
767                a  packet  to  the port it matches. For unicast logical output
768                ports on remote hypervisors, the actions set the tunnel key to
769                the  correct value, then send the packet on the tunnel port to
770                the correct hypervisor. (When the remote  hypervisor  receives
771                the  packet,  table  0  there  will recognize it as a tunneled
772                packet and pass it along to table 33.) For  multicast  logical
773                output  ports, the actions send one copy of the packet to each
774                remote hypervisor, in the same way  as  for  unicast  destina‐
775                tions.  If  a multicast group includes a logical port or ports
776                on the local hypervisor, then its actions also resubmit to ta‐
777                ble 33. Table 32 also includes:
778
779                ·      A  higher-priority  rule to match packets received from
780                       VXLAN tunnels, based on  flag  MLF_RCV_FROM_VXLAN,  and
781                       resubmit  these packets to table 33 for local delivery.
782                       Packets received from VXLAN tunnels reach here  because
783                       of  a  lack  of logical output port field in the tunnel
784                       key and thus these packets needed to  be  submitted  to
785                       table 8 to determine the output port.
786
787                ·      A  higher-priority  rule to match packets received from
788                       ports of type localport, based  on  the  logical  input
789                       port,  and resubmit these packets to table 33 for local
790                       delivery. Ports of type localport exist on every hyper‐
791                       visor  and  by definition their traffic should never go
792                       out through a tunnel.
793
794                ·      A higher-priority rule to match packets that  have  the
795                       MLF_LOCAL_ONLY  logical flow flag set, and whose desti‐
796                       nation is a multicast address. This flag indicates that
797                       the  packet  should not be delivered to remote hypervi‐
798                       sors, even if the multicast destination includes  ports
799                       on  remote hypervisors. This flag is used when ovn-con‐
800                       troller is the  originator  of  the  multicast  packet.
801                       Since each ovn-controller instance is originating these
802                       packets, the packets only need to be delivered to local
803                       ports.
804
805                ·      A  fallback flow that resubmits to table 33 if there is
806                       no other match.
807
808                Flows in table 33 resemble those in table 32 but  for  logical
809                ports  that  reside  locally rather than remotely. For unicast
810                logical output ports on the local hypervisor, the actions just
811                resubmit  to table 34. For multicast output ports that include
812                one or more logical ports on the local  hypervisor,  for  each
813                such  logical  port  P,  the actions change the logical output
814                port to P, then resubmit to table 34.
815
816                A special case is that when a  localnet  port  exists  on  the
817                datapath,  remote port is connected by switching to the local‐
818                net port. In this case, instead of adding a flow in  table  32
819                to  reach  the  remote  port,  a  flow is added in table 33 to
820                switch the logical outport to the localnet port, and  resubmit
821                to  table  33 as if it were unicasted to a logical port on the
822                local hypervisor.
823
824                Table 34 matches and drops packets for which the logical input
825                and  output ports are the same and the MLF_ALLOW_LOOPBACK flag
826                is not set. It resubmits other packets to table 40.
827
828              4.
829                OpenFlow tables 40 through 63 execute the logical egress pipe‐
830                line  from  the Logical_Flow table in the OVN Southbound data‐
831                base. The egress pipeline can perform a final stage of valida‐
832                tion  before  packet  delivery.  Eventually, it may execute an
833                output action, which ovn-controller implements by resubmitting
834                to  table  64.  A packet for which the pipeline never executes
835                output is effectively  dropped  (although  it  may  have  been
836                transmitted through a tunnel across a physical network).
837
838                The  egress  pipeline cannot change the logical output port or
839                cause further tunneling.
840
841              5.
842                Table 64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK is
843                set. Logical loopback was handled in table 34, but OpenFlow by
844                default also prevents loopback to the OpenFlow  ingress  port.
845                Thus,  when MLF_ALLOW_LOOPBACK is set, OpenFlow table 64 saves
846                the OpenFlow ingress port, sets it to zero, resubmits to table
847                65  for  logical-to-physical transformation, and then restores
848                the OpenFlow  ingress  port,  effectively  disabling  OpenFlow
849                loopback  prevents. When MLF_ALLOW_LOOPBACK is unset, table 64
850                flow simply resubmits to table 65.
851
852              6.
853                OpenFlow table 65  performs  logical-to-physical  translation,
854                the  opposite  of  table  0.  It  matches the packet’s logical
855                egress port.  Its  actions  output  the  packet  to  the  port
856                attached  to  the  OVN integration bridge that represents that
857                logical port. If the logical egress port is a container nested
858                with  a VM, then before sending the packet the actions push on
859                a VLAN header with an appropriate VLAN ID.
860
861   Logical Routers and Logical Patch Ports
862       Typically logical routers and logical patch ports do not have a  physi‐
863       cal  location  and  effectively reside on every hypervisor. This is the
864       case for logical  patch  ports  between  logical  routers  and  logical
865       switches behind those logical routers, to which VMs (and VIFs) attach.
866
867       Consider a packet sent from one virtual machine or container to another
868       VM or container that resides on a different  subnet.  The  packet  will
869       traverse  tables 0 to 65 as described in the previous section Architec‐
870       tural Physical Life Cycle of a Packet, using the logical datapath  rep‐
871       resenting  the  logical switch that the sender is attached to. At table
872       32, the packet will use the fallback flow that resubmits locally to ta‐
873       ble 33 on the same hypervisor. In this case, all of the processing from
874       table 0 to table 65 occurs on the hypervisor where the sender resides.
875
876       When the packet reaches table 65, the logical egress port is a  logical
877       patch port. The implementation in table 65 differs depending on the OVS
878       version, although the observed behavior is meant to be the same:
879
880              ·      In OVS versions 2.6 and earlier, table 65 outputs  to  an
881                     OVS  patch  port  that represents the logical patch port.
882                     The packet re-enters the OpenFlow flow table from the OVS
883                     patch  port’s peer in table 0, which identifies the logi‐
884                     cal datapath and logical input  port  based  on  the  OVS
885                     patch port’s OpenFlow port number.
886
887              ·      In  OVS  versions 2.7 and later, the packet is cloned and
888                     resubmitted directly to the first OpenFlow flow table  in
889                     the ingress pipeline, setting the logical ingress port to
890                     the peer logical patch port, and using the  peer  logical
891                     patch  port’s logical datapath (that represents the logi‐
892                     cal router).
893
894       The packet re-enters the ingress pipeline in order to traverse tables 8
895       to 65 again, this time using the logical datapath representing the log‐
896       ical router. The processing continues as described in the previous sec‐
897       tion  Architectural  Physical  Life  Cycle of a Packet. When the packet
898       reachs table 65, the logical egress port will once again be  a  logical
899       patch  port.  In the same manner as described above, this logical patch
900       port will cause the packet to be resubmitted to OpenFlow  tables  8  to
901       65,  this  time  using  the  logical  datapath representing the logical
902       switch that the destination VM or container is attached to.
903
904       The packet traverses tables 8 to 65 a third and final time. If the des‐
905       tination  VM or container resides on a remote hypervisor, then table 32
906       will send the packet on a tunnel port from the sender’s  hypervisor  to
907       the remote hypervisor. Finally table 65 will output the packet directly
908       to the destination VM or container.
909
910       The following sections describe two exceptions, where  logical  routers
911       and/or logical patch ports are associated with a physical location.
912
913     Gateway Routers
914
915       A  gateway router is a logical router that is bound to a physical loca‐
916       tion. This includes all of the  logical  patch  ports  of  the  logical
917       router,  as  well  as  all  of  the peer logical patch ports on logical
918       switches. In the OVN Southbound database, the Port_Binding entries  for
919       these  logical patch ports use the type l3gateway rather than patch, in
920       order to distinguish that these logical patch  ports  are  bound  to  a
921       chassis.
922
923       When a hypervisor processes a packet on a logical datapath representing
924       a logical switch, and the logical egress port is a l3gateway port  rep‐
925       resenting  connectivity  to  a  gateway router, the packet will match a
926       flow in table 32 that sends the packet on a tunnel port to the  chassis
927       where  the  gateway router resides. This processing in table 32 is done
928       in the same manner as for VIFs.
929
930       Gateway routers are  typically  used  in  between  distributed  logical
931       routers  and  physical networks. The distributed logical router and the
932       logical switches behind it, to which VMs and containers attach,  effec‐
933       tively  reside on each hypervisor. The distributed router and the gate‐
934       way router are connected by another logical switch, sometimes  referred
935       to as a join logical switch. On the other side, the gateway router con‐
936       nects to another logical switch that has a localnet port connecting  to
937       the physical network.
938
939       When using gateway routers, DNAT and SNAT rules are associated with the
940       gateway router, which provides a central location that can handle  one-
941       to-many SNAT (aka IP masquerading).
942
943     Distributed Gateway Ports
944
945       Distributed  gateway ports are logical router patch ports that directly
946       connect distributed logical routers to logical switches  with  localnet
947       ports.
948
949       The  primary  design  goal  of distributed gateway ports is to allow as
950       much traffic as possible to be handled locally on the hypervisor  where
951       a  VM  or  container resides. Whenever possible, packets from the VM or
952       container to the outside world should be processed completely  on  that
953       VM’s  or  container’s hypervisor, eventually traversing a localnet port
954       instance on that hypervisor to the physical network. Whenever possible,
955       packets  from the outside world to a VM or container should be directed
956       through the physical network directly to the VM’s or container’s hyper‐
957       visor,  where  the  packet  will enter the integration bridge through a
958       localnet port.
959
960       In order to allow for the distributed processing of  packets  described
961       in  the  paragraph  above, distributed gateway ports need to be logical
962       patch ports that effectively reside on every  hypervisor,  rather  than
963       l3gateway  ports  that  are bound to a particular chassis. However, the
964       flows associated with distributed gateway ports often need to be  asso‐
965       ciated with physical locations, for the following reasons:
966
967              ·      The  physical  network that the localnet port is attached
968                     to typically uses L2 learning. Any Ethernet address  used
969                     over the distributed gateway port must be restricted to a
970                     single physical location so that upstream L2 learning  is
971                     not  confused.  Traffic  sent out the distributed gateway
972                     port towards the localnet port with a  specific  Ethernet
973                     address  must  be  sent  out one specific instance of the
974                     distributed gateway port on one specific chassis. Traffic
975                     received  from  the  localnet  port (or from a VIF on the
976                     same logical switch as the localnet port) with a specific
977                     Ethernet address must be directed to the logical switch’s
978                     patch port instance on that specific chassis.
979
980                     Due to the implications  of  L2  learning,  the  Ethernet
981                     address  and  IP  address of the distributed gateway port
982                     need to be restricted to a single physical location.  For
983                     this reason, the user must specify one chassis associated
984                     with the distributed  gateway  port.  Note  that  traffic
985                     traversing  the distributed gateway port using other Eth‐
986                     ernet addresses and IP addresses (e.g. one-to-one NAT) is
987                     not restricted to this chassis.
988
989                     Replies  to  ARP  and ND requests must be restricted to a
990                     single physical location, where the Ethernet  address  in
991                     the  reply  resides. This includes ARP and ND replies for
992                     the IP address of the distributed gateway port, which are
993                     restricted  to  the chassis that the user associated with
994                     the distributed gateway port.
995
996              ·      In order to support one-to-many SNAT (aka  IP  masquerad‐
997                     ing),  where  multiple logical IP addresses spread across
998                     multiple chassis are  mapped  to  a  single  external  IP
999                     address, it will be necessary to handle some of the logi‐
1000                     cal router processing on a specific chassis in a central‐
1001                     ized  manner. Since the SNAT external IP address is typi‐
1002                     cally the distributed gateway port IP  address,  and  for
1003                     simplicity, the same chassis associated with the distrib‐
1004                     uted gateway port is used.
1005
1006       The details of flow restrictions to specific chassis are  described  in
1007       the ovn-northd documentation.
1008
1009       While  most  of  the physical location dependent aspects of distributed
1010       gateway ports can be handled by  restricting  some  flows  to  specific
1011       chassis, one additional mechanism is required. When a packet leaves the
1012       ingress pipeline and the logical egress port is the distributed gateway
1013       port, one of two different sets of actions is required at table 32:
1014
1015              ·      If  the  packet  can  be  handled locally on the sender’s
1016                     hypervisor (e.g. one-to-one NAT traffic), then the packet
1017                     should  just  be  resubmitted locally to table 33, in the
1018                     normal manner for distributed logical patch ports.
1019
1020              ·      However, if the packet needs to be handled on the chassis
1021                     associated  with  the distributed gateway port (e.g. one-
1022                     to-many SNAT traffic or non-NAT traffic), then  table  32
1023                     must send the packet on a tunnel port to that chassis.
1024
1025       In order to trigger the second set of actions, the chassisredirect type
1026       of southbound Port_Binding has been added. Setting the  logical  egress
1027       port  to the type chassisredirect logical port is simply a way to indi‐
1028       cate that although the packet is destined for the  distributed  gateway
1029       port,  it  needs  to be redirected to a different chassis. At table 32,
1030       packets with this logical egress port are sent to a  specific  chassis,
1031       in the same way that table 32 directs packets whose logical egress port
1032       is a VIF or a type l3gateway port to different chassis. Once the packet
1033       arrives at that chassis, table 33 resets the logical egress port to the
1034       value representing the distributed gateway port. For  each  distributed
1035       gateway  port,  there  is one type chassisredirect port, in addition to
1036       the distributed logical patch port representing the distributed gateway
1037       port.
1038
1039     High Availability for Distributed Gateway Ports
1040
1041       OVN  allows you to specify a prioritized list of chassis for a distrib‐
1042       uted gateway port. This is done by associating multiple Gateway_Chassis
1043       rows with a Logical_Router_Port in the OVN_Northbound database.
1044
1045       When  multiple  chassis  have been specified for a gateway, all chassis
1046       that may send packets to that gateway will enable BFD on tunnels to all
1047       configured  gateway chassis. The current master chassis for the gateway
1048       is the highest priority gateway chassis that  is  currently  viewed  as
1049       active based on BFD status.
1050
1051       For  more  information on L3 gateway high availability, please refer to
1052       http://docs.openvswitch.org/en/latest/topics/high-availability.
1053
1054   Life Cycle of a VTEP gateway
1055       A gateway is a chassis that forwards traffic  between  the  OVN-managed
1056       part of a logical network and a physical VLAN, extending a tunnel-based
1057       logical network into a physical network.
1058
1059       The steps below refer often to details of the  OVN  and  VTEP  database
1060       schemas. Please see ovn-sb(5), ovn-nb(5) and vtep(5), respectively, for
1061       the full story on these databases.
1062
1063              1.
1064                A VTEP gateway’s life cycle begins with the administrator reg‐
1065                istering  the VTEP gateway as a Physical_Switch table entry in
1066                the VTEP database. The ovn-controller-vtep connected  to  this
1067                VTEP  database, will recognize the new VTEP gateway and create
1068                a new Chassis table entry for it in the  OVN_Southbound  data‐
1069                base.
1070
1071              2.
1072                The  administrator  can then create a new Logical_Switch table
1073                entry, and bind a particular vlan on a VTEP gateway’s port  to
1074                any  VTEP  logical switch. Once a VTEP logical switch is bound
1075                to a VTEP gateway, the ovn-controller-vtep will detect it  and
1076                add  its name to the vtep_logical_switches column of the Chas‐
1077                sis table in the OVN_Southbound database. Note, the tunnel_key
1078                column  of  VTEP logical switch is not filled at creation. The
1079                ovn-controller-vtep will set the column when the  correponding
1080                vtep logical switch is bound to an OVN logical network.
1081
1082              3.
1083                Now,  the  administrator can use the CMS to add a VTEP logical
1084                switch to the OVN logical network. To do that,  the  CMS  must
1085                first  create  a  new  Logical_Switch_Port  table entry in the
1086                OVN_Northbound database. Then, the type column of  this  entry
1087                must be set to "vtep". Next, the vtep-logical-switch and vtep-
1088                physical-switch keys in the options column must also be speci‐
1089                fied, since multiple VTEP gateways can attach to the same VTEP
1090                logical switch.
1091
1092              4.
1093                The newly created logical port in the OVN_Northbound  database
1094                and  its  configuration  will be passed down to the OVN_South‐
1095                bound database as a new Port_Binding table entry. The ovn-con‐
1096                troller-vtep  will  recognize  the change and bind the logical
1097                port to the corresponding VTEP gateway chassis.  Configuration
1098                of  binding  the  same  VTEP logical switch to a different OVN
1099                logical networks is not allowed and a warning will  be  gener‐
1100                ated in the log.
1101
1102              5.
1103                Beside  binding  to  the  VTEP  gateway  chassis, the ovn-con‐
1104                troller-vtep will update the tunnel_key  column  of  the  VTEP
1105                logical  switch  to  the  corresponding Datapath_Binding table
1106                entry’s tunnel_key for the bound OVN logical network.
1107
1108              6.
1109                Next, the ovn-controller-vtep will keep reacting to  the  con‐
1110                figuration  change  in  the Port_Binding in the OVN_Northbound
1111                database, and updating the Ucast_Macs_Remote table in the VTEP
1112                database.  This allows the VTEP gateway to understand where to
1113                forward the unicast traffic coming from the extended  external
1114                network.
1115
1116              7.
1117                Eventually, the VTEP gateway’s life cycle ends when the admin‐
1118                istrator unregisters the VTEP gateway from the VTEP  database.
1119                The  ovn-controller-vtep  will  recognize the event and remove
1120                all related configurations (Chassis table entry and port bind‐
1121                ings) in the OVN_Southbound database.
1122
1123              8.
1124                When  the  ovn-controller-vtep is terminated, all related con‐
1125                figurations in the OVN_Southbound database and the VTEP  data‐
1126                base  will be cleaned, including Chassis table entries for all
1127                registered VTEP gateways and  their  port  bindings,  and  all
1128                Ucast_Macs_Remote  table entries and the Logical_Switch tunnel
1129                keys.
1130

SECURITY

1132   Role-Based Access Controls for the Soutbound DB
1133       In order to provide additional security against the possibility  of  an
1134       OVN  chassis becoming compromised in such a way as to allow rogue soft‐
1135       ware to make arbitrary modifications to the southbound  database  state
1136       and  thus  disrupt  the  OVN  network,  role-based access controls (see
1137       ovsdb-server(1) for additional details) are provided for the southbound
1138       database.
1139
1140       The  implementation  of  role-based access controls (RBAC) requires the
1141       addition of two tables to an OVSDB schema: the RBAC_Role  table,  which
1142       is  indexed  by  role name and maps the the names of the various tables
1143       that may be modifiable for a given role to individual rows in a permis‐
1144       sions  table  containing detailed permission information for that role,
1145       and the permission table itself which consists of rows  containing  the
1146       following information:
1147
1148              Table Name
1149                     The name of the associated table. This column exists pri‐
1150                     marily as an aid for humans reading the contents of  this
1151                     table.
1152
1153              Auth Criteria
1154                     A set of strings containing the names of columns (or col‐
1155                     umn:key pairs for columns containing string:string maps).
1156                     The contents of at least one of the columns or column:key
1157                     values in a row to be modified, inserted, or deleted must
1158                     be equal to the ID of the client attempting to act on the
1159                     row in order for the authorization check to pass. If  the
1160                     authorization  criteria  is empty, authorization checking
1161                     is disabled and all clients for the role will be  treated
1162                     as authorized.
1163
1164              Insert/Delete
1165                     Row insertion/deletion permission; boolean value indicat‐
1166                     ing whether insertion and deletion of rows is allowed for
1167                     the  associated table. If true, insertion and deletion of
1168                     rows is allowed for authorized clients.
1169
1170              Updatable Columns
1171                     A set of strings containing the names of columns or  col‐
1172                     umn:key  pairs  that  may be updated or mutated by autho‐
1173                     rized clients. Modifications to columns within a row  are
1174                     only  permitted  when  the  authorization  check  for the
1175                     client passes and all columns to be modified are included
1176                     in this set of modifiable columns.
1177
1178       RBAC  configuration  for  the  OVN southbound database is maintained by
1179       ovn-northd. With RBAC enabled, modifications are only permitted for the
1180       Chassis,   Encap,   Port_Binding,   and  MAC_Binding  tables,  and  are
1181       resstricted as follows:
1182
1183              Chassis
1184                     Authorization: client ID must match the chassis name.
1185
1186                     Insert/Delete: authorized row insertion and deletion  are
1187                     permitted.
1188
1189                     Update:  The  columns  nb_cfg,  external_ids, encaps, and
1190                     vtep_logical_switches may be modified when authorized.
1191
1192              Encap  Authorization: client ID must match the chassis name.
1193
1194                     Insert/Delete: row insertion and row deletion are permit‐
1195                     ted.
1196
1197                     Update:  The  columns  type, options, and ip can be modi‐
1198                     fied.
1199
1200              Port_Binding
1201                     Authorization:  disabled  (all  clients  are   considered
1202                     authorized. A future enhancement may add columns (or keys
1203                     to external_ids) in order to control  which  chassis  are
1204                     allowed to bind each port.
1205
1206                     Insert/Delete:  row  insertion/deletion are not permitted
1207                     (ovn-northd maintains rows in this table.
1208
1209                     Update: Only modifications to the chassis column are per‐
1210                     mitted.
1211
1212              MAC_Binding
1213                     Authorization: disabled (all clients are considered to be
1214                     authorized).
1215
1216                     Insert/Delete: row insertion/deletion are permitted.
1217
1218                     Update: The columns logical_port, ip, mac,  and  datapath
1219                     may be modified by ovn-controller.
1220
1221       Enabling RBAC for ovn-controller connections to the southbound database
1222       requires the following steps:
1223
1224              1.
1225                Creating SSL certificates for each chassis with  the  certifi‐
1226                cate CN field set to the chassis name (e.g. for a chassis with
1227                external-ids:system-id=chassis-1, via the command "ovs-pki  -B
1228                1024 -u req+sign chassis-1 switch").
1229
1230              2.
1231                Configuring  each ovn-controller to use SSL when connecting to
1232                the southbound  database  (e.g.  via  "ovs-vsctl  set  open  .
1233                external-ids:ovn-remote=ssl:x.x.x.x:6642").
1234
1235              3.
1236                Configuring  a  southbound  database SSL remote with "ovn-con‐
1237                troller"   role   (e.g.    via    "ovn-sbctl    set-connection
1238                role=ovn-controller pssl:6642").
1239

DESIGN DECISIONS

1241   Tunnel Encapsulations
1242       OVN annotates logical network packets that it sends from one hypervisor
1243       to another with the following  three  pieces  of  metadata,  which  are
1244       encoded in an encapsulation-specific fashion:
1245
1246              ·      24-bit  logical  datapath identifier, from the tunnel_key
1247                     column in the OVN Southbound Datapath_Binding table.
1248
1249              ·      15-bit logical ingress port identifier. ID 0 is  reserved
1250                     for  internal use within OVN. IDs 1 through 32767, inclu‐
1251                     sive, may be assigned to  logical  ports  (see  the  tun‐
1252                     nel_key column in the OVN Southbound Port_Binding table).
1253
1254              ·      16-bit  logical  egress  port  identifier.  IDs 0 through
1255                     32767 have the same meaning as for logical ingress ports.
1256                     IDs  32768  through  65535, inclusive, may be assigned to
1257                     logical multicast groups (see the  tunnel_key  column  in
1258                     the OVN Southbound Multicast_Group table).
1259
1260       For  hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
1261       encapsulations, for the following reasons:
1262
1263              ·      Only STT and Geneve support the large amounts of metadata
1264                     (over  32  bits  per  packet) that OVN uses (as described
1265                     above).
1266
1267              ·      STT and Geneve use randomized UDP  or  TCP  source  ports
1268                     that  allows  efficient distribution among multiple paths
1269                     in environments that use ECMP in their underlay.
1270
1271              ·      NICs are available to offload STT and  Geneve  encapsula‐
1272                     tion and decapsulation.
1273
1274       Due to its flexibility, the preferred encapsulation between hypervisors
1275       is Geneve. For Geneve encapsulation, OVN transmits the logical datapath
1276       identifier  in  the  Geneve  VNI. OVN transmits the logical ingress and
1277       logical egress ports in a TLV with  class  0x0102,  type  0x80,  and  a
1278       32-bit value encoded as follows, from MSB to LSB:
1279
1280         1       15          16
1281       +---+------------+-----------+
1282       |rsv|ingress port|egress port|
1283       +---+------------+-----------+
1284         0
1285
1286
1287       Environments  whose  NICs lack Geneve offload may prefer STT encapsula‐
1288       tion for performance reasons. For STT encapsulation,  OVN  encodes  all
1289       three  pieces  of  logical metadata in the STT 64-bit tunnel ID as fol‐
1290       lows, from MSB to LSB:
1291
1292           9          15          16         24
1293       +--------+------------+-----------+--------+
1294       |reserved|ingress port|egress port|datapath|
1295       +--------+------------+-----------+--------+
1296           0
1297
1298
1299       For connecting to gateways, in addition to Geneve and STT, OVN supports
1300       VXLAN,  because  only  VXLAN  support  is  common  on top-of-rack (ToR)
1301       switches. Currently, gateways have a feature set that matches the capa‐
1302       bilities  as  defined by the VTEP schema, so fewer bits of metadata are
1303       necessary. In the future, gateways that do not  support  encapsulations
1304       with  large  amounts of metadata may continue to have a reduced feature
1305       set.
1306
1307
1308
1309Open vSwitch 2.10.1            OVN Architecture            ovn-architecture(7)
Impressum