1ovn-architecture(7) Open vSwitch Manual ovn-architecture(7)
2
3
4
6 ovn-architecture - Open Virtual Network architecture
7
9 OVN, the Open Virtual Network, is a system to support virtual network
10 abstraction. OVN complements the existing capabilities of OVS to add
11 native support for virtual network abstractions, such as virtual L2 and
12 L3 overlays and security groups. Services such as DHCP are also desir‐
13 able features. Just like OVS, OVN’s design goal is to have a produc‐
14 tion-quality implementation that can operate at significant scale.
15
16 An OVN deployment consists of several components:
17
18 · A Cloud Management System (CMS), which is OVN’s ultimate
19 client (via its users and administrators). OVN integra‐
20 tion requires installing a CMS-specific plugin and
21 related software (see below). OVN initially targets Open‐
22 Stack as CMS.
23
24 We generally speak of ``the’’ CMS, but one can imagine
25 scenarios in which multiple CMSes manage different parts
26 of an OVN deployment.
27
28 · An OVN Database physical or virtual node (or, eventually,
29 cluster) installed in a central location.
30
31 · One or more (usually many) hypervisors. Hypervisors must
32 run Open vSwitch and implement the interface described in
33 IntegrationGuide.rst in the OVS source tree. Any hypervi‐
34 sor platform supported by Open vSwitch is acceptable.
35
36 · Zero or more gateways. A gateway extends a tunnel-based
37 logical network into a physical network by bidirection‐
38 ally forwarding packets between tunnels and a physical
39 Ethernet port. This allows non-virtualized machines to
40 participate in logical networks. A gateway may be a phys‐
41 ical host, a virtual machine, or an ASIC-based hardware
42 switch that supports the vtep(5) schema.
43
44 Hypervisors and gateways are together called transport
45 node or chassis.
46
47 The diagram below shows how the major components of OVN and related
48 software interact. Starting at the top of the diagram, we have:
49
50 · The Cloud Management System, as defined above.
51
52 · The OVN/CMS Plugin is the component of the CMS that
53 interfaces to OVN. In OpenStack, this is a Neutron plug‐
54 in. The plugin’s main purpose is to translate the CMS’s
55 notion of logical network configuration, stored in the
56 CMS’s configuration database in a CMS-specific format,
57 into an intermediate representation understood by OVN.
58
59 This component is necessarily CMS-specific, so a new
60 plugin needs to be developed for each CMS that is inte‐
61 grated with OVN. All of the components below this one in
62 the diagram are CMS-independent.
63
64 · The OVN Northbound Database receives the intermediate
65 representation of logical network configuration passed
66 down by the OVN/CMS Plugin. The database schema is meant
67 to be ``impedance matched’’ with the concepts used in a
68 CMS, so that it directly supports notions of logical
69 switches, routers, ACLs, and so on. See ovn-nb(5) for
70 details.
71
72 The OVN Northbound Database has only two clients: the
73 OVN/CMS Plugin above it and ovn-northd below it.
74
75 · ovn-northd(8) connects to the OVN Northbound Database
76 above it and the OVN Southbound Database below it. It
77 translates the logical network configuration in terms of
78 conventional network concepts, taken from the OVN North‐
79 bound Database, into logical datapath flows in the OVN
80 Southbound Database below it.
81
82 · The OVN Southbound Database is the center of the system.
83 Its clients are ovn-northd(8) above it and ovn-con‐
84 troller(8) on every transport node below it.
85
86 The OVN Southbound Database contains three kinds of data:
87 Physical Network (PN) tables that specify how to reach
88 hypervisor and other nodes, Logical Network (LN) tables
89 that describe the logical network in terms of ``logical
90 datapath flows,’’ and Binding tables that link logical
91 network components’ locations to the physical network.
92 The hypervisors populate the PN and Port_Binding tables,
93 whereas ovn-northd(8) populates the LN tables.
94
95 OVN Southbound Database performance must scale with the
96 number of transport nodes. This will likely require some
97 work on ovsdb-server(1) as we encounter bottlenecks.
98 Clustering for availability may be needed.
99
100 The remaining components are replicated onto each hypervisor:
101
102 · ovn-controller(8) is OVN’s agent on each hypervisor and
103 software gateway. Northbound, it connects to the OVN
104 Southbound Database to learn about OVN configuration and
105 status and to populate the PN table and the Chassis col‐
106 umn in Binding table with the hypervisor’s status. South‐
107 bound, it connects to ovs-vswitchd(8) as an OpenFlow con‐
108 troller, for control over network traffic, and to the
109 local ovsdb-server(1) to allow it to monitor and control
110 Open vSwitch configuration.
111
112 · ovs-vswitchd(8) and ovsdb-server(1) are conventional com‐
113 ponents of Open vSwitch.
114
115 CMS
116 |
117 |
118 +-----------|-----------+
119 | | |
120 | OVN/CMS Plugin |
121 | | |
122 | | |
123 | OVN Northbound DB |
124 | | |
125 | | |
126 | ovn-northd |
127 | | |
128 +-----------|-----------+
129 |
130 |
131 +-------------------+
132 | OVN Southbound DB |
133 +-------------------+
134 |
135 |
136 +------------------+------------------+
137 | | |
138 HV 1 | | HV n |
139 +---------------|---------------+ . +---------------|---------------+
140 | | | . | | |
141 | ovn-controller | . | ovn-controller |
142 | | | | . | | | |
143 | | | | | | | |
144 | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
145 | | | |
146 +-------------------------------+ +-------------------------------+
147
148
149 Information Flow in OVN
150 Configuration data in OVN flows from north to south. The CMS, through
151 its OVN/CMS plugin, passes the logical network configuration to
152 ovn-northd via the northbound database. In turn, ovn-northd compiles
153 the configuration into a lower-level form and passes it to all of the
154 chassis via the southbound database.
155
156 Status information in OVN flows from south to north. OVN currently pro‐
157 vides only a few forms of status information. First, ovn-northd popu‐
158 lates the up column in the northbound Logical_Switch_Port table: if a
159 logical port’s chassis column in the southbound Port_Binding table is
160 nonempty, it sets up to true, otherwise to false. This allows the CMS
161 to detect when a VM’s networking has come up.
162
163 Second, OVN provides feedback to the CMS on the realization of its con‐
164 figuration, that is, whether the configuration provided by the CMS has
165 taken effect. This feature requires the CMS to participate in a
166 sequence number protocol, which works the following way:
167
168 1.
169 When the CMS updates the configuration in the northbound data‐
170 base, as part of the same transaction, it increments the value
171 of the nb_cfg column in the NB_Global table. (This is only
172 necessary if the CMS wants to know when the configuration has
173 been realized.)
174
175 2.
176 When ovn-northd updates the southbound database based on a
177 given snapshot of the northbound database, it copies nb_cfg
178 from northbound NB_Global into the southbound database
179 SB_Global table, as part of the same transaction. (Thus, an
180 observer monitoring both databases can determine when the
181 southbound database is caught up with the northbound.)
182
183 3.
184 After ovn-northd receives confirmation from the southbound
185 database server that its changes have committed, it updates
186 sb_cfg in the northbound NB_Global table to the nb_cfg version
187 that was pushed down. (Thus, the CMS or another observer can
188 determine when the southbound database is caught up without a
189 connection to the southbound database.)
190
191 4.
192 The ovn-controller process on each chassis receives the
193 updated southbound database, with the updated nb_cfg. This
194 process in turn updates the physical flows installed in the
195 chassis’s Open vSwitch instances. When it receives confirma‐
196 tion from Open vSwitch that the physical flows have been
197 updated, it updates nb_cfg in its own Chassis record in the
198 southbound database.
199
200 5.
201 ovn-northd monitors the nb_cfg column in all of the Chassis
202 records in the southbound database. It keeps track of the min‐
203 imum value among all the records and copies it into the hv_cfg
204 column in the northbound NB_Global table. (Thus, the CMS or
205 another observer can determine when all of the hypervisors
206 have caught up to the northbound configuration.)
207
208 Chassis Setup
209 Each chassis in an OVN deployment must be configured with an Open
210 vSwitch bridge dedicated for OVN’s use, called the integration bridge.
211 System startup scripts may create this bridge prior to starting
212 ovn-controller if desired. If this bridge does not exist when ovn-con‐
213 troller starts, it will be created automatically with the default con‐
214 figuration suggested below. The ports on the integration bridge
215 include:
216
217 · On any chassis, tunnel ports that OVN uses to maintain
218 logical network connectivity. ovn-controller adds,
219 updates, and removes these tunnel ports.
220
221 · On a hypervisor, any VIFs that are to be attached to log‐
222 ical networks. The hypervisor itself, or the integration
223 between Open vSwitch and the hypervisor (described in
224 IntegrationGuide.rst) takes care of this. (This is not
225 part of OVN or new to OVN; this is pre-existing integra‐
226 tion work that has already been done on hypervisors that
227 support OVS.)
228
229 · On a gateway, the physical port used for logical network
230 connectivity. System startup scripts add this port to the
231 bridge prior to starting ovn-controller. This can be a
232 patch port to another bridge, instead of a physical port,
233 in more sophisticated setups.
234
235 Other ports should not be attached to the integration bridge. In par‐
236 ticular, physical ports attached to the underlay network (as opposed to
237 gateway ports, which are physical ports attached to logical networks)
238 must not be attached to the integration bridge. Underlay physical ports
239 should instead be attached to a separate Open vSwitch bridge (they need
240 not be attached to any bridge at all, in fact).
241
242 The integration bridge should be configured as described below. The
243 effect of each of these settings is documented in
244 ovs-vswitchd.conf.db(5):
245
246 fail-mode=secure
247 Avoids switching packets between isolated logical net‐
248 works before ovn-controller starts up. See Controller
249 Failure Settings in ovs-vsctl(8) for more information.
250
251 other-config:disable-in-band=true
252 Suppresses in-band control flows for the integration
253 bridge. It would be unusual for such flows to show up
254 anyway, because OVN uses a local controller (over a Unix
255 domain socket) instead of a remote controller. It’s pos‐
256 sible, however, for some other bridge in the same system
257 to have an in-band remote controller, and in that case
258 this suppresses the flows that in-band control would
259 ordinarily set up. Refer to the documentation for more
260 information.
261
262 The customary name for the integration bridge is br-int, but another
263 name may be used.
264
265 Logical Networks
266 A logical network implements the same concepts as physical networks,
267 but they are insulated from the physical network with tunnels or other
268 encapsulations. This allows logical networks to have separate IP and
269 other address spaces that overlap, without conflicting, with those used
270 for physical networks. Logical network topologies can be arranged with‐
271 out regard for the topologies of the physical networks on which they
272 run.
273
274 Logical network concepts in OVN include:
275
276 · Logical switches, the logical version of Ethernet
277 switches.
278
279 · Logical routers, the logical version of IP routers. Logi‐
280 cal switches and routers can be connected into sophisti‐
281 cated topologies.
282
283 · Logical datapaths are the logical version of an OpenFlow
284 switch. Logical switches and routers are both implemented
285 as logical datapaths.
286
287 · Logical ports represent the points of connectivity in and
288 out of logical switches and logical routers. Some common
289 types of logical ports are:
290
291 · Logical ports representing VIFs.
292
293 · Localnet ports represent the points of connectiv‐
294 ity between logical switches and the physical net‐
295 work. They are implemented as OVS patch ports
296 between the integration bridge and the separate
297 Open vSwitch bridge that underlay physical ports
298 attach to.
299
300 · Logical patch ports represent the points of con‐
301 nectivity between logical switches and logical
302 routers, and in some cases between peer logical
303 routers. There is a pair of logical patch ports at
304 each such point of connectivity, one on each side.
305
306 · Localport ports represent the points of local con‐
307 nectivity between logical switches and VIFs. These
308 ports are present in every chassis (not bound to
309 any particular one) and traffic from them will
310 never go through a tunnel. A localport is expected
311 to only generate traffic destined for a local des‐
312 tination, typically in response to a request it
313 received. One use case is how OpenStack Neutron
314 uses a localport port for serving metadata to VM’s
315 residing on every hypervisor. A metadata proxy
316 process is attached to this port on every host and
317 all VM’s within the same network will reach it at
318 the same IP/MAC address without any traffic being
319 sent over a tunnel. Further details can be seen at
320 https://docs.openstack.org/developer/networking-
321 ovn/design/metadata_api.html.
322
323 Life Cycle of a VIF
324 Tables and their schemas presented in isolation are difficult to under‐
325 stand. Here’s an example.
326
327 A VIF on a hypervisor is a virtual network interface attached either to
328 a VM or a container running directly on that hypervisor (This is dif‐
329 ferent from the interface of a container running inside a VM).
330
331 The steps in this example refer often to details of the OVN and OVN
332 Northbound database schemas. Please see ovn-sb(5) and ovn-nb(5),
333 respectively, for the full story on these databases.
334
335 1.
336 A VIF’s life cycle begins when a CMS administrator creates a
337 new VIF using the CMS user interface or API and adds it to a
338 switch (one implemented by OVN as a logical switch). The CMS
339 updates its own configuration. This includes associating
340 unique, persistent identifier vif-id and Ethernet address mac
341 with the VIF.
342
343 2.
344 The CMS plugin updates the OVN Northbound database to include
345 the new VIF, by adding a row to the Logical_Switch_Port table.
346 In the new row, name is vif-id, mac is mac, switch points to
347 the OVN logical switch’s Logical_Switch record, and other col‐
348 umns are initialized appropriately.
349
350 3.
351 ovn-northd receives the OVN Northbound database update. In
352 turn, it makes the corresponding updates to the OVN Southbound
353 database, by adding rows to the OVN Southbound database Logi‐
354 cal_Flow table to reflect the new port, e.g. add a flow to
355 recognize that packets destined to the new port’s MAC address
356 should be delivered to it, and update the flow that delivers
357 broadcast and multicast packets to include the new port. It
358 also creates a record in the Binding table and populates all
359 its columns except the column that identifies the chassis.
360
361 4.
362 On every hypervisor, ovn-controller receives the Logical_Flow
363 table updates that ovn-northd made in the previous step. As
364 long as the VM that owns the VIF is powered off, ovn-con‐
365 troller cannot do much; it cannot, for example, arrange to
366 send packets to or receive packets from the VIF, because the
367 VIF does not actually exist anywhere.
368
369 5.
370 Eventually, a user powers on the VM that owns the VIF. On the
371 hypervisor where the VM is powered on, the integration between
372 the hypervisor and Open vSwitch (described in Integra‐
373 tionGuide.rst) adds the VIF to the OVN integration bridge and
374 stores vif-id in external_ids:iface-id to indicate that the
375 interface is an instantiation of the new VIF. (None of this
376 code is new in OVN; this is pre-existing integration work that
377 has already been done on hypervisors that support OVS.)
378
379 6.
380 On the hypervisor where the VM is powered on, ovn-controller
381 notices external_ids:iface-id in the new Interface. In
382 response, in the OVN Southbound DB, it updates the Binding ta‐
383 ble’s chassis column for the row that links the logical port
384 from external_ids: iface-id to the hypervisor. Afterward,
385 ovn-controller updates the local hypervisor’s OpenFlow tables
386 so that packets to and from the VIF are properly handled.
387
388 7.
389 Some CMS systems, including OpenStack, fully start a VM only
390 when its networking is ready. To support this, ovn-northd
391 notices the chassis column updated for the row in Binding ta‐
392 ble and pushes this upward by updating the up column in the
393 OVN Northbound database’s Logical_Switch_Port table to indi‐
394 cate that the VIF is now up. The CMS, if it uses this feature,
395 can then react by allowing the VM’s execution to proceed.
396
397 8.
398 On every hypervisor but the one where the VIF resides,
399 ovn-controller notices the completely populated row in the
400 Binding table. This provides ovn-controller the physical loca‐
401 tion of the logical port, so each instance updates the Open‐
402 Flow tables of its switch (based on logical datapath flows in
403 the OVN DB Logical_Flow table) so that packets to and from the
404 VIF can be properly handled via tunnels.
405
406 9.
407 Eventually, a user powers off the VM that owns the VIF. On the
408 hypervisor where the VM was powered off, the VIF is deleted
409 from the OVN integration bridge.
410
411 10.
412 On the hypervisor where the VM was powered off, ovn-controller
413 notices that the VIF was deleted. In response, it removes the
414 Chassis column content in the Binding table for the logical
415 port.
416
417 11.
418 On every hypervisor, ovn-controller notices the empty Chassis
419 column in the Binding table’s row for the logical port. This
420 means that ovn-controller no longer knows the physical loca‐
421 tion of the logical port, so each instance updates its Open‐
422 Flow table to reflect that.
423
424 12.
425 Eventually, when the VIF (or its entire VM) is no longer
426 needed by anyone, an administrator deletes the VIF using the
427 CMS user interface or API. The CMS updates its own configura‐
428 tion.
429
430 13.
431 The CMS plugin removes the VIF from the OVN Northbound data‐
432 base, by deleting its row in the Logical_Switch_Port table.
433
434 14.
435 ovn-northd receives the OVN Northbound update and in turn
436 updates the OVN Southbound database accordingly, by removing
437 or updating the rows from the OVN Southbound database Logi‐
438 cal_Flow table and Binding table that were related to the now-
439 destroyed VIF.
440
441 15.
442 On every hypervisor, ovn-controller receives the Logical_Flow
443 table updates that ovn-northd made in the previous step.
444 ovn-controller updates OpenFlow tables to reflect the update,
445 although there may not be much to do, since the VIF had
446 already become unreachable when it was removed from the Bind‐
447 ing table in a previous step.
448
449 Life Cycle of a Container Interface Inside a VM
450 OVN provides virtual network abstractions by converting information
451 written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
452 virtual networking for multi-tenants can only be provided if OVN con‐
453 troller is the only entity that can modify flows in Open vSwitch. When
454 the Open vSwitch integration bridge resides in the hypervisor, it is a
455 fair assumption to make that tenant workloads running inside VMs cannot
456 make any changes to Open vSwitch flows.
457
458 If the infrastructure provider trusts the applications inside the con‐
459 tainers not to break out and modify the Open vSwitch flows, then con‐
460 tainers can be run in hypervisors. This is also the case when contain‐
461 ers are run inside the VMs and Open vSwitch integration bridge with
462 flows added by OVN controller resides in the same VM. For both the
463 above cases, the workflow is the same as explained with an example in
464 the previous section ("Life Cycle of a VIF").
465
466 This section talks about the life cycle of a container interface (CIF)
467 when containers are created in the VMs and the Open vSwitch integration
468 bridge resides inside the hypervisor. In this case, even if a container
469 application breaks out, other tenants are not affected because the con‐
470 tainers running inside the VMs cannot modify the flows in the Open
471 vSwitch integration bridge.
472
473 When multiple containers are created inside a VM, there are multiple
474 CIFs associated with them. The network traffic associated with these
475 CIFs need to reach the Open vSwitch integration bridge running in the
476 hypervisor for OVN to support virtual network abstractions. OVN should
477 also be able to distinguish network traffic coming from different CIFs.
478 There are two ways to distinguish network traffic of CIFs.
479
480 One way is to provide one VIF for every CIF (1:1 model). This means
481 that there could be a lot of network devices in the hypervisor. This
482 would slow down OVS because of all the additional CPU cycles needed for
483 the management of all the VIFs. It would also mean that the entity cre‐
484 ating the containers in a VM should also be able to create the corre‐
485 sponding VIFs in the hypervisor.
486
487 The second way is to provide a single VIF for all the CIFs (1:many
488 model). OVN could then distinguish network traffic coming from differ‐
489 ent CIFs via a tag written in every packet. OVN uses this mechanism and
490 uses VLAN as the tagging mechanism.
491
492 1.
493 A CIF’s life cycle begins when a container is spawned inside a
494 VM by the either the same CMS that created the VM or a tenant
495 that owns that VM or even a container Orchestration System
496 that is different than the CMS that initially created the VM.
497 Whoever the entity is, it will need to know the vif-id that is
498 associated with the network interface of the VM through which
499 the container interface’s network traffic is expected to go
500 through. The entity that creates the container interface will
501 also need to choose an unused VLAN inside that VM.
502
503 2.
504 The container spawning entity (either directly or through the
505 CMS that manages the underlying infrastructure) updates the
506 OVN Northbound database to include the new CIF, by adding a
507 row to the Logical_Switch_Port table. In the new row, name is
508 any unique identifier, parent_name is the vif-id of the VM
509 through which the CIF’s network traffic is expected to go
510 through and the tag is the VLAN tag that identifies the net‐
511 work traffic of that CIF.
512
513 3.
514 ovn-northd receives the OVN Northbound database update. In
515 turn, it makes the corresponding updates to the OVN Southbound
516 database, by adding rows to the OVN Southbound database’s Log‐
517 ical_Flow table to reflect the new port and also by creating a
518 new row in the Binding table and populating all its columns
519 except the column that identifies the chassis.
520
521 4.
522 On every hypervisor, ovn-controller subscribes to the changes
523 in the Binding table. When a new row is created by ovn-northd
524 that includes a value in parent_port column of Binding table,
525 the ovn-controller in the hypervisor whose OVN integration
526 bridge has that same value in vif-id in external_ids:iface-id
527 updates the local hypervisor’s OpenFlow tables so that packets
528 to and from the VIF with the particular VLAN tag are properly
529 handled. Afterward it updates the chassis column of the Bind‐
530 ing to reflect the physical location.
531
532 5.
533 One can only start the application inside the container after
534 the underlying network is ready. To support this, ovn-northd
535 notices the updated chassis column in Binding table and
536 updates the up column in the OVN Northbound database’s Logi‐
537 cal_Switch_Port table to indicate that the CIF is now up. The
538 entity responsible to start the container application queries
539 this value and starts the application.
540
541 6.
542 Eventually the entity that created and started the container,
543 stops it. The entity, through the CMS (or directly) deletes
544 its row in the Logical_Switch_Port table.
545
546 7.
547 ovn-northd receives the OVN Northbound update and in turn
548 updates the OVN Southbound database accordingly, by removing
549 or updating the rows from the OVN Southbound database Logi‐
550 cal_Flow table that were related to the now-destroyed CIF. It
551 also deletes the row in the Binding table for that CIF.
552
553 8.
554 On every hypervisor, ovn-controller receives the Logical_Flow
555 table updates that ovn-northd made in the previous step.
556 ovn-controller updates OpenFlow tables to reflect the update.
557
558 Architectural Physical Life Cycle of a Packet
559 This section describes how a packet travels from one virtual machine or
560 container to another through OVN. This description focuses on the phys‐
561 ical treatment of a packet; for a description of the logical life cycle
562 of a packet, please refer to the Logical_Flow table in ovn-sb(5).
563
564 This section mentions several data and metadata fields, for clarity
565 summarized here:
566
567 tunnel key
568 When OVN encapsulates a packet in Geneve or another tun‐
569 nel, it attaches extra data to it to allow the receiving
570 OVN instance to process it correctly. This takes differ‐
571 ent forms depending on the particular encapsulation, but
572 in each case we refer to it here as the ``tunnel key.’’
573 See Tunnel Encapsulations, below, for details.
574
575 logical datapath field
576 A field that denotes the logical datapath through which a
577 packet is being processed. OVN uses the field that Open‐
578 Flow 1.1+ simply (and confusingly) calls ``metadata’’ to
579 store the logical datapath. (This field is passed across
580 tunnels as part of the tunnel key.)
581
582 logical input port field
583 A field that denotes the logical port from which the
584 packet entered the logical datapath. OVN stores this in
585 Open vSwitch extension register number 14.
586
587 Geneve and STT tunnels pass this field as part of the
588 tunnel key. Although VXLAN tunnels do not explicitly
589 carry a logical input port, OVN only uses VXLAN to commu‐
590 nicate with gateways that from OVN’s perspective consist
591 of only a single logical port, so that OVN can set the
592 logical input port field to this one on ingress to the
593 OVN logical pipeline.
594
595 logical output port field
596 A field that denotes the logical port from which the
597 packet will leave the logical datapath. This is initial‐
598 ized to 0 at the beginning of the logical ingress pipe‐
599 line. OVN stores this in Open vSwitch extension register
600 number 15.
601
602 Geneve and STT tunnels pass this field as part of the
603 tunnel key. VXLAN tunnels do not transmit the logical
604 output port field. Since VXLAN tunnels do not carry a
605 logical output port field in the tunnel key, when a
606 packet is received from VXLAN tunnel by an OVN hypervi‐
607 sor, the packet is resubmitted to table 8 to determine
608 the output port(s); when the packet reaches table 32,
609 these packets are resubmitted to table 33 for local
610 delivery by checking a MLF_RCV_FROM_VXLAN flag, which is
611 set when the packet arrives from a VXLAN tunnel.
612
613 conntrack zone field for logical ports
614 A field that denotes the connection tracking zone for
615 logical ports. The value only has local significance and
616 is not meaningful between chassis. This is initialized to
617 0 at the beginning of the logical ingress pipeline. OVN
618 stores this in Open vSwitch extension register number 13.
619
620 conntrack zone fields for routers
621 Fields that denote the connection tracking zones for
622 routers. These values only have local significance and
623 are not meaningful between chassis. OVN stores the zone
624 information for DNATting in Open vSwitch extension regis‐
625 ter number 11 and zone information for SNATing in Open
626 vSwitch extension register number 12.
627
628 logical flow flags
629 The logical flags are intended to handle keeping context
630 between tables in order to decide which rules in subse‐
631 quent tables are matched. These values only have local
632 significance and are not meaningful between chassis. OVN
633 stores the logical flags in Open vSwitch extension regis‐
634 ter number 10.
635
636 VLAN ID
637 The VLAN ID is used as an interface between OVN and con‐
638 tainers nested inside a VM (see Life Cycle of a container
639 interface inside a VM, above, for more information).
640
641 Initially, a VM or container on the ingress hypervisor sends a packet
642 on a port attached to the OVN integration bridge. Then:
643
644 1.
645 OpenFlow table 0 performs physical-to-logical translation. It
646 matches the packet’s ingress port. Its actions annotate the
647 packet with logical metadata, by setting the logical datapath
648 field to identify the logical datapath that the packet is
649 traversing and the logical input port field to identify the
650 ingress port. Then it resubmits to table 8 to enter the logi‐
651 cal ingress pipeline.
652
653 Packets that originate from a container nested within a VM are
654 treated in a slightly different way. The originating container
655 can be distinguished based on the VIF-specific VLAN ID, so the
656 physical-to-logical translation flows additionally match on
657 VLAN ID and the actions strip the VLAN header. Following this
658 step, OVN treats packets from containers just like any other
659 packets.
660
661 Table 0 also processes packets that arrive from other chassis.
662 It distinguishes them from other packets by ingress port,
663 which is a tunnel. As with packets just entering the OVN pipe‐
664 line, the actions annotate these packets with logical datapath
665 and logical ingress port metadata. In addition, the actions
666 set the logical output port field, which is available because
667 in OVN tunneling occurs after the logical output port is
668 known. These three pieces of information are obtained from the
669 tunnel encapsulation metadata (see Tunnel Encapsulations for
670 encoding details). Then the actions resubmit to table 33 to
671 enter the logical egress pipeline.
672
673 2.
674 OpenFlow tables 8 through 31 execute the logical ingress pipe‐
675 line from the Logical_Flow table in the OVN Southbound data‐
676 base. These tables are expressed entirely in terms of logical
677 concepts like logical ports and logical datapaths. A big part
678 of ovn-controller’s job is to translate them into equivalent
679 OpenFlow (in particular it translates the table numbers: Logi‐
680 cal_Flow tables 0 through 23 become OpenFlow tables 8 through
681 31).
682
683 Each logical flow maps to one or more OpenFlow flows. An
684 actual packet ordinarily matches only one of these, although
685 in some cases it can match more than one of these flows (which
686 is not a problem because all of them have the same actions).
687 ovn-controller uses the first 32 bits of the logical flow’s
688 UUID as the cookie for its OpenFlow flow or flows. (This is
689 not necessarily unique, since the first 32 bits of a logical
690 flow’s UUID is not necessarily unique.)
691
692 Some logical flows can map to the Open vSwitch ``conjunctive
693 match’’ extension (see ovs-fields(7)). Flows with a conjunc‐
694 tion action use an OpenFlow cookie of 0, because they can cor‐
695 respond to multiple logical flows. The OpenFlow flow for a
696 conjunctive match includes a match on conj_id.
697
698 Some logical flows may not be represented in the OpenFlow
699 tables on a given hypervisor, if they could not be used on
700 that hypervisor. For example, if no VIF in a logical switch
701 resides on a given hypervisor, and the logical switch is not
702 otherwise reachable on that hypervisor (e.g. over a series of
703 hops through logical switches and routers starting from a VIF
704 on the hypervisor), then the logical flow may not be repre‐
705 sented there.
706
707 Most OVN actions have fairly obvious implementations in Open‐
708 Flow (with OVS extensions), e.g. next; is implemented as
709 resubmit, field = constant; as set_field. A few are worth
710 describing in more detail:
711
712 output:
713 Implemented by resubmitting the packet to table 32. If
714 the pipeline executes more than one output action, then
715 each one is separately resubmitted to table 32. This
716 can be used to send multiple copies of the packet to
717 multiple ports. (If the packet was not modified between
718 the output actions, and some of the copies are destined
719 to the same hypervisor, then using a logical multicast
720 output port would save bandwidth between hypervisors.)
721
722 get_arp(P, A);
723 get_nd(P, A);
724 Implemented by storing arguments into OpenFlow fields,
725 then resubmitting to table 66, which ovn-controller popu‐
726 lates with flows generated from the MAC_Binding table in
727 the OVN Southbound database. If there is a match in table
728 66, then its actions store the bound MAC in the Ethernet
729 destination address field.
730
731 (The OpenFlow actions save and restore the OpenFlow
732 fields used for the arguments, so that the OVN actions do
733 not have to be aware of this temporary use.)
734
735 put_arp(P, A, E);
736 put_nd(P, A, E);
737 Implemented by storing the arguments into OpenFlow
738 fields, then outputting a packet to ovn-controller, which
739 updates the MAC_Binding table.
740
741 (The OpenFlow actions save and restore the OpenFlow
742 fields used for the arguments, so that the OVN actions do
743 not have to be aware of this temporary use.)
744
745 3.
746 OpenFlow tables 32 through 47 implement the output action in
747 the logical ingress pipeline. Specifically, table 32 handles
748 packets to remote hypervisors, table 33 handles packets to the
749 local hypervisor, and table 34 checks whether packets whose
750 logical ingress and egress port are the same should be dis‐
751 carded.
752
753 Logical patch ports are a special case. Logical patch ports do
754 not have a physical location and effectively reside on every
755 hypervisor. Thus, flow table 33, for output to ports on the
756 local hypervisor, naturally implements output to unicast logi‐
757 cal patch ports too. However, applying the same logic to a
758 logical patch port that is part of a logical multicast group
759 yields packet duplication, because each hypervisor that con‐
760 tains a logical port in the multicast group will also output
761 the packet to the logical patch port. Thus, multicast groups
762 implement output to logical patch ports in table 32.
763
764 Each flow in table 32 matches on a logical output port for
765 unicast or multicast logical ports that include a logical port
766 on a remote hypervisor. Each flow’s actions implement sending
767 a packet to the port it matches. For unicast logical output
768 ports on remote hypervisors, the actions set the tunnel key to
769 the correct value, then send the packet on the tunnel port to
770 the correct hypervisor. (When the remote hypervisor receives
771 the packet, table 0 there will recognize it as a tunneled
772 packet and pass it along to table 33.) For multicast logical
773 output ports, the actions send one copy of the packet to each
774 remote hypervisor, in the same way as for unicast destina‐
775 tions. If a multicast group includes a logical port or ports
776 on the local hypervisor, then its actions also resubmit to ta‐
777 ble 33. Table 32 also includes:
778
779 · A higher-priority rule to match packets received from
780 VXLAN tunnels, based on flag MLF_RCV_FROM_VXLAN, and
781 resubmit these packets to table 33 for local delivery.
782 Packets received from VXLAN tunnels reach here because
783 of a lack of logical output port field in the tunnel
784 key and thus these packets needed to be submitted to
785 table 8 to determine the output port.
786
787 · A higher-priority rule to match packets received from
788 ports of type localport, based on the logical input
789 port, and resubmit these packets to table 33 for local
790 delivery. Ports of type localport exist on every hyper‐
791 visor and by definition their traffic should never go
792 out through a tunnel.
793
794 · A higher-priority rule to match packets that have the
795 MLF_LOCAL_ONLY logical flow flag set, and whose desti‐
796 nation is a multicast address. This flag indicates that
797 the packet should not be delivered to remote hypervi‐
798 sors, even if the multicast destination includes ports
799 on remote hypervisors. This flag is used when ovn-con‐
800 troller is the originator of the multicast packet.
801 Since each ovn-controller instance is originating these
802 packets, the packets only need to be delivered to local
803 ports.
804
805 · A fallback flow that resubmits to table 33 if there is
806 no other match.
807
808 Flows in table 33 resemble those in table 32 but for logical
809 ports that reside locally rather than remotely. For unicast
810 logical output ports on the local hypervisor, the actions just
811 resubmit to table 34. For multicast output ports that include
812 one or more logical ports on the local hypervisor, for each
813 such logical port P, the actions change the logical output
814 port to P, then resubmit to table 34.
815
816 A special case is that when a localnet port exists on the
817 datapath, remote port is connected by switching to the local‐
818 net port. In this case, instead of adding a flow in table 32
819 to reach the remote port, a flow is added in table 33 to
820 switch the logical outport to the localnet port, and resubmit
821 to table 33 as if it were unicasted to a logical port on the
822 local hypervisor.
823
824 Table 34 matches and drops packets for which the logical input
825 and output ports are the same and the MLF_ALLOW_LOOPBACK flag
826 is not set. It resubmits other packets to table 40.
827
828 4.
829 OpenFlow tables 40 through 63 execute the logical egress pipe‐
830 line from the Logical_Flow table in the OVN Southbound data‐
831 base. The egress pipeline can perform a final stage of valida‐
832 tion before packet delivery. Eventually, it may execute an
833 output action, which ovn-controller implements by resubmitting
834 to table 64. A packet for which the pipeline never executes
835 output is effectively dropped (although it may have been
836 transmitted through a tunnel across a physical network).
837
838 The egress pipeline cannot change the logical output port or
839 cause further tunneling.
840
841 5.
842 Table 64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK is
843 set. Logical loopback was handled in table 34, but OpenFlow by
844 default also prevents loopback to the OpenFlow ingress port.
845 Thus, when MLF_ALLOW_LOOPBACK is set, OpenFlow table 64 saves
846 the OpenFlow ingress port, sets it to zero, resubmits to table
847 65 for logical-to-physical transformation, and then restores
848 the OpenFlow ingress port, effectively disabling OpenFlow
849 loopback prevents. When MLF_ALLOW_LOOPBACK is unset, table 64
850 flow simply resubmits to table 65.
851
852 6.
853 OpenFlow table 65 performs logical-to-physical translation,
854 the opposite of table 0. It matches the packet’s logical
855 egress port. Its actions output the packet to the port
856 attached to the OVN integration bridge that represents that
857 logical port. If the logical egress port is a container nested
858 with a VM, then before sending the packet the actions push on
859 a VLAN header with an appropriate VLAN ID.
860
861 Logical Routers and Logical Patch Ports
862 Typically logical routers and logical patch ports do not have a physi‐
863 cal location and effectively reside on every hypervisor. This is the
864 case for logical patch ports between logical routers and logical
865 switches behind those logical routers, to which VMs (and VIFs) attach.
866
867 Consider a packet sent from one virtual machine or container to another
868 VM or container that resides on a different subnet. The packet will
869 traverse tables 0 to 65 as described in the previous section Architec‐
870 tural Physical Life Cycle of a Packet, using the logical datapath rep‐
871 resenting the logical switch that the sender is attached to. At table
872 32, the packet will use the fallback flow that resubmits locally to ta‐
873 ble 33 on the same hypervisor. In this case, all of the processing from
874 table 0 to table 65 occurs on the hypervisor where the sender resides.
875
876 When the packet reaches table 65, the logical egress port is a logical
877 patch port. The implementation in table 65 differs depending on the OVS
878 version, although the observed behavior is meant to be the same:
879
880 · In OVS versions 2.6 and earlier, table 65 outputs to an
881 OVS patch port that represents the logical patch port.
882 The packet re-enters the OpenFlow flow table from the OVS
883 patch port’s peer in table 0, which identifies the logi‐
884 cal datapath and logical input port based on the OVS
885 patch port’s OpenFlow port number.
886
887 · In OVS versions 2.7 and later, the packet is cloned and
888 resubmitted directly to the first OpenFlow flow table in
889 the ingress pipeline, setting the logical ingress port to
890 the peer logical patch port, and using the peer logical
891 patch port’s logical datapath (that represents the logi‐
892 cal router).
893
894 The packet re-enters the ingress pipeline in order to traverse tables 8
895 to 65 again, this time using the logical datapath representing the log‐
896 ical router. The processing continues as described in the previous sec‐
897 tion Architectural Physical Life Cycle of a Packet. When the packet
898 reachs table 65, the logical egress port will once again be a logical
899 patch port. In the same manner as described above, this logical patch
900 port will cause the packet to be resubmitted to OpenFlow tables 8 to
901 65, this time using the logical datapath representing the logical
902 switch that the destination VM or container is attached to.
903
904 The packet traverses tables 8 to 65 a third and final time. If the des‐
905 tination VM or container resides on a remote hypervisor, then table 32
906 will send the packet on a tunnel port from the sender’s hypervisor to
907 the remote hypervisor. Finally table 65 will output the packet directly
908 to the destination VM or container.
909
910 The following sections describe two exceptions, where logical routers
911 and/or logical patch ports are associated with a physical location.
912
913 Gateway Routers
914
915 A gateway router is a logical router that is bound to a physical loca‐
916 tion. This includes all of the logical patch ports of the logical
917 router, as well as all of the peer logical patch ports on logical
918 switches. In the OVN Southbound database, the Port_Binding entries for
919 these logical patch ports use the type l3gateway rather than patch, in
920 order to distinguish that these logical patch ports are bound to a
921 chassis.
922
923 When a hypervisor processes a packet on a logical datapath representing
924 a logical switch, and the logical egress port is a l3gateway port rep‐
925 resenting connectivity to a gateway router, the packet will match a
926 flow in table 32 that sends the packet on a tunnel port to the chassis
927 where the gateway router resides. This processing in table 32 is done
928 in the same manner as for VIFs.
929
930 Gateway routers are typically used in between distributed logical
931 routers and physical networks. The distributed logical router and the
932 logical switches behind it, to which VMs and containers attach, effec‐
933 tively reside on each hypervisor. The distributed router and the gate‐
934 way router are connected by another logical switch, sometimes referred
935 to as a join logical switch. On the other side, the gateway router con‐
936 nects to another logical switch that has a localnet port connecting to
937 the physical network.
938
939 When using gateway routers, DNAT and SNAT rules are associated with the
940 gateway router, which provides a central location that can handle one-
941 to-many SNAT (aka IP masquerading).
942
943 Distributed Gateway Ports
944
945 Distributed gateway ports are logical router patch ports that directly
946 connect distributed logical routers to logical switches with localnet
947 ports.
948
949 The primary design goal of distributed gateway ports is to allow as
950 much traffic as possible to be handled locally on the hypervisor where
951 a VM or container resides. Whenever possible, packets from the VM or
952 container to the outside world should be processed completely on that
953 VM’s or container’s hypervisor, eventually traversing a localnet port
954 instance on that hypervisor to the physical network. Whenever possible,
955 packets from the outside world to a VM or container should be directed
956 through the physical network directly to the VM’s or container’s hyper‐
957 visor, where the packet will enter the integration bridge through a
958 localnet port.
959
960 In order to allow for the distributed processing of packets described
961 in the paragraph above, distributed gateway ports need to be logical
962 patch ports that effectively reside on every hypervisor, rather than
963 l3gateway ports that are bound to a particular chassis. However, the
964 flows associated with distributed gateway ports often need to be asso‐
965 ciated with physical locations, for the following reasons:
966
967 · The physical network that the localnet port is attached
968 to typically uses L2 learning. Any Ethernet address used
969 over the distributed gateway port must be restricted to a
970 single physical location so that upstream L2 learning is
971 not confused. Traffic sent out the distributed gateway
972 port towards the localnet port with a specific Ethernet
973 address must be sent out one specific instance of the
974 distributed gateway port on one specific chassis. Traffic
975 received from the localnet port (or from a VIF on the
976 same logical switch as the localnet port) with a specific
977 Ethernet address must be directed to the logical switch’s
978 patch port instance on that specific chassis.
979
980 Due to the implications of L2 learning, the Ethernet
981 address and IP address of the distributed gateway port
982 need to be restricted to a single physical location. For
983 this reason, the user must specify one chassis associated
984 with the distributed gateway port. Note that traffic
985 traversing the distributed gateway port using other Eth‐
986 ernet addresses and IP addresses (e.g. one-to-one NAT) is
987 not restricted to this chassis.
988
989 Replies to ARP and ND requests must be restricted to a
990 single physical location, where the Ethernet address in
991 the reply resides. This includes ARP and ND replies for
992 the IP address of the distributed gateway port, which are
993 restricted to the chassis that the user associated with
994 the distributed gateway port.
995
996 · In order to support one-to-many SNAT (aka IP masquerad‐
997 ing), where multiple logical IP addresses spread across
998 multiple chassis are mapped to a single external IP
999 address, it will be necessary to handle some of the logi‐
1000 cal router processing on a specific chassis in a central‐
1001 ized manner. Since the SNAT external IP address is typi‐
1002 cally the distributed gateway port IP address, and for
1003 simplicity, the same chassis associated with the distrib‐
1004 uted gateway port is used.
1005
1006 The details of flow restrictions to specific chassis are described in
1007 the ovn-northd documentation.
1008
1009 While most of the physical location dependent aspects of distributed
1010 gateway ports can be handled by restricting some flows to specific
1011 chassis, one additional mechanism is required. When a packet leaves the
1012 ingress pipeline and the logical egress port is the distributed gateway
1013 port, one of two different sets of actions is required at table 32:
1014
1015 · If the packet can be handled locally on the sender’s
1016 hypervisor (e.g. one-to-one NAT traffic), then the packet
1017 should just be resubmitted locally to table 33, in the
1018 normal manner for distributed logical patch ports.
1019
1020 · However, if the packet needs to be handled on the chassis
1021 associated with the distributed gateway port (e.g. one-
1022 to-many SNAT traffic or non-NAT traffic), then table 32
1023 must send the packet on a tunnel port to that chassis.
1024
1025 In order to trigger the second set of actions, the chassisredirect type
1026 of southbound Port_Binding has been added. Setting the logical egress
1027 port to the type chassisredirect logical port is simply a way to indi‐
1028 cate that although the packet is destined for the distributed gateway
1029 port, it needs to be redirected to a different chassis. At table 32,
1030 packets with this logical egress port are sent to a specific chassis,
1031 in the same way that table 32 directs packets whose logical egress port
1032 is a VIF or a type l3gateway port to different chassis. Once the packet
1033 arrives at that chassis, table 33 resets the logical egress port to the
1034 value representing the distributed gateway port. For each distributed
1035 gateway port, there is one type chassisredirect port, in addition to
1036 the distributed logical patch port representing the distributed gateway
1037 port.
1038
1039 High Availability for Distributed Gateway Ports
1040
1041 OVN allows you to specify a prioritized list of chassis for a distrib‐
1042 uted gateway port. This is done by associating multiple Gateway_Chassis
1043 rows with a Logical_Router_Port in the OVN_Northbound database.
1044
1045 When multiple chassis have been specified for a gateway, all chassis
1046 that may send packets to that gateway will enable BFD on tunnels to all
1047 configured gateway chassis. The current master chassis for the gateway
1048 is the highest priority gateway chassis that is currently viewed as
1049 active based on BFD status.
1050
1051 For more information on L3 gateway high availability, please refer to
1052 http://docs.openvswitch.org/en/latest/topics/high-availability.
1053
1054 Life Cycle of a VTEP gateway
1055 A gateway is a chassis that forwards traffic between the OVN-managed
1056 part of a logical network and a physical VLAN, extending a tunnel-based
1057 logical network into a physical network.
1058
1059 The steps below refer often to details of the OVN and VTEP database
1060 schemas. Please see ovn-sb(5), ovn-nb(5) and vtep(5), respectively, for
1061 the full story on these databases.
1062
1063 1.
1064 A VTEP gateway’s life cycle begins with the administrator reg‐
1065 istering the VTEP gateway as a Physical_Switch table entry in
1066 the VTEP database. The ovn-controller-vtep connected to this
1067 VTEP database, will recognize the new VTEP gateway and create
1068 a new Chassis table entry for it in the OVN_Southbound data‐
1069 base.
1070
1071 2.
1072 The administrator can then create a new Logical_Switch table
1073 entry, and bind a particular vlan on a VTEP gateway’s port to
1074 any VTEP logical switch. Once a VTEP logical switch is bound
1075 to a VTEP gateway, the ovn-controller-vtep will detect it and
1076 add its name to the vtep_logical_switches column of the Chas‐
1077 sis table in the OVN_Southbound database. Note, the tunnel_key
1078 column of VTEP logical switch is not filled at creation. The
1079 ovn-controller-vtep will set the column when the correponding
1080 vtep logical switch is bound to an OVN logical network.
1081
1082 3.
1083 Now, the administrator can use the CMS to add a VTEP logical
1084 switch to the OVN logical network. To do that, the CMS must
1085 first create a new Logical_Switch_Port table entry in the
1086 OVN_Northbound database. Then, the type column of this entry
1087 must be set to "vtep". Next, the vtep-logical-switch and vtep-
1088 physical-switch keys in the options column must also be speci‐
1089 fied, since multiple VTEP gateways can attach to the same VTEP
1090 logical switch.
1091
1092 4.
1093 The newly created logical port in the OVN_Northbound database
1094 and its configuration will be passed down to the OVN_South‐
1095 bound database as a new Port_Binding table entry. The ovn-con‐
1096 troller-vtep will recognize the change and bind the logical
1097 port to the corresponding VTEP gateway chassis. Configuration
1098 of binding the same VTEP logical switch to a different OVN
1099 logical networks is not allowed and a warning will be gener‐
1100 ated in the log.
1101
1102 5.
1103 Beside binding to the VTEP gateway chassis, the ovn-con‐
1104 troller-vtep will update the tunnel_key column of the VTEP
1105 logical switch to the corresponding Datapath_Binding table
1106 entry’s tunnel_key for the bound OVN logical network.
1107
1108 6.
1109 Next, the ovn-controller-vtep will keep reacting to the con‐
1110 figuration change in the Port_Binding in the OVN_Northbound
1111 database, and updating the Ucast_Macs_Remote table in the VTEP
1112 database. This allows the VTEP gateway to understand where to
1113 forward the unicast traffic coming from the extended external
1114 network.
1115
1116 7.
1117 Eventually, the VTEP gateway’s life cycle ends when the admin‐
1118 istrator unregisters the VTEP gateway from the VTEP database.
1119 The ovn-controller-vtep will recognize the event and remove
1120 all related configurations (Chassis table entry and port bind‐
1121 ings) in the OVN_Southbound database.
1122
1123 8.
1124 When the ovn-controller-vtep is terminated, all related con‐
1125 figurations in the OVN_Southbound database and the VTEP data‐
1126 base will be cleaned, including Chassis table entries for all
1127 registered VTEP gateways and their port bindings, and all
1128 Ucast_Macs_Remote table entries and the Logical_Switch tunnel
1129 keys.
1130
1132 Role-Based Access Controls for the Soutbound DB
1133 In order to provide additional security against the possibility of an
1134 OVN chassis becoming compromised in such a way as to allow rogue soft‐
1135 ware to make arbitrary modifications to the southbound database state
1136 and thus disrupt the OVN network, role-based access controls (see
1137 ovsdb-server(1) for additional details) are provided for the southbound
1138 database.
1139
1140 The implementation of role-based access controls (RBAC) requires the
1141 addition of two tables to an OVSDB schema: the RBAC_Role table, which
1142 is indexed by role name and maps the the names of the various tables
1143 that may be modifiable for a given role to individual rows in a permis‐
1144 sions table containing detailed permission information for that role,
1145 and the permission table itself which consists of rows containing the
1146 following information:
1147
1148 Table Name
1149 The name of the associated table. This column exists pri‐
1150 marily as an aid for humans reading the contents of this
1151 table.
1152
1153 Auth Criteria
1154 A set of strings containing the names of columns (or col‐
1155 umn:key pairs for columns containing string:string maps).
1156 The contents of at least one of the columns or column:key
1157 values in a row to be modified, inserted, or deleted must
1158 be equal to the ID of the client attempting to act on the
1159 row in order for the authorization check to pass. If the
1160 authorization criteria is empty, authorization checking
1161 is disabled and all clients for the role will be treated
1162 as authorized.
1163
1164 Insert/Delete
1165 Row insertion/deletion permission; boolean value indicat‐
1166 ing whether insertion and deletion of rows is allowed for
1167 the associated table. If true, insertion and deletion of
1168 rows is allowed for authorized clients.
1169
1170 Updatable Columns
1171 A set of strings containing the names of columns or col‐
1172 umn:key pairs that may be updated or mutated by autho‐
1173 rized clients. Modifications to columns within a row are
1174 only permitted when the authorization check for the
1175 client passes and all columns to be modified are included
1176 in this set of modifiable columns.
1177
1178 RBAC configuration for the OVN southbound database is maintained by
1179 ovn-northd. With RBAC enabled, modifications are only permitted for the
1180 Chassis, Encap, Port_Binding, and MAC_Binding tables, and are
1181 resstricted as follows:
1182
1183 Chassis
1184 Authorization: client ID must match the chassis name.
1185
1186 Insert/Delete: authorized row insertion and deletion are
1187 permitted.
1188
1189 Update: The columns nb_cfg, external_ids, encaps, and
1190 vtep_logical_switches may be modified when authorized.
1191
1192 Encap Authorization: client ID must match the chassis name.
1193
1194 Insert/Delete: row insertion and row deletion are permit‐
1195 ted.
1196
1197 Update: The columns type, options, and ip can be modi‐
1198 fied.
1199
1200 Port_Binding
1201 Authorization: disabled (all clients are considered
1202 authorized. A future enhancement may add columns (or keys
1203 to external_ids) in order to control which chassis are
1204 allowed to bind each port.
1205
1206 Insert/Delete: row insertion/deletion are not permitted
1207 (ovn-northd maintains rows in this table.
1208
1209 Update: Only modifications to the chassis column are per‐
1210 mitted.
1211
1212 MAC_Binding
1213 Authorization: disabled (all clients are considered to be
1214 authorized).
1215
1216 Insert/Delete: row insertion/deletion are permitted.
1217
1218 Update: The columns logical_port, ip, mac, and datapath
1219 may be modified by ovn-controller.
1220
1221 Enabling RBAC for ovn-controller connections to the southbound database
1222 requires the following steps:
1223
1224 1.
1225 Creating SSL certificates for each chassis with the certifi‐
1226 cate CN field set to the chassis name (e.g. for a chassis with
1227 external-ids:system-id=chassis-1, via the command "ovs-pki -B
1228 1024 -u req+sign chassis-1 switch").
1229
1230 2.
1231 Configuring each ovn-controller to use SSL when connecting to
1232 the southbound database (e.g. via "ovs-vsctl set open .
1233 external-ids:ovn-remote=ssl:x.x.x.x:6642").
1234
1235 3.
1236 Configuring a southbound database SSL remote with "ovn-con‐
1237 troller" role (e.g. via "ovn-sbctl set-connection
1238 role=ovn-controller pssl:6642").
1239
1241 Tunnel Encapsulations
1242 OVN annotates logical network packets that it sends from one hypervisor
1243 to another with the following three pieces of metadata, which are
1244 encoded in an encapsulation-specific fashion:
1245
1246 · 24-bit logical datapath identifier, from the tunnel_key
1247 column in the OVN Southbound Datapath_Binding table.
1248
1249 · 15-bit logical ingress port identifier. ID 0 is reserved
1250 for internal use within OVN. IDs 1 through 32767, inclu‐
1251 sive, may be assigned to logical ports (see the tun‐
1252 nel_key column in the OVN Southbound Port_Binding table).
1253
1254 · 16-bit logical egress port identifier. IDs 0 through
1255 32767 have the same meaning as for logical ingress ports.
1256 IDs 32768 through 65535, inclusive, may be assigned to
1257 logical multicast groups (see the tunnel_key column in
1258 the OVN Southbound Multicast_Group table).
1259
1260 For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
1261 encapsulations, for the following reasons:
1262
1263 · Only STT and Geneve support the large amounts of metadata
1264 (over 32 bits per packet) that OVN uses (as described
1265 above).
1266
1267 · STT and Geneve use randomized UDP or TCP source ports
1268 that allows efficient distribution among multiple paths
1269 in environments that use ECMP in their underlay.
1270
1271 · NICs are available to offload STT and Geneve encapsula‐
1272 tion and decapsulation.
1273
1274 Due to its flexibility, the preferred encapsulation between hypervisors
1275 is Geneve. For Geneve encapsulation, OVN transmits the logical datapath
1276 identifier in the Geneve VNI. OVN transmits the logical ingress and
1277 logical egress ports in a TLV with class 0x0102, type 0x80, and a
1278 32-bit value encoded as follows, from MSB to LSB:
1279
1280 1 15 16
1281 +---+------------+-----------+
1282 |rsv|ingress port|egress port|
1283 +---+------------+-----------+
1284 0
1285
1286
1287 Environments whose NICs lack Geneve offload may prefer STT encapsula‐
1288 tion for performance reasons. For STT encapsulation, OVN encodes all
1289 three pieces of logical metadata in the STT 64-bit tunnel ID as fol‐
1290 lows, from MSB to LSB:
1291
1292 9 15 16 24
1293 +--------+------------+-----------+--------+
1294 |reserved|ingress port|egress port|datapath|
1295 +--------+------------+-----------+--------+
1296 0
1297
1298
1299 For connecting to gateways, in addition to Geneve and STT, OVN supports
1300 VXLAN, because only VXLAN support is common on top-of-rack (ToR)
1301 switches. Currently, gateways have a feature set that matches the capa‐
1302 bilities as defined by the VTEP schema, so fewer bits of metadata are
1303 necessary. In the future, gateways that do not support encapsulations
1304 with large amounts of metadata may continue to have a reduced feature
1305 set.
1306
1307
1308
1309Open vSwitch 2.10.1 OVN Architecture ovn-architecture(7)