1ovn-architecture(7) Open vSwitch Manual ovn-architecture(7)
2
3
4
6 ovn-architecture - Open Virtual Network architecture
7
9 OVN, the Open Virtual Network, is a system to support virtual network
10 abstraction. OVN complements the existing capabilities of OVS to add
11 native support for virtual network abstractions, such as virtual L2 and
12 L3 overlays and security groups. Services such as DHCP are also desir‐
13 able features. Just like OVS, OVN’s design goal is to have a produc‐
14 tion-quality implementation that can operate at significant scale.
15
16 An OVN deployment consists of several components:
17
18 · A Cloud Management System (CMS), which is OVN’s ultimate
19 client (via its users and administrators). OVN integra‐
20 tion requires installing a CMS-specific plugin and
21 related software (see below). OVN initially targets Open‐
22 Stack as CMS.
23
24 We generally speak of ``the’’ CMS, but one can imagine
25 scenarios in which multiple CMSes manage different parts
26 of an OVN deployment.
27
28 · An OVN Database physical or virtual node (or, eventually,
29 cluster) installed in a central location.
30
31 · One or more (usually many) hypervisors. Hypervisors must
32 run Open vSwitch and implement the interface described in
33 IntegrationGuide.rst in the OVS source tree. Any hypervi‐
34 sor platform supported by Open vSwitch is acceptable.
35
36 · Zero or more gateways. A gateway extends a tunnel-based
37 logical network into a physical network by bidirection‐
38 ally forwarding packets between tunnels and a physical
39 Ethernet port. This allows non-virtualized machines to
40 participate in logical networks. A gateway may be a phys‐
41 ical host, a virtual machine, or an ASIC-based hardware
42 switch that supports the vtep(5) schema.
43
44 Hypervisors and gateways are together called transport
45 node or chassis.
46
47 The diagram below shows how the major components of OVN and related
48 software interact. Starting at the top of the diagram, we have:
49
50 · The Cloud Management System, as defined above.
51
52 · The OVN/CMS Plugin is the component of the CMS that
53 interfaces to OVN. In OpenStack, this is a Neutron plug‐
54 in. The plugin’s main purpose is to translate the CMS’s
55 notion of logical network configuration, stored in the
56 CMS’s configuration database in a CMS-specific format,
57 into an intermediate representation understood by OVN.
58
59 This component is necessarily CMS-specific, so a new
60 plugin needs to be developed for each CMS that is inte‐
61 grated with OVN. All of the components below this one in
62 the diagram are CMS-independent.
63
64 · The OVN Northbound Database receives the intermediate
65 representation of logical network configuration passed
66 down by the OVN/CMS Plugin. The database schema is meant
67 to be ``impedance matched’’ with the concepts used in a
68 CMS, so that it directly supports notions of logical
69 switches, routers, ACLs, and so on. See ovn-nb(5) for
70 details.
71
72 The OVN Northbound Database has only two clients: the
73 OVN/CMS Plugin above it and ovn-northd below it.
74
75 · ovn-northd(8) connects to the OVN Northbound Database
76 above it and the OVN Southbound Database below it. It
77 translates the logical network configuration in terms of
78 conventional network concepts, taken from the OVN North‐
79 bound Database, into logical datapath flows in the OVN
80 Southbound Database below it.
81
82 · The OVN Southbound Database is the center of the system.
83 Its clients are ovn-northd(8) above it and ovn-con‐
84 troller(8) on every transport node below it.
85
86 The OVN Southbound Database contains three kinds of data:
87 Physical Network (PN) tables that specify how to reach
88 hypervisor and other nodes, Logical Network (LN) tables
89 that describe the logical network in terms of ``logical
90 datapath flows,’’ and Binding tables that link logical
91 network components’ locations to the physical network.
92 The hypervisors populate the PN and Port_Binding tables,
93 whereas ovn-northd(8) populates the LN tables.
94
95 OVN Southbound Database performance must scale with the
96 number of transport nodes. This will likely require some
97 work on ovsdb-server(1) as we encounter bottlenecks.
98 Clustering for availability may be needed.
99
100 The remaining components are replicated onto each hypervisor:
101
102 · ovn-controller(8) is OVN’s agent on each hypervisor and
103 software gateway. Northbound, it connects to the OVN
104 Southbound Database to learn about OVN configuration and
105 status and to populate the PN table and the Chassis col‐
106 umn in Binding table with the hypervisor’s status. South‐
107 bound, it connects to ovs-vswitchd(8) as an OpenFlow con‐
108 troller, for control over network traffic, and to the
109 local ovsdb-server(1) to allow it to monitor and control
110 Open vSwitch configuration.
111
112 · ovs-vswitchd(8) and ovsdb-server(1) are conventional com‐
113 ponents of Open vSwitch.
114
115 CMS
116 |
117 |
118 +-----------|-----------+
119 | | |
120 | OVN/CMS Plugin |
121 | | |
122 | | |
123 | OVN Northbound DB |
124 | | |
125 | | |
126 | ovn-northd |
127 | | |
128 +-----------|-----------+
129 |
130 |
131 +-------------------+
132 | OVN Southbound DB |
133 +-------------------+
134 |
135 |
136 +------------------+------------------+
137 | | |
138 HV 1 | | HV n |
139 +---------------|---------------+ . +---------------|---------------+
140 | | | . | | |
141 | ovn-controller | . | ovn-controller |
142 | | | | . | | | |
143 | | | | | | | |
144 | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
145 | | | |
146 +-------------------------------+ +-------------------------------+
147
148
149 Information Flow in OVN
150 Configuration data in OVN flows from north to south. The CMS, through
151 its OVN/CMS plugin, passes the logical network configuration to
152 ovn-northd via the northbound database. In turn, ovn-northd compiles
153 the configuration into a lower-level form and passes it to all of the
154 chassis via the southbound database.
155
156 Status information in OVN flows from south to north. OVN currently pro‐
157 vides only a few forms of status information. First, ovn-northd popu‐
158 lates the up column in the northbound Logical_Switch_Port table: if a
159 logical port’s chassis column in the southbound Port_Binding table is
160 nonempty, it sets up to true, otherwise to false. This allows the CMS
161 to detect when a VM’s networking has come up.
162
163 Second, OVN provides feedback to the CMS on the realization of its con‐
164 figuration, that is, whether the configuration provided by the CMS has
165 taken effect. This feature requires the CMS to participate in a
166 sequence number protocol, which works the following way:
167
168 1. When the CMS updates the configuration in the northbound
169 database, as part of the same transaction, it increments the
170 value of the nb_cfg column in the NB_Global table. (This is
171 only necessary if the CMS wants to know when the configura‐
172 tion has been realized.)
173
174 2. When ovn-northd updates the southbound database based on a
175 given snapshot of the northbound database, it copies nb_cfg
176 from northbound NB_Global into the southbound database
177 SB_Global table, as part of the same transaction. (Thus, an
178 observer monitoring both databases can determine when the
179 southbound database is caught up with the northbound.)
180
181 3. After ovn-northd receives confirmation from the southbound
182 database server that its changes have committed, it updates
183 sb_cfg in the northbound NB_Global table to the nb_cfg ver‐
184 sion that was pushed down. (Thus, the CMS or another
185 observer can determine when the southbound database is
186 caught up without a connection to the southbound database.)
187
188 4. The ovn-controller process on each chassis receives the
189 updated southbound database, with the updated nb_cfg. This
190 process in turn updates the physical flows installed in the
191 chassis’s Open vSwitch instances. When it receives confirma‐
192 tion from Open vSwitch that the physical flows have been
193 updated, it updates nb_cfg in its own Chassis record in the
194 southbound database.
195
196 5. ovn-northd monitors the nb_cfg column in all of the Chassis
197 records in the southbound database. It keeps track of the
198 minimum value among all the records and copies it into the
199 hv_cfg column in the northbound NB_Global table. (Thus, the
200 CMS or another observer can determine when all of the hyper‐
201 visors have caught up to the northbound configuration.)
202
203 Chassis Setup
204 Each chassis in an OVN deployment must be configured with an Open
205 vSwitch bridge dedicated for OVN’s use, called the integration bridge.
206 System startup scripts may create this bridge prior to starting
207 ovn-controller if desired. If this bridge does not exist when ovn-con‐
208 troller starts, it will be created automatically with the default con‐
209 figuration suggested below. The ports on the integration bridge
210 include:
211
212 · On any chassis, tunnel ports that OVN uses to maintain
213 logical network connectivity. ovn-controller adds,
214 updates, and removes these tunnel ports.
215
216 · On a hypervisor, any VIFs that are to be attached to log‐
217 ical networks. The hypervisor itself, or the integration
218 between Open vSwitch and the hypervisor (described in
219 IntegrationGuide.rst) takes care of this. (This is not
220 part of OVN or new to OVN; this is pre-existing integra‐
221 tion work that has already been done on hypervisors that
222 support OVS.)
223
224 · On a gateway, the physical port used for logical network
225 connectivity. System startup scripts add this port to the
226 bridge prior to starting ovn-controller. This can be a
227 patch port to another bridge, instead of a physical port,
228 in more sophisticated setups.
229
230 Other ports should not be attached to the integration bridge. In par‐
231 ticular, physical ports attached to the underlay network (as opposed to
232 gateway ports, which are physical ports attached to logical networks)
233 must not be attached to the integration bridge. Underlay physical ports
234 should instead be attached to a separate Open vSwitch bridge (they need
235 not be attached to any bridge at all, in fact).
236
237 The integration bridge should be configured as described below. The
238 effect of each of these settings is documented in
239 ovs-vswitchd.conf.db(5):
240
241 fail-mode=secure
242 Avoids switching packets between isolated logical net‐
243 works before ovn-controller starts up. See Controller
244 Failure Settings in ovs-vsctl(8) for more information.
245
246 other-config:disable-in-band=true
247 Suppresses in-band control flows for the integration
248 bridge. It would be unusual for such flows to show up
249 anyway, because OVN uses a local controller (over a Unix
250 domain socket) instead of a remote controller. It’s pos‐
251 sible, however, for some other bridge in the same system
252 to have an in-band remote controller, and in that case
253 this suppresses the flows that in-band control would
254 ordinarily set up. Refer to the documentation for more
255 information.
256
257 The customary name for the integration bridge is br-int, but another
258 name may be used.
259
260 Logical Networks
261 A logical network implements the same concepts as physical networks,
262 but they are insulated from the physical network with tunnels or other
263 encapsulations. This allows logical networks to have separate IP and
264 other address spaces that overlap, without conflicting, with those used
265 for physical networks. Logical network topologies can be arranged with‐
266 out regard for the topologies of the physical networks on which they
267 run.
268
269 Logical network concepts in OVN include:
270
271 · Logical switches, the logical version of Ethernet
272 switches.
273
274 · Logical routers, the logical version of IP routers. Logi‐
275 cal switches and routers can be connected into sophisti‐
276 cated topologies.
277
278 · Logical datapaths are the logical version of an OpenFlow
279 switch. Logical switches and routers are both implemented
280 as logical datapaths.
281
282 · Logical ports represent the points of connectivity in and
283 out of logical switches and logical routers. Some common
284 types of logical ports are:
285
286 · Logical ports representing VIFs.
287
288 · Localnet ports represent the points of connectiv‐
289 ity between logical switches and the physical net‐
290 work. They are implemented as OVS patch ports
291 between the integration bridge and the separate
292 Open vSwitch bridge that underlay physical ports
293 attach to.
294
295 · Logical patch ports represent the points of con‐
296 nectivity between logical switches and logical
297 routers, and in some cases between peer logical
298 routers. There is a pair of logical patch ports at
299 each such point of connectivity, one on each side.
300
301 · Localport ports represent the points of local con‐
302 nectivity between logical switches and VIFs. These
303 ports are present in every chassis (not bound to
304 any particular one) and traffic from them will
305 never go through a tunnel. A localport is expected
306 to only generate traffic destined for a local des‐
307 tination, typically in response to a request it
308 received. One use case is how OpenStack Neutron
309 uses a localport port for serving metadata to VM’s
310 residing on every hypervisor. A metadata proxy
311 process is attached to this port on every host and
312 all VM’s within the same network will reach it at
313 the same IP/MAC address without any traffic being
314 sent over a tunnel. Further details can be seen at
315 https://docs.openstack.org/developer/networking-
316 ovn/design/metadata_api.html.
317
318 Life Cycle of a VIF
319 Tables and their schemas presented in isolation are difficult to under‐
320 stand. Here’s an example.
321
322 A VIF on a hypervisor is a virtual network interface attached either to
323 a VM or a container running directly on that hypervisor (This is dif‐
324 ferent from the interface of a container running inside a VM).
325
326 The steps in this example refer often to details of the OVN and OVN
327 Northbound database schemas. Please see ovn-sb(5) and ovn-nb(5),
328 respectively, for the full story on these databases.
329
330 1. A VIF’s life cycle begins when a CMS administrator creates a
331 new VIF using the CMS user interface or API and adds it to a
332 switch (one implemented by OVN as a logical switch). The CMS
333 updates its own configuration. This includes associating
334 unique, persistent identifier vif-id and Ethernet address
335 mac with the VIF.
336
337 2. The CMS plugin updates the OVN Northbound database to
338 include the new VIF, by adding a row to the Logi‐
339 cal_Switch_Port table. In the new row, name is vif-id, mac
340 is mac, switch points to the OVN logical switch’s Logi‐
341 cal_Switch record, and other columns are initialized appro‐
342 priately.
343
344 3. ovn-northd receives the OVN Northbound database update. In
345 turn, it makes the corresponding updates to the OVN South‐
346 bound database, by adding rows to the OVN Southbound data‐
347 base Logical_Flow table to reflect the new port, e.g. add a
348 flow to recognize that packets destined to the new port’s
349 MAC address should be delivered to it, and update the flow
350 that delivers broadcast and multicast packets to include the
351 new port. It also creates a record in the Binding table and
352 populates all its columns except the column that identifies
353 the chassis.
354
355 4. On every hypervisor, ovn-controller receives the Logi‐
356 cal_Flow table updates that ovn-northd made in the previous
357 step. As long as the VM that owns the VIF is powered off,
358 ovn-controller cannot do much; it cannot, for example,
359 arrange to send packets to or receive packets from the VIF,
360 because the VIF does not actually exist anywhere.
361
362 5. Eventually, a user powers on the VM that owns the VIF. On
363 the hypervisor where the VM is powered on, the integration
364 between the hypervisor and Open vSwitch (described in Inte‐
365 grationGuide.rst) adds the VIF to the OVN integration bridge
366 and stores vif-id in external_ids:iface-id to indicate that
367 the interface is an instantiation of the new VIF. (None of
368 this code is new in OVN; this is pre-existing integration
369 work that has already been done on hypervisors that support
370 OVS.)
371
372 6. On the hypervisor where the VM is powered on, ovn-controller
373 notices external_ids:iface-id in the new Interface. In
374 response, in the OVN Southbound DB, it updates the Binding
375 table’s chassis column for the row that links the logical
376 port from external_ids: iface-id to the hypervisor. After‐
377 ward, ovn-controller updates the local hypervisor’s OpenFlow
378 tables so that packets to and from the VIF are properly han‐
379 dled.
380
381 7. Some CMS systems, including OpenStack, fully start a VM only
382 when its networking is ready. To support this, ovn-northd
383 notices the chassis column updated for the row in Binding
384 table and pushes this upward by updating the up column in
385 the OVN Northbound database’s Logical_Switch_Port table to
386 indicate that the VIF is now up. The CMS, if it uses this
387 feature, can then react by allowing the VM’s execution to
388 proceed.
389
390 8. On every hypervisor but the one where the VIF resides,
391 ovn-controller notices the completely populated row in the
392 Binding table. This provides ovn-controller the physical
393 location of the logical port, so each instance updates the
394 OpenFlow tables of its switch (based on logical datapath
395 flows in the OVN DB Logical_Flow table) so that packets to
396 and from the VIF can be properly handled via tunnels.
397
398 9. Eventually, a user powers off the VM that owns the VIF. On
399 the hypervisor where the VM was powered off, the VIF is
400 deleted from the OVN integration bridge.
401
402 10. On the hypervisor where the VM was powered off, ovn-con‐
403 troller notices that the VIF was deleted. In response, it
404 removes the Chassis column content in the Binding table for
405 the logical port.
406
407 11. On every hypervisor, ovn-controller notices the empty Chas‐
408 sis column in the Binding table’s row for the logical port.
409 This means that ovn-controller no longer knows the physical
410 location of the logical port, so each instance updates its
411 OpenFlow table to reflect that.
412
413 12. Eventually, when the VIF (or its entire VM) is no longer
414 needed by anyone, an administrator deletes the VIF using the
415 CMS user interface or API. The CMS updates its own configu‐
416 ration.
417
418 13. The CMS plugin removes the VIF from the OVN Northbound data‐
419 base, by deleting its row in the Logical_Switch_Port table.
420
421 14. ovn-northd receives the OVN Northbound update and in turn
422 updates the OVN Southbound database accordingly, by removing
423 or updating the rows from the OVN Southbound database Logi‐
424 cal_Flow table and Binding table that were related to the
425 now-destroyed VIF.
426
427 15. On every hypervisor, ovn-controller receives the Logi‐
428 cal_Flow table updates that ovn-northd made in the previous
429 step. ovn-controller updates OpenFlow tables to reflect the
430 update, although there may not be much to do, since the VIF
431 had already become unreachable when it was removed from the
432 Binding table in a previous step.
433
434 Life Cycle of a Container Interface Inside a VM
435 OVN provides virtual network abstractions by converting information
436 written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
437 virtual networking for multi-tenants can only be provided if OVN con‐
438 troller is the only entity that can modify flows in Open vSwitch. When
439 the Open vSwitch integration bridge resides in the hypervisor, it is a
440 fair assumption to make that tenant workloads running inside VMs cannot
441 make any changes to Open vSwitch flows.
442
443 If the infrastructure provider trusts the applications inside the con‐
444 tainers not to break out and modify the Open vSwitch flows, then con‐
445 tainers can be run in hypervisors. This is also the case when contain‐
446 ers are run inside the VMs and Open vSwitch integration bridge with
447 flows added by OVN controller resides in the same VM. For both the
448 above cases, the workflow is the same as explained with an example in
449 the previous section ("Life Cycle of a VIF").
450
451 This section talks about the life cycle of a container interface (CIF)
452 when containers are created in the VMs and the Open vSwitch integration
453 bridge resides inside the hypervisor. In this case, even if a container
454 application breaks out, other tenants are not affected because the con‐
455 tainers running inside the VMs cannot modify the flows in the Open
456 vSwitch integration bridge.
457
458 When multiple containers are created inside a VM, there are multiple
459 CIFs associated with them. The network traffic associated with these
460 CIFs need to reach the Open vSwitch integration bridge running in the
461 hypervisor for OVN to support virtual network abstractions. OVN should
462 also be able to distinguish network traffic coming from different CIFs.
463 There are two ways to distinguish network traffic of CIFs.
464
465 One way is to provide one VIF for every CIF (1:1 model). This means
466 that there could be a lot of network devices in the hypervisor. This
467 would slow down OVS because of all the additional CPU cycles needed for
468 the management of all the VIFs. It would also mean that the entity cre‐
469 ating the containers in a VM should also be able to create the corre‐
470 sponding VIFs in the hypervisor.
471
472 The second way is to provide a single VIF for all the CIFs (1:many
473 model). OVN could then distinguish network traffic coming from differ‐
474 ent CIFs via a tag written in every packet. OVN uses this mechanism and
475 uses VLAN as the tagging mechanism.
476
477 1. A CIF’s life cycle begins when a container is spawned inside
478 a VM by the either the same CMS that created the VM or a
479 tenant that owns that VM or even a container Orchestration
480 System that is different than the CMS that initially created
481 the VM. Whoever the entity is, it will need to know the vif-
482 id that is associated with the network interface of the VM
483 through which the container interface’s network traffic is
484 expected to go through. The entity that creates the con‐
485 tainer interface will also need to choose an unused VLAN
486 inside that VM.
487
488 2. The container spawning entity (either directly or through
489 the CMS that manages the underlying infrastructure) updates
490 the OVN Northbound database to include the new CIF, by
491 adding a row to the Logical_Switch_Port table. In the new
492 row, name is any unique identifier, parent_name is the vif-
493 id of the VM through which the CIF’s network traffic is
494 expected to go through and the tag is the VLAN tag that
495 identifies the network traffic of that CIF.
496
497 3. ovn-northd receives the OVN Northbound database update. In
498 turn, it makes the corresponding updates to the OVN South‐
499 bound database, by adding rows to the OVN Southbound data‐
500 base’s Logical_Flow table to reflect the new port and also
501 by creating a new row in the Binding table and populating
502 all its columns except the column that identifies the chas‐
503 sis.
504
505 4. On every hypervisor, ovn-controller subscribes to the
506 changes in the Binding table. When a new row is created by
507 ovn-northd that includes a value in parent_port column of
508 Binding table, the ovn-controller in the hypervisor whose
509 OVN integration bridge has that same value in vif-id in
510 external_ids:iface-id updates the local hypervisor’s Open‐
511 Flow tables so that packets to and from the VIF with the
512 particular VLAN tag are properly handled. Afterward it
513 updates the chassis column of the Binding to reflect the
514 physical location.
515
516 5. One can only start the application inside the container
517 after the underlying network is ready. To support this,
518 ovn-northd notices the updated chassis column in Binding ta‐
519 ble and updates the up column in the OVN Northbound data‐
520 base’s Logical_Switch_Port table to indicate that the CIF is
521 now up. The entity responsible to start the container appli‐
522 cation queries this value and starts the application.
523
524 6. Eventually the entity that created and started the con‐
525 tainer, stops it. The entity, through the CMS (or directly)
526 deletes its row in the Logical_Switch_Port table.
527
528 7. ovn-northd receives the OVN Northbound update and in turn
529 updates the OVN Southbound database accordingly, by removing
530 or updating the rows from the OVN Southbound database Logi‐
531 cal_Flow table that were related to the now-destroyed CIF.
532 It also deletes the row in the Binding table for that CIF.
533
534 8. On every hypervisor, ovn-controller receives the Logi‐
535 cal_Flow table updates that ovn-northd made in the previous
536 step. ovn-controller updates OpenFlow tables to reflect the
537 update.
538
539 Architectural Physical Life Cycle of a Packet
540 This section describes how a packet travels from one virtual machine or
541 container to another through OVN. This description focuses on the phys‐
542 ical treatment of a packet; for a description of the logical life cycle
543 of a packet, please refer to the Logical_Flow table in ovn-sb(5).
544
545 This section mentions several data and metadata fields, for clarity
546 summarized here:
547
548 tunnel key
549 When OVN encapsulates a packet in Geneve or another tun‐
550 nel, it attaches extra data to it to allow the receiving
551 OVN instance to process it correctly. This takes differ‐
552 ent forms depending on the particular encapsulation, but
553 in each case we refer to it here as the ``tunnel key.’’
554 See Tunnel Encapsulations, below, for details.
555
556 logical datapath field
557 A field that denotes the logical datapath through which a
558 packet is being processed. OVN uses the field that Open‐
559 Flow 1.1+ simply (and confusingly) calls ``metadata’’ to
560 store the logical datapath. (This field is passed across
561 tunnels as part of the tunnel key.)
562
563 logical input port field
564 A field that denotes the logical port from which the
565 packet entered the logical datapath. OVN stores this in
566 Open vSwitch extension register number 14.
567
568 Geneve and STT tunnels pass this field as part of the
569 tunnel key. Although VXLAN tunnels do not explicitly
570 carry a logical input port, OVN only uses VXLAN to commu‐
571 nicate with gateways that from OVN’s perspective consist
572 of only a single logical port, so that OVN can set the
573 logical input port field to this one on ingress to the
574 OVN logical pipeline.
575
576 logical output port field
577 A field that denotes the logical port from which the
578 packet will leave the logical datapath. This is initial‐
579 ized to 0 at the beginning of the logical ingress pipe‐
580 line. OVN stores this in Open vSwitch extension register
581 number 15.
582
583 Geneve and STT tunnels pass this field as part of the
584 tunnel key. VXLAN tunnels do not transmit the logical
585 output port field. Since VXLAN tunnels do not carry a
586 logical output port field in the tunnel key, when a
587 packet is received from VXLAN tunnel by an OVN hypervi‐
588 sor, the packet is resubmitted to table 8 to determine
589 the output port(s); when the packet reaches table 32,
590 these packets are resubmitted to table 33 for local
591 delivery by checking a MLF_RCV_FROM_VXLAN flag, which is
592 set when the packet arrives from a VXLAN tunnel.
593
594 conntrack zone field for logical ports
595 A field that denotes the connection tracking zone for
596 logical ports. The value only has local significance and
597 is not meaningful between chassis. This is initialized to
598 0 at the beginning of the logical ingress pipeline. OVN
599 stores this in Open vSwitch extension register number 13.
600
601 conntrack zone fields for routers
602 Fields that denote the connection tracking zones for
603 routers. These values only have local significance and
604 are not meaningful between chassis. OVN stores the zone
605 information for DNATting in Open vSwitch extension regis‐
606 ter number 11 and zone information for SNATing in Open
607 vSwitch extension register number 12.
608
609 logical flow flags
610 The logical flags are intended to handle keeping context
611 between tables in order to decide which rules in subse‐
612 quent tables are matched. These values only have local
613 significance and are not meaningful between chassis. OVN
614 stores the logical flags in Open vSwitch extension regis‐
615 ter number 10.
616
617 VLAN ID
618 The VLAN ID is used as an interface between OVN and con‐
619 tainers nested inside a VM (see Life Cycle of a container
620 interface inside a VM, above, for more information).
621
622 Initially, a VM or container on the ingress hypervisor sends a packet
623 on a port attached to the OVN integration bridge. Then:
624
625 1. OpenFlow table 0 performs physical-to-logical translation.
626 It matches the packet’s ingress port. Its actions annotate
627 the packet with logical metadata, by setting the logical
628 datapath field to identify the logical datapath that the
629 packet is traversing and the logical input port field to
630 identify the ingress port. Then it resubmits to table 8 to
631 enter the logical ingress pipeline.
632
633 Packets that originate from a container nested within a VM
634 are treated in a slightly different way. The originating
635 container can be distinguished based on the VIF-specific
636 VLAN ID, so the physical-to-logical translation flows addi‐
637 tionally match on VLAN ID and the actions strip the VLAN
638 header. Following this step, OVN treats packets from con‐
639 tainers just like any other packets.
640
641 Table 0 also processes packets that arrive from other chas‐
642 sis. It distinguishes them from other packets by ingress
643 port, which is a tunnel. As with packets just entering the
644 OVN pipeline, the actions annotate these packets with logi‐
645 cal datapath and logical ingress port metadata. In addition,
646 the actions set the logical output port field, which is
647 available because in OVN tunneling occurs after the logical
648 output port is known. These three pieces of information are
649 obtained from the tunnel encapsulation metadata (see Tunnel
650 Encapsulations for encoding details). Then the actions
651 resubmit to table 33 to enter the logical egress pipeline.
652
653 2. OpenFlow tables 8 through 31 execute the logical ingress
654 pipeline from the Logical_Flow table in the OVN Southbound
655 database. These tables are expressed entirely in terms of
656 logical concepts like logical ports and logical datapaths. A
657 big part of ovn-controller’s job is to translate them into
658 equivalent OpenFlow (in particular it translates the table
659 numbers: Logical_Flow tables 0 through 23 become OpenFlow
660 tables 8 through 31).
661
662 Each logical flow maps to one or more OpenFlow flows. An
663 actual packet ordinarily matches only one of these, although
664 in some cases it can match more than one of these flows
665 (which is not a problem because all of them have the same
666 actions). ovn-controller uses the first 32 bits of the logi‐
667 cal flow’s UUID as the cookie for its OpenFlow flow or
668 flows. (This is not necessarily unique, since the first 32
669 bits of a logical flow’s UUID is not necessarily unique.)
670
671 Some logical flows can map to the Open vSwitch ``conjunctive
672 match’’ extension (see ovs-fields(7)). Flows with a conjunc‐
673 tion action use an OpenFlow cookie of 0, because they can
674 correspond to multiple logical flows. The OpenFlow flow for
675 a conjunctive match includes a match on conj_id.
676
677 Some logical flows may not be represented in the OpenFlow
678 tables on a given hypervisor, if they could not be used on
679 that hypervisor. For example, if no VIF in a logical switch
680 resides on a given hypervisor, and the logical switch is not
681 otherwise reachable on that hypervisor (e.g. over a series
682 of hops through logical switches and routers starting from a
683 VIF on the hypervisor), then the logical flow may not be
684 represented there.
685
686 Most OVN actions have fairly obvious implementations in
687 OpenFlow (with OVS extensions), e.g. next; is implemented as
688 resubmit, field = constant; as set_field. A few are worth
689 describing in more detail:
690
691 output:
692 Implemented by resubmitting the packet to table 32.
693 If the pipeline executes more than one output action,
694 then each one is separately resubmitted to table 32.
695 This can be used to send multiple copies of the
696 packet to multiple ports. (If the packet was not mod‐
697 ified between the output actions, and some of the
698 copies are destined to the same hypervisor, then
699 using a logical multicast output port would save
700 bandwidth between hypervisors.)
701
702 get_arp(P, A);
703 get_nd(P, A);
704 Implemented by storing arguments into OpenFlow fields,
705 then resubmitting to table 66, which ovn-controller
706 populates with flows generated from the MAC_Binding ta‐
707 ble in the OVN Southbound database. If there is a match
708 in table 66, then its actions store the bound MAC in
709 the Ethernet destination address field.
710
711 (The OpenFlow actions save and restore the OpenFlow
712 fields used for the arguments, so that the OVN actions
713 do not have to be aware of this temporary use.)
714
715 put_arp(P, A, E);
716 put_nd(P, A, E);
717 Implemented by storing the arguments into OpenFlow
718 fields, then outputting a packet to ovn-controller,
719 which updates the MAC_Binding table.
720
721 (The OpenFlow actions save and restore the OpenFlow
722 fields used for the arguments, so that the OVN actions
723 do not have to be aware of this temporary use.)
724
725 3. OpenFlow tables 32 through 47 implement the output action in
726 the logical ingress pipeline. Specifically, table 32 handles
727 packets to remote hypervisors, table 33 handles packets to
728 the local hypervisor, and table 34 checks whether packets
729 whose logical ingress and egress port are the same should be
730 discarded.
731
732 Logical patch ports are a special case. Logical patch ports
733 do not have a physical location and effectively reside on
734 every hypervisor. Thus, flow table 33, for output to ports
735 on the local hypervisor, naturally implements output to uni‐
736 cast logical patch ports too. However, applying the same
737 logic to a logical patch port that is part of a logical mul‐
738 ticast group yields packet duplication, because each hyper‐
739 visor that contains a logical port in the multicast group
740 will also output the packet to the logical patch port. Thus,
741 multicast groups implement output to logical patch ports in
742 table 32.
743
744 Each flow in table 32 matches on a logical output port for
745 unicast or multicast logical ports that include a logical
746 port on a remote hypervisor. Each flow’s actions implement
747 sending a packet to the port it matches. For unicast logical
748 output ports on remote hypervisors, the actions set the tun‐
749 nel key to the correct value, then send the packet on the
750 tunnel port to the correct hypervisor. (When the remote
751 hypervisor receives the packet, table 0 there will recognize
752 it as a tunneled packet and pass it along to table 33.) For
753 multicast logical output ports, the actions send one copy of
754 the packet to each remote hypervisor, in the same way as for
755 unicast destinations. If a multicast group includes a logi‐
756 cal port or ports on the local hypervisor, then its actions
757 also resubmit to table 33. Table 32 also includes:
758
759 · A higher-priority rule to match packets received from
760 VXLAN tunnels, based on flag MLF_RCV_FROM_VXLAN, and
761 resubmit these packets to table 33 for local deliv‐
762 ery. Packets received from VXLAN tunnels reach here
763 because of a lack of logical output port field in the
764 tunnel key and thus these packets needed to be sub‐
765 mitted to table 8 to determine the output port.
766
767 · A higher-priority rule to match packets received from
768 ports of type localport, based on the logical input
769 port, and resubmit these packets to table 33 for
770 local delivery. Ports of type localport exist on
771 every hypervisor and by definition their traffic
772 should never go out through a tunnel.
773
774 · A higher-priority rule to match packets that have the
775 MLF_LOCAL_ONLY logical flow flag set, and whose des‐
776 tination is a multicast address. This flag indicates
777 that the packet should not be delivered to remote
778 hypervisors, even if the multicast destination
779 includes ports on remote hypervisors. This flag is
780 used when ovn-controller is the originator of the
781 multicast packet. Since each ovn-controller instance
782 is originating these packets, the packets only need
783 to be delivered to local ports.
784
785 · A fallback flow that resubmits to table 33 if there
786 is no other match.
787
788 Flows in table 33 resemble those in table 32 but for logical
789 ports that reside locally rather than remotely. For unicast
790 logical output ports on the local hypervisor, the actions
791 just resubmit to table 34. For multicast output ports that
792 include one or more logical ports on the local hypervisor,
793 for each such logical port P, the actions change the logical
794 output port to P, then resubmit to table 34.
795
796 A special case is that when a localnet port exists on the
797 datapath, remote port is connected by switching to the
798 localnet port. In this case, instead of adding a flow in ta‐
799 ble 32 to reach the remote port, a flow is added in table 33
800 to switch the logical outport to the localnet port, and
801 resubmit to table 33 as if it were unicasted to a logical
802 port on the local hypervisor.
803
804 Table 34 matches and drops packets for which the logical
805 input and output ports are the same and the MLF_ALLOW_LOOP‐
806 BACK flag is not set. It resubmits other packets to table
807 40.
808
809 4. OpenFlow tables 40 through 63 execute the logical egress
810 pipeline from the Logical_Flow table in the OVN Southbound
811 database. The egress pipeline can perform a final stage of
812 validation before packet delivery. Eventually, it may exe‐
813 cute an output action, which ovn-controller implements by
814 resubmitting to table 64. A packet for which the pipeline
815 never executes output is effectively dropped (although it
816 may have been transmitted through a tunnel across a physical
817 network).
818
819 The egress pipeline cannot change the logical output port or
820 cause further tunneling.
821
822 5. Table 64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK
823 is set. Logical loopback was handled in table 34, but Open‐
824 Flow by default also prevents loopback to the OpenFlow
825 ingress port. Thus, when MLF_ALLOW_LOOPBACK is set, OpenFlow
826 table 64 saves the OpenFlow ingress port, sets it to zero,
827 resubmits to table 65 for logical-to-physical transforma‐
828 tion, and then restores the OpenFlow ingress port, effec‐
829 tively disabling OpenFlow loopback prevents. When
830 MLF_ALLOW_LOOPBACK is unset, table 64 flow simply resubmits
831 to table 65.
832
833 6. OpenFlow table 65 performs logical-to-physical translation,
834 the opposite of table 0. It matches the packet’s logical
835 egress port. Its actions output the packet to the port
836 attached to the OVN integration bridge that represents that
837 logical port. If the logical egress port is a container
838 nested with a VM, then before sending the packet the actions
839 push on a VLAN header with an appropriate VLAN ID.
840
841 Logical Routers and Logical Patch Ports
842 Typically logical routers and logical patch ports do not have a physi‐
843 cal location and effectively reside on every hypervisor. This is the
844 case for logical patch ports between logical routers and logical
845 switches behind those logical routers, to which VMs (and VIFs) attach.
846
847 Consider a packet sent from one virtual machine or container to another
848 VM or container that resides on a different subnet. The packet will
849 traverse tables 0 to 65 as described in the previous section Architec‐
850 tural Physical Life Cycle of a Packet, using the logical datapath rep‐
851 resenting the logical switch that the sender is attached to. At table
852 32, the packet will use the fallback flow that resubmits locally to ta‐
853 ble 33 on the same hypervisor. In this case, all of the processing from
854 table 0 to table 65 occurs on the hypervisor where the sender resides.
855
856 When the packet reaches table 65, the logical egress port is a logical
857 patch port. The implementation in table 65 differs depending on the OVS
858 version, although the observed behavior is meant to be the same:
859
860 · In OVS versions 2.6 and earlier, table 65 outputs to an
861 OVS patch port that represents the logical patch port.
862 The packet re-enters the OpenFlow flow table from the OVS
863 patch port’s peer in table 0, which identifies the logi‐
864 cal datapath and logical input port based on the OVS
865 patch port’s OpenFlow port number.
866
867 · In OVS versions 2.7 and later, the packet is cloned and
868 resubmitted directly to the first OpenFlow flow table in
869 the ingress pipeline, setting the logical ingress port to
870 the peer logical patch port, and using the peer logical
871 patch port’s logical datapath (that represents the logi‐
872 cal router).
873
874 The packet re-enters the ingress pipeline in order to traverse tables 8
875 to 65 again, this time using the logical datapath representing the log‐
876 ical router. The processing continues as described in the previous sec‐
877 tion Architectural Physical Life Cycle of a Packet. When the packet
878 reachs table 65, the logical egress port will once again be a logical
879 patch port. In the same manner as described above, this logical patch
880 port will cause the packet to be resubmitted to OpenFlow tables 8 to
881 65, this time using the logical datapath representing the logical
882 switch that the destination VM or container is attached to.
883
884 The packet traverses tables 8 to 65 a third and final time. If the des‐
885 tination VM or container resides on a remote hypervisor, then table 32
886 will send the packet on a tunnel port from the sender’s hypervisor to
887 the remote hypervisor. Finally table 65 will output the packet directly
888 to the destination VM or container.
889
890 The following sections describe two exceptions, where logical routers
891 and/or logical patch ports are associated with a physical location.
892
893 Gateway Routers
894
895 A gateway router is a logical router that is bound to a physical loca‐
896 tion. This includes all of the logical patch ports of the logical
897 router, as well as all of the peer logical patch ports on logical
898 switches. In the OVN Southbound database, the Port_Binding entries for
899 these logical patch ports use the type l3gateway rather than patch, in
900 order to distinguish that these logical patch ports are bound to a
901 chassis.
902
903 When a hypervisor processes a packet on a logical datapath representing
904 a logical switch, and the logical egress port is a l3gateway port rep‐
905 resenting connectivity to a gateway router, the packet will match a
906 flow in table 32 that sends the packet on a tunnel port to the chassis
907 where the gateway router resides. This processing in table 32 is done
908 in the same manner as for VIFs.
909
910 Gateway routers are typically used in between distributed logical
911 routers and physical networks. The distributed logical router and the
912 logical switches behind it, to which VMs and containers attach, effec‐
913 tively reside on each hypervisor. The distributed router and the gate‐
914 way router are connected by another logical switch, sometimes referred
915 to as a join logical switch. On the other side, the gateway router con‐
916 nects to another logical switch that has a localnet port connecting to
917 the physical network.
918
919 When using gateway routers, DNAT and SNAT rules are associated with the
920 gateway router, which provides a central location that can handle one-
921 to-many SNAT (aka IP masquerading).
922
923 Distributed Gateway Ports
924
925 Distributed gateway ports are logical router patch ports that directly
926 connect distributed logical routers to logical switches with localnet
927 ports.
928
929 The primary design goal of distributed gateway ports is to allow as
930 much traffic as possible to be handled locally on the hypervisor where
931 a VM or container resides. Whenever possible, packets from the VM or
932 container to the outside world should be processed completely on that
933 VM’s or container’s hypervisor, eventually traversing a localnet port
934 instance on that hypervisor to the physical network. Whenever possible,
935 packets from the outside world to a VM or container should be directed
936 through the physical network directly to the VM’s or container’s hyper‐
937 visor, where the packet will enter the integration bridge through a
938 localnet port.
939
940 In order to allow for the distributed processing of packets described
941 in the paragraph above, distributed gateway ports need to be logical
942 patch ports that effectively reside on every hypervisor, rather than
943 l3gateway ports that are bound to a particular chassis. However, the
944 flows associated with distributed gateway ports often need to be asso‐
945 ciated with physical locations, for the following reasons:
946
947 · The physical network that the localnet port is attached
948 to typically uses L2 learning. Any Ethernet address used
949 over the distributed gateway port must be restricted to a
950 single physical location so that upstream L2 learning is
951 not confused. Traffic sent out the distributed gateway
952 port towards the localnet port with a specific Ethernet
953 address must be sent out one specific instance of the
954 distributed gateway port on one specific chassis. Traffic
955 received from the localnet port (or from a VIF on the
956 same logical switch as the localnet port) with a specific
957 Ethernet address must be directed to the logical switch’s
958 patch port instance on that specific chassis.
959
960 Due to the implications of L2 learning, the Ethernet
961 address and IP address of the distributed gateway port
962 need to be restricted to a single physical location. For
963 this reason, the user must specify one chassis associated
964 with the distributed gateway port. Note that traffic
965 traversing the distributed gateway port using other Eth‐
966 ernet addresses and IP addresses (e.g. one-to-one NAT) is
967 not restricted to this chassis.
968
969 Replies to ARP and ND requests must be restricted to a
970 single physical location, where the Ethernet address in
971 the reply resides. This includes ARP and ND replies for
972 the IP address of the distributed gateway port, which are
973 restricted to the chassis that the user associated with
974 the distributed gateway port.
975
976 · In order to support one-to-many SNAT (aka IP masquerad‐
977 ing), where multiple logical IP addresses spread across
978 multiple chassis are mapped to a single external IP
979 address, it will be necessary to handle some of the logi‐
980 cal router processing on a specific chassis in a central‐
981 ized manner. Since the SNAT external IP address is typi‐
982 cally the distributed gateway port IP address, and for
983 simplicity, the same chassis associated with the distrib‐
984 uted gateway port is used.
985
986 The details of flow restrictions to specific chassis are described in
987 the ovn-northd documentation.
988
989 While most of the physical location dependent aspects of distributed
990 gateway ports can be handled by restricting some flows to specific
991 chassis, one additional mechanism is required. When a packet leaves the
992 ingress pipeline and the logical egress port is the distributed gateway
993 port, one of two different sets of actions is required at table 32:
994
995 · If the packet can be handled locally on the sender’s
996 hypervisor (e.g. one-to-one NAT traffic), then the packet
997 should just be resubmitted locally to table 33, in the
998 normal manner for distributed logical patch ports.
999
1000 · However, if the packet needs to be handled on the chassis
1001 associated with the distributed gateway port (e.g. one-
1002 to-many SNAT traffic or non-NAT traffic), then table 32
1003 must send the packet on a tunnel port to that chassis.
1004
1005 In order to trigger the second set of actions, the chassisredirect type
1006 of southbound Port_Binding has been added. Setting the logical egress
1007 port to the type chassisredirect logical port is simply a way to indi‐
1008 cate that although the packet is destined for the distributed gateway
1009 port, it needs to be redirected to a different chassis. At table 32,
1010 packets with this logical egress port are sent to a specific chassis,
1011 in the same way that table 32 directs packets whose logical egress port
1012 is a VIF or a type l3gateway port to different chassis. Once the packet
1013 arrives at that chassis, table 33 resets the logical egress port to the
1014 value representing the distributed gateway port. For each distributed
1015 gateway port, there is one type chassisredirect port, in addition to
1016 the distributed logical patch port representing the distributed gateway
1017 port.
1018
1019 High Availability for Distributed Gateway Ports
1020
1021 OVN allows you to specify a prioritized list of chassis for a distrib‐
1022 uted gateway port. This is done by associating multiple Gateway_Chassis
1023 rows with a Logical_Router_Port in the OVN_Northbound database.
1024
1025 When multiple chassis have been specified for a gateway, all chassis
1026 that may send packets to that gateway will enable BFD on tunnels to all
1027 configured gateway chassis. The current master chassis for the gateway
1028 is the highest priority gateway chassis that is currently viewed as
1029 active based on BFD status.
1030
1031 For more information on L3 gateway high availability, please refer to
1032 http://docs.openvswitch.org/en/latest/topics/high-availability.
1033
1034 Multiple localnet logical switches connected to a Logical Router
1035 It is possible to have multiple logical switches each with a localnet
1036 port (representing physical networks) connected to a logical router, in
1037 which one localnet logical switch may provide the external connectivity
1038 via a distributed gateway port and rest of the localnet logical
1039 switches use VLAN tagging in the physical network. It is expected that
1040 ovn-bridge-mappings is configured appropriately on the chassis for all
1041 these localnet networks.
1042
1043 East West routing
1044
1045 East-West routing between these localnet VLAN tagged logical switches
1046 work almost the same way as normal logical switches. When the VM sends
1047 such a packet, then:
1048
1049 1. It first enters the ingress pipeline, and then egress pipe‐
1050 line of the source localnet logical switch datapath. It then
1051 enters the ingress pipeline of the logical router datapath
1052 via the logical router port in the source chassis.
1053
1054 2. Routing decision is taken.
1055
1056 3. From the router datapath, packet enters the ingress pipeline
1057 and then egress pipeline of the destination localnet logical
1058 switch datapath and goes out of the integration bridge to
1059 the provider bridge ( belonging to the destination logical
1060 switch) via the localnet port.
1061
1062 4. The destination chassis receives the packet via the localnet
1063 port and sends it to the integration bridge. The packet
1064 enters the ingress pipeline and then egress pipeline of the
1065 destination localnet logical switch and finally gets deliv‐
1066 ered to the destination VM port.
1067
1068 External traffic
1069
1070 The following happens when a VM sends an external traffic (which
1071 requires NATting) and the chassis hosting the VM doesn’t have a dis‐
1072 tributed gateway port.
1073
1074 1. The packet first enters the ingress pipeline, and then
1075 egress pipeline of the source localnet logical switch data‐
1076 path. It then enters the ingress pipeline of the logical
1077 router datapath via the logical router port in the source
1078 chassis.
1079
1080 2. Routing decision is taken. Since the gateway router or the
1081 distributed gateway port doesn’t reside in the source chas‐
1082 sis, the traffic is redirected to the gateway chassis via
1083 the tunnel port.
1084
1085 3. The gateway chassis receives the packet via the tunnel port
1086 and the packet enters the egress pipeline of the logical
1087 router datapath. NAT rules are applied here. The packet then
1088 enters the ingress pipeline and then egress pipeline of the
1089 localnet logical switch datapath which provides external
1090 connectivity and finally goes out via the localnet port of
1091 the logical switch which provides external connectivity.
1092
1093 Although this works, the VM traffic is tunnelled when sent from the
1094 compute chassis to the gateway chassis. In order for it to work prop‐
1095 erly, the MTU of the localnet logical switches must be lowered to
1096 account for the tunnel encapsulation.
1097
1098
1099 Centralized routing for localnet VLAN tagged logical switches connected
1100 to a Logical Router "
1101
1102 To overcome the tunnel encapsulation problem described in the previous
1103 section, OVN supports the option of enabling centralized routing for
1104 localnet VLAN tagged logical switches. CMS can configure the option
1105 options:reside-on-redirect-chassis to true for each Logical_Router_Port
1106 which connects to the localnet VLAN tagged logical switches. This
1107 causes the gateway chassis (hosting the distributed gateway port) to
1108 handle all the routing for these networks, making it centralized. It
1109 will reply to the ARP requests for the logical router port IPs.
1110
1111 If the logical router doesn’t have a distributed gateway port connect‐
1112 ing to the localnet logical switch which provides external connectiv‐
1113 ity, then this option is ignored by OVN.
1114
1115 The following happens when a VM sends an east-west traffic which needs
1116 to be routed:
1117
1118 1. The packet first enters the ingress pipeline, and then
1119 egress pipeline of the source localnet logical switch data‐
1120 path and is sent out via the localnet port of the source
1121 localnet logical switch (instead of sending it to router
1122 pipeline).
1123
1124 2. The gateway chassis receives the packet via the localnet
1125 port of the source localnet logical switch and sends it to
1126 the integration bridge. The packet then enters the ingress
1127 pipeline, and then egress pipeline of the source localnet
1128 logical switch datapath and enters the ingress pipeline of
1129 the logical router datapath.
1130
1131 3. Routing decision is taken.
1132
1133 4. From the router datapath, packet enters the ingress pipeline
1134 and then egress pipeline of the destination localnet logical
1135 switch datapath. It then goes out of the integration bridge
1136 to the provider bridge ( belonging to the destination logi‐
1137 cal switch) via the localnet port.
1138
1139 5. The destination chassis receives the packet via the localnet
1140 port and sends it to the integration bridge. The packet
1141 enters the ingress pipeline and then egress pipeline of the
1142 destination localnet logical switch and finally delivered to
1143 the destination VM port.
1144
1145 The following happens when a VM sends an external traffic which
1146 requires NATting:
1147
1148 1. The packet first enters the ingress pipeline, and then
1149 egress pipeline of the source localnet logical switch data‐
1150 path and is sent out via the localnet port of the source
1151 localnet logical switch (instead of sending it to router
1152 pipeline).
1153
1154 2. The gateway chassis receives the packet via the localnet
1155 port of the source localnet logical switch and sends it to
1156 the integration bridge. The packet then enters the ingress
1157 pipeline, and then egress pipeline of the source localnet
1158 logical switch datapath and enters the ingress pipeline of
1159 the logical router datapath.
1160
1161 3. Routing decision is taken and NAT rules are applied.
1162
1163 4. From the router datapath, packet enters the ingress pipeline
1164 and then egress pipeline of the localnet logical switch
1165 datapath which provides external connectivity. It then goes
1166 out of the integration bridge to the provider bridge
1167 (belonging to the logical switch which provides external
1168 connectivity) via the localnet port.
1169
1170 The following happens for the reverse external traffic.
1171
1172 1. The gateway chassis receives the packet from the localnet
1173 port of the logical switch which provides external connec‐
1174 tivity. The packet then enters the ingress pipeline and then
1175 egress pipeline of the localnet logical switch (which pro‐
1176 vides external connectivity). The packet then enters the
1177 ingress pipeline of the logical router datapath.
1178
1179 2. The ingress pipeline of the logical router datapath applies
1180 the unNATting rules. The packet then enters the ingress
1181 pipeline and then egress pipeline of the source localnet
1182 logical switch. Since the source VM doesn’t reside in the
1183 gateway chassis, the packet is sent out via the localnet
1184 port of the source logical switch.
1185
1186 3. The source chassis receives the packet via the localnet port
1187 and sends it to the integration bridge. The packet enters
1188 the ingress pipeline and then egress pipeline of the source
1189 localnet logical switch and finally gets delivered to the
1190 source VM port.
1191
1192 Life Cycle of a VTEP gateway
1193 A gateway is a chassis that forwards traffic between the OVN-managed
1194 part of a logical network and a physical VLAN, extending a tunnel-based
1195 logical network into a physical network.
1196
1197 The steps below refer often to details of the OVN and VTEP database
1198 schemas. Please see ovn-sb(5), ovn-nb(5) and vtep(5), respectively, for
1199 the full story on these databases.
1200
1201 1. A VTEP gateway’s life cycle begins with the administrator
1202 registering the VTEP gateway as a Physical_Switch table
1203 entry in the VTEP database. The ovn-controller-vtep con‐
1204 nected to this VTEP database, will recognize the new VTEP
1205 gateway and create a new Chassis table entry for it in the
1206 OVN_Southbound database.
1207
1208 2. The administrator can then create a new Logical_Switch table
1209 entry, and bind a particular vlan on a VTEP gateway’s port
1210 to any VTEP logical switch. Once a VTEP logical switch is
1211 bound to a VTEP gateway, the ovn-controller-vtep will detect
1212 it and add its name to the vtep_logical_switches column of
1213 the Chassis table in the OVN_Southbound database. Note, the
1214 tunnel_key column of VTEP logical switch is not filled at
1215 creation. The ovn-controller-vtep will set the column when
1216 the correponding vtep logical switch is bound to an OVN log‐
1217 ical network.
1218
1219 3. Now, the administrator can use the CMS to add a VTEP logical
1220 switch to the OVN logical network. To do that, the CMS must
1221 first create a new Logical_Switch_Port table entry in the
1222 OVN_Northbound database. Then, the type column of this entry
1223 must be set to "vtep". Next, the vtep-logical-switch and
1224 vtep-physical-switch keys in the options column must also be
1225 specified, since multiple VTEP gateways can attach to the
1226 same VTEP logical switch.
1227
1228 4. The newly created logical port in the OVN_Northbound data‐
1229 base and its configuration will be passed down to the
1230 OVN_Southbound database as a new Port_Binding table entry.
1231 The ovn-controller-vtep will recognize the change and bind
1232 the logical port to the corresponding VTEP gateway chassis.
1233 Configuration of binding the same VTEP logical switch to a
1234 different OVN logical networks is not allowed and a warning
1235 will be generated in the log.
1236
1237 5. Beside binding to the VTEP gateway chassis, the ovn-con‐
1238 troller-vtep will update the tunnel_key column of the VTEP
1239 logical switch to the corresponding Datapath_Binding table
1240 entry’s tunnel_key for the bound OVN logical network.
1241
1242 6. Next, the ovn-controller-vtep will keep reacting to the con‐
1243 figuration change in the Port_Binding in the OVN_Northbound
1244 database, and updating the Ucast_Macs_Remote table in the
1245 VTEP database. This allows the VTEP gateway to understand
1246 where to forward the unicast traffic coming from the
1247 extended external network.
1248
1249 7. Eventually, the VTEP gateway’s life cycle ends when the
1250 administrator unregisters the VTEP gateway from the VTEP
1251 database. The ovn-controller-vtep will recognize the event
1252 and remove all related configurations (Chassis table entry
1253 and port bindings) in the OVN_Southbound database.
1254
1255 8. When the ovn-controller-vtep is terminated, all related con‐
1256 figurations in the OVN_Southbound database and the VTEP
1257 database will be cleaned, including Chassis table entries
1258 for all registered VTEP gateways and their port bindings,
1259 and all Ucast_Macs_Remote table entries and the Logi‐
1260 cal_Switch tunnel keys.
1261
1262 Native OVN services for external logical ports
1263 To support OVN native services (like DHCP/IPv6 RA/DNS lookup) to the
1264 cloud resources which are external, OVN supports external logical
1265 ports.
1266
1267 Below are some of the use cases where external ports can be used.
1268
1269 · VMs connected to SR-IOV nics - Traffic from these VMs by
1270 passes the kernel stack and local ovn-controller do not
1271 bind these ports and cannot serve the native services.
1272
1273 · When CMS supports provisioning baremetal servers.
1274
1275 OVN will provide the native services if CMS has done the below configu‐
1276 ration in the OVN Northbound Database.
1277
1278 · A row is created in Logical_Switch_Port, configuring the
1279 addresses column and setting the type to external.
1280
1281 · ha_chassis_group column is configured.
1282
1283 · The HA chassis which belongs to the HA chassis group has
1284 the ovn-bridge-mappings configured and has proper L2 con‐
1285 nectivity so that it can receive the DHCP and other
1286 related request packets from these external resources.
1287
1288 · The Logical_Switch of this port has a localnet port.
1289
1290 · Native OVN services are enabled by configuring the DHCP
1291 and other options like the way it is done for the normal
1292 logical ports.
1293
1294 It is recommended to use the same HA chassis group for all the external
1295 ports of a logical switch. Otherwise, the physical switch might see MAC
1296 flap issue when different chassis provide the native services. For
1297 example when supporting native DHCPv4 service, DHCPv4 server mac (con‐
1298 figured in options:server_mac column in table DHCP_Options) originating
1299 from different ports can cause MAC flap issue. The MAC of the logical
1300 router IP(s) can also flap if the same HA chassis group is not set for
1301 all the external ports of a logical switch.
1302
1304 Role-Based Access Controls for the Soutbound DB
1305 In order to provide additional security against the possibility of an
1306 OVN chassis becoming compromised in such a way as to allow rogue soft‐
1307 ware to make arbitrary modifications to the southbound database state
1308 and thus disrupt the OVN network, role-based access controls (see
1309 ovsdb-server(1) for additional details) are provided for the southbound
1310 database.
1311
1312 The implementation of role-based access controls (RBAC) requires the
1313 addition of two tables to an OVSDB schema: the RBAC_Role table, which
1314 is indexed by role name and maps the the names of the various tables
1315 that may be modifiable for a given role to individual rows in a permis‐
1316 sions table containing detailed permission information for that role,
1317 and the permission table itself which consists of rows containing the
1318 following information:
1319
1320 Table Name
1321 The name of the associated table. This column exists pri‐
1322 marily as an aid for humans reading the contents of this
1323 table.
1324
1325 Auth Criteria
1326 A set of strings containing the names of columns (or col‐
1327 umn:key pairs for columns containing string:string maps).
1328 The contents of at least one of the columns or column:key
1329 values in a row to be modified, inserted, or deleted must
1330 be equal to the ID of the client attempting to act on the
1331 row in order for the authorization check to pass. If the
1332 authorization criteria is empty, authorization checking
1333 is disabled and all clients for the role will be treated
1334 as authorized.
1335
1336 Insert/Delete
1337 Row insertion/deletion permission; boolean value indicat‐
1338 ing whether insertion and deletion of rows is allowed for
1339 the associated table. If true, insertion and deletion of
1340 rows is allowed for authorized clients.
1341
1342 Updatable Columns
1343 A set of strings containing the names of columns or col‐
1344 umn:key pairs that may be updated or mutated by autho‐
1345 rized clients. Modifications to columns within a row are
1346 only permitted when the authorization check for the
1347 client passes and all columns to be modified are included
1348 in this set of modifiable columns.
1349
1350 RBAC configuration for the OVN southbound database is maintained by
1351 ovn-northd. With RBAC enabled, modifications are only permitted for the
1352 Chassis, Encap, Port_Binding, and MAC_Binding tables, and are
1353 resstricted as follows:
1354
1355 Chassis
1356 Authorization: client ID must match the chassis name.
1357
1358 Insert/Delete: authorized row insertion and deletion are
1359 permitted.
1360
1361 Update: The columns nb_cfg, external_ids, encaps, and
1362 vtep_logical_switches may be modified when authorized.
1363
1364 Encap Authorization: client ID must match the chassis name.
1365
1366 Insert/Delete: row insertion and row deletion are permit‐
1367 ted.
1368
1369 Update: The columns type, options, and ip can be modi‐
1370 fied.
1371
1372 Port_Binding
1373 Authorization: disabled (all clients are considered
1374 authorized. A future enhancement may add columns (or keys
1375 to external_ids) in order to control which chassis are
1376 allowed to bind each port.
1377
1378 Insert/Delete: row insertion/deletion are not permitted
1379 (ovn-northd maintains rows in this table.
1380
1381 Update: Only modifications to the chassis column are per‐
1382 mitted.
1383
1384 MAC_Binding
1385 Authorization: disabled (all clients are considered to be
1386 authorized).
1387
1388 Insert/Delete: row insertion/deletion are permitted.
1389
1390 Update: The columns logical_port, ip, mac, and datapath
1391 may be modified by ovn-controller.
1392
1393 Enabling RBAC for ovn-controller connections to the southbound database
1394 requires the following steps:
1395
1396 1. Creating SSL certificates for each chassis with the certifi‐
1397 cate CN field set to the chassis name (e.g. for a chassis
1398 with external-ids:system-id=chassis-1, via the command
1399 "ovs-pki -u req+sign chassis-1 switch").
1400
1401 2. Configuring each ovn-controller to use SSL when connecting
1402 to the southbound database (e.g. via "ovs-vsctl set open .
1403 external-ids:ovn-remote=ssl:x.x.x.x:6642").
1404
1405 3. Configuring a southbound database SSL remote with "ovn-con‐
1406 troller" role (e.g. via "ovn-sbctl set-connection
1407 role=ovn-controller pssl:6642").
1408
1409 Encrypt Tunnel Traffic with IPsec
1410 OVN tunnel traffic goes through physical routers and switches. These
1411 physical devices could be untrusted (devices in public network) or
1412 might be compromised. Enabling encryption to the tunnel traffic can
1413 prevent the traffic data from being monitored and manipulated.
1414
1415 The tunnel traffic is encrypted with IPsec. The CMS sets the ipsec col‐
1416 umn in the northbound NB_Global table to enable or disable IPsec encry‐
1417 tion. If ipsec is true, all OVN tunnels will be encrypted. If ipsec is
1418 false, no OVN tunnels will be encrypted.
1419
1420 When CMS updates the ipsec column in the northbound NB_Global table,
1421 ovn-northd copies the value to the ipsec column in the southbound
1422 SB_Global table. ovn-controller in each chassis monitors the southbound
1423 database and sets the options of the OVS tunnel interface accordingly.
1424 OVS tunnel interface options are monitored by the ovs-monitor-ipsec
1425 daemon which configures IKE daemon to set up IPsec connections.
1426
1427 Chassis authenticates each other by using certificate. The authentica‐
1428 tion succeeds if the other end in tunnel presents a certificate signed
1429 by a trusted CA and the common name (CN) matches the expected chassis
1430 name. The SSL certificates used in role-based access controls (RBAC)
1431 can be used in IPsec. Or use ovs-pki to create different certificates.
1432 The certificate is required to be x.509 version 3, and with CN field
1433 and subjectAltName field being set to the chassis name.
1434
1435 The CA certificate, chassis certificate and private key are required to
1436 be installed in each chassis before enabling IPsec. Please see
1437 ovs-vswitchd.conf.db(5) for setting up CA based IPsec authentication.
1438
1440 Tunnel Encapsulations
1441 OVN annotates logical network packets that it sends from one hypervisor
1442 to another with the following three pieces of metadata, which are
1443 encoded in an encapsulation-specific fashion:
1444
1445 · 24-bit logical datapath identifier, from the tunnel_key
1446 column in the OVN Southbound Datapath_Binding table.
1447
1448 · 15-bit logical ingress port identifier. ID 0 is reserved
1449 for internal use within OVN. IDs 1 through 32767, inclu‐
1450 sive, may be assigned to logical ports (see the tun‐
1451 nel_key column in the OVN Southbound Port_Binding table).
1452
1453 · 16-bit logical egress port identifier. IDs 0 through
1454 32767 have the same meaning as for logical ingress ports.
1455 IDs 32768 through 65535, inclusive, may be assigned to
1456 logical multicast groups (see the tunnel_key column in
1457 the OVN Southbound Multicast_Group table).
1458
1459 For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
1460 encapsulations, for the following reasons:
1461
1462 · Only STT and Geneve support the large amounts of metadata
1463 (over 32 bits per packet) that OVN uses (as described
1464 above).
1465
1466 · STT and Geneve use randomized UDP or TCP source ports
1467 that allows efficient distribution among multiple paths
1468 in environments that use ECMP in their underlay.
1469
1470 · NICs are available to offload STT and Geneve encapsula‐
1471 tion and decapsulation.
1472
1473 Due to its flexibility, the preferred encapsulation between hypervisors
1474 is Geneve. For Geneve encapsulation, OVN transmits the logical datapath
1475 identifier in the Geneve VNI. OVN transmits the logical ingress and
1476 logical egress ports in a TLV with class 0x0102, type 0x80, and a
1477 32-bit value encoded as follows, from MSB to LSB:
1478
1479 1 15 16
1480 +---+------------+-----------+
1481 |rsv|ingress port|egress port|
1482 +---+------------+-----------+
1483 0
1484
1485
1486 Environments whose NICs lack Geneve offload may prefer STT encapsula‐
1487 tion for performance reasons. For STT encapsulation, OVN encodes all
1488 three pieces of logical metadata in the STT 64-bit tunnel ID as fol‐
1489 lows, from MSB to LSB:
1490
1491 9 15 16 24
1492 +--------+------------+-----------+--------+
1493 |reserved|ingress port|egress port|datapath|
1494 +--------+------------+-----------+--------+
1495 0
1496
1497
1498 For connecting to gateways, in addition to Geneve and STT, OVN supports
1499 VXLAN, because only VXLAN support is common on top-of-rack (ToR)
1500 switches. Currently, gateways have a feature set that matches the capa‐
1501 bilities as defined by the VTEP schema, so fewer bits of metadata are
1502 necessary. In the future, gateways that do not support encapsulations
1503 with large amounts of metadata may continue to have a reduced feature
1504 set.
1505
1506
1507
1508Open vSwitch 2.11.1 OVN Architecture ovn-architecture(7)