1ovn-architecture(7) Open vSwitch Manual ovn-architecture(7)
2
3
4
6 ovn-architecture - Open Virtual Network architecture
7
9 OVN, the Open Virtual Network, is a system to support virtual network
10 abstraction. OVN complements the existing capabilities of OVS to add
11 native support for virtual network abstractions, such as virtual L2 and
12 L3 overlays and security groups. Services such as DHCP are also desir‐
13 able features. Just like OVS, OVN’s design goal is to have a produc‐
14 tion-quality implementation that can operate at significant scale.
15
16 An OVN deployment consists of several components:
17
18 · A Cloud Management System (CMS), which is OVN’s ultimate
19 client (via its users and administrators). OVN integra‐
20 tion requires installing a CMS-specific plugin and
21 related software (see below). OVN initially targets Open‐
22 Stack as CMS.
23
24 We generally speak of ``the’’ CMS, but one can imagine
25 scenarios in which multiple CMSes manage different parts
26 of an OVN deployment.
27
28 · An OVN Database physical or virtual node (or, eventually,
29 cluster) installed in a central location.
30
31 · One or more (usually many) hypervisors. Hypervisors must
32 run Open vSwitch and implement the interface described in
33 IntegrationGuide.rst in the OVS source tree. Any hypervi‐
34 sor platform supported by Open vSwitch is acceptable.
35
36 · Zero or more gateways. A gateway extends a tunnel-based
37 logical network into a physical network by bidirection‐
38 ally forwarding packets between tunnels and a physical
39 Ethernet port. This allows non-virtualized machines to
40 participate in logical networks. A gateway may be a phys‐
41 ical host, a virtual machine, or an ASIC-based hardware
42 switch that supports the vtep(5) schema.
43
44 Hypervisors and gateways are together called transport
45 node or chassis.
46
47 The diagram below shows how the major components of OVN and related
48 software interact. Starting at the top of the diagram, we have:
49
50 · The Cloud Management System, as defined above.
51
52 · The OVN/CMS Plugin is the component of the CMS that
53 interfaces to OVN. In OpenStack, this is a Neutron plug‐
54 in. The plugin’s main purpose is to translate the CMS’s
55 notion of logical network configuration, stored in the
56 CMS’s configuration database in a CMS-specific format,
57 into an intermediate representation understood by OVN.
58
59 This component is necessarily CMS-specific, so a new
60 plugin needs to be developed for each CMS that is inte‐
61 grated with OVN. All of the components below this one in
62 the diagram are CMS-independent.
63
64 · The OVN Northbound Database receives the intermediate
65 representation of logical network configuration passed
66 down by the OVN/CMS Plugin. The database schema is meant
67 to be ``impedance matched’’ with the concepts used in a
68 CMS, so that it directly supports notions of logical
69 switches, routers, ACLs, and so on. See ovn-nb(5) for
70 details.
71
72 The OVN Northbound Database has only two clients: the
73 OVN/CMS Plugin above it and ovn-northd below it.
74
75 · ovn-northd(8) connects to the OVN Northbound Database
76 above it and the OVN Southbound Database below it. It
77 translates the logical network configuration in terms of
78 conventional network concepts, taken from the OVN North‐
79 bound Database, into logical datapath flows in the OVN
80 Southbound Database below it.
81
82 · The OVN Southbound Database is the center of the system.
83 Its clients are ovn-northd(8) above it and ovn-con‐
84 troller(8) on every transport node below it.
85
86 The OVN Southbound Database contains three kinds of data:
87 Physical Network (PN) tables that specify how to reach
88 hypervisor and other nodes, Logical Network (LN) tables
89 that describe the logical network in terms of ``logical
90 datapath flows,’’ and Binding tables that link logical
91 network components’ locations to the physical network.
92 The hypervisors populate the PN and Port_Binding tables,
93 whereas ovn-northd(8) populates the LN tables.
94
95 OVN Southbound Database performance must scale with the
96 number of transport nodes. This will likely require some
97 work on ovsdb-server(1) as we encounter bottlenecks.
98 Clustering for availability may be needed.
99
100 The remaining components are replicated onto each hypervisor:
101
102 · ovn-controller(8) is OVN’s agent on each hypervisor and
103 software gateway. Northbound, it connects to the OVN
104 Southbound Database to learn about OVN configuration and
105 status and to populate the PN table and the Chassis col‐
106 umn in Binding table with the hypervisor’s status. South‐
107 bound, it connects to ovs-vswitchd(8) as an OpenFlow con‐
108 troller, for control over network traffic, and to the
109 local ovsdb-server(1) to allow it to monitor and control
110 Open vSwitch configuration.
111
112 · ovs-vswitchd(8) and ovsdb-server(1) are conventional com‐
113 ponents of Open vSwitch.
114
115 CMS
116 |
117 |
118 +-----------|-----------+
119 | | |
120 | OVN/CMS Plugin |
121 | | |
122 | | |
123 | OVN Northbound DB |
124 | | |
125 | | |
126 | ovn-northd |
127 | | |
128 +-----------|-----------+
129 |
130 |
131 +-------------------+
132 | OVN Southbound DB |
133 +-------------------+
134 |
135 |
136 +------------------+------------------+
137 | | |
138 HV 1 | | HV n |
139 +---------------|---------------+ . +---------------|---------------+
140 | | | . | | |
141 | ovn-controller | . | ovn-controller |
142 | | | | . | | | |
143 | | | | | | | |
144 | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
145 | | | |
146 +-------------------------------+ +-------------------------------+
147
148
149 Information Flow in OVN
150 Configuration data in OVN flows from north to south. The CMS, through
151 its OVN/CMS plugin, passes the logical network configuration to
152 ovn-northd via the northbound database. In turn, ovn-northd compiles
153 the configuration into a lower-level form and passes it to all of the
154 chassis via the southbound database.
155
156 Status information in OVN flows from south to north. OVN currently pro‐
157 vides only a few forms of status information. First, ovn-northd popu‐
158 lates the up column in the northbound Logical_Switch_Port table: if a
159 logical port’s chassis column in the southbound Port_Binding table is
160 nonempty, it sets up to true, otherwise to false. This allows the CMS
161 to detect when a VM’s networking has come up.
162
163 Second, OVN provides feedback to the CMS on the realization of its con‐
164 figuration, that is, whether the configuration provided by the CMS has
165 taken effect. This feature requires the CMS to participate in a
166 sequence number protocol, which works the following way:
167
168 1. When the CMS updates the configuration in the northbound
169 database, as part of the same transaction, it increments the
170 value of the nb_cfg column in the NB_Global table. (This is
171 only necessary if the CMS wants to know when the configura‐
172 tion has been realized.)
173
174 2. When ovn-northd updates the southbound database based on a
175 given snapshot of the northbound database, it copies nb_cfg
176 from northbound NB_Global into the southbound database
177 SB_Global table, as part of the same transaction. (Thus, an
178 observer monitoring both databases can determine when the
179 southbound database is caught up with the northbound.)
180
181 3. After ovn-northd receives confirmation from the southbound
182 database server that its changes have committed, it updates
183 sb_cfg in the northbound NB_Global table to the nb_cfg ver‐
184 sion that was pushed down. (Thus, the CMS or another
185 observer can determine when the southbound database is
186 caught up without a connection to the southbound database.)
187
188 4. The ovn-controller process on each chassis receives the
189 updated southbound database, with the updated nb_cfg. This
190 process in turn updates the physical flows installed in the
191 chassis’s Open vSwitch instances. When it receives confirma‐
192 tion from Open vSwitch that the physical flows have been
193 updated, it updates nb_cfg in its own Chassis record in the
194 southbound database.
195
196 5. ovn-northd monitors the nb_cfg column in all of the Chassis
197 records in the southbound database. It keeps track of the
198 minimum value among all the records and copies it into the
199 hv_cfg column in the northbound NB_Global table. (Thus, the
200 CMS or another observer can determine when all of the hyper‐
201 visors have caught up to the northbound configuration.)
202
203 Chassis Setup
204 Each chassis in an OVN deployment must be configured with an Open
205 vSwitch bridge dedicated for OVN’s use, called the integration bridge.
206 System startup scripts may create this bridge prior to starting
207 ovn-controller if desired. If this bridge does not exist when ovn-con‐
208 troller starts, it will be created automatically with the default con‐
209 figuration suggested below. The ports on the integration bridge
210 include:
211
212 · On any chassis, tunnel ports that OVN uses to maintain
213 logical network connectivity. ovn-controller adds,
214 updates, and removes these tunnel ports.
215
216 · On a hypervisor, any VIFs that are to be attached to log‐
217 ical networks. The hypervisor itself, or the integration
218 between Open vSwitch and the hypervisor (described in
219 IntegrationGuide.rst) takes care of this. (This is not
220 part of OVN or new to OVN; this is pre-existing integra‐
221 tion work that has already been done on hypervisors that
222 support OVS.)
223
224 · On a gateway, the physical port used for logical network
225 connectivity. System startup scripts add this port to the
226 bridge prior to starting ovn-controller. This can be a
227 patch port to another bridge, instead of a physical port,
228 in more sophisticated setups.
229
230 Other ports should not be attached to the integration bridge. In par‐
231 ticular, physical ports attached to the underlay network (as opposed to
232 gateway ports, which are physical ports attached to logical networks)
233 must not be attached to the integration bridge. Underlay physical ports
234 should instead be attached to a separate Open vSwitch bridge (they need
235 not be attached to any bridge at all, in fact).
236
237 The integration bridge should be configured as described below. The
238 effect of each of these settings is documented in
239 ovs-vswitchd.conf.db(5):
240
241 fail-mode=secure
242 Avoids switching packets between isolated logical net‐
243 works before ovn-controller starts up. See Controller
244 Failure Settings in ovs-vsctl(8) for more information.
245
246 other-config:disable-in-band=true
247 Suppresses in-band control flows for the integration
248 bridge. It would be unusual for such flows to show up
249 anyway, because OVN uses a local controller (over a Unix
250 domain socket) instead of a remote controller. It’s pos‐
251 sible, however, for some other bridge in the same system
252 to have an in-band remote controller, and in that case
253 this suppresses the flows that in-band control would
254 ordinarily set up. Refer to the documentation for more
255 information.
256
257 The customary name for the integration bridge is br-int, but another
258 name may be used.
259
260 Logical Networks
261 A logical network implements the same concepts as physical networks,
262 but they are insulated from the physical network with tunnels or other
263 encapsulations. This allows logical networks to have separate IP and
264 other address spaces that overlap, without conflicting, with those used
265 for physical networks. Logical network topologies can be arranged with‐
266 out regard for the topologies of the physical networks on which they
267 run.
268
269 Logical network concepts in OVN include:
270
271 · Logical switches, the logical version of Ethernet
272 switches.
273
274 · Logical routers, the logical version of IP routers. Logi‐
275 cal switches and routers can be connected into sophisti‐
276 cated topologies.
277
278 · Logical datapaths are the logical version of an OpenFlow
279 switch. Logical switches and routers are both implemented
280 as logical datapaths.
281
282 · Logical ports represent the points of connectivity in and
283 out of logical switches and logical routers. Some common
284 types of logical ports are:
285
286 · Logical ports representing VIFs.
287
288 · Localnet ports represent the points of connectiv‐
289 ity between logical switches and the physical net‐
290 work. They are implemented as OVS patch ports
291 between the integration bridge and the separate
292 Open vSwitch bridge that underlay physical ports
293 attach to.
294
295 · Logical patch ports represent the points of con‐
296 nectivity between logical switches and logical
297 routers, and in some cases between peer logical
298 routers. There is a pair of logical patch ports at
299 each such point of connectivity, one on each side.
300
301 · Localport ports represent the points of local con‐
302 nectivity between logical switches and VIFs. These
303 ports are present in every chassis (not bound to
304 any particular one) and traffic from them will
305 never go through a tunnel. A localport is expected
306 to only generate traffic destined for a local des‐
307 tination, typically in response to a request it
308 received. One use case is how OpenStack Neutron
309 uses a localport port for serving metadata to VM’s
310 residing on every hypervisor. A metadata proxy
311 process is attached to this port on every host and
312 all VM’s within the same network will reach it at
313 the same IP/MAC address without any traffic being
314 sent over a tunnel. Further details can be seen at
315 https://docs.openstack.org/developer/networking-
316 ovn/design/metadata_api.html.
317
318 Life Cycle of a VIF
319 Tables and their schemas presented in isolation are difficult to under‐
320 stand. Here’s an example.
321
322 A VIF on a hypervisor is a virtual network interface attached either to
323 a VM or a container running directly on that hypervisor (This is dif‐
324 ferent from the interface of a container running inside a VM).
325
326 The steps in this example refer often to details of the OVN and OVN
327 Northbound database schemas. Please see ovn-sb(5) and ovn-nb(5),
328 respectively, for the full story on these databases.
329
330 1. A VIF’s life cycle begins when a CMS administrator creates a
331 new VIF using the CMS user interface or API and adds it to a
332 switch (one implemented by OVN as a logical switch). The CMS
333 updates its own configuration. This includes associating
334 unique, persistent identifier vif-id and Ethernet address
335 mac with the VIF.
336
337 2. The CMS plugin updates the OVN Northbound database to
338 include the new VIF, by adding a row to the Logi‐
339 cal_Switch_Port table. In the new row, name is vif-id, mac
340 is mac, switch points to the OVN logical switch’s Logi‐
341 cal_Switch record, and other columns are initialized appro‐
342 priately.
343
344 3. ovn-northd receives the OVN Northbound database update. In
345 turn, it makes the corresponding updates to the OVN South‐
346 bound database, by adding rows to the OVN Southbound data‐
347 base Logical_Flow table to reflect the new port, e.g. add a
348 flow to recognize that packets destined to the new port’s
349 MAC address should be delivered to it, and update the flow
350 that delivers broadcast and multicast packets to include the
351 new port. It also creates a record in the Binding table and
352 populates all its columns except the column that identifies
353 the chassis.
354
355 4. On every hypervisor, ovn-controller receives the Logi‐
356 cal_Flow table updates that ovn-northd made in the previous
357 step. As long as the VM that owns the VIF is powered off,
358 ovn-controller cannot do much; it cannot, for example,
359 arrange to send packets to or receive packets from the VIF,
360 because the VIF does not actually exist anywhere.
361
362 5. Eventually, a user powers on the VM that owns the VIF. On
363 the hypervisor where the VM is powered on, the integration
364 between the hypervisor and Open vSwitch (described in Inte‐
365 grationGuide.rst) adds the VIF to the OVN integration bridge
366 and stores vif-id in external_ids:iface-id to indicate that
367 the interface is an instantiation of the new VIF. (None of
368 this code is new in OVN; this is pre-existing integration
369 work that has already been done on hypervisors that support
370 OVS.)
371
372 6. On the hypervisor where the VM is powered on, ovn-controller
373 notices external_ids:iface-id in the new Interface. In
374 response, in the OVN Southbound DB, it updates the Binding
375 table’s chassis column for the row that links the logical
376 port from external_ids: iface-id to the hypervisor. After‐
377 ward, ovn-controller updates the local hypervisor’s OpenFlow
378 tables so that packets to and from the VIF are properly han‐
379 dled.
380
381 7. Some CMS systems, including OpenStack, fully start a VM only
382 when its networking is ready. To support this, ovn-northd
383 notices the chassis column updated for the row in Binding
384 table and pushes this upward by updating the up column in
385 the OVN Northbound database’s Logical_Switch_Port table to
386 indicate that the VIF is now up. The CMS, if it uses this
387 feature, can then react by allowing the VM’s execution to
388 proceed.
389
390 8. On every hypervisor but the one where the VIF resides,
391 ovn-controller notices the completely populated row in the
392 Binding table. This provides ovn-controller the physical
393 location of the logical port, so each instance updates the
394 OpenFlow tables of its switch (based on logical datapath
395 flows in the OVN DB Logical_Flow table) so that packets to
396 and from the VIF can be properly handled via tunnels.
397
398 9. Eventually, a user powers off the VM that owns the VIF. On
399 the hypervisor where the VM was powered off, the VIF is
400 deleted from the OVN integration bridge.
401
402 10. On the hypervisor where the VM was powered off, ovn-con‐
403 troller notices that the VIF was deleted. In response, it
404 removes the Chassis column content in the Binding table for
405 the logical port.
406
407 11. On every hypervisor, ovn-controller notices the empty Chas‐
408 sis column in the Binding table’s row for the logical port.
409 This means that ovn-controller no longer knows the physical
410 location of the logical port, so each instance updates its
411 OpenFlow table to reflect that.
412
413 12. Eventually, when the VIF (or its entire VM) is no longer
414 needed by anyone, an administrator deletes the VIF using the
415 CMS user interface or API. The CMS updates its own configu‐
416 ration.
417
418 13. The CMS plugin removes the VIF from the OVN Northbound data‐
419 base, by deleting its row in the Logical_Switch_Port table.
420
421 14. ovn-northd receives the OVN Northbound update and in turn
422 updates the OVN Southbound database accordingly, by removing
423 or updating the rows from the OVN Southbound database Logi‐
424 cal_Flow table and Binding table that were related to the
425 now-destroyed VIF.
426
427 15. On every hypervisor, ovn-controller receives the Logi‐
428 cal_Flow table updates that ovn-northd made in the previous
429 step. ovn-controller updates OpenFlow tables to reflect the
430 update, although there may not be much to do, since the VIF
431 had already become unreachable when it was removed from the
432 Binding table in a previous step.
433
434 Life Cycle of a Container Interface Inside a VM
435 OVN provides virtual network abstractions by converting information
436 written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
437 virtual networking for multi-tenants can only be provided if OVN con‐
438 troller is the only entity that can modify flows in Open vSwitch. When
439 the Open vSwitch integration bridge resides in the hypervisor, it is a
440 fair assumption to make that tenant workloads running inside VMs cannot
441 make any changes to Open vSwitch flows.
442
443 If the infrastructure provider trusts the applications inside the con‐
444 tainers not to break out and modify the Open vSwitch flows, then con‐
445 tainers can be run in hypervisors. This is also the case when contain‐
446 ers are run inside the VMs and Open vSwitch integration bridge with
447 flows added by OVN controller resides in the same VM. For both the
448 above cases, the workflow is the same as explained with an example in
449 the previous section ("Life Cycle of a VIF").
450
451 This section talks about the life cycle of a container interface (CIF)
452 when containers are created in the VMs and the Open vSwitch integration
453 bridge resides inside the hypervisor. In this case, even if a container
454 application breaks out, other tenants are not affected because the con‐
455 tainers running inside the VMs cannot modify the flows in the Open
456 vSwitch integration bridge.
457
458 When multiple containers are created inside a VM, there are multiple
459 CIFs associated with them. The network traffic associated with these
460 CIFs need to reach the Open vSwitch integration bridge running in the
461 hypervisor for OVN to support virtual network abstractions. OVN should
462 also be able to distinguish network traffic coming from different CIFs.
463 There are two ways to distinguish network traffic of CIFs.
464
465 One way is to provide one VIF for every CIF (1:1 model). This means
466 that there could be a lot of network devices in the hypervisor. This
467 would slow down OVS because of all the additional CPU cycles needed for
468 the management of all the VIFs. It would also mean that the entity cre‐
469 ating the containers in a VM should also be able to create the corre‐
470 sponding VIFs in the hypervisor.
471
472 The second way is to provide a single VIF for all the CIFs (1:many
473 model). OVN could then distinguish network traffic coming from differ‐
474 ent CIFs via a tag written in every packet. OVN uses this mechanism and
475 uses VLAN as the tagging mechanism.
476
477 1. A CIF’s life cycle begins when a container is spawned inside
478 a VM by the either the same CMS that created the VM or a
479 tenant that owns that VM or even a container Orchestration
480 System that is different than the CMS that initially created
481 the VM. Whoever the entity is, it will need to know the vif-
482 id that is associated with the network interface of the VM
483 through which the container interface’s network traffic is
484 expected to go through. The entity that creates the con‐
485 tainer interface will also need to choose an unused VLAN
486 inside that VM.
487
488 2. The container spawning entity (either directly or through
489 the CMS that manages the underlying infrastructure) updates
490 the OVN Northbound database to include the new CIF, by
491 adding a row to the Logical_Switch_Port table. In the new
492 row, name is any unique identifier, parent_name is the vif-
493 id of the VM through which the CIF’s network traffic is
494 expected to go through and the tag is the VLAN tag that
495 identifies the network traffic of that CIF.
496
497 3. ovn-northd receives the OVN Northbound database update. In
498 turn, it makes the corresponding updates to the OVN South‐
499 bound database, by adding rows to the OVN Southbound data‐
500 base’s Logical_Flow table to reflect the new port and also
501 by creating a new row in the Binding table and populating
502 all its columns except the column that identifies the chas‐
503 sis.
504
505 4. On every hypervisor, ovn-controller subscribes to the
506 changes in the Binding table. When a new row is created by
507 ovn-northd that includes a value in parent_port column of
508 Binding table, the ovn-controller in the hypervisor whose
509 OVN integration bridge has that same value in vif-id in
510 external_ids:iface-id updates the local hypervisor’s Open‐
511 Flow tables so that packets to and from the VIF with the
512 particular VLAN tag are properly handled. Afterward it
513 updates the chassis column of the Binding to reflect the
514 physical location.
515
516 5. One can only start the application inside the container
517 after the underlying network is ready. To support this,
518 ovn-northd notices the updated chassis column in Binding ta‐
519 ble and updates the up column in the OVN Northbound data‐
520 base’s Logical_Switch_Port table to indicate that the CIF is
521 now up. The entity responsible to start the container appli‐
522 cation queries this value and starts the application.
523
524 6. Eventually the entity that created and started the con‐
525 tainer, stops it. The entity, through the CMS (or directly)
526 deletes its row in the Logical_Switch_Port table.
527
528 7. ovn-northd receives the OVN Northbound update and in turn
529 updates the OVN Southbound database accordingly, by removing
530 or updating the rows from the OVN Southbound database Logi‐
531 cal_Flow table that were related to the now-destroyed CIF.
532 It also deletes the row in the Binding table for that CIF.
533
534 8. On every hypervisor, ovn-controller receives the Logi‐
535 cal_Flow table updates that ovn-northd made in the previous
536 step. ovn-controller updates OpenFlow tables to reflect the
537 update.
538
539 Architectural Physical Life Cycle of a Packet
540 This section describes how a packet travels from one virtual machine or
541 container to another through OVN. This description focuses on the phys‐
542 ical treatment of a packet; for a description of the logical life cycle
543 of a packet, please refer to the Logical_Flow table in ovn-sb(5).
544
545 This section mentions several data and metadata fields, for clarity
546 summarized here:
547
548 tunnel key
549 When OVN encapsulates a packet in Geneve or another tun‐
550 nel, it attaches extra data to it to allow the receiving
551 OVN instance to process it correctly. This takes differ‐
552 ent forms depending on the particular encapsulation, but
553 in each case we refer to it here as the ``tunnel key.’’
554 See Tunnel Encapsulations, below, for details.
555
556 logical datapath field
557 A field that denotes the logical datapath through which a
558 packet is being processed. OVN uses the field that Open‐
559 Flow 1.1+ simply (and confusingly) calls ``metadata’’ to
560 store the logical datapath. (This field is passed across
561 tunnels as part of the tunnel key.)
562
563 logical input port field
564 A field that denotes the logical port from which the
565 packet entered the logical datapath. OVN stores this in
566 Open vSwitch extension register number 14.
567
568 Geneve and STT tunnels pass this field as part of the
569 tunnel key. Although VXLAN tunnels do not explicitly
570 carry a logical input port, OVN only uses VXLAN to commu‐
571 nicate with gateways that from OVN’s perspective consist
572 of only a single logical port, so that OVN can set the
573 logical input port field to this one on ingress to the
574 OVN logical pipeline.
575
576 logical output port field
577 A field that denotes the logical port from which the
578 packet will leave the logical datapath. This is initial‐
579 ized to 0 at the beginning of the logical ingress pipe‐
580 line. OVN stores this in Open vSwitch extension register
581 number 15.
582
583 Geneve and STT tunnels pass this field as part of the
584 tunnel key. VXLAN tunnels do not transmit the logical
585 output port field. Since VXLAN tunnels do not carry a
586 logical output port field in the tunnel key, when a
587 packet is received from VXLAN tunnel by an OVN hypervi‐
588 sor, the packet is resubmitted to table 8 to determine
589 the output port(s); when the packet reaches table 32,
590 these packets are resubmitted to table 33 for local
591 delivery by checking a MLF_RCV_FROM_VXLAN flag, which is
592 set when the packet arrives from a VXLAN tunnel.
593
594 conntrack zone field for logical ports
595 A field that denotes the connection tracking zone for
596 logical ports. The value only has local significance and
597 is not meaningful between chassis. This is initialized to
598 0 at the beginning of the logical ingress pipeline. OVN
599 stores this in Open vSwitch extension register number 13.
600
601 conntrack zone fields for routers
602 Fields that denote the connection tracking zones for
603 routers. These values only have local significance and
604 are not meaningful between chassis. OVN stores the zone
605 information for DNATting in Open vSwitch extension regis‐
606 ter number 11 and zone information for SNATing in Open
607 vSwitch extension register number 12.
608
609 logical flow flags
610 The logical flags are intended to handle keeping context
611 between tables in order to decide which rules in subse‐
612 quent tables are matched. These values only have local
613 significance and are not meaningful between chassis. OVN
614 stores the logical flags in Open vSwitch extension regis‐
615 ter number 10.
616
617 VLAN ID
618 The VLAN ID is used as an interface between OVN and con‐
619 tainers nested inside a VM (see Life Cycle of a container
620 interface inside a VM, above, for more information).
621
622 Initially, a VM or container on the ingress hypervisor sends a packet
623 on a port attached to the OVN integration bridge. Then:
624
625 1. OpenFlow table 0 performs physical-to-logical translation.
626 It matches the packet’s ingress port. Its actions annotate
627 the packet with logical metadata, by setting the logical
628 datapath field to identify the logical datapath that the
629 packet is traversing and the logical input port field to
630 identify the ingress port. Then it resubmits to table 8 to
631 enter the logical ingress pipeline.
632
633 Packets that originate from a container nested within a VM
634 are treated in a slightly different way. The originating
635 container can be distinguished based on the VIF-specific
636 VLAN ID, so the physical-to-logical translation flows addi‐
637 tionally match on VLAN ID and the actions strip the VLAN
638 header. Following this step, OVN treats packets from con‐
639 tainers just like any other packets.
640
641 Table 0 also processes packets that arrive from other chas‐
642 sis. It distinguishes them from other packets by ingress
643 port, which is a tunnel. As with packets just entering the
644 OVN pipeline, the actions annotate these packets with logi‐
645 cal datapath and logical ingress port metadata. In addition,
646 the actions set the logical output port field, which is
647 available because in OVN tunneling occurs after the logical
648 output port is known. These three pieces of information are
649 obtained from the tunnel encapsulation metadata (see Tunnel
650 Encapsulations for encoding details). Then the actions
651 resubmit to table 33 to enter the logical egress pipeline.
652
653 2. OpenFlow tables 8 through 31 execute the logical ingress
654 pipeline from the Logical_Flow table in the OVN Southbound
655 database. These tables are expressed entirely in terms of
656 logical concepts like logical ports and logical datapaths. A
657 big part of ovn-controller’s job is to translate them into
658 equivalent OpenFlow (in particular it translates the table
659 numbers: Logical_Flow tables 0 through 23 become OpenFlow
660 tables 8 through 31).
661
662 Each logical flow maps to one or more OpenFlow flows. An
663 actual packet ordinarily matches only one of these, although
664 in some cases it can match more than one of these flows
665 (which is not a problem because all of them have the same
666 actions). ovn-controller uses the first 32 bits of the logi‐
667 cal flow’s UUID as the cookie for its OpenFlow flow or
668 flows. (This is not necessarily unique, since the first 32
669 bits of a logical flow’s UUID is not necessarily unique.)
670
671 Some logical flows can map to the Open vSwitch ``conjunctive
672 match’’ extension (see ovs-fields(7)). Flows with a conjunc‐
673 tion action use an OpenFlow cookie of 0, because they can
674 correspond to multiple logical flows. The OpenFlow flow for
675 a conjunctive match includes a match on conj_id.
676
677 Some logical flows may not be represented in the OpenFlow
678 tables on a given hypervisor, if they could not be used on
679 that hypervisor. For example, if no VIF in a logical switch
680 resides on a given hypervisor, and the logical switch is not
681 otherwise reachable on that hypervisor (e.g. over a series
682 of hops through logical switches and routers starting from a
683 VIF on the hypervisor), then the logical flow may not be
684 represented there.
685
686 Most OVN actions have fairly obvious implementations in
687 OpenFlow (with OVS extensions), e.g. next; is implemented as
688 resubmit, field = constant; as set_field. A few are worth
689 describing in more detail:
690
691 output:
692 Implemented by resubmitting the packet to table 32.
693 If the pipeline executes more than one output action,
694 then each one is separately resubmitted to table 32.
695 This can be used to send multiple copies of the
696 packet to multiple ports. (If the packet was not mod‐
697 ified between the output actions, and some of the
698 copies are destined to the same hypervisor, then
699 using a logical multicast output port would save
700 bandwidth between hypervisors.)
701
702 get_arp(P, A);
703 get_nd(P, A);
704 Implemented by storing arguments into OpenFlow fields,
705 then resubmitting to table 66, which ovn-controller
706 populates with flows generated from the MAC_Binding ta‐
707 ble in the OVN Southbound database. If there is a match
708 in table 66, then its actions store the bound MAC in
709 the Ethernet destination address field.
710
711 (The OpenFlow actions save and restore the OpenFlow
712 fields used for the arguments, so that the OVN actions
713 do not have to be aware of this temporary use.)
714
715 put_arp(P, A, E);
716 put_nd(P, A, E);
717 Implemented by storing the arguments into OpenFlow
718 fields, then outputting a packet to ovn-controller,
719 which updates the MAC_Binding table.
720
721 (The OpenFlow actions save and restore the OpenFlow
722 fields used for the arguments, so that the OVN actions
723 do not have to be aware of this temporary use.)
724
725 R = lookup_arp(P, A, M);
726 R = lookup_nd(P, A, M);
727 Implemented by storing arguments into OpenFlow fields,
728 then resubmitting to table 67, which ovn-controller
729 populates with flows generated from the MAC_Binding ta‐
730 ble in the OVN Southbound database. If there is a match
731 in table 67, then its actions set the logical flow flag
732 MLF_LOOKUP_MAC.
733
734 (The OpenFlow actions save and restore the OpenFlow
735 fields used for the arguments, so that the OVN actions
736 do not have to be aware of this temporary use.)
737
738 3. OpenFlow tables 32 through 47 implement the output action in
739 the logical ingress pipeline. Specifically, table 32 handles
740 packets to remote hypervisors, table 33 handles packets to
741 the local hypervisor, and table 34 checks whether packets
742 whose logical ingress and egress port are the same should be
743 discarded.
744
745 Logical patch ports are a special case. Logical patch ports
746 do not have a physical location and effectively reside on
747 every hypervisor. Thus, flow table 33, for output to ports
748 on the local hypervisor, naturally implements output to uni‐
749 cast logical patch ports too. However, applying the same
750 logic to a logical patch port that is part of a logical mul‐
751 ticast group yields packet duplication, because each hyper‐
752 visor that contains a logical port in the multicast group
753 will also output the packet to the logical patch port. Thus,
754 multicast groups implement output to logical patch ports in
755 table 32.
756
757 Each flow in table 32 matches on a logical output port for
758 unicast or multicast logical ports that include a logical
759 port on a remote hypervisor. Each flow’s actions implement
760 sending a packet to the port it matches. For unicast logical
761 output ports on remote hypervisors, the actions set the tun‐
762 nel key to the correct value, then send the packet on the
763 tunnel port to the correct hypervisor. (When the remote
764 hypervisor receives the packet, table 0 there will recognize
765 it as a tunneled packet and pass it along to table 33.) For
766 multicast logical output ports, the actions send one copy of
767 the packet to each remote hypervisor, in the same way as for
768 unicast destinations. If a multicast group includes a logi‐
769 cal port or ports on the local hypervisor, then its actions
770 also resubmit to table 33. Table 32 also includes:
771
772 · A higher-priority rule to match packets received from
773 VXLAN tunnels, based on flag MLF_RCV_FROM_VXLAN, and
774 resubmit these packets to table 33 for local deliv‐
775 ery. Packets received from VXLAN tunnels reach here
776 because of a lack of logical output port field in the
777 tunnel key and thus these packets needed to be sub‐
778 mitted to table 8 to determine the output port.
779
780 · A higher-priority rule to match packets received from
781 ports of type localport, based on the logical input
782 port, and resubmit these packets to table 33 for
783 local delivery. Ports of type localport exist on
784 every hypervisor and by definition their traffic
785 should never go out through a tunnel.
786
787 · A higher-priority rule to match packets that have the
788 MLF_LOCAL_ONLY logical flow flag set, and whose des‐
789 tination is a multicast address. This flag indicates
790 that the packet should not be delivered to remote
791 hypervisors, even if the multicast destination
792 includes ports on remote hypervisors. This flag is
793 used when ovn-controller is the originator of the
794 multicast packet. Since each ovn-controller instance
795 is originating these packets, the packets only need
796 to be delivered to local ports.
797
798 · A fallback flow that resubmits to table 33 if there
799 is no other match.
800
801 Flows in table 33 resemble those in table 32 but for logical
802 ports that reside locally rather than remotely. For unicast
803 logical output ports on the local hypervisor, the actions
804 just resubmit to table 34. For multicast output ports that
805 include one or more logical ports on the local hypervisor,
806 for each such logical port P, the actions change the logical
807 output port to P, then resubmit to table 34.
808
809 A special case is that when a localnet port exists on the
810 datapath, remote port is connected by switching to the
811 localnet port. In this case, instead of adding a flow in ta‐
812 ble 32 to reach the remote port, a flow is added in table 33
813 to switch the logical outport to the localnet port, and
814 resubmit to table 33 as if it were unicasted to a logical
815 port on the local hypervisor.
816
817 Table 34 matches and drops packets for which the logical
818 input and output ports are the same and the MLF_ALLOW_LOOP‐
819 BACK flag is not set. It resubmits other packets to table
820 40.
821
822 4. OpenFlow tables 40 through 63 execute the logical egress
823 pipeline from the Logical_Flow table in the OVN Southbound
824 database. The egress pipeline can perform a final stage of
825 validation before packet delivery. Eventually, it may exe‐
826 cute an output action, which ovn-controller implements by
827 resubmitting to table 64. A packet for which the pipeline
828 never executes output is effectively dropped (although it
829 may have been transmitted through a tunnel across a physical
830 network).
831
832 The egress pipeline cannot change the logical output port or
833 cause further tunneling.
834
835 5. Table 64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK
836 is set. Logical loopback was handled in table 34, but Open‐
837 Flow by default also prevents loopback to the OpenFlow
838 ingress port. Thus, when MLF_ALLOW_LOOPBACK is set, OpenFlow
839 table 64 saves the OpenFlow ingress port, sets it to zero,
840 resubmits to table 65 for logical-to-physical transforma‐
841 tion, and then restores the OpenFlow ingress port, effec‐
842 tively disabling OpenFlow loopback prevents. When
843 MLF_ALLOW_LOOPBACK is unset, table 64 flow simply resubmits
844 to table 65.
845
846 6. OpenFlow table 65 performs logical-to-physical translation,
847 the opposite of table 0. It matches the packet’s logical
848 egress port. Its actions output the packet to the port
849 attached to the OVN integration bridge that represents that
850 logical port. If the logical egress port is a container
851 nested with a VM, then before sending the packet the actions
852 push on a VLAN header with an appropriate VLAN ID.
853
854 Logical Routers and Logical Patch Ports
855 Typically logical routers and logical patch ports do not have a physi‐
856 cal location and effectively reside on every hypervisor. This is the
857 case for logical patch ports between logical routers and logical
858 switches behind those logical routers, to which VMs (and VIFs) attach.
859
860 Consider a packet sent from one virtual machine or container to another
861 VM or container that resides on a different subnet. The packet will
862 traverse tables 0 to 65 as described in the previous section Architec‐
863 tural Physical Life Cycle of a Packet, using the logical datapath rep‐
864 resenting the logical switch that the sender is attached to. At table
865 32, the packet will use the fallback flow that resubmits locally to ta‐
866 ble 33 on the same hypervisor. In this case, all of the processing from
867 table 0 to table 65 occurs on the hypervisor where the sender resides.
868
869 When the packet reaches table 65, the logical egress port is a logical
870 patch port. The implementation in table 65 differs depending on the OVS
871 version, although the observed behavior is meant to be the same:
872
873 · In OVS versions 2.6 and earlier, table 65 outputs to an
874 OVS patch port that represents the logical patch port.
875 The packet re-enters the OpenFlow flow table from the OVS
876 patch port’s peer in table 0, which identifies the logi‐
877 cal datapath and logical input port based on the OVS
878 patch port’s OpenFlow port number.
879
880 · In OVS versions 2.7 and later, the packet is cloned and
881 resubmitted directly to the first OpenFlow flow table in
882 the ingress pipeline, setting the logical ingress port to
883 the peer logical patch port, and using the peer logical
884 patch port’s logical datapath (that represents the logi‐
885 cal router).
886
887 The packet re-enters the ingress pipeline in order to traverse tables 8
888 to 65 again, this time using the logical datapath representing the log‐
889 ical router. The processing continues as described in the previous sec‐
890 tion Architectural Physical Life Cycle of a Packet. When the packet
891 reachs table 65, the logical egress port will once again be a logical
892 patch port. In the same manner as described above, this logical patch
893 port will cause the packet to be resubmitted to OpenFlow tables 8 to
894 65, this time using the logical datapath representing the logical
895 switch that the destination VM or container is attached to.
896
897 The packet traverses tables 8 to 65 a third and final time. If the des‐
898 tination VM or container resides on a remote hypervisor, then table 32
899 will send the packet on a tunnel port from the sender’s hypervisor to
900 the remote hypervisor. Finally table 65 will output the packet directly
901 to the destination VM or container.
902
903 The following sections describe two exceptions, where logical routers
904 and/or logical patch ports are associated with a physical location.
905
906 Gateway Routers
907
908 A gateway router is a logical router that is bound to a physical loca‐
909 tion. This includes all of the logical patch ports of the logical
910 router, as well as all of the peer logical patch ports on logical
911 switches. In the OVN Southbound database, the Port_Binding entries for
912 these logical patch ports use the type l3gateway rather than patch, in
913 order to distinguish that these logical patch ports are bound to a
914 chassis.
915
916 When a hypervisor processes a packet on a logical datapath representing
917 a logical switch, and the logical egress port is a l3gateway port rep‐
918 resenting connectivity to a gateway router, the packet will match a
919 flow in table 32 that sends the packet on a tunnel port to the chassis
920 where the gateway router resides. This processing in table 32 is done
921 in the same manner as for VIFs.
922
923 Gateway routers are typically used in between distributed logical
924 routers and physical networks. The distributed logical router and the
925 logical switches behind it, to which VMs and containers attach, effec‐
926 tively reside on each hypervisor. The distributed router and the gate‐
927 way router are connected by another logical switch, sometimes referred
928 to as a join logical switch. On the other side, the gateway router con‐
929 nects to another logical switch that has a localnet port connecting to
930 the physical network.
931
932 When using gateway routers, DNAT and SNAT rules are associated with the
933 gateway router, which provides a central location that can handle one-
934 to-many SNAT (aka IP masquerading).
935
936 Distributed Gateway Ports
937
938 Distributed gateway ports are logical router patch ports that directly
939 connect distributed logical routers to logical switches with localnet
940 ports.
941
942 The primary design goal of distributed gateway ports is to allow as
943 much traffic as possible to be handled locally on the hypervisor where
944 a VM or container resides. Whenever possible, packets from the VM or
945 container to the outside world should be processed completely on that
946 VM’s or container’s hypervisor, eventually traversing a localnet port
947 instance on that hypervisor to the physical network. Whenever possible,
948 packets from the outside world to a VM or container should be directed
949 through the physical network directly to the VM’s or container’s hyper‐
950 visor, where the packet will enter the integration bridge through a
951 localnet port.
952
953 In order to allow for the distributed processing of packets described
954 in the paragraph above, distributed gateway ports need to be logical
955 patch ports that effectively reside on every hypervisor, rather than
956 l3gateway ports that are bound to a particular chassis. However, the
957 flows associated with distributed gateway ports often need to be asso‐
958 ciated with physical locations, for the following reasons:
959
960 · The physical network that the localnet port is attached
961 to typically uses L2 learning. Any Ethernet address used
962 over the distributed gateway port must be restricted to a
963 single physical location so that upstream L2 learning is
964 not confused. Traffic sent out the distributed gateway
965 port towards the localnet port with a specific Ethernet
966 address must be sent out one specific instance of the
967 distributed gateway port on one specific chassis. Traffic
968 received from the localnet port (or from a VIF on the
969 same logical switch as the localnet port) with a specific
970 Ethernet address must be directed to the logical switch’s
971 patch port instance on that specific chassis.
972
973 Due to the implications of L2 learning, the Ethernet
974 address and IP address of the distributed gateway port
975 need to be restricted to a single physical location. For
976 this reason, the user must specify one chassis associated
977 with the distributed gateway port. Note that traffic
978 traversing the distributed gateway port using other Eth‐
979 ernet addresses and IP addresses (e.g. one-to-one NAT) is
980 not restricted to this chassis.
981
982 Replies to ARP and ND requests must be restricted to a
983 single physical location, where the Ethernet address in
984 the reply resides. This includes ARP and ND replies for
985 the IP address of the distributed gateway port, which are
986 restricted to the chassis that the user associated with
987 the distributed gateway port.
988
989 · In order to support one-to-many SNAT (aka IP masquerad‐
990 ing), where multiple logical IP addresses spread across
991 multiple chassis are mapped to a single external IP
992 address, it will be necessary to handle some of the logi‐
993 cal router processing on a specific chassis in a central‐
994 ized manner. Since the SNAT external IP address is typi‐
995 cally the distributed gateway port IP address, and for
996 simplicity, the same chassis associated with the distrib‐
997 uted gateway port is used.
998
999 The details of flow restrictions to specific chassis are described in
1000 the ovn-northd documentation.
1001
1002 While most of the physical location dependent aspects of distributed
1003 gateway ports can be handled by restricting some flows to specific
1004 chassis, one additional mechanism is required. When a packet leaves the
1005 ingress pipeline and the logical egress port is the distributed gateway
1006 port, one of two different sets of actions is required at table 32:
1007
1008 · If the packet can be handled locally on the sender’s
1009 hypervisor (e.g. one-to-one NAT traffic), then the packet
1010 should just be resubmitted locally to table 33, in the
1011 normal manner for distributed logical patch ports.
1012
1013 · However, if the packet needs to be handled on the chassis
1014 associated with the distributed gateway port (e.g. one-
1015 to-many SNAT traffic or non-NAT traffic), then table 32
1016 must send the packet on a tunnel port to that chassis.
1017
1018 In order to trigger the second set of actions, the chassisredirect type
1019 of southbound Port_Binding has been added. Setting the logical egress
1020 port to the type chassisredirect logical port is simply a way to indi‐
1021 cate that although the packet is destined for the distributed gateway
1022 port, it needs to be redirected to a different chassis. At table 32,
1023 packets with this logical egress port are sent to a specific chassis,
1024 in the same way that table 32 directs packets whose logical egress port
1025 is a VIF or a type l3gateway port to different chassis. Once the packet
1026 arrives at that chassis, table 33 resets the logical egress port to the
1027 value representing the distributed gateway port. For each distributed
1028 gateway port, there is one type chassisredirect port, in addition to
1029 the distributed logical patch port representing the distributed gateway
1030 port.
1031
1032 High Availability for Distributed Gateway Ports
1033
1034 OVN allows you to specify a prioritized list of chassis for a distrib‐
1035 uted gateway port. This is done by associating multiple Gateway_Chassis
1036 rows with a Logical_Router_Port in the OVN_Northbound database.
1037
1038 When multiple chassis have been specified for a gateway, all chassis
1039 that may send packets to that gateway will enable BFD on tunnels to all
1040 configured gateway chassis. The current master chassis for the gateway
1041 is the highest priority gateway chassis that is currently viewed as
1042 active based on BFD status.
1043
1044 For more information on L3 gateway high availability, please refer to
1045 http://docs.openvswitch.org/en/latest/topics/high-availability.
1046
1047 Multiple localnet logical switches connected to a Logical Router
1048 It is possible to have multiple logical switches each with a localnet
1049 port (representing physical networks) connected to a logical router, in
1050 which one localnet logical switch may provide the external connectivity
1051 via a distributed gateway port and rest of the localnet logical
1052 switches use VLAN tagging in the physical network. It is expected that
1053 ovn-bridge-mappings is configured appropriately on the chassis for all
1054 these localnet networks.
1055
1056 East West routing
1057
1058 East-West routing between these localnet VLAN tagged logical switches
1059 work almost the same way as normal logical switches. When the VM sends
1060 such a packet, then:
1061
1062 1. It first enters the ingress pipeline, and then egress pipe‐
1063 line of the source localnet logical switch datapath. It then
1064 enters the ingress pipeline of the logical router datapath
1065 via the logical router port in the source chassis.
1066
1067 2. Routing decision is taken.
1068
1069 3. From the router datapath, packet enters the ingress pipeline
1070 and then egress pipeline of the destination localnet logical
1071 switch datapath and goes out of the integration bridge to
1072 the provider bridge ( belonging to the destination logical
1073 switch) via the localnet port. While sending the packet to
1074 provider bridge, we also replace router port MAC as source
1075 MAC with a chassis unique MAC.
1076
1077 This chassis unique MAC is configured as global ovs config
1078 on each chassis (eg. via "ovs-vsctl set open . external-ids:
1079 ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i""). For
1080 more details, see ovn-controller(8).
1081
1082 If the above is not configured, then source MAC would be the
1083 router port MAC. This could create problem if we have more
1084 than one chassis. This is because, since the router port is
1085 distributed, the same (MAC,VLAN) tuple will seen by physical
1086 network from other chassis as well, which could cause these
1087 issues:
1088
1089 · Continuous MAC moves in top-of-rack switch (ToR).
1090
1091 · ToR dropping the traffic, which is causing continuous
1092 MAC moves.
1093
1094 · ToR blocking the ports from which MAC moves are hap‐
1095 pening.
1096
1097 4. The destination chassis receives the packet via the localnet
1098 port and sends it to the integration bridge. Before entering
1099 the integration bridge the source mac of the packet will be
1100 replaced with router port mac again. The packet enters the
1101 ingress pipeline and then egress pipeline of the destination
1102 localnet logical switch and finally gets delivered to the
1103 destination VM port.
1104
1105 External traffic
1106
1107 The following happens when a VM sends an external traffic (which
1108 requires NATting) and the chassis hosting the VM doesn’t have a dis‐
1109 tributed gateway port.
1110
1111 1. The packet first enters the ingress pipeline, and then
1112 egress pipeline of the source localnet logical switch data‐
1113 path. It then enters the ingress pipeline of the logical
1114 router datapath via the logical router port in the source
1115 chassis.
1116
1117 2. Routing decision is taken. Since the gateway router or the
1118 distributed gateway port doesn’t reside in the source chas‐
1119 sis, the traffic is redirected to the gateway chassis via
1120 the tunnel port.
1121
1122 3. The gateway chassis receives the packet via the tunnel port
1123 and the packet enters the egress pipeline of the logical
1124 router datapath. NAT rules are applied here. The packet then
1125 enters the ingress pipeline and then egress pipeline of the
1126 localnet logical switch datapath which provides external
1127 connectivity and finally goes out via the localnet port of
1128 the logical switch which provides external connectivity.
1129
1130 Although this works, the VM traffic is tunnelled when sent from the
1131 compute chassis to the gateway chassis. In order for it to work prop‐
1132 erly, the MTU of the localnet logical switches must be lowered to
1133 account for the tunnel encapsulation.
1134
1135
1136 Centralized routing for localnet VLAN tagged logical switches connected
1137 to a Logical Router "
1138
1139 To overcome the tunnel encapsulation problem described in the previous
1140 section, OVN supports the option of enabling centralized routing for
1141 localnet VLAN tagged logical switches. CMS can configure the option
1142 options:reside-on-redirect-chassis to true for each Logical_Router_Port
1143 which connects to the localnet VLAN tagged logical switches. This
1144 causes the gateway chassis (hosting the distributed gateway port) to
1145 handle all the routing for these networks, making it centralized. It
1146 will reply to the ARP requests for the logical router port IPs.
1147
1148 If the logical router doesn’t have a distributed gateway port connect‐
1149 ing to the localnet logical switch which provides external connectiv‐
1150 ity, then this option is ignored by OVN.
1151
1152 The following happens when a VM sends an east-west traffic which needs
1153 to be routed:
1154
1155 1. The packet first enters the ingress pipeline, and then
1156 egress pipeline of the source localnet logical switch data‐
1157 path and is sent out via the localnet port of the source
1158 localnet logical switch (instead of sending it to router
1159 pipeline).
1160
1161 2. The gateway chassis receives the packet via the localnet
1162 port of the source localnet logical switch and sends it to
1163 the integration bridge. The packet then enters the ingress
1164 pipeline, and then egress pipeline of the source localnet
1165 logical switch datapath and enters the ingress pipeline of
1166 the logical router datapath.
1167
1168 3. Routing decision is taken.
1169
1170 4. From the router datapath, packet enters the ingress pipeline
1171 and then egress pipeline of the destination localnet logical
1172 switch datapath. It then goes out of the integration bridge
1173 to the provider bridge ( belonging to the destination logi‐
1174 cal switch) via the localnet port.
1175
1176 5. The destination chassis receives the packet via the localnet
1177 port and sends it to the integration bridge. The packet
1178 enters the ingress pipeline and then egress pipeline of the
1179 destination localnet logical switch and finally delivered to
1180 the destination VM port.
1181
1182 The following happens when a VM sends an external traffic which
1183 requires NATting:
1184
1185 1. The packet first enters the ingress pipeline, and then
1186 egress pipeline of the source localnet logical switch data‐
1187 path and is sent out via the localnet port of the source
1188 localnet logical switch (instead of sending it to router
1189 pipeline).
1190
1191 2. The gateway chassis receives the packet via the localnet
1192 port of the source localnet logical switch and sends it to
1193 the integration bridge. The packet then enters the ingress
1194 pipeline, and then egress pipeline of the source localnet
1195 logical switch datapath and enters the ingress pipeline of
1196 the logical router datapath.
1197
1198 3. Routing decision is taken and NAT rules are applied.
1199
1200 4. From the router datapath, packet enters the ingress pipeline
1201 and then egress pipeline of the localnet logical switch
1202 datapath which provides external connectivity. It then goes
1203 out of the integration bridge to the provider bridge
1204 (belonging to the logical switch which provides external
1205 connectivity) via the localnet port.
1206
1207 The following happens for the reverse external traffic.
1208
1209 1. The gateway chassis receives the packet from the localnet
1210 port of the logical switch which provides external connec‐
1211 tivity. The packet then enters the ingress pipeline and then
1212 egress pipeline of the localnet logical switch (which pro‐
1213 vides external connectivity). The packet then enters the
1214 ingress pipeline of the logical router datapath.
1215
1216 2. The ingress pipeline of the logical router datapath applies
1217 the unNATting rules. The packet then enters the ingress
1218 pipeline and then egress pipeline of the source localnet
1219 logical switch. Since the source VM doesn’t reside in the
1220 gateway chassis, the packet is sent out via the localnet
1221 port of the source logical switch.
1222
1223 3. The source chassis receives the packet via the localnet port
1224 and sends it to the integration bridge. The packet enters
1225 the ingress pipeline and then egress pipeline of the source
1226 localnet logical switch and finally gets delivered to the
1227 source VM port.
1228
1229 VLAN based redirection As an enhancement to reside-on-redirect-chassis
1230 we support VLAN based redirection as well. By setting options:redi‐
1231 rect-type to vlan to a gateway chassis attached router port, user can
1232 enforce that redirected packet should not use tunnel port but rather
1233 use localnet port of peer logical switch to go out as vlan packet.
1234
1235 Following happens for a VLAN based redirection:
1236
1237 1. On compute chassis, packet passes though logical router’s
1238 ingress pipeline.
1239
1240 2. If logical outport is gateway chassis attached router port
1241 then packet is "redirected" to gateway chassis using peer
1242 logical switch’s localnet port.
1243
1244 3. This VLAN backed redirected packet has destination mac as
1245 router port mac (the one to which gateway chassis is
1246 attached) and vlan id is that of localnet port (peer logical
1247 switch of the logical router port).
1248
1249 4. On the gateway chassis packet will enter the logical router
1250 pipeline again and this time it will passthrough egress
1251 pipeline as well.
1252
1253 5. Reverse traffic packet flows stays the same.
1254
1255 Some guidelines and expections with VLAN based redirection:
1256
1257 1. Since router port mac is destination mac, hence it has to be
1258 ensured that physical network learns it on ONLY from the
1259 gateway chassis. Which means that ovn-chassis-mac-mappings
1260 should be configure on all the compute nodes, so that physi‐
1261 cal network never learn router port mac from compute nodes.
1262
1263 2. Since packet enters logical router ingress pipeline twice
1264 (once on compute chassis and again on gateway chassis),
1265 hence ttl will be decremented twice.
1266
1267 3. Default redirection type continues to be overlay. User can
1268 switch the redirect-type between vlan and overlay by chang‐
1269 ing the value of options:redirect-type
1270
1271 Life Cycle of a VTEP gateway
1272 A gateway is a chassis that forwards traffic between the OVN-managed
1273 part of a logical network and a physical VLAN, extending a tunnel-based
1274 logical network into a physical network.
1275
1276 The steps below refer often to details of the OVN and VTEP database
1277 schemas. Please see ovn-sb(5), ovn-nb(5) and vtep(5), respectively, for
1278 the full story on these databases.
1279
1280 1. A VTEP gateway’s life cycle begins with the administrator
1281 registering the VTEP gateway as a Physical_Switch table
1282 entry in the VTEP database. The ovn-controller-vtep con‐
1283 nected to this VTEP database, will recognize the new VTEP
1284 gateway and create a new Chassis table entry for it in the
1285 OVN_Southbound database.
1286
1287 2. The administrator can then create a new Logical_Switch table
1288 entry, and bind a particular vlan on a VTEP gateway’s port
1289 to any VTEP logical switch. Once a VTEP logical switch is
1290 bound to a VTEP gateway, the ovn-controller-vtep will detect
1291 it and add its name to the vtep_logical_switches column of
1292 the Chassis table in the OVN_Southbound database. Note, the
1293 tunnel_key column of VTEP logical switch is not filled at
1294 creation. The ovn-controller-vtep will set the column when
1295 the correponding vtep logical switch is bound to an OVN log‐
1296 ical network.
1297
1298 3. Now, the administrator can use the CMS to add a VTEP logical
1299 switch to the OVN logical network. To do that, the CMS must
1300 first create a new Logical_Switch_Port table entry in the
1301 OVN_Northbound database. Then, the type column of this entry
1302 must be set to "vtep". Next, the vtep-logical-switch and
1303 vtep-physical-switch keys in the options column must also be
1304 specified, since multiple VTEP gateways can attach to the
1305 same VTEP logical switch.
1306
1307 4. The newly created logical port in the OVN_Northbound data‐
1308 base and its configuration will be passed down to the
1309 OVN_Southbound database as a new Port_Binding table entry.
1310 The ovn-controller-vtep will recognize the change and bind
1311 the logical port to the corresponding VTEP gateway chassis.
1312 Configuration of binding the same VTEP logical switch to a
1313 different OVN logical networks is not allowed and a warning
1314 will be generated in the log.
1315
1316 5. Beside binding to the VTEP gateway chassis, the ovn-con‐
1317 troller-vtep will update the tunnel_key column of the VTEP
1318 logical switch to the corresponding Datapath_Binding table
1319 entry’s tunnel_key for the bound OVN logical network.
1320
1321 6. Next, the ovn-controller-vtep will keep reacting to the con‐
1322 figuration change in the Port_Binding in the OVN_Northbound
1323 database, and updating the Ucast_Macs_Remote table in the
1324 VTEP database. This allows the VTEP gateway to understand
1325 where to forward the unicast traffic coming from the
1326 extended external network.
1327
1328 7. Eventually, the VTEP gateway’s life cycle ends when the
1329 administrator unregisters the VTEP gateway from the VTEP
1330 database. The ovn-controller-vtep will recognize the event
1331 and remove all related configurations (Chassis table entry
1332 and port bindings) in the OVN_Southbound database.
1333
1334 8. When the ovn-controller-vtep is terminated, all related con‐
1335 figurations in the OVN_Southbound database and the VTEP
1336 database will be cleaned, including Chassis table entries
1337 for all registered VTEP gateways and their port bindings,
1338 and all Ucast_Macs_Remote table entries and the Logi‐
1339 cal_Switch tunnel keys.
1340
1341 Native OVN services for external logical ports
1342 To support OVN native services (like DHCP/IPv6 RA/DNS lookup) to the
1343 cloud resources which are external, OVN supports external logical
1344 ports.
1345
1346 Below are some of the use cases where external ports can be used.
1347
1348 · VMs connected to SR-IOV nics - Traffic from these VMs by
1349 passes the kernel stack and local ovn-controller do not
1350 bind these ports and cannot serve the native services.
1351
1352 · When CMS supports provisioning baremetal servers.
1353
1354 OVN will provide the native services if CMS has done the below configu‐
1355 ration in the OVN Northbound Database.
1356
1357 · A row is created in Logical_Switch_Port, configuring the
1358 addresses column and setting the type to external.
1359
1360 · ha_chassis_group column is configured.
1361
1362 · The HA chassis which belongs to the HA chassis group has
1363 the ovn-bridge-mappings configured and has proper L2 con‐
1364 nectivity so that it can receive the DHCP and other
1365 related request packets from these external resources.
1366
1367 · The Logical_Switch of this port has a localnet port.
1368
1369 · Native OVN services are enabled by configuring the DHCP
1370 and other options like the way it is done for the normal
1371 logical ports.
1372
1373 It is recommended to use the same HA chassis group for all the external
1374 ports of a logical switch. Otherwise, the physical switch might see MAC
1375 flap issue when different chassis provide the native services. For
1376 example when supporting native DHCPv4 service, DHCPv4 server mac (con‐
1377 figured in options:server_mac column in table DHCP_Options) originating
1378 from different ports can cause MAC flap issue. The MAC of the logical
1379 router IP(s) can also flap if the same HA chassis group is not set for
1380 all the external ports of a logical switch.
1381
1383 Role-Based Access Controls for the Soutbound DB
1384 In order to provide additional security against the possibility of an
1385 OVN chassis becoming compromised in such a way as to allow rogue soft‐
1386 ware to make arbitrary modifications to the southbound database state
1387 and thus disrupt the OVN network, role-based access controls (see
1388 ovsdb-server(1) for additional details) are provided for the southbound
1389 database.
1390
1391 The implementation of role-based access controls (RBAC) requires the
1392 addition of two tables to an OVSDB schema: the RBAC_Role table, which
1393 is indexed by role name and maps the the names of the various tables
1394 that may be modifiable for a given role to individual rows in a permis‐
1395 sions table containing detailed permission information for that role,
1396 and the permission table itself which consists of rows containing the
1397 following information:
1398
1399 Table Name
1400 The name of the associated table. This column exists pri‐
1401 marily as an aid for humans reading the contents of this
1402 table.
1403
1404 Auth Criteria
1405 A set of strings containing the names of columns (or col‐
1406 umn:key pairs for columns containing string:string maps).
1407 The contents of at least one of the columns or column:key
1408 values in a row to be modified, inserted, or deleted must
1409 be equal to the ID of the client attempting to act on the
1410 row in order for the authorization check to pass. If the
1411 authorization criteria is empty, authorization checking
1412 is disabled and all clients for the role will be treated
1413 as authorized.
1414
1415 Insert/Delete
1416 Row insertion/deletion permission; boolean value indicat‐
1417 ing whether insertion and deletion of rows is allowed for
1418 the associated table. If true, insertion and deletion of
1419 rows is allowed for authorized clients.
1420
1421 Updatable Columns
1422 A set of strings containing the names of columns or col‐
1423 umn:key pairs that may be updated or mutated by autho‐
1424 rized clients. Modifications to columns within a row are
1425 only permitted when the authorization check for the
1426 client passes and all columns to be modified are included
1427 in this set of modifiable columns.
1428
1429 RBAC configuration for the OVN southbound database is maintained by
1430 ovn-northd. With RBAC enabled, modifications are only permitted for the
1431 Chassis, Encap, Port_Binding, and MAC_Binding tables, and are
1432 resstricted as follows:
1433
1434 Chassis
1435 Authorization: client ID must match the chassis name.
1436
1437 Insert/Delete: authorized row insertion and deletion are
1438 permitted.
1439
1440 Update: The columns nb_cfg, external_ids, encaps, and
1441 vtep_logical_switches may be modified when authorized.
1442
1443 Encap Authorization: client ID must match the chassis name.
1444
1445 Insert/Delete: row insertion and row deletion are permit‐
1446 ted.
1447
1448 Update: The columns type, options, and ip can be modi‐
1449 fied.
1450
1451 Port_Binding
1452 Authorization: disabled (all clients are considered
1453 authorized. A future enhancement may add columns (or keys
1454 to external_ids) in order to control which chassis are
1455 allowed to bind each port.
1456
1457 Insert/Delete: row insertion/deletion are not permitted
1458 (ovn-northd maintains rows in this table.
1459
1460 Update: Only modifications to the chassis column are per‐
1461 mitted.
1462
1463 MAC_Binding
1464 Authorization: disabled (all clients are considered to be
1465 authorized).
1466
1467 Insert/Delete: row insertion/deletion are permitted.
1468
1469 Update: The columns logical_port, ip, mac, and datapath
1470 may be modified by ovn-controller.
1471
1472 Enabling RBAC for ovn-controller connections to the southbound database
1473 requires the following steps:
1474
1475 1. Creating SSL certificates for each chassis with the certifi‐
1476 cate CN field set to the chassis name (e.g. for a chassis
1477 with external-ids:system-id=chassis-1, via the command
1478 "ovs-pki -u req+sign chassis-1 switch").
1479
1480 2. Configuring each ovn-controller to use SSL when connecting
1481 to the southbound database (e.g. via "ovs-vsctl set open .
1482 external-ids:ovn-remote=ssl:x.x.x.x:6642").
1483
1484 3. Configuring a southbound database SSL remote with "ovn-con‐
1485 troller" role (e.g. via "ovn-sbctl set-connection
1486 role=ovn-controller pssl:6642").
1487
1488 Encrypt Tunnel Traffic with IPsec
1489 OVN tunnel traffic goes through physical routers and switches. These
1490 physical devices could be untrusted (devices in public network) or
1491 might be compromised. Enabling encryption to the tunnel traffic can
1492 prevent the traffic data from being monitored and manipulated.
1493
1494 The tunnel traffic is encrypted with IPsec. The CMS sets the ipsec col‐
1495 umn in the northbound NB_Global table to enable or disable IPsec encry‐
1496 tion. If ipsec is true, all OVN tunnels will be encrypted. If ipsec is
1497 false, no OVN tunnels will be encrypted.
1498
1499 When CMS updates the ipsec column in the northbound NB_Global table,
1500 ovn-northd copies the value to the ipsec column in the southbound
1501 SB_Global table. ovn-controller in each chassis monitors the southbound
1502 database and sets the options of the OVS tunnel interface accordingly.
1503 OVS tunnel interface options are monitored by the ovs-monitor-ipsec
1504 daemon which configures IKE daemon to set up IPsec connections.
1505
1506 Chassis authenticates each other by using certificate. The authentica‐
1507 tion succeeds if the other end in tunnel presents a certificate signed
1508 by a trusted CA and the common name (CN) matches the expected chassis
1509 name. The SSL certificates used in role-based access controls (RBAC)
1510 can be used in IPsec. Or use ovs-pki to create different certificates.
1511 The certificate is required to be x.509 version 3, and with CN field
1512 and subjectAltName field being set to the chassis name.
1513
1514 The CA certificate, chassis certificate and private key are required to
1515 be installed in each chassis before enabling IPsec. Please see
1516 ovs-vswitchd.conf.db(5) for setting up CA based IPsec authentication.
1517
1519 Tunnel Encapsulations
1520 OVN annotates logical network packets that it sends from one hypervisor
1521 to another with the following three pieces of metadata, which are
1522 encoded in an encapsulation-specific fashion:
1523
1524 · 24-bit logical datapath identifier, from the tunnel_key
1525 column in the OVN Southbound Datapath_Binding table.
1526
1527 · 15-bit logical ingress port identifier. ID 0 is reserved
1528 for internal use within OVN. IDs 1 through 32767, inclu‐
1529 sive, may be assigned to logical ports (see the tun‐
1530 nel_key column in the OVN Southbound Port_Binding table).
1531
1532 · 16-bit logical egress port identifier. IDs 0 through
1533 32767 have the same meaning as for logical ingress ports.
1534 IDs 32768 through 65535, inclusive, may be assigned to
1535 logical multicast groups (see the tunnel_key column in
1536 the OVN Southbound Multicast_Group table).
1537
1538 For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
1539 encapsulations, for the following reasons:
1540
1541 · Only STT and Geneve support the large amounts of metadata
1542 (over 32 bits per packet) that OVN uses (as described
1543 above).
1544
1545 · STT and Geneve use randomized UDP or TCP source ports
1546 that allows efficient distribution among multiple paths
1547 in environments that use ECMP in their underlay.
1548
1549 · NICs are available to offload STT and Geneve encapsula‐
1550 tion and decapsulation.
1551
1552 Due to its flexibility, the preferred encapsulation between hypervisors
1553 is Geneve. For Geneve encapsulation, OVN transmits the logical datapath
1554 identifier in the Geneve VNI. OVN transmits the logical ingress and
1555 logical egress ports in a TLV with class 0x0102, type 0x80, and a
1556 32-bit value encoded as follows, from MSB to LSB:
1557
1558 1 15 16
1559 +---+------------+-----------+
1560 |rsv|ingress port|egress port|
1561 +---+------------+-----------+
1562 0
1563
1564
1565 Environments whose NICs lack Geneve offload may prefer STT encapsula‐
1566 tion for performance reasons. For STT encapsulation, OVN encodes all
1567 three pieces of logical metadata in the STT 64-bit tunnel ID as fol‐
1568 lows, from MSB to LSB:
1569
1570 9 15 16 24
1571 +--------+------------+-----------+--------+
1572 |reserved|ingress port|egress port|datapath|
1573 +--------+------------+-----------+--------+
1574 0
1575
1576
1577 For connecting to gateways, in addition to Geneve and STT, OVN supports
1578 VXLAN, because only VXLAN support is common on top-of-rack (ToR)
1579 switches. Currently, gateways have a feature set that matches the capa‐
1580 bilities as defined by the VTEP schema, so fewer bits of metadata are
1581 necessary. In the future, gateways that do not support encapsulations
1582 with large amounts of metadata may continue to have a reduced feature
1583 set.
1584
1585
1586
1587Open vSwitch 2.12.0 OVN Architecture ovn-architecture(7)