1EVS_OVERVIEW(8) Corosync Cluster Engine Programmer's Manual EVS_OVERVIEW(8)
2
3
4
6 evs_overview - EvS Library Overview
7
9 The EVS library is delivered with the corosync project. This library
10 is used to create distributed applications that operate properly during
11 partitions, merges, and faults.
12
13 The library provides a mechanism to: * handle abstraction for multiple
14 instances of an EVS library in one application * Deliver messages *
15 Deliver configuration changes * join one or more groups * leave one or
16 more groups * send messages to one or more groups * send messages to
17 currently joined groups
18
19 The EVS library implements a messaging model known as Extended Virtual
20 Synchrony. This model allows one sender to transmit to many receivers
21 using standard UDP/IP. UDP/IP is unreliable and unordered, so the EVS
22 library applies ordering and reliability to messages. Hardware multi‐
23 cast is used to avoid duplicated packets with two or more receivers.
24 Erroneous messages are corrected automatically by the library.
25
26 Certain guarantees are provided by the EVS library. These guarantees
27 are related to message delivery and configuration change delivery.
28
30 multicast
31 A multicast occurs when a network interface card sends a UDP
32 packet to multiple receivers simulatenously.
33
34 processor
35 A processor is the entity that executes the extended virtual
36 synchrony algorithms.
37
38 configuration
39 A configuration is the current description of the processors
40 executing the extended virtual syncrhony algorithm.
41
42 configuration change
43 A configuration change occurs when a new configuration is deliv‐
44 ered.
45
46 partition
47 A partition occurs when a configuration splits into two or more
48 configurations, or a processor fails or is stopped and leaves
49 the configuration.
50
51 merge A merge occurs when two or more configurations join into a
52 larger new configuration. When a new processor starts up, it is
53 treated as a configuration with only one processor and a merge
54 occurs.
55
56 fifo ordering
57 A message is FIFO ordered when one sender and one receiver agree
58 on the order of the messages sent.
59
60 agreed ordering
61 A message is AGREED ordered when all processors agree on the
62 order of the messages sent.
63
64 safe ordering
65 A message is SAFE ordered when all processors agree on the order
66 of messages sent and those messages are not delivered until all
67 processors have a copy of the message to deliver.
68
69 virtual syncrhony
70 Virtual syncrhony is obtained when all processors agree on the
71 order of messages sent and configuration changes sent for each
72 new configuration.
73
75 The virtual synchrony messaging model has many benefits for developing
76 distributed applications. Applications designed using replication have
77 the most benefits. Applications that must be able to partition and
78 merge also benefit from the virtual synchrony messaging model.
79
80 All applications receive a copy of transmitted messages even if there
81 are errors on the transmission media. This allows optimiziations when
82 every processor must receive a copy of the message for replication.
83
84 All messages are ordered according to agreed ordering. This mechanism
85 allows the avoidance of race conditions. Consider a lock service
86 implemented over several processors. Two requests occur at the same
87 time on two seperate processors. The requests are ordered for every
88 processor in the same order and delivered to the processors. Then all
89 processors will get request A before request B and can reject request
90 B. Any type of creation or deletion of a shared data structure can
91 benefit from this mechanism.
92
93 Self delivery ensures that messages that are sent by a processor are
94 also delivered back to that processor. This allows the processor send‐
95 ing the message to execute logic when the message is self delivered
96 according to agreed ordering and the virtual synchrony rules. It also
97 permits all logic to be placed in one message handler instead of two
98 seperate places.
99
100 Virtual Synchrony allows the current configuration to be used to make
101 decisions in partitions and merges. Since the configuration is sent in
102 the stream of messages to the application, the application can alter
103 its behavior based upon the configuration changes.
104
106 The EVS library is a thin IPC interface to the corosync executive. The
107 corosync executive provides services for the SA Forum AIS libraries as
108 well as the EVS library.
109
110 The corosync executive uses a ring protocol and membership protocol to
111 send messages according to the semantics required by extended virtual
112 synchrony. The ring protocol creates a virtual ring of processors. A
113 token is rotated around the ring of processors. When the token is pos‐
114 sessed by a processor, that processor may multicast messages to other
115 processors in the system.
116
117 The token is called the ORF token (for ordering, reliability, flow con‐
118 trol). The ORF token orders all messages by increasing a sequence num‐
119 ber every time a message is multicasted. In this way, an ordering is
120 placed on all messages that all processors agree to. The token also
121 contains a retransmission list. If a token is received by a processor
122 that has not yet received a message it should have, a message sequence
123 number is added to the retransmission list. A processor that has a
124 copy of the message then retransmits the message. The ORF token pro‐
125 vides configuration-wide flow control by tracking the number of mes‐
126 sages sent and limiting the number of messages that may be sent by one
127 processor on each posession of the token.
128
129 The membership protocol is responsible for ring formation and detecting
130 when a processor within a ring has failed. If the token fails to make
131 a rotation within a timeout period known as the token rotation timeout,
132 the membership protocol will form a new ring. If a new processor
133 starts, it will also form a new ring. Two or more configurations may
134 be used to form a new ring, allowing many partitions to merge together
135 into one new configuration.
136
138 The EVS library obtains 8.5MB/sec throughput on 100 mbit network links
139 with many processors. Larger messages obtain better throughput results
140 because the time to access Ethernet is about the same for a small mes‐
141 sage as it is for a larger message. Smaller messages obtain better
142 messages per second, because the time to send a message is not exactly
143 the same.
144
145 80% of CPU utilization occurs because of encryption and authentication.
146 The corosync can be built without encryption and authentication for
147 those with no security requirements and low CPU utilization require‐
148 ments. Even without encryption or authentication, under heavy load,
149 processor utilization can reach 25% on 1.5 GHZ CPU processors.
150
151 The current corosync executive supports 16 processors, however, support
152 for more processors is possible by changing defines in the corosync
153 executive. This is untested, however.
154
156 The EVS library encrypts all messages sent over the network using the
157 SOBER-128 stream cipher. The EVS library uses HMAC and SHA1 to authen‐
158 ticate all messages. The EVS library uses SOBER-128 as a pseudo random
159 number generator. The EVS library feeds the PRNG using the /dev/random
160 Linux device.
161
163 This software is not yet production, so there may still be some bugs.
164 But it appears there are very few since nobody reports any unknown bugs
165 at this point.
166
168 evs_initialize(3), evs_finalize(3), evs_fd_get(3), evs_dispatch(3),
169 evs_join(3), evs_leave(3), evs_mcast_joined(3), evs_mcast_groups(3),
170 evs_mmembership_get(3) evs_context_get(3) evs_context_set(3)
171
172
173corosync Man Page 2004-08-31 EVS_OVERVIEW(8)