1Resource(Management) Resource(Management)
2
3
4
5 ⟨https://travis-ci.org/intel/rmd⟩
6
7 ⟨https://goreportcard.com/report/github.com/intel/rmd⟩
8
9 ⟨https://godoc.org/github.com/intel/rmd⟩
10
11______________________________________________________________________________
12
13
14 Resource Management Daemon (RMD) is a system daemon running on generic
15 Linux platforms. The purpose of this daemon is to provide a central
16 uniform interface portal for hardware resource management tasks on x86
17 platforms.
18
19______________________________________________________________________________
20
21
23 RMD manages Intel RDT ⟨https://www.intel.com/content/www/us/en/archi‐
24 tecture-and-technology/resource-director-technology.html⟩ resources as
25 the first step. Specifically in the current release, Cache Allocation
26 Technology (CAT) is supported. CAT hardware feature is exposed to the
27 software by a number of Model Specific Registers (MSR). It is supported
28 by several software layers (e.g., libpqos and resctrl file system). The
29 advantages of RMD are:
30
31
32 · User friendly API: Most (if not all) of the alternative ways
33 to use RDT resources include manipulating bit masks,whereas
34 RMD offers a user friendly RESTFul API that end users just
35 need to specify the amount of the desired resources and some
36 other attributes. RMD will convert that quantity into corre‐
37 sponding bit masks correctly and automatically.
38
39 · System level awareness: One system may (and quite possible in
40 a hyper-convergent deployment) host several software entities
41 like OpenStack, Kubernates, Ceph and so on. Each of these
42 software entities may have their built-in support for RDT
43 resources but they may not have a system level view of all the
44 competitors of RDT resources and thus lacks of coordination.
45 Through RMD, these software entities can collaborate in
46 resource consumption. RMD can be a system level resource
47 orchestrator.
48
49 · Built-in intelligence: Though not supported yet, in RMD road
50 map, Machine Learning is one of the attractive incoming fea‐
51 tures which will provide intelligence to auto adjust resource
52 usage according to user pre-defined policies and the pressure
53 of resource contention.
54
55
56
57 Cache Pools/Groups
58 RMD divides the system L3 cache into the following groups or pools.
59 Each task of a RMD enabled system falls into one of the groups explic‐
60 itly or implicitly. Workloads are used to describe a group of tasks of
61 the same cache attributes.
62
63
64 · OS group: This is the default cache group that any newly
65 spawned task on the system is put into if not specified other‐
66 wise. Tasks in this group all shares the cache ways allocated
67 to this group but does not share/overlap with cache ways in
68 other groups.
69
70 · Infra group: Infrastructure group. Tasks allocating cache ways
71 from this group share cache ways with all of the other groups
72 except OS group. This group is intended for the infrastructure
73 software that provides common facilitation to all of the work‐
74 loads. An example would be the virtual switch software that
75 connects to all the virtual machines in the system.
76
77 · Guaranteed group: Workloads allocating cache ways from this
78 group have their guaranteed amount of desired cache ways.
79 Cache ways in this group are dedicated to their associated
80 workloads, not shared with others except the infra group.
81
82 · Best effort group: Workloads allocating cache ways from this
83 group have their minimal amount of desired cache ways guaran‐
84 teed but can burst to their maximum amount of desired cache
85 ways whenever possible. Cache ways in this group are also ded‐
86 icated to their associated workloads, not shared with others
87 except the infra group.
88
89 · Shared group: Workloads allocating cache ways from the shared
90 group shares the whole amount of cache ways assigned to the
91 group.
92
93
94
95 The amount of cache ways for each of the above groups are configurable
96 in the RMD configuration file. Below diagram gives an example of a sys‐
97 tem of 11 cache ways.
98
99
100 Cache Specification
101 Please refer to the API documentation ⟨docs/api/v1/swagger.yaml⟩ for a
102 comprehensive description of RMD APIs. Here is a brief depiction of how
103 to assign workloads to different aforementioned cache pools.
104
105
106 OS group is the default group, so if no one explicitly moves a task or
107 workload to other group, then it stays in the OS group.
108
109
110 Tasks in the infra group are pre-configured in the configuration file.
111 No API is provided to assign a task to the infra group dynamically.
112
113
114 End users make their cache requirements by specifying two values in
115 Cache section (max and min) associated with the workload:
116
117
118 · max == min > 0 nbsp;nbsp;nbsp;nbsp; ==> guaranteed group
119
120 · max > min > 0 nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; ==> best effort
121 group
122
123 · max == min == 0 nbsp;nbsp;nbsp;==> shared group
124
125
126
128 From a logical point of view, there are several components of RMD:
129
130
131 · HTTPS server -- provides mutual (client and server) authenti‐
132 cation and traffic encryption
133
134 · RESTFul API provider -- accepts and sanitizes user require‐
135 ments
136
137 · Policy engine -- decides whether to enforce or reject user
138 requirement based on system resource status
139
140 · Resctrl filesystem interface -- interacts with kernel resctrl
141 interface to enforce user requirements
142
143
144
145 From a physical point of view, RMD is composed of two processes -- the
146 front-end and the back-end. The splitting of RMD into two processes is
147 of security concerns. The front-end process which conducts most of the
148 jobs runs as a normal user (least privilege). Whereas the back-end
149 process runs as a privileged user because it has to do modifications to
150 the resctrl file system. The back-end process is deliberately kept as
151 small/simple as possible. Only add logic to the back-end when there is
152 definitely a need to lift privilege. The front-end and back-end commu‐
153 nicates via an anonymous pipe.
154
155
156 For more information on the design and architecture, please refer to
157 the developers guide ⟨docs/DeveloperQuickStart.md⟩
158
159
161 Please refer to the API documentation ⟨docs/api/v1/swagger.yaml⟩ for a
162 comprehensive description of RMD APIs. This section provides the intro‐
163 duction and rationale of the API entry points.
164
165
166 /cache entry point
167 This entry point and its sub-categories are to get system cache infor‐
168 mation. so only "GET" method is accepted by this entry point.
169
170
171 /workloads entry point
172 Through the "/workloads" entry point you can specify a workload by CPU
173 IDs and/or task IDs. And specify the workload's demand of caches in one
174 of two ways. The first way is to specify the Cache max/min" values
175 explicitly as aforementioned. The second way is to associate the work‐
176 load with one of the pre-defined "policies" (see below "/policy" entry
177 point). The pre-defined policies have pre-defined max/min values that
178 they are translated into.
179
180
181 /hospitality entry point
182 The reason behind this "/hospitality" entry point is that there are
183 often the needs to know how well a host can do to fulfill a certain
184 cache allocation requirement. This requirement usually comes from
185 scheduling in a large cluster deployment. So the notion of "hospitality
186 score" is introduced.
187
188
189 Why can't the available cache amount do the job? Currently the last
190 level cache in Intel platforms can only be allocated contiguously. So
191 the totally amount of available last level cache won't help due to
192 fragmentation issues.
193
194
195 The hospitality score is calculated differently for workloads of dif‐
196 ferent cache groups. (In below explanation 'value' means the largest
197 available contiguous cache ways in the corresponding group)
198
199
200 · guaranteed group: if value > max_cache then return 100 else
201 return 0
202
203 · best effort group: if value > max_cache then return 100 if
204 min_cache < value < max_cache then return (value/max)*100 if
205 value < min_cache then return 0
206
207 · shared group: return 100 if current workload number <
208 max_allowed_shared in shared group
209
210
211
212 /policy entry point
213 The "/policy" entry point contains the pre-defined recommended cache
214 usage values for the specific platform that this RMD instance is run‐
215 ning. Though completely configurable, the default policies are defined
216 as "Gold/Sliver/Bronze" to classify different service levels. API user
217 can get policies and associate workloads with one of the policies.
218
219
221 Configuration guide ⟨docs/ConfigurationGuide.md⟩
222
223
224 API Documentation ⟨docs/api/v1/swagger.yaml⟩
225
226
227 Users guide ⟨docs/UserGuide.md⟩
228
229
230 Developers guide ⟨docs/DeveloperQuickStart.md⟩
231
232
233
234 Daemon Resource(Management)