rmd(8) - f34

1Resource(Management)                                      Resource(Management)
2
3
4
5       ⟨https://travis-ci.org/intel/rmd⟩
6
7       ⟨https://goreportcard.com/report/github.com/intel/rmd⟩
8
9       ⟨https://godoc.org/github.com/intel/rmd⟩
10
11______________________________________________________________________________
12
13
14       Resource  Management Daemon (RMD) is a system daemon running on generic
15       Linux platforms. The purpose of this daemon is  to  provide  a  central
16       uniform  interface portal for hardware resource management tasks on x86
17       platforms.
18
19______________________________________________________________________________
20
21

Overview

23       RMD manages Intel  RDT  ⟨https://www.intel.com/content/www/us/en/archi‐
24       tecture-and-technology/resource-director-technology.html⟩  resources as
25       the first step. Specifically in the current release,  Cache  Allocation
26       Technology  (CAT)  is supported. CAT hardware feature is exposed to the
27       software by a number of Model Specific Registers (MSR). It is supported
28       by several software layers (e.g., libpqos and resctrl file system). The
29       advantages of RMD are:
30
31
32              · User friendly API: Most (if not all) of the  alternative  ways
33                to  use  RDT resources include manipulating bit masks，whereas
34                RMD offers a user friendly RESTFul API  that  end  users  just
35                need  to  specify the amount of the desired resources and some
36                other attributes. RMD will convert that quantity  into  corre‐
37                sponding bit masks correctly and automatically.
38
39              · System  level awareness: One system may (and quite possible in
40                a hyper-convergent deployment) host several software  entities
41                like  OpenStack,  Kubernates,  Ceph  and  so on. Each of these
42                software entities may have  their  built-in  support  for  RDT
43                resources but they may not have a system level view of all the
44                competitors of RDT resources and thus lacks  of  coordination.
45                Through  RMD,  these  software  entities  can  collaborate  in
46                resource consumption. RMD  can  be  a  system  level  resource
47                orchestrator.
48
49              · Built-in  intelligence:  Though not supported yet, in RMD road
50                map, Machine Learning is one of the attractive  incoming  fea‐
51                tures  which will provide intelligence to auto adjust resource
52                usage according to user pre-defined policies and the  pressure
53                of resource contention.
54
55
56
57   Cache Pools/Groups
58       RMD  divides  the  system  L3 cache into the following groups or pools.
59       Each task of a RMD enabled system falls into one of the groups  explic‐
60       itly  or implicitly. Workloads are used to describe a group of tasks of
61       the same cache attributes.
62
63
64              · OS group: This is the  default  cache  group  that  any  newly
65                spawned task on the system is put into if not specified other‐
66                wise. Tasks in this group all shares the cache ways  allocated
67                to  this  group  but does not share/overlap with cache ways in
68                other groups.
69
70              · Infra group: Infrastructure group. Tasks allocating cache ways
71                from  this group share cache ways with all of the other groups
72                except OS group. This group is intended for the infrastructure
73                software that provides common facilitation to all of the work‐
74                loads. An example would be the virtual  switch  software  that
75                connects to all the virtual machines in the system.
76
77              · Guaranteed  group:  Workloads  allocating cache ways from this
78                group have their guaranteed  amount  of  desired  cache  ways.
79                Cache  ways  in  this  group are dedicated to their associated
80                workloads, not shared with others except the infra group.
81
82              · Best effort group: Workloads allocating cache ways  from  this
83                group  have their minimal amount of desired cache ways guaran‐
84                teed but can burst to their maximum amount  of  desired  cache
85                ways whenever possible. Cache ways in this group are also ded‐
86                icated to their associated workloads, not shared  with  others
87                except the infra group.
88
89              · Shared  group: Workloads allocating cache ways from the shared
90                group shares the whole amount of cache ways  assigned  to  the
91                group.
92
93
94
95       The  amount of cache ways for each of the above groups are configurable
96       in the RMD configuration file. Below diagram gives an example of a sys‐
97       tem of 11 cache ways.
98
99
100   Cache Specification
101       Please  refer to the API documentation ⟨docs/api/v1/swagger.yaml⟩ for a
102       comprehensive description of RMD APIs. Here is a brief depiction of how
103       to assign workloads to different aforementioned cache pools.
104
105
106       OS  group is the default group, so if no one explicitly moves a task or
107       workload to other group, then it stays in the OS group.
108
109
110       Tasks in the infra group are pre-configured in the configuration  file.
111       No API is provided to assign a task to the infra group dynamically.
112
113
114       End  users  make  their  cache requirements by specifying two values in
115       Cache section (max and min) associated with the workload:
116
117
118              · max == min > 0 nbsp;nbsp;nbsp;nbsp; ==> guaranteed group
119
120              · max > min > 0 nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; ==>  best  effort
121                group
122
123              · max == min == 0 nbsp;nbsp;nbsp;==> shared group
124
125
126

Architecture

128       From a logical point of view, there are several components of RMD:
129
130
131              · HTTPS  server -- provides mutual (client and server) authenti‐
132                cation and traffic encryption
133
134              · RESTFul API provider -- accepts and  sanitizes  user  require‐
135                ments
136
137              · Policy  engine  --  decides  whether to enforce or reject user
138                requirement based on system resource status
139
140              · Resctrl filesystem interface -- interacts with kernel  resctrl
141                interface to enforce user requirements
142
143
144
145       From  a physical point of view, RMD is composed of two processes -- the
146       front-end and the back-end. The splitting of RMD into two processes  is
147       of  security concerns. The front-end process which conducts most of the
148       jobs runs as a normal user  (least  privilege).  Whereas  the  back-end
149       process runs as a privileged user because it has to do modifications to
150       the resctrl file system. The back-end process is deliberately  kept  as
151       small/simple  as possible. Only add logic to the back-end when there is
152       definitely a need to lift privilege. The front-end and back-end  commu‐
153       nicates via an anonymous pipe.
154
155
156       For  more  information  on the design and architecture, please refer to
157       the developers guide ⟨docs/DeveloperQuickStart.md⟩
158
159

API Introduction

161       Please refer to the API documentation ⟨docs/api/v1/swagger.yaml⟩ for  a
162       comprehensive description of RMD APIs. This section provides the intro‐
163       duction and rationale of the API entry points.
164
165
166   /cache entry point
167       This entry point and its sub-categories are to get system cache  infor‐
168       mation. so only "GET" method is accepted by this entry point.
169
170
171   /workloads entry point
172       Through  the "/workloads" entry point you can specify a workload by CPU
173       IDs and/or task IDs. And specify the workload's demand of caches in one
174       of  two  ways.  The  first  way is to specify the Cache max/min" values
175       explicitly as aforementioned. The second way is to associate the  work‐
176       load  with one of the pre-defined "policies" (see below "/policy" entry
177       point). The pre-defined policies have pre-defined max/min  values  that
178       they are translated into.
179
180
181   /hospitality entry point
182       The  reason  behind  this  "/hospitality" entry point is that there are
183       often the needs to know how well a host can do  to  fulfill  a  certain
184       cache  allocation  requirement.  This  requirement  usually  comes from
185       scheduling in a large cluster deployment. So the notion of "hospitality
186       score" is introduced.
187
188
189       Why  can't  the  available  cache amount do the job? Currently the last
190       level cache in Intel platforms can only be allocated  contiguously.  So
191       the  totally  amount  of  available  last level cache won't help due to
192       fragmentation issues.
193
194
195       The hospitality score is calculated differently for workloads  of  dif‐
196       ferent  cache  groups.  (In below explanation 'value' means the largest
197       available contiguous cache ways in the corresponding group)
198
199
200              · guaranteed group: if value > max_cache then  return  100  else
201                return 0
202
203              · best  effort  group:  if  value > max_cache then return 100 if
204                min_cache < value < max_cache then return  (value/max)*100  if
205                value < min_cache then return 0
206
207              · shared   group:  return  100  if  current  workload  number  <
208                max_allowed_shared in shared group
209
210
211
212   /policy entry point
213       The "/policy" entry point contains the  pre-defined  recommended  cache
214       usage  values  for the specific platform that this RMD instance is run‐
215       ning. Though completely configurable, the default policies are  defined
216       as  "Gold/Sliver/Bronze" to classify different service levels. API user
217       can get policies and associate workloads with one of the policies.
218
219

Refereneces

221       Configuration guide ⟨docs/ConfigurationGuide.md⟩
222
223
224       API Documentation ⟨docs/api/v1/swagger.yaml⟩
225
226
227       Users guide ⟨docs/UserGuide.md⟩
228
229
230       Developers guide ⟨docs/DeveloperQuickStart.md⟩
231
232
233
234                                    Daemon                Resource(Management)