1SYSTEMD.RESOURCE-CONTROL(5)systemd.resource-controlSYSTEMD.RESOURCE-CONTROL(5)
2
3
4
6 systemd.resource-control - Resource control unit settings
7
9 slice.slice, scope.scope, service.service, socket.socket, mount.mount,
10 swap.swap
11
13 Unit configuration files for services, slices, scopes, sockets, mount
14 points, and swap devices share a subset of configuration options for
15 resource control of spawned processes. Internally, this relies on the
16 Linux Control Groups (cgroups) kernel concept for organizing processes
17 in a hierarchical tree of named groups for the purpose of resource
18 management.
19
20 This man page lists the configuration options shared by those six unit
21 types. See systemd.unit(5) for the common options of all unit
22 configuration files, and systemd.slice(5), systemd.scope(5),
23 systemd.service(5), systemd.socket(5), systemd.mount(5), and
24 systemd.swap(5) for more information on the specific unit configuration
25 files. The resource control configuration options are configured in the
26 [Slice], [Scope], [Service], [Socket], [Mount], or [Swap] sections,
27 depending on the unit type.
28
29 In addition, options which control resources available to programs
30 executed by systemd are listed in systemd.exec(5). Those options
31 complement options listed here.
32
33 See the New Control Group Interfaces[1] for an introduction on how to
34 make use of resource control APIs from programs.
35
37 The following dependencies are implicitly added:
38
39 · Units with the Slice= setting set automatically acquire Requires=
40 and After= dependencies on the specified slice unit.
41
43 The unified control group hierarchy is the new version of kernel
44 control group interface, see Control Groups v2[2]. Depending on the
45 resource type, there are differences in resource control capabilities.
46 Also, because of interface changes, some resource types have separate
47 set of options on the unified hierarchy.
48
49 CPU
50 CPUWeight= and StartupCPUWeight= replace CPUShares= and
51 StartupCPUShares=, respectively.
52
53 The "cpuacct" controller does not exist separately on the unified
54 hierarchy.
55
56 Memory
57 MemoryMax= replaces MemoryLimit=. MemoryLow= and MemoryHigh= are
58 effective only on unified hierarchy.
59
60 IO
61 "IO"-prefixed settings are a superset of and replace
62 "BlockIO"-prefixed ones. On unified hierarchy, IO resource control
63 also applies to buffered writes.
64
65 To ease the transition, there is best-effort translation between the
66 two versions of settings. For each controller, if any of the settings
67 for the unified hierarchy are present, all settings for the legacy
68 hierarchy are ignored. If the resulting settings are for the other type
69 of hierarchy, the configurations are translated before application.
70
71 Legacy control group hierarchy (see Control Groups version 1[3]), also
72 called cgroup-v1, doesn't allow safe delegation of controllers to
73 unprivileged processes. If the system uses the legacy control group
74 hierarchy, resource control is disabled for the systemd user instance,
75 see systemd(1).
76
78 Units of the types listed above can have settings for resource control
79 configuration:
80
81 CPUAccounting=
82 Turn on CPU usage accounting for this unit. Takes a boolean
83 argument. Note that turning on CPU accounting for one unit will
84 also implicitly turn it on for all units contained in the same
85 slice and for all its parent slices and the units contained
86 therein. The system default for this setting may be controlled with
87 DefaultCPUAccounting= in systemd-system.conf(5).
88
89 CPUWeight=weight, StartupCPUWeight=weight
90 Assign the specified CPU time weight to the processes executed, if
91 the unified control group hierarchy is used on the system. These
92 options take an integer value and control the "cpu.weight" control
93 group attribute. The allowed range is 1 to 10000. Defaults to 100.
94 For details about this control group attribute, see Control Groups
95 v2[2] and CFS Scheduler[4]. The available CPU time is split up
96 among all units within one slice relative to their CPU time weight.
97
98 While StartupCPUWeight= only applies to the startup phase of the
99 system, CPUWeight= applies to normal runtime of the system, and if
100 the former is not set also to the startup phase. Using
101 StartupCPUWeight= allows prioritizing specific services at boot-up
102 differently than during normal runtime.
103
104 These settings replace CPUShares= and StartupCPUShares=.
105
106 CPUQuota=
107 Assign the specified CPU time quota to the processes executed.
108 Takes a percentage value, suffixed with "%". The percentage
109 specifies how much CPU time the unit shall get at maximum, relative
110 to the total CPU time available on one CPU. Use values > 100% for
111 allotting CPU time on more than one CPU. This controls the
112 "cpu.max" attribute on the unified control group hierarchy and
113 "cpu.cfs_quota_us" on legacy. For details about these control group
114 attributes, see Control Groups v2[2] and sched-bwc.txt[5].
115
116 Example: CPUQuota=20% ensures that the executed processes will
117 never get more than 20% CPU time on one CPU.
118
119 CPUQuotaPeriodSec=
120 Assign the duration over which the CPU time quota specified by
121 CPUQuota= is measured. Takes a time duration value in seconds, with
122 an optional suffix such as "ms" for milliseconds (or "s" for
123 seconds.) The default setting is 100ms. The period is clamped to
124 the range supported by the kernel, which is [1ms, 1000ms].
125 Additionally, the period is adjusted up so that the quota interval
126 is also at least 1ms. Setting CPUQuotaPeriodSec= to an empty value
127 resets it to the default.
128
129 This controls the second field of "cpu.max" attribute on the
130 unified control group hierarchy and "cpu.cfs_period_us" on legacy.
131 For details about these control group attributes, see Control
132 Groups v2[2] and CFS Scheduler[4].
133
134 Example: CPUQuotaPeriodSec=10ms to request that the CPU quota is
135 measured in periods of 10ms.
136
137 AllowedCPUs=
138 Restrict processes to be executed on specific CPUs. Takes a list of
139 CPU indices or ranges separated by either whitespace or commas. CPU
140 ranges are specified by the lower and upper CPU indices separated
141 by a dash.
142
143 Setting AllowedCPUs= doesn't guarantee that all of the CPUs will be
144 used by the processes as it may be limited by parent units. The
145 effective configuration is reported as EffectiveCPUs=.
146
147 This setting is supported only with the unified control group
148 hierarchy.
149
150 AllowedMemoryNodes=
151 Restrict processes to be executed on specific memory NUMA nodes.
152 Takes a list of memory NUMA nodes indices or ranges separated by
153 either whitespace or commas. Memory NUMA nodes ranges are specified
154 by the lower and upper CPU indices separated by a dash.
155
156 Setting AllowedMemoryNodes= doesn't guarantee that all of the
157 memory NUMA nodes will be used by the processes as it may be
158 limited by parent units. The effective configuration is reported as
159 EffectiveMemoryNodes=.
160
161 This setting is supported only with the unified control group
162 hierarchy.
163
164 MemoryAccounting=
165 Turn on process and kernel memory accounting for this unit. Takes a
166 boolean argument. Note that turning on memory accounting for one
167 unit will also implicitly turn it on for all units contained in the
168 same slice and for all its parent slices and the units contained
169 therein. The system default for this setting may be controlled with
170 DefaultMemoryAccounting= in systemd-system.conf(5).
171
172 MemoryMin=bytes
173 Specify the memory usage protection of the executed processes in
174 this unit. If the memory usages of this unit and all its ancestors
175 are below their minimum boundaries, this unit's memory won't be
176 reclaimed.
177
178 Takes a memory size in bytes. If the value is suffixed with K, M, G
179 or T, the specified memory size is parsed as Kilobytes, Megabytes,
180 Gigabytes, or Terabytes (with the base 1024), respectively.
181 Alternatively, a percentage value may be specified, which is taken
182 relative to the installed physical memory on the system. If
183 assigned the special value "infinity", all available memory is
184 protected, which may be useful in order to always inherit all of
185 the protection afforded by ancestors. This controls the
186 "memory.min" control group attribute. For details about this
187 control group attribute, see Memory Interface Files[6].
188
189 This setting is supported only if the unified control group
190 hierarchy is used and disables MemoryLimit=.
191
192 Units may have their children use a default "memory.min" value by
193 specifying DefaultMemoryMin=, which has the same semantics as
194 MemoryMin=. This setting does not affect "memory.min" in the unit
195 itself.
196
197 MemoryLow=bytes
198 Specify the best-effort memory usage protection of the executed
199 processes in this unit. If the memory usages of this unit and all
200 its ancestors are below their low boundaries, this unit's memory
201 won't be reclaimed as long as memory can be reclaimed from
202 unprotected units.
203
204 Takes a memory size in bytes. If the value is suffixed with K, M, G
205 or T, the specified memory size is parsed as Kilobytes, Megabytes,
206 Gigabytes, or Terabytes (with the base 1024), respectively.
207 Alternatively, a percentage value may be specified, which is taken
208 relative to the installed physical memory on the system. If
209 assigned the special value "infinity", all available memory is
210 protected, which may be useful in order to always inherit all of
211 the protection afforded by ancestors. This controls the
212 "memory.low" control group attribute. For details about this
213 control group attribute, see Memory Interface Files[6].
214
215 This setting is supported only if the unified control group
216 hierarchy is used and disables MemoryLimit=.
217
218 Units may have their children use a default "memory.low" value by
219 specifying DefaultMemoryLow=, which has the same semantics as
220 MemoryLow=. This setting does not affect "memory.low" in the unit
221 itself.
222
223 MemoryHigh=bytes
224 Specify the throttling limit on memory usage of the executed
225 processes in this unit. Memory usage may go above the limit if
226 unavoidable, but the processes are heavily slowed down and memory
227 is taken away aggressively in such cases. This is the main
228 mechanism to control memory usage of a unit.
229
230 Takes a memory size in bytes. If the value is suffixed with K, M, G
231 or T, the specified memory size is parsed as Kilobytes, Megabytes,
232 Gigabytes, or Terabytes (with the base 1024), respectively.
233 Alternatively, a percentage value may be specified, which is taken
234 relative to the installed physical memory on the system. If
235 assigned the special value "infinity", no memory throttling is
236 applied. This controls the "memory.high" control group attribute.
237 For details about this control group attribute, see Memory
238 Interface Files[6].
239
240 This setting is supported only if the unified control group
241 hierarchy is used and disables MemoryLimit=.
242
243 MemoryMax=bytes
244 Specify the absolute limit on memory usage of the executed
245 processes in this unit. If memory usage cannot be contained under
246 the limit, out-of-memory killer is invoked inside the unit. It is
247 recommended to use MemoryHigh= as the main control mechanism and
248 use MemoryMax= as the last line of defense.
249
250 Takes a memory size in bytes. If the value is suffixed with K, M, G
251 or T, the specified memory size is parsed as Kilobytes, Megabytes,
252 Gigabytes, or Terabytes (with the base 1024), respectively.
253 Alternatively, a percentage value may be specified, which is taken
254 relative to the installed physical memory on the system. If
255 assigned the special value "infinity", no memory limit is applied.
256 This controls the "memory.max" control group attribute. For details
257 about this control group attribute, see Memory Interface Files[6].
258
259 This setting replaces MemoryLimit=.
260
261 MemorySwapMax=bytes
262 Specify the absolute limit on swap usage of the executed processes
263 in this unit.
264
265 Takes a swap size in bytes. If the value is suffixed with K, M, G
266 or T, the specified swap size is parsed as Kilobytes, Megabytes,
267 Gigabytes, or Terabytes (with the base 1024), respectively. If
268 assigned the special value "infinity", no swap limit is applied.
269 This controls the "memory.swap.max" control group attribute. For
270 details about this control group attribute, see Memory Interface
271 Files[6].
272
273 This setting is supported only if the unified control group
274 hierarchy is used and disables MemoryLimit=.
275
276 TasksAccounting=
277 Turn on task accounting for this unit. Takes a boolean argument. If
278 enabled, the system manager will keep track of the number of tasks
279 in the unit. The number of tasks accounted this way includes both
280 kernel threads and userspace processes, with each thread counting
281 individually. Note that turning on tasks accounting for one unit
282 will also implicitly turn it on for all units contained in the same
283 slice and for all its parent slices and the units contained
284 therein. The system default for this setting may be controlled with
285 DefaultTasksAccounting= in systemd-system.conf(5).
286
287 TasksMax=N
288 Specify the maximum number of tasks that may be created in the
289 unit. This ensures that the number of tasks accounted for the unit
290 (see above) stays below a specific limit. This either takes an
291 absolute number of tasks or a percentage value that is taken
292 relative to the configured maximum number of tasks on the system.
293 If assigned the special value "infinity", no tasks limit is
294 applied. This controls the "pids.max" control group attribute. For
295 details about this control group attribute, see Process Number
296 Controller[7].
297
298 The system default for this setting may be controlled with
299 DefaultTasksMax= in systemd-system.conf(5).
300
301 IOAccounting=
302 Turn on Block I/O accounting for this unit, if the unified control
303 group hierarchy is used on the system. Takes a boolean argument.
304 Note that turning on block I/O accounting for one unit will also
305 implicitly turn it on for all units contained in the same slice and
306 all for its parent slices and the units contained therein. The
307 system default for this setting may be controlled with
308 DefaultIOAccounting= in systemd-system.conf(5).
309
310 This setting replaces BlockIOAccounting= and disables settings
311 prefixed with BlockIO or StartupBlockIO.
312
313 IOWeight=weight, StartupIOWeight=weight
314 Set the default overall block I/O weight for the executed
315 processes, if the unified control group hierarchy is used on the
316 system. Takes a single weight value (between 1 and 10000) to set
317 the default block I/O weight. This controls the "io.weight" control
318 group attribute, which defaults to 100. For details about this
319 control group attribute, see IO Interface Files[8]. The available
320 I/O bandwidth is split up among all units within one slice relative
321 to their block I/O weight.
322
323 While StartupIOWeight= only applies to the startup phase of the
324 system, IOWeight= applies to the later runtime of the system, and
325 if the former is not set also to the startup phase. This allows
326 prioritizing specific services at boot-up differently than during
327 runtime.
328
329 These settings replace BlockIOWeight= and StartupBlockIOWeight= and
330 disable settings prefixed with BlockIO or StartupBlockIO.
331
332 IODeviceWeight=device weight
333 Set the per-device overall block I/O weight for the executed
334 processes, if the unified control group hierarchy is used on the
335 system. Takes a space-separated pair of a file path and a weight
336 value to specify the device specific weight value, between 1 and
337 10000. (Example: "/dev/sda 1000"). The file path may be specified
338 as path to a block device node or as any other file, in which case
339 the backing block device of the file system of the file is
340 determined. This controls the "io.weight" control group attribute,
341 which defaults to 100. Use this option multiple times to set
342 weights for multiple devices. For details about this control group
343 attribute, see IO Interface Files[8].
344
345 This setting replaces BlockIODeviceWeight= and disables settings
346 prefixed with BlockIO or StartupBlockIO.
347
348 The specified device node should reference a block device that has
349 an I/O scheduler associated, i.e. should not refer to partition or
350 loopback block devices, but to the originating, physical device.
351 When a path to a regular file or directory is specified it is
352 attempted to discover the correct originating device backing the
353 file system of the specified path. This works correctly only for
354 simpler cases, where the file system is directly placed on a
355 partition or physical block device, or where simple 1:1 encryption
356 using dm-crypt/LUKS is used. This discovery does not cover complex
357 storage and in particular RAID and volume management storage
358 devices.
359
360 IOReadBandwidthMax=device bytes, IOWriteBandwidthMax=device bytes
361 Set the per-device overall block I/O bandwidth maximum limit for
362 the executed processes, if the unified control group hierarchy is
363 used on the system. This limit is not work-conserving and the
364 executed processes are not allowed to use more even if the device
365 has idle capacity. Takes a space-separated pair of a file path and
366 a bandwidth value (in bytes per second) to specify the device
367 specific bandwidth. The file path may be a path to a block device
368 node, or as any other file in which case the backing block device
369 of the file system of the file is used. If the bandwidth is
370 suffixed with K, M, G, or T, the specified bandwidth is parsed as
371 Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
372 base of 1000. (Example:
373 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
374 controls the "io.max" control group attributes. Use this option
375 multiple times to set bandwidth limits for multiple devices. For
376 details about this control group attribute, see IO Interface
377 Files[8].
378
379 These settings replace BlockIOReadBandwidth= and
380 BlockIOWriteBandwidth= and disable settings prefixed with BlockIO
381 or StartupBlockIO.
382
383 Similar restrictions on block device discovery as for
384 IODeviceWeight= apply, see above.
385
386 IOReadIOPSMax=device IOPS, IOWriteIOPSMax=device IOPS
387 Set the per-device overall block I/O IOs-Per-Second maximum limit
388 for the executed processes, if the unified control group hierarchy
389 is used on the system. This limit is not work-conserving and the
390 executed processes are not allowed to use more even if the device
391 has idle capacity. Takes a space-separated pair of a file path and
392 an IOPS value to specify the device specific IOPS. The file path
393 may be a path to a block device node, or as any other file in which
394 case the backing block device of the file system of the file is
395 used. If the IOPS is suffixed with K, M, G, or T, the specified
396 IOPS is parsed as KiloIOPS, MegaIOPS, GigaIOPS, or TeraIOPS,
397 respectively, to the base of 1000. (Example:
398 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This
399 controls the "io.max" control group attributes. Use this option
400 multiple times to set IOPS limits for multiple devices. For details
401 about this control group attribute, see IO Interface Files[8].
402
403 These settings are supported only if the unified control group
404 hierarchy is used and disable settings prefixed with BlockIO or
405 StartupBlockIO.
406
407 Similar restrictions on block device discovery as for
408 IODeviceWeight= apply, see above.
409
410 IODeviceLatencyTargetSec=device target
411 Set the per-device average target I/O latency for the executed
412 processes, if the unified control group hierarchy is used on the
413 system. Takes a file path and a timespan separated by a space to
414 specify the device specific latency target. (Example: "/dev/sda
415 25ms"). The file path may be specified as path to a block device
416 node or as any other file, in which case the backing block device
417 of the file system of the file is determined. This controls the
418 "io.latency" control group attribute. Use this option multiple
419 times to set latency target for multiple devices. For details about
420 this control group attribute, see IO Interface Files[8].
421
422 Implies "IOAccounting=yes".
423
424 These settings are supported only if the unified control group
425 hierarchy is used.
426
427 Similar restrictions on block device discovery as for
428 IODeviceWeight= apply, see above.
429
430 IPAccounting=
431 Takes a boolean argument. If true, turns on IPv4 and IPv6 network
432 traffic accounting for packets sent or received by the unit. When
433 this option is turned on, all IPv4 and IPv6 sockets created by any
434 process of the unit are accounted for.
435
436 When this option is used in socket units, it applies to all IPv4
437 and IPv6 sockets associated with it (including both listening and
438 connection sockets where this applies). Note that for
439 socket-activated services, this configuration setting and the
440 accounting data of the service unit and the socket unit are kept
441 separate, and displayed separately. No propagation of the setting
442 and the collected statistics is done, in either direction.
443 Moreover, any traffic sent or received on any of the socket unit's
444 sockets is accounted to the socket unit — and never to the service
445 unit it might have activated, even if the socket is used by it.
446
447 The system default for this setting may be controlled with
448 DefaultIPAccounting= in systemd-system.conf(5).
449
450 IPAddressAllow=ADDRESS[/PREFIXLENGTH]...,
451 IPAddressDeny=ADDRESS[/PREFIXLENGTH]...
452 Turn on address range network traffic filtering for IP packets sent
453 and received over AF_INET and AF_INET6 sockets. Both directives
454 take a space separated list of IPv4 or IPv6 addresses, each
455 optionally suffixed with an address prefix length in bits
456 (separated by a "/" character). If the latter is omitted, the
457 address is considered a host address, i.e. the prefix covers the
458 whole address (32 for IPv4, 128 for IPv6).
459
460 The access lists configured with this option are applied to all
461 sockets created by processes of this unit (or in the case of socket
462 units, associated with it). The lists are implicitly combined with
463 any lists configured for any of the parent slice units this unit
464 might be a member of. By default all access lists are empty. Both
465 ingress and egress traffic is filtered by these settings. In case
466 of ingress traffic the source IP address is checked against these
467 access lists, in case of egress traffic the destination IP address
468 is checked. When configured the lists are enforced as follows:
469
470 · Access will be granted in case an IP packet's
471 destination/source address matches any entry in the
472 IPAddressAllow= setting.
473
474 · Otherwise, access will be denied in case its destination/source
475 address matches any entry in the IPAddressDeny= setting.
476
477 · Otherwise, access will be granted.
478
479 In order to implement a whitelisting IP firewall, it is recommended
480 to use a IPAddressDeny=any setting on an upper-level slice unit
481 (such as the root slice -.slice or the slice containing all system
482 services system.slice – see systemd.special(7) for details on these
483 slice units), plus individual per-service IPAddressAllow= lines
484 permitting network access to relevant services, and only them.
485
486 Note that for socket-activated services, the IP access list
487 configured on the socket unit applies to all sockets associated
488 with it directly, but not to any sockets created by the ultimately
489 activated services for it. Conversely, the IP access list
490 configured for the service is not applied to any sockets passed
491 into the service via socket activation. Thus, it is usually a good
492 idea, to replicate the IP access lists on both the socket and the
493 service unit, however it often makes sense to maintain one list
494 more open and the other one more restricted, depending on the
495 usecase.
496
497 If these settings are used multiple times in the same unit the
498 specified lists are combined. If an empty string is assigned to
499 these settings the specific access list is reset and all previous
500 settings undone.
501
502 In place of explicit IPv4 or IPv6 address and prefix length
503 specifications a small set of symbolic names may be used. The
504 following names are defined:
505
506 Table 1. Special address/network names
507 ┌──────────────┬─────────────────────┬─────────────────────┐
508 │Symbolic Name │ Definition │ Meaning │
509 ├──────────────┼─────────────────────┼─────────────────────┤
510 │any │ 0.0.0.0/0 ::/0 │ Any host │
511 ├──────────────┼─────────────────────┼─────────────────────┤
512 │localhost │ 127.0.0.0/8 ::1/128 │ All addresses on │
513 │ │ │ the local loopback │
514 ├──────────────┼─────────────────────┼─────────────────────┤
515 │link-local │ 169.254.0.0/16 │ All link-local IP │
516 │ │ fe80::/64 │ addresses │
517 ├──────────────┼─────────────────────┼─────────────────────┤
518 │multicast │ 224.0.0.0/4 │ All IP multicasting │
519 │ │ ff00::/8 │ addresses │
520 └──────────────┴─────────────────────┴─────────────────────┘
521 Note that these settings might not be supported on some systems
522 (for example if eBPF control group support is not enabled in the
523 underlying kernel or container manager). These settings will have
524 no effect in that case. If compatibility with such systems is
525 desired it is hence recommended to not exclusively rely on them for
526 IP security.
527
528 IPIngressFilterPath=BPF_FS_PROGRAMM_PATH,
529 IPEgressFilterPath=BPF_FS_PROGRAMM_PATH
530 Add custom network traffic filters implemented as BPF programs,
531 applying to all IP packets sent and received over AF_INET and
532 AF_INET6 sockets. Takes an absolute path to a pinned BPF program in
533 the BPF virtual filesystem (/sys/fs/bpf/).
534
535 The filters configured with this option are applied to all sockets
536 created by processes of this unit (or in the case of socket units,
537 associated with it). The filters are loaded in addition to filters
538 any of the parent slice units this unit might be a member of as
539 well as any IPAddressAllow= and IPAddressDeny= filters in any of
540 these units. By default there are no filters specified.
541
542 If these settings are used multiple times in the same unit all the
543 specified programs are attached. If an empty string is assigned to
544 these settings the program list is reset and all previous specified
545 programs ignored.
546
547 Note that for socket-activated services, the IP filter programs
548 configured on the socket unit apply to all sockets associated with
549 it directly, but not to any sockets created by the ultimately
550 activated services for it. Conversely, the IP filter programs
551 configured for the service are not applied to any sockets passed
552 into the service via socket activation. Thus, it is usually a good
553 idea, to replicate the IP filter programs on both the socket and
554 the service unit, however it often makes sense to maintain one
555 configuration more open and the other one more restricted,
556 depending on the usecase.
557
558 Note that these settings might not be supported on some systems
559 (for example if eBPF control group support is not enabled in the
560 underlying kernel or container manager). These settings will fail
561 the service in that case. If compatibility with such systems is
562 desired it is hence recommended to attach your filter manually
563 (requires Delegate=yes) instead of using this setting.
564
565 DeviceAllow=
566 Control access to specific device nodes by the executed processes.
567 Takes two space-separated strings: a device node specifier followed
568 by a combination of r, w, m to control reading, writing, or
569 creation of the specific device node(s) by the unit (mknod),
570 respectively. On cgroup-v1 this controls the "devices.allow"
571 control group attribute. For details about this control group
572 attribute, see Device Whitelist Controller[9]. In the unified
573 cgroup hierarchy this functionality is implemented using eBPF
574 filtering.
575
576 The device node specifier is either a path to a device node in the
577 file system, starting with /dev/, or a string starting with either
578 "char-" or "block-" followed by a device group name, as listed in
579 /proc/devices. The latter is useful to whitelist all current and
580 future devices belonging to a specific device group at once. The
581 device group is matched according to filename globbing rules, you
582 may hence use the "*" and "?" wildcards. (Note that such globbing
583 wildcards are not available for device node path specifications!)
584 In order to match device nodes by numeric major/minor, use device
585 node paths in the /dev/char/ and /dev/block/ directories. However,
586 matching devices by major/minor is generally not recommended as
587 assignments are neither stable nor portable between systems or
588 different kernel versions.
589
590 Examples: /dev/sda5 is a path to a device node, referring to an ATA
591 or SCSI block device. "char-pts" and "char-alsa" are specifiers
592 for all pseudo TTYs and all ALSA sound devices, respectively.
593 "char-cpu/*" is a specifier matching all CPU related device groups.
594
595 Note that whitelists defined this way should only reference device
596 groups which are resolvable at the time the unit is started. Any
597 device groups not resolvable then are not added to the device
598 whitelist. In order to work around this limitation, consider
599 extending service units with a pair of After=modprobe@xyz.service
600 and Wants=modprobe@xyz.service lines that load the necessary kernel
601 module implementing the device group if missing. Example:
602
603 ...
604 [Unit]
605 Wants=modprobe@loop.service
606 After=modprobe@loop.service
607
608 [Service]
609 DeviceAllow=block-loop
610 DeviceAllow=/dev/loop-control
611 ...
612
613 DevicePolicy=auto|closed|strict
614 Control the policy for allowing device access:
615
616 strict
617 means to only allow types of access that are explicitly
618 specified.
619
620 closed
621 in addition, allows access to standard pseudo devices including
622 /dev/null, /dev/zero, /dev/full, /dev/random, and /dev/urandom.
623
624 auto
625 in addition, allows access to all devices if no explicit
626 DeviceAllow= is present. This is the default.
627
628 Slice=
629 The name of the slice unit to place the unit in. Defaults to
630 system.slice for all non-instantiated units of all unit types
631 (except for slice units themselves see below). Instance units are
632 by default placed in a subslice of system.slice that is named after
633 the template name.
634
635 This option may be used to arrange systemd units in a hierarchy of
636 slices each of which might have resource settings applied.
637
638 For units of type slice, the only accepted value for this setting
639 is the parent slice. Since the name of a slice unit implies the
640 parent slice, it is hence redundant to ever set this parameter
641 directly for slice units.
642
643 Special care should be taken when relying on the default slice
644 assignment in templated service units that have
645 DefaultDependencies=no set, see systemd.service(5), section
646 "Default Dependencies" for details.
647
648 Delegate=
649 Turns on delegation of further resource control partitioning to
650 processes of the unit. Units where this is enabled may create and
651 manage their own private subhierarchy of control groups below the
652 control group of the unit itself. For unprivileged services (i.e.
653 those using the User= setting) the unit's control group will be
654 made accessible to the relevant user. When enabled the service
655 manager will refrain from manipulating control groups or moving
656 processes below the unit's control group, so that a clear concept
657 of ownership is established: the control group tree above the
658 unit's control group (i.e. towards the root control group) is owned
659 and managed by the service manager of the host, while the control
660 group tree below the unit's control group is owned and managed by
661 the unit itself. Takes either a boolean argument or a list of
662 control group controller names. If true, delegation is turned on,
663 and all supported controllers are enabled for the unit, making them
664 available to the unit's processes for management. If false,
665 delegation is turned off entirely (and no additional controllers
666 are enabled). If set to a list of controllers, delegation is turned
667 on, and the specified controllers are enabled for the unit. Note
668 that additional controllers than the ones specified might be made
669 available as well, depending on configuration of the containing
670 slice unit or other units contained in it. Note that assigning the
671 empty string will enable delegation, but reset the list of
672 controllers, all assignments prior to this will have no effect.
673 Defaults to false.
674
675 Note that controller delegation to less privileged code is only
676 safe on the unified control group hierarchy. Accordingly, access to
677 the specified controllers will not be granted to unprivileged
678 services on the legacy hierarchy, even when requested.
679
680 The following controller names may be specified: cpu, cpuacct,
681 cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
682 bpf-devices.
683
684 Not all of these controllers are available on all kernels however,
685 and some are specific to the unified hierarchy while others are
686 specific to the legacy hierarchy. Also note that the kernel might
687 support further controllers, which aren't covered here yet as
688 delegation is either not supported at all for them or not defined
689 cleanly.
690
691 For further details on the delegation model consult Control Group
692 APIs and Delegation[10].
693
694 DisableControllers=
695 Disables controllers from being enabled for a unit's children. If a
696 controller listed is already in use in its subtree, the controller
697 will be removed from the subtree. This can be used to avoid child
698 units being able to implicitly or explicitly enable a controller.
699 Defaults to not disabling any controllers.
700
701 It may not be possible to successfully disable a controller if the
702 unit or any child of the unit in question delegates controllers to
703 its children, as any delegated subtree of the cgroup hierarchy is
704 unmanaged by systemd.
705
706 Multiple controllers may be specified, separated by spaces. You may
707 also pass DisableControllers= multiple times, in which case each
708 new instance adds another controller to disable. Passing
709 DisableControllers= by itself with no controller name present
710 resets the disabled controller list.
711
712 The following controller names may be specified: cpu, cpuacct,
713 cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
714 bpf-devices.
715
717 The following options are deprecated. Use the indicated superseding
718 options instead:
719
720 CPUShares=weight, StartupCPUShares=weight
721 Assign the specified CPU time share weight to the processes
722 executed. These options take an integer value and control the
723 "cpu.shares" control group attribute. The allowed range is 2 to
724 262144. Defaults to 1024. For details about this control group
725 attribute, see CFS Scheduler[4]. The available CPU time is split up
726 among all units within one slice relative to their CPU time share
727 weight.
728
729 While StartupCPUShares= only applies to the startup phase of the
730 system, CPUShares= applies to normal runtime of the system, and if
731 the former is not set also to the startup phase. Using
732 StartupCPUShares= allows prioritizing specific services at boot-up
733 differently than during normal runtime.
734
735 Implies "CPUAccounting=yes".
736
737 These settings are deprecated. Use CPUWeight= and StartupCPUWeight=
738 instead.
739
740 MemoryLimit=bytes
741 Specify the limit on maximum memory usage of the executed
742 processes. The limit specifies how much process and kernel memory
743 can be used by tasks in this unit. Takes a memory size in bytes. If
744 the value is suffixed with K, M, G or T, the specified memory size
745 is parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with
746 the base 1024), respectively. Alternatively, a percentage value may
747 be specified, which is taken relative to the installed physical
748 memory on the system. If assigned the special value "infinity", no
749 memory limit is applied. This controls the "memory.limit_in_bytes"
750 control group attribute. For details about this control group
751 attribute, see Memory Resource Controller[11].
752
753 Implies "MemoryAccounting=yes".
754
755 This setting is deprecated. Use MemoryMax= instead.
756
757 BlockIOAccounting=
758 Turn on Block I/O accounting for this unit, if the legacy control
759 group hierarchy is used on the system. Takes a boolean argument.
760 Note that turning on block I/O accounting for one unit will also
761 implicitly turn it on for all units contained in the same slice and
762 all for its parent slices and the units contained therein. The
763 system default for this setting may be controlled with
764 DefaultBlockIOAccounting= in systemd-system.conf(5).
765
766 This setting is deprecated. Use IOAccounting= instead.
767
768 BlockIOWeight=weight, StartupBlockIOWeight=weight
769 Set the default overall block I/O weight for the executed
770 processes, if the legacy control group hierarchy is used on the
771 system. Takes a single weight value (between 10 and 1000) to set
772 the default block I/O weight. This controls the "blkio.weight"
773 control group attribute, which defaults to 500. For details about
774 this control group attribute, see Block IO Controller[12]. The
775 available I/O bandwidth is split up among all units within one
776 slice relative to their block I/O weight.
777
778 While StartupBlockIOWeight= only applies to the startup phase of
779 the system, BlockIOWeight= applies to the later runtime of the
780 system, and if the former is not set also to the startup phase.
781 This allows prioritizing specific services at boot-up differently
782 than during runtime.
783
784 Implies "BlockIOAccounting=yes".
785
786 These settings are deprecated. Use IOWeight= and StartupIOWeight=
787 instead.
788
789 BlockIODeviceWeight=device weight
790 Set the per-device overall block I/O weight for the executed
791 processes, if the legacy control group hierarchy is used on the
792 system. Takes a space-separated pair of a file path and a weight
793 value to specify the device specific weight value, between 10 and
794 1000. (Example: "/dev/sda 500"). The file path may be specified as
795 path to a block device node or as any other file, in which case the
796 backing block device of the file system of the file is determined.
797 This controls the "blkio.weight_device" control group attribute,
798 which defaults to 1000. Use this option multiple times to set
799 weights for multiple devices. For details about this control group
800 attribute, see Block IO Controller[12].
801
802 Implies "BlockIOAccounting=yes".
803
804 This setting is deprecated. Use IODeviceWeight= instead.
805
806 BlockIOReadBandwidth=device bytes, BlockIOWriteBandwidth=device bytes
807 Set the per-device overall block I/O bandwidth limit for the
808 executed processes, if the legacy control group hierarchy is used
809 on the system. Takes a space-separated pair of a file path and a
810 bandwidth value (in bytes per second) to specify the device
811 specific bandwidth. The file path may be a path to a block device
812 node, or as any other file in which case the backing block device
813 of the file system of the file is used. If the bandwidth is
814 suffixed with K, M, G, or T, the specified bandwidth is parsed as
815 Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
816 base of 1000. (Example:
817 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
818 controls the "blkio.throttle.read_bps_device" and
819 "blkio.throttle.write_bps_device" control group attributes. Use
820 this option multiple times to set bandwidth limits for multiple
821 devices. For details about these control group attributes, see
822 Block IO Controller[12].
823
824 Implies "BlockIOAccounting=yes".
825
826 These settings are deprecated. Use IOReadBandwidthMax= and
827 IOWriteBandwidthMax= instead.
828
830 systemd(1), systemd-system.conf(5), systemd.unit(5),
831 systemd.service(5), systemd.slice(5), systemd.scope(5),
832 systemd.socket(5), systemd.mount(5), systemd.swap(5), systemd.exec(5),
833 systemd.directives(7), systemd.special(7), The documentation for
834 control groups and specific controllers in the Linux kernel: Control
835 Groups v2[2].
836
838 1. New Control Group Interfaces
839 https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
840
841 2. Control Groups v2
842 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
843
844 3. Control Groups version 1
845 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/
846
847 4. CFS Scheduler
848 https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html
849
850 5. sched-bwc.txt
851 https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
852
853 6. Memory Interface Files
854 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
855
856 7. Process Number Controller
857 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/pids.html
858
859 8. IO Interface Files
860 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#io-interface-files
861
862 9. Device Whitelist Controller
863 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/devices.html
864
865 10. Control Group APIs and Delegation
866 https://systemd.io/CGROUP_DELEGATION
867
868 11. Memory Resource Controller
869 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html
870
871 12. Block IO Controller
872 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/blkio-controller.html
873
874
875
876systemd 245 SYSTEMD.RESOURCE-CONTROL(5)