1SYSTEMD.RESOURCE-CONTROL(5)systemd.resource-controlSYSTEMD.RESOURCE-CONTROL(5)
2
3
4
6 systemd.resource-control - Resource control unit settings
7
9 slice.slice, scope.scope, service.service, socket.socket, mount.mount,
10 swap.swap
11
13 Unit configuration files for services, slices, scopes, sockets, mount
14 points, and swap devices share a subset of configuration options for
15 resource control of spawned processes. Internally, this relies on the
16 Linux Control Groups (cgroups) kernel concept for organizing processes
17 in a hierarchical tree of named groups for the purpose of resource
18 management.
19
20 This man page lists the configuration options shared by those six unit
21 types. See systemd.unit(5) for the common options of all unit
22 configuration files, and systemd.slice(5), systemd.scope(5),
23 systemd.service(5), systemd.socket(5), systemd.mount(5), and
24 systemd.swap(5) for more information on the specific unit configuration
25 files. The resource control configuration options are configured in the
26 [Slice], [Scope], [Service], [Socket], [Mount], or [Swap] sections,
27 depending on the unit type.
28
29 In addition, options which control resources available to programs
30 executed by systemd are listed in systemd.exec(5). Those options
31 complement options listed here.
32
33 See the New Control Group Interfaces[1] for an introduction on how to
34 make use of resource control APIs from programs.
35
36 Setting resource controls for a group of related units
37 As described in systemd.unit(5), the settings listed here may be set
38 through the main file of a unit and drop-in snippets in *.d/
39 directories. The list of directories searched for drop-ins includes
40 names formed by repeatedly truncating the unit name after all dashes.
41 This is particularly convenient to set resource limits for a group of
42 units with similar names.
43
44 For example, every user gets their own slice user-nnn.slice. Drop-ins
45 with local configuration that affect user 1000 may be placed in
46 /etc/systemd/system/user-1000.slice,
47 /etc/systemd/system/user-1000.slice.d/*.conf, but also
48 /etc/systemd/system/user-.slice.d/*.conf. This last directory applies
49 to all user slices.
50
52 The following dependencies are implicitly added:
53
54 • Units with the Slice= setting set automatically acquire Requires=
55 and After= dependencies on the specified slice unit.
56
58 The unified control group hierarchy is the new version of kernel
59 control group interface, see Control Groups v2[2]. Depending on the
60 resource type, there are differences in resource control capabilities.
61 Also, because of interface changes, some resource types have separate
62 set of options on the unified hierarchy.
63
64 CPU
65 CPUWeight= and StartupCPUWeight= replace CPUShares= and
66 StartupCPUShares=, respectively.
67
68 The "cpuacct" controller does not exist separately on the unified
69 hierarchy.
70
71 Memory
72 MemoryMax= replaces MemoryLimit=. MemoryLow= and MemoryHigh= are
73 effective only on unified hierarchy.
74
75 IO
76 "IO"-prefixed settings are a superset of and replace
77 "BlockIO"-prefixed ones. On unified hierarchy, IO resource control
78 also applies to buffered writes.
79
80 To ease the transition, there is best-effort translation between the
81 two versions of settings. For each controller, if any of the settings
82 for the unified hierarchy are present, all settings for the legacy
83 hierarchy are ignored. If the resulting settings are for the other type
84 of hierarchy, the configurations are translated before application.
85
86 Legacy control group hierarchy (see Control Groups version 1[3]), also
87 called cgroup-v1, doesn't allow safe delegation of controllers to
88 unprivileged processes. If the system uses the legacy control group
89 hierarchy, resource control is disabled for the systemd user instance,
90 see systemd(1).
91
93 Units of the types listed above can have settings for resource control
94 configuration:
95
96 CPUAccounting=
97 Turn on CPU usage accounting for this unit. Takes a boolean
98 argument. Note that turning on CPU accounting for one unit will
99 also implicitly turn it on for all units contained in the same
100 slice and for all its parent slices and the units contained
101 therein. The system default for this setting may be controlled with
102 DefaultCPUAccounting= in systemd-system.conf(5).
103
104 CPUWeight=weight, StartupCPUWeight=weight
105 Assign the specified CPU time weight to the processes executed, if
106 the unified control group hierarchy is used on the system. These
107 options take an integer value and control the "cpu.weight" control
108 group attribute. The allowed range is 1 to 10000. Defaults to 100.
109 For details about this control group attribute, see Control Groups
110 v2[2] and CFS Scheduler[4]. The available CPU time is split up
111 among all units within one slice relative to their CPU time weight.
112
113 While StartupCPUWeight= only applies to the startup phase of the
114 system, CPUWeight= applies to normal runtime of the system, and if
115 the former is not set also to the startup phase. Using
116 StartupCPUWeight= allows prioritizing specific services at boot-up
117 differently than during normal runtime.
118
119 These settings replace CPUShares= and StartupCPUShares=.
120
121 CPUQuota=
122 Assign the specified CPU time quota to the processes executed.
123 Takes a percentage value, suffixed with "%". The percentage
124 specifies how much CPU time the unit shall get at maximum, relative
125 to the total CPU time available on one CPU. Use values > 100% for
126 allotting CPU time on more than one CPU. This controls the
127 "cpu.max" attribute on the unified control group hierarchy and
128 "cpu.cfs_quota_us" on legacy. For details about these control group
129 attributes, see Control Groups v2[2] and sched-bwc.txt[5].
130
131 Example: CPUQuota=20% ensures that the executed processes will
132 never get more than 20% CPU time on one CPU.
133
134 CPUQuotaPeriodSec=
135 Assign the duration over which the CPU time quota specified by
136 CPUQuota= is measured. Takes a time duration value in seconds, with
137 an optional suffix such as "ms" for milliseconds (or "s" for
138 seconds.) The default setting is 100ms. The period is clamped to
139 the range supported by the kernel, which is [1ms, 1000ms].
140 Additionally, the period is adjusted up so that the quota interval
141 is also at least 1ms. Setting CPUQuotaPeriodSec= to an empty value
142 resets it to the default.
143
144 This controls the second field of "cpu.max" attribute on the
145 unified control group hierarchy and "cpu.cfs_period_us" on legacy.
146 For details about these control group attributes, see Control
147 Groups v2[2] and CFS Scheduler[4].
148
149 Example: CPUQuotaPeriodSec=10ms to request that the CPU quota is
150 measured in periods of 10ms.
151
152 AllowedCPUs=
153 Restrict processes to be executed on specific CPUs. Takes a list of
154 CPU indices or ranges separated by either whitespace or commas. CPU
155 ranges are specified by the lower and upper CPU indices separated
156 by a dash.
157
158 Setting AllowedCPUs= doesn't guarantee that all of the CPUs will be
159 used by the processes as it may be limited by parent units. The
160 effective configuration is reported as EffectiveCPUs=.
161
162 This setting is supported only with the unified control group
163 hierarchy.
164
165 AllowedMemoryNodes=
166 Restrict processes to be executed on specific memory NUMA nodes.
167 Takes a list of memory NUMA nodes indices or ranges separated by
168 either whitespace or commas. Memory NUMA nodes ranges are specified
169 by the lower and upper NUMA nodes indices separated by a dash.
170
171 Setting AllowedMemoryNodes= doesn't guarantee that all of the
172 memory NUMA nodes will be used by the processes as it may be
173 limited by parent units. The effective configuration is reported as
174 EffectiveMemoryNodes=.
175
176 This setting is supported only with the unified control group
177 hierarchy.
178
179 MemoryAccounting=
180 Turn on process and kernel memory accounting for this unit. Takes a
181 boolean argument. Note that turning on memory accounting for one
182 unit will also implicitly turn it on for all units contained in the
183 same slice and for all its parent slices and the units contained
184 therein. The system default for this setting may be controlled with
185 DefaultMemoryAccounting= in systemd-system.conf(5).
186
187 MemoryMin=bytes, MemoryLow=bytes
188 Specify the memory usage protection of the executed processes in
189 this unit. When reclaiming memory, the unit is treated as if it was
190 using less memory resulting in memory to be preferentially
191 reclaimed from unprotected units. Using MemoryLow= results in a
192 weaker protection where memory may still be reclaimed to avoid
193 invoking the OOM killer in case there is no other reclaimable
194 memory.
195
196 For a protection to be effective, it is generally required to set a
197 corresponding allocation on all ancestors, which is then
198 distributed between children (with the exception of the root
199 slice). Any MemoryMin= or MemoryLow= allocation that is not
200 explicitly distributed to specific children is used to create a
201 shared protection for all children. As this is a shared protection,
202 the children will freely compete for the memory.
203
204 Takes a memory size in bytes. If the value is suffixed with K, M, G
205 or T, the specified memory size is parsed as Kilobytes, Megabytes,
206 Gigabytes, or Terabytes (with the base 1024), respectively.
207 Alternatively, a percentage value may be specified, which is taken
208 relative to the installed physical memory on the system. If
209 assigned the special value "infinity", all available memory is
210 protected, which may be useful in order to always inherit all of
211 the protection afforded by ancestors. This controls the
212 "memory.min" or "memory.low" control group attribute. For details
213 about this control group attribute, see Memory Interface Files[6].
214
215 This setting is supported only if the unified control group
216 hierarchy is used and disables MemoryLimit=.
217
218 Units may have their children use a default "memory.min" or
219 "memory.low" value by specifying DefaultMemoryMin= or
220 DefaultMemoryLow=, which has the same semantics as MemoryMin= and
221 MemoryLow=. This setting does not affect "memory.min" or
222 "memory.low" in the unit itself. Using it to set a default child
223 allocation is only useful on kernels older than 5.7, which do not
224 support the "memory_recursiveprot" cgroup2 mount option.
225
226 MemoryHigh=bytes
227 Specify the throttling limit on memory usage of the executed
228 processes in this unit. Memory usage may go above the limit if
229 unavoidable, but the processes are heavily slowed down and memory
230 is taken away aggressively in such cases. This is the main
231 mechanism to control memory usage of a unit.
232
233 Takes a memory size in bytes. If the value is suffixed with K, M, G
234 or T, the specified memory size is parsed as Kilobytes, Megabytes,
235 Gigabytes, or Terabytes (with the base 1024), respectively.
236 Alternatively, a percentage value may be specified, which is taken
237 relative to the installed physical memory on the system. If
238 assigned the special value "infinity", no memory throttling is
239 applied. This controls the "memory.high" control group attribute.
240 For details about this control group attribute, see Memory
241 Interface Files[6].
242
243 This setting is supported only if the unified control group
244 hierarchy is used and disables MemoryLimit=.
245
246 MemoryMax=bytes
247 Specify the absolute limit on memory usage of the executed
248 processes in this unit. If memory usage cannot be contained under
249 the limit, out-of-memory killer is invoked inside the unit. It is
250 recommended to use MemoryHigh= as the main control mechanism and
251 use MemoryMax= as the last line of defense.
252
253 Takes a memory size in bytes. If the value is suffixed with K, M, G
254 or T, the specified memory size is parsed as Kilobytes, Megabytes,
255 Gigabytes, or Terabytes (with the base 1024), respectively.
256 Alternatively, a percentage value may be specified, which is taken
257 relative to the installed physical memory on the system. If
258 assigned the special value "infinity", no memory limit is applied.
259 This controls the "memory.max" control group attribute. For details
260 about this control group attribute, see Memory Interface Files[6].
261
262 This setting replaces MemoryLimit=.
263
264 MemorySwapMax=bytes
265 Specify the absolute limit on swap usage of the executed processes
266 in this unit.
267
268 Takes a swap size in bytes. If the value is suffixed with K, M, G
269 or T, the specified swap size is parsed as Kilobytes, Megabytes,
270 Gigabytes, or Terabytes (with the base 1024), respectively. If
271 assigned the special value "infinity", no swap limit is applied.
272 This controls the "memory.swap.max" control group attribute. For
273 details about this control group attribute, see Memory Interface
274 Files[6].
275
276 This setting is supported only if the unified control group
277 hierarchy is used and disables MemoryLimit=.
278
279 TasksAccounting=
280 Turn on task accounting for this unit. Takes a boolean argument. If
281 enabled, the system manager will keep track of the number of tasks
282 in the unit. The number of tasks accounted this way includes both
283 kernel threads and userspace processes, with each thread counting
284 individually. Note that turning on tasks accounting for one unit
285 will also implicitly turn it on for all units contained in the same
286 slice and for all its parent slices and the units contained
287 therein. The system default for this setting may be controlled with
288 DefaultTasksAccounting= in systemd-system.conf(5).
289
290 TasksMax=N
291 Specify the maximum number of tasks that may be created in the
292 unit. This ensures that the number of tasks accounted for the unit
293 (see above) stays below a specific limit. This either takes an
294 absolute number of tasks or a percentage value that is taken
295 relative to the configured maximum number of tasks on the system.
296 If assigned the special value "infinity", no tasks limit is
297 applied. This controls the "pids.max" control group attribute. For
298 details about this control group attribute, see Process Number
299 Controller[7].
300
301 The system default for this setting may be controlled with
302 DefaultTasksMax= in systemd-system.conf(5).
303
304 IOAccounting=
305 Turn on Block I/O accounting for this unit, if the unified control
306 group hierarchy is used on the system. Takes a boolean argument.
307 Note that turning on block I/O accounting for one unit will also
308 implicitly turn it on for all units contained in the same slice and
309 all for its parent slices and the units contained therein. The
310 system default for this setting may be controlled with
311 DefaultIOAccounting= in systemd-system.conf(5).
312
313 This setting replaces BlockIOAccounting= and disables settings
314 prefixed with BlockIO or StartupBlockIO.
315
316 IOWeight=weight, StartupIOWeight=weight
317 Set the default overall block I/O weight for the executed
318 processes, if the unified control group hierarchy is used on the
319 system. Takes a single weight value (between 1 and 10000) to set
320 the default block I/O weight. This controls the "io.weight" control
321 group attribute, which defaults to 100. For details about this
322 control group attribute, see IO Interface Files[8]. The available
323 I/O bandwidth is split up among all units within one slice relative
324 to their block I/O weight.
325
326 While StartupIOWeight= only applies to the startup phase of the
327 system, IOWeight= applies to the later runtime of the system, and
328 if the former is not set also to the startup phase. This allows
329 prioritizing specific services at boot-up differently than during
330 runtime.
331
332 These settings replace BlockIOWeight= and StartupBlockIOWeight= and
333 disable settings prefixed with BlockIO or StartupBlockIO.
334
335 IODeviceWeight=device weight
336 Set the per-device overall block I/O weight for the executed
337 processes, if the unified control group hierarchy is used on the
338 system. Takes a space-separated pair of a file path and a weight
339 value to specify the device specific weight value, between 1 and
340 10000. (Example: "/dev/sda 1000"). The file path may be specified
341 as path to a block device node or as any other file, in which case
342 the backing block device of the file system of the file is
343 determined. This controls the "io.weight" control group attribute,
344 which defaults to 100. Use this option multiple times to set
345 weights for multiple devices. For details about this control group
346 attribute, see IO Interface Files[8].
347
348 This setting replaces BlockIODeviceWeight= and disables settings
349 prefixed with BlockIO or StartupBlockIO.
350
351 The specified device node should reference a block device that has
352 an I/O scheduler associated, i.e. should not refer to partition or
353 loopback block devices, but to the originating, physical device.
354 When a path to a regular file or directory is specified it is
355 attempted to discover the correct originating device backing the
356 file system of the specified path. This works correctly only for
357 simpler cases, where the file system is directly placed on a
358 partition or physical block device, or where simple 1:1 encryption
359 using dm-crypt/LUKS is used. This discovery does not cover complex
360 storage and in particular RAID and volume management storage
361 devices.
362
363 IOReadBandwidthMax=device bytes, IOWriteBandwidthMax=device bytes
364 Set the per-device overall block I/O bandwidth maximum limit for
365 the executed processes, if the unified control group hierarchy is
366 used on the system. This limit is not work-conserving and the
367 executed processes are not allowed to use more even if the device
368 has idle capacity. Takes a space-separated pair of a file path and
369 a bandwidth value (in bytes per second) to specify the device
370 specific bandwidth. The file path may be a path to a block device
371 node, or as any other file in which case the backing block device
372 of the file system of the file is used. If the bandwidth is
373 suffixed with K, M, G, or T, the specified bandwidth is parsed as
374 Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
375 base of 1000. (Example:
376 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
377 controls the "io.max" control group attributes. Use this option
378 multiple times to set bandwidth limits for multiple devices. For
379 details about this control group attribute, see IO Interface
380 Files[8].
381
382 These settings replace BlockIOReadBandwidth= and
383 BlockIOWriteBandwidth= and disable settings prefixed with BlockIO
384 or StartupBlockIO.
385
386 Similar restrictions on block device discovery as for
387 IODeviceWeight= apply, see above.
388
389 IOReadIOPSMax=device IOPS, IOWriteIOPSMax=device IOPS
390 Set the per-device overall block I/O IOs-Per-Second maximum limit
391 for the executed processes, if the unified control group hierarchy
392 is used on the system. This limit is not work-conserving and the
393 executed processes are not allowed to use more even if the device
394 has idle capacity. Takes a space-separated pair of a file path and
395 an IOPS value to specify the device specific IOPS. The file path
396 may be a path to a block device node, or as any other file in which
397 case the backing block device of the file system of the file is
398 used. If the IOPS is suffixed with K, M, G, or T, the specified
399 IOPS is parsed as KiloIOPS, MegaIOPS, GigaIOPS, or TeraIOPS,
400 respectively, to the base of 1000. (Example:
401 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This
402 controls the "io.max" control group attributes. Use this option
403 multiple times to set IOPS limits for multiple devices. For details
404 about this control group attribute, see IO Interface Files[8].
405
406 These settings are supported only if the unified control group
407 hierarchy is used and disable settings prefixed with BlockIO or
408 StartupBlockIO.
409
410 Similar restrictions on block device discovery as for
411 IODeviceWeight= apply, see above.
412
413 IODeviceLatencyTargetSec=device target
414 Set the per-device average target I/O latency for the executed
415 processes, if the unified control group hierarchy is used on the
416 system. Takes a file path and a timespan separated by a space to
417 specify the device specific latency target. (Example: "/dev/sda
418 25ms"). The file path may be specified as path to a block device
419 node or as any other file, in which case the backing block device
420 of the file system of the file is determined. This controls the
421 "io.latency" control group attribute. Use this option multiple
422 times to set latency target for multiple devices. For details about
423 this control group attribute, see IO Interface Files[8].
424
425 Implies "IOAccounting=yes".
426
427 These settings are supported only if the unified control group
428 hierarchy is used.
429
430 Similar restrictions on block device discovery as for
431 IODeviceWeight= apply, see above.
432
433 IPAccounting=
434 Takes a boolean argument. If true, turns on IPv4 and IPv6 network
435 traffic accounting for packets sent or received by the unit. When
436 this option is turned on, all IPv4 and IPv6 sockets created by any
437 process of the unit are accounted for.
438
439 When this option is used in socket units, it applies to all IPv4
440 and IPv6 sockets associated with it (including both listening and
441 connection sockets where this applies). Note that for
442 socket-activated services, this configuration setting and the
443 accounting data of the service unit and the socket unit are kept
444 separate, and displayed separately. No propagation of the setting
445 and the collected statistics is done, in either direction.
446 Moreover, any traffic sent or received on any of the socket unit's
447 sockets is accounted to the socket unit — and never to the service
448 unit it might have activated, even if the socket is used by it.
449
450 The system default for this setting may be controlled with
451 DefaultIPAccounting= in systemd-system.conf(5).
452
453 IPAddressAllow=ADDRESS[/PREFIXLENGTH]...,
454 IPAddressDeny=ADDRESS[/PREFIXLENGTH]...
455 Turn on network traffic filtering for IP packets sent and received
456 over AF_INET and AF_INET6 sockets. Both directives take a space
457 separated list of IPv4 or IPv6 addresses, each optionally suffixed
458 with an address prefix length in bits after a "/" character. If the
459 suffix is omitted, the address is considered a host address, i.e.
460 the filter covers the whole address (32 bits for IPv4, 128 bits for
461 IPv6).
462
463 The access lists configured with this option are applied to all
464 sockets created by processes of this unit (or in the case of socket
465 units, associated with it). The lists are implicitly combined with
466 any lists configured for any of the parent slice units this unit
467 might be a member of. By default both access lists are empty. Both
468 ingress and egress traffic is filtered by these settings. In case
469 of ingress traffic the source IP address is checked against these
470 access lists, in case of egress traffic the destination IP address
471 is checked. The following rules are applied in turn:
472
473 • Access is granted when the checked IP address matches an entry
474 in the IPAddressAllow= list.
475
476 • Otherwise, access is denied when the checked IP address matches
477 an entry in the IPAddressDeny= list.
478
479 • Otherwise, access is granted.
480
481 In order to implement an allow-listing IP firewall, it is
482 recommended to use a IPAddressDeny=any setting on an upper-level
483 slice unit (such as the root slice -.slice or the slice containing
484 all system services system.slice – see systemd.special(7) for
485 details on these slice units), plus individual per-service
486 IPAddressAllow= lines permitting network access to relevant
487 services, and only them.
488
489 Note that for socket-activated services, the IP access list
490 configured on the socket unit applies to all sockets associated
491 with it directly, but not to any sockets created by the ultimately
492 activated services for it. Conversely, the IP access list
493 configured for the service is not applied to any sockets passed
494 into the service via socket activation. Thus, it is usually a good
495 idea to replicate the IP access lists on both the socket and the
496 service unit. Nevertheless, it may make sense to maintain one list
497 more open and the other one more restricted, depending on the
498 usecase.
499
500 If these settings are used multiple times in the same unit the
501 specified lists are combined. If an empty string is assigned to
502 these settings the specific access list is reset and all previous
503 settings undone.
504
505 In place of explicit IPv4 or IPv6 address and prefix length
506 specifications a small set of symbolic names may be used. The
507 following names are defined:
508
509 Table 1. Special address/network names
510 ┌──────────────┬─────────────────────┬─────────────────────┐
511 │Symbolic Name │ Definition │ Meaning │
512 ├──────────────┼─────────────────────┼─────────────────────┤
513 │any │ 0.0.0.0/0 ::/0 │ Any host │
514 ├──────────────┼─────────────────────┼─────────────────────┤
515 │localhost │ 127.0.0.0/8 ::1/128 │ All addresses on │
516 │ │ │ the local loopback │
517 ├──────────────┼─────────────────────┼─────────────────────┤
518 │link-local │ 169.254.0.0/16 │ All link-local IP │
519 │ │ fe80::/64 │ addresses │
520 ├──────────────┼─────────────────────┼─────────────────────┤
521 │multicast │ 224.0.0.0/4 │ All IP multicasting │
522 │ │ ff00::/8 │ addresses │
523 └──────────────┴─────────────────────┴─────────────────────┘
524 Note that these settings might not be supported on some systems
525 (for example if eBPF control group support is not enabled in the
526 underlying kernel or container manager). These settings will have
527 no effect in that case. If compatibility with such systems is
528 desired it is hence recommended to not exclusively rely on them for
529 IP security.
530
531 IPIngressFilterPath=BPF_FS_PROGRAM_PATH,
532 IPEgressFilterPath=BPF_FS_PROGRAM_PATH
533 Add custom network traffic filters implemented as BPF programs,
534 applying to all IP packets sent and received over AF_INET and
535 AF_INET6 sockets. Takes an absolute path to a pinned BPF program in
536 the BPF virtual filesystem (/sys/fs/bpf/).
537
538 The filters configured with this option are applied to all sockets
539 created by processes of this unit (or in the case of socket units,
540 associated with it). The filters are loaded in addition to filters
541 any of the parent slice units this unit might be a member of as
542 well as any IPAddressAllow= and IPAddressDeny= filters in any of
543 these units. By default there are no filters specified.
544
545 If these settings are used multiple times in the same unit all the
546 specified programs are attached. If an empty string is assigned to
547 these settings the program list is reset and all previous specified
548 programs ignored.
549
550 Note that for socket-activated services, the IP filter programs
551 configured on the socket unit apply to all sockets associated with
552 it directly, but not to any sockets created by the ultimately
553 activated services for it. Conversely, the IP filter programs
554 configured for the service are not applied to any sockets passed
555 into the service via socket activation. Thus, it is usually a good
556 idea, to replicate the IP filter programs on both the socket and
557 the service unit, however it often makes sense to maintain one
558 configuration more open and the other one more restricted,
559 depending on the usecase.
560
561 Note that these settings might not be supported on some systems
562 (for example if eBPF control group support is not enabled in the
563 underlying kernel or container manager). These settings will fail
564 the service in that case. If compatibility with such systems is
565 desired it is hence recommended to attach your filter manually
566 (requires Delegate=yes) instead of using this setting.
567
568 DeviceAllow=
569 Control access to specific device nodes by the executed processes.
570 Takes two space-separated strings: a device node specifier followed
571 by a combination of r, w, m to control reading, writing, or
572 creation of the specific device node(s) by the unit (mknod),
573 respectively. On cgroup-v1 this controls the "devices.allow"
574 control group attribute. For details about this control group
575 attribute, see Device Whitelist Controller[9]. In the unified
576 cgroup hierarchy this functionality is implemented using eBPF
577 filtering.
578
579 The device node specifier is either a path to a device node in the
580 file system, starting with /dev/, or a string starting with either
581 "char-" or "block-" followed by a device group name, as listed in
582 /proc/devices. The latter is useful to allow-list all current and
583 future devices belonging to a specific device group at once. The
584 device group is matched according to filename globbing rules, you
585 may hence use the "*" and "?" wildcards. (Note that such globbing
586 wildcards are not available for device node path specifications!)
587 In order to match device nodes by numeric major/minor, use device
588 node paths in the /dev/char/ and /dev/block/ directories. However,
589 matching devices by major/minor is generally not recommended as
590 assignments are neither stable nor portable between systems or
591 different kernel versions.
592
593 Examples: /dev/sda5 is a path to a device node, referring to an ATA
594 or SCSI block device. "char-pts" and "char-alsa" are specifiers
595 for all pseudo TTYs and all ALSA sound devices, respectively.
596 "char-cpu/*" is a specifier matching all CPU related device groups.
597
598 Note that allow lists defined this way should only reference device
599 groups which are resolvable at the time the unit is started. Any
600 device groups not resolvable then are not added to the device allow
601 list. In order to work around this limitation, consider extending
602 service units with a pair of After=modprobe@xyz.service and
603 Wants=modprobe@xyz.service lines that load the necessary kernel
604 module implementing the device group if missing. Example:
605
606 ...
607 [Unit]
608 Wants=modprobe@loop.service
609 After=modprobe@loop.service
610
611 [Service]
612 DeviceAllow=block-loop
613 DeviceAllow=/dev/loop-control
614 ...
615
616 DevicePolicy=auto|closed|strict
617 Control the policy for allowing device access:
618
619 strict
620 means to only allow types of access that are explicitly
621 specified.
622
623 closed
624 in addition, allows access to standard pseudo devices including
625 /dev/null, /dev/zero, /dev/full, /dev/random, and /dev/urandom.
626
627 auto
628 in addition, allows access to all devices if no explicit
629 DeviceAllow= is present. This is the default.
630
631 Slice=
632 The name of the slice unit to place the unit in. Defaults to
633 system.slice for all non-instantiated units of all unit types
634 (except for slice units themselves see below). Instance units are
635 by default placed in a subslice of system.slice that is named after
636 the template name.
637
638 This option may be used to arrange systemd units in a hierarchy of
639 slices each of which might have resource settings applied.
640
641 For units of type slice, the only accepted value for this setting
642 is the parent slice. Since the name of a slice unit implies the
643 parent slice, it is hence redundant to ever set this parameter
644 directly for slice units.
645
646 Special care should be taken when relying on the default slice
647 assignment in templated service units that have
648 DefaultDependencies=no set, see systemd.service(5), section
649 "Default Dependencies" for details.
650
651 Delegate=
652 Turns on delegation of further resource control partitioning to
653 processes of the unit. Units where this is enabled may create and
654 manage their own private subhierarchy of control groups below the
655 control group of the unit itself. For unprivileged services (i.e.
656 those using the User= setting) the unit's control group will be
657 made accessible to the relevant user. When enabled the service
658 manager will refrain from manipulating control groups or moving
659 processes below the unit's control group, so that a clear concept
660 of ownership is established: the control group tree above the
661 unit's control group (i.e. towards the root control group) is owned
662 and managed by the service manager of the host, while the control
663 group tree below the unit's control group is owned and managed by
664 the unit itself. Takes either a boolean argument or a list of
665 control group controller names. If true, delegation is turned on,
666 and all supported controllers are enabled for the unit, making them
667 available to the unit's processes for management. If false,
668 delegation is turned off entirely (and no additional controllers
669 are enabled). If set to a list of controllers, delegation is turned
670 on, and the specified controllers are enabled for the unit. Note
671 that additional controllers than the ones specified might be made
672 available as well, depending on configuration of the containing
673 slice unit or other units contained in it. Note that assigning the
674 empty string will enable delegation, but reset the list of
675 controllers, all assignments prior to this will have no effect.
676 Defaults to false.
677
678 Note that controller delegation to less privileged code is only
679 safe on the unified control group hierarchy. Accordingly, access to
680 the specified controllers will not be granted to unprivileged
681 services on the legacy hierarchy, even when requested.
682
683 The following controller names may be specified: cpu, cpuacct,
684 cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
685 bpf-devices.
686
687 Not all of these controllers are available on all kernels however,
688 and some are specific to the unified hierarchy while others are
689 specific to the legacy hierarchy. Also note that the kernel might
690 support further controllers, which aren't covered here yet as
691 delegation is either not supported at all for them or not defined
692 cleanly.
693
694 For further details on the delegation model consult Control Group
695 APIs and Delegation[10].
696
697 DisableControllers=
698 Disables controllers from being enabled for a unit's children. If a
699 controller listed is already in use in its subtree, the controller
700 will be removed from the subtree. This can be used to avoid child
701 units being able to implicitly or explicitly enable a controller.
702 Defaults to not disabling any controllers.
703
704 It may not be possible to successfully disable a controller if the
705 unit or any child of the unit in question delegates controllers to
706 its children, as any delegated subtree of the cgroup hierarchy is
707 unmanaged by systemd.
708
709 Multiple controllers may be specified, separated by spaces. You may
710 also pass DisableControllers= multiple times, in which case each
711 new instance adds another controller to disable. Passing
712 DisableControllers= by itself with no controller name present
713 resets the disabled controller list.
714
715 The following controller names may be specified: cpu, cpuacct,
716 cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
717 bpf-devices.
718
719 ManagedOOMSwap=auto|kill, ManagedOOMMemoryPressure=auto|kill
720 Specifies how systemd-oomd.service(8) will act on this unit's
721 cgroups. Defaults to auto.
722
723 When set to kill, systemd-oomd will actively monitor this unit's
724 cgroup metrics to decide whether it needs to act. If the cgroup
725 passes the limits set by oomd.conf(5) or its overrides,
726 systemd-oomd will send a SIGKILL to all of the processes under the
727 chosen candidate cgroup. Note that only descendant cgroups can be
728 eligible candidates for killing; the unit that set its property to
729 kill is not a candidate (unless one of its ancestors set their
730 property to kill). You can find more details on candidates and kill
731 behavior at systemd-oomd.service(8) and oomd.conf(5). Setting
732 either of these properties to kill will also automatically acquire
733 After= and Wants= dependencies on systemd-oomd.service unless
734 DefaultDependencies=no.
735
736 When set to auto, systemd-oomd will not actively use this cgroup's
737 data for monitoring and detection. However, if an ancestor cgroup
738 has one of these properties set to kill, a unit with auto can still
739 be an eligible candidate for systemd-oomd to act on.
740
741 ManagedOOMMemoryPressureLimit=
742 Overrides the default memory pressure limit set by oomd.conf(5) for
743 this unit (cgroup). Takes a percentage value between 0% and 100%,
744 inclusive. This property is ignored unless
745 ManagedOOMMemoryPressure=kill. Defaults to 0%, which means to use
746 the default set by oomd.conf(5).
747
748 ManagedOOMPreference=none|avoid|omit
749 Allows deprioritizing or omitting this unit's cgroup as a candidate
750 when systemd-oomd needs to act. Requires support for extended
751 attributes (see xattr(7)) in order to use avoid or omit.
752 Additionally, systemd-oomd will ignore these extended attributes if
753 the unit's cgroup is not owned by the root user.
754
755 If this property is set to avoid, the service manager will convey
756 this to systemd-oomd, which will only select this cgroup if there
757 are no other viable candidates.
758
759 If this property is set to omit, the service manager will convey
760 this to systemd-oomd, which will ignore this cgroup as a candidate
761 and will not perform any actions on it.
762
763 It is recommended to use avoid and omit sparingly, as it can
764 adversely affect systemd-oomd's kill behavior. Also note that these
765 extended attributes are not applied recursively to cgroups under
766 this unit's cgroup.
767
768 Defaults to none which means systemd-oomd will rank this unit's
769 cgroup as defined in systemd-oomd.service(8) and oomd.conf(5).
770
772 The following options are deprecated. Use the indicated superseding
773 options instead:
774
775 CPUShares=weight, StartupCPUShares=weight
776 Assign the specified CPU time share weight to the processes
777 executed. These options take an integer value and control the
778 "cpu.shares" control group attribute. The allowed range is 2 to
779 262144. Defaults to 1024. For details about this control group
780 attribute, see CFS Scheduler[4]. The available CPU time is split up
781 among all units within one slice relative to their CPU time share
782 weight.
783
784 While StartupCPUShares= only applies to the startup phase of the
785 system, CPUShares= applies to normal runtime of the system, and if
786 the former is not set also to the startup phase. Using
787 StartupCPUShares= allows prioritizing specific services at boot-up
788 differently than during normal runtime.
789
790 Implies "CPUAccounting=yes".
791
792 These settings are deprecated. Use CPUWeight= and StartupCPUWeight=
793 instead.
794
795 MemoryLimit=bytes
796 Specify the limit on maximum memory usage of the executed
797 processes. The limit specifies how much process and kernel memory
798 can be used by tasks in this unit. Takes a memory size in bytes. If
799 the value is suffixed with K, M, G or T, the specified memory size
800 is parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with
801 the base 1024), respectively. Alternatively, a percentage value may
802 be specified, which is taken relative to the installed physical
803 memory on the system. If assigned the special value "infinity", no
804 memory limit is applied. This controls the "memory.limit_in_bytes"
805 control group attribute. For details about this control group
806 attribute, see Memory Resource Controller[11].
807
808 Implies "MemoryAccounting=yes".
809
810 This setting is deprecated. Use MemoryMax= instead.
811
812 BlockIOAccounting=
813 Turn on Block I/O accounting for this unit, if the legacy control
814 group hierarchy is used on the system. Takes a boolean argument.
815 Note that turning on block I/O accounting for one unit will also
816 implicitly turn it on for all units contained in the same slice and
817 all for its parent slices and the units contained therein. The
818 system default for this setting may be controlled with
819 DefaultBlockIOAccounting= in systemd-system.conf(5).
820
821 This setting is deprecated. Use IOAccounting= instead.
822
823 BlockIOWeight=weight, StartupBlockIOWeight=weight
824 Set the default overall block I/O weight for the executed
825 processes, if the legacy control group hierarchy is used on the
826 system. Takes a single weight value (between 10 and 1000) to set
827 the default block I/O weight. This controls the "blkio.weight"
828 control group attribute, which defaults to 500. For details about
829 this control group attribute, see Block IO Controller[12]. The
830 available I/O bandwidth is split up among all units within one
831 slice relative to their block I/O weight.
832
833 While StartupBlockIOWeight= only applies to the startup phase of
834 the system, BlockIOWeight= applies to the later runtime of the
835 system, and if the former is not set also to the startup phase.
836 This allows prioritizing specific services at boot-up differently
837 than during runtime.
838
839 Implies "BlockIOAccounting=yes".
840
841 These settings are deprecated. Use IOWeight= and StartupIOWeight=
842 instead.
843
844 BlockIODeviceWeight=device weight
845 Set the per-device overall block I/O weight for the executed
846 processes, if the legacy control group hierarchy is used on the
847 system. Takes a space-separated pair of a file path and a weight
848 value to specify the device specific weight value, between 10 and
849 1000. (Example: "/dev/sda 500"). The file path may be specified as
850 path to a block device node or as any other file, in which case the
851 backing block device of the file system of the file is determined.
852 This controls the "blkio.weight_device" control group attribute,
853 which defaults to 1000. Use this option multiple times to set
854 weights for multiple devices. For details about this control group
855 attribute, see Block IO Controller[12].
856
857 Implies "BlockIOAccounting=yes".
858
859 This setting is deprecated. Use IODeviceWeight= instead.
860
861 BlockIOReadBandwidth=device bytes, BlockIOWriteBandwidth=device bytes
862 Set the per-device overall block I/O bandwidth limit for the
863 executed processes, if the legacy control group hierarchy is used
864 on the system. Takes a space-separated pair of a file path and a
865 bandwidth value (in bytes per second) to specify the device
866 specific bandwidth. The file path may be a path to a block device
867 node, or as any other file in which case the backing block device
868 of the file system of the file is used. If the bandwidth is
869 suffixed with K, M, G, or T, the specified bandwidth is parsed as
870 Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
871 base of 1000. (Example:
872 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
873 controls the "blkio.throttle.read_bps_device" and
874 "blkio.throttle.write_bps_device" control group attributes. Use
875 this option multiple times to set bandwidth limits for multiple
876 devices. For details about these control group attributes, see
877 Block IO Controller[12].
878
879 Implies "BlockIOAccounting=yes".
880
881 These settings are deprecated. Use IOReadBandwidthMax= and
882 IOWriteBandwidthMax= instead.
883
885 systemd(1), systemd-system.conf(5), systemd.unit(5),
886 systemd.service(5), systemd.slice(5), systemd.scope(5),
887 systemd.socket(5), systemd.mount(5), systemd.swap(5), systemd.exec(5),
888 systemd.directives(7), systemd.special(7), systemd-oomd.service(8), The
889 documentation for control groups and specific controllers in the Linux
890 kernel: Control Groups v2[2].
891
893 1. New Control Group Interfaces
894 https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
895
896 2. Control Groups v2
897 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
898
899 3. Control Groups version 1
900 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/
901
902 4. CFS Scheduler
903 https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html
904
905 5. sched-bwc.txt
906 https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
907
908 6. Memory Interface Files
909 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
910
911 7. Process Number Controller
912 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/pids.html
913
914 8. IO Interface Files
915 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#io-interface-files
916
917 9. Device Whitelist Controller
918 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/devices.html
919
920 10. Control Group APIs and Delegation
921 https://systemd.io/CGROUP_DELEGATION
922
923 11. Memory Resource Controller
924 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html
925
926 12. Block IO Controller
927 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/blkio-controller.html
928
929
930
931systemd 248 SYSTEMD.RESOURCE-CONTROL(5)