systemd.resource-control(5)

1SYSTEMD.RESOURCE-CONTROL(5)systemd.resource-controlSYSTEMD.RESOURCE-CONTROL(5)
2
3
4

NAME

6       systemd.resource-control - Resource control unit settings
7

SYNOPSIS

9       slice.slice, scope.scope, service.service, socket.socket, mount.mount,
10       swap.swap
11

DESCRIPTION

13       Unit configuration files for services, slices, scopes, sockets, mount
14       points, and swap devices share a subset of configuration options for
15       resource control of spawned processes. Internally, this relies on the
16       Linux Control Groups (cgroups) kernel concept for organizing processes
17       in a hierarchical tree of named groups for the purpose of resource
18       management.
19
20       This man page lists the configuration options shared by those six unit
21       types. See systemd.unit(5) for the common options of all unit
22       configuration files, and systemd.slice(5), systemd.scope(5),
23       systemd.service(5), systemd.socket(5), systemd.mount(5), and
24       systemd.swap(5) for more information on the specific unit configuration
25       files. The resource control configuration options are configured in the
26       [Slice], [Scope], [Service], [Socket], [Mount], or [Swap] sections,
27       depending on the unit type.
28
29       In addition, options which control resources available to programs
30       executed by systemd are listed in systemd.exec(5). Those options
31       complement options listed here.
32
33       See the New Control Group Interfaces[1] for an introduction on how to
34       make use of resource control APIs from programs.
35
36   Setting resource controls for a group of related units
37       As described in systemd.unit(5), the settings listed here may be set
38       through the main file of a unit and drop-in snippets in *.d/
39       directories. The list of directories searched for drop-ins includes
40       names formed by repeatedly truncating the unit name after all dashes.
41       This is particularly convenient to set resource limits for a group of
42       units with similar names.
43
44       For example, every user gets their own slice user-nnn.slice. Drop-ins
45       with local configuration that affect user 1000 may be placed in
46       /etc/systemd/system/user-1000.slice,
47       /etc/systemd/system/user-1000.slice.d/*.conf, but also
48       /etc/systemd/system/user-.slice.d/*.conf. This last directory applies
49       to all user slices.
50

IMPLICIT DEPENDENCIES

52       The following dependencies are implicitly added:
53
54       •   Units with the Slice= setting set automatically acquire Requires=
55           and After= dependencies on the specified slice unit.
56

UNIFIED AND LEGACY CONTROL GROUP HIERARCHIES

58       The unified control group hierarchy is the new version of kernel
59       control group interface, see Control Groups v2[2]. Depending on the
60       resource type, there are differences in resource control capabilities.
61       Also, because of interface changes, some resource types have separate
62       set of options on the unified hierarchy.
63
64       CPU
65           CPUWeight= and StartupCPUWeight= replace CPUShares= and
66           StartupCPUShares=, respectively.
67
68           The "cpuacct" controller does not exist separately on the unified
69           hierarchy.
70
71       Memory
72           MemoryMax= replaces MemoryLimit=.  MemoryLow= and MemoryHigh= are
73           effective only on unified hierarchy.
74
75       IO
76           "IO"-prefixed settings are a superset of and replace
77           "BlockIO"-prefixed ones. On unified hierarchy, IO resource control
78           also applies to buffered writes.
79
80       To ease the transition, there is best-effort translation between the
81       two versions of settings. For each controller, if any of the settings
82       for the unified hierarchy are present, all settings for the legacy
83       hierarchy are ignored. If the resulting settings are for the other type
84       of hierarchy, the configurations are translated before application.
85
86       Legacy control group hierarchy (see Control Groups version 1[3]), also
87       called cgroup-v1, doesn't allow safe delegation of controllers to
88       unprivileged processes. If the system uses the legacy control group
89       hierarchy, resource control is disabled for the systemd user instance,
90       see systemd(1).
91

OPTIONS

93       Units of the types listed above can have settings for resource control
94       configuration:
95
96       CPUAccounting=
97           Turn on CPU usage accounting for this unit. Takes a boolean
98           argument. Note that turning on CPU accounting for one unit will
99           also implicitly turn it on for all units contained in the same
100           slice and for all its parent slices and the units contained
101           therein. The system default for this setting may be controlled with
102           DefaultCPUAccounting= in systemd-system.conf(5).
103
104       CPUWeight=weight, StartupCPUWeight=weight
105           Assign the specified CPU time weight to the processes executed, if
106           the unified control group hierarchy is used on the system. These
107           options take an integer value and control the "cpu.weight" control
108           group attribute. The allowed range is 1 to 10000. Defaults to 100.
109           For details about this control group attribute, see Control Groups
110           v2[2] and CFS Scheduler[4]. The available CPU time is split up
111           among all units within one slice relative to their CPU time weight.
112           A higher weight means more CPU time, a lower weight means less.
113
114           While StartupCPUWeight= only applies to the startup phase of the
115           system, CPUWeight= applies to normal runtime of the system, and if
116           the former is not set also to the startup phase. Using
117           StartupCPUWeight= allows prioritizing specific services at boot-up
118           differently than during normal runtime.
119
120           These settings replace CPUShares= and StartupCPUShares=.
121
122       CPUQuota=
123           Assign the specified CPU time quota to the processes executed.
124           Takes a percentage value, suffixed with "%". The percentage
125           specifies how much CPU time the unit shall get at maximum, relative
126           to the total CPU time available on one CPU. Use values > 100% for
127           allotting CPU time on more than one CPU. This controls the
128           "cpu.max" attribute on the unified control group hierarchy and
129           "cpu.cfs_quota_us" on legacy. For details about these control group
130           attributes, see Control Groups v2[2] and sched-bwc.txt[5].
131
132           Example: CPUQuota=20% ensures that the executed processes will
133           never get more than 20% CPU time on one CPU.
134
135       CPUQuotaPeriodSec=
136           Assign the duration over which the CPU time quota specified by
137           CPUQuota= is measured. Takes a time duration value in seconds, with
138           an optional suffix such as "ms" for milliseconds (or "s" for
139           seconds.) The default setting is 100ms. The period is clamped to
140           the range supported by the kernel, which is [1ms, 1000ms].
141           Additionally, the period is adjusted up so that the quota interval
142           is also at least 1ms. Setting CPUQuotaPeriodSec= to an empty value
143           resets it to the default.
144
145           This controls the second field of "cpu.max" attribute on the
146           unified control group hierarchy and "cpu.cfs_period_us" on legacy.
147           For details about these control group attributes, see Control
148           Groups v2[2] and CFS Scheduler[4].
149
150           Example: CPUQuotaPeriodSec=10ms to request that the CPU quota is
151           measured in periods of 10ms.
152
153       AllowedCPUs=
154           Restrict processes to be executed on specific CPUs. Takes a list of
155           CPU indices or ranges separated by either whitespace or commas. CPU
156           ranges are specified by the lower and upper CPU indices separated
157           by a dash.
158
159           Setting AllowedCPUs= doesn't guarantee that all of the CPUs will be
160           used by the processes as it may be limited by parent units. The
161           effective configuration is reported as EffectiveCPUs=.
162
163           This setting is supported only with the unified control group
164           hierarchy.
165
166       AllowedMemoryNodes=
167           Restrict processes to be executed on specific memory NUMA nodes.
168           Takes a list of memory NUMA nodes indices or ranges separated by
169           either whitespace or commas. Memory NUMA nodes ranges are specified
170           by the lower and upper NUMA nodes indices separated by a dash.
171
172           Setting AllowedMemoryNodes= doesn't guarantee that all of the
173           memory NUMA nodes will be used by the processes as it may be
174           limited by parent units. The effective configuration is reported as
175           EffectiveMemoryNodes=.
176
177           This setting is supported only with the unified control group
178           hierarchy.
179
180       MemoryAccounting=
181           Turn on process and kernel memory accounting for this unit. Takes a
182           boolean argument. Note that turning on memory accounting for one
183           unit will also implicitly turn it on for all units contained in the
184           same slice and for all its parent slices and the units contained
185           therein. The system default for this setting may be controlled with
186           DefaultMemoryAccounting= in systemd-system.conf(5).
187
188       MemoryMin=bytes, MemoryLow=bytes
189           Specify the memory usage protection of the executed processes in
190           this unit. When reclaiming memory, the unit is treated as if it was
191           using less memory resulting in memory to be preferentially
192           reclaimed from unprotected units. Using MemoryLow= results in a
193           weaker protection where memory may still be reclaimed to avoid
194           invoking the OOM killer in case there is no other reclaimable
195           memory.
196
197           For a protection to be effective, it is generally required to set a
198           corresponding allocation on all ancestors, which is then
199           distributed between children (with the exception of the root
200           slice). Any MemoryMin= or MemoryLow= allocation that is not
201           explicitly distributed to specific children is used to create a
202           shared protection for all children. As this is a shared protection,
203           the children will freely compete for the memory.
204
205           Takes a memory size in bytes. If the value is suffixed with K, M, G
206           or T, the specified memory size is parsed as Kilobytes, Megabytes,
207           Gigabytes, or Terabytes (with the base 1024), respectively.
208           Alternatively, a percentage value may be specified, which is taken
209           relative to the installed physical memory on the system. If
210           assigned the special value "infinity", all available memory is
211           protected, which may be useful in order to always inherit all of
212           the protection afforded by ancestors. This controls the
213           "memory.min" or "memory.low" control group attribute. For details
214           about this control group attribute, see Memory Interface Files[6].
215
216           This setting is supported only if the unified control group
217           hierarchy is used and disables MemoryLimit=.
218
219           Units may have their children use a default "memory.min" or
220           "memory.low" value by specifying DefaultMemoryMin= or
221           DefaultMemoryLow=, which has the same semantics as MemoryMin= and
222           MemoryLow=. This setting does not affect "memory.min" or
223           "memory.low" in the unit itself. Using it to set a default child
224           allocation is only useful on kernels older than 5.7, which do not
225           support the "memory_recursiveprot" cgroup2 mount option.
226
227       MemoryHigh=bytes
228           Specify the throttling limit on memory usage of the executed
229           processes in this unit. Memory usage may go above the limit if
230           unavoidable, but the processes are heavily slowed down and memory
231           is taken away aggressively in such cases. This is the main
232           mechanism to control memory usage of a unit.
233
234           Takes a memory size in bytes. If the value is suffixed with K, M, G
235           or T, the specified memory size is parsed as Kilobytes, Megabytes,
236           Gigabytes, or Terabytes (with the base 1024), respectively.
237           Alternatively, a percentage value may be specified, which is taken
238           relative to the installed physical memory on the system. If
239           assigned the special value "infinity", no memory throttling is
240           applied. This controls the "memory.high" control group attribute.
241           For details about this control group attribute, see Memory
242           Interface Files[6].
243
244           This setting is supported only if the unified control group
245           hierarchy is used and disables MemoryLimit=.
246
247       MemoryMax=bytes
248           Specify the absolute limit on memory usage of the executed
249           processes in this unit. If memory usage cannot be contained under
250           the limit, out-of-memory killer is invoked inside the unit. It is
251           recommended to use MemoryHigh= as the main control mechanism and
252           use MemoryMax= as the last line of defense.
253
254           Takes a memory size in bytes. If the value is suffixed with K, M, G
255           or T, the specified memory size is parsed as Kilobytes, Megabytes,
256           Gigabytes, or Terabytes (with the base 1024), respectively.
257           Alternatively, a percentage value may be specified, which is taken
258           relative to the installed physical memory on the system. If
259           assigned the special value "infinity", no memory limit is applied.
260           This controls the "memory.max" control group attribute. For details
261           about this control group attribute, see Memory Interface Files[6].
262
263           This setting replaces MemoryLimit=.
264
265       MemorySwapMax=bytes
266           Specify the absolute limit on swap usage of the executed processes
267           in this unit.
268
269           Takes a swap size in bytes. If the value is suffixed with K, M, G
270           or T, the specified swap size is parsed as Kilobytes, Megabytes,
271           Gigabytes, or Terabytes (with the base 1024), respectively. If
272           assigned the special value "infinity", no swap limit is applied.
273           This controls the "memory.swap.max" control group attribute. For
274           details about this control group attribute, see Memory Interface
275           Files[6].
276
277           This setting is supported only if the unified control group
278           hierarchy is used and disables MemoryLimit=.
279
280       TasksAccounting=
281           Turn on task accounting for this unit. Takes a boolean argument. If
282           enabled, the system manager will keep track of the number of tasks
283           in the unit. The number of tasks accounted this way includes both
284           kernel threads and userspace processes, with each thread counting
285           individually. Note that turning on tasks accounting for one unit
286           will also implicitly turn it on for all units contained in the same
287           slice and for all its parent slices and the units contained
288           therein. The system default for this setting may be controlled with
289           DefaultTasksAccounting= in systemd-system.conf(5).
290
291       TasksMax=N
292           Specify the maximum number of tasks that may be created in the
293           unit. This ensures that the number of tasks accounted for the unit
294           (see above) stays below a specific limit. This either takes an
295           absolute number of tasks or a percentage value that is taken
296           relative to the configured maximum number of tasks on the system.
297           If assigned the special value "infinity", no tasks limit is
298           applied. This controls the "pids.max" control group attribute. For
299           details about this control group attribute, see Process Number
300           Controller[7].
301
302           The system default for this setting may be controlled with
303           DefaultTasksMax= in systemd-system.conf(5).
304
305       IOAccounting=
306           Turn on Block I/O accounting for this unit, if the unified control
307           group hierarchy is used on the system. Takes a boolean argument.
308           Note that turning on block I/O accounting for one unit will also
309           implicitly turn it on for all units contained in the same slice and
310           all for its parent slices and the units contained therein. The
311           system default for this setting may be controlled with
312           DefaultIOAccounting= in systemd-system.conf(5).
313
314           This setting replaces BlockIOAccounting= and disables settings
315           prefixed with BlockIO or StartupBlockIO.
316
317       IOWeight=weight, StartupIOWeight=weight
318           Set the default overall block I/O weight for the executed
319           processes, if the unified control group hierarchy is used on the
320           system. Takes a single weight value (between 1 and 10000) to set
321           the default block I/O weight. This controls the "io.weight" control
322           group attribute, which defaults to 100. For details about this
323           control group attribute, see IO Interface Files[8]. The available
324           I/O bandwidth is split up among all units within one slice relative
325           to their block I/O weight. A higher weight means more I/O
326           bandwidth, a lower weight means less.
327
328           While StartupIOWeight= only applies to the startup phase of the
329           system, IOWeight= applies to the later runtime of the system, and
330           if the former is not set also to the startup phase. This allows
331           prioritizing specific services at boot-up differently than during
332           runtime.
333
334           These settings replace BlockIOWeight= and StartupBlockIOWeight= and
335           disable settings prefixed with BlockIO or StartupBlockIO.
336
337       IODeviceWeight=device weight
338           Set the per-device overall block I/O weight for the executed
339           processes, if the unified control group hierarchy is used on the
340           system. Takes a space-separated pair of a file path and a weight
341           value to specify the device specific weight value, between 1 and
342           10000. (Example: "/dev/sda 1000"). The file path may be specified
343           as path to a block device node or as any other file, in which case
344           the backing block device of the file system of the file is
345           determined. This controls the "io.weight" control group attribute,
346           which defaults to 100. Use this option multiple times to set
347           weights for multiple devices. For details about this control group
348           attribute, see IO Interface Files[8].
349
350           This setting replaces BlockIODeviceWeight= and disables settings
351           prefixed with BlockIO or StartupBlockIO.
352
353           The specified device node should reference a block device that has
354           an I/O scheduler associated, i.e. should not refer to partition or
355           loopback block devices, but to the originating, physical device.
356           When a path to a regular file or directory is specified it is
357           attempted to discover the correct originating device backing the
358           file system of the specified path. This works correctly only for
359           simpler cases, where the file system is directly placed on a
360           partition or physical block device, or where simple 1:1 encryption
361           using dm-crypt/LUKS is used. This discovery does not cover complex
362           storage and in particular RAID and volume management storage
363           devices.
364
365       IOReadBandwidthMax=device bytes, IOWriteBandwidthMax=device bytes
366           Set the per-device overall block I/O bandwidth maximum limit for
367           the executed processes, if the unified control group hierarchy is
368           used on the system. This limit is not work-conserving and the
369           executed processes are not allowed to use more even if the device
370           has idle capacity. Takes a space-separated pair of a file path and
371           a bandwidth value (in bytes per second) to specify the device
372           specific bandwidth. The file path may be a path to a block device
373           node, or as any other file in which case the backing block device
374           of the file system of the file is used. If the bandwidth is
375           suffixed with K, M, G, or T, the specified bandwidth is parsed as
376           Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
377           base of 1000. (Example:
378           "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
379           controls the "io.max" control group attributes. Use this option
380           multiple times to set bandwidth limits for multiple devices. For
381           details about this control group attribute, see IO Interface
382           Files[8].
383
384           These settings replace BlockIOReadBandwidth= and
385           BlockIOWriteBandwidth= and disable settings prefixed with BlockIO
386           or StartupBlockIO.
387
388           Similar restrictions on block device discovery as for
389           IODeviceWeight= apply, see above.
390
391       IOReadIOPSMax=device IOPS, IOWriteIOPSMax=device IOPS
392           Set the per-device overall block I/O IOs-Per-Second maximum limit
393           for the executed processes, if the unified control group hierarchy
394           is used on the system. This limit is not work-conserving and the
395           executed processes are not allowed to use more even if the device
396           has idle capacity. Takes a space-separated pair of a file path and
397           an IOPS value to specify the device specific IOPS. The file path
398           may be a path to a block device node, or as any other file in which
399           case the backing block device of the file system of the file is
400           used. If the IOPS is suffixed with K, M, G, or T, the specified
401           IOPS is parsed as KiloIOPS, MegaIOPS, GigaIOPS, or TeraIOPS,
402           respectively, to the base of 1000. (Example:
403           "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This
404           controls the "io.max" control group attributes. Use this option
405           multiple times to set IOPS limits for multiple devices. For details
406           about this control group attribute, see IO Interface Files[8].
407
408           These settings are supported only if the unified control group
409           hierarchy is used and disable settings prefixed with BlockIO or
410           StartupBlockIO.
411
412           Similar restrictions on block device discovery as for
413           IODeviceWeight= apply, see above.
414
415       IODeviceLatencyTargetSec=device target
416           Set the per-device average target I/O latency for the executed
417           processes, if the unified control group hierarchy is used on the
418           system. Takes a file path and a timespan separated by a space to
419           specify the device specific latency target. (Example: "/dev/sda
420           25ms"). The file path may be specified as path to a block device
421           node or as any other file, in which case the backing block device
422           of the file system of the file is determined. This controls the
423           "io.latency" control group attribute. Use this option multiple
424           times to set latency target for multiple devices. For details about
425           this control group attribute, see IO Interface Files[8].
426
427           Implies "IOAccounting=yes".
428
429           These settings are supported only if the unified control group
430           hierarchy is used.
431
432           Similar restrictions on block device discovery as for
433           IODeviceWeight= apply, see above.
434
435       IPAccounting=
436           Takes a boolean argument. If true, turns on IPv4 and IPv6 network
437           traffic accounting for packets sent or received by the unit. When
438           this option is turned on, all IPv4 and IPv6 sockets created by any
439           process of the unit are accounted for.
440
441           When this option is used in socket units, it applies to all IPv4
442           and IPv6 sockets associated with it (including both listening and
443           connection sockets where this applies). Note that for
444           socket-activated services, this configuration setting and the
445           accounting data of the service unit and the socket unit are kept
446           separate, and displayed separately. No propagation of the setting
447           and the collected statistics is done, in either direction.
448           Moreover, any traffic sent or received on any of the socket unit's
449           sockets is accounted to the socket unit — and never to the service
450           unit it might have activated, even if the socket is used by it.
451
452           The system default for this setting may be controlled with
453           DefaultIPAccounting= in systemd-system.conf(5).
454
455       IPAddressAllow=ADDRESS[/PREFIXLENGTH]...,
456       IPAddressDeny=ADDRESS[/PREFIXLENGTH]...
457           Turn on network traffic filtering for IP packets sent and received
458           over AF_INET and AF_INET6 sockets. Both directives take a space
459           separated list of IPv4 or IPv6 addresses, each optionally suffixed
460           with an address prefix length in bits after a "/" character. If the
461           suffix is omitted, the address is considered a host address, i.e.
462           the filter covers the whole address (32 bits for IPv4, 128 bits for
463           IPv6).
464
465           The access lists configured with this option are applied to all
466           sockets created by processes of this unit (or in the case of socket
467           units, associated with it). The lists are implicitly combined with
468           any lists configured for any of the parent slice units this unit
469           might be a member of. By default both access lists are empty. Both
470           ingress and egress traffic is filtered by these settings. In case
471           of ingress traffic the source IP address is checked against these
472           access lists, in case of egress traffic the destination IP address
473           is checked. The following rules are applied in turn:
474
475           •   Access is granted when the checked IP address matches an entry
476               in the IPAddressAllow= list.
477
478           •   Otherwise, access is denied when the checked IP address matches
479               an entry in the IPAddressDeny= list.
480
481           •   Otherwise, access is granted.
482
483           In order to implement an allow-listing IP firewall, it is
484           recommended to use a IPAddressDeny=any setting on an upper-level
485           slice unit (such as the root slice -.slice or the slice containing
486           all system services system.slice – see systemd.special(7) for
487           details on these slice units), plus individual per-service
488           IPAddressAllow= lines permitting network access to relevant
489           services, and only them.
490
491           Note that for socket-activated services, the IP access list
492           configured on the socket unit applies to all sockets associated
493           with it directly, but not to any sockets created by the ultimately
494           activated services for it. Conversely, the IP access list
495           configured for the service is not applied to any sockets passed
496           into the service via socket activation. Thus, it is usually a good
497           idea to replicate the IP access lists on both the socket and the
498           service unit. Nevertheless, it may make sense to maintain one list
499           more open and the other one more restricted, depending on the
500           usecase.
501
502           If these settings are used multiple times in the same unit the
503           specified lists are combined. If an empty string is assigned to
504           these settings the specific access list is reset and all previous
505           settings undone.
506
507           In place of explicit IPv4 or IPv6 address and prefix length
508           specifications a small set of symbolic names may be used. The
509           following names are defined:
510
511           Table 1. Special address/network names
512           ┌──────────────┬─────────────────────┬─────────────────────┐
513           │Symbolic Name │ Definition          │ Meaning             │
514           ├──────────────┼─────────────────────┼─────────────────────┤
515           │any           │ 0.0.0.0/0 ::/0      │ Any host            │
516           ├──────────────┼─────────────────────┼─────────────────────┤
517           │localhost     │ 127.0.0.0/8 ::1/128 │ All addresses on    │
518           │              │                     │ the local loopback  │
519           ├──────────────┼─────────────────────┼─────────────────────┤
520           │link-local    │ 169.254.0.0/16      │ All link-local IP   │
521           │              │ fe80::/64           │ addresses           │
522           ├──────────────┼─────────────────────┼─────────────────────┤
523           │multicast     │ 224.0.0.0/4         │ All IP multicasting │
524           │              │ ff00::/8            │ addresses           │
525           └──────────────┴─────────────────────┴─────────────────────┘
526           Note that these settings might not be supported on some systems
527           (for example if eBPF control group support is not enabled in the
528           underlying kernel or container manager). These settings will have
529           no effect in that case. If compatibility with such systems is
530           desired it is hence recommended to not exclusively rely on them for
531           IP security.
532
533       IPIngressFilterPath=BPF_FS_PROGRAM_PATH,
534       IPEgressFilterPath=BPF_FS_PROGRAM_PATH
535           Add custom network traffic filters implemented as BPF programs,
536           applying to all IP packets sent and received over AF_INET and
537           AF_INET6 sockets. Takes an absolute path to a pinned BPF program in
538           the BPF virtual filesystem (/sys/fs/bpf/).
539
540           The filters configured with this option are applied to all sockets
541           created by processes of this unit (or in the case of socket units,
542           associated with it). The filters are loaded in addition to filters
543           any of the parent slice units this unit might be a member of as
544           well as any IPAddressAllow= and IPAddressDeny= filters in any of
545           these units. By default there are no filters specified.
546
547           If these settings are used multiple times in the same unit all the
548           specified programs are attached. If an empty string is assigned to
549           these settings the program list is reset and all previous specified
550           programs ignored.
551
552           If the path BPF_FS_PROGRAM_PATH in IPIngressFilterPath= assignment
553           is already being handled by BPFProgram= ingress hook, e.g.
554           BPFProgram=ingress:BPF_FS_PROGRAM_PATH, the assignment will be
555           still considered valid and the program will be attached to a
556           cgroup. Same for IPEgressFilterPath= path and egress hook.
557
558           Note that for socket-activated services, the IP filter programs
559           configured on the socket unit apply to all sockets associated with
560           it directly, but not to any sockets created by the ultimately
561           activated services for it. Conversely, the IP filter programs
562           configured for the service are not applied to any sockets passed
563           into the service via socket activation. Thus, it is usually a good
564           idea, to replicate the IP filter programs on both the socket and
565           the service unit, however it often makes sense to maintain one
566           configuration more open and the other one more restricted,
567           depending on the usecase.
568
569           Note that these settings might not be supported on some systems
570           (for example if eBPF control group support is not enabled in the
571           underlying kernel or container manager). These settings will fail
572           the service in that case. If compatibility with such systems is
573           desired it is hence recommended to attach your filter manually
574           (requires Delegate=yes) instead of using this setting.
575
576       BPFProgram=type:program-path
577           Add a custom cgroup BPF program.
578
579           BPFProgram= allows attaching BPF hooks to the cgroup of a systemd
580           unit. (This generalizes the functionality exposed via
581           IPEgressFilterPath= for egress and IPIngressFilterPath= for
582           ingress.) Cgroup-bpf hooks in the form of BPF programs loaded to
583           the BPF filesystem are attached with cgroup-bpf attach flags
584           determined by the unit. For details about attachment types and
585           flags see
586           https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/bpf.h.
587           For general BPF documentation please refer to
588           https://www.kernel.org/doc/html/latest/bpf/index.html.
589
590           The specification of BPF program consists of a type followed by a
591           program-path with ":" as the separator: type:program-path.
592
593           type is the string name of BPF attach type also used in bpftool.
594           type can be one of egress, ingress, sock_create, sock_ops, device,
595           bind4, bind6, connect4, connect6, post_bind4, post_bind6, sendmsg4,
596           sendmsg6, sysctl, recvmsg4, recvmsg6, getsockopt, setsockopt.
597
598           Setting BPFProgram= to an empty value makes previous assignments
599           ineffective.
600
601           Multiple assignments of the same type:program-path value have the
602           same effect as a single assignment: the program with the path
603           program-path will be attached to cgroup hook type just once.
604
605           If BPF egress pinned to program-path path is already being handled
606           by IPEgressFilterPath=, BPFProgram= assignment will be considered
607           valid and BPFProgram= will be attached to a cgroup. Similarly for
608           ingress hook and IPIngressFilterPath= assignment.
609
610           BPF programs passed with BPFProgram= are attached to the cgroup of
611           a unit with BPF attach flag multi, that allows further attachments
612           of the same type within cgroup hierarchy topped by the unit cgroup.
613
614           Examples:
615
616               BPFProgram=egress:/sys/fs/bpf/egress-hook
617               BPFProgram=bind6:/sys/fs/bpf/sock-addr-hook
618
619       SocketBindAllow=bind-rule, SocketBindDeny=bind-rule
620           Allow or deny binding a socket address to a socket by matching it
621           with the bind-rule and applying a corresponding action if there is
622           a match.
623
624           bind-rule describes socket properties such as address-family,
625           transport-protocol and ip-ports.
626
627           bind-rule := { [address-family:][transport-protocol:][ip-ports] |
628           any }
629
630           address-family := { ipv4 | ipv6 }
631
632           transport-protocol := { tcp | udp }
633
634           ip-ports := { ip-port | ip-port-range }
635
636           An optional address-family expects ipv4 or ipv6 values. If not
637           specified, a rule will be matched for both IPv4 and IPv6 addresses
638           and applied depending on other socket fields, e.g.
639           transport-protocol, ip-port.
640
641           An optional transport-protocol expects tcp or udp transport
642           protocol names. If not specified, a rule will be matched for any
643           transport protocol.
644
645           An optional ip-port value must lie within 1...65535 interval
646           inclusively, i.e. dynamic port 0 is not allowed. A range of
647           sequential ports is described by ip-port-range :=
648           ip-port-low-ip-port-high, where ip-port-low is smaller than or
649           equal to ip-port-high and both are within 1...65535 inclusively.
650
651           A special value any can be used to apply a rule to any address
652           family, transport protocol and any port with a positive value.
653
654           To allow multiple rules assign SocketBindAllow= or SocketBindDeny=
655           multiple times. To clear the existing assignments pass an empty
656           SocketBindAllow= or SocketBindDeny= assignment.
657
658           For each of SocketBindAllow= and SocketBindDeny=, maximum allowed
659           number of assignments is 128.
660
661           •   Binding to a socket is allowed when a socket address matches an
662               entry in the SocketBindAllow= list.
663
664           •   Otherwise, binding is denied when the socket address matches an
665               entry in the SocketBindDeny= list.
666
667           •   Otherwise, binding is allowed.
668
669           The feature is implemented with cgroup/bind4 and cgroup/bind6
670           cgroup-bpf hooks.
671
672           Examples:
673
674               ...
675               # Allow binding IPv6 socket addresses with a port greater than or equal to 10000.
676               [Service]
677               SocketBindAllow=ipv6:10000-65535
678               SocketBindDeny=any
679               ...
680               # Allow binding IPv4 and IPv6 socket addresses with 1234 and 4321 ports.
681               [Service]
682               SocketBindAllow=1234
683               SocketBindAllow=4321
684               SocketBindDeny=any
685               ...
686               # Deny binding IPv6 socket addresses.
687               [Service]
688               SocketBindDeny=ipv6
689               ...
690               # Deny binding IPv4 and IPv6 socket addresses.
691               [Service]
692               SocketBindDeny=any
693               ...
694               # Allow binding only over TCP
695               [Service]
696               SocketBindAllow=tcp
697               SocketBindDeny=any
698               ...
699               # Allow binding only over IPv6/TCP
700               [Service]
701               SocketBindAllow=ipv6:tcp
702               SocketBindDeny=any
703               ...
704               # Allow binding ports within 10000-65535 range over IPv4/UDP.
705               [Service]
706               SocketBindAllow=ipv4:udp:10000-65535
707               SocketBindDeny=any
708               ...
709
710       DeviceAllow=
711           Control access to specific device nodes by the executed processes.
712           Takes two space-separated strings: a device node specifier followed
713           by a combination of r, w, m to control reading, writing, or
714           creation of the specific device node(s) by the unit (mknod),
715           respectively. On cgroup-v1 this controls the "devices.allow"
716           control group attribute. For details about this control group
717           attribute, see Device Whitelist Controller[9]. In the unified
718           cgroup hierarchy this functionality is implemented using eBPF
719           filtering.
720
721           The device node specifier is either a path to a device node in the
722           file system, starting with /dev/, or a string starting with either
723           "char-" or "block-" followed by a device group name, as listed in
724           /proc/devices. The latter is useful to allow-list all current and
725           future devices belonging to a specific device group at once. The
726           device group is matched according to filename globbing rules, you
727           may hence use the "*" and "?"  wildcards. (Note that such globbing
728           wildcards are not available for device node path specifications!)
729           In order to match device nodes by numeric major/minor, use device
730           node paths in the /dev/char/ and /dev/block/ directories. However,
731           matching devices by major/minor is generally not recommended as
732           assignments are neither stable nor portable between systems or
733           different kernel versions.
734
735           Examples: /dev/sda5 is a path to a device node, referring to an ATA
736           or SCSI block device.  "char-pts" and "char-alsa" are specifiers
737           for all pseudo TTYs and all ALSA sound devices, respectively.
738           "char-cpu/*" is a specifier matching all CPU related device groups.
739
740           Note that allow lists defined this way should only reference device
741           groups which are resolvable at the time the unit is started. Any
742           device groups not resolvable then are not added to the device allow
743           list. In order to work around this limitation, consider extending
744           service units with a pair of After=modprobe@xyz.service and
745           Wants=modprobe@xyz.service lines that load the necessary kernel
746           module implementing the device group if missing. Example:
747
748               ...
749               [Unit]
750               Wants=modprobe@loop.service
751               After=modprobe@loop.service
752
753               [Service]
754               DeviceAllow=block-loop
755               DeviceAllow=/dev/loop-control
756               ...
757
758       DevicePolicy=auto|closed|strict
759           Control the policy for allowing device access:
760
761           strict
762               means to only allow types of access that are explicitly
763               specified.
764
765           closed
766               in addition, allows access to standard pseudo devices including
767               /dev/null, /dev/zero, /dev/full, /dev/random, and /dev/urandom.
768
769           auto
770               in addition, allows access to all devices if no explicit
771               DeviceAllow= is present. This is the default.
772
773       Slice=
774           The name of the slice unit to place the unit in. Defaults to
775           system.slice for all non-instantiated units of all unit types
776           (except for slice units themselves see below). Instance units are
777           by default placed in a subslice of system.slice that is named after
778           the template name.
779
780           This option may be used to arrange systemd units in a hierarchy of
781           slices each of which might have resource settings applied.
782
783           For units of type slice, the only accepted value for this setting
784           is the parent slice. Since the name of a slice unit implies the
785           parent slice, it is hence redundant to ever set this parameter
786           directly for slice units.
787
788           Special care should be taken when relying on the default slice
789           assignment in templated service units that have
790           DefaultDependencies=no set, see systemd.service(5), section
791           "Default Dependencies" for details.
792
793       Delegate=
794           Turns on delegation of further resource control partitioning to
795           processes of the unit. Units where this is enabled may create and
796           manage their own private subhierarchy of control groups below the
797           control group of the unit itself. For unprivileged services (i.e.
798           those using the User= setting) the unit's control group will be
799           made accessible to the relevant user. When enabled the service
800           manager will refrain from manipulating control groups or moving
801           processes below the unit's control group, so that a clear concept
802           of ownership is established: the control group tree above the
803           unit's control group (i.e. towards the root control group) is owned
804           and managed by the service manager of the host, while the control
805           group tree below the unit's control group is owned and managed by
806           the unit itself. Takes either a boolean argument or a list of
807           control group controller names. If true, delegation is turned on,
808           and all supported controllers are enabled for the unit, making them
809           available to the unit's processes for management. If false,
810           delegation is turned off entirely (and no additional controllers
811           are enabled). If set to a list of controllers, delegation is turned
812           on, and the specified controllers are enabled for the unit. Note
813           that additional controllers than the ones specified might be made
814           available as well, depending on configuration of the containing
815           slice unit or other units contained in it. Note that assigning the
816           empty string will enable delegation, but reset the list of
817           controllers, all assignments prior to this will have no effect.
818           Defaults to false.
819
820           Note that controller delegation to less privileged code is only
821           safe on the unified control group hierarchy. Accordingly, access to
822           the specified controllers will not be granted to unprivileged
823           services on the legacy hierarchy, even when requested.
824
825           The following controller names may be specified: cpu, cpuacct,
826           cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
827           bpf-devices.
828
829           Not all of these controllers are available on all kernels however,
830           and some are specific to the unified hierarchy while others are
831           specific to the legacy hierarchy. Also note that the kernel might
832           support further controllers, which aren't covered here yet as
833           delegation is either not supported at all for them or not defined
834           cleanly.
835
836           For further details on the delegation model consult Control Group
837           APIs and Delegation[10].
838
839       DisableControllers=
840           Disables controllers from being enabled for a unit's children. If a
841           controller listed is already in use in its subtree, the controller
842           will be removed from the subtree. This can be used to avoid child
843           units being able to implicitly or explicitly enable a controller.
844           Defaults to not disabling any controllers.
845
846           It may not be possible to successfully disable a controller if the
847           unit or any child of the unit in question delegates controllers to
848           its children, as any delegated subtree of the cgroup hierarchy is
849           unmanaged by systemd.
850
851           Multiple controllers may be specified, separated by spaces. You may
852           also pass DisableControllers= multiple times, in which case each
853           new instance adds another controller to disable. Passing
854           DisableControllers= by itself with no controller name present
855           resets the disabled controller list.
856
857           The following controller names may be specified: cpu, cpuacct,
858           cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
859           bpf-devices.
860
861       ManagedOOMSwap=auto|kill, ManagedOOMMemoryPressure=auto|kill
862           Specifies how systemd-oomd.service(8) will act on this unit's
863           cgroups. Defaults to auto.
864
865           When set to kill, systemd-oomd will actively monitor this unit's
866           cgroup metrics to decide whether it needs to act. If the cgroup
867           passes the limits set by oomd.conf(5) or its overrides,
868           systemd-oomd will send a SIGKILL to all of the processes under the
869           chosen candidate cgroup. Note that only descendant cgroups can be
870           eligible candidates for killing; the unit that set its property to
871           kill is not a candidate (unless one of its ancestors set their
872           property to kill). You can find more details on candidates and kill
873           behavior at systemd-oomd.service(8) and oomd.conf(5). Setting
874           either of these properties to kill will also automatically acquire
875           After= and Wants= dependencies on systemd-oomd.service unless
876           DefaultDependencies=no.
877
878           When set to auto, systemd-oomd will not actively use this cgroup's
879           data for monitoring and detection. However, if an ancestor cgroup
880           has one of these properties set to kill, a unit with auto can still
881           be an eligible candidate for systemd-oomd to act on.
882
883       ManagedOOMMemoryPressureLimit=
884           Overrides the default memory pressure limit set by oomd.conf(5) for
885           this unit (cgroup). Takes a percentage value between 0% and 100%,
886           inclusive. This property is ignored unless
887           ManagedOOMMemoryPressure=kill. Defaults to 0%, which means to use
888           the default set by oomd.conf(5).
889
890       ManagedOOMPreference=none|avoid|omit
891           Allows deprioritizing or omitting this unit's cgroup as a candidate
892           when systemd-oomd needs to act. Requires support for extended
893           attributes (see xattr(7)) in order to use avoid or omit.
894           Additionally, systemd-oomd will ignore these extended attributes if
895           the unit's cgroup is not owned by the root user.
896
897           If this property is set to avoid, the service manager will convey
898           this to systemd-oomd, which will only select this cgroup if there
899           are no other viable candidates.
900
901           If this property is set to omit, the service manager will convey
902           this to systemd-oomd, which will ignore this cgroup as a candidate
903           and will not perform any actions on it.
904
905           It is recommended to use avoid and omit sparingly, as it can
906           adversely affect systemd-oomd's kill behavior. Also note that these
907           extended attributes are not applied recursively to cgroups under
908           this unit's cgroup.
909
910           Defaults to none which means systemd-oomd will rank this unit's
911           cgroup as defined in systemd-oomd.service(8) and oomd.conf(5).
912

DEPRECATED OPTIONS

914       The following options are deprecated. Use the indicated superseding
915       options instead:
916
917       CPUShares=weight, StartupCPUShares=weight
918           Assign the specified CPU time share weight to the processes
919           executed. These options take an integer value and control the
920           "cpu.shares" control group attribute. The allowed range is 2 to
921           262144. Defaults to 1024. For details about this control group
922           attribute, see CFS Scheduler[4]. The available CPU time is split up
923           among all units within one slice relative to their CPU time share
924           weight.
925
926           While StartupCPUShares= only applies to the startup phase of the
927           system, CPUShares= applies to normal runtime of the system, and if
928           the former is not set also to the startup phase. Using
929           StartupCPUShares= allows prioritizing specific services at boot-up
930           differently than during normal runtime.
931
932           Implies "CPUAccounting=yes".
933
934           These settings are deprecated. Use CPUWeight= and StartupCPUWeight=
935           instead.
936
937       MemoryLimit=bytes
938           Specify the limit on maximum memory usage of the executed
939           processes. The limit specifies how much process and kernel memory
940           can be used by tasks in this unit. Takes a memory size in bytes. If
941           the value is suffixed with K, M, G or T, the specified memory size
942           is parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with
943           the base 1024), respectively. Alternatively, a percentage value may
944           be specified, which is taken relative to the installed physical
945           memory on the system. If assigned the special value "infinity", no
946           memory limit is applied. This controls the "memory.limit_in_bytes"
947           control group attribute. For details about this control group
948           attribute, see Memory Resource Controller[11].
949
950           Implies "MemoryAccounting=yes".
951
952           This setting is deprecated. Use MemoryMax= instead.
953
954       BlockIOAccounting=
955           Turn on Block I/O accounting for this unit, if the legacy control
956           group hierarchy is used on the system. Takes a boolean argument.
957           Note that turning on block I/O accounting for one unit will also
958           implicitly turn it on for all units contained in the same slice and
959           all for its parent slices and the units contained therein. The
960           system default for this setting may be controlled with
961           DefaultBlockIOAccounting= in systemd-system.conf(5).
962
963           This setting is deprecated. Use IOAccounting= instead.
964
965       BlockIOWeight=weight, StartupBlockIOWeight=weight
966           Set the default overall block I/O weight for the executed
967           processes, if the legacy control group hierarchy is used on the
968           system. Takes a single weight value (between 10 and 1000) to set
969           the default block I/O weight. This controls the "blkio.weight"
970           control group attribute, which defaults to 500. For details about
971           this control group attribute, see Block IO Controller[12]. The
972           available I/O bandwidth is split up among all units within one
973           slice relative to their block I/O weight.
974
975           While StartupBlockIOWeight= only applies to the startup phase of
976           the system, BlockIOWeight= applies to the later runtime of the
977           system, and if the former is not set also to the startup phase.
978           This allows prioritizing specific services at boot-up differently
979           than during runtime.
980
981           Implies "BlockIOAccounting=yes".
982
983           These settings are deprecated. Use IOWeight= and StartupIOWeight=
984           instead.
985
986       BlockIODeviceWeight=device weight
987           Set the per-device overall block I/O weight for the executed
988           processes, if the legacy control group hierarchy is used on the
989           system. Takes a space-separated pair of a file path and a weight
990           value to specify the device specific weight value, between 10 and
991           1000. (Example: "/dev/sda 500"). The file path may be specified as
992           path to a block device node or as any other file, in which case the
993           backing block device of the file system of the file is determined.
994           This controls the "blkio.weight_device" control group attribute,
995           which defaults to 1000. Use this option multiple times to set
996           weights for multiple devices. For details about this control group
997           attribute, see Block IO Controller[12].
998
999           Implies "BlockIOAccounting=yes".
1000
1001           This setting is deprecated. Use IODeviceWeight= instead.
1002
1003       BlockIOReadBandwidth=device bytes, BlockIOWriteBandwidth=device bytes
1004           Set the per-device overall block I/O bandwidth limit for the
1005           executed processes, if the legacy control group hierarchy is used
1006           on the system. Takes a space-separated pair of a file path and a
1007           bandwidth value (in bytes per second) to specify the device
1008           specific bandwidth. The file path may be a path to a block device
1009           node, or as any other file in which case the backing block device
1010           of the file system of the file is used. If the bandwidth is
1011           suffixed with K, M, G, or T, the specified bandwidth is parsed as
1012           Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
1013           base of 1000. (Example:
1014           "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
1015           controls the "blkio.throttle.read_bps_device" and
1016           "blkio.throttle.write_bps_device" control group attributes. Use
1017           this option multiple times to set bandwidth limits for multiple
1018           devices. For details about these control group attributes, see
1019           Block IO Controller[12].
1020
1021           Implies "BlockIOAccounting=yes".
1022
1023           These settings are deprecated. Use IOReadBandwidthMax= and
1024           IOWriteBandwidthMax= instead.
1025

NOTES

1035        1. New Control Group Interfaces
1036           https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
1037
1038        2. Control Groups v2
1039           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
1040
1041        3. Control Groups version 1
1042           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/
1043
1044        4. CFS Scheduler
1045           https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html
1046
1047        5. sched-bwc.txt
1048           https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
1049
1050        6. Memory Interface Files
1051           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
1052
1053        7. Process Number Controller
1054           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/pids.html
1055
1056        8. IO Interface Files
1057           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#io-interface-files
1058
1059        9. Device Whitelist Controller
1060           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/devices.html
1061
1062       10. Control Group APIs and Delegation
1063           https://systemd.io/CGROUP_DELEGATION
1064
1065       11. Memory Resource Controller
1066           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html
1067
1068       12. Block IO Controller
1069           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/blkio-controller.html
1070
1071
1072
1073systemd 249                                        SYSTEMD.RESOURCE-CONTROL(5)