systemd.resource-control(5)

1SYSTEMD.RESOURCE-CONTROL(5)systemd.resource-controlSYSTEMD.RESOURCE-CONTROL(5)
2
3
4

NAME

6       systemd.resource-control - Resource control unit settings
7

SYNOPSIS

9       slice.slice, scope.scope, service.service, socket.socket, mount.mount,
10       swap.swap
11

DESCRIPTION

13       Unit configuration files for services, slices, scopes, sockets, mount
14       points, and swap devices share a subset of configuration options for
15       resource control of spawned processes. Internally, this relies on the
16       Linux Control Groups (cgroups) kernel concept for organizing processes
17       in a hierarchical tree of named groups for the purpose of resource
18       management.
19
20       This man page lists the configuration options shared by those six unit
21       types. See systemd.unit(5) for the common options of all unit
22       configuration files, and systemd.slice(5), systemd.scope(5),
23       systemd.service(5), systemd.socket(5), systemd.mount(5), and
24       systemd.swap(5) for more information on the specific unit configuration
25       files. The resource control configuration options are configured in the
26       [Slice], [Scope], [Service], [Socket], [Mount], or [Swap] sections,
27       depending on the unit type.
28
29       In addition, options which control resources available to programs
30       executed by systemd are listed in systemd.exec(5). Those options
31       complement options listed here.
32
33   Enabling and disabling controllers
34       Controllers in the cgroup hierarchy are hierarchical, and resource
35       control is realized by distributing resource assignments between
36       siblings in branches of the cgroup hierarchy. There is no need to
37       explicitly enable a cgroup controller for a unit.  systemd will
38       instruct the kernel to enable a controller for a given unit when this
39       unit has configuration for a given controller. For example, when
40       CPUWeight= is set, the cpu controller will be enabled, and when
41       TasksMax= are set, the pids controller will be enabled. In addition,
42       various controllers may be also be enabled explicitly via the
43       MemoryAccounting=/TasksAccounting=/IOAccounting= settings. Because of
44       how the cgroup hierarchy works, controllers will be automatically
45       enabled for all parent units and for any sibling units starting with
46       the lowest level at which a controller is enabled. Units for which a
47       controller is enabled may be subject to resource control even if they
48       don't have any explicit configuration.
49
50       Setting Delegate= enables any delegated controllers for that unit (see
51       below). The delegatee may then enable controllers for its children as
52       appropriate. In particular, if the delegatee is systemd (in the
53       user@.service unit), it will repeat the same logic as the system
54       instance and enable controllers for user units which have resource
55       limits configured, and their siblings and parents and parents'
56       siblings.
57
58       Controllers may be disabled for parts of the cgroup hierarchy with
59       DisableControllers= (see below).
60
61       Example 1. Enabling and disabling controllers
62
63                                 -.slice
64                                /       \
65                         /-----/         \--------------\
66                        /                                \
67                 system.slice                       user.slice
68                   /       \                          /      \
69                  /         \                        /        \
70                 /           \              user@42.service  user@1000.service
71                /             \             Delegate=        Delegate=yes
72           a.service       b.slice                             /        \
73           CPUWeight=20   DisableControllers=cpu              /          \
74                            /  \                      app.slice      session.slice
75                           /    \                     CPUWeight=100  CPUWeight=100
76                          /      \
77                  b1.service   b2.service
78                               CPUWeight=1000
79
80
81       In this hierarchy, the cpu controller is enabled for all units shown
82       except b1.service and b2.service. Because there is no explicit
83       configuration for system.slice and user.slice, CPU resources will be
84       split equally between them. Similarly, resources are allocated equally
85       between children of user.slice and between the child slices beneath
86       user@1000.service. Assuming that there is no futher configuration of
87       resources or delegation below slices app.slice or session.slice, the
88       cpu controller would not be enabled for units in those slices and CPU
89       resources would be further allocated using other mechanisms, e.g. based
90       on nice levels. The manager for user 42 has delegation enabled without
91       any controllers, i.e. it can manipulate its subtree of the cgroup
92       hierarchy, but without resource control.
93
94       In the slice system.slice, CPU resources are split 1:6 for service
95       a.service, and 5:6 for slice b.slice, because slice b.slice gets the
96       default value of 100 for cpu.weight when CPUWeight= is not set.
97
98       CPUWeight= setting in service b2.service is neutralized by
99       DisableControllers= in slice b.slice, so the cpu controller would not
100       be enabled for services b1.service and b2.service, and CPU resources
101       would be further allocated using other mechanisms, e.g. based on nice
102       levels.
103
104   Setting resource controls for a group of related units
105       As described in systemd.unit(5), the settings listed here may be set
106       through the main file of a unit and drop-in snippets in *.d/
107       directories. The list of directories searched for drop-ins includes
108       names formed by repeatedly truncating the unit name after all dashes.
109       This is particularly convenient to set resource limits for a group of
110       units with similar names.
111
112       For example, every user gets their own slice user-nnn.slice. Drop-ins
113       with local configuration that affect user 1000 may be placed in
114       /etc/systemd/system/user-1000.slice,
115       /etc/systemd/system/user-1000.slice.d/*.conf, but also
116       /etc/systemd/system/user-.slice.d/*.conf. This last directory applies
117       to all user slices.
118
119       See the New Control Group Interfaces[1] for an introduction on how to
120       make use of resource control APIs from programs.
121

IMPLICIT DEPENDENCIES

123       The following dependencies are implicitly added:
124
125       •   Units with the Slice= setting set automatically acquire Requires=
126           and After= dependencies on the specified slice unit.
127

OPTIONS

129       Units of the types listed above can have settings for resource control
130       configuration:
131
132   CPU Accounting and Control
133       CPUAccounting=
134           Turn on CPU usage accounting for this unit. Takes a boolean
135           argument. Note that turning on CPU accounting for one unit will
136           also implicitly turn it on for all units contained in the same
137           slice and for all its parent slices and the units contained
138           therein. The system default for this setting may be controlled with
139           DefaultCPUAccounting= in systemd-system.conf(5).
140
141           Under the unified cgroup hierarchy, CPU accounting is available for
142           all units and this setting has no effect.
143
144       CPUWeight=weight, StartupCPUWeight=weight
145           These settings control the cpu controller in the unified hierarchy.
146
147           These options accept an integer value or a the special string
148           "idle":
149
150           •   If set to an integer value, assign the specified CPU time
151               weight to the processes executed, if the unified control group
152               hierarchy is used on the system. These options control the
153               "cpu.weight" control group attribute. The allowed range is 1 to
154               10000. Defaults to unset, but the kernel default is 100. For
155               details about this control group attribute, see Control Groups
156               v2[2] and CFS Scheduler[3]. The available CPU time is split up
157               among all units within one slice relative to their CPU time
158               weight. A higher weight means more CPU time, a lower weight
159               means less.
160
161           •   If set to the special string "idle", mark the cgroup for "idle
162               scheduling", which means that it will get CPU resources only
163               when there are no processes not marked in this way to execute
164               in this cgroup or its siblings. This setting corresponds to the
165               "cpu.idle" cgroup attribute.
166
167               Note that this value only has an effect on cgroup-v2, for
168               cgroup-v1 it is equivalent to the minimum weight.
169
170           While StartupCPUWeight= applies to the startup and shutdown phases
171           of the system, CPUWeight= applies to normal runtime of the system,
172           and if the former is not set also to the startup and shutdown
173           phases. Using StartupCPUWeight= allows prioritizing specific
174           services at boot-up and shutdown differently than during normal
175           runtime.
176
177           In addition to the resource allocation performed by the cpu
178           controller, the kernel may automatically divide resources based on
179           session-id grouping, see "The autogroup feature" in sched(7). The
180           effect of this feature is similar to the cpu controller with no
181           explicit configuration, so users should be careful to not mistake
182           one for the other.
183
184       CPUQuota=
185           This setting controls the cpu controller in the unified hierarchy.
186
187           Assign the specified CPU time quota to the processes executed.
188           Takes a percentage value, suffixed with "%". The percentage
189           specifies how much CPU time the unit shall get at maximum, relative
190           to the total CPU time available on one CPU. Use values > 100% for
191           allotting CPU time on more than one CPU. This controls the
192           "cpu.max" attribute on the unified control group hierarchy and
193           "cpu.cfs_quota_us" on legacy. For details about these control group
194           attributes, see Control Groups v2[2] and CFS Bandwidth Control[4].
195           Setting CPUQuota= to an empty value unsets the quota.
196
197           Example: CPUQuota=20% ensures that the executed processes will
198           never get more than 20% CPU time on one CPU.
199
200       CPUQuotaPeriodSec=
201           This setting controls the cpu controller in the unified hierarchy.
202
203           Assign the duration over which the CPU time quota specified by
204           CPUQuota= is measured. Takes a time duration value in seconds, with
205           an optional suffix such as "ms" for milliseconds (or "s" for
206           seconds.) The default setting is 100ms. The period is clamped to
207           the range supported by the kernel, which is [1ms, 1000ms].
208           Additionally, the period is adjusted up so that the quota interval
209           is also at least 1ms. Setting CPUQuotaPeriodSec= to an empty value
210           resets it to the default.
211
212           This controls the second field of "cpu.max" attribute on the
213           unified control group hierarchy and "cpu.cfs_period_us" on legacy.
214           For details about these control group attributes, see Control
215           Groups v2[2] and CFS Scheduler[3].
216
217           Example: CPUQuotaPeriodSec=10ms to request that the CPU quota is
218           measured in periods of 10ms.
219
220       AllowedCPUs=, StartupAllowedCPUs=
221           This setting controls the cpuset controller in the unified
222           hierarchy.
223
224           Restrict processes to be executed on specific CPUs. Takes a list of
225           CPU indices or ranges separated by either whitespace or commas. CPU
226           ranges are specified by the lower and upper CPU indices separated
227           by a dash.
228
229           Setting AllowedCPUs= or StartupAllowedCPUs= doesn't guarantee that
230           all of the CPUs will be used by the processes as it may be limited
231           by parent units. The effective configuration is reported as
232           EffectiveCPUs=.
233
234           While StartupAllowedCPUs= applies to the startup and shutdown
235           phases of the system, AllowedCPUs= applies to normal runtime of the
236           system, and if the former is not set also to the startup and
237           shutdown phases. Using StartupAllowedCPUs= allows prioritizing
238           specific services at boot-up and shutdown differently than during
239           normal runtime.
240
241           This setting is supported only with the unified control group
242           hierarchy.
243
244   Memory Accounting and Control
245       MemoryAccounting=
246           This setting controls the memory controller in the unified
247           hierarchy.
248
249           Turn on process and kernel memory accounting for this unit. Takes a
250           boolean argument. Note that turning on memory accounting for one
251           unit will also implicitly turn it on for all units contained in the
252           same slice and for all its parent slices and the units contained
253           therein. The system default for this setting may be controlled with
254           DefaultMemoryAccounting= in systemd-system.conf(5).
255
256       MemoryMin=bytes, MemoryLow=bytes
257           These settings control the memory controller in the unified
258           hierarchy.
259
260           Specify the memory usage protection of the executed processes in
261           this unit. When reclaiming memory, the unit is treated as if it was
262           using less memory resulting in memory to be preferentially
263           reclaimed from unprotected units. Using MemoryLow= results in a
264           weaker protection where memory may still be reclaimed to avoid
265           invoking the OOM killer in case there is no other reclaimable
266           memory.
267
268           For a protection to be effective, it is generally required to set a
269           corresponding allocation on all ancestors, which is then
270           distributed between children (with the exception of the root
271           slice). Any MemoryMin= or MemoryLow= allocation that is not
272           explicitly distributed to specific children is used to create a
273           shared protection for all children. As this is a shared protection,
274           the children will freely compete for the memory.
275
276           Takes a memory size in bytes. If the value is suffixed with K, M, G
277           or T, the specified memory size is parsed as Kilobytes, Megabytes,
278           Gigabytes, or Terabytes (with the base 1024), respectively.
279           Alternatively, a percentage value may be specified, which is taken
280           relative to the installed physical memory on the system. If
281           assigned the special value "infinity", all available memory is
282           protected, which may be useful in order to always inherit all of
283           the protection afforded by ancestors. This controls the
284           "memory.min" or "memory.low" control group attribute. For details
285           about this control group attribute, see Memory Interface Files[5].
286
287           Units may have their children use a default "memory.min" or
288           "memory.low" value by specifying DefaultMemoryMin= or
289           DefaultMemoryLow=, which has the same semantics as MemoryMin= and
290           MemoryLow=. This setting does not affect "memory.min" or
291           "memory.low" in the unit itself. Using it to set a default child
292           allocation is only useful on kernels older than 5.7, which do not
293           support the "memory_recursiveprot" cgroup2 mount option.
294
295       MemoryHigh=bytes
296           These settings control the memory controller in the unified
297           hierarchy.
298
299           Specify the throttling limit on memory usage of the executed
300           processes in this unit. Memory usage may go above the limit if
301           unavoidable, but the processes are heavily slowed down and memory
302           is taken away aggressively in such cases. This is the main
303           mechanism to control memory usage of a unit.
304
305           Takes a memory size in bytes. If the value is suffixed with K, M, G
306           or T, the specified memory size is parsed as Kilobytes, Megabytes,
307           Gigabytes, or Terabytes (with the base 1024), respectively.
308           Alternatively, a percentage value may be specified, which is taken
309           relative to the installed physical memory on the system. If
310           assigned the special value "infinity", no memory throttling is
311           applied. This controls the "memory.high" control group attribute.
312           For details about this control group attribute, see Memory
313           Interface Files[5].
314
315       MemoryMax=bytes
316           These settings control the memory controller in the unified
317           hierarchy.
318
319           Specify the absolute limit on memory usage of the executed
320           processes in this unit. If memory usage cannot be contained under
321           the limit, out-of-memory killer is invoked inside the unit. It is
322           recommended to use MemoryHigh= as the main control mechanism and
323           use MemoryMax= as the last line of defense.
324
325           Takes a memory size in bytes. If the value is suffixed with K, M, G
326           or T, the specified memory size is parsed as Kilobytes, Megabytes,
327           Gigabytes, or Terabytes (with the base 1024), respectively.
328           Alternatively, a percentage value may be specified, which is taken
329           relative to the installed physical memory on the system. If
330           assigned the special value "infinity", no memory limit is applied.
331           This controls the "memory.max" control group attribute. For details
332           about this control group attribute, see Memory Interface Files[5].
333
334       MemorySwapMax=bytes
335           These settings control the memory controller in the unified
336           hierarchy.
337
338           Specify the absolute limit on swap usage of the executed processes
339           in this unit.
340
341           Takes a swap size in bytes. If the value is suffixed with K, M, G
342           or T, the specified swap size is parsed as Kilobytes, Megabytes,
343           Gigabytes, or Terabytes (with the base 1024), respectively. If
344           assigned the special value "infinity", no swap limit is applied.
345           These settings control the "memory.swap.max" control group
346           attribute. For details about this control group attribute, see
347           Memory Interface Files[5].
348
349       MemoryZSwapMax=bytes
350           These settings control the memory controller in the unified
351           hierarchy.
352
353           Specify the absolute limit on zswap usage of the processes in this
354           unit. Zswap is a lightweight compressed cache for swap pages. It
355           takes pages that are in the process of being swapped out and
356           attempts to compress them into a dynamically allocated RAM-based
357           memory pool. If the limit specified is hit, no entries from this
358           unit will be stored in the pool until existing entries are faulted
359           back or written out to disk. See the kernel's Zswap[6]
360           documentation for more details.
361
362           Takes a size in bytes. If the value is suffixed with K, M, G or T,
363           the specified size is parsed as Kilobytes, Megabytes, Gigabytes, or
364           Terabytes (with the base 1024), respectively. If assigned the
365           special value "infinity", no limit is applied. These settings
366           control the "memory.zswap.max" control group attribute. For details
367           about this control group attribute, see Memory Interface Files[5].
368
369       AllowedMemoryNodes=, StartupAllowedMemoryNodes=
370           These settings control the cpuset controller in the unified
371           hierarchy.
372
373           Restrict processes to be executed on specific memory NUMA nodes.
374           Takes a list of memory NUMA nodes indices or ranges separated by
375           either whitespace or commas. Memory NUMA nodes ranges are specified
376           by the lower and upper NUMA nodes indices separated by a dash.
377
378           Setting AllowedMemoryNodes= or StartupAllowedMemoryNodes= doesn't
379           guarantee that all of the memory NUMA nodes will be used by the
380           processes as it may be limited by parent units. The effective
381           configuration is reported as EffectiveMemoryNodes=.
382
383           While StartupAllowedMemoryNodes= applies to the startup and
384           shutdown phases of the system, AllowedMemoryNodes= applies to
385           normal runtime of the system, and if the former is not set also to
386           the startup and shutdown phases. Using StartupAllowedMemoryNodes=
387           allows prioritizing specific services at boot-up and shutdown
388           differently than during normal runtime.
389
390           This setting is supported only with the unified control group
391           hierarchy.
392
393   Process Accounting and Control
394       TasksAccounting=
395           This setting controls the pids controller in the unified hierarchy.
396
397           Turn on task accounting for this unit. Takes a boolean argument. If
398           enabled, the kernel will keep track of the total number of tasks in
399           the unit and its children. This number includes both kernel threads
400           and userspace processes, with each thread counted individually.
401           Note that turning on tasks accounting for one unit will also
402           implicitly turn it on for all units contained in the same slice and
403           for all its parent slices and the units contained therein. The
404           system default for this setting may be controlled with
405           DefaultTasksAccounting= in systemd-system.conf(5).
406
407       TasksMax=N
408           This setting controls the pids controller in the unified hierarchy.
409
410           Specify the maximum number of tasks that may be created in the
411           unit. This ensures that the number of tasks accounted for the unit
412           (see above) stays below a specific limit. This either takes an
413           absolute number of tasks or a percentage value that is taken
414           relative to the configured maximum number of tasks on the system.
415           If assigned the special value "infinity", no tasks limit is
416           applied. This controls the "pids.max" control group attribute. For
417           details about this control group attribute, the pids controller[7].
418
419           The system default for this setting may be controlled with
420           DefaultTasksMax= in systemd-system.conf(5).
421
422   IO Accounting and Control
423       IOAccounting=
424           This setting controls the io controller in the unified hierarchy.
425
426           Turn on Block I/O accounting for this unit, if the unified control
427           group hierarchy is used on the system. Takes a boolean argument.
428           Note that turning on block I/O accounting for one unit will also
429           implicitly turn it on for all units contained in the same slice and
430           all for its parent slices and the units contained therein. The
431           system default for this setting may be controlled with
432           DefaultIOAccounting= in systemd-system.conf(5).
433
434       IOWeight=weight, StartupIOWeight=weight
435           These settings control the io controller in the unified hierarchy.
436
437           Set the default overall block I/O weight for the executed
438           processes, if the unified control group hierarchy is used on the
439           system. Takes a single weight value (between 1 and 10000) to set
440           the default block I/O weight. This controls the "io.weight" control
441           group attribute, which defaults to 100. For details about this
442           control group attribute, see IO Interface Files[8]. The available
443           I/O bandwidth is split up among all units within one slice relative
444           to their block I/O weight. A higher weight means more I/O
445           bandwidth, a lower weight means less.
446
447           While StartupIOWeight= applies to the startup and shutdown phases
448           of the system, IOWeight= applies to the later runtime of the
449           system, and if the former is not set also to the startup and
450           shutdown phases. This allows prioritizing specific services at
451           boot-up and shutdown differently than during runtime.
452
453       IODeviceWeight=device weight
454           This setting controls the io controller in the unified hierarchy.
455
456           Set the per-device overall block I/O weight for the executed
457           processes, if the unified control group hierarchy is used on the
458           system. Takes a space-separated pair of a file path and a weight
459           value to specify the device specific weight value, between 1 and
460           10000. (Example: "/dev/sda 1000"). The file path may be specified
461           as path to a block device node or as any other file, in which case
462           the backing block device of the file system of the file is
463           determined. This controls the "io.weight" control group attribute,
464           which defaults to 100. Use this option multiple times to set
465           weights for multiple devices. For details about this control group
466           attribute, see IO Interface Files[8].
467
468           The specified device node should reference a block device that has
469           an I/O scheduler associated, i.e. should not refer to partition or
470           loopback block devices, but to the originating, physical device.
471           When a path to a regular file or directory is specified it is
472           attempted to discover the correct originating device backing the
473           file system of the specified path. This works correctly only for
474           simpler cases, where the file system is directly placed on a
475           partition or physical block device, or where simple 1:1 encryption
476           using dm-crypt/LUKS is used. This discovery does not cover complex
477           storage and in particular RAID and volume management storage
478           devices.
479
480       IOReadBandwidthMax=device bytes, IOWriteBandwidthMax=device bytes
481           These settings control the io controller in the unified hierarchy.
482
483           Set the per-device overall block I/O bandwidth maximum limit for
484           the executed processes, if the unified control group hierarchy is
485           used on the system. This limit is not work-conserving and the
486           executed processes are not allowed to use more even if the device
487           has idle capacity. Takes a space-separated pair of a file path and
488           a bandwidth value (in bytes per second) to specify the device
489           specific bandwidth. The file path may be a path to a block device
490           node, or as any other file in which case the backing block device
491           of the file system of the file is used. If the bandwidth is
492           suffixed with K, M, G, or T, the specified bandwidth is parsed as
493           Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
494           base of 1000. (Example:
495           "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
496           controls the "io.max" control group attributes. Use this option
497           multiple times to set bandwidth limits for multiple devices. For
498           details about this control group attribute, see IO Interface
499           Files[8].
500
501           Similar restrictions on block device discovery as for
502           IODeviceWeight= apply, see above.
503
504       IOReadIOPSMax=device IOPS, IOWriteIOPSMax=device IOPS
505           These settings control the io controller in the unified hierarchy.
506
507           Set the per-device overall block I/O IOs-Per-Second maximum limit
508           for the executed processes, if the unified control group hierarchy
509           is used on the system. This limit is not work-conserving and the
510           executed processes are not allowed to use more even if the device
511           has idle capacity. Takes a space-separated pair of a file path and
512           an IOPS value to specify the device specific IOPS. The file path
513           may be a path to a block device node, or as any other file in which
514           case the backing block device of the file system of the file is
515           used. If the IOPS is suffixed with K, M, G, or T, the specified
516           IOPS is parsed as KiloIOPS, MegaIOPS, GigaIOPS, or TeraIOPS,
517           respectively, to the base of 1000. (Example:
518           "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This
519           controls the "io.max" control group attributes. Use this option
520           multiple times to set IOPS limits for multiple devices. For details
521           about this control group attribute, see IO Interface Files[8].
522
523           Similar restrictions on block device discovery as for
524           IODeviceWeight= apply, see above.
525
526       IODeviceLatencyTargetSec=device target
527           This setting controls the io controller in the unified hierarchy.
528
529           Set the per-device average target I/O latency for the executed
530           processes, if the unified control group hierarchy is used on the
531           system. Takes a file path and a timespan separated by a space to
532           specify the device specific latency target. (Example: "/dev/sda
533           25ms"). The file path may be specified as path to a block device
534           node or as any other file, in which case the backing block device
535           of the file system of the file is determined. This controls the
536           "io.latency" control group attribute. Use this option multiple
537           times to set latency target for multiple devices. For details about
538           this control group attribute, see IO Interface Files[8].
539
540           Implies "IOAccounting=yes".
541
542           These settings are supported only if the unified control group
543           hierarchy is used.
544
545           Similar restrictions on block device discovery as for
546           IODeviceWeight= apply, see above.
547
548   Network Accounting and Control
549       IPAccounting=
550           Takes a boolean argument. If true, turns on IPv4 and IPv6 network
551           traffic accounting for packets sent or received by the unit. When
552           this option is turned on, all IPv4 and IPv6 sockets created by any
553           process of the unit are accounted for.
554
555           When this option is used in socket units, it applies to all IPv4
556           and IPv6 sockets associated with it (including both listening and
557           connection sockets where this applies). Note that for
558           socket-activated services, this configuration setting and the
559           accounting data of the service unit and the socket unit are kept
560           separate, and displayed separately. No propagation of the setting
561           and the collected statistics is done, in either direction.
562           Moreover, any traffic sent or received on any of the socket unit's
563           sockets is accounted to the socket unit — and never to the service
564           unit it might have activated, even if the socket is used by it.
565
566           The system default for this setting may be controlled with
567           DefaultIPAccounting= in systemd-system.conf(5).
568
569       IPAddressAllow=ADDRESS[/PREFIXLENGTH]...,
570       IPAddressDeny=ADDRESS[/PREFIXLENGTH]...
571           Turn on network traffic filtering for IP packets sent and received
572           over AF_INET and AF_INET6 sockets. Both directives take a space
573           separated list of IPv4 or IPv6 addresses, each optionally suffixed
574           with an address prefix length in bits after a "/" character. If the
575           suffix is omitted, the address is considered a host address, i.e.
576           the filter covers the whole address (32 bits for IPv4, 128 bits for
577           IPv6).
578
579           The access lists configured with this option are applied to all
580           sockets created by processes of this unit (or in the case of socket
581           units, associated with it). The lists are implicitly combined with
582           any lists configured for any of the parent slice units this unit
583           might be a member of. By default both access lists are empty. Both
584           ingress and egress traffic is filtered by these settings. In case
585           of ingress traffic the source IP address is checked against these
586           access lists, in case of egress traffic the destination IP address
587           is checked. The following rules are applied in turn:
588
589           •   Access is granted when the checked IP address matches an entry
590               in the IPAddressAllow= list.
591
592           •   Otherwise, access is denied when the checked IP address matches
593               an entry in the IPAddressDeny= list.
594
595           •   Otherwise, access is granted.
596
597           In order to implement an allow-listing IP firewall, it is
598           recommended to use a IPAddressDeny=any setting on an upper-level
599           slice unit (such as the root slice -.slice or the slice containing
600           all system services system.slice – see systemd.special(7) for
601           details on these slice units), plus individual per-service
602           IPAddressAllow= lines permitting network access to relevant
603           services, and only them.
604
605           Note that for socket-activated services, the IP access list
606           configured on the socket unit applies to all sockets associated
607           with it directly, but not to any sockets created by the ultimately
608           activated services for it. Conversely, the IP access list
609           configured for the service is not applied to any sockets passed
610           into the service via socket activation. Thus, it is usually a good
611           idea to replicate the IP access lists on both the socket and the
612           service unit. Nevertheless, it may make sense to maintain one list
613           more open and the other one more restricted, depending on the
614           usecase.
615
616           If these settings are used multiple times in the same unit the
617           specified lists are combined. If an empty string is assigned to
618           these settings the specific access list is reset and all previous
619           settings undone.
620
621           In place of explicit IPv4 or IPv6 address and prefix length
622           specifications a small set of symbolic names may be used. The
623           following names are defined:
624
625           Table 1. Special address/network names
626           ┌──────────────┬─────────────────────┬─────────────────────┐
627           │Symbolic Name │ Definition          │ Meaning             │
628           ├──────────────┼─────────────────────┼─────────────────────┤
629           │any           │ 0.0.0.0/0 ::/0      │ Any host            │
630           ├──────────────┼─────────────────────┼─────────────────────┤
631           │localhost     │ 127.0.0.0/8 ::1/128 │ All addresses on    │
632           │              │                     │ the local loopback  │
633           ├──────────────┼─────────────────────┼─────────────────────┤
634           │link-local    │ 169.254.0.0/16      │ All link-local IP   │
635           │              │ fe80::/64           │ addresses           │
636           ├──────────────┼─────────────────────┼─────────────────────┤
637           │multicast     │ 224.0.0.0/4         │ All IP multicasting │
638           │              │ ff00::/8            │ addresses           │
639           └──────────────┴─────────────────────┴─────────────────────┘
640           Note that these settings might not be supported on some systems
641           (for example if eBPF control group support is not enabled in the
642           underlying kernel or container manager). These settings will have
643           no effect in that case. If compatibility with such systems is
644           desired it is hence recommended to not exclusively rely on them for
645           IP security.
646
647           This option cannot be bypassed by prefixing "+" to the executable
648           path in the service unit, as it applies to the whole control group.
649
650       SocketBindAllow=bind-rule, SocketBindDeny=bind-rule
651           Allow or deny binding a socket address to a socket by matching it
652           with the bind-rule and applying a corresponding action if there is
653           a match.
654
655           bind-rule describes socket properties such as address-family,
656           transport-protocol and ip-ports.
657
658           bind-rule := { [address-family:][transport-protocol:][ip-ports] |
659           any }
660
661           address-family := { ipv4 | ipv6 }
662
663           transport-protocol := { tcp | udp }
664
665           ip-ports := { ip-port | ip-port-range }
666
667           An optional address-family expects ipv4 or ipv6 values. If not
668           specified, a rule will be matched for both IPv4 and IPv6 addresses
669           and applied depending on other socket fields, e.g.
670           transport-protocol, ip-port.
671
672           An optional transport-protocol expects tcp or udp transport
673           protocol names. If not specified, a rule will be matched for any
674           transport protocol.
675
676           An optional ip-port value must lie within 1...65535 interval
677           inclusively, i.e. dynamic port 0 is not allowed. A range of
678           sequential ports is described by ip-port-range :=
679           ip-port-low-ip-port-high, where ip-port-low is smaller than or
680           equal to ip-port-high and both are within 1...65535 inclusively.
681
682           A special value any can be used to apply a rule to any address
683           family, transport protocol and any port with a positive value.
684
685           To allow multiple rules assign SocketBindAllow= or SocketBindDeny=
686           multiple times. To clear the existing assignments pass an empty
687           SocketBindAllow= or SocketBindDeny= assignment.
688
689           For each of SocketBindAllow= and SocketBindDeny=, maximum allowed
690           number of assignments is 128.
691
692           •   Binding to a socket is allowed when a socket address matches an
693               entry in the SocketBindAllow= list.
694
695           •   Otherwise, binding is denied when the socket address matches an
696               entry in the SocketBindDeny= list.
697
698           •   Otherwise, binding is allowed.
699
700           The feature is implemented with cgroup/bind4 and cgroup/bind6
701           cgroup-bpf hooks.
702
703           Examples:
704
705               ...
706               # Allow binding IPv6 socket addresses with a port greater than or equal to 10000.
707               [Service]
708               SocketBindAllow=ipv6:10000-65535
709               SocketBindDeny=any
710               ...
711               # Allow binding IPv4 and IPv6 socket addresses with 1234 and 4321 ports.
712               [Service]
713               SocketBindAllow=1234
714               SocketBindAllow=4321
715               SocketBindDeny=any
716               ...
717               # Deny binding IPv6 socket addresses.
718               [Service]
719               SocketBindDeny=ipv6
720               ...
721               # Deny binding IPv4 and IPv6 socket addresses.
722               [Service]
723               SocketBindDeny=any
724               ...
725               # Allow binding only over TCP
726               [Service]
727               SocketBindAllow=tcp
728               SocketBindDeny=any
729               ...
730               # Allow binding only over IPv6/TCP
731               [Service]
732               SocketBindAllow=ipv6:tcp
733               SocketBindDeny=any
734               ...
735               # Allow binding ports within 10000-65535 range over IPv4/UDP.
736               [Service]
737               SocketBindAllow=ipv4:udp:10000-65535
738               SocketBindDeny=any
739               ...
740
741           This option cannot be bypassed by prefixing "+" to the executable
742           path in the service unit, as it applies to the whole control group.
743
744       RestrictNetworkInterfaces=
745           Takes a list of space-separated network interface names. This
746           option restricts the network interfaces that processes of this unit
747           can use. By default processes can only use the network interfaces
748           listed (allow-list). If the first character of the rule is "~", the
749           effect is inverted: the processes can only use network interfaces
750           not listed (deny-list).
751
752           This option can appear multiple times, in which case the network
753           interface names are merged. If the empty string is assigned the set
754           is reset, all prior assignments will have not effect.
755
756           If you specify both types of this option (i.e. allow-listing and
757           deny-listing), the first encountered will take precedence and will
758           dictate the default action (allow vs deny). Then the next
759           occurrences of this option will add or delete the listed network
760           interface names from the set, depending of its type and the default
761           action.
762
763           The loopback interface ("lo") is not treated in any special way,
764           you have to configure it explicitly in the unit file.
765
766           Example 1: allow-list
767
768               RestrictNetworkInterfaces=eth1
769               RestrictNetworkInterfaces=eth2
770
771           Programs in the unit will be only able to use the eth1 and eth2
772           network interfaces.
773
774           Example 2: deny-list
775
776               RestrictNetworkInterfaces=~eth1 eth2
777
778           Programs in the unit will be able to use any network interface but
779           eth1 and eth2.
780
781           Example 3: mixed
782
783               RestrictNetworkInterfaces=eth1 eth2
784               RestrictNetworkInterfaces=~eth1
785
786           Programs in the unit will be only able to use the eth2 network
787           interface.
788
789           This option cannot be bypassed by prefixing "+" to the executable
790           path in the service unit, as it applies to the whole control group.
791
792   BPF Programs
793       IPIngressFilterPath=BPF_FS_PROGRAM_PATH,
794       IPEgressFilterPath=BPF_FS_PROGRAM_PATH
795           Add custom network traffic filters implemented as BPF programs,
796           applying to all IP packets sent and received over AF_INET and
797           AF_INET6 sockets. Takes an absolute path to a pinned BPF program in
798           the BPF virtual filesystem (/sys/fs/bpf/).
799
800           The filters configured with this option are applied to all sockets
801           created by processes of this unit (or in the case of socket units,
802           associated with it). The filters are loaded in addition to filters
803           any of the parent slice units this unit might be a member of as
804           well as any IPAddressAllow= and IPAddressDeny= filters in any of
805           these units. By default there are no filters specified.
806
807           If these settings are used multiple times in the same unit all the
808           specified programs are attached. If an empty string is assigned to
809           these settings the program list is reset and all previous specified
810           programs ignored.
811
812           If the path BPF_FS_PROGRAM_PATH in IPIngressFilterPath= assignment
813           is already being handled by BPFProgram= ingress hook, e.g.
814           BPFProgram=ingress:BPF_FS_PROGRAM_PATH, the assignment will be
815           still considered valid and the program will be attached to a
816           cgroup. Same for IPEgressFilterPath= path and egress hook.
817
818           Note that for socket-activated services, the IP filter programs
819           configured on the socket unit apply to all sockets associated with
820           it directly, but not to any sockets created by the ultimately
821           activated services for it. Conversely, the IP filter programs
822           configured for the service are not applied to any sockets passed
823           into the service via socket activation. Thus, it is usually a good
824           idea, to replicate the IP filter programs on both the socket and
825           the service unit, however it often makes sense to maintain one
826           configuration more open and the other one more restricted,
827           depending on the usecase.
828
829           Note that these settings might not be supported on some systems
830           (for example if eBPF control group support is not enabled in the
831           underlying kernel or container manager). These settings will fail
832           the service in that case. If compatibility with such systems is
833           desired it is hence recommended to attach your filter manually
834           (requires Delegate=yes) instead of using this setting.
835
836       BPFProgram=type:program-path
837           BPFProgram= allows attaching custom BPF programs to the cgroup of a
838           unit. (This generalizes the functionality exposed via
839           IPEgressFilterPath= and and IPIngressFilterPath= for other hooks.)
840           Cgroup-bpf hooks in the form of BPF programs loaded to the BPF
841           filesystem are attached with cgroup-bpf attach flags determined by
842           the unit. For details about attachment types and flags see
843           bpf.h[9]. Also refer to the general BPF documentation[10].
844
845           The specification of BPF program consists of a pair of BPF program
846           type and program path in the file system, with ":" as the
847           separator: type:program-path.
848
849           The BPF program type is equivalent to the BPF attach type used in
850           bpftool. It may be one of egress, ingress, sock_create, sock_ops,
851           device, bind4, bind6, connect4, connect6, post_bind4, post_bind6,
852           sendmsg4, sendmsg6, sysctl, recvmsg4, recvmsg6, getsockopt,
853           setsockopt.
854
855           The specified program path must be an absolute path referencing a
856           BPF program inode in the bpffs file system (which generally means
857           it must begin with /sys/fs/bpf/). If a specified program does not
858           exist (i.e. has not been uploaded to the BPF subsystem of the
859           kernel yet), it will not be installed but unit activation will
860           continue (a warning will be printed to the logs).
861
862           Setting BPFProgram= to an empty value makes previous assignments
863           ineffective.
864
865           Multiple assignments of the same program type/path pair have the
866           same effect as a single assignment: the program will be attached
867           just once.
868
869           If BPF egress pinned to program-path path is already being handled
870           by IPEgressFilterPath=, BPFProgram= assignment will be considered
871           valid and BPFProgram= will be attached to a cgroup. Similarly for
872           ingress hook and IPIngressFilterPath= assignment.
873
874           BPF programs passed with BPFProgram= are attached to the cgroup of
875           a unit with BPF attach flag multi, that allows further attachments
876           of the same type within cgroup hierarchy topped by the unit cgroup.
877
878           Examples:
879
880               BPFProgram=egress:/sys/fs/bpf/egress-hook
881               BPFProgram=bind6:/sys/fs/bpf/sock-addr-hook
882
883   Device Access
884       DeviceAllow=
885           Control access to specific device nodes by the executed processes.
886           Takes two space-separated strings: a device node specifier followed
887           by a combination of r, w, m to control reading, writing, or
888           creation of the specific device nodes by the unit (mknod),
889           respectively. This functionality is implemented using eBPF
890           filtering.
891
892           When access to all physical devices should be disallowed,
893           PrivateDevices= may be used instead. See systemd.exec(5).
894
895           The device node specifier is either a path to a device node in the
896           file system, starting with /dev/, or a string starting with either
897           "char-" or "block-" followed by a device group name, as listed in
898           /proc/devices. The latter is useful to allow-list all current and
899           future devices belonging to a specific device group at once. The
900           device group is matched according to filename globbing rules, you
901           may hence use the "*" and "?"  wildcards. (Note that such globbing
902           wildcards are not available for device node path specifications!)
903           In order to match device nodes by numeric major/minor, use device
904           node paths in the /dev/char/ and /dev/block/ directories. However,
905           matching devices by major/minor is generally not recommended as
906           assignments are neither stable nor portable between systems or
907           different kernel versions.
908
909           Examples: /dev/sda5 is a path to a device node, referring to an ATA
910           or SCSI block device.  "char-pts" and "char-alsa" are specifiers
911           for all pseudo TTYs and all ALSA sound devices, respectively.
912           "char-cpu/*" is a specifier matching all CPU related device groups.
913
914           Note that allow lists defined this way should only reference device
915           groups which are resolvable at the time the unit is started. Any
916           device groups not resolvable then are not added to the device allow
917           list. In order to work around this limitation, consider extending
918           service units with a pair of After=modprobe@xyz.service and
919           Wants=modprobe@xyz.service lines that load the necessary kernel
920           module implementing the device group if missing. Example:
921
922               ...
923               [Unit]
924               Wants=modprobe@loop.service
925               After=modprobe@loop.service
926
927               [Service]
928               DeviceAllow=block-loop
929               DeviceAllow=/dev/loop-control
930               ...
931
932           This option cannot be bypassed by prefixing "+" to the executable
933           path in the service unit, as it applies to the whole control group.
934
935       DevicePolicy=auto|closed|strict
936           Control the policy for allowing device access:
937
938           strict
939               means to only allow types of access that are explicitly
940               specified.
941
942           closed
943               in addition, allows access to standard pseudo devices including
944               /dev/null, /dev/zero, /dev/full, /dev/random, and /dev/urandom.
945
946           auto
947               in addition, allows access to all devices if no explicit
948               DeviceAllow= is present. This is the default.
949
950           This option cannot be bypassed by prefixing "+" to the executable
951           path in the service unit, as it applies to the whole control group.
952
953   Control Group Management
954       Slice=
955           The name of the slice unit to place the unit in. Defaults to
956           system.slice for all non-instantiated units of all unit types
957           (except for slice units themselves see below). Instance units are
958           by default placed in a subslice of system.slice that is named after
959           the template name.
960
961           This option may be used to arrange systemd units in a hierarchy of
962           slices each of which might have resource settings applied.
963
964           For units of type slice, the only accepted value for this setting
965           is the parent slice. Since the name of a slice unit implies the
966           parent slice, it is hence redundant to ever set this parameter
967           directly for slice units.
968
969           Special care should be taken when relying on the default slice
970           assignment in templated service units that have
971           DefaultDependencies=no set, see systemd.service(5), section
972           "Default Dependencies" for details.
973
974       Delegate=
975           Turns on delegation of further resource control partitioning to
976           processes of the unit. Units where this is enabled may create and
977           manage their own private subhierarchy of control groups below the
978           control group of the unit itself. For unprivileged services (i.e.
979           those using the User= setting) the unit's control group will be
980           made accessible to the relevant user.
981
982           When enabled the service manager will refrain from manipulating
983           control groups or moving processes below the unit's control group,
984           so that a clear concept of ownership is established: the control
985           group tree at the level of the unit's control group and above (i.e.
986           towards the root control group) is owned and managed by the service
987           manager of the host, while the control group tree below the unit's
988           control group is owned and managed by the unit itself.
989
990           Takes either a boolean argument or a (possibly empty) list of
991           control group controller names. If true, delegation is turned on,
992           and all supported controllers are enabled for the unit, making them
993           available to the unit's processes for management. If false,
994           delegation is turned off entirely (and no additional controllers
995           are enabled). If set to a list of controllers, delegation is turned
996           on, and the specified controllers are enabled for the unit.
997           Assigning the empty string will enable delegation, but reset the
998           list of controllers, and all assignments prior to this will have no
999           effect. Note that additional controllers other than the ones
1000           specified might be made available as well, depending on
1001           configuration of the containing slice unit or other units contained
1002           in it. Defaults to false.
1003
1004           Note that controller delegation to less privileged code is only
1005           safe on the unified control group hierarchy. Accordingly, access to
1006           the specified controllers will not be granted to unprivileged
1007           services on the legacy hierarchy, even when requested.
1008
1009           The following controller names may be specified: cpu, cpuacct,
1010           cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
1011           bpf-devices.
1012
1013           Not all of these controllers are available on all kernels however,
1014           and some are specific to the unified hierarchy while others are
1015           specific to the legacy hierarchy. Also note that the kernel might
1016           support further controllers, which aren't covered here yet as
1017           delegation is either not supported at all for them or not defined
1018           cleanly.
1019
1020           Note that because of the hierarchical nature of cgroup hierarchy,
1021           any controllers that are delegated will be enabled for the parent
1022           and sibling units of the unit with delegation.
1023
1024           For further details on the delegation model consult Control Group
1025           APIs and Delegation[11].
1026
1027       DisableControllers=
1028           Disables controllers from being enabled for a unit's children. If a
1029           controller listed is already in use in its subtree, the controller
1030           will be removed from the subtree. This can be used to avoid
1031           configuration in child units from being able to implicitly or
1032           explicitly enable a controller. Defaults to empty.
1033
1034           Multiple controllers may be specified, separated by spaces. You may
1035           also pass DisableControllers= multiple times, in which case each
1036           new instance adds another controller to disable. Passing
1037           DisableControllers= by itself with no controller name present
1038           resets the disabled controller list.
1039
1040           It may not be possible to disable a controller after units have
1041           been started, if the unit or any child of the unit in question
1042           delegates controllers to its children, as any delegated subtree of
1043           the cgroup hierarchy is unmanaged by systemd.
1044
1045           The following controller names may be specified: cpu, cpuacct,
1046           cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
1047           bpf-devices.
1048
1049   Memory Pressure Control
1050       ManagedOOMSwap=auto|kill, ManagedOOMMemoryPressure=auto|kill
1051           Specifies how systemd-oomd.service(8) will act on this unit's
1052           cgroups. Defaults to auto.
1053
1054           When set to kill, the unit becomes a candidate for monitoring by
1055           systemd-oomd. If the cgroup passes the limits set by oomd.conf(5)
1056           or the unit configuration, systemd-oomd will select a descendant
1057           cgroup and send SIGKILL to all of the processes under it. You can
1058           find more details on candidates and kill behavior at systemd-
1059           oomd.service(8) and oomd.conf(5).
1060
1061           Setting either of these properties to kill will also result in
1062           After= and Wants= dependencies on systemd-oomd.service unless
1063           DefaultDependencies=no.
1064
1065           When set to auto, systemd-oomd will not actively use this cgroup's
1066           data for monitoring and detection. However, if an ancestor cgroup
1067           has one of these properties set to kill, a unit with auto can still
1068           be a candidate for systemd-oomd to terminate.
1069
1070       ManagedOOMMemoryPressureLimit=
1071           Overrides the default memory pressure limit set by oomd.conf(5) for
1072           this unit (cgroup). Takes a percentage value between 0% and 100%,
1073           inclusive. This property is ignored unless
1074           ManagedOOMMemoryPressure=kill. Defaults to 0%, which means to use
1075           the default set by oomd.conf(5).
1076
1077       ManagedOOMPreference=none|avoid|omit
1078           Allows deprioritizing or omitting this unit's cgroup as a candidate
1079           when systemd-oomd needs to act. Requires support for extended
1080           attributes (see xattr(7)) in order to use avoid or omit.
1081
1082           When calculating candidates to relieve swap usage, systemd-oomd
1083           will only respect these extended attributes if the unit's cgroup is
1084           owned by root.
1085
1086           When calculating candidates to relieve memory pressure,
1087           systemd-oomd will only respect these extended attributes if the
1088           unit's cgroup is owned by root, or if the unit's cgroup owner, and
1089           the owner of the monitored ancestor cgroup are the same. For
1090           example, if systemd-oomd is calculating candidates for -.slice,
1091           then extended attributes set on descendants of
1092           /user.slice/user-1000.slice/user@1000.service/ will be ignored
1093           because the descendants are owned by UID 1000, and -.slice is owned
1094           by UID 0. But, if calculating candidates for
1095           /user.slice/user-1000.slice/user@1000.service/, then extended
1096           attributes set on the descendants would be respected.
1097
1098           If this property is set to avoid, the service manager will convey
1099           this to systemd-oomd, which will only select this cgroup if there
1100           are no other viable candidates.
1101
1102           If this property is set to omit, the service manager will convey
1103           this to systemd-oomd, which will ignore this cgroup as a candidate
1104           and will not perform any actions on it.
1105
1106           It is recommended to use avoid and omit sparingly, as it can
1107           adversely affect systemd-oomd's kill behavior. Also note that these
1108           extended attributes are not applied recursively to cgroups under
1109           this unit's cgroup.
1110
1111           Defaults to none which means systemd-oomd will rank this unit's
1112           cgroup as defined in systemd-oomd.service(8) and oomd.conf(5).
1113

HISTORY

1115       systemd 252
1116           Options for controlling the Legacy Control Group Hierarchy (Control
1117           Groups version 1[12]) are now fully deprecated: CPUShares=weight,
1118           StartupCPUShares=weight, MemoryLimit=bytes, BlockIOAccounting=,
1119           BlockIOWeight=weight, StartupBlockIOWeight=weight,
1120           BlockIODeviceWeight=device weight, BlockIOReadBandwidth=device
1121           bytes, BlockIOWriteBandwidth=device bytes. Please switch to the
1122           unified cgroup hierarchy.
1123

NOTES

1133        1. New Control Group Interfaces
1134           https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface
1135
1136        2. Control Groups v2
1137           https://docs.kernel.org/admin-guide/cgroup-v2.html
1138
1139        3. CFS Scheduler
1140           https://docs.kernel.org/scheduler/sched-design-CFS.html
1141
1142        4. CFS Bandwidth Control
1143           https://docs.kernel.org/scheduler/sched-bwc.html
1144
1145        5. Memory Interface Files
1146           https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files
1147
1148        6. Zswap
1149           https://www.kernel.org/doc/html/latest/admin-guide/mm/zswap.html
1150
1151        7. pids controller
1152           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#pid
1153
1154        8. IO Interface Files
1155           https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files
1156
1157        9. bpf.h
1158           https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/bpf.h
1159
1160       10. BPF documentation
1161           https://docs.kernel.org/bpf/
1162
1163       11. Control Group APIs and Delegation
1164           https://systemd.io/CGROUP_DELEGATION
1165
1166       12. Control Groups version 1
1167           https://docs.kernel.org/admin-guide/cgroup-v1/index.html
1168
1169
1170
1171systemd 253                                        SYSTEMD.RESOURCE-CONTROL(5)