1SYSTEMD.RESOURCE-CONTROL(5)systemd.resource-controlSYSTEMD.RESOURCE-CONTROL(5)
2
3
4

NAME

6       systemd.resource-control - Resource control unit settings
7

SYNOPSIS

9       slice.slice, scope.scope, service.service, socket.socket, mount.mount,
10       swap.swap
11

DESCRIPTION

13       Unit configuration files for services, slices, scopes, sockets, mount
14       points, and swap devices share a subset of configuration options for
15       resource control of spawned processes. Internally, this relies on the
16       Linux Control Groups (cgroups) kernel concept for organizing processes
17       in a hierarchical tree of named groups for the purpose of resource
18       management.
19
20       This man page lists the configuration options shared by those six unit
21       types. See systemd.unit(5) for the common options of all unit
22       configuration files, and systemd.slice(5), systemd.scope(5),
23       systemd.service(5), systemd.socket(5), systemd.mount(5), and
24       systemd.swap(5) for more information on the specific unit configuration
25       files. The resource control configuration options are configured in the
26       [Slice], [Scope], [Service], [Socket], [Mount], or [Swap] sections,
27       depending on the unit type.
28
29       In addition, options which control resources available to programs
30       executed by systemd are listed in systemd.exec(5). Those options
31       complement options listed here.
32
33       See the New Control Group Interfaces[1] for an introduction on how to
34       make use of resource control APIs from programs.
35
36   Setting resource controls for a group of related units
37       As described in systemd.unit(5), the settings listed here may be set
38       through the main file of a unit and drop-in snippets in *.d/
39       directories. The list of directories searched for drop-ins includes
40       names formed by repeatedly truncating the unit name after all dashes.
41       This is particularly convenient to set resource limits for a group of
42       units with similar names.
43
44       For example, every user gets their own slice user-nnn.slice. Drop-ins
45       with local configuration that affect user 1000 may be placed in
46       /etc/systemd/system/user-1000.slice,
47       /etc/systemd/system/user-1000.slice.d/*.conf, but also
48       /etc/systemd/system/user-.slice.d/*.conf. This last directory applies
49       to all user slices.
50

IMPLICIT DEPENDENCIES

52       The following dependencies are implicitly added:
53
54       •   Units with the Slice= setting set automatically acquire Requires=
55           and After= dependencies on the specified slice unit.
56

UNIFIED AND LEGACY CONTROL GROUP HIERARCHIES

58       The unified control group hierarchy is the new version of kernel
59       control group interface, see Control Groups v2[2]. Depending on the
60       resource type, there are differences in resource control capabilities.
61       Also, because of interface changes, some resource types have separate
62       set of options on the unified hierarchy.
63
64       CPU
65           CPUWeight= and StartupCPUWeight= replace CPUShares= and
66           StartupCPUShares=, respectively.
67
68           The "cpuacct" controller does not exist separately on the unified
69           hierarchy.
70
71       Memory
72           MemoryMax= replaces MemoryLimit=.  MemoryLow= and MemoryHigh= are
73           effective only on unified hierarchy.
74
75       IO
76           "IO"-prefixed settings are a superset of and replace
77           "BlockIO"-prefixed ones. On unified hierarchy, IO resource control
78           also applies to buffered writes.
79
80       To ease the transition, there is best-effort translation between the
81       two versions of settings. For each controller, if any of the settings
82       for the unified hierarchy are present, all settings for the legacy
83       hierarchy are ignored. If the resulting settings are for the other type
84       of hierarchy, the configurations are translated before application.
85
86       Legacy control group hierarchy (see Control Groups version 1[3]), also
87       called cgroup-v1, doesn't allow safe delegation of controllers to
88       unprivileged processes. If the system uses the legacy control group
89       hierarchy, resource control is disabled for the systemd user instance,
90       see systemd(1).
91

OPTIONS

93       Units of the types listed above can have settings for resource control
94       configuration:
95
96       CPUAccounting=
97           Turn on CPU usage accounting for this unit. Takes a boolean
98           argument. Note that turning on CPU accounting for one unit will
99           also implicitly turn it on for all units contained in the same
100           slice and for all its parent slices and the units contained
101           therein. The system default for this setting may be controlled with
102           DefaultCPUAccounting= in systemd-system.conf(5).
103
104       CPUWeight=weight, StartupCPUWeight=weight
105           Assign the specified CPU time weight to the processes executed, if
106           the unified control group hierarchy is used on the system. These
107           options take an integer value and control the "cpu.weight" control
108           group attribute. The allowed range is 1 to 10000. Defaults to 100.
109           For details about this control group attribute, see Control Groups
110           v2[2] and CFS Scheduler[4]. The available CPU time is split up
111           among all units within one slice relative to their CPU time weight.
112           A higher weight means more CPU time, a lower weight means less.
113
114           While StartupCPUWeight= applies to the startup and shutdown phases
115           of the system, CPUWeight= applies to normal runtime of the system,
116           and if the former is not set also to the startup and shutdown
117           phases. Using StartupCPUWeight= allows prioritizing specific
118           services at boot-up and shutdown differently than during normal
119           runtime.
120
121           These settings replace CPUShares= and StartupCPUShares=.
122
123       CPUQuota=
124           Assign the specified CPU time quota to the processes executed.
125           Takes a percentage value, suffixed with "%". The percentage
126           specifies how much CPU time the unit shall get at maximum, relative
127           to the total CPU time available on one CPU. Use values > 100% for
128           allotting CPU time on more than one CPU. This controls the
129           "cpu.max" attribute on the unified control group hierarchy and
130           "cpu.cfs_quota_us" on legacy. For details about these control group
131           attributes, see Control Groups v2[2] and sched-bwc.txt[5]. Setting
132           CPUQuota= to an empty value unsets the quota.
133
134           Example: CPUQuota=20% ensures that the executed processes will
135           never get more than 20% CPU time on one CPU.
136
137       CPUQuotaPeriodSec=
138           Assign the duration over which the CPU time quota specified by
139           CPUQuota= is measured. Takes a time duration value in seconds, with
140           an optional suffix such as "ms" for milliseconds (or "s" for
141           seconds.) The default setting is 100ms. The period is clamped to
142           the range supported by the kernel, which is [1ms, 1000ms].
143           Additionally, the period is adjusted up so that the quota interval
144           is also at least 1ms. Setting CPUQuotaPeriodSec= to an empty value
145           resets it to the default.
146
147           This controls the second field of "cpu.max" attribute on the
148           unified control group hierarchy and "cpu.cfs_period_us" on legacy.
149           For details about these control group attributes, see Control
150           Groups v2[2] and CFS Scheduler[4].
151
152           Example: CPUQuotaPeriodSec=10ms to request that the CPU quota is
153           measured in periods of 10ms.
154
155       AllowedCPUs=, StartupAllowedCPUs=
156           Restrict processes to be executed on specific CPUs. Takes a list of
157           CPU indices or ranges separated by either whitespace or commas. CPU
158           ranges are specified by the lower and upper CPU indices separated
159           by a dash.
160
161           Setting AllowedCPUs= or StartupAllowedCPUs= doesn't guarantee that
162           all of the CPUs will be used by the processes as it may be limited
163           by parent units. The effective configuration is reported as
164           EffectiveCPUs=.
165
166           While StartupAllowedCPUs= applies to the startup and shutdown
167           phases of the system, AllowedCPUs= applies to normal runtime of the
168           system, and if the former is not set also to the startup and
169           shutdown phases. Using StartupAllowedCPUs= allows prioritizing
170           specific services at boot-up and shutdown differently than during
171           normal runtime.
172
173           This setting is supported only with the unified control group
174           hierarchy.
175
176       AllowedMemoryNodes=, StartupAllowedMemoryNodes=
177           Restrict processes to be executed on specific memory NUMA nodes.
178           Takes a list of memory NUMA nodes indices or ranges separated by
179           either whitespace or commas. Memory NUMA nodes ranges are specified
180           by the lower and upper NUMA nodes indices separated by a dash.
181
182           Setting AllowedMemoryNodes= or StartupAllowedMemoryNodes= doesn't
183           guarantee that all of the memory NUMA nodes will be used by the
184           processes as it may be limited by parent units. The effective
185           configuration is reported as EffectiveMemoryNodes=.
186
187           While StartupAllowedMemoryNodes= applies to the startup and
188           shutdown phases of the system, AllowedMemoryNodes= applies to
189           normal runtime of the system, and if the former is not set also to
190           the startup and shutdown phases. Using StartupAllowedMemoryNodes=
191           allows prioritizing specific services at boot-up and shutdown
192           differently than during normal runtime.
193
194           This setting is supported only with the unified control group
195           hierarchy.
196
197       MemoryAccounting=
198           Turn on process and kernel memory accounting for this unit. Takes a
199           boolean argument. Note that turning on memory accounting for one
200           unit will also implicitly turn it on for all units contained in the
201           same slice and for all its parent slices and the units contained
202           therein. The system default for this setting may be controlled with
203           DefaultMemoryAccounting= in systemd-system.conf(5).
204
205       MemoryMin=bytes, MemoryLow=bytes
206           Specify the memory usage protection of the executed processes in
207           this unit. When reclaiming memory, the unit is treated as if it was
208           using less memory resulting in memory to be preferentially
209           reclaimed from unprotected units. Using MemoryLow= results in a
210           weaker protection where memory may still be reclaimed to avoid
211           invoking the OOM killer in case there is no other reclaimable
212           memory.
213
214           For a protection to be effective, it is generally required to set a
215           corresponding allocation on all ancestors, which is then
216           distributed between children (with the exception of the root
217           slice). Any MemoryMin= or MemoryLow= allocation that is not
218           explicitly distributed to specific children is used to create a
219           shared protection for all children. As this is a shared protection,
220           the children will freely compete for the memory.
221
222           Takes a memory size in bytes. If the value is suffixed with K, M, G
223           or T, the specified memory size is parsed as Kilobytes, Megabytes,
224           Gigabytes, or Terabytes (with the base 1024), respectively.
225           Alternatively, a percentage value may be specified, which is taken
226           relative to the installed physical memory on the system. If
227           assigned the special value "infinity", all available memory is
228           protected, which may be useful in order to always inherit all of
229           the protection afforded by ancestors. This controls the
230           "memory.min" or "memory.low" control group attribute. For details
231           about this control group attribute, see Memory Interface Files[6].
232
233           This setting is supported only if the unified control group
234           hierarchy is used and disables MemoryLimit=.
235
236           Units may have their children use a default "memory.min" or
237           "memory.low" value by specifying DefaultMemoryMin= or
238           DefaultMemoryLow=, which has the same semantics as MemoryMin= and
239           MemoryLow=. This setting does not affect "memory.min" or
240           "memory.low" in the unit itself. Using it to set a default child
241           allocation is only useful on kernels older than 5.7, which do not
242           support the "memory_recursiveprot" cgroup2 mount option.
243
244       MemoryHigh=bytes
245           Specify the throttling limit on memory usage of the executed
246           processes in this unit. Memory usage may go above the limit if
247           unavoidable, but the processes are heavily slowed down and memory
248           is taken away aggressively in such cases. This is the main
249           mechanism to control memory usage of a unit.
250
251           Takes a memory size in bytes. If the value is suffixed with K, M, G
252           or T, the specified memory size is parsed as Kilobytes, Megabytes,
253           Gigabytes, or Terabytes (with the base 1024), respectively.
254           Alternatively, a percentage value may be specified, which is taken
255           relative to the installed physical memory on the system. If
256           assigned the special value "infinity", no memory throttling is
257           applied. This controls the "memory.high" control group attribute.
258           For details about this control group attribute, see Memory
259           Interface Files[6].
260
261           This setting is supported only if the unified control group
262           hierarchy is used and disables MemoryLimit=.
263
264       MemoryMax=bytes
265           Specify the absolute limit on memory usage of the executed
266           processes in this unit. If memory usage cannot be contained under
267           the limit, out-of-memory killer is invoked inside the unit. It is
268           recommended to use MemoryHigh= as the main control mechanism and
269           use MemoryMax= as the last line of defense.
270
271           Takes a memory size in bytes. If the value is suffixed with K, M, G
272           or T, the specified memory size is parsed as Kilobytes, Megabytes,
273           Gigabytes, or Terabytes (with the base 1024), respectively.
274           Alternatively, a percentage value may be specified, which is taken
275           relative to the installed physical memory on the system. If
276           assigned the special value "infinity", no memory limit is applied.
277           This controls the "memory.max" control group attribute. For details
278           about this control group attribute, see Memory Interface Files[6].
279
280           This setting replaces MemoryLimit=.
281
282       MemorySwapMax=bytes
283           Specify the absolute limit on swap usage of the executed processes
284           in this unit.
285
286           Takes a swap size in bytes. If the value is suffixed with K, M, G
287           or T, the specified swap size is parsed as Kilobytes, Megabytes,
288           Gigabytes, or Terabytes (with the base 1024), respectively. If
289           assigned the special value "infinity", no swap limit is applied.
290           This controls the "memory.swap.max" control group attribute. For
291           details about this control group attribute, see Memory Interface
292           Files[6].
293
294           This setting is supported only if the unified control group
295           hierarchy is used and disables MemoryLimit=.
296
297       TasksAccounting=
298           Turn on task accounting for this unit. Takes a boolean argument. If
299           enabled, the system manager will keep track of the number of tasks
300           in the unit. The number of tasks accounted this way includes both
301           kernel threads and userspace processes, with each thread counting
302           individually. Note that turning on tasks accounting for one unit
303           will also implicitly turn it on for all units contained in the same
304           slice and for all its parent slices and the units contained
305           therein. The system default for this setting may be controlled with
306           DefaultTasksAccounting= in systemd-system.conf(5).
307
308       TasksMax=N
309           Specify the maximum number of tasks that may be created in the
310           unit. This ensures that the number of tasks accounted for the unit
311           (see above) stays below a specific limit. This either takes an
312           absolute number of tasks or a percentage value that is taken
313           relative to the configured maximum number of tasks on the system.
314           If assigned the special value "infinity", no tasks limit is
315           applied. This controls the "pids.max" control group attribute. For
316           details about this control group attribute, see Process Number
317           Controller[7].
318
319           The system default for this setting may be controlled with
320           DefaultTasksMax= in systemd-system.conf(5).
321
322       IOAccounting=
323           Turn on Block I/O accounting for this unit, if the unified control
324           group hierarchy is used on the system. Takes a boolean argument.
325           Note that turning on block I/O accounting for one unit will also
326           implicitly turn it on for all units contained in the same slice and
327           all for its parent slices and the units contained therein. The
328           system default for this setting may be controlled with
329           DefaultIOAccounting= in systemd-system.conf(5).
330
331           This setting replaces BlockIOAccounting= and disables settings
332           prefixed with BlockIO or StartupBlockIO.
333
334       IOWeight=weight, StartupIOWeight=weight
335           Set the default overall block I/O weight for the executed
336           processes, if the unified control group hierarchy is used on the
337           system. Takes a single weight value (between 1 and 10000) to set
338           the default block I/O weight. This controls the "io.weight" control
339           group attribute, which defaults to 100. For details about this
340           control group attribute, see IO Interface Files[8]. The available
341           I/O bandwidth is split up among all units within one slice relative
342           to their block I/O weight. A higher weight means more I/O
343           bandwidth, a lower weight means less.
344
345           While StartupIOWeight= applies to the startup and shutdown phases
346           of the system, IOWeight= applies to the later runtime of the
347           system, and if the former is not set also to the startup and
348           shutdown phases. This allows prioritizing specific services at
349           boot-up and shutdown differently than during runtime.
350
351           These settings replace BlockIOWeight= and StartupBlockIOWeight= and
352           disable settings prefixed with BlockIO or StartupBlockIO.
353
354       IODeviceWeight=device weight
355           Set the per-device overall block I/O weight for the executed
356           processes, if the unified control group hierarchy is used on the
357           system. Takes a space-separated pair of a file path and a weight
358           value to specify the device specific weight value, between 1 and
359           10000. (Example: "/dev/sda 1000"). The file path may be specified
360           as path to a block device node or as any other file, in which case
361           the backing block device of the file system of the file is
362           determined. This controls the "io.weight" control group attribute,
363           which defaults to 100. Use this option multiple times to set
364           weights for multiple devices. For details about this control group
365           attribute, see IO Interface Files[8].
366
367           This setting replaces BlockIODeviceWeight= and disables settings
368           prefixed with BlockIO or StartupBlockIO.
369
370           The specified device node should reference a block device that has
371           an I/O scheduler associated, i.e. should not refer to partition or
372           loopback block devices, but to the originating, physical device.
373           When a path to a regular file or directory is specified it is
374           attempted to discover the correct originating device backing the
375           file system of the specified path. This works correctly only for
376           simpler cases, where the file system is directly placed on a
377           partition or physical block device, or where simple 1:1 encryption
378           using dm-crypt/LUKS is used. This discovery does not cover complex
379           storage and in particular RAID and volume management storage
380           devices.
381
382       IOReadBandwidthMax=device bytes, IOWriteBandwidthMax=device bytes
383           Set the per-device overall block I/O bandwidth maximum limit for
384           the executed processes, if the unified control group hierarchy is
385           used on the system. This limit is not work-conserving and the
386           executed processes are not allowed to use more even if the device
387           has idle capacity. Takes a space-separated pair of a file path and
388           a bandwidth value (in bytes per second) to specify the device
389           specific bandwidth. The file path may be a path to a block device
390           node, or as any other file in which case the backing block device
391           of the file system of the file is used. If the bandwidth is
392           suffixed with K, M, G, or T, the specified bandwidth is parsed as
393           Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
394           base of 1000. (Example:
395           "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
396           controls the "io.max" control group attributes. Use this option
397           multiple times to set bandwidth limits for multiple devices. For
398           details about this control group attribute, see IO Interface
399           Files[8].
400
401           These settings replace BlockIOReadBandwidth= and
402           BlockIOWriteBandwidth= and disable settings prefixed with BlockIO
403           or StartupBlockIO.
404
405           Similar restrictions on block device discovery as for
406           IODeviceWeight= apply, see above.
407
408       IOReadIOPSMax=device IOPS, IOWriteIOPSMax=device IOPS
409           Set the per-device overall block I/O IOs-Per-Second maximum limit
410           for the executed processes, if the unified control group hierarchy
411           is used on the system. This limit is not work-conserving and the
412           executed processes are not allowed to use more even if the device
413           has idle capacity. Takes a space-separated pair of a file path and
414           an IOPS value to specify the device specific IOPS. The file path
415           may be a path to a block device node, or as any other file in which
416           case the backing block device of the file system of the file is
417           used. If the IOPS is suffixed with K, M, G, or T, the specified
418           IOPS is parsed as KiloIOPS, MegaIOPS, GigaIOPS, or TeraIOPS,
419           respectively, to the base of 1000. (Example:
420           "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This
421           controls the "io.max" control group attributes. Use this option
422           multiple times to set IOPS limits for multiple devices. For details
423           about this control group attribute, see IO Interface Files[8].
424
425           These settings are supported only if the unified control group
426           hierarchy is used and disable settings prefixed with BlockIO or
427           StartupBlockIO.
428
429           Similar restrictions on block device discovery as for
430           IODeviceWeight= apply, see above.
431
432       IODeviceLatencyTargetSec=device target
433           Set the per-device average target I/O latency for the executed
434           processes, if the unified control group hierarchy is used on the
435           system. Takes a file path and a timespan separated by a space to
436           specify the device specific latency target. (Example: "/dev/sda
437           25ms"). The file path may be specified as path to a block device
438           node or as any other file, in which case the backing block device
439           of the file system of the file is determined. This controls the
440           "io.latency" control group attribute. Use this option multiple
441           times to set latency target for multiple devices. For details about
442           this control group attribute, see IO Interface Files[8].
443
444           Implies "IOAccounting=yes".
445
446           These settings are supported only if the unified control group
447           hierarchy is used.
448
449           Similar restrictions on block device discovery as for
450           IODeviceWeight= apply, see above.
451
452       IPAccounting=
453           Takes a boolean argument. If true, turns on IPv4 and IPv6 network
454           traffic accounting for packets sent or received by the unit. When
455           this option is turned on, all IPv4 and IPv6 sockets created by any
456           process of the unit are accounted for.
457
458           When this option is used in socket units, it applies to all IPv4
459           and IPv6 sockets associated with it (including both listening and
460           connection sockets where this applies). Note that for
461           socket-activated services, this configuration setting and the
462           accounting data of the service unit and the socket unit are kept
463           separate, and displayed separately. No propagation of the setting
464           and the collected statistics is done, in either direction.
465           Moreover, any traffic sent or received on any of the socket unit's
466           sockets is accounted to the socket unit — and never to the service
467           unit it might have activated, even if the socket is used by it.
468
469           The system default for this setting may be controlled with
470           DefaultIPAccounting= in systemd-system.conf(5).
471
472       IPAddressAllow=ADDRESS[/PREFIXLENGTH]...,
473       IPAddressDeny=ADDRESS[/PREFIXLENGTH]...
474           Turn on network traffic filtering for IP packets sent and received
475           over AF_INET and AF_INET6 sockets. Both directives take a space
476           separated list of IPv4 or IPv6 addresses, each optionally suffixed
477           with an address prefix length in bits after a "/" character. If the
478           suffix is omitted, the address is considered a host address, i.e.
479           the filter covers the whole address (32 bits for IPv4, 128 bits for
480           IPv6).
481
482           The access lists configured with this option are applied to all
483           sockets created by processes of this unit (or in the case of socket
484           units, associated with it). The lists are implicitly combined with
485           any lists configured for any of the parent slice units this unit
486           might be a member of. By default both access lists are empty. Both
487           ingress and egress traffic is filtered by these settings. In case
488           of ingress traffic the source IP address is checked against these
489           access lists, in case of egress traffic the destination IP address
490           is checked. The following rules are applied in turn:
491
492           •   Access is granted when the checked IP address matches an entry
493               in the IPAddressAllow= list.
494
495           •   Otherwise, access is denied when the checked IP address matches
496               an entry in the IPAddressDeny= list.
497
498           •   Otherwise, access is granted.
499
500           In order to implement an allow-listing IP firewall, it is
501           recommended to use a IPAddressDeny=any setting on an upper-level
502           slice unit (such as the root slice -.slice or the slice containing
503           all system services system.slice – see systemd.special(7) for
504           details on these slice units), plus individual per-service
505           IPAddressAllow= lines permitting network access to relevant
506           services, and only them.
507
508           Note that for socket-activated services, the IP access list
509           configured on the socket unit applies to all sockets associated
510           with it directly, but not to any sockets created by the ultimately
511           activated services for it. Conversely, the IP access list
512           configured for the service is not applied to any sockets passed
513           into the service via socket activation. Thus, it is usually a good
514           idea to replicate the IP access lists on both the socket and the
515           service unit. Nevertheless, it may make sense to maintain one list
516           more open and the other one more restricted, depending on the
517           usecase.
518
519           If these settings are used multiple times in the same unit the
520           specified lists are combined. If an empty string is assigned to
521           these settings the specific access list is reset and all previous
522           settings undone.
523
524           In place of explicit IPv4 or IPv6 address and prefix length
525           specifications a small set of symbolic names may be used. The
526           following names are defined:
527
528           Table 1. Special address/network names
529           ┌──────────────┬─────────────────────┬─────────────────────┐
530Symbolic Name Definition          Meaning             
531           ├──────────────┼─────────────────────┼─────────────────────┤
532any           │ 0.0.0.0/0 ::/0      │ Any host            │
533           ├──────────────┼─────────────────────┼─────────────────────┤
534localhost     │ 127.0.0.0/8 ::1/128 │ All addresses on    │
535           │              │                     │ the local loopback  │
536           ├──────────────┼─────────────────────┼─────────────────────┤
537link-local    │ 169.254.0.0/16      │ All link-local IP   │
538           │              │ fe80::/64           │ addresses           │
539           ├──────────────┼─────────────────────┼─────────────────────┤
540multicast     │ 224.0.0.0/4         │ All IP multicasting │
541           │              │ ff00::/8            │ addresses           │
542           └──────────────┴─────────────────────┴─────────────────────┘
543           Note that these settings might not be supported on some systems
544           (for example if eBPF control group support is not enabled in the
545           underlying kernel or container manager). These settings will have
546           no effect in that case. If compatibility with such systems is
547           desired it is hence recommended to not exclusively rely on them for
548           IP security.
549
550       IPIngressFilterPath=BPF_FS_PROGRAM_PATH,
551       IPEgressFilterPath=BPF_FS_PROGRAM_PATH
552           Add custom network traffic filters implemented as BPF programs,
553           applying to all IP packets sent and received over AF_INET and
554           AF_INET6 sockets. Takes an absolute path to a pinned BPF program in
555           the BPF virtual filesystem (/sys/fs/bpf/).
556
557           The filters configured with this option are applied to all sockets
558           created by processes of this unit (or in the case of socket units,
559           associated with it). The filters are loaded in addition to filters
560           any of the parent slice units this unit might be a member of as
561           well as any IPAddressAllow= and IPAddressDeny= filters in any of
562           these units. By default there are no filters specified.
563
564           If these settings are used multiple times in the same unit all the
565           specified programs are attached. If an empty string is assigned to
566           these settings the program list is reset and all previous specified
567           programs ignored.
568
569           If the path BPF_FS_PROGRAM_PATH in IPIngressFilterPath= assignment
570           is already being handled by BPFProgram= ingress hook, e.g.
571           BPFProgram=ingress:BPF_FS_PROGRAM_PATH, the assignment will be
572           still considered valid and the program will be attached to a
573           cgroup. Same for IPEgressFilterPath= path and egress hook.
574
575           Note that for socket-activated services, the IP filter programs
576           configured on the socket unit apply to all sockets associated with
577           it directly, but not to any sockets created by the ultimately
578           activated services for it. Conversely, the IP filter programs
579           configured for the service are not applied to any sockets passed
580           into the service via socket activation. Thus, it is usually a good
581           idea, to replicate the IP filter programs on both the socket and
582           the service unit, however it often makes sense to maintain one
583           configuration more open and the other one more restricted,
584           depending on the usecase.
585
586           Note that these settings might not be supported on some systems
587           (for example if eBPF control group support is not enabled in the
588           underlying kernel or container manager). These settings will fail
589           the service in that case. If compatibility with such systems is
590           desired it is hence recommended to attach your filter manually
591           (requires Delegate=yes) instead of using this setting.
592
593       BPFProgram=type:program-path
594           Add a custom cgroup BPF program.
595
596           BPFProgram= allows attaching BPF hooks to the cgroup of a systemd
597           unit. (This generalizes the functionality exposed via
598           IPEgressFilterPath= for egress and IPIngressFilterPath= for
599           ingress.) Cgroup-bpf hooks in the form of BPF programs loaded to
600           the BPF filesystem are attached with cgroup-bpf attach flags
601           determined by the unit. For details about attachment types and
602           flags see
603           https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/bpf.h.
604           For general BPF documentation please refer to
605           https://www.kernel.org/doc/html/latest/bpf/index.html.
606
607           The specification of BPF program consists of a type followed by a
608           program-path with ":" as the separator: type:program-path.
609
610           type is the string name of BPF attach type also used in bpftool.
611           type can be one of egress, ingress, sock_create, sock_ops, device,
612           bind4, bind6, connect4, connect6, post_bind4, post_bind6, sendmsg4,
613           sendmsg6, sysctl, recvmsg4, recvmsg6, getsockopt, setsockopt.
614
615           Setting BPFProgram= to an empty value makes previous assignments
616           ineffective.
617
618           Multiple assignments of the same type:program-path value have the
619           same effect as a single assignment: the program with the path
620           program-path will be attached to cgroup hook type just once.
621
622           If BPF egress pinned to program-path path is already being handled
623           by IPEgressFilterPath=, BPFProgram= assignment will be considered
624           valid and BPFProgram= will be attached to a cgroup. Similarly for
625           ingress hook and IPIngressFilterPath= assignment.
626
627           BPF programs passed with BPFProgram= are attached to the cgroup of
628           a unit with BPF attach flag multi, that allows further attachments
629           of the same type within cgroup hierarchy topped by the unit cgroup.
630
631           Examples:
632
633               BPFProgram=egress:/sys/fs/bpf/egress-hook
634               BPFProgram=bind6:/sys/fs/bpf/sock-addr-hook
635
636       SocketBindAllow=bind-rule, SocketBindDeny=bind-rule
637           Allow or deny binding a socket address to a socket by matching it
638           with the bind-rule and applying a corresponding action if there is
639           a match.
640
641           bind-rule describes socket properties such as address-family,
642           transport-protocol and ip-ports.
643
644           bind-rule := { [address-family:][transport-protocol:][ip-ports] |
645           any }
646
647           address-family := { ipv4 | ipv6 }
648
649           transport-protocol := { tcp | udp }
650
651           ip-ports := { ip-port | ip-port-range }
652
653           An optional address-family expects ipv4 or ipv6 values. If not
654           specified, a rule will be matched for both IPv4 and IPv6 addresses
655           and applied depending on other socket fields, e.g.
656           transport-protocol, ip-port.
657
658           An optional transport-protocol expects tcp or udp transport
659           protocol names. If not specified, a rule will be matched for any
660           transport protocol.
661
662           An optional ip-port value must lie within 1...65535 interval
663           inclusively, i.e. dynamic port 0 is not allowed. A range of
664           sequential ports is described by ip-port-range :=
665           ip-port-low-ip-port-high, where ip-port-low is smaller than or
666           equal to ip-port-high and both are within 1...65535 inclusively.
667
668           A special value any can be used to apply a rule to any address
669           family, transport protocol and any port with a positive value.
670
671           To allow multiple rules assign SocketBindAllow= or SocketBindDeny=
672           multiple times. To clear the existing assignments pass an empty
673           SocketBindAllow= or SocketBindDeny= assignment.
674
675           For each of SocketBindAllow= and SocketBindDeny=, maximum allowed
676           number of assignments is 128.
677
678           •   Binding to a socket is allowed when a socket address matches an
679               entry in the SocketBindAllow= list.
680
681           •   Otherwise, binding is denied when the socket address matches an
682               entry in the SocketBindDeny= list.
683
684           •   Otherwise, binding is allowed.
685
686           The feature is implemented with cgroup/bind4 and cgroup/bind6
687           cgroup-bpf hooks.
688
689           Examples:
690
691               ...
692               # Allow binding IPv6 socket addresses with a port greater than or equal to 10000.
693               [Service]
694               SocketBindAllow=ipv6:10000-65535
695               SocketBindDeny=any
696               ...
697               # Allow binding IPv4 and IPv6 socket addresses with 1234 and 4321 ports.
698               [Service]
699               SocketBindAllow=1234
700               SocketBindAllow=4321
701               SocketBindDeny=any
702               ...
703               # Deny binding IPv6 socket addresses.
704               [Service]
705               SocketBindDeny=ipv6
706               ...
707               # Deny binding IPv4 and IPv6 socket addresses.
708               [Service]
709               SocketBindDeny=any
710               ...
711               # Allow binding only over TCP
712               [Service]
713               SocketBindAllow=tcp
714               SocketBindDeny=any
715               ...
716               # Allow binding only over IPv6/TCP
717               [Service]
718               SocketBindAllow=ipv6:tcp
719               SocketBindDeny=any
720               ...
721               # Allow binding ports within 10000-65535 range over IPv4/UDP.
722               [Service]
723               SocketBindAllow=ipv4:udp:10000-65535
724               SocketBindDeny=any
725               ...
726
727       RestrictNetworkInterfaces=
728           Takes a list of space-separated network interface names. This
729           option restricts the network interfaces that processes of this unit
730           can use. By default processes can only use the network interfaces
731           listed (allow-list). If the first character of the rule is "~", the
732           effect is inverted: the processes can only use network interfaces
733           not listed (deny-list).
734
735           This option can appear multiple times, in which case the network
736           interface names are merged. If the empty string is assigned the set
737           is reset, all prior assignments will have not effect.
738
739           If you specify both types of this option (i.e. allow-listing and
740           deny-listing), the first encountered will take precedence and will
741           dictate the default action (allow vs deny). Then the next
742           occurrences of this option will add or delete the listed network
743           interface names from the set, depending of its type and the default
744           action.
745
746           The loopback interface ("lo") is not treated in any special way,
747           you have to configure it explicitly in the unit file.
748
749           Example 1: allow-list
750
751               RestrictNetworkInterfaces=eth1
752               RestrictNetworkInterfaces=eth2
753
754           Programs in the unit will be only able to use the eth1 and eth2
755           network interfaces.
756
757           Example 2: deny-list
758
759               RestrictNetworkInterfaces=~eth1 eth2
760
761           Programs in the unit will be able to use any network interface but
762           eth1 and eth2.
763
764           Example 3: mixed
765
766               RestrictNetworkInterfaces=eth1 eth2
767               RestrictNetworkInterfaces=~eth1
768
769           Programs in the unit will be only able to use the eth2 network
770           interface.
771
772       DeviceAllow=
773           Control access to specific device nodes by the executed processes.
774           Takes two space-separated strings: a device node specifier followed
775           by a combination of r, w, m to control reading, writing, or
776           creation of the specific device node(s) by the unit (mknod),
777           respectively. On cgroup-v1 this controls the "devices.allow"
778           control group attribute. For details about this control group
779           attribute, see Device Whitelist Controller[9]. In the unified
780           cgroup hierarchy this functionality is implemented using eBPF
781           filtering.
782
783           When access to all physical devices should be disallowed,
784           PrivateDevices= may be used instead. See systemd.exec(5).
785
786           The device node specifier is either a path to a device node in the
787           file system, starting with /dev/, or a string starting with either
788           "char-" or "block-" followed by a device group name, as listed in
789           /proc/devices. The latter is useful to allow-list all current and
790           future devices belonging to a specific device group at once. The
791           device group is matched according to filename globbing rules, you
792           may hence use the "*" and "?"  wildcards. (Note that such globbing
793           wildcards are not available for device node path specifications!)
794           In order to match device nodes by numeric major/minor, use device
795           node paths in the /dev/char/ and /dev/block/ directories. However,
796           matching devices by major/minor is generally not recommended as
797           assignments are neither stable nor portable between systems or
798           different kernel versions.
799
800           Examples: /dev/sda5 is a path to a device node, referring to an ATA
801           or SCSI block device.  "char-pts" and "char-alsa" are specifiers
802           for all pseudo TTYs and all ALSA sound devices, respectively.
803           "char-cpu/*" is a specifier matching all CPU related device groups.
804
805           Note that allow lists defined this way should only reference device
806           groups which are resolvable at the time the unit is started. Any
807           device groups not resolvable then are not added to the device allow
808           list. In order to work around this limitation, consider extending
809           service units with a pair of After=modprobe@xyz.service and
810           Wants=modprobe@xyz.service lines that load the necessary kernel
811           module implementing the device group if missing. Example:
812
813               ...
814               [Unit]
815               Wants=modprobe@loop.service
816               After=modprobe@loop.service
817
818               [Service]
819               DeviceAllow=block-loop
820               DeviceAllow=/dev/loop-control
821               ...
822
823       DevicePolicy=auto|closed|strict
824           Control the policy for allowing device access:
825
826           strict
827               means to only allow types of access that are explicitly
828               specified.
829
830           closed
831               in addition, allows access to standard pseudo devices including
832               /dev/null, /dev/zero, /dev/full, /dev/random, and /dev/urandom.
833
834           auto
835               in addition, allows access to all devices if no explicit
836               DeviceAllow= is present. This is the default.
837
838       Slice=
839           The name of the slice unit to place the unit in. Defaults to
840           system.slice for all non-instantiated units of all unit types
841           (except for slice units themselves see below). Instance units are
842           by default placed in a subslice of system.slice that is named after
843           the template name.
844
845           This option may be used to arrange systemd units in a hierarchy of
846           slices each of which might have resource settings applied.
847
848           For units of type slice, the only accepted value for this setting
849           is the parent slice. Since the name of a slice unit implies the
850           parent slice, it is hence redundant to ever set this parameter
851           directly for slice units.
852
853           Special care should be taken when relying on the default slice
854           assignment in templated service units that have
855           DefaultDependencies=no set, see systemd.service(5), section
856           "Default Dependencies" for details.
857
858       Delegate=
859           Turns on delegation of further resource control partitioning to
860           processes of the unit. Units where this is enabled may create and
861           manage their own private subhierarchy of control groups below the
862           control group of the unit itself. For unprivileged services (i.e.
863           those using the User= setting) the unit's control group will be
864           made accessible to the relevant user. When enabled the service
865           manager will refrain from manipulating control groups or moving
866           processes below the unit's control group, so that a clear concept
867           of ownership is established: the control group tree above the
868           unit's control group (i.e. towards the root control group) is owned
869           and managed by the service manager of the host, while the control
870           group tree below the unit's control group is owned and managed by
871           the unit itself. Takes either a boolean argument or a list of
872           control group controller names. If true, delegation is turned on,
873           and all supported controllers are enabled for the unit, making them
874           available to the unit's processes for management. If false,
875           delegation is turned off entirely (and no additional controllers
876           are enabled). If set to a list of controllers, delegation is turned
877           on, and the specified controllers are enabled for the unit. Note
878           that additional controllers than the ones specified might be made
879           available as well, depending on configuration of the containing
880           slice unit or other units contained in it. Note that assigning the
881           empty string will enable delegation, but reset the list of
882           controllers, all assignments prior to this will have no effect.
883           Defaults to false.
884
885           Note that controller delegation to less privileged code is only
886           safe on the unified control group hierarchy. Accordingly, access to
887           the specified controllers will not be granted to unprivileged
888           services on the legacy hierarchy, even when requested.
889
890           The following controller names may be specified: cpu, cpuacct,
891           cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
892           bpf-devices.
893
894           Not all of these controllers are available on all kernels however,
895           and some are specific to the unified hierarchy while others are
896           specific to the legacy hierarchy. Also note that the kernel might
897           support further controllers, which aren't covered here yet as
898           delegation is either not supported at all for them or not defined
899           cleanly.
900
901           For further details on the delegation model consult Control Group
902           APIs and Delegation[10].
903
904       DisableControllers=
905           Disables controllers from being enabled for a unit's children. If a
906           controller listed is already in use in its subtree, the controller
907           will be removed from the subtree. This can be used to avoid child
908           units being able to implicitly or explicitly enable a controller.
909           Defaults to not disabling any controllers.
910
911           It may not be possible to successfully disable a controller if the
912           unit or any child of the unit in question delegates controllers to
913           its children, as any delegated subtree of the cgroup hierarchy is
914           unmanaged by systemd.
915
916           Multiple controllers may be specified, separated by spaces. You may
917           also pass DisableControllers= multiple times, in which case each
918           new instance adds another controller to disable. Passing
919           DisableControllers= by itself with no controller name present
920           resets the disabled controller list.
921
922           The following controller names may be specified: cpu, cpuacct,
923           cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
924           bpf-devices.
925
926       ManagedOOMSwap=auto|kill, ManagedOOMMemoryPressure=auto|kill
927           Specifies how systemd-oomd.service(8) will act on this unit's
928           cgroups. Defaults to auto.
929
930           When set to kill, the unit becomes a candidate for monitoring by
931           systemd-oomd. If the cgroup passes the limits set by oomd.conf(5)
932           or the unit configuration, systemd-oomd will select a descendant
933           cgroup and send SIGKILL to all of the processes under it. You can
934           find more details on candidates and kill behavior at systemd-
935           oomd.service(8) and oomd.conf(5).
936
937           Setting either of these properties to kill will also result in
938           After= and Wants= dependencies on systemd-oomd.service unless
939           DefaultDependencies=no.
940
941           When set to auto, systemd-oomd will not actively use this cgroup's
942           data for monitoring and detection. However, if an ancestor cgroup
943           has one of these properties set to kill, a unit with auto can still
944           be a candidate for systemd-oomd to terminate.
945
946       ManagedOOMMemoryPressureLimit=
947           Overrides the default memory pressure limit set by oomd.conf(5) for
948           this unit (cgroup). Takes a percentage value between 0% and 100%,
949           inclusive. This property is ignored unless
950           ManagedOOMMemoryPressure=kill. Defaults to 0%, which means to use
951           the default set by oomd.conf(5).
952
953       ManagedOOMPreference=none|avoid|omit
954           Allows deprioritizing or omitting this unit's cgroup as a candidate
955           when systemd-oomd needs to act. Requires support for extended
956           attributes (see xattr(7)) in order to use avoid or omit.
957           Additionally, systemd-oomd will ignore these extended attributes if
958           the unit's cgroup is not owned by the root user.
959
960           If this property is set to avoid, the service manager will convey
961           this to systemd-oomd, which will only select this cgroup if there
962           are no other viable candidates.
963
964           If this property is set to omit, the service manager will convey
965           this to systemd-oomd, which will ignore this cgroup as a candidate
966           and will not perform any actions on it.
967
968           It is recommended to use avoid and omit sparingly, as it can
969           adversely affect systemd-oomd's kill behavior. Also note that these
970           extended attributes are not applied recursively to cgroups under
971           this unit's cgroup.
972
973           Defaults to none which means systemd-oomd will rank this unit's
974           cgroup as defined in systemd-oomd.service(8) and oomd.conf(5).
975

DEPRECATED OPTIONS

977       The following options are deprecated. Use the indicated superseding
978       options instead:
979
980       CPUShares=weight, StartupCPUShares=weight
981           Assign the specified CPU time share weight to the processes
982           executed. These options take an integer value and control the
983           "cpu.shares" control group attribute. The allowed range is 2 to
984           262144. Defaults to 1024. For details about this control group
985           attribute, see CFS Scheduler[4]. The available CPU time is split up
986           among all units within one slice relative to their CPU time share
987           weight.
988
989           While StartupCPUShares= applies to the startup and shutdown phases
990           of the system, CPUShares= applies to normal runtime of the system,
991           and if the former is not set also to the startup and shutdown
992           phases. Using StartupCPUShares= allows prioritizing specific
993           services at boot-up and shutdown differently than during normal
994           runtime.
995
996           Implies "CPUAccounting=yes".
997
998           These settings are deprecated. Use CPUWeight= and StartupCPUWeight=
999           instead.
1000
1001       MemoryLimit=bytes
1002           Specify the limit on maximum memory usage of the executed
1003           processes. The limit specifies how much process and kernel memory
1004           can be used by tasks in this unit. Takes a memory size in bytes. If
1005           the value is suffixed with K, M, G or T, the specified memory size
1006           is parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with
1007           the base 1024), respectively. Alternatively, a percentage value may
1008           be specified, which is taken relative to the installed physical
1009           memory on the system. If assigned the special value "infinity", no
1010           memory limit is applied. This controls the "memory.limit_in_bytes"
1011           control group attribute. For details about this control group
1012           attribute, see Memory Resource Controller[11].
1013
1014           Implies "MemoryAccounting=yes".
1015
1016           This setting is deprecated. Use MemoryMax= instead.
1017
1018       BlockIOAccounting=
1019           Turn on Block I/O accounting for this unit, if the legacy control
1020           group hierarchy is used on the system. Takes a boolean argument.
1021           Note that turning on block I/O accounting for one unit will also
1022           implicitly turn it on for all units contained in the same slice and
1023           all for its parent slices and the units contained therein. The
1024           system default for this setting may be controlled with
1025           DefaultBlockIOAccounting= in systemd-system.conf(5).
1026
1027           This setting is deprecated. Use IOAccounting= instead.
1028
1029       BlockIOWeight=weight, StartupBlockIOWeight=weight
1030           Set the default overall block I/O weight for the executed
1031           processes, if the legacy control group hierarchy is used on the
1032           system. Takes a single weight value (between 10 and 1000) to set
1033           the default block I/O weight. This controls the "blkio.weight"
1034           control group attribute, which defaults to 500. For details about
1035           this control group attribute, see Block IO Controller[12]. The
1036           available I/O bandwidth is split up among all units within one
1037           slice relative to their block I/O weight.
1038
1039           While StartupBlockIOWeight= only applies to the startup and
1040           shutdown phases of the system, BlockIOWeight= applies to the later
1041           runtime of the system, and if the former is not set also to the
1042           startup and shutdown phases. This allows prioritizing specific
1043           services at boot-up and shutdown differently than during runtime.
1044
1045           Implies "BlockIOAccounting=yes".
1046
1047           These settings are deprecated. Use IOWeight= and StartupIOWeight=
1048           instead.
1049
1050       BlockIODeviceWeight=device weight
1051           Set the per-device overall block I/O weight for the executed
1052           processes, if the legacy control group hierarchy is used on the
1053           system. Takes a space-separated pair of a file path and a weight
1054           value to specify the device specific weight value, between 10 and
1055           1000. (Example: "/dev/sda 500"). The file path may be specified as
1056           path to a block device node or as any other file, in which case the
1057           backing block device of the file system of the file is determined.
1058           This controls the "blkio.weight_device" control group attribute,
1059           which defaults to 1000. Use this option multiple times to set
1060           weights for multiple devices. For details about this control group
1061           attribute, see Block IO Controller[12].
1062
1063           Implies "BlockIOAccounting=yes".
1064
1065           This setting is deprecated. Use IODeviceWeight= instead.
1066
1067       BlockIOReadBandwidth=device bytes, BlockIOWriteBandwidth=device bytes
1068           Set the per-device overall block I/O bandwidth limit for the
1069           executed processes, if the legacy control group hierarchy is used
1070           on the system. Takes a space-separated pair of a file path and a
1071           bandwidth value (in bytes per second) to specify the device
1072           specific bandwidth. The file path may be a path to a block device
1073           node, or as any other file in which case the backing block device
1074           of the file system of the file is used. If the bandwidth is
1075           suffixed with K, M, G, or T, the specified bandwidth is parsed as
1076           Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
1077           base of 1000. (Example:
1078           "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
1079           controls the "blkio.throttle.read_bps_device" and
1080           "blkio.throttle.write_bps_device" control group attributes. Use
1081           this option multiple times to set bandwidth limits for multiple
1082           devices. For details about these control group attributes, see
1083           Block IO Controller[12].
1084
1085           Implies "BlockIOAccounting=yes".
1086
1087           These settings are deprecated. Use IOReadBandwidthMax= and
1088           IOWriteBandwidthMax= instead.
1089

SEE ALSO

1091       systemd(1), systemd-system.conf(5), systemd.unit(5),
1092       systemd.service(5), systemd.slice(5), systemd.scope(5),
1093       systemd.socket(5), systemd.mount(5), systemd.swap(5), systemd.exec(5),
1094       systemd.directives(7), systemd.special(7), systemd-oomd.service(8), The
1095       documentation for control groups and specific controllers in the Linux
1096       kernel: Control Groups v2[2].
1097

NOTES

1099        1. New Control Group Interfaces
1100           https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
1101
1102        2. Control Groups v2
1103           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
1104
1105        3. Control Groups version 1
1106           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/
1107
1108        4. CFS Scheduler
1109           https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html
1110
1111        5. sched-bwc.txt
1112           https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
1113
1114        6. Memory Interface Files
1115           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
1116
1117        7. Process Number Controller
1118           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/pids.html
1119
1120        8. IO Interface Files
1121           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#io-interface-files
1122
1123        9. Device Whitelist Controller
1124           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/devices.html
1125
1126       10. Control Group APIs and Delegation
1127           https://systemd.io/CGROUP_DELEGATION
1128
1129       11. Memory Resource Controller
1130           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html
1131
1132       12. Block IO Controller
1133           https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/blkio-controller.html
1134
1135
1136
1137systemd 251                                        SYSTEMD.RESOURCE-CONTROL(5)
Impressum