1SYSTEMD.RESOURCE-CONTROL(5)systemd.resource-controlSYSTEMD.RESOURCE-CONTROL(5)
2
3
4
6 systemd.resource-control - Resource control unit settings
7
9 slice.slice, scope.scope, service.service, socket.socket, mount.mount,
10 swap.swap
11
13 Unit configuration files for services, slices, scopes, sockets, mount
14 points, and swap devices share a subset of configuration options for
15 resource control of spawned processes. Internally, this relies on the
16 Linux Control Groups (cgroups) kernel concept for organizing processes
17 in a hierarchical tree of named groups for the purpose of resource
18 management.
19
20 This man page lists the configuration options shared by those six unit
21 types. See systemd.unit(5) for the common options of all unit
22 configuration files, and systemd.slice(5), systemd.scope(5),
23 systemd.service(5), systemd.socket(5), systemd.mount(5), and
24 systemd.swap(5) for more information on the specific unit configuration
25 files. The resource control configuration options are configured in the
26 [Slice], [Scope], [Service], [Socket], [Mount], or [Swap] sections,
27 depending on the unit type.
28
29 In addition, options which control resources available to programs
30 executed by systemd are listed in systemd.exec(5). Those options
31 complement options listed here.
32
33 See the New Control Group Interfaces[1] for an introduction on how to
34 make use of resource control APIs from programs.
35
36 Setting resource controls for a group of related units
37 As described in systemd.unit(5), the settings listed here may be set
38 through the main file of a unit and drop-in snippets in *.d/
39 directories. The list of directories searched for drop-ins includes
40 names formed by repeatedly truncating the unit name after all dashes.
41 This is particularly convenient to set resource limits for a group of
42 units with similar names.
43
44 For example, every user gets their own slice user-nnn.slice. Drop-ins
45 with local configuration that affect user 1000 may be placed in
46 /etc/systemd/system/user-1000.slice,
47 /etc/systemd/system/user-1000.slice.d/*.conf, but also
48 /etc/systemd/system/user-.slice.d/*.conf. This last directory applies
49 to all user slices.
50
52 The following dependencies are implicitly added:
53
54 • Units with the Slice= setting set automatically acquire Requires=
55 and After= dependencies on the specified slice unit.
56
58 The unified control group hierarchy is the new version of kernel
59 control group interface, see Control Groups v2[2]. Depending on the
60 resource type, there are differences in resource control capabilities.
61 Also, because of interface changes, some resource types have separate
62 set of options on the unified hierarchy.
63
64 CPU
65 CPUWeight= and StartupCPUWeight= replace CPUShares= and
66 StartupCPUShares=, respectively.
67
68 The "cpuacct" controller does not exist separately on the unified
69 hierarchy.
70
71 Memory
72 MemoryMax= replaces MemoryLimit=. MemoryLow= and MemoryHigh= are
73 effective only on unified hierarchy.
74
75 IO
76 "IO"-prefixed settings are a superset of and replace
77 "BlockIO"-prefixed ones. On unified hierarchy, IO resource control
78 also applies to buffered writes.
79
80 To ease the transition, there is best-effort translation between the
81 two versions of settings. For each controller, if any of the settings
82 for the unified hierarchy are present, all settings for the legacy
83 hierarchy are ignored. If the resulting settings are for the other type
84 of hierarchy, the configurations are translated before application.
85
86 Legacy control group hierarchy (see Control Groups version 1[3]), also
87 called cgroup-v1, doesn't allow safe delegation of controllers to
88 unprivileged processes. If the system uses the legacy control group
89 hierarchy, resource control is disabled for the systemd user instance,
90 see systemd(1).
91
93 Units of the types listed above can have settings for resource control
94 configuration:
95
96 CPUAccounting=
97 Turn on CPU usage accounting for this unit. Takes a boolean
98 argument. Note that turning on CPU accounting for one unit will
99 also implicitly turn it on for all units contained in the same
100 slice and for all its parent slices and the units contained
101 therein. The system default for this setting may be controlled with
102 DefaultCPUAccounting= in systemd-system.conf(5).
103
104 CPUWeight=weight, StartupCPUWeight=weight
105 Assign the specified CPU time weight to the processes executed, if
106 the unified control group hierarchy is used on the system. These
107 options take an integer value and control the "cpu.weight" control
108 group attribute. The allowed range is 1 to 10000. Defaults to 100.
109 For details about this control group attribute, see Control Groups
110 v2[2] and CFS Scheduler[4]. The available CPU time is split up
111 among all units within one slice relative to their CPU time weight.
112 A higher weight means more CPU time, a lower weight means less.
113
114 While StartupCPUWeight= only applies to the startup phase of the
115 system, CPUWeight= applies to normal runtime of the system, and if
116 the former is not set also to the startup phase. Using
117 StartupCPUWeight= allows prioritizing specific services at boot-up
118 differently than during normal runtime.
119
120 These settings replace CPUShares= and StartupCPUShares=.
121
122 CPUQuota=
123 Assign the specified CPU time quota to the processes executed.
124 Takes a percentage value, suffixed with "%". The percentage
125 specifies how much CPU time the unit shall get at maximum, relative
126 to the total CPU time available on one CPU. Use values > 100% for
127 allotting CPU time on more than one CPU. This controls the
128 "cpu.max" attribute on the unified control group hierarchy and
129 "cpu.cfs_quota_us" on legacy. For details about these control group
130 attributes, see Control Groups v2[2] and sched-bwc.txt[5].
131
132 Example: CPUQuota=20% ensures that the executed processes will
133 never get more than 20% CPU time on one CPU.
134
135 CPUQuotaPeriodSec=
136 Assign the duration over which the CPU time quota specified by
137 CPUQuota= is measured. Takes a time duration value in seconds, with
138 an optional suffix such as "ms" for milliseconds (or "s" for
139 seconds.) The default setting is 100ms. The period is clamped to
140 the range supported by the kernel, which is [1ms, 1000ms].
141 Additionally, the period is adjusted up so that the quota interval
142 is also at least 1ms. Setting CPUQuotaPeriodSec= to an empty value
143 resets it to the default.
144
145 This controls the second field of "cpu.max" attribute on the
146 unified control group hierarchy and "cpu.cfs_period_us" on legacy.
147 For details about these control group attributes, see Control
148 Groups v2[2] and CFS Scheduler[4].
149
150 Example: CPUQuotaPeriodSec=10ms to request that the CPU quota is
151 measured in periods of 10ms.
152
153 AllowedCPUs=
154 Restrict processes to be executed on specific CPUs. Takes a list of
155 CPU indices or ranges separated by either whitespace or commas. CPU
156 ranges are specified by the lower and upper CPU indices separated
157 by a dash.
158
159 Setting AllowedCPUs= doesn't guarantee that all of the CPUs will be
160 used by the processes as it may be limited by parent units. The
161 effective configuration is reported as EffectiveCPUs=.
162
163 This setting is supported only with the unified control group
164 hierarchy.
165
166 AllowedMemoryNodes=
167 Restrict processes to be executed on specific memory NUMA nodes.
168 Takes a list of memory NUMA nodes indices or ranges separated by
169 either whitespace or commas. Memory NUMA nodes ranges are specified
170 by the lower and upper NUMA nodes indices separated by a dash.
171
172 Setting AllowedMemoryNodes= doesn't guarantee that all of the
173 memory NUMA nodes will be used by the processes as it may be
174 limited by parent units. The effective configuration is reported as
175 EffectiveMemoryNodes=.
176
177 This setting is supported only with the unified control group
178 hierarchy.
179
180 MemoryAccounting=
181 Turn on process and kernel memory accounting for this unit. Takes a
182 boolean argument. Note that turning on memory accounting for one
183 unit will also implicitly turn it on for all units contained in the
184 same slice and for all its parent slices and the units contained
185 therein. The system default for this setting may be controlled with
186 DefaultMemoryAccounting= in systemd-system.conf(5).
187
188 MemoryMin=bytes, MemoryLow=bytes
189 Specify the memory usage protection of the executed processes in
190 this unit. When reclaiming memory, the unit is treated as if it was
191 using less memory resulting in memory to be preferentially
192 reclaimed from unprotected units. Using MemoryLow= results in a
193 weaker protection where memory may still be reclaimed to avoid
194 invoking the OOM killer in case there is no other reclaimable
195 memory.
196
197 For a protection to be effective, it is generally required to set a
198 corresponding allocation on all ancestors, which is then
199 distributed between children (with the exception of the root
200 slice). Any MemoryMin= or MemoryLow= allocation that is not
201 explicitly distributed to specific children is used to create a
202 shared protection for all children. As this is a shared protection,
203 the children will freely compete for the memory.
204
205 Takes a memory size in bytes. If the value is suffixed with K, M, G
206 or T, the specified memory size is parsed as Kilobytes, Megabytes,
207 Gigabytes, or Terabytes (with the base 1024), respectively.
208 Alternatively, a percentage value may be specified, which is taken
209 relative to the installed physical memory on the system. If
210 assigned the special value "infinity", all available memory is
211 protected, which may be useful in order to always inherit all of
212 the protection afforded by ancestors. This controls the
213 "memory.min" or "memory.low" control group attribute. For details
214 about this control group attribute, see Memory Interface Files[6].
215
216 This setting is supported only if the unified control group
217 hierarchy is used and disables MemoryLimit=.
218
219 Units may have their children use a default "memory.min" or
220 "memory.low" value by specifying DefaultMemoryMin= or
221 DefaultMemoryLow=, which has the same semantics as MemoryMin= and
222 MemoryLow=. This setting does not affect "memory.min" or
223 "memory.low" in the unit itself. Using it to set a default child
224 allocation is only useful on kernels older than 5.7, which do not
225 support the "memory_recursiveprot" cgroup2 mount option.
226
227 MemoryHigh=bytes
228 Specify the throttling limit on memory usage of the executed
229 processes in this unit. Memory usage may go above the limit if
230 unavoidable, but the processes are heavily slowed down and memory
231 is taken away aggressively in such cases. This is the main
232 mechanism to control memory usage of a unit.
233
234 Takes a memory size in bytes. If the value is suffixed with K, M, G
235 or T, the specified memory size is parsed as Kilobytes, Megabytes,
236 Gigabytes, or Terabytes (with the base 1024), respectively.
237 Alternatively, a percentage value may be specified, which is taken
238 relative to the installed physical memory on the system. If
239 assigned the special value "infinity", no memory throttling is
240 applied. This controls the "memory.high" control group attribute.
241 For details about this control group attribute, see Memory
242 Interface Files[6].
243
244 This setting is supported only if the unified control group
245 hierarchy is used and disables MemoryLimit=.
246
247 MemoryMax=bytes
248 Specify the absolute limit on memory usage of the executed
249 processes in this unit. If memory usage cannot be contained under
250 the limit, out-of-memory killer is invoked inside the unit. It is
251 recommended to use MemoryHigh= as the main control mechanism and
252 use MemoryMax= as the last line of defense.
253
254 Takes a memory size in bytes. If the value is suffixed with K, M, G
255 or T, the specified memory size is parsed as Kilobytes, Megabytes,
256 Gigabytes, or Terabytes (with the base 1024), respectively.
257 Alternatively, a percentage value may be specified, which is taken
258 relative to the installed physical memory on the system. If
259 assigned the special value "infinity", no memory limit is applied.
260 This controls the "memory.max" control group attribute. For details
261 about this control group attribute, see Memory Interface Files[6].
262
263 This setting replaces MemoryLimit=.
264
265 MemorySwapMax=bytes
266 Specify the absolute limit on swap usage of the executed processes
267 in this unit.
268
269 Takes a swap size in bytes. If the value is suffixed with K, M, G
270 or T, the specified swap size is parsed as Kilobytes, Megabytes,
271 Gigabytes, or Terabytes (with the base 1024), respectively. If
272 assigned the special value "infinity", no swap limit is applied.
273 This controls the "memory.swap.max" control group attribute. For
274 details about this control group attribute, see Memory Interface
275 Files[6].
276
277 This setting is supported only if the unified control group
278 hierarchy is used and disables MemoryLimit=.
279
280 TasksAccounting=
281 Turn on task accounting for this unit. Takes a boolean argument. If
282 enabled, the system manager will keep track of the number of tasks
283 in the unit. The number of tasks accounted this way includes both
284 kernel threads and userspace processes, with each thread counting
285 individually. Note that turning on tasks accounting for one unit
286 will also implicitly turn it on for all units contained in the same
287 slice and for all its parent slices and the units contained
288 therein. The system default for this setting may be controlled with
289 DefaultTasksAccounting= in systemd-system.conf(5).
290
291 TasksMax=N
292 Specify the maximum number of tasks that may be created in the
293 unit. This ensures that the number of tasks accounted for the unit
294 (see above) stays below a specific limit. This either takes an
295 absolute number of tasks or a percentage value that is taken
296 relative to the configured maximum number of tasks on the system.
297 If assigned the special value "infinity", no tasks limit is
298 applied. This controls the "pids.max" control group attribute. For
299 details about this control group attribute, see Process Number
300 Controller[7].
301
302 The system default for this setting may be controlled with
303 DefaultTasksMax= in systemd-system.conf(5).
304
305 IOAccounting=
306 Turn on Block I/O accounting for this unit, if the unified control
307 group hierarchy is used on the system. Takes a boolean argument.
308 Note that turning on block I/O accounting for one unit will also
309 implicitly turn it on for all units contained in the same slice and
310 all for its parent slices and the units contained therein. The
311 system default for this setting may be controlled with
312 DefaultIOAccounting= in systemd-system.conf(5).
313
314 This setting replaces BlockIOAccounting= and disables settings
315 prefixed with BlockIO or StartupBlockIO.
316
317 IOWeight=weight, StartupIOWeight=weight
318 Set the default overall block I/O weight for the executed
319 processes, if the unified control group hierarchy is used on the
320 system. Takes a single weight value (between 1 and 10000) to set
321 the default block I/O weight. This controls the "io.weight" control
322 group attribute, which defaults to 100. For details about this
323 control group attribute, see IO Interface Files[8]. The available
324 I/O bandwidth is split up among all units within one slice relative
325 to their block I/O weight. A higher weight means more I/O
326 bandwidth, a lower weight means less.
327
328 While StartupIOWeight= only applies to the startup phase of the
329 system, IOWeight= applies to the later runtime of the system, and
330 if the former is not set also to the startup phase. This allows
331 prioritizing specific services at boot-up differently than during
332 runtime.
333
334 These settings replace BlockIOWeight= and StartupBlockIOWeight= and
335 disable settings prefixed with BlockIO or StartupBlockIO.
336
337 IODeviceWeight=device weight
338 Set the per-device overall block I/O weight for the executed
339 processes, if the unified control group hierarchy is used on the
340 system. Takes a space-separated pair of a file path and a weight
341 value to specify the device specific weight value, between 1 and
342 10000. (Example: "/dev/sda 1000"). The file path may be specified
343 as path to a block device node or as any other file, in which case
344 the backing block device of the file system of the file is
345 determined. This controls the "io.weight" control group attribute,
346 which defaults to 100. Use this option multiple times to set
347 weights for multiple devices. For details about this control group
348 attribute, see IO Interface Files[8].
349
350 This setting replaces BlockIODeviceWeight= and disables settings
351 prefixed with BlockIO or StartupBlockIO.
352
353 The specified device node should reference a block device that has
354 an I/O scheduler associated, i.e. should not refer to partition or
355 loopback block devices, but to the originating, physical device.
356 When a path to a regular file or directory is specified it is
357 attempted to discover the correct originating device backing the
358 file system of the specified path. This works correctly only for
359 simpler cases, where the file system is directly placed on a
360 partition or physical block device, or where simple 1:1 encryption
361 using dm-crypt/LUKS is used. This discovery does not cover complex
362 storage and in particular RAID and volume management storage
363 devices.
364
365 IOReadBandwidthMax=device bytes, IOWriteBandwidthMax=device bytes
366 Set the per-device overall block I/O bandwidth maximum limit for
367 the executed processes, if the unified control group hierarchy is
368 used on the system. This limit is not work-conserving and the
369 executed processes are not allowed to use more even if the device
370 has idle capacity. Takes a space-separated pair of a file path and
371 a bandwidth value (in bytes per second) to specify the device
372 specific bandwidth. The file path may be a path to a block device
373 node, or as any other file in which case the backing block device
374 of the file system of the file is used. If the bandwidth is
375 suffixed with K, M, G, or T, the specified bandwidth is parsed as
376 Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
377 base of 1000. (Example:
378 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
379 controls the "io.max" control group attributes. Use this option
380 multiple times to set bandwidth limits for multiple devices. For
381 details about this control group attribute, see IO Interface
382 Files[8].
383
384 These settings replace BlockIOReadBandwidth= and
385 BlockIOWriteBandwidth= and disable settings prefixed with BlockIO
386 or StartupBlockIO.
387
388 Similar restrictions on block device discovery as for
389 IODeviceWeight= apply, see above.
390
391 IOReadIOPSMax=device IOPS, IOWriteIOPSMax=device IOPS
392 Set the per-device overall block I/O IOs-Per-Second maximum limit
393 for the executed processes, if the unified control group hierarchy
394 is used on the system. This limit is not work-conserving and the
395 executed processes are not allowed to use more even if the device
396 has idle capacity. Takes a space-separated pair of a file path and
397 an IOPS value to specify the device specific IOPS. The file path
398 may be a path to a block device node, or as any other file in which
399 case the backing block device of the file system of the file is
400 used. If the IOPS is suffixed with K, M, G, or T, the specified
401 IOPS is parsed as KiloIOPS, MegaIOPS, GigaIOPS, or TeraIOPS,
402 respectively, to the base of 1000. (Example:
403 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This
404 controls the "io.max" control group attributes. Use this option
405 multiple times to set IOPS limits for multiple devices. For details
406 about this control group attribute, see IO Interface Files[8].
407
408 These settings are supported only if the unified control group
409 hierarchy is used and disable settings prefixed with BlockIO or
410 StartupBlockIO.
411
412 Similar restrictions on block device discovery as for
413 IODeviceWeight= apply, see above.
414
415 IODeviceLatencyTargetSec=device target
416 Set the per-device average target I/O latency for the executed
417 processes, if the unified control group hierarchy is used on the
418 system. Takes a file path and a timespan separated by a space to
419 specify the device specific latency target. (Example: "/dev/sda
420 25ms"). The file path may be specified as path to a block device
421 node or as any other file, in which case the backing block device
422 of the file system of the file is determined. This controls the
423 "io.latency" control group attribute. Use this option multiple
424 times to set latency target for multiple devices. For details about
425 this control group attribute, see IO Interface Files[8].
426
427 Implies "IOAccounting=yes".
428
429 These settings are supported only if the unified control group
430 hierarchy is used.
431
432 Similar restrictions on block device discovery as for
433 IODeviceWeight= apply, see above.
434
435 IPAccounting=
436 Takes a boolean argument. If true, turns on IPv4 and IPv6 network
437 traffic accounting for packets sent or received by the unit. When
438 this option is turned on, all IPv4 and IPv6 sockets created by any
439 process of the unit are accounted for.
440
441 When this option is used in socket units, it applies to all IPv4
442 and IPv6 sockets associated with it (including both listening and
443 connection sockets where this applies). Note that for
444 socket-activated services, this configuration setting and the
445 accounting data of the service unit and the socket unit are kept
446 separate, and displayed separately. No propagation of the setting
447 and the collected statistics is done, in either direction.
448 Moreover, any traffic sent or received on any of the socket unit's
449 sockets is accounted to the socket unit — and never to the service
450 unit it might have activated, even if the socket is used by it.
451
452 The system default for this setting may be controlled with
453 DefaultIPAccounting= in systemd-system.conf(5).
454
455 IPAddressAllow=ADDRESS[/PREFIXLENGTH]...,
456 IPAddressDeny=ADDRESS[/PREFIXLENGTH]...
457 Turn on network traffic filtering for IP packets sent and received
458 over AF_INET and AF_INET6 sockets. Both directives take a space
459 separated list of IPv4 or IPv6 addresses, each optionally suffixed
460 with an address prefix length in bits after a "/" character. If the
461 suffix is omitted, the address is considered a host address, i.e.
462 the filter covers the whole address (32 bits for IPv4, 128 bits for
463 IPv6).
464
465 The access lists configured with this option are applied to all
466 sockets created by processes of this unit (or in the case of socket
467 units, associated with it). The lists are implicitly combined with
468 any lists configured for any of the parent slice units this unit
469 might be a member of. By default both access lists are empty. Both
470 ingress and egress traffic is filtered by these settings. In case
471 of ingress traffic the source IP address is checked against these
472 access lists, in case of egress traffic the destination IP address
473 is checked. The following rules are applied in turn:
474
475 • Access is granted when the checked IP address matches an entry
476 in the IPAddressAllow= list.
477
478 • Otherwise, access is denied when the checked IP address matches
479 an entry in the IPAddressDeny= list.
480
481 • Otherwise, access is granted.
482
483 In order to implement an allow-listing IP firewall, it is
484 recommended to use a IPAddressDeny=any setting on an upper-level
485 slice unit (such as the root slice -.slice or the slice containing
486 all system services system.slice – see systemd.special(7) for
487 details on these slice units), plus individual per-service
488 IPAddressAllow= lines permitting network access to relevant
489 services, and only them.
490
491 Note that for socket-activated services, the IP access list
492 configured on the socket unit applies to all sockets associated
493 with it directly, but not to any sockets created by the ultimately
494 activated services for it. Conversely, the IP access list
495 configured for the service is not applied to any sockets passed
496 into the service via socket activation. Thus, it is usually a good
497 idea to replicate the IP access lists on both the socket and the
498 service unit. Nevertheless, it may make sense to maintain one list
499 more open and the other one more restricted, depending on the
500 usecase.
501
502 If these settings are used multiple times in the same unit the
503 specified lists are combined. If an empty string is assigned to
504 these settings the specific access list is reset and all previous
505 settings undone.
506
507 In place of explicit IPv4 or IPv6 address and prefix length
508 specifications a small set of symbolic names may be used. The
509 following names are defined:
510
511 Table 1. Special address/network names
512 ┌──────────────┬─────────────────────┬─────────────────────┐
513 │Symbolic Name │ Definition │ Meaning │
514 ├──────────────┼─────────────────────┼─────────────────────┤
515 │any │ 0.0.0.0/0 ::/0 │ Any host │
516 ├──────────────┼─────────────────────┼─────────────────────┤
517 │localhost │ 127.0.0.0/8 ::1/128 │ All addresses on │
518 │ │ │ the local loopback │
519 ├──────────────┼─────────────────────┼─────────────────────┤
520 │link-local │ 169.254.0.0/16 │ All link-local IP │
521 │ │ fe80::/64 │ addresses │
522 ├──────────────┼─────────────────────┼─────────────────────┤
523 │multicast │ 224.0.0.0/4 │ All IP multicasting │
524 │ │ ff00::/8 │ addresses │
525 └──────────────┴─────────────────────┴─────────────────────┘
526 Note that these settings might not be supported on some systems
527 (for example if eBPF control group support is not enabled in the
528 underlying kernel or container manager). These settings will have
529 no effect in that case. If compatibility with such systems is
530 desired it is hence recommended to not exclusively rely on them for
531 IP security.
532
533 IPIngressFilterPath=BPF_FS_PROGRAM_PATH,
534 IPEgressFilterPath=BPF_FS_PROGRAM_PATH
535 Add custom network traffic filters implemented as BPF programs,
536 applying to all IP packets sent and received over AF_INET and
537 AF_INET6 sockets. Takes an absolute path to a pinned BPF program in
538 the BPF virtual filesystem (/sys/fs/bpf/).
539
540 The filters configured with this option are applied to all sockets
541 created by processes of this unit (or in the case of socket units,
542 associated with it). The filters are loaded in addition to filters
543 any of the parent slice units this unit might be a member of as
544 well as any IPAddressAllow= and IPAddressDeny= filters in any of
545 these units. By default there are no filters specified.
546
547 If these settings are used multiple times in the same unit all the
548 specified programs are attached. If an empty string is assigned to
549 these settings the program list is reset and all previous specified
550 programs ignored.
551
552 If the path BPF_FS_PROGRAM_PATH in IPIngressFilterPath= assignment
553 is already being handled by BPFProgram= ingress hook, e.g.
554 BPFProgram=ingress:BPF_FS_PROGRAM_PATH, the assignment will be
555 still considered valid and the program will be attached to a
556 cgroup. Same for IPEgressFilterPath= path and egress hook.
557
558 Note that for socket-activated services, the IP filter programs
559 configured on the socket unit apply to all sockets associated with
560 it directly, but not to any sockets created by the ultimately
561 activated services for it. Conversely, the IP filter programs
562 configured for the service are not applied to any sockets passed
563 into the service via socket activation. Thus, it is usually a good
564 idea, to replicate the IP filter programs on both the socket and
565 the service unit, however it often makes sense to maintain one
566 configuration more open and the other one more restricted,
567 depending on the usecase.
568
569 Note that these settings might not be supported on some systems
570 (for example if eBPF control group support is not enabled in the
571 underlying kernel or container manager). These settings will fail
572 the service in that case. If compatibility with such systems is
573 desired it is hence recommended to attach your filter manually
574 (requires Delegate=yes) instead of using this setting.
575
576 BPFProgram=type:program-path
577 Add a custom cgroup BPF program.
578
579 BPFProgram= allows attaching BPF hooks to the cgroup of a systemd
580 unit. (This generalizes the functionality exposed via
581 IPEgressFilterPath= for egress and IPIngressFilterPath= for
582 ingress.) Cgroup-bpf hooks in the form of BPF programs loaded to
583 the BPF filesystem are attached with cgroup-bpf attach flags
584 determined by the unit. For details about attachment types and
585 flags see
586 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/bpf.h.
587 For general BPF documentation please refer to
588 https://www.kernel.org/doc/html/latest/bpf/index.html.
589
590 The specification of BPF program consists of a type followed by a
591 program-path with ":" as the separator: type:program-path.
592
593 type is the string name of BPF attach type also used in bpftool.
594 type can be one of egress, ingress, sock_create, sock_ops, device,
595 bind4, bind6, connect4, connect6, post_bind4, post_bind6, sendmsg4,
596 sendmsg6, sysctl, recvmsg4, recvmsg6, getsockopt, setsockopt.
597
598 Setting BPFProgram= to an empty value makes previous assignments
599 ineffective.
600
601 Multiple assignments of the same type:program-path value have the
602 same effect as a single assignment: the program with the path
603 program-path will be attached to cgroup hook type just once.
604
605 If BPF egress pinned to program-path path is already being handled
606 by IPEgressFilterPath=, BPFProgram= assignment will be considered
607 valid and BPFProgram= will be attached to a cgroup. Similarly for
608 ingress hook and IPIngressFilterPath= assignment.
609
610 BPF programs passed with BPFProgram= are attached to the cgroup of
611 a unit with BPF attach flag multi, that allows further attachments
612 of the same type within cgroup hierarchy topped by the unit cgroup.
613
614 Examples:
615
616 BPFProgram=egress:/sys/fs/bpf/egress-hook
617 BPFProgram=bind6:/sys/fs/bpf/sock-addr-hook
618
619 SocketBindAllow=bind-rule, SocketBindDeny=bind-rule
620 Allow or deny binding a socket address to a socket by matching it
621 with the bind-rule and applying a corresponding action if there is
622 a match.
623
624 bind-rule describes socket properties such as address-family,
625 transport-protocol and ip-ports.
626
627 bind-rule := { [address-family:][transport-protocol:][ip-ports] |
628 any }
629
630 address-family := { ipv4 | ipv6 }
631
632 transport-protocol := { tcp | udp }
633
634 ip-ports := { ip-port | ip-port-range }
635
636 An optional address-family expects ipv4 or ipv6 values. If not
637 specified, a rule will be matched for both IPv4 and IPv6 addresses
638 and applied depending on other socket fields, e.g.
639 transport-protocol, ip-port.
640
641 An optional transport-protocol expects tcp or udp transport
642 protocol names. If not specified, a rule will be matched for any
643 transport protocol.
644
645 An optional ip-port value must lie within 1...65535 interval
646 inclusively, i.e. dynamic port 0 is not allowed. A range of
647 sequential ports is described by ip-port-range :=
648 ip-port-low-ip-port-high, where ip-port-low is smaller than or
649 equal to ip-port-high and both are within 1...65535 inclusively.
650
651 A special value any can be used to apply a rule to any address
652 family, transport protocol and any port with a positive value.
653
654 To allow multiple rules assign SocketBindAllow= or SocketBindDeny=
655 multiple times. To clear the existing assignments pass an empty
656 SocketBindAllow= or SocketBindDeny= assignment.
657
658 For each of SocketBindAllow= and SocketBindDeny=, maximum allowed
659 number of assignments is 128.
660
661 • Binding to a socket is allowed when a socket address matches an
662 entry in the SocketBindAllow= list.
663
664 • Otherwise, binding is denied when the socket address matches an
665 entry in the SocketBindDeny= list.
666
667 • Otherwise, binding is allowed.
668
669 The feature is implemented with cgroup/bind4 and cgroup/bind6
670 cgroup-bpf hooks.
671
672 Examples:
673
674 ...
675 # Allow binding IPv6 socket addresses with a port greater than or equal to 10000.
676 [Service]
677 SocketBindAllow=ipv6:10000-65535
678 SocketBindDeny=any
679 ...
680 # Allow binding IPv4 and IPv6 socket addresses with 1234 and 4321 ports.
681 [Service]
682 SocketBindAllow=1234
683 SocketBindAllow=4321
684 SocketBindDeny=any
685 ...
686 # Deny binding IPv6 socket addresses.
687 [Service]
688 SocketBindDeny=ipv6
689 ...
690 # Deny binding IPv4 and IPv6 socket addresses.
691 [Service]
692 SocketBindDeny=any
693 ...
694 # Allow binding only over TCP
695 [Service]
696 SocketBindAllow=tcp
697 SocketBindDeny=any
698 ...
699 # Allow binding only over IPv6/TCP
700 [Service]
701 SocketBindAllow=ipv6:tcp
702 SocketBindDeny=any
703 ...
704 # Allow binding ports within 10000-65535 range over IPv4/UDP.
705 [Service]
706 SocketBindAllow=ipv4:udp:10000-65535
707 SocketBindDeny=any
708 ...
709
710 DeviceAllow=
711 Control access to specific device nodes by the executed processes.
712 Takes two space-separated strings: a device node specifier followed
713 by a combination of r, w, m to control reading, writing, or
714 creation of the specific device node(s) by the unit (mknod),
715 respectively. On cgroup-v1 this controls the "devices.allow"
716 control group attribute. For details about this control group
717 attribute, see Device Whitelist Controller[9]. In the unified
718 cgroup hierarchy this functionality is implemented using eBPF
719 filtering.
720
721 The device node specifier is either a path to a device node in the
722 file system, starting with /dev/, or a string starting with either
723 "char-" or "block-" followed by a device group name, as listed in
724 /proc/devices. The latter is useful to allow-list all current and
725 future devices belonging to a specific device group at once. The
726 device group is matched according to filename globbing rules, you
727 may hence use the "*" and "?" wildcards. (Note that such globbing
728 wildcards are not available for device node path specifications!)
729 In order to match device nodes by numeric major/minor, use device
730 node paths in the /dev/char/ and /dev/block/ directories. However,
731 matching devices by major/minor is generally not recommended as
732 assignments are neither stable nor portable between systems or
733 different kernel versions.
734
735 Examples: /dev/sda5 is a path to a device node, referring to an ATA
736 or SCSI block device. "char-pts" and "char-alsa" are specifiers
737 for all pseudo TTYs and all ALSA sound devices, respectively.
738 "char-cpu/*" is a specifier matching all CPU related device groups.
739
740 Note that allow lists defined this way should only reference device
741 groups which are resolvable at the time the unit is started. Any
742 device groups not resolvable then are not added to the device allow
743 list. In order to work around this limitation, consider extending
744 service units with a pair of After=modprobe@xyz.service and
745 Wants=modprobe@xyz.service lines that load the necessary kernel
746 module implementing the device group if missing. Example:
747
748 ...
749 [Unit]
750 Wants=modprobe@loop.service
751 After=modprobe@loop.service
752
753 [Service]
754 DeviceAllow=block-loop
755 DeviceAllow=/dev/loop-control
756 ...
757
758 DevicePolicy=auto|closed|strict
759 Control the policy for allowing device access:
760
761 strict
762 means to only allow types of access that are explicitly
763 specified.
764
765 closed
766 in addition, allows access to standard pseudo devices including
767 /dev/null, /dev/zero, /dev/full, /dev/random, and /dev/urandom.
768
769 auto
770 in addition, allows access to all devices if no explicit
771 DeviceAllow= is present. This is the default.
772
773 Slice=
774 The name of the slice unit to place the unit in. Defaults to
775 system.slice for all non-instantiated units of all unit types
776 (except for slice units themselves see below). Instance units are
777 by default placed in a subslice of system.slice that is named after
778 the template name.
779
780 This option may be used to arrange systemd units in a hierarchy of
781 slices each of which might have resource settings applied.
782
783 For units of type slice, the only accepted value for this setting
784 is the parent slice. Since the name of a slice unit implies the
785 parent slice, it is hence redundant to ever set this parameter
786 directly for slice units.
787
788 Special care should be taken when relying on the default slice
789 assignment in templated service units that have
790 DefaultDependencies=no set, see systemd.service(5), section
791 "Default Dependencies" for details.
792
793 Delegate=
794 Turns on delegation of further resource control partitioning to
795 processes of the unit. Units where this is enabled may create and
796 manage their own private subhierarchy of control groups below the
797 control group of the unit itself. For unprivileged services (i.e.
798 those using the User= setting) the unit's control group will be
799 made accessible to the relevant user. When enabled the service
800 manager will refrain from manipulating control groups or moving
801 processes below the unit's control group, so that a clear concept
802 of ownership is established: the control group tree above the
803 unit's control group (i.e. towards the root control group) is owned
804 and managed by the service manager of the host, while the control
805 group tree below the unit's control group is owned and managed by
806 the unit itself. Takes either a boolean argument or a list of
807 control group controller names. If true, delegation is turned on,
808 and all supported controllers are enabled for the unit, making them
809 available to the unit's processes for management. If false,
810 delegation is turned off entirely (and no additional controllers
811 are enabled). If set to a list of controllers, delegation is turned
812 on, and the specified controllers are enabled for the unit. Note
813 that additional controllers than the ones specified might be made
814 available as well, depending on configuration of the containing
815 slice unit or other units contained in it. Note that assigning the
816 empty string will enable delegation, but reset the list of
817 controllers, all assignments prior to this will have no effect.
818 Defaults to false.
819
820 Note that controller delegation to less privileged code is only
821 safe on the unified control group hierarchy. Accordingly, access to
822 the specified controllers will not be granted to unprivileged
823 services on the legacy hierarchy, even when requested.
824
825 The following controller names may be specified: cpu, cpuacct,
826 cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
827 bpf-devices.
828
829 Not all of these controllers are available on all kernels however,
830 and some are specific to the unified hierarchy while others are
831 specific to the legacy hierarchy. Also note that the kernel might
832 support further controllers, which aren't covered here yet as
833 delegation is either not supported at all for them or not defined
834 cleanly.
835
836 For further details on the delegation model consult Control Group
837 APIs and Delegation[10].
838
839 DisableControllers=
840 Disables controllers from being enabled for a unit's children. If a
841 controller listed is already in use in its subtree, the controller
842 will be removed from the subtree. This can be used to avoid child
843 units being able to implicitly or explicitly enable a controller.
844 Defaults to not disabling any controllers.
845
846 It may not be possible to successfully disable a controller if the
847 unit or any child of the unit in question delegates controllers to
848 its children, as any delegated subtree of the cgroup hierarchy is
849 unmanaged by systemd.
850
851 Multiple controllers may be specified, separated by spaces. You may
852 also pass DisableControllers= multiple times, in which case each
853 new instance adds another controller to disable. Passing
854 DisableControllers= by itself with no controller name present
855 resets the disabled controller list.
856
857 The following controller names may be specified: cpu, cpuacct,
858 cpuset, io, blkio, memory, devices, pids, bpf-firewall, and
859 bpf-devices.
860
861 ManagedOOMSwap=auto|kill, ManagedOOMMemoryPressure=auto|kill
862 Specifies how systemd-oomd.service(8) will act on this unit's
863 cgroups. Defaults to auto.
864
865 When set to kill, systemd-oomd will actively monitor this unit's
866 cgroup metrics to decide whether it needs to act. If the cgroup
867 passes the limits set by oomd.conf(5) or its overrides,
868 systemd-oomd will send a SIGKILL to all of the processes under the
869 chosen candidate cgroup. Note that only descendant cgroups can be
870 eligible candidates for killing; the unit that set its property to
871 kill is not a candidate (unless one of its ancestors set their
872 property to kill). You can find more details on candidates and kill
873 behavior at systemd-oomd.service(8) and oomd.conf(5). Setting
874 either of these properties to kill will also automatically acquire
875 After= and Wants= dependencies on systemd-oomd.service unless
876 DefaultDependencies=no.
877
878 When set to auto, systemd-oomd will not actively use this cgroup's
879 data for monitoring and detection. However, if an ancestor cgroup
880 has one of these properties set to kill, a unit with auto can still
881 be an eligible candidate for systemd-oomd to act on.
882
883 ManagedOOMMemoryPressureLimit=
884 Overrides the default memory pressure limit set by oomd.conf(5) for
885 this unit (cgroup). Takes a percentage value between 0% and 100%,
886 inclusive. This property is ignored unless
887 ManagedOOMMemoryPressure=kill. Defaults to 0%, which means to use
888 the default set by oomd.conf(5).
889
890 ManagedOOMPreference=none|avoid|omit
891 Allows deprioritizing or omitting this unit's cgroup as a candidate
892 when systemd-oomd needs to act. Requires support for extended
893 attributes (see xattr(7)) in order to use avoid or omit.
894 Additionally, systemd-oomd will ignore these extended attributes if
895 the unit's cgroup is not owned by the root user.
896
897 If this property is set to avoid, the service manager will convey
898 this to systemd-oomd, which will only select this cgroup if there
899 are no other viable candidates.
900
901 If this property is set to omit, the service manager will convey
902 this to systemd-oomd, which will ignore this cgroup as a candidate
903 and will not perform any actions on it.
904
905 It is recommended to use avoid and omit sparingly, as it can
906 adversely affect systemd-oomd's kill behavior. Also note that these
907 extended attributes are not applied recursively to cgroups under
908 this unit's cgroup.
909
910 Defaults to none which means systemd-oomd will rank this unit's
911 cgroup as defined in systemd-oomd.service(8) and oomd.conf(5).
912
914 The following options are deprecated. Use the indicated superseding
915 options instead:
916
917 CPUShares=weight, StartupCPUShares=weight
918 Assign the specified CPU time share weight to the processes
919 executed. These options take an integer value and control the
920 "cpu.shares" control group attribute. The allowed range is 2 to
921 262144. Defaults to 1024. For details about this control group
922 attribute, see CFS Scheduler[4]. The available CPU time is split up
923 among all units within one slice relative to their CPU time share
924 weight.
925
926 While StartupCPUShares= only applies to the startup phase of the
927 system, CPUShares= applies to normal runtime of the system, and if
928 the former is not set also to the startup phase. Using
929 StartupCPUShares= allows prioritizing specific services at boot-up
930 differently than during normal runtime.
931
932 Implies "CPUAccounting=yes".
933
934 These settings are deprecated. Use CPUWeight= and StartupCPUWeight=
935 instead.
936
937 MemoryLimit=bytes
938 Specify the limit on maximum memory usage of the executed
939 processes. The limit specifies how much process and kernel memory
940 can be used by tasks in this unit. Takes a memory size in bytes. If
941 the value is suffixed with K, M, G or T, the specified memory size
942 is parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with
943 the base 1024), respectively. Alternatively, a percentage value may
944 be specified, which is taken relative to the installed physical
945 memory on the system. If assigned the special value "infinity", no
946 memory limit is applied. This controls the "memory.limit_in_bytes"
947 control group attribute. For details about this control group
948 attribute, see Memory Resource Controller[11].
949
950 Implies "MemoryAccounting=yes".
951
952 This setting is deprecated. Use MemoryMax= instead.
953
954 BlockIOAccounting=
955 Turn on Block I/O accounting for this unit, if the legacy control
956 group hierarchy is used on the system. Takes a boolean argument.
957 Note that turning on block I/O accounting for one unit will also
958 implicitly turn it on for all units contained in the same slice and
959 all for its parent slices and the units contained therein. The
960 system default for this setting may be controlled with
961 DefaultBlockIOAccounting= in systemd-system.conf(5).
962
963 This setting is deprecated. Use IOAccounting= instead.
964
965 BlockIOWeight=weight, StartupBlockIOWeight=weight
966 Set the default overall block I/O weight for the executed
967 processes, if the legacy control group hierarchy is used on the
968 system. Takes a single weight value (between 10 and 1000) to set
969 the default block I/O weight. This controls the "blkio.weight"
970 control group attribute, which defaults to 500. For details about
971 this control group attribute, see Block IO Controller[12]. The
972 available I/O bandwidth is split up among all units within one
973 slice relative to their block I/O weight.
974
975 While StartupBlockIOWeight= only applies to the startup phase of
976 the system, BlockIOWeight= applies to the later runtime of the
977 system, and if the former is not set also to the startup phase.
978 This allows prioritizing specific services at boot-up differently
979 than during runtime.
980
981 Implies "BlockIOAccounting=yes".
982
983 These settings are deprecated. Use IOWeight= and StartupIOWeight=
984 instead.
985
986 BlockIODeviceWeight=device weight
987 Set the per-device overall block I/O weight for the executed
988 processes, if the legacy control group hierarchy is used on the
989 system. Takes a space-separated pair of a file path and a weight
990 value to specify the device specific weight value, between 10 and
991 1000. (Example: "/dev/sda 500"). The file path may be specified as
992 path to a block device node or as any other file, in which case the
993 backing block device of the file system of the file is determined.
994 This controls the "blkio.weight_device" control group attribute,
995 which defaults to 1000. Use this option multiple times to set
996 weights for multiple devices. For details about this control group
997 attribute, see Block IO Controller[12].
998
999 Implies "BlockIOAccounting=yes".
1000
1001 This setting is deprecated. Use IODeviceWeight= instead.
1002
1003 BlockIOReadBandwidth=device bytes, BlockIOWriteBandwidth=device bytes
1004 Set the per-device overall block I/O bandwidth limit for the
1005 executed processes, if the legacy control group hierarchy is used
1006 on the system. Takes a space-separated pair of a file path and a
1007 bandwidth value (in bytes per second) to specify the device
1008 specific bandwidth. The file path may be a path to a block device
1009 node, or as any other file in which case the backing block device
1010 of the file system of the file is used. If the bandwidth is
1011 suffixed with K, M, G, or T, the specified bandwidth is parsed as
1012 Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the
1013 base of 1000. (Example:
1014 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This
1015 controls the "blkio.throttle.read_bps_device" and
1016 "blkio.throttle.write_bps_device" control group attributes. Use
1017 this option multiple times to set bandwidth limits for multiple
1018 devices. For details about these control group attributes, see
1019 Block IO Controller[12].
1020
1021 Implies "BlockIOAccounting=yes".
1022
1023 These settings are deprecated. Use IOReadBandwidthMax= and
1024 IOWriteBandwidthMax= instead.
1025
1027 systemd(1), systemd-system.conf(5), systemd.unit(5),
1028 systemd.service(5), systemd.slice(5), systemd.scope(5),
1029 systemd.socket(5), systemd.mount(5), systemd.swap(5), systemd.exec(5),
1030 systemd.directives(7), systemd.special(7), systemd-oomd.service(8), The
1031 documentation for control groups and specific controllers in the Linux
1032 kernel: Control Groups v2[2].
1033
1035 1. New Control Group Interfaces
1036 https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
1037
1038 2. Control Groups v2
1039 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
1040
1041 3. Control Groups version 1
1042 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/
1043
1044 4. CFS Scheduler
1045 https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html
1046
1047 5. sched-bwc.txt
1048 https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
1049
1050 6. Memory Interface Files
1051 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
1052
1053 7. Process Number Controller
1054 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/pids.html
1055
1056 8. IO Interface Files
1057 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#io-interface-files
1058
1059 9. Device Whitelist Controller
1060 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/devices.html
1061
1062 10. Control Group APIs and Delegation
1063 https://systemd.io/CGROUP_DELEGATION
1064
1065 11. Memory Resource Controller
1066 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html
1067
1068 12. Block IO Controller
1069 https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/blkio-controller.html
1070
1071
1072
1073systemd 249 SYSTEMD.RESOURCE-CONTROL(5)