1sched(7) Miscellaneous Information Manual sched(7)
2
3
4
6 sched - overview of CPU scheduling
7
9 Since Linux 2.6.23, the default scheduler is CFS, the "Completely Fair
10 Scheduler". The CFS scheduler replaced the earlier "O(1)" scheduler.
11
12 API summary
13 Linux provides the following system calls for controlling the CPU
14 scheduling behavior, policy, and priority of processes (or, more pre‐
15 cisely, threads).
16
17 nice(2)
18 Set a new nice value for the calling thread, and return the new
19 nice value.
20
21 getpriority(2)
22 Return the nice value of a thread, a process group, or the set
23 of threads owned by a specified user.
24
25 setpriority(2)
26 Set the nice value of a thread, a process group, or the set of
27 threads owned by a specified user.
28
29 sched_setscheduler(2)
30 Set the scheduling policy and parameters of a specified thread.
31
32 sched_getscheduler(2)
33 Return the scheduling policy of a specified thread.
34
35 sched_setparam(2)
36 Set the scheduling parameters of a specified thread.
37
38 sched_getparam(2)
39 Fetch the scheduling parameters of a specified thread.
40
41 sched_get_priority_max(2)
42 Return the maximum priority available in a specified scheduling
43 policy.
44
45 sched_get_priority_min(2)
46 Return the minimum priority available in a specified scheduling
47 policy.
48
49 sched_rr_get_interval(2)
50 Fetch the quantum used for threads that are scheduled under the
51 "round-robin" scheduling policy.
52
53 sched_yield(2)
54 Cause the caller to relinquish the CPU, so that some other
55 thread be executed.
56
57 sched_setaffinity(2)
58 (Linux-specific) Set the CPU affinity of a specified thread.
59
60 sched_getaffinity(2)
61 (Linux-specific) Get the CPU affinity of a specified thread.
62
63 sched_setattr(2)
64 Set the scheduling policy and parameters of a specified thread.
65 This (Linux-specific) system call provides a superset of the
66 functionality of sched_setscheduler(2) and sched_setparam(2).
67
68 sched_getattr(2)
69 Fetch the scheduling policy and parameters of a specified
70 thread. This (Linux-specific) system call provides a superset
71 of the functionality of sched_getscheduler(2) and sched_get‐
72 param(2).
73
74 Scheduling policies
75 The scheduler is the kernel component that decides which runnable
76 thread will be executed by the CPU next. Each thread has an associated
77 scheduling policy and a static scheduling priority, sched_priority.
78 The scheduler makes its decisions based on knowledge of the scheduling
79 policy and static priority of all threads on the system.
80
81 For threads scheduled under one of the normal scheduling policies
82 (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority is not used in
83 scheduling decisions (it must be specified as 0).
84
85 Processes scheduled under one of the real-time policies (SCHED_FIFO,
86 SCHED_RR) have a sched_priority value in the range 1 (low) to 99
87 (high). (As the numbers imply, real-time threads always have higher
88 priority than normal threads.) Note well: POSIX.1 requires an imple‐
89 mentation to support only a minimum 32 distinct priority levels for the
90 real-time policies, and some systems supply just this minimum. Porta‐
91 ble programs should use sched_get_priority_min(2) and sched_get_prior‐
92 ity_max(2) to find the range of priorities supported for a particular
93 policy.
94
95 Conceptually, the scheduler maintains a list of runnable threads for
96 each possible sched_priority value. In order to determine which thread
97 runs next, the scheduler looks for the nonempty list with the highest
98 static priority and selects the thread at the head of this list.
99
100 A thread's scheduling policy determines where it will be inserted into
101 the list of threads with equal static priority and how it will move in‐
102 side this list.
103
104 All scheduling is preemptive: if a thread with a higher static priority
105 becomes ready to run, the currently running thread will be preempted
106 and returned to the wait list for its static priority level. The
107 scheduling policy determines the ordering only within the list of
108 runnable threads with equal static priority.
109
110 SCHED_FIFO: First in-first out scheduling
111 SCHED_FIFO can be used only with static priorities higher than 0, which
112 means that when a SCHED_FIFO thread becomes runnable, it will always
113 immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or
114 SCHED_IDLE thread. SCHED_FIFO is a simple scheduling algorithm without
115 time slicing. For threads scheduled under the SCHED_FIFO policy, the
116 following rules apply:
117
118 • A running SCHED_FIFO thread that has been preempted by another
119 thread of higher priority will stay at the head of the list for its
120 priority and will resume execution as soon as all threads of higher
121 priority are blocked again.
122
123 • When a blocked SCHED_FIFO thread becomes runnable, it will be in‐
124 serted at the end of the list for its priority.
125
126 • If a call to sched_setscheduler(2), sched_setparam(2), sched_se‐
127 tattr(2), pthread_setschedparam(3), or pthread_setschedprio(3)
128 changes the priority of the running or runnable SCHED_FIFO thread
129 identified by pid the effect on the thread's position in the list
130 depends on the direction of the change to threads priority:
131
132 (a) If the thread's priority is raised, it is placed at the end of
133 the list for its new priority. As a consequence, it may pre‐
134 empt a currently running thread with the same priority.
135
136 (b) If the thread's priority is unchanged, its position in the run
137 list is unchanged.
138
139 (c) If the thread's priority is lowered, it is placed at the front
140 of the list for its new priority.
141
142 According to POSIX.1-2008, changes to a thread's priority (or pol‐
143 icy) using any mechanism other than pthread_setschedprio(3) should
144 result in the thread being placed at the end of the list for its
145 priority.
146
147 • A thread calling sched_yield(2) will be put at the end of the list.
148
149 No other events will move a thread scheduled under the SCHED_FIFO pol‐
150 icy in the wait list of runnable threads with equal static priority.
151
152 A SCHED_FIFO thread runs until either it is blocked by an I/O request,
153 it is preempted by a higher priority thread, or it calls
154 sched_yield(2).
155
156 SCHED_RR: Round-robin scheduling
157 SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described
158 above for SCHED_FIFO also applies to SCHED_RR, except that each thread
159 is allowed to run only for a maximum time quantum. If a SCHED_RR
160 thread has been running for a time period equal to or longer than the
161 time quantum, it will be put at the end of the list for its priority.
162 A SCHED_RR thread that has been preempted by a higher priority thread
163 and subsequently resumes execution as a running thread will complete
164 the unexpired portion of its round-robin time quantum. The length of
165 the time quantum can be retrieved using sched_rr_get_interval(2).
166
167 SCHED_DEADLINE: Sporadic task model deadline scheduling
168 Since Linux 3.14, Linux provides a deadline scheduling policy
169 (SCHED_DEADLINE). This policy is currently implemented using GEDF
170 (Global Earliest Deadline First) in conjunction with CBS (Constant
171 Bandwidth Server). To set and fetch this policy and associated at‐
172 tributes, one must use the Linux-specific sched_setattr(2) and
173 sched_getattr(2) system calls.
174
175 A sporadic task is one that has a sequence of jobs, where each job is
176 activated at most once per period. Each job also has a relative dead‐
177 line, before which it should finish execution, and a computation time,
178 which is the CPU time necessary for executing the job. The moment when
179 a task wakes up because a new job has to be executed is called the ar‐
180 rival time (also referred to as the request time or release time). The
181 start time is the time at which a task starts its execution. The abso‐
182 lute deadline is thus obtained by adding the relative deadline to the
183 arrival time.
184
185 The following diagram clarifies these terms:
186
187 arrival/wakeup absolute deadline
188 | start time |
189 | | |
190 v v v
191 -----x--------xooooooooooooooooo--------x--------x---
192 |<- comp. time ->|
193 |<------- relative deadline ------>|
194 |<-------------- period ------------------->|
195
196 When setting a SCHED_DEADLINE policy for a thread using sched_se‐
197 tattr(2), one can specify three parameters: Runtime, Deadline, and Pe‐
198 riod. These parameters do not necessarily correspond to the aforemen‐
199 tioned terms: usual practice is to set Runtime to something bigger than
200 the average computation time (or worst-case execution time for hard
201 real-time tasks), Deadline to the relative deadline, and Period to the
202 period of the task. Thus, for SCHED_DEADLINE scheduling, we have:
203
204 arrival/wakeup absolute deadline
205 | start time |
206 | | |
207 v v v
208 -----x--------xooooooooooooooooo--------x--------x---
209 |<-- Runtime ------->|
210 |<----------- Deadline ----------->|
211 |<-------------- Period ------------------->|
212
213 The three deadline-scheduling parameters correspond to the sched_run‐
214 time, sched_deadline, and sched_period fields of the sched_attr struc‐
215 ture; see sched_setattr(2). These fields express values in nanosec‐
216 onds. If sched_period is specified as 0, then it is made the same as
217 sched_deadline.
218
219 The kernel requires that:
220
221 sched_runtime <= sched_deadline <= sched_period
222
223 In addition, under the current implementation, all of the parameter
224 values must be at least 1024 (i.e., just over one microsecond, which is
225 the resolution of the implementation), and less than 2^63. If any of
226 these checks fails, sched_setattr(2) fails with the error EINVAL.
227
228 The CBS guarantees non-interference between tasks, by throttling
229 threads that attempt to over-run their specified Runtime.
230
231 To ensure deadline scheduling guarantees, the kernel must prevent situ‐
232 ations where the set of SCHED_DEADLINE threads is not feasible (schedu‐
233 lable) within the given constraints. The kernel thus performs an ad‐
234 mittance test when setting or changing SCHED_DEADLINE policy and at‐
235 tributes. This admission test calculates whether the change is feasi‐
236 ble; if it is not, sched_setattr(2) fails with the error EBUSY.
237
238 For example, it is required (but not necessarily sufficient) for the
239 total utilization to be less than or equal to the total number of CPUs
240 available, where, since each thread can maximally run for Runtime per
241 Period, that thread's utilization is its Runtime divided by its Period.
242
243 In order to fulfill the guarantees that are made when a thread is ad‐
244 mitted to the SCHED_DEADLINE policy, SCHED_DEADLINE threads are the
245 highest priority (user controllable) threads in the system; if any
246 SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled
247 under one of the other policies.
248
249 A call to fork(2) by a thread scheduled under the SCHED_DEADLINE policy
250 fails with the error EAGAIN, unless the thread has its reset-on-fork
251 flag set (see below).
252
253 A SCHED_DEADLINE thread that calls sched_yield(2) will yield the cur‐
254 rent job and wait for a new period to begin.
255
256 SCHED_OTHER: Default Linux time-sharing scheduling
257 SCHED_OTHER can be used at only static priority 0 (i.e., threads under
258 real-time policies always have priority over SCHED_OTHER processes).
259 SCHED_OTHER is the standard Linux time-sharing scheduler that is in‐
260 tended for all threads that do not require the special real-time mecha‐
261 nisms.
262
263 The thread to run is chosen from the static priority 0 list based on a
264 dynamic priority that is determined only inside this list. The dynamic
265 priority is based on the nice value (see below) and is increased for
266 each time quantum the thread is ready to run, but denied to run by the
267 scheduler. This ensures fair progress among all SCHED_OTHER threads.
268
269 In the Linux kernel source code, the SCHED_OTHER policy is actually
270 named SCHED_NORMAL.
271
272 The nice value
273 The nice value is an attribute that can be used to influence the CPU
274 scheduler to favor or disfavor a process in scheduling decisions. It
275 affects the scheduling of SCHED_OTHER and SCHED_BATCH (see below) pro‐
276 cesses. The nice value can be modified using nice(2), setpriority(2),
277 or sched_setattr(2).
278
279 According to POSIX.1, the nice value is a per-process attribute; that
280 is, the threads in a process should share a nice value. However, on
281 Linux, the nice value is a per-thread attribute: different threads in
282 the same process may have different nice values.
283
284 The range of the nice value varies across UNIX systems. On modern
285 Linux, the range is -20 (high priority) to +19 (low priority). On some
286 other systems, the range is -20..20. Very early Linux kernels (before
287 Linux 2.0) had the range -infinity..15.
288
289 The degree to which the nice value affects the relative scheduling of
290 SCHED_OTHER processes likewise varies across UNIX systems and across
291 Linux kernel versions.
292
293 With the advent of the CFS scheduler in Linux 2.6.23, Linux adopted an
294 algorithm that causes relative differences in nice values to have a
295 much stronger effect. In the current implementation, each unit of dif‐
296 ference in the nice values of two processes results in a factor of 1.25
297 in the degree to which the scheduler favors the higher priority
298 process. This causes very low nice values (+19) to truly provide lit‐
299 tle CPU to a process whenever there is any other higher priority load
300 on the system, and makes high nice values (-20) deliver most of the CPU
301 to applications that require it (e.g., some audio applications).
302
303 On Linux, the RLIMIT_NICE resource limit can be used to define a limit
304 to which an unprivileged process's nice value can be raised; see setr‐
305 limit(2) for details.
306
307 For further details on the nice value, see the subsections on the auto‐
308 group feature and group scheduling, below.
309
310 SCHED_BATCH: Scheduling batch processes
311 (Since Linux 2.6.16.) SCHED_BATCH can be used only at static priority
312 0. This policy is similar to SCHED_OTHER in that it schedules the
313 thread according to its dynamic priority (based on the nice value).
314 The difference is that this policy will cause the scheduler to always
315 assume that the thread is CPU-intensive. Consequently, the scheduler
316 will apply a small scheduling penalty with respect to wakeup behavior,
317 so that this thread is mildly disfavored in scheduling decisions.
318
319 This policy is useful for workloads that are noninteractive, but do not
320 want to lower their nice value, and for workloads that want a determin‐
321 istic scheduling policy without interactivity causing extra preemptions
322 (between the workload's tasks).
323
324 SCHED_IDLE: Scheduling very low priority jobs
325 (Since Linux 2.6.23.) SCHED_IDLE can be used only at static priority
326 0; the process nice value has no influence for this policy.
327
328 This policy is intended for running jobs at extremely low priority
329 (lower even than a +19 nice value with the SCHED_OTHER or SCHED_BATCH
330 policies).
331
332 Resetting scheduling policy for child processes
333 Each thread has a reset-on-fork scheduling flag. When this flag is
334 set, children created by fork(2) do not inherit privileged scheduling
335 policies. The reset-on-fork flag can be set by either:
336
337 • ORing the SCHED_RESET_ON_FORK flag into the policy argument when
338 calling sched_setscheduler(2) (since Linux 2.6.32); or
339
340 • specifying the SCHED_FLAG_RESET_ON_FORK flag in attr.sched_flags
341 when calling sched_setattr(2).
342
343 Note that the constants used with these two APIs have different names.
344 The state of the reset-on-fork flag can analogously be retrieved using
345 sched_getscheduler(2) and sched_getattr(2).
346
347 The reset-on-fork feature is intended for media-playback applications,
348 and can be used to prevent applications evading the RLIMIT_RTTIME re‐
349 source limit (see getrlimit(2)) by creating multiple child processes.
350
351 More precisely, if the reset-on-fork flag is set, the following rules
352 apply for subsequently created children:
353
354 • If the calling thread has a scheduling policy of SCHED_FIFO or
355 SCHED_RR, the policy is reset to SCHED_OTHER in child processes.
356
357 • If the calling process has a negative nice value, the nice value is
358 reset to zero in child processes.
359
360 After the reset-on-fork flag has been enabled, it can be reset only if
361 the thread has the CAP_SYS_NICE capability. This flag is disabled in
362 child processes created by fork(2).
363
364 Privileges and resource limits
365 Before Linux 2.6.12, only privileged (CAP_SYS_NICE) threads can set a
366 nonzero static priority (i.e., set a real-time scheduling policy). The
367 only change that an unprivileged thread can make is to set the
368 SCHED_OTHER policy, and this can be done only if the effective user ID
369 of the caller matches the real or effective user ID of the target
370 thread (i.e., the thread specified by pid) whose policy is being
371 changed.
372
373 A thread must be privileged (CAP_SYS_NICE) in order to set or modify a
374 SCHED_DEADLINE policy.
375
376 Since Linux 2.6.12, the RLIMIT_RTPRIO resource limit defines a ceiling
377 on an unprivileged thread's static priority for the SCHED_RR and
378 SCHED_FIFO policies. The rules for changing scheduling policy and pri‐
379 ority are as follows:
380
381 • If an unprivileged thread has a nonzero RLIMIT_RTPRIO soft limit,
382 then it can change its scheduling policy and priority, subject to
383 the restriction that the priority cannot be set to a value higher
384 than the maximum of its current priority and its RLIMIT_RTPRIO soft
385 limit.
386
387 • If the RLIMIT_RTPRIO soft limit is 0, then the only permitted
388 changes are to lower the priority, or to switch to a non-real-time
389 policy.
390
391 • Subject to the same rules, another unprivileged thread can also make
392 these changes, as long as the effective user ID of the thread making
393 the change matches the real or effective user ID of the target
394 thread.
395
396 • Special rules apply for the SCHED_IDLE policy. Before Linux 2.6.39,
397 an unprivileged thread operating under this policy cannot change its
398 policy, regardless of the value of its RLIMIT_RTPRIO resource limit.
399 Since Linux 2.6.39, an unprivileged thread can switch to either the
400 SCHED_BATCH or the SCHED_OTHER policy so long as its nice value
401 falls within the range permitted by its RLIMIT_NICE resource limit
402 (see getrlimit(2)).
403
404 Privileged (CAP_SYS_NICE) threads ignore the RLIMIT_RTPRIO limit; as
405 with older kernels, they can make arbitrary changes to scheduling pol‐
406 icy and priority. See getrlimit(2) for further information on
407 RLIMIT_RTPRIO.
408
409 Limiting the CPU usage of real-time and deadline processes
410 A nonblocking infinite loop in a thread scheduled under the SCHED_FIFO,
411 SCHED_RR, or SCHED_DEADLINE policy can potentially block all other
412 threads from accessing the CPU forever. Before Linux 2.6.25, the only
413 way of preventing a runaway real-time process from freezing the system
414 was to run (at the console) a shell scheduled under a higher static
415 priority than the tested application. This allows an emergency kill of
416 tested real-time applications that do not block or terminate as ex‐
417 pected.
418
419 Since Linux 2.6.25, there are other techniques for dealing with runaway
420 real-time and deadline processes. One of these is to use the
421 RLIMIT_RTTIME resource limit to set a ceiling on the CPU time that a
422 real-time process may consume. See getrlimit(2) for details.
423
424 Since Linux 2.6.25, Linux also provides two /proc files that can be
425 used to reserve a certain amount of CPU time to be used by non-real-
426 time processes. Reserving CPU time in this fashion allows some CPU
427 time to be allocated to (say) a root shell that can be used to kill a
428 runaway process. Both of these files specify time values in microsec‐
429 onds:
430
431 /proc/sys/kernel/sched_rt_period_us
432 This file specifies a scheduling period that is equivalent to
433 100% CPU bandwidth. The value in this file can range from 1 to
434 INT_MAX, giving an operating range of 1 microsecond to around 35
435 minutes. The default value in this file is 1,000,000 (1 sec‐
436 ond).
437
438 /proc/sys/kernel/sched_rt_runtime_us
439 The value in this file specifies how much of the "period" time
440 can be used by all real-time and deadline scheduled processes on
441 the system. The value in this file can range from -1 to
442 INT_MAX-1. Specifying -1 makes the run time the same as the pe‐
443 riod; that is, no CPU time is set aside for non-real-time pro‐
444 cesses (which was the behavior before Linux 2.6.25). The de‐
445 fault value in this file is 950,000 (0.95 seconds), meaning that
446 5% of the CPU time is reserved for processes that don't run un‐
447 der a real-time or deadline scheduling policy.
448
449 Response time
450 A blocked high priority thread waiting for I/O has a certain response
451 time before it is scheduled again. The device driver writer can
452 greatly reduce this response time by using a "slow interrupt" interrupt
453 handler.
454
455 Miscellaneous
456 Child processes inherit the scheduling policy and parameters across a
457 fork(2). The scheduling policy and parameters are preserved across ex‐
458 ecve(2).
459
460 Memory locking is usually needed for real-time processes to avoid pag‐
461 ing delays; this can be done with mlock(2) or mlockall(2).
462
463 The autogroup feature
464 Since Linux 2.6.38, the kernel provides a feature known as autogrouping
465 to improve interactive desktop performance in the face of multiprocess,
466 CPU-intensive workloads such as building the Linux kernel with large
467 numbers of parallel build processes (i.e., the make(1) -j flag).
468
469 This feature operates in conjunction with the CFS scheduler and re‐
470 quires a kernel that is configured with CONFIG_SCHED_AUTOGROUP. On a
471 running system, this feature is enabled or disabled via the file
472 /proc/sys/kernel/sched_autogroup_enabled; a value of 0 disables the
473 feature, while a value of 1 enables it. The default value in this file
474 is 1, unless the kernel was booted with the noautogroup parameter.
475
476 A new autogroup is created when a new session is created via setsid(2);
477 this happens, for example, when a new terminal window is started. A
478 new process created by fork(2) inherits its parent's autogroup member‐
479 ship. Thus, all of the processes in a session are members of the same
480 autogroup. An autogroup is automatically destroyed when the last
481 process in the group terminates.
482
483 When autogrouping is enabled, all of the members of an autogroup are
484 placed in the same kernel scheduler "task group". The CFS scheduler
485 employs an algorithm that equalizes the distribution of CPU cycles
486 across task groups. The benefits of this for interactive desktop per‐
487 formance can be described via the following example.
488
489 Suppose that there are two autogroups competing for the same CPU (i.e.,
490 presume either a single CPU system or the use of taskset(1) to confine
491 all the processes to the same CPU on an SMP system). The first group
492 contains ten CPU-bound processes from a kernel build started with
493 make -j10. The other contains a single CPU-bound process: a video
494 player. The effect of autogrouping is that the two groups will each
495 receive half of the CPU cycles. That is, the video player will receive
496 50% of the CPU cycles, rather than just 9% of the cycles, which would
497 likely lead to degraded video playback. The situation on an SMP system
498 is more complex, but the general effect is the same: the scheduler dis‐
499 tributes CPU cycles across task groups such that an autogroup that con‐
500 tains a large number of CPU-bound processes does not end up hogging CPU
501 cycles at the expense of the other jobs on the system.
502
503 A process's autogroup (task group) membership can be viewed via the
504 file /proc/pid/autogroup:
505
506 $ cat /proc/1/autogroup
507 /autogroup-1 nice 0
508
509 This file can also be used to modify the CPU bandwidth allocated to an
510 autogroup. This is done by writing a number in the "nice" range to the
511 file to set the autogroup's nice value. The allowed range is from +19
512 (low priority) to -20 (high priority). (Writing values outside of this
513 range causes write(2) to fail with the error EINVAL.)
514
515 The autogroup nice setting has the same meaning as the process nice
516 value, but applies to distribution of CPU cycles to the autogroup as a
517 whole, based on the relative nice values of other autogroups. For a
518 process inside an autogroup, the CPU cycles that it receives will be a
519 product of the autogroup's nice value (compared to other autogroups)
520 and the process's nice value (compared to other processes in the same
521 autogroup.
522
523 The use of the cgroups(7) CPU controller to place processes in cgroups
524 other than the root CPU cgroup overrides the effect of autogrouping.
525
526 The autogroup feature groups only processes scheduled under non-real-
527 time policies (SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE). It does not
528 group processes scheduled under real-time and deadline policies. Those
529 processes are scheduled according to the rules described earlier.
530
531 The nice value and group scheduling
532 When scheduling non-real-time processes (i.e., those scheduled under
533 the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the CFS sched‐
534 uler employs a technique known as "group scheduling", if the kernel was
535 configured with the CONFIG_FAIR_GROUP_SCHED option (which is typical).
536
537 Under group scheduling, threads are scheduled in "task groups". Task
538 groups have a hierarchical relationship, rooted under the initial task
539 group on the system, known as the "root task group". Task groups are
540 formed in the following circumstances:
541
542 • All of the threads in a CPU cgroup form a task group. The parent of
543 this task group is the task group of the corresponding parent
544 cgroup.
545
546 • If autogrouping is enabled, then all of the threads that are (im‐
547 plicitly) placed in an autogroup (i.e., the same session, as created
548 by setsid(2)) form a task group. Each new autogroup is thus a sepa‐
549 rate task group. The root task group is the parent of all such au‐
550 togroups.
551
552 • If autogrouping is enabled, then the root task group consists of all
553 processes in the root CPU cgroup that were not otherwise implicitly
554 placed into a new autogroup.
555
556 • If autogrouping is disabled, then the root task group consists of
557 all processes in the root CPU cgroup.
558
559 • If group scheduling was disabled (i.e., the kernel was configured
560 without CONFIG_FAIR_GROUP_SCHED), then all of the processes on the
561 system are notionally placed in a single task group.
562
563 Under group scheduling, a thread's nice value has an effect for sched‐
564 uling decisions only relative to other threads in the same task group.
565 This has some surprising consequences in terms of the traditional se‐
566 mantics of the nice value on UNIX systems. In particular, if auto‐
567 grouping is enabled (which is the default in various distributions),
568 then employing setpriority(2) or nice(1) on a process has an effect
569 only for scheduling relative to other processes executed in the same
570 session (typically: the same terminal window).
571
572 Conversely, for two processes that are (for example) the sole CPU-bound
573 processes in different sessions (e.g., different terminal windows, each
574 of whose jobs are tied to different autogroups), modifying the nice
575 value of the process in one of the sessions has no effect in terms of
576 the scheduler's decisions relative to the process in the other session.
577 A possibly useful workaround here is to use a command such as the fol‐
578 lowing to modify the autogroup nice value for all of the processes in a
579 terminal session:
580
581 $ echo 10 > /proc/self/autogroup
582
583 Real-time features in the mainline Linux kernel
584 Since Linux 2.6.18, Linux is gradually becoming equipped with real-time
585 capabilities, most of which are derived from the former realtime-pre‐
586 empt patch set. Until the patches have been completely merged into the
587 mainline kernel, they must be installed to achieve the best real-time
588 performance. These patches are named:
589
590 patch-kernelversion-rtpatchversion
591
592 and can be downloaded from ⟨http://www.kernel.org/pub/linux/kernel
593 /projects/rt/⟩.
594
595 Without the patches and prior to their full inclusion into the mainline
596 kernel, the kernel configuration offers only the three preemption
597 classes CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_VOLUNTARY, and CONFIG_PRE‐
598 EMPT_DESKTOP which respectively provide no, some, and considerable re‐
599 duction of the worst-case scheduling latency.
600
601 With the patches applied or after their full inclusion into the main‐
602 line kernel, the additional configuration item CONFIG_PREEMPT_RT be‐
603 comes available. If this is selected, Linux is transformed into a reg‐
604 ular real-time operating system. The FIFO and RR scheduling policies
605 are then used to run a thread with true real-time priority and a mini‐
606 mum worst-case scheduling latency.
607
609 The cgroups(7) CPU controller can be used to limit the CPU consumption
610 of groups of processes.
611
612 Originally, Standard Linux was intended as a general-purpose operating
613 system being able to handle background processes, interactive applica‐
614 tions, and less demanding real-time applications (applications that
615 need to usually meet timing deadlines). Although the Linux 2.6 allowed
616 for kernel preemption and the newly introduced O(1) scheduler ensures
617 that the time needed to schedule is fixed and deterministic irrespec‐
618 tive of the number of active tasks, true real-time computing was not
619 possible up to Linux 2.6.17.
620
622 chcpu(1), chrt(1), lscpu(1), ps(1), taskset(1), top(1), getpriority(2),
623 mlock(2), mlockall(2), munlock(2), munlockall(2), nice(2),
624 sched_get_priority_max(2), sched_get_priority_min(2),
625 sched_getaffinity(2), sched_getparam(2), sched_getscheduler(2),
626 sched_rr_get_interval(2), sched_setaffinity(2), sched_setparam(2),
627 sched_setscheduler(2), sched_yield(2), setpriority(2),
628 pthread_getaffinity_np(3), pthread_getschedparam(3),
629 pthread_setaffinity_np(3), sched_getcpu(3), capabilities(7), cpuset(7)
630
631 Programming for the real world - POSIX.4 by Bill O. Gallmeister,
632 O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
633
634 The Linux kernel source files Documentation/scheduler/sched-deadline
635 .txt, Documentation/scheduler/sched-rt-group.txt, Documentation/
636 scheduler/sched-design-CFS.txt, and Documentation/scheduler/
637 sched-nice-design.txt
638
639
640
641Linux man-pages 6.04 2023-02-10 sched(7)