1SCHED(7) Linux Programmer's Manual SCHED(7)
2
3
4
6 sched - overview of CPU scheduling
7
9 Since Linux 2.6.23, the default scheduler is CFS, the "Completely Fair
10 Scheduler". The CFS scheduler replaced the earlier "O(1)" scheduler.
11
12 API summary
13 Linux provides the following system calls for controlling the CPU
14 scheduling behavior, policy, and priority of processes (or, more pre‐
15 cisely, threads).
16
17 nice(2)
18 Set a new nice value for the calling thread, and return the new
19 nice value.
20
21 getpriority(2)
22 Return the nice value of a thread, a process group, or the set
23 of threads owned by a specified user.
24
25 setpriority(2)
26 Set the nice value of a thread, a process group, or the set of
27 threads owned by a specified user.
28
29 sched_setscheduler(2)
30 Set the scheduling policy and parameters of a specified thread.
31
32 sched_getscheduler(2)
33 Return the scheduling policy of a specified thread.
34
35 sched_setparam(2)
36 Set the scheduling parameters of a specified thread.
37
38 sched_getparam(2)
39 Fetch the scheduling parameters of a specified thread.
40
41 sched_get_priority_max(2)
42 Return the maximum priority available in a specified scheduling
43 policy.
44
45 sched_get_priority_min(2)
46 Return the minimum priority available in a specified scheduling
47 policy.
48
49 sched_rr_get_interval(2)
50 Fetch the quantum used for threads that are scheduled under the
51 "round-robin" scheduling policy.
52
53 sched_yield(2)
54 Cause the caller to relinquish the CPU, so that some other
55 thread be executed.
56
57 sched_setaffinity(2)
58 (Linux-specific) Set the CPU affinity of a specified thread.
59
60 sched_getaffinity(2)
61 (Linux-specific) Get the CPU affinity of a specified thread.
62
63 sched_setattr(2)
64 Set the scheduling policy and parameters of a specified thread.
65 This (Linux-specific) system call provides a superset of the
66 functionality of sched_setscheduler(2) and sched_setparam(2).
67
68 sched_getattr(2)
69 Fetch the scheduling policy and parameters of a specified
70 thread. This (Linux-specific) system call provides a superset
71 of the functionality of sched_getscheduler(2) and sched_get‐
72 param(2).
73
74 Scheduling policies
75 The scheduler is the kernel component that decides which runnable
76 thread will be executed by the CPU next. Each thread has an associated
77 scheduling policy and a static scheduling priority, sched_priority.
78 The scheduler makes its decisions based on knowledge of the scheduling
79 policy and static priority of all threads on the system.
80
81 For threads scheduled under one of the normal scheduling policies
82 (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority is not used in
83 scheduling decisions (it must be specified as 0).
84
85 Processes scheduled under one of the real-time policies (SCHED_FIFO,
86 SCHED_RR) have a sched_priority value in the range 1 (low) to 99
87 (high). (As the numbers imply, real-time threads always have higher
88 priority than normal threads.) Note well: POSIX.1 requires an imple‐
89 mentation to support only a minimum 32 distinct priority levels for the
90 real-time policies, and some systems supply just this minimum. Porta‐
91 ble programs should use sched_get_priority_min(2) and sched_get_prior‐
92 ity_max(2) to find the range of priorities supported for a particular
93 policy.
94
95 Conceptually, the scheduler maintains a list of runnable threads for
96 each possible sched_priority value. In order to determine which thread
97 runs next, the scheduler looks for the nonempty list with the highest
98 static priority and selects the thread at the head of this list.
99
100 A thread's scheduling policy determines where it will be inserted into
101 the list of threads with equal static priority and how it will move
102 inside this list.
103
104 All scheduling is preemptive: if a thread with a higher static priority
105 becomes ready to run, the currently running thread will be preempted
106 and returned to the wait list for its static priority level. The
107 scheduling policy determines the ordering only within the list of
108 runnable threads with equal static priority.
109
110 SCHED_FIFO: First in-first out scheduling
111 SCHED_FIFO can be used only with static priorities higher than 0, which
112 means that when a SCHED_FIFO threads becomes runnable, it will always
113 immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or
114 SCHED_IDLE thread. SCHED_FIFO is a simple scheduling algorithm without
115 time slicing. For threads scheduled under the SCHED_FIFO policy, the
116 following rules apply:
117
118 1) A running SCHED_FIFO thread that has been preempted by another
119 thread of higher priority will stay at the head of the list for its
120 priority and will resume execution as soon as all threads of higher
121 priority are blocked again.
122
123 2) When a blocked SCHED_FIFO thread becomes runnable, it will be
124 inserted at the end of the list for its priority.
125
126 3) If a call to sched_setscheduler(2), sched_setparam(2),
127 sched_setattr(2), pthread_setschedparam(3), or pthread_setsched‐
128 prio(3) changes the priority of the running or runnable SCHED_FIFO
129 thread identified by pid the effect on the thread's position in the
130 list depends on the direction of the change to threads priority:
131
132 · If the thread's priority is raised, it is placed at the end of
133 the list for its new priority. As a consequence, it may preempt
134 a currently running thread with the same priority.
135
136 · If the thread's priority is unchanged, its position in the run
137 list is unchanged.
138
139 · If the thread's priority is lowered, it is placed at the front of
140 the list for its new priority.
141
142 According to POSIX.1-2008, changes to a thread's priority (or pol‐
143 icy) using any mechanism other than pthread_setschedprio(3) should
144 result in the thread being placed at the end of the list for its
145 priority.
146
147 4) A thread calling sched_yield(2) will be put at the end of the list.
148
149 No other events will move a thread scheduled under the SCHED_FIFO pol‐
150 icy in the wait list of runnable threads with equal static priority.
151
152 A SCHED_FIFO thread runs until either it is blocked by an I/O request,
153 it is preempted by a higher priority thread, or it calls
154 sched_yield(2).
155
156 SCHED_RR: Round-robin scheduling
157 SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described
158 above for SCHED_FIFO also applies to SCHED_RR, except that each thread
159 is allowed to run only for a maximum time quantum. If a SCHED_RR
160 thread has been running for a time period equal to or longer than the
161 time quantum, it will be put at the end of the list for its priority.
162 A SCHED_RR thread that has been preempted by a higher priority thread
163 and subsequently resumes execution as a running thread will complete
164 the unexpired portion of its round-robin time quantum. The length of
165 the time quantum can be retrieved using sched_rr_get_interval(2).
166
167 SCHED_DEADLINE: Sporadic task model deadline scheduling
168 Since version 3.14, Linux provides a deadline scheduling policy
169 (SCHED_DEADLINE). This policy is currently implemented using GEDF
170 (Global Earliest Deadline First) in conjunction with CBS (Constant
171 Bandwidth Server). To set and fetch this policy and associated
172 attributes, one must use the Linux-specific sched_setattr(2) and
173 sched_getattr(2) system calls.
174
175 A sporadic task is one that has a sequence of jobs, where each job is
176 activated at most once per period. Each job also has a relative dead‐
177 line, before which it should finish execution, and a computation time,
178 which is the CPU time necessary for executing the job. The moment when
179 a task wakes up because a new job has to be executed is called the
180 arrival time (also referred to as the request time or release time).
181 The start time is the time at which a task starts its execution. The
182 absolute deadline is thus obtained by adding the relative deadline to
183 the arrival time.
184
185 The following diagram clarifies these terms:
186
187 arrival/wakeup absolute deadline
188 | start time |
189 | | |
190 v v v
191 -----x--------xooooooooooooooooo--------x--------x---
192 |<- comp. time ->|
193 |<------- relative deadline ------>|
194 |<-------------- period ------------------->|
195
196 When setting a SCHED_DEADLINE policy for a thread using
197 sched_setattr(2), one can specify three parameters: Runtime, Deadline,
198 and Period. These parameters do not necessarily correspond to the
199 aforementioned terms: usual practice is to set Runtime to something
200 bigger than the average computation time (or worst-case execution time
201 for hard real-time tasks), Deadline to the relative deadline, and
202 Period to the period of the task. Thus, for SCHED_DEADLINE scheduling,
203 we have:
204
205 arrival/wakeup absolute deadline
206 | start time |
207 | | |
208 v v v
209 -----x--------xooooooooooooooooo--------x--------x---
210 |<-- Runtime ------->|
211 |<----------- Deadline ----------->|
212 |<-------------- Period ------------------->|
213
214 The three deadline-scheduling parameters correspond to the sched_run‐
215 time, sched_deadline, and sched_period fields of the sched_attr struc‐
216 ture; see sched_setattr(2). These fields express values in nanosec‐
217 onds. If sched_period is specified as 0, then it is made the same as
218 sched_deadline.
219
220 The kernel requires that:
221
222 sched_runtime <= sched_deadline <= sched_period
223
224 In addition, under the current implementation, all of the parameter
225 values must be at least 1024 (i.e., just over one microsecond, which is
226 the resolution of the implementation), and less than 2^63. If any of
227 these checks fails, sched_setattr(2) fails with the error EINVAL.
228
229 The CBS guarantees non-interference between tasks, by throttling
230 threads that attempt to over-run their specified Runtime.
231
232 To ensure deadline scheduling guarantees, the kernel must prevent situ‐
233 ations where the set of SCHED_DEADLINE threads is not feasible (schedu‐
234 lable) within the given constraints. The kernel thus performs an
235 admittance test when setting or changing SCHED_DEADLINE policy and
236 attributes. This admission test calculates whether the change is fea‐
237 sible; if it is not, sched_setattr(2) fails with the error EBUSY.
238
239 For example, it is required (but not necessarily sufficient) for the
240 total utilization to be less than or equal to the total number of CPUs
241 available, where, since each thread can maximally run for Runtime per
242 Period, that thread's utilization is its Runtime divided by its Period.
243
244 In order to fulfill the guarantees that are made when a thread is
245 admitted to the SCHED_DEADLINE policy, SCHED_DEADLINE threads are the
246 highest priority (user controllable) threads in the system; if any
247 SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled
248 under one of the other policies.
249
250 A call to fork(2) by a thread scheduled under the SCHED_DEADLINE policy
251 fails with the error EAGAIN, unless the thread has its reset-on-fork
252 flag set (see below).
253
254 A SCHED_DEADLINE thread that calls sched_yield(2) will yield the cur‐
255 rent job and wait for a new period to begin.
256
257 SCHED_OTHER: Default Linux time-sharing scheduling
258 SCHED_OTHER can be used at only static priority 0 (i.e., threads under
259 real-time policies always have priority over SCHED_OTHER processes).
260 SCHED_OTHER is the standard Linux time-sharing scheduler that is
261 intended for all threads that do not require the special real-time
262 mechanisms.
263
264 The thread to run is chosen from the static priority 0 list based on a
265 dynamic priority that is determined only inside this list. The dynamic
266 priority is based on the nice value (see below) and is increased for
267 each time quantum the thread is ready to run, but denied to run by the
268 scheduler. This ensures fair progress among all SCHED_OTHER threads.
269
270 The nice value
271 The nice value is an attribute that can be used to influence the CPU
272 scheduler to favor or disfavor a process in scheduling decisions. It
273 affects the scheduling of SCHED_OTHER and SCHED_BATCH (see below) pro‐
274 cesses. The nice value can be modified using nice(2), setpriority(2),
275 or sched_setattr(2).
276
277 According to POSIX.1, the nice value is a per-process attribute; that
278 is, the threads in a process should share a nice value. However, on
279 Linux, the nice value is a per-thread attribute: different threads in
280 the same process may have different nice values.
281
282 The range of the nice value varies across UNIX systems. On modern
283 Linux, the range is -20 (high priority) to +19 (low priority). On some
284 other systems, the range is -20..20. Very early Linux kernels (Before
285 Linux 2.0) had the range -infinity..15.
286
287 The degree to which the nice value affects the relative scheduling of
288 SCHED_OTHER processes likewise varies across UNIX systems and across
289 Linux kernel versions.
290
291 With the advent of the CFS scheduler in kernel 2.6.23, Linux adopted an
292 algorithm that causes relative differences in nice values to have a
293 much stronger effect. In the current implementation, each unit of dif‐
294 ference in the nice values of two processes results in a factor of 1.25
295 in the degree to which the scheduler favors the higher priority
296 process. This causes very low nice values (+19) to truly provide lit‐
297 tle CPU to a process whenever there is any other higher priority load
298 on the system, and makes high nice values (-20) deliver most of the CPU
299 to applications that require it (e.g., some audio applications).
300
301 On Linux, the RLIMIT_NICE resource limit can be used to define a limit
302 to which an unprivileged process's nice value can be raised; see setr‐
303 limit(2) for details.
304
305 For further details on the nice value, see the subsections on the auto‐
306 group feature and group scheduling, below.
307
308 SCHED_BATCH: Scheduling batch processes
309 (Since Linux 2.6.16.) SCHED_BATCH can be used only at static priority
310 0. This policy is similar to SCHED_OTHER in that it schedules the
311 thread according to its dynamic priority (based on the nice value).
312 The difference is that this policy will cause the scheduler to always
313 assume that the thread is CPU-intensive. Consequently, the scheduler
314 will apply a small scheduling penalty with respect to wakeup behavior,
315 so that this thread is mildly disfavored in scheduling decisions.
316
317 This policy is useful for workloads that are noninteractive, but do not
318 want to lower their nice value, and for workloads that want a determin‐
319 istic scheduling policy without interactivity causing extra preemptions
320 (between the workload's tasks).
321
322 SCHED_IDLE: Scheduling very low priority jobs
323 (Since Linux 2.6.23.) SCHED_IDLE can be used only at static priority
324 0; the process nice value has no influence for this policy.
325
326 This policy is intended for running jobs at extremely low priority
327 (lower even than a +19 nice value with the SCHED_OTHER or SCHED_BATCH
328 policies).
329
330 Resetting scheduling policy for child processes
331 Each thread has a reset-on-fork scheduling flag. When this flag is
332 set, children created by fork(2) do not inherit privileged scheduling
333 policies. The reset-on-fork flag can be set by either:
334
335 * ORing the SCHED_RESET_ON_FORK flag into the policy argument when
336 calling sched_setscheduler(2) (since Linux 2.6.32); or
337
338 * specifying the SCHED_FLAG_RESET_ON_FORK flag in attr.sched_flags
339 when calling sched_setattr(2).
340
341 Note that the constants used with these two APIs have different names.
342 The state of the reset-on-fork flag can analogously be retrieved using
343 sched_getscheduler(2) and sched_getattr(2).
344
345 The reset-on-fork feature is intended for media-playback applications,
346 and can be used to prevent applications evading the RLIMIT_RTTIME
347 resource limit (see getrlimit(2)) by creating multiple child processes.
348
349 More precisely, if the reset-on-fork flag is set, the following rules
350 apply for subsequently created children:
351
352 * If the calling thread has a scheduling policy of SCHED_FIFO or
353 SCHED_RR, the policy is reset to SCHED_OTHER in child processes.
354
355 * If the calling process has a negative nice value, the nice value is
356 reset to zero in child processes.
357
358 After the reset-on-fork flag has been enabled, it can be reset only if
359 the thread has the CAP_SYS_NICE capability. This flag is disabled in
360 child processes created by fork(2).
361
362 Privileges and resource limits
363 In Linux kernels before 2.6.12, only privileged (CAP_SYS_NICE) threads
364 can set a nonzero static priority (i.e., set a real-time scheduling
365 policy). The only change that an unprivileged thread can make is to
366 set the SCHED_OTHER policy, and this can be done only if the effective
367 user ID of the caller matches the real or effective user ID of the tar‐
368 get thread (i.e., the thread specified by pid) whose policy is being
369 changed.
370
371 A thread must be privileged (CAP_SYS_NICE) in order to set or modify a
372 SCHED_DEADLINE policy.
373
374 Since Linux 2.6.12, the RLIMIT_RTPRIO resource limit defines a ceiling
375 on an unprivileged thread's static priority for the SCHED_RR and
376 SCHED_FIFO policies. The rules for changing scheduling policy and pri‐
377 ority are as follows:
378
379 * If an unprivileged thread has a nonzero RLIMIT_RTPRIO soft limit,
380 then it can change its scheduling policy and priority, subject to
381 the restriction that the priority cannot be set to a value higher
382 than the maximum of its current priority and its RLIMIT_RTPRIO soft
383 limit.
384
385 * If the RLIMIT_RTPRIO soft limit is 0, then the only permitted
386 changes are to lower the priority, or to switch to a non-real-time
387 policy.
388
389 * Subject to the same rules, another unprivileged thread can also make
390 these changes, as long as the effective user ID of the thread making
391 the change matches the real or effective user ID of the target
392 thread.
393
394 * Special rules apply for the SCHED_IDLE policy. In Linux kernels
395 before 2.6.39, an unprivileged thread operating under this policy
396 cannot change its policy, regardless of the value of its
397 RLIMIT_RTPRIO resource limit. In Linux kernels since 2.6.39, an
398 unprivileged thread can switch to either the SCHED_BATCH or the
399 SCHED_OTHER policy so long as its nice value falls within the range
400 permitted by its RLIMIT_NICE resource limit (see getrlimit(2)).
401
402 Privileged (CAP_SYS_NICE) threads ignore the RLIMIT_RTPRIO limit; as
403 with older kernels, they can make arbitrary changes to scheduling pol‐
404 icy and priority. See getrlimit(2) for further information on
405 RLIMIT_RTPRIO.
406
407 Limiting the CPU usage of real-time and deadline processes
408 A nonblocking infinite loop in a thread scheduled under the SCHED_FIFO,
409 SCHED_RR, or SCHED_DEADLINE policy can potentially block all other
410 threads from accessing the CPU forever. Prior to Linux 2.6.25, the
411 only way of preventing a runaway real-time process from freezing the
412 system was to run (at the console) a shell scheduled under a higher
413 static priority than the tested application. This allows an emergency
414 kill of tested real-time applications that do not block or terminate as
415 expected.
416
417 Since Linux 2.6.25, there are other techniques for dealing with runaway
418 real-time and deadline processes. One of these is to use the
419 RLIMIT_RTTIME resource limit to set a ceiling on the CPU time that a
420 real-time process may consume. See getrlimit(2) for details.
421
422 Since version 2.6.25, Linux also provides two /proc files that can be
423 used to reserve a certain amount of CPU time to be used by non-real-
424 time processes. Reserving CPU time in this fashion allows some CPU
425 time to be allocated to (say) a root shell that can be used to kill a
426 runaway process. Both of these files specify time values in microsec‐
427 onds:
428
429 /proc/sys/kernel/sched_rt_period_us
430 This file specifies a scheduling period that is equivalent to
431 100% CPU bandwidth. The value in this file can range from 1 to
432 INT_MAX, giving an operating range of 1 microsecond to around 35
433 minutes. The default value in this file is 1,000,000 (1 sec‐
434 ond).
435
436 /proc/sys/kernel/sched_rt_runtime_us
437 The value in this file specifies how much of the "period" time
438 can be used by all real-time and deadline scheduled processes on
439 the system. The value in this file can range from -1 to
440 INT_MAX-1. Specifying -1 makes the run time the same as the
441 period; that is, no CPU time is set aside for non-real-time pro‐
442 cesses (which was the Linux behavior before kernel 2.6.25). The
443 default value in this file is 950,000 (0.95 seconds), meaning
444 that 5% of the CPU time is reserved for processes that don't run
445 under a real-time or deadline scheduling policy.
446
447 Response time
448 A blocked high priority thread waiting for I/O has a certain response
449 time before it is scheduled again. The device driver writer can
450 greatly reduce this response time by using a "slow interrupt" interrupt
451 handler.
452
453 Miscellaneous
454 Child processes inherit the scheduling policy and parameters across a
455 fork(2). The scheduling policy and parameters are preserved across
456 execve(2).
457
458 Memory locking is usually needed for real-time processes to avoid pag‐
459 ing delays; this can be done with mlock(2) or mlockall(2).
460
461 The autogroup feature
462 Since Linux 2.6.38, the kernel provides a feature known as autogrouping
463 to improve interactive desktop performance in the face of multiprocess,
464 CPU-intensive workloads such as building the Linux kernel with large
465 numbers of parallel build processes (i.e., the make(1) -j flag).
466
467 This feature operates in conjunction with the CFS scheduler and
468 requires a kernel that is configured with CONFIG_SCHED_AUTOGROUP. On a
469 running system, this feature is enabled or disabled via the file
470 /proc/sys/kernel/sched_autogroup_enabled; a value of 0 disables the
471 feature, while a value of 1 enables it. The default value in this file
472 is 1, unless the kernel was booted with the noautogroup parameter.
473
474 A new autogroup is created when a new session is created via setsid(2);
475 this happens, for example, when a new terminal window is started. A
476 new process created by fork(2) inherits its parent's autogroup member‐
477 ship. Thus, all of the processes in a session are members of the same
478 autogroup. An autogroup is automatically destroyed when the last
479 process in the group terminates.
480
481 When autogrouping is enabled, all of the members of an autogroup are
482 placed in the same kernel scheduler "task group". The CFS scheduler
483 employs an algorithm that equalizes the distribution of CPU cycles
484 across task groups. The benefits of this for interactive desktop per‐
485 formance can be described via the following example.
486
487 Suppose that there are two autogroups competing for the same CPU (i.e.,
488 presume either a single CPU system or the use of taskset(1) to confine
489 all the processes to the same CPU on an SMP system). The first group
490 contains ten CPU-bound processes from a kernel build started with
491 make -j10. The other contains a single CPU-bound process: a video
492 player. The effect of autogrouping is that the two groups will each
493 receive half of the CPU cycles. That is, the video player will receive
494 50% of the CPU cycles, rather than just 9% of the cycles, which would
495 likely lead to degraded video playback. The situation on an SMP system
496 is more complex, but the general effect is the same: the scheduler dis‐
497 tributes CPU cycles across task groups such that an autogroup that con‐
498 tains a large number of CPU-bound processes does not end up hogging CPU
499 cycles at the expense of the other jobs on the system.
500
501 A process's autogroup (task group) membership can be viewed via the
502 file /proc/[pid]/autogroup:
503
504 $ cat /proc/1/autogroup
505 /autogroup-1 nice 0
506
507 This file can also be used to modify the CPU bandwidth allocated to an
508 autogroup. This is done by writing a number in the "nice" range to the
509 file to set the autogroup's nice value. The allowed range is from +19
510 (low priority) to -20 (high priority). (Writing values outside of this
511 range causes write(2) to fail with the error EINVAL.)
512
513 The autogroup nice setting has the same meaning as the process nice
514 value, but applies to distribution of CPU cycles to the autogroup as a
515 whole, based on the relative nice values of other autogroups. For a
516 process inside an autogroup, the CPU cycles that it receives will be a
517 product of the autogroup's nice value (compared to other autogroups)
518 and the process's nice value (compared to other processes in the same
519 autogroup.
520
521 The use of the cgroups(7) CPU controller to place processes in cgroups
522 other than the root CPU cgroup overrides the effect of autogrouping.
523
524 The autogroup feature groups only processes scheduled under non-real-
525 time policies (SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE). It does not
526 group processes scheduled under real-time and deadline policies. Those
527 processes are scheduled according to the rules described earlier.
528
529 The nice value and group scheduling
530 When scheduling non-real-time processes (i.e., those scheduled under
531 the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the CFS sched‐
532 uler employs a technique known as "group scheduling", if the kernel was
533 configured with the CONFIG_FAIR_GROUP_SCHED option (which is typical).
534
535 Under group scheduling, threads are scheduled in "task groups". Task
536 groups have a hierarchical relationship, rooted under the initial task
537 group on the system, known as the "root task group". Task groups are
538 formed in the following circumstances:
539
540 * All of the threads in a CPU cgroup form a task group. The parent of
541 this task group is the task group of the corresponding parent
542 cgroup.
543
544 * If autogrouping is enabled, then all of the threads that are
545 (implicitly) placed in an autogroup (i.e., the same session, as cre‐
546 ated by setsid(2)) form a task group. Each new autogroup is thus a
547 separate task group. The root task group is the parent of all such
548 autogroups.
549
550 * If autogrouping is enabled, then the root task group consists of all
551 processes in the root CPU cgroup that were not otherwise implicitly
552 placed into a new autogroup.
553
554 * If autogrouping is disabled, then the root task group consists of
555 all processes in the root CPU cgroup.
556
557 * If group scheduling was disabled (i.e., the kernel was configured
558 without CONFIG_FAIR_GROUP_SCHED), then all of the processes on the
559 system are notionally placed in a single task group.
560
561 Under group scheduling, a thread's nice value has an effect for sched‐
562 uling decisions only relative to other threads in the same task group.
563 This has some surprising consequences in terms of the traditional
564 semantics of the nice value on UNIX systems. In particular, if auto‐
565 grouping is enabled (which is the default in various distributions),
566 then employing setpriority(2) or nice(1) on a process has an effect
567 only for scheduling relative to other processes executed in the same
568 session (typically: the same terminal window).
569
570 Conversely, for two processes that are (for example) the sole CPU-bound
571 processes in different sessions (e.g., different terminal windows, each
572 of whose jobs are tied to different autogroups), modifying the nice
573 value of the process in one of the sessions has no effect in terms of
574 the scheduler's decisions relative to the process in the other session.
575 A possibly useful workaround here is to use a command such as the fol‐
576 lowing to modify the autogroup nice value for all of the processes in a
577 terminal session:
578
579 $ echo 10 > /proc/self/autogroup
580
581 Real-time features in the mainline Linux kernel
582 Since kernel version 2.6.18, Linux is gradually becoming equipped with
583 real-time capabilities, most of which are derived from the former real‐
584 time-preempt patch set. Until the patches have been completely merged
585 into the mainline kernel, they must be installed to achieve the best
586 real-time performance. These patches are named:
587
588 patch-kernelversion-rtpatchversion
589
590 and can be downloaded from ⟨http://www.kernel.org/pub/linux/kernel
591 /projects/rt/⟩.
592
593 Without the patches and prior to their full inclusion into the mainline
594 kernel, the kernel configuration offers only the three preemption
595 classes CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_VOLUNTARY, and CONFIG_PRE‐
596 EMPT_DESKTOP which respectively provide no, some, and considerable
597 reduction of the worst-case scheduling latency.
598
599 With the patches applied or after their full inclusion into the main‐
600 line kernel, the additional configuration item CONFIG_PREEMPT_RT
601 becomes available. If this is selected, Linux is transformed into a
602 regular real-time operating system. The FIFO and RR scheduling poli‐
603 cies are then used to run a thread with true real-time priority and a
604 minimum worst-case scheduling latency.
605
607 The cgroups(7) CPU controller can be used to limit the CPU consumption
608 of groups of processes.
609
610 Originally, Standard Linux was intended as a general-purpose operating
611 system being able to handle background processes, interactive applica‐
612 tions, and less demanding real-time applications (applications that
613 need to usually meet timing deadlines). Although the Linux kernel 2.6
614 allowed for kernel preemption and the newly introduced O(1) scheduler
615 ensures that the time needed to schedule is fixed and deterministic
616 irrespective of the number of active tasks, true real-time computing
617 was not possible up to kernel version 2.6.17.
618
620 chrt(1), taskset(1), getpriority(2), mlock(2), mlockall(2), munlock(2),
621 munlockall(2), nice(2), sched_get_priority_max(2),
622 sched_get_priority_min(2), sched_getaffinity(2), sched_getparam(2),
623 sched_getscheduler(2), sched_rr_get_interval(2), sched_setaffinity(2),
624 sched_setparam(2), sched_setscheduler(2), sched_yield(2),
625 setpriority(2), pthread_getaffinity_np(3), pthread_setaffinity_np(3),
626 sched_getcpu(3), capabilities(7), cpuset(7)
627
628 Programming for the real world - POSIX.4 by Bill O. Gallmeister,
629 O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
630
631 The Linux kernel source files Documentation/scheduler/sched-
632 deadline.txt, Documentation/scheduler/sched-rt-group.txt,
633 Documentation/scheduler/sched-design-CFS.txt, and
634 Documentation/scheduler/sched-nice-design.txt
635
637 This page is part of release 4.16 of the Linux man-pages project. A
638 description of the project, information about reporting bugs, and the
639 latest version of this page, can be found at
640 https://www.kernel.org/doc/man-pages/.
641
642
643
644Linux 2018-02-02 SCHED(7)