1SCHED(7)                   Linux Programmer's Manual                  SCHED(7)
2
3
4

NAME

6       sched - overview of CPU scheduling
7

DESCRIPTION

9       Since  Linux 2.6.23, the default scheduler is CFS, the "Completely Fair
10       Scheduler".  The CFS scheduler replaced the earlier "O(1)" scheduler.
11
12   API summary
13       Linux provides the following  system  calls  for  controlling  the  CPU
14       scheduling  behavior,  policy, and priority of processes (or, more pre‐
15       cisely, threads).
16
17       nice(2)
18              Set a new nice value for the calling thread, and return the  new
19              nice value.
20
21       getpriority(2)
22              Return  the  nice value of a thread, a process group, or the set
23              of threads owned by a specified user.
24
25       setpriority(2)
26              Set the nice value of a thread, a process group, or the  set  of
27              threads owned by a specified user.
28
29       sched_setscheduler(2)
30              Set the scheduling policy and parameters of a specified thread.
31
32       sched_getscheduler(2)
33              Return the scheduling policy of a specified thread.
34
35       sched_setparam(2)
36              Set the scheduling parameters of a specified thread.
37
38       sched_getparam(2)
39              Fetch the scheduling parameters of a specified thread.
40
41       sched_get_priority_max(2)
42              Return  the maximum priority available in a specified scheduling
43              policy.
44
45       sched_get_priority_min(2)
46              Return the minimum priority available in a specified  scheduling
47              policy.
48
49       sched_rr_get_interval(2)
50              Fetch  the quantum used for threads that are scheduled under the
51              "round-robin" scheduling policy.
52
53       sched_yield(2)
54              Cause the caller to relinquish  the  CPU,  so  that  some  other
55              thread be executed.
56
57       sched_setaffinity(2)
58              (Linux-specific) Set the CPU affinity of a specified thread.
59
60       sched_getaffinity(2)
61              (Linux-specific) Get the CPU affinity of a specified thread.
62
63       sched_setattr(2)
64              Set  the scheduling policy and parameters of a specified thread.
65              This (Linux-specific) system call provides  a  superset  of  the
66              functionality of sched_setscheduler(2) and sched_setparam(2).
67
68       sched_getattr(2)
69              Fetch  the  scheduling  policy  and  parameters  of  a specified
70              thread.  This (Linux-specific) system call provides  a  superset
71              of  the  functionality  of  sched_getscheduler(2) and sched_get‐
72              param(2).
73
74   Scheduling policies
75       The scheduler is the  kernel  component  that  decides  which  runnable
76       thread will be executed by the CPU next.  Each thread has an associated
77       scheduling policy and a  static  scheduling  priority,  sched_priority.
78       The  scheduler makes its decisions based on knowledge of the scheduling
79       policy and static priority of all threads on the system.
80
81       For threads scheduled under  one  of  the  normal  scheduling  policies
82       (SCHED_OTHER,  SCHED_IDLE,  SCHED_BATCH), sched_priority is not used in
83       scheduling decisions (it must be specified as 0).
84
85       Processes scheduled under one of the  real-time  policies  (SCHED_FIFO,
86       SCHED_RR)  have  a  sched_priority  value  in  the  range 1 (low) to 99
87       (high).  (As the numbers imply, real-time threads  always  have  higher
88       priority  than  normal threads.)  Note well: POSIX.1 requires an imple‐
89       mentation to support only a minimum 32 distinct priority levels for the
90       real-time  policies, and some systems supply just this minimum.  Porta‐
91       ble programs should use sched_get_priority_min(2) and  sched_get_prior‐
92       ity_max(2)  to  find the range of priorities supported for a particular
93       policy.
94
95       Conceptually, the scheduler maintains a list of  runnable  threads  for
96       each possible sched_priority value.  In order to determine which thread
97       runs next, the scheduler looks for the nonempty list with  the  highest
98       static priority and selects the thread at the head of this list.
99
100       A  thread's scheduling policy determines where it will be inserted into
101       the list of threads with equal static priority and  how  it  will  move
102       inside this list.
103
104       All scheduling is preemptive: if a thread with a higher static priority
105       becomes ready to run, the currently running thread  will  be  preempted
106       and  returned  to  the  wait  list  for its static priority level.  The
107       scheduling policy determines the  ordering  only  within  the  list  of
108       runnable threads with equal static priority.
109
110   SCHED_FIFO: First in-first out scheduling
111       SCHED_FIFO can be used only with static priorities higher than 0, which
112       means that when a SCHED_FIFO thread becomes runnable,  it  will  always
113       immediately  preempt any currently running SCHED_OTHER, SCHED_BATCH, or
114       SCHED_IDLE thread.  SCHED_FIFO is a simple scheduling algorithm without
115       time  slicing.   For threads scheduled under the SCHED_FIFO policy, the
116       following rules apply:
117
118       1) A running SCHED_FIFO thread  that  has  been  preempted  by  another
119          thread  of higher priority will stay at the head of the list for its
120          priority and will resume execution as soon as all threads of  higher
121          priority are blocked again.
122
123       2) When  a  blocked  SCHED_FIFO  thread  becomes  runnable,  it will be
124          inserted at the end of the list for its priority.
125
126       3) If   a    call    to    sched_setscheduler(2),    sched_setparam(2),
127          sched_setattr(2),   pthread_setschedparam(3),  or  pthread_setsched‐
128          prio(3) changes the priority of the running or  runnable  SCHED_FIFO
129          thread  identified by pid the effect on the thread's position in the
130          list depends on the direction of the change to threads priority:
131
132          ·  If the thread's priority is raised, it is placed at  the  end  of
133             the  list for its new priority.  As a consequence, it may preempt
134             a currently running thread with the same priority.
135
136          ·  If the thread's priority is unchanged, its position  in  the  run
137             list is unchanged.
138
139          ·  If the thread's priority is lowered, it is placed at the front of
140             the list for its new priority.
141
142          According to POSIX.1-2008, changes to a thread's priority  (or  pol‐
143          icy)  using  any mechanism other than pthread_setschedprio(3) should
144          result in the thread being placed at the end of  the  list  for  its
145          priority.
146
147       4) A thread calling sched_yield(2) will be put at the end of the list.
148
149       No  other events will move a thread scheduled under the SCHED_FIFO pol‐
150       icy in the wait list of runnable threads with equal static priority.
151
152       A SCHED_FIFO thread runs until either it is blocked by an I/O  request,
153       it   is   preempted   by   a   higher  priority  thread,  or  it  calls
154       sched_yield(2).
155
156   SCHED_RR: Round-robin scheduling
157       SCHED_RR is a simple enhancement of SCHED_FIFO.   Everything  described
158       above  for SCHED_FIFO also applies to SCHED_RR, except that each thread
159       is allowed to run only for a  maximum  time  quantum.   If  a  SCHED_RR
160       thread  has  been running for a time period equal to or longer than the
161       time quantum, it will be put at the end of the list for  its  priority.
162       A  SCHED_RR  thread that has been preempted by a higher priority thread
163       and subsequently resumes execution as a running  thread  will  complete
164       the  unexpired  portion of its round-robin time quantum.  The length of
165       the time quantum can be retrieved using sched_rr_get_interval(2).
166
167   SCHED_DEADLINE: Sporadic task model deadline scheduling
168       Since  version  3.14,  Linux  provides  a  deadline  scheduling  policy
169       (SCHED_DEADLINE).   This  policy  is  currently  implemented using GEDF
170       (Global Earliest Deadline First)  in  conjunction  with  CBS  (Constant
171       Bandwidth  Server).   To  set  and  fetch  this  policy  and associated
172       attributes,  one  must  use  the  Linux-specific  sched_setattr(2)  and
173       sched_getattr(2) system calls.
174
175       A  sporadic  task is one that has a sequence of jobs, where each job is
176       activated at most once per period.  Each job also has a relative  dead‐
177       line,  before which it should finish execution, and a computation time,
178       which is the CPU time necessary for executing the job.  The moment when
179       a  task  wakes  up  because  a new job has to be executed is called the
180       arrival time (also referred to as the request time  or  release  time).
181       The  start  time is the time at which a task starts its execution.  The
182       absolute deadline is thus obtained by adding the relative  deadline  to
183       the arrival time.
184
185       The following diagram clarifies these terms:
186
187           arrival/wakeup                    absolute deadline
188                |    start time                    |
189                |        |                         |
190                v        v                         v
191           -----x--------xooooooooooooooooo--------x--------x---
192                         |<- comp. time ->|
193                |<------- relative deadline ------>|
194                |<-------------- period ------------------->|
195
196       When   setting   a   SCHED_DEADLINE   policy   for   a   thread   using
197       sched_setattr(2), one can specify three parameters: Runtime,  Deadline,
198       and  Period.   These  parameters  do  not necessarily correspond to the
199       aforementioned terms: usual practice is to  set  Runtime  to  something
200       bigger  than the average computation time (or worst-case execution time
201       for hard real-time tasks),  Deadline  to  the  relative  deadline,  and
202       Period to the period of the task.  Thus, for SCHED_DEADLINE scheduling,
203       we have:
204
205           arrival/wakeup                    absolute deadline
206                |    start time                    |
207                |        |                         |
208                v        v                         v
209           -----x--------xooooooooooooooooo--------x--------x---
210                         |<-- Runtime ------->|
211                |<----------- Deadline ----------->|
212                |<-------------- Period ------------------->|
213
214       The three deadline-scheduling parameters correspond to  the  sched_run‐
215       time,  sched_deadline, and sched_period fields of the sched_attr struc‐
216       ture; see sched_setattr(2).  These fields express  values  in  nanosec‐
217       onds.   If  sched_period is specified as 0, then it is made the same as
218       sched_deadline.
219
220       The kernel requires that:
221
222           sched_runtime <= sched_deadline <= sched_period
223
224       In addition, under the current implementation,  all  of  the  parameter
225       values must be at least 1024 (i.e., just over one microsecond, which is
226       the resolution of the implementation), and less than 2^63.  If  any  of
227       these checks fails, sched_setattr(2) fails with the error EINVAL.
228
229       The  CBS  guarantees  non-interference  between  tasks,  by  throttling
230       threads that attempt to over-run their specified Runtime.
231
232       To ensure deadline scheduling guarantees, the kernel must prevent situ‐
233       ations where the set of SCHED_DEADLINE threads is not feasible (schedu‐
234       lable) within the given  constraints.   The  kernel  thus  performs  an
235       admittance  test  when  setting  or  changing SCHED_DEADLINE policy and
236       attributes.  This admission test calculates whether the change is  fea‐
237       sible; if it is not, sched_setattr(2) fails with the error EBUSY.
238
239       For  example,  it  is required (but not necessarily sufficient) for the
240       total utilization to be less than or equal to the total number of  CPUs
241       available,  where,  since each thread can maximally run for Runtime per
242       Period, that thread's utilization is its Runtime divided by its Period.
243
244       In order to fulfill the guarantees that  are  made  when  a  thread  is
245       admitted  to  the SCHED_DEADLINE policy, SCHED_DEADLINE threads are the
246       highest priority (user controllable) threads  in  the  system;  if  any
247       SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled
248       under one of the other policies.
249
250       A call to fork(2) by a thread scheduled under the SCHED_DEADLINE policy
251       fails  with  the  error EAGAIN, unless the thread has its reset-on-fork
252       flag set (see below).
253
254       A SCHED_DEADLINE thread that calls sched_yield(2) will yield  the  cur‐
255       rent job and wait for a new period to begin.
256
257   SCHED_OTHER: Default Linux time-sharing scheduling
258       SCHED_OTHER  can be used at only static priority 0 (i.e., threads under
259       real-time policies always have priority  over  SCHED_OTHER  processes).
260       SCHED_OTHER  is  the  standard  Linux  time-sharing  scheduler  that is
261       intended for all threads that do  not  require  the  special  real-time
262       mechanisms.
263
264       The  thread to run is chosen from the static priority 0 list based on a
265       dynamic priority that is determined only inside this list.  The dynamic
266       priority  is  based  on the nice value (see below) and is increased for
267       each time quantum the thread is ready to run, but denied to run by  the
268       scheduler.  This ensures fair progress among all SCHED_OTHER threads.
269
270       In  the  Linux  kernel  source code, the SCHED_OTHER policy is actually
271       named SCHED_NORMAL.
272
273   The nice value
274       The nice value is an attribute that can be used to  influence  the  CPU
275       scheduler  to  favor or disfavor a process in scheduling decisions.  It
276       affects the scheduling of SCHED_OTHER and SCHED_BATCH (see below)  pro‐
277       cesses.   The nice value can be modified using nice(2), setpriority(2),
278       or sched_setattr(2).
279
280       According to POSIX.1, the nice value is a per-process  attribute;  that
281       is,  the  threads  in a process should share a nice value.  However, on
282       Linux, the nice value is a per-thread attribute: different  threads  in
283       the same process may have different nice values.
284
285       The  range  of  the  nice  value varies across UNIX systems.  On modern
286       Linux, the range is -20 (high priority) to +19 (low priority).  On some
287       other  systems, the range is -20..20.  Very early Linux kernels (Before
288       Linux 2.0) had the range -infinity..15.
289
290       The degree to which the nice value affects the relative  scheduling  of
291       SCHED_OTHER  processes  likewise  varies across UNIX systems and across
292       Linux kernel versions.
293
294       With the advent of the CFS scheduler in kernel 2.6.23, Linux adopted an
295       algorithm  that  causes  relative  differences in nice values to have a
296       much stronger effect.  In the current implementation, each unit of dif‐
297       ference in the nice values of two processes results in a factor of 1.25
298       in the degree  to  which  the  scheduler  favors  the  higher  priority
299       process.   This causes very low nice values (+19) to truly provide lit‐
300       tle CPU to a process whenever there is any other higher  priority  load
301       on the system, and makes high nice values (-20) deliver most of the CPU
302       to applications that require it (e.g., some audio applications).
303
304       On Linux, the RLIMIT_NICE resource limit can be used to define a  limit
305       to  which an unprivileged process's nice value can be raised; see setr‐
306       limit(2) for details.
307
308       For further details on the nice value, see the subsections on the auto‐
309       group feature and group scheduling, below.
310
311   SCHED_BATCH: Scheduling batch processes
312       (Since  Linux 2.6.16.)  SCHED_BATCH can be used only at static priority
313       0.  This policy is similar to SCHED_OTHER  in  that  it  schedules  the
314       thread  according  to  its  dynamic priority (based on the nice value).
315       The difference is that this policy will cause the scheduler  to  always
316       assume  that  the thread is CPU-intensive.  Consequently, the scheduler
317       will apply a small scheduling penalty with respect to wakeup  behavior,
318       so that this thread is mildly disfavored in scheduling decisions.
319
320       This policy is useful for workloads that are noninteractive, but do not
321       want to lower their nice value, and for workloads that want a determin‐
322       istic scheduling policy without interactivity causing extra preemptions
323       (between the workload's tasks).
324
325   SCHED_IDLE: Scheduling very low priority jobs
326       (Since Linux 2.6.23.)  SCHED_IDLE can be used only at  static  priority
327       0; the process nice value has no influence for this policy.
328
329       This  policy  is  intended  for  running jobs at extremely low priority
330       (lower even than a +19 nice value with the SCHED_OTHER  or  SCHED_BATCH
331       policies).
332
333   Resetting scheduling policy for child processes
334       Each  thread  has  a  reset-on-fork scheduling flag.  When this flag is
335       set, children created by fork(2) do not inherit  privileged  scheduling
336       policies.  The reset-on-fork flag can be set by either:
337
338       *  ORing  the  SCHED_RESET_ON_FORK  flag  into the policy argument when
339          calling sched_setscheduler(2) (since Linux 2.6.32); or
340
341       *  specifying the  SCHED_FLAG_RESET_ON_FORK  flag  in  attr.sched_flags
342          when calling sched_setattr(2).
343
344       Note  that the constants used with these two APIs have different names.
345       The state of the reset-on-fork flag can analogously be retrieved  using
346       sched_getscheduler(2) and sched_getattr(2).
347
348       The  reset-on-fork feature is intended for media-playback applications,
349       and can be used  to  prevent  applications  evading  the  RLIMIT_RTTIME
350       resource limit (see getrlimit(2)) by creating multiple child processes.
351
352       More  precisely,  if the reset-on-fork flag is set, the following rules
353       apply for subsequently created children:
354
355       *  If the calling thread has  a  scheduling  policy  of  SCHED_FIFO  or
356          SCHED_RR, the policy is reset to SCHED_OTHER in child processes.
357
358       *  If  the calling process has a negative nice value, the nice value is
359          reset to zero in child processes.
360
361       After the reset-on-fork flag has been enabled, it can be reset only  if
362       the  thread  has the CAP_SYS_NICE capability.  This flag is disabled in
363       child processes created by fork(2).
364
365   Privileges and resource limits
366       In Linux kernels before 2.6.12, only privileged (CAP_SYS_NICE)  threads
367       can  set  a  nonzero  static priority (i.e., set a real-time scheduling
368       policy).  The only change that an unprivileged thread can  make  is  to
369       set  the SCHED_OTHER policy, and this can be done only if the effective
370       user ID of the caller matches the real or effective user ID of the tar‐
371       get  thread  (i.e.,  the thread specified by pid) whose policy is being
372       changed.
373
374       A thread must be privileged (CAP_SYS_NICE) in order to set or modify  a
375       SCHED_DEADLINE policy.
376
377       Since  Linux 2.6.12, the RLIMIT_RTPRIO resource limit defines a ceiling
378       on an unprivileged  thread's  static  priority  for  the  SCHED_RR  and
379       SCHED_FIFO policies.  The rules for changing scheduling policy and pri‐
380       ority are as follows:
381
382       *  If an unprivileged thread has a nonzero  RLIMIT_RTPRIO  soft  limit,
383          then  it  can  change its scheduling policy and priority, subject to
384          the restriction that the priority cannot be set to  a  value  higher
385          than  the maximum of its current priority and its RLIMIT_RTPRIO soft
386          limit.
387
388       *  If the RLIMIT_RTPRIO soft  limit  is  0,  then  the  only  permitted
389          changes  are  to lower the priority, or to switch to a non-real-time
390          policy.
391
392       *  Subject to the same rules, another unprivileged thread can also make
393          these changes, as long as the effective user ID of the thread making
394          the change matches the real or  effective  user  ID  of  the  target
395          thread.
396
397       *  Special  rules  apply  for  the SCHED_IDLE policy.  In Linux kernels
398          before 2.6.39, an unprivileged thread operating  under  this  policy
399          cannot   change   its   policy,  regardless  of  the  value  of  its
400          RLIMIT_RTPRIO resource limit.  In Linux  kernels  since  2.6.39,  an
401          unprivileged  thread  can  switch  to  either the SCHED_BATCH or the
402          SCHED_OTHER policy so long as its nice value falls within the  range
403          permitted by its RLIMIT_NICE resource limit (see getrlimit(2)).
404
405       Privileged  (CAP_SYS_NICE)  threads  ignore the RLIMIT_RTPRIO limit; as
406       with older kernels, they can make arbitrary changes to scheduling  pol‐
407       icy   and  priority.   See  getrlimit(2)  for  further  information  on
408       RLIMIT_RTPRIO.
409
410   Limiting the CPU usage of real-time and deadline processes
411       A nonblocking infinite loop in a thread scheduled under the SCHED_FIFO,
412       SCHED_RR,  or  SCHED_DEADLINE  policy  can  potentially block all other
413       threads from accessing the CPU forever.  Prior  to  Linux  2.6.25,  the
414       only  way  of  preventing a runaway real-time process from freezing the
415       system was to run (at the console) a shell  scheduled  under  a  higher
416       static  priority than the tested application.  This allows an emergency
417       kill of tested real-time applications that do not block or terminate as
418       expected.
419
420       Since Linux 2.6.25, there are other techniques for dealing with runaway
421       real-time  and  deadline  processes.   One  of  these  is  to  use  the
422       RLIMIT_RTTIME  resource  limit  to set a ceiling on the CPU time that a
423       real-time process may consume.  See getrlimit(2) for details.
424
425       Since version 2.6.25, Linux also provides two /proc files that  can  be
426       used  to  reserve  a certain amount of CPU time to be used by non-real-
427       time processes.  Reserving CPU time in this  fashion  allows  some  CPU
428       time  to  be allocated to (say) a root shell that can be used to kill a
429       runaway process.  Both of these files specify time values in  microsec‐
430       onds:
431
432       /proc/sys/kernel/sched_rt_period_us
433              This  file  specifies  a scheduling period that is equivalent to
434              100% CPU bandwidth.  The value in this file can range from 1  to
435              INT_MAX, giving an operating range of 1 microsecond to around 35
436              minutes.  The default value in this file is  1,000,000  (1  sec‐
437              ond).
438
439       /proc/sys/kernel/sched_rt_runtime_us
440              The  value  in this file specifies how much of the "period" time
441              can be used by all real-time and deadline scheduled processes on
442              the  system.   The  value  in  this  file  can  range from -1 to
443              INT_MAX-1.  Specifying -1 makes the run time  the  same  as  the
444              period; that is, no CPU time is set aside for non-real-time pro‐
445              cesses (which was the Linux behavior before kernel 2.6.25).  The
446              default  value  in  this file is 950,000 (0.95 seconds), meaning
447              that 5% of the CPU time is reserved for processes that don't run
448              under a real-time or deadline scheduling policy.
449
450   Response time
451       A  blocked  high priority thread waiting for I/O has a certain response
452       time before it is  scheduled  again.   The  device  driver  writer  can
453       greatly reduce this response time by using a "slow interrupt" interrupt
454       handler.
455
456   Miscellaneous
457       Child processes inherit the scheduling policy and parameters  across  a
458       fork(2).   The  scheduling  policy  and parameters are preserved across
459       execve(2).
460
461       Memory locking is usually needed for real-time processes to avoid  pag‐
462       ing delays; this can be done with mlock(2) or mlockall(2).
463
464   The autogroup feature
465       Since Linux 2.6.38, the kernel provides a feature known as autogrouping
466       to improve interactive desktop performance in the face of multiprocess,
467       CPU-intensive  workloads  such  as building the Linux kernel with large
468       numbers of parallel build processes (i.e., the make(1) -j flag).
469
470       This feature  operates  in  conjunction  with  the  CFS  scheduler  and
471       requires a kernel that is configured with CONFIG_SCHED_AUTOGROUP.  On a
472       running system, this feature  is  enabled  or  disabled  via  the  file
473       /proc/sys/kernel/sched_autogroup_enabled;  a  value  of  0 disables the
474       feature, while a value of 1 enables it.  The default value in this file
475       is 1, unless the kernel was booted with the noautogroup parameter.
476
477       A new autogroup is created when a new session is created via setsid(2);
478       this happens, for example, when a new terminal window  is  started.   A
479       new  process created by fork(2) inherits its parent's autogroup member‐
480       ship.  Thus, all of the processes in a session are members of the  same
481       autogroup.   An  autogroup  is  automatically  destroyed  when the last
482       process in the group terminates.
483
484       When autogrouping is enabled, all of the members of  an  autogroup  are
485       placed  in  the  same kernel scheduler "task group".  The CFS scheduler
486       employs an algorithm that equalizes  the  distribution  of  CPU  cycles
487       across  task groups.  The benefits of this for interactive desktop per‐
488       formance can be described via the following example.
489
490       Suppose that there are two autogroups competing for the same CPU (i.e.,
491       presume  either a single CPU system or the use of taskset(1) to confine
492       all the processes to the same CPU on an SMP system).  The  first  group
493       contains  ten  CPU-bound  processes  from  a  kernel build started with
494       make -j10.  The other contains a  single  CPU-bound  process:  a  video
495       player.   The  effect  of autogrouping is that the two groups will each
496       receive half of the CPU cycles.  That is, the video player will receive
497       50%  of  the CPU cycles, rather than just 9% of the cycles, which would
498       likely lead to degraded video playback.  The situation on an SMP system
499       is more complex, but the general effect is the same: the scheduler dis‐
500       tributes CPU cycles across task groups such that an autogroup that con‐
501       tains a large number of CPU-bound processes does not end up hogging CPU
502       cycles at the expense of the other jobs on the system.
503
504       A process's autogroup (task group) membership can  be  viewed  via  the
505       file /proc/[pid]/autogroup:
506
507           $ cat /proc/1/autogroup
508           /autogroup-1 nice 0
509
510       This  file can also be used to modify the CPU bandwidth allocated to an
511       autogroup.  This is done by writing a number in the "nice" range to the
512       file  to set the autogroup's nice value.  The allowed range is from +19
513       (low priority) to -20 (high priority).  (Writing values outside of this
514       range causes write(2) to fail with the error EINVAL.)
515
516       The  autogroup  nice  setting  has the same meaning as the process nice
517       value, but applies to distribution of CPU cycles to the autogroup as  a
518       whole,  based  on  the relative nice values of other autogroups.  For a
519       process inside an autogroup, the CPU cycles that it receives will be  a
520       product  of  the  autogroup's nice value (compared to other autogroups)
521       and the process's nice value (compared to other processes in  the  same
522       autogroup.
523
524       The  use of the cgroups(7) CPU controller to place processes in cgroups
525       other than the root CPU cgroup overrides the effect of autogrouping.
526
527       The autogroup feature groups only processes scheduled  under  non-real-
528       time  policies (SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE).  It does not
529       group processes scheduled under real-time and deadline policies.  Those
530       processes are scheduled according to the rules described earlier.
531
532   The nice value and group scheduling
533       When  scheduling  non-real-time  processes (i.e., those scheduled under
534       the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the CFS  sched‐
535       uler employs a technique known as "group scheduling", if the kernel was
536       configured with the CONFIG_FAIR_GROUP_SCHED option (which is typical).
537
538       Under group scheduling, threads are scheduled in "task  groups".   Task
539       groups  have a hierarchical relationship, rooted under the initial task
540       group on the system, known as the "root task group".  Task  groups  are
541       formed in the following circumstances:
542
543       *  All of the threads in a CPU cgroup form a task group.  The parent of
544          this task group is  the  task  group  of  the  corresponding  parent
545          cgroup.
546
547       *  If  autogrouping  is  enabled,  then  all  of  the  threads that are
548          (implicitly) placed in an autogroup (i.e., the same session, as cre‐
549          ated  by setsid(2)) form a task group.  Each new autogroup is thus a
550          separate task group.  The root task group is the parent of all  such
551          autogroups.
552
553       *  If autogrouping is enabled, then the root task group consists of all
554          processes in the root CPU cgroup that were not otherwise  implicitly
555          placed into a new autogroup.
556
557       *  If  autogrouping  is  disabled, then the root task group consists of
558          all processes in the root CPU cgroup.
559
560       *  If group scheduling was disabled (i.e., the  kernel  was  configured
561          without  CONFIG_FAIR_GROUP_SCHED),  then all of the processes on the
562          system are notionally placed in a single task group.
563
564       Under group scheduling, a thread's nice value has an effect for  sched‐
565       uling  decisions only relative to other threads in the same task group.
566       This has some surprising  consequences  in  terms  of  the  traditional
567       semantics  of  the nice value on UNIX systems.  In particular, if auto‐
568       grouping is enabled (which is the default  in  various  distributions),
569       then  employing  setpriority(2)  or  nice(1) on a process has an effect
570       only for scheduling relative to other processes executed  in  the  same
571       session (typically: the same terminal window).
572
573       Conversely, for two processes that are (for example) the sole CPU-bound
574       processes in different sessions (e.g., different terminal windows, each
575       of  whose  jobs  are  tied to different autogroups), modifying the nice
576       value of the process in one of the sessions has no effect in  terms  of
577       the scheduler's decisions relative to the process in the other session.
578       A possibly useful workaround here is to use a command such as the  fol‐
579       lowing to modify the autogroup nice value for all of the processes in a
580       terminal session:
581
582           $ echo 10 > /proc/self/autogroup
583
584   Real-time features in the mainline Linux kernel
585       Since kernel version 2.6.18, Linux is gradually becoming equipped  with
586       real-time capabilities, most of which are derived from the former real‐
587       time-preempt patch set.  Until the patches have been completely  merged
588       into  the  mainline  kernel, they must be installed to achieve the best
589       real-time performance.  These patches are named:
590
591           patch-kernelversion-rtpatchversion
592
593       and  can  be  downloaded  from  ⟨http://www.kernel.org/pub/linux/kernel
594       /projects/rt/⟩.
595
596       Without the patches and prior to their full inclusion into the mainline
597       kernel, the kernel  configuration  offers  only  the  three  preemption
598       classes  CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_VOLUNTARY, and CONFIG_PRE‐
599       EMPT_DESKTOP which respectively  provide  no,  some,  and  considerable
600       reduction of the worst-case scheduling latency.
601
602       With  the  patches applied or after their full inclusion into the main‐
603       line  kernel,  the  additional  configuration  item   CONFIG_PREEMPT_RT
604       becomes  available.   If  this is selected, Linux is transformed into a
605       regular real-time operating system.  The FIFO and RR  scheduling  poli‐
606       cies  are  then used to run a thread with true real-time priority and a
607       minimum worst-case scheduling latency.
608

NOTES

610       The cgroups(7) CPU controller can be used to limit the CPU  consumption
611       of groups of processes.
612
613       Originally,  Standard Linux was intended as a general-purpose operating
614       system being able to handle background processes, interactive  applica‐
615       tions,  and  less  demanding  real-time applications (applications that
616       need to usually meet timing deadlines).  Although the Linux kernel  2.6
617       allowed  for  kernel preemption and the newly introduced O(1) scheduler
618       ensures that the time needed to schedule  is  fixed  and  deterministic
619       irrespective  of  the  number of active tasks, true real-time computing
620       was not possible up to kernel version 2.6.17.
621

SEE ALSO

623       chcpu(1), chrt(1), lscpu(1), ps(1), taskset(1), top(1), getpriority(2),
624       mlock(2), mlockall(2), munlock(2), munlockall(2), nice(2),
625       sched_get_priority_max(2), sched_get_priority_min(2),
626       sched_getaffinity(2), sched_getparam(2), sched_getscheduler(2),
627       sched_rr_get_interval(2), sched_setaffinity(2), sched_setparam(2),
628       sched_setscheduler(2), sched_yield(2), setpriority(2),
629       pthread_getaffinity_np(3), pthread_getschedparam(3),
630       pthread_setaffinity_np(3), sched_getcpu(3), capabilities(7), cpuset(7)
631
632       Programming  for  the  real  world  -  POSIX.4  by Bill O. Gallmeister,
633       O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
634
635       The   Linux   kernel   source   files    Documentation/scheduler/sched-
636       deadline.txt,               Documentation/scheduler/sched-rt-group.txt,
637       Documentation/scheduler/sched-design-CFS.txt,                       and
638       Documentation/scheduler/sched-nice-design.txt
639

COLOPHON

641       This  page  is  part of release 5.07 of the Linux man-pages project.  A
642       description of the project, information about reporting bugs,  and  the
643       latest     version     of     this    page,    can    be    found    at
644       https://www.kernel.org/doc/man-pages/.
645
646
647
648Linux                             2019-08-02                          SCHED(7)
Impressum