clone3(2)

1CLONE(2)                   Linux Programmer's Manual                  CLONE(2)
2
3
4

NAME

6       clone, __clone2, clone3 - create a child process
7

SYNOPSIS

9       /* Prototype for the glibc wrapper function */
10
11       #define _GNU_SOURCE
12       #include <sched.h>
13
14       int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...
15                 /* pid_t *parent_tid, void *tls, pid_t *child_tid */ );
16
17       /* For the prototype of the raw clone() system call, see NOTES */
18
19       long clone3(struct clone_args *cl_args, size_t size);
20
21       Note: There is not yet a glibc wrapper for clone3(); see NOTES.
22

DESCRIPTION

24       These  system calls create a new ("child") process, in a manner similar
25       to fork(2).
26
27       By contrast with fork(2), these system calls provide more precise  con‐
28       trol over what pieces of execution context are shared between the call‐
29       ing process and the child process.  For  example,  using  these  system
30       calls,  the  caller  can control whether or not the two processes share
31       the virtual address space, the table of file descriptors, and the table
32       of  signal  handlers.   These  system  calls  also  allow the new child
33       process to be placed in separate namespaces(7).
34
35       Note that in this manual page, "calling process"  normally  corresponds
36       to  "parent  process".   But  see  the descriptions of CLONE_PARENT and
37       CLONE_THREAD below.
38
39       This page describes the following interfaces:
40
41       *  The glibc clone() wrapper function and the underlying system call on
42          which  it  is  based.  The main text describes the wrapper function;
43          the differences for the raw system call are described toward the end
44          of this page.
45
46       *  The newer clone3() system call.
47
48       In the remainder of this page, the terminology "the clone call" is used
49       when noting details that apply to all of these interfaces,
50
51   The clone() wrapper function
52       When the child process is created with the clone() wrapper function, it
53       commences  execution by calling the function pointed to by the argument
54       fn.  (This differs from fork(2), where execution continues in the child
55       from the point of the fork(2) call.)  The arg argument is passed as the
56       argument of the function fn.
57
58       When the fn(arg) function returns, the child process  terminates.   The
59       integer  returned  by fn is the exit status for the child process.  The
60       child process may also terminate explicitly by calling exit(2) or after
61       receiving a fatal signal.
62
63       The  stack  argument  specifies  the  location of the stack used by the
64       child process.  Since the child and calling process may  share  memory,
65       it  is  not possible for the child process to execute in the same stack
66       as the calling process.  The calling process must therefore set up mem‐
67       ory  space  for  the  child  stack  and pass a pointer to this space to
68       clone().  Stacks grow downward on all processors that run Linux (except
69       the  HP  PA processors), so stack usually points to the topmost address
70       of the memory space set up for the child stack.  Note that clone() does
71       not  provide  a  means  whereby the caller can inform the kernel of the
72       size of the stack area.
73
74       The remaining arguments to clone() are discussed below.
75
76   clone3()
77       The clone3() system call provides a superset of  the  functionality  of
78       the older clone() interface.  It also provides a number of API improve‐
79       ments, including: space for additional flags bits;  cleaner  separation
80       in the use of various arguments; and the ability to specify the size of
81       the child's stack area.
82
83       As with fork(2), clone3() returns in both the parent and the child.  It
84       returns  0 in the child process and returns the PID of the child in the
85       parent.
86
87       The cl_args argument of clone3() is a structure of the following form:
88
89           struct clone_args {
90               u64 flags;        /* Flags bit mask */
91               u64 pidfd;        /* Where to store PID file descriptor
92                                    (pid_t *) */
93               u64 child_tid;    /* Where to store child TID,
94                                    in child's memory (pid_t *) */
95               u64 parent_tid;   /* Where to store child TID,
96                                    in parent's memory (int *) */
97               u64 exit_signal;  /* Signal to deliver to parent on
98                                    child termination */
99               u64 stack;        /* Pointer to lowest byte of stack */
100               u64 stack_size;   /* Size of stack */
101               u64 tls;          /* Location of new TLS */
102               u64 set_tid;      /* Pointer to a pid_t array
103                                    (since Linux 5.5) */
104               u64 set_tid_size; /* Number of elements in set_tid
105                                    (since Linux 5.5) */
106               u64 cgroup;       /* File descriptor for target cgroup
107                                    of child (since Linux 5.7) */
108           };
109
110       The size argument that is supplied to clone3() should be initialized to
111       the  size  of this structure.  (The existence of the size argument per‐
112       mits future extensions to the clone_args structure.)
113
114       The stack for the child process is specified via  cl_args.stack,  which
115       points  to  the  lowest byte of the stack area, and cl_args.stack_size,
116       which specifies the size of the stack in bytes.  In the case where  the
117       CLONE_VM  flag  (see  below)  is  specified, a stack must be explicitly
118       allocated and specified.  Otherwise, these two fields can be  specified
119       as NULL and 0, which causes the child to use the same stack area as the
120       parent (in the child's own virtual address space).
121
122       The remaining fields in the cl_args argument are discussed below.
123
124   Equivalence between clone() and clone3() arguments
125       Unlike the older clone() interface, where arguments are passed individ‐
126       ually,  in the newer clone3() interface the arguments are packaged into
127       the clone_args structure shown above.   This  structure  allows  for  a
128       superset of the information passed via the clone() arguments.
129
130       The  following  table  shows  the  equivalence between the arguments of
131       clone() and the fields in the clone_args argument supplied to clone3():
132
133              clone()         clone3()        Notes
134                              cl_args field
135              flags & ~0xff   flags           For most flags; details below
136              parent_tid      pidfd           See CLONE_PIDFD
137              child_tid       child_tid       See CLONE_CHILD_SETTID
138              parent_tid      parent_tid      See CLONE_PARENT_SETTID
139              flags & 0xff    exit_signal
140              stack           stack
141              ---             stack_size
142              tls             tls             See CLONE_SETTLS
143              ---             set_tid         See below for details
144              ---             set_tid_size
145              ---             cgroup          See CLONE_INTO_CGROUP
146
147   The child termination signal
148       When the child process terminates, a signal may be sent to the  parent.
149       The  termination signal is specified in the low byte of flags (clone())
150       or in cl_args.exit_signal (clone3()).  If this signal is  specified  as
151       anything  other  than SIGCHLD, then the parent process must specify the
152       __WALL or __WCLONE options when waiting for the child with wait(2).  If
153       no  signal  (i.e.,  zero)  is specified, then the parent process is not
154       signaled when the child terminates.
155
156   The set_tid array
157       By default, the kernel chooses the next  sequential  PID  for  the  new
158       process in each of the PID namespaces where it is present.  When creat‐
159       ing a process with clone3(), the set_tid array (available  since  Linux
160       5.5) can be used to select specific PIDs for the process in some or all
161       of the PID namespaces where it is present.  If the  PID  of  the  newly
162       created  process should be set only for the current PID namespace or in
163       the newly created PID namespace (if flags contains  CLONE_NEWPID)  then
164       the  first  element  in the set_tid array has to be the desired PID and
165       set_tid_size needs to be 1.
166
167       If the PID of the newly created process should have a certain value  in
168       multiple  PID  namespaces,  then  the  set_tid  array can have multiple
169       entries.  The first entry defines the PID in the most deeply nested PID
170       namespace  and  each  of  the following entries contains the PID in the
171       corresponding ancestor PID namespace.  The number of PID namespaces  in
172       which  a  PID  should be set is defined by set_tid_size which cannot be
173       larger than the number of currently nested PID namespaces.
174
175       To create a process with the following PIDs in a PID namespace  hierar‐
176       chy:
177
178              PID NS level   Requested PID   Notes
179              0              31496           Outermost PID namespace
180              1              42
181              2              7               Innermost PID namespace
182
183       Set the array to:
184
185           set_tid[0] = 7;
186           set_tid[1] = 42;
187           set_tid[2] = 31496;
188           set_tid_size = 3;
189
190       If  only the PIDs in the two innermost PID namespaces need to be speci‐
191       fied, set the array to:
192
193           set_tid[0] = 7;
194           set_tid[1] = 42;
195           set_tid_size = 2;
196
197       The PID in the PID namespaces outside the two innermost PID  namespaces
198       will be selected the same way as any other PID is selected.
199
200       The  set_tid  feature  requires CAP_SYS_ADMIN in all owning user names‐
201       paces of the target PID namespaces.
202
203       Callers may only choose a PID greater than 1 in a given  PID  namespace
204       if  an init process (i.e., a process with PID 1) already exists in that
205       namespace.  Otherwise the PID entry for this PID namespace must be 1.
206
207   The flags mask
208       Both clone() and clone3() allow a flags bit mask  that  modifies  their
209       behavior  and  allows  the caller to specify what is shared between the
210       calling process and the child process.  This bit mask—the  flags  argu‐
211       ment  of  clone()  or  the  cl_args.flags  field  passed to clone3()—is
212       referred to as the flags mask in the remainder of this page.
213
214       The flags mask is specified as a bitwise-OR of zero or more of the con‐
215       stants  listed below.  Except as noted below, these flags are available
216       (and have the same effect) in both clone() and clone3().
217
218       CLONE_CHILD_CLEARTID (since Linux 2.5.49)
219              Clear (zero) the child thread ID at the location pointed  to  by
220              child_tid  (clone())  or  cl_args.child_tid  (clone3()) in child
221              memory when the child exits, and do a wakeup  on  the  futex  at
222              that  address.   The  address  involved  may  be  changed by the
223              set_tid_address(2) system  call.   This  is  used  by  threading
224              libraries.
225
226       CLONE_CHILD_SETTID (since Linux 2.5.49)
227              Store  the  child  thread  ID  at  the  location  pointed  to by
228              child_tid  (clone())  or  cl_args.child_tid  (clone3())  in  the
229              child's  memory.  The store operation completes before the clone
230              call returns control to user space in the child process.   (Note
231              that the store operation may not have completed before the clone
232              call returns in the parent process, which will  be  relevant  if
233              the CLONE_VM flag is also employed.)
234
235       CLONE_CLEAR_SIGHAND (since Linux 5.5)
236              By default, signal dispositions in the child thread are the same
237              as in the parent.  If this flag is specified, then  all  signals
238              that are handled in the parent are reset to their default dispo‐
239              sitions (SIG_DFL) in the child.
240
241              Specifying this flag together with CLONE_SIGHAND is  nonsensical
242              and disallowed.
243
244       CLONE_DETACHED (historical)
245              For  a while (during the Linux 2.5 development series) there was
246              a CLONE_DETACHED flag, which caused the parent not to receive  a
247              signal  when  the  child  terminated.  Ultimately, the effect of
248              this flag was subsumed under the CLONE_THREAD flag  and  by  the
249              time  Linux 2.6.0 was released, this flag had no effect.  Start‐
250              ing in Linux 2.6.2, the need to give  this  flag  together  with
251              CLONE_THREAD disappeared.
252
253              This flag is still defined, but it is usually ignored when call‐
254              ing clone().  However, see the description  of  CLONE_PIDFD  for
255              some exceptions.
256
257       CLONE_FILES (since Linux 2.0)
258              If CLONE_FILES is set, the calling process and the child process
259              share the same file descriptor table.  Any file descriptor  cre‐
260              ated  by  the  calling  process  or by the child process is also
261              valid in the other process.  Similarly, if one of the  processes
262              closes a file descriptor, or changes its associated flags (using
263              the fcntl(2) F_SETFD  operation),  the  other  process  is  also
264              affected.   If  a  process sharing a file descriptor table calls
265              execve(2), its file descriptor table is duplicated (unshared).
266
267              If CLONE_FILES is not set, the child process inherits a copy  of
268              all  file  descriptors opened in the calling process at the time
269              of the clone call.  Subsequent operations  that  open  or  close
270              file  descriptors, or change file descriptor flags, performed by
271              either the calling process or the child process  do  not  affect
272              the  other  process.   Note,  however,  that the duplicated file
273              descriptors in the child refer to the same  open  file  descrip‐
274              tions  as  the  corresponding  file  descriptors  in the calling
275              process, and thus share file offsets and file status flags  (see
276              open(2)).
277
278       CLONE_FS (since Linux 2.0)
279              If  CLONE_FS  is set, the caller and the child process share the
280              same filesystem information.  This  includes  the  root  of  the
281              filesystem,  the  current working directory, and the umask.  Any
282              call to chroot(2), chdir(2), or umask(2) performed by the  call‐
283              ing process or the child process also affects the other process.
284
285              If CLONE_FS is not set, the child process works on a copy of the
286              filesystem information of the calling process at the time of the
287              clone call.  Calls to chroot(2), chdir(2), or umask(2) performed
288              later by one of the processes do not affect the other process.
289
290       CLONE_INTO_CGROUP (since Linux 5.7)
291              By default, a child process is placed  in  the  same  version  2
292              cgroup  as  its  parent.   The CLONE_INTO_CGROUP flag allows the
293              child process to be created in a  different  version  2  cgroup.
294              (Note  that  CLONE_INTO_CGROUP  has  effect  only  for version 2
295              cgroups.)
296
297              In order to place the child process in a different  cgroup,  the
298              caller specifies CLONE_INTO_CGROUP in cl_args.flags and passes a
299              file descriptor that  refers  to  a  version  2  cgroup  in  the
300              cl_args.cgroup  field.  (This file descriptor can be obtained by
301              opening a cgroup v2 directory using either the O_RDONLY  or  the
302              O_PATH   flag.)    Note  that  all  of  the  usual  restrictions
303              (described in cgroups(7)) on placing a process into a version  2
304              cgroup apply.
305
306              Among  the possible use cases for CLONE_INTO_CGROUP are the fol‐
307              lowing:
308
309              *  Spawning a process into a cgroup different from the  parent's
310                 cgroup  makes  it  possible for a service manager to directly
311                 spawn new services into dedicated cgroups.   This  eliminates
312                 the  accounting  jitter  that  would  be  caused if the child
313                 process was first created in the same cgroup  as  the  parent
314                 and then moved into the target cgroup.  Furthermore, spawning
315                 the child process directly into a target cgroup  is  signifi‐
316                 cantly  cheaper than moving the child process into the target
317                 cgroup after it has been created.
318
319              *  The CLONE_INTO_CGROUP flag also allows the creation of frozen
320                 child  processes by spawning them into a frozen cgroup.  (See
321                 cgroups(7) for a description of the freezer controller.)
322
323              *  For threaded applications  (or  even  thread  implementations
324                 which make use of cgroups to limit individual threads), it is
325                 possible to establish a fixed cgroup layout  before  spawning
326                 each thread directly into its target cgroup.
327
328       CLONE_IO (since Linux 2.6.25)
329              If  CLONE_IO  is set, then the new process shares an I/O context
330              with the calling process.  If this flag is  not  set,  then  (as
331              with fork(2)) the new process has its own I/O context.
332
333              The  I/O  context  is the I/O scope of the disk scheduler (i.e.,
334              what the I/O scheduler uses to model scheduling of  a  process's
335              I/O).  If processes share the same I/O context, they are treated
336              as one by the I/O scheduler.  As  a  consequence,  they  get  to
337              share  disk  time.   For  some  I/O schedulers, if two processes
338              share an I/O context, they will be allowed to  interleave  their
339              disk  access.  If several threads are doing I/O on behalf of the
340              same process (aio_read(3), for  instance),  they  should  employ
341              CLONE_IO to get better I/O performance.
342
343              If  the  kernel  is not configured with the CONFIG_BLOCK option,
344              this flag is a no-op.
345
346       CLONE_NEWCGROUP (since Linux 4.6)
347              Create the process in a new cgroup namespace.  If this  flag  is
348              not  set,  then  (as with fork(2)) the process is created in the
349              same cgroup namespaces as the calling process.
350
351              For further information on cgroup namespaces, see  cgroup_names‐
352              paces(7).
353
354              Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWC‐
355              GROUP.
356
357       CLONE_NEWIPC (since Linux 2.6.19)
358              If CLONE_NEWIPC is set, then create the process  in  a  new  IPC
359              namespace.  If this flag is not set, then (as with fork(2)), the
360              process is created in the same  IPC  namespace  as  the  calling
361              process.
362
363              For  further  information  on  IPC  namespaces,  see  ipc_names‐
364              paces(7).
365
366              Only   a   privileged   process   (CAP_SYS_ADMIN)   can   employ
367              CLONE_NEWIPC.   This flag can't be specified in conjunction with
368              CLONE_SYSVSEM.
369
370       CLONE_NEWNET (since Linux 2.6.24)
371              (The implementation of this flag was  completed  only  by  about
372              kernel version 2.6.29.)
373
374              If CLONE_NEWNET is set, then create the process in a new network
375              namespace.  If this flag is not set, then (as with fork(2))  the
376              process  is created in the same network namespace as the calling
377              process.
378
379              For  further  information  on  network  namespaces,   see   net‐
380              work_namespaces(7).
381
382              Only   a   privileged   process   (CAP_SYS_ADMIN)   can   employ
383              CLONE_NEWNET.
384
385       CLONE_NEWNS (since Linux 2.4.19)
386              If CLONE_NEWNS is set, the cloned child  is  started  in  a  new
387              mount namespace, initialized with a copy of the namespace of the
388              parent.  If CLONE_NEWNS is not set, the child lives in the  same
389              mount namespace as the parent.
390
391              For  further  information on mount namespaces, see namespaces(7)
392              and mount_namespaces(7).
393
394              Only   a   privileged   process   (CAP_SYS_ADMIN)   can   employ
395              CLONE_NEWNS.   It  is  not permitted to specify both CLONE_NEWNS
396              and CLONE_FS in the same clone call.
397
398       CLONE_NEWPID (since Linux 2.6.24)
399              If CLONE_NEWPID is set, then create the process  in  a  new  PID
400              namespace.   If this flag is not set, then (as with fork(2)) the
401              process is created in the same  PID  namespace  as  the  calling
402              process.
403
404              For further information on PID namespaces, see namespaces(7) and
405              pid_namespaces(7).
406
407              Only a privileged process (CAP_SYS_ADMIN) can employ  CLONE_NEW‐
408              PID.    This   flag  can't  be  specified  in  conjunction  with
409              CLONE_THREAD or CLONE_PARENT.
410
411       CLONE_NEWUSER
412              (This flag first became meaningful for clone() in Linux  2.6.23,
413              the  current clone() semantics were merged in Linux 3.5, and the
414              final pieces to make the user namespaces completely usable  were
415              merged in Linux 3.8.)
416
417              If  CLONE_NEWUSER  is set, then create the process in a new user
418              namespace.  If this flag is not set, then (as with fork(2))  the
419              process  is  created  in  the same user namespace as the calling
420              process.
421
422              For further information on user  namespaces,  see  namespaces(7)
423              and user_namespaces(7).
424
425              Before  Linux 3.8, use of CLONE_NEWUSER required that the caller
426              have three capabilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SET‐
427              GID.   Starting with Linux 3.8, no privileges are needed to cre‐
428              ate a user namespace.
429
430              This flag can't be specified in conjunction with CLONE_THREAD or
431              CLONE_PARENT.   For  security  reasons,  CLONE_NEWUSER cannot be
432              specified in conjunction with CLONE_FS.
433
434       CLONE_NEWUTS (since Linux 2.6.19)
435              If CLONE_NEWUTS is set, then create the process  in  a  new  UTS
436              namespace,  whose identifiers are initialized by duplicating the
437              identifiers from the UTS namespace of the calling  process.   If
438              this flag is not set, then (as with fork(2)) the process is cre‐
439              ated in the same UTS namespace as the calling process.
440
441              For  further  information  on  UTS  namespaces,  see  uts_names‐
442              paces(7).
443
444              Only   a   privileged   process   (CAP_SYS_ADMIN)   can   employ
445              CLONE_NEWUTS.
446
447       CLONE_PARENT (since Linux 2.3.12)
448              If CLONE_PARENT is set, then the parent of  the  new  child  (as
449              returned  by getppid(2)) will be the same as that of the calling
450              process.
451
452              If CLONE_PARENT is not set, then (as with fork(2))  the  child's
453              parent is the calling process.
454
455              Note  that  it is the parent process, as returned by getppid(2),
456              which  is  signaled  when  the  child  terminates,  so  that  if
457              CLONE_PARENT  is  set,  then  the parent of the calling process,
458              rather than the calling process itself, will be signaled.
459
460              The CLONE_PARENT flag can't be used in clone calls by the global
461              init  process (PID 1 in the initial PID namespace) and init pro‐
462              cesses in other PID namespaces.  This restriction  prevents  the
463              creation  of  multi-rooted process trees as well as the creation
464              of unreapable zombies in the initial PID namespace.
465
466       CLONE_PARENT_SETTID (since Linux 2.5.49)
467              Store the child thread ID at the location  pointed  to  by  par‐
468              ent_tid  (clone())  or cl_args.parent_tid (clone3()) in the par‐
469              ent's  memory.   (In  Linux  2.5.32-2.5.48  there  was  a   flag
470              CLONE_SETTID  that  did  this.)   The  store operation completes
471              before the clone call returns control to user space.
472
473       CLONE_PID (Linux 2.0 to 2.5.15)
474              If CLONE_PID is set, the child process is created with the  same
475              process ID as the calling process.  This is good for hacking the
476              system, but otherwise  of  not  much  use.   From  Linux  2.3.21
477              onward,  this  flag  could  be specified only by the system boot
478              process (PID 0).  The flag disappeared completely from the  ker‐
479              nel  sources in Linux 2.5.16.  Subsequently, the kernel silently
480              ignored this bit if it was specified in the  flags  mask.   Much
481              later,  the  same  bit  was  recycled for use as the CLONE_PIDFD
482              flag.
483
484       CLONE_PIDFD (since Linux 5.2)
485              If this flag is specified, a PID file  descriptor  referring  to
486              the  child  process is allocated and placed at a specified loca‐
487              tion in the parent's memory.  The close-on-exec flag is  set  on
488              this  new file descriptor.  PID file descriptors can be used for
489              the purposes described in pidfd_open(2).
490
491              *  When using clone3(), the PID file descriptor is placed at the
492                 location pointed to by cl_args.pidfd.
493
494              *  When  using clone(), the PID file descriptor is placed at the
495                 location pointed to  by  parent_tid.   Since  the  parent_tid
496                 argument   is   used  to  return  the  PID  file  descriptor,
497                 CLONE_PIDFD cannot  be  used  with  CLONE_PARENT_SETTID  when
498                 calling clone().
499
500              It  is  currently  not  possible  to use this flag together with
501              CLONE_THREAD.  This means that the process identified by the PID
502              file descriptor will always be a thread group leader.
503
504              If  the  obsolete  CLONE_DETACHED  flag  is  specified alongside
505              CLONE_PIDFD when calling clone(),  an  error  is  returned.   An
506              error  also  results if CLONE_DETACHED is specified when calling
507              clone3().  This error behavior ensures that the bit  correspond‐
508              ing  to  CLONE_DETACHED  can  be  reused  for  further  PID file
509              descriptor features in the future.
510
511       CLONE_PTRACE (since Linux 2.2)
512              If CLONE_PTRACE is specified, and the calling process  is  being
513              traced, then trace the child also (see ptrace(2)).
514
515       CLONE_SETTLS (since Linux 2.5.32)
516              The TLS (Thread Local Storage) descriptor is set to tls.
517
518              The  interpretation of tls and the resulting effect is architec‐
519              ture  dependent.   On  x86,  tls  is  interpreted  as  a  struct
520              user_desc *  (see  set_thread_area(2)).  On x86-64 it is the new
521              value to be set for the %fs base register (see  the  ARCH_SET_FS
522              argument  to  arch_prctl(2)).  On architectures with a dedicated
523              TLS register, it is the new value of that register.
524
525              Use of this flag requires detailed knowledge  and  generally  it
526              should not be used except in libraries implementing threading.
527
528       CLONE_SIGHAND (since Linux 2.0)
529              If  CLONE_SIGHAND  is  set,  the  calling  process and the child
530              process share the same table of signal handlers.  If the calling
531              process or child process calls sigaction(2) to change the behav‐
532              ior associated with a signal, the behavior  is  changed  in  the
533              other  process  as well.  However, the calling process and child
534              processes still have distinct signal masks and sets  of  pending
535              signals.   So,  one  of  them may block or unblock signals using
536              sigprocmask(2) without affecting the other process.
537
538              If CLONE_SIGHAND is not set, the child process inherits  a  copy
539              of the signal handlers of the calling process at the time of the
540              clone call.  Calls to sigaction(2) performed later by one of the
541              processes have no effect on the other process.
542
543              Since  Linux 2.6.0, the flags mask must also include CLONE_VM if
544              CLONE_SIGHAND is specified
545
546       CLONE_STOPPED (since Linux 2.6.0)
547              If CLONE_STOPPED is set, then the child is initially stopped (as
548              though  it  was  sent  a SIGSTOP signal), and must be resumed by
549              sending it a SIGCONT signal.
550
551              This flag was deprecated  from  Linux  2.6.25  onward,  and  was
552              removed  altogether  in  Linux  2.6.38.   Since then, the kernel
553              silently ignores it without error.  Starting with Linux 4.6, the
554              same bit was reused for the CLONE_NEWCGROUP flag.
555
556       CLONE_SYSVSEM (since Linux 2.5.10)
557              If  CLONE_SYSVSEM is set, then the child and the calling process
558              share a single list of System V  semaphore  adjustment  (semadj)
559              values  (see  semop(2)).   In this case, the shared list accumu‐
560              lates semadj values across all processes sharing the  list,  and
561              semaphore  adjustments  are performed only when the last process
562              that is sharing the list terminates (or ceases sharing the  list
563              using  unshare(2)).  If this flag is not set, then the child has
564              a separate semadj list that is initially empty.
565
566       CLONE_THREAD (since Linux 2.4.0)
567              If CLONE_THREAD is set, the child is placed in the  same  thread
568              group as the calling process.  To make the remainder of the dis‐
569              cussion of CLONE_THREAD more readable, the term "thread" is used
570              to refer to the processes within a thread group.
571
572              Thread  groups  were a feature added in Linux 2.4 to support the
573              POSIX threads notion of a set of threads  that  share  a  single
574              PID.   Internally, this shared PID is the so-called thread group
575              identifier (TGID) for the thread group.  Since Linux 2.4,  calls
576              to getpid(2) return the TGID of the caller.
577
578              The  threads  within a group can be distinguished by their (sys‐
579              tem-wide) unique thread IDs (TID).  A new thread's TID is avail‐
580              able as the function result returned to the caller, and a thread
581              can obtain its own TID using gettid(2).
582
583              When a clone call is made without specifying CLONE_THREAD,  then
584              the  resulting thread is placed in a new thread group whose TGID
585              is the same as the thread's TID.  This thread is the  leader  of
586              the new thread group.
587
588              A  new  thread  created  with  CLONE_THREAD  has the same parent
589              process as the process that made  the  clone  call  (i.e.,  like
590              CLONE_PARENT), so that calls to getppid(2) return the same value
591              for all of the threads in a thread group.  When  a  CLONE_THREAD
592              thread  terminates,  the  thread  that  created it is not sent a
593              SIGCHLD (or other termination) signal; nor  can  the  status  of
594              such a thread be obtained using wait(2).  (The thread is said to
595              be detached.)
596
597              After all of the threads in a thread group terminate the  parent
598              process of the thread group is sent a SIGCHLD (or other termina‐
599              tion) signal.
600
601              If any of the threads in a thread group performs  an  execve(2),
602              then  all  threads other than the thread group leader are termi‐
603              nated, and the new program  is  executed  in  the  thread  group
604              leader.
605
606              If  one  of  the threads in a thread group creates a child using
607              fork(2), then any thread in  the  group  can  wait(2)  for  that
608              child.
609
610              Since  Linux 2.5.35, the flags mask must also include CLONE_SIG‐
611              HAND if CLONE_THREAD is specified (and note  that,  since  Linux
612              2.6.0, CLONE_SIGHAND also requires CLONE_VM to be included).
613
614              Signal  dispositions  and actions are process-wide: if an unhan‐
615              dled signal is delivered to a thread, then it will affect  (ter‐
616              minate, stop, continue, be ignored in) all members of the thread
617              group.
618
619              Each thread has its own signal mask, as set by sigprocmask(2).
620
621              A signal may be process-directed or thread-directed.  A process-
622              directed  signal  is  targeted at a thread group (i.e., a TGID),
623              and is delivered to an arbitrarily selected  thread  from  among
624              those  that  are  not  blocking  the  signal.   A  signal may be
625              process-directed because it was generated by the kernel for rea‐
626              sons  other  than  a  hardware exception, or because it was sent
627              using kill(2) or sigqueue(3).  A thread-directed signal is  tar‐
628              geted  at  (i.e., delivered to) a specific thread.  A signal may
629              be thread directed  because  it  was  sent  using  tgkill(2)  or
630              pthread_sigqueue(3),  or  because  the thread executed a machine
631              language instruction that triggered a hardware exception  (e.g.,
632              invalid  memory  access  triggering  SIGSEGV or a floating-point
633              exception triggering SIGFPE).
634
635              A call to sigpending(2) returns a signal set that is  the  union
636              of the pending process-directed signals and the signals that are
637              pending for the calling thread.
638
639              If a process-directed signal is delivered to a thread group, and
640              the  thread  group  has installed a handler for the signal, then
641              the handler will be invoked in exactly one, arbitrarily selected
642              member  of the thread group that has not blocked the signal.  If
643              multiple threads in a group are waiting to accept the same  sig‐
644              nal using sigwaitinfo(2), the kernel will arbitrarily select one
645              of these threads to receive the signal.
646
647       CLONE_UNTRACED (since Linux 2.5.46)
648              If CLONE_UNTRACED is specified, then a  tracing  process  cannot
649              force CLONE_PTRACE on this child process.
650
651       CLONE_VFORK (since Linux 2.2)
652              If  CLONE_VFORK  is set, the execution of the calling process is
653              suspended until the child releases its virtual memory  resources
654              via a call to execve(2) or _exit(2) (as with vfork(2)).
655
656              If CLONE_VFORK is not set, then both the calling process and the
657              child are schedulable after the call, and an application  should
658              not rely on execution occurring in any particular order.
659
660       CLONE_VM (since Linux 2.0)
661              If  CLONE_VM  is  set, the calling process and the child process
662              run in the same memory space.  In particular, memory writes per‐
663              formed  by  the calling process or by the child process are also
664              visible in the other process.  Moreover, any memory  mapping  or
665              unmapping  performed  with  mmap(2) or munmap(2) by the child or
666              calling process also affects the other process.
667
668              If CLONE_VM is not set, the child process  runs  in  a  separate
669              copy  of  the memory space of the calling process at the time of
670              the clone call.  Memory writes or file mappings/unmappings  per‐
671              formed  by one of the processes do not affect the other, as with
672              fork(2).
673

RETURN VALUE

675       On success, the thread ID of the child process is returned in the call‐
676       er's  thread  of execution.  On failure, -1 is returned in the caller's
677       context, no child process will be created, and errno will be set appro‐
678       priately.
679

ERRORS

681       EAGAIN Too many processes are already running; see fork(2).
682
683       EBUSY (clone3() only)
684              CLONE_INTO_CGROUP  was  specified in cl_args.flags, but the file
685              descriptor specified in cl_args.cgroup refers  to  a  version  2
686              cgroup in which a domain controller is enabled.
687
688       EEXIST (clone3() only)
689              One (or more) of the PIDs specified in set_tid already exists in
690              the corresponding PID namespace.
691
692       EINVAL Both CLONE_SIGHAND and CLONE_CLEAR_SIGHAND were specified in the
693              flags mask.
694
695       EINVAL CLONE_SIGHAND  was specified in the flags mask, but CLONE_VM was
696              not.  (Since Linux 2.6.0.)
697
698       EINVAL CLONE_THREAD was specified in the flags mask, but  CLONE_SIGHAND
699              was not.  (Since Linux 2.5.35.)
700
701       EINVAL CLONE_THREAD  was  specified  in the flags mask, but the current
702              process previously called unshare(2) with the CLONE_NEWPID  flag
703              or used setns(2) to reassociate itself with a PID namespace.
704
705       EINVAL Both CLONE_FS and CLONE_NEWNS were specified in the flags mask.
706
707       EINVAL (since Linux 3.9)
708              Both  CLONE_NEWUSER  and  CLONE_FS  were  specified in the flags
709              mask.
710
711       EINVAL Both CLONE_NEWIPC and CLONE_SYSVSEM were specified in the  flags
712              mask.
713
714       EINVAL One (or both) of CLONE_NEWPID or CLONE_NEWUSER and one (or both)
715              of CLONE_THREAD or CLONE_PARENT  were  specified  in  the  flags
716              mask.
717
718       EINVAL (since Linux 2.6.32)
719              CLONE_PARENT was specified, and the caller is an init process.
720
721       EINVAL Returned  by the glibc clone() wrapper function when fn or stack
722              is specified as NULL.
723
724       EINVAL CLONE_NEWIPC was specified in the flags mask, but the kernel was
725              not   configured   with  the  CONFIG_SYSVIPC  and  CONFIG_IPC_NS
726              options.
727
728       EINVAL CLONE_NEWNET was specified in the flags mask, but the kernel was
729              not configured with the CONFIG_NET_NS option.
730
731       EINVAL CLONE_NEWPID was specified in the flags mask, but the kernel was
732              not configured with the CONFIG_PID_NS option.
733
734       EINVAL CLONE_NEWUSER was specified in the flags mask,  but  the  kernel
735              was not configured with the CONFIG_USER_NS option.
736
737       EINVAL CLONE_NEWUTS was specified in the flags mask, but the kernel was
738              not configured with the CONFIG_UTS_NS option.
739
740       EINVAL stack is not aligned to a suitable boundary for  this  architec‐
741              ture.  For example, on aarch64, stack must be a multiple of 16.
742
743       EINVAL (clone3() only)
744              CLONE_DETACHED was specified in the flags mask.
745
746       EINVAL (clone() only)
747              CLONE_PIDFD  was  specified  together with CLONE_DETACHED in the
748              flags mask.
749
750       EINVAL CLONE_PIDFD was specified  together  with  CLONE_THREAD  in  the
751              flags mask.
752
753       EINVAL (clone() only)
754              CLONE_PIDFD  was  specified together with CLONE_PARENT_SETTID in
755              the flags mask.
756
757       EINVAL (clone3() only)
758              set_tid_size is greater than the number  of  nested  PID  names‐
759              paces.
760
761       EINVAL (clone3() only)
762              One of the PIDs specified in set_tid was an invalid.
763
764       EINVAL (AArch64 only, Linux 4.6 and earlier)
765              stack was not aligned to a 126-bit boundary.
766
767       ENOMEM Cannot  allocate  sufficient memory to allocate a task structure
768              for the child, or to copy those parts of  the  caller's  context
769              that need to be copied.
770
771       ENOSPC (since Linux 3.7)
772              CLONE_NEWPID  was  specified in the flags mask, but the limit on
773              the nesting depth of PID namespaces would  have  been  exceeded;
774              see pid_namespaces(7).
775
776       ENOSPC (since Linux 4.9; beforehand EUSERS)
777              CLONE_NEWUSER  was  specified  in  the  flags mask, and the call
778              would cause the limit on the number of nested user namespaces to
779              be exceeded.  See user_namespaces(7).
780
781              From  Linux  3.11 to Linux 4.8, the error diagnosed in this case
782              was EUSERS.
783
784       ENOSPC (since Linux 4.9)
785              One of the values in the flags mask specified the creation of  a
786              new  user  namespace,  but  doing so would have caused the limit
787              defined by  the  corresponding  file  in  /proc/sys/user  to  be
788              exceeded.  For further details, see namespaces(7).
789
790       EOPNOTSUP (clone3() only)
791              CLONE_INTO_CGROUP  was  specified in cl_args.flags, but the file
792              descriptor specified in cl_args.cgroup refers  to  a  version  2
793              cgroup that is in the domain invalid state.
794
795       EPERM  CLONE_NEWCGROUP,    CLONE_NEWIPC,   CLONE_NEWNET,   CLONE_NEWNS,
796              CLONE_NEWPID, or CLONE_NEWUTS was specified by  an  unprivileged
797              process (process without CAP_SYS_ADMIN).
798
799       EPERM  CLONE_PID  was  specified  by  a  process  other than process 0.
800              (This error occurs only on Linux 2.5.15 and earlier.)
801
802       EPERM  CLONE_NEWUSER was specified in the flags mask,  but  either  the
803              effective  user  ID or the effective group ID of the caller does
804              not have a mapping in  the  parent  namespace  (see  user_names‐
805              paces(7)).
806
807       EPERM (since Linux 3.9)
808              CLONE_NEWUSER  was specified in the flags mask and the caller is
809              in a chroot environment (i.e., the caller's root directory  does
810              not  match the root directory of the mount namespace in which it
811              resides).
812
813       EPERM (clone3() only)
814              set_tid_size was greater than zero, and  the  caller  lacks  the
815              CAP_SYS_ADMIN  capability  in one or more of the user namespaces
816              that own the corresponding PID namespaces.
817
818       ERESTARTNOINTR (since Linux 2.6.17)
819              System call was interrupted by a signal and will  be  restarted.
820              (This can be seen only during a trace.)
821
822       EUSERS (Linux 3.11 to Linux 4.8)
823              CLONE_NEWUSER  was specified in the flags mask, and the limit on
824              the number of nested user namespaces would be exceeded.  See the
825              discussion of the ENOSPC error above.
826

VERSIONS

828       The clone3() system call first appeared in Linux 5.3.
829

CONFORMING TO

831       These  system  calls  are Linux-specific and should not be used in pro‐
832       grams intended to be portable.
833

NOTES

835       One use of these systems calls is to implement threads: multiple  flows
836       of  control  in  a  program  that  run concurrently in a shared address
837       space.
838
839       Glibc  does  not  provide  a  wrapper  for  clone3();  call  it   using
840       syscall(2).
841
842       Note  that the glibc clone() wrapper function makes some changes in the
843       memory pointed to by stack (changes required to set the stack  up  cor‐
844       rectly  for the child) before invoking the clone() system call.  So, in
845       cases where clone() is used to recursively create children, do not  use
846       the buffer employed for the parent's stack as the stack of the child.
847
848       The kcmp(2) system call can be used to test whether two processes share
849       various resources such as a file descriptor table, System  V  semaphore
850       undo operations, or a virtual address space.
851
852       Handlers  registered  using pthread_atfork(3) are not executed during a
853       clone call.
854
855       In the Linux 2.4.x series, CLONE_THREAD generally  does  not  make  the
856       parent of the new thread the same as the parent of the calling process.
857       However, for kernel versions 2.4.7  to  2.4.18  the  CLONE_THREAD  flag
858       implied the CLONE_PARENT flag (as in Linux 2.6.0 and later).
859
860       On  i386,  clone()  should not be called through vsyscall, but directly
861       through int $0x80.
862
863   C library/kernel differences
864       The raw clone() system call corresponds more closely to fork(2) in that
865       execution  in the child continues from the point of the call.  As such,
866       the fn and arg arguments of the clone() wrapper function are omitted.
867
868       In contrast to the glibc wrapper, the raw clone() system  call  accepts
869       NULL as a stack argument (and clone3() likewise allows cl_args.stack to
870       be NULL).  In this case, the child uses a  duplicate  of  the  parent's
871       stack.   (Copy-on-write  semantics  ensure that the child gets separate
872       copies of stack pages when either process modifies the stack.)  In this
873       case,  for  correct operation, the CLONE_VM option should not be speci‐
874       fied.  (If the child shares the parent's memory because of the  use  of
875       the  CLONE_VM  flag, then no copy-on-write duplication occurs and chaos
876       is likely to result.)
877
878       The order of the arguments also differs in the  raw  system  call,  and
879       there are variations in the arguments across architectures, as detailed
880       in the following paragraphs.
881
882       The raw system call interface on x86-64 and  some  other  architectures
883       (including sh, tile, and alpha) is:
884
885           long clone(unsigned long flags, void *stack,
886                      int *parent_tid, int *child_tid,
887                      unsigned long tls);
888
889       On  x86-32,  and  several  other common architectures (including score,
890       ARM, ARM 64, PA-RISC, arc, Power PC, xtensa, and MIPS),  the  order  of
891       the last two arguments is reversed:
892
893           long clone(unsigned long flags, void *stack,
894                     int *parent_tid, unsigned long tls,
895                     int *child_tid);
896
897       On  the  cris  and s390 architectures, the order of the first two argu‐
898       ments is reversed:
899
900           long clone(void *stack, unsigned long flags,
901                      int *parent_tid, int *child_tid,
902                      unsigned long tls);
903
904       On the microblaze architecture, an additional argument is supplied:
905
906           long clone(unsigned long flags, void *stack,
907                      int stack_size,         /* Size of stack */
908                      int *parent_tid, int *child_tid,
909                      unsigned long tls);
910
911   blackfin, m68k, and sparc
912       The argument-passing conventions on blackfin, m68k, and sparc are  dif‐
913       ferent  from  the descriptions above.  For details, see the kernel (and
914       glibc) source.
915
916   ia64
917       On ia64, a different interface is used:
918
919           int __clone2(int (*fn)(void *),
920                        void *stack_base, size_t stack_size,
921                        int flags, void *arg, ...
922                     /* pid_t *parent_tid, struct user_desc *tls,
923                        pid_t *child_tid */ );
924
925       The prototype shown above is for the glibc wrapper  function;  for  the
926       system  call  itself,  the prototype can be described as follows (it is
927       identical to the clone() prototype on microblaze):
928
929           long clone2(unsigned long flags, void *stack_base,
930                       int stack_size,         /* Size of stack */
931                       int *parent_tid, int *child_tid,
932                       unsigned long tls);
933
934       __clone2() operates in the same way as clone(), except that  stack_base
935       points  to the lowest address of the child's stack area, and stack_size
936       specifies the size of the stack pointed to by stack_base.
937
938   Linux 2.4 and earlier
939       In Linux 2.4 and earlier, clone() does not take  arguments  parent_tid,
940       tls, and child_tid.
941

BUGS

943       GNU C library versions 2.3.4 up to and including 2.24 contained a wrap‐
944       per function for  getpid(2)  that  performed  caching  of  PIDs.   This
945       caching relied on support in the glibc wrapper for clone(), but limita‐
946       tions in the implementation meant that the cache was not up to date  in
947       some  circumstances.   In  particular, if a signal was delivered to the
948       child immediately after the clone() call, then a call to getpid(2) in a
949       handler  for  the  signal  could  return the PID of the calling process
950       ("the parent"), if the clone wrapper had not yet had a chance to update
951       the  PID  cache  in the child.  (This discussion ignores the case where
952       the child was created using CLONE_THREAD, when getpid(2) should  return
953       the  same  value  in  the child and in the process that called clone(),
954       since the caller and the child are  in  the  same  thread  group.   The
955       stale-cache  problem also does not occur if the flags argument includes
956       CLONE_VM.)  To get the truth, it was sometimes necessary  to  use  code
957       such as the following:
958
959           #include <syscall.h>
960
961           pid_t mypid;
962
963           mypid = syscall(SYS_getpid);
964
965       Because  of the stale-cache problem, as well as other problems noted in
966       getpid(2), the PID caching feature was removed in glibc 2.25.
967

EXAMPLES

969       The following program demonstrates the use of clone() to create a child
970       process  that  executes in a separate UTS namespace.  The child changes
971       the hostname in its UTS namespace.  Both parent and child then  display
972       the  system  hostname, making it possible to see that the hostname dif‐
973       fers in the UTS namespaces of the parent and child.  For an example  of
974       the use of this program, see setns(2).
975
976       Within  the  sample  program, we allocate the memory that is to be used
977       for the child's stack using mmap(2) rather than malloc(3) for the  fol‐
978       lowing reasons:
979
980       *  mmap(2)  allocates  a block of memory that starts on a page boundary
981          and is a multiple of the page size.  This is useful if  we  want  to
982          establish a guard page (a page with protection PROT_NONE) at the end
983          of the stack using mprotect(2).
984
985       *  We can specify the MAP_STACK flag to request a mapping that is suit‐
986          able  for  a  stack.  For the moment, this flag is a no-op on Linux,
987          but it exists and has effect on some other  systems,  so  we  should
988          include it for portability.
989
990   Program source
991       #define _GNU_SOURCE
992       #include <sys/wait.h>
993       #include <sys/utsname.h>
994       #include <sched.h>
995       #include <string.h>
996       #include <stdio.h>
997       #include <stdlib.h>
998       #include <unistd.h>
999       #include <sys/mman.h>
1000
1001       #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
1002                               } while (0)
1003
1004       static int              /* Start function for cloned child */
1005       childFunc(void *arg)
1006       {
1007           struct utsname uts;
1008
1009           /* Change hostname in UTS namespace of child */
1010
1011           if (sethostname(arg, strlen(arg)) == -1)
1012               errExit("sethostname");
1013
1014           /* Retrieve and display hostname */
1015
1016           if (uname(&uts) == -1)
1017               errExit("uname");
1018           printf("uts.nodename in child:  %s\n", uts.nodename);
1019
1020           /* Keep the namespace open for a while, by sleeping.
1021              This allows some experimentation--for example, another
1022              process might join the namespace. */
1023
1024           sleep(200);
1025
1026           return 0;           /* Child terminates now */
1027       }
1028
1029       #define STACK_SIZE (1024 * 1024)    /* Stack size for cloned child */
1030
1031       int
1032       main(int argc, char *argv[])
1033       {
1034           char *stack;                    /* Start of stack buffer */
1035           char *stackTop;                 /* End of stack buffer */
1036           pid_t pid;
1037           struct utsname uts;
1038
1039           if (argc < 2) {
1040               fprintf(stderr, "Usage: %s <child-hostname>\n", argv[0]);
1041               exit(EXIT_SUCCESS);
1042           }
1043
1044           /* Allocate memory to be used for the stack of the child */
1045
1046           stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
1047                        MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
1048           if (stack == MAP_FAILED)
1049               errExit("mmap");
1050
1051           stackTop = stack + STACK_SIZE;  /* Assume stack grows downward */
1052
1053           /* Create child that has its own UTS namespace;
1054              child commences execution in childFunc() */
1055
1056           pid = clone(childFunc, stackTop, CLONE_NEWUTS | SIGCHLD, argv[1]);
1057           if (pid == -1)
1058               errExit("clone");
1059           printf("clone() returned %ld\n", (long) pid);
1060
1061           /* Parent falls through to here */
1062
1063           sleep(1);           /* Give child time to change its hostname */
1064
1065           /* Display hostname in parent's UTS namespace. This will be
1066              different from hostname in child's UTS namespace. */
1067
1068           if (uname(&uts) == -1)
1069               errExit("uname");
1070           printf("uts.nodename in parent: %s\n", uts.nodename);
1071
1072           if (waitpid(pid, NULL, 0) == -1)    /* Wait for child */
1073               errExit("waitpid");
1074           printf("child has terminated\n");
1075
1076           exit(EXIT_SUCCESS);
1077       }
1078

COLOPHON

1086       This page is part of release 5.07 of the Linux  man-pages  project.   A
1087       description  of  the project, information about reporting bugs, and the
1088       latest    version    of    this    page,    can     be     found     at
1089       https://www.kernel.org/doc/man-pages/.
1090
1091
1092
1093Linux                             2020-06-09                          CLONE(2)