1clone(2)                      System Calls Manual                     clone(2)
2
3
4

NAME

6       clone, __clone2, clone3 - create a child process
7

LIBRARY

9       Standard C library (libc, -lc)
10

SYNOPSIS

12       /* Prototype for the glibc wrapper function */
13
14       #define _GNU_SOURCE
15       #include <sched.h>
16
17       int clone(int (*fn)(void *_Nullable), void *stack, int flags,
18                 void *_Nullable arg, ...  /* pid_t *_Nullable parent_tid,
19                                              void *_Nullable tls,
20                                              pid_t *_Nullable child_tid */ );
21
22       /* For the prototype of the raw clone() system call, see NOTES */
23
24       #include <linux/sched.h>    /* Definition of struct clone_args */
25       #include <sched.h>          /* Definition of CLONE_* constants */
26       #include <sys/syscall.h>    /* Definition of SYS_* constants */
27       #include <unistd.h>
28
29       long syscall(SYS_clone3, struct clone_args *cl_args, size_t size);
30
31       Note:  glibc provides no wrapper for clone3(), necessitating the use of
32       syscall(2).
33

DESCRIPTION

35       These system calls create a new ("child") process, in a manner  similar
36       to fork(2).
37
38       By  contrast with fork(2), these system calls provide more precise con‐
39       trol over what pieces of execution context are shared between the call‐
40       ing  process  and  the  child process.  For example, using these system
41       calls, the caller can control whether or not the  two  processes  share
42       the virtual address space, the table of file descriptors, and the table
43       of signal handlers.  These  system  calls  also  allow  the  new  child
44       process to be placed in separate namespaces(7).
45
46       Note  that  in this manual page, "calling process" normally corresponds
47       to "parent process".  But see  the  descriptions  of  CLONE_PARENT  and
48       CLONE_THREAD below.
49
50       This page describes the following interfaces:
51
52       •  The glibc clone() wrapper function and the underlying system call on
53          which it is based.  The main text describes  the  wrapper  function;
54          the differences for the raw system call are described toward the end
55          of this page.
56
57       •  The newer clone3() system call.
58
59       In the remainder of this page, the terminology "the clone call" is used
60       when noting details that apply to all of these interfaces,
61
62   The clone() wrapper function
63       When the child process is created with the clone() wrapper function, it
64       commences execution by calling the function pointed to by the  argument
65       fn.  (This differs from fork(2), where execution continues in the child
66       from the point of the fork(2) call.)  The arg argument is passed as the
67       argument of the function fn.
68
69       When  the  fn(arg) function returns, the child process terminates.  The
70       integer returned by fn is the exit status for the child  process.   The
71       child process may also terminate explicitly by calling exit(2) or after
72       receiving a fatal signal.
73
74       The stack argument specifies the location of  the  stack  used  by  the
75       child  process.   Since the child and calling process may share memory,
76       it is not possible for the child process to execute in the  same  stack
77       as the calling process.  The calling process must therefore set up mem‐
78       ory space for the child stack and pass  a  pointer  to  this  space  to
79       clone().  Stacks grow downward on all processors that run Linux (except
80       the HP PA processors), so stack usually points to the  topmost  address
81       of the memory space set up for the child stack.  Note that clone() does
82       not provide a means whereby the caller can inform  the  kernel  of  the
83       size of the stack area.
84
85       The remaining arguments to clone() are discussed below.
86
87   clone3()
88       The  clone3()  system  call provides a superset of the functionality of
89       the older clone() interface.  It also provides a number of API improve‐
90       ments,  including:  space for additional flags bits; cleaner separation
91       in the use of various arguments; and the ability to specify the size of
92       the child's stack area.
93
94       As with fork(2), clone3() returns in both the parent and the child.  It
95       returns 0 in the child process and returns the PID of the child in  the
96       parent.
97
98       The cl_args argument of clone3() is a structure of the following form:
99
100           struct clone_args {
101               u64 flags;        /* Flags bit mask */
102               u64 pidfd;        /* Where to store PID file descriptor
103                                    (int *) */
104               u64 child_tid;    /* Where to store child TID,
105                                    in child's memory (pid_t *) */
106               u64 parent_tid;   /* Where to store child TID,
107                                    in parent's memory (pid_t *) */
108               u64 exit_signal;  /* Signal to deliver to parent on
109                                    child termination */
110               u64 stack;        /* Pointer to lowest byte of stack */
111               u64 stack_size;   /* Size of stack */
112               u64 tls;          /* Location of new TLS */
113               u64 set_tid;      /* Pointer to a pid_t array
114                                    (since Linux 5.5) */
115               u64 set_tid_size; /* Number of elements in set_tid
116                                    (since Linux 5.5) */
117               u64 cgroup;       /* File descriptor for target cgroup
118                                    of child (since Linux 5.7) */
119           };
120
121       The size argument that is supplied to clone3() should be initialized to
122       the size of this structure.  (The existence of the size  argument  per‐
123       mits future extensions to the clone_args structure.)
124
125       The  stack  for the child process is specified via cl_args.stack, which
126       points to the lowest byte of the stack  area,  and  cl_args.stack_size,
127       which  specifies the size of the stack in bytes.  In the case where the
128       CLONE_VM flag (see below) is specified, a stack must be explicitly  al‐
129       located and specified.  Otherwise, these two fields can be specified as
130       NULL and 0, which causes the child to use the same stack  area  as  the
131       parent (in the child's own virtual address space).
132
133       The remaining fields in the cl_args argument are discussed below.
134
135   Equivalence between clone() and clone3() arguments
136       Unlike the older clone() interface, where arguments are passed individ‐
137       ually, in the newer clone3() interface the arguments are packaged  into
138       the  clone_args structure shown above.  This structure allows for a su‐
139       perset of the information passed via the clone() arguments.
140
141       The following table shows the  equivalence  between  the  arguments  of
142       clone() and the fields in the clone_args argument supplied to clone3():
143
144           clone()         clone3()        Notes
145                           cl_args field
146           flags & ~0xff   flags           For most flags; details
147                                           below
148           parent_tid      pidfd           See CLONE_PIDFD
149           child_tid       child_tid       See CLONE_CHILD_SETTID
150           parent_tid      parent_tid      See CLONE_PARENT_SETTID
151           flags & 0xff    exit_signal
152           stack           stack
153           ---             stack_size
154           tls             tls             See CLONE_SETTLS
155           ---             set_tid         See below for details
156           ---             set_tid_size
157           ---             cgroup          See CLONE_INTO_CGROUP
158
159   The child termination signal
160       When the child process terminates, a signal may be sent to the  parent.
161       The  termination signal is specified in the low byte of flags (clone())
162       or in cl_args.exit_signal (clone3()).  If this signal is  specified  as
163       anything  other  than SIGCHLD, then the parent process must specify the
164       __WALL or __WCLONE options when waiting for the child with wait(2).  If
165       no  signal  (i.e.,  zero)  is specified, then the parent process is not
166       signaled when the child terminates.
167
168   The set_tid array
169       By default, the kernel chooses the next  sequential  PID  for  the  new
170       process in each of the PID namespaces where it is present.  When creat‐
171       ing a process with clone3(), the set_tid array (available  since  Linux
172       5.5) can be used to select specific PIDs for the process in some or all
173       of the PID namespaces where it is present.  If the  PID  of  the  newly
174       created  process should be set only for the current PID namespace or in
175       the newly created PID namespace (if flags contains  CLONE_NEWPID)  then
176       the  first  element  in the set_tid array has to be the desired PID and
177       set_tid_size needs to be 1.
178
179       If the PID of the newly created process should have a certain value  in
180       multiple  PID  namespaces, then the set_tid array can have multiple en‐
181       tries.  The first entry defines the PID in the most deeply  nested  PID
182       namespace  and  each  of  the following entries contains the PID in the
183       corresponding ancestor PID namespace.  The number of PID namespaces  in
184       which  a  PID  should be set is defined by set_tid_size which cannot be
185       larger than the number of currently nested PID namespaces.
186
187       To create a process with the following PIDs in a PID namespace  hierar‐
188       chy:
189
190           PID NS level   Requested PID   Notes
191           0              31496           Outermost PID namespace
192           1              42
193           2              7               Innermost PID namespace
194
195       Set the array to:
196
197           set_tid[0] = 7;
198           set_tid[1] = 42;
199           set_tid[2] = 31496;
200           set_tid_size = 3;
201
202       If  only the PIDs in the two innermost PID namespaces need to be speci‐
203       fied, set the array to:
204
205           set_tid[0] = 7;
206           set_tid[1] = 42;
207           set_tid_size = 2;
208
209       The PID in the PID namespaces outside the two innermost PID  namespaces
210       is selected the same way as any other PID is selected.
211
212       The  set_tid  feature  requires  CAP_SYS_ADMIN  or  (since  Linux  5.9)
213       CAP_CHECKPOINT_RESTORE in all owning user namespaces of the target  PID
214       namespaces.
215
216       Callers  may  only choose a PID greater than 1 in a given PID namespace
217       if an init process (i.e., a process with PID 1) already exists in  that
218       namespace.  Otherwise the PID entry for this PID namespace must be 1.
219
220   The flags mask
221       Both  clone()  and  clone3() allow a flags bit mask that modifies their
222       behavior and allows the caller to specify what is  shared  between  the
223       calling  process  and the child process.  This bit mask—the flags argu‐
224       ment of clone() or the cl_args.flags field passed  to  clone3()—is  re‐
225       ferred to as the flags mask in the remainder of this page.
226
227       The flags mask is specified as a bitwise OR of zero or more of the con‐
228       stants listed below.  Except as noted below, these flags are  available
229       (and have the same effect) in both clone() and clone3().
230
231       CLONE_CHILD_CLEARTID (since Linux 2.5.49)
232              Clear  (zero)  the child thread ID at the location pointed to by
233              child_tid (clone()) or  cl_args.child_tid  (clone3())  in  child
234              memory  when  the  child  exits, and do a wakeup on the futex at
235              that address.  The  address  involved  may  be  changed  by  the
236              set_tid_address(2)  system  call.  This is used by threading li‐
237              braries.
238
239       CLONE_CHILD_SETTID (since Linux 2.5.49)
240              Store the  child  thread  ID  at  the  location  pointed  to  by
241              child_tid  (clone())  or  cl_args.child_tid  (clone3())  in  the
242              child's memory.  The store operation completes before the  clone
243              call  returns control to user space in the child process.  (Note
244              that the store operation may not have completed before the clone
245              call  returns  in  the  parent process, which is relevant if the
246              CLONE_VM flag is also employed.)
247
248       CLONE_CLEAR_SIGHAND (since Linux 5.5)
249              By default, signal dispositions in the child thread are the same
250              as  in  the parent.  If this flag is specified, then all signals
251              that are handled in the parent are reset to their default dispo‐
252              sitions (SIG_DFL) in the child.
253
254              Specifying  this flag together with CLONE_SIGHAND is nonsensical
255              and disallowed.
256
257       CLONE_DETACHED (historical)
258              For a while (during the Linux 2.5 development series) there  was
259              a  CLONE_DETACHED flag, which caused the parent not to receive a
260              signal when the child terminated.   Ultimately,  the  effect  of
261              this  flag  was  subsumed under the CLONE_THREAD flag and by the
262              time Linux 2.6.0 was released, this flag had no effect.   Start‐
263              ing  in  Linux  2.6.2,  the need to give this flag together with
264              CLONE_THREAD disappeared.
265
266              This flag is still defined, but it is usually ignored when call‐
267              ing  clone().   However,  see the description of CLONE_PIDFD for
268              some exceptions.
269
270       CLONE_FILES (since Linux 2.0)
271              If CLONE_FILES is set, the calling process and the child process
272              share  the same file descriptor table.  Any file descriptor cre‐
273              ated by the calling process or by  the  child  process  is  also
274              valid  in the other process.  Similarly, if one of the processes
275              closes a file descriptor, or changes its associated flags (using
276              the  fcntl(2)  F_SETFD operation), the other process is also af‐
277              fected.  If a process sharing a file descriptor table calls  ex‐
278              ecve(2), its file descriptor table is duplicated (unshared).
279
280              If  CLONE_FILES is not set, the child process inherits a copy of
281              all file descriptors opened in the calling process at  the  time
282              of  the  clone  call.   Subsequent operations that open or close
283              file descriptors, or change file descriptor flags, performed  by
284              either  the  calling  process or the child process do not affect
285              the other process.  Note, however, that the duplicated file  de‐
286              scriptors  in the child refer to the same open file descriptions
287              as the corresponding file descriptors in  the  calling  process,
288              and thus share file offsets and file status flags (see open(2)).
289
290       CLONE_FS (since Linux 2.0)
291              If  CLONE_FS  is set, the caller and the child process share the
292              same filesystem information.  This  includes  the  root  of  the
293              filesystem,  the  current working directory, and the umask.  Any
294              call to chroot(2), chdir(2), or umask(2) performed by the  call‐
295              ing process or the child process also affects the other process.
296
297              If CLONE_FS is not set, the child process works on a copy of the
298              filesystem information of the calling process at the time of the
299              clone call.  Calls to chroot(2), chdir(2), or umask(2) performed
300              later by one of the processes do not affect the other process.
301
302       CLONE_INTO_CGROUP (since Linux 5.7)
303              By default, a child process is placed  in  the  same  version  2
304              cgroup  as  its  parent.   The CLONE_INTO_CGROUP flag allows the
305              child process to be created in a  different  version  2  cgroup.
306              (Note  that  CLONE_INTO_CGROUP  has  effect  only  for version 2
307              cgroups.)
308
309              In order to place the child process in a different  cgroup,  the
310              caller specifies CLONE_INTO_CGROUP in cl_args.flags and passes a
311              file descriptor that  refers  to  a  version  2  cgroup  in  the
312              cl_args.cgroup  field.  (This file descriptor can be obtained by
313              opening a cgroup v2 directory using either the O_RDONLY  or  the
314              O_PATH  flag.)   Note  that  all  of the usual restrictions (de‐
315              scribed in cgroups(7)) on placing a process  into  a  version  2
316              cgroup apply.
317
318              Among  the possible use cases for CLONE_INTO_CGROUP are the fol‐
319              lowing:
320
321              •  Spawning a process into a cgroup different from the  parent's
322                 cgroup  makes  it  possible for a service manager to directly
323                 spawn new services into dedicated cgroups.   This  eliminates
324                 the  accounting  jitter  that  would  be  caused if the child
325                 process was first created in the same cgroup  as  the  parent
326                 and then moved into the target cgroup.  Furthermore, spawning
327                 the child process directly into a target cgroup  is  signifi‐
328                 cantly  cheaper than moving the child process into the target
329                 cgroup after it has been created.
330
331              •  The CLONE_INTO_CGROUP flag also allows the creation of frozen
332                 child  processes by spawning them into a frozen cgroup.  (See
333                 cgroups(7) for a description of the freezer controller.)
334
335              •  For threaded applications  (or  even  thread  implementations
336                 which make use of cgroups to limit individual threads), it is
337                 possible to establish a fixed cgroup layout  before  spawning
338                 each thread directly into its target cgroup.
339
340       CLONE_IO (since Linux 2.6.25)
341              If  CLONE_IO  is set, then the new process shares an I/O context
342              with the calling process.  If this flag is  not  set,  then  (as
343              with fork(2)) the new process has its own I/O context.
344
345              The  I/O  context  is the I/O scope of the disk scheduler (i.e.,
346              what the I/O scheduler uses to model scheduling of  a  process's
347              I/O).  If processes share the same I/O context, they are treated
348              as one by the I/O scheduler.  As  a  consequence,  they  get  to
349              share  disk  time.   For  some  I/O schedulers, if two processes
350              share an I/O context, they will be allowed to  interleave  their
351              disk  access.  If several threads are doing I/O on behalf of the
352              same process (aio_read(3), for  instance),  they  should  employ
353              CLONE_IO to get better I/O performance.
354
355              If  the  kernel  is not configured with the CONFIG_BLOCK option,
356              this flag is a no-op.
357
358       CLONE_NEWCGROUP (since Linux 4.6)
359              Create the process in a new cgroup namespace.  If this  flag  is
360              not  set,  then  (as with fork(2)) the process is created in the
361              same cgroup namespaces as the calling process.
362
363              For further information on cgroup namespaces,  see  cgroup_name‐
364              spaces(7).
365
366              Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWC‐
367              GROUP.
368
369       CLONE_NEWIPC (since Linux 2.6.19)
370              If CLONE_NEWIPC is set, then create the process  in  a  new  IPC
371              namespace.  If this flag is not set, then (as with fork(2)), the
372              process is created in the same  IPC  namespace  as  the  calling
373              process.
374
375              For   further  information  on  IPC  namespaces,  see  ipc_name‐
376              spaces(7).
377
378              Only   a   privileged   process   (CAP_SYS_ADMIN)   can   employ
379              CLONE_NEWIPC.   This flag can't be specified in conjunction with
380              CLONE_SYSVSEM.
381
382       CLONE_NEWNET (since Linux 2.6.24)
383              (The implementation of this flag was  completed  only  by  about
384              Linux 2.6.29.)
385
386              If CLONE_NEWNET is set, then create the process in a new network
387              namespace.  If this flag is not set, then (as with fork(2))  the
388              process  is created in the same network namespace as the calling
389              process.
390
391              For further information on network namespaces, see network_name‐
392              spaces(7).
393
394              Only   a   privileged   process   (CAP_SYS_ADMIN)   can   employ
395              CLONE_NEWNET.
396
397       CLONE_NEWNS (since Linux 2.4.19)
398              If CLONE_NEWNS is set, the cloned child  is  started  in  a  new
399              mount namespace, initialized with a copy of the namespace of the
400              parent.  If CLONE_NEWNS is not set, the child lives in the  same
401              mount namespace as the parent.
402
403              For  further  information on mount namespaces, see namespaces(7)
404              and mount_namespaces(7).
405
406              Only   a   privileged   process   (CAP_SYS_ADMIN)   can   employ
407              CLONE_NEWNS.   It  is  not permitted to specify both CLONE_NEWNS
408              and CLONE_FS in the same clone call.
409
410       CLONE_NEWPID (since Linux 2.6.24)
411              If CLONE_NEWPID is set, then create the process  in  a  new  PID
412              namespace.   If this flag is not set, then (as with fork(2)) the
413              process is created in the same  PID  namespace  as  the  calling
414              process.
415
416              For further information on PID namespaces, see namespaces(7) and
417              pid_namespaces(7).
418
419              Only a privileged process (CAP_SYS_ADMIN) can employ  CLONE_NEW‐
420              PID.    This   flag  can't  be  specified  in  conjunction  with
421              CLONE_THREAD or CLONE_PARENT.
422
423       CLONE_NEWUSER
424              (This flag first became meaningful for clone() in Linux  2.6.23,
425              the  current clone() semantics were merged in Linux 3.5, and the
426              final pieces to make the user namespaces completely usable  were
427              merged in Linux 3.8.)
428
429              If  CLONE_NEWUSER  is set, then create the process in a new user
430              namespace.  If this flag is not set, then (as with fork(2))  the
431              process  is  created  in  the same user namespace as the calling
432              process.
433
434              For further information on user  namespaces,  see  namespaces(7)
435              and user_namespaces(7).
436
437              Before  Linux 3.8, use of CLONE_NEWUSER required that the caller
438              have three capabilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SET‐
439              GID.   Starting with Linux 3.8, no privileges are needed to cre‐
440              ate a user namespace.
441
442              This flag can't be specified in conjunction with CLONE_THREAD or
443              CLONE_PARENT.   For  security  reasons,  CLONE_NEWUSER cannot be
444              specified in conjunction with CLONE_FS.
445
446       CLONE_NEWUTS (since Linux 2.6.19)
447              If CLONE_NEWUTS is set, then create the process  in  a  new  UTS
448              namespace,  whose identifiers are initialized by duplicating the
449              identifiers from the UTS namespace of the calling  process.   If
450              this flag is not set, then (as with fork(2)) the process is cre‐
451              ated in the same UTS namespace as the calling process.
452
453              For  further  information  on  UTS  namespaces,  see   uts_name‐
454              spaces(7).
455
456              Only   a   privileged   process   (CAP_SYS_ADMIN)   can   employ
457              CLONE_NEWUTS.
458
459       CLONE_PARENT (since Linux 2.3.12)
460              If CLONE_PARENT is set, then the parent of the new child (as re‐
461              turned  by  getppid(2))  will be the same as that of the calling
462              process.
463
464              If CLONE_PARENT is not set, then (as with fork(2))  the  child's
465              parent is the calling process.
466
467              Note  that  it is the parent process, as returned by getppid(2),
468              which  is  signaled  when  the  child  terminates,  so  that  if
469              CLONE_PARENT  is  set,  then  the parent of the calling process,
470              rather than the calling process itself, is signaled.
471
472              The CLONE_PARENT flag can't be used in clone calls by the global
473              init  process (PID 1 in the initial PID namespace) and init pro‐
474              cesses in other PID namespaces.  This restriction  prevents  the
475              creation  of  multi-rooted process trees as well as the creation
476              of unreapable zombies in the initial PID namespace.
477
478       CLONE_PARENT_SETTID (since Linux 2.5.49)
479              Store the child thread ID at the location  pointed  to  by  par‐
480              ent_tid  (clone())  or cl_args.parent_tid (clone3()) in the par‐
481              ent's  memory.   (In  Linux  2.5.32-2.5.48  there  was  a   flag
482              CLONE_SETTID  that did this.)  The store operation completes be‐
483              fore the clone call returns control to user space.
484
485       CLONE_PID (Linux 2.0 to Linux 2.5.15)
486              If CLONE_PID is set, the child process is created with the  same
487              process ID as the calling process.  This is good for hacking the
488              system, but otherwise of not much use.  From  Linux  2.3.21  on‐
489              ward,  this  flag  could  be  specified  only by the system boot
490              process (PID 0).  The flag disappeared completely from the  ker‐
491              nel  sources in Linux 2.5.16.  Subsequently, the kernel silently
492              ignored this bit if it was specified in the  flags  mask.   Much
493              later,  the  same  bit  was  recycled for use as the CLONE_PIDFD
494              flag.
495
496       CLONE_PIDFD (since Linux 5.2)
497              If this flag is specified, a PID file  descriptor  referring  to
498              the  child  process is allocated and placed at a specified loca‐
499              tion in the parent's memory.  The close-on-exec flag is  set  on
500              this  new file descriptor.  PID file descriptors can be used for
501              the purposes described in pidfd_open(2).
502
503              •  When using clone3(), the PID file descriptor is placed at the
504                 location pointed to by cl_args.pidfd.
505
506              •  When  using clone(), the PID file descriptor is placed at the
507                 location pointed to by parent_tid.  Since the parent_tid  ar‐
508                 gument is used to return the PID file descriptor, CLONE_PIDFD
509                 cannot be used with CLONE_PARENT_SETTID when calling clone().
510
511              It is currently not possible to  use  this  flag  together  with
512              CLONE_THREAD.  This means that the process identified by the PID
513              file descriptor will always be a thread group leader.
514
515              If the  obsolete  CLONE_DETACHED  flag  is  specified  alongside
516              CLONE_PIDFD  when calling clone(), an error is returned.  An er‐
517              ror also results if CLONE_DETACHED  is  specified  when  calling
518              clone3().   This error behavior ensures that the bit correspond‐
519              ing to CLONE_DETACHED can be reused for  further  PID  file  de‐
520              scriptor features in the future.
521
522       CLONE_PTRACE (since Linux 2.2)
523              If  CLONE_PTRACE  is specified, and the calling process is being
524              traced, then trace the child also (see ptrace(2)).
525
526       CLONE_SETTLS (since Linux 2.5.32)
527              The TLS (Thread Local Storage) descriptor is set to tls.
528
529              The interpretation of tls and the resulting effect is  architec‐
530              ture  dependent.   On  x86,  tls  is  interpreted  as  a  struct
531              user_desc * (see set_thread_area(2)).  On x86-64 it is  the  new
532              value  to  be set for the %fs base register (see the ARCH_SET_FS
533              argument to arch_prctl(2)).  On architectures with  a  dedicated
534              TLS register, it is the new value of that register.
535
536              Use  of  this  flag requires detailed knowledge and generally it
537              should not be used except in libraries implementing threading.
538
539       CLONE_SIGHAND (since Linux 2.0)
540              If CLONE_SIGHAND is set,  the  calling  process  and  the  child
541              process share the same table of signal handlers.  If the calling
542              process or child process calls sigaction(2) to change the behav‐
543              ior  associated  with  a  signal, the behavior is changed in the
544              other process as well.  However, the calling process  and  child
545              processes  still  have distinct signal masks and sets of pending
546              signals.  So, one of them may block  or  unblock  signals  using
547              sigprocmask(2) without affecting the other process.
548
549              If  CLONE_SIGHAND  is not set, the child process inherits a copy
550              of the signal handlers of the calling process at the time of the
551              clone call.  Calls to sigaction(2) performed later by one of the
552              processes have no effect on the other process.
553
554              Since Linux 2.6.0, the flags mask must also include CLONE_VM  if
555              CLONE_SIGHAND is specified.
556
557       CLONE_STOPPED (since Linux 2.6.0)
558              If CLONE_STOPPED is set, then the child is initially stopped (as
559              though it was sent a SIGSTOP signal), and  must  be  resumed  by
560              sending it a SIGCONT signal.
561
562              This  flag  was deprecated from Linux 2.6.25 onward, and was re‐
563              moved altogether  in  Linux  2.6.38.   Since  then,  the  kernel
564              silently ignores it without error.  Starting with Linux 4.6, the
565              same bit was reused for the CLONE_NEWCGROUP flag.
566
567       CLONE_SYSVSEM (since Linux 2.5.10)
568              If CLONE_SYSVSEM is set, then the child and the calling  process
569              share  a  single  list of System V semaphore adjustment (semadj)
570              values (see semop(2)).  In this case, the  shared  list  accumu‐
571              lates  semadj  values across all processes sharing the list, and
572              semaphore adjustments are performed only when the  last  process
573              that  is sharing the list terminates (or ceases sharing the list
574              using unshare(2)).  If this flag is not set, then the child  has
575              a separate semadj list that is initially empty.
576
577       CLONE_THREAD (since Linux 2.4.0)
578              If  CLONE_THREAD  is set, the child is placed in the same thread
579              group as the calling process.  To make the remainder of the dis‐
580              cussion of CLONE_THREAD more readable, the term "thread" is used
581              to refer to the processes within a thread group.
582
583              Thread groups were a feature added in Linux 2.4 to  support  the
584              POSIX  threads  notion  of  a set of threads that share a single
585              PID.  Internally, this shared PID is the so-called thread  group
586              identifier  (TGID) for the thread group.  Since Linux 2.4, calls
587              to getpid(2) return the TGID of the caller.
588
589              The threads within a group can be distinguished by  their  (sys‐
590              tem-wide) unique thread IDs (TID).  A new thread's TID is avail‐
591              able as the function result returned to the caller, and a thread
592              can obtain its own TID using gettid(2).
593
594              When  a clone call is made without specifying CLONE_THREAD, then
595              the resulting thread is placed in a new thread group whose  TGID
596              is  the  same as the thread's TID.  This thread is the leader of
597              the new thread group.
598
599              A new thread created  with  CLONE_THREAD  has  the  same  parent
600              process  as  the  process  that  made the clone call (i.e., like
601              CLONE_PARENT), so that calls to getppid(2) return the same value
602              for  all  of the threads in a thread group.  When a CLONE_THREAD
603              thread terminates, the thread that created  it  is  not  sent  a
604              SIGCHLD  (or  other  termination)  signal; nor can the status of
605              such a thread be obtained using wait(2).  (The thread is said to
606              be detached.)
607
608              After  all of the threads in a thread group terminate the parent
609              process of the thread group is sent a SIGCHLD (or other termina‐
610              tion) signal.
611
612              If  any  of the threads in a thread group performs an execve(2),
613              then all threads other than the thread group leader  are  termi‐
614              nated,  and  the  new  program  is  executed in the thread group
615              leader.
616
617              If one of the threads in a thread group creates  a  child  using
618              fork(2),  then  any  thread  in  the  group can wait(2) for that
619              child.
620
621              Since Linux 2.5.35, the flags mask must also include  CLONE_SIG‐
622              HAND  if  CLONE_THREAD  is specified (and note that, since Linux
623              2.6.0, CLONE_SIGHAND also requires CLONE_VM to be included).
624
625              Signal dispositions and actions are process-wide: if  an  unhan‐
626              dled  signal is delivered to a thread, then it will affect (ter‐
627              minate, stop, continue, be ignored in) all members of the thread
628              group.
629
630              Each thread has its own signal mask, as set by sigprocmask(2).
631
632              A signal may be process-directed or thread-directed.  A process-
633              directed signal is targeted at a thread group  (i.e.,  a  TGID),
634              and  is  delivered  to an arbitrarily selected thread from among
635              those that are  not  blocking  the  signal.   A  signal  may  be
636              process-directed because it was generated by the kernel for rea‐
637              sons other than a hardware exception, or because it was sent us‐
638              ing  kill(2)  or  sigqueue(3).  A thread-directed signal is tar‐
639              geted at (i.e., delivered to) a specific thread.  A  signal  may
640              be  thread  directed  because  it  was  sent  using tgkill(2) or
641              pthread_sigqueue(3), or because the thread  executed  a  machine
642              language  instruction that triggered a hardware exception (e.g.,
643              invalid memory access triggering SIGSEGV or a floating-point ex‐
644              ception triggering SIGFPE).
645
646              A  call  to sigpending(2) returns a signal set that is the union
647              of the pending process-directed signals and the signals that are
648              pending for the calling thread.
649
650              If a process-directed signal is delivered to a thread group, and
651              the thread group has installed a handler for  the  signal,  then
652              the handler is invoked in exactly one, arbitrarily selected mem‐
653              ber of the thread group that has not  blocked  the  signal.   If
654              multiple  threads in a group are waiting to accept the same sig‐
655              nal using sigwaitinfo(2), the kernel will arbitrarily select one
656              of these threads to receive the signal.
657
658       CLONE_UNTRACED (since Linux 2.5.46)
659              If  CLONE_UNTRACED  is  specified, then a tracing process cannot
660              force CLONE_PTRACE on this child process.
661
662       CLONE_VFORK (since Linux 2.2)
663              If CLONE_VFORK is set, the execution of the calling  process  is
664              suspended  until the child releases its virtual memory resources
665              via a call to execve(2) or _exit(2) (as with vfork(2)).
666
667              If CLONE_VFORK is not set, then both the calling process and the
668              child  are schedulable after the call, and an application should
669              not rely on execution occurring in any particular order.
670
671       CLONE_VM (since Linux 2.0)
672              If CLONE_VM is set, the calling process and  the  child  process
673              run in the same memory space.  In particular, memory writes per‐
674              formed by the calling process or by the child process  are  also
675              visible  in  the other process.  Moreover, any memory mapping or
676              unmapping performed with mmap(2) or munmap(2) by  the  child  or
677              calling process also affects the other process.
678
679              If  CLONE_VM  is  not  set, the child process runs in a separate
680              copy of the memory space of the calling process at the  time  of
681              the  clone call.  Memory writes or file mappings/unmappings per‐
682              formed by one of the processes do not affect the other, as  with
683              fork(2).
684
685              If  the  CLONE_VM  flag is specified and the CLONE_VFORK flag is
686              not specified, then any alternate signal stack that  was  estab‐
687              lished by sigaltstack(2) is cleared in the child process.
688

RETURN VALUE

690       On  success,  the  thread  ID  of  the child process is returned in the
691       caller's thread of execution.   On  failure,  -1  is  returned  in  the
692       caller's  context, no child process is created, and errno is set to in‐
693       dicate the error.
694

ERRORS

696       EACCES (clone3() only)
697              CLONE_INTO_CGROUP was specified in cl_args.flags,  but  the  re‐
698              strictions  (described  in  cgroups(7))  on  placing  the  child
699              process into the version 2 cgroup referred to by  cl_args.cgroup
700              are not met.
701
702       EAGAIN Too many processes are already running; see fork(2).
703
704       EBUSY (clone3() only)
705              CLONE_INTO_CGROUP  was  specified in cl_args.flags, but the file
706              descriptor specified in cl_args.cgroup refers  to  a  version  2
707              cgroup in which a domain controller is enabled.
708
709       EEXIST (clone3() only)
710              One (or more) of the PIDs specified in set_tid already exists in
711              the corresponding PID namespace.
712
713       EINVAL Both CLONE_SIGHAND and CLONE_CLEAR_SIGHAND were specified in the
714              flags mask.
715
716       EINVAL CLONE_SIGHAND  was specified in the flags mask, but CLONE_VM was
717              not.  (Since Linux 2.6.0.)
718
719       EINVAL CLONE_THREAD was specified in the flags mask, but  CLONE_SIGHAND
720              was not.  (Since Linux 2.5.35.)
721
722       EINVAL CLONE_THREAD  was  specified  in the flags mask, but the current
723              process previously called unshare(2) with the CLONE_NEWPID  flag
724              or used setns(2) to reassociate itself with a PID namespace.
725
726       EINVAL Both CLONE_FS and CLONE_NEWNS were specified in the flags mask.
727
728       EINVAL (since Linux 3.9)
729              Both  CLONE_NEWUSER  and  CLONE_FS  were  specified in the flags
730              mask.
731
732       EINVAL Both CLONE_NEWIPC and CLONE_SYSVSEM were specified in the  flags
733              mask.
734
735       EINVAL One (or both) of CLONE_NEWPID or CLONE_NEWUSER and one (or both)
736              of CLONE_THREAD or CLONE_PARENT  were  specified  in  the  flags
737              mask.
738
739       EINVAL (since Linux 2.6.32)
740              CLONE_PARENT was specified, and the caller is an init process.
741
742       EINVAL Returned  by the glibc clone() wrapper function when fn or stack
743              is specified as NULL.
744
745       EINVAL CLONE_NEWIPC was specified in the flags mask, but the kernel was
746              not  configured  with  the  CONFIG_SYSVIPC and CONFIG_IPC_NS op‐
747              tions.
748
749       EINVAL CLONE_NEWNET was specified in the flags mask, but the kernel was
750              not configured with the CONFIG_NET_NS option.
751
752       EINVAL CLONE_NEWPID was specified in the flags mask, but the kernel was
753              not configured with the CONFIG_PID_NS option.
754
755       EINVAL CLONE_NEWUSER was specified in the flags mask,  but  the  kernel
756              was not configured with the CONFIG_USER_NS option.
757
758       EINVAL CLONE_NEWUTS was specified in the flags mask, but the kernel was
759              not configured with the CONFIG_UTS_NS option.
760
761       EINVAL stack is not aligned to a suitable boundary for  this  architec‐
762              ture.  For example, on aarch64, stack must be a multiple of 16.
763
764       EINVAL (clone3() only)
765              CLONE_DETACHED was specified in the flags mask.
766
767       EINVAL (clone() only)
768              CLONE_PIDFD  was  specified  together with CLONE_DETACHED in the
769              flags mask.
770
771       EINVAL CLONE_PIDFD was specified  together  with  CLONE_THREAD  in  the
772              flags mask.
773
774       EINVAL (clone() only)
775              CLONE_PIDFD  was  specified together with CLONE_PARENT_SETTID in
776              the flags mask.
777
778       EINVAL (clone3() only)
779              set_tid_size is greater than the  number  of  nested  PID  name‐
780              spaces.
781
782       EINVAL (clone3() only)
783              One of the PIDs specified in set_tid was an invalid.
784
785       EINVAL (clone3() only)
786              CLONE_THREAD  or  CLONE_PARENT  was specified in the flags mask,
787              but a signal was specified in exit_signal.
788
789       EINVAL (AArch64 only, Linux 4.6 and earlier)
790              stack was not aligned to a 128-bit boundary.
791
792       ENOMEM Cannot allocate sufficient memory to allocate a  task  structure
793              for  the  child,  or to copy those parts of the caller's context
794              that need to be copied.
795
796       ENOSPC (since Linux 3.7)
797              CLONE_NEWPID was specified in the flags mask, but the  limit  on
798              the  nesting  depth  of PID namespaces would have been exceeded;
799              see pid_namespaces(7).
800
801       ENOSPC (since Linux 4.9; beforehand EUSERS)
802              CLONE_NEWUSER was specified in the  flags  mask,  and  the  call
803              would cause the limit on the number of nested user namespaces to
804              be exceeded.  See user_namespaces(7).
805
806              From Linux 3.11 to Linux 4.8, the error diagnosed in  this  case
807              was EUSERS.
808
809       ENOSPC (since Linux 4.9)
810              One  of the values in the flags mask specified the creation of a
811              new user namespace, but doing so would have caused the limit de‐
812              fined  by  the  corresponding  file  in /proc/sys/user to be ex‐
813              ceeded.  For further details, see namespaces(7).
814
815       EOPNOTSUPP (clone3() only)
816              CLONE_INTO_CGROUP was specified in cl_args.flags, but  the  file
817              descriptor  specified  in  cl_args.cgroup  refers to a version 2
818              cgroup that is in the domain invalid state.
819
820       EPERM  CLONE_NEWCGROUP,   CLONE_NEWIPC,   CLONE_NEWNET,    CLONE_NEWNS,
821              CLONE_NEWPID,  or  CLONE_NEWUTS was specified by an unprivileged
822              process (process without CAP_SYS_ADMIN).
823
824       EPERM  CLONE_PID was specified by  a  process  other  than  process  0.
825              (This error occurs only on Linux 2.5.15 and earlier.)
826
827       EPERM  CLONE_NEWUSER  was  specified  in the flags mask, but either the
828              effective user ID or the effective group ID of the  caller  does
829              not  have  a  mapping  in  the  parent namespace (see user_name‐
830              spaces(7)).
831
832       EPERM (since Linux 3.9)
833              CLONE_NEWUSER was specified in the flags mask and the caller  is
834              in  a chroot environment (i.e., the caller's root directory does
835              not match the root directory of the mount namespace in which  it
836              resides).
837
838       EPERM (clone3() only)
839              set_tid_size  was  greater  than  zero, and the caller lacks the
840              CAP_SYS_ADMIN capability in one or more of the  user  namespaces
841              that own the corresponding PID namespaces.
842
843       ERESTARTNOINTR (since Linux 2.6.17)
844              System  call  was interrupted by a signal and will be restarted.
845              (This can be seen only during a trace.)
846
847       EUSERS (Linux 3.11 to Linux 4.8)
848              CLONE_NEWUSER was specified in the flags mask, and the limit  on
849              the number of nested user namespaces would be exceeded.  See the
850              discussion of the ENOSPC error above.
851

VERSIONS

853       The glibc clone() wrapper function makes some  changes  in  the  memory
854       pointed to by stack (changes required to set the stack up correctly for
855       the child) before invoking the clone() system call.  So, in cases where
856       clone()  is  used to recursively create children, do not use the buffer
857       employed for the parent's stack as the stack of the child.
858
859       On i386, clone() should not be called through  vsyscall,  but  directly
860       through int $0x80.
861
862   C library/kernel differences
863       The raw clone() system call corresponds more closely to fork(2) in that
864       execution in the child continues from the point of the call.  As  such,
865       the fn and arg arguments of the clone() wrapper function are omitted.
866
867       In  contrast  to the glibc wrapper, the raw clone() system call accepts
868       NULL as a stack argument (and clone3() likewise allows cl_args.stack to
869       be  NULL).   In  this  case, the child uses a duplicate of the parent's
870       stack.  (Copy-on-write semantics ensure that the  child  gets  separate
871       copies of stack pages when either process modifies the stack.)  In this
872       case, for correct operation, the CLONE_VM option should not  be  speci‐
873       fied.   (If  the child shares the parent's memory because of the use of
874       the CLONE_VM flag, then no copy-on-write duplication occurs  and  chaos
875       is likely to result.)
876
877       The  order  of  the  arguments also differs in the raw system call, and
878       there are variations in the arguments across architectures, as detailed
879       in the following paragraphs.
880
881       The  raw  system  call interface on x86-64 and some other architectures
882       (including sh, tile, and alpha) is:
883
884           long clone(unsigned long flags, void *stack,
885                      int *parent_tid, int *child_tid,
886                      unsigned long tls);
887
888       On x86-32, and several other  common  architectures  (including  score,
889       ARM,  ARM  64,  PA-RISC, arc, Power PC, xtensa, and MIPS), the order of
890       the last two arguments is reversed:
891
892           long clone(unsigned long flags, void *stack,
893                     int *parent_tid, unsigned long tls,
894                     int *child_tid);
895
896       On the cris and s390 architectures, the order of the  first  two  argu‐
897       ments is reversed:
898
899           long clone(void *stack, unsigned long flags,
900                      int *parent_tid, int *child_tid,
901                      unsigned long tls);
902
903       On the microblaze architecture, an additional argument is supplied:
904
905           long clone(unsigned long flags, void *stack,
906                      int stack_size,         /* Size of stack */
907                      int *parent_tid, int *child_tid,
908                      unsigned long tls);
909
910   blackfin, m68k, and sparc
911       The  argument-passing conventions on blackfin, m68k, and sparc are dif‐
912       ferent from the descriptions above.  For details, see the  kernel  (and
913       glibc) source.
914
915   ia64
916       On ia64, a different interface is used:
917
918           int __clone2(int (*fn)(void *),
919                        void *stack_base, size_t stack_size,
920                        int flags, void *arg, ...
921                     /* pid_t *parent_tid, struct user_desc *tls,
922                        pid_t *child_tid */ );
923
924       The  prototype  shown  above is for the glibc wrapper function; for the
925       system call itself, the prototype can be described as  follows  (it  is
926       identical to the clone() prototype on microblaze):
927
928           long clone2(unsigned long flags, void *stack_base,
929                       int stack_size,         /* Size of stack */
930                       int *parent_tid, int *child_tid,
931                       unsigned long tls);
932
933       __clone2()  operates in the same way as clone(), except that stack_base
934       points to the lowest address of the child's stack area, and  stack_size
935       specifies the size of the stack pointed to by stack_base.
936

STANDARDS

938       Linux.
939

HISTORY

941       clone3()
942              Linux 5.3.
943
944   Linux 2.4 and earlier
945       In  the  Linux  2.4.x  series, CLONE_THREAD generally does not make the
946       parent of the new thread the same as the parent of the calling process.
947       However, from Linux 2.4.7 to Linux 2.4.18 the CLONE_THREAD flag implied
948       the CLONE_PARENT flag (as in Linux 2.6.0 and later).
949
950       In Linux 2.4 and earlier, clone() does not take  arguments  parent_tid,
951       tls, and child_tid.
952

NOTES

954       One  use of these systems calls is to implement threads: multiple flows
955       of control in a program that  run  concurrently  in  a  shared  address
956       space.
957
958       The kcmp(2) system call can be used to test whether two processes share
959       various resources such as a file descriptor table, System  V  semaphore
960       undo operations, or a virtual address space.
961
962       Handlers  registered  using pthread_atfork(3) are not executed during a
963       clone call.
964

BUGS

966       GNU C library versions 2.3.4 up to and including 2.24 contained a wrap‐
967       per  function  for  getpid(2)  that  performed  caching  of PIDs.  This
968       caching relied on support in the glibc wrapper for clone(), but limita‐
969       tions  in the implementation meant that the cache was not up to date in
970       some circumstances.  In particular, if a signal was  delivered  to  the
971       child immediately after the clone() call, then a call to getpid(2) in a
972       handler for the signal could return the  PID  of  the  calling  process
973       ("the parent"), if the clone wrapper had not yet had a chance to update
974       the PID cache in the child.  (This discussion ignores  the  case  where
975       the  child was created using CLONE_THREAD, when getpid(2) should return
976       the same value in the child and in the  process  that  called  clone(),
977       since  the  caller  and  the  child  are in the same thread group.  The
978       stale-cache problem also does not occur if the flags argument  includes
979       CLONE_VM.)   To  get  the truth, it was sometimes necessary to use code
980       such as the following:
981
982           #include <syscall.h>
983
984           pid_t mypid;
985
986           mypid = syscall(SYS_getpid);
987
988       Because of the stale-cache problem, as well as other problems noted  in
989       getpid(2), the PID caching feature was removed in glibc 2.25.
990

EXAMPLES

992       The following program demonstrates the use of clone() to create a child
993       process that executes in a separate UTS namespace.  The  child  changes
994       the  hostname in its UTS namespace.  Both parent and child then display
995       the system hostname, making it possible to see that the  hostname  dif‐
996       fers  in the UTS namespaces of the parent and child.  For an example of
997       the use of this program, see setns(2).
998
999       Within the sample program, we allocate the memory that is  to  be  used
1000       for  the child's stack using mmap(2) rather than malloc(3) for the fol‐
1001       lowing reasons:
1002
1003mmap(2) allocates a block of memory that starts on a  page  boundary
1004          and  is  a  multiple of the page size.  This is useful if we want to
1005          establish a guard page (a page with protection PROT_NONE) at the end
1006          of the stack using mprotect(2).
1007
1008       •  We can specify the MAP_STACK flag to request a mapping that is suit‐
1009          able for a stack.  For the moment, this flag is a  no-op  on  Linux,
1010          but it exists and has effect on some other systems, so we should in‐
1011          clude it for portability.
1012
1013   Program source
1014       #define _GNU_SOURCE
1015       #include <err.h>
1016       #include <sched.h>
1017       #include <signal.h>
1018       #include <stdint.h>
1019       #include <stdio.h>
1020       #include <stdlib.h>
1021       #include <string.h>
1022       #include <sys/mman.h>
1023       #include <sys/utsname.h>
1024       #include <sys/wait.h>
1025       #include <unistd.h>
1026
1027       static int              /* Start function for cloned child */
1028       childFunc(void *arg)
1029       {
1030           struct utsname uts;
1031
1032           /* Change hostname in UTS namespace of child. */
1033
1034           if (sethostname(arg, strlen(arg)) == -1)
1035               err(EXIT_FAILURE, "sethostname");
1036
1037           /* Retrieve and display hostname. */
1038
1039           if (uname(&uts) == -1)
1040               err(EXIT_FAILURE, "uname");
1041           printf("uts.nodename in child:  %s\n", uts.nodename);
1042
1043           /* Keep the namespace open for a while, by sleeping.
1044              This allows some experimentation--for example, another
1045              process might join the namespace. */
1046
1047           sleep(200);
1048
1049           return 0;           /* Child terminates now */
1050       }
1051
1052       #define STACK_SIZE (1024 * 1024)    /* Stack size for cloned child */
1053
1054       int
1055       main(int argc, char *argv[])
1056       {
1057           char            *stack;         /* Start of stack buffer */
1058           char            *stackTop;      /* End of stack buffer */
1059           pid_t           pid;
1060           struct utsname  uts;
1061
1062           if (argc < 2) {
1063               fprintf(stderr, "Usage: %s <child-hostname>\n", argv[0]);
1064               exit(EXIT_SUCCESS);
1065           }
1066
1067           /* Allocate memory to be used for the stack of the child. */
1068
1069           stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
1070                        MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
1071           if (stack == MAP_FAILED)
1072               err(EXIT_FAILURE, "mmap");
1073
1074           stackTop = stack + STACK_SIZE;  /* Assume stack grows downward */
1075
1076           /* Create child that has its own UTS namespace;
1077              child commences execution in childFunc(). */
1078
1079           pid = clone(childFunc, stackTop, CLONE_NEWUTS | SIGCHLD, argv[1]);
1080           if (pid == -1)
1081               err(EXIT_FAILURE, "clone");
1082           printf("clone() returned %jd\n", (intmax_t) pid);
1083
1084           /* Parent falls through to here */
1085
1086           sleep(1);           /* Give child time to change its hostname */
1087
1088           /* Display hostname in parent's UTS namespace. This will be
1089              different from hostname in child's UTS namespace. */
1090
1091           if (uname(&uts) == -1)
1092               err(EXIT_FAILURE, "uname");
1093           printf("uts.nodename in parent: %s\n", uts.nodename);
1094
1095           if (waitpid(pid, NULL, 0) == -1)    /* Wait for child */
1096               err(EXIT_FAILURE, "waitpid");
1097           printf("child has terminated\n");
1098
1099           exit(EXIT_SUCCESS);
1100       }
1101

SEE ALSO

1103       fork(2),   futex(2),   getpid(2),    gettid(2),    kcmp(2),    mmap(2),
1104       pidfd_open(2),    set_thread_area(2),   set_tid_address(2),   setns(2),
1105       tkill(2),   unshare(2),   wait(2),   capabilities(7),    namespaces(7),
1106       pthreads(7)
1107
1108
1109
1110Linux man-pages 6.05              2023-05-03                          clone(2)
Impressum