1CLONE(2) Linux Programmer's Manual CLONE(2)
2
3
4
6 clone, __clone2, clone3 - create a child process
7
9 /* Prototype for the glibc wrapper function */
10
11 #define _GNU_SOURCE
12 #include <sched.h>
13
14 int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...
15 /* pid_t *parent_tid, void *tls, pid_t *child_tid */ );
16
17 /* For the prototype of the raw clone() system call, see NOTES */
18
19 long clone3(struct clone_args *cl_args, size_t size);
20
21 Note: There is not yet a glibc wrapper for clone3(); see NOTES.
22
24 These system calls create a new ("child") process, in a manner similar
25 to fork(2).
26
27 By contrast with fork(2), these system calls provide more precise con‐
28 trol over what pieces of execution context are shared between the call‐
29 ing process and the child process. For example, using these system
30 calls, the caller can control whether or not the two processes share
31 the virtual address space, the table of file descriptors, and the table
32 of signal handlers. These system calls also allow the new child
33 process to be placed in separate namespaces(7).
34
35 Note that in this manual page, "calling process" normally corresponds
36 to "parent process". But see the description of CLONE_PARENT below.
37
38 This page describes the following interfaces:
39
40 * The glibc clone() wrapper function and the underlying system call on
41 which it is based. The main text describes the wrapper function;
42 the differences for the raw system call are described toward the end
43 of this page.
44
45 * The newer clone3() system call.
46
47 In the remainder of this page, the terminology "the clone call" is used
48 when noting details that apply to all of these interfaces,
49
50 The clone() wrapper function
51 When the child process is created with the clone() wrapper function, it
52 commences execution by calling the function pointed to by the argument
53 fn. (This differs from fork(2), where execution continues in the child
54 from the point of the fork(2) call.) The arg argument is passed as the
55 argument of the function fn.
56
57 When the fn(arg) function returns, the child process terminates. The
58 integer returned by fn is the exit status for the child process. The
59 child process may also terminate explicitly by calling exit(2) or after
60 receiving a fatal signal.
61
62 The stack argument specifies the location of the stack used by the
63 child process. Since the child and calling process may share memory,
64 it is not possible for the child process to execute in the same stack
65 as the calling process. The calling process must therefore set up mem‐
66 ory space for the child stack and pass a pointer to this space to
67 clone(). Stacks grow downward on all processors that run Linux (except
68 the HP PA processors), so stack usually points to the topmost address
69 of the memory space set up for the child stack. Note that clone() does
70 not provide a means whereby the caller can inform the kernel of the
71 size of the stack area.
72
73 The remaining arguments to clone() are discussed below.
74
75 clone3()
76 The clone3() system call provides a superset of the functionality of
77 the older clone() interface. It also provides a number of API improve‐
78 ments, including: space for additional flags bits; cleaner separation
79 in the use of various arguments; and the ability to specify the size of
80 the child's stack area.
81
82 As with fork(2), clone3() returns in both the parent and the child. It
83 returns 0 in the child process and returns the PID of the child in the
84 parent.
85
86 The cl_args argument of clone3() is a structure of the following form:
87
88 struct clone_args {
89 u64 flags; /* Flags bit mask */
90 u64 pidfd; /* Where to store PID file descriptor
91 (pid_t *) */
92 u64 child_tid; /* Where to store child TID,
93 in child's memory (pid_t *) */
94 u64 parent_tid; /* Where to store child TID,
95 in parent's memory (int *) */
96 u64 exit_signal; /* Signal to deliver to parent on
97 child termination */
98 u64 stack; /* Pointer to lowest byte of stack */
99 u64 stack_size; /* Size of stack */
100 u64 tls; /* Location of new TLS */
101 };
102
103 The size argument that is supplied to clone3() should be initialized to
104 the size of this structure. (The existence of the size argument per‐
105 mits future extensions to the clone_args structure.)
106
107 The stack for the child process is specified via cl_args.stack, which
108 points to the lowest byte of the stack area, and cl_args.stack_size,
109 which specifies the size of the stack in bytes. In the case where the
110 CLONE_VM flag (see below) is specified, a stack must be explicitly
111 allocated and specified. Otherwise, these two fields can be specified
112 as NULL and 0, which causes the child to use the same stack area as the
113 parent (in the child's own virtual address space).
114
115 The remaining fields in the cl_args argument are discussed below.
116
117 Equivalence between clone() and clone3() arguments
118 Unlike the older clone() interface, where arguments are passed individ‐
119 ually, in the newer clone3() interface the arguments are packaged into
120 the clone_args structure shown above. This structure allows for a
121 superset of the information passed via the clone() arguments.
122
123 The following table shows the equivalence between the arguments of
124 clone() and the fields in the clone_args argument supplied to clone3():
125
126 clone() clone(3) Notes
127 cl_args field
128 flags & ~0xff flags For most flags; details below
129 parent_tid pidfd See CLONE_PIDFD
130 child_tid child_tid See CLONE_CHILD_SETTID
131 parent_tid parent_tid See CLONE_PARENT_SETTID
132
133 flags & 0xff exit_signal
134 stack stack
135 --- stack_size
136 tls tls See CLONE_SETTLS
137
138 The child termination signal
139 When the child process terminates, a signal may be sent to the parent.
140 The termination signal is specified in the low byte of flags (clone())
141 or in cl_args.exit_signal (clone3()). If this signal is specified as
142 anything other than SIGCHLD, then the parent process must specify the
143 __WALL or __WCLONE options when waiting for the child with wait(2). If
144 no signal (i.e., zero) is specified, then the parent process is not
145 signaled when the child terminates.
146
147 The flags mask
148 Both clone() and clone3() allow a flags bit mask that modifies their
149 behavior and allows the caller to specify what is shared between the
150 calling process and the child process. This bit mask—the flags argu‐
151 ment of clone() or the cl_args.flags field passed to clone3()—is
152 referred to as the flags mask in the remainder of this page.
153
154 The flags mask is specified as a bitwise-OR of zero or more of the con‐
155 stants listed below. Except as noted below, these flags are available
156 (and have the same effect) in both clone() and clone3().
157
158 CLONE_CHILD_CLEARTID (since Linux 2.5.49)
159 Clear (zero) the child thread ID at the location pointed to by
160 child_tid (clone()) or cl_args.child_tid (clone3()) in child
161 memory when the child exits, and do a wakeup on the futex at
162 that address. The address involved may be changed by the
163 set_tid_address(2) system call. This is used by threading
164 libraries.
165
166 CLONE_CHILD_SETTID (since Linux 2.5.49)
167 Store the child thread ID at the location pointed to by
168 child_tid (clone()) or cl_args.child_tid (clone3()) in the
169 child's memory. The store operation completes before the clone
170 call returns control to user space in the child process. (Note
171 that the store operation may not have completed before the clone
172 call returns in the parent process, which will be relevant if
173 the CLONE_VM flag is also employed.)
174
175 CLONE_DETACHED (historical)
176 For a while (during the Linux 2.5 development series) there was
177 a CLONE_DETACHED flag, which caused the parent not to receive a
178 signal when the child terminated. Ultimately, the effect of
179 this flag was subsumed under the CLONE_THREAD flag and by the
180 time Linux 2.6.0 was released, this flag had no effect. Start‐
181 ing in Linux 2.6.2, the need to give this flag together with
182 CLONE_THREAD disappeared.
183
184 This flag is still defined, but it is usually ignored when call‐
185 ing clone(). However, see the description of CLONE_PIDFD for
186 some exceptions.
187
188 CLONE_FILES (since Linux 2.0)
189 If CLONE_FILES is set, the calling process and the child process
190 share the same file descriptor table. Any file descriptor cre‐
191 ated by the calling process or by the child process is also
192 valid in the other process. Similarly, if one of the processes
193 closes a file descriptor, or changes its associated flags (using
194 the fcntl(2) F_SETFD operation), the other process is also
195 affected. If a process sharing a file descriptor table calls
196 execve(2), its file descriptor table is duplicated (unshared).
197
198 If CLONE_FILES is not set, the child process inherits a copy of
199 all file descriptors opened in the calling process at the time
200 of the clone call. Subsequent operations that open or close
201 file descriptors, or change file descriptor flags, performed by
202 either the calling process or the child process do not affect
203 the other process. Note, however, that the duplicated file
204 descriptors in the child refer to the same open file descrip‐
205 tions as the corresponding file descriptors in the calling
206 process, and thus share file offsets and file status flags (see
207 open(2)).
208
209 CLONE_FS (since Linux 2.0)
210 If CLONE_FS is set, the caller and the child process share the
211 same filesystem information. This includes the root of the
212 filesystem, the current working directory, and the umask. Any
213 call to chroot(2), chdir(2), or umask(2) performed by the call‐
214 ing process or the child process also affects the other process.
215
216 If CLONE_FS is not set, the child process works on a copy of the
217 filesystem information of the calling process at the time of the
218 clone call. Calls to chroot(2), chdir(2), or umask(2) performed
219 later by one of the processes do not affect the other process.
220
221 CLONE_IO (since Linux 2.6.25)
222 If CLONE_IO is set, then the new process shares an I/O context
223 with the calling process. If this flag is not set, then (as
224 with fork(2)) the new process has its own I/O context.
225
226 The I/O context is the I/O scope of the disk scheduler (i.e.,
227 what the I/O scheduler uses to model scheduling of a process's
228 I/O). If processes share the same I/O context, they are treated
229 as one by the I/O scheduler. As a consequence, they get to
230 share disk time. For some I/O schedulers, if two processes
231 share an I/O context, they will be allowed to interleave their
232 disk access. If several threads are doing I/O on behalf of the
233 same process (aio_read(3), for instance), they should employ
234 CLONE_IO to get better I/O performance.
235
236 If the kernel is not configured with the CONFIG_BLOCK option,
237 this flag is a no-op.
238
239 CLONE_NEWCGROUP (since Linux 4.6)
240 Create the process in a new cgroup namespace. If this flag is
241 not set, then (as with fork(2)) the process is created in the
242 same cgroup namespaces as the calling process.
243
244 For further information on cgroup namespaces, see cgroup_names‐
245 paces(7).
246
247 Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWC‐
248 GROUP.
249
250 CLONE_NEWIPC (since Linux 2.6.19)
251 If CLONE_NEWIPC is set, then create the process in a new IPC
252 namespace. If this flag is not set, then (as with fork(2)), the
253 process is created in the same IPC namespace as the calling
254 process.
255
256 For further information on IPC namespaces, see ipc_names‐
257 paces(7).
258
259 Only a privileged process (CAP_SYS_ADMIN) can employ
260 CLONE_NEWIPC. This flag can't be specified in conjunction with
261 CLONE_SYSVSEM.
262
263 CLONE_NEWNET (since Linux 2.6.24)
264 (The implementation of this flag was completed only by about
265 kernel version 2.6.29.)
266
267 If CLONE_NEWNET is set, then create the process in a new network
268 namespace. If this flag is not set, then (as with fork(2)) the
269 process is created in the same network namespace as the calling
270 process.
271
272 For further information on network namespaces, see net‐
273 work_namespaces(7).
274
275 Only a privileged process (CAP_SYS_ADMIN) can employ
276 CLONE_NEWNET.
277
278 CLONE_NEWNS (since Linux 2.4.19)
279 If CLONE_NEWNS is set, the cloned child is started in a new
280 mount namespace, initialized with a copy of the namespace of the
281 parent. If CLONE_NEWNS is not set, the child lives in the same
282 mount namespace as the parent.
283
284 For further information on mount namespaces, see namespaces(7)
285 and mount_namespaces(7).
286
287 Only a privileged process (CAP_SYS_ADMIN) can employ
288 CLONE_NEWNS. It is not permitted to specify both CLONE_NEWNS
289 and CLONE_FS in the same clone call.
290
291 CLONE_NEWPID (since Linux 2.6.24)
292 If CLONE_NEWPID is set, then create the process in a new PID
293 namespace. If this flag is not set, then (as with fork(2)) the
294 process is created in the same PID namespace as the calling
295 process.
296
297 For further information on PID namespaces, see namespaces(7) and
298 pid_namespaces(7).
299
300 Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEW‐
301 PID. This flag can't be specified in conjunction with
302 CLONE_THREAD or CLONE_PARENT.
303
304 CLONE_NEWUSER
305 (This flag first became meaningful for clone() in Linux 2.6.23,
306 the current clone() semantics were merged in Linux 3.5, and the
307 final pieces to make the user namespaces completely usable were
308 merged in Linux 3.8.)
309
310 If CLONE_NEWUSER is set, then create the process in a new user
311 namespace. If this flag is not set, then (as with fork(2)) the
312 process is created in the same user namespace as the calling
313 process.
314
315 For further information on user namespaces, see namespaces(7)
316 and user_namespaces(7).
317
318 Before Linux 3.8, use of CLONE_NEWUSER required that the caller
319 have three capabilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SET‐
320 GID. Starting with Linux 3.8, no privileges are needed to cre‐
321 ate a user namespace.
322
323 This flag can't be specified in conjunction with CLONE_THREAD or
324 CLONE_PARENT. For security reasons, CLONE_NEWUSER cannot be
325 specified in conjunction with CLONE_FS.
326
327 CLONE_NEWUTS (since Linux 2.6.19)
328 If CLONE_NEWUTS is set, then create the process in a new UTS
329 namespace, whose identifiers are initialized by duplicating the
330 identifiers from the UTS namespace of the calling process. If
331 this flag is not set, then (as with fork(2)) the process is cre‐
332 ated in the same UTS namespace as the calling process.
333
334 For further information on UTS namespaces, see uts_names‐
335 paces(7).
336
337 Only a privileged process (CAP_SYS_ADMIN) can employ
338 CLONE_NEWUTS.
339
340 CLONE_PARENT (since Linux 2.3.12)
341 If CLONE_PARENT is set, then the parent of the new child (as
342 returned by getppid(2)) will be the same as that of the calling
343 process.
344
345 If CLONE_PARENT is not set, then (as with fork(2)) the child's
346 parent is the calling process.
347
348 Note that it is the parent process, as returned by getppid(2),
349 which is signaled when the child terminates, so that if
350 CLONE_PARENT is set, then the parent of the calling process,
351 rather than the calling process itself, will be signaled.
352
353 CLONE_PARENT_SETTID (since Linux 2.5.49)
354 Store the child thread ID at the location pointed to by par‐
355 ent_tid (clone()) or cl_args.child_tid (clone3()) in the par‐
356 ent's memory. (In Linux 2.5.32-2.5.48 there was a flag
357 CLONE_SETTID that did this.) The store operation completes
358 before the clone call returns control to user space.
359
360 CLONE_PID (Linux 2.0 to 2.5.15)
361 If CLONE_PID is set, the child process is created with the same
362 process ID as the calling process. This is good for hacking the
363 system, but otherwise of not much use. From Linux 2.3.21
364 onward, this flag could be specified only by the system boot
365 process (PID 0). The flag disappeared completely from the ker‐
366 nel sources in Linux 2.5.16. Subsequently, the kernel silently
367 ignored this bit if it was specified in the flags mask. Much
368 later, the same bit was recycled for use as the CLONE_PIDFD
369 flag.
370
371 CLONE_PIDFD (since Linux 5.2)
372 If this flag is specified, a PID file descriptor referring to
373 the child process is allocated and placed at a specified loca‐
374 tion in the parent's memory. The close-on-exec flag is set on
375 this new file descriptor. PID file descriptors can be used for
376 the purposes described in pidfd_open(2).
377
378 * When using clone3(), the PID file descriptor is placed at the
379 location pointed to by cl_args.pidfd.
380
381 * When using clone(), the PID file descriptor is placed at the
382 location pointed to by parent_tid. Since the parent_tid
383 argument is used to return the PID file descriptor,
384 CLONE_PIDFD cannot be used with CLONE_PARENT_SETTID when
385 calling clone().
386
387 It is currently not possible to use this flag together with
388 CLONE_THREAD. This means that the process identified by the PID
389 file descriptor will always be a thread group leader.
390
391 If the obsolete CLONE_DETACHED flag is specified alongside
392 CLONE_PIDFD when calling clone(), an error is returned. An
393 error also results if CLONE_DETACHED is specified when calling
394 clone3(). This error behavior ensures that the bit correspond‐
395 ing to CLONE_DETACHED can be reused for further PID file
396 descriptor features in the future.
397
398 CLONE_PTRACE (since Linux 2.2)
399 If CLONE_PTRACE is specified, and the calling process is being
400 traced, then trace the child also (see ptrace(2)).
401
402 CLONE_SETTLS (since Linux 2.5.32)
403 The TLS (Thread Local Storage) descriptor is set to tls.
404
405 The interpretation of tls and the resulting effect is architec‐
406 ture dependent. On x86, tls is interpreted as a struct
407 user_desc * (see set_thread_area(2)). On x86-64 it is the new
408 value to be set for the %fs base register (see the ARCH_SET_FS
409 argument to arch_prctl(2)). On architectures with a dedicated
410 TLS register, it is the new value of that register.
411
412 Use of this flag requires detailed knowledge and generally it
413 should not be used except in libraries implementing threading.
414
415 CLONE_SIGHAND (since Linux 2.0)
416 If CLONE_SIGHAND is set, the calling process and the child
417 process share the same table of signal handlers. If the calling
418 process or child process calls sigaction(2) to change the behav‐
419 ior associated with a signal, the behavior is changed in the
420 other process as well. However, the calling process and child
421 processes still have distinct signal masks and sets of pending
422 signals. So, one of them may block or unblock signals using
423 sigprocmask(2) without affecting the other process.
424
425 If CLONE_SIGHAND is not set, the child process inherits a copy
426 of the signal handlers of the calling process at the time of the
427 clone call. Calls to sigaction(2) performed later by one of the
428 processes have no effect on the other process.
429
430 Since Linux 2.6.0, the flags mask must also include CLONE_VM if
431 CLONE_SIGHAND is specified
432
433 CLONE_STOPPED (since Linux 2.6.0)
434 If CLONE_STOPPED is set, then the child is initially stopped (as
435 though it was sent a SIGSTOP signal), and must be resumed by
436 sending it a SIGCONT signal.
437
438 This flag was deprecated from Linux 2.6.25 onward, and was
439 removed altogether in Linux 2.6.38. Since then, the kernel
440 silently ignores it without error. Starting with Linux 4.6, the
441 same bit was reused for the CLONE_NEWCGROUP flag.
442
443 CLONE_SYSVSEM (since Linux 2.5.10)
444 If CLONE_SYSVSEM is set, then the child and the calling process
445 share a single list of System V semaphore adjustment (semadj)
446 values (see semop(2)). In this case, the shared list accumu‐
447 lates semadj values across all processes sharing the list, and
448 semaphore adjustments are performed only when the last process
449 that is sharing the list terminates (or ceases sharing the list
450 using unshare(2)). If this flag is not set, then the child has
451 a separate semadj list that is initially empty.
452
453 CLONE_THREAD (since Linux 2.4.0)
454 If CLONE_THREAD is set, the child is placed in the same thread
455 group as the calling process. To make the remainder of the dis‐
456 cussion of CLONE_THREAD more readable, the term "thread" is used
457 to refer to the processes within a thread group.
458
459 Thread groups were a feature added in Linux 2.4 to support the
460 POSIX threads notion of a set of threads that share a single
461 PID. Internally, this shared PID is the so-called thread group
462 identifier (TGID) for the thread group. Since Linux 2.4, calls
463 to getpid(2) return the TGID of the caller.
464
465 The threads within a group can be distinguished by their (sys‐
466 tem-wide) unique thread IDs (TID). A new thread's TID is avail‐
467 able as the function result returned to the caller, and a thread
468 can obtain its own TID using gettid(2).
469
470 When a clone call is made without specifying CLONE_THREAD, then
471 the resulting thread is placed in a new thread group whose TGID
472 is the same as the thread's TID. This thread is the leader of
473 the new thread group.
474
475 A new thread created with CLONE_THREAD has the same parent
476 process as the process that made the clone call (i.e., like
477 CLONE_PARENT), so that calls to getppid(2) return the same value
478 for all of the threads in a thread group. When a CLONE_THREAD
479 thread terminates, the thread that created it is not sent a
480 SIGCHLD (or other termination) signal; nor can the status of
481 such a thread be obtained using wait(2). (The thread is said to
482 be detached.)
483
484 After all of the threads in a thread group terminate the parent
485 process of the thread group is sent a SIGCHLD (or other termina‐
486 tion) signal.
487
488 If any of the threads in a thread group performs an execve(2),
489 then all threads other than the thread group leader are termi‐
490 nated, and the new program is executed in the thread group
491 leader.
492
493 If one of the threads in a thread group creates a child using
494 fork(2), then any thread in the group can wait(2) for that
495 child.
496
497 Since Linux 2.5.35, the flags mask must also include CLONE_SIG‐
498 HAND if CLONE_THREAD is specified (and note that, since Linux
499 2.6.0, CLONE_SIGHAND also requires CLONE_VM to be included).
500
501 Signal dispositions and actions are process-wide: if an unhan‐
502 dled signal is delivered to a thread, then it will affect (ter‐
503 minate, stop, continue, be ignored in) all members of the thread
504 group.
505
506 Each thread has its own signal mask, as set by sigprocmask(2).
507
508 A signal may be process-directed or thread-directed. A process-
509 directed signal is targeted at a thread group (i.e., a TGID),
510 and is delivered to an arbitrarily selected thread from among
511 those that are not blocking the signal. A signal may be
512 process-directed because it was generated by the kernel for rea‐
513 sons other than a hardware exception, or because it was sent
514 using kill(2) or sigqueue(3). A thread-directed signal is tar‐
515 geted at (i.e., delivered to) a specific thread. A signal may
516 be thread directed because it was sent using tgkill(2) or
517 pthread_sigqueue(3), or because the thread executed a machine
518 language instruction that triggered a hardware exception (e.g.,
519 invalid memory access triggering SIGSEGV or a floating-point
520 exception triggering SIGFPE).
521
522 A call to sigpending(2) returns a signal set that is the union
523 of the pending process-directed signals and the signals that are
524 pending for the calling thread.
525
526 If a process-directed signal is delivered to a thread group, and
527 the thread group has installed a handler for the signal, then
528 the handler will be invoked in exactly one, arbitrarily selected
529 member of the thread group that has not blocked the signal. If
530 multiple threads in a group are waiting to accept the same sig‐
531 nal using sigwaitinfo(2), the kernel will arbitrarily select one
532 of these threads to receive the signal.
533
534 CLONE_UNTRACED (since Linux 2.5.46)
535 If CLONE_UNTRACED is specified, then a tracing process cannot
536 force CLONE_PTRACE on this child process.
537
538 CLONE_VFORK (since Linux 2.2)
539 If CLONE_VFORK is set, the execution of the calling process is
540 suspended until the child releases its virtual memory resources
541 via a call to execve(2) or _exit(2) (as with vfork(2)).
542
543 If CLONE_VFORK is not set, then both the calling process and the
544 child are schedulable after the call, and an application should
545 not rely on execution occurring in any particular order.
546
547 CLONE_VM (since Linux 2.0)
548 If CLONE_VM is set, the calling process and the child process
549 run in the same memory space. In particular, memory writes per‐
550 formed by the calling process or by the child process are also
551 visible in the other process. Moreover, any memory mapping or
552 unmapping performed with mmap(2) or munmap(2) by the child or
553 calling process also affects the other process.
554
555 If CLONE_VM is not set, the child process runs in a separate
556 copy of the memory space of the calling process at the time of
557 the clone call. Memory writes or file mappings/unmappings per‐
558 formed by one of the processes do not affect the other, as with
559 fork(2).
560
562 One use of these systems calls is to implement threads: multiple flows
563 of control in a program that run concurrently in a shared address
564 space.
565
566 Glibc does not provide a wrapper for clone3(); call it using
567 syscall(2).
568
569 Note that the glibc clone() wrapper function makes some changes in the
570 memory pointed to by stack (changes required to set the stack up cor‐
571 rectly for the child) before invoking the clone() system call. So, in
572 cases where clone() is used to recursively create children, do not use
573 the buffer employed for the parent's stack as the stack of the child.
574
575 C library/kernel differences
576 The raw clone() system call corresponds more closely to fork(2) in that
577 execution in the child continues from the point of the call. As such,
578 the fn and arg arguments of the clone() wrapper function are omitted.
579
580 In contrast to the glibc wrapper, the raw clone() system call accepts
581 NULL as a stack argument (and clone3() likewise allows cl_args.stack to
582 be NULL). In this case, the child uses a duplicate of the parent's
583 stack. (Copy-on-write semantics ensure that the child gets separate
584 copies of stack pages when either process modifies the stack.) In this
585 case, for correct operation, the CLONE_VM option should not be speci‐
586 fied. (If the child shares the parent's memory because of the use of
587 the CLONE_VM flag, then no copy-on-write duplication occurs and chaos
588 is likely to result.)
589
590 The order of the arguments also differs in the raw system call, and
591 there are variations in the arguments across architectures, as detailed
592 in the following paragraphs.
593
594 The raw system call interface on x86-64 and some other architectures
595 (including sh, tile, and alpha) is:
596
597 long clone(unsigned long flags, void *stack,
598 int *parent_tid, int *child_tid,
599 unsigned long tls);
600
601 On x86-32, and several other common architectures (including score,
602 ARM, ARM 64, PA-RISC, arc, Power PC, xtensa, and MIPS), the order of
603 the last two arguments is reversed:
604
605 long clone(unsigned long flags, void *stack,
606 int *parent_tid, unsigned long tls,
607 int *child_tid);
608
609 On the cris and s390 architectures, the order of the first two argu‐
610 ments is reversed:
611
612 long clone(void *stack, unsigned long flags,
613 int *parent_tid, int *child_tid,
614 unsigned long tls);
615
616 On the microblaze architecture, an additional argument is supplied:
617
618 long clone(unsigned long flags, void *stack,
619 int stack_size, /* Size of stack */
620 int *parent_tid, int *child_tid,
621 unsigned long tls);
622
623 blackfin, m68k, and sparc
624 The argument-passing conventions on blackfin, m68k, and sparc are dif‐
625 ferent from the descriptions above. For details, see the kernel (and
626 glibc) source.
627
628 ia64
629 On ia64, a different interface is used:
630
631 int __clone2(int (*fn)(void *),
632 void *stack_base, size_t stack_size,
633 int flags, void *arg, ...
634 /* pid_t *parent_tid, struct user_desc *tls,
635 pid_t *child_tid */ );
636
637 The prototype shown above is for the glibc wrapper function; for the
638 system call itself, the prototype can be described as follows (it is
639 identical to the clone() prototype on microblaze):
640
641 long clone2(unsigned long flags, void *stack_base,
642 int stack_size, /* Size of stack */
643 int *parent_tid, int *child_tid,
644 unsigned long tls);
645
646 __clone2() operates in the same way as clone(), except that stack_base
647 points to the lowest address of the child's stack area, and stack_size
648 specifies the size of the stack pointed to by stack_base.
649
650 Linux 2.4 and earlier
651 In Linux 2.4 and earlier, clone() does not take arguments parent_tid,
652 tls, and child_tid.
653
655 On success, the thread ID of the child process is returned in the call‐
656 er's thread of execution. On failure, -1 is returned in the caller's
657 context, no child process will be created, and errno will be set appro‐
658 priately.
659
661 EAGAIN Too many processes are already running; see fork(2).
662
663 EINVAL CLONE_SIGHAND was specified in the flags mask, but CLONE_VM was
664 not. (Since Linux 2.6.0.)
665
666 EINVAL CLONE_THREAD was specified in the flags mask, but CLONE_SIGHAND
667 was not. (Since Linux 2.5.35.)
668
669 EINVAL CLONE_THREAD was specified in the flags mask, but the current
670 process previously called unshare(2) with the CLONE_NEWPID flag
671 or used setns(2) to reassociate itself with a PID namespace.
672
673 EINVAL Both CLONE_FS and CLONE_NEWNS were specified in the flags mask.
674
675 EINVAL (since Linux 3.9)
676 Both CLONE_NEWUSER and CLONE_FS were specified in the flags
677 mask.
678
679 EINVAL Both CLONE_NEWIPC and CLONE_SYSVSEM were specified in the flags
680 mask.
681
682 EINVAL One (or both) of CLONE_NEWPID or CLONE_NEWUSER and one (or both)
683 of CLONE_THREAD or CLONE_PARENT were specified in the flags
684 mask.
685
686 EINVAL Returned by the glibc clone() wrapper function when fn or stack
687 is specified as NULL.
688
689 EINVAL CLONE_NEWIPC was specified in the flags mask, but the kernel was
690 not configured with the CONFIG_SYSVIPC and CONFIG_IPC_NS
691 options.
692
693 EINVAL CLONE_NEWNET was specified in the flags mask, but the kernel was
694 not configured with the CONFIG_NET_NS option.
695
696 EINVAL CLONE_NEWPID was specified in the flags mask, but the kernel was
697 not configured with the CONFIG_PID_NS option.
698
699 EINVAL CLONE_NEWUSER was specified in the flags mask, but the kernel
700 was not configured with the CONFIG_USER_NS option.
701
702 EINVAL CLONE_NEWUTS was specified in the flags mask, but the kernel was
703 not configured with the CONFIG_UTS_NS option.
704
705 EINVAL stack is not aligned to a suitable boundary for this architec‐
706 ture. For example, on aarch64, stack must be a multiple of 16.
707
708 EINVAL (clone3() only
709 CLONE_DETACHED was specified in the flags mask.
710
711 EINVAL (clone() only
712 CLONE_PIDFD was specified together with CLONE_DETACHED in the
713 flags mask.
714
715 EINVAL CLONE_PIDFD was specified together with CLONE_THREAD in the
716 flags mask.
717
718 EINVAL (clone() only)
719 CLONE_PIDFD was specified together with CLONE_PARENT_SETTID in
720 the flags mask.
721
722 ENOMEM Cannot allocate sufficient memory to allocate a task structure
723 for the child, or to copy those parts of the caller's context
724 that need to be copied.
725
726 ENOSPC (since Linux 3.7)
727 CLONE_NEWPID was specified in the flags mask, but the limit on
728 the nesting depth of PID namespaces would have been exceeded;
729 see pid_namespaces(7).
730
731 ENOSPC (since Linux 4.9; beforehand EUSERS)
732 CLONE_NEWUSER was specified in the flags mask, and the call
733 would cause the limit on the number of nested user namespaces to
734 be exceeded. See user_namespaces(7).
735
736 From Linux 3.11 to Linux 4.8, the error diagnosed in this case
737 was EUSERS.
738
739 ENOSPC (since Linux 4.9)
740 One of the values in the flags mask specified the creation of a
741 new user namespace, but doing so would have caused the limit
742 defined by the corresponding file in /proc/sys/user to be
743 exceeded. For further details, see namespaces(7).
744
745 EPERM CLONE_NEWCGROUP, CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS,
746 CLONE_NEWPID, or CLONE_NEWUTS was specified by an unprivileged
747 process (process without CAP_SYS_ADMIN).
748
749 EPERM CLONE_PID was specified by a process other than process 0.
750 (This error occurs only on Linux 2.5.15 and earlier.)
751
752 EPERM CLONE_NEWUSER was specified in the flags mask, but either the
753 effective user ID or the effective group ID of the caller does
754 not have a mapping in the parent namespace (see user_names‐
755 paces(7)).
756
757 EPERM (since Linux 3.9)
758 CLONE_NEWUSER was specified in the flags mask and the caller is
759 in a chroot environment (i.e., the caller's root directory does
760 not match the root directory of the mount namespace in which it
761 resides).
762
763 ERESTARTNOINTR (since Linux 2.6.17)
764 System call was interrupted by a signal and will be restarted.
765 (This can be seen only during a trace.)
766
767 EUSERS (Linux 3.11 to Linux 4.8)
768 CLONE_NEWUSER was specified in the flags mask, and the limit on
769 the number of nested user namespaces would be exceeded. See the
770 discussion of the ENOSPC error above.
771
773 The clone3() system call first appeared in Linux 5.3.
774
776 These system calls are Linux-specific and should not be used in pro‐
777 grams intended to be portable.
778
780 The kcmp(2) system call can be used to test whether two processes share
781 various resources such as a file descriptor table, System V semaphore
782 undo operations, or a virtual address space.
783
784 Handlers registered using pthread_atfork(3) are not executed during a
785 clone call.
786
787 In the Linux 2.4.x series, CLONE_THREAD generally does not make the
788 parent of the new thread the same as the parent of the calling process.
789 However, for kernel versions 2.4.7 to 2.4.18 the CLONE_THREAD flag
790 implied the CLONE_PARENT flag (as in Linux 2.6.0 and later).
791
792 On i386, clone() should not be called through vsyscall, but directly
793 through int $0x80.
794
796 GNU C library versions 2.3.4 up to and including 2.24 contained a wrap‐
797 per function for getpid(2) that performed caching of PIDs. This
798 caching relied on support in the glibc wrapper for clone(), but limita‐
799 tions in the implementation meant that the cache was not up to date in
800 some circumstances. In particular, if a signal was delivered to the
801 child immediately after the clone() call, then a call to getpid(2) in a
802 handler for the signal could return the PID of the calling process
803 ("the parent"), if the clone wrapper had not yet had a chance to update
804 the PID cache in the child. (This discussion ignores the case where
805 the child was created using CLONE_THREAD, when getpid(2) should return
806 the same value in the child and in the process that called clone(),
807 since the caller and the child are in the same thread group. The
808 stale-cache problem also does not occur if the flags argument includes
809 CLONE_VM.) To get the truth, it was sometimes necessary to use code
810 such as the following:
811
812 #include <syscall.h>
813
814 pid_t mypid;
815
816 mypid = syscall(SYS_getpid);
817
818 Because of the stale-cache problem, as well as other problems noted in
819 getpid(2), the PID caching feature was removed in glibc 2.25.
820
822 The following program demonstrates the use of clone() to create a child
823 process that executes in a separate UTS namespace. The child changes
824 the hostname in its UTS namespace. Both parent and child then display
825 the system hostname, making it possible to see that the hostname dif‐
826 fers in the UTS namespaces of the parent and child. For an example of
827 the use of this program, see setns(2).
828
829 Within the sample program, we allocate the memory that is to be used
830 for the child's stack using mmap(2) rather than malloc(3) for the fol‐
831 lowing reasons:
832
833 * mmap(2) allocates a block of memory that starts on a page boundary
834 and is a multiple of the page size. This is useful if we want to
835 establish a guard page (a page with protection PROT_NONE) at the end
836 of the stack using mprotect(2).
837
838 * We can specify the MAP_STACK flag to request a mapping that is suit‐
839 able for a stack. For the moment, this flag is a no-op on Linux,
840 but it exists and has effect on some other systems, so we should
841 include it for portability.
842
843 Program source
844 #define _GNU_SOURCE
845 #include <sys/wait.h>
846 #include <sys/utsname.h>
847 #include <sched.h>
848 #include <string.h>
849 #include <stdio.h>
850 #include <stdlib.h>
851 #include <unistd.h>
852 #include <sys/mman.h>
853
854 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
855 } while (0)
856
857 static int /* Start function for cloned child */
858 childFunc(void *arg)
859 {
860 struct utsname uts;
861
862 /* Change hostname in UTS namespace of child */
863
864 if (sethostname(arg, strlen(arg)) == -1)
865 errExit("sethostname");
866
867 /* Retrieve and display hostname */
868
869 if (uname(&uts) == -1)
870 errExit("uname");
871 printf("uts.nodename in child: %s\n", uts.nodename);
872
873 /* Keep the namespace open for a while, by sleeping.
874 This allows some experimentation--for example, another
875 process might join the namespace. */
876
877 sleep(200);
878
879 return 0; /* Child terminates now */
880 }
881
882 #define STACK_SIZE (1024 * 1024) /* Stack size for cloned child */
883
884 int
885 main(int argc, char *argv[])
886 {
887 char *stack; /* Start of stack buffer */
888 char *stackTop; /* End of stack buffer */
889 pid_t pid;
890 struct utsname uts;
891
892 if (argc < 2) {
893 fprintf(stderr, "Usage: %s <child-hostname>\n", argv[0]);
894 exit(EXIT_SUCCESS);
895 }
896
897 /* Allocate memory to be used for the stack of the child */
898
899 stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
900 MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
901 if (stack == MAP_FAILED)
902 errExit("mmap");
903
904 stackTop = stack + STACK_SIZE; /* Assume stack grows downward */
905
906 /* Create child that has its own UTS namespace;
907 child commences execution in childFunc() */
908
909 pid = clone(childFunc, stackTop, CLONE_NEWUTS | SIGCHLD, argv[1]);
910 if (pid == -1)
911 errExit("clone");
912 printf("clone() returned %ld\n", (long) pid);
913
914 /* Parent falls through to here */
915
916 sleep(1); /* Give child time to change its hostname */
917
918 /* Display hostname in parent's UTS namespace. This will be
919 different from hostname in child's UTS namespace. */
920
921 if (uname(&uts) == -1)
922 errExit("uname");
923 printf("uts.nodename in parent: %s\n", uts.nodename);
924
925 if (waitpid(pid, NULL, 0) == -1) /* Wait for child */
926 errExit("waitpid");
927 printf("child has terminated\n");
928
929 exit(EXIT_SUCCESS);
930 }
931
933 fork(2), futex(2), getpid(2), gettid(2), kcmp(2), mmap(2),
934 pidfd_open(2), set_thread_area(2), set_tid_address(2), setns(2),
935 tkill(2), unshare(2), wait(2), capabilities(7), namespaces(7),
936 pthreads(7)
937
939 This page is part of release 5.04 of the Linux man-pages project. A
940 description of the project, information about reporting bugs, and the
941 latest version of this page, can be found at
942 https://www.kernel.org/doc/man-pages/.
943
944
945
946Linux 2019-11-19 CLONE(2)