1CLONE(2) Linux Programmer's Manual CLONE(2)
2
3
4
6 clone, __clone2 - create a child process
7
9 /* Prototype for the glibc wrapper function */
10
11 #define _GNU_SOURCE
12 #include <sched.h>
13
14 int clone(int (*fn)(void *), void *child_stack,
15 int flags, void *arg, ...
16 /* pid_t *ptid, void *newtls, pid_t *ctid */ );
17
18 /* For the prototype of the raw system call, see NOTES */
19
21 clone() creates a new process, in a manner similar to fork(2).
22
23 This page describes both the glibc clone() wrapper function and the
24 underlying system call on which it is based. The main text describes
25 the wrapper function; the differences for the raw system call are
26 described toward the end of this page.
27
28 Unlike fork(2), clone() allows the child process to share parts of its
29 execution context with the calling process, such as the virtual address
30 space, the table of file descriptors, and the table of signal handlers.
31 (Note that on this manual page, "calling process" normally corresponds
32 to "parent process". But see the description of CLONE_PARENT below.)
33
34 One use of clone() is to implement threads: multiple flows of control
35 in a program that run concurrently in a shared address space.
36
37 When the child process is created with clone(), it commences execution
38 by calling the function pointed to by the argument fn. (This differs
39 from fork(2), where execution continues in the child from the point of
40 the fork(2) call.) The arg argument is passed as the argument of the
41 function fn.
42
43 When the fn(arg) function returns, the child process terminates. The
44 integer returned by fn is the exit status for the child process. The
45 child process may also terminate explicitly by calling exit(2) or after
46 receiving a fatal signal.
47
48 The child_stack argument specifies the location of the stack used by
49 the child process. Since the child and calling process may share mem‐
50 ory, it is not possible for the child process to execute in the same
51 stack as the calling process. The calling process must therefore set
52 up memory space for the child stack and pass a pointer to this space to
53 clone(). Stacks grow downward on all processors that run Linux (except
54 the HP PA processors), so child_stack usually points to the topmost
55 address of the memory space set up for the child stack.
56
57 The low byte of flags contains the number of the termination signal
58 sent to the parent when the child dies. If this signal is specified as
59 anything other than SIGCHLD, then the parent process must specify the
60 __WALL or __WCLONE options when waiting for the child with wait(2). If
61 no signal is specified, then the parent process is not signaled when
62 the child terminates.
63
64 flags may also be bitwise-ORed with zero or more of the following con‐
65 stants, in order to specify what is shared between the calling process
66 and the child process:
67
68 CLONE_CHILD_CLEARTID (since Linux 2.5.49)
69 Clear (zero) the child thread ID at the location ctid in child
70 memory when the child exits, and do a wakeup on the futex at
71 that address. The address involved may be changed by the
72 set_tid_address(2) system call. This is used by threading
73 libraries.
74
75 CLONE_CHILD_SETTID (since Linux 2.5.49)
76 Store the child thread ID at the location ctid in the child's
77 memory. The store operation completes before clone() returns
78 control to user space in the child process. (Note that the
79 store operation may not have completed before clone() returns in
80 the parent process, which will be relevant if the CLONE_VM flag
81 is also employed.)
82
83 CLONE_FILES (since Linux 2.0)
84 If CLONE_FILES is set, the calling process and the child process
85 share the same file descriptor table. Any file descriptor cre‐
86 ated by the calling process or by the child process is also
87 valid in the other process. Similarly, if one of the processes
88 closes a file descriptor, or changes its associated flags (using
89 the fcntl(2) F_SETFD operation), the other process is also
90 affected. If a process sharing a file descriptor table calls
91 execve(2), its file descriptor table is duplicated (unshared).
92
93 If CLONE_FILES is not set, the child process inherits a copy of
94 all file descriptors opened in the calling process at the time
95 of clone(). Subsequent operations that open or close file
96 descriptors, or change file descriptor flags, performed by
97 either the calling process or the child process do not affect
98 the other process. Note, however, that the duplicated file
99 descriptors in the child refer to the same open file descrip‐
100 tions as the corresponding file descriptors in the calling
101 process, and thus share file offsets and file status flags (see
102 open(2)).
103
104 CLONE_FS (since Linux 2.0)
105 If CLONE_FS is set, the caller and the child process share the
106 same filesystem information. This includes the root of the
107 filesystem, the current working directory, and the umask. Any
108 call to chroot(2), chdir(2), or umask(2) performed by the call‐
109 ing process or the child process also affects the other process.
110
111 If CLONE_FS is not set, the child process works on a copy of the
112 filesystem information of the calling process at the time of the
113 clone() call. Calls to chroot(2), chdir(2), or umask(2) per‐
114 formed later by one of the processes do not affect the other
115 process.
116
117 CLONE_IO (since Linux 2.6.25)
118 If CLONE_IO is set, then the new process shares an I/O context
119 with the calling process. If this flag is not set, then (as
120 with fork(2)) the new process has its own I/O context.
121
122 The I/O context is the I/O scope of the disk scheduler (i.e.,
123 what the I/O scheduler uses to model scheduling of a process's
124 I/O). If processes share the same I/O context, they are treated
125 as one by the I/O scheduler. As a consequence, they get to
126 share disk time. For some I/O schedulers, if two processes
127 share an I/O context, they will be allowed to interleave their
128 disk access. If several threads are doing I/O on behalf of the
129 same process (aio_read(3), for instance), they should employ
130 CLONE_IO to get better I/O performance.
131
132 If the kernel is not configured with the CONFIG_BLOCK option,
133 this flag is a no-op.
134
135 CLONE_NEWCGROUP (since Linux 4.6)
136 Create the process in a new cgroup namespace. If this flag is
137 not set, then (as with fork(2)) the process is created in the
138 same cgroup namespaces as the calling process. This flag is
139 intended for the implementation of containers.
140
141 For further information on cgroup namespaces, see cgroup_names‐
142 paces(7).
143
144 Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWC‐
145 GROUP.
146
147 CLONE_NEWIPC (since Linux 2.6.19)
148 If CLONE_NEWIPC is set, then create the process in a new IPC
149 namespace. If this flag is not set, then (as with fork(2)), the
150 process is created in the same IPC namespace as the calling
151 process. This flag is intended for the implementation of con‐
152 tainers.
153
154 An IPC namespace provides an isolated view of System V IPC
155 objects (see sysvipc(7)) and (since Linux 2.6.30) POSIX message
156 queues (see mq_overview(7)). The common characteristic of these
157 IPC mechanisms is that IPC objects are identified by mechanisms
158 other than filesystem pathnames.
159
160 Objects created in an IPC namespace are visible to all other
161 processes that are members of that namespace, but are not visi‐
162 ble to processes in other IPC namespaces.
163
164 When an IPC namespace is destroyed (i.e., when the last process
165 that is a member of the namespace terminates), all IPC objects
166 in the namespace are automatically destroyed.
167
168 Only a privileged process (CAP_SYS_ADMIN) can employ
169 CLONE_NEWIPC. This flag can't be specified in conjunction with
170 CLONE_SYSVSEM.
171
172 For further information on IPC namespaces, see namespaces(7).
173
174 CLONE_NEWNET (since Linux 2.6.24)
175 (The implementation of this flag was completed only by about
176 kernel version 2.6.29.)
177
178 If CLONE_NEWNET is set, then create the process in a new network
179 namespace. If this flag is not set, then (as with fork(2)) the
180 process is created in the same network namespace as the calling
181 process. This flag is intended for the implementation of con‐
182 tainers.
183
184 A network namespace provides an isolated view of the networking
185 stack (network device interfaces, IPv4 and IPv6 protocol stacks,
186 IP routing tables, firewall rules, the /proc/net and
187 /sys/class/net directory trees, sockets, etc.). A physical net‐
188 work device can live in exactly one network namespace. A vir‐
189 tual network (veth(4)) device pair provides a pipe-like abstrac‐
190 tion that can be used to create tunnels between network names‐
191 paces, and can be used to create a bridge to a physical network
192 device in another namespace.
193
194 When a network namespace is freed (i.e., when the last process
195 in the namespace terminates), its physical network devices are
196 moved back to the initial network namespace (not to the parent
197 of the process). For further information on network namespaces,
198 see namespaces(7).
199
200 Only a privileged process (CAP_SYS_ADMIN) can employ
201 CLONE_NEWNET.
202
203 CLONE_NEWNS (since Linux 2.4.19)
204 If CLONE_NEWNS is set, the cloned child is started in a new
205 mount namespace, initialized with a copy of the namespace of the
206 parent. If CLONE_NEWNS is not set, the child lives in the same
207 mount namespace as the parent.
208
209 Only a privileged process (CAP_SYS_ADMIN) can employ
210 CLONE_NEWNS. It is not permitted to specify both CLONE_NEWNS
211 and CLONE_FS in the same clone() call.
212
213 For further information on mount namespaces, see namespaces(7)
214 and mount_namespaces(7).
215
216 CLONE_NEWPID (since Linux 2.6.24)
217 If CLONE_NEWPID is set, then create the process in a new PID
218 namespace. If this flag is not set, then (as with fork(2)) the
219 process is created in the same PID namespace as the calling
220 process. This flag is intended for the implementation of con‐
221 tainers.
222
223 For further information on PID namespaces, see namespaces(7) and
224 pid_namespaces(7).
225
226 Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEW‐
227 PID. This flag can't be specified in conjunction with
228 CLONE_THREAD or CLONE_PARENT.
229
230 CLONE_NEWUSER
231 (This flag first became meaningful for clone() in Linux 2.6.23,
232 the current clone() semantics were merged in Linux 3.5, and the
233 final pieces to make the user namespaces completely usable were
234 merged in Linux 3.8.)
235
236 If CLONE_NEWUSER is set, then create the process in a new user
237 namespace. If this flag is not set, then (as with fork(2)) the
238 process is created in the same user namespace as the calling
239 process.
240
241 Before Linux 3.8, use of CLONE_NEWUSER required that the caller
242 have three capabilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SET‐
243 GID. Starting with Linux 3.8, no privileges are needed to cre‐
244 ate a user namespace.
245
246 This flag can't be specified in conjunction with CLONE_THREAD or
247 CLONE_PARENT. For security reasons, CLONE_NEWUSER cannot be
248 specified in conjunction with CLONE_FS.
249
250 For further information on user namespaces, see namespaces(7)
251 and user_namespaces(7).
252
253 CLONE_NEWUTS (since Linux 2.6.19)
254 If CLONE_NEWUTS is set, then create the process in a new UTS
255 namespace, whose identifiers are initialized by duplicating the
256 identifiers from the UTS namespace of the calling process. If
257 this flag is not set, then (as with fork(2)) the process is cre‐
258 ated in the same UTS namespace as the calling process. This
259 flag is intended for the implementation of containers.
260
261 A UTS namespace is the set of identifiers returned by uname(2);
262 among these, the domain name and the hostname can be modified by
263 setdomainname(2) and sethostname(2), respectively. Changes made
264 to the identifiers in a UTS namespace are visible to all other
265 processes in the same namespace, but are not visible to pro‐
266 cesses in other UTS namespaces.
267
268 Only a privileged process (CAP_SYS_ADMIN) can employ
269 CLONE_NEWUTS.
270
271 For further information on UTS namespaces, see namespaces(7).
272
273 CLONE_PARENT (since Linux 2.3.12)
274 If CLONE_PARENT is set, then the parent of the new child (as
275 returned by getppid(2)) will be the same as that of the calling
276 process.
277
278 If CLONE_PARENT is not set, then (as with fork(2)) the child's
279 parent is the calling process.
280
281 Note that it is the parent process, as returned by getppid(2),
282 which is signaled when the child terminates, so that if
283 CLONE_PARENT is set, then the parent of the calling process,
284 rather than the calling process itself, will be signaled.
285
286 CLONE_PARENT_SETTID (since Linux 2.5.49)
287 Store the child thread ID at the location ptid in the parent's
288 memory. (In Linux 2.5.32-2.5.48 there was a flag CLONE_SETTID
289 that did this.) The store operation completes before clone()
290 returns control to user space.
291
292 CLONE_PID (Linux 2.0 to 2.5.15)
293 If CLONE_PID is set, the child process is created with the same
294 process ID as the calling process. This is good for hacking the
295 system, but otherwise of not much use. From Linux 2.3.21
296 onward, this flag could be specified only by the system boot
297 process (PID 0). The flag disappeared completely from the ker‐
298 nel sources in Linux 2.5.16. Since then, the kernel silently
299 ignores this bit if it is specified in flags.
300
301 CLONE_PTRACE (since Linux 2.2)
302 If CLONE_PTRACE is specified, and the calling process is being
303 traced, then trace the child also (see ptrace(2)).
304
305 CLONE_SETTLS (since Linux 2.5.32)
306 The TLS (Thread Local Storage) descriptor is set to newtls.
307
308 The interpretation of newtls and the resulting effect is archi‐
309 tecture dependent. On x86, newtls is interpreted as a struct
310 user_desc * (see set_thread_area(2)). On x86-64 it is the new
311 value to be set for the %fs base register (see the ARCH_SET_FS
312 argument to arch_prctl(2)). On architectures with a dedicated
313 TLS register, it is the new value of that register.
314
315 CLONE_SIGHAND (since Linux 2.0)
316 If CLONE_SIGHAND is set, the calling process and the child
317 process share the same table of signal handlers. If the calling
318 process or child process calls sigaction(2) to change the behav‐
319 ior associated with a signal, the behavior is changed in the
320 other process as well. However, the calling process and child
321 processes still have distinct signal masks and sets of pending
322 signals. So, one of them may block or unblock signals using
323 sigprocmask(2) without affecting the other process.
324
325 If CLONE_SIGHAND is not set, the child process inherits a copy
326 of the signal handlers of the calling process at the time
327 clone() is called. Calls to sigaction(2) performed later by one
328 of the processes have no effect on the other process.
329
330 Since Linux 2.6.0, flags must also include CLONE_VM if
331 CLONE_SIGHAND is specified
332
333 CLONE_STOPPED (since Linux 2.6.0)
334 If CLONE_STOPPED is set, then the child is initially stopped (as
335 though it was sent a SIGSTOP signal), and must be resumed by
336 sending it a SIGCONT signal.
337
338 This flag was deprecated from Linux 2.6.25 onward, and was
339 removed altogether in Linux 2.6.38. Since then, the kernel
340 silently ignores it without error. Starting with Linux 4.6, the
341 same bit was reused for the CLONE_NEWCGROUP flag.
342
343 CLONE_SYSVSEM (since Linux 2.5.10)
344 If CLONE_SYSVSEM is set, then the child and the calling process
345 share a single list of System V semaphore adjustment (semadj)
346 values (see semop(2)). In this case, the shared list accumu‐
347 lates semadj values across all processes sharing the list, and
348 semaphore adjustments are performed only when the last process
349 that is sharing the list terminates (or ceases sharing the list
350 using unshare(2)). If this flag is not set, then the child has
351 a separate semadj list that is initially empty.
352
353 CLONE_THREAD (since Linux 2.4.0)
354 If CLONE_THREAD is set, the child is placed in the same thread
355 group as the calling process. To make the remainder of the dis‐
356 cussion of CLONE_THREAD more readable, the term "thread" is used
357 to refer to the processes within a thread group.
358
359 Thread groups were a feature added in Linux 2.4 to support the
360 POSIX threads notion of a set of threads that share a single
361 PID. Internally, this shared PID is the so-called thread group
362 identifier (TGID) for the thread group. Since Linux 2.4, calls
363 to getpid(2) return the TGID of the caller.
364
365 The threads within a group can be distinguished by their (sys‐
366 tem-wide) unique thread IDs (TID). A new thread's TID is avail‐
367 able as the function result returned to the caller of clone(),
368 and a thread can obtain its own TID using gettid(2).
369
370 When a call is made to clone() without specifying CLONE_THREAD,
371 then the resulting thread is placed in a new thread group whose
372 TGID is the same as the thread's TID. This thread is the leader
373 of the new thread group.
374
375 A new thread created with CLONE_THREAD has the same parent
376 process as the caller of clone() (i.e., like CLONE_PARENT), so
377 that calls to getppid(2) return the same value for all of the
378 threads in a thread group. When a CLONE_THREAD thread termi‐
379 nates, the thread that created it using clone() is not sent a
380 SIGCHLD (or other termination) signal; nor can the status of
381 such a thread be obtained using wait(2). (The thread is said to
382 be detached.)
383
384 After all of the threads in a thread group terminate the parent
385 process of the thread group is sent a SIGCHLD (or other termina‐
386 tion) signal.
387
388 If any of the threads in a thread group performs an execve(2),
389 then all threads other than the thread group leader are termi‐
390 nated, and the new program is executed in the thread group
391 leader.
392
393 If one of the threads in a thread group creates a child using
394 fork(2), then any thread in the group can wait(2) for that
395 child.
396
397 Since Linux 2.5.35, flags must also include CLONE_SIGHAND if
398 CLONE_THREAD is specified (and note that, since Linux 2.6.0,
399 CLONE_SIGHAND also requires CLONE_VM to be included).
400
401 Signal dispositions and actions are process-wide: if an unhan‐
402 dled signal is delivered to a thread, then it will affect (ter‐
403 minate, stop, continue, be ignored in) all members of the thread
404 group.
405
406 Each thread has its own signal mask, as set by sigprocmask(2).
407
408 A signal may be process-directed or thread-directed. A process-
409 directed signal is targeted at a thread group (i.e., a TGID),
410 and is delivered to an arbitrarily selected thread from among
411 those that are not blocking the signal. A signal may be process
412 directed because it was generated by the kernel for reasons
413 other than a hardware exception, or because it was sent using
414 kill(2) or sigqueue(3). A thread-directed signal is targeted at
415 (i.e., delivered to) a specific thread. A signal may be thread
416 directed because it was sent using tgkill(2) or
417 pthread_sigqueue(3), or because the thread executed a machine
418 language instruction that triggered a hardware exception (e.g.,
419 invalid memory access triggering SIGSEGV or a floating-point
420 exception triggering SIGFPE).
421
422 A call to sigpending(2) returns a signal set that is the union
423 of the pending process-directed signals and the signals that are
424 pending for the calling thread.
425
426 If a process-directed signal is delivered to a thread group, and
427 the thread group has installed a handler for the signal, then
428 the handler will be invoked in exactly one, arbitrarily selected
429 member of the thread group that has not blocked the signal. If
430 multiple threads in a group are waiting to accept the same sig‐
431 nal using sigwaitinfo(2), the kernel will arbitrarily select one
432 of these threads to receive the signal.
433
434 CLONE_UNTRACED (since Linux 2.5.46)
435 If CLONE_UNTRACED is specified, then a tracing process cannot
436 force CLONE_PTRACE on this child process.
437
438 CLONE_VFORK (since Linux 2.2)
439 If CLONE_VFORK is set, the execution of the calling process is
440 suspended until the child releases its virtual memory resources
441 via a call to execve(2) or _exit(2) (as with vfork(2)).
442
443 If CLONE_VFORK is not set, then both the calling process and the
444 child are schedulable after the call, and an application should
445 not rely on execution occurring in any particular order.
446
447 CLONE_VM (since Linux 2.0)
448 If CLONE_VM is set, the calling process and the child process
449 run in the same memory space. In particular, memory writes per‐
450 formed by the calling process or by the child process are also
451 visible in the other process. Moreover, any memory mapping or
452 unmapping performed with mmap(2) or munmap(2) by the child or
453 calling process also affects the other process.
454
455 If CLONE_VM is not set, the child process runs in a separate
456 copy of the memory space of the calling process at the time of
457 clone(). Memory writes or file mappings/unmappings performed by
458 one of the processes do not affect the other, as with fork(2).
459
461 Note that the glibc clone() wrapper function makes some changes in the
462 memory pointed to by child_stack (changes required to set the stack up
463 correctly for the child) before invoking the clone() system call. So,
464 in cases where clone() is used to recursively create children, do not
465 use the buffer employed for the parent's stack as the stack of the
466 child.
467
468 C library/kernel differences
469 The raw clone() system call corresponds more closely to fork(2) in that
470 execution in the child continues from the point of the call. As such,
471 the fn and arg arguments of the clone() wrapper function are omitted.
472
473 Another difference for the raw clone() system call is that the
474 child_stack argument may be NULL, in which case the child uses a dupli‐
475 cate of the parent's stack. (Copy-on-write semantics ensure that the
476 child gets separate copies of stack pages when either process modifies
477 the stack.) In this case, for correct operation, the CLONE_VM option
478 should not be specified. (If the child shares the parent's memory
479 because of the use of the CLONE_VM flag, then no copy-on-write duplica‐
480 tion occurs and chaos is likely to result.)
481
482 The order of the arguments also differs in the raw system call, and
483 there are variations in the arguments across architectures, as detailed
484 in the following paragraphs.
485
486 The raw system call interface on x86-64 and some other architectures
487 (including sh, tile, ia-64, and alpha) is:
488
489 long clone(unsigned long flags, void *child_stack,
490 int *ptid, int *ctid,
491 unsigned long newtls);
492
493 On x86-32, and several other common architectures (including score,
494 ARM, ARM 64, PA-RISC, arc, Power PC, xtensa, and MIPS), the order of
495 the last two arguments is reversed:
496
497 long clone(unsigned long flags, void *child_stack,
498 int *ptid, unsigned long newtls,
499 int *ctid);
500
501 On the cris and s390 architectures, the order of the first two argu‐
502 ments is reversed:
503
504 long clone(void *child_stack, unsigned long flags,
505 int *ptid, int *ctid,
506 unsigned long newtls);
507
508 On the microblaze architecture, an additional argument is supplied:
509
510 long clone(unsigned long flags, void *child_stack,
511 int stack_size, /* Size of stack */
512 int *ptid, int *ctid,
513 unsigned long newtls);
514
515 blackfin, m68k, and sparc
516 The argument-passing conventions on blackfin, m68k, and sparc are dif‐
517 ferent from the descriptions above. For details, see the kernel (and
518 glibc) source.
519
520 ia64
521 On ia64, a different interface is used:
522
523 int __clone2(int (*fn)(void *),
524 void *child_stack_base, size_t stack_size,
525 int flags, void *arg, ...
526 /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
527
528 The prototype shown above is for the glibc wrapper function; for the
529 system call itself, the prototype can be described as follows (it is
530 identical to the clone() prototype on microblaze):
531
532 long clone2(unsigned long flags, void *child_stack_base,
533 int stack_size, /* Size of stack */
534 int *ptid, int *ctid,
535 unsigned long tls);
536
537 __clone2() operates in the same way as clone(), except that
538 child_stack_base points to the lowest address of the child's stack
539 area, and stack_size specifies the size of the stack pointed to by
540 child_stack_base.
541
542 Linux 2.4 and earlier
543 In Linux 2.4 and earlier, clone() does not take arguments ptid, tls,
544 and ctid.
545
547 On success, the thread ID of the child process is returned in the call‐
548 er's thread of execution. On failure, -1 is returned in the caller's
549 context, no child process will be created, and errno will be set appro‐
550 priately.
551
553 EAGAIN Too many processes are already running; see fork(2).
554
555 EINVAL CLONE_SIGHAND was specified, but CLONE_VM was not. (Since Linux
556 2.6.0.)
557
558 EINVAL CLONE_THREAD was specified, but CLONE_SIGHAND was not. (Since
559 Linux 2.5.35.)
560
561 EINVAL CLONE_THREAD was specified, but the current process previously
562 called unshare(2) with the CLONE_NEWPID flag or used setns(2) to
563 reassociate itself with a PID namespace.
564
565 EINVAL Both CLONE_FS and CLONE_NEWNS were specified in flags.
566
567 EINVAL (since Linux 3.9)
568 Both CLONE_NEWUSER and CLONE_FS were specified in flags.
569
570 EINVAL Both CLONE_NEWIPC and CLONE_SYSVSEM were specified in flags.
571
572 EINVAL One (or both) of CLONE_NEWPID or CLONE_NEWUSER and one (or both)
573 of CLONE_THREAD or CLONE_PARENT were specified in flags.
574
575 EINVAL Returned by the glibc clone() wrapper function when fn or
576 child_stack is specified as NULL.
577
578 EINVAL CLONE_NEWIPC was specified in flags, but the kernel was not con‐
579 figured with the CONFIG_SYSVIPC and CONFIG_IPC_NS options.
580
581 EINVAL CLONE_NEWNET was specified in flags, but the kernel was not con‐
582 figured with the CONFIG_NET_NS option.
583
584 EINVAL CLONE_NEWPID was specified in flags, but the kernel was not con‐
585 figured with the CONFIG_PID_NS option.
586
587 EINVAL CLONE_NEWUSER was specified in flags, but the kernel was not
588 configured with the CONFIG_USER_NS option.
589
590 EINVAL CLONE_NEWUTS was specified in flags, but the kernel was not con‐
591 figured with the CONFIG_UTS_NS option.
592
593 EINVAL child_stack is not aligned to a suitable boundary for this
594 architecture. For example, on aarch64, child_stack must be a
595 multiple of 16.
596
597 ENOMEM Cannot allocate sufficient memory to allocate a task structure
598 for the child, or to copy those parts of the caller's context
599 that need to be copied.
600
601 ENOSPC (since Linux 3.7)
602 CLONE_NEWPID was specified in flags, but the limit on the nest‐
603 ing depth of PID namespaces would have been exceeded; see
604 pid_namespaces(7).
605
606 ENOSPC (since Linux 4.9; beforehand EUSERS)
607 CLONE_NEWUSER was specified in flags, and the call would cause
608 the limit on the number of nested user namespaces to be
609 exceeded. See user_namespaces(7).
610
611 From Linux 3.11 to Linux 4.8, the error diagnosed in this case
612 was EUSERS.
613
614 ENOSPC (since Linux 4.9)
615 One of the values in flags specified the creation of a new user
616 namespace, but doing so would have caused the limit defined by
617 the corresponding file in /proc/sys/user to be exceeded. For
618 further details, see namespaces(7).
619
620 EPERM CLONE_NEWCGROUP, CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS,
621 CLONE_NEWPID, or CLONE_NEWUTS was specified by an unprivileged
622 process (process without CAP_SYS_ADMIN).
623
624 EPERM CLONE_PID was specified by a process other than process 0.
625 (This error occurs only on Linux 2.5.15 and earlier.)
626
627 EPERM CLONE_NEWUSER was specified in flags, but either the effective
628 user ID or the effective group ID of the caller does not have a
629 mapping in the parent namespace (see user_namespaces(7)).
630
631 EPERM (since Linux 3.9)
632 CLONE_NEWUSER was specified in flags and the caller is in a
633 chroot environment (i.e., the caller's root directory does not
634 match the root directory of the mount namespace in which it
635 resides).
636
637 ERESTARTNOINTR (since Linux 2.6.17)
638 System call was interrupted by a signal and will be restarted.
639 (This can be seen only during a trace.)
640
641 EUSERS (Linux 3.11 to Linux 4.8)
642 CLONE_NEWUSER was specified in flags, and the limit on the num‐
643 ber of nested user namespaces would be exceeded. See the dis‐
644 cussion of the ENOSPC error above.
645
647 clone() is Linux-specific and should not be used in programs intended
648 to be portable.
649
651 The kcmp(2) system call can be used to test whether two processes share
652 various resources such as a file descriptor table, System V semaphore
653 undo operations, or a virtual address space.
654
655 Handlers registered using pthread_atfork(3) are not executed during a
656 call to clone().
657
658 In the Linux 2.4.x series, CLONE_THREAD generally does not make the
659 parent of the new thread the same as the parent of the calling process.
660 However, for kernel versions 2.4.7 to 2.4.18 the CLONE_THREAD flag
661 implied the CLONE_PARENT flag (as in Linux 2.6.0 and later).
662
663 For a while there was CLONE_DETACHED (introduced in 2.5.32): parent
664 wants no child-exit signal. In Linux 2.6.2, the need to give this flag
665 together with CLONE_THREAD disappeared. This flag is still defined,
666 but has no effect.
667
668 On i386, clone() should not be called through vsyscall, but directly
669 through int $0x80.
670
672 GNU C library versions 2.3.4 up to and including 2.24 contained a wrap‐
673 per function for getpid(2) that performed caching of PIDs. This
674 caching relied on support in the glibc wrapper for clone(), but limita‐
675 tions in the implementation meant that the cache was not up to date in
676 some circumstances. In particular, if a signal was delivered to the
677 child immediately after the clone() call, then a call to getpid(2) in a
678 handler for the signal could return the PID of the calling process
679 ("the parent"), if the clone wrapper had not yet had a chance to update
680 the PID cache in the child. (This discussion ignores the case where
681 the child was created using CLONE_THREAD, when getpid(2) should return
682 the same value in the child and in the process that called clone(),
683 since the caller and the child are in the same thread group. The
684 stale-cache problem also does not occur if the flags argument includes
685 CLONE_VM.) To get the truth, it was sometimes necessary to use code
686 such as the following:
687
688 #include <syscall.h>
689
690 pid_t mypid;
691
692 mypid = syscall(SYS_getpid);
693
694 Because of the stale-cache problem, as well as other problems noted in
695 getpid(2), the PID caching feature was removed in glibc 2.25.
696
698 The following program demonstrates the use of clone() to create a child
699 process that executes in a separate UTS namespace. The child changes
700 the hostname in its UTS namespace. Both parent and child then display
701 the system hostname, making it possible to see that the hostname dif‐
702 fers in the UTS namespaces of the parent and child. For an example of
703 the use of this program, see setns(2).
704
705 Program source
706 #define _GNU_SOURCE
707 #include <sys/wait.h>
708 #include <sys/utsname.h>
709 #include <sched.h>
710 #include <string.h>
711 #include <stdio.h>
712 #include <stdlib.h>
713 #include <unistd.h>
714
715 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
716 } while (0)
717
718 static int /* Start function for cloned child */
719 childFunc(void *arg)
720 {
721 struct utsname uts;
722
723 /* Change hostname in UTS namespace of child */
724
725 if (sethostname(arg, strlen(arg)) == -1)
726 errExit("sethostname");
727
728 /* Retrieve and display hostname */
729
730 if (uname(&uts) == -1)
731 errExit("uname");
732 printf("uts.nodename in child: %s\n", uts.nodename);
733
734 /* Keep the namespace open for a while, by sleeping.
735 This allows some experimentation--for example, another
736 process might join the namespace. */
737
738 sleep(200);
739
740 return 0; /* Child terminates now */
741 }
742
743 #define STACK_SIZE (1024 * 1024) /* Stack size for cloned child */
744
745 int
746 main(int argc, char *argv[])
747 {
748 char *stack; /* Start of stack buffer */
749 char *stackTop; /* End of stack buffer */
750 pid_t pid;
751 struct utsname uts;
752
753 if (argc < 2) {
754 fprintf(stderr, "Usage: %s <child-hostname>\n", argv[0]);
755 exit(EXIT_SUCCESS);
756 }
757
758 /* Allocate stack for child */
759
760 stack = malloc(STACK_SIZE);
761 if (stack == NULL)
762 errExit("malloc");
763 stackTop = stack + STACK_SIZE; /* Assume stack grows downward */
764
765 /* Create child that has its own UTS namespace;
766 child commences execution in childFunc() */
767
768 pid = clone(childFunc, stackTop, CLONE_NEWUTS | SIGCHLD, argv[1]);
769 if (pid == -1)
770 errExit("clone");
771 printf("clone() returned %ld\n", (long) pid);
772
773 /* Parent falls through to here */
774
775 sleep(1); /* Give child time to change its hostname */
776
777 /* Display hostname in parent's UTS namespace. This will be
778 different from hostname in child's UTS namespace. */
779
780 if (uname(&uts) == -1)
781 errExit("uname");
782 printf("uts.nodename in parent: %s\n", uts.nodename);
783
784 if (waitpid(pid, NULL, 0) == -1) /* Wait for child */
785 errExit("waitpid");
786 printf("child has terminated\n");
787
788 exit(EXIT_SUCCESS);
789 }
790
792 fork(2), futex(2), getpid(2), gettid(2), kcmp(2), set_thread_area(2),
793 set_tid_address(2), setns(2), tkill(2), unshare(2), wait(2), capabili‐
794 ties(7), namespaces(7), pthreads(7)
795
797 This page is part of release 5.02 of the Linux man-pages project. A
798 description of the project, information about reporting bugs, and the
799 latest version of this page, can be found at
800 https://www.kernel.org/doc/man-pages/.
801
802
803
804Linux 2019-08-02 CLONE(2)