1CLONE(2) Linux Programmer's Manual CLONE(2)
2
3
4
6 clone, __clone2 - create a child process
7
9 /* Prototype for the glibc wrapper function */
10
11 #define _GNU_SOURCE
12 #include <sched.h>
13
14 int clone(int (*fn)(void *), void *child_stack,
15 int flags, void *arg, ...
16 /* pid_t *ptid, void *newtls, pid_t *ctid */ );
17
18 /* For the prototype of the raw system call, see NOTES */
19
21 clone() creates a new process, in a manner similar to fork(2).
22
23 This page describes both the glibc clone() wrapper function and the
24 underlying system call on which it is based. The main text describes
25 the wrapper function; the differences for the raw system call are
26 described toward the end of this page.
27
28 Unlike fork(2), clone() allows the child process to share parts of its
29 execution context with the calling process, such as the virtual address
30 space, the table of file descriptors, and the table of signal handlers.
31 (Note that on this manual page, "calling process" normally corresponds
32 to "parent process". But see the description of CLONE_PARENT below.)
33
34 One use of clone() is to implement threads: multiple flows of control
35 in a program that run concurrently in a shared address space.
36
37 When the child process is created with clone(), it commences execution
38 by calling the function pointed to by the argument fn. (This differs
39 from fork(2), where execution continues in the child from the point of
40 the fork(2) call.) The arg argument is passed as the argument of the
41 function fn.
42
43 When the fn(arg) function returns, the child process terminates. The
44 integer returned by fn is the exit status for the child process. The
45 child process may also terminate explicitly by calling exit(2) or after
46 receiving a fatal signal.
47
48 The child_stack argument specifies the location of the stack used by
49 the child process. Since the child and calling process may share mem‐
50 ory, it is not possible for the child process to execute in the same
51 stack as the calling process. The calling process must therefore set
52 up memory space for the child stack and pass a pointer to this space to
53 clone(). Stacks grow downward on all processors that run Linux (except
54 the HP PA processors), so child_stack usually points to the topmost
55 address of the memory space set up for the child stack.
56
57 The low byte of flags contains the number of the termination signal
58 sent to the parent when the child dies. If this signal is specified as
59 anything other than SIGCHLD, then the parent process must specify the
60 __WALL or __WCLONE options when waiting for the child with wait(2). If
61 no signal is specified, then the parent process is not signaled when
62 the child terminates.
63
64 flags may also be bitwise-ORed with zero or more of the following con‐
65 stants, in order to specify what is shared between the calling process
66 and the child process:
67
68 CLONE_CHILD_CLEARTID (since Linux 2.5.49)
69 Clear (zero) the child thread ID at the location ctid in child
70 memory when the child exits, and do a wakeup on the futex at
71 that address. The address involved may be changed by the
72 set_tid_address(2) system call. This is used by threading
73 libraries.
74
75 CLONE_CHILD_SETTID (since Linux 2.5.49)
76 Store the child thread ID at the location ctid in the child's
77 memory. The store operation completes before clone() returns
78 control to user space.
79
80 CLONE_FILES (since Linux 2.0)
81 If CLONE_FILES is set, the calling process and the child process
82 share the same file descriptor table. Any file descriptor cre‐
83 ated by the calling process or by the child process is also
84 valid in the other process. Similarly, if one of the processes
85 closes a file descriptor, or changes its associated flags (using
86 the fcntl(2) F_SETFD operation), the other process is also
87 affected. If a process sharing a file descriptor table calls
88 execve(2), its file descriptor table is duplicated (unshared).
89
90 If CLONE_FILES is not set, the child process inherits a copy of
91 all file descriptors opened in the calling process at the time
92 of clone(). Subsequent operations that open or close file
93 descriptors, or change file descriptor flags, performed by
94 either the calling process or the child process do not affect
95 the other process. Note, however, that the duplicated file
96 descriptors in the child refer to the same open file descrip‐
97 tions as the corresponding file descriptors in the calling
98 process, and thus share file offsets and file status flags (see
99 open(2)).
100
101 CLONE_FS (since Linux 2.0)
102 If CLONE_FS is set, the caller and the child process share the
103 same filesystem information. This includes the root of the
104 filesystem, the current working directory, and the umask. Any
105 call to chroot(2), chdir(2), or umask(2) performed by the call‐
106 ing process or the child process also affects the other process.
107
108 If CLONE_FS is not set, the child process works on a copy of the
109 filesystem information of the calling process at the time of the
110 clone() call. Calls to chroot(2), chdir(2), or umask(2) per‐
111 formed later by one of the processes do not affect the other
112 process.
113
114 CLONE_IO (since Linux 2.6.25)
115 If CLONE_IO is set, then the new process shares an I/O context
116 with the calling process. If this flag is not set, then (as
117 with fork(2)) the new process has its own I/O context.
118
119 The I/O context is the I/O scope of the disk scheduler (i.e.,
120 what the I/O scheduler uses to model scheduling of a process's
121 I/O). If processes share the same I/O context, they are treated
122 as one by the I/O scheduler. As a consequence, they get to
123 share disk time. For some I/O schedulers, if two processes
124 share an I/O context, they will be allowed to interleave their
125 disk access. If several threads are doing I/O on behalf of the
126 same process (aio_read(3), for instance), they should employ
127 CLONE_IO to get better I/O performance.
128
129 If the kernel is not configured with the CONFIG_BLOCK option,
130 this flag is a no-op.
131
132 CLONE_NEWCGROUP (since Linux 4.6)
133 Create the process in a new cgroup namespace. If this flag is
134 not set, then (as with fork(2)) the process is created in the
135 same cgroup namespaces as the calling process. This flag is
136 intended for the implementation of containers.
137
138 For further information on cgroup namespaces, see cgroup_names‐
139 paces(7).
140
141 Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWC‐
142 GROUP.
143
144 CLONE_NEWIPC (since Linux 2.6.19)
145 If CLONE_NEWIPC is set, then create the process in a new IPC
146 namespace. If this flag is not set, then (as with fork(2)), the
147 process is created in the same IPC namespace as the calling
148 process. This flag is intended for the implementation of con‐
149 tainers.
150
151 An IPC namespace provides an isolated view of System V IPC
152 objects (see svipc(7)) and (since Linux 2.6.30) POSIX message
153 queues (see mq_overview(7)). The common characteristic of these
154 IPC mechanisms is that IPC objects are identified by mechanisms
155 other than filesystem pathnames.
156
157 Objects created in an IPC namespace are visible to all other
158 processes that are members of that namespace, but are not visi‐
159 ble to processes in other IPC namespaces.
160
161 When an IPC namespace is destroyed (i.e., when the last process
162 that is a member of the namespace terminates), all IPC objects
163 in the namespace are automatically destroyed.
164
165 Only a privileged process (CAP_SYS_ADMIN) can employ
166 CLONE_NEWIPC. This flag can't be specified in conjunction with
167 CLONE_SYSVSEM.
168
169 For further information on IPC namespaces, see namespaces(7).
170
171 CLONE_NEWNET (since Linux 2.6.24)
172 (The implementation of this flag was completed only by about
173 kernel version 2.6.29.)
174
175 If CLONE_NEWNET is set, then create the process in a new network
176 namespace. If this flag is not set, then (as with fork(2)) the
177 process is created in the same network namespace as the calling
178 process. This flag is intended for the implementation of con‐
179 tainers.
180
181 A network namespace provides an isolated view of the networking
182 stack (network device interfaces, IPv4 and IPv6 protocol stacks,
183 IP routing tables, firewall rules, the /proc/net and
184 /sys/class/net directory trees, sockets, etc.). A physical net‐
185 work device can live in exactly one network namespace. A vir‐
186 tual network (veth(4)) device pair provides a pipe-like abstrac‐
187 tion that can be used to create tunnels between network names‐
188 paces, and can be used to create a bridge to a physical network
189 device in another namespace.
190
191 When a network namespace is freed (i.e., when the last process
192 in the namespace terminates), its physical network devices are
193 moved back to the initial network namespace (not to the parent
194 of the process). For further information on network namespaces,
195 see namespaces(7).
196
197 Only a privileged process (CAP_SYS_ADMIN) can employ
198 CLONE_NEWNET.
199
200 CLONE_NEWNS (since Linux 2.4.19)
201 If CLONE_NEWNS is set, the cloned child is started in a new
202 mount namespace, initialized with a copy of the namespace of the
203 parent. If CLONE_NEWNS is not set, the child lives in the same
204 mount namespace as the parent.
205
206 Only a privileged process (CAP_SYS_ADMIN) can employ
207 CLONE_NEWNS. It is not permitted to specify both CLONE_NEWNS
208 and CLONE_FS in the same clone() call.
209
210 For further information on mount namespaces, see namespaces(7)
211 and mount_namespaces(7).
212
213 CLONE_NEWPID (since Linux 2.6.24)
214 If CLONE_NEWPID is set, then create the process in a new PID
215 namespace. If this flag is not set, then (as with fork(2)) the
216 process is created in the same PID namespace as the calling
217 process. This flag is intended for the implementation of con‐
218 tainers.
219
220 For further information on PID namespaces, see namespaces(7) and
221 pid_namespaces(7).
222
223 Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEW‐
224 PID. This flag can't be specified in conjunction with
225 CLONE_THREAD or CLONE_PARENT.
226
227 CLONE_NEWUSER
228 (This flag first became meaningful for clone() in Linux 2.6.23,
229 the current clone() semantics were merged in Linux 3.5, and the
230 final pieces to make the user namespaces completely usable were
231 merged in Linux 3.8.)
232
233 If CLONE_NEWUSER is set, then create the process in a new user
234 namespace. If this flag is not set, then (as with fork(2)) the
235 process is created in the same user namespace as the calling
236 process.
237
238 Before Linux 3.8, use of CLONE_NEWUSER required that the caller
239 have three capabilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SET‐
240 GID. Starting with Linux 3.8, no privileges are needed to cre‐
241 ate a user namespace.
242
243 This flag can't be specified in conjunction with CLONE_THREAD or
244 CLONE_PARENT. For security reasons, CLONE_NEWUSER cannot be
245 specified in conjunction with CLONE_FS.
246
247 For further information on user namespaces, see namespaces(7)
248 and user_namespaces(7).
249
250 CLONE_NEWUTS (since Linux 2.6.19)
251 If CLONE_NEWUTS is set, then create the process in a new UTS
252 namespace, whose identifiers are initialized by duplicating the
253 identifiers from the UTS namespace of the calling process. If
254 this flag is not set, then (as with fork(2)) the process is cre‐
255 ated in the same UTS namespace as the calling process. This
256 flag is intended for the implementation of containers.
257
258 A UTS namespace is the set of identifiers returned by uname(2);
259 among these, the domain name and the hostname can be modified by
260 setdomainname(2) and sethostname(2), respectively. Changes made
261 to the identifiers in a UTS namespace are visible to all other
262 processes in the same namespace, but are not visible to pro‐
263 cesses in other UTS namespaces.
264
265 Only a privileged process (CAP_SYS_ADMIN) can employ
266 CLONE_NEWUTS.
267
268 For further information on UTS namespaces, see namespaces(7).
269
270 CLONE_PARENT (since Linux 2.3.12)
271 If CLONE_PARENT is set, then the parent of the new child (as
272 returned by getppid(2)) will be the same as that of the calling
273 process.
274
275 If CLONE_PARENT is not set, then (as with fork(2)) the child's
276 parent is the calling process.
277
278 Note that it is the parent process, as returned by getppid(2),
279 which is signaled when the child terminates, so that if
280 CLONE_PARENT is set, then the parent of the calling process,
281 rather than the calling process itself, will be signaled.
282
283 CLONE_PARENT_SETTID (since Linux 2.5.49)
284 Store the child thread ID at the location ptid in the parent's
285 memory. (In Linux 2.5.32-2.5.48 there was a flag CLONE_SETTID
286 that did this.) The store operation completes before clone()
287 returns control to user space.
288
289 CLONE_PID (Linux 2.0 to 2.5.15)
290 If CLONE_PID is set, the child process is created with the same
291 process ID as the calling process. This is good for hacking the
292 system, but otherwise of not much use. From Linux 2.3.21
293 onward, this flag could be specified only by the system boot
294 process (PID 0). The flag disappeared completely from the ker‐
295 nel sources in Linux 2.5.16. Since then, the kernel silently
296 ignores this bit if it is specified in flags.
297
298 CLONE_PTRACE (since Linux 2.2)
299 If CLONE_PTRACE is specified, and the calling process is being
300 traced, then trace the child also (see ptrace(2)).
301
302 CLONE_SETTLS (since Linux 2.5.32)
303 The TLS (Thread Local Storage) descriptor is set to newtls.
304
305 The interpretation of newtls and the resulting effect is archi‐
306 tecture dependent. On x86, newtls is interpreted as a struct
307 user_desc * (see set_thread_area(2)). On x86-64 it is the new
308 value to be set for the %fs base register (see the ARCH_SET_FS
309 argument to arch_prctl(2)). On architectures with a dedicated
310 TLS register, it is the new value of that register.
311
312 CLONE_SIGHAND (since Linux 2.0)
313 If CLONE_SIGHAND is set, the calling process and the child
314 process share the same table of signal handlers. If the calling
315 process or child process calls sigaction(2) to change the behav‐
316 ior associated with a signal, the behavior is changed in the
317 other process as well. However, the calling process and child
318 processes still have distinct signal masks and sets of pending
319 signals. So, one of them may block or unblock signals using
320 sigprocmask(2) without affecting the other process.
321
322 If CLONE_SIGHAND is not set, the child process inherits a copy
323 of the signal handlers of the calling process at the time
324 clone() is called. Calls to sigaction(2) performed later by one
325 of the processes have no effect on the other process.
326
327 Since Linux 2.6.0-test6, flags must also include CLONE_VM if
328 CLONE_SIGHAND is specified
329
330 CLONE_STOPPED (since Linux 2.6.0-test2)
331 If CLONE_STOPPED is set, then the child is initially stopped (as
332 though it was sent a SIGSTOP signal), and must be resumed by
333 sending it a SIGCONT signal.
334
335 This flag was deprecated from Linux 2.6.25 onward, and was
336 removed altogether in Linux 2.6.38. Since then, the kernel
337 silently ignores it without error. Starting with Linux 4.6, the
338 same bit was reused for the CLONE_NEWCGROUP flag.
339
340 CLONE_SYSVSEM (since Linux 2.5.10)
341 If CLONE_SYSVSEM is set, then the child and the calling process
342 share a single list of System V semaphore adjustment (semadj)
343 values (see semop(2)). In this case, the shared list accumu‐
344 lates semadj values across all processes sharing the list, and
345 semaphore adjustments are performed only when the last process
346 that is sharing the list terminates (or ceases sharing the list
347 using unshare(2)). If this flag is not set, then the child has
348 a separate semadj list that is initially empty.
349
350 CLONE_THREAD (since Linux 2.4.0-test8)
351 If CLONE_THREAD is set, the child is placed in the same thread
352 group as the calling process. To make the remainder of the dis‐
353 cussion of CLONE_THREAD more readable, the term "thread" is used
354 to refer to the processes within a thread group.
355
356 Thread groups were a feature added in Linux 2.4 to support the
357 POSIX threads notion of a set of threads that share a single
358 PID. Internally, this shared PID is the so-called thread group
359 identifier (TGID) for the thread group. Since Linux 2.4, calls
360 to getpid(2) return the TGID of the caller.
361
362 The threads within a group can be distinguished by their (sys‐
363 tem-wide) unique thread IDs (TID). A new thread's TID is avail‐
364 able as the function result returned to the caller of clone(),
365 and a thread can obtain its own TID using gettid(2).
366
367 When a call is made to clone() without specifying CLONE_THREAD,
368 then the resulting thread is placed in a new thread group whose
369 TGID is the same as the thread's TID. This thread is the leader
370 of the new thread group.
371
372 A new thread created with CLONE_THREAD has the same parent
373 process as the caller of clone() (i.e., like CLONE_PARENT), so
374 that calls to getppid(2) return the same value for all of the
375 threads in a thread group. When a CLONE_THREAD thread termi‐
376 nates, the thread that created it using clone() is not sent a
377 SIGCHLD (or other termination) signal; nor can the status of
378 such a thread be obtained using wait(2). (The thread is said to
379 be detached.)
380
381 After all of the threads in a thread group terminate the parent
382 process of the thread group is sent a SIGCHLD (or other termina‐
383 tion) signal.
384
385 If any of the threads in a thread group performs an execve(2),
386 then all threads other than the thread group leader are termi‐
387 nated, and the new program is executed in the thread group
388 leader.
389
390 If one of the threads in a thread group creates a child using
391 fork(2), then any thread in the group can wait(2) for that
392 child.
393
394 Since Linux 2.5.35, flags must also include CLONE_SIGHAND if
395 CLONE_THREAD is specified (and note that, since Linux
396 2.6.0-test6, CLONE_SIGHAND also requires CLONE_VM to be
397 included).
398
399 Signals may be sent to a thread group as a whole (i.e., a TGID)
400 using kill(2), or to a specific thread (i.e., TID) using
401 tgkill(2).
402
403 Signal dispositions and actions are process-wide: if an unhan‐
404 dled signal is delivered to a thread, then it will affect (ter‐
405 minate, stop, continue, be ignored in) all members of the thread
406 group.
407
408 Each thread has its own signal mask, as set by sigprocmask(2),
409 but signals can be pending either: for the whole process (i.e.,
410 deliverable to any member of the thread group), when sent with
411 kill(2); or for an individual thread, when sent with tgkill(2).
412 A call to sigpending(2) returns a signal set that is the union
413 of the signals pending for the whole process and the signals
414 that are pending for the calling thread.
415
416 If kill(2) is used to send a signal to a thread group, and the
417 thread group has installed a handler for the signal, then the
418 handler will be invoked in exactly one, arbitrarily selected
419 member of the thread group that has not blocked the signal. If
420 multiple threads in a group are waiting to accept the same sig‐
421 nal using sigwaitinfo(2), the kernel will arbitrarily select one
422 of these threads to receive a signal sent using kill(2).
423
424 CLONE_UNTRACED (since Linux 2.5.46)
425 If CLONE_UNTRACED is specified, then a tracing process cannot
426 force CLONE_PTRACE on this child process.
427
428 CLONE_VFORK (since Linux 2.2)
429 If CLONE_VFORK is set, the execution of the calling process is
430 suspended until the child releases its virtual memory resources
431 via a call to execve(2) or _exit(2) (as with vfork(2)).
432
433 If CLONE_VFORK is not set, then both the calling process and the
434 child are schedulable after the call, and an application should
435 not rely on execution occurring in any particular order.
436
437 CLONE_VM (since Linux 2.0)
438 If CLONE_VM is set, the calling process and the child process
439 run in the same memory space. In particular, memory writes per‐
440 formed by the calling process or by the child process are also
441 visible in the other process. Moreover, any memory mapping or
442 unmapping performed with mmap(2) or munmap(2) by the child or
443 calling process also affects the other process.
444
445 If CLONE_VM is not set, the child process runs in a separate
446 copy of the memory space of the calling process at the time of
447 clone(). Memory writes or file mappings/unmappings performed by
448 one of the processes do not affect the other, as with fork(2).
449
451 Note that the glibc clone() wrapper function makes some changes in the
452 memory pointed to by child_stack (changes required to set the stack up
453 correctly for the child) before invoking the clone() system call. So,
454 in cases where clone() is used to recursively create children, do not
455 use the buffer employed for the parent's stack as the stack of the
456 child.
457
458 C library/kernel differences
459 The raw clone() system call corresponds more closely to fork(2) in that
460 execution in the child continues from the point of the call. As such,
461 the fn and arg arguments of the clone() wrapper function are omitted.
462
463 Another difference for the raw clone() system call is that the
464 child_stack argument may be zero, in which case the child uses a dupli‐
465 cate of the parent's stack. (Copy-on-write semantics ensure that the
466 child gets separate copies of stack pages when either process modifies
467 the stack.) In this case, for correct operation, the CLONE_VM option
468 should not be specified. (If the child shares the parent's memory
469 because of the use of the CLONE_VM flag, then no copy-on-write duplica‐
470 tion occurs and chaos is likely to result.)
471
472 The order of the arguments also differs in the raw system call, and
473 there are variations in the arguments across architectures, as detailed
474 in the following paragraphs.
475
476 The raw system call interface on x86-64 and some other architectures
477 (including sh, tile, and alpha) is roughly:
478
479 long clone(unsigned long flags, void *child_stack,
480 int *ptid, int *ctid,
481 unsigned long newtls);
482
483 On x86-32, and several other common architectures (including score,
484 ARM, ARM 64, PA-RISC, arc, Power PC, xtensa, and MIPS), the order of
485 the last two arguments is reversed:
486
487 long clone(unsigned long flags, void *child_stack,
488 int *ptid, unsigned long newtls,
489 int *ctid);
490
491 On the cris and s390 architectures, the order of the first two argu‐
492 ments is reversed:
493
494 long clone(void *child_stack, unsigned long flags,
495 int *ptid, int *ctid,
496 unsigned long newtls);
497
498 On the microblaze architecture, an additional argument is supplied:
499
500 long clone(unsigned long flags, void *child_stack,
501 int stack_size, /* Size of stack */
502 int *ptid, int *ctid,
503 unsigned long newtls);
504
505 blackfin, m68k, and sparc
506 The argument-passing conventions on blackfin, m68k, and sparc are dif‐
507 ferent from the descriptions above. For details, see the kernel (and
508 glibc) source.
509
510 ia64
511 On ia64, a different interface is used:
512
513 int __clone2(int (*fn)(void *),
514 void *child_stack_base, size_t stack_size,
515 int flags, void *arg, ...
516 /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
517
518 The prototype shown above is for the glibc wrapper function; the raw
519 system call interface has no fn or arg argument, and changes the order
520 of the arguments so that flags is the first argument, and tls is the
521 last argument.
522
523 __clone2() operates in the same way as clone(), except that
524 child_stack_base points to the lowest address of the child's stack
525 area, and stack_size specifies the size of the stack pointed to by
526 child_stack_base.
527
528 Linux 2.4 and earlier
529 In Linux 2.4 and earlier, clone() does not take arguments ptid, tls,
530 and ctid.
531
533 On success, the thread ID of the child process is returned in the call‐
534 er's thread of execution. On failure, -1 is returned in the caller's
535 context, no child process will be created, and errno will be set appro‐
536 priately.
537
539 EAGAIN Too many processes are already running; see fork(2).
540
541 EINVAL CLONE_SIGHAND was specified, but CLONE_VM was not. (Since Linux
542 2.6.0-test6.)
543
544 EINVAL CLONE_THREAD was specified, but CLONE_SIGHAND was not. (Since
545 Linux 2.5.35.)
546
547 EINVAL Both CLONE_FS and CLONE_NEWNS were specified in flags.
548
549 EINVAL (since Linux 3.9)
550 Both CLONE_NEWUSER and CLONE_FS were specified in flags.
551
552 EINVAL Both CLONE_NEWIPC and CLONE_SYSVSEM were specified in flags.
553
554 EINVAL One (or both) of CLONE_NEWPID or CLONE_NEWUSER and one (or both)
555 of CLONE_THREAD or CLONE_PARENT were specified in flags.
556
557 EINVAL Returned by the glibc clone() wrapper function when fn or
558 child_stack is specified as NULL.
559
560 EINVAL CLONE_NEWIPC was specified in flags, but the kernel was not con‐
561 figured with the CONFIG_SYSVIPC and CONFIG_IPC_NS options.
562
563 EINVAL CLONE_NEWNET was specified in flags, but the kernel was not con‐
564 figured with the CONFIG_NET_NS option.
565
566 EINVAL CLONE_NEWPID was specified in flags, but the kernel was not con‐
567 figured with the CONFIG_PID_NS option.
568
569 EINVAL CLONE_NEWUTS was specified in flags, but the kernel was not con‐
570 figured with the CONFIG_UTS option.
571
572 EINVAL child_stack is not aligned to a suitable boundary for this
573 architecture. For example, on aarch64, child_stack must be a
574 multiple of 16.
575
576 ENOMEM Cannot allocate sufficient memory to allocate a task structure
577 for the child, or to copy those parts of the caller's context
578 that need to be copied.
579
580 ENOSPC (since Linux 3.7)
581 CLONE_NEWPID was specified in flags, but the limit on the nest‐
582 ing depth of PID namespaces would have been exceeded; see
583 pid_namespaces(7).
584
585 ENOSPC (since Linux 4.9; beforehand EUSERS)
586 CLONE_NEWUSER was specified in flags, and the call would cause
587 the limit on the number of nested user namespaces to be
588 exceeded. See user_namespaces(7).
589
590 From Linux 3.11 to Linux 4.8, the error diagnosed in this case
591 was EUSERS.
592
593 ENOSPC (since Linux 4.9)
594 One of the values in flags specified the creation of a new user
595 namespace, but doing so would have caused the limit defined by
596 the corresponding file in /proc/sys/user to be exceeded. For
597 further details, see namespaces(7).
598
599 EPERM CLONE_NEWCGROUP, CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS,
600 CLONE_NEWPID, or CLONE_NEWUTS was specified by an unprivileged
601 process (process without CAP_SYS_ADMIN).
602
603 EPERM CLONE_PID was specified by a process other than process 0.
604 (This error occurs only on Linux 2.5.15 and earlier.)
605
606 EPERM CLONE_NEWUSER was specified in flags, but either the effective
607 user ID or the effective group ID of the caller does not have a
608 mapping in the parent namespace (see user_namespaces(7)).
609
610 EPERM (since Linux 3.9)
611 CLONE_NEWUSER was specified in flags and the caller is in a
612 chroot environment (i.e., the caller's root directory does not
613 match the root directory of the mount namespace in which it
614 resides).
615
616 ERESTARTNOINTR (since Linux 2.6.17)
617 System call was interrupted by a signal and will be restarted.
618 (This can be seen only during a trace.)
619
620 EUSERS (Linux 3.11 to Linux 4.8)
621 CLONE_NEWUSER was specified in flags, and the limit on the num‐
622 ber of nested user namespaces would be exceeded. See the dis‐
623 cussion of the ENOSPC error above.
624
626 clone() is Linux-specific and should not be used in programs intended
627 to be portable.
628
630 The kcmp(2) system call can be used to test whether two processes share
631 various resources such as a file descriptor table, System V semaphore
632 undo operations, or a virtual address space.
633
634 Handlers registered using pthread_atfork(3) are not executed during a
635 call to clone().
636
637 In the Linux 2.4.x series, CLONE_THREAD generally does not make the
638 parent of the new thread the same as the parent of the calling process.
639 However, for kernel versions 2.4.7 to 2.4.18 the CLONE_THREAD flag
640 implied the CLONE_PARENT flag (as in Linux 2.6.0 and later).
641
642 For a while there was CLONE_DETACHED (introduced in 2.5.32): parent
643 wants no child-exit signal. In Linux 2.6.2, the need to give this flag
644 together with CLONE_THREAD disappeared. This flag is still defined,
645 but has no effect.
646
647 On i386, clone() should not be called through vsyscall, but directly
648 through int $0x80.
649
651 GNU C library versions 2.3.4 up to and including 2.24 contained a wrap‐
652 per function for getpid(2) that performed caching of PIDs. This
653 caching relied on support in the glibc wrapper for clone(), but limita‐
654 tions in the implementation meant that the cache was not up to date in
655 some circumstances. In particular, if a signal was delivered to the
656 child immediately after the clone() call, then a call to getpid(2) in a
657 handler for the signal could return the PID of the calling process
658 ("the parent"), if the clone wrapper had not yet had a chance to update
659 the PID cache in the child. (This discussion ignores the case where
660 the child was created using CLONE_THREAD, when getpid(2) should return
661 the same value in the child and in the process that called clone(),
662 since the caller and the child are in the same thread group. The
663 stale-cache problem also does not occur if the flags argument includes
664 CLONE_VM.) To get the truth, it was sometimes necessary to use code
665 such as the following:
666
667 #include <syscall.h>
668
669 pid_t mypid;
670
671 mypid = syscall(SYS_getpid);
672
673 Because of the stale-cache problem, as well as other problems noted in
674 getpid(2), the PID caching feature was removed in glibc 2.25.
675
677 The following program demonstrates the use of clone() to create a child
678 process that executes in a separate UTS namespace. The child changes
679 the hostname in its UTS namespace. Both parent and child then display
680 the system hostname, making it possible to see that the hostname dif‐
681 fers in the UTS namespaces of the parent and child. For an example of
682 the use of this program, see setns(2).
683
684 Program source
685 #define _GNU_SOURCE
686 #include <sys/wait.h>
687 #include <sys/utsname.h>
688 #include <sched.h>
689 #include <string.h>
690 #include <stdio.h>
691 #include <stdlib.h>
692 #include <unistd.h>
693
694 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
695 } while (0)
696
697 static int /* Start function for cloned child */
698 childFunc(void *arg)
699 {
700 struct utsname uts;
701
702 /* Change hostname in UTS namespace of child */
703
704 if (sethostname(arg, strlen(arg)) == -1)
705 errExit("sethostname");
706
707 /* Retrieve and display hostname */
708
709 if (uname(&uts) == -1)
710 errExit("uname");
711 printf("uts.nodename in child: %s\n", uts.nodename);
712
713 /* Keep the namespace open for a while, by sleeping.
714 This allows some experimentation--for example, another
715 process might join the namespace. */
716
717 sleep(200);
718
719 return 0; /* Child terminates now */
720 }
721
722 #define STACK_SIZE (1024 * 1024) /* Stack size for cloned child */
723
724 int
725 main(int argc, char *argv[])
726 {
727 char *stack; /* Start of stack buffer */
728 char *stackTop; /* End of stack buffer */
729 pid_t pid;
730 struct utsname uts;
731
732 if (argc < 2) {
733 fprintf(stderr, "Usage: %s <child-hostname>\n", argv[0]);
734 exit(EXIT_SUCCESS);
735 }
736
737 /* Allocate stack for child */
738
739 stack = malloc(STACK_SIZE);
740 if (stack == NULL)
741 errExit("malloc");
742 stackTop = stack + STACK_SIZE; /* Assume stack grows downward */
743
744 /* Create child that has its own UTS namespace;
745 child commences execution in childFunc() */
746
747 pid = clone(childFunc, stackTop, CLONE_NEWUTS | SIGCHLD, argv[1]);
748 if (pid == -1)
749 errExit("clone");
750 printf("clone() returned %ld\n", (long) pid);
751
752 /* Parent falls through to here */
753
754 sleep(1); /* Give child time to change its hostname */
755
756 /* Display hostname in parent's UTS namespace. This will be
757 different from hostname in child's UTS namespace. */
758
759 if (uname(&uts) == -1)
760 errExit("uname");
761 printf("uts.nodename in parent: %s\n", uts.nodename);
762
763 if (waitpid(pid, NULL, 0) == -1) /* Wait for child */
764 errExit("waitpid");
765 printf("child has terminated\n");
766
767 exit(EXIT_SUCCESS);
768 }
769
771 fork(2), futex(2), getpid(2), gettid(2), kcmp(2), set_thread_area(2),
772 set_tid_address(2), setns(2), tkill(2), unshare(2), wait(2), capabili‐
773 ties(7), namespaces(7), pthreads(7)
774
776 This page is part of release 4.15 of the Linux man-pages project. A
777 description of the project, information about reporting bugs, and the
778 latest version of this page, can be found at
779 https://www.kernel.org/doc/man-pages/.
780
781
782
783Linux 2017-09-15 CLONE(2)