1CLONE(2) Linux Programmer's Manual CLONE(2)
2
3
4
6 clone, __clone2 - create a child process
7
9 #define _GNU_SOURCE
10 #include <sched.h>
11
12 int clone(int (*fn)(void *), void *child_stack,
13 int flags, void *arg, ...
14 /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
15
17 clone() creates a new process, in a manner similar to fork(2). It is
18 actually a library function layered on top of the underlying clone()
19 system call, hereinafter referred to as sys_clone. A description of
20 sys_clone is given towards the end of this page.
21
22 Unlike fork(2), these calls allow the child process to share parts of
23 its execution context with the calling process, such as the memory
24 space, the table of file descriptors, and the table of signal handlers.
25 (Note that on this manual page, "calling process" normally corresponds
26 to "parent process". But see the description of CLONE_PARENT below.)
27
28 The main use of clone() is to implement threads: multiple threads of
29 control in a program that run concurrently in a shared memory space.
30
31 When the child process is created with clone(), it executes the func‐
32 tion application fn(arg). (This differs from fork(2), where execution
33 continues in the child from the point of the fork(2) call.) The fn
34 argument is a pointer to a function that is called by the child process
35 at the beginning of its execution. The arg argument is passed to the
36 fn function.
37
38 When the fn(arg) function application returns, the child process termi‐
39 nates. The integer returned by fn is the exit code for the child
40 process. The child process may also terminate explicitly by calling
41 exit(2) or after receiving a fatal signal.
42
43 The child_stack argument specifies the location of the stack used by
44 the child process. Since the child and calling process may share mem‐
45 ory, it is not possible for the child process to execute in the same
46 stack as the calling process. The calling process must therefore set
47 up memory space for the child stack and pass a pointer to this space to
48 clone(). Stacks grow downwards on all processors that run Linux
49 (except the HP PA processors), so child_stack usually points to the
50 topmost address of the memory space set up for the child stack.
51
52 The low byte of flags contains the number of the termination signal
53 sent to the parent when the child dies. If this signal is specified as
54 anything other than SIGCHLD, then the parent process must specify the
55 __WALL or __WCLONE options when waiting for the child with wait(2). If
56 no signal is specified, then the parent process is not signaled when
57 the child terminates.
58
59 flags may also be bitwise-or'ed with zero or more of the following con‐
60 stants, in order to specify what is shared between the calling process
61 and the child process:
62
63 CLONE_CHILD_CLEARTID (since Linux 2.5.49)
64 Erase child thread ID at location ctid in child memory when the
65 child exits, and do a wakeup on the futex at that address. The
66 address involved may be changed by the set_tid_address(2) system
67 call. This is used by threading libraries.
68
69 CLONE_CHILD_SETTID (since Linux 2.5.49)
70 Store child thread ID at location ctid in child memory.
71
72 CLONE_FILES
73 If CLONE_FILES is set, the calling process and the child process
74 share the same file descriptor table. Any file descriptor cre‐
75 ated by the calling process or by the child process is also
76 valid in the other process. Similarly, if one of the processes
77 closes a file descriptor, or changes its associated flags (using
78 the fcntl(2) F_SETFD operation), the other process is also
79 affected.
80
81 If CLONE_FILES is not set, the child process inherits a copy of
82 all file descriptors opened in the calling process at the time
83 of clone(). (The duplicated file descriptors in the child refer
84 to the same open file descriptions (see open(2)) as the corre‐
85 sponding file descriptors in the calling process.) Subsequent
86 operations that open or close file descriptors, or change file
87 descriptor flags, performed by either the calling process or the
88 child process do not affect the other process.
89
90 CLONE_FS
91 If CLONE_FS is set, the caller and the child process share the
92 same file system information. This includes the root of the
93 file system, the current working directory, and the umask. Any
94 call to chroot(2), chdir(2), or umask(2) performed by the call‐
95 ing process or the child process also affects the other process.
96
97 If CLONE_FS is not set, the child process works on a copy of the
98 file system information of the calling process at the time of
99 the clone() call. Calls to chroot(2), chdir(2), umask(2) per‐
100 formed later by one of the processes do not affect the other
101 process.
102
103 CLONE_IO (since Linux 2.6.25)
104 If CLONE_IO is set, then the new process shares an I/O context
105 with the calling process. If this flag is not set, then (as
106 with fork(2)) the new process has its own I/O context.
107
108 The I/O context is the I/O scope of the disk scheduler (i.e,
109 what the I/O scheduler uses to model scheduling of a process's
110 I/O). If processes share the same I/O context, they are treated
111 as one by the I/O scheduler. As a consequence, they get to
112 share disk time. For some I/O schedulers, if two processes
113 share an I/O context, they will be allowed to interleave their
114 disk access. If several threads are doing I/O on behalf of the
115 same process (aio_read(3), for instance), they should employ
116 CLONE_IO to get better I/O performance.
117
118 If the kernel is not configured with the CONFIG_BLOCK option,
119 this flag is a no-op.
120
121 CLONE_NEWIPC (since Linux 2.6.19)
122 If CLONE_NEWIPC is set, then create the process in a new IPC
123 namespace. If this flag is not set, then (as with fork(2)), the
124 process is created in the same IPC namespace as the calling
125 process. This flag is intended for the implementation of con‐
126 tainers.
127
128 An IPC namespace consists of the set of identifiers for System V
129 IPC objects. (These objects are created using msgctl(2), sem‐
130 ctl(2), and shmctl(2)). Objects created in an IPC namespace are
131 visible to all other processes that are members of that names‐
132 pace, but are not visible to processes in other IPC namespaces.
133
134 When an IPC namespace is destroyed (i.e, when the last process
135 that is a member of the namespace terminates), all IPC objects
136 in the namespace are automatically destroyed.
137
138 Use of this flag requires: a kernel configured with the CON‐
139 FIG_SYSVIPC and CONFIG_IPC_NS options and that the process be
140 privileged (CAP_SYS_ADMIN). This flag can't be specified in
141 conjunction with CLONE_SYSVSEM.
142
143 CLONE_NEWNET (since Linux 2.6.24)
144 (The implementation of this flag is not yet complete, but proba‐
145 bly will be mostly complete by about Linux 2.6.28.)
146
147 If CLONE_NEWNET is set, then create the process in a new network
148 namespace. If this flag is not set, then (as with fork(2)), the
149 process is created in the same network namespace as the calling
150 process. This flag is intended for the implementation of con‐
151 tainers.
152
153 A network namespace provides an isolated view of the networking
154 stack (network device interfaces, IPv4 and IPv6 protocol stacks,
155 IP routing tables, firewall rules, the /proc/net and
156 /sys/class/net directory trees, sockets, etc.). A physical net‐
157 work device can live in exactly one network namespace. A vir‐
158 tual network device ("veth") pair provides a pipe-like abstrac‐
159 tion that can be used to create tunnels between network names‐
160 paces, and can be used to create a bridge to a physical network
161 device in another namespace.
162
163 When a network namespace is freed (i.e., when the last process
164 in the namespace terminates), its physical network devices are
165 moved back to the initial network namespace (not to the parent
166 of the process).
167
168 Use of this flag requires: a kernel configured with the CON‐
169 FIG_NET_NS option and that the process be privileged
170 (CAP_SYS_ADMIN).
171
172 CLONE_NEWNS (since Linux 2.4.19)
173 Start the child in a new mount namespace.
174
175 Every process lives in a mount namespace. The namespace of a
176 process is the data (the set of mounts) describing the file
177 hierarchy as seen by that process. After a fork(2) or clone()
178 where the CLONE_NEWNS flag is not set, the child lives in the
179 same mount namespace as the parent. The system calls mount(2)
180 and umount(2) change the mount namespace of the calling process,
181 and hence affect all processes that live in the same namespace,
182 but do not affect processes in a different mount namespace.
183
184 After a clone() where the CLONE_NEWNS flag is set, the cloned
185 child is started in a new mount namespace, initialized with a
186 copy of the namespace of the parent.
187
188 Only a privileged process (one having the CAP_SYS_ADMIN capabil‐
189 ity) may specify the CLONE_NEWNS flag. It is not permitted to
190 specify both CLONE_NEWNS and CLONE_FS in the same clone() call.
191
192 CLONE_NEWPID (since Linux 2.6.24)
193 If CLONE_NEWPID is set, then create the process in a new PID
194 namespace. If this flag is not set, then (as with fork(2)), the
195 process is created in the same PID namespace as the calling
196 process. This flag is intended for the implementation of con‐
197 tainers.
198
199 A PID namespace provides an isolated environment for PIDs: PIDs
200 in a new namespace start at 1, somewhat like a standalone sys‐
201 tem, and calls to fork(2), vfork(2), or clone(2) will produce
202 processes with PIDs that are unique within the namespace.
203
204 The first process created in a new namespace (i.e., the process
205 created using the CLONE_NEWPID flag) has the PID 1, and is the
206 "init" process for the namespace. Children that are orphaned
207 within the namespace will be reparented to this process rather
208 than init(8). Unlike the traditional init process, the "init"
209 process of a PID namespace can terminate, and if it does, all of
210 the processes in the namespace are terminated.
211
212 PID namespaces form a hierarchy. When a new PID namespace is
213 created, the processes in that namespace are visible in the PID
214 namespace of the process that created the new namespace; analo‐
215 gously, if the parent PID namespace is itself the child of
216 another PID namespace, then processes in the child and parent
217 PID namespaces will both be visible in the grandparent PID
218 namespace. Conversely, the processes in the "child" PID names‐
219 pace do not see the processes in the parent namespace. The
220 existence of a namespace hierarchy means that each process may
221 now have multiple PIDs: one for each namespace in which it is
222 visible; each of these PIDs is unique within the corresponding
223 namespace. (A call to getpid(2) always returns the PID associ‐
224 ated with the namespace in which the process lives.)
225
226 After creating the new namespace, it is useful for the child to
227 change its root directory and mount a new procfs instance at
228 /proc so that tools such as ps(1) work correctly. (If
229 CLONE_NEWNS is also included in flags, then it isn't necessary
230 to change the root directory: a new procfs instance can be
231 mounted directly over /proc.)
232
233 Use of this flag requires: a kernel configured with the CON‐
234 FIG_PID_NS option and that the process be privileged
235 (CAP_SYS_ADMIN). This flag can't be specified in conjunction
236 with CLONE_THREAD.
237
238 CLONE_NEWUTS (since Linux 2.6.19)
239 If CLONE_NEWUTS is set, then create the process in a new UTS
240 namespace, whose identifiers are initialized by duplicating the
241 identifiers from the UTS namespace of the calling process. If
242 this flag is not set, then (as with fork(2)), the process is
243 created in the same UTS namespace as the calling process. This
244 flag is intended for the implementation of containers.
245
246 A UTS namespace is the set of identifiers returned by uname(2);
247 among these, the domain name and the host name can be modified
248 by setdomainname(2) and sethostname(2), respectively. Changes
249 made to the identifiers in a UTS namespace are visible to all
250 other processes in the same namespace, but are not visible to
251 processes in other UTS namespaces.
252
253 Use of this flag requires: a kernel configured with the CON‐
254 FIG_UTS_NS option and that the process be privileged
255 (CAP_SYS_ADMIN).
256
257 CLONE_PARENT (since Linux 2.3.12)
258 If CLONE_PARENT is set, then the parent of the new child (as
259 returned by getppid(2)) will be the same as that of the calling
260 process.
261
262 If CLONE_PARENT is not set, then (as with fork(2)) the child's
263 parent is the calling process.
264
265 Note that it is the parent process, as returned by getppid(2),
266 which is signaled when the child terminates, so that if
267 CLONE_PARENT is set, then the parent of the calling process,
268 rather than the calling process itself, will be signaled.
269
270 CLONE_PARENT_SETTID (since Linux 2.5.49)
271 Store child thread ID at location ptid in parent and child mem‐
272 ory. (In Linux 2.5.32-2.5.48 there was a flag CLONE_SETTID that
273 did this.)
274
275 CLONE_PID (obsolete)
276 If CLONE_PID is set, the child process is created with the same
277 process ID as the calling process. This is good for hacking the
278 system, but otherwise of not much use. Since 2.3.21 this flag
279 can be specified only by the system boot process (PID 0). It
280 disappeared in Linux 2.5.16.
281
282 CLONE_PTRACE
283 If CLONE_PTRACE is specified, and the calling process is being
284 traced, then trace the child also (see ptrace(2)).
285
286 CLONE_SETTLS (since Linux 2.5.32)
287 The newtls argument is the new TLS (Thread Local Storage)
288 descriptor. (See set_thread_area(2).)
289
290 CLONE_SIGHAND
291 If CLONE_SIGHAND is set, the calling process and the child
292 process share the same table of signal handlers. If the calling
293 process or child process calls sigaction(2) to change the behav‐
294 ior associated with a signal, the behavior is changed in the
295 other process as well. However, the calling process and child
296 processes still have distinct signal masks and sets of pending
297 signals. So, one of them may block or unblock some signals
298 using sigprocmask(2) without affecting the other process.
299
300 If CLONE_SIGHAND is not set, the child process inherits a copy
301 of the signal handlers of the calling process at the time
302 clone() is called. Calls to sigaction(2) performed later by one
303 of the processes have no effect on the other process.
304
305 Since Linux 2.6.0-test6, flags must also include CLONE_VM if
306 CLONE_SIGHAND is specified
307
308 CLONE_STOPPED (since Linux 2.6.0-test2)
309 If CLONE_STOPPED is set, then the child is initially stopped (as
310 though it was sent a SIGSTOP signal), and must be resumed by
311 sending it a SIGCONT signal.
312
313 From Linux 2.6.25 this flag is deprecated. You probably never
314 wanted to use it, you certainly shouldn't be using it, and soon
315 it will go away.
316
317 CLONE_SYSVSEM (since Linux 2.5.10)
318 If CLONE_SYSVSEM is set, then the child and the calling process
319 share a single list of System V semaphore undo values (see
320 semop(2)). If this flag is not set, then the child has a sepa‐
321 rate undo list, which is initially empty.
322
323 CLONE_THREAD (since Linux 2.4.0-test8)
324 If CLONE_THREAD is set, the child is placed in the same thread
325 group as the calling process. To make the remainder of the dis‐
326 cussion of CLONE_THREAD more readable, the term "thread" is used
327 to refer to the processes within a thread group.
328
329 Thread groups were a feature added in Linux 2.4 to support the
330 POSIX threads notion of a set of threads that share a single
331 PID. Internally, this shared PID is the so-called thread group
332 identifier (TGID) for the thread group. Since Linux 2.4, calls
333 to getpid(2) return the TGID of the caller.
334
335 The threads within a group can be distinguished by their (sys‐
336 tem-wide) unique thread IDs (TID). A new thread's TID is avail‐
337 able as the function result returned to the caller of clone(),
338 and a thread can obtain its own TID using gettid(2).
339
340 When a call is made to clone() without specifying CLONE_THREAD,
341 then the resulting thread is placed in a new thread group whose
342 TGID is the same as the thread's TID. This thread is the leader
343 of the new thread group.
344
345 A new thread created with CLONE_THREAD has the same parent
346 process as the caller of clone() (i.e., like CLONE_PARENT), so
347 that calls to getppid(2) return the same value for all of the
348 threads in a thread group. When a CLONE_THREAD thread termi‐
349 nates, the thread that created it using clone() is not sent a
350 SIGCHLD (or other termination) signal; nor can the status of
351 such a thread be obtained using wait(2). (The thread is said to
352 be detached.)
353
354 After all of the threads in a thread group terminate the parent
355 process of the thread group is sent a SIGCHLD (or other termina‐
356 tion) signal.
357
358 If any of the threads in a thread group performs an execve(2),
359 then all threads other than the thread group leader are termi‐
360 nated, and the new program is executed in the thread group
361 leader.
362
363 If one of the threads in a thread group creates a child using
364 fork(2), then any thread in the group can wait(2) for that
365 child.
366
367 Since Linux 2.5.35, flags must also include CLONE_SIGHAND if
368 CLONE_THREAD is specified.
369
370 Signals may be sent to a thread group as a whole (i.e., a TGID)
371 using kill(2), or to a specific thread (i.e., TID) using
372 tgkill(2).
373
374 Signal dispositions and actions are process-wide: if an unhan‐
375 dled signal is delivered to a thread, then it will affect (ter‐
376 minate, stop, continue, be ignored in) all members of the thread
377 group.
378
379 Each thread has its own signal mask, as set by sigprocmask(2),
380 but signals can be pending either: for the whole process (i.e.,
381 deliverable to any member of the thread group), when sent with
382 kill(2); or for an individual thread, when sent with tgkill(2).
383 A call to sigpending(2) returns a signal set that is the union
384 of the signals pending for the whole process and the signals
385 that are pending for the calling thread.
386
387 If kill(2) is used to send a signal to a thread group, and the
388 thread group has installed a handler for the signal, then the
389 handler will be invoked in exactly one, arbitrarily selected
390 member of the thread group that has not blocked the signal. If
391 multiple threads in a group are waiting to accept the same sig‐
392 nal using sigwaitinfo(2), the kernel will arbitrarily select one
393 of these threads to receive a signal sent using kill(2).
394
395 CLONE_UNTRACED (since Linux 2.5.46)
396 If CLONE_UNTRACED is specified, then a tracing process cannot
397 force CLONE_PTRACE on this child process.
398
399 CLONE_VFORK
400 If CLONE_VFORK is set, the execution of the calling process is
401 suspended until the child releases its virtual memory resources
402 via a call to execve(2) or _exit(2) (as with vfork(2)).
403
404 If CLONE_VFORK is not set then both the calling process and the
405 child are schedulable after the call, and an application should
406 not rely on execution occurring in any particular order.
407
408 CLONE_VM
409 If CLONE_VM is set, the calling process and the child process
410 run in the same memory space. In particular, memory writes per‐
411 formed by the calling process or by the child process are also
412 visible in the other process. Moreover, any memory mapping or
413 unmapping performed with mmap(2) or munmap(2) by the child or
414 calling process also affects the other process.
415
416 If CLONE_VM is not set, the child process runs in a separate
417 copy of the memory space of the calling process at the time of
418 clone(). Memory writes or file mappings/unmappings performed by
419 one of the processes do not affect the other, as with fork(2).
420
421 sys_clone
422 The sys_clone system call corresponds more closely to fork(2) in that
423 execution in the child continues from the point of the call. Thus,
424 sys_clone only requires the flags and child_stack arguments, which have
425 the same meaning as for clone(). (Note that the order of these argu‐
426 ments differs from clone().)
427
428 Another difference for sys_clone is that the child_stack argument may
429 be zero, in which case copy-on-write semantics ensure that the child
430 gets separate copies of stack pages when either process modifies the
431 stack. In this case, for correct operation, the CLONE_VM option should
432 not be specified.
433
434 In Linux 2.4 and earlier, clone() does not take arguments ptid, tls,
435 and ctid.
436
438 On success, the thread ID of the child process is returned in the call‐
439 er's thread of execution. On failure, -1 is returned in the caller's
440 context, no child process will be created, and errno will be set appro‐
441 priately.
442
444 EAGAIN Too many processes are already running.
445
446 EINVAL CLONE_SIGHAND was specified, but CLONE_VM was not. (Since Linux
447 2.6.0-test6.)
448
449 EINVAL CLONE_THREAD was specified, but CLONE_SIGHAND was not. (Since
450 Linux 2.5.35.)
451
452 EINVAL Both CLONE_FS and CLONE_NEWNS were specified in flags.
453
454 EINVAL Both CLONE_NEWIPC and CLONE_SYSVSEM were specified in flags.
455
456 EINVAL Both CLONE_NEWPID and CLONE_THREAD were specified in flags.
457
458 EINVAL Returned by clone() when a zero value is specified for
459 child_stack.
460
461 EINVAL CLONE_NEWIPC was specified in flags, but the kernel was not con‐
462 figured with the CONFIG_SYSVIPC and CONFIG_IPC_NS options.
463
464 EINVAL CLONE_NEWNET was specified in flags, but the kernel was not con‐
465 figured with the CONFIG_NET_NS option.
466
467 EINVAL CLONE_NEWPID was specified in flags, but the kernel was not con‐
468 figured with the CONFIG_PID_NS option.
469
470 EINVAL CLONE_NEWUTS was specified in flags, but the kernel was not con‐
471 figured with the CONFIG_UTS option.
472
473 ENOMEM Cannot allocate sufficient memory to allocate a task structure
474 for the child, or to copy those parts of the caller's context
475 that need to be copied.
476
477 EPERM CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or
478 CLONE_NEWUTS was specified by an unprivileged process (process
479 without CAP_SYS_ADMIN).
480
481 EPERM CLONE_PID was specified by a process other than process 0.
482
484 There is no entry for clone() in libc5. glibc2 provides clone() as
485 described in this manual page.
486
488 The clone() and sys_clone calls are Linux-specific and should not be
489 used in programs intended to be portable.
490
492 In the kernel 2.4.x series, CLONE_THREAD generally does not make the
493 parent of the new thread the same as the parent of the calling process.
494 However, for kernel versions 2.4.7 to 2.4.18 the CLONE_THREAD flag
495 implied the CLONE_PARENT flag (as in kernel 2.6).
496
497 For a while there was CLONE_DETACHED (introduced in 2.5.32): parent
498 wants no child-exit signal. In 2.6.2 the need to give this together
499 with CLONE_THREAD disappeared. This flag is still defined, but has no
500 effect.
501
502 On i386, clone() should not be called through vsyscall, but directly
503 through int $0x80.
504
505 On ia64, a different system call is used:
506
507 int __clone2(int (*fn)(void *),
508 void *child_stack_base, size_t stack_size,
509 int flags, void *arg, ...
510 /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
511
512 The __clone2() system call operates in the same way as clone(), except
513 that child_stack_base points to the lowest address of the child's stack
514 area, and stack_size specifies the size of the stack pointed to by
515 child_stack_base.
516
518 Versions of the GNU C library that include the NPTL threading library
519 contain a wrapper function for getpid(2) that performs caching of PIDs.
520 This caching relies on support in the glibc wrapper for clone(), but as
521 currently implemented, the cache may not be up to date in some circum‐
522 stances. In particular, if a signal is delivered to the child immedi‐
523 ately after the clone() call, then a call to getpid() in a handler for
524 the signal may return the PID of the calling process ("the parent"), if
525 the clone wrapper has not yet had a chance to update the PID cache in
526 the child. (This discussion ignores the case where the child was cre‐
527 ated using CLONE_THREAD, when getpid() should return the same value in
528 the child and in the process that called clone(), since the caller and
529 the child are in the same thread group. The stale-cache problem also
530 does not occur if the flags argument includes CLONE_VM.) To get the
531 truth, it may be necessary to use code such as the following:
532
533 #include <syscall.h>
534
535 pid_t mypid;
536
537 mypid = syscall(SYS_getpid);
538
540 fork(2), futex(2), getpid(2), gettid(2), set_thread_area(2),
541 set_tid_address(2), tkill(2), unshare(2), wait(2), capabilities(7),
542 pthreads(7)
543
545 This page is part of release 3.25 of the Linux man-pages project. A
546 description of the project, and information about reporting bugs, can
547 be found at http://www.kernel.org/doc/man-pages/.
548
549
550
551Linux 2009-07-18 CLONE(2)