1CH-RUN(1) Charliecloud CH-RUN(1)
2
3
4
6 ch-run - Run a command in a Charliecloud container
7
9 $ ch-run [OPTION...] IMAGE -- CMD [ARG...]
10
12 Run command CMD in a fully unprivileged Charliecloud container using
13 the image located at IMAGE, which can be either a directory or, if the
14 proper support is enabled, a SquashFS archive.
15
16 -b, --bind=SRC[:DST]
17 Bind-mount SRC at guest DST. The default destination if not
18 specified is to use the same path as the host; i.e., the de‐
19 fault is --bind=SRC:SRC. Can be repeated.
20
21 If --write is given and DST does not exist, it will be cre‐
22 ated as an empty directory. However, DST must be entirely
23 within the image itself; DST cannot enter a previous bind
24 mount. For example, --bind /foo:/tmp/foo will fail because
25 /tmp is shared with the host via bind-mount (unless $TMPDIR
26 is set to something else or --private-tmp is given).
27
28 Most images do have ten directories /mnt/[0-9] already avail‐
29 able as mount points.
30
31 Symlinks in DST are followed, and absolute links can have
32 surprising behavior. Bind-mounting happens after namespace
33 setup but before pivoting into the container image, so abso‐
34 lute links use the host root. For example, suppose the image
35 has a symlink /foo -> /mnt. Then, --bind=/bar:/foo will
36 bind-mount on the host’s /mnt, which is inaccessible on the
37 host because namespaces are already set up and also inacces‐
38 sible in the container because of the subsequent pivot into
39 the image. Currently, this problem is only detected when DST
40 needs to be created: ch-run will refuse to follow absolute
41 symlinks in this case, to avoid directory creation surprises.
42
43 -c, --cd=DIR
44 Initial working directory in container.
45
46 --ch-ssh
47 Bind ch-ssh(1) into container at /usr/bin/ch-ssh.
48
49 --env-no-expand
50 don’t expand variables when using --set-env
51
52 -g, --gid=GID
53 Run as group GID within container.
54
55 -j, --join
56 Use the same container (namespaces) as peer ch-run invoca‐
57 tions.
58
59 --join-pid=PID
60 Join the namespaces of an existing process.
61
62 --join-ct=N
63 Number of ch-run peers (implies --join; default: see below).
64
65 --join-tag=TAG
66 Label for ch-run peer group (implies --join; default: see be‐
67 low).
68
69 -m, --mount=DIR
70 Use DIR for the SquashFS mount point, which must already ex‐
71 ist. If not specified, the default is /var/tmp/$USER.ch/mnt,
72 which will be created if needed.
73
74 --no-home
75 By default, your host home directory (i.e., $HOME) is
76 bind-mounted at guest /home/$USER. This is accomplished by
77 mounting a new tmpfs at /home, which hides any image content
78 under that path. If this is specified, neither of these
79 things happens and the image’s /home is exposed unaltered.
80
81 --no-passwd
82 By default, temporary /etc/passwd and /etc/group files are
83 created according to the UID and GID maps for the container
84 and bind-mounted into it. If this is specified, no such tem‐
85 porary files are created and the image’s files are exposed.
86
87 -t, --private-tmp
88 By default, the host’s /tmp (or $TMPDIR if set) is
89 bind-mounted at container /tmp. If this is specified, a new
90 tmpfs is mounted on the container’s /tmp instead.
91
92 --set-env, --set-env=FILE, --set-env=VAR=VALUE
93 Set environment variable(s). With:
94
95 • no argument: as listed in file /ch/environment within
96 the image. It is an error if the file does not exist or
97 cannot be read. (Note that with SquashFS images, it is
98 not currently possible to use other files within the
99 image.)
100
101 • FILE (i.e., no equals in argument): as specified in
102 file at host path FILE. Again, it is an error if the
103 file cannot be read.
104
105 • NAME=VALUE (i.e., equals sign in argument): set vari‐
106 able NAME to VALUE.
107
108 See below for details on how environment variables work in
109 ch-run.
110
111 -u, --uid=UID
112 Run as user UID within container.
113
114 --unset-env=GLOB
115 Unset environment variables whose names match GLOB.
116
117 -v, --verbose
118 Be more verbose (can be repeated).
119
120 -w, --write
121 Mount image read-write (by default, the image is mounted
122 read-only).
123
124 -?, --help
125 Print help and exit.
126
127 --usage
128 Print a short usage message and exit.
129
130 -V, --version
131 Print version and exit.
132
133 Note: Because ch-run is fully unprivileged, it is not possible to
134 change UIDs and GIDs within the container (the relevant system calls
135 fail). In particular, setuid, setgid, and setcap executables do not
136 work. As a precaution, ch-run calls prctl(PR_SET_NO_NEW_PRIVS, 1) to
137 disable these executables within the container. This does not reduce
138 functionality but is a “belt and suspenders” precaution to reduce the
139 attack surface should bugs in these system calls or elsewhere arise.
140
142 ch-run supports two different image formats.
143
144 The first is a simple directory that contains a Linux filesystem tree.
145 This can be accomplished by:
146
147 • ch-convert directly from ch-image or another builder to a directory.
148
149 • Charliecloud’s tarball workflow: build or pull the image, ch-convert
150 it to a tarball, transfer the tarball to the target system, then
151 ch-convert the tarball to a directory.
152
153 • Manually mount a SquashFS image, e.g. with squashfuse(1) and then
154 un-mount it after run with fusermount -u.
155
156 • Any other workflow that produces an appropriate directory tree.
157
158 The second is a SquashFS image archive mounted internally by ch-run,
159 available if it’s linked with the optional libsquashfuse_ll. ch-run
160 mounts the image filesystem, services all FUSE requests, and unmounts
161 it, all within ch-run. See --mount above to set the mount point loca‐
162 tion.
163
164 Prior versions of Charliecloud provided wrappers for the squashfuse and
165 squashfuse_ll SquashFS mount commands and fusermount -u unmount com‐
166 mand. We removed these because we concluded they had minimal value-add
167 over the standard, unwrapped commands.
168
169 WARNING:
170 Currently, Charliecloud unmounts the SquashFS filesystem when user
171 command CMD’s process exits. It does not monitor any of its child
172 processes. Therefore, if the user command spawns child processes and
173 then exits before them (e.g., some daemons), those children will
174 have the image unmounted from underneath them. In this case, the
175 workaround is to mount/unmount using external tools. We expect to
176 remove this limitation in a future version.
177
179 In addition to any directories specified by the user with --bind,
180 ch-run has standard host files and directories that are bind-mounted in
181 as well.
182
183 The following host files and directories are bind-mounted at the same
184 location in the container. These give access to the host’s devices and
185 various kernel facilities. (Recall that Charliecloud provides minimal
186 isolation and containerized processes are mostly normal unprivileged
187 processes.) They cannot be disabled and are required; i.e., they must
188 exist both on host and within the image.
189
190 • /dev
191
192 • /proc
193
194 • /sys
195
196 Optional; bind-mounted only if path exists on both host and within the
197 image, without error or warning if not.
198
199 • /etc/hosts and /etc/resolv.conf. Because Charliecloud containers
200 share the host network namespace, they need the same hostname res‐
201 olution configuration.
202
203 • /etc/machine-id. Provides a unique ID for the OS installation;
204 matching the host works for most situations. Needed to support
205 D-Bus, some software licensing situations, and likely other use
206 cases. See also issue #1050.
207
208 • /var/lib/hugetlbfs at guest /var/opt/cray/hugetlbfs, and
209 /var/opt/cray/alps/spool. These support Cray MPI.
210
211 • $PREFIX/bin/ch-ssh at guest /usr/bin/ch-ssh. SSH wrapper that au‐
212 tomatically containerizes after connecting.
213
214 Additional bind mounts done by default but can be disabled; see the op‐
215 tions above.
216
217 • $HOME at /home/$USER (and image /home is hidden). Makes user data
218 and init files available.
219
220 • /tmp (or $TMPDIR if set) at guest /tmp. Provides a temporary di‐
221 rectory that persists between container runs and is shared with
222 non-containerized application components.
223
224 • temporary files at /etc/passwd and /etc/group. Usernames and group
225 names need to be customized for each container run.
226
228 By default, different ch-run invocations use different user and mount
229 namespaces (i.e., different containers). While this has no impact on
230 sharing most resources between invocations, there are a few important
231 exceptions. These include:
232
233 1. ptrace(2), used by debuggers and related tools. One can attach a de‐
234 bugger to processes in descendant namespaces, but not sibling name‐
235 spaces. The practical effect of this is that (without --join), you
236 can’t run a command with ch-run and then attach to it with a debug‐
237 ger also run with ch-run.
238
239 2. Cross-memory attach (CMA) is used by cooperating processes to commu‐
240 nicate by simply reading and writing one another’s memory. This is
241 also not permitted between sibling namespaces. This affects various
242 MPI implementations that use CMA to pass messages between ranks on
243 the same node, because it’s faster than traditional shared memory.
244
245 --join is designed to address this by placing related ch-run commands
246 (the “peer group”) in the same container. This is done by one of the
247 peers creating the namespaces with unshare(2) and the others joining
248 with setns(2).
249
250 To do so, we need to know the number of peers and a name for the group.
251 These are specified by additional arguments that can (hopefully) be
252 left at default values in most cases:
253
254 • --join-ct sets the number of peers. The default is the value of the
255 first of the following environment variables that is defined:
256 OMPI_COMM_WORLD_LOCAL_SIZE, SLURM_STEP_TASKS_PER_NODE,
257 SLURM_CPUS_ON_NODE.
258
259 • --join-tag sets the tag that names the peer group. The default is en‐
260 vironment variable SLURM_STEP_ID, if defined; otherwise, the PID of
261 ch-run’s parent. Tags can be re-used for peer groups that start at
262 different times, i.e., once all peer ch-run have replaced themselves
263 with the user command, the tag can be re-used.
264
265 Caveats:
266
267 • One cannot currently add peers after the fact, for example, if one
268 decides to start a debugger after the fact. (This is only required
269 for code with bugs and is thus an unusual use case.)
270
271 • ch-run instances race. The winner of this race sets up the name‐
272 spaces, and the other peers use the winner to find the namespaces to
273 join. Therefore, if the user command of the winner exits, any remain‐
274 ing peers will not be able to join the namespaces, even if they are
275 still active. There is currently no general way to specify which
276 ch-run should be the winner.
277
278 • If --join-ct is too high, the winning ch-run’s user command exits be‐
279 fore all peers join, or ch-run itself crashes, IPC resources such as
280 semaphores and shared memory segments will be leaked. These appear as
281 files in /dev/shm/ and can be removed with rm(1).
282
283 • Many of the arguments given to the race losers, such as the image
284 path and --bind, will be ignored in favor of what was given to the
285 winner.
286
288 ch-run leaves environment variables unchanged, i.e. the host environ‐
289 ment is passed through unaltered, except:
290
291 • limited tweaks to avoid significant guest breakage;
292
293 • user-set variables via --set-env;
294
295 • user-unset variables via --unset-env; and
296
297 • set CH_RUNNING.
298
299 This section describes these features.
300
301 The default tweaks happen first, then --set-env and --unset-env in the
302 order specified on the command line, and then CH_RUNNING. The two op‐
303 tions can be repeated arbitrarily many times, e.g. to add/remove multi‐
304 ple variable sets or add only some variables in a file.
305
306 Default behavior
307 By default, ch-run makes the following environment variable changes:
308
309 • $CH_RUNNING: Set to Weird Al Yankovic. While a process can figure out
310 that it’s in an unprivileged container and what namespaces are active
311 without this hint, that can be messy, and there is no way to tell
312 that it’s a Charliecloud container specifically. This variable makes
313 such a test simple and well-defined. (Note: This variable is unaf‐
314 fected by --unset-env.)
315
316 • $HOME: If the path to your home directory is not /home/$USER on the
317 host, then an inherited $HOME will be incorrect inside the guest.
318 This confuses some software, such as Spack. Thus, we change $HOME to
319 /home/$USER, unless --no-home is specified, in which case it is left
320 unchanged.
321
322 • $PATH: Newer Linux distributions replace some root-level directories,
323 such as /bin, with symlinks to their counterparts in /usr.
324
325 Some of these distributions (e.g., Fedora 24) have also dropped /bin
326 from the default $PATH. This is a problem when the guest OS does not
327 have a merged /usr (e.g., Debian 8 “Jessie”). Thus, we add /bin to
328 $PATH if it’s not already present.
329
330 Further reading:
331
332 • The case for the /usr Merge
333
334 • Fedora
335
336 • Debian
337
338 • $TMPDIR: Unset, because this is almost certainly a host path, and
339 that host path is made available in the guest at /tmp unless --pri‐
340 vate-tmp is given.
341
342 Setting variables with --set-env
343 The purpose of --set-env is to set environment variables within the
344 container. Values given replace any already in the environment (i.e.,
345 inherited from the host shell) or set by earlier --set-env. This flag
346 takes an optional argument with two possible forms:
347
348 1. If the argument contains an equals sign (=, ASCII 61), that sets an
349 environment variable directly. For example, to set FOO to the string
350 value bar:
351
352 $ ch-run --set-env=FOO=bar ...
353
354 Single straight quotes around the value (', ASCII 39) are stripped,
355 though be aware that both single and double quotes are also inter‐
356 preted by the shell. For example, this example is similar to the
357 prior one; the double quotes are removed by the shell and the single
358 quotes are removed by ch-run:
359
360 $ ch-run --set-env="'BAZ=qux'" ...
361
362 2. If the argument does not contain an equals sign, it is a host path
363 to a file containing zero or more variables using the same syntax as
364 above (except with no prior shell processing). This file contains a
365 sequence of assignments separated by newlines. Empty lines are ig‐
366 nored, and no comments are interpreted. (This syntax is designed to
367 accept the output of printenv and be easily produced by other simple
368 mechanisms.) For example:
369
370 $ cat /tmp/env.txt
371 FOO=bar
372 BAZ='qux'
373 $ ch-run --set-env=/tmp/env.txt ...
374
375 For directory images only (because the file is read before con‐
376 tainerizing), guest paths can be given by prepending the image path.
377
378 3. If there is no argument, the file /ch/environment within the image
379 is used. This file is commonly populated by ENV instructions in the
380 Dockerfile. For example, equivalently to form 2:
381
382 $ cat Dockerfile
383 [...]
384 ENV FOO=bar
385 ENV BAZ=qux
386 [...]
387 $ ch-image build -t foo .
388 $ ch-convert foo /var/tmp/foo.sqfs
389 $ ch-run --set-env /var/tmp/foo.sqfs -- ...
390
391 (Note the image path is interpreted correctly, not as the --set-env
392 argument.)
393
394 At present, there is no way to use files other than /ch/environment
395 within SquashFS images.
396
397 Environment variables are expanded for values that look like search
398 paths, unless --env-no-expand is given prior to --set-env. In this
399 case, the value is a sequence of zero or more possibly-empty items sep‐
400 arated by colon (:, ASCII 58). If an item begins with dollar sign ($,
401 ASCII 36), then the rest of the item is the name of an environment
402 variable. If this variable is set to a non-empty value, that value is
403 substituted for the item; otherwise (i.e., the variable is unset or the
404 empty string), the item is deleted, including a delimiter colon. The
405 purpose of omitting empty expansions is to avoid surprising behavior
406 such as an empty element in $PATH meaning the current directory.
407
408 For example, to set HOSTPATH to the search path in the current shell
409 (this is expanded by ch-run, though letting the shell do it happens to
410 be equivalent):
411
412 $ ch-run --set-env='HOSTPATH=$PATH' ...
413
414 To prepend /opt/bin to this current search path:
415
416 $ ch-run --set-env='PATH=/opt/bin:$PATH' ...
417
418 To prepend /opt/bin to the search path set by the Dockerfile, as re‐
419 trieved from guest file /ch/environment (here we really cannot let the
420 shell expand $PATH):
421
422 $ ch-run --set-env --set-env='PATH=/opt/bin:$PATH' ...
423
424 Examples of valid assignment, assuming that environment variable BAR is
425 set to bar and UNSET is unset or set to the empty string:
426
427 ┌───────────────────┬───────┬─────────────────────┐
428 │Assignment │ Name │ Value │
429 ├───────────────────┼───────┼─────────────────────┤
430 │FOO=bar │ FOO │ bar │
431 ├───────────────────┼───────┼─────────────────────┤
432 │FOO=bar=baz │ FOO │ bar=baz │
433 ├───────────────────┼───────┼─────────────────────┤
434 │FLAGS=-march=foo │ FLAGS │ -march=foo │
435 │-mtune=bar │ │ -mtune=bar │
436 ├───────────────────┼───────┼─────────────────────┤
437 │FLAGS='-march=foo │ FLAGS │ -march=foo │
438 │-mtune=bar' │ │ -mtune=bar │
439 ├───────────────────┼───────┼─────────────────────┤
440 │FOO=$BAR │ FOO │ bar │
441 ├───────────────────┼───────┼─────────────────────┤
442 │FOO=$BAR:baz │ FOO │ bar:baz │
443 ├───────────────────┼───────┼─────────────────────┤
444 │FOO= │ FOO │ empty string │
445 ├───────────────────┼───────┼─────────────────────┤
446 │FOO=$UNSET │ FOO │ empty string │
447 ├───────────────────┼───────┼─────────────────────┤
448 │FOO=baz:$UNSET:qux │ FOO │ baz:qux (not │
449 │ │ │ baz::qux) │
450 ├───────────────────┼───────┼─────────────────────┤
451 │FOO=:bar:baz:: │ FOO │ :bar:baz:: │
452 ├───────────────────┼───────┼─────────────────────┤
453 │FOO='' │ FOO │ empty string │
454 ├───────────────────┼───────┼─────────────────────┤
455 │FOO='''' │ FOO │ '' (two single │
456 │ │ │ quotes) │
457 └───────────────────┴───────┴─────────────────────┘
458
459 Example invalid assignments:
460
461 ┌───────────┬──────────────────────┐
462 │Assignment │ Problem │
463 ├───────────┼──────────────────────┤
464 │FOO bar │ no equals separator │
465 ├───────────┼──────────────────────┤
466 │=bar │ name cannot be empty │
467 └───────────┴──────────────────────┘
468
469 Example valid assignments that are probably not what you want:
470
471 ┌─────────────────┬───────┬───────────┬──────────────────┐
472 │Assignment │ Name │ Value │ Problem │
473 ├─────────────────┼───────┼───────────┼──────────────────┤
474 │FOO="bar" │ FOO │ "bar" │ double quotes │
475 │ │ │ │ aren’t stripped │
476 ├─────────────────┼───────┼───────────┼──────────────────┤
477 │FOO=bar # baz │ FOO │ bar # baz │ comments not │
478 │ │ │ │ supported │
479 ├─────────────────┼───────┼───────────┼──────────────────┤
480 │FOO=bar\tbaz │ FOO │ bar\tbaz │ backslashes are │
481 │ │ │ │ not special │
482 ├─────────────────┼───────┼───────────┼──────────────────┤
483 │ FOO=bar │ FOO │ bar │ leading space in │
484 │ │ │ │ key │
485 ├─────────────────┼───────┼───────────┼──────────────────┤
486 │FOO= bar │ FOO │ bar │ leading space in │
487 │ │ │ │ value │
488 ├─────────────────┼───────┼───────────┼──────────────────┤
489 │$FOO=bar │ $FOO │ bar │ variables not │
490 │ │ │ │ expanded in key │
491 ├─────────────────┼───────┼───────────┼──────────────────┤
492 │FOO=$BAR baz:qux │ FOO │ qux │ variable BAR baz │
493 │ │ │ │ not set │
494 └─────────────────┴───────┴───────────┴──────────────────┘
495
496 Removing variables with --unset-env
497 The purpose of --unset-env=GLOB is to remove unwanted environment vari‐
498 ables. The argument GLOB is a glob pattern (dialect fnmatch(3) with no
499 flags); all variables with matching names are removed from the environ‐
500 ment.
501
502 WARNING:
503 Because the shell also interprets glob patterns, if any wildcard
504 characters are in GLOB, it is important to put it in single quotes
505 to avoid surprises.
506
507 GLOB must be a non-empty string.
508
509 Example 1: Remove the single environment variable FOO:
510
511 $ export FOO=bar
512 $ env | fgrep FOO
513 FOO=bar
514 $ ch-run --unset-env=FOO $CH_TEST_IMGDIR/chtest -- env | fgrep FOO
515 $
516
517 Example 2: Hide from a container the fact that it’s running in a Slurm
518 allocation, by removing all variables beginning with SLURM. You might
519 want to do this to test an MPI program with one rank and no launcher:
520
521 $ salloc -N1
522 $ env | egrep '^SLURM' | wc
523 44 44 1092
524 $ ch-run $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello
525 [... long error message ...]
526 $ ch-run --unset-env='SLURM*' $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello
527 0: MPI version:
528 Open MPI v3.1.3, package: Open MPI root@c897a83f6f92 Distribution, ident: 3.1.3, repo rev: v3.1.3, Oct 29, 2018
529 0: init ok cn001.localdomain, 1 ranks, userns 4026532530
530 0: send/receive ok
531 0: finalize ok
532
533 Example 3: Clear the environment completely (remove all variables):
534
535 $ ch-run --unset-env='*' $CH_TEST_IMGDIR/chtest -- env
536 $
537
538 Note that some programs, such as shells, set some environment variables
539 even if started with no init files:
540
541 $ ch-run --unset-env='*' $CH_TEST_IMGDIR/debian9 -- bash --noprofile --norc -c env
542 SHLVL=1
543 PWD=/
544 _=/usr/bin/env
545 $
546
548 Run the command echo hello inside a Charliecloud container using the
549 unpacked image at /data/foo:
550
551 $ ch-run /data/foo -- echo hello
552 hello
553
554 Run an MPI job that can use CMA to communicate:
555
556 $ srun ch-run --join /data/foo -- bar
557
559 By default, ch-run logs its command line to syslog. (This can be dis‐
560 abled by configuring with --disable-syslog.) This includes: (1) the in‐
561 voking real UID, (2) the number of command line arguments, and (3) the
562 arguments, separated by spaces. For example:
563
564 Dec 10 18:19:08 mybox ch-run: uid=1000 args=7: ch-run -v /var/tmp/00_tiny -- echo hello "wor l}\$d"
565
566 Logging is one of the first things done during program initialization,
567 even before command line parsing. That is, almost all command lines are
568 logged, even if erroneous, and there is no logging of program success
569 or failure.
570
571 Arguments are serialized with the following procedure. The purpose is
572 to provide a human-readable reconstruction of the command line while
573 also allowing each argument to be recovered byte-for-byte.
574
575 • If an argument contains only printable ASCII bytes that are not
576 whitespace, shell metacharacters, double quote (", ASCII 34 deci‐
577 mal), or backslash (\, ASCII 92), then log it unchanged.
578
579 • Otherwise, (a) enclose the argument in double quotes and (b) back‐
580 slash-escape double quotes, backslashes, and characters inter‐
581 preted by Bash (including POSIX shells) within double quotes.
582
583 The verbatim command line typed in the shell cannot be recovered, be‐
584 cause not enough information is provided to UNIX programs. For example,
585 echo 'foo' is given to programs as a sequence of two arguments, echo
586 and foo; the two spaces and single quotes are removed by the shell. The
587 zero byte, ASCII NUL, cannot appear in arguments because it would ter‐
588 minate the string.
589
591 If there is an error during containerization, ch-run exits with status
592 non-zero. If the user command is started successfully, the exit status
593 is that of the user command, with one exception: if the image is an in‐
594 ternally mounted SquashFS filesystem and the user command is killed by
595 a signal, the exit status is 1 regardless of the signal value.
596
598 If Charliecloud was obtained from your Linux distribution, use your
599 distribution’s bug reporting procedures.
600
601 Otherwise, report bugs to: https://github.com/hpc/charliecloud/issues
602
604 charliecloud(7)
605
606 Full documentation at: <https://hpc.github.io/charliecloud>
607
609 2014–2021, Triad National Security, LLC
610
611
612
613
6140.26 2022-01-24 00:00 UTC CH-RUN(1)