1CH-RUN(1) Charliecloud CH-RUN(1)
2
3
4
6 ch-run - Run a command in a Charliecloud container
7
9 $ ch-run [OPTION...] NEWROOT CMD [ARG...]
10
12 Run command CMD in a fully unprivileged Charliecloud container using
13 the flattened and unpacked image directory located at NEWROOT.
14
15 -b, --bind=SRC[:DST]
16 Bind-mount SRC at guest DST. The default destination if not
17 specified is to use the same path as the host; i.e., the de‐
18 fault is --bind=SRC:SRC. Can be repeated.
19
20 If --write is given and DST does not exist, it will be cre‐
21 ated as an empty directory. However, DST must be entirely
22 within the image itself; DST cannot enter a previous bind
23 mount. For example, --bind /foo:/tmp/foo will fail because
24 /tmp is shared with the host via bind-mount (unless --pri‐
25 vate-tmp is given).
26
27 Most images do have ten directories /mnt/[0-9] already avail‐
28 able as mount points.
29
30 Symlinks in DST are followed, and absolute links can have
31 surprising behavior. Bind-mounting happens after namespace
32 setup but before pivoting into the container image, so abso‐
33 lute links use the host root. For example, suppose the image
34 has a symlink /foo -> /mnt. Then, --bind=/bar:/foo will
35 bind-mount on the host’s /mnt, which is inaccessible on the
36 host because namespaces are already set up and also inacces‐
37 sible in the container because of the subsequent pivot into
38 the image. Currently, this problem is only detected when DST
39 needs to be created: ch-run will refuse to follow absolute
40 symlinks in this case, to avoid directory creation surprises.
41
42 -c, --cd=DIR
43 Initial working directory in container.
44
45 --ch-ssh
46 Bind ch-ssh(1) into container at /usr/bin/ch-ssh.
47
48 --env-no-expand
49 don’t expand variables when using --set-env
50
51 -g, --gid=GID
52 Run as group GID within container.
53
54 -j, --join
55 Use the same container (namespaces) as peer ch-run invoca‐
56 tions.
57
58 --join-pid=PID
59 Join the namespaces of an existing process.
60
61 --join-ct=N
62 Number of ch-run peers (implies --join; default: see below).
63
64 --join-tag=TAG
65 Label for ch-run peer group (implies --join; default: see be‐
66 low).
67
68 --no-home
69 By default, your host home directory (i.e., $HOME) is
70 bind-mounted at guest /home/$USER. This is accomplished by
71 mounting a new tmpfs at /home, which hides any image content
72 under that path. If this is specified, neither of these
73 things happens and the image’s /home is exposed unaltered.
74
75 --no-passwd
76 By default, temporary /etc/passwd and /etc/group files are
77 created according to the UID and GID maps for the container
78 and bind-mounted into it. If this is specified, no such tem‐
79 porary files are created and the image’s files are exposed.
80
81 -t, --private-tmp
82 By default, /tmp is shared with the host. If this is speci‐
83 fied, a new tmpfs is mounted on the container’s /tmp instead.
84
85 --set-env=FILE, --set-env=VAR=VALUE
86 set environment variable(s), either as specified in host path
87 FILE, or set variable VAR to VALUE
88
89 -u, --uid=UID
90 Run as user UID within container.
91
92 --unset-env=GLOB
93 Unset environment variables whose names match GLOB.
94
95 -v, --verbose
96 Be more verbose (can be repeated).
97
98 -w, --write
99 Mount image read-write (by default, the image is mounted
100 read-only).
101
102 -?, --help
103 Print help and exit.
104
105 --usage
106 Print a short usage message and exit.
107
108 -V, --version
109 Print version and exit.
110
111 Note: Because ch-run is fully unprivileged, it is not possible to
112 change UIDs and GIDs within the container (the relevant system calls
113 fail). In particular, setuid, setgid, and setcap executables do not
114 work. As a precaution, ch-run calls prctl(PR_SET_NO_NEW_PRIVS, 1) to
115 disable these executables within the container. This does not reduce
116 functionality but is a “belt and suspenders” precaution to reduce the
117 attack surface should bugs in these system calls or elsewhere arise.
118
120 In addition to any directories specified by the user with --bind,
121 ch-run has standard host files and directories that are bind-mounted in
122 as well.
123
124 The following host files and directories are bind-mounted at the same
125 location in the container. These give access to the host’s devices and
126 various kernel facilities. (Recall that Charliecloud provides minimal
127 isolation and containerized processes are mostly normal unprivileged
128 processes.) They cannot be disabled and are required; i.e., they must
129 exist both on host and within the image.
130
131 • /dev
132
133 • /proc
134
135 • /sys
136
137 Optional; bind-mounted only if path exists on both host and within the
138 image, without error or warning if not.
139
140 • /etc/hosts and /etc/resolv.conf. Because Charliecloud containers
141 share the host network namespace, they need the same hostname res‐
142 olution configuration.
143
144 • /etc/machine-id. Provides a unique ID for the OS installation;
145 matching the host works for most situations. Needed to support
146 D-Bus, some software licensing situations, and likely other use
147 cases. See also issue #1050.
148
149 • /var/lib/hugetlbfs at guest /var/opt/cray/hugetlbfs, and
150 /var/opt/cray/alps/spool. These support Cray MPI.
151
152 • $PREFIX/bin/ch-ssh at guest /usr/bin/ch-ssh. SSH wrapper that au‐
153 tomatically containerizes after connecting.
154
155 Additional bind mounts done by default but can be disabled; see the op‐
156 tions above.
157
158 • $HOME at /home/$USER (and image /home is hidden). Makes user data
159 and init files available.
160
161 • /tmp. Provides a temporary directory that persists between con‐
162 tainer runs and is shared with non-containerized application com‐
163 ponents.
164
165 • temporary files at /etc/passwd and /etc/group. Usernames and group
166 names need to be customized for each container run.
167
169 By default, different ch-run invocations use different user and mount
170 namespaces (i.e., different containers). While this has no impact on
171 sharing most resources between invocations, there are a few important
172 exceptions. These include:
173
174 1. ptrace(2), used by debuggers and related tools. One can attach a de‐
175 bugger to processes in descendant namespaces, but not sibling name‐
176 spaces. The practical effect of this is that (without --join), you
177 can’t run a command with ch-run and then attach to it with a debug‐
178 ger also run with ch-run.
179
180 2. Cross-memory attach (CMA) is used by cooperating processes to commu‐
181 nicate by simply reading and writing one another’s memory. This is
182 also not permitted between sibling namespaces. This affects various
183 MPI implementations that use CMA to pass messages between ranks on
184 the same node, because it’s faster than traditional shared memory.
185
186 --join is designed to address this by placing related ch-run commands
187 (the “peer group”) in the same container. This is done by one of the
188 peers creating the namespaces with unshare(2) and the others joining
189 with setns(2).
190
191 To do so, we need to know the number of peers and a name for the group.
192 These are specified by additional arguments that can (hopefully) be
193 left at default values in most cases:
194
195 • --join-ct sets the number of peers. The default is the value of the
196 first of the following environment variables that is defined:
197 OMPI_COMM_WORLD_LOCAL_SIZE, SLURM_STEP_TASKS_PER_NODE,
198 SLURM_CPUS_ON_NODE.
199
200 • --join-tag sets the tag that names the peer group. The default is en‐
201 vironment variable SLURM_STEP_ID, if defined; otherwise, the PID of
202 ch-run’s parent. Tags can be re-used for peer groups that start at
203 different times, i.e., once all peer ch-run have replaced themselves
204 with the user command, the tag can be re-used.
205
206 Caveats:
207
208 • One cannot currently add peers after the fact, for example, if one
209 decides to start a debugger after the fact. (This is only required
210 for code with bugs and is thus an unusual use case.)
211
212 • ch-run instances race. The winner of this race sets up the name‐
213 spaces, and the other peers use the winner to find the namespaces to
214 join. Therefore, if the user command of the winner exits, any remain‐
215 ing peers will not be able to join the namespaces, even if they are
216 still active. There is currently no general way to specify which
217 ch-run should be the winner.
218
219 • If --join-ct is too high, the winning ch-run’s user command exits be‐
220 fore all peers join, or ch-run itself crashes, IPC resources such as
221 semaphores and shared memory segments will be leaked. These appear as
222 files in /dev/shm/ and can be removed with rm(1).
223
224 • Many of the arguments given to the race losers, such as the image
225 path and --bind, will be ignored in favor of what was given to the
226 winner.
227
229 ch-run leaves environment variables unchanged, i.e. the host environ‐
230 ment is passed through unaltered, except:
231
232 • limited tweaks to avoid significant guest breakage;
233
234 • user-set variables via --set-env;
235
236 • user-unset variables via --unset-env; and
237
238 • set CH_RUNNING.
239
240 This section describes these features.
241
242 The default tweaks happen first, and then --set-env and --unset-env in
243 the order specified on the command line. The latter two can be repeated
244 arbitrarily many times, e.g. to add/remove multiple variable sets or
245 add only some variables in a file.
246
247 Default behavior
248 By default, ch-run makes the following environment variable changes:
249
250 • $CH_RUNNING: Set to Weird Al Yankovic. While a process can figure out
251 that it’s in an unprivileged container and what namespaces are active
252 without this hint, the checks can be messy, and there is no way to
253 tell that it’s a Charliecloud container specifically. This variable
254 makes such a test simple and well-defined. (Note: This variable is
255 unaffected by --unset-env.)
256
257 • $HOME: If the path to your home directory is not /home/$USER on the
258 host, then an inherited $HOME will be incorrect inside the guest.
259 This confuses some software, such as Spack.
260
261 Thus, we change $HOME to /home/$USER, unless --no-home is specified,
262 in which case it is left unchanged.
263
264 • $PATH: Newer Linux distributions replace some root-level directories,
265 such as /bin, with symlinks to their counterparts in /usr.
266
267 Some of these distributions (e.g., Fedora 24) have also dropped /bin
268 from the default $PATH. This is a problem when the guest OS does not
269 have a merged /usr (e.g., Debian 8 “Jessie”). Thus, we add /bin to
270 $PATH if it’s not already present.
271
272 Further reading:
273
274 • The case for the /usr Merge
275
276 • Fedora
277
278 • Debian
279
280 Setting variables with --set-env
281 The purpose of --set-env is to set environment variables in addition to
282 (or instead of) those inherited from the host shell.
283
284 If the argument contains an equals character, then it is interpreted as
285 a variable name and value; otherwise, it is a host path to a file with
286 one variable name/value per line (guest paths can be specified by
287 prepending the image path). Values given replace any already set (i.e.,
288 if a variable is repeated, the last value wins). Environment variables
289 in the value are expanded unless --env-no-expand is given, though see
290 below for syntax differences from the shell.
291
292 For example, to prepend /opt/bin to the current shell’s path (note pro‐
293 tecting expansion of $PATH by the shell, though here the results would
294 be equivalent if we let the shell do it):
295
296 $ ch-run --set-env='PATH=/opt/bin:$PATH' ...
297
298 To add variables set by Dockerfile ENV instructions to the current en‐
299 vironment:
300
301 $ ch-run --set-env=$IMG/ch/environment ...
302
303 To prepend /opt/bin to the path set by the Dockerfile (here we really
304 can’t let the shell expand $PATH):
305
306 $ ch-run --set-env=$IMG/ch/environment --set-env='PATH=/opt/bin:$PATH' ...
307
308 The syntax of the argument is a key-value pair separated by the first
309 equals character (=, ASCII 61), with optional single straight quotes
310 (', ASCII 39) around the value, though be aware that quotes are also
311 interpreted by the shell. Newlines (ASCII 10) are not permitted in ei‐
312 ther key or value. The value may be empty, but not the key.
313
314 Environment variables in the value are expanded unless --env-no-expand
315 is given. In this case, the value is a sequence of possibly-empty items
316 separated by colon (:, ASCII 58). If an item begins with dollar sign
317 ($, ASCII 36), then the rest of the item the name of an environment
318 variable. If this variable is set to a non-empty value, that value is
319 substituted for the item; otherwise (i.e., the variable is unset or the
320 empty string), the item is deleted, including a delimiter colon. The
321 purpose of omitting empty expansions is to avoid surprising behavior
322 such as an empty element in $PATH meaning the current directory. If no
323 expansions happen, this paragraph is a no-op.
324
325 If a file is given instead, it is a sequence of such arguments, one per
326 line. Empty lines are ignored. No comments are interpreted. (This syn‐
327 tax is designed to accept the output of printenv and be easily produced
328 by other simple mechanisms.)
329
330 Examples of valid arguments, assuming that environment variable $BAR is
331 set to bar and $UNSET is unset (or set to the empty string):
332
333 ┌───────────────────┬───────┬─────────────────────┐
334 │Line │ Key │ Value │
335 ├───────────────────┼───────┼─────────────────────┤
336 │FOO=bar │ FOO │ bar │
337 ├───────────────────┼───────┼─────────────────────┤
338 │FOO=bar=baz │ FOO │ bar=baz │
339 ├───────────────────┼───────┼─────────────────────┤
340 │FLAGS=-march=foo │ FLAGS │ -march=foo │
341 │-mtune=bar │ │ -mtune=bar │
342 ├───────────────────┼───────┼─────────────────────┤
343 │FLAGS='-march=foo │ FLAGS │ -march=foo │
344 │-mtune=bar' │ │ -mtune=bar │
345 ├───────────────────┼───────┼─────────────────────┤
346 │FOO=$BAR │ FOO │ bar │
347 ├───────────────────┼───────┼─────────────────────┤
348 │FOO=$BAR:baz │ FOO │ bar:baz │
349 ├───────────────────┼───────┼─────────────────────┤
350 │FOO= │ FOO │ empty string (not │
351 │ │ │ unset) │
352 ├───────────────────┼───────┼─────────────────────┤
353 │FOO=$UNSET │ FOO │ empty string (not │
354 │ │ │ unset or $UNSET) │
355 ├───────────────────┼───────┼─────────────────────┤
356 │FOO=baz:$UNSET:qux │ FOO │ baz:qux (not │
357 │ │ │ baz::qux) │
358 ├───────────────────┼───────┼─────────────────────┤
359 │FOO=:bar:baz:: │ FOO │ :bar:baz:: │
360 ├───────────────────┼───────┼─────────────────────┤
361 │FOO='' │ FOO │ empty string (not │
362 │ │ │ unset) │
363 ├───────────────────┼───────┼─────────────────────┤
364 │FOO='''' │ FOO │ '' (two single │
365 │ │ │ quotes) │
366 └───────────────────┴───────┴─────────────────────┘
367
368 Example invalid lines:
369
370 ┌────────┬─────────────────────┐
371 │Line │ Problem │
372 ├────────┼─────────────────────┤
373 │FOO bar │ no separator │
374 ├────────┼─────────────────────┤
375 │=bar │ key cannot be empty │
376 └────────┴─────────────────────┘
377
378 Example valid lines that are probably not what you want:
379
380 ┌─────────────────┬───────┬───────────┬──────────────────┐
381 │Line │ Key │ Value │ Problem │
382 ├─────────────────┼───────┼───────────┼──────────────────┤
383 │FOO="bar" │ FOO │ "bar" │ double quotes │
384 │ │ │ │ aren’t stripped │
385 ├─────────────────┼───────┼───────────┼──────────────────┤
386 │FOO=bar # baz │ FOO │ bar # baz │ comments not │
387 │ │ │ │ supported │
388 ├─────────────────┼───────┼───────────┼──────────────────┤
389 │FOO=bar\tbaz │ FOO │ bar\tbaz │ backslashes are │
390 │ │ │ │ not special │
391 ├─────────────────┼───────┼───────────┼──────────────────┤
392 │ FOO=bar │ FOO │ bar │ leading space in │
393 │ │ │ │ key │
394 ├─────────────────┼───────┼───────────┼──────────────────┤
395 │FOO= bar │ FOO │ bar │ leading space in │
396 │ │ │ │ value │
397 └─────────────────┴───────┴───────────┴──────────────────┘
398
399
400 │$FOO=bar │ $FOO │ bar │ variables not │
401 │ │ │ │ expanded in key │
402 ├─────────────────┼───────┼───────────┼──────────────────┤
403 │FOO=$BAR baz:qux │ FOO │ qux │ variable BAR baz │
404 │ │ │ │ not set │
405 └─────────────────┴───────┴───────────┴──────────────────┘
406
407 Removing variables with --unset-env
408 The purpose of --unset-env=GLOB is to remove unwanted environment vari‐
409 ables. The argument GLOB is a glob pattern (dialect fnmatch(3) with no
410 flags); all variables with matching names are removed from the environ‐
411 ment.
412
413 WARNING:
414 Because the shell also interprets glob patterns, if any wildcard
415 characters are in GLOB, it is important to put it in single quotes
416 to avoid surprises.
417
418 GLOB must be a non-empty string.
419
420 Example 1: Remove the single environment variable FOO:
421
422 $ export FOO=bar
423 $ env | fgrep FOO
424 FOO=bar
425 $ ch-run --unset-env=FOO $CH_TEST_IMGDIR/chtest -- env | fgrep FOO
426 $
427
428 Example 2: Hide from a container the fact that it’s running in a Slurm
429 allocation, by removing all variables beginning with SLURM. You might
430 want to do this to test an MPI program with one rank and no launcher:
431
432 $ salloc -N1
433 $ env | egrep '^SLURM' | wc
434 44 44 1092
435 $ ch-run $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello
436 [... long error message ...]
437 $ ch-run --unset-env='SLURM*' $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello
438 0: MPI version:
439 Open MPI v3.1.3, package: Open MPI root@c897a83f6f92 Distribution, ident: 3.1.3, repo rev: v3.1.3, Oct 29, 2018
440 0: init ok cn001.localdomain, 1 ranks, userns 4026532530
441 0: send/receive ok
442 0: finalize ok
443
444 Example 3: Clear the environment completely (remove all variables):
445
446 $ ch-run --unset-env='*' $CH_TEST_IMGDIR/chtest -- env
447 $
448
449 Note that some programs, such as shells, set some environment variables
450 even if started with no init files:
451
452 $ ch-run --unset-env='*' $CH_TEST_IMGDIR/debian9 -- bash --noprofile --norc -c env
453 SHLVL=1
454 PWD=/
455 _=/usr/bin/env
456 $
457
459 Run the command echo hello inside a Charliecloud container using the
460 unpacked image at /data/foo:
461
462 $ ch-run /data/foo -- echo hello
463 hello
464
465 Run an MPI job that can use CMA to communicate:
466
467 $ srun ch-run --join /data/foo -- bar
468
470 If Charliecloud was obtained from your Linux distribution, use your
471 distribution’s bug reporting procedures.
472
473 Otherwise, report bugs to: <https://github.com/hpc/charliecloud/issues>
474
476 charliecloud(7)
477
478 Full documentation at: <https://hpc.github.io/charliecloud>
479
481 2014–2021, Triad National Security, LLC
482
483
484
485
4860.25 2021-09-20 00:00 UTC CH-RUN(1)