1CH-IMAGE(1) Charliecloud CH-IMAGE(1)
2
3
4
6 ch-image - Build and manage images; completely unprivileged
7
9 $ ch-image [...] build [-t TAG] [-f DOCKERFILE] [...] CONTEXT
10 $ ch-image [...] build-cache [...]
11 $ ch-image [...] delete IMAGE_REF
12 $ ch-image [...] gestalt [SELECTOR]
13 $ ch-image [...] import PATH IMAGE_REF
14 $ ch-image [...] list [-l] [IMAGE_REF]
15 $ ch-image [...] pull [...] IMAGE_REF [DEST_REF]
16 $ ch-image [...] push [--image DIR] IMAGE_REF [DEST_REF]
17 $ ch-image [...] reset
18 $ ch-image [...] undelete IMAGE_REF
19 $ ch-image { --help | --version | --dependencies }
20
22 ch-image is a tool for building and manipulating container images, but
23 not running them (for that you want ch-run). It is completely unprivi‐
24 leged, with no setuid/setgid/setcap helpers. Many operations can use
25 caching for speed. The action to take is specified by a sub-command.
26
27 Options that print brief information and then exit:
28
29 -h, --help
30 Print help and exit successfully. If specified before the
31 sub-command, print general help and list of sub-commands; if
32 after the sub-command, print help specific to that sub-com‐
33 mand.
34
35 --dependencies
36 Report dependency problems on standard output, if any, and
37 exit. If all is well, there is no output and the exit is suc‐
38 cessful; in case of problems, the exit is unsuccessful.
39
40 --version
41 Print version number and exit successfully.
42
43 Common options placed before or after the sub-command:
44
45 -a, --arch ARCH
46 Use ARCH for architecture-aware registry operations. (See
47 section “Architecture” below for details.)
48
49 --always-download
50 Download all files when pulling, even if they are already in
51 builder storage. Note that ch-image pull will always retrieve
52 the most up-to-date image; this option is mostly for debug‐
53 ging.
54
55 --auth Authenticate with the remote repository, then (if successful)
56 make all subsequent requests in authenticated mode. For most
57 subcommands, the default is to never authenticate, i.e., make
58 all requests anonymously. The exception is push, which im‐
59 plies --auth.
60
61 --cache
62 Enable build cache. Default if a sufficiently new Git is
63 available.
64
65 --cache-large SIZE
66 Set the cache’s large file threshold to SIZE MiB, or 0 for no
67 large files, which is the default. This can speed up some
68 builds. Experimental. See section “Build cache” for details.
69
70 --no-cache
71 Disable build cache. Default if a sufficiently new Git is not
72 available. This option turns off the cache completely; if
73 you want to re-execute a Dockerfile and store the new results
74 in cache, use --rebuild instead.
75
76 --no-lock
77 Disable storage directory locking. This lets you run as many
78 concurrent ch-image instances as you want against the same
79 storage directory, which risks corruption but may be OK for
80 some workloads.
81
82 --profile
83 Dump profile to files /tmp/chofile.p (cProfile dump format)
84 and /tmp/chofile.txt (text summary). You can convert the for‐
85 mer to a PDF call graph with gprof2dot -f pstats
86 /tmp/chofile.p | dot -Tpdf -o /tmp/chofile.pdf. This excludes
87 time spend in subprocesses. Profile data should still be
88 written on fatal errors, but not if the program crashes.
89
90 --rebuild
91 Execute all instructions, even if they are build cache hits,
92 except for FROM which is retrieved from cache on hit.
93
94 --password-many
95 Re-prompt the user every time a registry password is needed.
96
97 -s, --storage DIR
98 Set the storage directory (see below for important details).
99
100 --tls-no-verify
101 Don’t verify TLS certificates of the repository. (Do not use
102 this option unless you understand the risks.)
103
104 -v, --verbose
105 Print extra chatter; can be repeated.
106
108 Charliecloud provides the option --arch ARCH to specify the architec‐
109 ture for architecture-aware registry operations. The argument ARCH can
110 be: (1) yolo, to bypass architecture-aware code and use the registry’s
111 default architecture; (2) host, to use the host’s architecture, ob‐
112 tained with the equivalent of uname -m (default if --arch not speci‐
113 fied); or (3) an architecture name. If the specified architecture is
114 not available, the error message will list which ones are.
115
116 Notes:
117
118 1. ch-image is limited to one image per image reference in builder
119 storage at a time, regardless of architecture. For example, if you
120 say ch-image pull --arch=foo baz and then ch-image pull --arch=bar
121 baz, builder storage will contain one image called “baz”, with ar‐
122 chitecture “bar”.
123
124 2. Images’ default architecture is usually amd64, so this is usually
125 what you get with --arch=yolo. Similarly, if a registry image is ar‐
126 chitecture-unaware, it will still be pulled with --arch=amd64 and
127 --arch=host on x86-64 hosts (other host architectures must specify
128 --arch=yolo to pull architecture-unaware images).
129
130 3. uname -m and image registries often use different names for the same
131 architecture. For example, what uname -m reports as “x86_64” is
132 known to registries as “amd64”. --arch=host should translate if
133 needed, but it’s useful to know this is happening. Directly speci‐
134 fied architecture names are passed to the registry without transla‐
135 tion.
136
137 4. Registries treat architecture as a pair of items, architecture and
138 sometimes variant (e.g., “arm” and “v7”). Charliecloud treats archi‐
139 tecture as a simple string and converts to/from the registry view
140 transparently.
141
143 Charliecloud does not have configuration files; thus, it has no sepa‐
144 rate login subcommand to store secrets. Instead, Charliecloud will
145 prompt for a username and password when authentication is needed. Note
146 that some repositories refer to the secret as something other than a
147 “password”; e.g., GitLab calls it a “personal access token (PAT)”, Quay
148 calls it an “application token”, and nVidia NGC calls it an “API to‐
149 ken”.
150
151 For non-interactive authentication, you can use environment variables
152 CH_IMAGE_USERNAME and CH_IMAGE_PASSWORD. Only do this if you fully un‐
153 derstand the implications for your specific use case, because it is
154 difficult to securely store secrets in environment variables.
155
156 By default for most subcommands, all registry access is anonymous. To
157 instead use authenticated access for everything, specify --auth or set
158 the environment variable $CH_IMAGE_AUTH=yes. The exception is push,
159 which always runs in authenticated mode. Even for pulling public im‐
160 ages, it can be useful to authenticate for registries that have
161 per-user rate limits, such as Docker Hub. (Older versions of Char‐
162 liecloud started with anonymous access, then tried to upgrade to au‐
163 thenticated if it seemed necessary. However, this turned out to be
164 brittle; see issue #1318.)
165
166 The username and password are remembered for the life of the process
167 and silently re-offered to the registry if needed. One case when this
168 happens is on push to a private registry: many registries will first
169 offer a read-only token when ch-image checks if something exists, then
170 re-authenticate when upgrading the token to read-write for upload. If
171 your site uses one-time passwords such as provided by a security de‐
172 vice, you can specify --password-many to provide a new secret each
173 time.
174
175 These values are not saved persistently, e.g. in a file. Note that we
176 do use normal Python variables for this information, without pinning
177 them into physical RAM with mlock(2) or any other special treatment, so
178 we cannot guarantee they will never reach non-volatile storage.
179
180 Technical details
181
182 Most registries use something called Bearer authentication,
183 where the client (e.g., Charliecloud) includes a token in the
184 headers of every HTTP request.
185
186 The authorization dance is different from the typical UNIX
187 approach, where there is a separate login sequence before any
188 content requests are made. The client starts by simply mak‐
189 ing the HTTP request it wants (e.g., to GET an image mani‐
190 fest), and if the registry doesn’t like the client’s token
191 (or if there is no token because the client doesn’t have one
192 yet), it replies with HTTP 401 Unauthorized, but crucially it
193 also provides instructions in the response header on how to
194 get a token. The client then follows those instructions, ob‐
195 tains a token, re-tries the request, and (hopefully) all is
196 well. This approach also allows a client to upgrade a token
197 if needed, e.g. when transitioning from asking if a layer ex‐
198 ists to uploading its content.
199
200 The distinction between Charliecloud’s anonymous mode and au‐
201 thenticated modes is that it will only ask for anonymous to‐
202 kens in anonymous mode and authenticated tokens in authenti‐
203 cated mode. That is, anonymous mode does involve an authenti‐
204 cation procedure to obtain a token, but this “authentication”
205 is done anonymously. (Yes, it’s confusing.)
206
207 Registries also often reply HTTP 401 when an image does not
208 exist, rather than the seemingly more correct HTTP 404 Not
209 Found. This is to avoid information leakage about the exis‐
210 tence of images the client is not allowed to pull, and it’s
211 why Charliecloud never says an image simply does not exist.
212
214 ch-image maintains state using normal files and directories located in
215 its storage directory; contents include various caches and temporary
216 images used for building.
217
218 In descending order of priority, this directory is located at:
219
220 -s, --storage DIR
221 Command line option.
222
223 $CH_IMAGE_STORAGE
224 Environment variable. The path must be absolute, because the
225 variable is likely set in a very different context than when
226 it’s used, which seems error-prone on what a relative path is
227 relative to.
228
229 /var/tmp/$USER.ch
230 Default. (Previously, the default was /var/tmp/$USER/ch-im‐
231 age. If a valid storage directory is found at the old default
232 path, ch-image tries to move it to the new default path.)
233
234 Unlike many container implementations, there is no notion of storage
235 drivers, graph drivers, etc., to select and/or configure.
236
237 The storage directory can reside on any single filesystem (i.e., it
238 cannot be split across multiple filesystems). However, it contains lots
239 of small files and metadata traffic can be intense. For example, the
240 Charliecloud test suite uses approximately 400,000 files and directo‐
241 ries in the storage directory as of this writing. Place it on a
242 filesystem appropriate for this; tmpfs’es such as /var/tmp are a good
243 choice if you have enough RAM (/tmp is not recommended because ch-run
244 bind-mounts it into containers by default).
245
246 While you can currently poke around in the storage directory and find
247 unpacked images runnable with ch-run, this is not a supported use case.
248 The supported workflow uses ch-convert to obtain a packed image; see
249 the tutorial for details.
250
251 The storage directory format changes on no particular schedule. ch-im‐
252 age is normally able to upgrade directories produced by a given Char‐
253 liecloud version up to one year after that version’s release. Upgrades
254 outside this window and downgrades are not supported. In these cases,
255 ch-image will refuse to run until you delete and re-initialize the
256 storage directory with ch-image reset.
257
258 WARNING:
259 Network filesystems, especially Lustre, are typically bad choices
260 for the storage directory. This is a site-specific question and your
261 local support will likely have strong opinions.
262
264 Overview
265 Subcommands that create images, such as build and pull, can use a build
266 cache to speed repeated operations. That is, an image is created by
267 starting from the empty image and executing a sequence of instructions,
268 largely Dockerfile instructions but also some others like “pull” and
269 “import”. Some instructions are expensive to execute (e.g., RUN wget
270 http://slow.example.com/bigfile or transferring data billed by the
271 byte), so it’s often cheaper to retrieve their results from cache in‐
272 stead.
273
274 The build cache uses a relatively new Git under the hood; see the in‐
275 stallation instructions for version requirements. Charliecloud imple‐
276 ments workarounds for Git’s various storage limitations, so things like
277 file metadata and Git repositories within the image should work. Impor‐
278 tant exception: No files named .git* or other Git metadata are permit‐
279 ted in the image’s root directory.
280
281 The cache has three modes, enabled, disabled, and a hybrid mode called
282 rebuild where the cache is fully enabled for FROM instructions, but all
283 other operations re-execute and re-cache their results. The purpose of
284 rebuild is to do a clean rebuild of a Dockerfile atop a known-good base
285 image.
286
287 Enabled mode is selected with --cache or setting $CH_IMAGE_CACHE to en‐
288 abled, disabled mode with --no-cache or disabled, and rebuild mode with
289 --rebuild or rebuild. The default mode is enabled if an appropriate Git
290 is installed, otherwise disabled.
291
292 Compared to other implementations
293 Other container implementations typically use build caches based on
294 overlayfs, or fuse-overlayfs in unprivileged situations (configured via
295 a “storage driver”). This works by creating a new tmpfs for each in‐
296 struction, layered atop the previous instruction’s tmpfs using over‐
297 layfs. Each layer can then be tarred up separately to form a tar-based
298 diff.
299
300 The Git-based cache has two advantages over the overlayfs approach.
301 First, kernel-mode overlayfs is only available unprivileged in Linux
302 5.11 and higher, forcing the use of fuse-overlayfs and its accompanying
303 FUSE overhead for unprivileged use cases. Second, Git de-duplicates and
304 compresses files in a fairly sophisticated way across the entire build
305 cache, not just between image states with an ancestry relationship (de‐
306 tailed in the next section).
307
308 A disadvantage is lowered performance in some cases. Preliminary exper‐
309 iments suggest this performance penalty is relatively modest, and some‐
310 times Charliecloud is actually faster than alternatives. We have ongo‐
311 ing experiments to answer this performance question in more detail.
312
313 De-duplication and garbage collection
314 Charliecloud’s build cache takes advantage of Git’s file de-duplication
315 features. This operates across the entire build cache, i.e., files are
316 de-duplicated no matter where in the cache they are found or the rela‐
317 tionship between their container images. Files are de-duplicated at
318 different times depending on whether they are identical or merely simi‐
319 lar.
320
321 Identical files are de-duplicated at git add time; in ch-image build
322 terms, that’s upon committing a successful instruction. That is, it’s
323 impossible to store two files with the same content in the build cache.
324 If you try — say with RUN yum install -y foo in one Dockerfile and RUN
325 yum install -y foo bar in another, which are different instructions but
326 both install RPM foo’s files — the content is stored once and each copy
327 gets its own metadata and a pointer to the content, much like filesys‐
328 tem hard links.
329
330 Similar files, however, are only de-duplicated during Git’s garbage
331 collection process. When files are initially added to a Git repository
332 (with git add), they are stored inside the repository as (possibly com‐
333 pressed) individual files, called objects in Git jargon. Upon garbage
334 collection, which happens both automatically when certain parameters
335 are met and explicitly with git gc, these files are archived and
336 (re-)compressed together into a single file called a packfile. Also,
337 existing packfiles may be re-written into the new one.
338
339 During this process, similar files are identified, and each set of sim‐
340 ilar files is stored as one base file plus diffs to recover the others.
341 (Similarity detection seems to be based primarily on file size.) This
342 delta process is agnostic to alignment, which is an advantage over
343 alignment-sensitive block-level de-duplicating filesystems. Exception:
344 “Large” files are not compressed or de-duplicated. We use the Git de‐
345 fault threshold of 512 MiB (as of this writing).
346
347 Charliecloud runs Git garbage collection at two different times. First,
348 a lighter-weight garbage pass runs automatically when the number of
349 loose files (objects) grows beyond a limit. This limit is in flux as we
350 learn more about build cache performance, but it’s quite a bit higher
351 than the Git default. This garbage runs in the background and can con‐
352 tinue after the build completes; you may see Git processes using a lot
353 of CPU.
354
355 An important limitation of the automatic garbage is that large pack‐
356 files (again, this is in flux, but it’s several GiB) will not be
357 re-packed, limiting the scope of similar file detection. To address
358 this, a heavier garbage collection can be run manually with ch-image
359 build-cache --gc. This will re-pack (and re-write) the entire build
360 cache, de-duplicating all similar files. In both cases, garbage uses
361 all available cores.
362
363 git build-cache prints the specific garbage collection parameters in
364 use, and -v can be added for more detail.
365
366 Large file threshold
367 Because Git uses content-addressed storage, upon commit, it must read
368 in full all files modified by an instruction. This I/O cost can be a
369 significant fraction of build time for some large images. Regular files
370 larger than the experimental large file threshold are stored outside
371 the Git repository, somewhat like Git Large File Storage. ch-image
372 uses hard links to bring large files in and out of images as needed,
373 which is a fast metadata operation that ignores file content.
374
375 Option --cache-large sets the threshold in MiB; if not set, environment
376 variable CH_IMAGE_CACHE_LARGE is used; if that is not set either, the
377 default value 0 indicates that no files are considered large.
378
379 There are two trade-offs. First, large files in any image with the same
380 path, mode, size, and mtime (to nanosecond precision if possible) are
381 considered identical, even if their content is not actually identical;
382 e.g., touch(1) shenanigans can corrupt an image. Second, every version
383 of a large file is stored verbatim and uncompressed (e.g., a large file
384 with a one-byte change will be stored in full twice), and large files
385 do not participate in the build cache’s de-duplication, so more storage
386 space will likely be used. Unused versions are deleted by ch-image
387 build-cache --gc.
388
389 (Note that Git has an unrelated setting called core.bigFileThreshold.)
390
391 Example
392 Suppose we have this Dockerfile:
393
394 $ cat a.df
395 FROM alpine:3.17
396 RUN echo foo
397 RUN echo bar
398
399 On our first build, we get:
400
401 $ ch-image build -t foo -f a.df .
402 1. FROM alpine:3.17
403 [ ... pull chatter omitted ... ]
404 2. RUN echo foo
405 copying image ...
406 foo
407 3. RUN echo bar
408 bar
409 grown in 3 instructions: foo
410
411 Note the dot after each instruction’s line number. This means that the
412 instruction was executed. You can also see this by the output of the
413 two echo commands.
414
415 But on our second build, we get:
416
417 $ ch-image build -t foo -f a.df .
418 1* FROM alpine:3.17
419 2* RUN echo foo
420 3* RUN echo bar
421 copying image ...
422 grown in 3 instructions: foo
423
424 Here, instead of being executed, each instruction’s results were re‐
425 trieved from cache. (Charliecloud uses lazy retrieval; nothing is actu‐
426 ally retrieved until the end, as seen by the “copying image” message.)
427 Cache hit for each instruction is indicated by an asterisk (*) after
428 the line number. Even for such a small and short Dockerfile, this
429 build is noticeably faster than the first.
430
431 We can also try a second, slightly different Dockerfile. Note that the
432 first three instructions are the same, but the third is different:
433
434 $ cat c.df
435 FROM alpine:3.17
436 RUN echo foo
437 RUN echo qux
438 $ ch-image build -t c -f c.df .
439 1* FROM alpine:3.17
440 2* RUN echo foo
441 3. RUN echo qux
442 copying image ...
443 qux
444 grown in 3 instructions: c
445
446 Here, the first two instructions are hits from the first Dockerfile,
447 but the third is a miss, so Charliecloud retrieves that state and con‐
448 tinues building.
449
450 We can also inspect the cache:
451
452 $ ch-image build-cache --tree
453 * (c) RUN echo qux
454 | * (a) RUN echo bar
455 |/
456 * RUN echo foo
457 * (alpine+3.9) PULL alpine:3.17
458 * (HEAD -> root) ROOT
459
460 named images: 4
461 state IDs: 5
462 commits: 5
463 files: 317
464 disk used: 3 MiB
465
466 Here there are four named images: a and c that we built, the base image
467 alpine:3.17 (written as alpine+3.9 because colon is not allowed in Git
468 branch names), and the empty base of everything root. Also note how a
469 and c diverge after the last common instruction RUN echo foo.
470
472 Build an image from a Dockerfile and put it in the storage directory.
473
474 Synopsis
475 $ ch-image [...] build [-t TAG] [-f DOCKERFILE] [...] CONTEXT
476
477 Description
478 Uses ch-run -w -u0 -g0 --no-passwd --unsafe to execute RUN instruc‐
479 tions. Note that FROM implicitly pulls the base image if needed, so you
480 may want to read about the pull subcommand below as well.
481
482 Required argument:
483
484 CONTEXT
485 Path to context directory. This is the root of COPY instruc‐
486 tions in the Dockerfile. If a single hyphen (-) is specified:
487 (a) read the Dockerfile from standard input, (b) specifying
488 --file is an error, and (c) there is no context, so COPY will
489 fail. (See --file for how to provide the Dockerfile on stan‐
490 dard input while also having a context.)
491
492 Options:
493
494 -b, --bind SRC[:DST]
495 For RUN instructions only, bind-mount SRC at guest DST. The
496 default destination if not specified is to use the same path
497 as the host; i.e., the default is equivalent to
498 --bind=SRC:SRC. If DST does not exist, try to create it as an
499 empty directory, though images do have ten directories
500 /mnt/[0-9] already available as mount points. Can be re‐
501 peated.
502
503 Note: See documentation for ch-run --bind for important
504 caveats and gotchas.
505
506 Note: Other instructions that modify the image filesystem,
507 e.g. COPY, can only access host files from the context di‐
508 rectory, regardless of this option.
509
510 --build-arg KEY[=VALUE]
511 Set build-time variable KEY defined by ARG instruction to
512 VALUE. If VALUE not specified, use the value of environment
513 variable KEY.
514
515 -f, --file DOCKERFILE
516 Use DOCKERFILE instead of CONTEXT/Dockerfile. If a single hy‐
517 phen (-) is specified, read the Dockerfile from standard in‐
518 put; like docker build, the context directory is still avail‐
519 able in this case.
520
521 --force[=MODE]
522 Use unprivileged build workarounds of mode MODE, which can be
523 fakeroot or seccomp (the default). See section “Privilege
524 model” below for details on what this does and when you might
525 need it.
526
527 --force-cmd=CMD,ARG1[,ARG2...]
528 If command CMD is found in a RUN instruction, add the
529 comma-separated ARGs to it. For example,
530 --force-cmd=foo,-a,--bar=baz would transform RUN foo -c into
531 RUN foo -a --bar=baz -c. This is intended to suppress valida‐
532 tion that defeats --force=seccomp and implies that option.
533 Can be repeated. If specified, replaces (does not extend) the
534 default suppression options. Literal commas can be escaped
535 with backslash; importantly however, backslash will need to
536 be protected from the shell also. Section “Privilege model”
537 below explains why you might need this.
538
539 -n, --dry-run
540 Don’t actually execute any Dockerfile instructions.
541
542 --parse-only
543 Stop after parsing the Dockerfile.
544
545 -t, --tag TAG
546 Name of image to create. If not specified, infer the name:
547
548 1. If Dockerfile named Dockerfile with an extension: use the
549 extension with invalid characters stripped, e.g. Docker‐
550 file.@FOO.bar → foo.bar.
551
552 2. If Dockerfile has extension dockerfile: use the basename
553 with the same transformation, e.g. baz.@QUX.dockerfile ->
554 baz.qux.
555
556 3. If context directory is not /: use its name, i.e. the last
557 component of the absolute path to the context directory,
558 with the same transformation,
559
560 4. Otherwise (context directory is /): use root.
561
562 If no colon present in the name, append :latest.
563
564 Privilege model
565 Overview
566 ch-image is a fully unprivileged image builder. It does not use any se‐
567 tuid or setcap helper programs, and it does not use configuration files
568 /etc/subuid or /etc/subgid. This contrasts with the “rootless” or “‐
569 fakeroot” modes of some competing builders, which do require privileged
570 supporting code or utilities.
571
572 Without workarounds provided by --force, this approach does confuse
573 programs that expect to have real root privileges, most notably distri‐
574 bution package installers. This subsection describes why that happens
575 and what you can do about it.
576
577 ch-image executes all instructions as the normal user who invokes it.
578 For RUN, this is accomplished with ch-run arguments including -w
579 --uid=0 --gid=0. That is, your host EUID and EGID are both mapped to
580 zero inside the container, and only one UID (zero) and GID (zero) are
581 available inside the container. Under this arrangement, processes run‐
582 ning in the container for each RUN appear to be running as root, but
583 many privileged system calls will fail without the workarounds de‐
584 scribed below. This affects any fully unprivileged container build,
585 not just Charliecloud.
586
587 The most common time to see this is installing packages. For example,
588 here is RPM failing to chown(2) a file, which makes the package update
589 fail:
590
591 Updating : 1:dbus-1.10.24-13.el7_6.x86_64 2/4
592 Error unpacking rpm package 1:dbus-1.10.24-13.el7_6.x86_64
593 error: unpacking of archive failed on file /usr/libexec/dbus-1/dbus-daemon-launch-helper;5cffd726: cpio: chown
594 Cleanup : 1:dbus-libs-1.10.24-12.el7.x86_64 3/4
595 error: dbus-1:1.10.24-13.el7_6.x86_64: install failed
596
597 This one is (ironically) apt-get failing to drop privileges:
598
599 E: setgroups 65534 failed - setgroups (1: Operation not permitted)
600 E: setegid 65534 failed - setegid (22: Invalid argument)
601 E: seteuid 100 failed - seteuid (22: Invalid argument)
602 E: setgroups 0 failed - setgroups (1: Operation not permitted)
603
604 Charliecloud provides two different mechanisms to avoid these problems.
605 Both involve lying to the containerized process about privileged system
606 calls, but at very different levels of complexity.
607
608 Workaround mode fakeroot
609 This mode uses fakeroot(1) to maintain an elaborate web of deceit that
610 is internally consistent. This program intercepts both privileged sys‐
611 tem calls (e.g., setuid(2)) as well as other system calls whose return
612 values depend on those calls (e.g., getuid(2)), faking success for
613 privileged system calls (perhaps making no system call at all) and al‐
614 tering return values to be consistent with earlier fake success. Char‐
615 liecloud automatically installs the fakeroot(1) program inside the con‐
616 tainer and then wraps RUN instructions having known privilege needs
617 with it. Thus, this mode is only available for certain distributions.
618
619 The advantage of this mode is its consistency; e.g., careful programs
620 that check the new UID after attempting to change it will not notice
621 anything amiss. Its disadvantage is complexity: detailed knowledge and
622 procedures for multiple Linux distributions.
623
624 This mode has three basic steps:
625
626 1. After FROM, analyze the image to see what distribution it con‐
627 tains, which determines the specific workarounds.
628
629 2. Before the user command in the first RUN instruction where the
630 injection seems needed, install fakeroot(1) in the image, if one
631 is not already installed, as well as any other necessary initial‐
632 ization commands. For example, we turn off the apt sandbox (for
633 Debian Buster) and configure EPEL but leave it disabled (for Cen‐
634 tOS/RHEL).
635
636 3. Prepend fakeroot to RUN instructions that seem to need it, e.g.
637 ones that contain apt, apt-get, dpkg for Debian derivatives and
638 dnf, rpm, or yum for RPM-based distributions.
639
640 RUN instructions that do not seem to need modification are unaffected
641 by this mode.
642
643 The details are specific to each distribution. ch-image analyzes image
644 content (e.g., grepping /etc/debian_version) to select a configuration;
645 see lib/force.py for details. ch-image prints exactly what it is doing.
646
647 WARNING:
648 Because of fakeroot mode’s complexity, we plan to remove it if sec‐
649 comp mode performs well enough. If you have a situation where fake‐
650 root mode works and seccomp does not, please let us know.
651
652 Workaround mode seccomp (default)
653 This mode uses the kernel’s seccomp(2) system call filtering to inter‐
654 cept certain privileged system calls, do absolutely nothing, and return
655 success to the program.
656
657 The quashed system calls are: capset(2); chown(2) and friends; mknod(2)
658 and mknodat(2); and setuid(2), setgid(2), and setgroups(2) along with
659 the other system calls that change user or group.
660
661 The advantages of this approach is that it’s much simpler, it’s faster,
662 it’s completely agnostic to libc, and it’s mostly agnostic to distribu‐
663 tion. The disadvantage is that it’s a very lazy liar; even the most
664 cursory consistency checks will fail, e.g., getuid(2) after setuid(2).
665
666 While this mode does not provide consistency, it does offer a hook to
667 help prevent programs asking for consistency. For example, apt-get -o
668 APT::Sandbox::User=root will prevent apt-get from attempting to drop
669 privileges, which it verifies, exiting with failure if the correct IDs
670 are not found (which they won’t be under this approach). This can be
671 expressed with --force-cmd=apt-get,-o,APT::Sandbox::User=root, though
672 this particular case is built-in and does not need to be specified. The
673 full default configuration, which is applied regardless of the image
674 distribution, can be examined in the source file force.py. If any
675 --force-cmd are specified, this replaces (rather than extends) the de‐
676 fault configuration.
677
678 Note that because the substitutions are a simple regex with no knowl‐
679 edge of shell syntax, they can cause unwanted modifications. For exam‐
680 ple, RUN apt-get install -y apt-get will be run as /bin/sh -c "apt-get
681 -o APT::Sandbox::User=root install -y apt-get -o APT::Sand‐
682 box::User=root". One workaround is to add escape syntax transparent to
683 the shell; e.g., RUN apt-get install -y apt-get.
684
685 This mode executes all RUN instructions with the seccomp(2) filter and
686 has no knowledge of which instructions actually used the intercepted
687 system calls. Therefore, the printed “instructions modified” number is
688 only a count of instructions with a hook applied as described above.
689
690 Compatibility with other Dockerfile interpreters
691 ch-image is an independent implementation and shares no code with other
692 Dockerfile interpreters. It uses a formal Dockerfile parsing grammar
693 developed from the Dockerfile reference documentation and miscellaneous
694 other sources, which you can examine in the source code.
695
696 We believe this independence is valuable for several reasons. First, it
697 helps the community examine Dockerfile syntax and semantics critically,
698 think rigorously about what is really needed, and build a more robust
699 standard. Second, it yields disjoint sets of bugs (note that Podman,
700 Buildah, and Docker all share the same Dockerfile parser). Third, be‐
701 cause it is a much smaller code base, it illustrates how Dockerfiles
702 work more clearly. Finally, it allows straightforward extensions if
703 needed to support scientific computing.
704
705 ch-image tries hard to be compatible with Docker and other inter‐
706 preters, though as an independent implementation, it is not bug-compat‐
707 ible.
708
709 The following subsections describe differences from the Dockerfile ref‐
710 erence that we expect to be approximately permanent. For not-yet-imple‐
711 mented features and bugs in this area, see related issues on GitHub.
712
713 None of these are set in stone. We are very interested in feedback on
714 our assessments and open questions. This helps us prioritize new fea‐
715 tures and revise our thinking about what is needed for HPC containers.
716
717 Context directory
718 The context directory is bind-mounted into the build, rather than
719 copied like Docker. Thus, the size of the context is immaterial, and
720 the build reads directly from storage like any other local process
721 would. However, you still can’t access anything outside the context di‐
722 rectory.
723
724 Variable substitution
725 Variable substitution happens for all instructions, not just the ones
726 listed in the Dockerfile reference.
727
728 ARG and ENV cause cache misses upon definition, in contrast with Docker
729 where these variables miss upon use, except for certain cache-excluded
730 variables that never cause misses, listed below.
731
732 Note that ARG and ENV have different syntax despite very similar seman‐
733 tics.
734
735 ch-image passes the following proxy environment variables in to the
736 build. Changes to these variables do not cause a cache miss. They do
737 not require an ARG instruction, as documented in the Dockerfile refer‐
738 ence. Unlike Docker, they are available if the same-named environment
739 variable is defined; --build-arg is not required.
740
741 HTTP_PROXY
742 http_proxy
743 HTTPS_PROXY
744 https_proxy
745 FTP_PROXY
746 ftp_proxy
747 NO_PROXY
748 no_proxy
749
750 In addition to those listed in the Dockerfile reference, these environ‐
751 ment variables are passed through in the same way:
752
753 SSH_AUTH_SOCK
754 USER
755
756 Finally, these variables are also pre-defined but are unrelated to the
757 host environment:
758
759 PATH=/ch/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
760 TAR_OPTIONS=--no-same-owner
761
762 ARG
763 Variables set with ARG are available anywhere in the Dockerfile, unlike
764 Docker, where they only work in FROM instructions, and possibly in
765 other ARG before the first FROM.
766
767 FROM
768 The FROM instruction accepts option --arg=NAME=VALUE, which serves the
769 same purpose as the ARG instruction. It can be repeated.
770
771 LABEL
772 The LABEL instruction accepts key=value pairs to add metadata for an
773 image. Unlike Docker, multiline values are not supported; see issue
774 #1512. Can be repeated.
775
776 COPY
777 Especially for people used to UNIX cp(1), the semantics of the Docker‐
778 file COPY instruction can be confusing.
779
780 Most notably, when a source of the copy is a directory, the contents of
781 that directory, not the directory itself, are copied. This is docu‐
782 mented, but it’s a real gotcha because that’s not what cp(1) does, and
783 it means that many things you can do in one cp(1) command require mul‐
784 tiple COPY instructions.
785
786 Also, the reference documentation is incomplete. In our experience,
787 Docker also behaves as follows; ch-image does the same in an attempt to
788 be bug-compatible.
789
790 1. You can use absolute paths in the source; the root is the context
791 directory.
792
793 2. Destination directories are created if they don’t exist in the fol‐
794 lowing situations:
795
796 1. If the destination path ends in slash. (Documented.)
797
798 2. If the number of sources is greater than 1, either by wildcard or
799 explicitly, regardless of whether the destination ends in slash.
800 (Not documented.)
801
802 3. If there is a single source and it is a directory. (Not docu‐
803 mented.)
804
805 3. Symbolic links behave differently depending on how deep in the
806 copied tree they are. (Not documented.)
807
808 1. Symlinks at the top level — i.e., named as the destination or the
809 source, either explicitly or by wildcards — are dereferenced.
810 They are followed, and whatever they point to is used as the des‐
811 tination or source, respectively.
812
813 2. Symlinks at deeper levels are not dereferenced, i.e., the symlink
814 itself is copied.
815
816 4. If a directory appears at the same path in source and destination,
817 and is at the 2nd level or deeper, the source directory’s metadata
818 (e.g., permissions) are copied to the destination directory. (Not
819 documented.)
820
821 5. If an object appears in both the source and destination, and is at
822 the 2nd level or deeper, and is of different types in the source and
823 destination, then the source object will overwrite the destination
824 object. (Not documented.) For example, if /tmp/foo/bar is a regular
825 file, and /tmp is the context directory, then the following Docker‐
826 file snippet will result in a file in the container at /foo/bar
827 (copied from /tmp/foo/bar); the directory and all its contents will
828 be lost.
829
830 RUN mkdir -p /foo/bar && touch /foo/bar/baz
831 COPY foo /foo
832
833 We expect the following differences to be permanent:
834
835 • Wildcards use Python glob semantics, not the Go semantics.
836
837 • COPY --chown is ignored, because it doesn’t make sense in an unprivi‐
838 leged build.
839
840 Features we do not plan to support
841 • Parser directives are not supported. We have not identified a need
842 for any of them.
843
844 • EXPOSE: Charliecloud does not use the network namespace, so con‐
845 tainerized processes can simply listen on a host port like other un‐
846 privileged processes.
847
848 • HEALTHCHECK: This instruction’s main use case is monitoring server
849 processes rather than applications. Also, implementing it requires a
850 container supervisor daemon, which we have no plans to add.
851
852 • MAINTAINER is deprecated.
853
854 • STOPSIGNAL requires a container supervisor daemon process, which we
855 have no plans to add.
856
857 • USER does not make sense for unprivileged builds.
858
859 • VOLUME: This instruction is not currently supported. Charliecloud has
860 good support for bind mounts; we anticipate that it will continue to
861 focus on that and will not introduce the volume management features
862 that Docker has.
863
864 Examples
865 Build image bar using ./foo/bar/Dockerfile and context directory
866 ./foo/bar:
867
868 $ ch-image build -t bar -f ./foo/bar/Dockerfile ./foo/bar
869 [...]
870 grown in 4 instructions: bar
871
872 Same, but infer the image name and Dockerfile from the context direc‐
873 tory path:
874
875 $ ch-image build ./foo/bar
876 [...]
877 grown in 4 instructions: bar
878
879 Build using humongous vendor compilers you want to bind-mount instead
880 of installing into the image:
881
882 $ ch-image build --bind /opt/bigvendor:/opt .
883 $ cat Dockerfile
884 FROM centos:7
885
886 RUN /opt/bin/cc hello.c
887 #COPY /opt/lib/*.so /usr/local/lib # fail: COPY doesn’t bind mount
888 RUN cp /opt/lib/*.so /usr/local/lib # possible workaround
889 RUN ldconfig
890
892 $ ch-image [...] build-cache [...]
893
894 Print basic information about the cache. If -v is given, also print
895 some Git statistics and the Git repository configuration.
896
897 If any of the following options are given, do the corresponding opera‐
898 tion before printing. Multiple options can be given, in which case they
899 happen in this order.
900
901 --reset
902 Clear and re-initialize the build cache.
903
904 --gc Run Git garbage collection on the cache, including full
905 de-duplication of similar files. This will immediately remove
906 all cache entries not currently reachable from a named branch
907 (which is likely to cause corruption if the build cache is
908 being accessed concurrently by another process). The opera‐
909 tion can take a long time on large caches.
910
911 --text Print a text tree of the cache using Git’s git log --graph
912 feature. If -v is also given, the tree has more detail.
913
914 --dot Create a DOT export of the tree named ./build-cache.dot and a
915 PDF rendering ./build-cache.pdf. Requires graphviz and
916 git2dot.
917
919 $ ch-image [...] delete IMAGE_GLOB
920
921 Delete the image(s) described by IMAGE_GLOB from the storage directory
922 (including all build stages).
923
924 IMAGE_GLOB can be either a plain image reference or an image reference
925 with glob characters to match multiple images. For example, ch-image
926 delete 'foo*' will delete all images whose names start with foo.
927
928 Importantly, this sub-command does not also remove the image from the
929 build cache. Therefore, it can be used to reduce the size of the stor‐
930 age directory, trading off the time needed to retrieve an image from
931 cache.
932
933 WARNING:
934 Glob characters must be quoted or otherwise protected from the
935 shell, which also desires to interpret them and will do so incor‐
936 rectly.
937
939 $ ch-image [...] gestalt [SELECTOR]
940
941 Provide information about the configuration and available features of
942 ch-image. End users generally will not need this; it is intended for
943 testing and debugging.
944
945 SELECTOR is one of:
946
947 • bucache. Exit successfully if the build cache is available, unsuc‐
948 cessfully with an error message otherwise. With -v, also print
949 version information about dependencies.
950
951 • bucache-dot. Exit successfully if build cache DOT trees can be
952 written, unsuccessfully with an error message otherwise. With -v,
953 also print version information about dependencies.
954
955 • python-path. Print the path to the Python interpreter in use and
956 exit successfully.
957
958 • storage-path. Print the storage directory path and exit success‐
959 fully.
960
962 Print information about images. If no argument given, list the images
963 in builder storage.
964
965 Synopsis
966 $ ch-image [...] list [-l] [IMAGE_REF]
967
968 Description
969 Optional argument:
970
971 -l, --long
972 Use long format (name, last change timestamp) when listing
973 images.
974
975 IMAGE_REF
976 Print details of what’s known about IMAGE_REF, both locally
977 and in the remote registry, if any.
978
979 Examples
980 List images in builder storage:
981
982 $ ch-image list
983 alpine:3.17 (amd64)
984 alpine:latest (amd64)
985 debian:buster (amd64)
986
987 Print details about Debian Buster image:
988
989 $ ch-image list debian:buster
990 details of image: debian:buster
991 in local storage: no
992 full remote ref: registry-1.docker.io:443/library/debian:buster
993 available remotely: yes
994 remote arch-aware: yes
995 host architecture: amd64
996 archs available: 386 bae2738ed83
997 amd64 98285d32477
998 arm/v7 97247fd4822
999 arm64/v8 122a0342878
1000
1001 For remotely available images like Debian Buster, the associated digest
1002 is listed beside each available architecture. Importantly, this feature
1003 does not provide the hash of the local image, which is only calculated
1004 on push.
1005
1007 $ ch-image [...] import PATH IMAGE_REF
1008
1009 Copy the image at PATH into builder storage with name IMAGE_REF. PATH
1010 can be:
1011
1012 • an image directory
1013
1014 • a tarball with no top-level directory (a.k.a. a “tarbomb”)
1015
1016 • a standard tarball with one top-level directory
1017
1018 If the imported image contains Charliecloud metadata, that will be im‐
1019 ported unchanged, i.e., images exported from ch-image builder storage
1020 will be functionally identical when re-imported.
1021
1022 NOTE:
1023 Every import creates a new cache entry, even if the file or direc‐
1024 tory has already been imported.
1025
1027 Pull the image described by the image reference IMAGE_REF from a repos‐
1028 itory to the local filesystem.
1029
1030 Synopsis
1031 $ ch-image [...] pull [...] IMAGE_REF [DEST_REF]
1032
1033 See the FAQ for the gory details on specifying image references.
1034
1035 Description
1036 Destination:
1037
1038 DEST_REF
1039 If specified, use this as the destination image reference,
1040 rather than IMAGE_REF. This lets you pull an image with a
1041 complicated reference while storing it locally with a simpler
1042 one.
1043
1044 Options:
1045
1046 --last-layer N
1047 Unpack only N layers, leaving an incomplete image. This op‐
1048 tion is intended for debugging.
1049
1050 --parse-only
1051 Parse IMAGE_REF, print a parse report, and exit successfully
1052 without talking to the internet or touching the storage di‐
1053 rectory.
1054
1055 This script does a fair amount of validation and fixing of the layer
1056 tarballs before flattening in order to support unprivileged use despite
1057 image problems we frequently see in the wild. For example, device files
1058 are ignored, and file and directory permissions are increased to a min‐
1059 imum of rwx------ and rw------- respectively. Note, however, that sym‐
1060 links pointing outside the image are permitted, because they are not
1061 resolved until runtime within a container.
1062
1063 The following metadata in the pulled image is retained; all other meta‐
1064 data is currently ignored. (If you have a need for additional metadata,
1065 please let us know!)
1066
1067 • Current working directory set with WORKDIR is effective in down‐
1068 stream Dockerfiles.
1069
1070 • Environment variables set with ENV are effective in downstream
1071 Dockerfiles and also written to /ch/environment for use in ch-run
1072 --set-env.
1073
1074 • Mount point directories specified with VOLUME are created in the
1075 image if they don’t exist, but no other action is taken.
1076
1077 Note that some images (e.g., those with a “version 1 manifest”) do not
1078 contain metadata. A warning is printed in this case.
1079
1080 Examples
1081 Download the Debian Buster image matching the host’s architecture and
1082 place it in the storage directory:
1083
1084 $ uname -m
1085 aarch32
1086 pulling image: debian:buster
1087 requesting arch: arm64/v8
1088 manifest list: downloading
1089 manifest: downloading
1090 config: downloading
1091 layer 1/1: c54d940: downloading
1092 flattening image
1093 layer 1/1: c54d940: listing
1094 validating tarball members
1095 resolving whiteouts
1096 layer 1/1: c54d940: extracting
1097 image arch: arm64
1098 done
1099
1100 Same, specifying the architecture explicitly:
1101
1102 $ ch-image --arch=arm/v7 pull debian:buster
1103 pulling image: debian:buster
1104 requesting arch: arm/v7
1105 manifest list: downloading
1106 manifest: downloading
1107 config: downloading
1108 layer 1/1: 8947560: downloading
1109 flattening image
1110 layer 1/1: 8947560: listing
1111 validating tarball members
1112 resolving whiteouts
1113 layer 1/1: 8947560: extracting
1114 image arch: arm (may not match host arm64/v8)
1115
1117 Push the image described by the image reference IMAGE_REF from the lo‐
1118 cal filesystem to a repository.
1119
1120 Synopsis
1121 $ ch-image [...] push [--image DIR] IMAGE_REF [DEST_REF]
1122
1123 See the FAQ for the gory details on specifying image references.
1124
1125 Description
1126 Destination:
1127
1128 DEST_REF
1129 If specified, use this as the destination image reference,
1130 rather than IMAGE_REF. This lets you push to a repository
1131 without permanently adding a tag to the image.
1132
1133 Options:
1134
1135 --image DIR
1136 Use the unpacked image located at DIR rather than an image in
1137 the storage directory named IMAGE_REF.
1138
1139 Because Charliecloud is fully unprivileged, the owner and group of
1140 files in its images are not meaningful in the broader ecosystem. Thus,
1141 when pushed, everything in the image is flattened to user:group
1142 root:root. Also, setuid/setgid bits are removed, to avoid surprises if
1143 the image is pulled by a privileged container implementation.
1144
1145 Examples
1146 Push a local image to the registry example.com:5000 at path /foo/bar
1147 with tag latest. Note that in this form, the local image must be named
1148 to match that remote reference.
1149
1150 $ ch-image push example.com:5000/foo/bar:latest
1151 pushing image: example.com:5000/foo/bar:latest
1152 layer 1/1: gathering
1153 layer 1/1: preparing
1154 preparing metadata
1155 starting upload
1156 layer 1/1: a1664c4: checking if already in repository
1157 layer 1/1: a1664c4: not present, uploading
1158 config: 89315a2: checking if already in repository
1159 config: 89315a2: not present, uploading
1160 manifest: uploading
1161 cleaning up
1162 done
1163
1164 Same, except use local image alpine:3.17. In this form, the local image
1165 name does not have to match the destination reference.
1166
1167 $ ch-image push alpine:3.17 example.com:5000/foo/bar:latest
1168 pushing image: alpine:3.17
1169 destination: example.com:5000/foo/bar:latest
1170 layer 1/1: gathering
1171 layer 1/1: preparing
1172 preparing metadata
1173 starting upload
1174 layer 1/1: a1664c4: checking if already in repository
1175 layer 1/1: a1664c4: not present, uploading
1176 config: 89315a2: checking if already in repository
1177 config: 89315a2: not present, uploading
1178 manifest: uploading
1179 cleaning up
1180 done
1181
1182 Same, except use unpacked image located at /var/tmp/image rather than
1183 an image in ch-image storage. (Also, the sole layer is already present
1184 in the remote registry, so we don’t upload it again.)
1185
1186 $ ch-image push --image /var/tmp/image example.com:5000/foo/bar:latest
1187 pushing image: example.com:5000/foo/bar:latest
1188 image path: /var/tmp/image
1189 layer 1/1: gathering
1190 layer 1/1: preparing
1191 preparing metadata
1192 starting upload
1193 layer 1/1: 892e38d: checking if already in repository
1194 layer 1/1: 892e38d: already present
1195 config: 546f447: checking if already in repository
1196 config: 546f447: not present, uploading
1197 manifest: uploading
1198 cleaning up
1199 done
1200
1202 $ ch-image [...] reset
1203
1204 Delete all images and cache from ch-image builder storage.
1205
1207 $ ch-image [...] undelete IMAGE_REF
1208
1209 If IMAGE_REF has been deleted but is in the build cache, recover it
1210 from the cache. Only available when the cache is enabled, and will not
1211 overwrite IMAGE_REF if it exists.
1212
1214 CH_IMAGE_USERNAME, CH_IMAGE_PASSWORD
1215 Username and password for registry authentication. See important
1216 caveats in section “Authentication” above.
1217
1218 CH_LOG_FILE
1219 If set, append log chatter to this file, rather than standard
1220 error. This is useful for debugging situations where standard
1221 error is consumed or lost.
1222
1223 Also sets verbose mode if not already set (equivalent to --ver‐
1224 bose).
1225
1226 CH_LOG_FESTOON
1227 If set, prepend PID and timestamp to logged chatter.
1228
1230 If Charliecloud was obtained from your Linux distribution, use your
1231 distribution’s bug reporting procedures.
1232
1233 Otherwise, report bugs to: https://github.com/hpc/charliecloud/issues
1234
1236 charliecloud(7)
1237
1238 Full documentation at: <https://hpc.github.io/charliecloud>
1239
1241 2014–2022, Triad National Security, LLC and others
1242
1243
1244
1245
12460.32 2023-07-19 00:00 UTC CH-IMAGE(1)