1PAX(1P) POSIX Programmer's Manual PAX(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
12 pax — portable archive interchange
13
15 pax [-dv] [-c|-n] [-H|-L] [-o options] [-f archive] [-s replstr]...
16 [pattern...]
17
18 pax -r[-c|-n] [-dikuv] [-H|-L] [-f archive] [-o options]... [-p string]...
19 [-s replstr]... [pattern...]
20
21 pax -w [-dituvX] [-H|-L] [-b blocksize] [[-a] [-f archive]] [-o options]...
22 [-s replstr]... [-x format] [file...]
23
24 pax -r -w [-diklntuvX] [-H|-L] [-o options]... [-p string]...
25 [-s replstr]... [file...] directory
26
28 The pax utility shall read, write, and write lists of the members of
29 archive files and copy directory hierarchies. A variety of archive for‐
30 mats shall be supported; see the -x format option.
31
32 The action to be taken depends on the presence of the -r and -w
33 options. The four combinations of -r and -w are referred to as the four
34 modes of operation: list, read, write, and copy modes, corresponding
35 respectively to the four forms shown in the SYNOPSIS section.
36
37 list In list mode (when neither -r nor -w are specified), pax
38 shall write the names of the members of the archive file read
39 from the standard input, with pathnames matching the speci‐
40 fied patterns, to standard output. If a named file is of type
41 directory, the file hierarchy rooted at that file shall be
42 listed as well.
43
44 read In read mode (when -r is specified, but -w is not), pax shall
45 extract the members of the archive file read from the stan‐
46 dard input, with pathnames matching the specified patterns.
47 If an extracted file is of type directory, the file hierarchy
48 rooted at that file shall be extracted as well. The extracted
49 files shall be created performing pathname resolution with
50 the directory in which pax was invoked as the current working
51 directory.
52
53 If an attempt is made to extract a directory when the direc‐
54 tory already exists, this shall not be considered an error.
55 If an attempt is made to extract a FIFO when the FIFO already
56 exists, this shall not be considered an error.
57
58 The ownership, access, and modification times, and file mode
59 of the restored files are discussed under the -p option.
60
61 write In write mode (when -w is specified, but -r is not), pax
62 shall write the contents of the file operands to the standard
63 output in an archive format. If no file operands are speci‐
64 fied, a list of files to copy, one per line, shall be read
65 from the standard input and each entry in this list shall be
66 processed as if it had been a file operand on the command
67 line. A file of type directory shall include all of the files
68 in the file hierarchy rooted at the file.
69
70 copy In copy mode (when both -r and -w are specified), pax shall
71 copy the file operands to the destination directory.
72
73 If no file operands are specified, a list of files to copy,
74 one per line, shall be read from the standard input. A file
75 of type directory shall include all of the files in the file
76 hierarchy rooted at the file.
77
78 The effect of the copy shall be as if the copied files were
79 written to a pax format archive file and then subsequently
80 extracted, except that copying of sockets may be supported
81 even if archiving them in write mode is not supported, and
82 that there may be hard links between the original and the
83 copied files. If the destination directory is a subdirectory
84 of one of the files to be copied, the results are unspeci‐
85 fied. If the destination directory is a file of a type not
86 defined by the System Interfaces volume of POSIX.1‐2017, the
87 results are implementation-defined; otherwise, it shall be an
88 error for the file named by the directory operand not to
89 exist, not be writable by the user, or not be a file of type
90 directory.
91
92 In read or copy modes, if intermediate directories are necessary to
93 extract an archive member, pax shall perform actions equivalent to the
94 mkdir() function defined in the System Interfaces volume of
95 POSIX.1‐2017, called with the following arguments:
96
97 * The intermediate directory used as the path argument
98
99 * The value of the bitwise-inclusive OR of S_IRWXU, S_IRWXG, and
100 S_IRWXO as the mode argument
101
102 If any specified pattern or file operands are not matched by at least
103 one file or archive member, pax shall write a diagnostic message to
104 standard error for each one that did not match and exit with a non-zero
105 exit status.
106
107 The archive formats described in the EXTENDED DESCRIPTION section shall
108 be automatically detected on input. The default output archive format
109 shall be implementation-defined.
110
111 A single archive can span multiple files. The pax utility shall deter‐
112 mine, in an implementation-defined manner, what file to read or write
113 as the next file.
114
115 If the selected archive format supports the specification of linked
116 files, it shall be an error if these files cannot be linked when the
117 archive is extracted. For archive formats that do not store file con‐
118 tents with each name that causes a hard link, if the file that contains
119 the data is not extracted during this pax session, either the data
120 shall be restored from the original file, or a diagnostic message shall
121 be displayed with the name of a file that can be used to extract the
122 data. In traversing directories, pax shall detect infinite loops; that
123 is, entering a previously visited directory that is an ancestor of the
124 last file visited. When it detects an infinite loop, pax shall write a
125 diagnostic message to standard error and shall terminate.
126
128 The pax utility shall conform to the Base Definitions volume of
129 POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines, except that the
130 order of presentation of the -o, -p, and -s options is significant.
131
132 The following options shall be supported:
133
134 -r Read an archive file from standard input.
135
136 -w Write files to the standard output in the specified archive
137 format.
138
139 -a Append files to the end of the archive. It is implementation-
140 defined which devices on the system support appending. Addi‐
141 tional file formats unspecified by this volume of
142 POSIX.1‐2017 may impose restrictions on appending.
143
144 -b blocksize
145 Block the output at a positive decimal integer number of
146 bytes per write to the archive file. Devices and archive for‐
147 mats may impose restrictions on blocking. Blocking shall be
148 automatically determined on input. Conforming applications
149 shall not specify a blocksize value larger than 32256.
150 Default blocking when creating archives depends on the ar‐
151 chive format. (See the -x option below.)
152
153 -c Match all file or archive members except those specified by
154 the pattern or file operands.
155
156 -d Cause files of type directory being copied or archived or ar‐
157 chive members of type directory being extracted or listed to
158 match only the file or archive member itself and not the file
159 hierarchy rooted at the file.
160
161 -f archive
162 Specify the pathname of the input or output archive, overrid‐
163 ing the default standard input (in list or read modes) or
164 standard output (write mode).
165
166 -H If a symbolic link referencing a file of type directory is
167 specified on the command line, pax shall archive the file
168 hierarchy rooted in the file referenced by the link, using
169 the name of the link as the root of the file hierarchy. Oth‐
170 erwise, if a symbolic link referencing a file of any other
171 file type which pax can normally archive is specified on the
172 command line, then pax shall archive the file referenced by
173 the link, using the name of the link. The default behavior,
174 when neither -H or -L are specified, shall be to archive the
175 symbolic link itself.
176
177 -i Interactively rename files or archive members. For each ar‐
178 chive member matching a pattern operand or file matching a
179 file operand, a prompt shall be written to the file /dev/tty.
180 The prompt shall contain the name of the file or archive mem‐
181 ber, but the format is otherwise unspecified. A line shall
182 then be read from /dev/tty. If this line is blank, the file
183 or archive member shall be skipped. If this line consists of
184 a single period, the file or archive member shall be pro‐
185 cessed with no modification to its name. Otherwise, its name
186 shall be replaced with the contents of the line. The pax
187 utility shall immediately exit with a non-zero exit status if
188 end-of-file is encountered when reading a response or if
189 /dev/tty cannot be opened for reading and writing.
190
191 The results of extracting a hard link to a file that has been
192 renamed during extraction are unspecified.
193
194 -k Prevent the overwriting of existing files.
195
196 -l (The letter ell.) In copy mode, hard links shall be made
197 between the source and destination file hierarchies whenever
198 possible. If specified in conjunction with -H or -L, when a
199 symbolic link is encountered, the hard link created in the
200 destination file hierarchy shall be to the file referenced by
201 the symbolic link. If specified when neither -H nor -L is
202 specified, when a symbolic link is encountered, the implemen‐
203 tation shall create a hard link to the symbolic link in the
204 source file hierarchy or copy the symbolic link to the desti‐
205 nation.
206
207 -L If a symbolic link referencing a file of type directory is
208 specified on the command line or encountered during the tra‐
209 versal of a file hierarchy, pax shall archive the file hier‐
210 archy rooted in the file referenced by the link, using the
211 name of the link as the root of the file hierarchy. Other‐
212 wise, if a symbolic link referencing a file of any other file
213 type which pax can normally archive is specified on the com‐
214 mand line or encountered during the traversal of a file hier‐
215 archy, pax shall archive the file referenced by the link,
216 using the name of the link. The default behavior, when nei‐
217 ther -H or -L are specified, shall be to archive the symbolic
218 link itself.
219
220 -n Select the first archive member that matches each pattern op‐
221 erand. No more than one archive member shall be matched for
222 each pattern (although members of type directory shall still
223 match the file hierarchy rooted at that file).
224
225 -o options
226 Provide information to the implementation to modify the algo‐
227 rithm for extracting or writing files. The value of options
228 shall consist of one or more <comma>-separated keywords of
229 the form:
230
231
232 keyword[[:]=value][,keyword[[:]=value], ...]
233
234 Some keywords apply only to certain file formats, as indi‐
235 cated with each description. Use of keywords that are inap‐
236 plicable to the file format being processed produces unde‐
237 fined results.
238
239 Keywords in the options argument shall be a string that would
240 be a valid portable filename as described in the Base Defini‐
241 tions volume of POSIX.1‐2017, Section 3.282, Portable File‐
242 name Character Set.
243
244 Note: Keywords are not expected to be filenames, merely
245 to follow the same character composition rules as
246 portable filenames.
247
248 Keywords can be preceded with white space. The value field
249 shall consist of zero or more characters; within value, the
250 application shall precede any literal <comma> with a <back‐
251 slash>, which shall be ignored, but preserves the <comma> as
252 part of value. A <comma> as the final character, or a
253 <comma> followed solely by white space as the final charac‐
254 ters, in options shall be ignored. Multiple -o options can be
255 specified; if keywords given to these multiple -o options
256 conflict, the keywords and values appearing later in command
257 line sequence shall take precedence and the earlier shall be
258 silently ignored. The following keyword values of options
259 shall be supported for the file formats as indicated:
260
261 delete=pattern
262 (Applicable only to the -x pax format.) When used in
263 write or copy mode, pax shall omit from extended header
264 records that it produces any keywords matching the
265 string pattern. When used in read or list mode, pax
266 shall ignore any keywords matching the string pattern
267 in the extended header records. In both cases, matching
268 shall be performed using the pattern matching notation
269 described in Section 2.13.1, Patterns Matching a Single
270 Character and Section 2.13.2, Patterns Matching Multi‐
271 ple Characters. For example:
272
273
274 -o delete=security.*
275
276 would suppress security-related information. See pax
277 Extended Header for extended header record keyword
278 usage.
279
280 When multiple -odelete=pattern options are specified,
281 the patterns shall be additive; all keywords matching
282 the specified string patterns shall be omitted from
283 extended header records that pax produces.
284
285 exthdr.name=string
286 (Applicable only to the -x pax format.) This keyword
287 allows user control over the name that is written into
288 the ustar header blocks for the extended header pro‐
289 duced under the circumstances described in pax Header
290 Block. The name shall be the contents of string, after
291 the following character substitutions have been made:
292
293 ┌──────────┬────────────────────────────────────────┐
294 │ string │ │
295 │Includes: │ Replaced by: │
296 ├──────────┼────────────────────────────────────────┤
297 │%d │ The directory name of the file, equiv‐ │
298 │ │ alent to the result of the dirname │
299 │ │ utility on the translated pathname. │
300 │%f │ The filename of the file, equivalent │
301 │ │ to the result of the basename utility │
302 │ │ on the translated pathname. │
303 │%p │ The process ID of the pax process. │
304 │%% │ A '%' character. │
305 └──────────┴────────────────────────────────────────┘
306 Any other '%' characters in string produce undefined
307 results.
308
309 If no -o exthdr.name=string is specified, pax shall use
310 the following default value:
311
312
313 %d/PaxHeaders.%p/%f
314
315 globexthdr.name=string
316 (Applicable only to the -x pax format.) When used in
317 write or copy mode with the appropriate options, pax
318 shall create global extended header records with ustar
319 header blocks that will be treated as regular files by
320 previous versions of pax. This keyword allows user
321 control over the name that is written into the ustar
322 header blocks for global extended header records. The
323 name shall be the contents of string, after the follow‐
324 ing character substitutions have been made:
325
326 ┌──────────┬────────────────────────────────────────┐
327 │ string │ │
328 │Includes: │ Replaced by: │
329 ├──────────┼────────────────────────────────────────┤
330 │%n │ An integer that represents the │
331 │ │ sequence number of the global extended │
332 │ │ header record in the archive, starting │
333 │ │ at 1. │
334 │%p │ The process ID of the pax process. │
335 │%% │ A '%' character. │
336 └──────────┴────────────────────────────────────────┘
337 Any other '%' characters in string produce undefined
338 results.
339
340 If no -o globexthdr.name=string is specified, pax shall
341 use the following default value:
342
343
344 $TMPDIR/GlobalHead.%p.%n
345
346 where $TMPDIR represents the value of the TMPDIR envi‐
347 ronment variable. If TMPDIR is not set, pax shall use
348 /tmp.
349
350 invalid=action
351 (Applicable only to the -x pax format.) This keyword
352 allows user control over the action pax takes upon
353 encountering values in an extended header record that,
354 in read or copy mode, are invalid in the destination
355 hierarchy or, in list mode, cannot be written in the
356 codeset and current locale of the implementation. The
357 following are invalid values that shall be recognized
358 by pax:
359
360 -- In read or copy mode, a filename or link name that
361 contains character encodings invalid in the desti‐
362 nation hierarchy. (For example, the name may con‐
363 tain embedded NULs.)
364
365 -- In read or copy mode, a filename or link name that
366 is longer than the maximum allowed in the destina‐
367 tion hierarchy (for either a pathname component or
368 the entire pathname).
369
370 -- In list mode, any character string value (filename,
371 link name, user name, and so on) that cannot be
372 written in the codeset and current locale of the
373 implementation.
374
375 The following mutually-exclusive values of the action
376 argument are supported:
377
378 binary In write mode, pax shall generate a hdr‐
379 charset=BINARY extended header record for
380 each file with a filename, link name, group
381 name, owner name, or any other field in an
382 extended header record that cannot be trans‐
383 lated to the UTF‐8 codeset, allowing the ar‐
384 chive to contain the files with unencoded
385 extended header record values. In read or
386 copy mode, pax shall use the values specified
387 in the header without translation, regardless
388 of whether this may overwrite an existing
389 file with a valid name. In list mode, pax
390 shall behave identically to the bypass
391 action.
392
393 bypass In read or copy mode, pax shall bypass the
394 file, causing no change to the destination
395 hierarchy. In list mode, pax shall write all
396 requested valid values for the file, but its
397 method for writing invalid values is unspeci‐
398 fied.
399
400 rename In read or copy mode, pax shall act as if the
401 -i option were in effect for each file with
402 invalid filename or link name values, allow‐
403 ing the user to provide a replacement name
404 interactively. In list mode, pax shall
405 behave identically to the bypass action.
406
407 UTF‐8 When used in read, copy, or list mode and a
408 filename, link name, owner name, or any other
409 field in an extended header record cannot be
410 translated from the pax UTF‐8 codeset format
411 to the codeset and current locale of the
412 implementation, pax shall use the actual
413 UTF‐8 encoding for the name. If a hdrcharset
414 extended header record is in effect for this
415 file, the character set specified by that
416 record shall be used instead of UTF‐8. If a
417 hdrcharset=BINARY extended header record is
418 in effect for this file, no translation shall
419 be performed.
420
421 write In read or copy mode, pax shall write the
422 file, translating the name, regardless of
423 whether this may overwrite an existing file
424 with a valid name. In list mode, pax shall
425 behave identically to the bypass action.
426
427 If no -o invalid=option is specified, pax shall act as
428 if -oinvalid=bypass were specified. Any overwriting of
429 existing files that may be allowed by the -oinvalid=
430 actions shall be subject to permission (-p) and modifi‐
431 cation time (-u) restrictions, and shall be suppressed
432 if the -k option is also specified.
433
434 linkdata
435 (Applicable only to the -x pax format.) In write mode,
436 pax shall write the contents of a file to the archive
437 even when that file is merely a hard link to a file
438 whose contents have already been written to the ar‐
439 chive.
440
441 listopt=format
442 This keyword specifies the output format of the table
443 of contents produced when the -v option is specified in
444 list mode. See List Mode Format Specifications. To
445 avoid ambiguity, the listopt=format shall be the only
446 or final keyword=value pair in a -o option-argument;
447 all characters in the remainder of the option-argument
448 shall be considered part of the format string. When
449 multiple -olistopt=format options are specified, the
450 format strings shall be considered a single, concate‐
451 nated string, evaluated in command line order.
452
453 times
454 (Applicable only to the -x pax format.) When used in
455 write or copy mode, pax shall include atime and mtime
456 extended header records for each file. See pax Extended
457 Header File Times.
458
459 In addition to these keywords, if the -x pax format is speci‐
460 fied, any of the keywords and values defined in pax Extended
461 Header, including implementation extensions, can be used in
462 -o option-arguments, in either of two modes:
463
464 keyword=value
465 When used in write or copy mode, these keyword/value
466 pairs shall be included at the beginning of the archive
467 as typeflag g global extended header records. When used
468 in read or list mode, these keyword/value pairs shall
469 act as if they had been at the beginning of the archive
470 as typeflag g global extended header records.
471
472 keyword:=value
473 When used in write or copy mode, these keyword/value
474 pairs shall be included as records at the beginning of
475 a typeflag x extended header for each file. (This shall
476 be equivalent to the <equals-sign> form except that it
477 creates no typeflag g global extended header records.)
478 When used in read or list mode, these keyword/value
479 pairs shall act as if they were included as records at
480 the end of each extended header; thus, they shall over‐
481 ride any global or file-specific extended header record
482 keywords of the same names. For example, in the com‐
483 mand:
484
485
486 pax -r -o "
487 gname:=mygroup,
488 " <archive
489
490 the group name will be forced to a new value for all
491 files read from the archive.
492
493 The precedence of -o keywords over various fields in the ar‐
494 chive is described in pax Extended Header Keyword Precedence.
495 If the -o delete=pattern, -o keyword=value, or -o key‐
496 word:=value options are used to override or remove any
497 extended header data needed to find files in an archive
498 (e.g., -o delete=size for a file whose size cannot be repre‐
499 sented in a ustar header or -o size=100 for a file whose size
500 is not 100 bytes), the behavior is undefined.
501
502 -p string Specify one or more file characteristic options (privileges).
503 The string option-argument shall be a string specifying file
504 characteristics to be retained or discarded on extraction.
505 The string shall consist of the specification characters a,
506 e, m, o, and p. Other implementation-defined characters can
507 be included. Multiple characteristics can be concatenated
508 within the same string and multiple -p options can be speci‐
509 fied. The meaning of the specification characters are as fol‐
510 lows:
511
512 a Do not preserve file access times.
513
514 e Preserve the user ID, group ID, file mode bits (see the
515 Base Definitions volume of POSIX.1‐2017, Section 3.169,
516 File Mode Bits), access time, modification time, and
517 any other implementation-defined file characteristics.
518
519 m Do not preserve file modification times.
520
521 o Preserve the user ID and group ID.
522
523 p Preserve the file mode bits. Other implementation-
524 defined file mode attributes may be preserved.
525
526 In the preceding list, ``preserve'' indicates that an
527 attribute stored in the archive shall be given to the
528 extracted file, subject to the permissions of the invoking
529 process. The access and modification times of the file shall
530 be preserved unless otherwise specified with the -p option or
531 not stored in the archive. All attributes that are not pre‐
532 served shall be determined as part of the normal file cre‐
533 ation action (see Section 1.1.1.4, File Read, Write, and Cre‐
534 ation).
535
536 If neither the e nor the o specification character is speci‐
537 fied, or the user ID and group ID are not preserved for any
538 reason, pax shall not set the S_ISUID and S_ISGID bits of the
539 file mode.
540
541 If the preservation of any of these items fails for any rea‐
542 son, pax shall write a diagnostic message to standard error.
543 Failure to preserve these items shall affect the final exit
544 status, but shall not cause the extracted file to be deleted.
545
546 If file characteristic letters in any of the string option-
547 arguments are duplicated or conflict with each other, the
548 ones given last shall take precedence. For example, if -p eme
549 is specified, file modification times are preserved.
550
551 -s replstr
552 Modify file or archive member names named by pattern or file
553 operands according to the substitution expression replstr,
554 using the syntax of the ed utility. The concepts of
555 ``address'' and ``line'' are meaningless in the context of
556 the pax utility, and shall not be supplied. The format shall
557 be:
558
559
560 -s /old/new/[gp]
561
562 where as in ed, old is a basic regular expression and new can
563 contain an <ampersand>, '\n' (where n is a digit) back-refer‐
564 ences, or subexpression matching. The old string shall also
565 be permitted to contain <newline> characters.
566
567 Any non-null character can be used as a delimiter ('/' shown
568 here). Multiple -s expressions can be specified; the expres‐
569 sions shall be applied in the order specified, terminating
570 with the first successful substitution. The optional trail‐
571 ing 'g' is as defined in the ed utility. The optional trail‐
572 ing 'p' shall cause successful substitutions to be written to
573 standard error. File or archive member names that substitute
574 to the empty string shall be ignored when reading and writing
575 archives.
576
577 -t When reading files from the file system, and if the user has
578 the permissions required by utime() to do so, set the access
579 time of each file read to the access time that it had before
580 being read by pax.
581
582 -u Ignore files that are older (having a less recent file modi‐
583 fication time) than a pre-existing file or archive member
584 with the same name. In read mode, an archive member with the
585 same name as a file in the file system shall be extracted if
586 the archive member is newer than the file. In write mode, an
587 archive file member with the same name as a file in the file
588 system shall be superseded if the file is newer than the ar‐
589 chive member. If -a is also specified, this is accomplished
590 by appending to the archive; otherwise, it is unspecified
591 whether this is accomplished by actual replacement in the ar‐
592 chive or by appending to the archive. In copy mode, the file
593 in the destination hierarchy shall be replaced by the file in
594 the source hierarchy or by a link to the file in the source
595 hierarchy if the file in the source hierarchy is newer.
596
597 -v In list mode, produce a verbose table of contents (see the
598 STDOUT section). Otherwise, write archive member pathnames
599 to standard error (see the STDERR section).
600
601 -x format Specify the output archive format. The pax utility shall sup‐
602 port the following formats:
603
604 cpio The cpio interchange format; see the EXTENDED
605 DESCRIPTION section. The default blocksize for this
606 format for character special archive files shall be
607 5120. Implementations shall support all blocksize
608 values less than or equal to 32256 that are multi‐
609 ples of 512.
610
611 pax The pax interchange format; see the EXTENDED
612 DESCRIPTION section. The default blocksize for this
613 format for character special archive files shall be
614 5120. Implementations shall support all blocksize
615 values less than or equal to 32256 that are multi‐
616 ples of 512.
617
618 ustar The tar interchange format; see the EXTENDED
619 DESCRIPTION section. The default blocksize for this
620 format for character special archive files shall be
621 10240. Implementations shall support all blocksize
622 values less than or equal to 32256 that are multi‐
623 ples of 512.
624
625 Implementation-defined formats shall specify a default block
626 size as well as any other block sizes supported for character
627 special archive files.
628
629 Any attempt to append to an archive file in a format differ‐
630 ent from the existing archive format shall cause pax to exit
631 immediately with a non-zero exit status.
632
633 -X When traversing the file hierarchy specified by a pathname,
634 pax shall not descend into directories that have a different
635 device ID (st_dev; see the System Interfaces volume of
636 POSIX.1‐2017, stat()).
637
638 Specifying more than one of the mutually-exclusive options -H and -L
639 shall not be considered an error and the last option specified shall
640 determine the behavior of the utility.
641
642 The options that operate on the names of files or archive members (-c,
643 -i, -n, -s, -u, and -v) shall interact as follows. In read mode, the
644 archive members shall be selected based on the user-specified pattern
645 operands as modified by the -c, -n, and -u options. Then, any -s and -i
646 options shall modify, in that order, the names of the selected files.
647 The -v option shall write names resulting from these modifications.
648
649 In write mode, the files shall be selected based on the user-specified
650 pathnames as modified by the -n and -u options. Then, any -s and -i
651 options shall modify, in that order, the names of these selected files.
652 The -v option shall write names resulting from these modifications.
653
654 If both the -u and -n options are specified, pax shall not consider a
655 file selected unless it is newer than the file to which it is compared.
656
657 List Mode Format Specifications
658 In list mode with the -o listopt=format option, the format argument
659 shall be applied for each selected file. The pax utility shall append a
660 <newline> to the listopt output for each selected file. The format
661 argument shall be used as the format string described in the Base Defi‐
662 nitions volume of POSIX.1‐2017, Chapter 5, File Format Notation, with
663 the exceptions 1. through 6. defined in the EXTENDED DESCRIPTION sec‐
664 tion of printf, plus the following exceptions:
665
666 7. The sequence (keyword) can occur before a format conversion spec‐
667 ifier. The conversion argument is defined by the value of key‐
668 word. The implementation shall support the following keywords:
669
670 -- Any of the Field Name entries in Table 4-14, ustar Header
671 Block and Table 4-16, Octet-Oriented cpio Archive Entry. The
672 implementation may support the cpio keywords without the
673 leading c_ in addition to the form required by Table 4-16,
674 Octet-Oriented cpio Archive Entry.
675
676 -- Any keyword defined for the extended header in pax Extended
677 Header.
678
679 -- Any keyword provided as an implementation-defined extension
680 within the extended header defined in pax Extended Header.
681
682 For example, the sequence "%(charset)s" is the string value of
683 the name of the character set in the extended header.
684
685 The result of the keyword conversion argument shall be the value
686 from the applicable header field or extended header, without any
687 trailing NULs.
688
689 All keyword values used as conversion arguments shall be trans‐
690 lated from the UTF‐8 encoding (or alternative encoding specified
691 by any hdrcharset extended header record) to the character set
692 appropriate for the local file system, user database, and so on,
693 as applicable.
694
695 8. An additional conversion specifier character, T, shall be used to
696 specify time formats. The T conversion specifier character can be
697 preceded by the sequence (keyword=subformat), where subformat is
698 a date format as defined by date operands. The default keyword
699 shall be mtime and the default subformat shall be:
700
701
702 %b %e %H:%M %Y
703
704 9. An additional conversion specifier character, M, shall be used to
705 specify the file mode string as defined in ls Standard Output. If
706 (keyword) is omitted, the mode keyword shall be used. For exam‐
707 ple, %.1M writes the single character corresponding to the
708 <entry type> field of the ls -l command.
709
710 10. An additional conversion specifier character, D, shall be used to
711 specify the device for block or special files, if applicable, in
712 an implementation-defined format. If not applicable, and (key‐
713 word) is specified, then this conversion shall be equivalent to
714 %(keyword)u. If not applicable, and (keyword) is omitted, then
715 this conversion shall be equivalent to <space>.
716
717 11. An additional conversion specifier character, F, shall be used to
718 specify a pathname. The F conversion character can be preceded by
719 a sequence of <comma>-separated keywords:
720
721
722 (keyword[,keyword] ... )
723
724 The values for all the keywords that are non-null shall be con‐
725 catenated together, each separated by a '/'. The default shall
726 be (path) if the keyword path is defined; otherwise, the default
727 shall be (prefix,name).
728
729 12. An additional conversion specifier character, L, shall be used to
730 specify a symbolic link expansion. If the current file is a sym‐
731 bolic link, then %L shall expand to:
732
733
734 "%s -> %s", <value of keyword>, <contents of link>
735
736 Otherwise, the %L conversion specification shall be the equiva‐
737 lent of %F.
738
740 The following operands shall be supported:
741
742 directory The destination directory pathname for copy mode.
743
744 file A pathname of a file to be copied or archived.
745
746 pattern A pattern matching one or more pathnames of archive members.
747 A pattern must be given in the name-generating notation of
748 the pattern matching notation in Section 2.13, Pattern Match‐
749 ing Notation, including the filename expansion rules in Sec‐
750 tion 2.13.3, Patterns Used for Filename Expansion. The
751 default, if no pattern is specified, is to select all members
752 in the archive.
753
755 In write mode, the standard input shall be used only if no file oper‐
756 ands are specified. It shall be a file containing a list of pathnames,
757 each terminated by a <newline> character.
758
759 In list and read modes, if -f is not specified, the standard input
760 shall be an archive file.
761
762 Otherwise, the standard input shall not be used.
763
765 The input file named by the archive option-argument, or standard input
766 when the archive is read from there, shall be a file formatted accord‐
767 ing to one of the specifications in the EXTENDED DESCRIPTION section or
768 some other implementation-defined format.
769
770 The file /dev/tty shall be used to write prompts and read responses.
771
773 The following environment variables shall affect the execution of pax:
774
775 LANG Provide a default value for the internationalization vari‐
776 ables that are unset or null. (See the Base Definitions vol‐
777 ume of POSIX.1‐2017, Section 8.2, Internationalization Vari‐
778 ables the precedence of internationalization variables used
779 to determine the values of locale categories.)
780
781 LC_ALL If set to a non-empty string value, override the values of
782 all the other internationalization variables.
783
784 LC_COLLATE
785 Determine the locale for the behavior of ranges, equivalence
786 classes, and multi-character collating elements used in the
787 pattern matching expressions for the pattern operand, the
788 basic regular expression for the -s option, and the extended
789 regular expression defined for the yesexpr locale keyword in
790 the LC_MESSAGES category.
791
792 LC_CTYPE Determine the locale for the interpretation of sequences of
793 bytes of text data as characters (for example, single-byte as
794 opposed to multi-byte characters in arguments and input
795 files), the behavior of character classes used in the
796 extended regular expression defined for the yesexpr locale
797 keyword in the LC_MESSAGES category, and pattern matching.
798
799 LC_MESSAGES
800 Determine the locale used to process affirmative responses,
801 and the locale used to affect the format and contents of
802 diagnostic messages and prompts written to standard error.
803
804 LC_TIME Determine the format and contents of date and time strings
805 when the -v option is specified.
806
807 NLSPATH Determine the location of message catalogs for the processing
808 of LC_MESSAGES.
809
810 TMPDIR Determine the pathname that provides part of the default
811 global extended header record file, as described for the -o
812 globexthdr= keyword in the OPTIONS section.
813
814 TZ Determine the timezone used to calculate date and time
815 strings when the -v option is specified. If TZ is unset or
816 null, an unspecified default timezone shall be used.
817
819 Default.
820
822 In write mode, if -f is not specified, the standard output shall be the
823 archive formatted according to one of the specifications in the
824 EXTENDED DESCRIPTION section, or some other implementation-defined for‐
825 mat (see -x format).
826
827 In list mode, when the -olistopt=format has been specified, the
828 selected archive members shall be written to standard output using the
829 format described under List Mode Format Specifications. In list mode
830 without the -olistopt=format option, the table of contents of the
831 selected archive members shall be written to standard output using the
832 following format:
833
834
835 "%s\n", <pathname>
836
837 If the -v option is specified in list mode, the table of contents of
838 the selected archive members shall be written to standard output using
839 the following formats.
840
841 For pathnames representing hard links to previous members of the ar‐
842 chive:
843
844
845 "%s == %s\n", <ls -l listing>, <linkname>
846
847 For all other pathnames:
848
849
850 "%s\n", <ls -l listing>
851
852 where <ls -l listing> shall be the format specified by the ls utility
853 with the -l option. When writing pathnames in this format, it is
854 unspecified what is written for fields for which the underlying archive
855 format does not have the correct information, although the correct num‐
856 ber of <blank>-separated fields shall be written.
857
858 In list mode, standard output shall not be buffered more than a path‐
859 name (plus any associated information and a <newline> terminator) at a
860 time.
861
863 If -v is specified in read, write, or copy modes, pax shall write the
864 pathnames it processes to the standard error output using the following
865 format:
866
867
868 "%s\n", <pathname>
869
870 These pathnames shall be written as soon as processing is begun on the
871 file or archive member, and shall be flushed to standard error. The
872 trailing <newline>, which shall not be buffered, is written when the
873 file has been read or written.
874
875 If the -s option is specified, and the replacement string has a trail‐
876 ing 'p', substitutions shall be written to standard error in the fol‐
877 lowing format:
878
879
880 "%s >> %s\n", <original pathname>, <new pathname>
881
882 In all operating modes of pax, optional messages of unspecified format
883 concerning the input archive format and volume number, the number of
884 files, blocks, volumes, and media parts as well as other diagnostic
885 messages may be written to standard error.
886
887 In all formats, for both standard output and standard error, it is
888 unspecified how non-printable characters in pathnames or link names are
889 written.
890
891 When using the -xpax archive format, if a filename, link name, group
892 name, owner name, or any other field in an extended header record can‐
893 not be translated between the codeset in use for that extended header
894 record and the character set of the current locale, pax shall write a
895 diagnostic message to standard error, shall process the file as
896 described for the -o invalid= option, and then shall continue process‐
897 ing with the next file.
898
900 In read mode, the extracted output files shall be of the archived file
901 type. In copy mode, the copied output files shall be the type of the
902 file being copied. In either mode, existing files in the destination
903 hierarchy shall be overwritten only when all permission (-p), modifica‐
904 tion time (-u), and invalid-value (-oinvalid=) tests allow it.
905
906 In write mode, the output file named by the -f option-argument shall be
907 a file formatted according to one of the specifications in the EXTENDED
908 DESCRIPTION section, or some other implementation-defined format.
909
911 pax Interchange Format
912 A pax archive tape or file produced in the -xpax format shall contain a
913 series of blocks. The physical layout of the archive shall be identical
914 to the ustar format described in ustar Interchange Format. Each file
915 archived shall be represented by the following sequence:
916
917 * An optional header block with extended header records. This header
918 block is of the form described in pax Header Block, with a typeflag
919 value of x or g. The extended header records, described in pax
920 Extended Header, shall be included as the data for this header
921 block.
922
923 * A header block that describes the file. Any fields in the preceding
924 optional extended header shall override the associated fields in
925 this header block for this file.
926
927 * Zero or more blocks that contain the contents of the file.
928
929 At the end of the archive file there shall be two 512-byte blocks
930 filled with binary zeros, interpreted as an end-of-archive indicator.
931
932 A schematic of an example archive with global extended header records
933 and two actual files is shown in Figure 4-1, pax Format Archive Exam‐
934 ple. In the example, the second file in the archive has no extended
935 header preceding it, presumably because it has no need for extended
936 attributes.
937
938 Figure 4-1: pax Format Archive Example
939
940 pax Header Block
941 The pax header block shall be identical to the ustar header block
942 described in ustar Interchange Format, except that two additional type‐
943 flag values are defined:
944
945 x Represents extended header records for the following file in the
946 archive (which shall have its own ustar header block). The format
947 of these extended header records shall be as described in pax
948 Extended Header.
949
950 g Represents global extended header records for the following files
951 in the archive. The format of these extended header records shall
952 be as described in pax Extended Header. Each value shall affect
953 all subsequent files that do not override that value in their own
954 extended header record and until another global extended header
955 record is reached that provides another value for the same field.
956 The typeflag g global headers should not be used with interchange
957 media that could suffer partial data loss in transporting the ar‐
958 chive.
959
960 For both of these types, the size field shall be the size of the
961 extended header records in octets. The other fields in the header block
962 are not meaningful to this version of the pax utility. However, if this
963 archive is read by a pax utility conforming to the ISO POSIX‐2:1993
964 standard, the header block fields are used to create a regular file
965 that contains the extended header records as data. Therefore, header
966 block field values should be selected to provide reasonable file access
967 to this regular file.
968
969 A further difference from the ustar header block is that data blocks
970 for files of typeflag 1 (the digit one) (hard link) may be included,
971 which means that the size field may be greater than zero. Archives cre‐
972 ated by pax -o linkdata shall include these data blocks with the hard
973 links.
974
975 pax Extended Header
976 A pax extended header contains values that are inappropriate for the
977 ustar header block because of limitations in that format: fields
978 requiring a character encoding other than that described in the
979 ISO/IEC 646:1991 standard, fields representing file attributes not
980 described in the ustar header, and fields whose format or length do not
981 fit the requirements of the ustar header. The values in an extended
982 header add attributes to the following file (or files; see the descrip‐
983 tion of the typeflag g header block) or override values in the follow‐
984 ing header block(s), as indicated in the following list of keywords.
985
986 An extended header shall consist of one or more records, each con‐
987 structed as follows:
988
989
990 "%d %s=%s\n", <length>, <keyword>, <value>
991
992 The extended header records shall be encoded according to the
993 ISO/IEC 10646‐1:2000 standard UTF‐8 encoding. The <length> field,
994 <blank>, <equals-sign>, and <newline> shown shall be limited to the
995 portable character set, as encoded in UTF‐8. The <keyword> fields can
996 be any UTF‐8 characters. The <length> field shall be the decimal
997 length of the extended header record in octets, including the trailing
998 <newline>. If there is a hdrcharset extended header in effect for a
999 file, the value field for any gname, linkpath, path, and uname extended
1000 header records shall be encoded using the character set specified by
1001 the hdrcharset extended header record; otherwise, the value field shall
1002 be encoded using UTF‐8. The value field for all other keywords speci‐
1003 fied by POSIX.1‐2008 shall be encoded using UTF‐8.
1004
1005 The <keyword> field shall be one of the entries from the following list
1006 or a keyword provided as an implementation extension. Keywords con‐
1007 sisting entirely of lowercase letters, digits, and periods are reserved
1008 for future standardization. A keyword shall not include an <equals-
1009 sign>. (In the following list, the notations ``file(s)'' or
1010 ``block(s)'' is used to acknowledge that a keyword affects the follow‐
1011 ing single file after a typeflag x extended header, but possibly multi‐
1012 ple files after typeflag g. Any requirements in the list for pax to
1013 include a record when in write or copy mode shall apply only when such
1014 a record has not already been provided through the use of the -o
1015 option. When used in copy mode, pax shall behave as if an archive had
1016 been created with applicable extended header records and then
1017 extracted.)
1018
1019 atime The file access time for the following file(s), equivalent to
1020 the value of the st_atime member of the stat structure for a
1021 file, as described by the stat() function. The access time
1022 shall be restored if the process has appropriate privileges
1023 required to do so. The format of the <value> shall be as
1024 described in pax Extended Header File Times.
1025
1026 charset The name of the character set used to encode the data in the
1027 following file(s). The entries in the following table are
1028 defined to refer to known standards; additional names may be
1029 agreed on between the originator and recipient.
1030
1031 ┌────────────────────────┬───────────────────────────────┐
1032 │ <value> │ Formal Standard │
1033 ├────────────────────────┼───────────────────────────────┤
1034 │ISO-IR 646 1990 │ ISO/IEC 646:1990 │
1035 │ISO-IR 8859 1 1998 │ ISO/IEC 8859‐1:1998 │
1036 │ISO-IR 8859 2 1999 │ ISO/IEC 8859‐2:1999 │
1037 │ISO-IR 8859 3 1999 │ ISO/IEC 8859‐3:1999 │
1038 │ISO-IR 8859 4 1998 │ ISO/IEC 8859‐4:1998 │
1039 │ISO-IR 8859 5 1999 │ ISO/IEC 8859‐5:1999 │
1040 │ISO-IR 8859 6 1999 │ ISO/IEC 8859‐6:1999 │
1041 │ISO-IR 8859 7 1987 │ ISO/IEC 8859‐7:1987 │
1042 │ISO-IR 8859 8 1999 │ ISO/IEC 8859‐8:1999 │
1043 │ISO-IR 8859 9 1999 │ ISO/IEC 8859‐9:1999 │
1044 │ISO-IR 8859 10 1998 │ ISO/IEC 8859‐10:1998 │
1045 │ISO-IR 8859 13 1998 │ ISO/IEC 8859‐13:1998 │
1046 │ISO-IR 8859 14 1998 │ ISO/IEC 8859‐14:1998 │
1047 │ISO-IR 8859 15 1999 │ ISO/IEC 8859‐15:1999 │
1048 │ISO-IR 10646 2000 │ ISO/IEC 10646:2000 │
1049 │ISO-IR 10646 2000 UTF-8 │ ISO/IEC 10646, UTF-8 encoding │
1050 │BINARY │ None. │
1051 └────────────────────────┴───────────────────────────────┘
1052 The encoding is included in an extended header for informa‐
1053 tion only; when pax is used as described in POSIX.1‐2008, it
1054 shall not translate the file data into any other encoding.
1055 The BINARY entry indicates unencoded binary data.
1056
1057 When used in write or copy mode, it is implementation-defined
1058 whether pax includes a charset extended header record for a
1059 file.
1060
1061 comment A series of characters used as a comment. All characters in
1062 the <value> field shall be ignored by pax.
1063
1064 gid The group ID of the group that owns the file, expressed as a
1065 decimal number using digits from the ISO/IEC 646:1991 stan‐
1066 dard. This record shall override the gid field in the follow‐
1067 ing header block(s). When used in write or copy mode, pax
1068 shall include a gid extended header record for each file
1069 whose group ID is greater than 2097151 (octal 7777777).
1070
1071 gname The group of the file(s), formatted as a group name in the
1072 group database. This record shall override the gid and gname
1073 fields in the following header block(s), and any gid extended
1074 header record. When used in read, copy, or list mode, pax
1075 shall translate the name from the encoding in the header
1076 record to the character set appropriate for the group data‐
1077 base on the receiving system. If any of the characters cannot
1078 be translated, and if neither the -oinvalid=UTF‐8 option nor
1079 the -oinvalid=binary option is specified, the results are
1080 implementation-defined. When used in write or copy mode, pax
1081 shall include a gname extended header record for each file
1082 whose group name cannot be represented entirely with the let‐
1083 ters and digits of the portable character set.
1084
1085 hdrcharset
1086 The name of the character set used to encode the value field
1087 of the gname, linkpath, path, and uname pax extended header
1088 records. The entries in the following table are defined to
1089 refer to known standards; additional names may be agreed
1090 between the originator and the recipient.
1091
1092 ┌────────────────────────┬───────────────────────────────┐
1093 │ <value> │ Formal Standard │
1094 ├────────────────────────┼───────────────────────────────┤
1095 │ISO-IR 10646 2000 UTF-8 │ ISO/IEC 10646, UTF-8 encoding │
1096 │BINARY │ None. │
1097 └────────────────────────┴───────────────────────────────┘
1098 If no hdrcharset extended header record is specified, the
1099 default character set used to encode all values in extended
1100 header records shall be the ISO/IEC 10646‐1:2000 standard
1101 UTF‐8 encoding.
1102
1103 The BINARY entry indicates that all values recorded in
1104 extended headers for affected files are unencoded binary data
1105 from the underlying system.
1106
1107 linkpath The pathname of a link being created to another file, of any
1108 type, previously archived. This record shall override the
1109 linkname field in the following ustar header block(s). The
1110 following ustar header block shall determine the type of link
1111 created. If typeflag of the following header block is 1, it
1112 shall be a hard link. If typeflag is 2, it shall be a sym‐
1113 bolic link and the linkpath value shall be the contents of
1114 the symbolic link. The pax utility shall translate the name
1115 of the link (contents of the symbolic link) from the encoding
1116 in the header to the character set appropriate for the local
1117 file system. When used in write or copy mode, pax shall
1118 include a linkpath extended header record for each link whose
1119 pathname cannot be represented entirely with the members of
1120 the portable character set other than NUL.
1121
1122 mtime The file modification time of the following file(s), equiva‐
1123 lent to the value of the st_mtime member of the stat struc‐
1124 ture for a file, as described in the stat() function. This
1125 record shall override the mtime field in the following header
1126 block(s). The modification time shall be restored if the
1127 process has appropriate privileges required to do so. The
1128 format of the <value> shall be as described in pax Extended
1129 Header File Times.
1130
1131 path The pathname of the following file(s). This record shall
1132 override the name and prefix fields in the following header
1133 block(s). The pax utility shall translate the pathname of the
1134 file from the encoding in the header to the character set
1135 appropriate for the local file system.
1136
1137 When used in write or copy mode, pax shall include a path
1138 extended header record for each file whose pathname cannot be
1139 represented entirely with the members of the portable charac‐
1140 ter set other than NUL.
1141
1142 realtime.any
1143 The keywords prefixed by ``realtime.'' are reserved for
1144 future standardization.
1145
1146 security.any
1147 The keywords prefixed by ``security.'' are reserved for
1148 future standardization.
1149
1150 size The size of the file in octets, expressed as a decimal number
1151 using digits from the ISO/IEC 646:1991 standard. This record
1152 shall override the size field in the following header
1153 block(s). When used in write or copy mode, pax shall include
1154 a size extended header record for each file with a size value
1155 greater than 8589934591 (octal 77777777777).
1156
1157 uid The user ID of the file owner, expressed as a decimal number
1158 using digits from the ISO/IEC 646:1991 standard. This record
1159 shall override the uid field in the following header
1160 block(s). When used in write or copy mode, pax shall include
1161 a uid extended header record for each file whose owner ID is
1162 greater than 2097151 (octal 7777777).
1163
1164 uname The owner of the following file(s), formatted as a user name
1165 in the user database. This record shall override the uid and
1166 uname fields in the following header block(s), and any uid
1167 extended header record. When used in read, copy, or list
1168 mode, pax shall translate the name from the encoding in the
1169 header record to the character set appropriate for the user
1170 database on the receiving system. If any of the characters
1171 cannot be translated, and if neither the -oinvalid=UTF‐8
1172 option nor the -oinvalid=binary option is specified, the
1173 results are implementation-defined. When used in write or
1174 copy mode, pax shall include a uname extended header record
1175 for each file whose user name cannot be represented entirely
1176 with the letters and digits of the portable character set.
1177
1178 If the <value> field is zero length, it shall delete any header block
1179 field, previously entered extended header value, or global extended
1180 header value of the same name.
1181
1182 If a keyword in an extended header record (or in a -o option-argument)
1183 overrides or deletes a corresponding field in the ustar header block,
1184 pax shall ignore the contents of that header block field.
1185
1186 Unlike the ustar header block fields, NULs shall not delimit <value>s;
1187 all characters within the <value> field shall be considered data for
1188 the field. None of the length limitations of the ustar header block
1189 fields in Table 4-14, ustar Header Block shall apply to the extended
1190 header records.
1191
1192 pax Extended Header Keyword Precedence
1193 This section describes the precedence in which the various header
1194 records and fields and command line options are selected to apply to a
1195 file in the archive. When pax is used in read or list modes, it shall
1196 determine a file attribute in the following sequence:
1197
1198 1. If -odelete=keyword-prefix is used, the affected attributes shall
1199 be determined from step 7., if applicable, or ignored otherwise.
1200
1201 2. If -okeyword:= is used, the affected attributes shall be ignored.
1202
1203 3. If -okeyword:=value is used, the affected attribute shall be
1204 assigned the value.
1205
1206 4. If there is a typeflag x extended header record, the affected
1207 attribute shall be assigned the <value>. When extended header
1208 records conflict, the last one given in the header shall take
1209 precedence.
1210
1211 5. If -okeyword=value is used, the affected attribute shall be
1212 assigned the value.
1213
1214 6. If there is a typeflag g global extended header record, the
1215 affected attribute shall be assigned the <value>. When global
1216 extended header records conflict, the last one given in the global
1217 header shall take precedence.
1218
1219 7. Otherwise, the attribute shall be determined from the ustar header
1220 block.
1221
1222 pax Extended Header File Times
1223 The pax utility shall write an mtime record for each file in write or
1224 copy modes if the file's modification time cannot be represented
1225 exactly in the ustar header logical record described in ustar Inter‐
1226 change Format. This can occur if the time is out of ustar range, or if
1227 the file system of the underlying implementation supports non-integer
1228 time granularities and the time is not an integer. All of these time
1229 records shall be formatted as a decimal representation of the time in
1230 seconds since the Epoch. If a <period> ('.') decimal point character
1231 is present, the digits to the right of the point shall represent the
1232 units of a subsecond timing granularity, where the first digit is
1233 tenths of a second and each subsequent digit is a tenth of the previous
1234 digit. In read or copy mode, the pax utility shall truncate the time of
1235 a file to the greatest value that is not greater than the input header
1236 file time. In write or copy mode, the pax utility shall output a time
1237 exactly if it can be represented exactly as a decimal number, and oth‐
1238 erwise shall generate only enough digits so that the same time shall be
1239 recovered if the file is extracted on a system whose underlying imple‐
1240 mentation supports the same time granularity.
1241
1242 ustar Interchange Format
1243 A ustar archive tape or file shall contain a series of logical records.
1244 Each logical record shall be a fixed-size logical record of 512 octets
1245 (see below). Although this format may be thought of as being stored on
1246 9-track industry-standard 12.7 mm (0.5 in) magnetic tape, other types
1247 of transportable media are not excluded. Each file archived shall be
1248 represented by a header logical record that describes the file, fol‐
1249 lowed by zero or more logical records that give the contents of the
1250 file. At the end of the archive file there shall be two 512-octet logi‐
1251 cal records filled with binary zeros, interpreted as an end-of-archive
1252 indicator.
1253
1254 The logical records may be grouped for physical I/O operations, as
1255 described under the -bblocksize and -x ustar options. Each group of
1256 logical records may be written with a single operation equivalent to
1257 the write() function. On magnetic tape, the result of this write shall
1258 be a single tape physical block. The last physical block shall always
1259 be the full size, so logical records after the two zero logical records
1260 may contain undefined data.
1261
1262 The header logical record shall be structured as shown in the following
1263 table. All lengths and offsets are in decimal.
1264
1265 Table 4-14: ustar Header Block
1266
1267 ┌───────────┬──────────────┬────────────────────┐
1268 │Field Name │ Octet Offset │ Length (in Octets) │
1269 ├───────────┼──────────────┼────────────────────┤
1270 │name │ 0 │ 100 │
1271 │mode │ 100 │ 8 │
1272 │uid │ 108 │ 8 │
1273 │gid │ 116 │ 8 │
1274 │size │ 124 │ 12 │
1275 │mtime │ 136 │ 12 │
1276 │chksum │ 148 │ 8 │
1277 │typeflag │ 156 │ 1 │
1278 │linkname │ 157 │ 100 │
1279 │magic │ 257 │ 6 │
1280 │version │ 263 │ 2 │
1281 │uname │ 265 │ 32 │
1282 │gname │ 297 │ 32 │
1283 │devmajor │ 329 │ 8 │
1284 │devminor │ 337 │ 8 │
1285 │prefix │ 345 │ 155 │
1286 └───────────┴──────────────┴────────────────────┘
1287 All characters in the header logical record shall be represented in the
1288 coded character set of the ISO/IEC 646:1991 standard. For maximum
1289 portability between implementations, names should be selected from
1290 characters represented by the portable filename character set as octets
1291 with the most significant bit zero. If an implementation supports the
1292 use of characters outside of <slash> and the portable filename charac‐
1293 ter set in names for files, users, and groups, one or more implementa‐
1294 tion-defined encodings of these characters shall be provided for inter‐
1295 change purposes.
1296
1297 However, the pax utility shall never create filenames on the local sys‐
1298 tem that cannot be accessed via the procedures described in
1299 POSIX.1‐2008. If a filename is found on the medium that would create an
1300 invalid filename, it is implementation-defined whether the data from
1301 the file is stored on the file hierarchy and under what name it is
1302 stored. The pax utility may choose to ignore these files as long as it
1303 produces an error indicating that the file is being ignored.
1304
1305 Each field within the header logical record is contiguous; that is,
1306 there is no padding used. Each character on the archive medium shall be
1307 stored contiguously.
1308
1309 The fields magic, uname, and gname are character strings each termi‐
1310 nated by a NUL character. The fields name, linkname, and prefix are
1311 NUL-terminated character strings except when all characters in the
1312 array contain non-NUL characters including the last character. The ver‐
1313 sion field is two octets containing the characters "00" (zero-zero).
1314 The typeflag contains a single character. All other fields are leading
1315 zero-filled octal numbers using digits from the ISO/IEC 646:1991 stan‐
1316 dard IRV. Each numeric field is terminated by one or more <space> or
1317 NUL characters.
1318
1319 The name and the prefix fields shall produce the pathname of the file.
1320 A new pathname shall be formed, if prefix is not an empty string (its
1321 first character is not NUL), by concatenating prefix (up to the first
1322 NUL character), a <slash> character, and name; otherwise, name is used
1323 alone. In either case, name is terminated at the first NUL character.
1324 If prefix begins with a NUL character, it shall be ignored. In this
1325 manner, pathnames of at most 256 characters can be supported. If a
1326 pathname does not fit in the space provided, pax shall notify the user
1327 of the error, and shall not store any part of the file—header or data—
1328 on the medium.
1329
1330 The linkname field, described below, shall not use the prefix to pro‐
1331 duce a pathname. As such, a linkname is limited to 100 characters. If
1332 the name does not fit in the space provided, pax shall notify the user
1333 of the error, and shall not attempt to store the link on the medium.
1334
1335 The mode field provides 12 bits encoded in the ISO/IEC 646:1991 stan‐
1336 dard octal digit representation. The encoded bits shall represent the
1337 following values:
1338
1339 Table: ustar mode Field
1340
1341 ┌──────────┬──────────────────┬─────────────────────────────────────────────────┐
1342 │Bit Value │ POSIX.1‐2008 Bit │ Description │
1343 ├──────────┼──────────────────┼─────────────────────────────────────────────────┤
1344 │ 04000 │ S_ISUID │ Set UID on execution. │
1345 │ 02000 │ S_ISGID │ Set GID on execution. │
1346 │ 01000 │ <reserved> │ Reserved for future standardization. │
1347 │ 00400 │ S_IRUSR │ Read permission for file owner class. │
1348 │ 00200 │ S_IWUSR │ Write permission for file owner class. │
1349 │ 00100 │ S_IXUSR │ Execute/search permission for file owner class. │
1350 │ 00040 │ S_IRGRP │ Read permission for file group class. │
1351 │ 00020 │ S_IWGRP │ Write permission for file group class. │
1352 │ 00010 │ S_IXGRP │ Execute/search permission for file group class. │
1353 │ 00004 │ S_IROTH │ Read permission for file other class. │
1354 │ 00002 │ S_IWOTH │ Write permission for file other class. │
1355 │ 00001 │ S_IXOTH │ Execute/search permission for file other class. │
1356 └──────────┴──────────────────┴─────────────────────────────────────────────────┘
1357 When appropriate privileges are required to set one of these mode bits,
1358 and the user restoring the files from the archive does not have appro‐
1359 priate privileges, the mode bits for which the user does not have
1360 appropriate privileges shall be ignored. Some of the mode bits in the
1361 archive format are not mentioned elsewhere in this volume of
1362 POSIX.1‐2017. If the implementation does not support those bits, they
1363 may be ignored.
1364
1365 The uid and gid fields are the user and group ID of the owner and group
1366 of the file, respectively.
1367
1368 The size field is the size of the file in octets. If the typeflag field
1369 is set to specify a file to be of type 1 (a link) or 2 (a symbolic
1370 link), the size field shall be specified as zero. If the typeflag field
1371 is set to specify a file of type 5 (directory), the size field shall be
1372 interpreted as described under the definition of that record type. No
1373 data logical records are stored for types 1, 2, or 5. If the typeflag
1374 field is set to 3 (character special file), 4 (block special file), or
1375 6 (FIFO), the meaning of the size field is unspecified by this volume
1376 of POSIX.1‐2017, and no data logical records shall be stored on the
1377 medium. Additionally, for type 6, the size field shall be ignored when
1378 reading. If the typeflag field is set to any other value, the number of
1379 logical records written following the header shall be (size+511)/512,
1380 ignoring any fraction in the result of the division.
1381
1382 The mtime field shall be the modification time of the file at the time
1383 it was archived. It is the ISO/IEC 646:1991 standard representation of
1384 the octal value of the modification time obtained from the stat() func‐
1385 tion.
1386
1387 The chksum field shall be the ISO/IEC 646:1991 standard IRV representa‐
1388 tion of the octal value of the simple sum of all octets in the header
1389 logical record. Each octet in the header shall be treated as an
1390 unsigned value. These values shall be added to an unsigned integer,
1391 initialized to zero, the precision of which is not less than 17 bits.
1392 When calculating the checksum, the chksum field is treated as if it
1393 were all <space> characters.
1394
1395 The typeflag field specifies the type of file archived. If a particular
1396 implementation does not recognize the type, or the user does not have
1397 appropriate privileges to create that type, the file shall be extracted
1398 as if it were a regular file if the file type is defined to have a
1399 meaning for the size field that could cause data logical records to be
1400 written on the medium (see the previous description for size). If con‐
1401 version to a regular file occurs, the pax utility shall produce an
1402 error indicating that the conversion took place. All of the typeflag
1403 fields shall be coded in the ISO/IEC 646:1991 standard IRV:
1404
1405 0 Represents a regular file. For backwards-compatibility, a type‐
1406 flag value of binary zero ('\0') should be recognized as mean‐
1407 ing a regular file when extracting files from the archive. Ar‐
1408 chives written with this version of the archive file format
1409 create regular files with a typeflag value of the
1410 ISO/IEC 646:1991 standard IRV '0'.
1411
1412 1 Represents a file linked to another file, of any type, previ‐
1413 ously archived. Such files are identified by having the same
1414 device and file serial numbers, and pathnames that refer to
1415 different directory entries. All such files shall be archived
1416 as linked files. The linked-to name is specified in the
1417 linkname field with a NUL-character terminator if it is less
1418 than 100 octets in length.
1419
1420 2 Represents a symbolic link. The contents of the symbolic link
1421 shall be stored in the linkname field.
1422
1423 3,4 Represent character special files and block special files
1424 respectively. In this case the devmajor and devminor fields
1425 shall contain information defining the device, the format of
1426 which is unspecified by this volume of POSIX.1‐2017. Implemen‐
1427 tations may map the device specifications to their own local
1428 specification or may ignore the entry.
1429
1430 5 Specifies a directory or subdirectory. On systems where disk
1431 allocation is performed on a directory basis, the size field
1432 shall contain the maximum number of octets (which may be
1433 rounded to the nearest disk block allocation unit) that the
1434 directory may hold. A size field of zero indicates no such
1435 limiting. Systems that do not support limiting in this manner
1436 should ignore the size field.
1437
1438 6 Specifies a FIFO special file. Note that the archiving of a
1439 FIFO file archives the existence of this file and not its con‐
1440 tents.
1441
1442 7 Reserved to represent a file to which an implementation has
1443 associated some high-performance attribute. Implementations
1444 without such extensions should treat this file as a regular
1445 file (type 0).
1446
1447 A‐Z The letters 'A' to 'Z', inclusive, are reserved for custom
1448 implementations. All other values are reserved for future ver‐
1449 sions of this standard.
1450
1451 It is unspecified whether files with pathnames that refer to the same
1452 directory entry are archived as linked files or as separate files. If
1453 they are archived as linked files, this means that attempting to
1454 extract both pathnames from the resulting archive will always cause an
1455 error (unless the -u option is used) because the link cannot be cre‐
1456 ated.
1457
1458 It is unspecified whether files with the same device and file serial
1459 numbers being appended to an archive are treated as linked files to
1460 members that were in the archive before the append.
1461
1462 Attempts to archive a socket shall produce a diagnostic message when
1463 ustar interchange format is used, but may be allowed when pax inter‐
1464 change format is used. Handling of other file types is implementation-
1465 defined.
1466
1467 The magic field is the specification that this archive was output in
1468 this archive format. If this field contains ustar (the five characters
1469 from the ISO/IEC 646:1991 standard IRV shown followed by NUL), the
1470 uname and gname fields shall contain the ISO/IEC 646:1991 standard IRV
1471 representation of the owner and group of the file, respectively (trun‐
1472 cated to fit, if necessary). When the file is restored by a privileged,
1473 protection-preserving version of the utility, the user and group data‐
1474 bases shall be scanned for these names. If found, the user and group
1475 IDs contained within these files shall be used rather than the values
1476 contained within the uid and gid fields.
1477
1478 cpio Interchange Format
1479 The octet-oriented cpio archive format shall be a series of entries,
1480 each comprising a header that describes the file, the name of the file,
1481 and then the contents of the file.
1482
1483 An archive may be recorded as a series of fixed-size blocks of octets.
1484 This blocking shall be used only to make physical I/O more efficient.
1485 The last group of blocks shall always be at the full size.
1486
1487 For the octet-oriented cpio archive format, the individual entry infor‐
1488 mation shall be in the order indicated and described by the following
1489 table; see also the <cpio.h> header.
1490
1491 Table 4-16: Octet-Oriented cpio Archive Entry
1492
1493 ┌─────────────────────┬────────────────────┬─────────────────┐
1494 │ Header Field Name │ Length (in Octets) │ Interpreted as │
1495 ├─────────────────────┼────────────────────┼─────────────────┤
1496 │c_magic │ 6 │ Octal number │
1497 │c_dev │ 6 │ Octal number │
1498 │c_ino │ 6 │ Octal number │
1499 │c_mode │ 6 │ Octal number │
1500 │c_uid │ 6 │ Octal number │
1501 │c_gid │ 6 │ Octal number │
1502 │c_nlink │ 6 │ Octal number │
1503 │c_rdev │ 6 │ Octal number │
1504 │c_mtime │ 11 │ Octal number │
1505 │c_namesize │ 6 │ Octal number │
1506 │c_filesize │ 11 │ Octal number │
1507 ├─────────────────────┼────────────────────┼─────────────────┤
1508 │Filename Field Name │ Length │ Interpreted as │
1509 ├─────────────────────┴────────────────────┴─────────────────┤
1510 │c_name c_namesize Pathname string │
1511 ├─────────────────────┬────────────────────┬─────────────────┤
1512 │File Data Field Name │ Length │ Interpreted as │
1513 ├─────────────────────┴────────────────────┴─────────────────┤
1514 │c_filedata c_filesize Data │
1515 └────────────────────────────────────────────────────────────┘
1516 cpio Header
1517 For each file in the archive, a header as defined previously shall be
1518 written. The information in the header fields is written as streams of
1519 the ISO/IEC 646:1991 standard characters interpreted as octal numbers.
1520 The octal numbers shall be extended to the necessary length by append‐
1521 ing the ISO/IEC 646:1991 standard IRV zeros at the most-significant-
1522 digit end of the number; the result is written to the most-significant
1523 digit of the stream of octets first. The fields shall be interpreted
1524 as follows:
1525
1526 c_magic Identify the archive as being a transportable archive by con‐
1527 taining the identifying value "070707".
1528
1529 c_dev, c_ino
1530 Contains values that uniquely identify the file within the
1531 archive (that is, no files contain the same pair of c_dev and
1532 c_ino values unless they are links to the same file). The
1533 values shall be determined in an unspecified manner.
1534
1535 c_mode Contains the file type and access permissions as defined in
1536 the following table.
1537
1538 Table 4-17: Values for cpio c_mode Field
1539
1540 │──────────────────────┬─────────┬────────────────────────┬─
1541 │ File Permissions Name│ Value │ Indicates │
1542 │──────────────────────┼─────────┼────────────────────────┼─
1543 │ C_IRUSR │ 000400│ Read by owner │
1544 │ C_IWUSR │ 000200│ Write by owner │
1545 │ C_IXUSR │ 000100│ Execute by owner │
1546 │ C_IRGRP │ 000040│ Read by group │
1547 │ C_IWGRP │ 000020│ Write by group │
1548 │ C_IXGRP │ 000010│ Execute by group │
1549 │ C_IROTH │ 000004│ Read by others │
1550 │ C_IWOTH │ 000002│ Write by others │
1551 │ C_IXOTH │ 000001│ Execute by others │
1552 │ C_ISUID │ 004000│ Set uid │
1553 │ C_ISGID │ 002000│ Set gid │
1554 │ C_ISVTX │ 001000│ Reserved │
1555 │──────────────────────┼─────────┼────────────────────────┼─
1556 │ File Type Name │ Value │ Indicates │
1557 │──────────────────────┼─────────┼────────────────────────┼─
1558 │ C_ISDIR │ 040000│ Directory │
1559 │ C_ISFIFO │ 010000│ FIFO │
1560 │ C_ISREG │ 0100000│ Regular file │
1561 │ C_ISLNK │ 0120000│ Symbolic link │
1562 │ │ │ │
1563 │C_ISBLK │ 060000 │ Block special file │
1564 │C_ISCHR │ 020000 │ Character special file │
1565 │C_ISSOCK │ 0140000 │ Socket │
1566 │ │ │ │
1567 │C_ISCTG │ 0110000 │ Reserved │
1568 └──────────────────────┴─────────┴────────────────────────┘
1569 Directories, FIFOs, symbolic links, and regular files shall
1570 be supported on a system conforming to this volume of
1571 POSIX.1‐2017; additional values defined previously are
1572 reserved for compatibility with existing systems. Additional
1573 file types may be supported; however, such files should not
1574 be written to archives intended to be transported to other
1575 systems.
1576
1577 c_uid Contains the user ID of the owner.
1578
1579 c_gid Contains the group ID of the group.
1580
1581 c_nlink Contains a number greater than or equal to the number of
1582 links in the archive referencing the file. If the -a option
1583 is used to append to a cpio archive, then the pax utility
1584 need not account for the files in the existing part of the
1585 archive when calculating the c_nlink values for the appended
1586 part of the archive, and need not alter the c_nlink values in
1587 the existing part of the archive if additional files with the
1588 same c_dev and c_ino values are appended to the archive.
1589
1590 c_rdev Contains implementation-defined information for character or
1591 block special files.
1592
1593 c_mtime Contains the latest time of modification of the file at the
1594 time the archive was created.
1595
1596 c_namesize
1597 Contains the length of the pathname, including the terminat‐
1598 ing NUL character.
1599
1600 c_filesize
1601 Contains the length in octets of the data section following
1602 the header structure.
1603
1604 cpio Filename
1605 The c_name field shall contain the pathname of the file. The length of
1606 this field in octets is the value of c_namesize.
1607
1608 If a filename is found on the medium that would create an invalid path‐
1609 name, it is implementation-defined whether the data from the file is
1610 stored on the file hierarchy and under what name it is stored.
1611
1612 All characters shall be represented in the ISO/IEC 646:1991 standard
1613 IRV. For maximum portability between implementations, names should be
1614 selected from characters represented by the portable filename character
1615 set as octets with the most significant bit zero. If an implementation
1616 supports the use of characters outside the portable filename character
1617 set in names for files, users, and groups, one or more implementation-
1618 defined encodings of these characters shall be provided for interchange
1619 purposes. However, the pax utility shall never create filenames on the
1620 local system that cannot be accessed via the procedures described pre‐
1621 viously in this volume of POSIX.1‐2017. If a filename is found on the
1622 medium that would create an invalid filename, it is implementation-
1623 defined whether the data from the file is stored on the local file sys‐
1624 tem and under what name it is stored. The pax utility may choose to
1625 ignore these files as long as it produces an error indicating that the
1626 file is being ignored.
1627
1628 cpio File Data
1629 Following c_name, there shall be c_filesize octets of data. Interpreta‐
1630 tion of such data occurs in a manner dependent on the file. For regular
1631 files, the data shall consist of the contents of the file. For symbolic
1632 links, the data shall consist of the contents of the symbolic link. If
1633 c_filesize is zero, no data shall be contained in c_filedata.
1634
1635 When restoring from an archive:
1636
1637 * If the user does not have appropriate privileges to create a file
1638 of the specified type, pax shall ignore the entry and write an
1639 error message to standard error.
1640
1641 * Only regular files and symbolic links have data to be restored.
1642 Presuming a regular file meets any selection criteria that might be
1643 imposed on the format-reading utility by the user, such data shall
1644 be restored.
1645
1646 * If a user does not have appropriate privileges to set a particular
1647 mode flag, the flag shall be ignored. Some of the mode flags in the
1648 archive format are not mentioned elsewhere in this volume of
1649 POSIX.1‐2017. If the implementation does not support those flags,
1650 they may be ignored.
1651
1652 cpio Special Entries
1653 FIFO special files, directories, and the trailer shall be recorded with
1654 c_filesize equal to zero. Symbolic links shall be recorded with c_file‐
1655 size equal to the length of the contents of the symbolic link. For
1656 other special files, c_filesize is unspecified by this volume of
1657 POSIX.1‐2017. The header for the next file entry in the archive shall
1658 be written directly after the last octet of the file entry preceding
1659 it. A header denoting the filename TRAILER!!! shall indicate the end
1660 of the archive; the contents of octets in the last block of the archive
1661 following such a header are undefined.
1662
1664 The following exit values shall be returned:
1665
1666 0 All files were processed successfully.
1667
1668 >0 An error occurred.
1669
1671 If pax cannot create a file or a link when reading an archive or cannot
1672 find a file when writing an archive, or cannot preserve the user ID,
1673 group ID, or file mode when the -p option is specified, a diagnostic
1674 message shall be written to standard error and a non-zero exit status
1675 shall be returned, but processing shall continue. In the case where pax
1676 cannot create a link to a file, pax shall not, by default, create a
1677 second copy of the file.
1678
1679 If the extraction of a file from an archive is prematurely terminated
1680 by a signal or error, pax may have only partially extracted the file or
1681 (if the -n option was not specified) may have extracted a file of the
1682 same name as that specified by the user, but which is not the file the
1683 user wanted. Additionally, the file modes of extracted directories may
1684 have additional bits from the S_IRWXU mask set as well as incorrect
1685 modification and access times.
1686
1687 The following sections are informative.
1688
1690 Caution is advised when using the -a option to append to a cpio format
1691 archive. If any of the files being appended happen to be given the same
1692 c_dev and c_ino values as a file in the existing part of the archive,
1693 then they may be treated as links to that file on extraction. Thus, it
1694 is risky to use -a with cpio format except when it is done on the same
1695 system that the original archive was created on, and with the same pax
1696 utility, and in the knowledge that there has been little or no file
1697 system activity since the original archive was created that could lead
1698 to any of the files appended being given the same c_dev and c_ino val‐
1699 ues as an unrelated file in the existing part of the archive. Also,
1700 when (intentionally) appending additional links to a file in the exist‐
1701 ing part of the archive, the c_nlink values in the modified archive can
1702 be smaller than the number of links to the file in the archive, which
1703 may mean that the links are not preserved on extraction.
1704
1705 The -p (privileges) option was invented to reconcile differences
1706 between historical tar and cpio implementations. In particular, the two
1707 utilities use -m in diametrically opposed ways. The -p option also pro‐
1708 vides a consistent means of extending the ways in which future file
1709 attributes can be addressed, such as for enhanced security systems or
1710 high-performance files. Although it may seem complex, there are really
1711 two modes that are most commonly used:
1712
1713 -p e ``Preserve everything''. This would be used by the historical
1714 superuser, someone with all appropriate privileges, to preserve
1715 all aspects of the files as they are recorded in the archive.
1716 The e flag is the sum of o and p, and other implementation-
1717 defined attributes.
1718
1719 -p p ``Preserve'' the file mode bits. This would be used by the user
1720 with regular privileges who wished to preserve aspects of the
1721 file other than the ownership. The file times are preserved by
1722 default, but two other flags are offered to disable these and
1723 use the time of extraction.
1724
1725 The one pathname per line format of standard input precludes pathnames
1726 containing <newline> characters. Although such pathnames violate the
1727 portable filename guidelines, they may exist and their presence may
1728 inhibit usage of pax within shell scripts. This problem is inherited
1729 from historical archive programs. The problem can be avoided by listing
1730 filename arguments on the command line instead of on standard input.
1731
1732 It is almost certain that appropriate privileges are required for pax
1733 to accomplish parts of this volume of POSIX.1‐2017. Specifically, cre‐
1734 ating files of type block special or character special, restoring file
1735 access times unless the files are owned by the user (the -t option), or
1736 preserving file owner, group, and mode (the -p option) all probably
1737 require appropriate privileges.
1738
1739 In read mode, implementations are permitted to overwrite files when the
1740 archive has multiple members with the same name. This may fail if per‐
1741 missions on the first version of the file do not permit it to be over‐
1742 written.
1743
1744 The cpio and ustar formats can only support files up to 8589934592
1745 bytes (8 ∗ 2^30) in size.
1746
1747 When archives containing binary header information are listed , the
1748 filenames printed may cause strange behavior on some terminals.
1749
1750 When all of the following are true:
1751
1752 1. A file of type directory is being placed into an archive.
1753
1754 2. The ustar archive format is being used.
1755
1756 3. The pathname of the directory is less than or equal to 155 bytes
1757 long (it will fit in the prefix field in the ustar header block).
1758
1759 4. The last component of the pathname of the directory is longer than
1760 100 bytes long (it will not fit in the name field in the ustar
1761 header block).
1762
1763 some implementations of the pax utility will place the entire directory
1764 pathname in the prefix field, set the name field to an empty string,
1765 and place the directory in the archive. Other implementations of the
1766 pax utility will give an error under these conditions because the name
1767 field is not large enough to hold the last component of the directory
1768 name. This standard allows either behavior. However, when extracting a
1769 directory from a ustar format archive, this standard requires that all
1770 implementations be able to extract a directory even if the name field
1771 contains an empty string as long as the prefix field does not also con‐
1772 tain an empty string.
1773
1775 The following command:
1776
1777
1778 pax -w -f /dev/rmt/1m .
1779
1780 copies the contents of the current directory to tape drive 1, medium
1781 density (assuming historical System V device naming procedures—the his‐
1782 torical BSD device name would be /dev/rmt9).
1783
1784 The following commands:
1785
1786
1787 mkdir newdir
1788 pax -rw olddir newdir
1789
1790 copy the olddir directory hierarchy to newdir.
1791
1792
1793 pax -r -s ',^//*usr//*,,' -f a.pax
1794
1795 reads the archive a.pax, with all files rooted in /usr in the archive
1796 extracted relative to the current directory.
1797
1798 Using the option:
1799
1800
1801 -o listopt="%M %(atime)T %(size)D %(name)s"
1802
1803 overrides the default output description in Standard Output and instead
1804 writes:
1805
1806
1807 -rw-rw--- Jan 12 15:53 2003 1492 /usr/foo/bar
1808
1809 Using the options:
1810
1811
1812 -o listopt='%L\t%(size)D\n%.7' \
1813 -o listopt='(name)s\n%(atime)T\n%T'
1814
1815 overrides the default output description in Standard Output and instead
1816 writes:
1817
1818
1819 /usr/foo/bar -> /tmp 1492
1820 /usr/fo
1821 Jan 12 15:53 1991
1822 Jan 31 15:53 2003
1823
1825 The pax utility was new for the ISO POSIX‐2:1993 standard. It repre‐
1826 sents a peaceful compromise between advocates of the historical tar and
1827 cpio utilities.
1828
1829 A fundamental difference between cpio and tar was in the way directo‐
1830 ries were treated. The cpio utility did not treat directories differ‐
1831 ently from other files, and to select a directory and its contents
1832 required that each file in the hierarchy be explicitly specified. For
1833 tar, a directory matched every file in the file hierarchy it rooted.
1834
1835 The pax utility offers both interfaces; by default, directories map
1836 into the file hierarchy they root. The -d option causes pax to skip any
1837 file not explicitly referenced, as cpio historically did. The tar
1838 -style behavior was chosen as the default because it was believed that
1839 this was the more common usage and because tar is the more commonly
1840 available interface, as it was historically provided on both System V
1841 and BSD implementations.
1842
1843 The data interchange format specification in this volume of
1844 POSIX.1‐2017 requires that processes with ``appropriate privileges''
1845 shall always restore the ownership and permissions of extracted files
1846 exactly as archived. If viewed from the historic equivalence between
1847 superuser and ``appropriate privileges'', there are two problems with
1848 this requirement. First, users running as superusers may unknowingly
1849 set dangerous permissions on extracted files. Second, it is needlessly
1850 limiting, in that superusers cannot extract files and own them as supe‐
1851 ruser unless the archive was created by the superuser. (It should be
1852 noted that restoration of ownerships and permissions for the superuser,
1853 by default, is historical practice in cpio, but not in tar.) In order
1854 to avoid these two problems, the pax specification has an additional
1855 ``privilege'' mechanism, the -p option. Only a pax invocation with the
1856 privileges needed, and which has the -p option set using the e specifi‐
1857 cation character, has appropriate privileges to restore full ownership
1858 and permission information.
1859
1860 Note also that this volume of POSIX.1‐2017 requires that the file own‐
1861 ership and access permissions shall be set, on extraction, in the same
1862 fashion as the creat() function when provided with the mode stored in
1863 the archive. This means that the file creation mask of the user is
1864 applied to the file permissions.
1865
1866 Users should note that directories may be created by pax while extract‐
1867 ing files with permissions that are different from those that existed
1868 at the time the archive was created. When extracting sensitive informa‐
1869 tion into a directory hierarchy that no longer exists, users are
1870 encouraged to set their file creation mask appropriately to protect
1871 these files during extraction.
1872
1873 The table of contents output is written to standard output to facili‐
1874 tate pipeline processing.
1875
1876 An early proposal had hard links displaying for all pathnames. This was
1877 removed because it complicates the output of the case where -v is not
1878 specified and does not match historical cpio usage. The hard-link
1879 information is available in the -v display.
1880
1881 The description of the -l option allows implementations to make hard
1882 links to symbolic links. Earlier versions of this standard did not
1883 specify any way to create a hard link to a symbolic link, but many
1884 implementations provided this capability as an extension. If there are
1885 hard links to symbolic links when an archive is created, the implemen‐
1886 tation is required to archive the hard link in the archive (unless -H
1887 or -L is specified). When in read mode and in copy mode, implementa‐
1888 tions supporting hard links to symbolic links should use them when
1889 appropriate.
1890
1891 The archive formats inherited from the POSIX.1‐1990 standard have cer‐
1892 tain restrictions that have been brought along from historical usage.
1893 For example, there are restrictions on the length of pathnames stored
1894 in the archive. When pax is used in copy(-rw) mode (copying directory
1895 hierarchies), the ability to use extensions from the -xpax format over‐
1896 comes these restrictions.
1897
1898 The default blocksize value of 5120 bytes for cpio was selected because
1899 it is one of the standard block-size values for cpio, set when the -B
1900 option is specified. (The other default block-size value for cpio is
1901 512 bytes, and this was considered to be too small.) The default block
1902 value of 10240 bytes for tar was selected because that is the standard
1903 block-size value for BSD tar. The maximum block size of 32256 bytes
1904 (215-512 bytes) is the largest multiple of 512 bytes that fits into a
1905 signed 16-bit tape controller transfer register. There are known limi‐
1906 tations in some historical systems that would prevent larger blocks
1907 from being accepted. Historical values were chosen to improve compati‐
1908 bility with historical scripts using dd or similar utilities to manipu‐
1909 late archives. Also, default block sizes for any file type other than
1910 character special file has been deleted from this volume of
1911 POSIX.1‐2017 as unimportant and not likely to affect the structure of
1912 the resulting archive.
1913
1914 Implementations are permitted to modify the block-size value based on
1915 the archive format or the device to which the archive is being written.
1916 This is to provide implementations with the opportunity to take advan‐
1917 tage of special types of devices, and it should not be used without a
1918 great deal of consideration as it almost certainly decreases archive
1919 portability.
1920
1921 The intended use of the -n option was to permit extraction of one or
1922 more files from the archive without processing the entire archive. This
1923 was viewed by the standard developers as offering significant perfor‐
1924 mance advantages over historical implementations. The -n option in
1925 early proposals had three effects; the first was to cause special char‐
1926 acters in patterns to not be treated specially. The second was to cause
1927 only the first file that matched a pattern to be extracted. The third
1928 was to cause pax to write a diagnostic message to standard error when
1929 no file was found matching a specified pattern. Only the second behav‐
1930 ior is retained by this volume of POSIX.1‐2017, for many reasons.
1931 First, it is in general not acceptable for a single option to have mul‐
1932 tiple effects. Second, the ability to make pattern matching characters
1933 act as normal characters is useful for parts of pax other than file
1934 extraction. Third, a finer degree of control over the special charac‐
1935 ters is useful because users may wish to normalize only a single spe‐
1936 cial character in a single filename. Fourth, given a more general
1937 escape mechanism, the previous behavior of the -n option can be easily
1938 obtained using the -s option or a sed script. Finally, writing a diag‐
1939 nostic message when a pattern specified by the user is unmatched by any
1940 file is useful behavior in all cases.
1941
1942 In this version, the -n was removed from the copy mode synopsis of pax;
1943 it is inapplicable because there are no pattern operands specified in
1944 this mode.
1945
1946 There is another method than pax for copying subtrees in POSIX.1‐2008
1947 described as part of the cp utility. Both methods are historical prac‐
1948 tice: cp provides a simpler, more intuitive interface, while pax offers
1949 a finer granularity of control. Each provides additional functionality
1950 to the other; in particular, pax maintains the hard-link structure of
1951 the hierarchy while cp does not. It is the intention of the standard
1952 developers that the results be similar (using appropriate option combi‐
1953 nations in both utilities). The results are not required to be identi‐
1954 cal; there seemed insufficient gain to applications to balance the dif‐
1955 ficulty of implementations having to guarantee that the results would
1956 be exactly identical.
1957
1958 A single archive may span more than one file. It is suggested that
1959 implementations provide informative messages to the user on standard
1960 error whenever the archive file is changed.
1961
1962 The -d option (do not create intermediate directories not listed in the
1963 archive) found in early proposals was originally provided as a comple‐
1964 ment to the historic -d option of cpio. It has been deleted.
1965
1966 The -s option in early proposals specified a subset of the substitution
1967 command from the ed utility. As there was no reason for only a subset
1968 to be supported, the -s option is now compatible with the current ed
1969 specification. Since the delimiter can be any non-null character, the
1970 following usage with single <space> characters is valid:
1971
1972
1973 pax -s " foo bar " ...
1974
1975 The -t description is worded so as to note that this may cause the
1976 access time update caused by some other activity (which occurs while
1977 the file is being read) to be overwritten.
1978
1979 The default behavior of pax with regard to file modification times is
1980 the same as historical implementations of tar. It is not the histori‐
1981 cal behavior of cpio.
1982
1983 Because the -i option uses /dev/tty, utilities without a controlling
1984 terminal are not able to use this option.
1985
1986 The -y option, found in early proposals, has been deleted because a
1987 line containing a single <period> for the -i option has equivalent
1988 functionality. The special lines for the -i option (a single <period>
1989 and the empty line) are historical practice in cpio.
1990
1991 In early drafts, a -echarmap option was included to increase portabil‐
1992 ity of files between systems using different coded character sets. This
1993 option was omitted because it was apparent that consensus could not be
1994 formed for it. In this version, the use of UTF‐8 should be an adequate
1995 substitute.
1996
1997 The ISO POSIX‐2:1993 standard and ISO POSIX‐1 standard requirements for
1998 pax, however, made it very difficult to create a single archive con‐
1999 taining files created using extended characters provided by different
2000 locales. This version adds the hdrcharset keyword to make it possible
2001 to archive files in these cases without dropping files due to transla‐
2002 tion errors.
2003
2004 Translating filenames and other attributes from a locale's encoding to
2005 UTF‐8 and then back again can lose information, as the resulting file‐
2006 name might not be byte-for-byte equivalent to the original. To avoid
2007 this problem, users can specify the -o hdrcharset=binary option, which
2008 will cause the resulting archive to use binary format for all names and
2009 attributes. Such archives are not portable among hosts that use differ‐
2010 ent native encodings (e.g., EBCDIC versus ASCII-based encodings), but
2011 they will allow interchange among the vast majority of POSIX file sys‐
2012 tems in practical use. Also, the -o hdrcharset=binary option will cause
2013 pax in copy mode to behave more like other standard utilities such as
2014 cp.
2015
2016 If the values specified by the -o exthdr.name=value, -o globex‐
2017 thdr.name=value, or by $TMPDIR (if -o globexthdr.name is not specified)
2018 require a character encoding other than that described in the
2019 ISO/IEC 646:1991 standard, a path extended header record will have to
2020 be created for the file. If a hdrcharset extended header record is
2021 active for such headers, it will determine the codeset used for the
2022 value field in these extended path header records. These path extended
2023 header records always need to be created when writing an archive even
2024 if hdrcharset=binary has been specified and would contain the same
2025 (binary) data that appears in the ustar header record prefix and name
2026 fields. (In other words, an extended header path record is always
2027 required to be generated if the prefix or name fields contain non-ASCII
2028 characters even when hdrcharset=binary is also in effect for that
2029 file.)
2030
2031 The -k option was added to address international concerns about the
2032 dangers involved in the character set transformations of -e (if the
2033 target character set were different from the source, the filenames
2034 might be transformed into names matching existing files) and also was
2035 made more general to protect files transferred between file systems
2036 with different {NAME_MAX} values (truncating a filename on a smaller
2037 system might also inadvertently overwrite existing files). As stated,
2038 it prevents any overwriting, even if the target file is older than the
2039 source. This version adds more granularity of options to solve this
2040 problem by introducing the -oinvalid=option—specifically the UTF‐8 and
2041 binary actions. (Note that an existing file is still subject to over‐
2042 writing in this case. The -k option closes that loophole.)
2043
2044 Some of the file characteristics referenced in this volume of
2045 POSIX.1‐2017 might not be supported by some archive formats. For exam‐
2046 ple, neither the tar nor cpio formats contain the file access time. For
2047 this reason, the e specification character has been provided, intended
2048 to cause all file characteristics specified in the archive to be
2049 retained.
2050
2051 It is required that extracted directories, by default, have their
2052 access and modification times and permissions set to the values speci‐
2053 fied in the archive. This has obvious problems in that the directories
2054 are almost certainly modified after being extracted and that directory
2055 permissions may not permit file creation. One possible solution is to
2056 create directories with the mode specified in the archive, as modified
2057 by the umask of the user, with sufficient permissions to allow file
2058 creation. After all files have been extracted, pax would then reset the
2059 access and modification times and permissions as necessary.
2060
2061 The list-mode formatting description borrows heavily from the one
2062 defined by the printf utility. However, since there is no separate op‐
2063 erand list to get conversion arguments, the format was extended to
2064 allow specifying the name of the conversion argument as part of the
2065 conversion specification.
2066
2067 The T conversion specifier allows time fields to be displayed in any of
2068 the date formats. Unlike the ls utility, pax does not adjust the format
2069 when the date is less than six months in the past. This makes parsing
2070 the output more predictable.
2071
2072 The D conversion specifier handles the ability to display the
2073 major/minor or file size, as with ls, by using %-8(size)D.
2074
2075 The L conversion specifier handles the ls display for symbolic links.
2076
2077 Conversion specifiers were added to generate existing known types used
2078 for ls.
2079
2080 pax Interchange Format
2081 The new POSIX data interchange format was developed primarily to sat‐
2082 isfy international concerns that the ustar and cpio formats did not
2083 provide for file, user, and group names encoded in characters outside a
2084 subset of the ISO/IEC 646:1991 standard. The standard developers real‐
2085 ized that this new POSIX data interchange format should be very exten‐
2086 sible because there were other requirements they foresaw in the near
2087 future:
2088
2089 * Support international character encodings and locale information
2090
2091 * Support security information (ACLs, and so on)
2092
2093 * Support future file types, such as realtime or contiguous files
2094
2095 * Include data areas for implementation use
2096
2097 * Support systems with words larger than 32 bits and timers with sub‐
2098 second granularity
2099
2100 The following were not goals for this format because these are better
2101 handled by separate utilities or are inappropriate for a portable for‐
2102 mat:
2103
2104 * Encryption
2105
2106 * Compression
2107
2108 * Data translation between locales and codesets
2109
2110 * inode storage
2111
2112 The format chosen to support the goals is an extension of the ustar
2113 format. Of the two formats previously available, only the ustar format
2114 was selected for extensions because:
2115
2116 * It was easier to extend in an upwards-compatible way. It offered
2117 version flags and header block type fields with room for future
2118 standardization. The cpio format, while possessing a more flexible
2119 file naming methodology, could not be extended without breaking
2120 some theoretical implementation or using a dummy filename that
2121 could be a legitimate filename.
2122
2123 * Industry experience since the original ``tar wars'' fought in
2124 developing the ISO POSIX‐1 standard has clearly been in favor of
2125 the ustar format, which is generally the default output format
2126 selected for pax implementations on new systems.
2127
2128 The new format was designed with one additional goal in mind: reason‐
2129 able behavior when an older tar or pax utility happened to read an ar‐
2130 chive. Since the POSIX.1‐1990 standard mandated that a ``format-reading
2131 utility'' had to treat unrecognized typeflag values as regular files,
2132 this allowed the format to include all the extended information in a
2133 pseudo-regular file that preceded each real file. An option is given
2134 that allows the archive creator to set up reasonable names for these
2135 files on the older systems. Also, the normative text suggests that rea‐
2136 sonable file access values be used for this ustar header block. Making
2137 these header files inaccessible for convenient reading and deleting
2138 would not be reasonable. File permissions of 600 or 700 are suggested.
2139
2140 The ustar typeflag field was used to accommodate the additional func‐
2141 tionality of the new format rather than magic or version because the
2142 POSIX.1‐1990 standard (and, by reference, the previous version of pax),
2143 mandated the behavior of the format-reading utility when it encountered
2144 an unknown typeflag, but was silent about the other two fields.
2145
2146 Early proposals for the first version of this standard contained a pro‐
2147 posed archive format that was based on compatibility with the standard
2148 for tape files (ISO 1001, similar to the format used historically on
2149 many mainframes and minicomputers). This format was overly complex and
2150 required considerable overhead in volume and header records. Further‐
2151 more, the standard developers felt that it would not be acceptable to
2152 the community of POSIX developers, so it was later changed to be a for‐
2153 mat more closely related to historical practice on POSIX systems.
2154
2155 The prefix and name split of pathnames in ustar was replaced by the
2156 single path extended header record for simplicity.
2157
2158 The concept of a global extended header (typeflagg) was controversial.
2159 If this were applied to an archive being recorded on magnetic tape, a
2160 few unreadable blocks at the beginning of the tape could be a serious
2161 problem; a utility attempting to extract as many files as possible from
2162 a damaged archive could lose a large percentage of file header informa‐
2163 tion in this case. However, if the archive were on a reliable medium,
2164 such as a CD‐ROM, the global extended header offers considerable poten‐
2165 tial size reductions by eliminating redundant information. Thus, the
2166 text warns against using the global method for unreliable media and
2167 provides a method for implanting global information in the extended
2168 header for each file, rather than in the typeflag g records.
2169
2170 No facility for data translation or filtering on a per-file basis is
2171 included because the standard developers could not invent an interface
2172 that would allow this in an efficient manner. If a filter, such as
2173 encryption or compression, is to be applied to all the files, it is
2174 more efficient to apply the filter to the entire archive as a single
2175 file. The standard developers considered interfaces that would invoke a
2176 shell script for each file going into or out of the archive, but the
2177 system overhead in this approach was considered to be too high.
2178
2179 One such approach would be to have filter= records that give a pathname
2180 for an executable. When the program is invoked, the file and archive
2181 would be open for standard input/output and all the header fields would
2182 be available as environment variables or command-line arguments. The
2183 standard developers did discuss such schemes, but they were omitted
2184 from POSIX.1‐2008 due to concerns about excessive overhead. Also, the
2185 program itself would need to be in the archive if it were to be used
2186 portably.
2187
2188 There is currently no portable means of identifying the character
2189 set(s) used for a file in the file system. Therefore, pax has not been
2190 given a mechanism to generate charset records automatically. The only
2191 portable means of doing this is for the user to write the archive using
2192 the -ocharset=string command line option. This assumes that all of the
2193 files in the archive use the same encoding. The ``implementation-
2194 defined'' text is included to allow for a system that can identify the
2195 encodings used for each of its files.
2196
2197 The table of standards that accompanies the charset record description
2198 is acknowledged to be very limited. Only a limited number of character
2199 set standards is reasonable for maximal interchange. Any character set
2200 is, of course, possible by prior agreement. It was suggested that
2201 EBCDIC be listed, but it was omitted because it is not defined by a
2202 formal standard. Formal standards, and then only those with reasonably
2203 large followings, can be included here, simply as a matter of practi‐
2204 cality. The <value>s represent names of officially registered character
2205 sets in the format required by the ISO 2375:1985 standard.
2206
2207 The normal <comma> or <blank>-separated list rules are not followed in
2208 the case of keyword options to allow ease of argument parsing for
2209 getopts.
2210
2211 Further information on character encodings is in pax Archive Character
2212 Set Encoding/Decoding.
2213
2214 The standard developers have reserved keyword name space for vendor
2215 extensions. It is suggested that the format to be used is:
2216
2217
2218 VENDOR.keyword
2219
2220 where VENDOR is the name of the vendor or organization in all uppercase
2221 letters. It is further suggested that the keyword following the
2222 <period> be named differently than any of the standard keywords so that
2223 it could be used for future standardization, if appropriate, by omit‐
2224 ting the VENDOR prefix.
2225
2226 The <length> field in the extended header record was included to make
2227 it simpler to step through the records, even if a record contains an
2228 unknown format (to a particular pax) with complex interactions of spe‐
2229 cial characters. It also provides a minor integrity checkpoint within
2230 the records to aid a program attempting to recover files from a damaged
2231 archive.
2232
2233 There are no extended header versions of the devmajor and devminor
2234 fields because the unspecified format ustar header field should be suf‐
2235 ficient. If they are not, vendor-specific extended keywords (such as
2236 VENDOR.devmajor) should be used.
2237
2238 Device and i-number labeling of files was not adopted from cpio; files
2239 are interchanged strictly on a symbolic name basis, as in ustar.
2240
2241 Just as with the ustar format descriptions, the new format makes no
2242 special arrangements for multi-volume archives. Each of the pax archive
2243 types is assumed to be inside a single POSIX file and splitting that
2244 file over multiple volumes (diskettes, tape cartridges, and so on),
2245 processing their labels, and mounting each in the proper sequence are
2246 considered to be implementation details that cannot be described
2247 portably.
2248
2249 The pax format is intended for interchange, not only for backup on a
2250 single (family of) systems. It is not as densely packed as might be
2251 possible for backup:
2252
2253 * It contains information as coded characters that could be coded in
2254 binary.
2255
2256 * It identifies extended records with name fields that could be omit‐
2257 ted in favor of a fixed-field layout.
2258
2259 * It translates names into a portable character set and identifies
2260 locale-related information, both of which are probably unnecessary
2261 for backup.
2262
2263 The requirements on restoring from an archive are slightly different
2264 from the historical wording, allowing for non-monolithic privilege to
2265 bring forward as much as possible. In particular, attributes such as
2266 ``high performance file'' might be broadly but not universally granted
2267 while set-user-ID or chown() might be much more restricted. There is no
2268 implication in POSIX.1‐2008 that the security information be honored
2269 after it is restored to the file hierarchy, in spite of what might be
2270 improperly inferred by the silence on that topic. That is a topic for
2271 another standard.
2272
2273 Links are recorded in the fashion described here because a link can be
2274 to any file type. It is desirable in general to be able to restore part
2275 of an archive selectively and restore all of those files completely. If
2276 the data is not associated with each link, it is not possible to do
2277 this. However, the data associated with a file can be large, and when
2278 selective restoration is not needed, this can be a significant burden.
2279 The archive is structured so that files that have no associated data
2280 can always be restored by the name of any link name of any link, and
2281 the user may choose whether data is recorded with each instance of a
2282 file that contains data. The format permits mixing of both types of
2283 links in a single archive; this can be done for special needs, and pax
2284 is expected to interpret such archives on input properly, despite the
2285 fact that there is no pax option that would force this mixed case on
2286 output. (When -o linkdata is used, the output must contain the dupli‐
2287 cate data, but the implementation is free to include it or omit it when
2288 -o linkdata is not used.)
2289
2290 The time values are included as extended header records for those
2291 implementations needing more than the eleven octal digits allowed by
2292 the ustar format. Portable file timestamps cannot be negative. If pax
2293 encounters a file with a negative timestamp in copy or write mode, it
2294 can reject the file, substitute a non-negative timestamp, or generate a
2295 non-portable timestamp with a leading '-'. Even though some implemen‐
2296 tations can support finer file-time granularities than seconds, the
2297 normative text requires support only for seconds since the Epoch
2298 because the ISO POSIX‐1 standard states them that way. The ustar format
2299 includes only mtime; the new format adds atime and ctime for symmetry.
2300 The atime access time restored to the file system will be affected by
2301 the -p a and -p e options. The ctime creation time (actually inode mod‐
2302 ification time) is described with appropriate privileges so that it can
2303 be ignored when writing to the file system. POSIX does not provide a
2304 portable means to change file creation time. Nothing is intended to
2305 prevent a non-portable implementation of pax from restoring the value.
2306
2307 The gid, size, and uid extended header records were included to allow
2308 expansion beyond the sizes specified in the regular tar header. New
2309 file system architectures are emerging that will exhaust the 12-digit
2310 size field. There are probably not many systems requiring more than 8
2311 digits for user and group IDs, but the extended header values were
2312 included for completeness, allowing overrides for all of the decimal
2313 values in the tar header.
2314
2315 The standard developers intended to describe the effective results of
2316 pax with regard to file ownerships and permissions; implementations are
2317 not restricted in timing or sequencing the restoration of such, pro‐
2318 vided the results are as specified.
2319
2320 Much of the text describing the extended headers refers to use in
2321 ``write or copy modes''. The copy mode references are due to the norma‐
2322 tive text: ``The effect of the copy shall be as if the copied files
2323 were written to an archive file and then subsequently extracted ...''.
2324 There is certainly no way to test whether pax is actually generating
2325 the extended headers in copy mode, but the effects must be as if it
2326 had.
2327
2328 pax Archive Character Set Encoding/Decoding
2329 There is a need to exchange archives of files between systems of dif‐
2330 ferent native codesets. Filenames, group names, and user names must be
2331 preserved to the fullest extent possible when an archive is read on the
2332 receiving platform. Translation of the contents of files is not within
2333 the scope of the pax utility.
2334
2335 There will also be the need to represent characters that are not avail‐
2336 able on the receiving platform. These unsupported characters cannot be
2337 automatically folded to the local set of characters due to the chance
2338 of collisions. This could result in overwriting previous extracted
2339 files from the archive or pre-existing files on the system.
2340
2341 For these reasons, the codeset used to represent characters within the
2342 extended header records of the pax archive must be sufficiently rich to
2343 handle all commonly used character sets. The fields requiring transla‐
2344 tion include, at a minimum, filenames, user names, group names, and
2345 link pathnames. Implementations may wish to have localized extended
2346 keywords that use non-portable characters.
2347
2348 The standard developers considered the following options:
2349
2350 * The archive creator specifies the well-defined name of the source
2351 codeset. The receiver must then recognize the codeset name and per‐
2352 form the appropriate translations to the destination codeset.
2353
2354 * The archive creator includes within the archive the character map‐
2355 ping table for the source codeset used to encode extended header
2356 records. The receiver must then read the character mapping table
2357 and perform the appropriate translations to the destination code‐
2358 set.
2359
2360 * The archive creator translates the extended header records in the
2361 source codeset into a canonical form. The receiver must then per‐
2362 form the appropriate translations to the destination codeset.
2363
2364 The approach that incorporates the name of the source codeset poses the
2365 problem of codeset name registration, and makes the archive useless to
2366 pax archive decoders that do not recognize that codeset.
2367
2368 Because parts of an archive may be corrupted, the standard developers
2369 felt that including the character map of the source codeset was too
2370 fragile. The loss of this one key component could result in making the
2371 entire archive useless. (The difference between this and the global
2372 extended header decision was that the latter has a workaround—duplicat‐
2373 ing extended header records on unreliable media—but this would be too
2374 burdensome for large character set maps.)
2375
2376 Both of the above approaches also put an undue burden on the pax ar‐
2377 chive receiver to handle the cross-product of all source and destina‐
2378 tion codesets.
2379
2380 To simplify the translation from the source codeset to the canonical
2381 form and from the canonical form to the destination codeset, the stan‐
2382 dard developers decided that the internal representation should be a
2383 stateless encoding. A stateless encoding is one where each codepoint
2384 has the same meaning, without regard to the decoder being in a specific
2385 state. An example of a stateful encoding would be the Japanese Shift-
2386 JIS; an example of a stateless encoding would be the ISO/IEC 646:1991
2387 standard (equivalent to 7-bit ASCII).
2388
2389 For these reasons, the standard developers decided to adopt a canonical
2390 format for the representation of file information strings. The obvious,
2391 well-endorsed candidate is the ISO/IEC 10646‐1:2000 standard (based in
2392 part on Unicode), which can be used to represent the characters of vir‐
2393 tually all standardized character sets. The standard developers ini‐
2394 tially agreed upon using UCS2 (16-bit Unicode) as the internal repre‐
2395 sentation. This repertoire of characters provides a sufficiently rich
2396 set to represent all commonly-used codesets.
2397
2398 However, the standard developers found that the 16-bit Unicode repre‐
2399 sentation had some problems. It forced the issue of standardizing byte
2400 ordering. The 2-byte length of each character made the extended header
2401 records twice as long for the case of strings coded entirely from his‐
2402 torical 7-bit ASCII. For these reasons, the standard developers chose
2403 the UTF‐8 defined in the ISO/IEC 10646‐1:2000 standard. This multi-byte
2404 representation encodes UCS2 or UCS4 characters reliably and determinis‐
2405 tically, eliminating the need for a canonical byte ordering. In addi‐
2406 tion, NUL octets and other characters possibly confusing to POSIX file
2407 systems do not appear, except to represent themselves. It was realized
2408 that certain national codesets take up more space after the encoding,
2409 due to their placement within the UCS range; it was felt that the use‐
2410 fulness of the encoding of the names outweighs the disadvantage of size
2411 increase for file, user, and group names.
2412
2413 The encoding of UTF‐8 is as follows:
2414
2415
2416 UCS4 Hex Encoding UTF-8 Binary Encoding
2417
2418 00000000-0000007F 0xxxxxxx
2419 00000080-000007FF 110xxxxx 10xxxxxx
2420 00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
2421 00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
2422 00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
2423 04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
2424
2425 where each 'x' represents a bit value from the character being trans‐
2426 lated.
2427
2428 ustar Interchange Format
2429 The description of the ustar format reflects numerous enhancements over
2430 pre-1988 versions of the historical tar utility. The goal of these
2431 changes was not only to provide the functional enhancements desired,
2432 but also to retain compatibility between new and old versions. This
2433 compatibility has been retained. Archives written using the old ar‐
2434 chive format are compatible with the new format.
2435
2436 Implementors should be aware that the previous file format did not
2437 include a mechanism to archive directory type files. For this reason,
2438 the convention of using a filename ending with <slash> was adopted to
2439 specify a directory on the archive.
2440
2441 The total size of the name and prefix fields have been set to meet the
2442 minimum requirements for {PATH_MAX}. If a pathname will fit within the
2443 name field, it is recommended that the pathname be stored there without
2444 the use of the prefix field. Although the name field is known to be too
2445 small to contain {PATH_MAX} characters, the value was not changed in
2446 this version of the archive file format to retain backwards-compatibil‐
2447 ity, and instead the prefix was introduced. Also, because of the ear‐
2448 lier version of the format, there is no way to remove the restriction
2449 on the linkname field being limited in size to just that of the name
2450 field.
2451
2452 The size field is required to be meaningful in all implementation
2453 extensions, although it could be zero. This is required so that the
2454 data blocks can always be properly counted.
2455
2456 It is suggested that if device special files need to be represented
2457 that cannot be represented in the standard format, that one of the
2458 extension types (A‐Z) be used, and that the additional information for
2459 the special file be represented as data and be reflected in the size
2460 field.
2461
2462 Attempting to restore a special file type, where it is converted to
2463 ordinary data and conflicts with an existing filename, need not be spe‐
2464 cially detected by the utility. If run as an ordinary user, pax should
2465 not be able to overwrite the entries in, for example, /dev in any case
2466 (whether the file is converted to another type or not). If run as a
2467 privileged user, it should be able to do so, and it would be considered
2468 a bug if it did not. The same is true of ordinary data files and simi‐
2469 larly named special files; it is impossible to anticipate the needs of
2470 the user (who could really intend to overwrite the file), so the behav‐
2471 ior should be predictable (and thus regular) and rely on the protection
2472 system as required.
2473
2474 The value 7 in the typeflag field is intended to define how contiguous
2475 files can be stored in a ustar archive. POSIX.1‐2008 does not require
2476 the contiguous file extension, but does define a standard way of ar‐
2477 chiving such files so that all conforming systems can interpret these
2478 file types in a meaningful and consistent manner. On a system that does
2479 not support extended file types, the pax utility should do the best it
2480 can with the file and go on to the next.
2481
2482 The file protection modes are those conventionally used by the ls util‐
2483 ity. This is extended beyond the usage in the ISO POSIX‐2 standard to
2484 support the ``shared text'' or ``sticky'' bit. It is intended that the
2485 conformance document should not document anything beyond the existence
2486 of and support of such a mode. Further extensions are expected to these
2487 bits, particularly with overloading the set-user-ID and set-group-ID
2488 flags.
2489
2490 cpio Interchange Format
2491 The reference to appropriate privileges in the cpio format refers to an
2492 error on standard output; the ustar format does not make comparable
2493 statements.
2494
2495 The model for this format was the historical System V cpio-c data
2496 interchange format. This model documents the portable version of the
2497 cpio format and not the binary version. It has the flexibility to
2498 transfer data of any type described within POSIX.1‐2008, yet is exten‐
2499 sible to transfer data types specific to extensions beyond POSIX.1‐2008
2500 (for example, contiguous files). Because it describes existing prac‐
2501 tice, there is no question of maintaining upwards-compatibility.
2502
2503 cpio Header
2504 There has been some concern that the size of the c_ino field of the
2505 header is too small to handle those systems that have very large inode
2506 numbers. However, the c_ino field in the header is used strictly as a
2507 hard-link resolution mechanism for archives. It is not necessarily the
2508 same value as the inode number of the file in the location from which
2509 that file is extracted.
2510
2511 The name c_magic is based on historical usage.
2512
2513 cpio Filename
2514 For most historical implementations of the cpio utility, {PATH_MAX}
2515 octets can be used to describe the pathname without the addition of any
2516 other header fields (the NUL character would be included in this
2517 count). {PATH_MAX} is the minimum value for pathname size, documented
2518 as 256 bytes. However, an implementation may use c_namesize to deter‐
2519 mine the exact length of the pathname. With the current description of
2520 the <cpio.h> header, this pathname size can be as large as a number
2521 that is described in six octal digits.
2522
2523 Two values are documented under the c_mode field values to provide for
2524 extensibility for known file types:
2525
2526 0110 000 Reserved for contiguous files. The implementation may treat
2527 the rest of the information for this archive like a regular
2528 file. If this file type is undefined, the implementation may
2529 create the file as a regular file.
2530
2531 This provides for extensibility of the cpio format while allowing for
2532 the ability to read old archives. Files of an unknown type may be read
2533 as ``regular files'' on some implementations. On a system that does
2534 not support extended file types, the pax utility should do the best it
2535 can with the file and go on to the next.
2536
2538 None.
2539
2541 Chapter 2, Shell Command Language, cp, ed, getopts, ls, printf
2542
2543 The Base Definitions volume of POSIX.1‐2017, Section 3.169, File Mode
2544 Bits, Chapter 5, File Format Notation, Chapter 8, Environment Vari‐
2545 ables, Section 12.2, Utility Syntax Guidelines, <cpio.h>, <tar.h>
2546
2547 The System Interfaces volume of POSIX.1‐2017, chown(), creat(),
2548 fstatat(), mkdir(), mkfifo(), utime(), write()
2549
2551 Portions of this text are reprinted and reproduced in electronic form
2552 from IEEE Std 1003.1-2017, Standard for Information Technology -- Por‐
2553 table Operating System Interface (POSIX), The Open Group Base Specifi‐
2554 cations Issue 7, 2018 Edition, Copyright (C) 2018 by the Institute of
2555 Electrical and Electronics Engineers, Inc and The Open Group. In the
2556 event of any discrepancy between this version and the original IEEE and
2557 The Open Group Standard, the original IEEE and The Open Group Standard
2558 is the referee document. The original Standard can be obtained online
2559 at http://www.opengroup.org/unix/online.html .
2560
2561 Any typographical or formatting errors that appear in this page are
2562 most likely to have been introduced during the conversion of the source
2563 files to man page format. To report such errors, see https://www.ker‐
2564 nel.org/doc/man-pages/reporting_bugs.html .
2565
2566
2567
2568IEEE/The Open Group 2017 PAX(1P)