1PAX(1P) POSIX Programmer's Manual PAX(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
12 pax - portable archive interchange
13
15 pax [-cdnv][-H|-L][-f archive][-s replstr]...[pattern...]
16
17 pax -r[-cdiknuv][-H|-L][-f archive][-o options]...[-p string]...
18 [-s replstr]...[pattern...]
19
20 pax -w[-dituvX][-H|-L][-b blocksize][[-a][-f archive][-o options]...
21 [-s replstr]...[-x format][file...]
22
23 pax -r -w[-diklntuvX][-H|-L][-p string]...[-s replstr]...
24 [file...] directory
25
26
28 The pax utility shall read, write, and write lists of the members of
29 archive files and copy directory hierarchies. A variety of archive for‐
30 mats shall be supported; see the -x format option.
31
32 The action to be taken depends on the presence of the -r and -w
33 options. The four combinations of -r and -w are referred to as the four
34 modes of operation: list, read, write, and copy modes, corresponding
35 respectively to the four forms shown in the SYNOPSIS section.
36
37 list In list mode (when neither -r nor -w are specified), pax shall
38 write the names of the members of the archive file read from the
39 standard input, with pathnames matching the specified patterns,
40 to standard output. If a named file is of type directory, the
41 file hierarchy rooted at that file shall be listed as well.
42
43 read In read mode (when -r is specified, but -w is not), pax shall
44 extract the members of the archive file read from the standard
45 input, with pathnames matching the specified patterns. If an
46 extracted file is of type directory, the file hierarchy rooted
47 at that file shall be extracted as well. The extracted files
48 shall be created performing pathname resolution with the direc‐
49 tory in which pax was invoked as the current working directory.
50
51 If an attempt is made to extract a directory when the directory already
52 exists, this shall not be considered an error. If an attempt is made to
53 extract a FIFO when the FIFO already exists, this shall not be consid‐
54 ered an error.
55
56 The ownership, access, and modification times, and file mode of the
57 restored files are discussed under the -p option.
58
59 write In write mode (when -w is specified, but -r is not), pax shall
60 write the contents of the file operands to the standard output
61 in an archive format. If no file operands are specified, a list
62 of files to copy, one per line, shall be read from the standard
63 input. A file of type directory shall include all of the files
64 in the file hierarchy rooted at the file.
65
66 copy In copy mode (when both -r and -w are specified), pax shall copy
67 the file operands to the destination directory.
68
69 If no file operands are specified, a list of files to copy, one per
70 line, shall be read from the standard input. A file of type directory
71 shall include all of the files in the file hierarchy rooted at the
72 file.
73
74 The effect of the copy shall be as if the copied files were written to
75 an archive file and then subsequently extracted, except that there may
76 be hard links between the original and the copied files. If the desti‐
77 nation directory is a subdirectory of one of the files to be copied,
78 the results are unspecified. If the destination directory is a file of
79 a type not defined by the System Interfaces volume of
80 IEEE Std 1003.1-2001, the results are implementation-defined; other‐
81 wise, it shall be an error for the file named by the directory operand
82 not to exist, not be writable by the user, or not be a file of type
83 directory.
84
85
86 In read or copy modes, if intermediate directories are necessary to
87 extract an archive member, pax shall perform actions equivalent to the
88 mkdir() function defined in the System Interfaces volume of
89 IEEE Std 1003.1-2001, called with the following arguments:
90
91 * The intermediate directory used as the path argument
92
93 * The value of the bitwise-inclusive OR of S_IRWXU, S_IRWXG, and
94 S_IRWXO as the mode argument
95
96 If any specified pattern or file operands are not matched by at least
97 one file or archive member, pax shall write a diagnostic message to
98 standard error for each one that did not match and exit with a non-zero
99 exit status.
100
101 The archive formats described in the EXTENDED DESCRIPTION section shall
102 be automatically detected on input. The default output archive format
103 shall be implementation-defined.
104
105 A single archive can span multiple files. The pax utility shall deter‐
106 mine, in an implementation-defined manner, what file to read or write
107 as the next file.
108
109 If the selected archive format supports the specification of linked
110 files, it shall be an error if these files cannot be linked when the
111 archive is extracted. For archive formats that do not store file con‐
112 tents with each name that causes a hard link, if the file that contains
113 the data is not extracted during this pax session, either the data
114 shall be restored from the original file, or a diagnostic message shall
115 be displayed with the name of a file that can be used to extract the
116 data. In traversing directories, pax shall detect infinite loops; that
117 is, entering a previously visited directory that is an ancestor of the
118 last file visited. When it detects an infinite loop, pax shall write a
119 diagnostic message to standard error and shall terminate.
120
122 The pax utility shall conform to the Base Definitions volume of
123 IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines, except
124 that the order of presentation of the -o, -p, and -s options is signif‐
125 icant.
126
127 The following options shall be supported:
128
129 -r Read an archive file from standard input.
130
131 -w Write files to the standard output in the specified archive for‐
132 mat.
133
134 -a Append files to the end of the archive. It is implementation-
135 defined which devices on the system support appending. Addi‐
136 tional file formats unspecified by this volume of
137 IEEE Std 1003.1-2001 may impose restrictions on appending.
138
139 -b blocksize
140 Block the output at a positive decimal integer number of bytes
141 per write to the archive file. Devices and archive formats may
142 impose restrictions on blocking. Blocking shall be automatically
143 determined on input. Conforming applications shall not specify a
144 blocksize value larger than 32256. Default blocking when creat‐
145 ing archives depends on the archive format. (See the -x option
146 below.)
147
148 -c Match all file or archive members except those specified by the
149 pattern or file operands.
150
151 -d Cause files of type directory being copied or archived or ar‐
152 chive members of type directory being extracted or listed to
153 match only the file or archive member itself and not the file
154 hierarchy rooted at the file.
155
156 -f archive
157 Specify the pathname of the input or output archive, overriding
158 the default standard input (in list or read modes) or standard
159 output ( write mode).
160
161 -H If a symbolic link referencing a file of type directory is spec‐
162 ified on the command line, pax shall archive the file hierarchy
163 rooted in the file referenced by the link, using the name of the
164 link as the root of the file hierarchy. Otherwise, if a symbolic
165 link referencing a file of any other file type which pax can
166 normally archive is specified on the command line, then pax
167 shall archive the file referenced by the link, using the name of
168 the link. The default behavior shall be to archive the symbolic
169 link itself.
170
171 -i Interactively rename files or archive members. For each archive
172 member matching a pattern operand or file matching a file oper‐
173 and, a prompt shall be written to the file /dev/tty. The prompt
174 shall contain the name of the file or archive member, but the
175 format is otherwise unspecified. A line shall then be read from
176 /dev/tty. If this line is blank, the file or archive member
177 shall be skipped. If this line consists of a single period, the
178 file or archive member shall be processed with no modification
179 to its name. Otherwise, its name shall be replaced with the con‐
180 tents of the line. The pax utility shall immediately exit with a
181 non-zero exit status if end-of-file is encountered when reading
182 a response or if /dev/tty cannot be opened for reading and writ‐
183 ing.
184
185 The results of extracting a hard link to a file that has been renamed
186 during extraction are unspecified.
187
188 -k Prevent the overwriting of existing files.
189
190 -l (The letter ell.) In copy mode, hard links shall be made between
191 the source and destination file hierarchies whenever possible.
192 If specified in conjunction with -H or -L, when a symbolic link
193 is encountered, the hard link created in the destination file
194 hierarchy shall be to the file referenced by the symbolic link.
195 If specified when neither -H nor -L is specified, when a sym‐
196 bolic link is encountered, the implementation shall create a
197 hard link to the symbolic link in the source file hierarchy or
198 copy the symbolic link to the destination.
199
200 -L If a symbolic link referencing a file of type directory is spec‐
201 ified on the command line or encountered during the traversal of
202 a file hierarchy, pax shall archive the file hierarchy rooted in
203 the file referenced by the link, using the name of the link as
204 the root of the file hierarchy. Otherwise, if a symbolic link
205 referencing a file of any other file type which pax can normally
206 archive is specified on the command line or encountered during
207 the traversal of a file hierarchy, pax shall archive the file
208 referenced by the link, using the name of the link. The default
209 behavior shall be to archive the symbolic link itself.
210
211 -n Select the first archive member that matches each pattern oper‐
212 and. No more than one archive member shall be matched for each
213 pattern (although members of type directory shall still match
214 the file hierarchy rooted at that file).
215
216 -o options
217 Provide information to the implementation to modify the algo‐
218 rithm for extracting or writing files. The value of options
219 shall consist of one or more comma-separated keywords of the
220 form:
221
222
223 keyword[[:]=value][,keyword[[:]=value], ...]
224
225 Some keywords apply only to certain file formats, as indicated with
226 each description. Use of keywords that are inapplicable to the file
227 format being processed produces undefined results.
228
229 Keywords in the options argument shall be a string that would be a
230 valid portable filename as described in the Base Definitions volume of
231 IEEE Std 1003.1-2001, Section 3.276, Portable Filename Character Set.
232
233 Note:
234 Keywords are not expected to be filenames, merely to follow the
235 same character composition rules as portable filenames.
236
237
238 Keywords can be preceded with white space. The value field shall con‐
239 sist of zero or more characters; within value, the application shall
240 precede any literal comma with a backslash, which shall be ignored, but
241 preserves the comma as part of value. A comma as the final character,
242 or a comma followed solely by white space as the final characters, in
243 options shall be ignored. Multiple -o options can be specified; if key‐
244 words given to these multiple -o options conflict, the keywords and
245 values appearing later in command line sequence shall take precedence
246 and the earlier shall be silently ignored. The following keyword values
247 of options shall be supported for the file formats as indicated:
248
249 delete=pattern
250
251 (Applicable only to the -x pax format.) When used in write or
252 copy mode, pax shall omit from extended header records that it
253 produces any keywords matching the string pattern. When used in
254 read or list mode, pax shall ignore any keywords matching the
255 string pattern in the extended header records. In both cases,
256 matching shall be performed using the pattern matching notation
257 described in Patterns Matching a Single Character and Patterns
258 Matching Multiple Characters . For example:
259
260
261 -o delete=security.*
262
263 would suppress security-related information. See pax Extended
264 Header for extended header record keyword usage.
265
266 exthdr.name=string
267
268 (Applicable only to the -x pax format.) This keyword allows user
269 control over the name that is written into the ustar header
270 blocks for the extended header produced under the circumstances
271 described in pax Header Block . The name shall be the contents
272 of string, after the following character substitutions have been
273 made:
274
275 string
276 Includes: Replaced By:
277 %d The directory name of the file, equiva‐
278 lent to the result of the dirname util‐
279 ity on the translated pathname.
280 %f The filename of the file, equivalent to
281 the result of the basename utility on
282 the translated pathname.
283 %p The process ID of the pax process.
284 %% A '%' character.
285
286 Any other '%' characters in string produce undefined results.
287
288 If no -o exthdr.name= string is specified, pax shall use the
289 following default value:
290
291
292 %d/PaxHeaders.%p/%f
293
294 globexthdr.name=string
295
296 (Applicable only to the -x pax format.) When used in write or
297 copy mode with the appropriate options, pax shall create global
298 extended header records with ustar header blocks that will be
299 treated as regular files by previous versions of pax. This key‐
300 word allows user control over the name that is written into the
301 ustar header blocks for global extended header records. The name
302 shall be the contents of string, after the following character
303 substitutions have been made:
304
305 string
306 Includes: Replaced By:
307 %n An integer that represents the sequence
308 number of the global extended header
309 record in the archive, starting at 1.
310 %p The process ID of the pax process.
311 %% A '%' character.
312
313 Any other '%' characters in string produce undefined results.
314
315 If no -o globexthdr.name= string is specified, pax shall use the
316 following default value:
317
318
319 $TMPDIR/GlobalHead.%p.%n
320
321 where $ TMPDIR represents the value of the TMPDIR environment
322 variable. If TMPDIR is not set, pax shall use /tmp.
323
324 invalid=action
325
326 (Applicable only to the -x pax format.) This keyword allows user
327 control over the action pax takes upon encountering values in an
328 extended header record that, in read or copy mode, are invalid
329 in the destination hierarchy or, in list mode, cannot be written
330 in the codeset and current locale of the implementation. The
331 following are invalid values that shall be recognized by pax:
332
333 * In read or copy mode, a filename or link name that
334 contains character encodings invalid in the destina‐
335 tion hierarchy. (For example, the name may contain
336 embedded NULs.)
337
338 * In read or copy mode, a filename or link name that is
339 longer than the maximum allowed in the destination
340 hierarchy (for either a pathname component or the
341 entire pathname).
342
343 * In list mode, any character string value (filename,
344 link name, user name, and so on) that cannot be writ‐
345 ten in the codeset and current locale of the implemen‐
346 tation.
347
348 The following mutually-exclusive values of the action argument
349 are supported:
350
351 bypass
352 In read or copy mode, pax shall bypass the file, causing
353 no change to the destination hierarchy. In list mode, pax
354 shall write all requested valid values for the file, but
355 its method for writing invalid values is unspecified.
356
357 rename
358 In read or copy mode, pax shall act as if the -i option
359 were in effect for each file with invalid filename or
360 link name values, allowing the user to provide a replace‐
361 ment name interactively. In list mode, pax shall behave
362 identically to the bypass action.
363
364 UTF-8
365 When used in read, copy, or list mode and a filename,
366 link name, owner name, or any other field in an extended
367 header record cannot be translated from the pax UTF-8
368 codeset format to the codeset and current locale of the
369 implementation, pax shall use the actual UTF-8 encoding
370 for the name.
371
372 write
373 In read or copy mode, pax shall write the file, translat‐
374 ing or truncating the name, regardless of whether this
375 may overwrite an existing file with a valid name. In list
376 mode, pax shall behave identically to the bypass action.
377
378
379 If no -o invalid= option is specified, pax shall act as if -o
380 invalid= bypass were specified. Any overwriting of existing
381 files that may be allowed by the -o invalid= actions shall be
382 subject to permission ( -p) and modification time ( -u) restric‐
383 tions, and shall be suppressed if the -k option is also speci‐
384 fied.
385
386 linkdata
387
388 (Applicable only to the -x pax format.) In write mode, pax shall
389 write the contents of a file to the archive even when that file
390 is merely a hard link to a file whose contents have already been
391 written to the archive.
392
393 listopt=format
394
395 This keyword specifies the output format of the table of con‐
396 tents produced when the -v option is specified in list mode. See
397 List Mode Format Specifications . To avoid ambiguity, the
398 listopt= format shall be the only or final keyword= value pair
399 in a -o option-argument; all characters in the remainder of the
400 option-argument shall be considered part of the format string.
401 When multiple -o listopt= format options are specified, the for‐
402 mat strings shall be considered a single, concatenated string,
403 evaluated in command line order.
404
405 times
406
407 (Applicable only to the -x pax format.) When used in write or
408 copy mode, pax shall include atime, ctime, and mtime extended
409 header records for each file. See pax Extended Header File Times
410 .
411
412
413 In addition to these keywords, if the -x pax format is specified, any
414 of the keywords and values defined in pax Extended Header, including
415 implementation extensions, can be used in -o option-arguments, in
416 either of two modes:
417
418 keyword=value
419
420 When used in write or copy mode, these keyword/value pairs shall
421 be included at the beginning of the archive as typeflag g global
422 extended header records. When used in read or list mode, these
423 keyword/value pairs shall act as if they had been at the begin‐
424 ning of the archive as typeflag g global extended header
425 records.
426
427 keyword:=value
428
429 When used in write or copy mode, these keyword/value pairs shall
430 be included as records at the beginning of a typeflag x extended
431 header for each file. (This shall be equivalent to the equal-
432 sign form except that it creates no typeflag g global extended
433 header records.) When used in read or list mode, these key‐
434 word/value pairs shall act as if they were included as records
435 at the end of each extended header; thus, they shall override
436 any global or file-specific extended header record keywords of
437 the same names. For example, in the command:
438
439
440 pax -r -o "
441 gname:=mygroup,
442 " <archive
443
444 the group name will be forced to a new value for all files read
445 from the archive.
446
447
448 The precedence of -o keywords over various fields in the archive is
449 described in pax Extended Header Keyword Precedence .
450
451 -p string
452 Specify one or more file characteristic options (privileges).
453 The string option-argument shall be a string specifying file
454 characteristics to be retained or discarded on extraction. The
455 string shall consist of the specification characters a, e, m, o,
456 and p . Other implementation-defined characters can be included.
457 Multiple characteristics can be concatenated within the same
458 string and multiple -p options can be specified. The meaning of
459 the specification characters are as follows:
460
461 a
462 Do not preserve file access times.
463
464 e
465 Preserve the user ID, group ID, file mode bits (see the Base
466 Definitions volume of IEEE Std 1003.1-2001, Section 3.168, File
467 Mode Bits), access time, modification time, and any other imple‐
468 mentation-defined file characteristics.
469
470 m
471 Do not preserve file modification times.
472
473 o
474 Preserve the user ID and group ID.
475
476 p
477 Preserve the file mode bits. Other implementation-defined file
478 mode attributes may be preserved.
479
480
481 In the preceding list, "preserve" indicates that an attribute stored in
482 the archive shall be given to the extracted file, subject to the per‐
483 missions of the invoking process. The access and modification times of
484 the file shall be preserved unless otherwise specified with the -p
485 option or not stored in the archive. All attributes that are not pre‐
486 served shall be determined as part of the normal file creation action
487 (see File Read, Write, and Creation ).
488
489 If neither the e nor the o specification character is specified, or the
490 user ID and group ID are not preserved for any reason, pax shall not
491 set the S_ISUID and S_ISGID bits of the file mode.
492
493 If the preservation of any of these items fails for any reason, pax
494 shall write a diagnostic message to standard error. Failure to pre‐
495 serve these items shall affect the final exit status, but shall not
496 cause the extracted file to be deleted.
497
498 If file characteristic letters in any of the string option-arguments
499 are duplicated or conflict with each other, the ones given last shall
500 take precedence. For example, if -p eme is specified, file modification
501 times are preserved.
502
503 -s replstr
504 Modify file or archive member names named by pattern or file op‐
505 erands according to the substitution expression replstr, using
506 the syntax of the ed utility. The concepts of "address" and
507 "line" are meaningless in the context of the pax utility, and
508 shall not be supplied. The format shall be:
509
510
511 -s /old/new/[gp]
512
513 where as in ed, old is a basic regular expression and new can contain
514 an ampersand, '\n' (where n is a digit) backreferences, or subexpres‐
515 sion matching. The old string shall also be permitted to contain <new‐
516 line>s.
517
518 Any non-null character can be used as a delimiter ( '/' shown here).
519 Multiple -s expressions can be specified; the expressions shall be
520 applied in the order specified, terminating with the first successful
521 substitution. The optional trailing 'g' is as defined in the ed util‐
522 ity. The optional trailing 'p' shall cause successful substitutions to
523 be written to standard error. File or archive member names that substi‐
524 tute to the empty string shall be ignored when reading and writing ar‐
525 chives.
526
527 -t When reading files from the file system, and if the user has the
528 permissions required by utime() to do so, set the access time of
529 each file read to the access time that it had before being read
530 by pax.
531
532 -u Ignore files that are older (having a less recent file modifica‐
533 tion time) than a pre-existing file or archive member with the
534 same name. In read mode, an archive member with the same name as
535 a file in the file system shall be extracted if the archive mem‐
536 ber is newer than the file. In write mode, an archive file mem‐
537 ber with the same name as a file in the file system shall be
538 superseded if the file is newer than the archive member. If -a
539 is also specified, this is accomplished by appending to the ar‐
540 chive; otherwise, it is unspecified whether this is accomplished
541 by actual replacement in the archive or by appending to the ar‐
542 chive. In copy mode, the file in the destination hierarchy shall
543 be replaced by the file in the source hierarchy or by a link to
544 the file in the source hierarchy if the file in the source hier‐
545 archy is newer.
546
547 -v In list mode, produce a verbose table of contents (see the STD‐
548 OUT section). Otherwise, write archive member pathnames to stan‐
549 dard error (see the STDERR section).
550
551 -x format
552 Specify the output archive format. The pax utility shall support
553 the following formats:
554
555 cpio
556 The cpio interchange format; see the EXTENDED DESCRIPTION sec‐
557 tion. The default blocksize for this format for character spe‐
558 cial archive files shall be 5120. Implementations shall support
559 all blocksize values less than or equal to 32256 that are multi‐
560 ples of 512.
561
562 pax
563 The pax interchange format; see the EXTENDED DESCRIPTION sec‐
564 tion. The default blocksize for this format for character spe‐
565 cial archive files shall be 5120. Implementations shall support
566 all blocksize values less than or equal to 32256 that are multi‐
567 ples of 512.
568
569 ustar
570 The tar interchange format; see the EXTENDED DESCRIPTION sec‐
571 tion. The default blocksize for this format for character spe‐
572 cial archive files shall be 10240. Implementations shall support
573 all blocksize values less than or equal to 32256 that are multi‐
574 ples of 512.
575
576
577 Implementation-defined formats shall specify a default block size as
578 well as any other block sizes supported for character special archive
579 files.
580
581 Any attempt to append to an archive file in a format different from the
582 existing archive format shall cause pax to exit immediately with a non-
583 zero exit status.
584
585 In copy mode, if no -x format is specified, pax shall behave as if -x
586 pax were specified.
587
588 -X When traversing the file hierarchy specified by a pathname, pax
589 shall not descend into directories that have a different device
590 ID ( st_dev; see the System Interfaces volume of
591 IEEE Std 1003.1-2001, stat()).
592
593
594 The options that operate on the names of files or archive members ( -c,
595 -i, -n, -s, -u, and -v) shall interact as follows. In read mode, the
596 archive members shall be selected based on the user-specified pattern
597 operands as modified by the -c, -n, and -u options. Then, any -s and -i
598 options shall modify, in that order, the names of the selected files.
599 The -v option shall write names resulting from these modifications.
600
601 In write mode, the files shall be selected based on the user-specified
602 pathnames as modified by the -n and -u options. Then, any -s and -i
603 options shall modify, in that order, the names of these selected files.
604 The -v option shall write names resulting from these modifications.
605
606 If both the -u and -n options are specified, pax shall not consider a
607 file selected unless it is newer than the file to which it is compared.
608
609 List Mode Format Specifications
610 In list mode with the -o listopt= format option, the format argument
611 shall be applied for each selected file. The pax utility shall append a
612 <newline> to the listopt output for each selected file. The format
613 argument shall be used as the format string described in the Base Defi‐
614 nitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Nota‐
615 tion, with the exceptions 1. through 5. defined in the EXTENDED
616 DESCRIPTION section of printf, plus the following exceptions:
617
618 6. The sequence ( keyword) can occur before a format conversion
619 specifier. The conversion argument is defined by the value of
620 keyword. The implementation shall support the following key‐
621 words:
622
623 * Any of the Field Name entries in ustar Header Block and
624 Octet-Oriented cpio Archive Entry . The implementation may
625 support the cpio keywords without the leading c_ in addition
626 to the form required by Values for cpio c_mode Field .
627
628 * Any keyword defined for the extended header in pax Extended
629 Header .
630
631 * Any keyword provided as an implementation-defined extension
632 within the extended header defined in pax Extended Header .
633
634 For example, the sequence "%(charset)s" is the string value of the name
635 of the character set in the extended header.
636
637 The result of the keyword conversion argument shall be the value from
638 the applicable header field or extended header, without any trailing
639 NULs.
640
641 All keyword values used as conversion arguments shall be translated
642 from the UTF-8 encoding to the character set appropriate for the local
643 file system, user database, and so on, as applicable.
644
645 7. An additional conversion specifier character, T, shall be used
646 to specify time formats. The T conversion specifier character
647 can be preceded by the sequence ( keyword= subformat), where
648 subformat is a date format as defined by date operands. The
649 default keyword shall be mtime and the default subformat shall
650 be:
651
652
653 %b %e %H:%M %Y
654
655 8. An additional conversion specifier character, M, shall be used
656 to specify the file mode string as defined in ls Standard Out‐
657 put. If ( keyword) is omitted, the mode keyword shall be used.
658 For example, %.1M writes the single character corresponding to
659 the <entry type> field of the ls -l command.
660
661 9. An additional conversion specifier character, D, shall be used
662 to specify the device for block or special files, if applicable,
663 in an implementation-defined format. If not applicable, and (
664 keyword) is specified, then this conversion shall be equivalent
665 to %(keyword)u. If not applicable, and ( keyword) is omitted,
666 then this conversion shall be equivalent to <space>.
667
668 10. An additional conversion specifier character, F, shall be used
669 to specify a pathname. The F conversion character can be pre‐
670 ceded by a sequence of comma-separated keywords:
671
672
673 (keyword[,keyword] ... )
674
675 The values for all the keywords that are non-null shall be concatenated
676 together, each separated by a '/' . The default shall be ( path) if the
677 keyword path is defined; otherwise, the default shall be ( prefix,
678 name).
679
680 11. An additional conversion specifier character, L, shall be used
681 to specify a symbolic line expansion. If the current file is a
682 symbolic link, then %L shall expand to:
683
684
685 "%s -> %s", <value of keyword>, <contents of link>
686
687 Otherwise, the %L conversion specification shall be the equivalent of
688 %F .
689
690
692 The following operands shall be supported:
693
694 directory
695 The destination directory pathname for copy mode.
696
697 file A pathname of a file to be copied or archived.
698
699 pattern
700 A pattern matching one or more pathnames of archive members. A
701 pattern must be given in the name-generating notation of the
702 pattern matching notation in Pattern Matching Notation, includ‐
703 ing the filename expansion rules in Patterns Used for Filename
704 Expansion . The default, if no pattern is specified, is to
705 select all members in the archive.
706
707
709 In write mode, the standard input shall be used only if no file oper‐
710 ands are specified. It shall be a text file containing a list of path‐
711 names, one per line, without leading or trailing <blank>s.
712
713 In list and read modes, if -f is not specified, the standard input
714 shall be an archive file.
715
716 Otherwise, the standard input shall not be used.
717
719 The input file named by the archive option-argument, or standard input
720 when the archive is read from there, shall be a file formatted accord‐
721 ing to one of the specifications in the EXTENDED DESCRIPTION section or
722 some other implementation-defined format.
723
724 The file /dev/tty shall be used to write prompts and read responses.
725
727 The following environment variables shall affect the execution of pax:
728
729 LANG Provide a default value for the internationalization variables
730 that are unset or null. (See the Base Definitions volume of
731 IEEE Std 1003.1-2001, Section 8.2, Internationalization Vari‐
732 ables for the precedence of internationalization variables used
733 to determine the values of locale categories.)
734
735 LC_ALL If set to a non-empty string value, override the values of all
736 the other internationalization variables.
737
738 LC_COLLATE
739
740 Determine the locale for the behavior of ranges, equivalence
741 classes, and multi-character collating elements used in the pat‐
742 tern matching expressions for the pattern operand, the basic
743 regular expression for the -s option, and the extended regular
744 expression defined for the yesexpr locale keyword in the LC_MES‐
745 SAGES category.
746
747 LC_CTYPE
748 Determine the locale for the interpretation of sequences of
749 bytes of text data as characters (for example, single-byte as
750 opposed to multi-byte characters in arguments and input files),
751 the behavior of character classes used in the extended regular
752 expression defined for the yesexpr locale keyword in the LC_MES‐
753 SAGES category, and pattern matching.
754
755 LC_MESSAGES
756 Determine the locale for the processing of affirmative responses
757 that should be used to affect the format and contents of diag‐
758 nostic messages written to standard error.
759
760 LC_TIME
761 Determine the format and contents of date and time strings when
762 the -v option is specified.
763
764 NLSPATH
765 Determine the location of message catalogs for the processing of
766 LC_MESSAGES .
767
768 TMPDIR Determine the pathname that provides part of the default global
769 extended header record file, as described for the -o globexthdr=
770 keyword in the OPTIONS section.
771
772 TZ Determine the timezone used to calculate date and time strings
773 when the -v option is specified. If TZ is unset or null, an
774 unspecified default timezone shall be used.
775
776
778 Default.
779
781 In write mode, if -f is not specified, the standard output shall be the
782 archive formatted according to one of the specifications in the
783 EXTENDED DESCRIPTION section, or some other implementation-defined for‐
784 mat (see -x format).
785
786 In list mode, when the -o listopt= format has been specified, the
787 selected archive members shall be written to standard output using the
788 format described under List Mode Format Specifications . In list mode
789 without the -o listopt= format option, the table of contents of the
790 selected archive members shall be written to standard output using the
791 following format:
792
793
794 "%s\n", <pathname>
795
796 If the -v option is specified in list mode, the table of contents of
797 the selected archive members shall be written to standard output using
798 the following formats.
799
800 For pathnames representing hard links to previous members of the ar‐
801 chive:
802
803
804 "%s == %s\n", <ls -l listing>, <linkname>
805
806 For all other pathnames:
807
808
809 "%s\n", <ls -l listing>
810
811 where <ls -l listing> shall be the format specified by the ls utility
812 with the -l option. When writing pathnames in this format, it is
813 unspecified what is written for fields for which the underlying archive
814 format does not have the correct information, although the correct num‐
815 ber of <blank>-separated fields shall be written.
816
817 In list mode, standard output shall not be buffered more than a line at
818 a time.
819
821 If -v is specified in read, write, or copy modes, pax shall write the
822 pathnames it processes to the standard error output using the following
823 format:
824
825
826 "%s\n", <pathname>
827
828 These pathnames shall be written as soon as processing is begun on the
829 file or archive member, and shall be flushed to standard error. The
830 trailing <newline>, which shall not be buffered, is written when the
831 file has been read or written.
832
833 If the -s option is specified, and the replacement string has a trail‐
834 ing 'p', substitutions shall be written to standard error in the fol‐
835 lowing format:
836
837
838 "%s >> %s\n", <original pathname>, <new pathname>
839
840 In all operating modes of pax, optional messages of unspecified format
841 concerning the input archive format and volume number, the number of
842 files, blocks, volumes, and media parts as well as other diagnostic
843 messages may be written to standard error.
844
845 In all formats, for both standard output and standard error, it is
846 unspecified how non-printable characters in pathnames or link names are
847 written.
848
849 When pax is in read mode or list mode, using the -x pax archive format,
850 and a filename, link name, owner name, or any other field in an
851 extended header record cannot be translated from the pax UTF-8 codeset
852 format to the codeset and current locale of the implementation, pax
853 shall write a diagnostic message to standard error, shall process the
854 file as described for the -o invalid= option, and then shall process
855 the next file in the archive.
856
858 In read mode, the extracted output files shall be of the archived file
859 type. In copy mode, the copied output files shall be the type of the
860 file being copied. In either mode, existing files in the destination
861 hierarchy shall be overwritten only when all permission ( -p), modifi‐
862 cation time ( -u), and invalid-value ( -o invalid=) tests allow it.
863
864 In write mode, the output file named by the -f option-argument shall be
865 a file formatted according to one of the specifications in the EXTENDED
866 DESCRIPTION section, or some other implementation-defined format.
867
869 pax Interchange Format
870 A pax archive tape or file produced in the -x pax format shall contain
871 a series of blocks. The physical layout of the archive shall be identi‐
872 cal to the ustar format described in ustar Interchange Format . Each
873 file archived shall be represented by the following sequence:
874
875 * An optional header block with extended header records. This header
876 block is of the form described in pax Header Block, with a typeflag
877 value of x or g. The extended header records, described in pax
878 Extended Header, shall be included as the data for this header
879 block.
880
881 * A header block that describes the file. Any fields in the preceding
882 optional extended header shall override the associated fields in
883 this header block for this file.
884
885 * Zero or more blocks that contain the contents of the file.
886
887 At the end of the archive file there shall be two 512-byte blocks
888 filled with binary zeros, interpreted as an end-of-archive indicator.
889
890 A schematic of an example archive with global extended header records
891 and two actual files is shown in pax Format Archive Example . In the
892 example, the second file in the archive has no extended header preced‐
893 ing it, presumably because it has no need for extended attributes.
894
895
896
897 Figure: pax Format Archive Example
898
899 pax Header Block
900 The pax header block shall be identical to the ustar header block
901 described in ustar Interchange Format, except that two additional type‐
902 flag values are defined:
903
904 x Represents extended header records for the following file in the
905 archive (which shall have its own ustar header block). The for‐
906 mat of these extended header records shall be as described in
907 pax Extended Header .
908
909 g Represents global extended header records for the following
910 files in the archive. The format of these extended header
911 records shall be as described in pax Extended Header . Each
912 value shall affect all subsequent files that do not override
913 that value in their own extended header record and until another
914 global extended header record is reached that provides another
915 value for the same field. The typeflag g global headers should
916 not be used with interchange media that could suffer partial
917 data loss in transporting the archive.
918
919
920 For both of these types, the size field shall be the size of the
921 extended header records in octets. The other fields in the header block
922 are not meaningful to this version of the pax utility. However, if this
923 archive is read by a pax utility conforming to the ISO POSIX-2:1993
924 standard, the header block fields are used to create a regular file
925 that contains the extended header records as data. Therefore, header
926 block field values should be selected to provide reasonable file access
927 to this regular file.
928
929 A further difference from the ustar header block is that data blocks
930 for files of typeflag 1 (the digit one) (hard link) may be included,
931 which means that the size field may be greater than zero. Archives cre‐
932 ated by pax -o linkdata shall include these data blocks with the hard
933 links.
934
935 pax Extended Header
936 A pax extended header contains values that are inappropriate for the
937 ustar header block because of limitations in that format: fields
938 requiring a character encoding other than that described in the
939 ISO/IEC 646:1991 standard, fields representing file attributes not
940 described in the ustar header, and fields whose format or length do not
941 fit the requirements of the ustar header. The values in an extended
942 header add attributes to the following file (or files; see the descrip‐
943 tion of the typeflag g header block) or override values in the follow‐
944 ing header block(s), as indicated in the following list of keywords.
945
946 An extended header shall consist of one or more records, each con‐
947 structed as follows:
948
949
950 "%d %s=%s\n", <length>, <keyword>, <value>
951
952 The extended header records shall be encoded according to the
953 ISO/IEC 10646-1:2000 standard (UTF-8). The <length> field, <blank>,
954 equals sign, and <newline> shown shall be limited to the portable char‐
955 acter set, as encoded in UTF-8. The <keyword> and <value> fields can be
956 any UTF-8 characters. The <length> field shall be the decimal length of
957 the extended header record in octets, including the trailing <newline>.
958
959 The <keyword> field shall be one of the entries from the following list
960 or a keyword provided as an implementation extension. Keywords consist‐
961 ing entirely of lowercase letters, digits, and periods are reserved for
962 future standardization. A keyword shall not include an equals sign. (In
963 the following list, the notations "file(s)" or "block(s)" is used to
964 acknowledge that a keyword affects the following single file after a
965 typeflag x extended header, but possibly multiple files after typeflag
966 g. Any requirements in the list for pax to include a record when in
967 write or copy mode shall apply only when such a record has not already
968 been provided through the use of the -o option. When used in copy mode,
969 pax shall behave as if an archive had been created with applicable
970 extended header records and then extracted.)
971
972 atime The file access time for the following file(s), equivalent to
973 the value of the st_atime member of the stat structure for a
974 file, as described by the stat() function. The access time shall
975 be restored if the process has the appropriate privilege
976 required to do so. The format of the <value> shall be as
977 described in pax Extended Header File Times .
978
979 charset
980 The name of the character set used to encode the data in the
981 following file(s). The entries in the following table are
982 defined to refer to known standards; additional names may be
983 agreed on between the originator and recipient.
984
985 <value> Formal Standard
986 ISO-IR 646 1990 ISO/IEC 646:1990
987 ISO-IR 8859 1 1998 ISO/IEC 8859-1:1998
988 ISO-IR 8859 2 1999 ISO/IEC 8859-2:1999
989 ISO-IR 8859 3 1999 ISO/IEC 8859-3:1999
990 ISO-IR 8859 4 1998 ISO/IEC 8859-4:1998
991 ISO-IR 8859 5 1999 ISO/IEC 8859-5:1999
992 ISO-IR 8859 6 1999 ISO/IEC 8859-6:1999
993 ISO-IR 8859 7 1987 ISO/IEC 8859-7:1987
994 ISO-IR 8859 8 1999 ISO/IEC 8859-8:1999
995 ISO-IR 8859 9 1999 ISO/IEC 8859-9:1999
996 ISO-IR 8859 10 1998 ISO/IEC 8859-10:1998
997 ISO-IR 8859 13 1998 ISO/IEC 8859-13:1998
998 ISO-IR 8859 14 1998 ISO/IEC 8859-14:1998
999 ISO-IR 8859 15 1999 ISO/IEC 8859-15:1999
1000 ISO-IR 10646 2000 ISO/IEC 10646:2000
1001 ISO-IR 10646 2000 UTF-8 ISO/IEC 10646, UTF-8 encoding
1002 BINARY None.
1003
1004 The encoding is included in an extended header for information only;
1005 when pax is used as described in IEEE Std 1003.1-2001, it shall not
1006 translate the file data into any other encoding. The BINARY entry indi‐
1007 cates unencoded binary data.
1008
1009 When used in write or copy mode, it is implementation-defined whether
1010 pax includes a charset extended header record for a file.
1011
1012 comment
1013 A series of characters used as a comment. All characters in the
1014 <value> field shall be ignored by pax.
1015
1016 ctime The file creation time for the following file(s), equivalent to
1017 the value of the st_ctime member of the stat structure for a
1018 file, as described by the stat() function. The creation time
1019 shall be restored if the process has the appropriate privilege
1020 required to do so. The format of the <value> shall be as
1021 described in pax Extended Header File Times .
1022
1023 gid The group ID of the group that owns the file, expressed as a
1024 decimal number using digits from the ISO/IEC 646:1991 standard.
1025 This record shall override the gid field in the following header
1026 block(s). When used in write or copy mode, pax shall include a
1027 gid extended header record for each file whose group ID is
1028 greater than 2097151 (octal 7777777).
1029
1030 gname The group of the file(s), formatted as a group name in the group
1031 database. This record shall override the gid and gname fields
1032 in the following header block(s), and any gid extended header
1033 record. When used in read, copy, or list mode, pax shall trans‐
1034 late the name from the UTF-8 encoding in the header record to
1035 the character set appropriate for the group database on the
1036 receiving system. If any of the UTF-8 characters cannot be
1037 translated, and if the -o invalid= UTF-8 option is not speci‐
1038 fied, the results are implementation-defined. When used in write
1039 or copy mode, pax shall include a gname extended header record
1040 for each file whose group name cannot be represented entirely
1041 with the letters and digits of the portable character set.
1042
1043 linkpath
1044 The pathname of a link being created to another file, of any
1045 type, previously archived. This record shall override the
1046 linkname field in the following ustar header block(s). The fol‐
1047 lowing ustar header block shall determine the type of link cre‐
1048 ated. If typeflag of the following header block is 1, it shall
1049 be a hard link. If typeflag is 2, it shall be a symbolic link
1050 and the linkpath value shall be the contents of the symbolic
1051 link. The pax utility shall translate the name of the link (con‐
1052 tents of the symbolic link) from the UTF-8 encoding to the char‐
1053 acter set appropriate for the local file system. When used in
1054 write or copy mode, pax shall include a linkpath extended header
1055 record for each link whose pathname cannot be represented
1056 entirely with the members of the portable character set other
1057 than NUL.
1058
1059 mtime The file modification time of the following file(s), equivalent
1060 to the value of the st_mtime member of the stat structure for a
1061 file, as described in the stat() function. This record shall
1062 override the mtime field in the following header block(s). The
1063 modification time shall be restored if the process has the
1064 appropriate privilege required to do so. The format of the
1065 <value> shall be as described in pax Extended Header File Times
1066 .
1067
1068 path The pathname of the following file(s). This record shall over‐
1069 ride the name and prefix fields in the following header
1070 block(s). The pax utility shall translate the pathname of the
1071 file from the UTF-8 encoding to the character set appropriate
1072 for the local file system.
1073
1074 When used in write or copy mode, pax shall include a path extended
1075 header record for each file whose pathname cannot be represented
1076 entirely with the members of the portable character set other than NUL.
1077
1078 realtime.any
1079 The keywords prefixed by "realtime." are reserved for future
1080 standardization.
1081
1082 security.any
1083 The keywords prefixed by "security." are reserved for future
1084 standardization.
1085
1086 size The size of the file in octets, expressed as a decimal number
1087 using digits from the ISO/IEC 646:1991 standard. This record
1088 shall override the size field in the following header block(s).
1089 When used in write or copy mode, pax shall include a size
1090 extended header record for each file with a size value greater
1091 than 8589934591 (octal 77777777777).
1092
1093 uid The user ID of the file owner, expressed as a decimal number
1094 using digits from the ISO/IEC 646:1991 standard. This record
1095 shall override the uid field in the following header block(s).
1096 When used in write or copy mode, pax shall include a uid
1097 extended header record for each file whose owner ID is greater
1098 than 2097151 (octal 7777777).
1099
1100 uname The owner of the following file(s), formatted as a user name in
1101 the user database. This record shall override the uid and uname
1102 fields in the following header block(s), and any uid extended
1103 header record. When used in read, copy, or list mode, pax shall
1104 translate the name from the UTF-8 encoding in the header record
1105 to the character set appropriate for the user database on the
1106 receiving system. If any of the UTF-8 characters cannot be
1107 translated, and if the -o invalid= UTF-8 option is not speci‐
1108 fied, the results are implementation-defined. When used in write
1109 or copy mode, pax shall include a uname extended header record
1110 for each file whose user name cannot be represented entirely
1111 with the letters and digits of the portable character set.
1112
1113
1114 If the <value> field is zero length, it shall delete any header block
1115 field, previously entered extended header value, or global extended
1116 header value of the same name.
1117
1118 If a keyword in an extended header record (or in a -o option-argument)
1119 overrides or deletes a corresponding field in the ustar header block,
1120 pax shall ignore the contents of that header block field.
1121
1122 Unlike the ustar header block fields, NULs shall not delimit <value>s;
1123 all characters within the <value> field shall be considered data for
1124 the field. None of the length limitations of the ustar header block
1125 fields in ustar Header Block shall apply to the extended header
1126 records.
1127
1128 pax Extended Header Keyword Precedence
1129 This section describes the precedence in which the various header
1130 records and fields and command line options are selected to apply to a
1131 file in the archive. When pax is used in read or list modes, it shall
1132 determine a file attribute in the following sequence:
1133
1134 1. If -o delete= keyword-prefix is used, the affected attributes shall
1135 be determined from step 7., if applicable, or ignored otherwise.
1136
1137 2. If -o keyword:= is used, the affected attributes shall be ignored.
1138
1139 3. If -o keyword := value is used, the affected attribute shall be
1140 assigned the value.
1141
1142 4. If there is a typeflag x extended header record, the affected
1143 attribute shall be assigned the <value>. When extended header
1144 records conflict, the last one given in the header shall take
1145 precedence.
1146
1147 5. If -o keyword = value is used, the affected attribute shall be
1148 assigned the value.
1149
1150 6. If there is a typeflag g global extended header record, the
1151 affected attribute shall be assigned the <value>. When global
1152 extended header records conflict, the last one given in the global
1153 header shall take precedence.
1154
1155 7. Otherwise, the attribute shall be determined from the ustar header
1156 block.
1157
1158 pax Extended Header File Times
1159 The pax utility shall write an mtime record for each file in write or
1160 copy modes if the file's modification time cannot be represented
1161 exactly in the ustar header logical record described in ustar Inter‐
1162 change Format . This can occur if the time is out of ustar range, or if
1163 the file system of the underlying implementation supports non-integer
1164 time granularities and the time is not an integer. All of these time
1165 records shall be formatted as a decimal representation of the time in
1166 seconds since the Epoch. If a period ( '.' ) decimal point character is
1167 present, the digits to the right of the point shall represent the units
1168 of a subsecond timing granularity, where the first digit is tenths of a
1169 second and each subsequent digit is a tenth of the previous digit. In
1170 read or copy mode, the pax utility shall truncate the time of a file to
1171 the greatest value that is not greater than the input header file time.
1172 In write or copy mode, the pax utility shall output a time exactly if
1173 it can be represented exactly as a decimal number, and otherwise shall
1174 generate only enough digits so that the same time shall be recovered if
1175 the file is extracted on a system whose underlying implementation sup‐
1176 ports the same time granularity.
1177
1178 ustar Interchange Format
1179 A ustar archive tape or file shall contain a series of logical records.
1180 Each logical record shall be a fixed-size logical record of 512 octets
1181 (see below). Although this format may be thought of as being stored on
1182 9-track industry-standard 12.7 mm (0.5 in) magnetic tape, other types
1183 of transportable media are not excluded. Each file archived shall be
1184 represented by a header logical record that describes the file, fol‐
1185 lowed by zero or more logical records that give the contents of the
1186 file. At the end of the archive file there shall be two 512-octet logi‐
1187 cal records filled with binary zeros, interpreted as an end-of-archive
1188 indicator.
1189
1190 The logical records may be grouped for physical I/O operations, as
1191 described under the -b blocksize and -x ustar options. Each group of
1192 logical records may be written with a single operation equivalent to
1193 the write() function. On magnetic tape, the result of this write shall
1194 be a single tape physical block. The last physical block shall always
1195 be the full size, so logical records after the two zero logical records
1196 may contain undefined data.
1197
1198 The header logical record shall be structured as shown in the following
1199 table. All lengths and offsets are in decimal.
1200
1201 Table: ustar Header Block
1202
1203 Field Name Octet Offset Length (in Octets)
1204 name 0 100
1205 mode 100 8
1206 uid 108 8
1207 gid 116 8
1208 size 124 12
1209 mtime 136 12
1210 chksum 148 8
1211 typeflag 156 1
1212 linkname 157 100
1213 magic 257 6
1214 version 263 2
1215 uname 265 32
1216 gname 297 32
1217 devmajor 329 8
1218 devminor 337 8
1219 prefix 345 155
1220
1221 All characters in the header logical record shall be represented in the
1222 coded character set of the ISO/IEC 646:1991 standard. For maximum
1223 portability between implementations, names should be selected from
1224 characters represented by the portable filename character set as octets
1225 with the most significant bit zero. If an implementation supports the
1226 use of characters outside of slash and the portable filename character
1227 set in names for files, users, and groups, one or more implementation-
1228 defined encodings of these characters shall be provided for interchange
1229 purposes.
1230
1231 However, the pax utility shall never create filenames on the local sys‐
1232 tem that cannot be accessed via the procedures described in
1233 IEEE Std 1003.1-2001. If a filename is found on the medium that would
1234 create an invalid filename, it is implementation-defined whether the
1235 data from the file is stored on the file hierarchy and under what name
1236 it is stored. The pax utility may choose to ignore these files as long
1237 as it produces an error indicating that the file is being ignored.
1238
1239 Each field within the header logical record is contiguous; that is,
1240 there is no padding used. Each character on the archive medium shall be
1241 stored contiguously.
1242
1243 The fields magic, uname, and gname are character strings each termi‐
1244 nated by a NUL character. The fields name, linkname, and prefix are
1245 NUL-terminated character strings except when all characters in the
1246 array contain non-NUL characters including the last character. The ver‐
1247 sion field is two octets containing the characters "00" (zero-zero).
1248 The typeflag contains a single character. All other fields are leading
1249 zero-filled octal numbers using digits from the ISO/IEC 646:1991 stan‐
1250 dard IRV. Each numeric field is terminated by one or more <space> or
1251 NUL characters.
1252
1253 The name and the prefix fields shall produce the pathname of the file.
1254 A new pathname shall be formed, if prefix is not an empty string (its
1255 first character is not NUL), by concatenating prefix (up to the first
1256 NUL character), a slash character, and name; otherwise, name is used
1257 alone. In either case, name is terminated at the first NUL character.
1258 If prefix begins with a NUL character, it shall be ignored. In this
1259 manner, pathnames of at most 256 characters can be supported. If a
1260 pathname does not fit in the space provided, pax shall notify the user
1261 of the error, and shall not store any part of the file-header or data-
1262 on the medium.
1263
1264 The linkname field, described below, shall not use the prefix to pro‐
1265 duce a pathname. As such, a linkname is limited to 100 characters. If
1266 the name does not fit in the space provided, pax shall notify the user
1267 of the error, and shall not attempt to store the link on the medium.
1268
1269 The mode field provides 12 bits encoded in the ISO/IEC 646:1991 stan‐
1270 dard octal digit representation. The encoded bits shall represent the
1271 following values:
1272
1273 Table: ustar mode Field
1274
1275 Bit Value IEEE Std 1003.1-2001 Bit Description
1276 04000 S_ISUID Set UID on execution.
1277 02000 S_ISGID Set GID on execution.
1278 01000 <reserved> Reserved for future standardization.
1279 00400 S_IRUSR Read permission for file owner class.
1280 00200 S_IWUSR Write permission for file owner
1281 class.
1282 00100 S_IXUSR Execute/search permission for file
1283 owner class.
1284 00040 S_IRGRP Read permission for file group class.
1285 00020 S_IWGRP Write permission for file group
1286 class.
1287 00010 S_IXGRP Execute/search permission for file
1288 group class.
1289 00004 S_IROTH Read permission for file other class.
1290 00002 S_IWOTH Write permission for file other
1291 class.
1292 00001 S_IXOTH Execute/search permission for file
1293 other class.
1294
1295 When appropriate privilege is required to set one of these mode bits,
1296 and the user restoring the files from the archive does not have the
1297 appropriate privilege, the mode bits for which the user does not have
1298 appropriate privilege shall be ignored. Some of the mode bits in the
1299 archive format are not mentioned elsewhere in this volume of
1300 IEEE Std 1003.1-2001. If the implementation does not support those
1301 bits, they may be ignored.
1302
1303 The uid and gid fields are the user and group ID of the owner and group
1304 of the file, respectively.
1305
1306 The size field is the size of the file in octets. If the typeflag field
1307 is set to specify a file to be of type 1 (a link) or 2 (a symbolic
1308 link), the size field shall be specified as zero. If the typeflag field
1309 is set to specify a file of type 5 (directory), the size field shall be
1310 interpreted as described under the definition of that record type. No
1311 data logical records are stored for types 1, 2, or 5. If the typeflag
1312 field is set to 3 (character special file), 4 (block special file), or
1313 6 (FIFO), the meaning of the size field is unspecified by this volume
1314 of IEEE Std 1003.1-2001, and no data logical records shall be stored on
1315 the medium. Additionally, for type 6, the size field shall be ignored
1316 when reading. If the typeflag field is set to any other value, the num‐
1317 ber of logical records written following the header shall be (
1318 size+511)/512, ignoring any fraction in the result of the division.
1319
1320 The mtime field shall be the modification time of the file at the time
1321 it was archived. It is the ISO/IEC 646:1991 standard representation of
1322 the octal value of the modification time obtained from the stat() func‐
1323 tion.
1324
1325 The chksum field shall be the ISO/IEC 646:1991 standard IRV representa‐
1326 tion of the octal value of the simple sum of all octets in the header
1327 logical record. Each octet in the header shall be treated as an
1328 unsigned value. These values shall be added to an unsigned integer,
1329 initialized to zero, the precision of which is not less than 17 bits.
1330 When calculating the checksum, the chksum field is treated as if it
1331 were all spaces.
1332
1333 The typeflag field specifies the type of file archived. If a particular
1334 implementation does not recognize the type, or the user does not have
1335 appropriate privilege to create that type, the file shall be extracted
1336 as if it were a regular file if the file type is defined to have a
1337 meaning for the size field that could cause data logical records to be
1338 written on the medium (see the previous description for size). If con‐
1339 version to a regular file occurs, the pax utility shall produce an
1340 error indicating that the conversion took place. All of the typeflag
1341 fields shall be coded in the ISO/IEC 646:1991 standard IRV:
1342
1343 0 Represents a regular file. For backwards-compatibility, a type‐
1344 flag value of binary zero ( '\0' ) should be recognized as mean‐
1345 ing a regular file when extracting files from the archive. Ar‐
1346 chives written with this version of the archive file format cre‐
1347 ate regular files with a typeflag value of the ISO/IEC 646:1991
1348 standard IRV '0' .
1349
1350 1 Represents a file linked to another file, of any type, previ‐
1351 ously archived. Such files are identified by each file having
1352 the same device and file serial number. The linked-to name is
1353 specified in the linkname field with a NUL-character terminator
1354 if it is less than 100 octets in length.
1355
1356 2 Represents a symbolic link. The contents of the symbolic link
1357 shall be stored in the linkname field.
1358
1359 3,4 Represent character special files and block special files
1360 respectively. In this case the devmajor and devminor fields
1361 shall contain information defining the device, the format of
1362 which is unspecified by this volume of IEEE Std 1003.1-2001.
1363 Implementations may map the device specifications to their own
1364 local specification or may ignore the entry.
1365
1366 5 Specifies a directory or subdirectory. On systems where disk
1367 allocation is performed on a directory basis, the size field
1368 shall contain the maximum number of octets (which may be rounded
1369 to the nearest disk block allocation unit) that the directory
1370 may hold. A size field of zero indicates no such limiting. Sys‐
1371 tems that do not support limiting in this manner should ignore
1372 the size field.
1373
1374 6 Specifies a FIFO special file. Note that the archiving of a FIFO
1375 file archives the existence of this file and not its contents.
1376
1377 7 Reserved to represent a file to which an implementation has
1378 associated some high-performance attribute. Implementations
1379 without such extensions should treat this file as a regular file
1380 (type 0).
1381
1382 A-Z The letters 'A' to 'Z', inclusive, are reserved for custom
1383 implementations. All other values are reserved for future ver‐
1384 sions of IEEE Std 1003.1-2001.
1385
1386
1387 Attempts to archive a socket using ustar interchange format shall pro‐
1388 duce a diagnostic message. Handling of other file types is implementa‐
1389 tion-defined.
1390
1391 The magic field is the specification that this archive was output in
1392 this archive format. If this field contains ustar (the five characters
1393 from the ISO/IEC 646:1991 standard IRV shown followed by NUL), the
1394 uname and gname fields shall contain the ISO/IEC 646:1991 standard IRV
1395 representation of the owner and group of the file, respectively (trun‐
1396 cated to fit, if necessary). When the file is restored by a privileged,
1397 protection-preserving version of the utility, the user and group data‐
1398 bases shall be scanned for these names. If found, the user and group
1399 IDs contained within these files shall be used rather than the values
1400 contained within the uid and gid fields.
1401
1402 cpio Interchange Format
1403 The octet-oriented cpio archive format shall be a series of entries,
1404 each comprising a header that describes the file, the name of the file,
1405 and then the contents of the file.
1406
1407 An archive may be recorded as a series of fixed-size blocks of octets.
1408 This blocking shall be used only to make physical I/O more efficient.
1409 The last group of blocks shall always be at the full size.
1410
1411 For the octet-oriented cpio archive format, the individual entry infor‐
1412 mation shall be in the order indicated and described by the following
1413 table; see also the <cpio.h> header.
1414
1415 Table: Octet-Oriented cpio Archive Entry
1416
1417 Header Field Name Length (in Octets) Interpreted as
1418 c_magic 6 Octal number
1419 c_dev 6 Octal number
1420 c_ino 6 Octal number
1421 c_mode 6 Octal number
1422 c_uid 6 Octal number
1423 c_gid 6 Octal number
1424 c_nlink 6 Octal number
1425 c_rdev 6 Octal number
1426 c_mtime 11 Octal number
1427 c_namesize 6 Octal number
1428 c_filesize 11 Octal number
1429 Filename Field Name Length Interpreted as
1430 c_name c_namesize Pathname string
1431 File Data Field Name Length Interpreted as
1432 c_filedata c_filesize Data
1433
1434 cpio Header
1435 For each file in the archive, a header as defined previously shall be
1436 written. The information in the header fields is written as streams of
1437 the ISO/IEC 646:1991 standard characters interpreted as octal numbers.
1438 The octal numbers shall be extended to the necessary length by append‐
1439 ing the ISO/IEC 646:1991 standard IRV zeros at the most-significant-
1440 digit end of the number; the result is written to the most-significant
1441 digit of the stream of octets first. The fields shall be interpreted as
1442 follows:
1443
1444 c_magic
1445 Identify the archive as being a transportable archive by con‐
1446 taining the identifying value "070707" .
1447
1448 c_dev, c_ino
1449 Contains values that uniquely identify the file within the ar‐
1450 chive (that is, no files contain the same pair of c_dev and
1451 c_ino values unless they are links to the same file). The values
1452 shall be determined in an unspecified manner.
1453
1454 c_mode Contains the file type and access permissions as defined in the
1455 following table.
1456
1457 Table: Values for cpio c_mode Field
1458
1459 File Permissions Name Value Indicates
1460 C_IRUSR 000400 Read by owner
1461 C_IWUSR 000200 Write by owner
1462 C_IXUSR 000100 Execute by owner
1463 C_IRGRP 000040 Read by group
1464 C_IWGRP 000020 Write by group
1465 C_IXGRP 000010 Execute by group
1466 C_IROTH 000004 Read by others
1467 C_IWOTH 000002 Write by others
1468 C_IXOTH 000001 Execute by others
1469 C_ISUID 004000 Set uid
1470 C_ISGID 002000 Set gid
1471 C_ISVTX 001000 Reserved
1472 File Type Name Value Indicates
1473 C_ISDIR 040000 Directory
1474 C_ISFIFO 010000 FIFO
1475 C_ISREG 0100000 Regular file
1476 C_ISLNK 0120000 Symbolic link
1477 C_ISBLK 060000 Block special file
1478 C_ISCHR 020000 Character special file
1479 C_ISSOCK 0140000 Socket
1480 C_ISCTG 0110000 Reserved
1481
1482 Directories, FIFOs, symbolic links, and regular files shall be sup‐
1483 ported on a system conforming to this volume of IEEE Std 1003.1-2001;
1484 additional values defined previously are reserved for compatibility
1485 with existing systems. Additional file types may be supported; how‐
1486 ever, such files should not be written to archives intended to be
1487 transported to other systems.
1488
1489 c_uid Contains the user ID of the owner.
1490
1491 c_gid Contains the group ID of the group.
1492
1493 c_nlink
1494 Contains the number of links referencing the file at the time
1495 the archive was created.
1496
1497 c_rdev Contains implementation-defined information for character or
1498 block special files.
1499
1500 c_mtime
1501 Contains the latest time of modification of the file at the time
1502 the archive was created.
1503
1504 c_namesize
1505 Contains the length of the pathname, including the terminating
1506 NUL character.
1507
1508 c_filesize
1509 Contains the length of the file in octets. This shall be the
1510 length of the data section following the header structure.
1511
1512
1513 cpio Filename
1514 The c_name field shall contain the pathname of the file. The length of
1515 this field in octets is the value of c_namesize.
1516
1517 If a filename is found on the medium that would create an invalid path‐
1518 name, it is implementation-defined whether the data from the file is
1519 stored on the file hierarchy and under what name it is stored.
1520
1521 All characters shall be represented in the ISO/IEC 646:1991 standard
1522 IRV. For maximum portability between implementations, names should be
1523 selected from characters represented by the portable filename character
1524 set as octets with the most significant bit zero. If an implementation
1525 supports the use of characters outside the portable filename character
1526 set in names for files, users, and groups, one or more implementation-
1527 defined encodings of these characters shall be provided for interchange
1528 purposes. However, the pax utility shall never create filenames on the
1529 local system that cannot be accessed via the procedures described pre‐
1530 viously in this volume of IEEE Std 1003.1-2001. If a filename is found
1531 on the medium that would create an invalid filename, it is implementa‐
1532 tion-defined whether the data from the file is stored on the local file
1533 system and under what name it is stored. The pax utility may choose to
1534 ignore these files as long as it produces an error indicating that the
1535 file is being ignored.
1536
1537 cpio File Data
1538 Following c_name, there shall be c_filesize octets of data. Interpreta‐
1539 tion of such data occurs in a manner dependent on the file. If c_file‐
1540 size is zero, no data shall be contained in c_filedata.
1541
1542 When restoring from an archive:
1543
1544 * If the user does not have the appropriate privilege to create a file
1545 of the specified type, pax shall ignore the entry and write an error
1546 message to standard error.
1547
1548 * Only regular files have data to be restored. Presuming a regular
1549 file meets any selection criteria that might be imposed on the for‐
1550 mat-reading utility by the user, such data shall be restored.
1551
1552 * If a user does not have appropriate privilege to set a particular
1553 mode flag, the flag shall be ignored. Some of the mode flags in the
1554 archive format are not mentioned elsewhere in this volume of
1555 IEEE Std 1003.1-2001. If the implementation does not support those
1556 flags, they may be ignored.
1557
1558 cpio Special Entries
1559 FIFO special files, directories, and the trailer shall be recorded with
1560 c_filesize equal to zero. For other special files, c_filesize is
1561 unspecified by this volume of IEEE Std 1003.1-2001. The header for the
1562 next file entry in the archive shall be written directly after the last
1563 octet of the file entry preceding it. A header denoting the filename
1564 TRAILER!!! shall indicate the end of the archive; the contents of
1565 octets in the last block of the archive following such a header are
1566 undefined.
1567
1569 The following exit values shall be returned:
1570
1571 0 All files were processed successfully.
1572
1573 >0 An error occurred.
1574
1575
1577 If pax cannot create a file or a link when reading an archive or cannot
1578 find a file when writing an archive, or cannot preserve the user ID,
1579 group ID, or file mode when the -p option is specified, a diagnostic
1580 message shall be written to standard error and a non-zero exit status
1581 shall be returned, but processing shall continue. In the case where pax
1582 cannot create a link to a file, pax shall not, by default, create a
1583 second copy of the file.
1584
1585 If the extraction of a file from an archive is prematurely terminated
1586 by a signal or error, pax may have only partially extracted the file or
1587 (if the -n option was not specified) may have extracted a file of the
1588 same name as that specified by the user, but which is not the file the
1589 user wanted. Additionally, the file modes of extracted directories may
1590 have additional bits from the S_IRWXU mask set as well as incorrect
1591 modification and access times.
1592
1593 The following sections are informative.
1594
1596 The -p (privileges) option was invented to reconcile differences
1597 between historical tar and cpio implementations. In particular, the two
1598 utilities use -m in diametrically opposed ways. The -p option also pro‐
1599 vides a consistent means of extending the ways in which future file
1600 attributes can be addressed, such as for enhanced security systems or
1601 high-performance files. Although it may seem complex, there are really
1602 two modes that are most commonly used:
1603
1604 -p e ``Preserve everything". This would be used by the historical
1605 superuser, someone with all the appropriate privileges, to pre‐
1606 serve all aspects of the files as they are recorded in the ar‐
1607 chive. The e flag is the sum of o and p, and other implementa‐
1608 tion-defined attributes.
1609
1610 -p p ``Preserve" the file mode bits. This would be used by the user
1611 with regular privileges who wished to preserve aspects of the
1612 file other than the ownership. The file times are preserved by
1613 default, but two other flags are offered to disable these and
1614 use the time of extraction.
1615
1616
1617 The one pathname per line format of standard input precludes pathnames
1618 containing <newline>s. Although such pathnames violate the portable
1619 filename guidelines, they may exist and their presence may inhibit
1620 usage of pax within shell scripts. This problem is inherited from his‐
1621 torical archive programs. The problem can be avoided by listing file‐
1622 name arguments on the command line instead of on standard input.
1623
1624 It is almost certain that appropriate privileges are required for pax
1625 to accomplish parts of this volume of IEEE Std 1003.1-2001. Specifi‐
1626 cally, creating files of type block special or character special,
1627 restoring file access times unless the files are owned by the user (the
1628 -t option), or preserving file owner, group, and mode (the -p option)
1629 all probably require appropriate privileges.
1630
1631 In read mode, implementations are permitted to overwrite files when the
1632 archive has multiple members with the same name. This may fail if per‐
1633 missions on the first version of the file do not permit it to be over‐
1634 written.
1635
1636 The cpio and ustar formats can only support files up to 8589934592
1637 bytes (8 * 2^30) in size.
1638
1640 The following command:
1641
1642
1643 pax -w -f /dev/rmt/1m .
1644
1645 copies the contents of the current directory to tape drive 1, medium
1646 density (assuming historical System V device naming procedures-the his‐
1647 torical BSD device name would be /dev/rmt9).
1648
1649 The following commands:
1650
1651
1652 mkdir newdirpax -rw olddir newdir
1653
1654 copy the olddir directory hierarchy to newdir.
1655
1656
1657 pax -r -s ',^//*usr//*,,' -f a.pax
1658
1659 reads the archive a.pax, with all files rooted in /usr in the archive
1660 extracted relative to the current directory.
1661
1662 Using the option:
1663
1664
1665 -o listopt="%M %(atime)T %(size)D %(name)s"
1666
1667 overrides the default output description in Standard Output and instead
1668 writes:
1669
1670
1671 -rw-rw--- Jan 12 15:53 1492 /usr/foo/bar
1672
1673 Using the options:
1674
1675
1676 -o listopt='%L\t%(size)D\n%.7' \
1677 -o listopt='(name)s\n%(ctime)T\n%T'
1678
1679 overrides the default output description in Standard Output and instead
1680 writes:
1681
1682
1683 /usr/foo/bar -> /tmp 1492
1684 /usr/fo
1685 Jan 12 1991
1686 Jan 31 15:53
1687
1689 The pax utility was new for the ISO POSIX-2:1993 standard. It repre‐
1690 sents a peaceful compromise between advocates of the historical tar and
1691 cpio utilities.
1692
1693 A fundamental difference between cpio and tar was in the way directo‐
1694 ries were treated. The cpio utility did not treat directories differ‐
1695 ently from other files, and to select a directory and its contents
1696 required that each file in the hierarchy be explicitly specified. For
1697 tar, a directory matched every file in the file hierarchy it rooted.
1698
1699 The pax utility offers both interfaces; by default, directories map
1700 into the file hierarchy they root. The -d option causes pax to skip any
1701 file not explicitly referenced, as cpio historically did. The tar -
1702 style behavior was chosen as the default because it was believed that
1703 this was the more common usage and because tar is the more commonly
1704 available interface, as it was historically provided on both System V
1705 and BSD implementations.
1706
1707 The data interchange format specification in this volume of
1708 IEEE Std 1003.1-2001 requires that processes with "appropriate privi‐
1709 leges" shall always restore the ownership and permissions of extracted
1710 files exactly as archived. If viewed from the historic equivalence
1711 between superuser and "appropriate privileges", there are two problems
1712 with this requirement. First, users running as superusers may unknow‐
1713 ingly set dangerous permissions on extracted files. Second, it is need‐
1714 lessly limiting, in that superusers cannot extract files and own them
1715 as superuser unless the archive was created by the superuser. (It
1716 should be noted that restoration of ownerships and permissions for the
1717 superuser, by default, is historical practice in cpio, but not in tar.)
1718 In order to avoid these two problems, the pax specification has an
1719 additional "privilege" mechanism, the -p option. Only a pax invocation
1720 with the privileges needed, and which has the -p option set using the e
1721 specification character, has the "appropriate privilege" to restore
1722 full ownership and permission information.
1723
1724 Note also that this volume of IEEE Std 1003.1-2001 requires that the
1725 file ownership and access permissions shall be set, on extraction, in
1726 the same fashion as the creat() function when provided with the mode
1727 stored in the archive. This means that the file creation mask of the
1728 user is applied to the file permissions.
1729
1730 Users should note that directories may be created by pax while extract‐
1731 ing files with permissions that are different from those that existed
1732 at the time the archive was created. When extracting sensitive informa‐
1733 tion into a directory hierarchy that no longer exists, users are
1734 encouraged to set their file creation mask appropriately to protect
1735 these files during extraction.
1736
1737 The table of contents output is written to standard output to facili‐
1738 tate pipeline processing.
1739
1740 An early proposal had hard links displaying for all pathnames. This was
1741 removed because it complicates the output of the case where -v is not
1742 specified and does not match historical cpio usage. The hard-link
1743 information is available in the -v display.
1744
1745 The description of the -l option allows implementations to make hard
1746 links to symbolic links. IEEE Std 1003.1-2001 does not specify any way
1747 to create a hard link to a symbolic link, but many implementations pro‐
1748 vide this capability as an extension. If there are hard links to sym‐
1749 bolic links when an archive is created, the implementation is required
1750 to archive the hard link in the archive (unless -H or -L is specified).
1751 When in read mode and in copy mode, implementations supporting hard
1752 links to symbolic links should use them when appropriate.
1753
1754 The archive formats inherited from the POSIX.1-1990 standard have cer‐
1755 tain restrictions that have been brought along from historical usage.
1756 For example, there are restrictions on the length of pathnames stored
1757 in the archive. When pax is used in copy( -rw) mode (copying directory
1758 hierarchies), the ability to use extensions from the -x pax format
1759 overcomes these restrictions.
1760
1761 The default blocksize value of 5120 bytes for cpio was selected because
1762 it is one of the standard block-size values for cpio, set when the -B
1763 option is specified. (The other default block-size value for cpio is
1764 512 bytes, and this was considered to be too small.) The default block
1765 value of 10240 bytes for tar was selected because that is the standard
1766 block-size value for BSD tar. The maximum block size of 32256 bytes
1767 (2**15-512 bytes) is the largest multiple of 512 bytes that fits into a
1768 signed 16-bit tape controller transfer register. There are known limi‐
1769 tations in some historical systems that would prevent larger blocks
1770 from being accepted. Historical values were chosen to improve compati‐
1771 bility with historical scripts using dd or similar utilities to manipu‐
1772 late archives. Also, default block sizes for any file type other than
1773 character special file has been deleted from this volume of
1774 IEEE Std 1003.1-2001 as unimportant and not likely to affect the struc‐
1775 ture of the resulting archive.
1776
1777 Implementations are permitted to modify the block-size value based on
1778 the archive format or the device to which the archive is being written.
1779 This is to provide implementations with the opportunity to take advan‐
1780 tage of special types of devices, and it should not be used without a
1781 great deal of consideration as it almost certainly decreases archive
1782 portability.
1783
1784 The intended use of the -n option was to permit extraction of one or
1785 more files from the archive without processing the entire archive. This
1786 was viewed by the standard developers as offering significant perfor‐
1787 mance advantages over historical implementations. The -n option in
1788 early proposals had three effects; the first was to cause special char‐
1789 acters in patterns to not be treated specially. The second was to cause
1790 only the first file that matched a pattern to be extracted. The third
1791 was to cause pax to write a diagnostic message to standard error when
1792 no file was found matching a specified pattern. Only the second behav‐
1793 ior is retained by this volume of IEEE Std 1003.1-2001, for many rea‐
1794 sons. First, it is in general not acceptable for a single option to
1795 have multiple effects. Second, the ability to make pattern matching
1796 characters act as normal characters is useful for parts of pax other
1797 than file extraction. Third, a finer degree of control over the spe‐
1798 cial characters is useful because users may wish to normalize only a
1799 single special character in a single filename. Fourth, given a more
1800 general escape mechanism, the previous behavior of the -n option can be
1801 easily obtained using the -s option or a sed script. Finally, writing
1802 a diagnostic message when a pattern specified by the user is unmatched
1803 by any file is useful behavior in all cases.
1804
1805 In this version, the -n was removed from the copy mode synopsis of pax;
1806 it is inapplicable because there are no pattern operands specified in
1807 this mode.
1808
1809 There is another method than pax for copying subtrees in
1810 IEEE Std 1003.1-2001 described as part of the cp utility. Both methods
1811 are historical practice: cp provides a simpler, more intuitive inter‐
1812 face, while pax offers a finer granularity of control. Each provides
1813 additional functionality to the other; in particular, pax maintains the
1814 hard-link structure of the hierarchy while cp does not. It is the
1815 intention of the standard developers that the results be similar (using
1816 appropriate option combinations in both utilities). The results are not
1817 required to be identical; there seemed insufficient gain to applica‐
1818 tions to balance the difficulty of implementations having to guarantee
1819 that the results would be exactly identical.
1820
1821 A single archive may span more than one file. It is suggested that
1822 implementations provide informative messages to the user on standard
1823 error whenever the archive file is changed.
1824
1825 The -d option (do not create intermediate directories not listed in the
1826 archive) found in early proposals was originally provided as a comple‐
1827 ment to the historic -d option of cpio. It has been deleted.
1828
1829 The -s option in early proposals specified a subset of the substitution
1830 command from the ed utility. As there was no reason for only a subset
1831 to be supported, the -s option is now compatible with the current ed
1832 specification. Since the delimiter can be any non-null character, the
1833 following usage with single spaces is valid:
1834
1835
1836 pax -s " foo bar " ...
1837
1838 The -t description is worded so as to note that this may cause the
1839 access time update caused by some other activity (which occurs while
1840 the file is being read) to be overwritten.
1841
1842 The default behavior of pax with regard to file modification times is
1843 the same as historical implementations of tar. It is not the historical
1844 behavior of cpio.
1845
1846 Because the -i option uses /dev/tty, utilities without a controlling
1847 terminal are not able to use this option.
1848
1849 The -y option, found in early proposals, has been deleted because a
1850 line containing a single period for the -i option has equivalent func‐
1851 tionality. The special lines for the -i option (a single period and the
1852 empty line) are historical practice in cpio.
1853
1854 In early drafts, a -e charmap option was included to increase portabil‐
1855 ity of files between systems using different coded character sets. This
1856 option was omitted because it was apparent that consensus could not be
1857 formed for it. In this version, the use of UTF-8 should be an adequate
1858 substitute.
1859
1860 The -k option was added to address international concerns about the
1861 dangers involved in the character set transformations of -e (if the
1862 target character set were different from the source, the filenames
1863 might be transformed into names matching existing files) and also was
1864 made more general to protect files transferred between file systems
1865 with different {NAME_MAX} values (truncating a filename on a smaller
1866 system might also inadvertently overwrite existing files). As stated,
1867 it prevents any overwriting, even if the target file is older than the
1868 source. This version adds more granularity of options to solve this
1869 problem by introducing the -o invalid= option-specifically the UTF-8
1870 action. (Note that an existing file that is named with a UTF-8 encoding
1871 is still subject to overwriting in this case. The -k option closes that
1872 loophole.)
1873
1874 Some of the file characteristics referenced in this volume of
1875 IEEE Std 1003.1-2001 might not be supported by some archive formats.
1876 For example, neither the tar nor cpio formats contain the file access
1877 time. For this reason, the e specification character has been provided,
1878 intended to cause all file characteristics specified in the archive to
1879 be retained.
1880
1881 It is required that extracted directories, by default, have their
1882 access and modification times and permissions set to the values speci‐
1883 fied in the archive. This has obvious problems in that the directories
1884 are almost certainly modified after being extracted and that directory
1885 permissions may not permit file creation. One possible solution is to
1886 create directories with the mode specified in the archive, as modified
1887 by the umask of the user, with sufficient permissions to allow file
1888 creation. After all files have been extracted, pax would then reset the
1889 access and modification times and permissions as necessary.
1890
1891 The list-mode formatting description borrows heavily from the one
1892 defined by the printf utility. However, since there is no separate op‐
1893 erand list to get conversion arguments, the format was extended to
1894 allow specifying the name of the conversion argument as part of the
1895 conversion specification.
1896
1897 The T conversion specifier allows time fields to be displayed in any of
1898 the date formats. Unlike the ls utility, pax does not adjust the format
1899 when the date is less than six months in the past. This makes parsing
1900 the output more predictable.
1901
1902 The D conversion specifier handles the ability to display the
1903 major/minor or file size, as with ls, by using %-8(size)D.
1904
1905 The L conversion specifier handles the ls display for symbolic links.
1906
1907 Conversion specifiers were added to generate existing known types used
1908 for ls.
1909
1910 pax Interchange Format
1911 The new POSIX data interchange format was developed primarily to sat‐
1912 isfy international concerns that the ustar and cpio formats did not
1913 provide for file, user, and group names encoded in characters outside a
1914 subset of the ISO/IEC 646:1991 standard. The standard developers real‐
1915 ized that this new POSIX data interchange format should be very exten‐
1916 sible because there were other requirements they foresaw in the near
1917 future:
1918
1919 * Support international character encodings and locale information
1920
1921 * Support security information (ACLs, and so on)
1922
1923 * Support future file types, such as realtime or contiguous files
1924
1925 * Include data areas for implementation use
1926
1927 * Support systems with words larger than 32 bits and timers with sub‐
1928 second granularity
1929
1930 The following were not goals for this format because these are better
1931 handled by separate utilities or are inappropriate for a portable for‐
1932 mat:
1933
1934 * Encryption
1935
1936 * Compression
1937
1938 * Data translation between locales and codesets
1939
1940 * inode storage
1941
1942 The format chosen to support the goals is an extension of the ustar
1943 format. Of the two formats previously available, only the ustar format
1944 was selected for extensions because:
1945
1946 * It was easier to extend in an upwards-compatible way. It offered
1947 version flags and header block type fields with room for future
1948 standardization. The cpio format, while possessing a more flexible
1949 file naming methodology, could not be extended without breaking some
1950 theoretical implementation or using a dummy filename that could be a
1951 legitimate filename.
1952
1953 * Industry experience since the original " tar wars" fought in devel‐
1954 oping the ISO POSIX-1 standard has clearly been in favor of the
1955 ustar format, which is generally the default output format selected
1956 for pax implementations on new systems.
1957
1958 The new format was designed with one additional goal in mind: reason‐
1959 able behavior when an older tar or pax utility happened to read an ar‐
1960 chive. Since the POSIX.1-1990 standard mandated that a "format-reading
1961 utility" had to treat unrecognized typeflag values as regular files,
1962 this allowed the format to include all the extended information in a
1963 pseudo-regular file that preceded each real file. An option is given
1964 that allows the archive creator to set up reasonable names for these
1965 files on the older systems. Also, the normative text suggests that rea‐
1966 sonable file access values be used for this ustar header block. Making
1967 these header files inaccessible for convenient reading and deleting
1968 would not be reasonable. File permissions of 600 or 700 are suggested.
1969
1970 The ustar typeflag field was used to accommodate the additional func‐
1971 tionality of the new format rather than magic or version because the
1972 POSIX.1-1990 standard (and, by reference, the previous version of pax),
1973 mandated the behavior of the format-reading utility when it encountered
1974 an unknown typeflag, but was silent about the other two fields.
1975
1976 Early proposals of the first revision to IEEE Std 1003.1-2001 contained
1977 a proposed archive format that was based on compatibility with the
1978 standard for tape files (ISO 1001, similar to the format used histori‐
1979 cally on many mainframes and minicomputers). This format was overly
1980 complex and required considerable overhead in volume and header
1981 records. Furthermore, the standard developers felt that it would not be
1982 acceptable to the community of POSIX developers, so it was later
1983 changed to be a format more closely related to historical practice on
1984 POSIX systems.
1985
1986 The prefix and name split of pathnames in ustar was replaced by the
1987 single path extended header record for simplicity.
1988
1989 The concept of a global extended header ( typeflag g) was controver‐
1990 sial. If this were applied to an archive being recorded on magnetic
1991 tape, a few unreadable blocks at the beginning of the tape could be a
1992 serious problem; a utility attempting to extract as many files as pos‐
1993 sible from a damaged archive could lose a large percentage of file
1994 header information in this case. However, if the archive were on a
1995 reliable medium, such as a CD-ROM, the global extended header offers
1996 considerable potential size reductions by eliminating redundant infor‐
1997 mation. Thus, the text warns against using the global method for unre‐
1998 liable media and provides a method for implanting global information in
1999 the extended header for each file, rather than in the typeflag g
2000 records.
2001
2002 No facility for data translation or filtering on a per-file basis is
2003 included because the standard developers could not invent an interface
2004 that would allow this in an efficient manner. If a filter, such as
2005 encryption or compression, is to be applied to all the files, it is
2006 more efficient to apply the filter to the entire archive as a single
2007 file. The standard developers considered interfaces that would invoke a
2008 shell script for each file going into or out of the archive, but the
2009 system overhead in this approach was considered to be too high.
2010
2011 One such approach would be to have filter= records that give a pathname
2012 for an executable. When the program is invoked, the file and archive
2013 would be open for standard input/output and all the header fields would
2014 be available as environment variables or command-line arguments. The
2015 standard developers did discuss such schemes, but they were omitted
2016 from IEEE Std 1003.1-2001 due to concerns about excessive overhead.
2017 Also, the program itself would need to be in the archive if it were to
2018 be used portably.
2019
2020 There is currently no portable means of identifying the character
2021 set(s) used for a file in the file system. Therefore, pax has not been
2022 given a mechanism to generate charset records automatically. The only
2023 portable means of doing this is for the user to write the archive using
2024 the -o charset= string command line option. This assumes that all of
2025 the files in the archive use the same encoding. The "implementation-
2026 defined" text is included to allow for a system that can identify the
2027 encodings used for each of its files.
2028
2029 The table of standards that accompanies the charset record description
2030 is acknowledged to be very limited. Only a limited number of character
2031 set standards is reasonable for maximal interchange. Any character set
2032 is, of course, possible by prior agreement. It was suggested that
2033 EBCDIC be listed, but it was omitted because it is not defined by a
2034 formal standard. Formal standards, and then only those with reasonably
2035 large followings, can be included here, simply as a matter of practi‐
2036 cality. The <value>s represent names of officially registered character
2037 sets in the format required by the ISO 2375:1985 standard.
2038
2039 The normal comma or <blank>-separated list rules are not followed in
2040 the case of keyword options to allow ease of argument parsing for
2041 getopts.
2042
2043 Further information on character encodings is in pax Archive Character
2044 Set Encoding/Decoding .
2045
2046 The standard developers have reserved keyword name space for vendor
2047 extensions. It is suggested that the format to be used is:
2048
2049
2050 VENDOR.keyword
2051
2052 where VENDOR is the name of the vendor or organization in all uppercase
2053 letters. It is further suggested that the keyword following the period
2054 be named differently than any of the standard keywords so that it could
2055 be used for future standardization, if appropriate, by omitting the
2056 VENDOR prefix.
2057
2058 The <length> field in the extended header record was included to make
2059 it simpler to step through the records, even if a record contains an
2060 unknown format (to a particular pax) with complex interactions of spe‐
2061 cial characters. It also provides a minor integrity checkpoint within
2062 the records to aid a program attempting to recover files from a damaged
2063 archive.
2064
2065 There are no extended header versions of the devmajor and devminor
2066 fields because the unspecified format ustar header field should be suf‐
2067 ficient. If they are not, vendor-specific extended keywords (such as
2068 VENDOR.devmajor) should be used.
2069
2070 Device and i-number labeling of files was not adopted from cpio; files
2071 are interchanged strictly on a symbolic name basis, as in ustar.
2072
2073 Just as with the ustar format descriptions, the new format makes no
2074 special arrangements for multi-volume archives. Each of the pax archive
2075 types is assumed to be inside a single POSIX file and splitting that
2076 file over multiple volumes (diskettes, tape cartridges, and so on),
2077 processing their labels, and mounting each in the proper sequence are
2078 considered to be implementation details that cannot be described
2079 portably.
2080
2081 The pax format is intended for interchange, not only for backup on a
2082 single (family of) systems. It is not as densely packed as might be
2083 possible for backup:
2084
2085 * It contains information as coded characters that could be coded in
2086 binary.
2087
2088 * It identifies extended records with name fields that could be omit‐
2089 ted in favor of a fixed-field layout.
2090
2091 * It translates names into a portable character set and identifies
2092 locale-related information, both of which are probably unnecessary
2093 for backup.
2094
2095 The requirements on restoring from an archive are slightly different
2096 from the historical wording, allowing for non-monolithic privilege to
2097 bring forward as much as possible. In particular, attributes such as
2098 "high performance file" might be broadly but not universally granted
2099 while set-user-ID or chown() might be much more restricted. There is
2100 no implication in IEEE Std 1003.1-2001 that the security information be
2101 honored after it is restored to the file hierarchy, in spite of what
2102 might be improperly inferred by the silence on that topic. That is a
2103 topic for another standard.
2104
2105 Links are recorded in the fashion described here because a link can be
2106 to any file type. It is desirable in general to be able to restore part
2107 of an archive selectively and restore all of those files completely. If
2108 the data is not associated with each link, it is not possible to do
2109 this. However, the data associated with a file can be large, and when
2110 selective restoration is not needed, this can be a significant burden.
2111 The archive is structured so that files that have no associated data
2112 can always be restored by the name of any link name of any link, and
2113 the user may choose whether data is recorded with each instance of a
2114 file that contains data. The format permits mixing of both types of
2115 links in a single archive; this can be done for special needs, and pax
2116 is expected to interpret such archives on input properly, despite the
2117 fact that there is no pax option that would force this mixed case on
2118 output. (When -o linkdata is used, the output must contain the dupli‐
2119 cate data, but the implementation is free to include it or omit it when
2120 -o linkdata is not used.)
2121
2122 The time values are included as extended header records for those
2123 implementations needing more than the eleven octal digits allowed by
2124 the ustar format. Portable file timestamps cannot be negative. If pax
2125 encounters a file with a negative timestamp in copy or write mode, it
2126 can reject the file, substitute a non-negative timestamp, or generate a
2127 non-portable timestamp with a leading '-' . Even though some implemen‐
2128 tations can support finer file-time granularities than seconds, the
2129 normative text requires support only for seconds since the Epoch
2130 because the ISO POSIX-1 standard states them that way. The ustar format
2131 includes only mtime; the new format adds atime and ctime for symmetry.
2132 The atime access time restored to the file system will be affected by
2133 the -p a and -p e options. The ctime creation time (actually inode
2134 modification time) is described with "appropriate privilege" so that it
2135 can be ignored when writing to the file system. POSIX does not provide
2136 a portable means to change file creation time. Nothing is intended to
2137 prevent a non-portable implementation of pax from restoring the value.
2138
2139 The gid, size, and uid extended header records were included to allow
2140 expansion beyond the sizes specified in the regular tar header. New
2141 file system architectures are emerging that will exhaust the 12-digit
2142 size field. There are probably not many systems requiring more than 8
2143 digits for user and group IDs, but the extended header values were
2144 included for completeness, allowing overrides for all of the decimal
2145 values in the tar header.
2146
2147 The standard developers intended to describe the effective results of
2148 pax with regard to file ownerships and permissions; implementations are
2149 not restricted in timing or sequencing the restoration of such, pro‐
2150 vided the results are as specified.
2151
2152 Much of the text describing the extended headers refers to use in "
2153 write or copy modes". The copy mode references are due to the normative
2154 text: "The effect of the copy shall be as if the copied files were
2155 written to an archive file and then subsequently extracted ...". There
2156 is certainly no way to test whether pax is actually generating the
2157 extended headers in copy mode, but the effects must be as if it had.
2158
2159 pax Archive Character Set Encoding/Decoding
2160 There is a need to exchange archives of files between systems of dif‐
2161 ferent native codesets. Filenames, group names, and user names must be
2162 preserved to the fullest extent possible when an archive is read on the
2163 receiving platform. Translation of the contents of files is not within
2164 the scope of the pax utility.
2165
2166 There will also be the need to represent characters that are not avail‐
2167 able on the receiving platform. These unsupported characters cannot be
2168 automatically folded to the local set of characters due to the chance
2169 of collisions. This could result in overwriting previous extracted
2170 files from the archive or pre-existing files on the system.
2171
2172 For these reasons, the codeset used to represent characters within the
2173 extended header records of the pax archive must be sufficiently rich to
2174 handle all commonly used character sets. The fields requiring transla‐
2175 tion include, at a minimum, filenames, user names, group names, and
2176 link pathnames. Implementations may wish to have localized extended
2177 keywords that use non-portable characters.
2178
2179 The standard developers considered the following options:
2180
2181 * The archive creator specifies the well-defined name of the source
2182 codeset. The receiver must then recognize the codeset name and per‐
2183 form the appropriate translations to the destination codeset.
2184
2185 * The archive creator includes within the archive the character map‐
2186 ping table for the source codeset used to encode extended header
2187 records. The receiver must then read the character mapping table and
2188 perform the appropriate translations to the destination codeset.
2189
2190 * The archive creator translates the extended header records in the
2191 source codeset into a canonical form. The receiver must then perform
2192 the appropriate translations to the destination codeset.
2193
2194 The approach that incorporates the name of the source codeset poses the
2195 problem of codeset name registration, and makes the archive useless to
2196 pax archive decoders that do not recognize that codeset.
2197
2198 Because parts of an archive may be corrupted, the standard developers
2199 felt that including the character map of the source codeset was too
2200 fragile. The loss of this one key component could result in making the
2201 entire archive useless. (The difference between this and the global
2202 extended header decision was that the latter has a workaround-duplicat‐
2203 ing extended header records on unreliable media-but this would be too
2204 burdensome for large character set maps.)
2205
2206 Both of the above approaches also put an undue burden on the pax ar‐
2207 chive receiver to handle the cross-product of all source and destina‐
2208 tion codesets.
2209
2210 To simplify the translation from the source codeset to the canonical
2211 form and from the canonical form to the destination codeset, the stan‐
2212 dard developers decided that the internal representation should be a
2213 stateless encoding. A stateless encoding is one where each codepoint
2214 has the same meaning, without regard to the decoder being in a specific
2215 state. An example of a stateful encoding would be the Japanese Shift-
2216 JIS; an example of a stateless encoding would be the ISO/IEC 646:1991
2217 standard (equivalent to 7-bit ASCII).
2218
2219 For these reasons, the standard developers decided to adopt a canonical
2220 format for the representation of file information strings. The obvious,
2221 well-endorsed candidate is the ISO/IEC 10646-1:2000 standard (based in
2222 part on Unicode), which can be used to represent the characters of vir‐
2223 tually all standardized character sets. The standard developers ini‐
2224 tially agreed upon using UCS2 (16-bit Unicode) as the internal repre‐
2225 sentation. This repertoire of characters provides a sufficiently rich
2226 set to represent all commonly-used codesets.
2227
2228 However, the standard developers found that the 16-bit Unicode repre‐
2229 sentation had some problems. It forced the issue of standardizing byte
2230 ordering. The 2-byte length of each character made the extended header
2231 records twice as long for the case of strings coded entirely from his‐
2232 torical 7-bit ASCII. For these reasons, the standard developers chose
2233 the UTF-8 defined in the ISO/IEC 10646-1:2000 standard. This multi-byte
2234 representation encodes UCS2 or UCS4 characters reliably and determinis‐
2235 tically, eliminating the need for a canonical byte ordering. In addi‐
2236 tion, NUL octets and other characters possibly confusing to POSIX file
2237 systems do not appear, except to represent themselves. It was realized
2238 that certain national codesets take up more space after the encoding,
2239 due to their placement within the UCS range; it was felt that the use‐
2240 fulness of the encoding of the names outweighs the disadvantage of size
2241 increase for file, user, and group names.
2242
2243 The encoding of UTF-8 is as follows:
2244
2245
2246 UCS4 Hex Encoding UTF-8 Binary Encoding
2247
2248
2249 00000000-0000007F 0xxxxxxx
2250 00000080-000007FF 110xxxxx 10xxxxxx
2251 00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
2252 00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
2253 00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
2254 04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
2255
2256 where each 'x' represents a bit value from the character being trans‐
2257 lated.
2258
2259 ustar Interchange Format
2260 The description of the ustar format reflects numerous enhancements over
2261 pre-1988 versions of the historical tar utility. The goal of these
2262 changes was not only to provide the functional enhancements desired,
2263 but also to retain compatibility between new and old versions. This
2264 compatibility has been retained. Archives written using the old ar‐
2265 chive format are compatible with the new format.
2266
2267 Implementors should be aware that the previous file format did not
2268 include a mechanism to archive directory type files. For this reason,
2269 the convention of using a filename ending with slash was adopted to
2270 specify a directory on the archive.
2271
2272 The total size of the name and prefix fields have been set to meet the
2273 minimum requirements for {PATH_MAX}. If a pathname will fit within the
2274 name field, it is recommended that the pathname be stored there without
2275 the use of the prefix field. Although the name field is known to be too
2276 small to contain {PATH_MAX} characters, the value was not changed in
2277 this version of the archive file format to retain backwards-compatibil‐
2278 ity, and instead the prefix was introduced. Also, because of the ear‐
2279 lier version of the format, there is no way to remove the restriction
2280 on the linkname field being limited in size to just that of the name
2281 field.
2282
2283 The size field is required to be meaningful in all implementation
2284 extensions, although it could be zero. This is required so that the
2285 data blocks can always be properly counted.
2286
2287 It is suggested that if device special files need to be represented
2288 that cannot be represented in the standard format, that one of the
2289 extension types ( A- Z) be used, and that the additional information
2290 for the special file be represented as data and be reflected in the
2291 size field.
2292
2293 Attempting to restore a special file type, where it is converted to
2294 ordinary data and conflicts with an existing filename, need not be spe‐
2295 cially detected by the utility. If run as an ordinary user, pax should
2296 not be able to overwrite the entries in, for example, /dev in any case
2297 (whether the file is converted to another type or not). If run as a
2298 privileged user, it should be able to do so, and it would be considered
2299 a bug if it did not. The same is true of ordinary data files and simi‐
2300 larly named special files; it is impossible to anticipate the needs of
2301 the user (who could really intend to overwrite the file), so the behav‐
2302 ior should be predictable (and thus regular) and rely on the protection
2303 system as required.
2304
2305 The value 7 in the typeflag field is intended to define how contiguous
2306 files can be stored in a ustar archive. IEEE Std 1003.1-2001 does not
2307 require the contiguous file extension, but does define a standard way
2308 of archiving such files so that all conforming systems can interpret
2309 these file types in a meaningful and consistent manner. On a system
2310 that does not support extended file types, the pax utility should do
2311 the best it can with the file and go on to the next.
2312
2313 The file protection modes are those conventionally used by the ls util‐
2314 ity. This is extended beyond the usage in the ISO POSIX-2 standard to
2315 support the "shared text" or "sticky" bit. It is intended that the con‐
2316 formance document should not document anything beyond the existence of
2317 and support of such a mode. Further extensions are expected to these
2318 bits, particularly with overloading the set-user-ID and set-group-ID
2319 flags.
2320
2321 cpio Interchange Format
2322 The reference to appropriate privilege in the cpio format refers to an
2323 error on standard output; the ustar format does not make comparable
2324 statements.
2325
2326 The model for this format was the historical System V cpio -c data
2327 interchange format. This model documents the portable version of the
2328 cpio format and not the binary version. It has the flexibility to
2329 transfer data of any type described within IEEE Std 1003.1-2001, yet is
2330 extensible to transfer data types specific to extensions beyond
2331 IEEE Std 1003.1-2001 (for example, contiguous files). Because it
2332 describes existing practice, there is no question of maintaining
2333 upwards-compatibility.
2334
2335 cpio Header
2336 There has been some concern that the size of the c_ino field of the
2337 header is too small to handle those systems that have very large inode
2338 numbers. However, the c_ino field in the header is used strictly as a
2339 hard-link resolution mechanism for archives. It is not necessarily the
2340 same value as the inode number of the file in the location from which
2341 that file is extracted.
2342
2343 The name c_magic is based on historical usage.
2344
2345 cpio Filename
2346 For most historical implementations of the cpio utility, {PATH_MAX}
2347 octets can be used to describe the pathname without the addition of any
2348 other header fields (the NUL character would be included in this
2349 count). {PATH_MAX} is the minimum value for pathname size, documented
2350 as 256 bytes. However, an implementation may use c_namesize to deter‐
2351 mine the exact length of the pathname. With the current description of
2352 the <cpio.h> header, this pathname size can be as large as a number
2353 that is described in six octal digits.
2354
2355 Two values are documented under the c_mode field values to provide for
2356 extensibility for known file types:
2357
2358 0110 000
2359 Reserved for contiguous files. The implementation may treat the
2360 rest of the information for this archive like a regular file.
2361 If this file type is undefined, the implementation may create
2362 the file as a regular file.
2363
2364
2365 This provides for extensibility of the cpio format while allowing for
2366 the ability to read old archives. Files of an unknown type may be read
2367 as "regular files" on some implementations. On a system that does not
2368 support extended file types, the pax utility should do the best it can
2369 with the file and go on to the next.
2370
2372 None.
2373
2375 Shell Command Language, cp, ed, getopts, ls, printf(), the Base Defini‐
2376 tions volume of IEEE Std 1003.1-2001, <cpio.h>, the System Interfaces
2377 volume of IEEE Std 1003.1-2001, chown(), creat(), mkdir(), mkfifo(),
2378 stat(), utime(), write()
2379
2381 Portions of this text are reprinted and reproduced in electronic form
2382 from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
2383 -- Portable Operating System Interface (POSIX), The Open Group Base
2384 Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
2385 Electrical and Electronics Engineers, Inc and The Open Group. In the
2386 event of any discrepancy between this version and the original IEEE and
2387 The Open Group Standard, the original IEEE and The Open Group Standard
2388 is the referee document. The original Standard can be obtained online
2389 at http://www.opengroup.org/unix/online.html .
2390
2391
2392
2393IEEE/The Open Group 2003 PAX(1P)