1PAX(P) POSIX Programmer's Manual PAX(P)
2
3
4
6 pax - portable archive interchange
7
9 pax [-cdnv][-H|-L][-f archive][-s replstr]...[pattern...]
10
11 pax -r[-cdiknuv][-H|-L][-f archive][-o options]...[-p string]...
12 [-s replstr]...[pattern...]
13
14 pax -w[-dituvX][-H|-L][-b blocksize][[-a][-f archive][-o options]...
15 [-s replstr]...[-x format][file...]
16
17 pax -r -w[-diklntuvX][-H|-L][-p string]...[-s replstr]...
18 [file...] directory
19
20
22 The pax utility shall read, write, and write lists of the members of
23 archive files and copy directory hierarchies. A variety of archive for‐
24 mats shall be supported; see the -x format option.
25
26 The action to be taken depends on the presence of the -r and -w
27 options. The four combinations of -r and -w are referred to as the four
28 modes of operation: list, read, write, and copy modes, corresponding
29 respectively to the four forms shown in the SYNOPSIS section.
30
31 list In list mode (when neither -r nor -w are specified), pax shall
32 write the names of the members of the archive file read from the
33 standard input, with pathnames matching the specified patterns,
34 to standard output. If a named file is of type directory, the
35 file hierarchy rooted at that file shall be listed as well.
36
37 read In read mode (when -r is specified, but -w is not), pax shall
38 extract the members of the archive file read from the standard
39 input, with pathnames matching the specified patterns. If an
40 extracted file is of type directory, the file hierarchy rooted
41 at that file shall be extracted as well. The extracted files
42 shall be created performing pathname resolution with the direc‐
43 tory in which pax was invoked as the current working directory.
44
45 If an attempt is made to extract a directory when the directory already
46 exists, this shall not be considered an error. If an attempt is made to
47 extract a FIFO when the FIFO already exists, this shall not be consid‐
48 ered an error.
49
50 The ownership, access, and modification times, and file mode of the
51 restored files are discussed under the -p option.
52
53 write In write mode (when -w is specified, but -r is not), pax shall
54 write the contents of the file operands to the standard output
55 in an archive format. If no file operands are specified, a list
56 of files to copy, one per line, shall be read from the standard
57 input. A file of type directory shall include all of the files
58 in the file hierarchy rooted at the file.
59
60 copy In copy mode (when both -r and -w are specified), pax shall copy
61 the file operands to the destination directory.
62
63 If no file operands are specified, a list of files to copy, one per
64 line, shall be read from the standard input. A file of type directory
65 shall include all of the files in the file hierarchy rooted at the
66 file.
67
68 The effect of the copy shall be as if the copied files were written to
69 an archive file and then subsequently extracted, except that there may
70 be hard links between the original and the copied files. If the desti‐
71 nation directory is a subdirectory of one of the files to be copied,
72 the results are unspecified. If the destination directory is a file of
73 a type not defined by the System Interfaces volume of
74 IEEE Std 1003.1-2001, the results are implementation-defined; other‐
75 wise, it shall be an error for the file named by the directory operand
76 not to exist, not be writable by the user, or not be a file of type
77 directory.
78
79
80 In read or copy modes, if intermediate directories are necessary to
81 extract an archive member, pax shall perform actions equivalent to the
82 mkdir() function defined in the System Interfaces volume of
83 IEEE Std 1003.1-2001, called with the following arguments:
84
85 * The intermediate directory used as the path argument
86
87 * The value of the bitwise-inclusive OR of S_IRWXU, S_IRWXG, and
88 S_IRWXO as the mode argument
89
90 If any specified pattern or file operands are not matched by at least
91 one file or archive member, pax shall write a diagnostic message to
92 standard error for each one that did not match and exit with a non-zero
93 exit status.
94
95 The archive formats described in the EXTENDED DESCRIPTION section shall
96 be automatically detected on input. The default output archive format
97 shall be implementation-defined.
98
99 A single archive can span multiple files. The pax utility shall deter‐
100 mine, in an implementation-defined manner, what file to read or write
101 as the next file.
102
103 If the selected archive format supports the specification of linked
104 files, it shall be an error if these files cannot be linked when the
105 archive is extracted. For archive formats that do not store file con‐
106 tents with each name that causes a hard link, if the file that contains
107 the data is not extracted during this pax session, either the data
108 shall be restored from the original file, or a diagnostic message shall
109 be displayed with the name of a file that can be used to extract the
110 data. In traversing directories, pax shall detect infinite loops; that
111 is, entering a previously visited directory that is an ancestor of the
112 last file visited. When it detects an infinite loop, pax shall write a
113 diagnostic message to standard error and shall terminate.
114
116 The pax utility shall conform to the Base Definitions volume of
117 IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines, except
118 that the order of presentation of the -o, -p, and -s options is signif‐
119 icant.
120
121 The following options shall be supported:
122
123 -r Read an archive file from standard input.
124
125 -w Write files to the standard output in the specified archive for‐
126 mat.
127
128 -a Append files to the end of the archive. It is implementation-
129 defined which devices on the system support appending. Addi‐
130 tional file formats unspecified by this volume of
131 IEEE Std 1003.1-2001 may impose restrictions on appending.
132
133 -b blocksize
134 Block the output at a positive decimal integer number of bytes
135 per write to the archive file. Devices and archive formats may
136 impose restrictions on blocking. Blocking shall be automatically
137 determined on input. Conforming applications shall not specify a
138 blocksize value larger than 32256. Default blocking when creat‐
139 ing archives depends on the archive format. (See the -x option
140 below.)
141
142 -c Match all file or archive members except those specified by the
143 pattern or file operands.
144
145 -d Cause files of type directory being copied or archived or ar‐
146 chive members of type directory being extracted or listed to
147 match only the file or archive member itself and not the file
148 hierarchy rooted at the file.
149
150 -f archive
151 Specify the pathname of the input or output archive, overriding
152 the default standard input (in list or read modes) or standard
153 output ( write mode).
154
155 -H If a symbolic link referencing a file of type directory is spec‐
156 ified on the command line, pax shall archive the file hierarchy
157 rooted in the file referenced by the link, using the name of the
158 link as the root of the file hierarchy. Otherwise, if a symbolic
159 link referencing a file of any other file type which pax can
160 normally archive is specified on the command line, then pax
161 shall archive the file referenced by the link, using the name of
162 the link. The default behavior shall be to archive the symbolic
163 link itself.
164
165 -i Interactively rename files or archive members. For each archive
166 member matching a pattern operand or file matching a file oper‐
167 and, a prompt shall be written to the file /dev/tty. The prompt
168 shall contain the name of the file or archive member, but the
169 format is otherwise unspecified. A line shall then be read from
170 /dev/tty. If this line is blank, the file or archive member
171 shall be skipped. If this line consists of a single period, the
172 file or archive member shall be processed with no modification
173 to its name. Otherwise, its name shall be replaced with the con‐
174 tents of the line. The pax utility shall immediately exit with a
175 non-zero exit status if end-of-file is encountered when reading
176 a response or if /dev/tty cannot be opened for reading and writ‐
177 ing.
178
179 The results of extracting a hard link to a file that has been renamed
180 during extraction are unspecified.
181
182 -k Prevent the overwriting of existing files.
183
184 -l (The letter ell.) In copy mode, hard links shall be made between
185 the source and destination file hierarchies whenever possible.
186 If specified in conjunction with -H or -L, when a symbolic link
187 is encountered, the hard link created in the destination file
188 hierarchy shall be to the file referenced by the symbolic link.
189 If specified when neither -H nor -L is specified, when a sym‐
190 bolic link is encountered, the implementation shall create a
191 hard link to the symbolic link in the source file hierarchy or
192 copy the symbolic link to the destination.
193
194 -L If a symbolic link referencing a file of type directory is spec‐
195 ified on the command line or encountered during the traversal of
196 a file hierarchy, pax shall archive the file hierarchy rooted in
197 the file referenced by the link, using the name of the link as
198 the root of the file hierarchy. Otherwise, if a symbolic link
199 referencing a file of any other file type which pax can normally
200 archive is specified on the command line or encountered during
201 the traversal of a file hierarchy, pax shall archive the file
202 referenced by the link, using the name of the link. The default
203 behavior shall be to archive the symbolic link itself.
204
205 -n Select the first archive member that matches each pattern oper‐
206 and. No more than one archive member shall be matched for each
207 pattern (although members of type directory shall still match
208 the file hierarchy rooted at that file).
209
210 -o options
211 Provide information to the implementation to modify the algo‐
212 rithm for extracting or writing files. The value of options
213 shall consist of one or more comma-separated keywords of the
214 form:
215
216
217 keyword[[:]=value][,keyword[[:]=value], ...]
218
219 Some keywords apply only to certain file formats, as indicated with
220 each description. Use of keywords that are inapplicable to the file
221 format being processed produces undefined results.
222
223 Keywords in the options argument shall be a string that would be a
224 valid portable filename as described in the Base Definitions volume of
225 IEEE Std 1003.1-2001, Section 3.276, Portable Filename Character Set.
226
227 Note:
228 Keywords are not expected to be filenames, merely to follow the
229 same character composition rules as portable filenames.
230
231
232 Keywords can be preceded with white space. The value field shall con‐
233 sist of zero or more characters; within value, the application shall
234 precede any literal comma with a backslash, which shall be ignored, but
235 preserves the comma as part of value. A comma as the final character,
236 or a comma followed solely by white space as the final characters, in
237 options shall be ignored. Multiple -o options can be specified; if key‐
238 words given to these multiple -o options conflict, the keywords and
239 values appearing later in command line sequence shall take precedence
240 and the earlier shall be silently ignored. The following keyword values
241 of options shall be supported for the file formats as indicated:
242
243 delete=pattern
244
245 (Applicable only to the -x pax format.) When used in write or
246 copy mode, pax shall omit from extended header records that it
247 produces any keywords matching the string pattern. When used in
248 read or list mode, pax shall ignore any keywords matching the
249 string pattern in the extended header records. In both cases,
250 matching shall be performed using the pattern matching notation
251 described in Patterns Matching a Single Character and Patterns
252 Matching Multiple Characters . For example:
253
254
255 -o delete=security.*
256
257 would suppress security-related information. See pax Extended
258 Header for extended header record keyword usage.
259
260 exthdr.name=string
261
262 (Applicable only to the -x pax format.) This keyword allows user
263 control over the name that is written into the ustar header
264 blocks for the extended header produced under the circumstances
265 described in pax Header Block . The name shall be the contents
266 of string, after the following character substitutions have been
267 made:
268
269 string
270 Includes: Replaced By:
271 %d The directory name of the file, equiva‐
272 lent to the result of the dirname util‐
273 ity on the translated pathname.
274 %f The filename of the file, equivalent to
275 the result of the basename utility on
276 the translated pathname.
277 %p The process ID of the pax process.
278 %% A '%' character.
279
280 Any other '%' characters in string produce undefined results.
281
282 If no -o exthdr.name= string is specified, pax shall use the
283 following default value:
284
285
286 %d/PaxHeaders.%p/%f
287
288 globexthdr.name=string
289
290 (Applicable only to the -x pax format.) When used in write or
291 copy mode with the appropriate options, pax shall create global
292 extended header records with ustar header blocks that will be
293 treated as regular files by previous versions of pax. This key‐
294 word allows user control over the name that is written into the
295 ustar header blocks for global extended header records. The name
296 shall be the contents of string, after the following character
297 substitutions have been made:
298
299 string
300 Includes: Replaced By:
301 %n An integer that represents the sequence
302 number of the global extended header
303 record in the archive, starting at 1.
304 %p The process ID of the pax process.
305 %% A '%' character.
306
307 Any other '%' characters in string produce undefined results.
308
309 If no -o globexthdr.name= string is specified, pax shall use the
310 following default value:
311
312
313 $TMPDIR/GlobalHead.%p.%n
314
315 where $ TMPDIR represents the value of the TMPDIR environment
316 variable. If TMPDIR is not set, pax shall use /tmp.
317
318 invalid=action
319
320 (Applicable only to the -x pax format.) This keyword allows user
321 control over the action pax takes upon encountering values in an
322 extended header record that, in read or copy mode, are invalid
323 in the destination hierarchy or, in list mode, cannot be written
324 in the codeset and current locale of the implementation. The
325 following are invalid values that shall be recognized by pax:
326
327 * In read or copy mode, a filename or link name that
328 contains character encodings invalid in the destina‐
329 tion hierarchy. (For example, the name may contain
330 embedded NULs.)
331
332 * In read or copy mode, a filename or link name that is
333 longer than the maximum allowed in the destination
334 hierarchy (for either a pathname component or the
335 entire pathname).
336
337 * In list mode, any character string value (filename,
338 link name, user name, and so on) that cannot be writ‐
339 ten in the codeset and current locale of the implemen‐
340 tation.
341
342 The following mutually-exclusive values of the action argument
343 are supported:
344
345 bypass
346 In read or copy mode, pax shall bypass the file, causing
347 no change to the destination hierarchy. In list mode, pax
348 shall write all requested valid values for the file, but
349 its method for writing invalid values is unspecified.
350
351 rename
352 In read or copy mode, pax shall act as if the -i option
353 were in effect for each file with invalid filename or
354 link name values, allowing the user to provide a replace‐
355 ment name interactively. In list mode, pax shall behave
356 identically to the bypass action.
357
358 UTF-8
359 When used in read, copy, or list mode and a filename,
360 link name, owner name, or any other field in an extended
361 header record cannot be translated from the pax UTF-8
362 codeset format to the codeset and current locale of the
363 implementation, pax shall use the actual UTF-8 encoding
364 for the name.
365
366 write
367 In read or copy mode, pax shall write the file, translat‐
368 ing or truncating the name, regardless of whether this
369 may overwrite an existing file with a valid name. In list
370 mode, pax shall behave identically to the bypass action.
371
372
373 If no -o invalid= option is specified, pax shall act as if -o
374 invalid= bypass were specified. Any overwriting of existing
375 files that may be allowed by the -o invalid= actions shall be
376 subject to permission ( -p) and modification time ( -u) restric‐
377 tions, and shall be suppressed if the -k option is also speci‐
378 fied.
379
380 linkdata
381
382 (Applicable only to the -x pax format.) In write mode, pax shall
383 write the contents of a file to the archive even when that file
384 is merely a hard link to a file whose contents have already been
385 written to the archive.
386
387 listopt=format
388
389 This keyword specifies the output format of the table of con‐
390 tents produced when the -v option is specified in list mode. See
391 List Mode Format Specifications . To avoid ambiguity, the
392 listopt= format shall be the only or final keyword= value pair
393 in a -o option-argument; all characters in the remainder of the
394 option-argument shall be considered part of the format string.
395 When multiple -o listopt= format options are specified, the for‐
396 mat strings shall be considered a single, concatenated string,
397 evaluated in command line order.
398
399 times
400
401 (Applicable only to the -x pax format.) When used in write or
402 copy mode, pax shall include atime, ctime, and mtime extended
403 header records for each file. See pax Extended Header File Times
404 .
405
406
407 In addition to these keywords, if the -x pax format is specified, any
408 of the keywords and values defined in pax Extended Header , including
409 implementation extensions, can be used in -o option-arguments, in
410 either of two modes:
411
412 keyword=value
413
414 When used in write or copy mode, these keyword/value pairs shall
415 be included at the beginning of the archive as typeflag g global
416 extended header records. When used in read or list mode, these
417 keyword/value pairs shall act as if they had been at the begin‐
418 ning of the archive as typeflag g global extended header
419 records.
420
421 keyword:=value
422
423 When used in write or copy mode, these keyword/value pairs shall
424 be included as records at the beginning of a typeflag x extended
425 header for each file. (This shall be equivalent to the equal-
426 sign form except that it creates no typeflag g global extended
427 header records.) When used in read or list mode, these key‐
428 word/value pairs shall act as if they were included as records
429 at the end of each extended header; thus, they shall override
430 any global or file-specific extended header record keywords of
431 the same names. For example, in the command:
432
433
434 pax -r -o "
435 gname:=mygroup,
436 " <archive
437
438 the group name will be forced to a new value for all files read
439 from the archive.
440
441
442 The precedence of -o keywords over various fields in the archive is
443 described in pax Extended Header Keyword Precedence .
444
445 -p string
446 Specify one or more file characteristic options (privileges).
447 The string option-argument shall be a string specifying file
448 characteristics to be retained or discarded on extraction. The
449 string shall consist of the specification characters a , e , m ,
450 o , and p . Other implementation-defined characters can be
451 included. Multiple characteristics can be concatenated within
452 the same string and multiple -p options can be specified. The
453 meaning of the specification characters are as follows:
454
455 a
456 Do not preserve file access times.
457
458 e
459 Preserve the user ID, group ID, file mode bits (see the Base
460 Definitions volume of IEEE Std 1003.1-2001, Section 3.168, File
461 Mode Bits), access time, modification time, and any other imple‐
462 mentation-defined file characteristics.
463
464 m
465 Do not preserve file modification times.
466
467 o
468 Preserve the user ID and group ID.
469
470 p
471 Preserve the file mode bits. Other implementation-defined file
472 mode attributes may be preserved.
473
474
475 In the preceding list, "preserve" indicates that an attribute stored in
476 the archive shall be given to the extracted file, subject to the per‐
477 missions of the invoking process. The access and modification times of
478 the file shall be preserved unless otherwise specified with the -p
479 option or not stored in the archive. All attributes that are not pre‐
480 served shall be determined as part of the normal file creation action
481 (see File Read, Write, and Creation ).
482
483 If neither the e nor the o specification character is specified, or the
484 user ID and group ID are not preserved for any reason, pax shall not
485 set the S_ISUID and S_ISGID bits of the file mode.
486
487 If the preservation of any of these items fails for any reason, pax
488 shall write a diagnostic message to standard error. Failure to pre‐
489 serve these items shall affect the final exit status, but shall not
490 cause the extracted file to be deleted.
491
492 If file characteristic letters in any of the string option-arguments
493 are duplicated or conflict with each other, the ones given last shall
494 take precedence. For example, if -p eme is specified, file modification
495 times are preserved.
496
497 -s replstr
498 Modify file or archive member names named by pattern or file op‐
499 erands according to the substitution expression replstr, using
500 the syntax of the ed utility. The concepts of "address" and
501 "line" are meaningless in the context of the pax utility, and
502 shall not be supplied. The format shall be:
503
504
505 -s /old/new/[gp]
506
507 where as in ed, old is a basic regular expression and new can contain
508 an ampersand, '\n' (where n is a digit) backreferences, or subexpres‐
509 sion matching. The old string shall also be permitted to contain <new‐
510 line>s.
511
512 Any non-null character can be used as a delimiter ( '/' shown here).
513 Multiple -s expressions can be specified; the expressions shall be
514 applied in the order specified, terminating with the first successful
515 substitution. The optional trailing 'g' is as defined in the ed util‐
516 ity. The optional trailing 'p' shall cause successful substitutions to
517 be written to standard error. File or archive member names that substi‐
518 tute to the empty string shall be ignored when reading and writing ar‐
519 chives.
520
521 -t When reading files from the file system, and if the user has the
522 permissions required by utime() to do so, set the access time of
523 each file read to the access time that it had before being read
524 by pax.
525
526 -u Ignore files that are older (having a less recent file modifica‐
527 tion time) than a pre-existing file or archive member with the
528 same name. In read mode, an archive member with the same name as
529 a file in the file system shall be extracted if the archive mem‐
530 ber is newer than the file. In write mode, an archive file mem‐
531 ber with the same name as a file in the file system shall be
532 superseded if the file is newer than the archive member. If -a
533 is also specified, this is accomplished by appending to the ar‐
534 chive; otherwise, it is unspecified whether this is accomplished
535 by actual replacement in the archive or by appending to the ar‐
536 chive. In copy mode, the file in the destination hierarchy shall
537 be replaced by the file in the source hierarchy or by a link to
538 the file in the source hierarchy if the file in the source hier‐
539 archy is newer.
540
541 -v In list mode, produce a verbose table of contents (see the STD‐
542 OUT section). Otherwise, write archive member pathnames to stan‐
543 dard error (see the STDERR section).
544
545 -x format
546 Specify the output archive format. The pax utility shall support
547 the following formats:
548
549 cpio
550 The cpio interchange format; see the EXTENDED DESCRIPTION sec‐
551 tion. The default blocksize for this format for character spe‐
552 cial archive files shall be 5120. Implementations shall support
553 all blocksize values less than or equal to 32256 that are multi‐
554 ples of 512.
555
556 pax
557 The pax interchange format; see the EXTENDED DESCRIPTION sec‐
558 tion. The default blocksize for this format for character spe‐
559 cial archive files shall be 5120. Implementations shall support
560 all blocksize values less than or equal to 32256 that are multi‐
561 ples of 512.
562
563 ustar
564 The tar interchange format; see the EXTENDED DESCRIPTION sec‐
565 tion. The default blocksize for this format for character spe‐
566 cial archive files shall be 10240. Implementations shall support
567 all blocksize values less than or equal to 32256 that are multi‐
568 ples of 512.
569
570
571 Implementation-defined formats shall specify a default block size as
572 well as any other block sizes supported for character special archive
573 files.
574
575 Any attempt to append to an archive file in a format different from the
576 existing archive format shall cause pax to exit immediately with a non-
577 zero exit status.
578
579 In copy mode, if no -x format is specified, pax shall behave as if -x
580 pax were specified.
581
582 -X When traversing the file hierarchy specified by a pathname, pax
583 shall not descend into directories that have a different device
584 ID ( st_dev; see the System Interfaces volume of
585 IEEE Std 1003.1-2001, stat()).
586
587
588 The options that operate on the names of files or archive members ( -c,
589 -i, -n, -s, -u, and -v) shall interact as follows. In read mode, the
590 archive members shall be selected based on the user-specified pattern
591 operands as modified by the -c, -n, and -u options. Then, any -s and -i
592 options shall modify, in that order, the names of the selected files.
593 The -v option shall write names resulting from these modifications.
594
595 In write mode, the files shall be selected based on the user-specified
596 pathnames as modified by the -n and -u options. Then, any -s and -i
597 options shall modify, in that order, the names of these selected files.
598 The -v option shall write names resulting from these modifications.
599
600 If both the -u and -n options are specified, pax shall not consider a
601 file selected unless it is newer than the file to which it is compared.
602
603 List Mode Format Specifications
604 In list mode with the -o listopt= format option, the format argument
605 shall be applied for each selected file. The pax utility shall append a
606 <newline> to the listopt output for each selected file. The format
607 argument shall be used as the format string described in the Base Defi‐
608 nitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Nota‐
609 tion, with the exceptions 1. through 5. defined in the EXTENDED
610 DESCRIPTION section of printf, plus the following exceptions:
611
612 6. The sequence ( keyword) can occur before a format conversion
613 specifier. The conversion argument is defined by the value of
614 keyword. The implementation shall support the following key‐
615 words:
616
617 * Any of the Field Name entries in ustar Header Block and
618 Octet-Oriented cpio Archive Entry . The implementation may
619 support the cpio keywords without the leading c_ in addition
620 to the form required by Values for cpio c_mode Field .
621
622 * Any keyword defined for the extended header in pax Extended
623 Header .
624
625 * Any keyword provided as an implementation-defined extension
626 within the extended header defined in pax Extended Header .
627
628 For example, the sequence "%(charset)s" is the string value of the name
629 of the character set in the extended header.
630
631 The result of the keyword conversion argument shall be the value from
632 the applicable header field or extended header, without any trailing
633 NULs.
634
635 All keyword values used as conversion arguments shall be translated
636 from the UTF-8 encoding to the character set appropriate for the local
637 file system, user database, and so on, as applicable.
638
639 7. An additional conversion specifier character, T , shall be used
640 to specify time formats. The T conversion specifier character
641 can be preceded by the sequence ( keyword= subformat), where
642 subformat is a date format as defined by date operands. The
643 default keyword shall be mtime and the default subformat shall
644 be:
645
646
647 %b %e %H:%M %Y
648
649 8. An additional conversion specifier character, M , shall be used
650 to specify the file mode string as defined in ls Standard Out‐
651 put. If ( keyword) is omitted, the mode keyword shall be used.
652 For example, %.1M writes the single character corresponding to
653 the <entry type> field of the ls -l command.
654
655 9. An additional conversion specifier character, D , shall be used
656 to specify the device for block or special files, if applicable,
657 in an implementation-defined format. If not applicable, and (
658 keyword) is specified, then this conversion shall be equivalent
659 to %(keyword)u. If not applicable, and ( keyword) is omitted,
660 then this conversion shall be equivalent to <space>.
661
662 10. An additional conversion specifier character, F , shall be used
663 to specify a pathname. The F conversion character can be pre‐
664 ceded by a sequence of comma-separated keywords:
665
666
667 (keyword[,keyword] ... )
668
669 The values for all the keywords that are non-null shall be concatenated
670 together, each separated by a '/' . The default shall be ( path) if the
671 keyword path is defined; otherwise, the default shall be ( prefix,
672 name).
673
674 11. An additional conversion specifier character, L , shall be used
675 to specify a symbolic line expansion. If the current file is a
676 symbolic link, then %L shall expand to:
677
678
679 "%s -> %s", <value of keyword>, <contents of link>
680
681 Otherwise, the %L conversion specification shall be the equivalent of
682 %F .
683
684
686 The following operands shall be supported:
687
688 directory
689 The destination directory pathname for copy mode.
690
691 file A pathname of a file to be copied or archived.
692
693 pattern
694 A pattern matching one or more pathnames of archive members. A
695 pattern must be given in the name-generating notation of the
696 pattern matching notation in Pattern Matching Notation , includ‐
697 ing the filename expansion rules in Patterns Used for Filename
698 Expansion . The default, if no pattern is specified, is to
699 select all members in the archive.
700
701
703 In write mode, the standard input shall be used only if no file oper‐
704 ands are specified. It shall be a text file containing a list of path‐
705 names, one per line, without leading or trailing <blank>s.
706
707 In list and read modes, if -f is not specified, the standard input
708 shall be an archive file.
709
710 Otherwise, the standard input shall not be used.
711
713 The input file named by the archive option-argument, or standard input
714 when the archive is read from there, shall be a file formatted accord‐
715 ing to one of the specifications in the EXTENDED DESCRIPTION section or
716 some other implementation-defined format.
717
718 The file /dev/tty shall be used to write prompts and read responses.
719
721 The following environment variables shall affect the execution of pax:
722
723 LANG Provide a default value for the internationalization variables
724 that are unset or null. (See the Base Definitions volume of
725 IEEE Std 1003.1-2001, Section 8.2, Internationalization Vari‐
726 ables for the precedence of internationalization variables used
727 to determine the values of locale categories.)
728
729 LC_ALL If set to a non-empty string value, override the values of all
730 the other internationalization variables.
731
732 LC_COLLATE
733
734 Determine the locale for the behavior of ranges, equivalence
735 classes, and multi-character collating elements used in the pat‐
736 tern matching expressions for the pattern operand, the basic
737 regular expression for the -s option, and the extended regular
738 expression defined for the yesexpr locale keyword in the LC_MES‐
739 SAGES category.
740
741 LC_CTYPE
742 Determine the locale for the interpretation of sequences of
743 bytes of text data as characters (for example, single-byte as
744 opposed to multi-byte characters in arguments and input files),
745 the behavior of character classes used in the extended regular
746 expression defined for the yesexpr locale keyword in the LC_MES‐
747 SAGES category, and pattern matching.
748
749 LC_MESSAGES
750 Determine the locale for the processing of affirmative responses
751 that should be used to affect the format and contents of diag‐
752 nostic messages written to standard error.
753
754 LC_TIME
755 Determine the format and contents of date and time strings when
756 the -v option is specified.
757
758 NLSPATH
759 Determine the location of message catalogs for the processing of
760 LC_MESSAGES .
761
762 TMPDIR Determine the pathname that provides part of the default global
763 extended header record file, as described for the -o globexthdr=
764 keyword in the OPTIONS section.
765
766 TZ Determine the timezone used to calculate date and time strings
767 when the -v option is specified. If TZ is unset or null, an
768 unspecified default timezone shall be used.
769
770
772 Default.
773
775 In write mode, if -f is not specified, the standard output shall be the
776 archive formatted according to one of the specifications in the
777 EXTENDED DESCRIPTION section, or some other implementation-defined for‐
778 mat (see -x format).
779
780 In list mode, when the -o listopt= format has been specified, the
781 selected archive members shall be written to standard output using the
782 format described under List Mode Format Specifications . In list mode
783 without the -o listopt= format option, the table of contents of the
784 selected archive members shall be written to standard output using the
785 following format:
786
787
788 "%s\n", <pathname>
789
790 If the -v option is specified in list mode, the table of contents of
791 the selected archive members shall be written to standard output using
792 the following formats.
793
794 For pathnames representing hard links to previous members of the ar‐
795 chive:
796
797
798 "%s == %s\n", <ls -l listing>, <linkname>
799
800 For all other pathnames:
801
802
803 "%s\n", <ls -l listing>
804
805 where <ls -l listing> shall be the format specified by the ls utility
806 with the -l option. When writing pathnames in this format, it is
807 unspecified what is written for fields for which the underlying archive
808 format does not have the correct information, although the correct num‐
809 ber of <blank>-separated fields shall be written.
810
811 In list mode, standard output shall not be buffered more than a line at
812 a time.
813
815 If -v is specified in read, write, or copy modes, pax shall write the
816 pathnames it processes to the standard error output using the following
817 format:
818
819
820 "%s\n", <pathname>
821
822 These pathnames shall be written as soon as processing is begun on the
823 file or archive member, and shall be flushed to standard error. The
824 trailing <newline>, which shall not be buffered, is written when the
825 file has been read or written.
826
827 If the -s option is specified, and the replacement string has a trail‐
828 ing 'p' , substitutions shall be written to standard error in the fol‐
829 lowing format:
830
831
832 "%s >> %s\n", <original pathname>, <new pathname>
833
834 In all operating modes of pax, optional messages of unspecified format
835 concerning the input archive format and volume number, the number of
836 files, blocks, volumes, and media parts as well as other diagnostic
837 messages may be written to standard error.
838
839 In all formats, for both standard output and standard error, it is
840 unspecified how non-printable characters in pathnames or link names are
841 written.
842
843 When pax is in read mode or list mode, using the -x pax archive format,
844 and a filename, link name, owner name, or any other field in an
845 extended header record cannot be translated from the pax UTF-8 codeset
846 format to the codeset and current locale of the implementation, pax
847 shall write a diagnostic message to standard error, shall process the
848 file as described for the -o invalid= option, and then shall process
849 the next file in the archive.
850
852 In read mode, the extracted output files shall be of the archived file
853 type. In copy mode, the copied output files shall be the type of the
854 file being copied. In either mode, existing files in the destination
855 hierarchy shall be overwritten only when all permission ( -p), modifi‐
856 cation time ( -u), and invalid-value ( -o invalid=) tests allow it.
857
858 In write mode, the output file named by the -f option-argument shall be
859 a file formatted according to one of the specifications in the EXTENDED
860 DESCRIPTION section, or some other implementation-defined format.
861
863 pax Interchange Format
864 A pax archive tape or file produced in the -x pax format shall contain
865 a series of blocks. The physical layout of the archive shall be identi‐
866 cal to the ustar format described in ustar Interchange Format . Each
867 file archived shall be represented by the following sequence:
868
869 * An optional header block with extended header records. This header
870 block is of the form described in pax Header Block , with a typeflag
871 value of x or g. The extended header records, described in pax
872 Extended Header , shall be included as the data for this header
873 block.
874
875 * A header block that describes the file. Any fields in the preceding
876 optional extended header shall override the associated fields in
877 this header block for this file.
878
879 * Zero or more blocks that contain the contents of the file.
880
881 At the end of the archive file there shall be two 512-byte blocks
882 filled with binary zeros, interpreted as an end-of-archive indicator.
883
884 A schematic of an example archive with global extended header records
885 and two actual files is shown in pax Format Archive Example . In the
886 example, the second file in the archive has no extended header preced‐
887 ing it, presumably because it has no need for extended attributes.
888
889
890
891 Figure: pax Format Archive Example
892
893 pax Header Block
894 The pax header block shall be identical to the ustar header block
895 described in ustar Interchange Format , except that two additional
896 typeflag values are defined:
897
898 x Represents extended header records for the following file in the
899 archive (which shall have its own ustar header block). The for‐
900 mat of these extended header records shall be as described in
901 pax Extended Header .
902
903 g Represents global extended header records for the following
904 files in the archive. The format of these extended header
905 records shall be as described in pax Extended Header . Each
906 value shall affect all subsequent files that do not override
907 that value in their own extended header record and until another
908 global extended header record is reached that provides another
909 value for the same field. The typeflag g global headers should
910 not be used with interchange media that could suffer partial
911 data loss in transporting the archive.
912
913
914 For both of these types, the size field shall be the size of the
915 extended header records in octets. The other fields in the header block
916 are not meaningful to this version of the pax utility. However, if this
917 archive is read by a pax utility conforming to the ISO POSIX-2:1993
918 standard, the header block fields are used to create a regular file
919 that contains the extended header records as data. Therefore, header
920 block field values should be selected to provide reasonable file access
921 to this regular file.
922
923 A further difference from the ustar header block is that data blocks
924 for files of typeflag 1 (the digit one) (hard link) may be included,
925 which means that the size field may be greater than zero. Archives cre‐
926 ated by pax -o linkdata shall include these data blocks with the hard
927 links.
928
929 pax Extended Header
930 A pax extended header contains values that are inappropriate for the
931 ustar header block because of limitations in that format: fields
932 requiring a character encoding other than that described in the
933 ISO/IEC 646:1991 standard, fields representing file attributes not
934 described in the ustar header, and fields whose format or length do not
935 fit the requirements of the ustar header. The values in an extended
936 header add attributes to the following file (or files; see the descrip‐
937 tion of the typeflag g header block) or override values in the follow‐
938 ing header block(s), as indicated in the following list of keywords.
939
940 An extended header shall consist of one or more records, each con‐
941 structed as follows:
942
943
944 "%d %s=%s\n", <length>, <keyword>, <value>
945
946 The extended header records shall be encoded according to the
947 ISO/IEC 10646-1:2000 standard (UTF-8). The <length> field, <blank>,
948 equals sign, and <newline> shown shall be limited to the portable char‐
949 acter set, as encoded in UTF-8. The <keyword> and <value> fields can be
950 any UTF-8 characters. The <length> field shall be the decimal length of
951 the extended header record in octets, including the trailing <newline>.
952
953 The <keyword> field shall be one of the entries from the following list
954 or a keyword provided as an implementation extension. Keywords consist‐
955 ing entirely of lowercase letters, digits, and periods are reserved for
956 future standardization. A keyword shall not include an equals sign. (In
957 the following list, the notations "file(s)" or "block(s)" is used to
958 acknowledge that a keyword affects the following single file after a
959 typeflag x extended header, but possibly multiple files after typeflag
960 g. Any requirements in the list for pax to include a record when in
961 write or copy mode shall apply only when such a record has not already
962 been provided through the use of the -o option. When used in copy mode,
963 pax shall behave as if an archive had been created with applicable
964 extended header records and then extracted.)
965
966 atime The file access time for the following file(s), equivalent to
967 the value of the st_atime member of the stat structure for a
968 file, as described by the stat() function. The access time shall
969 be restored if the process has the appropriate privilege
970 required to do so. The format of the <value> shall be as
971 described in pax Extended Header File Times .
972
973 charset
974 The name of the character set used to encode the data in the
975 following file(s). The entries in the following table are
976 defined to refer to known standards; additional names may be
977 agreed on between the originator and recipient.
978
979 <value> Formal Standard
980 ISO-IR 646 1990 ISO/IEC 646:1990
981 ISO-IR 8859 1 1998 ISO/IEC 8859-1:1998
982 ISO-IR 8859 2 1999 ISO/IEC 8859-2:1999
983 ISO-IR 8859 3 1999 ISO/IEC 8859-3:1999
984 ISO-IR 8859 4 1998 ISO/IEC 8859-4:1998
985 ISO-IR 8859 5 1999 ISO/IEC 8859-5:1999
986 ISO-IR 8859 6 1999 ISO/IEC 8859-6:1999
987 ISO-IR 8859 7 1987 ISO/IEC 8859-7:1987
988 ISO-IR 8859 8 1999 ISO/IEC 8859-8:1999
989 ISO-IR 8859 9 1999 ISO/IEC 8859-9:1999
990
991 ISO-IR 8859 10 1998 ISO/IEC 8859-10:1998
992 ISO-IR 8859 13 1998 ISO/IEC 8859-13:1998
993 ISO-IR 8859 14 1998 ISO/IEC 8859-14:1998
994 ISO-IR 8859 15 1999 ISO/IEC 8859-15:1999
995 ISO-IR 10646 2000 ISO/IEC 10646:2000
996 ISO-IR 10646 2000 UTF-8 ISO/IEC 10646, UTF-8 encoding
997 BINARY None.
998
999 The encoding is included in an extended header for information only;
1000 when pax is used as described in IEEE Std 1003.1-2001, it shall not
1001 translate the file data into any other encoding. The BINARY entry indi‐
1002 cates unencoded binary data.
1003
1004 When used in write or copy mode, it is implementation-defined whether
1005 pax includes a charset extended header record for a file.
1006
1007 comment
1008 A series of characters used as a comment. All characters in the
1009 <value> field shall be ignored by pax.
1010
1011 ctime The file creation time for the following file(s), equivalent to
1012 the value of the st_ctime member of the stat structure for a
1013 file, as described by the stat() function. The creation time
1014 shall be restored if the process has the appropriate privilege
1015 required to do so. The format of the <value> shall be as
1016 described in pax Extended Header File Times .
1017
1018 gid The group ID of the group that owns the file, expressed as a
1019 decimal number using digits from the ISO/IEC 646:1991 standard.
1020 This record shall override the gid field in the following header
1021 block(s). When used in write or copy mode, pax shall include a
1022 gid extended header record for each file whose group ID is
1023 greater than 2097151 (octal 7777777).
1024
1025 gname The group of the file(s), formatted as a group name in the group
1026 database. This record shall override the gid and gname fields
1027 in the following header block(s), and any gid extended header
1028 record. When used in read, copy, or list mode, pax shall trans‐
1029 late the name from the UTF-8 encoding in the header record to
1030 the character set appropriate for the group database on the
1031 receiving system. If any of the UTF-8 characters cannot be
1032 translated, and if the -o invalid= UTF-8 option is not speci‐
1033 fied, the results are implementation-defined. When used in write
1034 or copy mode, pax shall include a gname extended header record
1035 for each file whose group name cannot be represented entirely
1036 with the letters and digits of the portable character set.
1037
1038 linkpath
1039 The pathname of a link being created to another file, of any
1040 type, previously archived. This record shall override the
1041 linkname field in the following ustar header block(s). The fol‐
1042 lowing ustar header block shall determine the type of link cre‐
1043 ated. If typeflag of the following header block is 1, it shall
1044 be a hard link. If typeflag is 2, it shall be a symbolic link
1045 and the linkpath value shall be the contents of the symbolic
1046 link. The pax utility shall translate the name of the link (con‐
1047 tents of the symbolic link) from the UTF-8 encoding to the char‐
1048 acter set appropriate for the local file system. When used in
1049 write or copy mode, pax shall include a linkpath extended header
1050 record for each link whose pathname cannot be represented
1051 entirely with the members of the portable character set other
1052 than NUL.
1053
1054 mtime The file modification time of the following file(s), equivalent
1055 to the value of the st_mtime member of the stat structure for a
1056 file, as described in the stat() function. This record shall
1057 override the mtime field in the following header block(s). The
1058 modification time shall be restored if the process has the
1059 appropriate privilege required to do so. The format of the
1060 <value> shall be as described in pax Extended Header File Times
1061 .
1062
1063 path The pathname of the following file(s). This record shall over‐
1064 ride the name and prefix fields in the following header
1065 block(s). The pax utility shall translate the pathname of the
1066 file from the UTF-8 encoding to the character set appropriate
1067 for the local file system.
1068
1069 When used in write or copy mode, pax shall include a path extended
1070 header record for each file whose pathname cannot be represented
1071 entirely with the members of the portable character set other than NUL.
1072
1073 realtime.any
1074 The keywords prefixed by "realtime." are reserved for future
1075 standardization.
1076
1077 security.any
1078 The keywords prefixed by "security." are reserved for future
1079 standardization.
1080
1081 size The size of the file in octets, expressed as a decimal number
1082 using digits from the ISO/IEC 646:1991 standard. This record
1083 shall override the size field in the following header block(s).
1084 When used in write or copy mode, pax shall include a size
1085 extended header record for each file with a size value greater
1086 than 8589934591 (octal 77777777777).
1087
1088 uid The user ID of the file owner, expressed as a decimal number
1089 using digits from the ISO/IEC 646:1991 standard. This record
1090 shall override the uid field in the following header block(s).
1091 When used in write or copy mode, pax shall include a uid
1092 extended header record for each file whose owner ID is greater
1093 than 2097151 (octal 7777777).
1094
1095 uname The owner of the following file(s), formatted as a user name in
1096 the user database. This record shall override the uid and uname
1097 fields in the following header block(s), and any uid extended
1098 header record. When used in read, copy, or list mode, pax shall
1099 translate the name from the UTF-8 encoding in the header record
1100 to the character set appropriate for the user database on the
1101 receiving system. If any of the UTF-8 characters cannot be
1102 translated, and if the -o invalid= UTF-8 option is not speci‐
1103 fied, the results are implementation-defined. When used in write
1104 or copy mode, pax shall include a uname extended header record
1105 for each file whose user name cannot be represented entirely
1106 with the letters and digits of the portable character set.
1107
1108
1109 If the <value> field is zero length, it shall delete any header block
1110 field, previously entered extended header value, or global extended
1111 header value of the same name.
1112
1113 If a keyword in an extended header record (or in a -o option-argument)
1114 overrides or deletes a corresponding field in the ustar header block,
1115 pax shall ignore the contents of that header block field.
1116
1117 Unlike the ustar header block fields, NULs shall not delimit <value>s;
1118 all characters within the <value> field shall be considered data for
1119 the field. None of the length limitations of the ustar header block
1120 fields in ustar Header Block shall apply to the extended header
1121 records.
1122
1123 pax Extended Header Keyword Precedence
1124 This section describes the precedence in which the various header
1125 records and fields and command line options are selected to apply to a
1126 file in the archive. When pax is used in read or list modes, it shall
1127 determine a file attribute in the following sequence:
1128
1129 1. If -o delete= keyword-prefix is used, the affected attributes shall
1130 be determined from step 7., if applicable, or ignored otherwise.
1131
1132 2. If -o keyword:= is used, the affected attributes shall be ignored.
1133
1134 3. If -o keyword := value is used, the affected attribute shall be
1135 assigned the value.
1136
1137 4. If there is a typeflag x extended header record, the affected
1138 attribute shall be assigned the <value>. When extended header
1139 records conflict, the last one given in the header shall take
1140 precedence.
1141
1142 5. If -o keyword = value is used, the affected attribute shall be
1143 assigned the value.
1144
1145 6. If there is a typeflag g global extended header record, the
1146 affected attribute shall be assigned the <value>. When global
1147 extended header records conflict, the last one given in the global
1148 header shall take precedence.
1149
1150 7. Otherwise, the attribute shall be determined from the ustar header
1151 block.
1152
1153 pax Extended Header File Times
1154 The pax utility shall write an mtime record for each file in write or
1155 copy modes if the file's modification time cannot be represented
1156 exactly in the ustar header logical record described in ustar Inter‐
1157 change Format . This can occur if the time is out of ustar range, or if
1158 the file system of the underlying implementation supports non-integer
1159 time granularities and the time is not an integer. All of these time
1160 records shall be formatted as a decimal representation of the time in
1161 seconds since the Epoch. If a period ( '.' ) decimal point character is
1162 present, the digits to the right of the point shall represent the units
1163 of a subsecond timing granularity, where the first digit is tenths of a
1164 second and each subsequent digit is a tenth of the previous digit. In
1165 read or copy mode, the pax utility shall truncate the time of a file to
1166 the greatest value that is not greater than the input header file time.
1167 In write or copy mode, the pax utility shall output a time exactly if
1168 it can be represented exactly as a decimal number, and otherwise shall
1169 generate only enough digits so that the same time shall be recovered if
1170 the file is extracted on a system whose underlying implementation sup‐
1171 ports the same time granularity.
1172
1173 ustar Interchange Format
1174 A ustar archive tape or file shall contain a series of logical records.
1175 Each logical record shall be a fixed-size logical record of 512 octets
1176 (see below). Although this format may be thought of as being stored on
1177 9-track industry-standard 12.7 mm (0.5 in) magnetic tape, other types
1178 of transportable media are not excluded. Each file archived shall be
1179 represented by a header logical record that describes the file, fol‐
1180 lowed by zero or more logical records that give the contents of the
1181 file. At the end of the archive file there shall be two 512-octet logi‐
1182 cal records filled with binary zeros, interpreted as an end-of-archive
1183 indicator.
1184
1185 The logical records may be grouped for physical I/O operations, as
1186 described under the -b blocksize and -x ustar options. Each group of
1187 logical records may be written with a single operation equivalent to
1188 the write() function. On magnetic tape, the result of this write shall
1189 be a single tape physical block. The last physical block shall always
1190 be the full size, so logical records after the two zero logical records
1191 may contain undefined data.
1192
1193 The header logical record shall be structured as shown in the following
1194 table. All lengths and offsets are in decimal.
1195
1196 Table: ustar Header Block
1197
1198 Field Name Octet Offset Length (in Octets)
1199 name 0 100
1200 mode 100 8
1201 uid 108 8
1202 gid 116 8
1203 size 124 12
1204 mtime 136 12
1205 chksum 148 8
1206 typeflag 156 1
1207 linkname 157 100
1208 magic 257 6
1209 version 263 2
1210 uname 265 32
1211 gname 297 32
1212 devmajor 329 8
1213 devminor 337 8
1214 prefix 345 155
1215
1216 All characters in the header logical record shall be represented in the
1217 coded character set of the ISO/IEC 646:1991 standard. For maximum
1218 portability between implementations, names should be selected from
1219 characters represented by the portable filename character set as octets
1220 with the most significant bit zero. If an implementation supports the
1221 use of characters outside of slash and the portable filename character
1222 set in names for files, users, and groups, one or more implementation-
1223 defined encodings of these characters shall be provided for interchange
1224 purposes.
1225
1226 However, the pax utility shall never create filenames on the local sys‐
1227 tem that cannot be accessed via the procedures described in
1228 IEEE Std 1003.1-2001. If a filename is found on the medium that would
1229 create an invalid filename, it is implementation-defined whether the
1230 data from the file is stored on the file hierarchy and under what name
1231 it is stored. The pax utility may choose to ignore these files as long
1232 as it produces an error indicating that the file is being ignored.
1233
1234 Each field within the header logical record is contiguous; that is,
1235 there is no padding used. Each character on the archive medium shall be
1236 stored contiguously.
1237
1238 The fields magic, uname, and gname are character strings each termi‐
1239 nated by a NUL character. The fields name, linkname, and prefix are
1240 NUL-terminated character strings except when all characters in the
1241 array contain non-NUL characters including the last character. The ver‐
1242 sion field is two octets containing the characters "00" (zero-zero).
1243 The typeflag contains a single character. All other fields are leading
1244 zero-filled octal numbers using digits from the ISO/IEC 646:1991 stan‐
1245 dard IRV. Each numeric field is terminated by one or more <space> or
1246 NUL characters.
1247
1248 The name and the prefix fields shall produce the pathname of the file.
1249 A new pathname shall be formed, if prefix is not an empty string (its
1250 first character is not NUL), by concatenating prefix (up to the first
1251 NUL character), a slash character, and name; otherwise, name is used
1252 alone. In either case, name is terminated at the first NUL character.
1253 If prefix begins with a NUL character, it shall be ignored. In this
1254 manner, pathnames of at most 256 characters can be supported. If a
1255 pathname does not fit in the space provided, pax shall notify the user
1256 of the error, and shall not store any part of the file-header or data-
1257 on the medium.
1258
1259 The linkname field, described below, shall not use the prefix to pro‐
1260 duce a pathname. As such, a linkname is limited to 100 characters. If
1261 the name does not fit in the space provided, pax shall notify the user
1262 of the error, and shall not attempt to store the link on the medium.
1263
1264 The mode field provides 12 bits encoded in the ISO/IEC 646:1991 stan‐
1265 dard octal digit representation. The encoded bits shall represent the
1266 following values:
1267
1268 Table: ustar mode Field
1269
1270 Bit Value IEEE Std 1003.1-2001 Bit Description
1271 04000 S_ISUID Set UID on execution.
1272 02000 S_ISGID Set GID on execution.
1273 01000 <reserved> Reserved for future standardization.
1274 00400 S_IRUSR Read permission for file owner class.
1275 00200 S_IWUSR Write permission for file owner
1276 class.
1277 00100 S_IXUSR Execute/search permission for file
1278 owner class.
1279 00040 S_IRGRP Read permission for file group class.
1280 00020 S_IWGRP Write permission for file group
1281 class.
1282 00010 S_IXGRP Execute/search permission for file
1283 group class.
1284 00004 S_IROTH Read permission for file other class.
1285 00002 S_IWOTH Write permission for file other
1286 class.
1287 00001 S_IXOTH Execute/search permission for file
1288 other class.
1289
1290 When appropriate privilege is required to set one of these mode bits,
1291 and the user restoring the files from the archive does not have the
1292 appropriate privilege, the mode bits for which the user does not have
1293 appropriate privilege shall be ignored. Some of the mode bits in the
1294 archive format are not mentioned elsewhere in this volume of
1295 IEEE Std 1003.1-2001. If the implementation does not support those
1296 bits, they may be ignored.
1297
1298 The uid and gid fields are the user and group ID of the owner and group
1299 of the file, respectively.
1300
1301 The size field is the size of the file in octets. If the typeflag field
1302 is set to specify a file to be of type 1 (a link) or 2 (a symbolic
1303 link), the size field shall be specified as zero. If the typeflag field
1304 is set to specify a file of type 5 (directory), the size field shall be
1305 interpreted as described under the definition of that record type. No
1306 data logical records are stored for types 1, 2, or 5. If the typeflag
1307 field is set to 3 (character special file), 4 (block special file), or
1308 6 (FIFO), the meaning of the size field is unspecified by this volume
1309 of IEEE Std 1003.1-2001, and no data logical records shall be stored on
1310 the medium. Additionally, for type 6, the size field shall be ignored
1311 when reading. If the typeflag field is set to any other value, the num‐
1312 ber of logical records written following the header shall be (
1313 size+511)/512, ignoring any fraction in the result of the division.
1314
1315 The mtime field shall be the modification time of the file at the time
1316 it was archived. It is the ISO/IEC 646:1991 standard representation of
1317 the octal value of the modification time obtained from the stat() func‐
1318 tion.
1319
1320 The chksum field shall be the ISO/IEC 646:1991 standard IRV representa‐
1321 tion of the octal value of the simple sum of all octets in the header
1322 logical record. Each octet in the header shall be treated as an
1323 unsigned value. These values shall be added to an unsigned integer,
1324 initialized to zero, the precision of which is not less than 17 bits.
1325 When calculating the checksum, the chksum field is treated as if it
1326 were all spaces.
1327
1328 The typeflag field specifies the type of file archived. If a particular
1329 implementation does not recognize the type, or the user does not have
1330 appropriate privilege to create that type, the file shall be extracted
1331 as if it were a regular file if the file type is defined to have a
1332 meaning for the size field that could cause data logical records to be
1333 written on the medium (see the previous description for size). If con‐
1334 version to a regular file occurs, the pax utility shall produce an
1335 error indicating that the conversion took place. All of the typeflag
1336 fields shall be coded in the ISO/IEC 646:1991 standard IRV:
1337
1338 0 Represents a regular file. For backwards-compatibility, a type‐
1339 flag value of binary zero ( '\0' ) should be recognized as mean‐
1340 ing a regular file when extracting files from the archive. Ar‐
1341 chives written with this version of the archive file format cre‐
1342 ate regular files with a typeflag value of the ISO/IEC 646:1991
1343 standard IRV '0' .
1344
1345 1 Represents a file linked to another file, of any type, previ‐
1346 ously archived. Such files are identified by each file having
1347 the same device and file serial number. The linked-to name is
1348 specified in the linkname field with a NUL-character terminator
1349 if it is less than 100 octets in length.
1350
1351 2 Represents a symbolic link. The contents of the symbolic link
1352 shall be stored in the linkname field.
1353
1354 3,4 Represent character special files and block special files
1355 respectively. In this case the devmajor and devminor fields
1356 shall contain information defining the device, the format of
1357 which is unspecified by this volume of IEEE Std 1003.1-2001.
1358 Implementations may map the device specifications to their own
1359 local specification or may ignore the entry.
1360
1361 5 Specifies a directory or subdirectory. On systems where disk
1362 allocation is performed on a directory basis, the size field
1363 shall contain the maximum number of octets (which may be rounded
1364 to the nearest disk block allocation unit) that the directory
1365 may hold. A size field of zero indicates no such limiting. Sys‐
1366 tems that do not support limiting in this manner should ignore
1367 the size field.
1368
1369 6 Specifies a FIFO special file. Note that the archiving of a FIFO
1370 file archives the existence of this file and not its contents.
1371
1372 7 Reserved to represent a file to which an implementation has
1373 associated some high-performance attribute. Implementations
1374 without such extensions should treat this file as a regular file
1375 (type 0).
1376
1377 A-Z The letters 'A' to 'Z' , inclusive, are reserved for custom
1378 implementations. All other values are reserved for future ver‐
1379 sions of IEEE Std 1003.1-2001.
1380
1381
1382 Attempts to archive a socket using ustar interchange format shall pro‐
1383 duce a diagnostic message. Handling of other file types is implementa‐
1384 tion-defined.
1385
1386 The magic field is the specification that this archive was output in
1387 this archive format. If this field contains ustar (the five characters
1388 from the ISO/IEC 646:1991 standard IRV shown followed by NUL), the
1389 uname and gname fields shall contain the ISO/IEC 646:1991 standard IRV
1390 representation of the owner and group of the file, respectively (trun‐
1391 cated to fit, if necessary). When the file is restored by a privileged,
1392 protection-preserving version of the utility, the user and group data‐
1393 bases shall be scanned for these names. If found, the user and group
1394 IDs contained within these files shall be used rather than the values
1395 contained within the uid and gid fields.
1396
1397 cpio Interchange Format
1398 The octet-oriented cpio archive format shall be a series of entries,
1399 each comprising a header that describes the file, the name of the file,
1400 and then the contents of the file.
1401
1402 An archive may be recorded as a series of fixed-size blocks of octets.
1403 This blocking shall be used only to make physical I/O more efficient.
1404 The last group of blocks shall always be at the full size.
1405
1406 For the octet-oriented cpio archive format, the individual entry infor‐
1407 mation shall be in the order indicated and described by the following
1408 table; see also the <cpio.h> header.
1409
1410 Table: Octet-Oriented cpio Archive Entry
1411
1412 Header Field Name Length (in Octets) Interpreted as
1413 c_magic 6 Octal number
1414 c_dev 6 Octal number
1415 c_ino 6 Octal number
1416 c_mode 6 Octal number
1417 c_uid 6 Octal number
1418 c_gid 6 Octal number
1419 c_nlink 6 Octal number
1420 c_rdev 6 Octal number
1421 c_mtime 11 Octal number
1422 c_namesize 6 Octal number
1423 c_filesize 11 Octal number
1424 Filename Field Name Length Interpreted as
1425 c_name c_namesize Pathname string
1426 File Data Field Name Length Interpreted as
1427 c_filedata c_filesize Data
1428
1429 cpio Header
1430 For each file in the archive, a header as defined previously shall be
1431 written. The information in the header fields is written as streams of
1432 the ISO/IEC 646:1991 standard characters interpreted as octal numbers.
1433 The octal numbers shall be extended to the necessary length by append‐
1434 ing the ISO/IEC 646:1991 standard IRV zeros at the most-significant-
1435 digit end of the number; the result is written to the most-significant
1436 digit of the stream of octets first. The fields shall be interpreted as
1437 follows:
1438
1439 c_magic
1440 Identify the archive as being a transportable archive by con‐
1441 taining the identifying value "070707" .
1442
1443 c_dev, c_ino
1444 Contains values that uniquely identify the file within the ar‐
1445 chive (that is, no files contain the same pair of c_dev and
1446 c_ino values unless they are links to the same file). The values
1447 shall be determined in an unspecified manner.
1448
1449 c_mode Contains the file type and access permissions as defined in the
1450 following table.
1451
1452 Table: Values for cpio c_mode Field
1453
1454 File Permissions Name Value Indicates
1455 C_IRUSR 000400 Read by owner
1456 C_IWUSR 000200 Write by owner
1457 C_IXUSR 000100 Execute by owner
1458 C_IRGRP 000040 Read by group
1459 C_IWGRP 000020 Write by group
1460 C_IXGRP 000010 Execute by group
1461 C_IROTH 000004 Read by others
1462 C_IWOTH 000002 Write by others
1463 C_IXOTH 000001 Execute by others
1464 C_ISUID 004000 Set uid
1465 C_ISGID 002000 Set gid
1466 C_ISVTX 001000 Reserved
1467 File Type Name Value Indicates
1468 C_ISDIR 040000 Directory
1469 C_ISFIFO 010000 FIFO
1470 C_ISREG 0100000 Regular file
1471 C_ISLNK 0120000 Symbolic link
1472 C_ISBLK 060000 Block special file
1473 C_ISCHR 020000 Character special file
1474 C_ISSOCK 0140000 Socket
1475 C_ISCTG 0110000 Reserved
1476
1477 Directories, FIFOs, symbolic links, and regular files shall be sup‐
1478 ported on a system conforming to this volume of IEEE Std 1003.1-2001;
1479 additional values defined previously are reserved for compatibility
1480 with existing systems. Additional file types may be supported; how‐
1481 ever, such files should not be written to archives intended to be
1482 transported to other systems.
1483
1484 c_uid Contains the user ID of the owner.
1485
1486 c_gid Contains the group ID of the group.
1487
1488 c_nlink
1489 Contains the number of links referencing the file at the time
1490 the archive was created.
1491
1492 c_rdev Contains implementation-defined information for character or
1493 block special files.
1494
1495 c_mtime
1496 Contains the latest time of modification of the file at the time
1497 the archive was created.
1498
1499 c_namesize
1500 Contains the length of the pathname, including the terminating
1501 NUL character.
1502
1503 c_filesize
1504 Contains the length of the file in octets. This shall be the
1505 length of the data section following the header structure.
1506
1507
1508 cpio Filename
1509 The c_name field shall contain the pathname of the file. The length of
1510 this field in octets is the value of c_namesize.
1511
1512 If a filename is found on the medium that would create an invalid path‐
1513 name, it is implementation-defined whether the data from the file is
1514 stored on the file hierarchy and under what name it is stored.
1515
1516 All characters shall be represented in the ISO/IEC 646:1991 standard
1517 IRV. For maximum portability between implementations, names should be
1518 selected from characters represented by the portable filename character
1519 set as octets with the most significant bit zero. If an implementation
1520 supports the use of characters outside the portable filename character
1521 set in names for files, users, and groups, one or more implementation-
1522 defined encodings of these characters shall be provided for interchange
1523 purposes. However, the pax utility shall never create filenames on the
1524 local system that cannot be accessed via the procedures described pre‐
1525 viously in this volume of IEEE Std 1003.1-2001. If a filename is found
1526 on the medium that would create an invalid filename, it is implementa‐
1527 tion-defined whether the data from the file is stored on the local file
1528 system and under what name it is stored. The pax utility may choose to
1529 ignore these files as long as it produces an error indicating that the
1530 file is being ignored.
1531
1532 cpio File Data
1533 Following c_name, there shall be c_filesize octets of data. Interpreta‐
1534 tion of such data occurs in a manner dependent on the file. If c_file‐
1535 size is zero, no data shall be contained in c_filedata.
1536
1537 When restoring from an archive:
1538
1539 * If the user does not have the appropriate privilege to create a file
1540 of the specified type, pax shall ignore the entry and write an error
1541 message to standard error.
1542
1543 * Only regular files have data to be restored. Presuming a regular
1544 file meets any selection criteria that might be imposed on the for‐
1545 mat-reading utility by the user, such data shall be restored.
1546
1547 * If a user does not have appropriate privilege to set a particular
1548 mode flag, the flag shall be ignored. Some of the mode flags in the
1549 archive format are not mentioned elsewhere in this volume of
1550 IEEE Std 1003.1-2001. If the implementation does not support those
1551 flags, they may be ignored.
1552
1553 cpio Special Entries
1554 FIFO special files, directories, and the trailer shall be recorded with
1555 c_filesize equal to zero. For other special files, c_filesize is
1556 unspecified by this volume of IEEE Std 1003.1-2001. The header for the
1557 next file entry in the archive shall be written directly after the last
1558 octet of the file entry preceding it. A header denoting the filename
1559 TRAILER!!! shall indicate the end of the archive; the contents of
1560 octets in the last block of the archive following such a header are
1561 undefined.
1562
1564 The following exit values shall be returned:
1565
1566 0 All files were processed successfully.
1567
1568 >0 An error occurred.
1569
1570
1572 If pax cannot create a file or a link when reading an archive or cannot
1573 find a file when writing an archive, or cannot preserve the user ID,
1574 group ID, or file mode when the -p option is specified, a diagnostic
1575 message shall be written to standard error and a non-zero exit status
1576 shall be returned, but processing shall continue. In the case where pax
1577 cannot create a link to a file, pax shall not, by default, create a
1578 second copy of the file.
1579
1580 If the extraction of a file from an archive is prematurely terminated
1581 by a signal or error, pax may have only partially extracted the file or
1582 (if the -n option was not specified) may have extracted a file of the
1583 same name as that specified by the user, but which is not the file the
1584 user wanted. Additionally, the file modes of extracted directories may
1585 have additional bits from the S_IRWXU mask set as well as incorrect
1586 modification and access times.
1587
1588 The following sections are informative.
1589
1591 The -p (privileges) option was invented to reconcile differences
1592 between historical tar and cpio implementations. In particular, the two
1593 utilities use -m in diametrically opposed ways. The -p option also pro‐
1594 vides a consistent means of extending the ways in which future file
1595 attributes can be addressed, such as for enhanced security systems or
1596 high-performance files. Although it may seem complex, there are really
1597 two modes that are most commonly used:
1598
1599 -p e ``Preserve everything". This would be used by the historical
1600 superuser, someone with all the appropriate privileges, to pre‐
1601 serve all aspects of the files as they are recorded in the ar‐
1602 chive. The e flag is the sum of o and p, and other implementa‐
1603 tion-defined attributes.
1604
1605 -p p ``Preserve" the file mode bits. This would be used by the user
1606 with regular privileges who wished to preserve aspects of the
1607 file other than the ownership. The file times are preserved by
1608 default, but two other flags are offered to disable these and
1609 use the time of extraction.
1610
1611
1612 The one pathname per line format of standard input precludes pathnames
1613 containing <newline>s. Although such pathnames violate the portable
1614 filename guidelines, they may exist and their presence may inhibit
1615 usage of pax within shell scripts. This problem is inherited from his‐
1616 torical archive programs. The problem can be avoided by listing file‐
1617 name arguments on the command line instead of on standard input.
1618
1619 It is almost certain that appropriate privileges are required for pax
1620 to accomplish parts of this volume of IEEE Std 1003.1-2001. Specifi‐
1621 cally, creating files of type block special or character special,
1622 restoring file access times unless the files are owned by the user (the
1623 -t option), or preserving file owner, group, and mode (the -p option)
1624 all probably require appropriate privileges.
1625
1626 In read mode, implementations are permitted to overwrite files when the
1627 archive has multiple members with the same name. This may fail if per‐
1628 missions on the first version of the file do not permit it to be over‐
1629 written.
1630
1631 The cpio and ustar formats can only support files up to 8589934592
1632 bytes (8 * 2^30) in size.
1633
1635 The following command:
1636
1637
1638 pax -w -f /dev/rmt/1m .
1639
1640 copies the contents of the current directory to tape drive 1, medium
1641 density (assuming historical System V device naming procedures-the his‐
1642 torical BSD device name would be /dev/rmt9).
1643
1644 The following commands:
1645
1646
1647 mkdir newdirpax -rw olddir newdir
1648
1649 copy the olddir directory hierarchy to newdir.
1650
1651
1652 pax -r -s ',^//*usr//*,,' -f a.pax
1653
1654 reads the archive a.pax, with all files rooted in /usr in the archive
1655 extracted relative to the current directory.
1656
1657 Using the option:
1658
1659
1660 -o listopt="%M %(atime)T %(size)D %(name)s"
1661
1662 overrides the default output description in Standard Output and instead
1663 writes:
1664
1665
1666 -rw-rw--- Jan 12 15:53 1492 /usr/foo/bar
1667
1668 Using the options:
1669
1670
1671 -o listopt='%L\t%(size)D\n%.7' \
1672 -o listopt='(name)s\n%(ctime)T\n%T'
1673
1674 overrides the default output description in Standard Output and instead
1675 writes:
1676
1677
1678 /usr/foo/bar -> /tmp 1492
1679 /usr/fo
1680 Jan 12 1991
1681 Jan 31 15:53
1682
1684 The pax utility was new for the ISO POSIX-2:1993 standard. It repre‐
1685 sents a peaceful compromise between advocates of the historical tar and
1686 cpio utilities.
1687
1688 A fundamental difference between cpio and tar was in the way directo‐
1689 ries were treated. The cpio utility did not treat directories differ‐
1690 ently from other files, and to select a directory and its contents
1691 required that each file in the hierarchy be explicitly specified. For
1692 tar, a directory matched every file in the file hierarchy it rooted.
1693
1694 The pax utility offers both interfaces; by default, directories map
1695 into the file hierarchy they root. The -d option causes pax to skip any
1696 file not explicitly referenced, as cpio historically did. The tar -
1697 style behavior was chosen as the default because it was believed that
1698 this was the more common usage and because tar is the more commonly
1699 available interface, as it was historically provided on both System V
1700 and BSD implementations.
1701
1702 The data interchange format specification in this volume of
1703 IEEE Std 1003.1-2001 requires that processes with "appropriate privi‐
1704 leges" shall always restore the ownership and permissions of extracted
1705 files exactly as archived. If viewed from the historic equivalence
1706 between superuser and "appropriate privileges", there are two problems
1707 with this requirement. First, users running as superusers may unknow‐
1708 ingly set dangerous permissions on extracted files. Second, it is need‐
1709 lessly limiting, in that superusers cannot extract files and own them
1710 as superuser unless the archive was created by the superuser. (It
1711 should be noted that restoration of ownerships and permissions for the
1712 superuser, by default, is historical practice in cpio, but not in tar.)
1713 In order to avoid these two problems, the pax specification has an
1714 additional "privilege" mechanism, the -p option. Only a pax invocation
1715 with the privileges needed, and which has the -p option set using the e
1716 specification character, has the "appropriate privilege" to restore
1717 full ownership and permission information.
1718
1719 Note also that this volume of IEEE Std 1003.1-2001 requires that the
1720 file ownership and access permissions shall be set, on extraction, in
1721 the same fashion as the creat() function when provided with the mode
1722 stored in the archive. This means that the file creation mask of the
1723 user is applied to the file permissions.
1724
1725 Users should note that directories may be created by pax while extract‐
1726 ing files with permissions that are different from those that existed
1727 at the time the archive was created. When extracting sensitive informa‐
1728 tion into a directory hierarchy that no longer exists, users are
1729 encouraged to set their file creation mask appropriately to protect
1730 these files during extraction.
1731
1732 The table of contents output is written to standard output to facili‐
1733 tate pipeline processing.
1734
1735 An early proposal had hard links displaying for all pathnames. This was
1736 removed because it complicates the output of the case where -v is not
1737 specified and does not match historical cpio usage. The hard-link
1738 information is available in the -v display.
1739
1740 The description of the -l option allows implementations to make hard
1741 links to symbolic links. IEEE Std 1003.1-2001 does not specify any way
1742 to create a hard link to a symbolic link, but many implementations pro‐
1743 vide this capability as an extension. If there are hard links to sym‐
1744 bolic links when an archive is created, the implementation is required
1745 to archive the hard link in the archive (unless -H or -L is specified).
1746 When in read mode and in copy mode, implementations supporting hard
1747 links to symbolic links should use them when appropriate.
1748
1749 The archive formats inherited from the POSIX.1-1990 standard have cer‐
1750 tain restrictions that have been brought along from historical usage.
1751 For example, there are restrictions on the length of pathnames stored
1752 in the archive. When pax is used in copy( -rw) mode (copying directory
1753 hierarchies), the ability to use extensions from the -x pax format
1754 overcomes these restrictions.
1755
1756 The default blocksize value of 5120 bytes for cpio was selected because
1757 it is one of the standard block-size values for cpio, set when the -B
1758 option is specified. (The other default block-size value for cpio is
1759 512 bytes, and this was considered to be too small.) The default block
1760 value of 10240 bytes for tar was selected because that is the standard
1761 block-size value for BSD tar. The maximum block size of 32256 bytes
1762 (2**15-512 bytes) is the largest multiple of 512 bytes that fits into a
1763 signed 16-bit tape controller transfer register. There are known limi‐
1764 tations in some historical systems that would prevent larger blocks
1765 from being accepted. Historical values were chosen to improve compati‐
1766 bility with historical scripts using dd or similar utilities to manipu‐
1767 late archives. Also, default block sizes for any file type other than
1768 character special file has been deleted from this volume of
1769 IEEE Std 1003.1-2001 as unimportant and not likely to affect the struc‐
1770 ture of the resulting archive.
1771
1772 Implementations are permitted to modify the block-size value based on
1773 the archive format or the device to which the archive is being written.
1774 This is to provide implementations with the opportunity to take advan‐
1775 tage of special types of devices, and it should not be used without a
1776 great deal of consideration as it almost certainly decreases archive
1777 portability.
1778
1779 The intended use of the -n option was to permit extraction of one or
1780 more files from the archive without processing the entire archive. This
1781 was viewed by the standard developers as offering significant perfor‐
1782 mance advantages over historical implementations. The -n option in
1783 early proposals had three effects; the first was to cause special char‐
1784 acters in patterns to not be treated specially. The second was to cause
1785 only the first file that matched a pattern to be extracted. The third
1786 was to cause pax to write a diagnostic message to standard error when
1787 no file was found matching a specified pattern. Only the second behav‐
1788 ior is retained by this volume of IEEE Std 1003.1-2001, for many rea‐
1789 sons. First, it is in general not acceptable for a single option to
1790 have multiple effects. Second, the ability to make pattern matching
1791 characters act as normal characters is useful for parts of pax other
1792 than file extraction. Third, a finer degree of control over the spe‐
1793 cial characters is useful because users may wish to normalize only a
1794 single special character in a single filename. Fourth, given a more
1795 general escape mechanism, the previous behavior of the -n option can be
1796 easily obtained using the -s option or a sed script. Finally, writing
1797 a diagnostic message when a pattern specified by the user is unmatched
1798 by any file is useful behavior in all cases.
1799
1800 In this version, the -n was removed from the copy mode synopsis of pax;
1801 it is inapplicable because there are no pattern operands specified in
1802 this mode.
1803
1804 There is another method than pax for copying subtrees in
1805 IEEE Std 1003.1-2001 described as part of the cp utility. Both methods
1806 are historical practice: cp provides a simpler, more intuitive inter‐
1807 face, while pax offers a finer granularity of control. Each provides
1808 additional functionality to the other; in particular, pax maintains the
1809 hard-link structure of the hierarchy while cp does not. It is the
1810 intention of the standard developers that the results be similar (using
1811 appropriate option combinations in both utilities). The results are not
1812 required to be identical; there seemed insufficient gain to applica‐
1813 tions to balance the difficulty of implementations having to guarantee
1814 that the results would be exactly identical.
1815
1816 A single archive may span more than one file. It is suggested that
1817 implementations provide informative messages to the user on standard
1818 error whenever the archive file is changed.
1819
1820 The -d option (do not create intermediate directories not listed in the
1821 archive) found in early proposals was originally provided as a comple‐
1822 ment to the historic -d option of cpio. It has been deleted.
1823
1824 The -s option in early proposals specified a subset of the substitution
1825 command from the ed utility. As there was no reason for only a subset
1826 to be supported, the -s option is now compatible with the current ed
1827 specification. Since the delimiter can be any non-null character, the
1828 following usage with single spaces is valid:
1829
1830
1831 pax -s " foo bar " ...
1832
1833 The -t description is worded so as to note that this may cause the
1834 access time update caused by some other activity (which occurs while
1835 the file is being read) to be overwritten.
1836
1837 The default behavior of pax with regard to file modification times is
1838 the same as historical implementations of tar. It is not the historical
1839 behavior of cpio.
1840
1841 Because the -i option uses /dev/tty, utilities without a controlling
1842 terminal are not able to use this option.
1843
1844 The -y option, found in early proposals, has been deleted because a
1845 line containing a single period for the -i option has equivalent func‐
1846 tionality. The special lines for the -i option (a single period and the
1847 empty line) are historical practice in cpio.
1848
1849 In early drafts, a -e charmap option was included to increase portabil‐
1850 ity of files between systems using different coded character sets. This
1851 option was omitted because it was apparent that consensus could not be
1852 formed for it. In this version, the use of UTF-8 should be an adequate
1853 substitute.
1854
1855 The -k option was added to address international concerns about the
1856 dangers involved in the character set transformations of -e (if the
1857 target character set were different from the source, the filenames
1858 might be transformed into names matching existing files) and also was
1859 made more general to protect files transferred between file systems
1860 with different {NAME_MAX} values (truncating a filename on a smaller
1861 system might also inadvertently overwrite existing files). As stated,
1862 it prevents any overwriting, even if the target file is older than the
1863 source. This version adds more granularity of options to solve this
1864 problem by introducing the -o invalid= option-specifically the UTF-8
1865 action. (Note that an existing file that is named with a UTF-8 encoding
1866 is still subject to overwriting in this case. The -k option closes that
1867 loophole.)
1868
1869 Some of the file characteristics referenced in this volume of
1870 IEEE Std 1003.1-2001 might not be supported by some archive formats.
1871 For example, neither the tar nor cpio formats contain the file access
1872 time. For this reason, the e specification character has been provided,
1873 intended to cause all file characteristics specified in the archive to
1874 be retained.
1875
1876 It is required that extracted directories, by default, have their
1877 access and modification times and permissions set to the values speci‐
1878 fied in the archive. This has obvious problems in that the directories
1879 are almost certainly modified after being extracted and that directory
1880 permissions may not permit file creation. One possible solution is to
1881 create directories with the mode specified in the archive, as modified
1882 by the umask of the user, with sufficient permissions to allow file
1883 creation. After all files have been extracted, pax would then reset the
1884 access and modification times and permissions as necessary.
1885
1886 The list-mode formatting description borrows heavily from the one
1887 defined by the printf utility. However, since there is no separate op‐
1888 erand list to get conversion arguments, the format was extended to
1889 allow specifying the name of the conversion argument as part of the
1890 conversion specification.
1891
1892 The T conversion specifier allows time fields to be displayed in any of
1893 the date formats. Unlike the ls utility, pax does not adjust the format
1894 when the date is less than six months in the past. This makes parsing
1895 the output more predictable.
1896
1897 The D conversion specifier handles the ability to display the
1898 major/minor or file size, as with ls, by using %-8(size)D.
1899
1900 The L conversion specifier handles the ls display for symbolic links.
1901
1902 Conversion specifiers were added to generate existing known types used
1903 for ls.
1904
1905 pax Interchange Format
1906 The new POSIX data interchange format was developed primarily to sat‐
1907 isfy international concerns that the ustar and cpio formats did not
1908 provide for file, user, and group names encoded in characters outside a
1909 subset of the ISO/IEC 646:1991 standard. The standard developers real‐
1910 ized that this new POSIX data interchange format should be very exten‐
1911 sible because there were other requirements they foresaw in the near
1912 future:
1913
1914 * Support international character encodings and locale information
1915
1916 * Support security information (ACLs, and so on)
1917
1918 * Support future file types, such as realtime or contiguous files
1919
1920 * Include data areas for implementation use
1921
1922 * Support systems with words larger than 32 bits and timers with sub‐
1923 second granularity
1924
1925 The following were not goals for this format because these are better
1926 handled by separate utilities or are inappropriate for a portable for‐
1927 mat:
1928
1929 * Encryption
1930
1931 * Compression
1932
1933 * Data translation between locales and codesets
1934
1935 * inode storage
1936
1937 The format chosen to support the goals is an extension of the ustar
1938 format. Of the two formats previously available, only the ustar format
1939 was selected for extensions because:
1940
1941 * It was easier to extend in an upwards-compatible way. It offered
1942 version flags and header block type fields with room for future
1943 standardization. The cpio format, while possessing a more flexible
1944 file naming methodology, could not be extended without breaking some
1945 theoretical implementation or using a dummy filename that could be a
1946 legitimate filename.
1947
1948 * Industry experience since the original " tar wars" fought in devel‐
1949 oping the ISO POSIX-1 standard has clearly been in favor of the
1950 ustar format, which is generally the default output format selected
1951 for pax implementations on new systems.
1952
1953 The new format was designed with one additional goal in mind: reason‐
1954 able behavior when an older tar or pax utility happened to read an ar‐
1955 chive. Since the POSIX.1-1990 standard mandated that a "format-reading
1956 utility" had to treat unrecognized typeflag values as regular files,
1957 this allowed the format to include all the extended information in a
1958 pseudo-regular file that preceded each real file. An option is given
1959 that allows the archive creator to set up reasonable names for these
1960 files on the older systems. Also, the normative text suggests that rea‐
1961 sonable file access values be used for this ustar header block. Making
1962 these header files inaccessible for convenient reading and deleting
1963 would not be reasonable. File permissions of 600 or 700 are suggested.
1964
1965 The ustar typeflag field was used to accommodate the additional func‐
1966 tionality of the new format rather than magic or version because the
1967 POSIX.1-1990 standard (and, by reference, the previous version of pax),
1968 mandated the behavior of the format-reading utility when it encountered
1969 an unknown typeflag, but was silent about the other two fields.
1970
1971 Early proposals of the first revision to IEEE Std 1003.1-2001 contained
1972 a proposed archive format that was based on compatibility with the
1973 standard for tape files (ISO 1001, similar to the format used histori‐
1974 cally on many mainframes and minicomputers). This format was overly
1975 complex and required considerable overhead in volume and header
1976 records. Furthermore, the standard developers felt that it would not be
1977 acceptable to the community of POSIX developers, so it was later
1978 changed to be a format more closely related to historical practice on
1979 POSIX systems.
1980
1981 The prefix and name split of pathnames in ustar was replaced by the
1982 single path extended header record for simplicity.
1983
1984 The concept of a global extended header ( typeflag g) was controver‐
1985 sial. If this were applied to an archive being recorded on magnetic
1986 tape, a few unreadable blocks at the beginning of the tape could be a
1987 serious problem; a utility attempting to extract as many files as pos‐
1988 sible from a damaged archive could lose a large percentage of file
1989 header information in this case. However, if the archive were on a
1990 reliable medium, such as a CD-ROM, the global extended header offers
1991 considerable potential size reductions by eliminating redundant infor‐
1992 mation. Thus, the text warns against using the global method for unre‐
1993 liable media and provides a method for implanting global information in
1994 the extended header for each file, rather than in the typeflag g
1995 records.
1996
1997 No facility for data translation or filtering on a per-file basis is
1998 included because the standard developers could not invent an interface
1999 that would allow this in an efficient manner. If a filter, such as
2000 encryption or compression, is to be applied to all the files, it is
2001 more efficient to apply the filter to the entire archive as a single
2002 file. The standard developers considered interfaces that would invoke a
2003 shell script for each file going into or out of the archive, but the
2004 system overhead in this approach was considered to be too high.
2005
2006 One such approach would be to have filter= records that give a pathname
2007 for an executable. When the program is invoked, the file and archive
2008 would be open for standard input/output and all the header fields would
2009 be available as environment variables or command-line arguments. The
2010 standard developers did discuss such schemes, but they were omitted
2011 from IEEE Std 1003.1-2001 due to concerns about excessive overhead.
2012 Also, the program itself would need to be in the archive if it were to
2013 be used portably.
2014
2015 There is currently no portable means of identifying the character
2016 set(s) used for a file in the file system. Therefore, pax has not been
2017 given a mechanism to generate charset records automatically. The only
2018 portable means of doing this is for the user to write the archive using
2019 the -o charset= string command line option. This assumes that all of
2020 the files in the archive use the same encoding. The "implementation-
2021 defined" text is included to allow for a system that can identify the
2022 encodings used for each of its files.
2023
2024 The table of standards that accompanies the charset record description
2025 is acknowledged to be very limited. Only a limited number of character
2026 set standards is reasonable for maximal interchange. Any character set
2027 is, of course, possible by prior agreement. It was suggested that
2028 EBCDIC be listed, but it was omitted because it is not defined by a
2029 formal standard. Formal standards, and then only those with reasonably
2030 large followings, can be included here, simply as a matter of practi‐
2031 cality. The <value>s represent names of officially registered character
2032 sets in the format required by the ISO 2375:1985 standard.
2033
2034 The normal comma or <blank>-separated list rules are not followed in
2035 the case of keyword options to allow ease of argument parsing for
2036 getopts.
2037
2038 Further information on character encodings is in pax Archive Character
2039 Set Encoding/Decoding .
2040
2041 The standard developers have reserved keyword name space for vendor
2042 extensions. It is suggested that the format to be used is:
2043
2044
2045 VENDOR.keyword
2046
2047 where VENDOR is the name of the vendor or organization in all uppercase
2048 letters. It is further suggested that the keyword following the period
2049 be named differently than any of the standard keywords so that it could
2050 be used for future standardization, if appropriate, by omitting the
2051 VENDOR prefix.
2052
2053 The <length> field in the extended header record was included to make
2054 it simpler to step through the records, even if a record contains an
2055 unknown format (to a particular pax) with complex interactions of spe‐
2056 cial characters. It also provides a minor integrity checkpoint within
2057 the records to aid a program attempting to recover files from a damaged
2058 archive.
2059
2060 There are no extended header versions of the devmajor and devminor
2061 fields because the unspecified format ustar header field should be suf‐
2062 ficient. If they are not, vendor-specific extended keywords (such as
2063 VENDOR.devmajor) should be used.
2064
2065 Device and i-number labeling of files was not adopted from cpio; files
2066 are interchanged strictly on a symbolic name basis, as in ustar.
2067
2068 Just as with the ustar format descriptions, the new format makes no
2069 special arrangements for multi-volume archives. Each of the pax archive
2070 types is assumed to be inside a single POSIX file and splitting that
2071 file over multiple volumes (diskettes, tape cartridges, and so on),
2072 processing their labels, and mounting each in the proper sequence are
2073 considered to be implementation details that cannot be described
2074 portably.
2075
2076 The pax format is intended for interchange, not only for backup on a
2077 single (family of) systems. It is not as densely packed as might be
2078 possible for backup:
2079
2080 * It contains information as coded characters that could be coded in
2081 binary.
2082
2083 * It identifies extended records with name fields that could be omit‐
2084 ted in favor of a fixed-field layout.
2085
2086 * It translates names into a portable character set and identifies
2087 locale-related information, both of which are probably unnecessary
2088 for backup.
2089
2090 The requirements on restoring from an archive are slightly different
2091 from the historical wording, allowing for non-monolithic privilege to
2092 bring forward as much as possible. In particular, attributes such as
2093 "high performance file" might be broadly but not universally granted
2094 while set-user-ID or chown() might be much more restricted. There is
2095 no implication in IEEE Std 1003.1-2001 that the security information be
2096 honored after it is restored to the file hierarchy, in spite of what
2097 might be improperly inferred by the silence on that topic. That is a
2098 topic for another standard.
2099
2100 Links are recorded in the fashion described here because a link can be
2101 to any file type. It is desirable in general to be able to restore part
2102 of an archive selectively and restore all of those files completely. If
2103 the data is not associated with each link, it is not possible to do
2104 this. However, the data associated with a file can be large, and when
2105 selective restoration is not needed, this can be a significant burden.
2106 The archive is structured so that files that have no associated data
2107 can always be restored by the name of any link name of any link, and
2108 the user may choose whether data is recorded with each instance of a
2109 file that contains data. The format permits mixing of both types of
2110 links in a single archive; this can be done for special needs, and pax
2111 is expected to interpret such archives on input properly, despite the
2112 fact that there is no pax option that would force this mixed case on
2113 output. (When -o linkdata is used, the output must contain the dupli‐
2114 cate data, but the implementation is free to include it or omit it when
2115 -o linkdata is not used.)
2116
2117 The time values are included as extended header records for those
2118 implementations needing more than the eleven octal digits allowed by
2119 the ustar format. Portable file timestamps cannot be negative. If pax
2120 encounters a file with a negative timestamp in copy or write mode, it
2121 can reject the file, substitute a non-negative timestamp, or generate a
2122 non-portable timestamp with a leading '-' . Even though some implemen‐
2123 tations can support finer file-time granularities than seconds, the
2124 normative text requires support only for seconds since the Epoch
2125 because the ISO POSIX-1 standard states them that way. The ustar format
2126 includes only mtime; the new format adds atime and ctime for symmetry.
2127 The atime access time restored to the file system will be affected by
2128 the -p a and -p e options. The ctime creation time (actually inode
2129 modification time) is described with "appropriate privilege" so that it
2130 can be ignored when writing to the file system. POSIX does not provide
2131 a portable means to change file creation time. Nothing is intended to
2132 prevent a non-portable implementation of pax from restoring the value.
2133
2134 The gid, size, and uid extended header records were included to allow
2135 expansion beyond the sizes specified in the regular tar header. New
2136 file system architectures are emerging that will exhaust the 12-digit
2137 size field. There are probably not many systems requiring more than 8
2138 digits for user and group IDs, but the extended header values were
2139 included for completeness, allowing overrides for all of the decimal
2140 values in the tar header.
2141
2142 The standard developers intended to describe the effective results of
2143 pax with regard to file ownerships and permissions; implementations are
2144 not restricted in timing or sequencing the restoration of such, pro‐
2145 vided the results are as specified.
2146
2147 Much of the text describing the extended headers refers to use in "
2148 write or copy modes". The copy mode references are due to the normative
2149 text: "The effect of the copy shall be as if the copied files were
2150 written to an archive file and then subsequently extracted ...". There
2151 is certainly no way to test whether pax is actually generating the
2152 extended headers in copy mode, but the effects must be as if it had.
2153
2154 pax Archive Character Set Encoding/Decoding
2155 There is a need to exchange archives of files between systems of dif‐
2156 ferent native codesets. Filenames, group names, and user names must be
2157 preserved to the fullest extent possible when an archive is read on the
2158 receiving platform. Translation of the contents of files is not within
2159 the scope of the pax utility.
2160
2161 There will also be the need to represent characters that are not avail‐
2162 able on the receiving platform. These unsupported characters cannot be
2163 automatically folded to the local set of characters due to the chance
2164 of collisions. This could result in overwriting previous extracted
2165 files from the archive or pre-existing files on the system.
2166
2167 For these reasons, the codeset used to represent characters within the
2168 extended header records of the pax archive must be sufficiently rich to
2169 handle all commonly used character sets. The fields requiring transla‐
2170 tion include, at a minimum, filenames, user names, group names, and
2171 link pathnames. Implementations may wish to have localized extended
2172 keywords that use non-portable characters.
2173
2174 The standard developers considered the following options:
2175
2176 * The archive creator specifies the well-defined name of the source
2177 codeset. The receiver must then recognize the codeset name and per‐
2178 form the appropriate translations to the destination codeset.
2179
2180 * The archive creator includes within the archive the character map‐
2181 ping table for the source codeset used to encode extended header
2182 records. The receiver must then read the character mapping table and
2183 perform the appropriate translations to the destination codeset.
2184
2185 * The archive creator translates the extended header records in the
2186 source codeset into a canonical form. The receiver must then perform
2187 the appropriate translations to the destination codeset.
2188
2189 The approach that incorporates the name of the source codeset poses the
2190 problem of codeset name registration, and makes the archive useless to
2191 pax archive decoders that do not recognize that codeset.
2192
2193 Because parts of an archive may be corrupted, the standard developers
2194 felt that including the character map of the source codeset was too
2195 fragile. The loss of this one key component could result in making the
2196 entire archive useless. (The difference between this and the global
2197 extended header decision was that the latter has a workaround-duplicat‐
2198 ing extended header records on unreliable media-but this would be too
2199 burdensome for large character set maps.)
2200
2201 Both of the above approaches also put an undue burden on the pax ar‐
2202 chive receiver to handle the cross-product of all source and destina‐
2203 tion codesets.
2204
2205 To simplify the translation from the source codeset to the canonical
2206 form and from the canonical form to the destination codeset, the stan‐
2207 dard developers decided that the internal representation should be a
2208 stateless encoding. A stateless encoding is one where each codepoint
2209 has the same meaning, without regard to the decoder being in a specific
2210 state. An example of a stateful encoding would be the Japanese Shift-
2211 JIS; an example of a stateless encoding would be the ISO/IEC 646:1991
2212 standard (equivalent to 7-bit ASCII).
2213
2214 For these reasons, the standard developers decided to adopt a canonical
2215 format for the representation of file information strings. The obvious,
2216 well-endorsed candidate is the ISO/IEC 10646-1:2000 standard (based in
2217 part on Unicode), which can be used to represent the characters of vir‐
2218 tually all standardized character sets. The standard developers ini‐
2219 tially agreed upon using UCS2 (16-bit Unicode) as the internal repre‐
2220 sentation. This repertoire of characters provides a sufficiently rich
2221 set to represent all commonly-used codesets.
2222
2223 However, the standard developers found that the 16-bit Unicode repre‐
2224 sentation had some problems. It forced the issue of standardizing byte
2225 ordering. The 2-byte length of each character made the extended header
2226 records twice as long for the case of strings coded entirely from his‐
2227 torical 7-bit ASCII. For these reasons, the standard developers chose
2228 the UTF-8 defined in the ISO/IEC 10646-1:2000 standard. This multi-byte
2229 representation encodes UCS2 or UCS4 characters reliably and determinis‐
2230 tically, eliminating the need for a canonical byte ordering. In addi‐
2231 tion, NUL octets and other characters possibly confusing to POSIX file
2232 systems do not appear, except to represent themselves. It was realized
2233 that certain national codesets take up more space after the encoding,
2234 due to their placement within the UCS range; it was felt that the use‐
2235 fulness of the encoding of the names outweighs the disadvantage of size
2236 increase for file, user, and group names.
2237
2238 The encoding of UTF-8 is as follows:
2239
2240
2241 UCS4 Hex Encoding UTF-8 Binary Encoding
2242
2243
2244 00000000-0000007F 0xxxxxxx
2245 00000080-000007FF 110xxxxx 10xxxxxx
2246 00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
2247 00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
2248 00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
2249 04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
2250
2251 where each 'x' represents a bit value from the character being trans‐
2252 lated.
2253
2254 ustar Interchange Format
2255 The description of the ustar format reflects numerous enhancements over
2256 pre-1988 versions of the historical tar utility. The goal of these
2257 changes was not only to provide the functional enhancements desired,
2258 but also to retain compatibility between new and old versions. This
2259 compatibility has been retained. Archives written using the old ar‐
2260 chive format are compatible with the new format.
2261
2262 Implementors should be aware that the previous file format did not
2263 include a mechanism to archive directory type files. For this reason,
2264 the convention of using a filename ending with slash was adopted to
2265 specify a directory on the archive.
2266
2267 The total size of the name and prefix fields have been set to meet the
2268 minimum requirements for {PATH_MAX}. If a pathname will fit within the
2269 name field, it is recommended that the pathname be stored there without
2270 the use of the prefix field. Although the name field is known to be too
2271 small to contain {PATH_MAX} characters, the value was not changed in
2272 this version of the archive file format to retain backwards-compatibil‐
2273 ity, and instead the prefix was introduced. Also, because of the ear‐
2274 lier version of the format, there is no way to remove the restriction
2275 on the linkname field being limited in size to just that of the name
2276 field.
2277
2278 The size field is required to be meaningful in all implementation
2279 extensions, although it could be zero. This is required so that the
2280 data blocks can always be properly counted.
2281
2282 It is suggested that if device special files need to be represented
2283 that cannot be represented in the standard format, that one of the
2284 extension types ( A- Z) be used, and that the additional information
2285 for the special file be represented as data and be reflected in the
2286 size field.
2287
2288 Attempting to restore a special file type, where it is converted to
2289 ordinary data and conflicts with an existing filename, need not be spe‐
2290 cially detected by the utility. If run as an ordinary user, pax should
2291 not be able to overwrite the entries in, for example, /dev in any case
2292 (whether the file is converted to another type or not). If run as a
2293 privileged user, it should be able to do so, and it would be considered
2294 a bug if it did not. The same is true of ordinary data files and simi‐
2295 larly named special files; it is impossible to anticipate the needs of
2296 the user (who could really intend to overwrite the file), so the behav‐
2297 ior should be predictable (and thus regular) and rely on the protection
2298 system as required.
2299
2300 The value 7 in the typeflag field is intended to define how contiguous
2301 files can be stored in a ustar archive. IEEE Std 1003.1-2001 does not
2302 require the contiguous file extension, but does define a standard way
2303 of archiving such files so that all conforming systems can interpret
2304 these file types in a meaningful and consistent manner. On a system
2305 that does not support extended file types, the pax utility should do
2306 the best it can with the file and go on to the next.
2307
2308 The file protection modes are those conventionally used by the ls util‐
2309 ity. This is extended beyond the usage in the ISO POSIX-2 standard to
2310 support the "shared text" or "sticky" bit. It is intended that the con‐
2311 formance document should not document anything beyond the existence of
2312 and support of such a mode. Further extensions are expected to these
2313 bits, particularly with overloading the set-user-ID and set-group-ID
2314 flags.
2315
2316 cpio Interchange Format
2317 The reference to appropriate privilege in the cpio format refers to an
2318 error on standard output; the ustar format does not make comparable
2319 statements.
2320
2321 The model for this format was the historical System V cpio -c data
2322 interchange format. This model documents the portable version of the
2323 cpio format and not the binary version. It has the flexibility to
2324 transfer data of any type described within IEEE Std 1003.1-2001, yet is
2325 extensible to transfer data types specific to extensions beyond
2326 IEEE Std 1003.1-2001 (for example, contiguous files). Because it
2327 describes existing practice, there is no question of maintaining
2328 upwards-compatibility.
2329
2330 cpio Header
2331 There has been some concern that the size of the c_ino field of the
2332 header is too small to handle those systems that have very large inode
2333 numbers. However, the c_ino field in the header is used strictly as a
2334 hard-link resolution mechanism for archives. It is not necessarily the
2335 same value as the inode number of the file in the location from which
2336 that file is extracted.
2337
2338 The name c_magic is based on historical usage.
2339
2340 cpio Filename
2341 For most historical implementations of the cpio utility, {PATH_MAX}
2342 octets can be used to describe the pathname without the addition of any
2343 other header fields (the NUL character would be included in this
2344 count). {PATH_MAX} is the minimum value for pathname size, documented
2345 as 256 bytes. However, an implementation may use c_namesize to deter‐
2346 mine the exact length of the pathname. With the current description of
2347 the <cpio.h> header, this pathname size can be as large as a number
2348 that is described in six octal digits.
2349
2350 Two values are documented under the c_mode field values to provide for
2351 extensibility for known file types:
2352
2353 0110 000
2354 Reserved for contiguous files. The implementation may treat the
2355 rest of the information for this archive like a regular file.
2356 If this file type is undefined, the implementation may create
2357 the file as a regular file.
2358
2359
2360 This provides for extensibility of the cpio format while allowing for
2361 the ability to read old archives. Files of an unknown type may be read
2362 as "regular files" on some implementations. On a system that does not
2363 support extended file types, the pax utility should do the best it can
2364 with the file and go on to the next.
2365
2367 None.
2368
2370 Shell Command Language , cp , ed , getopts , ls , printf() , the Base
2371 Definitions volume of IEEE Std 1003.1-2001, <cpio.h>, the System Inter‐
2372 faces volume of IEEE Std 1003.1-2001, chown(), creat(), mkdir(),
2373 mkfifo(), stat(), utime(), write()
2374
2376 Portions of this text are reprinted and reproduced in electronic form
2377 from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
2378 -- Portable Operating System Interface (POSIX), The Open Group Base
2379 Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
2380 Electrical and Electronics Engineers, Inc and The Open Group. In the
2381 event of any discrepancy between this version and the original IEEE and
2382 The Open Group Standard, the original IEEE and The Open Group Standard
2383 is the referee document. The original Standard can be obtained online
2384 at http://www.opengroup.org/unix/online.html .
2385
2386
2387
2388IEEE/The Open Group 2003 PAX(P)