1PAX(1P) POSIX Programmer's Manual PAX(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
11
13 pax — portable archive interchange
14
16 pax [−dv] [−c|−n] [−H|−L] [−o options] [−f archive] [−s replstr]...
17 [pattern...]
18
19 pax −r[−c|−n] [−dikuv] [−H|−L] [−f archive] [−o options]... [−p string]...
20 [−s replstr]... [pattern...]
21
22 pax −w [−dituvX] [−H|−L] [−b blocksize] [[−a] [−f archive]] [−o options]...
23 [−s replstr]... [−x format] [file...]
24
25 pax −r −w [−diklntuvX] [−H|−L] [−o options]... [−p string]...
26 [−s replstr]... [file...] directory
27
29 The pax utility shall read, write, and write lists of the members of
30 archive files and copy directory hierarchies. A variety of archive for‐
31 mats shall be supported; see the −x format option.
32
33 The action to be taken depends on the presence of the −r and −w
34 options. The four combinations of −r and −w are referred to as the four
35 modes of operation: list, read, write, and copy modes, corresponding
36 respectively to the four forms shown in the SYNOPSIS section.
37
38 list In list mode (when neither −r nor −w are specified), pax
39 shall write the names of the members of the archive file read
40 from the standard input, with pathnames matching the speci‐
41 fied patterns, to standard output. If a named file is of type
42 directory, the file hierarchy rooted at that file shall be
43 listed as well.
44
45 read In read mode (when −r is specified, but −w is not), pax shall
46 extract the members of the archive file read from the stan‐
47 dard input, with pathnames matching the specified patterns.
48 If an extracted file is of type directory, the file hierarchy
49 rooted at that file shall be extracted as well. The extracted
50 files shall be created performing pathname resolution with
51 the directory in which pax was invoked as the current working
52 directory.
53
54 If an attempt is made to extract a directory when the direc‐
55 tory already exists, this shall not be considered an error.
56 If an attempt is made to extract a FIFO when the FIFO already
57 exists, this shall not be considered an error.
58
59 The ownership, access, and modification times, and file mode
60 of the restored files are discussed under the −p option.
61
62 write In write mode (when −w is specified, but −r is not), pax
63 shall write the contents of the file operands to the standard
64 output in an archive format. If no file operands are speci‐
65 fied, a list of files to copy, one per line, shall be read
66 from the standard input and each entry in this list shall be
67 processed as if it had been a file operand on the command
68 line. A file of type directory shall include all of the files
69 in the file hierarchy rooted at the file.
70
71 copy In copy mode (when both −r and −w are specified), pax shall
72 copy the file operands to the destination directory.
73
74 If no file operands are specified, a list of files to copy,
75 one per line, shall be read from the standard input. A file
76 of type directory shall include all of the files in the file
77 hierarchy rooted at the file.
78
79 The effect of the copy shall be as if the copied files were
80 written to a pax format archive file and then subsequently
81 extracted, except that there may be hard links between the
82 original and the copied files. If the destination directory
83 is a subdirectory of one of the files to be copied, the
84 results are unspecified. If the destination directory is a
85 file of a type not defined by the System Interfaces volume of
86 POSIX.1‐2008, the results are implementation-defined; other‐
87 wise, it shall be an error for the file named by the direc‐
88 tory operand not to exist, not be writable by the user, or
89 not be a file of type directory.
90
91 In read or copy modes, if intermediate directories are necessary to
92 extract an archive member, pax shall perform actions equivalent to the
93 mkdir() function defined in the System Interfaces volume of
94 POSIX.1‐2008, called with the following arguments:
95
96 * The intermediate directory used as the path argument
97
98 * The value of the bitwise-inclusive OR of S_IRWXU, S_IRWXG, and
99 S_IRWXO as the mode argument
100
101 If any specified pattern or file operands are not matched by at least
102 one file or archive member, pax shall write a diagnostic message to
103 standard error for each one that did not match and exit with a non-zero
104 exit status.
105
106 The archive formats described in the EXTENDED DESCRIPTION section shall
107 be automatically detected on input. The default output archive format
108 shall be implementation-defined.
109
110 A single archive can span multiple files. The pax utility shall deter‐
111 mine, in an implementation-defined manner, what file to read or write
112 as the next file.
113
114 If the selected archive format supports the specification of linked
115 files, it shall be an error if these files cannot be linked when the
116 archive is extracted. For archive formats that do not store file con‐
117 tents with each name that causes a hard link, if the file that contains
118 the data is not extracted during this pax session, either the data
119 shall be restored from the original file, or a diagnostic message shall
120 be displayed with the name of a file that can be used to extract the
121 data. In traversing directories, pax shall detect infinite loops; that
122 is, entering a previously visited directory that is an ancestor of the
123 last file visited. When it detects an infinite loop, pax shall write a
124 diagnostic message to standard error and shall terminate.
125
127 The pax utility shall conform to the Base Definitions volume of
128 POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines, except that the
129 order of presentation of the −o, −p, and −s options is significant.
130
131 The following options shall be supported:
132
133 −r Read an archive file from standard input.
134
135 −w Write files to the standard output in the specified archive
136 format.
137
138 −a Append files to the end of the archive. It is implementation-
139 defined which devices on the system support appending. Addi‐
140 tional file formats unspecified by this volume of
141 POSIX.1‐2008 may impose restrictions on appending.
142
143 −b blocksize
144 Block the output at a positive decimal integer number of
145 bytes per write to the archive file. Devices and archive for‐
146 mats may impose restrictions on blocking. Blocking shall be
147 automatically determined on input. Conforming applications
148 shall not specify a blocksize value larger than 32256.
149 Default blocking when creating archives depends on the ar‐
150 chive format. (See the −x option below.)
151
152 −c Match all file or archive members except those specified by
153 the pattern or file operands.
154
155 −d Cause files of type directory being copied or archived or ar‐
156 chive members of type directory being extracted or listed to
157 match only the file or archive member itself and not the file
158 hierarchy rooted at the file.
159
160 −f archive
161 Specify the pathname of the input or output archive, overrid‐
162 ing the default standard input (in list or read modes) or
163 standard output (write mode).
164
165 −H If a symbolic link referencing a file of type directory is
166 specified on the command line, pax shall archive the file
167 hierarchy rooted in the file referenced by the link, using
168 the name of the link as the root of the file hierarchy. Oth‐
169 erwise, if a symbolic link referencing a file of any other
170 file type which pax can normally archive is specified on the
171 command line, then pax shall archive the file referenced by
172 the link, using the name of the link. The default behavior,
173 when neither −H or −L are specified, shall be to archive the
174 symbolic link itself.
175
176 −i Interactively rename files or archive members. For each ar‐
177 chive member matching a pattern operand or file matching a
178 file operand, a prompt shall be written to the file /dev/tty.
179 The prompt shall contain the name of the file or archive mem‐
180 ber, but the format is otherwise unspecified. A line shall
181 then be read from /dev/tty. If this line is blank, the file
182 or archive member shall be skipped. If this line consists of
183 a single period, the file or archive member shall be pro‐
184 cessed with no modification to its name. Otherwise, its name
185 shall be replaced with the contents of the line. The pax
186 utility shall immediately exit with a non-zero exit status if
187 end-of-file is encountered when reading a response or if
188 /dev/tty cannot be opened for reading and writing.
189
190 The results of extracting a hard link to a file that has been
191 renamed during extraction are unspecified.
192
193 −k Prevent the overwriting of existing files.
194
195 −l (The letter ell.) In copy mode, hard links shall be made
196 between the source and destination file hierarchies whenever
197 possible. If specified in conjunction with −H or −L, when a
198 symbolic link is encountered, the hard link created in the
199 destination file hierarchy shall be to the file referenced by
200 the symbolic link. If specified when neither −H nor −L is
201 specified, when a symbolic link is encountered, the implemen‐
202 tation shall create a hard link to the symbolic link in the
203 source file hierarchy or copy the symbolic link to the desti‐
204 nation.
205
206 −L If a symbolic link referencing a file of type directory is
207 specified on the command line or encountered during the tra‐
208 versal of a file hierarchy, pax shall archive the file hier‐
209 archy rooted in the file referenced by the link, using the
210 name of the link as the root of the file hierarchy. Other‐
211 wise, if a symbolic link referencing a file of any other file
212 type which pax can normally archive is specified on the com‐
213 mand line or encountered during the traversal of a file hier‐
214 archy, pax shall archive the file referenced by the link,
215 using the name of the link. The default behavior, when nei‐
216 ther −H or −L are specified, shall be to archive the symbolic
217 link itself.
218
219 −n Select the first archive member that matches each pattern op‐
220 erand. No more than one archive member shall be matched for
221 each pattern (although members of type directory shall still
222 match the file hierarchy rooted at that file).
223
224 −o options
225 Provide information to the implementation to modify the algo‐
226 rithm for extracting or writing files. The value of options
227 shall consist of one or more <comma>-separated keywords of
228 the form:
229
230 keyword[[:]=value][,keyword[[:]=value], ...]
231
232 Some keywords apply only to certain file formats, as indi‐
233 cated with each description. Use of keywords that are inap‐
234 plicable to the file format being processed produces unde‐
235 fined results.
236
237 Keywords in the options argument shall be a string that would
238 be a valid portable filename as described in the Base Defini‐
239 tions volume of POSIX.1‐2008, Section 3.278, Portable File‐
240 name Character Set.
241
242 Note: Keywords are not expected to be filenames, merely
243 to follow the same character composition rules as
244 portable filenames.
245
246 Keywords can be preceded with white space. The value field
247 shall consist of zero or more characters; within value, the
248 application shall precede any literal <comma> with a <back‐
249 slash>, which shall be ignored, but preserves the <comma> as
250 part of value. A <comma> as the final character, or a
251 <comma> followed solely by white space as the final charac‐
252 ters, in options shall be ignored. Multiple −o options can be
253 specified; if keywords given to these multiple −o options
254 conflict, the keywords and values appearing later in command
255 line sequence shall take precedence and the earlier shall be
256 silently ignored. The following keyword values of options
257 shall be supported for the file formats as indicated:
258
259 delete=pattern
260 (Applicable only to the −x pax format.) When used in
261 write or copy mode, pax shall omit from extended header
262 records that it produces any keywords matching the
263 string pattern. When used in read or list mode, pax
264 shall ignore any keywords matching the string pattern
265 in the extended header records. In both cases, matching
266 shall be performed using the pattern matching notation
267 described in Section 2.13.1, Patterns Matching a Single
268 Character and Section 2.13.2, Patterns Matching Multi‐
269 ple Characters. For example:
270
271 −o delete=security.*
272
273 would suppress security-related information. See pax
274 Extended Header for extended header record keyword
275 usage.
276
277 When multiple −odelete=pattern options are specified,
278 the patterns shall be additive; all keywords matching
279 the specified string patterns shall be omitted from
280 extended header records that pax produces.
281
282 exthdr.name=string
283 (Applicable only to the −x pax format.) This keyword
284 allows user control over the name that is written into
285 the ustar header blocks for the extended header pro‐
286 duced under the circumstances described in pax Header
287 Block. The name shall be the contents of string, after
288 the following character substitutions have been made:
289
290 ┌──────────┬────────────────────────────────────────┐
291 │ string │ │
292 │Includes: │ Replaced by: │
293 ├──────────┼────────────────────────────────────────┤
294 │%d │ The directory name of the file, equiv‐ │
295 │ │ alent to the result of the dirname │
296 │ │ utility on the translated pathname. │
297 │%f │ The filename of the file, equivalent │
298 │ │ to the result of the basename utility │
299 │ │ on the translated pathname. │
300 │%p │ The process ID of the pax process. │
301 │%% │ A '%' character. │
302 └──────────┴────────────────────────────────────────┘
303 Any other '%' characters in string produce undefined
304 results.
305
306 If no −o exthdr.name=string is specified, pax shall use
307 the following default value:
308
309 %d/PaxHeaders.%p/%f
310
311 globexthdr.name=string
312 (Applicable only to the −x pax format.) When used in
313 write or copy mode with the appropriate options, pax
314 shall create global extended header records with ustar
315 header blocks that will be treated as regular files by
316 previous versions of pax. This keyword allows user
317 control over the name that is written into the ustar
318 header blocks for global extended header records. The
319 name shall be the contents of string, after the follow‐
320 ing character substitutions have been made:
321
322 ┌──────────┬────────────────────────────────────────┐
323 │ string │ │
324 │Includes: │ Replaced by: │
325 ├──────────┼────────────────────────────────────────┤
326 │%n │ An integer that represents the │
327 │ │ sequence number of the global extended │
328 │ │ header record in the archive, starting │
329 │ │ at 1. │
330 │%p │ The process ID of the pax process. │
331 │%% │ A '%' character. │
332 └──────────┴────────────────────────────────────────┘
333 Any other '%' characters in string produce undefined
334 results.
335
336 If no −o globexthdr.name=string is specified, pax shall
337 use the following default value:
338
339 $TMPDIR/GlobalHead.%p.%n
340
341 where $TMPDIR represents the value of the TMPDIR envi‐
342 ronment variable. If TMPDIR is not set, pax shall use
343 /tmp.
344
345 invalid=action
346 (Applicable only to the −x pax format.) This keyword
347 allows user control over the action pax takes upon
348 encountering values in an extended header record that,
349 in read or copy mode, are invalid in the destination
350 hierarchy or, in list mode, cannot be written in the
351 codeset and current locale of the implementation. The
352 following are invalid values that shall be recognized
353 by pax:
354
355 -- In read or copy mode, a filename or link name that
356 contains character encodings invalid in the desti‐
357 nation hierarchy. (For example, the name may con‐
358 tain embedded NULs.)
359
360 -- In read or copy mode, a filename or link name that
361 is longer than the maximum allowed in the destina‐
362 tion hierarchy (for either a pathname component or
363 the entire pathname).
364
365 -- In list mode, any character string value (filename,
366 link name, user name, and so on) that cannot be
367 written in the codeset and current locale of the
368 implementation.
369
370 The following mutually-exclusive values of the action
371 argument are supported:
372
373 binary In write mode, pax shall generate a hdr‐
374 charset=BINARY extended header record for
375 each file with a filename, link name, group
376 name, owner name, or any other field in an
377 extended header record that cannot be trans‐
378 lated to the UTF‐8 codeset, allowing the ar‐
379 chive to contain the files with unencoded
380 extended header record values. In read or
381 copy mode, pax shall use the values specified
382 in the header without translation, regardless
383 of whether this may overwrite an existing
384 file with a valid name. In list mode, pax
385 shall behave identically to the bypass
386 action.
387
388 bypass In read or copy mode, pax shall bypass the
389 file, causing no change to the destination
390 hierarchy. In list mode, pax shall write all
391 requested valid values for the file, but its
392 method for writing invalid values is unspeci‐
393 fied.
394
395 rename In read or copy mode, pax shall act as if the
396 −i option were in effect for each file with
397 invalid filename or link name values, allow‐
398 ing the user to provide a replacement name
399 interactively. In list mode, pax shall
400 behave identically to the bypass action.
401
402 UTF‐8 When used in read, copy, or list mode and a
403 filename, link name, owner name, or any other
404 field in an extended header record cannot be
405 translated from the pax UTF‐8 codeset format
406 to the codeset and current locale of the
407 implementation, pax shall use the actual
408 UTF‐8 encoding for the name. If a hdrcharset
409 extended header record is in effect for this
410 file, the character set specified by that
411 record shall be used instead of UTF‐8. If a
412 hdrcharset=BINARY extended header record is
413 in effect for this file, no translation shall
414 be performed.
415
416 write In read or copy mode, pax shall write the
417 file, translating the name, regardless of
418 whether this may overwrite an existing file
419 with a valid name. In list mode, pax shall
420 behave identically to the bypass action.
421
422 If no −o invalid=option is specified, pax shall act as
423 if −oinvalid=bypass were specified. Any overwriting of
424 existing files that may be allowed by the −oinvalid=
425 actions shall be subject to permission (−p) and modifi‐
426 cation time (−u) restrictions, and shall be suppressed
427 if the −k option is also specified.
428
429 linkdata
430 (Applicable only to the −x pax format.) In write mode,
431 pax shall write the contents of a file to the archive
432 even when that file is merely a hard link to a file
433 whose contents have already been written to the ar‐
434 chive.
435
436 listopt=format
437 This keyword specifies the output format of the table
438 of contents produced when the −v option is specified in
439 list mode. See List Mode Format Specifications. To
440 avoid ambiguity, the listopt=format shall be the only
441 or final keyword=value pair in a −o option-argument;
442 all characters in the remainder of the option-argument
443 shall be considered part of the format string. When
444 multiple −olistopt=format options are specified, the
445 format strings shall be considered a single, concate‐
446 nated string, evaluated in command line order.
447
448 times
449 (Applicable only to the −x pax format.) When used in
450 write or copy mode, pax shall include atime and mtime
451 extended header records for each file. See pax Extended
452 Header File Times.
453
454 In addition to these keywords, if the −x pax format is speci‐
455 fied, any of the keywords and values defined in pax Extended
456 Header, including implementation extensions, can be used in
457 −o option-arguments, in either of two modes:
458
459 keyword=value
460 When used in write or copy mode, these keyword/value
461 pairs shall be included at the beginning of the archive
462 as typeflag g global extended header records. When used
463 in read or list mode, these keyword/value pairs shall
464 act as if they had been at the beginning of the archive
465 as typeflag g global extended header records.
466
467 keyword:=value
468 When used in write or copy mode, these keyword/value
469 pairs shall be included as records at the beginning of
470 a typeflag x extended header for each file. (This shall
471 be equivalent to the <equals-sign> form except that it
472 creates no typeflag g global extended header records.)
473 When used in read or list mode, these keyword/value
474 pairs shall act as if they were included as records at
475 the end of each extended header; thus, they shall over‐
476 ride any global or file-specific extended header record
477 keywords of the same names. For example, in the com‐
478 mand:
479
480 pax −r −o "
481 gname:=mygroup,
482 " <archive
483
484 the group name will be forced to a new value for all
485 files read from the archive.
486
487 The precedence of −o keywords over various fields in the ar‐
488 chive is described in pax Extended Header Keyword Precedence.
489
490 −p string Specify one or more file characteristic options (privileges).
491 The string option-argument shall be a string specifying file
492 characteristics to be retained or discarded on extraction.
493 The string shall consist of the specification characters a,
494 e, m, o, and p. Other implementation-defined characters can
495 be included. Multiple characteristics can be concatenated
496 within the same string and multiple −p options can be speci‐
497 fied. The meaning of the specification characters are as fol‐
498 lows:
499
500 a Do not preserve file access times.
501
502 e Preserve the user ID, group ID, file mode bits (see the
503 Base Definitions volume of POSIX.1‐2008, Section 3.169,
504 File Mode Bits), access time, modification time, and
505 any other implementation-defined file characteristics.
506
507 m Do not preserve file modification times.
508
509 o Preserve the user ID and group ID.
510
511 p Preserve the file mode bits. Other implementation-
512 defined file mode attributes may be preserved.
513
514 In the preceding list, ``preserve'' indicates that an
515 attribute stored in the archive shall be given to the
516 extracted file, subject to the permissions of the invoking
517 process. The access and modification times of the file shall
518 be preserved unless otherwise specified with the −p option or
519 not stored in the archive. All attributes that are not pre‐
520 served shall be determined as part of the normal file cre‐
521 ation action (see Section 1.1.1.4, File Read, Write, and Cre‐
522 ation).
523
524 If neither the e nor the o specification character is speci‐
525 fied, or the user ID and group ID are not preserved for any
526 reason, pax shall not set the S_ISUID and S_ISGID bits of the
527 file mode.
528
529 If the preservation of any of these items fails for any rea‐
530 son, pax shall write a diagnostic message to standard error.
531 Failure to preserve these items shall affect the final exit
532 status, but shall not cause the extracted file to be deleted.
533
534 If file characteristic letters in any of the string option-
535 arguments are duplicated or conflict with each other, the
536 ones given last shall take precedence. For example, if −p eme
537 is specified, file modification times are preserved.
538
539 −s replstr
540 Modify file or archive member names named by pattern or file
541 operands according to the substitution expression replstr,
542 using the syntax of the ed utility. The concepts of
543 ``address'' and ``line'' are meaningless in the context of
544 the pax utility, and shall not be supplied. The format shall
545 be:
546
547 −s /old/new/[gp]
548
549 where as in ed, old is a basic regular expression and new can
550 contain an <ampersand>, '\n' (where n is a digit) back-refer‐
551 ences, or subexpression matching. The old string shall also
552 be permitted to contain <newline> characters.
553
554 Any non-null character can be used as a delimiter ('/' shown
555 here). Multiple −s expressions can be specified; the expres‐
556 sions shall be applied in the order specified, terminating
557 with the first successful substitution. The optional trail‐
558 ing 'g' is as defined in the ed utility. The optional trail‐
559 ing 'p' shall cause successful substitutions to be written to
560 standard error. File or archive member names that substitute
561 to the empty string shall be ignored when reading and writing
562 archives.
563
564 −t When reading files from the file system, and if the user has
565 the permissions required by utime() to do so, set the access
566 time of each file read to the access time that it had before
567 being read by pax.
568
569 −u Ignore files that are older (having a less recent file modi‐
570 fication time) than a pre-existing file or archive member
571 with the same name. In read mode, an archive member with the
572 same name as a file in the file system shall be extracted if
573 the archive member is newer than the file. In write mode, an
574 archive file member with the same name as a file in the file
575 system shall be superseded if the file is newer than the ar‐
576 chive member. If −a is also specified, this is accomplished
577 by appending to the archive; otherwise, it is unspecified
578 whether this is accomplished by actual replacement in the ar‐
579 chive or by appending to the archive. In copy mode, the file
580 in the destination hierarchy shall be replaced by the file in
581 the source hierarchy or by a link to the file in the source
582 hierarchy if the file in the source hierarchy is newer.
583
584 −v In list mode, produce a verbose table of contents (see the
585 STDOUT section). Otherwise, write archive member pathnames
586 to standard error (see the STDERR section).
587
588 −x format Specify the output archive format. The pax utility shall sup‐
589 port the following formats:
590
591 cpio The cpio interchange format; see the EXTENDED
592 DESCRIPTION section. The default blocksize for this
593 format for character special archive files shall be
594 5120. Implementations shall support all blocksize
595 values less than or equal to 32256 that are multi‐
596 ples of 512.
597
598 pax The pax interchange format; see the EXTENDED
599 DESCRIPTION section. The default blocksize for this
600 format for character special archive files shall be
601 5120. Implementations shall support all blocksize
602 values less than or equal to 32256 that are multi‐
603 ples of 512.
604
605 ustar The tar interchange format; see the EXTENDED
606 DESCRIPTION section. The default blocksize for this
607 format for character special archive files shall be
608 10240. Implementations shall support all blocksize
609 values less than or equal to 32256 that are multi‐
610 ples of 512.
611
612 Implementation-defined formats shall specify a default block
613 size as well as any other block sizes supported for character
614 special archive files.
615
616 Any attempt to append to an archive file in a format differ‐
617 ent from the existing archive format shall cause pax to exit
618 immediately with a non-zero exit status.
619
620 −X When traversing the file hierarchy specified by a pathname,
621 pax shall not descend into directories that have a different
622 device ID (st_dev; see the System Interfaces volume of
623 POSIX.1‐2008, stat()).
624
625 Specifying more than one of the mutually-exclusive options −H and −L
626 shall not be considered an error and the last option specified shall
627 determine the behavior of the utility.
628
629 The options that operate on the names of files or archive members (−c,
630 −i, −n, −s, −u, and −v) shall interact as follows. In read mode, the
631 archive members shall be selected based on the user-specified pattern
632 operands as modified by the −c, −n, and −u options. Then, any −s and −i
633 options shall modify, in that order, the names of the selected files.
634 The −v option shall write names resulting from these modifications.
635
636 In write mode, the files shall be selected based on the user-specified
637 pathnames as modified by the −n and −u options. Then, any −s and −i
638 options shall modify, in that order, the names of these selected files.
639 The −v option shall write names resulting from these modifications.
640
641 If both the −u and −n options are specified, pax shall not consider a
642 file selected unless it is newer than the file to which it is compared.
643
644 List Mode Format Specifications
645 In list mode with the −o listopt=format option, the format argument
646 shall be applied for each selected file. The pax utility shall append a
647 <newline> to the listopt output for each selected file. The format
648 argument shall be used as the format string described in the Base Defi‐
649 nitions volume of POSIX.1‐2008, Chapter 5, File Format Notation, with
650 the exceptions 1. through 6. defined in the EXTENDED DESCRIPTION sec‐
651 tion of printf, plus the following exceptions:
652
653 7. The sequence (keyword) can occur before a format conversion spec‐
654 ifier. The conversion argument is defined by the value of key‐
655 word. The implementation shall support the following keywords:
656
657 -- Any of the Field Name entries in Table 4-14, ustar Header
658 Block and Table 4-16, Octet-Oriented cpio Archive Entry. The
659 implementation may support the cpio keywords without the
660 leading c_ in addition to the form required by Table 4-16,
661 Octet-Oriented cpio Archive Entry.
662
663 -- Any keyword defined for the extended header in pax Extended
664 Header.
665
666 -- Any keyword provided as an implementation-defined extension
667 within the extended header defined in pax Extended Header.
668
669 For example, the sequence "%(charset)s" is the string value of
670 the name of the character set in the extended header.
671
672 The result of the keyword conversion argument shall be the value
673 from the applicable header field or extended header, without any
674 trailing NULs.
675
676 All keyword values used as conversion arguments shall be trans‐
677 lated from the UTF‐8 encoding (or alternative encoding specified
678 by any hdrcharset extended header record) to the character set
679 appropriate for the local file system, user database, and so on,
680 as applicable.
681
682 8. An additional conversion specifier character, T, shall be used to
683 specify time formats. The T conversion specifier character can be
684 preceded by the sequence (keyword=subformat), where subformat is
685 a date format as defined by date operands. The default keyword
686 shall be mtime and the default subformat shall be:
687
688 %b %e %H:%M %Y
689
690 9. An additional conversion specifier character, M, shall be used to
691 specify the file mode string as defined in ls Standard Output. If
692 (keyword) is omitted, the mode keyword shall be used. For exam‐
693 ple, %.1M writes the single character corresponding to the
694 <entry type> field of the ls −l command.
695
696 10. An additional conversion specifier character, D, shall be used to
697 specify the device for block or special files, if applicable, in
698 an implementation-defined format. If not applicable, and (key‐
699 word) is specified, then this conversion shall be equivalent to
700 %(keyword)u. If not applicable, and (keyword) is omitted, then
701 this conversion shall be equivalent to <space>.
702
703 11. An additional conversion specifier character, F, shall be used to
704 specify a pathname. The F conversion character can be preceded by
705 a sequence of <comma>-separated keywords:
706
707 (keyword[,keyword] ... )
708
709 The values for all the keywords that are non-null shall be con‐
710 catenated together, each separated by a '/'. The default shall
711 be (path) if the keyword path is defined; otherwise, the default
712 shall be (prefix,name).
713
714 12. An additional conversion specifier character, L, shall be used to
715 specify a symbolic link expansion. If the current file is a sym‐
716 bolic link, then %L shall expand to:
717
718 "%s −> %s", <value of keyword>, <contents of link>
719
720 Otherwise, the %L conversion specification shall be the equiva‐
721 lent of %F.
722
724 The following operands shall be supported:
725
726 directory The destination directory pathname for copy mode.
727
728 file A pathname of a file to be copied or archived.
729
730 pattern A pattern matching one or more pathnames of archive members.
731 A pattern must be given in the name-generating notation of
732 the pattern matching notation in Section 2.13, Pattern Match‐
733 ing Notation, including the filename expansion rules in Sec‐
734 tion 2.13.3, Patterns Used for Filename Expansion. The
735 default, if no pattern is specified, is to select all members
736 in the archive.
737
739 In write mode, the standard input shall be used only if no file oper‐
740 ands are specified. It shall be a file containing a list of pathnames,
741 each terminated by a <newline> character.
742
743 In list and read modes, if −f is not specified, the standard input
744 shall be an archive file.
745
746 Otherwise, the standard input shall not be used.
747
749 The input file named by the archive option-argument, or standard input
750 when the archive is read from there, shall be a file formatted accord‐
751 ing to one of the specifications in the EXTENDED DESCRIPTION section or
752 some other implementation-defined format.
753
754 The file /dev/tty shall be used to write prompts and read responses.
755
757 The following environment variables shall affect the execution of pax:
758
759 LANG Provide a default value for the internationalization vari‐
760 ables that are unset or null. (See the Base Definitions vol‐
761 ume of POSIX.1‐2008, Section 8.2, Internationalization Vari‐
762 ables the precedence of internationalization variables used
763 to determine the values of locale categories.)
764
765 LC_ALL If set to a non-empty string value, override the values of
766 all the other internationalization variables.
767
768 LC_COLLATE
769 Determine the locale for the behavior of ranges, equivalence
770 classes, and multi-character collating elements used in the
771 pattern matching expressions for the pattern operand, the
772 basic regular expression for the −s option, and the extended
773 regular expression defined for the yesexpr locale keyword in
774 the LC_MESSAGES category.
775
776 LC_CTYPE Determine the locale for the interpretation of sequences of
777 bytes of text data as characters (for example, single-byte as
778 opposed to multi-byte characters in arguments and input
779 files), the behavior of character classes used in the
780 extended regular expression defined for the yesexpr locale
781 keyword in the LC_MESSAGES category, and pattern matching.
782
783 LC_MESSAGES
784 Determine the locale used to process affirmative responses,
785 and the locale used to affect the format and contents of
786 diagnostic messages and prompts written to standard error.
787
788 LC_TIME Determine the format and contents of date and time strings
789 when the −v option is specified.
790
791 NLSPATH Determine the location of message catalogs for the processing
792 of LC_MESSAGES.
793
794 TMPDIR Determine the pathname that provides part of the default
795 global extended header record file, as described for the −o
796 globexthdr= keyword in the OPTIONS section.
797
798 TZ Determine the timezone used to calculate date and time
799 strings when the −v option is specified. If TZ is unset or
800 null, an unspecified default timezone shall be used.
801
803 Default.
804
806 In write mode, if −f is not specified, the standard output shall be the
807 archive formatted according to one of the specifications in the
808 EXTENDED DESCRIPTION section, or some other implementation-defined for‐
809 mat (see −x format).
810
811 In list mode, when the −olistopt=format has been specified, the
812 selected archive members shall be written to standard output using the
813 format described under List Mode Format Specifications. In list mode
814 without the −olistopt=format option, the table of contents of the
815 selected archive members shall be written to standard output using the
816 following format:
817
818 "%s\n", <pathname>
819
820 If the −v option is specified in list mode, the table of contents of
821 the selected archive members shall be written to standard output using
822 the following formats.
823
824 For pathnames representing hard links to previous members of the ar‐
825 chive:
826
827 "%s == %s\n", <ls −l listing>, <linkname>
828
829 For all other pathnames:
830
831 "%s\n", <ls −l listing>
832
833 where <ls −l listing> shall be the format specified by the ls utility
834 with the −l option. When writing pathnames in this format, it is
835 unspecified what is written for fields for which the underlying archive
836 format does not have the correct information, although the correct num‐
837 ber of <blank>-separated fields shall be written.
838
839 In list mode, standard output shall not be buffered more than a path‐
840 name (plus any associated information and a <newline> terminator) at a
841 time.
842
844 If −v is specified in read, write, or copy modes, pax shall write the
845 pathnames it processes to the standard error output using the following
846 format:
847
848 "%s\n", <pathname>
849
850 These pathnames shall be written as soon as processing is begun on the
851 file or archive member, and shall be flushed to standard error. The
852 trailing <newline>, which shall not be buffered, is written when the
853 file has been read or written.
854
855 If the −s option is specified, and the replacement string has a trail‐
856 ing 'p', substitutions shall be written to standard error in the fol‐
857 lowing format:
858
859 "%s >> %s\n", <original pathname>, <new pathname>
860
861 In all operating modes of pax, optional messages of unspecified format
862 concerning the input archive format and volume number, the number of
863 files, blocks, volumes, and media parts as well as other diagnostic
864 messages may be written to standard error.
865
866 In all formats, for both standard output and standard error, it is
867 unspecified how non-printable characters in pathnames or link names are
868 written.
869
870 When using the −xpax archive format, if a filename, link name, group
871 name, owner name, or any other field in an extended header record can‐
872 not be translated between the codeset in use for that extended header
873 record and the character set of the current locale, pax shall write a
874 diagnostic message to standard error, shall process the file as
875 described for the −o invalid= option, and then shall continue process‐
876 ing with the next file.
877
879 In read mode, the extracted output files shall be of the archived file
880 type. In copy mode, the copied output files shall be the type of the
881 file being copied. In either mode, existing files in the destination
882 hierarchy shall be overwritten only when all permission (−p), modifica‐
883 tion time (−u), and invalid-value (−oinvalid=) tests allow it.
884
885 In write mode, the output file named by the −f option-argument shall be
886 a file formatted according to one of the specifications in the EXTENDED
887 DESCRIPTION section, or some other implementation-defined format.
888
890 pax Interchange Format
891 A pax archive tape or file produced in the −xpax format shall contain a
892 series of blocks. The physical layout of the archive shall be identical
893 to the ustar format described in ustar Interchange Format. Each file
894 archived shall be represented by the following sequence:
895
896 * An optional header block with extended header records. This header
897 block is of the form described in pax Header Block, with a typeflag
898 value of x or g. The extended header records, described in pax
899 Extended Header, shall be included as the data for this header
900 block.
901
902 * A header block that describes the file. Any fields in the preceding
903 optional extended header shall override the associated fields in
904 this header block for this file.
905
906 * Zero or more blocks that contain the contents of the file.
907
908 At the end of the archive file there shall be two 512-byte blocks
909 filled with binary zeros, interpreted as an end-of-archive indicator.
910
911 A schematic of an example archive with global extended header records
912 and two actual files is shown in Figure 4-1, pax Format Archive Exam‐
913 ple. In the example, the second file in the archive has no extended
914 header preceding it, presumably because it has no need for extended
915 attributes.
916
917 Figure 4-1: pax Format Archive Example
918
919 pax Header Block
920 The pax header block shall be identical to the ustar header block
921 described in ustar Interchange Format, except that two additional type‐
922 flag values are defined:
923
924 x Represents extended header records for the following file in the
925 archive (which shall have its own ustar header block). The format
926 of these extended header records shall be as described in pax
927 Extended Header.
928
929 g Represents global extended header records for the following files
930 in the archive. The format of these extended header records shall
931 be as described in pax Extended Header. Each value shall affect
932 all subsequent files that do not override that value in their own
933 extended header record and until another global extended header
934 record is reached that provides another value for the same field.
935 The typeflag g global headers should not be used with interchange
936 media that could suffer partial data loss in transporting the ar‐
937 chive.
938
939 For both of these types, the size field shall be the size of the
940 extended header records in octets. The other fields in the header block
941 are not meaningful to this version of the pax utility. However, if this
942 archive is read by a pax utility conforming to the ISO POSIX‐2:1993
943 standard, the header block fields are used to create a regular file
944 that contains the extended header records as data. Therefore, header
945 block field values should be selected to provide reasonable file access
946 to this regular file.
947
948 A further difference from the ustar header block is that data blocks
949 for files of typeflag 1 (the digit one) (hard link) may be included,
950 which means that the size field may be greater than zero. Archives cre‐
951 ated by pax −o linkdata shall include these data blocks with the hard
952 links.
953
954 pax Extended Header
955 A pax extended header contains values that are inappropriate for the
956 ustar header block because of limitations in that format: fields
957 requiring a character encoding other than that described in the
958 ISO/IEC 646:1991 standard, fields representing file attributes not
959 described in the ustar header, and fields whose format or length do not
960 fit the requirements of the ustar header. The values in an extended
961 header add attributes to the following file (or files; see the descrip‐
962 tion of the typeflag g header block) or override values in the follow‐
963 ing header block(s), as indicated in the following list of keywords.
964
965 An extended header shall consist of one or more records, each con‐
966 structed as follows:
967
968 "%d %s=%s\n", <length>, <keyword>, <value>
969
970 The extended header records shall be encoded according to the
971 ISO/IEC 10646‐1:2000 standard UTF‐8 encoding. The <length> field,
972 <blank>, <equals-sign>, and <newline> shown shall be limited to the
973 portable character set, as encoded in UTF‐8. The <keyword> fields can
974 be any UTF‐8 characters. The <length> field shall be the decimal
975 length of the extended header record in octets, including the trailing
976 <newline>. If there is a hdrcharset extended header in effect for a
977 file, the value field for any gname, linkpath, path, and uname extended
978 header records shall be encoded using the character set specified by
979 the hdrcharset extended header record; otherwise, the value field shall
980 be encoded using UTF‐8. The value field for all other keywords speci‐
981 fied by POSIX.1‐2008 shall be encoded using UTF‐8.
982
983 The <keyword> field shall be one of the entries from the following list
984 or a keyword provided as an implementation extension. Keywords con‐
985 sisting entirely of lowercase letters, digits, and periods are reserved
986 for future standardization. A keyword shall not include an <equals-
987 sign>. (In the following list, the notations ``file(s)'' or
988 ``block(s)'' is used to acknowledge that a keyword affects the follow‐
989 ing single file after a typeflag x extended header, but possibly multi‐
990 ple files after typeflag g. Any requirements in the list for pax to
991 include a record when in write or copy mode shall apply only when such
992 a record has not already been provided through the use of the −o
993 option. When used in copy mode, pax shall behave as if an archive had
994 been created with applicable extended header records and then
995 extracted.)
996
997 atime The file access time for the following file(s), equivalent to
998 the value of the st_atime member of the stat structure for a
999 file, as described by the stat() function. The access time
1000 shall be restored if the process has appropriate privileges
1001 required to do so. The format of the <value> shall be as
1002 described in pax Extended Header File Times.
1003
1004 charset The name of the character set used to encode the data in the
1005 following file(s). The entries in the following table are
1006 defined to refer to known standards; additional names may be
1007 agreed on between the originator and recipient.
1008
1009 ┌────────────────────────┬───────────────────────────────┐
1010 │ <value> │ Formal Standard │
1011 ├────────────────────────┼───────────────────────────────┤
1012 │ISO-IR 646 1990 │ ISO/IEC 646:1990 │
1013 │ISO-IR 8859 1 1998 │ ISO/IEC 8859‐1:1998 │
1014 │ISO-IR 8859 2 1999 │ ISO/IEC 8859‐2:1999 │
1015 │ISO-IR 8859 3 1999 │ ISO/IEC 8859‐3:1999 │
1016 │ISO-IR 8859 4 1998 │ ISO/IEC 8859‐4:1998 │
1017 │ISO-IR 8859 5 1999 │ ISO/IEC 8859‐5:1999 │
1018 │ISO-IR 8859 6 1999 │ ISO/IEC 8859‐6:1999 │
1019 │ISO-IR 8859 7 1987 │ ISO/IEC 8859‐7:1987 │
1020 │ISO-IR 8859 8 1999 │ ISO/IEC 8859‐8:1999 │
1021 │ISO-IR 8859 9 1999 │ ISO/IEC 8859‐9:1999 │
1022 │ISO-IR 8859 10 1998 │ ISO/IEC 8859‐10:1998 │
1023 │ISO-IR 8859 13 1998 │ ISO/IEC 8859‐13:1998 │
1024 │ISO-IR 8859 14 1998 │ ISO/IEC 8859‐14:1998 │
1025 │ISO-IR 8859 15 1999 │ ISO/IEC 8859‐15:1999 │
1026 │ISO-IR 10646 2000 │ ISO/IEC 10646:2000 │
1027 │ISO-IR 10646 2000 UTF-8 │ ISO/IEC 10646, UTF-8 encoding │
1028 │BINARY │ None. │
1029 └────────────────────────┴───────────────────────────────┘
1030 The encoding is included in an extended header for informa‐
1031 tion only; when pax is used as described in POSIX.1‐2008, it
1032 shall not translate the file data into any other encoding.
1033 The BINARY entry indicates unencoded binary data.
1034
1035 When used in write or copy mode, it is implementation-defined
1036 whether pax includes a charset extended header record for a
1037 file.
1038
1039 comment A series of characters used as a comment. All characters in
1040 the <value> field shall be ignored by pax.
1041
1042 gid The group ID of the group that owns the file, expressed as a
1043 decimal number using digits from the ISO/IEC 646:1991 stan‐
1044 dard. This record shall override the gid field in the follow‐
1045 ing header block(s). When used in write or copy mode, pax
1046 shall include a gid extended header record for each file
1047 whose group ID is greater than 2097151 (octal 7777777).
1048
1049 gname The group of the file(s), formatted as a group name in the
1050 group database. This record shall override the gid and gname
1051 fields in the following header block(s), and any gid extended
1052 header record. When used in read, copy, or list mode, pax
1053 shall translate the name from the encoding in the header
1054 record to the character set appropriate for the group data‐
1055 base on the receiving system. If any of the characters cannot
1056 be translated, and if neither the −oinvalid=UTF‐8 option nor
1057 the −oinvalid=binary option is specified, the results are
1058 implementation-defined. When used in write or copy mode, pax
1059 shall include a gname extended header record for each file
1060 whose group name cannot be represented entirely with the let‐
1061 ters and digits of the portable character set.
1062
1063 hdrcharset
1064 The name of the character set used to encode the value field
1065 of the gname, linkpath, path, and uname pax extended header
1066 records. The entries in the following table are defined to
1067 refer to known standards; additional names may be agreed
1068 between the originator and the recipient.
1069
1070 ┌────────────────────────┬───────────────────────────────┐
1071 │ <value> │ Formal Standard │
1072 ├────────────────────────┼───────────────────────────────┤
1073 │ISO-IR 10646 2000 UTF-8 │ ISO/IEC 10646, UTF-8 encoding │
1074 │BINARY │ None. │
1075 └────────────────────────┴───────────────────────────────┘
1076 If no hdrcharset extended header record is specified, the
1077 default character set used to encode all values in extended
1078 header records shall be the ISO/IEC 10646‐1:2000 standard
1079 UTF‐8 encoding.
1080
1081 The BINARY entry indicates that all values recorded in
1082 extended headers for affected files are unencoded binary data
1083 from the underlying system.
1084
1085 linkpath The pathname of a link being created to another file, of any
1086 type, previously archived. This record shall override the
1087 linkname field in the following ustar header block(s). The
1088 following ustar header block shall determine the type of link
1089 created. If typeflag of the following header block is 1, it
1090 shall be a hard link. If typeflag is 2, it shall be a sym‐
1091 bolic link and the linkpath value shall be the contents of
1092 the symbolic link. The pax utility shall translate the name
1093 of the link (contents of the symbolic link) from the encoding
1094 in the header to the character set appropriate for the local
1095 file system. When used in write or copy mode, pax shall
1096 include a linkpath extended header record for each link whose
1097 pathname cannot be represented entirely with the members of
1098 the portable character set other than NUL.
1099
1100 mtime The file modification time of the following file(s), equiva‐
1101 lent to the value of the st_mtime member of the stat struc‐
1102 ture for a file, as described in the stat() function. This
1103 record shall override the mtime field in the following header
1104 block(s). The modification time shall be restored if the
1105 process has appropriate privileges required to do so. The
1106 format of the <value> shall be as described in pax Extended
1107 Header File Times.
1108
1109 path The pathname of the following file(s). This record shall
1110 override the name and prefix fields in the following header
1111 block(s). The pax utility shall translate the pathname of the
1112 file from the encoding in the header to the character set
1113 appropriate for the local file system.
1114
1115 When used in write or copy mode, pax shall include a path
1116 extended header record for each file whose pathname cannot be
1117 represented entirely with the members of the portable charac‐
1118 ter set other than NUL.
1119
1120 realtime.any
1121 The keywords prefixed by ``realtime.'' are reserved for
1122 future standardization.
1123
1124 security.any
1125 The keywords prefixed by ``security.'' are reserved for
1126 future standardization.
1127
1128 size The size of the file in octets, expressed as a decimal number
1129 using digits from the ISO/IEC 646:1991 standard. This record
1130 shall override the size field in the following header
1131 block(s). When used in write or copy mode, pax shall include
1132 a size extended header record for each file with a size value
1133 greater than 8589934591 (octal 77777777777).
1134
1135 uid The user ID of the file owner, expressed as a decimal number
1136 using digits from the ISO/IEC 646:1991 standard. This record
1137 shall override the uid field in the following header
1138 block(s). When used in write or copy mode, pax shall include
1139 a uid extended header record for each file whose owner ID is
1140 greater than 2097151 (octal 7777777).
1141
1142 uname The owner of the following file(s), formatted as a user name
1143 in the user database. This record shall override the uid and
1144 uname fields in the following header block(s), and any uid
1145 extended header record. When used in read, copy, or list
1146 mode, pax shall translate the name from the encoding in the
1147 header record to the character set appropriate for the user
1148 database on the receiving system. If any of the characters
1149 cannot be translated, and if neither the −oinvalid=UTF‐8
1150 option nor the −oinvalid=binary option is specified, the
1151 results are implementation-defined. When used in write or
1152 copy mode, pax shall include a uname extended header record
1153 for each file whose user name cannot be represented entirely
1154 with the letters and digits of the portable character set.
1155
1156 If the <value> field is zero length, it shall delete any header block
1157 field, previously entered extended header value, or global extended
1158 header value of the same name.
1159
1160 If a keyword in an extended header record (or in a −o option-argument)
1161 overrides or deletes a corresponding field in the ustar header block,
1162 pax shall ignore the contents of that header block field.
1163
1164 Unlike the ustar header block fields, NULs shall not delimit <value>s;
1165 all characters within the <value> field shall be considered data for
1166 the field. None of the length limitations of the ustar header block
1167 fields in Table 4-14, ustar Header Block shall apply to the extended
1168 header records.
1169
1170 pax Extended Header Keyword Precedence
1171 This section describes the precedence in which the various header
1172 records and fields and command line options are selected to apply to a
1173 file in the archive. When pax is used in read or list modes, it shall
1174 determine a file attribute in the following sequence:
1175
1176 1. If −odelete=keyword-prefix is used, the affected attributes shall
1177 be determined from step 7., if applicable, or ignored otherwise.
1178
1179 2. If −okeyword:= is used, the affected attributes shall be ignored.
1180
1181 3. If −okeyword:=value is used, the affected attribute shall be
1182 assigned the value.
1183
1184 4. If there is a typeflag x extended header record, the affected
1185 attribute shall be assigned the <value>. When extended header
1186 records conflict, the last one given in the header shall take
1187 precedence.
1188
1189 5. If −okeyword=value is used, the affected attribute shall be
1190 assigned the value.
1191
1192 6. If there is a typeflag g global extended header record, the
1193 affected attribute shall be assigned the <value>. When global
1194 extended header records conflict, the last one given in the global
1195 header shall take precedence.
1196
1197 7. Otherwise, the attribute shall be determined from the ustar header
1198 block.
1199
1200 pax Extended Header File Times
1201 The pax utility shall write an mtime record for each file in write or
1202 copy modes if the file's modification time cannot be represented
1203 exactly in the ustar header logical record described in ustar Inter‐
1204 change Format. This can occur if the time is out of ustar range, or if
1205 the file system of the underlying implementation supports non-integer
1206 time granularities and the time is not an integer. All of these time
1207 records shall be formatted as a decimal representation of the time in
1208 seconds since the Epoch. If a <period> ('.') decimal point character
1209 is present, the digits to the right of the point shall represent the
1210 units of a subsecond timing granularity, where the first digit is
1211 tenths of a second and each subsequent digit is a tenth of the previous
1212 digit. In read or copy mode, the pax utility shall truncate the time of
1213 a file to the greatest value that is not greater than the input header
1214 file time. In write or copy mode, the pax utility shall output a time
1215 exactly if it can be represented exactly as a decimal number, and oth‐
1216 erwise shall generate only enough digits so that the same time shall be
1217 recovered if the file is extracted on a system whose underlying imple‐
1218 mentation supports the same time granularity.
1219
1220 ustar Interchange Format
1221 A ustar archive tape or file shall contain a series of logical records.
1222 Each logical record shall be a fixed-size logical record of 512 octets
1223 (see below). Although this format may be thought of as being stored on
1224 9-track industry-standard 12.7 mm (0.5 in) magnetic tape, other types
1225 of transportable media are not excluded. Each file archived shall be
1226 represented by a header logical record that describes the file, fol‐
1227 lowed by zero or more logical records that give the contents of the
1228 file. At the end of the archive file there shall be two 512-octet logi‐
1229 cal records filled with binary zeros, interpreted as an end-of-archive
1230 indicator.
1231
1232 The logical records may be grouped for physical I/O operations, as
1233 described under the −bblocksize and −x ustar options. Each group of
1234 logical records may be written with a single operation equivalent to
1235 the write() function. On magnetic tape, the result of this write shall
1236 be a single tape physical block. The last physical block shall always
1237 be the full size, so logical records after the two zero logical records
1238 may contain undefined data.
1239
1240 The header logical record shall be structured as shown in the following
1241 table. All lengths and offsets are in decimal.
1242
1243 Table 4-14: ustar Header Block
1244
1245 ┌───────────┬──────────────┬────────────────────┐
1246 │Field Name │ Octet Offset │ Length (in Octets) │
1247 ├───────────┼──────────────┼────────────────────┤
1248 │name │ 0 │ 100 │
1249 │mode │ 100 │ 8 │
1250 │uid │ 108 │ 8 │
1251 │gid │ 116 │ 8 │
1252 │size │ 124 │ 12 │
1253 │mtime │ 136 │ 12 │
1254 │chksum │ 148 │ 8 │
1255 │typeflag │ 156 │ 1 │
1256 │linkname │ 157 │ 100 │
1257 │magic │ 257 │ 6 │
1258 │version │ 263 │ 2 │
1259 │uname │ 265 │ 32 │
1260 │gname │ 297 │ 32 │
1261 │devmajor │ 329 │ 8 │
1262 │devminor │ 337 │ 8 │
1263 │prefix │ 345 │ 155 │
1264 └───────────┴──────────────┴────────────────────┘
1265 All characters in the header logical record shall be represented in the
1266 coded character set of the ISO/IEC 646:1991 standard. For maximum
1267 portability between implementations, names should be selected from
1268 characters represented by the portable filename character set as octets
1269 with the most significant bit zero. If an implementation supports the
1270 use of characters outside of <slash> and the portable filename charac‐
1271 ter set in names for files, users, and groups, one or more implementa‐
1272 tion-defined encodings of these characters shall be provided for inter‐
1273 change purposes.
1274
1275 However, the pax utility shall never create filenames on the local sys‐
1276 tem that cannot be accessed via the procedures described in
1277 POSIX.1‐2008. If a filename is found on the medium that would create an
1278 invalid filename, it is implementation-defined whether the data from
1279 the file is stored on the file hierarchy and under what name it is
1280 stored. The pax utility may choose to ignore these files as long as it
1281 produces an error indicating that the file is being ignored.
1282
1283 Each field within the header logical record is contiguous; that is,
1284 there is no padding used. Each character on the archive medium shall be
1285 stored contiguously.
1286
1287 The fields magic, uname, and gname are character strings each termi‐
1288 nated by a NUL character. The fields name, linkname, and prefix are
1289 NUL-terminated character strings except when all characters in the
1290 array contain non-NUL characters including the last character. The ver‐
1291 sion field is two octets containing the characters "00" (zero-zero).
1292 The typeflag contains a single character. All other fields are leading
1293 zero-filled octal numbers using digits from the ISO/IEC 646:1991 stan‐
1294 dard IRV. Each numeric field is terminated by one or more <space> or
1295 NUL characters.
1296
1297 The name and the prefix fields shall produce the pathname of the file.
1298 A new pathname shall be formed, if prefix is not an empty string (its
1299 first character is not NUL), by concatenating prefix (up to the first
1300 NUL character), a <slash> character, and name; otherwise, name is used
1301 alone. In either case, name is terminated at the first NUL character.
1302 If prefix begins with a NUL character, it shall be ignored. In this
1303 manner, pathnames of at most 256 characters can be supported. If a
1304 pathname does not fit in the space provided, pax shall notify the user
1305 of the error, and shall not store any part of the file—header or data—
1306 on the medium.
1307
1308 The linkname field, described below, shall not use the prefix to pro‐
1309 duce a pathname. As such, a linkname is limited to 100 characters. If
1310 the name does not fit in the space provided, pax shall notify the user
1311 of the error, and shall not attempt to store the link on the medium.
1312
1313 The mode field provides 12 bits encoded in the ISO/IEC 646:1991 stan‐
1314 dard octal digit representation. The encoded bits shall represent the
1315 following values:
1316
1317 Table: ustar mode Field
1318
1319 ┌──────────┬──────────────────┬─────────────────────────────────────────────────┐
1320 │Bit Value │ POSIX.1‐2008 Bit │ Description │
1321 ├──────────┼──────────────────┼─────────────────────────────────────────────────┤
1322 │ 04000 │ S_ISUID │ Set UID on execution. │
1323 │ 02000 │ S_ISGID │ Set GID on execution. │
1324 │ 01000 │ <reserved> │ Reserved for future standardization. │
1325 │ 00400 │ S_IRUSR │ Read permission for file owner class. │
1326 │ 00200 │ S_IWUSR │ Write permission for file owner class. │
1327 │ 00100 │ S_IXUSR │ Execute/search permission for file owner class. │
1328 │ 00040 │ S_IRGRP │ Read permission for file group class. │
1329 │ 00020 │ S_IWGRP │ Write permission for file group class. │
1330 │ 00010 │ S_IXGRP │ Execute/search permission for file group class. │
1331 │ 00004 │ S_IROTH │ Read permission for file other class. │
1332 │ 00002 │ S_IWOTH │ Write permission for file other class. │
1333 │ 00001 │ S_IXOTH │ Execute/search permission for file other class. │
1334 └──────────┴──────────────────┴─────────────────────────────────────────────────┘
1335 When appropriate privileges are required to set one of these mode bits,
1336 and the user restoring the files from the archive does not have appro‐
1337 priate privileges, the mode bits for which the user does not have
1338 appropriate privileges shall be ignored. Some of the mode bits in the
1339 archive format are not mentioned elsewhere in this volume of
1340 POSIX.1‐2008. If the implementation does not support those bits, they
1341 may be ignored.
1342
1343 The uid and gid fields are the user and group ID of the owner and group
1344 of the file, respectively.
1345
1346 The size field is the size of the file in octets. If the typeflag field
1347 is set to specify a file to be of type 1 (a link) or 2 (a symbolic
1348 link), the size field shall be specified as zero. If the typeflag field
1349 is set to specify a file of type 5 (directory), the size field shall be
1350 interpreted as described under the definition of that record type. No
1351 data logical records are stored for types 1, 2, or 5. If the typeflag
1352 field is set to 3 (character special file), 4 (block special file), or
1353 6 (FIFO), the meaning of the size field is unspecified by this volume
1354 of POSIX.1‐2008, and no data logical records shall be stored on the
1355 medium. Additionally, for type 6, the size field shall be ignored when
1356 reading. If the typeflag field is set to any other value, the number of
1357 logical records written following the header shall be (size+511)/512,
1358 ignoring any fraction in the result of the division.
1359
1360 The mtime field shall be the modification time of the file at the time
1361 it was archived. It is the ISO/IEC 646:1991 standard representation of
1362 the octal value of the modification time obtained from the stat() func‐
1363 tion.
1364
1365 The chksum field shall be the ISO/IEC 646:1991 standard IRV representa‐
1366 tion of the octal value of the simple sum of all octets in the header
1367 logical record. Each octet in the header shall be treated as an
1368 unsigned value. These values shall be added to an unsigned integer,
1369 initialized to zero, the precision of which is not less than 17 bits.
1370 When calculating the checksum, the chksum field is treated as if it
1371 were all <space> characters.
1372
1373 The typeflag field specifies the type of file archived. If a particular
1374 implementation does not recognize the type, or the user does not have
1375 appropriate privileges to create that type, the file shall be extracted
1376 as if it were a regular file if the file type is defined to have a
1377 meaning for the size field that could cause data logical records to be
1378 written on the medium (see the previous description for size). If con‐
1379 version to a regular file occurs, the pax utility shall produce an
1380 error indicating that the conversion took place. All of the typeflag
1381 fields shall be coded in the ISO/IEC 646:1991 standard IRV:
1382
1383 0 Represents a regular file. For backwards-compatibility, a type‐
1384 flag value of binary zero ('\0') should be recognized as mean‐
1385 ing a regular file when extracting files from the archive. Ar‐
1386 chives written with this version of the archive file format
1387 create regular files with a typeflag value of the
1388 ISO/IEC 646:1991 standard IRV '0'.
1389
1390 1 Represents a file linked to another file, of any type, previ‐
1391 ously archived. Such files are identified by having the same
1392 device and file serial numbers, and pathnames that refer to
1393 different directory entries. All such files shall be archived
1394 as linked files. The linked-to name is specified in the
1395 linkname field with a NUL-character terminator if it is less
1396 than 100 octets in length.
1397
1398 2 Represents a symbolic link. The contents of the symbolic link
1399 shall be stored in the linkname field.
1400
1401 3,4 Represent character special files and block special files
1402 respectively. In this case the devmajor and devminor fields
1403 shall contain information defining the device, the format of
1404 which is unspecified by this volume of POSIX.1‐2008. Implemen‐
1405 tations may map the device specifications to their own local
1406 specification or may ignore the entry.
1407
1408 5 Specifies a directory or subdirectory. On systems where disk
1409 allocation is performed on a directory basis, the size field
1410 shall contain the maximum number of octets (which may be
1411 rounded to the nearest disk block allocation unit) that the
1412 directory may hold. A size field of zero indicates no such
1413 limiting. Systems that do not support limiting in this manner
1414 should ignore the size field.
1415
1416 6 Specifies a FIFO special file. Note that the archiving of a
1417 FIFO file archives the existence of this file and not its con‐
1418 tents.
1419
1420 7 Reserved to represent a file to which an implementation has
1421 associated some high-performance attribute. Implementations
1422 without such extensions should treat this file as a regular
1423 file (type 0).
1424
1425 A‐Z The letters 'A' to 'Z', inclusive, are reserved for custom
1426 implementations. All other values are reserved for future ver‐
1427 sions of this standard.
1428
1429 It is unspecified whether files with pathnames that refer to the same
1430 directory entry are archived as linked files or as separate files. If
1431 they are archived as linked files, this means that attempting to
1432 extract both pathnames from the resulting archive will always cause an
1433 error (unless the −u option is used) because the link cannot be cre‐
1434 ated.
1435
1436 It is unspecified whether files with the same device and file serial
1437 numbers being appended to an archive are treated as linked files to
1438 members that were in the archive before the append.
1439
1440 Attempts to archive a socket using ustar interchange format shall pro‐
1441 duce a diagnostic message. Handling of other file types is implementa‐
1442 tion-defined.
1443
1444 The magic field is the specification that this archive was output in
1445 this archive format. If this field contains ustar (the five characters
1446 from the ISO/IEC 646:1991 standard IRV shown followed by NUL), the
1447 uname and gname fields shall contain the ISO/IEC 646:1991 standard IRV
1448 representation of the owner and group of the file, respectively (trun‐
1449 cated to fit, if necessary). When the file is restored by a privileged,
1450 protection-preserving version of the utility, the user and group data‐
1451 bases shall be scanned for these names. If found, the user and group
1452 IDs contained within these files shall be used rather than the values
1453 contained within the uid and gid fields.
1454
1455 cpio Interchange Format
1456 The octet-oriented cpio archive format shall be a series of entries,
1457 each comprising a header that describes the file, the name of the file,
1458 and then the contents of the file.
1459
1460 An archive may be recorded as a series of fixed-size blocks of octets.
1461 This blocking shall be used only to make physical I/O more efficient.
1462 The last group of blocks shall always be at the full size.
1463
1464 For the octet-oriented cpio archive format, the individual entry infor‐
1465 mation shall be in the order indicated and described by the following
1466 table; see also the <cpio.h> header.
1467
1468 Table 4-16: Octet-Oriented cpio Archive Entry
1469
1470 ┌─────────────────────┬────────────────────┬─────────────────┐
1471 │ Header Field Name │ Length (in Octets) │ Interpreted as │
1472 ├─────────────────────┼────────────────────┼─────────────────┤
1473 │c_magic │ 6 │ Octal number │
1474 │c_dev │ 6 │ Octal number │
1475 │c_ino │ 6 │ Octal number │
1476 │c_mode │ 6 │ Octal number │
1477 │c_uid │ 6 │ Octal number │
1478 │c_gid │ 6 │ Octal number │
1479 │c_nlink │ 6 │ Octal number │
1480 │c_rdev │ 6 │ Octal number │
1481 │c_mtime │ 11 │ Octal number │
1482 │c_namesize │ 6 │ Octal number │
1483 │c_filesize │ 11 │ Octal number │
1484 ├─────────────────────┼────────────────────┼─────────────────┤
1485 │Filename Field Name │ Length │ Interpreted as │
1486 ├─────────────────────┴────────────────────┴─────────────────┤
1487 │c_name c_namesize Pathname string │
1488 ├─────────────────────┬────────────────────┬─────────────────┤
1489 │File Data Field Name │ Length │ Interpreted as │
1490 ├─────────────────────┴────────────────────┴─────────────────┤
1491 │c_filedata c_filesize Data │
1492 └────────────────────────────────────────────────────────────┘
1493 cpio Header
1494 For each file in the archive, a header as defined previously shall be
1495 written. The information in the header fields is written as streams of
1496 the ISO/IEC 646:1991 standard characters interpreted as octal numbers.
1497 The octal numbers shall be extended to the necessary length by append‐
1498 ing the ISO/IEC 646:1991 standard IRV zeros at the most-significant-
1499 digit end of the number; the result is written to the most-significant
1500 digit of the stream of octets first. The fields shall be interpreted
1501 as follows:
1502
1503 c_magic Identify the archive as being a transportable archive by con‐
1504 taining the identifying value "070707".
1505
1506 c_dev, c_ino
1507 Contains values that uniquely identify the file within the
1508 archive (that is, no files contain the same pair of c_dev and
1509 c_ino values unless they are links to the same file). The
1510 values shall be determined in an unspecified manner.
1511
1512 c_mode Contains the file type and access permissions as defined in
1513 the following table.
1514
1515 Table 4-17: Values for cpio c_mode Field
1516
1517 │──────────────────────┬─────────┬────────────────────────┬─
1518 │ File Permissions Name│ Value │ Indicates │
1519 │──────────────────────┼─────────┼────────────────────────┼─
1520 │ C_IRUSR │ 000400│ Read by owner │
1521 │ C_IWUSR │ 000200│ Write by owner │
1522 │ C_IXUSR │ 000100│ Execute by owner │
1523 │ C_IRGRP │ 000040│ Read by group │
1524 │ C_IWGRP │ 000020│ Write by group │
1525 │ C_IXGRP │ 000010│ Execute by group │
1526 │ C_IROTH │ 000004│ Read by others │
1527 │ C_IWOTH │ 000002│ Write by others │
1528 │ C_IXOTH │ 000001│ Execute by others │
1529 │ C_ISUID │ 004000│ Set uid │
1530 │ C_ISGID │ 002000│ Set gid │
1531 │ C_ISVTX │ 001000│ Reserved │
1532 │──────────────────────┼─────────┼────────────────────────┼─
1533 │ File Type Name │ Value │ Indicates │
1534 │──────────────────────┼─────────┼────────────────────────┼─
1535 │ C_ISDIR │ 040000│ Directory │
1536 │ C_ISFIFO │ 010000│ FIFO │
1537 │ C_ISREG │ 0100000│ Regular file │
1538 │ C_ISLNK │ 0120000│ Symbolic link │
1539 │ │ │ │
1540 │C_ISBLK │ 060000 │ Block special file │
1541 │C_ISCHR │ 020000 │ Character special file │
1542 │C_ISSOCK │ 0140000 │ Socket │
1543 │ │ │ │
1544 │C_ISCTG │ 0110000 │ Reserved │
1545 └──────────────────────┴─────────┴────────────────────────┘
1546 Directories, FIFOs, symbolic links, and regular files shall
1547 be supported on a system conforming to this volume of
1548 POSIX.1‐2008; additional values defined previously are
1549 reserved for compatibility with existing systems. Additional
1550 file types may be supported; however, such files should not
1551 be written to archives intended to be transported to other
1552 systems.
1553
1554 c_uid Contains the user ID of the owner.
1555
1556 c_gid Contains the group ID of the group.
1557
1558 c_nlink Contains a number greater than or equal to the number of
1559 links in the archive referencing the file. If the −a option
1560 is used to append to a cpio archive, then the pax utility
1561 need not account for the files in the existing part of the
1562 archive when calculating the c_nlink values for the appended
1563 part of the archive, and need not alter the c_nlink values in
1564 the existing part of the archive if additional files with the
1565 same c_dev and c_ino values are appended to the archive.
1566
1567 c_rdev Contains implementation-defined information for character or
1568 block special files.
1569
1570 c_mtime Contains the latest time of modification of the file at the
1571 time the archive was created.
1572
1573 c_namesize
1574 Contains the length of the pathname, including the terminat‐
1575 ing NUL character.
1576
1577 c_filesize
1578 Contains the length in octets of the data section following
1579 the header structure.
1580
1581 cpio Filename
1582 The c_name field shall contain the pathname of the file. The length of
1583 this field in octets is the value of c_namesize.
1584
1585 If a filename is found on the medium that would create an invalid path‐
1586 name, it is implementation-defined whether the data from the file is
1587 stored on the file hierarchy and under what name it is stored.
1588
1589 All characters shall be represented in the ISO/IEC 646:1991 standard
1590 IRV. For maximum portability between implementations, names should be
1591 selected from characters represented by the portable filename character
1592 set as octets with the most significant bit zero. If an implementation
1593 supports the use of characters outside the portable filename character
1594 set in names for files, users, and groups, one or more implementation-
1595 defined encodings of these characters shall be provided for interchange
1596 purposes. However, the pax utility shall never create filenames on the
1597 local system that cannot be accessed via the procedures described pre‐
1598 viously in this volume of POSIX.1‐2008. If a filename is found on the
1599 medium that would create an invalid filename, it is implementation-
1600 defined whether the data from the file is stored on the local file sys‐
1601 tem and under what name it is stored. The pax utility may choose to
1602 ignore these files as long as it produces an error indicating that the
1603 file is being ignored.
1604
1605 cpio File Data
1606 Following c_name, there shall be c_filesize octets of data. Interpreta‐
1607 tion of such data occurs in a manner dependent on the file. For regular
1608 files, the data shall consist of the contents of the file. For symbolic
1609 links, the data shall consist of the contents of the symbolic link. If
1610 c_filesize is zero, no data shall be contained in c_filedata.
1611
1612 When restoring from an archive:
1613
1614 * If the user does not have appropriate privileges to create a file
1615 of the specified type, pax shall ignore the entry and write an
1616 error message to standard error.
1617
1618 * Only regular files and symbolic links have data to be restored.
1619 Presuming a regular file meets any selection criteria that might be
1620 imposed on the format-reading utility by the user, such data shall
1621 be restored.
1622
1623 * If a user does not have appropriate privileges to set a particular
1624 mode flag, the flag shall be ignored. Some of the mode flags in the
1625 archive format are not mentioned elsewhere in this volume of
1626 POSIX.1‐2008. If the implementation does not support those flags,
1627 they may be ignored.
1628
1629 cpio Special Entries
1630 FIFO special files, directories, and the trailer shall be recorded with
1631 c_filesize equal to zero. Symbolic links shall be recorded with c_file‐
1632 size equal to the length of the contents of the symbolic link. For
1633 other special files, c_filesize is unspecified by this volume of
1634 POSIX.1‐2008. The header for the next file entry in the archive shall
1635 be written directly after the last octet of the file entry preceding
1636 it. A header denoting the filename TRAILER!!! shall indicate the end
1637 of the archive; the contents of octets in the last block of the archive
1638 following such a header are undefined.
1639
1641 The following exit values shall be returned:
1642
1643 0 All files were processed successfully.
1644
1645 >0 An error occurred.
1646
1648 If pax cannot create a file or a link when reading an archive or cannot
1649 find a file when writing an archive, or cannot preserve the user ID,
1650 group ID, or file mode when the −p option is specified, a diagnostic
1651 message shall be written to standard error and a non-zero exit status
1652 shall be returned, but processing shall continue. In the case where pax
1653 cannot create a link to a file, pax shall not, by default, create a
1654 second copy of the file.
1655
1656 If the extraction of a file from an archive is prematurely terminated
1657 by a signal or error, pax may have only partially extracted the file or
1658 (if the −n option was not specified) may have extracted a file of the
1659 same name as that specified by the user, but which is not the file the
1660 user wanted. Additionally, the file modes of extracted directories may
1661 have additional bits from the S_IRWXU mask set as well as incorrect
1662 modification and access times.
1663
1664 The following sections are informative.
1665
1667 Caution is advised when using the −a option to append to a cpio format
1668 archive. If any of the files being appended happen to be given the same
1669 c_dev and c_ino values as a file in the existing part of the archive,
1670 then they may be treated as links to that file on extraction. Thus, it
1671 is risky to use −a with cpio format except when it is done on the same
1672 system that the original archive was created on, and with the same pax
1673 utility, and in the knowledge that there has been little or no file
1674 system activity since the original archive was created that could lead
1675 to any of the files appended being given the same c_dev and c_ino val‐
1676 ues as an unrelated file in the existing part of the archive. Also,
1677 when (intentionally) appending additional links to a file in the exist‐
1678 ing part of the archive, the c_nlink values in the modified archive can
1679 be smaller than the number of links to the file in the archive, which
1680 may mean that the links are not preserved on extraction.
1681
1682 The −p (privileges) option was invented to reconcile differences
1683 between historical tar and cpio implementations. In particular, the two
1684 utilities use −m in diametrically opposed ways. The −p option also pro‐
1685 vides a consistent means of extending the ways in which future file
1686 attributes can be addressed, such as for enhanced security systems or
1687 high-performance files. Although it may seem complex, there are really
1688 two modes that are most commonly used:
1689
1690 −p e ``Preserve everything''. This would be used by the historical
1691 superuser, someone with all appropriate privileges, to preserve
1692 all aspects of the files as they are recorded in the archive.
1693 The e flag is the sum of o and p, and other implementation-
1694 defined attributes.
1695
1696 −p p ``Preserve'' the file mode bits. This would be used by the user
1697 with regular privileges who wished to preserve aspects of the
1698 file other than the ownership. The file times are preserved by
1699 default, but two other flags are offered to disable these and
1700 use the time of extraction.
1701
1702 The one pathname per line format of standard input precludes pathnames
1703 containing <newline> characters. Although such pathnames violate the
1704 portable filename guidelines, they may exist and their presence may
1705 inhibit usage of pax within shell scripts. This problem is inherited
1706 from historical archive programs. The problem can be avoided by listing
1707 filename arguments on the command line instead of on standard input.
1708
1709 It is almost certain that appropriate privileges are required for pax
1710 to accomplish parts of this volume of POSIX.1‐2008. Specifically, cre‐
1711 ating files of type block special or character special, restoring file
1712 access times unless the files are owned by the user (the −t option), or
1713 preserving file owner, group, and mode (the −p option) all probably
1714 require appropriate privileges.
1715
1716 In read mode, implementations are permitted to overwrite files when the
1717 archive has multiple members with the same name. This may fail if per‐
1718 missions on the first version of the file do not permit it to be over‐
1719 written.
1720
1721 The cpio and ustar formats can only support files up to 8589934592
1722 bytes (8 ∗ 2^30) in size.
1723
1724 When archives containing binary header information are listed , the
1725 filenames printed may cause strange behavior on some terminals.
1726
1727 When all of the following are true:
1728
1729 1. A file of type directory is being placed into an archive.
1730
1731 2. The ustar archive format is being used.
1732
1733 3. The pathname of the directory is less than or equal to 155 bytes
1734 long (it will fit in the prefix field in the ustar header block).
1735
1736 4. The last component of the pathname of the directory is longer than
1737 100 bytes long (it will not fit in the name field in the ustar
1738 header block).
1739
1740 some implementations of the pax utility will place the entire directory
1741 pathname in the prefix field, set the name field to an empty string,
1742 and place the directory in the archive. Other implementations of the
1743 pax utility will give an error under these conditions because the name
1744 field is not large enough to hold the last component of the directory
1745 name. This standard allows either behavior. However, when extracting a
1746 directory from a ustar format archive, this standard requires that all
1747 implementations be able to extract a directory even if the name field
1748 contains an empty string as long as the prefix field does not also con‐
1749 tain an empty string.
1750
1752 The following command:
1753
1754 pax −w −f /dev/rmt/1m .
1755
1756 copies the contents of the current directory to tape drive 1, medium
1757 density (assuming historical System V device naming procedures—the his‐
1758 torical BSD device name would be /dev/rmt9).
1759
1760 The following commands:
1761
1762 mkdir newdir
1763 pax −rw olddir newdir
1764
1765 copy the olddir directory hierarchy to newdir.
1766
1767 pax −r −s ',^//*usr//*,,' −f a.pax
1768
1769 reads the archive a.pax, with all files rooted in /usr in the archive
1770 extracted relative to the current directory.
1771
1772 Using the option:
1773
1774 −o listopt="%M %(atime)T %(size)D %(name)s"
1775
1776 overrides the default output description in Standard Output and instead
1777 writes:
1778
1779 −rw−rw−−− Jan 12 15:53 2003 1492 /usr/foo/bar
1780
1781 Using the options:
1782
1783 −o listopt='%L\t%(size)D\n%.7' \
1784 −o listopt='(name)s\n%(atime)T\n%T'
1785
1786 overrides the default output description in Standard Output and instead
1787 writes:
1788
1789 /usr/foo/bar −> /tmp 1492
1790 /usr/fo
1791 Jan 12 15:53 1991
1792 Jan 31 15:53 2003
1793
1795 The pax utility was new for the ISO POSIX‐2:1993 standard. It repre‐
1796 sents a peaceful compromise between advocates of the historical tar and
1797 cpio utilities.
1798
1799 A fundamental difference between cpio and tar was in the way directo‐
1800 ries were treated. The cpio utility did not treat directories differ‐
1801 ently from other files, and to select a directory and its contents
1802 required that each file in the hierarchy be explicitly specified. For
1803 tar, a directory matched every file in the file hierarchy it rooted.
1804
1805 The pax utility offers both interfaces; by default, directories map
1806 into the file hierarchy they root. The −d option causes pax to skip any
1807 file not explicitly referenced, as cpio historically did. The tar
1808 −style behavior was chosen as the default because it was believed that
1809 this was the more common usage and because tar is the more commonly
1810 available interface, as it was historically provided on both System V
1811 and BSD implementations.
1812
1813 The data interchange format specification in this volume of
1814 POSIX.1‐2008 requires that processes with ``appropriate privileges''
1815 shall always restore the ownership and permissions of extracted files
1816 exactly as archived. If viewed from the historic equivalence between
1817 superuser and ``appropriate privileges'', there are two problems with
1818 this requirement. First, users running as superusers may unknowingly
1819 set dangerous permissions on extracted files. Second, it is needlessly
1820 limiting, in that superusers cannot extract files and own them as supe‐
1821 ruser unless the archive was created by the superuser. (It should be
1822 noted that restoration of ownerships and permissions for the superuser,
1823 by default, is historical practice in cpio, but not in In order to
1824 avoid these two problems, the pax specification has an additional
1825 ``privilege'' mechanism, the −p option. Only a pax invocation with the
1826 privileges needed, and which has the −p option set using the e specifi‐
1827 cation character, has appropriate privileges to restore full ownership
1828 and permission information.
1829
1830 Note also that this volume of POSIX.1‐2008 requires that the file own‐
1831 ership and access permissions shall be set, on extraction, in the same
1832 fashion as the creat() function when provided with the mode stored in
1833 the archive. This means that the file creation mask of the user is
1834 applied to the file permissions.
1835
1836 Users should note that directories may be created by pax while extract‐
1837 ing files with permissions that are different from those that existed
1838 at the time the archive was created. When extracting sensitive informa‐
1839 tion into a directory hierarchy that no longer exists, users are
1840 encouraged to set their file creation mask appropriately to protect
1841 these files during extraction.
1842
1843 The table of contents output is written to standard output to facili‐
1844 tate pipeline processing.
1845
1846 An early proposal had hard links displaying for all pathnames. This was
1847 removed because it complicates the output of the case where −v is not
1848 specified and does not match historical cpio usage. The hard-link
1849 information is available in the −v display.
1850
1851 The description of the −l option allows implementations to make hard
1852 links to symbolic links. Earlier versions of this standard did not
1853 specify any way to create a hard link to a symbolic link, but many
1854 implementations provided this capability as an extension. If there are
1855 hard links to symbolic links when an archive is created, the implemen‐
1856 tation is required to archive the hard link in the archive (unless −H
1857 or −L is specified). When in read mode and in copy mode, implementa‐
1858 tions supporting hard links to symbolic links should use them when
1859 appropriate.
1860
1861 The archive formats inherited from the POSIX.1‐1990 standard have cer‐
1862 tain restrictions that have been brought along from historical usage.
1863 For example, there are restrictions on the length of pathnames stored
1864 in the archive. When pax is used in copy(−rw) mode (copying directory
1865 hierarchies), the ability to use extensions from the −xpax format over‐
1866 comes these restrictions.
1867
1868 The default blocksize value of 5120 bytes for cpio was selected because
1869 it is one of the standard block-size values for cpio, set when the −B
1870 option is specified. (The other default block-size value for cpio is
1871 512 bytes, and this was considered to be too small.) The default block
1872 value of 10240 bytes for tar was selected because that is the standard
1873 block-size value for BSD tar. The maximum block size of 32256 bytes
1874 (215−512 bytes) is the largest multiple of 512 bytes that fits into a
1875 signed 16-bit tape controller transfer register. There are known limi‐
1876 tations in some historical systems that would prevent larger blocks
1877 from being accepted. Historical values were chosen to improve compati‐
1878 bility with historical scripts using dd or similar utilities to manipu‐
1879 late archives. Also, default block sizes for any file type other than
1880 character special file has been deleted from this volume of
1881 POSIX.1‐2008 as unimportant and not likely to affect the structure of
1882 the resulting archive.
1883
1884 Implementations are permitted to modify the block-size value based on
1885 the archive format or the device to which the archive is being written.
1886 This is to provide implementations with the opportunity to take advan‐
1887 tage of special types of devices, and it should not be used without a
1888 great deal of consideration as it almost certainly decreases archive
1889 portability.
1890
1891 The intended use of the −n option was to permit extraction of one or
1892 more files from the archive without processing the entire archive. This
1893 was viewed by the standard developers as offering significant perfor‐
1894 mance advantages over historical implementations. The −n option in
1895 early proposals had three effects; the first was to cause special char‐
1896 acters in patterns to not be treated specially. The second was to cause
1897 only the first file that matched a pattern to be extracted. The third
1898 was to cause pax to write a diagnostic message to standard error when
1899 no file was found matching a specified pattern. Only the second behav‐
1900 ior is retained by this volume of POSIX.1‐2008, for many reasons.
1901 First, it is in general not acceptable for a single option to have mul‐
1902 tiple effects. Second, the ability to make pattern matching characters
1903 act as normal characters is useful for parts of pax other than file
1904 extraction. Third, a finer degree of control over the special charac‐
1905 ters is useful because users may wish to normalize only a single spe‐
1906 cial character in a single filename. Fourth, given a more general
1907 escape mechanism, the previous behavior of the −n option can be easily
1908 obtained using the −s option or a sed script. Finally, writing a diag‐
1909 nostic message when a pattern specified by the user is unmatched by any
1910 file is useful behavior in all cases.
1911
1912 In this version, the −n was removed from the copy mode synopsis of pax;
1913 it is inapplicable because there are no pattern operands specified in
1914 this mode.
1915
1916 There is another method than pax for copying subtrees in POSIX.1‐2008
1917 described as part of the cp utility. Both methods are historical prac‐
1918 tice: cp provides a simpler, more intuitive interface, while pax offers
1919 a finer granularity of control. Each provides additional functionality
1920 to the other; in particular, pax maintains the hard-link structure of
1921 the hierarchy while cp does not. It is the intention of the standard
1922 developers that the results be similar (using appropriate option combi‐
1923 nations in both utilities). The results are not required to be identi‐
1924 cal; there seemed insufficient gain to applications to balance the dif‐
1925 ficulty of implementations having to guarantee that the results would
1926 be exactly identical.
1927
1928 A single archive may span more than one file. It is suggested that
1929 implementations provide informative messages to the user on standard
1930 error whenever the archive file is changed.
1931
1932 The −d option (do not create intermediate directories not listed in the
1933 archive) found in early proposals was originally provided as a comple‐
1934 ment to the historic −d option of cpio. It has been deleted.
1935
1936 The −s option in early proposals specified a subset of the substitution
1937 command from the ed utility. As there was no reason for only a subset
1938 to be supported, the −s option is now compatible with the current ed
1939 specification. Since the delimiter can be any non-null character, the
1940 following usage with single <space> characters is valid:
1941
1942 pax −s " foo bar " ...
1943
1944 The −t description is worded so as to note that this may cause the
1945 access time update caused by some other activity (which occurs while
1946 the file is being read) to be overwritten.
1947
1948 The default behavior of pax with regard to file modification times is
1949 the same as historical implementations of tar. It is not the histori‐
1950 cal behavior of cpio.
1951
1952 Because the −i option uses /dev/tty, utilities without a controlling
1953 terminal are not able to use this option.
1954
1955 The −y option, found in early proposals, has been deleted because a
1956 line containing a single <period> for the −i option has equivalent
1957 functionality. The special lines for the −i option (a single <period>
1958 and the empty line) are historical practice in cpio.
1959
1960 In early drafts, a −echarmap option was included to increase portabil‐
1961 ity of files between systems using different coded character sets. This
1962 option was omitted because it was apparent that consensus could not be
1963 formed for it. In this version, the use of UTF‐8 should be an adequate
1964 substitute.
1965
1966 The ISO POSIX‐2:1993 standard and ISO POSIX‐1 standard requirements for
1967 pax, however, made it very difficult to create a single archive con‐
1968 taining files created using extended characters provided by different
1969 locales. This version adds the hdrcharset keyword to make it possible
1970 to archive files in these cases without dropping files due to transla‐
1971 tion errors.
1972
1973 Translating filenames and other attributes from a locale's encoding to
1974 UTF‐8 and then back again can lose information, as the resulting file‐
1975 name might not be byte-for-byte equivalent to the original. To avoid
1976 this problem, users can specify the −o hdrcharset=binary option, which
1977 will cause the resulting archive to use binary format for all names and
1978 attributes. Such archives are not portable among hosts that use differ‐
1979 ent native encodings (e.g., EBCDIC versus ASCII-based encodings), but
1980 they will allow interchange among the vast majority of POSIX file sys‐
1981 tems in practical use. Also, the −o hdrcharset=binary option will cause
1982 pax in copy mode to behave more like other standard utilities such as
1983 cp.
1984
1985 If the values specified by the −o exthdr.name=value, −o globex‐
1986 thdr.name=value, or by $TMPDIR (if −o globexthdr.name is not specified)
1987 require a character encoding other than that described in the
1988 ISO/IEC 646:1991 standard, a path extended header record will have to
1989 be created for the file. If a hdrcharset extended header record is
1990 active for such headers, it will determine the codeset used for the
1991 value field in these extended path header records. These path extended
1992 header records always need to be created when writing an archive even
1993 if hdrcharset=binary has been specified and would contain the same
1994 (binary) data that appears in the ustar header record prefix and name
1995 fields. (In other words, an extended header path record is always
1996 required to be generated if the prefix or name fields contain non-ASCII
1997 characters even when hdrcharset=binary is also in effect for that
1998 file.)
1999
2000 The −k option was added to address international concerns about the
2001 dangers involved in the character set transformations of −e (if the
2002 target character set were different from the source, the filenames
2003 might be transformed into names matching existing files) and also was
2004 made more general to protect files transferred between file systems
2005 with different {NAME_MAX} values (truncating a filename on a smaller
2006 system might also inadvertently overwrite existing files). As stated,
2007 it prevents any overwriting, even if the target file is older than the
2008 source. This version adds more granularity of options to solve this
2009 problem by introducing the −oinvalid=option—specifically the UTF‐8 and
2010 binary actions. (Note that an existing file is still subject to over‐
2011 writing in this case. The −k option closes that loophole.)
2012
2013 Some of the file characteristics referenced in this volume of
2014 POSIX.1‐2008 might not be supported by some archive formats. For exam‐
2015 ple, neither the tar nor cpio formats contain the file access time. For
2016 this reason, the e specification character has been provided, intended
2017 to cause all file characteristics specified in the archive to be
2018 retained.
2019
2020 It is required that extracted directories, by default, have their
2021 access and modification times and permissions set to the values speci‐
2022 fied in the archive. This has obvious problems in that the directories
2023 are almost certainly modified after being extracted and that directory
2024 permissions may not permit file creation. One possible solution is to
2025 create directories with the mode specified in the archive, as modified
2026 by the umask of the user, with sufficient permissions to allow file
2027 creation. After all files have been extracted, pax would then reset the
2028 access and modification times and permissions as necessary.
2029
2030 The list-mode formatting description borrows heavily from the one
2031 defined by the printf utility. However, since there is no separate op‐
2032 erand list to get conversion arguments, the format was extended to
2033 allow specifying the name of the conversion argument as part of the
2034 conversion specification.
2035
2036 The T conversion specifier allows time fields to be displayed in any of
2037 the date formats. Unlike the ls utility, pax does not adjust the format
2038 when the date is less than six months in the past. This makes parsing
2039 the output more predictable.
2040
2041 The D conversion specifier handles the ability to display the
2042 major/minor or file size, as with ls, by using %−8(size)D.
2043
2044 The L conversion specifier handles the ls display for symbolic links.
2045
2046 Conversion specifiers were added to generate existing known types used
2047 for ls.
2048
2049 pax Interchange Format
2050 The new POSIX data interchange format was developed primarily to sat‐
2051 isfy international concerns that the ustar and cpio formats did not
2052 provide for file, user, and group names encoded in characters outside a
2053 subset of the ISO/IEC 646:1991 standard. The standard developers real‐
2054 ized that this new POSIX data interchange format should be very exten‐
2055 sible because there were other requirements they foresaw in the near
2056 future:
2057
2058 * Support international character encodings and locale information
2059
2060 * Support security information (ACLs, and so on)
2061
2062 * Support future file types, such as realtime or contiguous files
2063
2064 * Include data areas for implementation use
2065
2066 * Support systems with words larger than 32 bits and timers with sub‐
2067 second granularity
2068
2069 The following were not goals for this format because these are better
2070 handled by separate utilities or are inappropriate for a portable for‐
2071 mat:
2072
2073 * Encryption
2074
2075 * Compression
2076
2077 * Data translation between locales and codesets
2078
2079 * inode storage
2080
2081 The format chosen to support the goals is an extension of the ustar
2082 format. Of the two formats previously available, only the ustar format
2083 was selected for extensions because:
2084
2085 * It was easier to extend in an upwards-compatible way. It offered
2086 version flags and header block type fields with room for future
2087 standardization. The cpio format, while possessing a more flexible
2088 file naming methodology, could not be extended without breaking
2089 some theoretical implementation or using a dummy filename that
2090 could be a legitimate filename.
2091
2092 * Industry experience since the original ``tar wars'' fought in
2093 developing the ISO POSIX‐1 standard has clearly been in favor of
2094 the ustar format, which is generally the default output format
2095 selected for pax implementations on new systems.
2096
2097 The new format was designed with one additional goal in mind: reason‐
2098 able behavior when an older tar or pax utility happened to read an ar‐
2099 chive. Since the POSIX.1‐1990 standard mandated that a ``format-reading
2100 utility'' had to treat unrecognized typeflag values as regular files,
2101 this allowed the format to include all the extended information in a
2102 pseudo-regular file that preceded each real file. An option is given
2103 that allows the archive creator to set up reasonable names for these
2104 files on the older systems. Also, the normative text suggests that rea‐
2105 sonable file access values be used for this ustar header block. Making
2106 these header files inaccessible for convenient reading and deleting
2107 would not be reasonable. File permissions of 600 or 700 are suggested.
2108
2109 The ustar typeflag field was used to accommodate the additional func‐
2110 tionality of the new format rather than magic or version because the
2111 POSIX.1‐1990 standard (and, by reference, the previous version of pax),
2112 mandated the behavior of the format-reading utility when it encountered
2113 an unknown typeflag, but was silent about the other two fields.
2114
2115 Early proposals for the first version of this standard contained a pro‐
2116 posed archive format that was based on compatibility with the standard
2117 for tape files (ISO 1001, similar to the format used historically on
2118 many mainframes and minicomputers). This format was overly complex and
2119 required considerable overhead in volume and header records. Further‐
2120 more, the standard developers felt that it would not be acceptable to
2121 the community of POSIX developers, so it was later changed to be a for‐
2122 mat more closely related to historical practice on POSIX systems.
2123
2124 The prefix and name split of pathnames in ustar was replaced by the
2125 single path extended header record for simplicity.
2126
2127 The concept of a global extended header (typeflagg) was controversial.
2128 If this were applied to an archive being recorded on magnetic tape, a
2129 few unreadable blocks at the beginning of the tape could be a serious
2130 problem; a utility attempting to extract as many files as possible from
2131 a damaged archive could lose a large percentage of file header informa‐
2132 tion in this case. However, if the archive were on a reliable medium,
2133 such as a CD‐ROM, the global extended header offers considerable poten‐
2134 tial size reductions by eliminating redundant information. Thus, the
2135 text warns against using the global method for unreliable media and
2136 provides a method for implanting global information in the extended
2137 header for each file, rather than in the typeflag g records.
2138
2139 No facility for data translation or filtering on a per-file basis is
2140 included because the standard developers could not invent an interface
2141 that would allow this in an efficient manner. If a filter, such as
2142 encryption or compression, is to be applied to all the files, it is
2143 more efficient to apply the filter to the entire archive as a single
2144 file. The standard developers considered interfaces that would invoke a
2145 shell script for each file going into or out of the archive, but the
2146 system overhead in this approach was considered to be too high.
2147
2148 One such approach would be to have filter= records that give a pathname
2149 for an executable. When the program is invoked, the file and archive
2150 would be open for standard input/output and all the header fields would
2151 be available as environment variables or command-line arguments. The
2152 standard developers did discuss such schemes, but they were omitted
2153 from POSIX.1‐2008 due to concerns about excessive overhead. Also, the
2154 program itself would need to be in the archive if it were to be used
2155 portably.
2156
2157 There is currently no portable means of identifying the character
2158 set(s) used for a file in the file system. Therefore, pax has not been
2159 given a mechanism to generate charset records automatically. The only
2160 portable means of doing this is for the user to write the archive using
2161 the −ocharset=string command line option. This assumes that all of the
2162 files in the archive use the same encoding. The ``implementation-
2163 defined'' text is included to allow for a system that can identify the
2164 encodings used for each of its files.
2165
2166 The table of standards that accompanies the charset record description
2167 is acknowledged to be very limited. Only a limited number of character
2168 set standards is reasonable for maximal interchange. Any character set
2169 is, of course, possible by prior agreement. It was suggested that
2170 EBCDIC be listed, but it was omitted because it is not defined by a
2171 formal standard. Formal standards, and then only those with reasonably
2172 large followings, can be included here, simply as a matter of practi‐
2173 cality. The <value>s represent names of officially registered character
2174 sets in the format required by the ISO 2375:1985 standard.
2175
2176 The normal <comma> or <blank>-separated list rules are not followed in
2177 the case of keyword options to allow ease of argument parsing for
2178 getopts.
2179
2180 Further information on character encodings is in pax Archive Character
2181 Set Encoding/Decoding.
2182
2183 The standard developers have reserved keyword name space for vendor
2184 extensions. It is suggested that the format to be used is:
2185
2186 VENDOR.keyword
2187
2188 where VENDOR is the name of the vendor or organization in all uppercase
2189 letters. It is further suggested that the keyword following the
2190 <period> be named differently than any of the standard keywords so that
2191 it could be used for future standardization, if appropriate, by omit‐
2192 ting the VENDOR prefix.
2193
2194 The <length> field in the extended header record was included to make
2195 it simpler to step through the records, even if a record contains an
2196 unknown format (to a particular pax) with complex interactions of spe‐
2197 cial characters. It also provides a minor integrity checkpoint within
2198 the records to aid a program attempting to recover files from a damaged
2199 archive.
2200
2201 There are no extended header versions of the devmajor and devminor
2202 fields because the unspecified format ustar header field should be suf‐
2203 ficient. If they are not, vendor-specific extended keywords (such as
2204 VENDOR.devmajor) should be used.
2205
2206 Device and i-number labeling of files was not adopted from cpio; files
2207 are interchanged strictly on a symbolic name basis, as in ustar.
2208
2209 Just as with the ustar format descriptions, the new format makes no
2210 special arrangements for multi-volume archives. Each of the pax archive
2211 types is assumed to be inside a single POSIX file and splitting that
2212 file over multiple volumes (diskettes, tape cartridges, and so on),
2213 processing their labels, and mounting each in the proper sequence are
2214 considered to be implementation details that cannot be described
2215 portably.
2216
2217 The pax format is intended for interchange, not only for backup on a
2218 single (family of) systems. It is not as densely packed as might be
2219 possible for backup:
2220
2221 * It contains information as coded characters that could be coded in
2222 binary.
2223
2224 * It identifies extended records with name fields that could be omit‐
2225 ted in favor of a fixed-field layout.
2226
2227 * It translates names into a portable character set and identifies
2228 locale-related information, both of which are probably unnecessary
2229 for backup.
2230
2231 The requirements on restoring from an archive are slightly different
2232 from the historical wording, allowing for non-monolithic privilege to
2233 bring forward as much as possible. In particular, attributes such as
2234 ``high performance file'' might be broadly but not universally granted
2235 while set-user-ID or chown() might be much more restricted. There is no
2236 implication in POSIX.1‐2008 that the security information be honored
2237 after it is restored to the file hierarchy, in spite of what might be
2238 improperly inferred by the silence on that topic. That is a topic for
2239 another standard.
2240
2241 Links are recorded in the fashion described here because a link can be
2242 to any file type. It is desirable in general to be able to restore part
2243 of an archive selectively and restore all of those files completely. If
2244 the data is not associated with each link, it is not possible to do
2245 this. However, the data associated with a file can be large, and when
2246 selective restoration is not needed, this can be a significant burden.
2247 The archive is structured so that files that have no associated data
2248 can always be restored by the name of any link name of any link, and
2249 the user may choose whether data is recorded with each instance of a
2250 file that contains data. The format permits mixing of both types of
2251 links in a single archive; this can be done for special needs, and pax
2252 is expected to interpret such archives on input properly, despite the
2253 fact that there is no pax option that would force this mixed case on
2254 output. (When −o linkdata is used, the output must contain the dupli‐
2255 cate data, but the implementation is free to include it or omit it when
2256 −o linkdata is not used.)
2257
2258 The time values are included as extended header records for those
2259 implementations needing more than the eleven octal digits allowed by
2260 the ustar format. Portable file timestamps cannot be negative. If pax
2261 encounters a file with a negative timestamp in copy or write mode, it
2262 can reject the file, substitute a non-negative timestamp, or generate a
2263 non-portable timestamp with a leading '−'. Even though some implemen‐
2264 tations can support finer file-time granularities than seconds, the
2265 normative text requires support only for seconds since the Epoch
2266 because the ISO POSIX‐1 standard states them that way. The ustar format
2267 includes only mtime; the new format adds atime and ctime for symmetry.
2268 The atime access time restored to the file system will be affected by
2269 the −p a and −p e options. The ctime creation time (actually inode mod‐
2270 ification time) is described with appropriate privileges so that it can
2271 be ignored when writing to the file system. POSIX does not provide a
2272 portable means to change file creation time. Nothing is intended to
2273 prevent a non-portable implementation of pax from restoring the value.
2274
2275 The gid, size, and uid extended header records were included to allow
2276 expansion beyond the sizes specified in the regular tar header. New
2277 file system architectures are emerging that will exhaust the 12-digit
2278 size field. There are probably not many systems requiring more than 8
2279 digits for user and group IDs, but the extended header values were
2280 included for completeness, allowing overrides for all of the decimal
2281 values in the tar header.
2282
2283 The standard developers intended to describe the effective results of
2284 pax with regard to file ownerships and permissions; implementations are
2285 not restricted in timing or sequencing the restoration of such, pro‐
2286 vided the results are as specified.
2287
2288 Much of the text describing the extended headers refers to use in
2289 ``write or copy modes''. The copy mode references are due to the norma‐
2290 tive text: ``The effect of the copy shall be as if the copied files
2291 were written to an archive file and then subsequently extracted ...''.
2292 There is certainly no way to test whether pax is actually generating
2293 the extended headers in copy mode, but the effects must be as if it
2294 had.
2295
2296 pax Archive Character Set Encoding/Decoding
2297 There is a need to exchange archives of files between systems of dif‐
2298 ferent native codesets. Filenames, group names, and user names must be
2299 preserved to the fullest extent possible when an archive is read on the
2300 receiving platform. Translation of the contents of files is not within
2301 the scope of the pax utility.
2302
2303 There will also be the need to represent characters that are not avail‐
2304 able on the receiving platform. These unsupported characters cannot be
2305 automatically folded to the local set of characters due to the chance
2306 of collisions. This could result in overwriting previous extracted
2307 files from the archive or pre-existing files on the system.
2308
2309 For these reasons, the codeset used to represent characters within the
2310 extended header records of the pax archive must be sufficiently rich to
2311 handle all commonly used character sets. The fields requiring transla‐
2312 tion include, at a minimum, filenames, user names, group names, and
2313 link pathnames. Implementations may wish to have localized extended
2314 keywords that use non-portable characters.
2315
2316 The standard developers considered the following options:
2317
2318 * The archive creator specifies the well-defined name of the source
2319 codeset. The receiver must then recognize the codeset name and per‐
2320 form the appropriate translations to the destination codeset.
2321
2322 * The archive creator includes within the archive the character map‐
2323 ping table for the source codeset used to encode extended header
2324 records. The receiver must then read the character mapping table
2325 and perform the appropriate translations to the destination code‐
2326 set.
2327
2328 * The archive creator translates the extended header records in the
2329 source codeset into a canonical form. The receiver must then per‐
2330 form the appropriate translations to the destination codeset.
2331
2332 The approach that incorporates the name of the source codeset poses the
2333 problem of codeset name registration, and makes the archive useless to
2334 pax archive decoders that do not recognize that codeset.
2335
2336 Because parts of an archive may be corrupted, the standard developers
2337 felt that including the character map of the source codeset was too
2338 fragile. The loss of this one key component could result in making the
2339 entire archive useless. (The difference between this and the global
2340 extended header decision was that the latter has a workaround—duplicat‐
2341 ing extended header records on unreliable media—but this would be too
2342 burdensome for large character set maps.)
2343
2344 Both of the above approaches also put an undue burden on the pax ar‐
2345 chive receiver to handle the cross-product of all source and destina‐
2346 tion codesets.
2347
2348 To simplify the translation from the source codeset to the canonical
2349 form and from the canonical form to the destination codeset, the stan‐
2350 dard developers decided that the internal representation should be a
2351 stateless encoding. A stateless encoding is one where each codepoint
2352 has the same meaning, without regard to the decoder being in a specific
2353 state. An example of a stateful encoding would be the Japanese Shift-
2354 JIS; an example of a stateless encoding would be the ISO/IEC 646:1991
2355 standard (equivalent to 7-bit ASCII).
2356
2357 For these reasons, the standard developers decided to adopt a canonical
2358 format for the representation of file information strings. The obvious,
2359 well-endorsed candidate is the ISO/IEC 10646‐1:2000 standard (based in
2360 part on Unicode), which can be used to represent the characters of vir‐
2361 tually all standardized character sets. The standard developers ini‐
2362 tially agreed upon using UCS2 (16-bit Unicode) as the internal repre‐
2363 sentation. This repertoire of characters provides a sufficiently rich
2364 set to represent all commonly-used codesets.
2365
2366 However, the standard developers found that the 16-bit Unicode repre‐
2367 sentation had some problems. It forced the issue of standardizing byte
2368 ordering. The 2-byte length of each character made the extended header
2369 records twice as long for the case of strings coded entirely from his‐
2370 torical 7-bit ASCII. For these reasons, the standard developers chose
2371 the UTF‐8 defined in the ISO/IEC 10646‐1:2000 standard. This multi-byte
2372 representation encodes UCS2 or UCS4 characters reliably and determinis‐
2373 tically, eliminating the need for a canonical byte ordering. In addi‐
2374 tion, NUL octets and other characters possibly confusing to POSIX file
2375 systems do not appear, except to represent themselves. It was realized
2376 that certain national codesets take up more space after the encoding,
2377 due to their placement within the UCS range; it was felt that the use‐
2378 fulness of the encoding of the names outweighs the disadvantage of size
2379 increase for file, user, and group names.
2380
2381 The encoding of UTF‐8 is as follows:
2382
2383 UCS4 Hex Encoding UTF-8 Binary Encoding
2384
2385 00000000-0000007F 0xxxxxxx
2386 00000080-000007FF 110xxxxx 10xxxxxx
2387 00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
2388 00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
2389 00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
2390 04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
2391
2392 where each 'x' represents a bit value from the character being trans‐
2393 lated.
2394
2395 ustar Interchange Format
2396 The description of the ustar format reflects numerous enhancements over
2397 pre-1988 versions of the historical tar utility. The goal of these
2398 changes was not only to provide the functional enhancements desired,
2399 but also to retain compatibility between new and old versions. This
2400 compatibility has been retained. Archives written using the old ar‐
2401 chive format are compatible with the new format.
2402
2403 Implementors should be aware that the previous file format did not
2404 include a mechanism to archive directory type files. For this reason,
2405 the convention of using a filename ending with <slash> was adopted to
2406 specify a directory on the archive.
2407
2408 The total size of the name and prefix fields have been set to meet the
2409 minimum requirements for {PATH_MAX}. If a pathname will fit within the
2410 name field, it is recommended that the pathname be stored there without
2411 the use of the prefix field. Although the name field is known to be too
2412 small to contain {PATH_MAX} characters, the value was not changed in
2413 this version of the archive file format to retain backwards-compatibil‐
2414 ity, and instead the prefix was introduced. Also, because of the ear‐
2415 lier version of the format, there is no way to remove the restriction
2416 on the linkname field being limited in size to just that of the name
2417 field.
2418
2419 The size field is required to be meaningful in all implementation
2420 extensions, although it could be zero. This is required so that the
2421 data blocks can always be properly counted.
2422
2423 It is suggested that if device special files need to be represented
2424 that cannot be represented in the standard format, that one of the
2425 extension types (A‐Z) be used, and that the additional information for
2426 the special file be represented as data and be reflected in the size
2427 field.
2428
2429 Attempting to restore a special file type, where it is converted to
2430 ordinary data and conflicts with an existing filename, need not be spe‐
2431 cially detected by the utility. If run as an ordinary user, pax should
2432 not be able to overwrite the entries in, for example, /dev in any case
2433 (whether the file is converted to another type or not). If run as a
2434 privileged user, it should be able to do so, and it would be considered
2435 a bug if it did not. The same is true of ordinary data files and simi‐
2436 larly named special files; it is impossible to anticipate the needs of
2437 the user (who could really intend to overwrite the file), so the behav‐
2438 ior should be predictable (and thus regular) and rely on the protection
2439 system as required.
2440
2441 The value 7 in the typeflag field is intended to define how contiguous
2442 files can be stored in a ustar archive. POSIX.1‐2008 does not require
2443 the contiguous file extension, but does define a standard way of ar‐
2444 chiving such files so that all conforming systems can interpret these
2445 file types in a meaningful and consistent manner. On a system that does
2446 not support extended file types, the pax utility should do the best it
2447 can with the file and go on to the next.
2448
2449 The file protection modes are those conventionally used by the ls util‐
2450 ity. This is extended beyond the usage in the ISO POSIX‐2 standard to
2451 support the ``shared text'' or ``sticky'' bit. It is intended that the
2452 conformance document should not document anything beyond the existence
2453 of and support of such a mode. Further extensions are expected to these
2454 bits, particularly with overloading the set-user-ID and set-group-ID
2455 flags.
2456
2457 cpio Interchange Format
2458 The reference to appropriate privileges in the cpio format refers to an
2459 error on standard output; the ustar format does not make comparable
2460 statements.
2461
2462 The model for this format was the historical System V cpio−c data
2463 interchange format. This model documents the portable version of the
2464 cpio format and not the binary version. It has the flexibility to
2465 transfer data of any type described within POSIX.1‐2008, yet is exten‐
2466 sible to transfer data types specific to extensions beyond POSIX.1‐2008
2467 (for example, contiguous files). Because it describes existing prac‐
2468 tice, there is no question of maintaining upwards-compatibility.
2469
2470 cpio Header
2471 There has been some concern that the size of the c_ino field of the
2472 header is too small to handle those systems that have very large inode
2473 numbers. However, the c_ino field in the header is used strictly as a
2474 hard-link resolution mechanism for archives. It is not necessarily the
2475 same value as the inode number of the file in the location from which
2476 that file is extracted.
2477
2478 The name c_magic is based on historical usage.
2479
2480 cpio Filename
2481 For most historical implementations of the cpio utility, {PATH_MAX}
2482 octets can be used to describe the pathname without the addition of any
2483 other header fields (the NUL character would be included in this
2484 count). {PATH_MAX} is the minimum value for pathname size, documented
2485 as 256 bytes. However, an implementation may use c_namesize to deter‐
2486 mine the exact length of the pathname. With the current description of
2487 the <cpio.h> header, this pathname size can be as large as a number
2488 that is described in six octal digits.
2489
2490 Two values are documented under the c_mode field values to provide for
2491 extensibility for known file types:
2492
2493 0110 000 Reserved for contiguous files. The implementation may treat
2494 the rest of the information for this archive like a regular
2495 file. If this file type is undefined, the implementation may
2496 create the file as a regular file.
2497
2498 This provides for extensibility of the cpio format while allowing for
2499 the ability to read old archives. Files of an unknown type may be read
2500 as ``regular files'' on some implementations. On a system that does
2501 not support extended file types, the pax utility should do the best it
2502 can with the file and go on to the next.
2503
2505 None.
2506
2508 Chapter 2, Shell Command Language, cp, ed, getopts, ls, printf
2509
2510 The Base Definitions volume of POSIX.1‐2008, Section 3.169, File Mode
2511 Bits, Chapter 5, File Format Notation, Chapter 8, Environment Vari‐
2512 ables, Section 12.2, Utility Syntax Guidelines, <cpio.h>
2513
2514 The System Interfaces volume of POSIX.1‐2008, chown(), creat(),
2515 fstatat(), mkdir(), mkfifo(), utime(), write()
2516
2518 Portions of this text are reprinted and reproduced in electronic form
2519 from IEEE Std 1003.1, 2013 Edition, Standard for Information Technology
2520 -- Portable Operating System Interface (POSIX), The Open Group Base
2521 Specifications Issue 7, Copyright (C) 2013 by the Institute of Electri‐
2522 cal and Electronics Engineers, Inc and The Open Group. (This is
2523 POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.) In the
2524 event of any discrepancy between this version and the original IEEE and
2525 The Open Group Standard, the original IEEE and The Open Group Standard
2526 is the referee document. The original Standard can be obtained online
2527 at http://www.unix.org/online.html .
2528
2529 Any typographical or formatting errors that appear in this page are
2530 most likely to have been introduced during the conversion of the source
2531 files to man page format. To report such errors, see https://www.ker‐
2532 nel.org/doc/man-pages/reporting_bugs.html .
2533
2534
2535
2536IEEE/The Open Group 2013 PAX(1P)