1CPIO(5) BSD File Formats Manual CPIO(5)
2
4 cpio — format of cpio archive files
5
7 The cpio archive format collects any number of files, directories, and
8 other file system objects (symbolic links, device nodes, etc.) into a
9 single stream of bytes.
10
11 General Format
12 Each file system object in a cpio archive comprises a header record with
13 basic numeric metadata followed by the full pathname of the entry and the
14 file data. The header record stores a series of integer values that gen‐
15 erally follow the fields in struct stat. (See stat(2) for details.) The
16 variants differ primarily in how they store those integers (binary, oc‐
17 tal, or hexadecimal). The header is followed by the pathname of the en‐
18 try (the length of the pathname is stored in the header) and any file
19 data. The end of the archive is indicated by a special record with the
20 pathname “TRAILER!!!”.
21
22 PWB format
23 The PWB binary cpio format is the original format, when cpio was intro‐
24 duced as part of the Programmer's Work Bench system, a variant of 6th
25 Edition UNIX. It stores numbers as 2-byte and 4-byte binary values.
26 Each entry begins with a header in the following format:
27
28 struct header_pwb_cpio {
29 short h_magic;
30 short h_dev;
31 short h_ino;
32 short h_mode;
33 short h_uid;
34 short h_gid;
35 short h_nlink;
36 short h_majmin;
37 long h_mtime;
38 short h_namesize;
39 long h_filesize;
40 };
41
42 The short fields here are 16-bit integer values, while the long fields
43 are 32 bit integers. Since PWB UNIX, like the 6th Edition UNIX it was
44 based on, only ran on PDP-11 computers, they are in PDP-endian format,
45 which has little-endian shorts, and big-endian longs. That is, the long
46 integer whose hexadecimal representation is 0x12345678 would be stored in
47 four successive bytes as 0x34, 0x12, 0x78, 0x56. The fields are as fol‐
48 lows:
49
50 h_magic
51 The integer value octal 070707.
52
53 h_dev, h_ino
54 The device and inode numbers from the disk. These are used by
55 programs that read cpio archives to determine when two entries
56 refer to the same file. Programs that synthesize cpio archives
57 should be careful to set these to distinct values for each entry.
58
59 h_mode The mode specifies both the regular permissions and the file
60 type, and it also holds a couple of bits that are irrelevant to
61 the cpio format, because the field is actually a raw copy of the
62 mode field in the inode representing the file. These are the
63 IALLOC flag, which shows that the inode entry is in use, and the
64 ILARG flag, which shows that the file it represents is large
65 enough to have indirect blocks pointers in the inode. The mode
66 is decoded as follows:
67
68 0100000 IALLOC flag - irrelevant to cpio.
69 0060000 This masks the file type bits.
70 0040000 File type value for directories.
71 0020000 File type value for character special devices.
72 0060000 File type value for block special devices.
73 0010000 ILARG flag - irrelevant to cpio.
74 0004000 SUID bit.
75 0002000 SGID bit.
76 0001000 Sticky bit.
77 0000777 The lower 9 bits specify read/write/execute permissions
78 for world, group, and user following standard POSIX con‐
79 ventions.
80
81 h_uid, h_gid
82 The numeric user id and group id of the owner.
83
84 h_nlink
85 The number of links to this file. Directories always have a
86 value of at least two here. Note that hardlinked files include
87 file data with every copy in the archive.
88
89 h_majmin
90 For block special and character special entries, this field con‐
91 tains the associated device number, with the major number in the
92 high byte, and the minor number in the low byte. For all other
93 entry types, it should be set to zero by writers and ignored by
94 readers.
95
96 h_mtime
97 Modification time of the file, indicated as the number of seconds
98 since the start of the epoch, 00:00:00 UTC January 1, 1970.
99
100 h_namesize
101 The number of bytes in the pathname that follows the header.
102 This count includes the trailing NUL byte.
103
104 h_filesize
105 The size of the file. Note that this archive format is limited
106 to 16 megabyte file sizes, because PWB UNIX, like 6th Edition,
107 only used an unsigned 24 bit integer for the file size inter‐
108 nally.
109
110 The pathname immediately follows the fixed header. If h_namesize is odd,
111 an additional NUL byte is added after the pathname. The file data is
112 then appended, again with an additional NUL appended if needed to get the
113 next header at an even offset.
114
115 Hardlinked files are not given special treatment; the full file contents
116 are included with each copy of the file.
117
118 New Binary Format
119 The new binary cpio format showed up when cpio was adopted into late 7th
120 Edition UNIX. It is exactly like the PWB binary format, described above,
121 except for three changes:
122
123 First, UNIX now ran on more than one hardware type, so the endianness of
124 16 bit integers must be determined by observing the magic number at the
125 start of the header. The 32 bit integers are still always stored with
126 the most significant word first, though, so each of those two, in the
127 struct shown above, was stored as an array of two 16 bit integers, in the
128 traditional order. Those 16 bit integers, like all the others in the
129 struct, were accessed using a macro that byte swapped them if necessary.
130
131 Next, 7th Edition had more file types to store, and the IALLOC and ILARG
132 flag bits were re-purposed to accommodate these. The revised use of the
133 various bits is as follows:
134
135 0170000 This masks the file type bits.
136 0140000 File type value for sockets.
137 0120000 File type value for symbolic links. For symbolic links, the
138 link body is stored as file data.
139 0100000 File type value for regular files.
140 0060000 File type value for block special devices.
141 0040000 File type value for directories.
142 0020000 File type value for character special devices.
143 0010000 File type value for named pipes or FIFOs.
144 0004000 SUID bit.
145 0002000 SGID bit.
146 0001000 Sticky bit.
147 0000777 The lower 9 bits specify read/write/execute permissions for
148 world, group, and user following standard POSIX conventions.
149
150 Finally, the file size field now represents a signed 32 bit integer in
151 the underlying file system, so the maximum file size has increased to 2
152 gigabytes.
153
154 Note that there is no obvious way to tell which of the two binary formats
155 an archive uses, other than to see which one makes more sense. The typi‐
156 cal error scenario is that a PWB format archive unpacked as if it were in
157 the new format will create named sockets instead of directories, and then
158 fail to unpack files that should go in those directories. Running
159 bsdcpio -itv on an unknown archive will make it obvious which it is: if
160 it's PWB format, directories will be listed with an 's' instead of a 'd'
161 as the first character of the mode string, and the larger files will have
162 a '?' in that position.
163
164 Portable ASCII Format
165 Version 2 of the Single UNIX Specification (“SUSv2”) standardized an
166 ASCII variant that is portable across all platforms. It is commonly
167 known as the “old character” format or as the “odc” format. It stores
168 the same numeric fields as the old binary format, but represents them as
169 6-character or 11-character octal values.
170
171 struct cpio_odc_header {
172 char c_magic[6];
173 char c_dev[6];
174 char c_ino[6];
175 char c_mode[6];
176 char c_uid[6];
177 char c_gid[6];
178 char c_nlink[6];
179 char c_rdev[6];
180 char c_mtime[11];
181 char c_namesize[6];
182 char c_filesize[11];
183 };
184
185 The fields are identical to those in the new binary format. The name and
186 file body follow the fixed header. Unlike the binary formats, there is
187 no additional padding after the pathname or file contents. If the files
188 being archived are themselves entirely ASCII, then the resulting archive
189 will be entirely ASCII, except for the NUL byte that terminates the name
190 field.
191
192 New ASCII Format
193 The "new" ASCII format uses 8-byte hexadecimal fields for all numbers and
194 separates device numbers into separate fields for major and minor num‐
195 bers.
196
197 struct cpio_newc_header {
198 char c_magic[6];
199 char c_ino[8];
200 char c_mode[8];
201 char c_uid[8];
202 char c_gid[8];
203 char c_nlink[8];
204 char c_mtime[8];
205 char c_filesize[8];
206 char c_devmajor[8];
207 char c_devminor[8];
208 char c_rdevmajor[8];
209 char c_rdevminor[8];
210 char c_namesize[8];
211 char c_check[8];
212 };
213
214 Except as specified below, the fields here match those specified for the
215 new binary format above.
216
217 magic The string “070701”.
218
219 check This field is always set to zero by writers and ignored by read‐
220 ers. See the next section for more details.
221
222 The pathname is followed by NUL bytes so that the total size of the fixed
223 header plus pathname is a multiple of four. Likewise, the file data is
224 padded to a multiple of four bytes. Note that this format supports only
225 4 gigabyte files (unlike the older ASCII format, which supports 8 giga‐
226 byte files).
227
228 In this format, hardlinked files are handled by setting the filesize to
229 zero for each entry except the first one that appears in the archive.
230
231 New CRC Format
232 The CRC format is identical to the new ASCII format described in the pre‐
233 vious section except that the magic field is set to “070702” and the
234 check field is set to the sum of all bytes in the file data. This sum is
235 computed treating all bytes as unsigned values and using unsigned arith‐
236 metic. Only the least-significant 32 bits of the sum are stored.
237
238 HP variants
239 The cpio implementation distributed with HPUX used XXXX but stored device
240 numbers differently XXX.
241
242 Other Extensions and Variants
243 Sun Solaris uses additional file types to store extended file data, in‐
244 cluding ACLs and extended attributes, as special entries in cpio ar‐
245 chives.
246
247 XXX Others? XXX
248
250 cpio(1), tar(5)
251
253 The cpio utility is no longer a part of POSIX or the Single Unix Stan‐
254 dard. It last appeared in Version 2 of the Single UNIX Specification
255 (“SUSv2”). It has been supplanted in subsequent standards by pax(1).
256 The portable ASCII format is currently part of the specification for the
257 pax(1) utility.
258
260 The original cpio utility was written by Dick Haight while working in
261 AT&T's Unix Support Group. It appeared in 1977 as part of PWB/UNIX 1.0,
262 the “Programmer's Work Bench” derived from Version 6 AT&T UNIX that was
263 used internally at AT&T. Both the new binary and old character formats
264 were in use by 1980, according to the System III source released by SCO
265 under their “Ancient Unix” license. The character format was adopted as
266 part of IEEE Std 1003.1-1988 (“POSIX.1”). XXX when did "newc" appear?
267 Who invented it? When did HP come out with their variant? When did Sun
268 introduce ACLs and extended attributes? XXX
269
271 The “CRC” format is mis-named, as it uses a simple checksum and not a
272 cyclic redundancy check.
273
274 The binary formats are limited to 16 bits for user id, group id, device,
275 and inode numbers. They are limited to 16 megabyte and 2 gigabyte file
276 sizes for the older and newer variants, respectively.
277
278 The old ASCII format is limited to 18 bits for the user id, group id, de‐
279 vice, and inode numbers. It is limited to 8 gigabyte file sizes.
280
281 The new ASCII format is limited to 4 gigabyte file sizes.
282
283 None of the cpio formats store user or group names, which are essential
284 when moving files between systems with dissimilar user or group number‐
285 ing.
286
287 Especially when writing older cpio variants, it may be necessary to map
288 actual device/inode values to synthesized values that fit the available
289 fields. With very large filesystems, this may be necessary even for the
290 newer formats.
291
292BSD December 23, 2011 BSD