1FILE(P) POSIX Programmer's Manual FILE(P)
2
3
4
6 file - determine file type
7
9 file [-dh][-M file][-m file] file ...
10
11 file -i [-h] file ...
12
13
15 The file utility shall perform a series of tests in sequence on each
16 specified file in an attempt to classify it:
17
18 1. If file does not exist, cannot be read, or its file status could
19 not be determined, the output shall indicate that the file was pro‐
20 cessed, but that its type could not be determined.
21
22 2. If the file is not a regular file, its file type shall be identi‐
23 fied. The file types directory, FIFO, socket, block special, and
24 character special shall be identified as such. Other implementa‐
25 tion-defined file types may also be identified. If file is a sym‐
26 bolic link, by default the link shall be resolved and file shall
27 test the type of file referenced by the symbolic link. (See the -h
28 and -i options below.)
29
30 3. If the length of file is zero, it shall be identified as an empty
31 file.
32
33 4. The file utility shall examine an initial segment of file and shall
34 make a guess at identifying its contents based on position-sensi‐
35 tive tests. (The answer is not guaranteed to be correct; see the
36 -d, -M, and -m options below.)
37
38 5. The file utility shall examine file and make a guess at identifying
39 its contents based on context-sensitive default system tests. (The
40 answer is not guaranteed to be correct.)
41
42 6. The file shall be identified as a data file.
43
44 If file does not exist, cannot be read, or its file status could not be
45 determined, the output shall indicate that the file was processed, but
46 that its type could not be determined.
47
48 If file is a symbolic link, by default the link shall be resolved and
49 file shall test the type of file referenced by the symbolic link.
50
52 The file utility shall conform to the Base Definitions volume of
53 IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines, except
54 that the order of the -m, -d, and -M options shall be significant.
55
56 The following options shall be supported by the implementation:
57
58 -d Apply any position-sensitive default system tests and context-
59 sensitive default system tests to the file. This is the default
60 if no -M or -m option is specified.
61
62 -h When a symbolic link is encountered, identify the file as a sym‐
63 bolic link. If -h is not specified and file is a symbolic link
64 that refers to a nonexistent file, file shall identify the file
65 as a symbolic link, as if -h had been specified.
66
67 -i If a file is a regular file, do not attempt to classify the type
68 of the file further, but identify the file as specified in the
69 STDOUT section.
70
71 -M file
72 Specify the name of a file containing position-sensitive tests
73 that shall be applied to a file in order to classify it (see the
74 EXTENDED DESCRIPTION). No position-sensitive default system
75 tests nor context-sensitive default system tests shall be
76 applied unless the -d option is also specified.
77
78 -m file
79 Specify the name of a file containing position-sensitive tests
80 that shall be applied to a file in order to classify it (see the
81 EXTENDED DESCRIPTION).
82
83
84 If the -m option is specified without specifying the -d option or the
85 -M option, position-sensitive default system tests shall be applied
86 after the position-sensitive tests specified by the -m option. If the
87 -M option is specified with the -d option, the -m option, or both, or
88 the -m option is specified with the -d option, the concatenation of the
89 position-sensitive tests specified by these options shall be applied in
90 the order specified by the appearance of these options. If a -M or -m
91 file option-argument is -, the results are unspecified.
92
94 The following operand shall be supported:
95
96 file A pathname of a file to be tested.
97
98
100 Not used.
101
103 The file can be any file type.
104
106 The following environment variables shall affect the execution of file:
107
108 LANG Provide a default value for the internationalization variables
109 that are unset or null. (See the Base Definitions volume of
110 IEEE Std 1003.1-2001, Section 8.2, Internationalization Vari‐
111 ables for the precedence of internationalization variables used
112 to determine the values of locale categories.)
113
114 LC_ALL If set to a non-empty string value, override the values of all
115 the other internationalization variables.
116
117 LC_CTYPE
118 Determine the locale for the interpretation of sequences of
119 bytes of text data as characters (for example, single-byte as
120 opposed to multi-byte characters in arguments and input files).
121
122 LC_MESSAGES
123 Determine the locale that should be used to affect the format
124 and contents of diagnostic messages written to standard error
125 and informative messages written to standard output.
126
127 NLSPATH
128 Determine the location of message catalogs for the processing of
129 LC_MESSAGES .
130
131
133 Default.
134
136 In the POSIX locale, the following format shall be used to identify
137 each operand, file specified:
138
139
140 "%s: %s\n", <file>, <type>
141
142 The values for <type> are unspecified, except that in the POSIX locale,
143 if file is identified as one of the types listed in the following ta‐
144 ble, <type> shall contain (but is not limited to) the corresponding
145 string, unless the file is identified by a position-sensitive test
146 specified by a -M or -m option. Each space shown in the strings shall
147 be exactly one <space>.
148
149 Table: File Utility Output Strings
150
151 If file is: <type> shall contain the Notes
152 string:
153 Nonexistent cannot open
154 Block special block special 1
155 Character special character special 1
156 Directory directory 1
157 FIFO fifo 1
158 Socket socket 1
159 Symbolic link symbolic link to 1
160 Regular file regular file 1,2
161 Empty regular file empty 3
162 Regular file that cannot be read cannot open 3
163 Executable binary executable 4,6
164 ar archive library (see ar) archive 4,6
165 Extended cpio format (see pax) cpio archive 4,6
166 Extended tar format (see ustar in pax) tar archive 4,6
167 Shell script commands text 5,6
168 C-language source c program text 5,6
169 FORTRAN source fortran program text 5,6
170 Regular file whose type cannot be deter‐ data
171 mined
172
173 Notes:
174
175 1. This is a file type test.
176
177 2. This test is applied only if the -i option is specified.
178
179 3. This test is applied only if the -i option is not specified.
180
181 4. This is a position-sensitive default system test.
182
183 5. This is a context-sensitive default system test.
184
185 6. Position-sensitive default system tests and context-sensi‐
186 tive default system tests are not applied if the -M option
187 is specified unless the -d option is also specified.
188
189 In the POSIX locale, if file is identified as a symbolic link (see the
190 -h option), the following alternative output format shall be used:
191
192
193 "%s: %s %s\n", <file>, <type>, <contents of link>"
194
195 If the file named by the file operand does not exist, cannot be read,
196 or the type of the file named by the file operand cannot be determined,
197 this shall not be considered an error that affects the exit status.
198
200 The standard error shall be used only for diagnostic messages.
201
203 None.
204
206 A file specified as an option-argument to the -m or -M options shall
207 contain one position-sensitive test per line, which shall be applied to
208 the file. If the test succeeds, the message field of the line shall be
209 printed and no further tests shall be applied, with the exception that
210 tests on immediately following lines beginning with a single '>' char‐
211 acter shall be applied.
212
213 Each line shall be composed of the following four <blank>-separated
214 fields:
215
216 offset An unsigned number (optionally preceded by a single '>' charac‐
217 ter) specifying the offset, in bytes, of the value in the file
218 that is to be compared against the value field of the line. If
219 the file is shorter than the specified offset, the test shall
220 fail.
221
222 If the offset begins with the character '>' , the test contained in the
223 line shall not be applied to the file unless the test on the last line
224 for which the offset did not begin with a '>' was successful. By
225 default, the offset shall be interpreted as an unsigned decimal number.
226 With a leading 0x or 0X, the offset shall be interpreted as a hexadeci‐
227 mal number; otherwise, with a leading 0, the offset shall be inter‐
228 preted as an octal number.
229
230 type The type of the value in the file to be tested. The type shall
231 consist of the type specification characters c , d , f , s , and
232 u , specifying character, signed decimal, floating point,
233 string, and unsigned decimal, respectively.
234
235 The type string shall be interpreted as the bytes from the file start‐
236 ing at the specified offset and including the same number of bytes
237 specified by the value field. If insufficient bytes remain in the file
238 past the offset to match the value field, the test shall fail.
239
240 The type specification characters d , f , and u can be followed by an
241 optional unsigned decimal integer that specifies the number of bytes
242 represented by the type. The type specification character f can be
243 followed by an optional F , D , or L , indicating that the value is of
244 type float, double, or long double, respectively. The type specifica‐
245 tion characters d and u can be followed by an optional C , S , I , or L
246 , indicating that the value is of type char, short, int, or long,
247 respectively.
248
249 The default number of bytes represented by the type specifiers d , f ,
250 and u shall correspond to their respective C-language types as follows.
251 If the system claims conformance to the C-Language Development Utili‐
252 ties option, those specifiers shall correspond to the default sizes
253 used in the c99 utility. Otherwise, the default sizes shall be imple‐
254 mentation-defined.
255
256 For the type specifier characters d and u , the default number of bytes
257 shall correspond to the size of a basic integer type of the implementa‐
258 tion. For these specifier characters, the implementation shall support
259 values of the optional number of bytes to be converted corresponding to
260 the number of bytes in the C-language types char, short, int, or long.
261 These numbers can also be specified by an application as the characters
262 C , S , I , and L , respectively. The byte order used when interpreting
263 numeric values is implementation-defined, but shall correspond to the
264 order in which a constant of the corresponding type is stored in memory
265 on the system.
266
267 For the type specifier f , the default number of bytes shall correspond
268 to the number of bytes in the basic double precision floating-point
269 data type of the underlying implementation. The implementation shall
270 support values of the optional number of bytes to be converted corre‐
271 sponding to the number of bytes in the C-language types float, double,
272 and long double. These numbers can also be specified by an application
273 as the characters F , D , and L , respectively.
274
275 All type specifiers, except for s , can be followed by a mask specifier
276 of the form &number. The mask value shall be AND'ed with the value of
277 the input file before the comparison with the value field of the line
278 is made. By default, the mask shall be interpreted as an unsigned deci‐
279 mal number. With a leading 0x or 0X, the mask shall be interpreted as
280 an unsigned hexadecimal number; otherwise, with a leading 0, the mask
281 shall be interpreted as an unsigned octal number.
282
283 The strings byte, short, long, and string shall also be supported as
284 type fields, being interpreted as dC , dS , dL , and s , respectively.
285
286 value The value to be compared with the value from the file.
287
288 If the specifier from the type field is s or string, then interpret the
289 value as a string. Otherwise, interpret it as a number. If the value is
290 a string, then the test shall succeed only when a string value exactly
291 matches the bytes from the file.
292
293 If the value is a string, it can contain the following sequences:
294
295 \character
296 The backslash-escape sequences as specified in the Base Defini‐
297 tions volume of IEEE Std 1003.1-2001, Table 5-1, Escape
298 Sequences and Associated Actions ( '\\' , '\a' , '\b' , '\f' ,
299 '\n' , '\r' , '\t' , '\v' ). The results of using any other
300 character, other than an octal digit, following the backslash
301 are unspecified.
302
303 \octal
304 Octal sequences that can be used to represent characters with
305 specific coded values. An octal sequence shall consist of a
306 backslash followed by the longest sequence of one, two, or three
307 octal-digit characters (01234567). If the size of a byte on the
308 system is greater than 9 bits, the valid escape sequence used to
309 represent a byte is implementation-defined.
310
311
312 By default, any value that is not a string shall be interpreted as a
313 signed decimal number. Any such value, with a leading 0x or 0X, shall
314 be interpreted as an unsigned hexadecimal number; otherwise, with a
315 leading zero, the value shall be interpreted as an unsigned octal num‐
316 ber.
317
318 If the value is not a string, it can be preceded by a character indi‐
319 cating the comparison to be performed. Permissible characters and the
320 comparisons they specify are as follows:
321
322 =
323 The test shall succeed if the value from the file equals the
324 value field.
325
326 <
327 The test shall succeed if the value from the file is less than
328 the value field.
329
330 >
331 The test shall succeed if the value from the file is greater
332 than the value field.
333
334 &
335 The test shall succeed if all of the set bits in the value field
336 are set in the value from the file.
337
338 ^
339 The test shall succeed if at least one of the set bits in the
340 value field is not set in the value from the file.
341
342 x
343 The test shall succeed if the file is large enough to contain a
344 value of the type specified starting at the offset specified.
345
346
347 message
348 The message to be printed if the test succeeds. The message
349 shall be interpreted using the notation for the printf format‐
350 ting specification; see printf() . If the value field was a
351 string, then the value from the file shall be the argument for
352 the printf formatting specification; otherwise, the value from
353 the file shall be the argument.
354
355
357 The following exit values shall be returned:
358
359 0 Successful completion.
360
361 >0 An error occurred.
362
363
365 Default.
366
367 The following sections are informative.
368
370 The file utility can only be required to guess at many of the file
371 types because only exhaustive testing can determine some types with
372 certainty. For example, binary data on some implementations might match
373 the initial segment of an executable or a tar archive.
374
375 Note that the table indicates that the output contains the stated
376 string. Systems may add text before or after the string. For executa‐
377 bles, as an example, the machine architecture and various facts about
378 how the file was link-edited may be included. Note also that on systems
379 that recognize shell script files starting with "#!" as executable
380 files, these may be identified as executable binary files rather than
381 as shell scripts.
382
384 Determine whether an argument is a binary executable file:
385
386
387 file "$1" | grep -Fq executable &&
388 printf "%s is executable.\n" "$1"
389
391 The -f option was omitted because the same effect can (and should) be
392 obtained using the xargs utility.
393
394 Historical versions of the file utility attempt to identify the follow‐
395 ing types of files: symbolic link, directory, character special, block
396 special, socket, tar archive, cpio archive, SCCS archive, archive
397 library, empty, compress output, pack output, binary data, C source,
398 FORTRAN source, assembler source, nroff/ troff/ eqn/ tbl source troff
399 output, shell script, C shell script, English text, ASCII text, various
400 executables, APL workspace, compiled terminfo entries, and CURSES
401 screen images. Only those types that are reasonably well specified in
402 POSIX or are directly related to POSIX utilities are listed in the ta‐
403 ble.
404
405 Historical systems have used a "magic file" named /etc/magic to help
406 identify file types. Because it is generally useful for users and
407 scripts to be able to identify special file types, the -m flag and a
408 portable format for user-created magic files has been specified. No
409 requirement is made that an implementation of file use this method of
410 identifying files, only that users be permitted to add their own clas‐
411 sifying tests.
412
413 In addition, three options have been added to historical practice. The
414 -d flag has been added to permit users to cause their tests to follow
415 any default system tests. The -i flag has been added to permit users to
416 test portably for regular files in shell scripts. The -M flag has been
417 added to permit users to ignore any default system tests.
418
419 The IEEE Std 1003.1-2001 description of default system tests and the
420 interaction between the -d, -M, and -m options did not clearly indicate
421 that there were two types of "default system tests". The "position-sen‐
422 sitive tests'' determine file types by looking for certain string or
423 binary values at specific offsets in the file being examined. These
424 position-sensitive tests were implemented in historical systems using
425 the magic file described above. Some of these tests are now built into
426 the file utility itself on some implementations so the output can pro‐
427 vide more detail than can be provided by magic files. For example, a
428 magic file can easily identify a core file on most implementations, but
429 cannot name the program file that dropped the core. A magic file could
430 produce output such as:
431
432
433 /home/dwc/core: ELF 32-bit MSB core file SPARC Version 1
434
435 but by building the test into the file utility, you could get output
436 such as:
437
438
439 /home/dwc/core: ELF 32-bit MSB core file SPARC Version 1, from 'testprog'
440
441 These extended built-in tests are still to be treated as position-sen‐
442 sitive default system tests even if they are not listed in /etc/magic
443 or any other magic file.
444
445 The context-sensitive default system tests were always built into the
446 file utility. These tests looked for language constructs in text files
447 trying to identify shell scripts, C, FORTRAN, and other computer lan‐
448 guage source files, and even plain text files. With the addition of the
449 -m and -M options the distinction between position-sensitive and con‐
450 text-sensitive default system tests became important because the order
451 of testing is important. The context-sensitive system default tests
452 should never be applied before any position-sensitive tests even if the
453 -d option is specified before a -m option or -M option due to the high
454 probability that the context-sensitive system default tests will incor‐
455 rectly identify arbitrary text files as text files before position-sen‐
456 sitive tests specified by the -m or -M option would be applied to give
457 a more accurate identification.
458
459 Leaving the meaning of -M - and -m - unspecified allows an existing
460 prototype of these options to continue to work in a backwards-compati‐
461 ble manner. (In that implementation, -M - was roughly equivalent to -d
462 in IEEE Std 1003.1-2001.)
463
464 The historical -c option was omitted as not particularly useful to
465 users or portable shell scripts. In addition, a reasonable implementa‐
466 tion of the file utility would report any errors found each time the
467 magic file is read.
468
469 The historical format of the magic file was the same as that specified
470 by the Rationale in the ISO POSIX-2:1993 standard for the offset,
471 value, and message fields; however, it used less precise type fields
472 than the format specified by the current normative text. The new type
473 field values are a superset of the historical ones.
474
475 The following is an example magic file:
476
477
478 0 short 070707 cpio archive
479 0 short 0143561 Byte-swapped cpio archive
480 0 string 070707 ASCII cpio archive
481 0 long 0177555 Very old archive
482 0 short 0177545 Old archive
483 0 short 017437 Old packed data
484 0 string \037\036 Packed data
485 0 string \377\037 Compacted data
486 0 string \037\235 Compressed data
487 >2 byte&0x80 >0 Block compressed
488 >2 byte&0x1f x %d bits
489 0 string \032\001 Compiled Terminfo Entry
490 0 short 0433 Curses screen image
491 0 short 0434 Curses screen image
492 0 string <ar> System V Release 1 archive
493 0 string !<arch>\n__.SYMDEF Archive random library
494 0 string !<arch> Archive
495 0 string ARF_BEGARF PHIGS clear text archive
496 0 long 0x137A2950 Scalable OpenFont binary
497 0 long 0x137A2951 Encrypted scalable OpenFont binary
498
499 The use of a basic integer data type is intended to allow the implemen‐
500 tation to choose a word size commonly used by applications on that
501 architecture.
502
504 None.
505
507 ar , ls , pax
508
510 Portions of this text are reprinted and reproduced in electronic form
511 from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
512 -- Portable Operating System Interface (POSIX), The Open Group Base
513 Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
514 Electrical and Electronics Engineers, Inc and The Open Group. In the
515 event of any discrepancy between this version and the original IEEE and
516 The Open Group Standard, the original IEEE and The Open Group Standard
517 is the referee document. The original Standard can be obtained online
518 at http://www.opengroup.org/unix/online.html .
519
520
521
522IEEE/The Open Group 2003 FILE(P)