1FILE(1) BSD General Commands Manual FILE(1)
2
4 file — determine file type
5
7 file [-bchikLNnprsvz0] [--apple] [--mime-encoding] [--mime-type]
8 [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ...
9 file -C [-m magicfiles]
10 file [--help]
11
13 This manual page documents version 5.04 of the file command.
14
15 file tests each argument in an attempt to classify it. There are three
16 sets of tests, performed in this order: filesystem tests, magic tests,
17 and language tests. The first test that succeeds causes the file type to
18 be printed.
19
20 The type printed will usually contain one of the words text (the file
21 contains only printing characters and a few common control characters and
22 is probably safe to read on an ASCII terminal), executable (the file con‐
23 tains the result of compiling a program in a form understandable to some
24 UNIX kernel or another), or data meaning anything else (data is usually
25 ‘binary’ or non-printable). Exceptions are well-known file formats (core
26 files, tar archives) that are known to contain binary data. When modify‐
27 ing magic files or the program itself, make sure to preserve these
28 keywords. Users depend on knowing that all the readable files in a
29 directory have the word ‘text’ printed. Don't do as Berkeley did and
30 change ‘shell commands text’ to ‘shell script’.
31
32 The filesystem tests are based on examining the return from a stat(2)
33 system call. The program checks to see if the file is empty, or if it's
34 some sort of special file. Any known file types appropriate to the sys‐
35 tem you are running on (sockets, symbolic links, or named pipes (FIFOs)
36 on those systems that implement them) are intuited if they are defined in
37 the system header file <sys/stat.h>.
38
39 The magic tests are used to check for files with data in particular fixed
40 formats. The canonical example of this is a binary executable (compiled
41 program) a.out file, whose format is defined in <elf.h>, <a.out.h> and
42 possibly <exec.h> in the standard include directory. These files have a
43 ‘magic number’ stored in a particular place near the beginning of the
44 file that tells the UNIX operating system that the file is a binary exe‐
45 cutable, and which of several types thereof. The concept of a ‘magic’
46 has been applied by extension to data files. Any file with some invari‐
47 ant identifier at a small fixed offset into the file can usually be
48 described in this way. The information identifying these files is read
49 from the compiled magic file /usr/share/misc/magic.mgc, or the files in
50 the directory /usr/share/misc/magic if the compiled file does not exist.
51 In addition, if $HOME/.magic.mgc or $HOME/.magic exists, it will be used
52 in preference to the system magic files. If /etc/magic exists, it will
53 be used together with other magic files.
54
55 If a file does not match any of the entries in the magic file, it is
56 examined to see if it seems to be a text file. ASCII, ISO-8859-x, non-
57 ISO 8-bit extended-ASCII character sets (such as those used on Macintosh
58 and IBM PC systems), UTF-8-encoded Unicode, UTF-16-encoded Unicode, and
59 EBCDIC character sets can be distinguished by the different ranges and
60 sequences of bytes that constitute printable text in each set. If a file
61 passes any of these tests, its character set is reported. ASCII,
62 ISO-8859-x, UTF-8, and extended-ASCII files are identified as ‘text’
63 because they will be mostly readable on nearly any terminal; UTF-16 and
64 EBCDIC are only ‘character data’ because, while they contain text, it is
65 text that will require translation before it can be read. In addition,
66 file will attempt to determine other characteristics of text-type files.
67 If the lines of a file are terminated by CR, CRLF, or NEL, instead of the
68 Unix-standard LF, this will be reported. Files that contain embedded
69 escape sequences or overstriking will also be identified.
70
71 Once file has determined the character set used in a text-type file, it
72 will attempt to determine in what language the file is written. The lan‐
73 guage tests look for particular strings (cf. <names.h> ) that can appear
74 anywhere in the first few blocks of a file. For example, the keyword .br
75 indicates that the file is most likely a troff(1) input file, just as the
76 keyword struct indicates a C program. These tests are less reliable than
77 the previous two groups, so they are performed last. The language test
78 routines also test for some miscellany (such as tar(1) archives).
79
80 Any file that cannot be identified as having been written in any of the
81 character sets listed above is simply said to be ‘data’.
82
84 -b, --brief
85 Do not prepend filenames to output lines (brief mode).
86
87 -C, --compile
88 Write a magic.mgc output file that contains a pre-parsed version
89 of the magic file or directory.
90
91 -c, --checking-printout
92 Cause a checking printout of the parsed form of the magic file.
93 This is usually used in conjunction with the -m flag to debug a
94 new magic file before installing it.
95
96 -e, --exclude testname
97 Exclude the test named in testname from the list of tests made to
98 determine the file type. Valid test names are:
99
100 apptype EMX application type (only on EMX).
101
102 text Various types of text files (this test will try to
103 guess the text encoding, irrespective of the setting of
104 the ‘encoding’ option).
105
106 encoding Different text encodings for soft magic tests.
107
108 tokens Looks for known tokens inside text files.
109
110 cdf Prints details of Compound Document Files.
111
112 compress Checks for, and looks inside, compressed files.
113
114 elf Prints ELF file details.
115
116 soft Consults magic files.
117
118 tar Examines tar files.
119
120 -F, --separator separator
121 Use the specified string as the separator between the filename
122 and the file result returned. Defaults to ‘:’.
123
124 -f, --files-from namefile
125 Read the names of the files to be examined from namefile (one per
126 line) before the argument list. Either namefile or at least one
127 filename argument must be present; to test the standard input,
128 use ‘-’ as a filename argument.
129
130 -h, --no-dereference
131 option causes symlinks not to be followed (on systems that sup‐
132 port symbolic links). This is the default if the environment
133 variable POSIXLY_CORRECT is not defined.
134
135 -i, --mime
136 Causes the file command to output mime type strings rather than
137 the more traditional human readable ones. Thus it may say
138 ‘text/plain; charset=us-ascii’ rather than ‘ASCII text’. In
139 order for this option to work, file changes the way it handles
140 files recognized by the command itself (such as many of the text
141 file types, directories etc), and makes use of an alternative
142 ‘magic’ file. (See the FILES section, below).
143
144 --mime-type, --mime-encoding
145 Like -i, but print only the specified element(s).
146
147 -k, --keep-going
148 Don't stop at the first match, keep going. Subsequent matches
149 will be have the string ‘\012- ’ prepended. (If you want a new‐
150 line, see the ‘-r’ option.)
151
152 -L, --dereference
153 option causes symlinks to be followed, as the like-named option
154 in ls(1) (on systems that support symbolic links). This is the
155 default if the environment variable POSIXLY_CORRECT is defined.
156
157 -m, --magic-file magicfiles
158 Specify an alternate list of files and directories containing
159 magic. This can be a single item, or a colon-separated list. If
160 a compiled magic file is found alongside a file or directory, it
161 will be used instead.
162
163 -N, --no-pad
164 Don't pad filenames so that they align in the output.
165
166 -n, --no-buffer
167 Force stdout to be flushed after checking each file. This is
168 only useful if checking a list of files. It is intended to be
169 used by programs that want filetype output from a pipe.
170
171 -p, --preserve-date
172 On systems that support utime(2) or utimes(2), attempt to pre‐
173 serve the access time of files analyzed, to pretend that file
174 never read them.
175
176 -r, --raw
177 Don't translate unprintable characters to \ooo. Normally file
178 translates unprintable characters to their octal representation.
179
180 -s, --special-files
181 Normally, file only attempts to read and determine the type of
182 argument files which stat(2) reports are ordinary files. This
183 prevents problems, because reading special files may have pecu‐
184 liar consequences. Specifying the -s option causes file to also
185 read argument files which are block or character special files.
186 This is useful for determining the filesystem types of the data
187 in raw disk partitions, which are block special files. This
188 option also causes file to disregard the file size as reported by
189 stat(2) since on some systems it reports a zero size for raw disk
190 partitions.
191
192 -v, --version
193 Print the version of the program and exit.
194
195 -z, --uncompress
196 Try to look inside compressed files.
197
198 -0, --print0
199 Output a null character ‘\0’ after the end of the filename. Nice
200 to cut(1) the output. This does not affect the separator which is
201 still printed.
202
203 --help Print a help message and exit.
204
206 /usr/share/misc/magic.mgc Default compiled list of magic.
207 /usr/share/misc/magic Directory containing default magic files.
208
210 The environment variable MAGIC can be used to set the default magic file
211 name. If that variable is set, then file will not attempt to open
212 $HOME/.magic. file adds ‘.mgc’ to the value of this variable as appro‐
213 priate. The environment variable POSIXLY_CORRECT controls (on systems
214 that support symbolic links), whether file will attempt to follow sym‐
215 links or not. If set, then file follows symlink, otherwise it does not.
216 This is also controlled by the -L and -h options.
217
219 magic(5), strings(1), od(1), hexdump(1,) file(1posix)
220
222 This program is believed to exceed the System V Interface Definition of
223 FILE(CMD), as near as one can determine from the vague language contained
224 therein. Its behavior is mostly compatible with the System V program of
225 the same name. This version knows more magic, however, so it will pro‐
226 duce different (albeit more accurate) output in many cases.
227
228 The one significant difference between this version and System V is that
229 this version treats any white space as a delimiter, so that spaces in
230 pattern strings must be escaped. For example,
231
232 >10 string language impress (imPRESS data)
233
234 in an existing magic file would have to be changed to
235
236 >10 string language\ impress (imPRESS data)
237
238 In addition, in this version, if a pattern string contains a backslash,
239 it must be escaped. For example
240
241 0 string \begindata Andrew Toolkit document
242
243 in an existing magic file would have to be changed to
244
245 0 string \\begindata Andrew Toolkit document
246
247 SunOS releases 3.2 and later from Sun Microsystems include a file command
248 derived from the System V one, but with some extensions. My version dif‐
249 fers from Sun's only in minor ways. It includes the extension of the ‘&’
250 operator, used as, for example,
251
252 >16 long&0x7fffffff >0 not stripped
253
255 The magic file entries have been collected from various sources, mainly
256 USENET, and contributed by various authors. Christos Zoulas (address
257 below) will collect additional or corrected magic file entries. A con‐
258 solidation of magic file entries will be distributed periodically.
259
260 The order of entries in the magic file is significant. Depending on what
261 system you are using, the order that they are put together may be incor‐
262 rect. If your old file command uses a magic file, keep the old magic
263 file around for comparison purposes (rename it to
264 /usr/share/misc/magic.orig ).
265
267 $ file file.c file /dev/{wd0a,hda}
268 file.c: C program text
269 file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
270 dynamically linked (uses shared libs), stripped
271 /dev/wd0a: block special (0/0)
272 /dev/hda: block special (3/0)
273
274 $ file -s /dev/wd0{b,d}
275 /dev/wd0b: data
276 /dev/wd0d: x86 boot sector
277
278 $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
279 /dev/hda: x86 boot sector
280 /dev/hda1: Linux/i386 ext2 filesystem
281 /dev/hda2: x86 boot sector
282 /dev/hda3: x86 boot sector, extended partition table
283 /dev/hda4: Linux/i386 ext2 filesystem
284 /dev/hda5: Linux/i386 swap file
285 /dev/hda6: Linux/i386 swap file
286 /dev/hda7: Linux/i386 swap file
287 /dev/hda8: Linux/i386 swap file
288 /dev/hda9: empty
289 /dev/hda10: empty
290
291 $ file -i file.c file /dev/{wd0a,hda}
292 file.c: text/x-c
293 file: application/x-executable
294 /dev/hda: application/x-not-regular-file
295 /dev/wd0a: application/x-not-regular-file
296
297
299 There has been a file command in every UNIX since at least Research
300 Version 4 (man page dated November, 1973). The System V version intro‐
301 duced one significant major change: the external list of magic types.
302 This slowed the program down slightly but made it a lot more flexible.
303
304 This program, based on the System V version, was written by Ian Darwin
305 <ian@darwinsys.com> without looking at anybody else's source code.
306
307 John Gilmore revised the code extensively, making it better than the
308 first version. Geoff Collyer found several inadequacies and provided
309 some magic file entries. Contributions by the `&' operator by Rob McMa‐
310 hon, cudcv@warwick.ac.uk, 1989.
311
312 Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
313
314 Primary development and maintenance from 1990 to the present by Christos
315 Zoulas (christos@astron.com).
316
317 Altered by Chris Lowth, chris@lowth.com, 2000: Handle the -i option to
318 output mime type strings, using an alternative magic file and internal
319 logic.
320
321 Altered by Eric Fischer (enf@pobox.com), July, 2000, to identify charac‐
322 ter codes and attempt to identify the languages of non-ASCII files.
323
324 Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME
325 support and merge MIME and non-MIME magic, support directories as well as
326 files of magic, apply many bug fixes and improve the build system.
327
328 The list of contributors to the ‘magic’ directory (magic files) is too
329 long to include here. You know who you are; thank you. Many contribu‐
330 tors are listed in the source files.
331
333 Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. Covered by the
334 standard Berkeley Software Distribution copyright; see the file
335 LEGAL.NOTICE in the source distribution.
336
337 The files tar.h and is_tar.c were written by John Gilmore from his pub‐
338 lic-domain tar(1) program, and are not covered by the above license.
339
341 There must be a better way to automate the construction of the Magic file
342 from all the glop in Magdir. What is it?
343
344 file uses several algorithms that favor speed over accuracy, thus it can
345 be misled about the contents of text files.
346
347 The support for text files (primarily for programming languages) is sim‐
348 plistic, inefficient and requires recompilation to update.
349
350 The list of keywords in ascmagic probably belongs in the Magic file.
351 This could be done by using some keyword like ‘*’ for the offset value.
352
353 Complain about conflicts in the magic file entries. Make a rule that the
354 magic entries sort based on file offset rather than position within the
355 magic file?
356
357 The program should provide a way to give an estimate of ‘how good’ a
358 guess is. We end up removing guesses (e.g. ‘Fromas first 5 chars of
359 file) because’ they are not as good as other guesses (e.g. ‘Newsgroups:’
360 versus ‘Return-Path:’ ). Still, if the others don't pan out, it should
361 be possible to use the first guess.
362
363 This manual page, and particularly this section, is too long.
364
366 file returns 0 on success, and non-zero on error.
367
368 If the file named by the file operand does not exist, cannot be read, or
369 the type of the file named by the file operand cannot be determined, this
370 is not be considered an error that affects the exit status.
371
373 You can obtain the original author's latest version by anonymous FTP on
374 ftp.astron.com in the directory /pub/file/file-X.YZ.tar.gz
375
376BSD October 9, 2008 BSD