1FILE(1)                   BSD General Commands Manual                  FILE(1)
2

NAME

4     file — determine file type
5

SYNOPSIS

7     file [-bchikLNnprsvz0] [--apple] [--mime-encoding] [--mime-type]
8          [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ...
9     file -C [-m magicfiles]
10     file [--help]
11

DESCRIPTION

13     This manual page documents version 5.04 of the file command.
14
15     file tests each argument in an attempt to classify it.  There are three
16     sets of tests, performed in this order: filesystem tests, magic tests,
17     and language tests.  The first test that succeeds causes the file type to
18     be printed.
19
20     The type printed will usually contain one of the words text (the file
21     contains only printing characters and a few common control characters and
22     is probably safe to read on an ASCII terminal), executable (the file con‐
23     tains the result of compiling a program in a form understandable to some
24     UNIX kernel or another), or data meaning anything else (data is usually
25     ‘binary’ or non-printable).  Exceptions are well-known file formats (core
26     files, tar archives) that are known to contain binary data.  When modify‐
27     ing magic files or the program itself, make sure to preserve these
28     keywords.  Users depend on knowing that all the readable files in a
29     directory have the word ‘text’ printed.  Don't do as Berkeley did and
30     change ‘shell commands text’ to ‘shell script’.
31
32     The filesystem tests are based on examining the return from a stat(2)
33     system call.  The program checks to see if the file is empty, or if it's
34     some sort of special file.  Any known file types appropriate to the sys‐
35     tem you are running on (sockets, symbolic links, or named pipes (FIFOs)
36     on those systems that implement them) are intuited if they are defined in
37     the system header file <sys/stat.h>.
38
39     The magic tests are used to check for files with data in particular fixed
40     formats.  The canonical example of this is a binary executable (compiled
41     program) a.out file, whose format is defined in <elf.h>, <a.out.h> and
42     possibly <exec.h> in the standard include directory.  These files have a
43     ‘magic number’ stored in a particular place near the beginning of the
44     file that tells the UNIX operating system that the file is a binary exe‐
45     cutable, and which of several types thereof.  The concept of a ‘magic’
46     has been applied by extension to data files.  Any file with some invari‐
47     ant identifier at a small fixed offset into the file can usually be
48     described in this way.  The information identifying these files is read
49     from the compiled magic file /usr/share/misc/magic.mgc, or the files in
50     the directory /usr/share/misc/magic if the compiled file does not exist.
51     In addition, if $HOME/.magic.mgc or $HOME/.magic exists, it will be used
52     in preference to the system magic files.  If /etc/magic exists, it will
53     be used together with other magic files.
54
55     If a file does not match any of the entries in the magic file, it is
56     examined to see if it seems to be a text file.  ASCII, ISO-8859-x, non-
57     ISO 8-bit extended-ASCII character sets (such as those used on Macintosh
58     and IBM PC systems), UTF-8-encoded Unicode, UTF-16-encoded Unicode, and
59     EBCDIC character sets can be distinguished by the different ranges and
60     sequences of bytes that constitute printable text in each set.  If a file
61     passes any of these tests, its character set is reported.  ASCII,
62     ISO-8859-x, UTF-8, and extended-ASCII files are identified as ‘text’
63     because they will be mostly readable on nearly any terminal; UTF-16 and
64     EBCDIC are only ‘character data’ because, while they contain text, it is
65     text that will require translation before it can be read.  In addition,
66     file will attempt to determine other characteristics of text-type files.
67     If the lines of a file are terminated by CR, CRLF, or NEL, instead of the
68     Unix-standard LF, this will be reported.  Files that contain embedded
69     escape sequences or overstriking will also be identified.
70
71     Once file has determined the character set used in a text-type file, it
72     will attempt to determine in what language the file is written.  The lan‐
73     guage tests look for particular strings (cf.  <names.h> ) that can appear
74     anywhere in the first few blocks of a file.  For example, the keyword .br
75     indicates that the file is most likely a troff(1) input file, just as the
76     keyword struct indicates a C program.  These tests are less reliable than
77     the previous two groups, so they are performed last.  The language test
78     routines also test for some miscellany (such as tar(1) archives).
79
80     Any file that cannot be identified as having been written in any of the
81     character sets listed above is simply said to be ‘data’.
82

OPTIONS

84     -b, --brief
85             Do not prepend filenames to output lines (brief mode).
86
87     -C, --compile
88             Write a magic.mgc output file that contains a pre-parsed version
89             of the magic file or directory.
90
91     -c, --checking-printout
92             Cause a checking printout of the parsed form of the magic file.
93             This is usually used in conjunction with the -m flag to debug a
94             new magic file before installing it.
95
96     -e, --exclude testname
97             Exclude the test named in testname from the list of tests made to
98             determine the file type. Valid test names are:
99
100             apptype   EMX application type (only on EMX).
101
102             text      Various types of text files (this test will try to
103                       guess the text encoding, irrespective of the setting of
104                       the ‘encoding’ option).
105
106             encoding  Different text encodings for soft magic tests.
107
108             tokens    Looks for known tokens inside text files.
109
110             cdf       Prints details of Compound Document Files.
111
112             compress  Checks for, and looks inside, compressed files.
113
114             elf       Prints ELF file details.
115
116             soft      Consults magic files.
117
118             tar       Examines tar files.
119
120     -F, --separator separator
121             Use the specified string as the separator between the filename
122             and the file result returned. Defaults to ‘:’.
123
124     -f, --files-from namefile
125             Read the names of the files to be examined from namefile (one per
126             line) before the argument list.  Either namefile or at least one
127             filename argument must be present; to test the standard input,
128             use ‘-’ as a filename argument.
129
130     -h, --no-dereference
131             option causes symlinks not to be followed (on systems that sup‐
132             port symbolic links). This is the default if the environment
133             variable POSIXLY_CORRECT is not defined.
134
135     -i, --mime
136             Causes the file command to output mime type strings rather than
137             the more traditional human readable ones. Thus it may say
138             ‘text/plain; charset=us-ascii’ rather than ‘ASCII text’.  In
139             order for this option to work, file changes the way it handles
140             files recognized by the command itself (such as many of the text
141             file types, directories etc), and makes use of an alternative
142             ‘magic’ file.  (See the FILES section, below).
143
144     --mime-type, --mime-encoding
145             Like -i, but print only the specified element(s).
146
147     -k, --keep-going
148             Don't stop at the first match, keep going. Subsequent matches
149             will be have the string ‘\012- ’ prepended.  (If you want a new‐
150             line, see the ‘-r’ option.)
151
152     -L, --dereference
153             option causes symlinks to be followed, as the like-named option
154             in ls(1) (on systems that support symbolic links).  This is the
155             default if the environment variable POSIXLY_CORRECT is defined.
156
157     -m, --magic-file magicfiles
158             Specify an alternate list of files and directories containing
159             magic.  This can be a single item, or a colon-separated list.  If
160             a compiled magic file is found alongside a file or directory, it
161             will be used instead.
162
163     -N, --no-pad
164             Don't pad filenames so that they align in the output.
165
166     -n, --no-buffer
167             Force stdout to be flushed after checking each file.  This is
168             only useful if checking a list of files.  It is intended to be
169             used by programs that want filetype output from a pipe.
170
171     -p, --preserve-date
172             On systems that support utime(2) or utimes(2), attempt to pre‐
173             serve the access time of files analyzed, to pretend that file
174             never read them.
175
176     -r, --raw
177             Don't translate unprintable characters to \ooo.  Normally file
178             translates unprintable characters to their octal representation.
179
180     -s, --special-files
181             Normally, file only attempts to read and determine the type of
182             argument files which stat(2) reports are ordinary files.  This
183             prevents problems, because reading special files may have pecu‐
184             liar consequences.  Specifying the -s option causes file to also
185             read argument files which are block or character special files.
186             This is useful for determining the filesystem types of the data
187             in raw disk partitions, which are block special files.  This
188             option also causes file to disregard the file size as reported by
189             stat(2) since on some systems it reports a zero size for raw disk
190             partitions.
191
192     -v, --version
193             Print the version of the program and exit.
194
195     -z, --uncompress
196             Try to look inside compressed files.
197
198     -0, --print0
199             Output a null character ‘\0’ after the end of the filename. Nice
200             to cut(1) the output. This does not affect the separator which is
201             still printed.
202
203     --help  Print a help message and exit.
204

FILES

206     /usr/share/misc/magic.mgc  Default compiled list of magic.
207     /usr/share/misc/magic      Directory containing default magic files.
208

ENVIRONMENT

210     The environment variable MAGIC can be used to set the default magic file
211     name.  If that variable is set, then file will not attempt to open
212     $HOME/.magic.  file adds ‘.mgc’ to the value of this variable as appro‐
213     priate.  The environment variable POSIXLY_CORRECT controls (on systems
214     that support symbolic links), whether file will attempt to follow sym‐
215     links or not. If set, then file follows symlink, otherwise it does not.
216     This is also controlled by the -L and -h options.
217

SEE ALSO

219     magic(5), strings(1), od(1), hexdump(1,) file(1posix)
220

STANDARDS CONFORMANCE

222     This program is believed to exceed the System V Interface Definition of
223     FILE(CMD), as near as one can determine from the vague language contained
224     therein.  Its behavior is mostly compatible with the System V program of
225     the same name.  This version knows more magic, however, so it will pro‐
226     duce different (albeit more accurate) output in many cases.
227
228     The one significant difference between this version and System V is that
229     this version treats any white space as a delimiter, so that spaces in
230     pattern strings must be escaped.  For example,
231
232           >10     string  language impress        (imPRESS data)
233
234     in an existing magic file would have to be changed to
235
236           >10     string  language\ impress       (imPRESS data)
237
238     In addition, in this version, if a pattern string contains a backslash,
239     it must be escaped.  For example
240
241           0       string          \begindata      Andrew Toolkit document
242
243     in an existing magic file would have to be changed to
244
245           0       string          \\begindata     Andrew Toolkit document
246
247     SunOS releases 3.2 and later from Sun Microsystems include a file command
248     derived from the System V one, but with some extensions.  My version dif‐
249     fers from Sun's only in minor ways.  It includes the extension of the ‘&’
250     operator, used as, for example,
251
252           >16     long&0x7fffffff >0              not stripped
253

MAGIC DIRECTORY

255     The magic file entries have been collected from various sources, mainly
256     USENET, and contributed by various authors.  Christos Zoulas (address
257     below) will collect additional or corrected magic file entries.  A con‐
258     solidation of magic file entries will be distributed periodically.
259
260     The order of entries in the magic file is significant.  Depending on what
261     system you are using, the order that they are put together may be incor‐
262     rect.  If your old file command uses a magic file, keep the old magic
263     file around for comparison purposes (rename it to
264     /usr/share/misc/magic.orig ).
265

EXAMPLES

267           $ file file.c file /dev/{wd0a,hda}
268           file.c:   C program text
269           file:     ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
270                     dynamically linked (uses shared libs), stripped
271           /dev/wd0a: block special (0/0)
272           /dev/hda: block special (3/0)
273
274           $ file -s /dev/wd0{b,d}
275           /dev/wd0b: data
276           /dev/wd0d: x86 boot sector
277
278           $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
279           /dev/hda:   x86 boot sector
280           /dev/hda1:  Linux/i386 ext2 filesystem
281           /dev/hda2:  x86 boot sector
282           /dev/hda3:  x86 boot sector, extended partition table
283           /dev/hda4:  Linux/i386 ext2 filesystem
284           /dev/hda5:  Linux/i386 swap file
285           /dev/hda6:  Linux/i386 swap file
286           /dev/hda7:  Linux/i386 swap file
287           /dev/hda8:  Linux/i386 swap file
288           /dev/hda9:  empty
289           /dev/hda10: empty
290
291           $ file -i file.c file /dev/{wd0a,hda}
292           file.c:      text/x-c
293           file:        application/x-executable
294           /dev/hda:    application/x-not-regular-file
295           /dev/wd0a:   application/x-not-regular-file
296
297

HISTORY

299     There has been a file command in every UNIX since at least Research
300     Version 4 (man page dated November, 1973).  The System V version intro‐
301     duced one significant major change: the external list of magic types.
302     This slowed the program down slightly but made it a lot more flexible.
303
304     This program, based on the System V version, was written by Ian Darwin
305     <ian@darwinsys.com> without looking at anybody else's source code.
306
307     John Gilmore revised the code extensively, making it better than the
308     first version.  Geoff Collyer found several inadequacies and provided
309     some magic file entries.  Contributions by the `&' operator by Rob McMa‐
310     hon, cudcv@warwick.ac.uk, 1989.
311
312     Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
313
314     Primary development and maintenance from 1990 to the present by Christos
315     Zoulas (christos@astron.com).
316
317     Altered by Chris Lowth, chris@lowth.com, 2000: Handle the -i option to
318     output mime type strings, using an alternative magic file and internal
319     logic.
320
321     Altered by Eric Fischer (enf@pobox.com), July, 2000, to identify charac‐
322     ter codes and attempt to identify the languages of non-ASCII files.
323
324     Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME
325     support and merge MIME and non-MIME magic, support directories as well as
326     files of magic, apply many bug fixes and improve the build system.
327
328     The list of contributors to the ‘magic’ directory (magic files) is too
329     long to include here.  You know who you are; thank you.  Many contribu‐
330     tors are listed in the source files.
331
333     Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.  Covered by the
334     standard Berkeley Software Distribution copyright; see the file
335     LEGAL.NOTICE in the source distribution.
336
337     The files tar.h and is_tar.c were written by John Gilmore from his pub‐
338     lic-domain tar(1) program, and are not covered by the above license.
339

BUGS

341     There must be a better way to automate the construction of the Magic file
342     from all the glop in Magdir.  What is it?
343
344     file uses several algorithms that favor speed over accuracy, thus it can
345     be misled about the contents of text files.
346
347     The support for text files (primarily for programming languages) is sim‐
348     plistic, inefficient and requires recompilation to update.
349
350     The list of keywords in ascmagic probably belongs in the Magic file.
351     This could be done by using some keyword like ‘*’ for the offset value.
352
353     Complain about conflicts in the magic file entries.  Make a rule that the
354     magic entries sort based on file offset rather than position within the
355     magic file?
356
357     The program should provide a way to give an estimate of ‘how good’ a
358     guess is.  We end up removing guesses (e.g.  ‘Fromas first 5 chars of
359     file) because’ they are not as good as other guesses (e.g.  ‘Newsgroups:’
360     versus ‘Return-Path:’ ).  Still, if the others don't pan out, it should
361     be possible to use the first guess.
362
363     This manual page, and particularly this section, is too long.
364

RETURN CODE

366     file returns 0 on success, and non-zero on error.
367
368     If the file named by the file operand does not exist, cannot be read, or
369     the type of the file named by the file operand cannot be determined, this
370     is not be considered an error that affects the exit status.
371

AVAILABILITY

373     You can obtain the original author's latest version by anonymous FTP on
374     ftp.astron.com in the directory /pub/file/file-X.YZ.tar.gz
375
376BSD                             October 9, 2008                            BSD
Impressum