1MAGIC(5) BSD File Formats Manual MAGIC(5)
2
4 magic — file command's magic pattern file
5
7 This manual page documents the format of magic files as used by the
8 file(1) command, version 5.44. The file(1) command identifies the type
9 of a file using, among other tests, a test for whether the file contains
10 certain “magic patterns”. The database of these “magic patterns” is usu‐
11 ally located in a binary file in /usr/share/misc/magic.mgc or a directory
12 of source text magic pattern fragment files in /usr/share/misc/magic.
13 The database specifies what patterns are to be tested for, what message
14 or MIME type to print if a particular pattern is found, and additional
15 information to extract from the file.
16
17 The format of the source fragment files that are used to build this data‐
18 base is as follows: Each line of a fragment file specifies a test to be
19 performed. A test compares the data starting at a particular offset in
20 the file with a byte value, a string or a numeric value. If the test
21 succeeds, a message is printed. The line consists of the following
22 fields:
23
24 offset A number specifying the offset (in bytes) into the file of the
25 data which is to be tested. This offset can be a negative num‐
26 ber if it is:
27 • The first direct offset of the magic entry (at continuation
28 level 0), in which case it is interpreted an offset from end
29 end of the file going backwards. This works only when a
30 file descriptor to the file is available and it is a regular
31 file.
32 • A continuation offset relative to the end of the last up-
33 level field (&).
34
35 type The type of the data to be tested. The possible values are:
36
37 byte A one-byte value.
38
39 short A two-byte value in this machine's native byte or‐
40 der.
41
42 long A four-byte value in this machine's native byte or‐
43 der.
44
45 quad An eight-byte value in this machine's native byte
46 order.
47
48 float A 32-bit single precision IEEE floating point number
49 in this machine's native byte order.
50
51 double A 64-bit double precision IEEE floating point number
52 in this machine's native byte order.
53
54 string A string of bytes. The string type specification
55 can be optionally followed by a /<width> option and
56 optionally followed by a set of flags /[bCcftTtWw]*.
57 The width limits the number of characters to be
58 copied. Zero means all characters. The following
59 flags are supported:
60 b Force binary file test.
61 C Use upper case insensitive matching: upper
62 case characters in the magic match both lower
63 and upper case characters in the target,
64 whereas lower case characters in the magic
65 only match upper case characters in the tar‐
66 get.
67 c Use lower case insensitive matching: lower
68 case characters in the magic match both lower
69 and upper case characters in the target,
70 whereas upper case characters in the magic
71 only match upper case characters in the tar‐
72 get. To do a complete case insensitive
73 match, specify both “c” and “C”.
74 f Require that the matched string is a full
75 word, not a partial word match.
76 T Trim the string, i.e. leading and trailing
77 whitespace
78 t Force text file test.
79 W Compact whitespace in the target, which must
80 contain at least one whitespace character.
81 If the magic has n consecutive blanks, the
82 target needs at least n consecutive blanks to
83 match.
84 w Treat every blank in the magic as an optional
85 blank. is deleted before the string is
86 printed.
87
88 pstring A Pascal-style string where the first byte/short/int
89 is interpreted as the unsigned length. The length
90 defaults to byte and can be specified as a modifier.
91 The following modifiers are supported:
92 B A byte length (default).
93 H A 2 byte big endian length.
94 h A 2 byte little endian length.
95 L A 4 byte big endian length.
96 l A 4 byte little endian length.
97 J The length includes itself in its count.
98 The string is not NUL terminated. “J” is used
99 rather than the more valuable “I” because this type
100 of length is a feature of the JPEG format.
101
102 date A four-byte value interpreted as a UNIX date.
103
104 qdate An eight-byte value interpreted as a UNIX date.
105
106 ldate A four-byte value interpreted as a UNIX-style date,
107 but interpreted as local time rather than UTC.
108
109 qldate An eight-byte value interpreted as a UNIX-style
110 date, but interpreted as local time rather than UTC.
111
112 qwdate An eight-byte value interpreted as a Windows-style
113 date.
114
115 beid3 A 32-bit ID3 length in big-endian byte order.
116
117 beshort A two-byte value in big-endian byte order.
118
119 belong A four-byte value in big-endian byte order.
120
121 bequad An eight-byte value in big-endian byte order.
122
123 befloat A 32-bit single precision IEEE floating point number
124 in big-endian byte order.
125
126 bedouble A 64-bit double precision IEEE floating point number
127 in big-endian byte order.
128
129 bedate A four-byte value in big-endian byte order, inter‐
130 preted as a Unix date.
131
132 beqdate An eight-byte value in big-endian byte order, inter‐
133 preted as a Unix date.
134
135 beldate A four-byte value in big-endian byte order, inter‐
136 preted as a UNIX-style date, but interpreted as lo‐
137 cal time rather than UTC.
138
139 beqldate An eight-byte value in big-endian byte order, inter‐
140 preted as a UNIX-style date, but interpreted as lo‐
141 cal time rather than UTC.
142
143 beqwdate An eight-byte value in big-endian byte order, inter‐
144 preted as a Windows-style date.
145
146 bestring16 A two-byte unicode (UCS16) string in big-endian byte
147 order.
148
149 leid3 A 32-bit ID3 length in little-endian byte order.
150
151 leshort A two-byte value in little-endian byte order.
152
153 lelong A four-byte value in little-endian byte order.
154
155 lequad An eight-byte value in little-endian byte order.
156
157 lefloat A 32-bit single precision IEEE floating point number
158 in little-endian byte order.
159
160 ledouble A 64-bit double precision IEEE floating point number
161 in little-endian byte order.
162
163 ledate A four-byte value in little-endian byte order, in‐
164 terpreted as a UNIX date.
165
166 leqdate An eight-byte value in little-endian byte order, in‐
167 terpreted as a UNIX date.
168
169 leldate A four-byte value in little-endian byte order, in‐
170 terpreted as a UNIX-style date, but interpreted as
171 local time rather than UTC.
172
173 leqldate An eight-byte value in little-endian byte order, in‐
174 terpreted as a UNIX-style date, but interpreted as
175 local time rather than UTC.
176
177 leqwdate An eight-byte value in little-endian byte order, in‐
178 terpreted as a Windows-style date.
179
180 lestring16 A two-byte unicode (UCS16) string in little-endian
181 byte order.
182
183 melong A four-byte value in middle-endian (PDP-11) byte or‐
184 der.
185
186 medate A four-byte value in middle-endian (PDP-11) byte or‐
187 der, interpreted as a UNIX date.
188
189 meldate A four-byte value in middle-endian (PDP-11) byte or‐
190 der, interpreted as a UNIX-style date, but inter‐
191 preted as local time rather than UTC.
192
193 indirect Starting at the given offset, consult the magic
194 database again. The offset of the indirect magic is
195 by default absolute in the file, but one can specify
196 /r to indicate that the offset is relative from the
197 beginning of the entry.
198
199 name Define a “named” magic instance that can be called
200 from another use magic entry, like a subroutine
201 call. Named instance direct magic offsets are rela‐
202 tive to the offset of the previous matched entry,
203 but indirect offsets are relative to the beginning
204 of the file as usual. Named magic entries always
205 match.
206
207 use Recursively call the named magic starting from the
208 current offset. If the name of the referenced be‐
209 gins with a ^ then the endianness of the magic is
210 switched; if the magic mentioned leshort for exam‐
211 ple, it is treated as beshort and vice versa. This
212 is useful to avoid duplicating the rules for differ‐
213 ent endianness.
214
215 regex A regular expression match in extended POSIX regular
216 expression syntax (like egrep). Regular expressions
217 can take exponential time to process, and their per‐
218 formance is hard to predict, so their use is dis‐
219 couraged. When used in production environments,
220 their performance should be carefully checked. The
221 size of the string to search should also be limited
222 by specifying /<length>, to avoid performance issues
223 scanning long files. The type specification can
224 also be optionally followed by /[c][s][l]. The “c”
225 flag makes the match case insensitive, while the “s”
226 flag update the offset to the start offset of the
227 match, rather than the end. The “l” modifier,
228 changes the limit of length to mean number of lines
229 instead of a byte count. Lines are delimited by the
230 platforms native line delimiter. When a line count
231 is specified, an implicit byte count also computed
232 assuming each line is 80 characters long. If nei‐
233 ther a byte or line count is specified, the search
234 is limited automatically to 8KiB. ^ and $ match the
235 beginning and end of individual lines, respectively,
236 not beginning and end of file.
237
238 search A literal string search starting at the given off‐
239 set. The same modifier flags can be used as for
240 string patterns. The search expression must contain
241 the range in the form /number, that is the number of
242 positions at which the match will be attempted,
243 starting from the start offset. This is suitable
244 for searching larger binary expressions with vari‐
245 able offsets, using \ escapes for special charac‐
246 ters. The order of modifier and number is not rele‐
247 vant.
248
249 default This is intended to be used with the test x (which
250 is always true) and it has no type. It matches when
251 no other test at that continuation level has matched
252 before. Clearing that matched tests for a continua‐
253 tion level, can be done using the clear test.
254
255 clear This test is always true and clears the match flag
256 for that continuation level. It is intended to be
257 used with the default test.
258
259 der Parse the file as a DER Certificate file. The test
260 field is used as a der type that needs to be
261 matched. The DER types are: eoc, bool, int,
262 bit_str, octet_str, null, obj_id, obj_desc, ext,
263 real, enum, embed, utf8_str, rel_oid, time, res2,
264 seq, set, num_str, prt_str, t61_str, vid_str,
265 ia5_str, utc_time, gen_time, gr_str, vis_str,
266 gen_str, univ_str, char_str, bmp_str, date, tod,
267 datetime, duration, oid-iri, rel-oid-iri. These
268 types can be followed by an optional numeric size,
269 which indicates the field width in bytes.
270
271 guid A Globally Unique Identifier, parsed and printed as
272 XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX. It's format
273 is a string.
274
275 offset This is a quad value indicating the current offset
276 of the file. It can be used to determine the size
277 of the file or the magic buffer. For example the
278 magic entries:
279
280 -0 offset x this file is %lld bytes
281 -0 offset <=100 must be more than 100 \
282 bytes and is only %lld
283
284 octal A string representing an octal number.
285
286 For compatibility with the Single UNIX Standard, the type specifiers dC
287 and d1 are equivalent to byte, the type specifiers uC and u1 are equiva‐
288 lent to ubyte, the type specifiers dS and d2 are equivalent to short, the
289 type specifiers uS and u2 are equivalent to ushort, the type specifiers
290 dI, dL, and d4 are equivalent to long, the type specifiers uI, uL, and u4
291 are equivalent to ulong, the type specifier d8 is equivalent to quad, the
292 type specifier u8 is equivalent to uquad, and the type specifier s is
293 equivalent to string. In addition, the type specifier dQ is equivalent
294 to quad and the type specifier uQ is equivalent to uquad.
295
296 Each top-level magic pattern (see below for an explanation of levels) is
297 classified as text or binary according to the types used. Types “regex”
298 and “search” are classified as text tests, unless non-printable charac‐
299 ters are used in the pattern. All other tests are classified as binary.
300 A top-level pattern is considered to be a test text when all its patterns
301 are text patterns; otherwise, it is considered to be a binary pattern.
302 When matching a file, binary patterns are tried first; if no match is
303 found, and the file looks like text, then its encoding is determined and
304 the text patterns are tried.
305
306 The numeric types may optionally be followed by & and a numeric value, to
307 specify that the value is to be AND'ed with the numeric value before any
308 comparisons are done. Prepending a u to the type indicates that ordered
309 comparisons should be unsigned.
310 The value to be compared with the value from the file. If the type is
311 numeric, this value is specified in C form; if it is a string, it is
312 specified as a C string with the usual escapes permitted (e.g. \n for
313 new-line).
314
315 Numeric values may be preceded by a character indicating the operation to
316 be performed. It may be =, to specify that the value from the file must
317 equal the specified value, <, to specify that the value from the file
318 must be less than the specified value, >, to specify that the value from
319 the file must be greater than the specified value, &, to specify that the
320 value from the file must have set all of the bits that are set in the
321 specified value, ^, to specify that the value from the file must have
322 clear any of the bits that are set in the specified value, or ~, the
323 value specified after is negated before tested. x, to specify that any
324 value will match. If the character is omitted, it is assumed to be =.
325 Operators &, ^, and ~ don't work with floats and doubles. The operator !
326 specifies that the line matches if the test does not succeed.
327
328 Numeric values are specified in C form; e.g. 13 is decimal, 013 is oc‐
329 tal, and 0x13 is hexadecimal.
330
331 Numeric operations are not performed on date types, instead the numeric
332 value is interpreted as an offset.
333
334 For string values, the string from the file must match the specified
335 string. The operators =, < and > (but not &) can be applied to strings.
336 The length used for matching is that of the string argument in the magic
337 file. This means that a line can match any non-empty string (usually
338 used to then print the string), with >\0 (because all non-empty strings
339 are greater than the empty string).
340
341 Dates are treated as numerical values in the respective internal repre‐
342 sentation.
343
344 The special test x always evaluates to true.
345 The message to be printed if the comparison succeeds. If the string con‐
346 tains a printf(3) format specification, the value from the file (with any
347 specified masking performed) is printed using the message as the format
348 string. If the string begins with “\b”, the message printed is the re‐
349 mainder of the string with no whitespace added before it: multiple
350 matches are normally separated by a single space.
351
352 An APPLE 4+4 character APPLE creator and type can be specified as:
353
354 !:apple CREATYPE
355
356 A MIME type is given on a separate line, which must be the next non-blank
357 or comment line after the magic line that identifies the file type, and has
358 the following format:
359
360 !:mime MIMETYPE
361
362 i.e. the literal string “!:mime” followed by the MIME type.
363
364 An optional strength can be supplied on a separate line which refers to the
365 current magic description using the following format:
366
367 !:strength OP VALUE
368
369 The operand OP can be: +, -, *, or / and VALUE is a constant between 0 and
370 255. This constant is applied using the specified operand to the currently
371 computed default magic strength.
372
373 Some file formats contain additional information which is to be printed
374 along with the file type or need additional tests to determine the true
375 file type. These additional tests are introduced by one or more > charac‐
376 ters preceding the offset. The number of > on the line indicates the level
377 of the test; a line with no > at the beginning is considered to be at level
378 0. Tests are arranged in a tree-like hierarchy: if the test on a line at
379 level n succeeds, all following tests at level n+1 are performed, and the
380 messages printed if the tests succeed, until a line with level n (or less)
381 appears. For more complex files, one can use empty messages to get just
382 the "if/then" effect, in the following way:
383
384 0 string MZ
385 >0x18 leshort <0x40 MS-DOS executable
386 >0x18 leshort >0x3f extended PC executable (e.g., MS Windows)
387
388 Offsets do not need to be constant, but can also be read from the file be‐
389 ing examined. If the first character following the last > is a ( then the
390 string after the parenthesis is interpreted as an indirect offset. That
391 means that the number after the parenthesis is used as an offset in the
392 file. The value at that offset is read, and is used again as an offset in
393 the file. Indirect offsets are of the form: (( x
394 [[.,][bBcCeEfFgGhHiIlmsSqQ]][+-][ y ]). The value of x is used as an off‐
395 set in the file. A byte, id3 length, short or long is read at that offset
396 depending on the [bBcCeEfFgGhHiIlmsSqQ] type specifier. The value is
397 treated as signed if “”, is specified or unsigned if “”. is specified.
398 The capitalized types interpret the number as a big endian value, whereas
399 the small letter versions interpret the number as a little endian value;
400 the m type interprets the number as a middle endian (PDP-11) value. To
401 that number the value of y is added and the result is used as an offset in
402 the file. The default type if one is not specified is long. The following
403 types are recognized:
404
405 Type Sy Mnemonic Sy Endian Sy Size
406 bcBc Byte/Char N/A 1
407 efg Double Little 8
408 EFG Double Big 8
409 hs Half/Short Little 2
410 HS Half/Short Big 2
411 i ID3 Little 4
412 I ID3 Big 4
413 m Middle Middle 4
414 o Octal Textual Variable
415 q Quad Little 8
416 Q Quad Big 8
417
418 That way variable length structures can be examined:
419
420 # MS Windows executables are also valid MS-DOS executables
421 0 string MZ
422 >0x18 leshort <0x40 MZ executable (MS-DOS)
423 # skip the whole block below if it is not an extended executable
424 >0x18 leshort >0x3f
425 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
426 >>(0x3c.l) string LX\0\0 LX executable (OS/2)
427
428 This strategy of examining has a drawback: you must make sure that you
429 eventually print something, or users may get empty output (such as when
430 there is neither PE\0\0 nor LE\0\0 in the above example).
431
432 If this indirect offset cannot be used directly, simple calculations are
433 possible: appending [+-*/%&|^]number inside parentheses allows one to mod‐
434 ify the value read from the file before it is used as an offset:
435
436 # MS Windows executables are also valid MS-DOS executables
437 0 string MZ
438 # sometimes, the value at 0x18 is less that 0x40 but there's still an
439 # extended executable, simply appended to the file
440 >0x18 leshort <0x40
441 >>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
442 >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
443
444 Sometimes you do not know the exact offset as this depends on the length or
445 position (when indirection was used before) of preceding fields. You can
446 specify an offset relative to the end of the last up-level field using ‘&’
447 as a prefix to the offset:
448
449 0 string MZ
450 >0x18 leshort >0x3f
451 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
452 # immediately following the PE signature is the CPU type
453 >>>&0 leshort 0x14c for Intel 80386
454 >>>&0 leshort 0x184 for DEC Alpha
455
456 Indirect and relative offsets can be combined:
457
458 0 string MZ
459 >0x18 leshort <0x40
460 >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
461 # if it's not COFF, go back 512 bytes and add the offset taken
462 # from byte 2/3, which is yet another way of finding the start
463 # of the extended executable
464 >>>&(2.s-514) string LE LE executable (MS Windows VxD driver)
465
466 Or the other way around:
467
468 0 string MZ
469 >0x18 leshort >0x3f
470 >>(0x3c.l) string LE\0\0 LE executable (MS-Windows)
471 # at offset 0x80 (-4, since relative offsets start at the end
472 # of the up-level match) inside the LE header, we find the absolute
473 # offset to the code area, where we look for a specific signature
474 >>>(&0x7c.l+0x26) string UPX \b, UPX compressed
475
476 Or even both!
477
478 0 string MZ
479 >0x18 leshort >0x3f
480 >>(0x3c.l) string LE\0\0 LE executable (MS-Windows)
481 # at offset 0x58 inside the LE header, we find the relative offset
482 # to a data area where we look for a specific signature
483 >>>&(&0x54.l-3) string UNACE \b, ACE self-extracting archive
484
485 If you have to deal with offset/length pairs in your file, even the second
486 value in a parenthesized expression can be taken from the file itself, us‐
487 ing another set of parentheses. Note that this additional indirect offset
488 is always relative to the start of the main indirect offset.
489
490 0 string MZ
491 >0x18 leshort >0x3f
492 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
493 # search for the PE section called ".idata"...
494 >>>&0xf4 search/0x140 .idata
495 # ...and go to the end of it, calculated from start+length;
496 # these are located 14 and 10 bytes after the section name
497 >>>>(&0xe.l+(-4)) string PK\3\4 \b, ZIP self-extracting archive
498
499 If you have a list of known values at a particular continuation level, and
500 you want to provide a switch-like default case:
501
502 # clear that continuation level match
503 >18 clear
504 >18 lelong 1 one
505 >18 lelong 2 two
506 >18 default x
507 # print default match
508 >>18 lelong x unmatched 0x%x
509
511 file(1) - the command that reads this file.
512
514 The formats long, belong, lelong, melong, short, beshort, and leshort do
515 not depend on the length of the C data types short and long on the plat‐
516 form, even though the Single UNIX Specification implies that they do.
517 However, as OS X Mountain Lion has passed the Single UNIX Specification
518 validation suite, and supplies a version of file(1) in which they do not
519 depend on the sizes of the C data types and that is built for a 64-bit
520 environment in which long is 8 bytes rather than 4 bytes, presumably the
521 validation suite does not test whether, for example long refers to an
522 item with the same size as the C data type long. There should probably
523 be type names int8, uint8, int16, uint16, int32, uint32, int64, and
524 uint64, and specified-byte-order variants of them, to make it clearer
525 that those types have specified widths.
526
527BSD October 9, 2022 BSD