1MAGIC(5) BSD File Formats Manual MAGIC(5)
2
4 magic — file command's magic pattern file
5
7 This manual page documents the format of magic files as used by the
8 file(1) command, version 5.42. The file(1) command identifies the type
9 of a file using, among other tests, a test for whether the file contains
10 certain “magic patterns”. The database of these “magic patterns” is usu‐
11 ally located in a binary file in /usr/share/misc/magic.mgc or a directory
12 of source text magic pattern fragment files in /usr/share/misc/magic.
13 The database specifies what patterns are to be tested for, what message
14 or MIME type to print if a particular pattern is found, and additional
15 information to extract from the file.
16
17 The format of the source fragment files that are used to build this data‐
18 base is as follows: Each line of a fragment file specifies a test to be
19 performed. A test compares the data starting at a particular offset in
20 the file with a byte value, a string or a numeric value. If the test
21 succeeds, a message is printed. The line consists of the following
22 fields:
23
24 offset A number specifying the offset (in bytes) into the file of the
25 data which is to be tested. This offset can be a negative num‐
26 ber if it is:
27 • The first direct offset of the magic entry (at continuation
28 level 0), in which case it is interpreted an offset from end
29 end of the file going backwards. This works only when a
30 file descriptor to the file is available and it is a regular
31 file.
32 • A continuation offset relative to the end of the last up-
33 level field (&).
34
35 type The type of the data to be tested. The possible values are:
36
37 byte A one-byte value.
38
39 short A two-byte value in this machine's native byte or‐
40 der.
41
42 long A four-byte value in this machine's native byte or‐
43 der.
44
45 quad An eight-byte value in this machine's native byte
46 order.
47
48 float A 32-bit single precision IEEE floating point number
49 in this machine's native byte order.
50
51 double A 64-bit double precision IEEE floating point number
52 in this machine's native byte order.
53
54 string A string of bytes. The string type specification
55 can be optionally followed by /[WwcCtbTf]*. The “W”
56 flag compacts whitespace in the target, which must
57 contain at least one whitespace character. If the
58 magic has n consecutive blanks, the target needs at
59 least n consecutive blanks to match. The “w” flag
60 treats every blank in the magic as an optional
61 blank. The “f” flags requires that the matched
62 string is a full word, not a partial word match.
63 The “c” flag specifies case insensitive matching:
64 lower case characters in the magic match both lower
65 and upper case characters in the target, whereas up‐
66 per case characters in the magic only match upper
67 case characters in the target. The “C” flag speci‐
68 fies case insensitive matching: upper case charac‐
69 ters in the magic match both lower and upper case
70 characters in the target, whereas lower case charac‐
71 ters in the magic only match upper case characters
72 in the target. To do a complete case insensitive
73 match, specify both “c” and “C”. The “t” flag
74 forces the test to be done for text files, while the
75 “b” flag forces the test to be done for binary
76 files. The “T” flag causes the string to be
77 trimmed, i.e. leading and trailing whitespace is
78 deleted before the string is printed.
79
80 pstring A Pascal-style string where the first byte/short/int
81 is interpreted as the unsigned length. The length
82 defaults to byte and can be specified as a modifier.
83 The following modifiers are supported:
84 B A byte length (default).
85 H A 2 byte big endian length.
86 h A 2 byte little endian length.
87 L A 4 byte big endian length.
88 l A 4 byte little endian length.
89 J The length includes itself in its count.
90 The string is not NUL terminated. “J” is used
91 rather than the more valuable “I” because this type
92 of length is a feature of the JPEG format.
93
94 date A four-byte value interpreted as a UNIX date.
95
96 qdate An eight-byte value interpreted as a UNIX date.
97
98 ldate A four-byte value interpreted as a UNIX-style date,
99 but interpreted as local time rather than UTC.
100
101 qldate An eight-byte value interpreted as a UNIX-style
102 date, but interpreted as local time rather than UTC.
103
104 qwdate An eight-byte value interpreted as a Windows-style
105 date.
106
107 beid3 A 32-bit ID3 length in big-endian byte order.
108
109 beshort A two-byte value in big-endian byte order.
110
111 belong A four-byte value in big-endian byte order.
112
113 bequad An eight-byte value in big-endian byte order.
114
115 befloat A 32-bit single precision IEEE floating point number
116 in big-endian byte order.
117
118 bedouble A 64-bit double precision IEEE floating point number
119 in big-endian byte order.
120
121 bedate A four-byte value in big-endian byte order, inter‐
122 preted as a Unix date.
123
124 beqdate An eight-byte value in big-endian byte order, inter‐
125 preted as a Unix date.
126
127 beldate A four-byte value in big-endian byte order, inter‐
128 preted as a UNIX-style date, but interpreted as lo‐
129 cal time rather than UTC.
130
131 beqldate An eight-byte value in big-endian byte order, inter‐
132 preted as a UNIX-style date, but interpreted as lo‐
133 cal time rather than UTC.
134
135 beqwdate An eight-byte value in big-endian byte order, inter‐
136 preted as a Windows-style date.
137
138 bestring16 A two-byte unicode (UCS16) string in big-endian byte
139 order.
140
141 leid3 A 32-bit ID3 length in little-endian byte order.
142
143 leshort A two-byte value in little-endian byte order.
144
145 lelong A four-byte value in little-endian byte order.
146
147 lequad An eight-byte value in little-endian byte order.
148
149 lefloat A 32-bit single precision IEEE floating point number
150 in little-endian byte order.
151
152 ledouble A 64-bit double precision IEEE floating point number
153 in little-endian byte order.
154
155 ledate A four-byte value in little-endian byte order, in‐
156 terpreted as a UNIX date.
157
158 leqdate An eight-byte value in little-endian byte order, in‐
159 terpreted as a UNIX date.
160
161 leldate A four-byte value in little-endian byte order, in‐
162 terpreted as a UNIX-style date, but interpreted as
163 local time rather than UTC.
164
165 leqldate An eight-byte value in little-endian byte order, in‐
166 terpreted as a UNIX-style date, but interpreted as
167 local time rather than UTC.
168
169 leqwdate An eight-byte value in little-endian byte order, in‐
170 terpreted as a Windows-style date.
171
172 lestring16 A two-byte unicode (UCS16) string in little-endian
173 byte order.
174
175 melong A four-byte value in middle-endian (PDP-11) byte or‐
176 der.
177
178 medate A four-byte value in middle-endian (PDP-11) byte or‐
179 der, interpreted as a UNIX date.
180
181 meldate A four-byte value in middle-endian (PDP-11) byte or‐
182 der, interpreted as a UNIX-style date, but inter‐
183 preted as local time rather than UTC.
184
185 indirect Starting at the given offset, consult the magic
186 database again. The offset of the indirect magic is
187 by default absolute in the file, but one can specify
188 /r to indicate that the offset is relative from the
189 beginning of the entry.
190
191 name Define a “named” magic instance that can be called
192 from another use magic entry, like a subroutine
193 call. Named instance direct magic offsets are rela‐
194 tive to the offset of the previous matched entry,
195 but indirect offsets are relative to the beginning
196 of the file as usual. Named magic entries always
197 match.
198
199 use Recursively call the named magic starting from the
200 current offset. If the name of the referenced be‐
201 gins with a ^ then the endianness of the magic is
202 switched; if the magic mentioned leshort for exam‐
203 ple, it is treated as beshort and vice versa. This
204 is useful to avoid duplicating the rules for differ‐
205 ent endianness.
206
207 regex A regular expression match in extended POSIX regular
208 expression syntax (like egrep). Regular expressions
209 can take exponential time to process, and their per‐
210 formance is hard to predict, so their use is dis‐
211 couraged. When used in production environments,
212 their performance should be carefully checked. The
213 size of the string to search should also be limited
214 by specifying /<length>, to avoid performance issues
215 scanning long files. The type specification can
216 also be optionally followed by /[c][s][l]. The “c”
217 flag makes the match case insensitive, while the “s”
218 flag update the offset to the start offset of the
219 match, rather than the end. The “l” modifier,
220 changes the limit of length to mean number of lines
221 instead of a byte count. Lines are delimited by the
222 platforms native line delimiter. When a line count
223 is specified, an implicit byte count also computed
224 assuming each line is 80 characters long. If nei‐
225 ther a byte or line count is specified, the search
226 is limited automatically to 8KiB. ^ and $ match the
227 beginning and end of individual lines, respectively,
228 not beginning and end of file.
229
230 search A literal string search starting at the given off‐
231 set. The same modifier flags can be used as for
232 string patterns. The search expression must contain
233 the range in the form /number, that is the number of
234 positions at which the match will be attempted,
235 starting from the start offset. This is suitable
236 for searching larger binary expressions with vari‐
237 able offsets, using \ escapes for special charac‐
238 ters. The order of modifier and number is not rele‐
239 vant.
240
241 default This is intended to be used with the test x (which
242 is always true) and it has no type. It matches when
243 no other test at that continuation level has matched
244 before. Clearing that matched tests for a continua‐
245 tion level, can be done using the clear test.
246
247 clear This test is always true and clears the match flag
248 for that continuation level. It is intended to be
249 used with the default test.
250
251 der Parse the file as a DER Certificate file. The test
252 field is used as a der type that needs to be
253 matched. The DER types are: eoc, bool, int,
254 bit_str, octet_str, null, obj_id, obj_desc, ext,
255 real, enum, embed, utf8_str, rel_oid, time, res2,
256 seq, set, num_str, prt_str, t61_str, vid_str,
257 ia5_str, utc_time, gen_time, gr_str, vis_str,
258 gen_str, univ_str, char_str, bmp_str, date, tod,
259 datetime, duration, oid-iri, rel-oid-iri. These
260 types can be followed by an optional numeric size,
261 which indicates the field width in bytes.
262
263 guid A Globally Unique Identifier, parsed and printed as
264 XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX. It's format
265 is a string.
266
267 offset This is a quad value indicating the current offset
268 of the file. It can be used to determine the size
269 of the file or the magic buffer. For example the
270 magic entries:
271
272 -0 offset x this file is %lld bytes
273 -0 offset <=100 must be more than 100 \
274 bytes and is only %lld
275
276 For compatibility with the Single UNIX Standard, the type speci‐
277 fiers dC and d1 are equivalent to byte, the type specifiers uC
278 and u1 are equivalent to ubyte, the type specifiers dS and d2
279 are equivalent to short, the type specifiers uS and u2 are
280 equivalent to ushort, the type specifiers dI, dL, and d4 are
281 equivalent to long, the type specifiers uI, uL, and u4 are
282 equivalent to ulong, the type specifier d8 is equivalent to
283 quad, the type specifier u8 is equivalent to uquad, and the type
284 specifier s is equivalent to string. In addition, the type
285 specifier dQ is equivalent to quad and the type specifier uQ is
286 equivalent to uquad.
287
288 Each top-level magic pattern (see below for an explanation of
289 levels) is classified as text or binary according to the types
290 used. Types “regex” and “search” are classified as text tests,
291 unless non-printable characters are used in the pattern. All
292 other tests are classified as binary. A top-level pattern is
293 considered to be a test text when all its patterns are text pat‐
294 terns; otherwise, it is considered to be a binary pattern. When
295 matching a file, binary patterns are tried first; if no match is
296 found, and the file looks like text, then its encoding is deter‐
297 mined and the text patterns are tried.
298
299 The numeric types may optionally be followed by & and a numeric
300 value, to specify that the value is to be AND'ed with the nu‐
301 meric value before any comparisons are done. Prepending a u to
302 the type indicates that ordered comparisons should be unsigned.
303
304 test The value to be compared with the value from the file. If the
305 type is numeric, this value is specified in C form; if it is a
306 string, it is specified as a C string with the usual escapes
307 permitted (e.g. \n for new-line).
308
309 Numeric values may be preceded by a character indicating the op‐
310 eration to be performed. It may be =, to specify that the value
311 from the file must equal the specified value, <, to specify that
312 the value from the file must be less than the specified value,
313 >, to specify that the value from the file must be greater than
314 the specified value, &, to specify that the value from the file
315 must have set all of the bits that are set in the specified
316 value, ^, to specify that the value from the file must have
317 clear any of the bits that are set in the specified value, or ~,
318 the value specified after is negated before tested. x, to spec‐
319 ify that any value will match. If the character is omitted, it
320 is assumed to be =. Operators &, ^, and ~ don't work with
321 floats and doubles. The operator ! specifies that the line
322 matches if the test does not succeed.
323
324 Numeric values are specified in C form; e.g. 13 is decimal, 013
325 is octal, and 0x13 is hexadecimal.
326
327 Numeric operations are not performed on date types, instead the
328 numeric value is interpreted as an offset.
329
330 For string values, the string from the file must match the spec‐
331 ified string. The operators =, < and > (but not &) can be ap‐
332 plied to strings. The length used for matching is that of the
333 string argument in the magic file. This means that a line can
334 match any non-empty string (usually used to then print the
335 string), with >\0 (because all non-empty strings are greater
336 than the empty string).
337
338 Dates are treated as numerical values in the respective internal
339 representation.
340
341 The special test x always evaluates to true.
342
343 message The message to be printed if the comparison succeeds. If the
344 string contains a printf(3) format specification, the value from
345 the file (with any specified masking performed) is printed using
346 the message as the format string. If the string begins with
347 “\b”, the message printed is the remainder of the string with no
348 whitespace added before it: multiple matches are normally sepa‐
349 rated by a single space.
350
351 An APPLE 4+4 character APPLE creator and type can be specified as:
352
353 !:apple CREATYPE
354
355 A MIME type is given on a separate line, which must be the next non-blank
356 or comment line after the magic line that identifies the file type, and
357 has the following format:
358
359 !:mime MIMETYPE
360
361 i.e. the literal string “!:mime” followed by the MIME type.
362
363 An optional strength can be supplied on a separate line which refers to
364 the current magic description using the following format:
365
366 !:strength OP VALUE
367
368 The operand OP can be: +, -, *, or / and VALUE is a constant between 0
369 and 255. This constant is applied using the specified operand to the
370 currently computed default magic strength.
371
372 Some file formats contain additional information which is to be printed
373 along with the file type or need additional tests to determine the true
374 file type. These additional tests are introduced by one or more > char‐
375 acters preceding the offset. The number of > on the line indicates the
376 level of the test; a line with no > at the beginning is considered to be
377 at level 0. Tests are arranged in a tree-like hierarchy: if the test on
378 a line at level n succeeds, all following tests at level n+1 are per‐
379 formed, and the messages printed if the tests succeed, until a line with
380 level n (or less) appears. For more complex files, one can use empty
381 messages to get just the "if/then" effect, in the following way:
382
383 0 string MZ
384 >0x18 leshort <0x40 MS-DOS executable
385 >0x18 leshort >0x3f extended PC executable (e.g., MS Windows)
386
387 Offsets do not need to be constant, but can also be read from the file
388 being examined. If the first character following the last > is a ( then
389 the string after the parenthesis is interpreted as an indirect offset.
390 That means that the number after the parenthesis is used as an offset in
391 the file. The value at that offset is read, and is used again as an off‐
392 set in the file. Indirect offsets are of the form: (( x
393 [[.,][bBcCeEfFgGhHiIlmsSqQ]][+-][ y ]). The value of x is used as an
394 offset in the file. A byte, id3 length, short or long is read at that
395 offset depending on the [bBcCeEfFgGhHiIlmsSqQ] type specifier. The value
396 is treated as signed if “”, is specified or unsigned if “”. is speci‐
397 fied. The capitalized types interpret the number as a big endian value,
398 whereas the small letter versions interpret the number as a little endian
399 value; the m type interprets the number as a middle endian (PDP-11)
400 value. To that number the value of y is added and the result is used as
401 an offset in the file. The default type if one is not specified is long.
402 The following types are recognized:
403
404 Type Sy Mnemonic Sy Endian Sy Size
405 bcBc Byte/Char N/A 1
406 efg Double Little 8
407 EFG Double Big 8
408 hs Half/Short Little 2
409 HS Half/Short Big 2
410 i ID3 Little 4
411 I ID3 Big 4
412 m Middle Middle 4
413 q Quad Little 8
414 Q Quad Big 8
415
416 That way variable length structures can be examined:
417
418 # MS Windows executables are also valid MS-DOS executables
419 0 string MZ
420 >0x18 leshort <0x40 MZ executable (MS-DOS)
421 # skip the whole block below if it is not an extended executable
422 >0x18 leshort >0x3f
423 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
424 >>(0x3c.l) string LX\0\0 LX executable (OS/2)
425
426 This strategy of examining has a drawback: you must make sure that you
427 eventually print something, or users may get empty output (such as when
428 there is neither PE\0\0 nor LE\0\0 in the above example).
429
430 If this indirect offset cannot be used directly, simple calculations are
431 possible: appending [+-*/%&|^]number inside parentheses allows one to
432 modify the value read from the file before it is used as an offset:
433
434 # MS Windows executables are also valid MS-DOS executables
435 0 string MZ
436 # sometimes, the value at 0x18 is less that 0x40 but there's still an
437 # extended executable, simply appended to the file
438 >0x18 leshort <0x40
439 >>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
440 >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
441
442 Sometimes you do not know the exact offset as this depends on the length
443 or position (when indirection was used before) of preceding fields. You
444 can specify an offset relative to the end of the last up-level field us‐
445 ing ‘&’ as a prefix to the offset:
446
447 0 string MZ
448 >0x18 leshort >0x3f
449 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
450 # immediately following the PE signature is the CPU type
451 >>>&0 leshort 0x14c for Intel 80386
452 >>>&0 leshort 0x184 for DEC Alpha
453
454 Indirect and relative offsets can be combined:
455
456 0 string MZ
457 >0x18 leshort <0x40
458 >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
459 # if it's not COFF, go back 512 bytes and add the offset taken
460 # from byte 2/3, which is yet another way of finding the start
461 # of the extended executable
462 >>>&(2.s-514) string LE LE executable (MS Windows VxD driver)
463
464 Or the other way around:
465
466 0 string MZ
467 >0x18 leshort >0x3f
468 >>(0x3c.l) string LE\0\0 LE executable (MS-Windows)
469 # at offset 0x80 (-4, since relative offsets start at the end
470 # of the up-level match) inside the LE header, we find the absolute
471 # offset to the code area, where we look for a specific signature
472 >>>(&0x7c.l+0x26) string UPX \b, UPX compressed
473
474 Or even both!
475
476 0 string MZ
477 >0x18 leshort >0x3f
478 >>(0x3c.l) string LE\0\0 LE executable (MS-Windows)
479 # at offset 0x58 inside the LE header, we find the relative offset
480 # to a data area where we look for a specific signature
481 >>>&(&0x54.l-3) string UNACE \b, ACE self-extracting archive
482
483 If you have to deal with offset/length pairs in your file, even the sec‐
484 ond value in a parenthesized expression can be taken from the file it‐
485 self, using another set of parentheses. Note that this additional indi‐
486 rect offset is always relative to the start of the main indirect offset.
487
488 0 string MZ
489 >0x18 leshort >0x3f
490 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
491 # search for the PE section called ".idata"...
492 >>>&0xf4 search/0x140 .idata
493 # ...and go to the end of it, calculated from start+length;
494 # these are located 14 and 10 bytes after the section name
495 >>>>(&0xe.l+(-4)) string PK\3\4 \b, ZIP self-extracting archive
496
497 If you have a list of known values at a particular continuation level,
498 and you want to provide a switch-like default case:
499
500 # clear that continuation level match
501 >18 clear
502 >18 lelong 1 one
503 >18 lelong 2 two
504 >18 default x
505 # print default match
506 >>18 lelong x unmatched 0x%x
507
509 file(1) - the command that reads this file.
510
512 The formats long, belong, lelong, melong, short, beshort, and leshort do
513 not depend on the length of the C data types short and long on the plat‐
514 form, even though the Single UNIX Specification implies that they do.
515 However, as OS X Mountain Lion has passed the Single UNIX Specification
516 validation suite, and supplies a version of file(1) in which they do not
517 depend on the sizes of the C data types and that is built for a 64-bit
518 environment in which long is 8 bytes rather than 4 bytes, presumably the
519 validation suite does not test whether, for example long refers to an
520 item with the same size as the C data type long. There should probably
521 be type names int8, uint8, int16, uint16, int32, uint32, int64, and
522 uint64, and specified-byte-order variants of them, to make it clearer
523 that those types have specified widths.
524
525BSD May 9, 2021 BSD