1SORTER(1) General Commands Manual SORTER(1)
2
3
4
6 sorter - Sort files in an image into categories based on file type
7
9 [-b size ] [-e] [-E] [-h] [-l] [-md5] [-s] [-sha1] [-U] [-v] [-V] [-a
10 hash_alert ] [-c config ] [-C config ] [-d dir ] [-m mnt ] [-n nsrl_db
11 ] [-x hash_exclude ] [-i imgtype] [-o imgoffset] [-f fstype] image
12 [image] [meta_addr]
13
15 sorter is a Perl script that analyzes a file system to organize the
16 allocated and unallocated files by file type. It runs the 'file' com‐
17 mand on each file and organizes the files according to the rules in
18 configuration files. Extension mismatching is also done to identify
19 'hidden' files. One can also provide hash databases for files that are
20 known to be good and can be ignored and files that are known to be bad
21 and should be alerted.
22
23 By default, the program uses the configuration files in the directory
24 where The Sleuth Kit was installed. Those can be overruled with run-
25 time options. There is a standard configuration file for all file sys‐
26 tem types and then a specific one for a given operating system.
27
28
30 The required arguments are as follows. This will analyze one or more
31 images and either save the results in the '-d' directory or list the
32 results to STDOUT (if '-l' is given).
33
34
35 -d dir Specify the location of where all files should be written. This
36 includes the index files and subdirectories if the '-s' flag is
37 given. This MUST be given, unless the '-l' list flag is given.
38
39 -l List information to STDOUT (no files are ever written). This is
40 useful for Incident Response, with the use of 'netcat'. This
41 cannot be used if '-d' is used.
42
43 image [images]
44 The disk or partition image to read, whose format is given with
45 '-i'. Multiple image file names can be given if the image is
46 split into multiple segments. If only one image file is given,
47 and its name is the first in a sequence (e.g., as indicated by
48 ending in '.001'), subsequent image segments will be included
49 automatically.
50
51
52 The options are as follows:
53
54 -f fstype
55 Specify the file system type of the image(s). This is the same
56 type that The Sleuth Kit uses.
57
58
59 -i imgtype
60 Specify the image type in which the file system is located.
61 This is the same type that The Sleuth Kit uses.
62
63
64 -o imgoffset
65 Specify the sector offset from the beginning of the image to the
66 start of the file system.
67
68
69 -b size
70 Specify the minimum size of file to process. All files less
71 than this size will be ignored.
72
73
74 -c config
75 Specify the location of an additional configuration file. This
76 file will be loaded in addition to the standard ones in the
77 install directory. These settings will have priority over the
78 standard files.
79
80 -C config
81 Specify the location of the ONLY configuration file. The stan‐
82 dard config files will not be loaded if this option is given.
83 For example, in the ´share/sort´ directory there is a file
84 called 'images.sort'. This file contains only rules about
85 graphic images. If it is specified with -C, then only images
86 will be saved about the image.
87
88 -m mnt Specify the mounting point of the image being analyzed. This is
89 only for cosmetic reasons. When the entries in the output files
90 are written, the files will have a the full path instead of just
91 the relative path. If this is given, then only one image can be
92 given.
93
94 -a hash_alert
95 Specify the location a hash database with entries of known 'bad'
96 files. If any file is found with an MD5 hash value in this
97 database, it will be placed in a special alert file. This data‐
98 base must have been indexed for MD5 using 'hfind' in The Sleuth
99 Kit before it is used by sorter.
100
101 -n nsrl_db
102 Specify the location of the NIST National Software Reference
103 Library (NSRL) database (www.nsrl.nist.org). Any file found in
104 the NSRL will be ignored and not placed into a category. The
105 database must be indexed for MD5 with 'hfind' in The Sleuth Kit
106 before it is used by sorter. The database file is currently
107 called 'NSRLFile.txt'.
108
109 -x hash_exclude
110 Specify the location a hash database with entries of known
111 'good' files. If any file is found with an MD5 hash value in
112 this database, it will be ignored and not processed or saved to
113 the category files. This database must have been indexed for
114 MD5 using 'hfind' in The Sleuth Kit before it is used by sorter.
115
116 -e Perform extension mismatch checks on (no category index files
117 are generated)
118
119 -U Do no save data about unknown file types. By default, an
120 'unknown' file is created for files where the 'file' output is
121 not known. This allows one to refine their configuration. If
122 this is not desired, use this flag.
123
124 -h Create category files in HTML
125
126 -md5 Calculate the MD5 value for each file and save it in the cate‐
127 gory file. This will be done automatically when any of the
128 databases are given.
129
130 -sha1 Calculate the SHA-1 value for each file and save it in the cate‐
131 gory file.
132
133 -s Save the actual file content to sub-directories in the directory
134 specified by '-d'. For example, all JPG and GIF files would
135 actually be saved in the 'images' directory. If '-h' is also
136 given, thumbnails of graphic images are also created.
137
138 -v Display verbose information
139
140 -V Display version.
141
142 [meta_addr]
143 The meta data address of the directory to start with. By
144 default, the root directory is used. If this is given, then
145 only one image can be given.
146
147
149 sorter is a Perl script that interacts with other The Sleuth Kit tools.
150 It starts by reading the configuration files from the installation
151 directory. There is a general configuration file and a specific one
152 for each operating system. The specific one is determined from the
153 '-f' flag. Each configuration file contains rules for processing the
154 output of the 'file' command. One type of line identifies which cate‐
155 gory (i.e. 'images') a given 'file' output belongs to (i.e. ´image
156 data´) (using regular expressions). Another rule shows the file exten‐
157 sions (i.e. .txt) that belong to a 'file' output (i.e.
158 ASCII(.*?)text). See the Rules section below.
159
160 The program then runs the 'fls' tool in The Sleuth Kit to identify the
161 files in the file system image. Each identified file is viewed using
162 the 'icat' tool. If a hash database is given, the hash of the file is
163 calculated and looked up. If it is found in an 'alert' database, then
164 it is added to a special 'alert.txt' file. If it is found in the NSRL
165 or 'exclude' database, then it is ignored as a known good file.
166 Excluded files are recorded in an 'exclude' file for future reference
167 but it is not saved in the category files.
168
169 The 'file' command is then run to identify the file type (based on
170 header information). The configuration file rules are used to identify
171 which category it belongs to. An entry is added to the corresponding
172 category file (in the '-d dir' directory). If the '-s' flag is given,
173 then a copy of the file is saved in a subdirectory of the same name as
174 the category. If the HTML format is used, then hyper-links will allow
175 one to easily view saved files and view what is in each category.
176
177 Files that do not have a category are recorded in the 'unknown' cate‐
178 gory and the 'data' category. 'data' is for files with a structure
179 that 'file' does not know and 'unknown' is for files with a structure
180 that 'file' knows about. These are saved for future reference, but the
181 unknown category can be ignored by using the '-U' flag.
182
183 A copy of the files can be saved by using the '-s' flag. If so, then
184 the files are saved in a subdirectory that is named with the category
185 name. Each file is named using the file system image name followed by
186 the meta data address and the original file extension. The category
187 index file can be used to translate the actual name to the saved name.
188 The HTML format makes viewing easier as there are links to each file
189 from the category index file.
190
191 The program will also consult the rules about the file extension. If
192 the file has an extension at the end of it (anything after a ´.´), it
193 will be compared to the rules. If the extension is not found in the
194 rules as a valid extension for the file type, it will be added to the
195 file of 'mismatch'. If the file does not have an extension it will not
196 be entered even if the file type has valid extensions. This check is
197 done even if the file is found in one of the known good hash databases.
198 If it is found in one of those, it will be added to a special file.
199 Files of type 'data' have no extension checks done by default (as they
200 have an unknown structure).
201
202
203
204 The program repeats the above procedures using the output of the 'ils'
205 command as well. This allows 'sorter' to examine the contents of unal‐
206 located files that still have pointers to the data units (not all file
207 systems will produce data from this step).
208
209
211 Configuration files are used to define what file types belong in which
212 categories and what extensions belong to what file types. Configura‐
213 tion files are distributed with the 'sorter' tool and are located in
214 the installation directory in the 'share/sorter' directory.
215
216 The 'default.sort' file is used by any file system type. It contains
217 entries for common file types. A specific operating system file also
218 exists, which is useful for extensions that are specific to a given OS.
219 By default, the default file and the OS specific one will be used.
220 Using the '-c' flag, an additional file can be used. If the '-C' flag
221 is used, then only the supplied configuration file is used.
222
223 There are two rule types in the configuration files. Each rule starts
224 with a header that specifies which rule type it is (category or ext).
225 Both rule types have two additional columns that can be separated by
226 any white space.
227
228
229 The category rule has the category name as the second column and a Perl
230 regular expression in the third column. The category name can not have
231 any spaces in it and can only be letters and numbers. The regular
232 expression is used to examine the output of 'file'. The regular
233 expression will be used case insensitive. More than one rule can exist
234 for a category, but only one category can exist for a given file out‐
235 put. For example:
236
237 This saves all file output with 'image data' anywhere in it to the
238 ´images´ category:
239 category images image data
240
241 This saves all file output that has 'ASCII' followed by anything and
242 then 'text' to be saved to the 'text' category:
243 category text ASCII(.*?)text
244
245 This saves all file output that is just 'data' to the 'data' category
246 (the ^ and $ define the boundaries in Perl). The 'data' value is com‐
247 mon in the output of file for unknown binary data.
248 category data ^data?
249
250
251 There is a special category of 'ignore' that is used to skip over files
252 of this type. This is mainly a time and space saver.
253
254
255 The extension rule is similar except that the second column has the
256 value extensions for the file output. Multiple rules can exist for the
257 same file type. The comparison will be done case insensitive. If no
258 extension is valid for the file type, a rule does not need to be made.
259 That is already assumed.
260
261 For example, the ASCII is used for several file extensions so the fol‐
262 lowing rules could exist:
263
264 ext txt,log ASCII(.*?)text
265 ext c,cpp,h,js ASCII(.*?)text
266
267
268 Please email me any rules that you find useful for standard investiga‐
269 tions and I will incorporate them into future releases (carrier at
270 sleuthkit dot org).
271
272
274 To run sorter with no hash databases, the following can be used:
275
276 # sorter -f ntfs -d data/sorter images/hda1.dd
277 # sorter -d data/sorter images/hda1.dd
278
279 # sorter -i raw -f ntfs -o 63 -d data/sorter images/hda.dd
280
281 To include the NSRL, an exclude, and an alert hash database:
282
283 # sorter -f ntfs -d data/sorter -a /usr/hash/rootkit.db -x
284 /usr/hash/win2k.db -n /usr/hash/nsrl/NSRLFile.txt images/hda1.dd
285
286 To just identify images using the supplied 'images.sort' file:
287
288 # sorter -f ntfs -C /usr/local/sleuthkit/share/sort/images.sort
289 -d data/sorter -h -s images/hda1.dd
290
291
293 The NIST National Software Reference Library (NSRL) can be found at
294 www.nsrl.nist.gov.
295
296
298 Distributed under the Common Public License, found in the cpl1.0.txt
299 file in the The Sleuth Kit licenses directory.
300
301
303 Brian Carrier <carrier at sleuthkit dogt org>
304
305 Send documentation updates to <doc-updates at sleuthkit dot org>
306
307
308
309 SORTER(1)