1SORTER(1)                   General Commands Manual                  SORTER(1)
2
3
4

NAME

6       sorter - Sort files in an image into categories based on file type
7

SYNOPSIS

9       [-b  size  ] [-e] [-E] [-h] [-l] [-md5] [-s] [-sha1] [-U] [-v] [-V] [-a
10       hash_alert ] [-c config ] [-C config ] [-d dir ] [-m mnt ] [-n  nsrl_db
11       ]  [-x  hash_exclude  ]  [-i  imgtype] [-o imgoffset] [-f fstype] image
12       [image] [meta_addr]
13

DESCRIPTION

15       sorter is a Perl script that analyzes a file  system  to  organize  the
16       allocated  and unallocated files by file type.  It runs the 'file' com‐
17       mand on each file and organizes the files according  to  the  rules  in
18       configuration  files.   Extension  mismatching is also done to identify
19       'hidden' files.  One can also provide hash databases for files that are
20       known  to be good and can be ignored and files that are known to be bad
21       and should be alerted.
22
23       By default, the program uses the configuration files in  the  directory
24       where  The Sleuth Kit was installed.   Those can be overruled with run-
25       time options.  There is a standard configuration file for all file sys‐
26       tem types and then a specific one for a given operating system.
27
28

ARGUMENTS

30       The  required  arguments are as follows.  This will analyze one or more
31       images and either save the results in the '-d' directory  or  list  the
32       results to STDOUT (if '-l' is given).
33
34
35       -d dir Specify the location of where all files should be written.  This
36              includes the index files and subdirectories if the '-s' flag  is
37              given.  This MUST be given, unless the '-l' list flag is given.
38
39       -l     List information to STDOUT (no files are ever written).  This is
40              useful for Incident Response, with the use  of  'netcat'.   This
41              cannot be used if '-d' is used.
42
43       image [images]
44              The  disk or partition image to read, whose format is given with
45              '-i'.  Multiple image file names can be given if  the  image  is
46              split  into multiple segments.  If only one image file is given,
47              and its name is the first in a sequence (e.g., as  indicated  by
48              ending  in  '.001'),  subsequent image segments will be included
49              automatically.
50
51
52       The options are as follows:
53
54       -f fstype
55              Specify the file system type of the image(s).  This is the  same
56              type that The Sleuth Kit uses.
57
58
59       -i imgtype
60              Specify  the  image  type  in  which the file system is located.
61              This is the same type that The Sleuth Kit uses.
62
63
64       -o imgoffset
65              Specify the sector offset from the beginning of the image to the
66              start of the file system.
67
68
69       -b size
70              Specify  the  minimum  size  of file to process.  All files less
71              than this size will be ignored.
72
73
74       -c config
75              Specify the location of an additional configuration file.   This
76              file  will  be  loaded  in  addition to the standard ones in the
77              install directory.  These settings will have priority  over  the
78              standard files.
79
80       -C config
81              Specify  the location of the ONLY configuration file.  The stan‐
82              dard config files will not be loaded if this  option  is  given.
83              For  example,  in  the  ´share/sort´  directory  there is a file
84              called 'images.sort'.   This  file  contains  only  rules  about
85              graphic  images.   If  it is specified with -C, then only images
86              will be saved about the image.
87
88       -m mnt Specify the mounting point of the image being analyzed.  This is
89              only for cosmetic reasons.  When the entries in the output files
90              are written, the files will have a the full path instead of just
91              the relative path.  If this is given, then only one image can be
92              given.
93
94       -a hash_alert
95              Specify the location a hash database with entries of known 'bad'
96              files.   If  any  file  is  found with an MD5 hash value in this
97              database, it will be placed in a special alert file.  This data‐
98              base  must have been indexed for MD5 using 'hfind' in The Sleuth
99              Kit before it is used by sorter.
100
101       -n nsrl_db
102              Specify the location of the  NIST  National  Software  Reference
103              Library  (NSRL) database (www.nsrl.nist.org).  Any file found in
104              the NSRL will be ignored and not placed into  a  category.   The
105              database  must be indexed for MD5 with 'hfind' in The Sleuth Kit
106              before it is used by sorter.  The  database  file  is  currently
107              called 'NSRLFile.txt'.
108
109       -x hash_exclude
110              Specify  the  location  a  hash  database  with entries of known
111              'good' files.  If any file is found with an MD5  hash  value  in
112              this  database, it will be ignored and not processed or saved to
113              the category files.  This database must have  been  indexed  for
114              MD5 using 'hfind' in The Sleuth Kit before it is used by sorter.
115
116       -e     Perform  extension  mismatch  checks on (no category index files
117              are generated)
118
119       -U     Do no save data  about  unknown  file  types.   By  default,  an
120              'unknown'  file  is created for files where the 'file' output is
121              not known.  This allows one to refine their  configuration.   If
122              this is not desired, use this flag.
123
124       -h     Create category files in HTML
125
126       -md5   Calculate  the  MD5 value for each file and save it in the cate‐
127              gory file.  This will be done  automatically  when  any  of  the
128              databases are given.
129
130       -sha1  Calculate the SHA-1 value for each file and save it in the cate‐
131              gory file.
132
133       -s     Save the actual file content to sub-directories in the directory
134              specified  by  '-d'.   For  example, all JPG and GIF files would
135              actually be saved in the 'images' directory.  If  '-h'  is  also
136              given, thumbnails of graphic images are also created.
137
138       -v     Display verbose information
139
140       -V     Display version.
141
142       [meta_addr]
143              The  meta  data  address  of  the  directory  to start with.  By
144              default, the root directory is used.  If  this  is  given,  then
145              only one image can be given.
146
147

HIGH-LEVEL OVERVIEW OF PROCESS

149       sorter is a Perl script that interacts with other The Sleuth Kit tools.
150       It starts by reading the  configuration  files  from  the  installation
151       directory.   There  is  a general configuration file and a specific one
152       for each operating system.  The specific one  is  determined  from  the
153       '-f'  flag.   Each configuration file contains rules for processing the
154       output of the 'file' command.  One type of line identifies which  cate‐
155       gory  (i.e.  'images')  a  given 'file' output belongs to (i.e.  ´image
156       data´) (using regular expressions).  Another rule shows the file exten‐
157       sions   (i.e.   .txt)   that   belong   to   a   'file'   output  (i.e.
158       ASCII(.*?)text).  See the Rules section below.
159
160       The program then runs the 'fls' tool in The Sleuth Kit to identify  the
161       files  in  the file system image.  Each identified file is viewed using
162       the 'icat' tool.  If a hash database is given, the hash of the file  is
163       calculated  and looked up.  If it is found in an 'alert' database, then
164       it is added to a special 'alert.txt' file.  If it is found in the  NSRL
165       or  'exclude'  database,  then  it  is  ignored  as  a known good file.
166       Excluded files are recorded in an 'exclude' file for  future  reference
167       but it is not saved in the category files.
168
169       The  'file'  command  is  then  run to identify the file type (based on
170       header information).  The configuration file rules are used to identify
171       which  category  it belongs to.  An entry is added to the corresponding
172       category file (in the '-d dir' directory).  If the '-s' flag is  given,
173       then  a copy of the file is saved in a subdirectory of the same name as
174       the category.  If the HTML format is used, then hyper-links will  allow
175       one to easily view saved files and view what is in each category.
176
177       Files  that  do not have a category are recorded in the 'unknown' cate‐
178       gory and the 'data' category.  'data' is for  files  with  a  structure
179       that  'file'  does not know and 'unknown' is for files with a structure
180       that 'file' knows about.  These are saved for future reference, but the
181       unknown category can be ignored by using the '-U' flag.
182
183       A  copy  of the files can be saved by using the '-s' flag.  If so, then
184       the files are saved in a subdirectory that is named with  the  category
185       name.   Each file is named using the file system image name followed by
186       the meta data address and the original file  extension.   The  category
187       index  file can be used to translate the actual name to the saved name.
188       The HTML format makes viewing easier as there are links  to  each  file
189       from the category index file.
190
191       The  program  will also consult the rules about the file extension.  If
192       the file has an extension at the end of it (anything after a  ´.´),  it
193       will  be  compared  to the rules.  If the extension is not found in the
194       rules as a valid extension for the file type, it will be added  to  the
195       file of 'mismatch'.  If the file does not have an extension it will not
196       be entered even if the file type has valid extensions.  This  check  is
197       done even if the file is found in one of the known good hash databases.
198       If it is found in one of those, it will be added  to  a  special  file.
199       Files  of type 'data' have no extension checks done by default (as they
200       have an unknown structure).
201
202
203
204       The program repeats the above procedures using the output of the  'ils'
205       command as well.  This allows 'sorter' to examine the contents of unal‐
206       located files that still have pointers to the data units (not all  file
207       systems will produce data from this step).
208
209

CONFIGURATION FILES

211       Configuration  files are used to define what file types belong in which
212       categories and what extensions belong to what file  types.   Configura‐
213       tion  files  are  distributed with the 'sorter' tool and are located in
214       the installation directory in the 'share/sorter' directory.
215
216       The 'default.sort' file is used by any file system type.   It  contains
217       entries  for  common file types.  A specific operating system file also
218       exists, which is useful for extensions that are specific to a given OS.
219       By  default,  the  default  file  and the OS specific one will be used.
220       Using the '-c' flag, an additional file can be used.  If the '-C'  flag
221       is used, then only the supplied configuration file is used.
222
223       There  are two rule types in the configuration files.  Each rule starts
224       with a header that specifies which rule type it is (category  or  ext).
225       Both  rule  types  have two additional columns that can be separated by
226       any white space.
227
228
229       The category rule has the category name as the second column and a Perl
230       regular expression in the third column.  The category name can not have
231       any spaces in it and can only be  letters  and  numbers.   The  regular
232       expression  is  used  to  examine  the  output  of 'file'.  The regular
233       expression will be used case insensitive.  More than one rule can exist
234       for  a  category, but only one category can exist for a given file out‐
235       put.  For example:
236
237       This saves all file output with 'image data'  anywhere  in  it  to  the
238       ´images´ category:
239           category        images          image data
240
241       This  saves  all  file output that has 'ASCII' followed by anything and
242       then 'text' to be saved to the 'text' category:
243           category        text            ASCII(.*?)text
244
245       This saves all file output that is just 'data' to the  'data'  category
246       (the  ^ and $ define the boundaries in Perl).  The 'data' value is com‐
247       mon in the output of file for unknown binary data.
248           category        data            ^data?
249
250
251       There is a special category of 'ignore' that is used to skip over files
252       of this type.  This is mainly a time and space saver.
253
254
255       The  extension  rule  is  similar except that the second column has the
256       value extensions for the file output.  Multiple rules can exist for the
257       same  file  type.  The comparison will be done case insensitive.  If no
258       extension is valid for the file type, a rule does not need to be  made.
259       That is already assumed.
260
261       For  example, the ASCII is used for several file extensions so the fol‐
262       lowing rules could exist:
263
264           ext             txt,log         ASCII(.*?)text
265           ext             c,cpp,h,js      ASCII(.*?)text
266
267
268       Please email me any rules that you find useful for standard  investiga‐
269       tions  and  I  will  incorporate  them into future releases (carrier at
270       sleuthkit dot org).
271
272

EXAMPLES

274       To run sorter with no hash databases, the following can be used:
275
276           # sorter -f ntfs -d data/sorter images/hda1.dd
277           # sorter -d data/sorter images/hda1.dd
278
279           # sorter -i raw -f ntfs -o 63 -d data/sorter images/hda.dd
280
281       To include the NSRL, an exclude, and an alert hash database:
282
283           # sorter -f ntfs -d data/sorter -a /usr/hash/rootkit.db          -x
284       /usr/hash/win2k.db -n /usr/hash/nsrl/NSRLFile.txt       images/hda1.dd
285
286       To just identify images using the supplied 'images.sort' file:
287
288           #  sorter -f ntfs -C /usr/local/sleuthkit/share/sort/images.sort
289       -d data/sorter -h -s images/hda1.dd
290
291

REQUIREMENTS

293       The NIST National Software Reference Library (NSRL)  can  be  found  at
294       www.nsrl.nist.gov.
295
296

LICENSE

298       Distributed  under  the  Common Public License, found in the cpl1.0.txt
299       file in the The Sleuth Kit licenses directory.
300
301

AUTHOR

303       Brian Carrier <carrier at sleuthkit dogt org>
304
305       Send documentation updates to <doc-updates at sleuthkit dot org>
306
307
308
309                                                                     SORTER(1)
Impressum