1SORTER(1)                   General Commands Manual                  SORTER(1)
2
3
4

NAME

6       sorter - Sort files in an image into categories based on file type
7

SYNOPSIS

9       [-b  size  ] [-e] [-E] [-h] [-l] [-md5] [-s] [-sha1] [-U] [-v] [-V] [-a
10       hash_alert ] [-c config ] [-C config ] [-d dir ] [-m mnt ] [-n  nsrl_db
11       ]  [-x  hash_exclude  ]  [-i  imgtype] [-o imgoffset] [-f fstype] image
12       [image] [meta_addr]
13

DESCRIPTION

15       sorter is a Perl script that analyzes a file  system  to  organize  the
16       allocated  and unallocated files by file type.  It runs the 'file' com‐
17       mand on each file and organizes the files according  to  the  rules  in
18       configuration  files.   Extension  mismatching is also done to identify
19       'hidden' files.  One can also provide hash databases for files that are
20       known  to be good and can be ignored and files that are known to be bad
21       and should be alerted.
22
23       By default, the program uses the configuration files in  the  directory
24       where  The Sleuth Kit was installed.   Those can be overruled with run-
25       time options.  There is a standard configuration file for all file sys‐
26       tem types and then a specific one for a given operating system.
27
28

ARGUMENTS

30       The  required  arguments are as follows.  This will analyze one or more
31       images and either save the results in the '-d' directory  or  list  the
32       results to STDOUT (if '-l' is given).
33
34
35       -d dir Specify the location of where all files should be written.  This
36              includes the index files and subdirectories if the '-s' flag  is
37              given.  This MUST be given, unless the '-l' list flag is given.
38
39       -l     List information to STDOUT (no files are ever written).  This is
40              useful for Incident Response, with the use  of  'netcat'.   This
41              cannot be used if '-d' is used.
42
43       images The file names of the image(s) to analyze.
44
45
46       The options are as follows:
47
48       -f fstype
49              Specify  the file system type of the image(s).  This is the same
50              type that The Sleuth Kit uses.
51
52
53       -i imgtype
54              Specify the image type in which  the  file  system  is  located.
55              This is the same type that The Sleuth Kit uses.
56
57
58       -o imgoffset
59              Specify the sector offset from the beginning of the image to the
60              start of the file system.
61
62
63       -b size
64              Specify the minimum size of file to  process.   All  files  less
65              than this size will be ignored.
66
67
68       -c config
69              Specify  the location of an additional configuration file.  This
70              file will be loaded in addition to  the  standard  ones  in  the
71              install  directory.   These settings will have priority over the
72              standard files.
73
74       -C config
75              Specify the location of the ONLY configuration file.  The  stan‐
76              dard  config  files  will not be loaded if this option is given.
77              For example, in the  ´share/sort´  directory  there  is  a  file
78              called  'images.sort'.   This  file  contains  only  rules about
79              graphic images.  If it is specified with -C,  then  only  images
80              will be saved about the image.
81
82       -m mnt Specify the mounting point of the image being analyzed.  This is
83              only for cosmetic reasons.  When the entries in the output files
84              are written, the files will have a the full path instead of just
85              the relative path.  If this is given, then only one image can be
86              given.
87
88       -a hash_alert
89              Specify the location a hash database with entries of known 'bad'
90              files.  If any file is found with an  MD5  hash  value  in  this
91              database, it will be placed in a special alert file.  This data‐
92              base must have been indexed for MD5 using 'hfind' in The  Sleuth
93              Kit before it is used by sorter.
94
95       -n nsrl_db
96              Specify  the  location  of  the NIST National Software Reference
97              Library (NSRL) database (www.nsrl.nist.org).  Any file found  in
98              the  NSRL  will  be ignored and not placed into a category.  The
99              database must be indexed for MD5 with 'hfind' in The Sleuth  Kit
100              before  it  is  used  by sorter.  The database file is currently
101              called 'NSRLFile.txt'.
102
103       -x hash_exclude
104              Specify the location a  hash  database  with  entries  of  known
105              'good'  files.   If  any file is found with an MD5 hash value in
106              this database, it will be ignored and not processed or saved  to
107              the  category  files.   This database must have been indexed for
108              MD5 using 'hfind' in The Sleuth Kit before it is used by sorter.
109
110       -e     Perform extension mismatch checks on (no  category  index  files
111              are generated)
112
113       -i     Perform category indexing only (no extension mismatch checks)
114
115       -U     Do  no  save  data  about  unknown  file  types.  By default, an
116              'unknown' file is created for files where the 'file'  output  is
117              not  known.   This allows one to refine their configuration.  If
118              this is not desired, use this flag.
119
120       -h     Create category files in HTML
121
122       -md5   Calculate the MD5 value for each file and save it in  the  cate‐
123              gory  file.   This  will  be  done automatically when any of the
124              databases are given.
125
126       -sha1  Calculate the SHA-1 value for each file and save it in the cate‐
127              gory file.
128
129       -s     Save the actual file content to sub-directories in the directory
130              specified by '-d'.  For example, all JPG  and  GIF  files  would
131              actually  be  saved  in the 'images' directory.  If '-h' is also
132              given, thumbnails of graphic images are also created.
133
134       -v     Display verbose information
135
136       -V     Display version.
137
138       [meta_addr]
139              The meta data address  of  the  directory  to  start  with.   By
140              default,  the  root  directory  is used.  If this is given, then
141              only one image can be given.
142
143

HIGH-LEVEL OVERVIEW OF PROCESS

145       sorter is a Perl script that interacts with other The Sleuth Kit tools.
146       It  starts  by  reading  the  configuration files from the installation
147       directory.  There is a general configuration file and  a  specific  one
148       for  each  operating  system.   The specific one is determined from the
149       '-f' flag.  Each configuration file contains rules for  processing  the
150       output  of the 'file' command.  One type of line identifies which cate‐
151       gory (i.e. 'images') a given 'file' output  belongs  to  (i.e.   ´image
152       data´) (using regular expressions).  Another rule shows the file exten‐
153       sions  (i.e.   .txt)   that   belong   to   a   'file'   output   (i.e.
154       ASCII(.*?)text).  See the Rules section below.
155
156       The  program then runs the 'fls' tool in The Sleuth Kit to identify the
157       files in the file system image.  Each identified file is  viewed  using
158       the  'icat' tool.  If a hash database is given, the hash of the file is
159       calculated and looked up.  If it is found in an 'alert' database,  then
160       it  is added to a special 'alert.txt' file.  If it is found in the NSRL
161       or 'exclude' database, then  it  is  ignored  as  a  known  good  file.
162       Excluded  files  are recorded in an 'exclude' file for future reference
163       but it is not saved in the category files.
164
165       The 'file' command is then run to identify  the  file  type  (based  on
166       header information).  The configuration file rules are used to identify
167       which category it belongs to.  An entry is added to  the  corresponding
168       category  file (in the '-d dir' directory).  If the '-s' flag is given,
169       then a copy of the file is saved in a subdirectory of the same name  as
170       the  category.  If the HTML format is used, then hyper-links will allow
171       one to easily view saved files and view what is in each category.
172
173       Files that do not have a category are recorded in the  'unknown'  cate‐
174       gory  and  the  'data'  category.  'data' is for files with a structure
175       that 'file' does not know and 'unknown' is for files with  a  structure
176       that 'file' knows about.  These are saved for future reference, but the
177       unknown category can be ignored by using the '-U' flag.
178
179       A copy of the files can be saved by using the '-s' flag.  If  so,  then
180       the  files  are saved in a subdirectory that is named with the category
181       name.  Each file is named using the file system image name followed  by
182       the  meta  data  address and the original file extension.  The category
183       index file can be used to translate the actual name to the saved  name.
184       The  HTML  format  makes viewing easier as there are links to each file
185       from the category index file.
186
187       The program will also consult the rules about the file  extension.   If
188       the  file  has an extension at the end of it (anything after a ´.´), it
189       will be compared to the rules.  If the extension is not  found  in  the
190       rules  as  a valid extension for the file type, it will be added to the
191       file of 'mismatch'.  If the file does not have an extension it will not
192       be  entered  even if the file type has valid extensions.  This check is
193       done even if the file is found in one of the known good hash databases.
194       If  it  is  found  in one of those, it will be added to a special file.
195       Files of type 'data' have no extension checks done by default (as  they
196       have an unknown structure).
197
198
199
200       The  program repeats the above procedures using the output of the 'ils'
201       command as well.  This allows 'sorter' to examine the contents of unal‐
202       located  files that still have pointers to the data units (not all file
203       systems will produce data from this step).
204
205

CONFIGURATION FILES

207       Configuration files are used to define what file types belong in  which
208       categories  and  what extensions belong to what file types.  Configura‐
209       tion files are distributed with the 'sorter' tool and  are  located  in
210       the installation directory in the 'share/sorter' directory.
211
212       The  'default.sort'  file is used by any file system type.  It contains
213       entries for common file types.  A specific operating system  file  also
214       exists, which is useful for extensions that are specific to a given OS.
215       By default, the default file and the OS  specific  one  will  be  used.
216       Using  the '-c' flag, an additional file can be used.  If the '-C' flag
217       is used, then only the supplied configuration file is used.
218
219       There are two rule types in the configuration files.  Each rule  starts
220       with  a  header that specifies which rule type it is (category or ext).
221       Both rule types have two additional columns that can  be  separated  by
222       any white space.
223
224
225       The category rule has the category name as the second column and a Perl
226       regular expression in the third column.  The category name can not have
227       any  spaces  in  it  and  can only be letters and numbers.  The regular
228       expression is used to  examine  the  output  of  'file'.   The  regular
229       expression will be used case insensitive.  More than one rule can exist
230       for a category, but only one category can exist for a given  file  out‐
231       put.  For example:
232
233       This  saves  all  file  output  with 'image data' anywhere in it to the
234       ´images´ category:
235           category        images          image data
236
237       This saves all file output that has 'ASCII' followed  by  anything  and
238       then 'text' to be saved to the 'text' category:
239           category        text            ASCII(.*?)text
240
241       This  saves  all file output that is just 'data' to the 'data' category
242       (the ^ and $ define the boundaries in Perl).  The 'data' value is  com‐
243       mon in the output of file for unknown binary data.
244           category        data            ^data?
245
246
247       There is a special category of 'ignore' that is used to skip over files
248       of this type.  This is mainly a time and space saver.
249
250
251       The extension rule is similar except that the  second  column  has  the
252       value extensions for the file output.  Multiple rules can exist for the
253       same file type.  The comparison will be done case insensitive.   If  no
254       extension  is valid for the file type, a rule does not need to be made.
255       That is already assumed.
256
257       For example, the ASCII is used for several file extensions so the  fol‐
258       lowing rules could exist:
259
260           ext             txt,log         ASCII(.*?)text
261           ext             c,cpp,h,js      ASCII(.*?)text
262
263
264       Please  email me any rules that you find useful for standard investiga‐
265       tions and I will incorporate them  into  future  releases  (carrier  at
266       sleuthkit dot org).
267
268

EXAMPLES

270       To run sorter with no hash databases, the following can be used:
271
272           # sorter -f ntfs -d data/sorter images/hda1.dd
273           # sorter -d data/sorter images/hda1.dd
274
275           # sorter -i raw -f ntfs -o 63 -d data/sorter images/hda.dd
276
277       To include the NSRL, an exclude, and an alert hash database:
278
279           #  sorter  -f ntfs -d data/sorter -a /usr/hash/rootkit.db        -x
280       /usr/hash/win2k.db -n /usr/hash/nsrl/NSRLFile.txt       images/hda1.dd
281
282       To just identify images using the supplied 'images.sort' file:
283
284           # sorter -f ntfs -C /usr/local/sleuthkit/share/sort/images.sort
285       -d data/sorter -h -s images/hda1.dd
286
287

REQUIREMENTS

289       The  NIST  National  Software  Reference Library (NSRL) can be found at
290       www.nsrl.nist.gov.
291
292

LICENSE

294       Distributed under the Common Public License, found  in  the  cpl1.0.txt
295       file in the The Sleuth Kit licenses directory.
296
297

AUTHOR

299       Brian Carrier <carrier at sleuthkit dogt org>
300
301       Send documentation updates to <doc-updates at sleuthkit dot org>
302
303
304
305                                                                     SORTER(1)
Impressum