1agedu(1)                         Simon Tatham                         agedu(1)
2
3
4

NAME

6       agedu  -  correlate disk usage with last-access times to identify large
7       and disused data
8

SYNOPSIS

10       agedu [ options ] action [action...]
11

DESCRIPTION

13       agedu scans a directory tree and produces reports about how  much  disk
14       space  is  used  in  each directory and subdirectory, and also how that
15       usage of disk space corresponds to files with last-access times a  long
16       time ago.
17
18       In  other words, agedu is a tool you might use to help you free up disk
19       space. It lets you see which directories are taking up the most  space,
20       as  du does; but unlike du, it also distinguishes between large collec‐
21       tions of data which are still in use  and  ones  which  have  not  been
22       accessed  in months or years - for instance, large archives downloaded,
23       unpacked, used once, and never cleaned up.  Where  du  helps  you  find
24       what's  using your disk space, agedu helps you find what's wasting your
25       disk space.
26
27       agedu has several operating modes. In one mode, it scans your disk  and
28       builds  an  index  file  containing a data structure which allows it to
29       efficiently retrieve any information  it  might  need.  Typically,  you
30       would  use it in this mode first, and then run it in one of a number of
31       `query' modes to display a report of the disk space usage of a particu‐
32       lar  directory and its subdirectories. Those reports can be produced as
33       plain text (much like du) or as HTML. agedu can even run as a miniature
34       web  server, presenting each directory's HTML report with hyperlinks to
35       let you navigate around the file system to similar  reports  for  other
36       directories.
37
38       So  you would typically start using agedu by telling it to do a scan of
39       a directory tree and build an index. This is done with a  command  such
40       as
41
42       $ agedu -s /home/fred
43
44       which  will  build  a  large data file called agedu.dat in your current
45       directory. (If that current directory is inside /home/fred, don't worry
46       - agedu is smart enough to discount its own index file.)
47
48       Having  built  the  index,  you  would now query it for reports of disk
49       space usage. If you have a graphical  web  browser,  the  simplest  and
50       nicest way to query the index is by running agedu in web server mode:
51
52       $ agedu -w
53
54       which  will  print  (among other messages) a URL on its standard output
55       along the lines of
56
57       URL: http://127.0.0.1:48638/
58
59       (That URL will always begin with  `127.',  meaning  that  it's  in  the
60       localhost address space. So only processes running on the same computer
61       can even try to connect to that web server, and also  there  is  access
62       control  to  prevent  other  users  from seeing it - see below for more
63       detail.)
64
65       Now paste that URL into your web browser,  and  you  will  be  shown  a
66       graphical  representation of the disk usage in /home/fred and its imme‐
67       diate subdirectories, with varying colours used to show the  difference
68       between  disused  and recently-accessed data. Click on any subdirectory
69       to descend into it and see a report for  its  subdirectories  in  turn;
70       click  on  parts  of  the  pathname at the top of any page to return to
71       higher-level directories. When you've finished browsing, you  can  just
72       press  Ctrl-D  to  send an end-of-file indication to agedu, and it will
73       shut down.
74
75       After that, you probably want to delete the data file agedu.dat,  since
76       it's  pretty large. In fact, the command agedu -R will do this for you;
77       and you can chain agedu commands on the  same  command  line,  so  that
78       instead of the above you could have done
79
80       $ agedu -s /home/fred -w -R
81
82       for a single self-contained run of agedu which builds its index, serves
83       web pages from it, and cleans it up when finished.
84
85       If you don't have a  graphical  web  browser,  you  can  do  text-based
86       queries as well. Having scanned /home/fred as above, you might run
87
88       $ agedu -t /home/fred
89
90       which  again  gives  a  summary of the disk usage in /home/fred and its
91       immediate subdirectories; but this time agedu will print it on standard
92       output, in much the same format as du. If you then want to find out how
93       much old data is there, you can add the -a option to  show  only  files
94       last  accessed  a certain length of time ago. For example, to show only
95       files which haven't been looked at in six months or more:
96
97       $ agedu -t /home/fred -a 6m
98
99       That's the essence of what agedu does. It has other modes of  operation
100       for  more  complex  situations,  and  the  usual  array of configurable
101       options. The following sections contain a complete  reference  for  all
102       its functionality.
103

OPERATING MODES

105       This  section describes the operating modes supported by agedu. Each of
106       these is in the form of a command-line option, sometimes with an  argu‐
107       ment.  Multiple  operating-mode options may appear on the command line,
108       in which case agedu  will  perform  the  specified  actions  one  after
109       another. For instance, as shown in the previous section, you might want
110       to perform a disk scan and  immediately  launch  a  web  server  giving
111       reports from that scan.
112
113       -s directory or --scan directory
114              In this mode, agedu scans the file system starting at the speci‐
115              fied directory, and indexes the results of the scan into a large
116              data file which other operating modes can query.
117
118              By  default,  the  scan  is  restricted  to a single file system
119              (since the expected use of agedu is that you would probably  use
120              it  because  a  particular  disk  partition  was  running low on
121              space). You can remove that  restriction  using  the  --cross-fs
122              option;  other  configuration  options  allow  you to include or
123              exclude files or entire subdirectories from the  scan.  See  the
124              next section for full details of the configurable options.
125
126              The  index file is created with restrictive permissions, in case
127              the file system you are scanning contains confidential  informa‐
128              tion in its structure.
129
130              Index  files  are  dependent  on  the characteristics of the CPU
131              architecture you created them on. You should not  expect  to  be
132              able  to  move an index file between different types of computer
133              and have it continue to  work.  If  you  need  to  transfer  the
134              results  of a disk scan to a different kind of computer, see the
135              -D and -L options below.
136
137       -w or --web
138              In this mode, agedu expects to find an index file already  writ‐
139              ten.  It allocates a network port, and starts up a web server on
140              that port which serves reports generated from the index file. By
141              default it invents its own URL and prints it out.
142
143              The web server runs until agedu receives an end-of-file event on
144              its standard input. (The expected usage is that you run it  from
145              the command line, immediately browse web pages until you're sat‐
146              isfied, and then press Ctrl-D.) To disable  the  EOF  behaviour,
147              use the --no-eof option.
148
149              In  case  the  index  file contains any confidential information
150              about your file system, the web server  protects  the  pages  it
151              serves  from  access  by  other  people.  On Linux, this is done
152              transparently by means of using /proc/net/tcp to check the owner
153              of  each  incoming connection; failing that, the web server will
154              require a password to view the reports, and agedu will print the
155              password it invented on standard output along with the URL.
156
157              Configurable  options  for  this  mode  let you specify your own
158              address and port number to listen on, and also specify your  own
159              choice  of  authentication method (including turning authentica‐
160              tion off completely) and a username and password of your choice.
161
162       -t directory or --text directory
163              In this mode, agedu generates a textual report on standard  out‐
164              put,  listing  the disk usage in the specified directory and all
165              its subdirectories down to a given depth. By default that  depth
166              is  1,  so that you see a report for directory itself and all of
167              its immediate subdirectories.  You  can  configure  a  different
168              depth  (or  no depth limit) using -d, described in the next sec‐
169              tion.
170
171              Used on its own, -t merely lists the total disk  usage  in  each
172              subdirectory;  agedu's  additional ability to distinguish unused
173              from recently-used data is not activated. To  activate  it,  use
174              the -a option to specify a minimum age.
175
176              The  directory structure stored in agedu's index file is treated
177              as a set of literal strings. This means that you cannot refer to
178              directories  by synonyms. So if you ran agedu -s ., then all the
179              path names you later pass to the -t option must be either `.' or
180              begin  with `./'. Similarly, symbolic links within the directory
181              you scanned will not be followed; you must refer to each  direc‐
182              tory by its canonical, symlink-free pathname.
183
184       -R or --remove
185              In  this  mode, agedu deletes its index file. Running just agedu
186              -R on its own is therefore equivalent to  typing  rm  agedu.dat.
187              However,  you  can  also  put -R on the end of a command line to
188              indicate that agedu should delete its index file after  it  fin‐
189              ishes performing other operations.
190
191       -D or --dump
192              In  this mode, agedu reads an existing index file and produces a
193              dump of its contents on standard output. This dump can later  be
194              loaded into a new index file, perhaps on another computer.
195
196       -L or --load
197              In  this  mode,  agedu expects to read a dump produced by the -D
198              option from its standard input. It constructs an index file from
199              that dump, exactly as it would have if it had read the same data
200              from a disk scan in -s mode.
201
202       -S directory or --scan-dump directory
203              In this mode, agedu will scan a directory tree and  convert  the
204              results  straight into a dump on standard output, without gener‐
205              ating an index file at all. So running  agedu  -S  /path  should
206              produce  equivalent  output to that of agedu -s /path -D, except
207              that the latter will produce an index  file  as  a  side  effect
208              whereas -S will not.
209
210              (The  output  will not be exactly identical, due to a difference
211              in treatment of last-access times on  directories.  However,  it
212              should be effectively equivalent for most purposes. See the doc‐
213              umentation of the --dir-atime option in  the  next  section  for
214              further detail.)
215
216       -H directory or --html directory
217              In  this  mode,  agedu  will generate an HTML report of the disk
218              usage in the specified directory and its  immediate  subdirecto‐
219              ries,  in the same form that it serves from its web server in -w
220              mode.
221
222              By default, a single HTML report will be  generated  and  simply
223              written to standard output, with no hyperlinks pointing to other
224              similar pages. If you also specify the -d  option  (see  below),
225              agedu  will  instead  write  out a collection of HTML files with
226              hyperlinks between them, and call the top-level file index.html.
227
228       --cgi  In this mode, agedu will run as the bulk of a CGI  script  which
229              provides  the  same  set of web pages as the built-in web server
230              would. It will read the usual  CGI  environment  variables,  and
231              write CGI-style data to its standard output.
232
233              The  actual  CGI  program itself should be a tiny wrapper around
234              agedu which passes it the --cgi option, and also  (probably)  -f
235              to locate the index file. agedu will do everything else.
236
237              No  access control is performed in this mode: restricting access
238              to CGI scripts is assumed to be the job of the web server.
239

OPTIONS

241       This section describes the various configuration  options  that  affect
242       agedu's operation in one mode or another.
243
244       The following option affects nearly all modes (except -S):
245
246       -f filename or --file filename
247              Specifies  the  location  of the index file which agedu creates,
248              reads or removes depending on its operating  mode.  By  default,
249              this  is  simply `agedu.dat', in whatever is the current working
250              directory when you run agedu.
251
252       The following options affect the disk-scanning modes, -s and -S:
253
254       --cross-fs and --no-cross-fs
255              These configure whether or not the disk  scan  is  permitted  to
256              cross  between  different  file  systems. The default is not to:
257              agedu will normally skip over subdirectories on which a  differ‐
258              ent  file  system  is mounted. This makes it convenient when you
259              want to free up space on a particular file system which is  run‐
260              ning  low. However, in other circumstances you might wish to see
261              general information about the use of space no matter which  file
262              system  it's  on  (for  instance,  if  your real concern is your
263              backup media running out of space, and if your  backups  do  not
264              treat  different file systems specially); in that situation, use
265              --cross-fs.
266
267              (Note that this default is the opposite way round from the  cor‐
268              responding option in du.)
269
270       --prune wildcard and --prune-path wildcard
271              These  cause  particular  files  or  directories  to  be omitted
272              entirely from the scan. If agedu's scan  encounters  a  file  or
273              directory  whose  name  matches  the  wildcard  provided  to the
274              --prune option, it will not include that file in its index,  and
275              also  if  it's a directory it will skip over it and not scan its
276              contents.
277
278              Note that in most Unix shells, wildcards will probably  need  to
279              be  escaped  on  the  command  line,  to  prevent the shell from
280              expanding the wildcard before agedu sees it.
281
282              --prune-path is similar to --prune, except that the wildcard  is
283              matched against the entire pathname instead of just the filename
284              at the end of it. So whereas --prune *a*b* will match  any  file
285              whose  actual  name contains an a somewhere before a b, --prune-
286              path *a*b* will also match a file  whose  name  contains  b  and
287              which  is inside a directory containing an a, or any file inside
288              a directory of that form, and so on.
289
290       --exclude wildcard and --exclude-path wildcard
291              These cause particular files or directories to be  omitted  from
292              the  index,  but not from the scan. If agedu's scan encounters a
293              file or directory whose name matches the  wildcard  provided  to
294              the --exclude option, it will not include that file in its index
295              - but unlike --prune, if the file in question is a directory  it
296              will  still  scan  its  contents  and index them if they are not
297              ruled out themselves by --exclude options.
298
299              As above, --exclude-path is similar to  --exclude,  except  that
300              the wildcard is matched against the entire pathname.
301
302       --include wildcard and --include-path wildcard
303              These cause particular files or directories to be re-included in
304              the index and the scan, if they had previously been ruled out by
305              one  of  the  above exclude or prune options. You can interleave
306              include, exclude and prune options as you wish  on  the  command
307              line,  and  if  more than one of them applies to a file then the
308              last one takes priority.
309
310              For example, if you wanted to see only the disk space  taken  up
311              by MP3 files, you might run
312
313              $ agedu -s . --exclude '*' --include '*.mp3'
314
315              which  will  cause  everything  to be omitted from the scan, but
316              then the MP3 files to be put back in. If you then wanted only  a
317              subset  of those MP3s, you could then exclude some of them again
318              by adding, say, `--exclude-path  './queen/*''  (or,  more  effi‐
319              ciently, `--prune ./queen') on the end of that command.
320
321              As  with  the previous two options, --include-path is similar to
322              --include except that the wildcard is matched against the entire
323              pathname.
324
325       --progress, --no-progress and --tty-progress
326              When agedu is scanning a directory tree, it will typically print
327              a one-line progress report every second  showing  where  it  has
328              reached  in  the  scan,  so  you  can have some idea of how much
329              longer it will take. (Of course, it can't  predict  exactly  how
330              long  it  will take, since it doesn't know which of the directo‐
331              ries it hasn't scanned yet will turn out to be huge.)
332
333              By default, those progress  reports  are  displayed  on  agedu's
334              standard  error  channel,  if  that channel points to a terminal
335              device. If you need to manually enable or disable them, you  can
336              use the above three options to do so: --progress unconditionally
337              enables the progress reports, --no-progress unconditionally dis‐
338              ables  them, and --tty-progress reverts to the default behaviour
339              which is conditional on standard error being a terminal.
340
341       --dir-atime and --no-dir-atime
342              In normal operation,  agedu  ignores  the  atimes  (last  access
343              times)  on  the  directories it scans: it only pays attention to
344              the atimes of  the  files  inside  those  directories.  This  is
345              because  directory  atimes  tend  to be reset by a lot of system
346              administrative tasks, such as cron jobs which scan the file sys‐
347              tem  for  one  reason  or another - or even other invocations of
348              agedu itself, though it tries to avoid modifying any  atimes  if
349              possible. So the literal atimes on directories are typically not
350              representative of how long ago the data  in  question  was  last
351              accessed with real intent to use that data in particular.
352
353              Instead,  agedu  makes  up  a  fake atime for every directory it
354              scans, which is equal to the newest atime  of  any  file  in  or
355              below that directory (or the directory's last modification time,
356              whichever is newest). This is based on the assumption  that  all
357              important  accesses  to directories are actually accesses to the
358              files inside  those  directories,  so  that  when  any  file  is
359              accessed all the directories on the path leading to it should be
360              considered to have been accessed as well.
361
362              In unusual cases it is possible that a  directory  itself  might
363              embody  important  data  which is accessed by reading the direc‐
364              tory. In that situation, agedu's atime-faking policy will misre‐
365              port  the  directory as disused. In the unlikely event that such
366              directories form a significant part of your  disk  space  usage,
367              you  might  want  to turn off the faking. The --dir-atime option
368              does this: it causes the disk scan to read the  original  atimes
369              of the directories it scans.
370
371              The  faking  of atimes on directories also requires a processing
372              pass over the index file after the main disk scan  is  complete.
373              --dir-atime also turns this pass off. Hence, this option affects
374              the -L option as well as -s and -S.
375
376              (The previous section mentioned that there might be subtle  dif‐
377              ferences  between  the  output of agedu -s /path -D and agedu -S
378              /path. This is why. Doing a scan with -s  and  then  dumping  it
379              with  -D  will  dump  the fully faked atimes on the directories,
380              whereas doing a scan-to-dump with -S will  dump  only  partially
381              faked  atimes - specifically, each directory's last modification
382              time - since the subsequent processing pass will not have had  a
383              chance  to  take place. However, loading either of the resulting
384              dump files with -L  will  perform  the  atime-faking  processing
385              pass,  leading  to the same data in the index file in each case.
386              In normal usage it should be safe to ignore all of this complex‐
387              ity.)
388
389       --mtime
390              This  option causes agedu to index files by their last modifica‐
391              tion time instead of their last access time. You might  want  to
392              use  this  if your last access times were completely useless for
393              some reason: for example, if you  had  recently  searched  every
394              file on your system, the system would have lost all the informa‐
395              tion about what files you hadn't recently accessed before  then.
396              Using this option is liable to be less effective at finding gen‐
397              uinely wasted space than the normal mode (that is,  it  will  be
398              more  likely  to flag things as disused when they're not, so you
399              will have more candidates to go through by hand looking for data
400              you  don't  need),  but may be better than nothing if your last-
401              access times are unhelpful.
402
403              Another use for this mode might  be  to  find  recently  created
404              large  data.  If  your  disk  has  been gradually filling up for
405              years, the default mode of agedu will let you find  unused  data
406              to  delete;  but  if  you  know  your  disk  had plenty of space
407              recently and now it's suddenly full, and you suspect  that  some
408              rogue  program  has  left a large core dump or output file, then
409              agedu --mtime might be a convenient way to locate the culprit.
410
411       The following option affects all the modes that generate  reports:  the
412       web  server  mode  -w,  the stand-alone HTML generation mode -H and the
413       text report mode -t.
414
415       --files
416              This option causes agedu's reports to list the individual  files
417              in  each directory, instead of just giving a combined report for
418              everything that's not in a subdirectory.
419
420       The following options affect the stand-alone HTML  generation  mode  -H
421       and the text report mode -t.
422
423       -d depth or --depth depth
424              This  option  controls the maximum depth to which agedu recurses
425              when generating a text or HTML report.
426
427              In text mode, the default is 1, meaning  that  the  report  will
428              include  the  directory given on the command line and all of its
429              immediate subdirectories. A depth of two includes another  level
430              below  that, and so on; a depth of zero means only the directory
431              on the command line.
432
433              In HTML mode, specifying this option switches agedu from writing
434              out  a single HTML file to writing out multiple files which link
435              to each other. A depth of 1 means agedu will write out  an  HTML
436              file  for the given directory and also one for each of its imme‐
437              diate subdirectories.
438
439              If you want agedu to recurse as deeply  as  possible,  give  the
440              special word `max' as an argument to -d.
441
442       -o filename or --output filename
443              This option is used to specify an output file for agedu to write
444              its report to. In text mode or single-file HTML mode, the  argu‐
445              ment  is  treated  as  the name of a file. In multiple-file HTML
446              mode, the argument is treated as the name of  a  directory:  the
447              directory  will be created if it does not already exist, and the
448              output HTML files will be created inside it.
449
450       The following options affect the web server mode -w, and  in  one  case
451       also the stand-alone HTML generation mode -H:
452
453       -r age range or --age-range age range
454              The  HTML  reports  produced  by agedu use a range of colours to
455              indicate how long ago data was last accessed, running  from  red
456              (representing  the most disused data) to green (representing the
457              newest). By default, the lengths of time represented by the  two
458              ends  of  that spectrum are chosen by examining the data file to
459              see what range of ages appears in it. However, you might want to
460              set your own limits, and you can do this using -r.
461
462              The  argument  to -r consists of a single age, or two ages sepa‐
463              rated by a minus sign. An age is a number, followed  by  one  of
464              `y'  (years), `m' (months), `w' (weeks) or `d' (days). The first
465              age in the  range  represents  the  oldest  data,  and  will  be
466              coloured  red in the HTML; the second age represents the newest,
467              coloured green. If the second age  is  not  specified,  it  will
468              default  to  zero  (so  that  green  means  data  which has been
469              accessed just now).
470
471              For example, -r 2y will mark data in red if it has  been  unused
472              for  two  years  or more, and green if it has been accessed just
473              now. -r 2y-3m will similarly mark data red if it has been unused
474              for  two  years  or  more, but will mark it green if it has been
475              accessed three months ago or later.
476
477       --address addr[:port]
478              Specifies the network address and port  number  on  which  agedu
479              should  listen when running its web server. If you want agedu to
480              listen for connections coming in from  any  source,  you  should
481              probably  specify  the  special  IP address 0.0.0.0. If the port
482              number is omitted, an arbitrary unused port will be  chosen  for
483              you and displayed.
484
485              If  you  specify  this  option,  agedu will not print its URL on
486              standard output (since you are expected to know what address you
487              told it to listen to).
488
489       --auth auth-type
490              Specifies  how  agedu  should control access to the web pages it
491              serves. The options are as follows:
492
493              magic  This option only works on Linux, and only when the incom‐
494                     ing  connection  is  from  the same machine that agedu is
495                     running on. On Linux, the special file /proc/net/tcp con‐
496                     tains  a  list  of network connections currently known to
497                     the operating system kernel, including which user id cre‐
498                     ated them. So agedu will look up each incoming connection
499                     in that file, and allow access if it comes from the  same
500                     user  id  under which agedu itself is running. Therefore,
501                     in agedu's normal web server mode, you can safely run  it
502                     on a multi-user machine and no other user will be able to
503                     read data out of your index file.
504
505              basic  In this mode, agedu will use HTTP  Basic  authentication:
506                     the user will have to provide a username and password via
507                     their browser. agedu will normally make up a username and
508                     password  for  the purpose, but you can specify your own;
509                     see below.
510
511              none   In this mode, the web server is  unauthenticated:  anyone
512                     connecting to it has full access to the reports generated
513                     by agedu. Do not do this unless there is  nothing  confi‐
514                     dential at all in your index file, or unless you are cer‐
515                     tain that nobody but you can run processes on  your  com‐
516                     puter.
517
518              default
519                     This is the default mode if you do not specify one of the
520                     above. In this mode, agedu  will  attempt  to  use  Linux
521                     magic  authentication,  but if it detects at startup time
522                     that /proc/net/tcp is absent or  non-functional  then  it
523                     will  fall  back  to  using HTTP Basic authentication and
524                     invent a user name and password.
525
526       --auth-file filename or --auth-fd fd
527              When agedu is using HTTP  Basic  authentication,  these  options
528              allow  you  to  specify  your own user name and password. If you
529              specify --auth-file, these will be read from the specified file;
530              if  you specify --auth-fd they will instead be read from a given
531              file descriptor which you should have arranged to pass to agedu.
532              In either case, the authentication details should consist of the
533              username, followed by a colon, followed by  the  password,  fol‐
534              lowed  immediately  by end of file (no trailing newline, or else
535              it will be considered part of the password).
536
537       --no-eof
538              Stop agedu in web server mode from looking  for  end-of-file  on
539              standard input and treating it as a signal to terminate.
540

LIMITATIONS

542       The data file is pretty large. The core of agedu is the tree-based data
543       structure it uses in its index in  order  to  efficiently  perform  the
544       queries it needs; this data structure requires O(N log N) storage. This
545       is larger than you might expect; a scan of my own home directory,  con‐
546       taining  half  a  million files and directories and about 20Gb of data,
547       produced an index file over 60Mb in size. Furthermore, since  the  data
548       file  must  be  memory-mapped during most processing, it can never grow
549       larger than available address space, so a  really  big  filesystem  may
550       need  to  be  indexed on a 64-bit computer. (This is one reason for the
551       existence of the -D and -L options: you can  do  the  scanning  on  the
552       machine  with  access  to the filesystem, and the indexing on a machine
553       big enough to handle it.)
554
555       The data structure also does not usefully permit access control  within
556       the data file, so it would be difficult - even given the willingness to
557       do additional coding - to run a system-wide agedu scan on  a  cron  job
558       and serve the right subset of reports to each user.
559
560       In  certain  circumstances, agedu can report false positives (reporting
561       files as disused which are in fact in use) as well as the  more  benign
562       false  negatives (reporting files as in use which are not). This arises
563       when a file is, semantically speaking, `read'  without  actually  being
564       physically  read.  Typically  this occurs when a program checks whether
565       the file's mtime has changed and only bothers re-reading it if it  has;
566       programs which do this include rsync(1) and make(1). Such programs will
567       fail to update the atime of unmodified files despite depending on their
568       continued existence; a directory full of such files will be reported as
569       disused by agedu but deleting them will cause trouble.
570

LICENCE

572       agedu is free software, distributed under the MIT licence.  Type  agedu
573       --licence to see the full licence text.
574
575
576
577Simon Tatham                      2008‐11‐02                          agedu(1)
Impressum