1agedu(1) Simon Tatham agedu(1)
2
3
4
6 agedu - correlate disk usage with last-access times to identify large
7 and disused data
8
10 agedu [ options ] action [action...]
11
13 agedu scans a directory tree and produces reports about how much disk
14 space is used in each directory and subdirectory, and also how that
15 usage of disk space corresponds to files with last-access times a long
16 time ago.
17
18 In other words, agedu is a tool you might use to help you free up disk
19 space. It lets you see which directories are taking up the most space,
20 as du does; but unlike du, it also distinguishes between large collec‐
21 tions of data which are still in use and ones which have not been
22 accessed in months or years - for instance, large archives downloaded,
23 unpacked, used once, and never cleaned up. Where du helps you find
24 what's using your disk space, agedu helps you find what's wasting your
25 disk space.
26
27 agedu has several operating modes. In one mode, it scans your disk and
28 builds an index file containing a data structure which allows it to
29 efficiently retrieve any information it might need. Typically, you
30 would use it in this mode first, and then run it in one of a number of
31 `query' modes to display a report of the disk space usage of a particu‐
32 lar directory and its subdirectories. Those reports can be produced as
33 plain text (much like du) or as HTML. agedu can even run as a miniature
34 web server, presenting each directory's HTML report with hyperlinks to
35 let you navigate around the file system to similar reports for other
36 directories.
37
38 So you would typically start using agedu by telling it to do a scan of
39 a directory tree and build an index. This is done with a command such
40 as
41
42 $ agedu -s /home/fred
43
44 which will build a large data file called agedu.dat in your current
45 directory. (If that current directory is inside /home/fred, don't worry
46 - agedu is smart enough to discount its own index file.)
47
48 Having built the index, you would now query it for reports of disk
49 space usage. If you have a graphical web browser, the simplest and
50 nicest way to query the index is by running agedu in web server mode:
51
52 $ agedu -w
53
54 which will print (among other messages) a URL on its standard output
55 along the lines of
56
57 URL: http://127.0.0.1:48638/
58
59 (That URL will always begin with `127.', meaning that it's in the
60 localhost address space. So only processes running on the same computer
61 can even try to connect to that web server, and also there is access
62 control to prevent other users from seeing it - see below for more
63 detail.)
64
65 Now paste that URL into your web browser, and you will be shown a
66 graphical representation of the disk usage in /home/fred and its imme‐
67 diate subdirectories, with varying colours used to show the difference
68 between disused and recently-accessed data. Click on any subdirectory
69 to descend into it and see a report for its subdirectories in turn;
70 click on parts of the pathname at the top of any page to return to
71 higher-level directories. When you've finished browsing, you can just
72 press Ctrl-D to send an end-of-file indication to agedu, and it will
73 shut down.
74
75 After that, you probably want to delete the data file agedu.dat, since
76 it's pretty large. In fact, the command agedu -R will do this for you;
77 and you can chain agedu commands on the same command line, so that
78 instead of the above you could have done
79
80 $ agedu -s /home/fred -w -R
81
82 for a single self-contained run of agedu which builds its index, serves
83 web pages from it, and cleans it up when finished.
84
85 If you don't have a graphical web browser, you can do text-based
86 queries as well. Having scanned /home/fred as above, you might run
87
88 $ agedu -t /home/fred
89
90 which again gives a summary of the disk usage in /home/fred and its
91 immediate subdirectories; but this time agedu will print it on standard
92 output, in much the same format as du. If you then want to find out how
93 much old data is there, you can add the -a option to show only files
94 last accessed a certain length of time ago. For example, to show only
95 files which haven't been looked at in six months or more:
96
97 $ agedu -t /home/fred -a 6m
98
99 That's the essence of what agedu does. It has other modes of operation
100 for more complex situations, and the usual array of configurable
101 options. The following sections contain a complete reference for all
102 its functionality.
103
105 This section describes the operating modes supported by agedu. Each of
106 these is in the form of a command-line option, sometimes with an argu‐
107 ment. Multiple operating-mode options may appear on the command line,
108 in which case agedu will perform the specified actions one after
109 another. For instance, as shown in the previous section, you might want
110 to perform a disk scan and immediately launch a web server giving
111 reports from that scan.
112
113 -s directory or --scan directory
114 In this mode, agedu scans the file system starting at the speci‐
115 fied directory, and indexes the results of the scan into a large
116 data file which other operating modes can query.
117
118 By default, the scan is restricted to a single file system
119 (since the expected use of agedu is that you would probably use
120 it because a particular disk partition was running low on
121 space). You can remove that restriction using the --cross-fs
122 option; other configuration options allow you to include or
123 exclude files or entire subdirectories from the scan. See the
124 next section for full details of the configurable options.
125
126 The index file is created with restrictive permissions, in case
127 the file system you are scanning contains confidential informa‐
128 tion in its structure.
129
130 Index files are dependent on the characteristics of the CPU
131 architecture you created them on. You should not expect to be
132 able to move an index file between different types of computer
133 and have it continue to work. If you need to transfer the
134 results of a disk scan to a different kind of computer, see the
135 -D and -L options below.
136
137 -w or --web
138 In this mode, agedu expects to find an index file already writ‐
139 ten. It allocates a network port, and starts up a web server on
140 that port which serves reports generated from the index file. By
141 default it invents its own URL and prints it out.
142
143 The web server runs until agedu receives an end-of-file event on
144 its standard input. (The expected usage is that you run it from
145 the command line, immediately browse web pages until you're sat‐
146 isfied, and then press Ctrl-D.) To disable the EOF behaviour,
147 use the --no-eof option.
148
149 In case the index file contains any confidential information
150 about your file system, the web server protects the pages it
151 serves from access by other people. On Linux, this is done
152 transparently by means of using /proc/net/tcp to check the owner
153 of each incoming connection; failing that, the web server will
154 require a password to view the reports, and agedu will print the
155 password it invented on standard output along with the URL.
156
157 Configurable options for this mode let you specify your own
158 address and port number to listen on, and also specify your own
159 choice of authentication method (including turning authentica‐
160 tion off completely) and a username and password of your choice.
161
162 -t directory or --text directory
163 In this mode, agedu generates a textual report on standard out‐
164 put, listing the disk usage in the specified directory and all
165 its subdirectories down to a given depth. By default that depth
166 is 1, so that you see a report for directory itself and all of
167 its immediate subdirectories. You can configure a different
168 depth (or no depth limit) using -d, described in the next sec‐
169 tion.
170
171 Used on its own, -t merely lists the total disk usage in each
172 subdirectory; agedu's additional ability to distinguish unused
173 from recently-used data is not activated. To activate it, use
174 the -a option to specify a minimum age.
175
176 The directory structure stored in agedu's index file is treated
177 as a set of literal strings. This means that you cannot refer to
178 directories by synonyms. So if you ran agedu -s ., then all the
179 path names you later pass to the -t option must be either `.' or
180 begin with `./'. Similarly, symbolic links within the directory
181 you scanned will not be followed; you must refer to each direc‐
182 tory by its canonical, symlink-free pathname.
183
184 -R or --remove
185 In this mode, agedu deletes its index file. Running just agedu
186 -R on its own is therefore equivalent to typing rm agedu.dat.
187 However, you can also put -R on the end of a command line to
188 indicate that agedu should delete its index file after it fin‐
189 ishes performing other operations.
190
191 -D or --dump
192 In this mode, agedu reads an existing index file and produces a
193 dump of its contents on standard output. This dump can later be
194 loaded into a new index file, perhaps on another computer.
195
196 -L or --load
197 In this mode, agedu expects to read a dump produced by the -D
198 option from its standard input. It constructs an index file from
199 that dump, exactly as it would have if it had read the same data
200 from a disk scan in -s mode.
201
202 -S directory or --scan-dump directory
203 In this mode, agedu will scan a directory tree and convert the
204 results straight into a dump on standard output, without gener‐
205 ating an index file at all. So running agedu -S /path should
206 produce equivalent output to that of agedu -s /path -D, except
207 that the latter will produce an index file as a side effect
208 whereas -S will not.
209
210 (The output will not be exactly identical, due to a difference
211 in treatment of last-access times on directories. However, it
212 should be effectively equivalent for most purposes. See the doc‐
213 umentation of the --dir-atime option in the next section for
214 further detail.)
215
216 -H directory or --html directory
217 In this mode, agedu will generate an HTML report of the disk
218 usage in the specified directory and its immediate subdirecto‐
219 ries, in the same form that it serves from its web server in -w
220 mode.
221
222 By default, a single HTML report will be generated and simply
223 written to standard output, with no hyperlinks pointing to other
224 similar pages. If you also specify the -d option (see below),
225 agedu will instead write out a collection of HTML files with
226 hyperlinks between them, and call the top-level file index.html.
227
228 --cgi In this mode, agedu will run as the bulk of a CGI script which
229 provides the same set of web pages as the built-in web server
230 would. It will read the usual CGI environment variables, and
231 write CGI-style data to its standard output.
232
233 The actual CGI program itself should be a tiny wrapper around
234 agedu which passes it the --cgi option, and also (probably) -f
235 to locate the index file. agedu will do everything else.
236
237 No access control is performed in this mode: restricting access
238 to CGI scripts is assumed to be the job of the web server.
239
241 This section describes the various configuration options that affect
242 agedu's operation in one mode or another.
243
244 The following option affects nearly all modes (except -S):
245
246 -f filename or --file filename
247 Specifies the location of the index file which agedu creates,
248 reads or removes depending on its operating mode. By default,
249 this is simply `agedu.dat', in whatever is the current working
250 directory when you run agedu.
251
252 The following options affect the disk-scanning modes, -s and -S:
253
254 --cross-fs and --no-cross-fs
255 These configure whether or not the disk scan is permitted to
256 cross between different file systems. The default is not to:
257 agedu will normally skip over subdirectories on which a differ‐
258 ent file system is mounted. This makes it convenient when you
259 want to free up space on a particular file system which is run‐
260 ning low. However, in other circumstances you might wish to see
261 general information about the use of space no matter which file
262 system it's on (for instance, if your real concern is your
263 backup media running out of space, and if your backups do not
264 treat different file systems specially); in that situation, use
265 --cross-fs.
266
267 (Note that this default is the opposite way round from the cor‐
268 responding option in du.)
269
270 --prune wildcard and --prune-path wildcard
271 These cause particular files or directories to be omitted
272 entirely from the scan. If agedu's scan encounters a file or
273 directory whose name matches the wildcard provided to the
274 --prune option, it will not include that file in its index, and
275 also if it's a directory it will skip over it and not scan its
276 contents.
277
278 Note that in most Unix shells, wildcards will probably need to
279 be escaped on the command line, to prevent the shell from
280 expanding the wildcard before agedu sees it.
281
282 --prune-path is similar to --prune, except that the wildcard is
283 matched against the entire pathname instead of just the filename
284 at the end of it. So whereas --prune *a*b* will match any file
285 whose actual name contains an a somewhere before a b, --prune-
286 path *a*b* will also match a file whose name contains b and
287 which is inside a directory containing an a, or any file inside
288 a directory of that form, and so on.
289
290 --exclude wildcard and --exclude-path wildcard
291 These cause particular files or directories to be omitted from
292 the index, but not from the scan. If agedu's scan encounters a
293 file or directory whose name matches the wildcard provided to
294 the --exclude option, it will not include that file in its index
295 - but unlike --prune, if the file in question is a directory it
296 will still scan its contents and index them if they are not
297 ruled out themselves by --exclude options.
298
299 As above, --exclude-path is similar to --exclude, except that
300 the wildcard is matched against the entire pathname.
301
302 --include wildcard and --include-path wildcard
303 These cause particular files or directories to be re-included in
304 the index and the scan, if they had previously been ruled out by
305 one of the above exclude or prune options. You can interleave
306 include, exclude and prune options as you wish on the command
307 line, and if more than one of them applies to a file then the
308 last one takes priority.
309
310 For example, if you wanted to see only the disk space taken up
311 by MP3 files, you might run
312
313 $ agedu -s . --exclude '*' --include '*.mp3'
314
315 which will cause everything to be omitted from the scan, but
316 then the MP3 files to be put back in. If you then wanted only a
317 subset of those MP3s, you could then exclude some of them again
318 by adding, say, `--exclude-path './queen/*'' (or, more effi‐
319 ciently, `--prune ./queen') on the end of that command.
320
321 As with the previous two options, --include-path is similar to
322 --include except that the wildcard is matched against the entire
323 pathname.
324
325 --progress, --no-progress and --tty-progress
326 When agedu is scanning a directory tree, it will typically print
327 a one-line progress report every second showing where it has
328 reached in the scan, so you can have some idea of how much
329 longer it will take. (Of course, it can't predict exactly how
330 long it will take, since it doesn't know which of the directo‐
331 ries it hasn't scanned yet will turn out to be huge.)
332
333 By default, those progress reports are displayed on agedu's
334 standard error channel, if that channel points to a terminal
335 device. If you need to manually enable or disable them, you can
336 use the above three options to do so: --progress unconditionally
337 enables the progress reports, --no-progress unconditionally dis‐
338 ables them, and --tty-progress reverts to the default behaviour
339 which is conditional on standard error being a terminal.
340
341 --dir-atime and --no-dir-atime
342 In normal operation, agedu ignores the atimes (last access
343 times) on the directories it scans: it only pays attention to
344 the atimes of the files inside those directories. This is
345 because directory atimes tend to be reset by a lot of system
346 administrative tasks, such as cron jobs which scan the file sys‐
347 tem for one reason or another - or even other invocations of
348 agedu itself, though it tries to avoid modifying any atimes if
349 possible. So the literal atimes on directories are typically not
350 representative of how long ago the data in question was last
351 accessed with real intent to use that data in particular.
352
353 Instead, agedu makes up a fake atime for every directory it
354 scans, which is equal to the newest atime of any file in or
355 below that directory (or the directory's last modification time,
356 whichever is newest). This is based on the assumption that all
357 important accesses to directories are actually accesses to the
358 files inside those directories, so that when any file is
359 accessed all the directories on the path leading to it should be
360 considered to have been accessed as well.
361
362 In unusual cases it is possible that a directory itself might
363 embody important data which is accessed by reading the direc‐
364 tory. In that situation, agedu's atime-faking policy will misre‐
365 port the directory as disused. In the unlikely event that such
366 directories form a significant part of your disk space usage,
367 you might want to turn off the faking. The --dir-atime option
368 does this: it causes the disk scan to read the original atimes
369 of the directories it scans.
370
371 The faking of atimes on directories also requires a processing
372 pass over the index file after the main disk scan is complete.
373 --dir-atime also turns this pass off. Hence, this option affects
374 the -L option as well as -s and -S.
375
376 (The previous section mentioned that there might be subtle dif‐
377 ferences between the output of agedu -s /path -D and agedu -S
378 /path. This is why. Doing a scan with -s and then dumping it
379 with -D will dump the fully faked atimes on the directories,
380 whereas doing a scan-to-dump with -S will dump only partially
381 faked atimes - specifically, each directory's last modification
382 time - since the subsequent processing pass will not have had a
383 chance to take place. However, loading either of the resulting
384 dump files with -L will perform the atime-faking processing
385 pass, leading to the same data in the index file in each case.
386 In normal usage it should be safe to ignore all of this complex‐
387 ity.)
388
389 --mtime
390 This option causes agedu to index files by their last modifica‐
391 tion time instead of their last access time. You might want to
392 use this if your last access times were completely useless for
393 some reason: for example, if you had recently searched every
394 file on your system, the system would have lost all the informa‐
395 tion about what files you hadn't recently accessed before then.
396 Using this option is liable to be less effective at finding gen‐
397 uinely wasted space than the normal mode (that is, it will be
398 more likely to flag things as disused when they're not, so you
399 will have more candidates to go through by hand looking for data
400 you don't need), but may be better than nothing if your last-
401 access times are unhelpful.
402
403 Another use for this mode might be to find recently created
404 large data. If your disk has been gradually filling up for
405 years, the default mode of agedu will let you find unused data
406 to delete; but if you know your disk had plenty of space
407 recently and now it's suddenly full, and you suspect that some
408 rogue program has left a large core dump or output file, then
409 agedu --mtime might be a convenient way to locate the culprit.
410
411 The following option affects all the modes that generate reports: the
412 web server mode -w, the stand-alone HTML generation mode -H and the
413 text report mode -t.
414
415 --files
416 This option causes agedu's reports to list the individual files
417 in each directory, instead of just giving a combined report for
418 everything that's not in a subdirectory.
419
420 The following options affect the stand-alone HTML generation mode -H
421 and the text report mode -t.
422
423 -d depth or --depth depth
424 This option controls the maximum depth to which agedu recurses
425 when generating a text or HTML report.
426
427 In text mode, the default is 1, meaning that the report will
428 include the directory given on the command line and all of its
429 immediate subdirectories. A depth of two includes another level
430 below that, and so on; a depth of zero means only the directory
431 on the command line.
432
433 In HTML mode, specifying this option switches agedu from writing
434 out a single HTML file to writing out multiple files which link
435 to each other. A depth of 1 means agedu will write out an HTML
436 file for the given directory and also one for each of its imme‐
437 diate subdirectories.
438
439 If you want agedu to recurse as deeply as possible, give the
440 special word `max' as an argument to -d.
441
442 -o filename or --output filename
443 This option is used to specify an output file for agedu to write
444 its report to. In text mode or single-file HTML mode, the argu‐
445 ment is treated as the name of a file. In multiple-file HTML
446 mode, the argument is treated as the name of a directory: the
447 directory will be created if it does not already exist, and the
448 output HTML files will be created inside it.
449
450 The following options affect the web server mode -w, and in one case
451 also the stand-alone HTML generation mode -H:
452
453 -r age range or --age-range age range
454 The HTML reports produced by agedu use a range of colours to
455 indicate how long ago data was last accessed, running from red
456 (representing the most disused data) to green (representing the
457 newest). By default, the lengths of time represented by the two
458 ends of that spectrum are chosen by examining the data file to
459 see what range of ages appears in it. However, you might want to
460 set your own limits, and you can do this using -r.
461
462 The argument to -r consists of a single age, or two ages sepa‐
463 rated by a minus sign. An age is a number, followed by one of
464 `y' (years), `m' (months), `w' (weeks) or `d' (days). The first
465 age in the range represents the oldest data, and will be
466 coloured red in the HTML; the second age represents the newest,
467 coloured green. If the second age is not specified, it will
468 default to zero (so that green means data which has been
469 accessed just now).
470
471 For example, -r 2y will mark data in red if it has been unused
472 for two years or more, and green if it has been accessed just
473 now. -r 2y-3m will similarly mark data red if it has been unused
474 for two years or more, but will mark it green if it has been
475 accessed three months ago or later.
476
477 --address addr[:port]
478 Specifies the network address and port number on which agedu
479 should listen when running its web server. If you want agedu to
480 listen for connections coming in from any source, you should
481 probably specify the special IP address 0.0.0.0. If the port
482 number is omitted, an arbitrary unused port will be chosen for
483 you and displayed.
484
485 If you specify this option, agedu will not print its URL on
486 standard output (since you are expected to know what address you
487 told it to listen to).
488
489 --auth auth-type
490 Specifies how agedu should control access to the web pages it
491 serves. The options are as follows:
492
493 magic This option only works on Linux, and only when the incom‐
494 ing connection is from the same machine that agedu is
495 running on. On Linux, the special file /proc/net/tcp con‐
496 tains a list of network connections currently known to
497 the operating system kernel, including which user id cre‐
498 ated them. So agedu will look up each incoming connection
499 in that file, and allow access if it comes from the same
500 user id under which agedu itself is running. Therefore,
501 in agedu's normal web server mode, you can safely run it
502 on a multi-user machine and no other user will be able to
503 read data out of your index file.
504
505 basic In this mode, agedu will use HTTP Basic authentication:
506 the user will have to provide a username and password via
507 their browser. agedu will normally make up a username and
508 password for the purpose, but you can specify your own;
509 see below.
510
511 none In this mode, the web server is unauthenticated: anyone
512 connecting to it has full access to the reports generated
513 by agedu. Do not do this unless there is nothing confi‐
514 dential at all in your index file, or unless you are cer‐
515 tain that nobody but you can run processes on your com‐
516 puter.
517
518 default
519 This is the default mode if you do not specify one of the
520 above. In this mode, agedu will attempt to use Linux
521 magic authentication, but if it detects at startup time
522 that /proc/net/tcp is absent or non-functional then it
523 will fall back to using HTTP Basic authentication and
524 invent a user name and password.
525
526 --auth-file filename or --auth-fd fd
527 When agedu is using HTTP Basic authentication, these options
528 allow you to specify your own user name and password. If you
529 specify --auth-file, these will be read from the specified file;
530 if you specify --auth-fd they will instead be read from a given
531 file descriptor which you should have arranged to pass to agedu.
532 In either case, the authentication details should consist of the
533 username, followed by a colon, followed by the password, fol‐
534 lowed immediately by end of file (no trailing newline, or else
535 it will be considered part of the password).
536
537 --no-eof
538 Stop agedu in web server mode from looking for end-of-file on
539 standard input and treating it as a signal to terminate.
540
542 The data file is pretty large. The core of agedu is the tree-based data
543 structure it uses in its index in order to efficiently perform the
544 queries it needs; this data structure requires O(N log N) storage. This
545 is larger than you might expect; a scan of my own home directory, con‐
546 taining half a million files and directories and about 20Gb of data,
547 produced an index file over 60Mb in size. Furthermore, since the data
548 file must be memory-mapped during most processing, it can never grow
549 larger than available address space, so a really big filesystem may
550 need to be indexed on a 64-bit computer. (This is one reason for the
551 existence of the -D and -L options: you can do the scanning on the
552 machine with access to the filesystem, and the indexing on a machine
553 big enough to handle it.)
554
555 The data structure also does not usefully permit access control within
556 the data file, so it would be difficult - even given the willingness to
557 do additional coding - to run a system-wide agedu scan on a cron job
558 and serve the right subset of reports to each user.
559
560 In certain circumstances, agedu can report false positives (reporting
561 files as disused which are in fact in use) as well as the more benign
562 false negatives (reporting files as in use which are not). This arises
563 when a file is, semantically speaking, `read' without actually being
564 physically read. Typically this occurs when a program checks whether
565 the file's mtime has changed and only bothers re-reading it if it has;
566 programs which do this include rsync(1) and make(1). Such programs will
567 fail to update the atime of unmodified files despite depending on their
568 continued existence; a directory full of such files will be reported as
569 disused by agedu but deleting them will cause trouble.
570
572 agedu is free software, distributed under the MIT licence. Type agedu
573 --licence to see the full licence text.
574
575
576
577Simon Tatham 2008‐11‐02 agedu(1)