1agedu(1) Simon Tatham agedu(1)
2
3
4
6 agedu - correlate disk usage with last-access times to identify large
7 and disused data
8
10 agedu [ options ] action [action...]
11
13 agedu scans a directory tree and produces reports about how much disk
14 space is used in each directory and subdirectory, and also how that
15 usage of disk space corresponds to files with last-access times a long
16 time ago.
17
18 In other words, agedu is a tool you might use to help you free up disk
19 space. It lets you see which directories are taking up the most space,
20 as du does; but unlike du, it also distinguishes between large collec‐
21 tions of data which are still in use and ones which have not been
22 accessed in months or years - for instance, large archives downloaded,
23 unpacked, used once, and never cleaned up. Where du helps you find
24 what's using your disk space, agedu helps you find what's wasting your
25 disk space.
26
27 agedu has several operating modes. In one mode, it scans your disk and
28 builds an index file containing a data structure which allows it to
29 efficiently retrieve any information it might need. Typically, you
30 would use it in this mode first, and then run it in one of a number of
31 `query' modes to display a report of the disk space usage of a particu‐
32 lar directory and its subdirectories. Those reports can be produced as
33 plain text (much like du) or as HTML. agedu can even run as a miniature
34 web server, presenting each directory's HTML report with hyperlinks to
35 let you navigate around the file system to similar reports for other
36 directories.
37
38 So you would typically start using agedu by telling it to do a scan of
39 a directory tree and build an index. This is done with a command such
40 as
41
42 $ agedu -s /home/fred
43
44 which will build a large data file called agedu.dat in your current
45 directory. (If that current directory is inside /home/fred, don't worry
46 - agedu is smart enough to discount its own index file.)
47
48 Having built the index, you would now query it for reports of disk
49 space usage. If you have a graphical web browser, the simplest and
50 nicest way to query the index is by running agedu in web server mode:
51
52 $ agedu -w
53
54 which will print (among other messages) a URL on its standard output
55 along the lines of
56
57 URL: http://127.0.0.1:48638/
58
59 (That URL will always begin with `127.', meaning that it's in the
60 localhost address space. So only processes running on the same computer
61 can even try to connect to that web server, and also there is access
62 control to prevent other users from seeing it - see below for more
63 detail.)
64
65 Now paste that URL into your web browser, and you will be shown a
66 graphical representation of the disk usage in /home/fred and its imme‐
67 diate subdirectories, with varying colours used to show the difference
68 between disused and recently-accessed data. Click on any subdirectory
69 to descend into it and see a report for its subdirectories in turn;
70 click on parts of the pathname at the top of any page to return to
71 higher-level directories. When you've finished browsing, you can just
72 press Ctrl-D to send an end-of-file indication to agedu, and it will
73 shut down.
74
75 After that, you probably want to delete the data file agedu.dat, since
76 it's pretty large. In fact, the command agedu -R will do this for you;
77 and you can chain agedu commands on the same command line, so that
78 instead of the above you could have done
79
80 $ agedu -s /home/fred -w -R
81
82 for a single self-contained run of agedu which builds its index, serves
83 web pages from it, and cleans it up when finished.
84
85 In some situations, you might want to scan the directory structure of
86 one computer, but run agedu's user interface on another. In that case,
87 you can do your scan using the agedu -S option in place of agedu -s,
88 which will make agedu not bother building an index file but instead
89 just write out its scan results in plain text on standard output; then
90 you can funnel that output to the other machine using SSH (or whatever
91 other technique you prefer), and there, run agedu -L to load in the
92 textual dump and turn it into an index file. For example, you might run
93 a command like this (plus any ssh options you need) on the machine you
94 want to scan:
95
96 $ agedu -S /home/fred | ssh indexing-machine agedu -L
97
98 or, equivalently, run something like this on the other machine:
99
100 $ ssh machine-to-scan agedu -S /home/fred | agedu -L
101
102 Either way, the agedu -L command will create an agedu.dat index file,
103 which you can then use with agedu -w just as above.
104
105 (Another way to do this might be to build the index file on the first
106 machine as normal, and then just copy it to the other machine once it's
107 complete. However, for efficiency, the index file is formatted differ‐
108 ently depending on the CPU architecture that agedu is compiled for. So
109 if that doesn't match between the two machines - e.g. if one is a
110 32-bit machine and one 64-bit - then agedu.dat files written on one
111 machine will not work on the other. The technique described above using
112 -S and -L should work between any two machines.)
113
114 If you don't have a graphical web browser, you can do text-based
115 queries instead of using agedu's web interface. Having scanned
116 /home/fred in any of the ways suggested above, you might run
117
118 $ agedu -t /home/fred
119
120 which again gives a summary of the disk usage in /home/fred and its
121 immediate subdirectories; but this time agedu will print it on standard
122 output, in much the same format as du. If you then want to find out how
123 much old data is there, you can add the -a option to show only files
124 last accessed a certain length of time ago. For example, to show only
125 files which haven't been looked at in six months or more:
126
127 $ agedu -t /home/fred -a 6m
128
129 That's the essence of what agedu does. It has other modes of operation
130 for more complex situations, and the usual array of configurable
131 options. The following sections contain a complete reference for all
132 its functionality.
133
135 This section describes the operating modes supported by agedu. Each of
136 these is in the form of a command-line option, sometimes with an argu‐
137 ment. Multiple operating-mode options may appear on the command line,
138 in which case agedu will perform the specified actions one after
139 another. For instance, as shown in the previous section, you might want
140 to perform a disk scan and immediately launch a web server giving
141 reports from that scan.
142
143 -s directory or --scan directory
144 In this mode, agedu scans the file system starting at the speci‐
145 fied directory, and indexes the results of the scan into a large
146 data file which other operating modes can query.
147
148 By default, the scan is restricted to a single file system
149 (since the expected use of agedu is that you would probably use
150 it because a particular disk partition was running low on
151 space). You can remove that restriction using the --cross-fs
152 option; other configuration options allow you to include or
153 exclude files or entire subdirectories from the scan. See the
154 next section for full details of the configurable options.
155
156 The index file is created with restrictive permissions, in case
157 the file system you are scanning contains confidential informa‐
158 tion in its structure.
159
160 Index files are dependent on the characteristics of the CPU
161 architecture you created them on. You should not expect to be
162 able to move an index file between different types of computer
163 and have it continue to work. If you need to transfer the
164 results of a disk scan to a different kind of computer, see the
165 -D and -L options below.
166
167 -w or --web
168 In this mode, agedu expects to find an index file already writ‐
169 ten. It allocates a network port, and starts up a web server on
170 that port which serves reports generated from the index file. By
171 default it invents its own URL and prints it out.
172
173 The web server runs until agedu receives an end-of-file event on
174 its standard input. (The expected usage is that you run it from
175 the command line, immediately browse web pages until you're sat‐
176 isfied, and then press Ctrl-D.) To disable the EOF behaviour,
177 use the --no-eof option.
178
179 In case the index file contains any confidential information
180 about your file system, the web server protects the pages it
181 serves from access by other people. On Linux, this is done
182 transparently by means of using /proc/net/tcp to check the owner
183 of each incoming connection; failing that, the web server will
184 require a password to view the reports, and agedu will print the
185 password it invented on standard output along with the URL.
186
187 Configurable options for this mode let you specify your own
188 address and port number to listen on, and also specify your own
189 choice of authentication method (including turning authentica‐
190 tion off completely) and a username and password of your choice.
191
192 -t directory or --text directory
193 In this mode, agedu generates a textual report on standard out‐
194 put, listing the disk usage in the specified directory and all
195 its subdirectories down to a given depth. By default that depth
196 is 1, so that you see a report for directory itself and all of
197 its immediate subdirectories. You can configure a different
198 depth (or no depth limit) using -d, described in the next sec‐
199 tion.
200
201 Used on its own, -t merely lists the total disk usage in each
202 subdirectory; agedu's additional ability to distinguish unused
203 from recently-used data is not activated. To activate it, use
204 the -a option to specify a minimum age.
205
206 The directory structure stored in agedu's index file is treated
207 as a set of literal strings. This means that you cannot refer to
208 directories by synonyms. So if you ran agedu -s ., then all the
209 path names you later pass to the -t option must be either `.' or
210 begin with `./'. Similarly, symbolic links within the directory
211 you scanned will not be followed; you must refer to each direc‐
212 tory by its canonical, symlink-free pathname.
213
214 -R or --remove
215 In this mode, agedu deletes its index file. Running just agedu
216 -R on its own is therefore equivalent to typing rm agedu.dat.
217 However, you can also put -R on the end of a command line to
218 indicate that agedu should delete its index file after it fin‐
219 ishes performing other operations.
220
221 -S directory or --scan-dump directory
222 In this mode, agedu will scan a directory tree and convert the
223 results straight into a textual dump on standard output, without
224 generating an index file at all. The dump data is intended for
225 agedu -L to read.
226
227 -L or --load
228 In this mode, agedu expects to read a dump produced by the -S
229 option from its standard input. It constructs an index file from
230 that dump, exactly as it would have if it had read the same data
231 from a disk scan in -s mode.
232
233 -D or --dump
234 In this mode, agedu reads an existing index file and produces a
235 dump of its contents on standard output, in the same format used
236 by -S and -L. This option could be used to convert an existing
237 index file into a format acceptable to a different kind of com‐
238 puter, by dumping it using -D and then loading the dump back in
239 on the other machine using -L.
240
241 (The output of agedu -D on an existing index file will not be
242 exactly identical to what agedu -S would have originally pro‐
243 duced, due to a difference in treatment of last-access times on
244 directories. However, it should be effectively equivalent for
245 most purposes. See the documentation of the --dir-atime option
246 in the next section for further detail.)
247
248 -H directory or --html directory
249 In this mode, agedu will generate an HTML report of the disk
250 usage in the specified directory and its immediate subdirecto‐
251 ries, in the same form that it serves from its web server in -w
252 mode.
253
254 By default, a single HTML report will be generated and simply
255 written to standard output, with no hyperlinks pointing to other
256 similar pages. If you also specify the -d option (see below),
257 agedu will instead write out a collection of HTML files with
258 hyperlinks between them, and call the top-level file index.html.
259
260 --cgi In this mode, agedu will run as the bulk of a CGI script which
261 provides the same set of web pages as the built-in web server
262 would. It will read the usual CGI environment variables, and
263 write CGI-style data to its standard output.
264
265 The actual CGI program itself should be a tiny wrapper around
266 agedu which passes it the --cgi option, and also (probably) -f
267 to locate the index file. agedu will do everything else. For
268 example, your script might read
269
270 #!/bin/sh
271 /some/path/to/agedu --cgi -f /some/other/path/to/agedu.dat
272
273 (Note that agedu will produce the entire CGI output, including
274 status code, HTTP headers and the full HTML document. If you try
275 to surround the call to agedu --cgi with code that adds your own
276 HTML header and footer, you won't get the results you want, and
277 agedu's HTTP-level features such as auto-redirecting to canoni‐
278 cal versions of URIs will stop working.)
279
280 No access control is performed in this mode: restricting access
281 to CGI scripts is assumed to be the job of the web server.
282
283 --presort and --postsort
284 In these two modes, agedu will expect to read a textual data
285 dump from its standard input of the form produced by -S (and
286 -D). It will transform the data into a different version of its
287 text dump format, and write the transformed version on standard
288 output.
289
290 The ordinary dump file format is reasonably readable, but load‐
291 ing it into an index file using agedu -L requires it to be
292 sorted in a specific order, which is complicated to describe and
293 difficult to implement using ordinary Unix sorting tools. So if
294 you want to construct your own data dump from a source of your
295 own that agedu itself doesn't know how to scan, you will need to
296 make sure it's sorted in the right order.
297
298 To help with this, agedu provides a secondary dump format which
299 is `sortable', in the sense that ordinary sort(1) without argu‐
300 ments will arrange it into the right order. However, the
301 sortable format is much more unreadable and also twice the size,
302 so you wouldn't want to write it directly!
303
304 So the recommended procedure is to generate dump data in the
305 ordinary format; then pipe it through agedu --presort to turn it
306 into the sortable format; then sort it; then pipe it into agedu
307 -L (which can accept either the normal or the sortable format as
308 input). For example:
309
310 generate_custom_data.sh | agedu --presort | sort | agedu -L
311
312 If you need to transform the sorted dump file back into the
313 ordinary format, agedu --postsort can do that. But since agedu
314 -L can accept either format as input, you may not need to.
315
316 -h or --help
317 Causes agedu to print some help text and terminate immediately.
318
319 -V or --version
320 Causes agedu to print its version number and terminate immedi‐
321 ately.
322
324 This section describes the various configuration options that affect
325 agedu's operation in one mode or another.
326
327 The following option affects nearly all modes (except -S):
328
329 -f filename or --file filename
330 Specifies the location of the index file which agedu creates,
331 reads or removes depending on its operating mode. By default,
332 this is simply `agedu.dat', in whatever is the current working
333 directory when you run agedu.
334
335 The following options affect the disk-scanning modes, -s and -S:
336
337 --cross-fs and --no-cross-fs
338 These configure whether or not the disk scan is permitted to
339 cross between different file systems. The default is not to:
340 agedu will normally skip over subdirectories on which a differ‐
341 ent file system is mounted. This makes it convenient when you
342 want to free up space on a particular file system which is run‐
343 ning low. However, in other circumstances you might wish to see
344 general information about the use of space no matter which file
345 system it's on (for instance, if your real concern is your
346 backup media running out of space, and if your backups do not
347 treat different file systems specially); in that situation, use
348 --cross-fs.
349
350 (Note that this default is the opposite way round from the cor‐
351 responding option in du.)
352
353 --prune wildcard and --prune-path wildcard
354 These cause particular files or directories to be omitted
355 entirely from the scan. If agedu's scan encounters a file or
356 directory whose name matches the wildcard provided to the
357 --prune option, it will not include that file in its index, and
358 also if it's a directory it will skip over it and not scan its
359 contents.
360
361 Note that in most Unix shells, wildcards will probably need to
362 be escaped on the command line, to prevent the shell from
363 expanding the wildcard before agedu sees it.
364
365 --prune-path is similar to --prune, except that the wildcard is
366 matched against the entire pathname instead of just the filename
367 at the end of it. So whereas --prune *a*b* will match any file
368 whose actual name contains an a somewhere before a b, --prune-
369 path *a*b* will also match a file whose name contains b and
370 which is inside a directory containing an a, or any file inside
371 a directory of that form, and so on.
372
373 --exclude wildcard and --exclude-path wildcard
374 These cause particular files or directories to be omitted from
375 the index, but not from the scan. If agedu's scan encounters a
376 file or directory whose name matches the wildcard provided to
377 the --exclude option, it will not include that file in its index
378 - but unlike --prune, if the file in question is a directory it
379 will still scan its contents and index them if they are not
380 ruled out themselves by --exclude options.
381
382 As above, --exclude-path is similar to --exclude, except that
383 the wildcard is matched against the entire pathname.
384
385 --include wildcard and --include-path wildcard
386 These cause particular files or directories to be re-included in
387 the index and the scan, if they had previously been ruled out by
388 one of the above exclude or prune options. You can interleave
389 include, exclude and prune options as you wish on the command
390 line, and if more than one of them applies to a file then the
391 last one takes priority.
392
393 For example, if you wanted to see only the disk space taken up
394 by MP3 files, you might run
395
396 $ agedu -s . --exclude '*' --include '*.mp3'
397
398 which will cause everything to be omitted from the scan, but
399 then the MP3 files to be put back in. If you then wanted only a
400 subset of those MP3s, you could then exclude some of them again
401 by adding, say, `--exclude-path './queen/*'' (or, more effi‐
402 ciently, `--prune ./queen') on the end of that command.
403
404 As with the previous two options, --include-path is similar to
405 --include except that the wildcard is matched against the entire
406 pathname.
407
408 --progress, --no-progress and --tty-progress
409 When agedu is scanning a directory tree, it will typically print
410 a one-line progress report every second showing where it has
411 reached in the scan, so you can have some idea of how much
412 longer it will take. (Of course, it can't predict exactly how
413 long it will take, since it doesn't know which of the directo‐
414 ries it hasn't scanned yet will turn out to be huge.)
415
416 By default, those progress reports are displayed on agedu's
417 standard error channel, if that channel points to a terminal
418 device. If you need to manually enable or disable them, you can
419 use the above three options to do so: --progress unconditionally
420 enables the progress reports, --no-progress unconditionally dis‐
421 ables them, and --tty-progress reverts to the default behaviour
422 which is conditional on standard error being a terminal.
423
424 --dir-atime and --no-dir-atime
425 In normal operation, agedu ignores the atimes (last access
426 times) on the directories it scans: it only pays attention to
427 the atimes of the files inside those directories. This is
428 because directory atimes tend to be reset by a lot of system
429 administrative tasks, such as cron jobs which scan the file sys‐
430 tem for one reason or another - or even other invocations of
431 agedu itself, though it tries to avoid modifying any atimes if
432 possible. So the literal atimes on directories are typically not
433 representative of how long ago the data in question was last
434 accessed with real intent to use that data in particular.
435
436 Instead, agedu makes up a fake atime for every directory it
437 scans, which is equal to the newest atime of any file in or
438 below that directory (or the directory's last modification time,
439 whichever is newest). This is based on the assumption that all
440 important accesses to directories are actually accesses to the
441 files inside those directories, so that when any file is
442 accessed all the directories on the path leading to it should be
443 considered to have been accessed as well.
444
445 In unusual cases it is possible that a directory itself might
446 embody important data which is accessed by reading the direc‐
447 tory. In that situation, agedu's atime-faking policy will misre‐
448 port the directory as disused. In the unlikely event that such
449 directories form a significant part of your disk space usage,
450 you might want to turn off the faking. The --dir-atime option
451 does this: it causes the disk scan to read the original atimes
452 of the directories it scans.
453
454 The faking of atimes on directories also requires a processing
455 pass over the index file after the main disk scan is complete.
456 --dir-atime also turns this pass off. Hence, this option affects
457 the -L option as well as -s and -S.
458
459 (The previous section mentioned that there might be subtle dif‐
460 ferences between the output of agedu -s /path -D and agedu -S
461 /path. This is why. Doing a scan with -s and then dumping it
462 with -D will dump the fully faked atimes on the directories,
463 whereas doing a scan-to-dump with -S will dump only partially
464 faked atimes - specifically, each directory's last modification
465 time - since the subsequent processing pass will not have had a
466 chance to take place. However, loading either of the resulting
467 dump files with -L will perform the atime-faking processing
468 pass, leading to the same data in the index file in each case.
469 In normal usage it should be safe to ignore all of this complex‐
470 ity.)
471
472 --mtime
473 This option causes agedu to index files by their last modifica‐
474 tion time instead of their last access time. You might want to
475 use this if your last access times were completely useless for
476 some reason: for example, if you had recently searched every
477 file on your system, the system would have lost all the informa‐
478 tion about what files you hadn't recently accessed before then.
479 Using this option is liable to be less effective at finding gen‐
480 uinely wasted space than the normal mode (that is, it will be
481 more likely to flag things as disused when they're not, so you
482 will have more candidates to go through by hand looking for data
483 you don't need), but may be better than nothing if your last-
484 access times are unhelpful.
485
486 Another use for this mode might be to find recently created
487 large data. If your disk has been gradually filling up for
488 years, the default mode of agedu will let you find unused data
489 to delete; but if you know your disk had plenty of space
490 recently and now it's suddenly full, and you suspect that some
491 rogue program has left a large core dump or output file, then
492 agedu --mtime might be a convenient way to locate the culprit.
493
494 --logicalsize
495 This option causes agedu to consider the size of each file to be
496 its `logical' size, rather than the amount of space it consumes
497 on disk. (That is, it will use st_size instead of st_blocks in
498 the data returned from stat(2).) This option makes agedu less
499 accurate at reporting how much of your disk is used, but it
500 might be useful in specialist cases, such as working around a
501 file system that is misreporting physical sizes.
502
503 For most files, the physical size of a file will be larger than
504 the logical size, reflecting the fact that filesystem layouts
505 generally allocate a whole number of blocks of the disk to each
506 file, so some space is wasted at the end of the last block. So
507 counting only the logical file size will typically cause under-
508 reporting of the disk usage (perhaps large under-reporting in
509 the case of a very large number of very small files).
510
511 On the other hand, sometimes a file with a very large logical
512 size can have `holes' where no data is actually stored, in which
513 case using the logical size of the file will over-report its
514 disk usage. So the use of logical sizes can give wrong answers
515 in both directions.
516
517 The following option affects all the modes that generate reports: the
518 web server mode -w, the stand-alone HTML generation mode -H and the
519 text report mode -t.
520
521 --files
522 This option causes agedu's reports to list the individual files
523 in each directory, instead of just giving a combined report for
524 everything that's not in a subdirectory.
525
526 The following option affects the text report mode -t.
527
528 -a age or --age age
529 This option tells agedu to report only files of at least the
530 specified age. An age is specified as a number, followed by one
531 of `y' (years), `m' (months), `w' (weeks) or `d' (days). (This
532 syntax is also used by the -r option.) For example, -a 6m will
533 produce a text report which includes only files at least six
534 months old.
535
536 The following options affect the stand-alone HTML generation mode -H
537 and the text report mode -t.
538
539 -d depth or --depth depth
540 This option controls the maximum depth to which agedu recurses
541 when generating a text or HTML report.
542
543 In text mode, the default is 1, meaning that the report will
544 include the directory given on the command line and all of its
545 immediate subdirectories. A depth of two includes another level
546 below that, and so on; a depth of zero means only the directory
547 on the command line.
548
549 In HTML mode, specifying this option switches agedu from writing
550 out a single HTML file to writing out multiple files which link
551 to each other. A depth of 1 means agedu will write out an HTML
552 file for the given directory and also one for each of its imme‐
553 diate subdirectories.
554
555 If you want agedu to recurse as deeply as possible, give the
556 special word `max' as an argument to -d.
557
558 -o filename or --output filename
559 This option is used to specify an output file for agedu to write
560 its report to. In text mode or single-file HTML mode, the argu‐
561 ment is treated as the name of a file. In multiple-file HTML
562 mode, the argument is treated as the name of a directory: the
563 directory will be created if it does not already exist, and the
564 output HTML files will be created inside it.
565
566 The following option affects only the stand-alone HTML generation mode
567 -H, and even then, only in recursive mode (with -d):
568
569 --numeric
570 This option tells agedu to name most of its output HTML files
571 numerically. The root of the whole output file collection will
572 still be called index.html, but all the rest will have names
573 like 73.html or 12525.html. (The numbers are essentially arbi‐
574 trary; in fact, they're indices of nodes in the data structure
575 used by agedu's index file.)
576
577 This system of file naming is less intuitive than the default of
578 naming files after the sub-pathname they index. It's also less
579 stable: the same pathname will not necessarily be represented by
580 the same filename if agedu -H is re-run after another scan of
581 the same directory tree. However, it does have the virtue that
582 it keeps the filenames short, so that even if your directory
583 tree is very deep, the output HTML files won't exceed any OS
584 limit on filename length.
585
586 The following options affect the web server mode -w, and in some cases
587 also the stand-alone HTML generation mode -H:
588
589 -r age range or --age-range age range
590 The HTML reports produced by agedu use a range of colours to
591 indicate how long ago data was last accessed, running from red
592 (representing the most disused data) to green (representing the
593 newest). By default, the lengths of time represented by the two
594 ends of that spectrum are chosen by examining the data file to
595 see what range of ages appears in it. However, you might want to
596 set your own limits, and you can do this using -r.
597
598 The argument to -r consists of a single age, or two ages sepa‐
599 rated by a minus sign. An age is a number, followed by one of
600 `y' (years), `m' (months), `w' (weeks) or `d' (days). (This syn‐
601 tax is also used by the -a option.) The first age in the range
602 represents the oldest data, and will be coloured red in the
603 HTML; the second age represents the newest, coloured green. If
604 the second age is not specified, it will default to zero (so
605 that green means data which has been accessed just now).
606
607 For example, -r 2y will mark data in red if it has been unused
608 for two years or more, and green if it has been accessed just
609 now. -r 2y-3m will similarly mark data red if it has been unused
610 for two years or more, but will mark it green if it has been
611 accessed three months ago or later.
612
613 --address addr[:port]
614 Specifies the network address and port number on which agedu
615 should listen when running its web server. If you want agedu to
616 listen for connections coming in from any source, specify the
617 address as the special value ANY. If the port number is omitted,
618 an arbitrary unused port will be chosen for you and displayed.
619
620 If you specify this option, agedu will not print its URL on
621 standard output (since you are expected to know what address you
622 told it to listen to).
623
624 --auth auth-type
625 Specifies how agedu should control access to the web pages it
626 serves. The options are as follows:
627
628 magic This option only works on Linux, and only when the incom‐
629 ing connection is from the same machine that agedu is
630 running on. On Linux, the special file /proc/net/tcp con‐
631 tains a list of network connections currently known to
632 the operating system kernel, including which user id cre‐
633 ated them. So agedu will look up each incoming connection
634 in that file, and allow access if it comes from the same
635 user id under which agedu itself is running. Therefore,
636 in agedu's normal web server mode, you can safely run it
637 on a multi-user machine and no other user will be able to
638 read data out of your index file.
639
640 basic In this mode, agedu will use HTTP Basic authentication:
641 the user will have to provide a username and password via
642 their browser. agedu will normally make up a username and
643 password for the purpose, but you can specify your own;
644 see below.
645
646 none In this mode, the web server is unauthenticated: anyone
647 connecting to it has full access to the reports generated
648 by agedu. Do not do this unless there is nothing confi‐
649 dential at all in your index file, or unless you are cer‐
650 tain that nobody but you can run processes on your com‐
651 puter.
652
653 default
654 This is the default mode if you do not specify one of the
655 above. In this mode, agedu will attempt to use Linux
656 magic authentication, but if it detects at startup time
657 that /proc/net/tcp is absent or non-functional then it
658 will fall back to using HTTP Basic authentication and
659 invent a user name and password.
660
661 --auth-file filename or --auth-fd fd
662 When agedu is using HTTP Basic authentication, these options
663 allow you to specify your own user name and password. If you
664 specify --auth-file, these will be read from the specified file;
665 if you specify --auth-fd they will instead be read from a given
666 file descriptor which you should have arranged to pass to agedu.
667 In either case, the authentication details should consist of the
668 username, followed by a colon, followed by the password, fol‐
669 lowed immediately by end of file (no trailing newline, or else
670 it will be considered part of the password).
671
672 --title title
673 Specify the string that appears at the start of the <title> sec‐
674 tion of the output HTML pages. The default is `agedu'. This
675 title is followed by a colon and then the path you're viewing
676 within the index file. You might use this option if you were
677 serving agedu reports for several different servers and wanted
678 to make it clearer which one a user was looking at.
679
680 --launch shell-command
681 Specify a command to be run with the base URL of the web user
682 interface, once the web server has started up. The command will
683 be interpreted by /bin/sh, and the base URL will be appended to
684 it as an extra argument word.
685
686 A typical use for this would be `--launch=browse', which uses
687 the XDG `browse' command to automatically open the agedu web
688 interface in your default browser. However, other uses are pos‐
689 sible: for example, you could provide a command which communi‐
690 cates the URL to some other software that will use it for some‐
691 thing.
692
693 --no-eof
694 Stop agedu in web server mode from looking for end-of-file on
695 standard input and treating it as a signal to terminate.
696
698 The data file is pretty large. The core of agedu is the tree-based data
699 structure it uses in its index in order to efficiently perform the
700 queries it needs; this data structure requires O(N log N) storage. This
701 is larger than you might expect; a scan of my own home directory, con‐
702 taining half a million files and directories and about 20Gb of data,
703 produced an index file over 60Mb in size. Furthermore, since the data
704 file must be memory-mapped during most processing, it can never grow
705 larger than available address space, so a really big filesystem may
706 need to be indexed on a 64-bit computer. (This is one reason for the
707 existence of the -D and -L options: you can do the scanning on the
708 machine with access to the filesystem, and the indexing on a machine
709 big enough to handle it.)
710
711 The data structure also does not usefully permit access control within
712 the data file, so it would be difficult - even given the willingness to
713 do additional coding - to run a system-wide agedu scan on a cron job
714 and serve the right subset of reports to each user.
715
716 In certain circumstances, agedu can report false positives (reporting
717 files as disused which are in fact in use) as well as the more benign
718 false negatives (reporting files as in use which are not). This arises
719 when a file is, semantically speaking, `read' without actually being
720 physically read. Typically this occurs when a program checks whether
721 the file's mtime has changed and only bothers re-reading it if it has;
722 programs which do this include rsync(1) and make(1). Such programs will
723 fail to update the atime of unmodified files despite depending on their
724 continued existence; a directory full of such files will be reported as
725 disused by agedu even in situations where deleting them will cause
726 trouble.
727
728 Finally, of course, agedu's normal usage mode depends critically on the
729 OS providing last-access times which are at least approximately right.
730 So a file system mounted with Linux's `noatime' option, or the equiva‐
731 lent on any other OS, will not give useful results! (However, the Linux
732 mount option `relatime', which distributions now tend to use by
733 default, should be fine for all but specialist purposes: it reduces the
734 accuracy of last-access times so that they might be wrong by up to 24
735 hours, but if you're looking for files that have been unused for months
736 or years, that's not a problem.)
737
739 agedu is free software, distributed under the MIT licence. Type agedu
740 --licence to see the full licence text.
741
742
743
744Simon Tatham 2008‐11‐02 agedu(1)