1agedu(1) Simon Tatham agedu(1)
2
3
4
6 agedu - correlate disk usage with last-access times to identify large
7 and disused data
8
10 agedu [ options ] action [action...]
11
13 agedu scans a directory tree and produces reports about how much disk
14 space is used in each directory and subdirectory, and also how that
15 usage of disk space corresponds to files with last-access times a long
16 time ago.
17
18 In other words, agedu is a tool you might use to help you free up disk
19 space. It lets you see which directories are taking up the most space,
20 as du does; but unlike du, it also distinguishes between large collec‐
21 tions of data which are still in use and ones which have not been
22 accessed in months or years - for instance, large archives downloaded,
23 unpacked, used once, and never cleaned up. Where du helps you find
24 what's using your disk space, agedu helps you find what's wasting your
25 disk space.
26
27 agedu has several operating modes. In one mode, it scans your disk and
28 builds an index file containing a data structure which allows it to
29 efficiently retrieve any information it might need. Typically, you
30 would use it in this mode first, and then run it in one of a number of
31 `query' modes to display a report of the disk space usage of a particu‐
32 lar directory and its subdirectories. Those reports can be produced as
33 plain text (much like du) or as HTML. agedu can even run as a miniature
34 web server, presenting each directory's HTML report with hyperlinks to
35 let you navigate around the file system to similar reports for other
36 directories.
37
38 So you would typically start using agedu by telling it to do a scan of
39 a directory tree and build an index. This is done with a command such
40 as
41
42 $ agedu -s /home/fred
43
44 which will build a large data file called agedu.dat in your current
45 directory. (If that current directory is inside /home/fred, don't worry
46 - agedu is smart enough to discount its own index file.)
47
48 Having built the index, you would now query it for reports of disk
49 space usage. If you have a graphical web browser, the simplest and
50 nicest way to query the index is by running agedu in web server mode:
51
52 $ agedu -w
53
54 which will print (among other messages) a URL on its standard output
55 along the lines of
56
57 URL: http://127.0.0.1:48638/
58
59 (That URL will always begin with `127.', meaning that it's in the
60 localhost address space. So only processes running on the same computer
61 can even try to connect to that web server, and also there is access
62 control to prevent other users from seeing it - see below for more
63 detail.)
64
65 Now paste that URL into your web browser, and you will be shown a
66 graphical representation of the disk usage in /home/fred and its imme‐
67 diate subdirectories, with varying colours used to show the difference
68 between disused and recently-accessed data. Click on any subdirectory
69 to descend into it and see a report for its subdirectories in turn;
70 click on parts of the pathname at the top of any page to return to
71 higher-level directories. When you've finished browsing, you can just
72 press Ctrl-D to send an end-of-file indication to agedu, and it will
73 shut down.
74
75 After that, you probably want to delete the data file agedu.dat, since
76 it's pretty large. In fact, the command agedu -R will do this for you;
77 and you can chain agedu commands on the same command line, so that
78 instead of the above you could have done
79
80 $ agedu -s /home/fred -w -R
81
82 for a single self-contained run of agedu which builds its index, serves
83 web pages from it, and cleans it up when finished.
84
85 In some situations, you might want to scan the directory structure of
86 one computer, but run agedu's user interface on another. In that case,
87 you can do your scan using the agedu -S option in place of agedu -s,
88 which will make agedu not bother building an index file but instead
89 just write out its scan results in plain text on standard output; then
90 you can funnel that output to the other machine using SSH (or whatever
91 other technique you prefer), and there, run agedu -L to load in the
92 textual dump and turn it into an index file. For example, you might run
93 a command like this (plus any ssh options you need) on the machine you
94 want to scan:
95
96 $ agedu -S /home/fred | ssh indexing-machine agedu -L
97
98 or, equivalently, run something like this on the other machine:
99
100 $ ssh machine-to-scan agedu -S /home/fred | agedu -L
101
102 Either way, the agedu -L command will create an agedu.dat index file,
103 which you can then use with agedu -w just as above.
104
105 (Another way to do this might be to build the index file on the first
106 machine as normal, and then just copy it to the other machine once it's
107 complete. However, for efficiency, the index file is formatted differ‐
108 ently depending on the CPU architecture that agedu is compiled for. So
109 if that doesn't match between the two machines - e.g. if one is a
110 32-bit machine and one 64-bit - then agedu.dat files written on one
111 machine will not work on the other. The technique described above using
112 -S and -L should work between any two machines.)
113
114 If you don't have a graphical web browser, you can do text-based
115 queries instead of using agedu's web interface. Having scanned
116 /home/fred in any of the ways suggested above, you might run
117
118 $ agedu -t /home/fred
119
120 which again gives a summary of the disk usage in /home/fred and its
121 immediate subdirectories; but this time agedu will print it on standard
122 output, in much the same format as du. If you then want to find out how
123 much old data is there, you can add the -a option to show only files
124 last accessed a certain length of time ago. For example, to show only
125 files which haven't been looked at in six months or more:
126
127 $ agedu -t /home/fred -a 6m
128
129 That's the essence of what agedu does. It has other modes of operation
130 for more complex situations, and the usual array of configurable
131 options. The following sections contain a complete reference for all
132 its functionality.
133
135 This section describes the operating modes supported by agedu. Each of
136 these is in the form of a command-line option, sometimes with an argu‐
137 ment. Multiple operating-mode options may appear on the command line,
138 in which case agedu will perform the specified actions one after
139 another. For instance, as shown in the previous section, you might want
140 to perform a disk scan and immediately launch a web server giving
141 reports from that scan.
142
143 -s directory or --scan directory
144 In this mode, agedu scans the file system starting at the speci‐
145 fied directory, and indexes the results of the scan into a large
146 data file which other operating modes can query.
147
148 By default, the scan is restricted to a single file system
149 (since the expected use of agedu is that you would probably use
150 it because a particular disk partition was running low on
151 space). You can remove that restriction using the --cross-fs
152 option; other configuration options allow you to include or
153 exclude files or entire subdirectories from the scan. See the
154 next section for full details of the configurable options.
155
156 The index file is created with restrictive permissions, in case
157 the file system you are scanning contains confidential informa‐
158 tion in its structure.
159
160 Index files are dependent on the characteristics of the CPU
161 architecture you created them on. You should not expect to be
162 able to move an index file between different types of computer
163 and have it continue to work. If you need to transfer the
164 results of a disk scan to a different kind of computer, see the
165 -D and -L options below.
166
167 -w or --web
168 In this mode, agedu expects to find an index file already writ‐
169 ten. It allocates a network port, and starts up a web server on
170 that port which serves reports generated from the index file. By
171 default it invents its own URL and prints it out.
172
173 The web server runs until agedu receives an end-of-file event on
174 its standard input. (The expected usage is that you run it from
175 the command line, immediately browse web pages until you're sat‐
176 isfied, and then press Ctrl-D.) To disable the EOF behaviour,
177 use the --no-eof option.
178
179 In case the index file contains any confidential information
180 about your file system, the web server protects the pages it
181 serves from access by other people. On Linux, this is done
182 transparently by means of using /proc/net/tcp to check the owner
183 of each incoming connection; failing that, the web server will
184 require a password to view the reports, and agedu will print the
185 password it invented on standard output along with the URL.
186
187 Configurable options for this mode let you specify your own
188 address and port number to listen on, and also specify your own
189 choice of authentication method (including turning authentica‐
190 tion off completely) and a username and password of your choice.
191
192 -t directory or --text directory
193 In this mode, agedu generates a textual report on standard out‐
194 put, listing the disk usage in the specified directory and all
195 its subdirectories down to a given depth. By default that depth
196 is 1, so that you see a report for directory itself and all of
197 its immediate subdirectories. You can configure a different
198 depth (or no depth limit) using -d, described in the next sec‐
199 tion.
200
201 Used on its own, -t merely lists the total disk usage in each
202 subdirectory; agedu's additional ability to distinguish unused
203 from recently-used data is not activated. To activate it, use
204 the -a option to specify a minimum age.
205
206 The directory structure stored in agedu's index file is treated
207 as a set of literal strings. This means that you cannot refer to
208 directories by synonyms. So if you ran agedu -s ., then all the
209 path names you later pass to the -t option must be either `.' or
210 begin with `./'. Similarly, symbolic links within the directory
211 you scanned will not be followed; you must refer to each direc‐
212 tory by its canonical, symlink-free pathname.
213
214 -R or --remove
215 In this mode, agedu deletes its index file. Running just agedu
216 -R on its own is therefore equivalent to typing rm agedu.dat.
217 However, you can also put -R on the end of a command line to
218 indicate that agedu should delete its index file after it fin‐
219 ishes performing other operations.
220
221 -S directory or --scan-dump directory
222 In this mode, agedu will scan a directory tree and convert the
223 results straight into a textual dump on standard output, without
224 generating an index file at all. The dump data is intended for
225 agedu -L to read.
226
227 -L or --load
228 In this mode, agedu expects to read a dump produced by the -S
229 option from its standard input. It constructs an index file from
230 that dump, exactly as it would have if it had read the same data
231 from a disk scan in -s mode.
232
233 -D or --dump
234 In this mode, agedu reads an existing index file and produces a
235 dump of its contents on standard output, in the same format used
236 by -S and -L. This option could be used to convert an existing
237 index file into a format acceptable to a different kind of com‐
238 puter, by dumping it using -D and then loading the dump back in
239 on the other machine using -L.
240
241 (The output of agedu -D on an existing index file will not be
242 exactly identical to what agedu -S would have originally pro‐
243 duced, due to a difference in treatment of last-access times on
244 directories. However, it should be effectively equivalent for
245 most purposes. See the documentation of the --dir-atime option
246 in the next section for further detail.)
247
248 -H directory or --html directory
249 In this mode, agedu will generate an HTML report of the disk
250 usage in the specified directory and its immediate subdirecto‐
251 ries, in the same form that it serves from its web server in -w
252 mode.
253
254 By default, a single HTML report will be generated and simply
255 written to standard output, with no hyperlinks pointing to other
256 similar pages. If you also specify the -d option (see below),
257 agedu will instead write out a collection of HTML files with
258 hyperlinks between them, and call the top-level file index.html.
259
260 --cgi In this mode, agedu will run as the bulk of a CGI script which
261 provides the same set of web pages as the built-in web server
262 would. It will read the usual CGI environment variables, and
263 write CGI-style data to its standard output.
264
265 The actual CGI program itself should be a tiny wrapper around
266 agedu which passes it the --cgi option, and also (probably) -f
267 to locate the index file. agedu will do everything else. For
268 example, your script might read
269
270 #!/bin/sh
271 /some/path/to/agedu --cgi -f /some/other/path/to/agedu.dat
272
273 (Note that agedu will produce the entire CGI output, including
274 status code, HTTP headers and the full HTML document. If you try
275 to surround the call to agedu --cgi with code that adds your own
276 HTML header and footer, you won't get the results you want, and
277 agedu's HTTP-level features such as auto-redirecting to canoni‐
278 cal versions of URIs will stop working.)
279
280 No access control is performed in this mode: restricting access
281 to CGI scripts is assumed to be the job of the web server.
282
283 -h or --help
284 Causes agedu to print some help text and terminate immediately.
285
286 -V or --version
287 Causes agedu to print its version number and terminate immedi‐
288 ately.
289
291 This section describes the various configuration options that affect
292 agedu's operation in one mode or another.
293
294 The following option affects nearly all modes (except -S):
295
296 -f filename or --file filename
297 Specifies the location of the index file which agedu creates,
298 reads or removes depending on its operating mode. By default,
299 this is simply `agedu.dat', in whatever is the current working
300 directory when you run agedu.
301
302 The following options affect the disk-scanning modes, -s and -S:
303
304 --cross-fs and --no-cross-fs
305 These configure whether or not the disk scan is permitted to
306 cross between different file systems. The default is not to:
307 agedu will normally skip over subdirectories on which a differ‐
308 ent file system is mounted. This makes it convenient when you
309 want to free up space on a particular file system which is run‐
310 ning low. However, in other circumstances you might wish to see
311 general information about the use of space no matter which file
312 system it's on (for instance, if your real concern is your
313 backup media running out of space, and if your backups do not
314 treat different file systems specially); in that situation, use
315 --cross-fs.
316
317 (Note that this default is the opposite way round from the cor‐
318 responding option in du.)
319
320 --prune wildcard and --prune-path wildcard
321 These cause particular files or directories to be omitted
322 entirely from the scan. If agedu's scan encounters a file or
323 directory whose name matches the wildcard provided to the
324 --prune option, it will not include that file in its index, and
325 also if it's a directory it will skip over it and not scan its
326 contents.
327
328 Note that in most Unix shells, wildcards will probably need to
329 be escaped on the command line, to prevent the shell from
330 expanding the wildcard before agedu sees it.
331
332 --prune-path is similar to --prune, except that the wildcard is
333 matched against the entire pathname instead of just the filename
334 at the end of it. So whereas --prune *a*b* will match any file
335 whose actual name contains an a somewhere before a b, --prune-
336 path *a*b* will also match a file whose name contains b and
337 which is inside a directory containing an a, or any file inside
338 a directory of that form, and so on.
339
340 --exclude wildcard and --exclude-path wildcard
341 These cause particular files or directories to be omitted from
342 the index, but not from the scan. If agedu's scan encounters a
343 file or directory whose name matches the wildcard provided to
344 the --exclude option, it will not include that file in its index
345 - but unlike --prune, if the file in question is a directory it
346 will still scan its contents and index them if they are not
347 ruled out themselves by --exclude options.
348
349 As above, --exclude-path is similar to --exclude, except that
350 the wildcard is matched against the entire pathname.
351
352 --include wildcard and --include-path wildcard
353 These cause particular files or directories to be re-included in
354 the index and the scan, if they had previously been ruled out by
355 one of the above exclude or prune options. You can interleave
356 include, exclude and prune options as you wish on the command
357 line, and if more than one of them applies to a file then the
358 last one takes priority.
359
360 For example, if you wanted to see only the disk space taken up
361 by MP3 files, you might run
362
363 $ agedu -s . --exclude '*' --include '*.mp3'
364
365 which will cause everything to be omitted from the scan, but
366 then the MP3 files to be put back in. If you then wanted only a
367 subset of those MP3s, you could then exclude some of them again
368 by adding, say, `--exclude-path './queen/*'' (or, more effi‐
369 ciently, `--prune ./queen') on the end of that command.
370
371 As with the previous two options, --include-path is similar to
372 --include except that the wildcard is matched against the entire
373 pathname.
374
375 --progress, --no-progress and --tty-progress
376 When agedu is scanning a directory tree, it will typically print
377 a one-line progress report every second showing where it has
378 reached in the scan, so you can have some idea of how much
379 longer it will take. (Of course, it can't predict exactly how
380 long it will take, since it doesn't know which of the directo‐
381 ries it hasn't scanned yet will turn out to be huge.)
382
383 By default, those progress reports are displayed on agedu's
384 standard error channel, if that channel points to a terminal
385 device. If you need to manually enable or disable them, you can
386 use the above three options to do so: --progress unconditionally
387 enables the progress reports, --no-progress unconditionally dis‐
388 ables them, and --tty-progress reverts to the default behaviour
389 which is conditional on standard error being a terminal.
390
391 --dir-atime and --no-dir-atime
392 In normal operation, agedu ignores the atimes (last access
393 times) on the directories it scans: it only pays attention to
394 the atimes of the files inside those directories. This is
395 because directory atimes tend to be reset by a lot of system
396 administrative tasks, such as cron jobs which scan the file sys‐
397 tem for one reason or another - or even other invocations of
398 agedu itself, though it tries to avoid modifying any atimes if
399 possible. So the literal atimes on directories are typically not
400 representative of how long ago the data in question was last
401 accessed with real intent to use that data in particular.
402
403 Instead, agedu makes up a fake atime for every directory it
404 scans, which is equal to the newest atime of any file in or
405 below that directory (or the directory's last modification time,
406 whichever is newest). This is based on the assumption that all
407 important accesses to directories are actually accesses to the
408 files inside those directories, so that when any file is
409 accessed all the directories on the path leading to it should be
410 considered to have been accessed as well.
411
412 In unusual cases it is possible that a directory itself might
413 embody important data which is accessed by reading the direc‐
414 tory. In that situation, agedu's atime-faking policy will misre‐
415 port the directory as disused. In the unlikely event that such
416 directories form a significant part of your disk space usage,
417 you might want to turn off the faking. The --dir-atime option
418 does this: it causes the disk scan to read the original atimes
419 of the directories it scans.
420
421 The faking of atimes on directories also requires a processing
422 pass over the index file after the main disk scan is complete.
423 --dir-atime also turns this pass off. Hence, this option affects
424 the -L option as well as -s and -S.
425
426 (The previous section mentioned that there might be subtle dif‐
427 ferences between the output of agedu -s /path -D and agedu -S
428 /path. This is why. Doing a scan with -s and then dumping it
429 with -D will dump the fully faked atimes on the directories,
430 whereas doing a scan-to-dump with -S will dump only partially
431 faked atimes - specifically, each directory's last modification
432 time - since the subsequent processing pass will not have had a
433 chance to take place. However, loading either of the resulting
434 dump files with -L will perform the atime-faking processing
435 pass, leading to the same data in the index file in each case.
436 In normal usage it should be safe to ignore all of this complex‐
437 ity.)
438
439 --mtime
440 This option causes agedu to index files by their last modifica‐
441 tion time instead of their last access time. You might want to
442 use this if your last access times were completely useless for
443 some reason: for example, if you had recently searched every
444 file on your system, the system would have lost all the informa‐
445 tion about what files you hadn't recently accessed before then.
446 Using this option is liable to be less effective at finding gen‐
447 uinely wasted space than the normal mode (that is, it will be
448 more likely to flag things as disused when they're not, so you
449 will have more candidates to go through by hand looking for data
450 you don't need), but may be better than nothing if your last-
451 access times are unhelpful.
452
453 Another use for this mode might be to find recently created
454 large data. If your disk has been gradually filling up for
455 years, the default mode of agedu will let you find unused data
456 to delete; but if you know your disk had plenty of space
457 recently and now it's suddenly full, and you suspect that some
458 rogue program has left a large core dump or output file, then
459 agedu --mtime might be a convenient way to locate the culprit.
460
461 The following option affects all the modes that generate reports: the
462 web server mode -w, the stand-alone HTML generation mode -H and the
463 text report mode -t.
464
465 --files
466 This option causes agedu's reports to list the individual files
467 in each directory, instead of just giving a combined report for
468 everything that's not in a subdirectory.
469
470 The following option affects the text report mode -t.
471
472 -a age or --age age
473 This option tells agedu to report only files of at least the
474 specified age. An age is specified as a number, followed by one
475 of `y' (years), `m' (months), `w' (weeks) or `d' (days). (This
476 syntax is also used by the -r option.) For example, -a 6m will
477 produce a text report which includes only files at least six
478 months old.
479
480 The following options affect the stand-alone HTML generation mode -H
481 and the text report mode -t.
482
483 -d depth or --depth depth
484 This option controls the maximum depth to which agedu recurses
485 when generating a text or HTML report.
486
487 In text mode, the default is 1, meaning that the report will
488 include the directory given on the command line and all of its
489 immediate subdirectories. A depth of two includes another level
490 below that, and so on; a depth of zero means only the directory
491 on the command line.
492
493 In HTML mode, specifying this option switches agedu from writing
494 out a single HTML file to writing out multiple files which link
495 to each other. A depth of 1 means agedu will write out an HTML
496 file for the given directory and also one for each of its imme‐
497 diate subdirectories.
498
499 If you want agedu to recurse as deeply as possible, give the
500 special word `max' as an argument to -d.
501
502 -o filename or --output filename
503 This option is used to specify an output file for agedu to write
504 its report to. In text mode or single-file HTML mode, the argu‐
505 ment is treated as the name of a file. In multiple-file HTML
506 mode, the argument is treated as the name of a directory: the
507 directory will be created if it does not already exist, and the
508 output HTML files will be created inside it.
509
510 The following option affects only the stand-alone HTML generation mode
511 -H, and even then, only in recursive mode (with -d):
512
513 --numeric
514 This option tells agedu to name most of its output HTML files
515 numerically. The root of the whole output file collection will
516 still be called index.html, but all the rest will have names
517 like 73.html or 12525.html. (The numbers are essentially arbi‐
518 trary; in fact, they're indices of nodes in the data structure
519 used by agedu's index file.)
520
521 This system of file naming is less intuitive than the default of
522 naming files after the sub-pathname they index. It's also less
523 stable: the same pathname will not necessarily be represented by
524 the same filename if agedu -H is re-run after another scan of
525 the same directory tree. However, it does have the virtue that
526 it keeps the filenames short, so that even if your directory
527 tree is very deep, the output HTML files won't exceed any OS
528 limit on filename length.
529
530 The following options affect the web server mode -w, and in some cases
531 also the stand-alone HTML generation mode -H:
532
533 -r age range or --age-range age range
534 The HTML reports produced by agedu use a range of colours to
535 indicate how long ago data was last accessed, running from red
536 (representing the most disused data) to green (representing the
537 newest). By default, the lengths of time represented by the two
538 ends of that spectrum are chosen by examining the data file to
539 see what range of ages appears in it. However, you might want to
540 set your own limits, and you can do this using -r.
541
542 The argument to -r consists of a single age, or two ages sepa‐
543 rated by a minus sign. An age is a number, followed by one of
544 `y' (years), `m' (months), `w' (weeks) or `d' (days). (This syn‐
545 tax is also used by the -a option.) The first age in the range
546 represents the oldest data, and will be coloured red in the
547 HTML; the second age represents the newest, coloured green. If
548 the second age is not specified, it will default to zero (so
549 that green means data which has been accessed just now).
550
551 For example, -r 2y will mark data in red if it has been unused
552 for two years or more, and green if it has been accessed just
553 now. -r 2y-3m will similarly mark data red if it has been unused
554 for two years or more, but will mark it green if it has been
555 accessed three months ago or later.
556
557 --address addr[:port]
558 Specifies the network address and port number on which agedu
559 should listen when running its web server. If you want agedu to
560 listen for connections coming in from any source, specify the
561 address as the special value ANY. If the port number is omitted,
562 an arbitrary unused port will be chosen for you and displayed.
563
564 If you specify this option, agedu will not print its URL on
565 standard output (since you are expected to know what address you
566 told it to listen to).
567
568 --auth auth-type
569 Specifies how agedu should control access to the web pages it
570 serves. The options are as follows:
571
572 magic This option only works on Linux, and only when the incom‐
573 ing connection is from the same machine that agedu is
574 running on. On Linux, the special file /proc/net/tcp con‐
575 tains a list of network connections currently known to
576 the operating system kernel, including which user id cre‐
577 ated them. So agedu will look up each incoming connection
578 in that file, and allow access if it comes from the same
579 user id under which agedu itself is running. Therefore,
580 in agedu's normal web server mode, you can safely run it
581 on a multi-user machine and no other user will be able to
582 read data out of your index file.
583
584 basic In this mode, agedu will use HTTP Basic authentication:
585 the user will have to provide a username and password via
586 their browser. agedu will normally make up a username and
587 password for the purpose, but you can specify your own;
588 see below.
589
590 none In this mode, the web server is unauthenticated: anyone
591 connecting to it has full access to the reports generated
592 by agedu. Do not do this unless there is nothing confi‐
593 dential at all in your index file, or unless you are cer‐
594 tain that nobody but you can run processes on your com‐
595 puter.
596
597 default
598 This is the default mode if you do not specify one of the
599 above. In this mode, agedu will attempt to use Linux
600 magic authentication, but if it detects at startup time
601 that /proc/net/tcp is absent or non-functional then it
602 will fall back to using HTTP Basic authentication and
603 invent a user name and password.
604
605 --auth-file filename or --auth-fd fd
606 When agedu is using HTTP Basic authentication, these options
607 allow you to specify your own user name and password. If you
608 specify --auth-file, these will be read from the specified file;
609 if you specify --auth-fd they will instead be read from a given
610 file descriptor which you should have arranged to pass to agedu.
611 In either case, the authentication details should consist of the
612 username, followed by a colon, followed by the password, fol‐
613 lowed immediately by end of file (no trailing newline, or else
614 it will be considered part of the password).
615
616 --title title
617 Specify the string that appears at the start of the <title> sec‐
618 tion of the output HTML pages. The default is `agedu'. This
619 title is followed by a colon and then the path you're viewing
620 within the index file. You might use this option if you were
621 serving agedu reports for several different servers and wanted
622 to make it clearer which one a user was looking at.
623
624 --no-eof
625 Stop agedu in web server mode from looking for end-of-file on
626 standard input and treating it as a signal to terminate.
627
629 The data file is pretty large. The core of agedu is the tree-based data
630 structure it uses in its index in order to efficiently perform the
631 queries it needs; this data structure requires O(N log N) storage. This
632 is larger than you might expect; a scan of my own home directory, con‐
633 taining half a million files and directories and about 20Gb of data,
634 produced an index file over 60Mb in size. Furthermore, since the data
635 file must be memory-mapped during most processing, it can never grow
636 larger than available address space, so a really big filesystem may
637 need to be indexed on a 64-bit computer. (This is one reason for the
638 existence of the -D and -L options: you can do the scanning on the
639 machine with access to the filesystem, and the indexing on a machine
640 big enough to handle it.)
641
642 The data structure also does not usefully permit access control within
643 the data file, so it would be difficult - even given the willingness to
644 do additional coding - to run a system-wide agedu scan on a cron job
645 and serve the right subset of reports to each user.
646
647 In certain circumstances, agedu can report false positives (reporting
648 files as disused which are in fact in use) as well as the more benign
649 false negatives (reporting files as in use which are not). This arises
650 when a file is, semantically speaking, `read' without actually being
651 physically read. Typically this occurs when a program checks whether
652 the file's mtime has changed and only bothers re-reading it if it has;
653 programs which do this include rsync(1) and make(1). Such programs will
654 fail to update the atime of unmodified files despite depending on their
655 continued existence; a directory full of such files will be reported as
656 disused by agedu even in situations where deleting them will cause
657 trouble.
658
659 Finally, of course, agedu's normal usage mode depends critically on the
660 OS providing last-access times which are at least approximately right.
661 So a file system mounted with Linux's `noatime' option, or the equiva‐
662 lent on any other OS, will not give useful results! (However, the Linux
663 mount option `relatime', which distributions now tend to use by
664 default, should be fine for all but specialist purposes: it reduces the
665 accuracy of last-access times so that they might be wrong by up to 24
666 hours, but if you're looking for files that have been unused for months
667 or years, that's not a problem.)
668
670 agedu is free software, distributed under the MIT licence. Type agedu
671 --licence to see the full licence text.
672
673
674
675Simon Tatham 2008‐11‐02 agedu(1)