1File::Find(3pm) Perl Programmers Reference Guide File::Find(3pm)
2
3
4
6 File::Find - Traverse a directory tree.
7
9 use File::Find;
10 find(\&wanted, @directories_to_search);
11 sub wanted { ... }
12
13 use File::Find;
14 finddepth(\&wanted, @directories_to_search);
15 sub wanted { ... }
16
17 use File::Find;
18 find({ wanted => \&process, follow => 1 }, '.');
19
21 These are functions for searching through directory trees doing work on
22 each file found similar to the Unix find command. File::Find exports
23 two functions, "find" and "finddepth". They work similarly but have
24 subtle differences.
25
26 find
27 find(\&wanted, @directories);
28 find(\%options, @directories);
29
30 "find()" does a depth-first search over the given @directories in
31 the order they are given. For each file or directory found, it
32 calls the &wanted subroutine. (See below for details on how to use
33 the &wanted function). Additionally, for each directory found, it
34 will "chdir()" into that directory and continue the search, invok‐
35 ing the &wanted function on each file or subdirectory in the direc‐
36 tory.
37
38 finddepth
39 finddepth(\&wanted, @directories);
40 finddepth(\%options, @directories);
41
42 "finddepth()" works just like "find()" except that is invokes the
43 &wanted function for a directory after invoking it for the direc‐
44 tory's contents. It does a postorder traversal instead of a pre‐
45 order traversal, working from the bottom of the directory tree up
46 where "find()" works from the top of the tree down.
47
48 %options
49
50 The first argument to "find()" is either a code reference to your
51 &wanted function, or a hash reference describing the operations to be
52 performed for each file. The code reference is described in "The
53 wanted function" below.
54
55 Here are the possible keys for the hash:
56
57 "wanted"
58 The value should be a code reference. This code reference is
59 described in "The wanted function" below.
60
61 "bydepth"
62 Reports the name of a directory only AFTER all its entries have been
63 reported. Entry point "finddepth()" is a shortcut for specifying
64 "<{ bydepth =" 1 }>> in the first argument of "find()".
65
66 "preprocess"
67 The value should be a code reference. This code reference is used to
68 preprocess the current directory. The name of the currently pro‐
69 cessed directory is in $File::Find::dir. Your preprocessing function
70 is called after "readdir()", but before the loop that calls the
71 "wanted()" function. It is called with a list of strings (actually
72 file/directory names) and is expected to return a list of strings.
73 The code can be used to sort the file/directory names alphabeti‐
74 cally, numerically, or to filter out directory entries based on
75 their name alone. When follow or follow_fast are in effect, "prepro‐
76 cess" is a no-op.
77
78 "postprocess"
79 The value should be a code reference. It is invoked just before
80 leaving the currently processed directory. It is called in void con‐
81 text with no arguments. The name of the current directory is in
82 $File::Find::dir. This hook is handy for summarizing a directory,
83 such as calculating its disk usage. When follow or follow_fast are
84 in effect, "postprocess" is a no-op.
85
86 "follow"
87 Causes symbolic links to be followed. Since directory trees with
88 symbolic links (followed) may contain files more than once and may
89 even have cycles, a hash has to be built up with an entry for each
90 file. This might be expensive both in space and time for a large
91 directory tree. See follow_fast and follow_skip below. If either
92 follow or follow_fast is in effect:
93
94 * It is guaranteed that an lstat has been called before the
95 user's "wanted()" function is called. This enables fast file
96 checks involving _. Note that this guarantee no longer holds
97 if follow or follow_fast are not set.
98
99 * There is a variable $File::Find::fullname which holds the
100 absolute pathname of the file with all symbolic links
101 resolved. If the link is a dangling symbolic link, then full‐
102 name will be set to "undef".
103
104 This is a no-op on Win32.
105
106 "follow_fast"
107 This is similar to follow except that it may report some files more
108 than once. It does detect cycles, however. Since only symbolic
109 links have to be hashed, this is much cheaper both in space and
110 time. If processing a file more than once (by the user's "wanted()"
111 function) is worse than just taking time, the option follow should
112 be used.
113
114 This is also a no-op on Win32.
115
116 "follow_skip"
117 "follow_skip==1", which is the default, causes all files which are
118 neither directories nor symbolic links to be ignored if they are
119 about to be processed a second time. If a directory or a symbolic
120 link are about to be processed a second time, File::Find dies.
121
122 "follow_skip==0" causes File::Find to die if any file is about to be
123 processed a second time.
124
125 "follow_skip==2" causes File::Find to ignore any duplicate files and
126 directories but to proceed normally otherwise.
127
128 "dangling_symlinks"
129 If true and a code reference, will be called with the symbolic link
130 name and the directory it lives in as arguments. Otherwise, if true
131 and warnings are on, warning "symbolic_link_name is a dangling sym‐
132 bolic link\n" will be issued. If false, the dangling symbolic link
133 will be silently ignored.
134
135 "no_chdir"
136 Does not "chdir()" to each directory as it recurses. The "wanted()"
137 function will need to be aware of this, of course. In this case, $_
138 will be the same as $File::Find::name.
139
140 "untaint"
141 If find is used in taint-mode (-T command line switch or if EUID !=
142 UID or if EGID != GID) then internally directory names have to be
143 untainted before they can be chdir'ed to. Therefore they are checked
144 against a regular expression untaint_pattern. Note that all names
145 passed to the user's wanted() function are still tainted. If this
146 option is used while not in taint-mode, "untaint" is a no-op.
147
148 "untaint_pattern"
149 See above. This should be set using the "qr" quoting operator. The
150 default is set to "qr⎪^([-+@\w./]+)$⎪". Note that the parentheses
151 are vital.
152
153 "untaint_skip"
154 If set, a directory which fails the untaint_pattern is skipped,
155 including all its sub-directories. The default is to 'die' in such a
156 case.
157
158 The wanted function
159
160 The "wanted()" function does whatever verifications you want on each
161 file and directory. Note that despite its name, the "wanted()" func‐
162 tion is a generic callback function, and does not tell File::Find if a
163 file is "wanted" or not. In fact, its return value is ignored.
164
165 The wanted function takes no arguments but rather does its work through
166 a collection of variables.
167
168 $File::Find::dir is the current directory name,
169 $_ is the current filename within that directory
170 $File::Find::name is the complete pathname to the file.
171
172 Don't modify these variables.
173
174 For example, when examining the file /some/path/foo.ext you will have:
175
176 $File::Find::dir = /some/path/
177 $_ = foo.ext
178 $File::Find::name = /some/path/foo.ext
179
180 You are chdir()'d to $File::Find::dir when the function is called,
181 unless "no_chdir" was specified. Note that when changing to directories
182 is in effect the root directory (/) is a somewhat special case inasmuch
183 as the concatenation of $File::Find::dir, '/' and $_ is not literally
184 equal to $File::Find::name. The table below summarizes all variants:
185
186 $File::Find::name $File::Find::dir $_
187 default / / .
188 no_chdir=>0 /etc / etc
189 /etc/x /etc x
190
191 no_chdir=>1 / / /
192 /etc / /etc
193 /etc/x /etc /etc/x
194
195 When <follow> or <follow_fast> are in effect, there is also a
196 $File::Find::fullname. The function may set $File::Find::prune to
197 prune the tree unless "bydepth" was specified. Unless "follow" or
198 "follow_fast" is specified, for compatibility reasons (find.pl,
199 find2perl) there are in addition the following globals available:
200 $File::Find::topdir, $File::Find::topdev, $File::Find::topino,
201 $File::Find::topmode and $File::Find::topnlink.
202
203 This library is useful for the "find2perl" tool, which when fed,
204
205 find2perl / -name .nfs\* -mtime +7 \
206 -exec rm -f {} \; -o -fstype nfs -prune
207
208 produces something like:
209
210 sub wanted {
211 /^\.nfs.*\z/s &&
212 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
213 int(-M _) > 7 &&
214 unlink($_)
215 ⎪⎪
216 ($nlink ⎪⎪ (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
217 $dev < 0 &&
218 ($File::Find::prune = 1);
219 }
220
221 Notice the "_" in the above "int(-M _)": the "_" is a magical filehan‐
222 dle that caches the information from the preceding "stat()", "lstat()",
223 or filetest.
224
225 Here's another interesting wanted function. It will find all symbolic
226 links that don't resolve:
227
228 sub wanted {
229 -l && !-e && print "bogus link: $File::Find::name\n";
230 }
231
232 See also the script "pfind" on CPAN for a nice application of this mod‐
233 ule.
234
236 If you run your program with the "-w" switch, or if you use the "warn‐
237 ings" pragma, File::Find will report warnings for several weird situa‐
238 tions. You can disable these warnings by putting the statement
239
240 no warnings 'File::Find';
241
242 in the appropriate scope. See perllexwarn for more info about lexical
243 warnings.
244
246 $dont_use_nlink
247 You can set the variable $File::Find::dont_use_nlink to 1, if you
248 want to force File::Find to always stat directories. This was used
249 for file systems that do not have an "nlink" count matching the num‐
250 ber of sub-directories. Examples are ISO-9660 (CD-ROM), AFS, HPFS
251 (OS/2 file system), FAT (DOS file system) and a couple of others.
252
253 You shouldn't need to set this variable, since File::Find should now
254 detect such file systems on-the-fly and switch itself to using stat.
255 This works even for parts of your file system, like a mounted CD-ROM.
256
257 If you do set $File::Find::dont_use_nlink to 1, you will notice
258 slow-downs.
259
260 symlinks
261 Be aware that the option to follow symbolic links can be dangerous.
262 Depending on the structure of the directory tree (including symbolic
263 links to directories) you might traverse a given (physical) directory
264 more than once (only if "follow_fast" is in effect). Furthermore,
265 deleting or changing files in a symbolically linked directory might
266 cause very unpleasant surprises, since you delete or change files in
267 an unknown directory.
268
270 · Mac OS (Classic) users should note a few differences:
271
272 · The path separator is ':', not '/', and the current directory
273 is denoted as ':', not '.'. You should be careful about speci‐
274 fying relative pathnames. While a full path always begins with
275 a volume name, a relative pathname should always begin with a
276 ':'. If specifying a volume name only, a trailing ':' is
277 required.
278
279 · $File::Find::dir is guaranteed to end with a ':'. If $_ con‐
280 tains the name of a directory, that name may or may not end
281 with a ':'. Likewise, $File::Find::name, which contains the
282 complete pathname to that directory, and $File::Find::fullname,
283 which holds the absolute pathname of that directory with all
284 symbolic links resolved, may or may not end with a ':'.
285
286 · The default "untaint_pattern" (see above) on Mac OS is set to
287 "qr⎪^(.+)$⎪". Note that the parentheses are vital.
288
289 · The invisible system file "Icon\015" is ignored. While this
290 file may appear in every directory, there are some more invisi‐
291 ble system files on every volume, which are all located at the
292 volume root level (i.e. "MacintoshHD:"). These system files
293 are not excluded automatically. Your filter may use the fol‐
294 lowing code to recognize invisible files or directories
295 (requires Mac::Files):
296
297 use Mac::Files;
298
299 # invisible() -- returns 1 if file/directory is invisible,
300 # 0 if it's visible or undef if an error occurred
301
302 sub invisible($) {
303 my $file = shift;
304 my ($fileCat, $fileInfo);
305 my $invisible_flag = 1 << 14;
306
307 if ( $fileCat = FSpGetCatInfo($file) ) {
308 if ($fileInfo = $fileCat->ioFlFndrInfo() ) {
309 return (($fileInfo->fdFlags & $invisible_flag) && 1);
310 }
311 }
312 return undef;
313 }
314
315 Generally, invisible files are system files, unless an odd
316 application decides to use invisible files for its own pur‐
317 poses. To distinguish such files from system files, you have to
318 look at the type and creator file attributes. The MacPerl
319 built-in functions "GetFileInfo(FILE)" and "SetFileInfo(CRE‐
320 ATOR, TYPE, FILES)" offer access to these attributes (see
321 MacPerl.pm for details).
322
323 Files that appear on the desktop actually reside in an (hidden)
324 directory named "Desktop Folder" on the particular disk volume.
325 Note that, although all desktop files appear to be on the same
326 "virtual" desktop, each disk volume actually maintains its own
327 "Desktop Folder" directory.
328
330 Despite the name of the "finddepth()" function, both "find()" and
331 "finddepth()" perform a depth-first search of the directory hierarchy.
332
334 File::Find used to produce incorrect results if called recursively.
335 During the development of perl 5.8 this bug was fixed. The first fixed
336 version of File::Find was 1.01.
337
338
339
340perl v5.8.8 2001-09-21 File::Find(3pm)