1File::Find(3pm) Perl Programmers Reference Guide File::Find(3pm)
2
3
4
6 File::Find - Traverse a directory tree.
7
9 use File::Find;
10 find(\&wanted, @directories_to_search);
11 sub wanted { ... }
12
13 use File::Find;
14 finddepth(\&wanted, @directories_to_search);
15 sub wanted { ... }
16
17 use File::Find;
18 find({ wanted => \&process, follow => 1 }, '.');
19
21 These are functions for searching through directory trees doing work on
22 each file found similar to the Unix find command. File::Find exports
23 two functions, "find" and "finddepth". They work similarly but have
24 subtle differences.
25
26 find
27 find(\&wanted, @directories);
28 find(\%options, @directories);
29
30 "find()" does a depth-first search over the given @directories in
31 the order they are given. For each file or directory found, it
32 calls the &wanted subroutine. (See below for details on how to use
33 the &wanted function). Additionally, for each directory found, it
34 will "chdir()" into that directory and continue the search,
35 invoking the &wanted function on each file or subdirectory in the
36 directory.
37
38 finddepth
39 finddepth(\&wanted, @directories);
40 finddepth(\%options, @directories);
41
42 "finddepth()" works just like "find()" except that it invokes the
43 &wanted function for a directory after invoking it for the
44 directory's contents. It does a postorder traversal instead of a
45 preorder traversal, working from the bottom of the directory tree
46 up where "find()" works from the top of the tree down.
47
48 %options
49 The first argument to "find()" is either a code reference to your
50 &wanted function, or a hash reference describing the operations to be
51 performed for each file. The code reference is described in "The
52 wanted function" below.
53
54 Here are the possible keys for the hash:
55
56 "wanted"
57 The value should be a code reference. This code reference is
58 described in "The wanted function" below. The &wanted subroutine is
59 mandatory.
60
61 "bydepth"
62 Reports the name of a directory only AFTER all its entries have been
63 reported. Entry point "finddepth()" is a shortcut for specifying "{
64 bydepth => 1 }" in the first argument of "find()".
65
66 "preprocess"
67 The value should be a code reference. This code reference is used to
68 preprocess the current directory. The name of the currently
69 processed directory is in $File::Find::dir. Your preprocessing
70 function is called after "readdir()", but before the loop that calls
71 the "wanted()" function. It is called with a list of strings
72 (actually file/directory names) and is expected to return a list of
73 strings. The code can be used to sort the file/directory names
74 alphabetically, numerically, or to filter out directory entries
75 based on their name alone. When follow or follow_fast are in effect,
76 "preprocess" is a no-op.
77
78 "postprocess"
79 The value should be a code reference. It is invoked just before
80 leaving the currently processed directory. It is called in void
81 context with no arguments. The name of the current directory is in
82 $File::Find::dir. This hook is handy for summarizing a directory,
83 such as calculating its disk usage. When follow or follow_fast are
84 in effect, "postprocess" is a no-op.
85
86 "follow"
87 Causes symbolic links to be followed. Since directory trees with
88 symbolic links (followed) may contain files more than once and may
89 even have cycles, a hash has to be built up with an entry for each
90 file. This might be expensive both in space and time for a large
91 directory tree. See follow_fast and follow_skip below. If either
92 follow or follow_fast is in effect:
93
94 · It is guaranteed that an lstat has been called before the
95 user's "wanted()" function is called. This enables fast file
96 checks involving _. Note that this guarantee no longer holds
97 if follow or follow_fast are not set.
98
99 · There is a variable $File::Find::fullname which holds the
100 absolute pathname of the file with all symbolic links
101 resolved. If the link is a dangling symbolic link, then
102 fullname will be set to "undef".
103
104 This is a no-op on Win32.
105
106 "follow_fast"
107 This is similar to follow except that it may report some files more
108 than once. It does detect cycles, however. Since only symbolic
109 links have to be hashed, this is much cheaper both in space and
110 time. If processing a file more than once (by the user's "wanted()"
111 function) is worse than just taking time, the option follow should
112 be used.
113
114 This is also a no-op on Win32.
115
116 "follow_skip"
117 "follow_skip==1", which is the default, causes all files which are
118 neither directories nor symbolic links to be ignored if they are
119 about to be processed a second time. If a directory or a symbolic
120 link are about to be processed a second time, File::Find dies.
121
122 "follow_skip==0" causes File::Find to die if any file is about to be
123 processed a second time.
124
125 "follow_skip==2" causes File::Find to ignore any duplicate files and
126 directories but to proceed normally otherwise.
127
128 "dangling_symlinks"
129 If true and a code reference, will be called with the symbolic link
130 name and the directory it lives in as arguments. Otherwise, if true
131 and warnings are on, warning "symbolic_link_name is a dangling
132 symbolic link\n" will be issued. If false, the dangling symbolic
133 link will be silently ignored.
134
135 "no_chdir"
136 Does not "chdir()" to each directory as it recurses. The "wanted()"
137 function will need to be aware of this, of course. In this case, $_
138 will be the same as $File::Find::name.
139
140 "untaint"
141 If find is used in taint-mode (-T command line switch or if EUID !=
142 UID or if EGID != GID) then internally directory names have to be
143 untainted before they can be chdir'ed to. Therefore they are checked
144 against a regular expression untaint_pattern. Note that all names
145 passed to the user's wanted() function are still tainted. If this
146 option is used while not in taint-mode, "untaint" is a no-op.
147
148 "untaint_pattern"
149 See above. This should be set using the "qr" quoting operator. The
150 default is set to "qr|^([-+@\w./]+)$|". Note that the parentheses
151 are vital.
152
153 "untaint_skip"
154 If set, a directory which fails the untaint_pattern is skipped,
155 including all its sub-directories. The default is to 'die' in such a
156 case.
157
158 The wanted function
159 The "wanted()" function does whatever verifications you want on each
160 file and directory. Note that despite its name, the "wanted()"
161 function is a generic callback function, and does not tell File::Find
162 if a file is "wanted" or not. In fact, its return value is ignored.
163
164 The wanted function takes no arguments but rather does its work through
165 a collection of variables.
166
167 $File::Find::dir is the current directory name,
168 $_ is the current filename within that directory
169 $File::Find::name is the complete pathname to the file.
170
171 The above variables have all been localized and may be changed without
172 effecting data outside of the wanted function.
173
174 For example, when examining the file /some/path/foo.ext you will have:
175
176 $File::Find::dir = /some/path/
177 $_ = foo.ext
178 $File::Find::name = /some/path/foo.ext
179
180 You are chdir()'d to $File::Find::dir when the function is called,
181 unless "no_chdir" was specified. Note that when changing to directories
182 is in effect the root directory (/) is a somewhat special case inasmuch
183 as the concatenation of $File::Find::dir, '/' and $_ is not literally
184 equal to $File::Find::name. The table below summarizes all variants:
185
186 $File::Find::name $File::Find::dir $_
187 default / / .
188 no_chdir=>0 /etc / etc
189 /etc/x /etc x
190
191 no_chdir=>1 / / /
192 /etc / /etc
193 /etc/x /etc /etc/x
194
195 When "follow" or "follow_fast" are in effect, there is also a
196 $File::Find::fullname. The function may set $File::Find::prune to
197 prune the tree unless "bydepth" was specified. Unless "follow" or
198 "follow_fast" is specified, for compatibility reasons (find.pl,
199 find2perl) there are in addition the following globals available:
200 $File::Find::topdir, $File::Find::topdev, $File::Find::topino,
201 $File::Find::topmode and $File::Find::topnlink.
202
203 This library is useful for the "find2perl" tool, which when fed,
204
205 find2perl / -name .nfs\* -mtime +7 \
206 -exec rm -f {} \; -o -fstype nfs -prune
207
208 produces something like:
209
210 sub wanted {
211 /^\.nfs.*\z/s &&
212 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
213 int(-M _) > 7 &&
214 unlink($_)
215 ||
216 ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
217 $dev < 0 &&
218 ($File::Find::prune = 1);
219 }
220
221 Notice the "_" in the above "int(-M _)": the "_" is a magical
222 filehandle that caches the information from the preceding "stat()",
223 "lstat()", or filetest.
224
225 Here's another interesting wanted function. It will find all symbolic
226 links that don't resolve:
227
228 sub wanted {
229 -l && !-e && print "bogus link: $File::Find::name\n";
230 }
231
232 See also the script "pfind" on CPAN for a nice application of this
233 module.
234
236 If you run your program with the "-w" switch, or if you use the
237 "warnings" pragma, File::Find will report warnings for several weird
238 situations. You can disable these warnings by putting the statement
239
240 no warnings 'File::Find';
241
242 in the appropriate scope. See perllexwarn for more info about lexical
243 warnings.
244
246 $dont_use_nlink
247 You can set the variable $File::Find::dont_use_nlink to 1, if you
248 want to force File::Find to always stat directories. This was used
249 for file systems that do not have an "nlink" count matching the
250 number of sub-directories. Examples are ISO-9660 (CD-ROM), AFS, HPFS
251 (OS/2 file system), FAT (DOS file system) and a couple of others.
252
253 You shouldn't need to set this variable, since File::Find should now
254 detect such file systems on-the-fly and switch itself to using stat.
255 This works even for parts of your file system, like a mounted CD-ROM.
256
257 If you do set $File::Find::dont_use_nlink to 1, you will notice slow-
258 downs.
259
260 symlinks
261 Be aware that the option to follow symbolic links can be dangerous.
262 Depending on the structure of the directory tree (including symbolic
263 links to directories) you might traverse a given (physical) directory
264 more than once (only if "follow_fast" is in effect). Furthermore,
265 deleting or changing files in a symbolically linked directory might
266 cause very unpleasant surprises, since you delete or change files in
267 an unknown directory.
268
270 · Mac OS (Classic) users should note a few differences:
271
272 · The path separator is ':', not '/', and the current directory
273 is denoted as ':', not '.'. You should be careful about
274 specifying relative pathnames. While a full path always begins
275 with a volume name, a relative pathname should always begin
276 with a ':'. If specifying a volume name only, a trailing ':'
277 is required.
278
279 · $File::Find::dir is guaranteed to end with a ':'. If $_
280 contains the name of a directory, that name may or may not end
281 with a ':'. Likewise, $File::Find::name, which contains the
282 complete pathname to that directory, and $File::Find::fullname,
283 which holds the absolute pathname of that directory with all
284 symbolic links resolved, may or may not end with a ':'.
285
286 · The default "untaint_pattern" (see above) on Mac OS is set to
287 "qr|^(.+)$|". Note that the parentheses are vital.
288
289 · The invisible system file "Icon\015" is ignored. While this
290 file may appear in every directory, there are some more
291 invisible system files on every volume, which are all located
292 at the volume root level (i.e. "MacintoshHD:"). These system
293 files are not excluded automatically. Your filter may use the
294 following code to recognize invisible files or directories
295 (requires Mac::Files):
296
297 use Mac::Files;
298
299 # invisible() -- returns 1 if file/directory is invisible,
300 # 0 if it's visible or undef if an error occurred
301
302 sub invisible($) {
303 my $file = shift;
304 my ($fileCat, $fileInfo);
305 my $invisible_flag = 1 << 14;
306
307 if ( $fileCat = FSpGetCatInfo($file) ) {
308 if ($fileInfo = $fileCat->ioFlFndrInfo() ) {
309 return (($fileInfo->fdFlags & $invisible_flag) && 1);
310 }
311 }
312 return undef;
313 }
314
315 Generally, invisible files are system files, unless an odd
316 application decides to use invisible files for its own
317 purposes. To distinguish such files from system files, you have
318 to look at the type and creator file attributes. The MacPerl
319 built-in functions "GetFileInfo(FILE)" and
320 "SetFileInfo(CREATOR, TYPE, FILES)" offer access to these
321 attributes (see MacPerl.pm for details).
322
323 Files that appear on the desktop actually reside in an (hidden)
324 directory named "Desktop Folder" on the particular disk volume.
325 Note that, although all desktop files appear to be on the same
326 "virtual" desktop, each disk volume actually maintains its own
327 "Desktop Folder" directory.
328
330 Despite the name of the "finddepth()" function, both "find()" and
331 "finddepth()" perform a depth-first search of the directory hierarchy.
332
334 File::Find used to produce incorrect results if called recursively.
335 During the development of perl 5.8 this bug was fixed. The first fixed
336 version of File::Find was 1.01.
337
339 find, find2perl.
340
341
342
343perl v5.10.1 2009-04-18 File::Find(3pm)