1File::Find(3pm) Perl Programmers Reference Guide File::Find(3pm)
2
3
4
6 File::Find - Traverse a directory tree.
7
9 use File::Find;
10 find(\&wanted, @directories_to_search);
11 sub wanted { ... }
12
13 use File::Find;
14 finddepth(\&wanted, @directories_to_search);
15 sub wanted { ... }
16
17 use File::Find;
18 find({ wanted => \&process, follow => 1 }, '.');
19
21 These are functions for searching through directory trees doing work on
22 each file found similar to the Unix find command. File::Find exports
23 two functions, "find" and "finddepth". They work similarly but have
24 subtle differences.
25
26 find
27 find(\&wanted, @directories);
28 find(\%options, @directories);
29
30 find() does a depth-first search over the given @directories in the
31 order they are given. For each file or directory found, it calls
32 the &wanted subroutine. (See below for details on how to use the
33 &wanted function). Additionally, for each directory found, it will
34 chdir() into that directory and continue the search, invoking the
35 &wanted function on each file or subdirectory in the directory.
36
37 finddepth
38 finddepth(\&wanted, @directories);
39 finddepth(\%options, @directories);
40
41 finddepth() works just like find() except that it invokes the
42 &wanted function for a directory after invoking it for the
43 directory's contents. It does a postorder traversal instead of a
44 preorder traversal, working from the bottom of the directory tree
45 up where find() works from the top of the tree down.
46
47 Despite the name of the finddepth() function, both find() and
48 finddepth() perform a depth-first search of the directory hierarchy.
49
50 %options
51 The first argument to find() is either a code reference to your &wanted
52 function, or a hash reference describing the operations to be performed
53 for each file. The code reference is described in "The wanted
54 function" below.
55
56 Here are the possible keys for the hash:
57
58 "wanted"
59 The value should be a code reference. This code reference is
60 described in "The wanted function" below. The &wanted subroutine is
61 mandatory.
62
63 "bydepth"
64 Reports the name of a directory only AFTER all its entries have
65 been reported. Entry point finddepth() is a shortcut for
66 specifying "{ bydepth => 1 }" in the first argument of find().
67
68 "preprocess"
69 The value should be a code reference. This code reference is used
70 to preprocess the current directory. The name of the currently
71 processed directory is in $File::Find::dir. Your preprocessing
72 function is called after readdir(), but before the loop that calls
73 the wanted() function. It is called with a list of strings
74 (actually file/directory names) and is expected to return a list of
75 strings. The code can be used to sort the file/directory names
76 alphabetically, numerically, or to filter out directory entries
77 based on their name alone. When follow or follow_fast are in
78 effect, "preprocess" is a no-op.
79
80 "postprocess"
81 The value should be a code reference. It is invoked just before
82 leaving the currently processed directory. It is called in void
83 context with no arguments. The name of the current directory is in
84 $File::Find::dir. This hook is handy for summarizing a directory,
85 such as calculating its disk usage. When follow or follow_fast are
86 in effect, "postprocess" is a no-op.
87
88 "follow"
89 Causes symbolic links to be followed. Since directory trees with
90 symbolic links (followed) may contain files more than once and may
91 even have cycles, a hash has to be built up with an entry for each
92 file. This might be expensive both in space and time for a large
93 directory tree. See "follow_fast" and "follow_skip" below. If
94 either follow or follow_fast is in effect:
95
96 • It is guaranteed that an lstat has been called before the
97 user's wanted() function is called. This enables fast file
98 checks involving "_". Note that this guarantee no longer holds
99 if follow or follow_fast are not set.
100
101 • There is a variable $File::Find::fullname which holds the
102 absolute pathname of the file with all symbolic links resolved.
103 If the link is a dangling symbolic link, then fullname will be
104 set to "undef".
105
106 "follow_fast"
107 This is similar to follow except that it may report some files more
108 than once. It does detect cycles, however. Since only symbolic
109 links have to be hashed, this is much cheaper both in space and
110 time. If processing a file more than once (by the user's wanted()
111 function) is worse than just taking time, the option follow should
112 be used.
113
114 "follow_skip"
115 "follow_skip==1", which is the default, causes all files which are
116 neither directories nor symbolic links to be ignored if they are
117 about to be processed a second time. If a directory or a symbolic
118 link are about to be processed a second time, File::Find dies.
119
120 "follow_skip==0" causes File::Find to die if any file is about to
121 be processed a second time.
122
123 "follow_skip==2" causes File::Find to ignore any duplicate files
124 and directories but to proceed normally otherwise.
125
126 "dangling_symlinks"
127 Specifies what to do with symbolic links whose target doesn't
128 exist. If true and a code reference, will be called with the
129 symbolic link name and the directory it lives in as arguments.
130 Otherwise, if true and warnings are on, a warning of the form
131 "symbolic_link_name is a dangling symbolic link\n" will be issued.
132 If false, the dangling symbolic link will be silently ignored.
133
134 "no_chdir"
135 Does not chdir() to each directory as it recurses. The wanted()
136 function will need to be aware of this, of course. In this case, $_
137 will be the same as $File::Find::name.
138
139 "untaint"
140 If find is used in taint-mode (-T command line switch or if EUID !=
141 UID or if EGID != GID), then internally directory names have to be
142 untainted before they can be "chdir"'d to. Therefore they are
143 checked against a regular expression untaint_pattern. Note that
144 all names passed to the user's wanted() function are still tainted.
145 If this option is used while not in taint-mode, "untaint" is a no-
146 op.
147
148 "untaint_pattern"
149 See above. This should be set using the "qr" quoting operator. The
150 default is set to "qr|^([-+@\w./]+)$|". Note that the parentheses
151 are vital.
152
153 "untaint_skip"
154 If set, a directory which fails the untaint_pattern is skipped,
155 including all its sub-directories. The default is to "die" in such
156 a case.
157
158 The wanted function
159 The wanted() function does whatever verifications you want on each file
160 and directory. Note that despite its name, the wanted() function is a
161 generic callback function, and does not tell File::Find if a file is
162 "wanted" or not. In fact, its return value is ignored.
163
164 The wanted function takes no arguments but rather does its work through
165 a collection of variables.
166
167 $File::Find::dir is the current directory name,
168 $_ is the current filename within that directory
169 $File::Find::name is the complete pathname to the file.
170
171 The above variables have all been localized and may be changed without
172 affecting data outside of the wanted function.
173
174 For example, when examining the file /some/path/foo.ext you will have:
175
176 $File::Find::dir = /some/path/
177 $_ = foo.ext
178 $File::Find::name = /some/path/foo.ext
179
180 You are chdir()'d to $File::Find::dir when the function is called,
181 unless "no_chdir" was specified. Note that when changing to directories
182 is in effect, the root directory (/) is a somewhat special case
183 inasmuch as the concatenation of $File::Find::dir, '/' and $_ is not
184 literally equal to $File::Find::name. The table below summarizes all
185 variants:
186
187 $File::Find::name $File::Find::dir $_
188 default / / .
189 no_chdir=>0 /etc / etc
190 /etc/x /etc x
191
192 no_chdir=>1 / / /
193 /etc / /etc
194 /etc/x /etc /etc/x
195
196 When "follow" or "follow_fast" are in effect, there is also a
197 $File::Find::fullname. The function may set $File::Find::prune to
198 prune the tree unless "bydepth" was specified. Unless "follow" or
199 "follow_fast" is specified, for compatibility reasons (find.pl,
200 find2perl) there are in addition the following globals available:
201 $File::Find::topdir, $File::Find::topdev, $File::Find::topino,
202 $File::Find::topmode and $File::Find::topnlink.
203
204 This library is useful for the "find2perl" tool (distributed as part of
205 the App-find2perl CPAN distribution), which when fed,
206
207 find2perl / -name .nfs\* -mtime +7 \
208 -exec rm -f {} \; -o -fstype nfs -prune
209
210 produces something like:
211
212 sub wanted {
213 /^\.nfs.*\z/s &&
214 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
215 int(-M _) > 7 &&
216 unlink($_)
217 ||
218 ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
219 $dev < 0 &&
220 ($File::Find::prune = 1);
221 }
222
223 Notice the "_" in the above "int(-M _)": the "_" is a magical
224 filehandle that caches the information from the preceding stat(),
225 lstat(), or filetest.
226
227 Here's another interesting wanted function. It will find all symbolic
228 links that don't resolve:
229
230 sub wanted {
231 -l && !-e && print "bogus link: $File::Find::name\n";
232 }
233
234 Note that you may mix directories and (non-directory) files in the list
235 of directories to be searched by the wanted() function.
236
237 find(\&wanted, "./foo", "./bar", "./baz/epsilon");
238
239 In the example above, no file in ./baz/ other than ./baz/epsilon will
240 be evaluated by wanted().
241
242 See also the script "pfind" on CPAN for a nice application of this
243 module.
244
246 If you run your program with the "-w" switch, or if you use the
247 "warnings" pragma, File::Find will report warnings for several weird
248 situations. You can disable these warnings by putting the statement
249
250 no warnings 'File::Find';
251
252 in the appropriate scope. See warnings for more info about lexical
253 warnings.
254
256 $dont_use_nlink
257 You can set the variable $File::Find::dont_use_nlink to 0 if you
258 are sure the filesystem you are scanning reflects the number of
259 subdirectories in the parent directory's "nlink" count.
260
261 If you do set $File::Find::dont_use_nlink to 0, you may notice an
262 improvement in speed at the risk of not recursing into
263 subdirectories if a filesystem doesn't populate "nlink" as
264 expected.
265
266 $File::Find::dont_use_nlink now defaults to 1 on all platforms.
267
268 symlinks
269 Be aware that the option to follow symbolic links can be dangerous.
270 Depending on the structure of the directory tree (including
271 symbolic links to directories) you might traverse a given
272 (physical) directory more than once (only if "follow_fast" is in
273 effect). Furthermore, deleting or changing files in a symbolically
274 linked directory might cause very unpleasant surprises, since you
275 delete or change files in an unknown directory.
276
278 File::Find used to produce incorrect results if called recursively.
279 During the development of perl 5.8 this bug was fixed. The first fixed
280 version of File::Find was 1.01.
281
283 find(1), find2perl.
284
285
286
287perl v5.38.2 2023-11-30 File::Find(3pm)