1PIR(3) User Contributed Perl Documentation PIR(3)
2
3
4
6 PIR - Short alias for Path::Iterator::Rule
7
9 version 1.015
10
12 use PIR;
13
14 my $rule = PIR->new; # match anything
15 $rule->file->size(">10k"); # add/chain rules
16
17 # iterator interface
18 my $next = $rule->iter( @dirs );
19 while ( defined( my $file = $next->() ) ) {
20 ...
21 }
22
23 # list interface
24 for my $file ( $rule->all( @dirs ) ) {
25 ...
26 }
27
29 This is an empty subclass of Path::Iterator::Rule. It saves you from
30 having to type the full name repeatedly, which is particularly handy
31 for one-liners:
32
33 $ perl -MPIR -wE 'say for PIR->new->skip_dirs(".")->perl_module->all(@INC)'
34
36 Constructors
37 "new"
38
39 my $rule = Path::Iterator::Rule->new;
40
41 Creates a new rule object that matches any file or directory. It takes
42 no arguments. For convenience, it may also be called on an object, in
43 which case it still returns a new object that matches any file or
44 directory.
45
46 "clone"
47
48 my $common = Path::Iterator::Rule->new->file->not_empty;
49 my $big_files = $common->clone->size(">1M");
50 my $small_files = $common->clone->size("<10K");
51
52 Creates a copy of a rule object. Useful for customizing different rule
53 objects against a common base.
54
55 Matching and iteration
56 "iter"
57
58 my $next = $rule->iter( @dirs, \%options);
59 while ( defined( my $file = $next->() ) ) {
60 ...
61 }
62
63 Creates a subroutine reference iterator that returns a single result
64 when dereferenced. This iterator is "lazy" -- results are not pre-
65 computed.
66
67 It takes as arguments a list of directories to search and an optional
68 hash reference of control options. If no search directories are
69 provided, the current directory is used ("."). Valid options include:
70
71 • "depthfirst" -- Controls order of results. Valid values are "1"
72 (post-order, depth-first search), "0" (breadth-first search) or
73 "-1" (pre-order, depth-first search). Default is 0.
74
75 • "error_handler" -- Catches errors during execution of rule tests.
76 Default handler dies with the filename and error. If set to undef,
77 error handling is disabled.
78
79 • "follow_symlinks" -- Follow directory symlinks when true. Default
80 is 1.
81
82 • "report_symlinks" -- Includes symlinks in results when true.
83 Default is equal to "follow_symlinks".
84
85 • "loop_safe" -- Prevents visiting the same directory more than once
86 when true. Default is 1.
87
88 • "relative" -- Return matching items relative to the search
89 directory. Default is 0.
90
91 • "sorted" -- Whether entries in a directory are sorted before
92 processing. Default is 1.
93
94 • "visitor" -- An optional coderef that will be called on items
95 matching all rules.
96
97 Filesystem loops might exist from either hard or soft links. The
98 "loop_safe" option prevents infinite loops, but adds some overhead by
99 making "stat" calls. Because directories are visited only once when
100 "loop_safe" is true, matches could come from a symlinked directory
101 before the real directory depending on the search order.
102
103 To get only the real files, turn off "follow_symlinks". You can have
104 symlinks included in results, but not descend into symlink directories
105 if you turn off "follow_symlinks", but turn on "report_symlinks".
106
107 Turning "loop_safe" off and leaving "follow_symlinks" on avoids "stat"
108 calls and will be fastest, but with the risk of an infinite loop and
109 repeated files. The default is slow, but safe.
110
111 The "error_handler" parameter must be a subroutine reference. It will
112 be called when a rule test throws an exception. The first argument
113 will be the file name being inspected and the second argument will be
114 the exception.
115
116 The optional "visitor" parameter must be a subroutine reference. If
117 set, it will be called for any result that matches. It is called the
118 same way a custom rule would be (see "EXTENDING") but its return value
119 is ignored. It is called when an item is first inspected --
120 "postorder" is not respected.
121
122 The paths inspected and returned will be relative to the search
123 directories provided. If these are absolute, then the paths returned
124 will have absolute paths. If these are relative, then the paths
125 returned will have relative paths.
126
127 If the search directories are absolute and the "relative" option is
128 true, files returned will be relative to the search directory. Note
129 that if the search directories are not mutually exclusive (whether
130 containing subdirectories like @INC or symbolic links), files found
131 could be returned relative to different initial search directories
132 based on "depthfirst", "follow_symlinks" or "loop_safe".
133
134 When the iterator is exhausted, it will return undef.
135
136 "iter_fast"
137
138 This works just like "iter", except that it optimizes for speed over
139 safety. Don't do this unless you're sure you need it and accept the
140 consequences. See "PERFORMANCE" for details.
141
142 "all"
143
144 my @matches = $rule->all( @dir, \%options );
145
146 Returns a list of paths that match the rule. It takes the same
147 arguments and has the same behaviors as the "iter" method. The "all"
148 method uses "iter" internally to fetch all results.
149
150 In scalar context, it will return the count of matched paths.
151
152 In void context, it is optimized to iterate over everything, but not
153 store results. This is most useful with the "visitor" option:
154
155 $rule->all( $path, { visitor => \&callback } );
156
157 "all_fast"
158
159 This works just like "all", except that it optimizes for speed over
160 safety. Don't do this unless you're sure you need it and accept the
161 consequences. See "PERFORMANCE" for details.
162
163 "test"
164
165 if ( $rule->test( $path, $basename, $stash ) ) { ... }
166
167 Test a file path against a rule. Used internally, but provided should
168 someone want to create their own, custom iteration algorithm.
169
170 Logic operations
171 "Path::Iterator::Rule" provides three logic operations for adding rules
172 to the object. Rules may be either a subroutine reference with
173 specific semantics (described below in "EXTENDING") or another
174 "Path::Iterator::Rule" object.
175
176 "and"
177
178 $rule->and( sub { -r -w -x $_ } ); # stacked filetest example
179 $rule->and( @more_rules );
180
181 Adds one or more constraints to the current rule. E.g. "old rule AND
182 new1 AND new2 AND ...". Returns the object to allow method chaining.
183
184 "or"
185
186 $rule->or(
187 $rule->new->name("foo*"),
188 $rule->new->name("bar*"),
189 sub { -r -w -x $_ },
190 );
191
192 Takes one or more alternatives and adds them as a constraint to the
193 current rule. E.g. "old rule AND ( new1 OR new2 OR ... )". Returns the
194 object to allow method chaining.
195
196 "not"
197
198 $rule->not( sub { -r -w -x $_ } );
199
200 Takes one or more alternatives and adds them as a negative constraint
201 to the current rule. E.g. "old rule AND NOT ( new1 AND new2 AND ...)".
202 Returns the object to allow method chaining.
203
204 "skip"
205
206 $rule->skip(
207 $rule->new->dir->not_writeable,
208 $rule->new->dir->name("foo"),
209 );
210
211 Takes one or more alternatives and will prune a directory if any of the
212 criteria match or if any of the rules already indicate the directory
213 should be pruned. Pruning means the directory will not be returned by
214 the iterator and will not be searched.
215
216 For files, it is equivalent to "$rule->not($rule->or(@rules))".
217 Returns the object to allow method chaining.
218
219 This method should be called as early as possible in the rule chain.
220 See "skip_dirs" below for further explanation and an example.
221
223 Rule methods are helpers that add constraints. Internally, they
224 generate a closure to accomplish the desired logic and add it to the
225 rule object with the "and" method. Rule methods return the object to
226 allow for method chaining.
227
228 File name rules
229 "name"
230
231 $rule->name( "foo.txt" );
232 $rule->name( qr/foo/, "bar.*");
233
234 The "name" method takes one or more patterns and creates a rule that is
235 true if any of the patterns match the basename of the file or directory
236 path. Patterns may be regular expressions or glob expressions (or
237 literal names).
238
239 "iname"
240
241 $rule->iname( "foo.txt" );
242 $rule->iname( qr/foo/, "bar.*");
243
244 The "iname" method is just like the "name" method, but matches case-
245 insensitively.
246
247 "skip_dirs"
248
249 $rule->skip_dirs( @patterns );
250
251 The "skip_dirs" method skips directories that match one or more
252 patterns. Patterns may be regular expressions or globs (just like
253 "name"). Directories that match will not be returned from the iterator
254 and will be excluded from further search. This includes the starting
255 directories. If that isn't what you want, see "skip_subdirs" instead.
256
257 Note: this rule should be specified early so that it has a chance to
258 operate before a logical shortcut. E.g.
259
260 $rule->skip_dirs(".git")->file; # OK
261 $rule->file->skip_dirs(".git"); # Won't work
262
263 In the latter case, when a ".git" directory is seen, the "file" rule
264 shortcuts the rule before the "skip_dirs" rule has a chance to act.
265
266 "skip_subdirs"
267
268 $rule->skip_subdirs( @patterns );
269
270 This works just like "skip_dirs", except that the starting directories
271 (depth 0) are not skipped and may be returned from the iterator unless
272 excluded by other rules.
273
274 File test rules
275 Most of the "-X" style filetest are available as boolean rules. The
276 table below maps the filetest to its corresponding method name.
277
278 Test | Method Test | Method
279 ------|------------- ------|----------------
280 -r | readable -R | r_readable
281 -w | writeable -W | r_writeable
282 -w | writable -W | r_writable
283 -x | executable -X | r_executable
284 -o | owned -O | r_owned
285 | |
286 -e | exists -f | file
287 -z | empty -d | directory, dir
288 -s | nonempty -l | symlink
289 | -p | fifo
290 -u | setuid -S | socket
291 -g | setgid -b | block
292 -k | sticky -c | character
293 | -t | tty
294 -T | ascii
295 -B | binary
296
297 For example:
298
299 $rule->file->nonempty; # -f -s $file
300
301 The -X operators for timestamps take a single argument in a form that
302 Number::Compare can interpret.
303
304 Test | Method
305 ------|-------------
306 -A | accessed
307 -M | modified
308 -C | changed
309
310 For example:
311
312 $rule->modified(">1"); # -M $file > 1
313
314 Stat test rules
315 All of the "stat" elements have a method that takes a single argument
316 in a form understood by Number::Compare.
317
318 stat() | Method
319 --------------------
320 0 | dev
321 1 | ino
322 2 | mode
323 3 | nlink
324 4 | uid
325 5 | gid
326 6 | rdev
327 7 | size
328 8 | atime
329 9 | mtime
330 10 | ctime
331 11 | blksize
332 12 | blocks
333
334 For example:
335
336 $rule->size(">10K")
337
338 Depth rules
339 $rule->min_depth(3);
340 $rule->max_depth(5);
341
342 The "min_depth" and "max_depth" rule methods take a single argument and
343 limit the paths returned to a minimum or maximum depth (respectively)
344 from the starting search directory. A depth of 0 means the starting
345 directory itself. A depth of 1 means its children. (This is similar
346 to the Unix "find" utility.)
347
348 Perl file rules
349 # All perl rules
350 $rule->perl_file;
351
352 # Individual perl file rules
353 $rule->perl_module; # .pm files
354 $rule->perl_pod; # .pod files
355 $rule->perl_test; # .t files
356 $rule->perl_installer; # Makefile.PL or Build.PL
357 $rule->perl_script; # .pl or 'perl' in the shebang
358
359 These rule methods match file names (or a shebang line) that are
360 typical of Perl distribution files.
361
362 Version control file rules
363 # Skip all known VCS files
364 $rule->skip_vcs;
365
366 # Skip individual VCS files
367 $rule->skip_cvs;
368 $rule->skip_rcs;
369 $rule->skip_svn;
370 $rule->skip_git;
371 $rule->skip_bzr;
372 $rule->skip_hg;
373 $rule->skip_darcs;
374
375 Skips files and/or prunes directories related to a version control
376 system. Just like "skip_dirs", these rules should be specified early
377 to get the correct behavior.
378
379 File content rules
380 "contents_match"
381
382 $rule->contents_match(qr/BEGIN .* END/xs);
383
384 The "contents_match" rule takes a list of regular expressions and
385 returns files that match one of the expressions.
386
387 The expressions are applied to the file's contents as a single string.
388 For large files, this is likely to take significant time and memory.
389
390 Files are assumed to be encoded in UTF-8, but alternative Perl IO
391 layers can be passed as the first argument:
392
393 $rule->contents_match(":encoding(iso-8859-1)", qr/BEGIN .* END/xs);
394
395 See perlio for further details.
396
397 "line_match"
398
399 $rule->line_match(qr/^new/i, qr/^Addition/);
400
401 The "line_match" rule takes a list of regular expressions and returns
402 files with at least one line that matches one of the expressions.
403
404 Files are assumed to be encoded in UTF-8, but alternative Perl IO
405 layers can be passed as the first argument.
406
407 "shebang"
408
409 $rule->shebang(qr/#!.*\bperl\b/);
410
411 The "shebang" rule takes a list of regular expressions or glob patterns
412 and checks them against the first line of a file.
413
414 Other rules
415 "dangling"
416
417 $rule->symlink->dangling;
418 $rule->not_dangling;
419
420 The "dangling" rule method matches dangling symlinks. Use it or its
421 inverse to control how dangling symlinks should be treated.
422
423 Negated rules
424 Most rule methods have a negated form preceded by "not_".
425
426 $rule->not_name("foo.*")
427
428 Because this happens automatically, it includes somewhat silly ones
429 like "not_nonempty" (which is thus a less efficient way of saying
430 "empty").
431
432 Rules that skip directories or version control files do not have a
433 negated version.
434
436 Custom rule subroutines
437 Rules are implemented as (usually anonymous) subroutine callbacks that
438 return a value indicating whether or not the rule matches. These
439 callbacks are called with three arguments. The first argument is a
440 path, which is also locally aliased as the $_ global variable for
441 convenience in simple tests.
442
443 $rule->and( sub { -r -w -x $_ } ); # tests $_
444
445 The second argument is the basename of the path, which is useful for
446 certain types of name checks:
447
448 $rule->and( sub { $_[1] =~ /foo|bar/ } ); "foo" or "bar" in basename;
449
450 The third argument is a hash reference that can be used to maintain
451 state. Keys beginning with an underscore are reserved for
452 "Path::Iterator::Rule" to provide additional data about the search in
453 progress. For example, the "_depth" key is used to support minimum and
454 maximum depth checks.
455
456 The custom rule subroutine must return one of four values:
457
458 • A true value -- indicates the constraint is satisfied
459
460 • A false value -- indicates the constraint is not satisfied
461
462 • "\1" -- indicate the constraint is satisfied, and prune if it's a
463 directory
464
465 • "\0" -- indicate the constraint is not satisfied, and prune if it's
466 a directory
467
468 A reference is a special flag that signals that a directory should not
469 be searched recursively, regardless of whether the directory should be
470 returned by the iterator or not.
471
472 The legacy "0 but true" value used previously for pruning is no longer
473 valid and will throw an exception if it is detected.
474
475 Here is an example. This is equivalent to the "max_depth" rule method
476 with a depth of 3:
477
478 $rule->and(
479 sub {
480 my ($path, $basename, $stash) = @_;
481 return 1 if $stash->{_depth} < 3;
482 return \1 if $stash->{_depth} == 3;
483 return \0; # should never get here
484 }
485 );
486
487 Files and directories and directories up to depth 3 will be returned
488 and directories will be searched. Files of depth 3 will be returned.
489 Directories of depth 3 will be returned, but their contents will not be
490 added to the search.
491
492 Returning a reference is "sticky" -- they will propagate through "and"
493 and "or" logic.
494
495 0 && \0 = \0 \0 && 0 = \0 0 || \0 = \0 \0 || 0 = \0
496 0 && \1 = \0 \0 && 1 = \0 0 || \1 = \1 \0 || 1 = \1
497 1 && \0 = \0 \1 && 0 = \0 1 || \0 = \1 \1 || 0 = \1
498 1 && \1 = \1 \1 && 1 = \1 1 || \1 = \1 \1 || 1 = \1
499
500 Once a directory is flagged to be pruned, it will be pruned regardless
501 of subsequent rules.
502
503 $rule->max_depth(3)->name(qr/foo/);
504
505 This will return files or directories with "foo" in the name, but all
506 directories at depth 3 will be pruned, regardless of whether they match
507 the name rule.
508
509 Generally, if you want to do directory pruning, you are encouraged to
510 use the "skip" method instead of writing your own logic using "\0" and
511 "\1".
512
513 Extension modules and custom rule methods
514 One of the strengths of File::Find::Rule is the many CPAN modules that
515 extend it. "Path::Iterator::Rule" provides the "add_helper" method to
516 provide a similar mechanism for extensions.
517
518 The "add_helper" class method takes three arguments, a "name" for the
519 rule method, a closure-generating callback, and a flag for not
520 generating a negated form of the rule. Unless the flag is true, an
521 inverted "not_*" method is generated automatically. Extension classes
522 should call this as a class method to install new rule methods. For
523 example, this adds a "foo" method that checks if the filename is "foo":
524
525 package Path::Iterator::Rule::Foo;
526
527 use Path::Iterator::Rule;
528
529 Path::Iterator::Rule->add_helper(
530 foo => sub {
531 my @args = @_; # do this to customize closure with arguments
532 return sub {
533 my ($item, $basename) = @_;
534 return if -d "$item";
535 return $basename =~ /^foo$/;
536 }
537 }
538 );
539
540 1;
541
542 This allows the following rule methods:
543
544 $rule->foo;
545 $fule->not_foo;
546
547 The "add_helper" method will warn and ignore a helper with the same
548 name as an existing method.
549
550 Subclassing
551 Instead of processing and returning strings, this module may be
552 subclassed to operate on objects that represent files. Such objects
553 must stringify to a file path.
554
555 The following private implementation methods must be overridden:
556
557 • _objectify -- given a path, return an object
558
559 • _children -- given a directory, return an (unsorted) list of [
560 basename, full path ] entries within it, excluding "." and ".."
561
562 Note that "_children" should return a list of tuples, where the tuples
563 are array references containing basename and full path.
564
565 See Path::Class::Rule source for an example.
566
568 If you run with lexical warnings enabled, "Path::Iterator::Rule" will
569 issue warnings in certain circumstances (such as an unreadable
570 directory that must be skipped). To disable these categories, put the
571 following statement at the correct scope:
572
573 no warnings 'Path::Iterator::Rule';
574
576 By default, "Path::Iterator::Rule" iterator options are "slow but
577 safe". They ensure uniqueness, return files in sorted order, and throw
578 nice error messages if something goes wrong.
579
580 If you want speed over safety, set these options:
581
582 %options = (
583 loop_safe => 0,
584 sorted => 0,
585 depthfirst => -1,
586 error_handler => undef
587 );
588
589 Alternatively, use the "iter_fast" and "all_fast" methods instead,
590 which set these options for you.
591
592 $iter = $rule->iter( @dirs, \%options );
593
594 $iter = $rule->iter_fast( @dirs ); # same thing
595
596 Depending on the file structure being searched, "depthfirst => -1" may
597 or may not be a good choice. If you have lots of nested directories and
598 all the files at the bottom, a depth first search might do less work or
599 use less memory, particularly if the search will be halted early (e.g.
600 finding the first N matches.)
601
602 Rules will shortcut on failure, so be sure to put rules likely to fail
603 early in a rule chain.
604
605 Consider:
606
607 $r1 = Path::Iterator::Rule->new->name(qr/foo/)->file;
608 $r2 = Path::Iterator::Rule->new->file->name(qr/foo/);
609
610 If there are lots of files, but only a few containing "foo", then $r1
611 above will be faster.
612
613 Rules are implemented as code references, so long chains have some
614 overhead. Consider testing with a custom coderef that combines several
615 tests into one.
616
617 Consider:
618
619 $r3 = Path::Iterator::Rule->new->and( sub { -x -w -r $_ } );
620 $r4 = Path::Iterator::Rule->new->executable->writeable->readable;
621
622 Rule $r3 above will be much faster, not only because it stacks the file
623 tests, but because it requires only a single code reference.
624
626 Some features are still unimplemented:
627
628 • Untainting options
629
630 • Some File::Find::Rule helpers (e.g. "grep")
631
632 • Extension class loading via "import()"
633
634 Filetest operators and stat rules are subject to the usual portability
635 considerations. See perlport for details.
636
638 There are many other file finding modules out there. They all have
639 various features/deficiencies, depending on your preferences and needs.
640 Here is an (incomplete) list of alternatives, with some comparison
641 commentary.
642
643 Path::Class::Rule and IO::All::Rule are subclasses of
644 "Path::Iterator::Rule" and operate on Path::Class and IO::All objects,
645 respectively. Because of this, they are substantially slower on large
646 directory trees than just using this module directly.
647
648 File::Find is part of the Perl core. It requires the user to write a
649 callback function to process each node of the search. Callbacks must
650 use global variables to determine the current node. It only supports
651 depth-first search (both pre- and post-order). It supports pre- and
652 post-processing callbacks; the former is required for sorting files to
653 process in a directory. File::Find::Closures can be used to help
654 create a callback for File::Find.
655
656 File::Find::Rule is an object-oriented wrapper around File::Find. It
657 provides a number of helper functions and there are many more
658 "File::Find::Rule::*" modules on CPAN with additional helpers. It
659 provides an iterator interface, but precomputes all the results.
660
661 File::Next provides iterators for file, directories or "everything".
662 It takes two callbacks, one to match files and one to decide which
663 directories to descend. It does not allow control over breadth/depth
664 order, though it does provide means to sort files for processing within
665 a directory. Like File::Find, it requires callbacks to use global
666 variables.
667
668 Path::Class::Iterator walks a directory structure with an iterator. It
669 is implemented as Path::Class subclasses, which adds a degree of extra
670 complexity. It takes a single callback to define "interesting" paths to
671 return. The callback gets a Path::Class::Iterator::File or
672 Path::Class::Iterator::Dir object for evaluation.
673
674 File::Find::Object and companion File::Find::Object::Rule are like
675 File::Find and File::Find::Rule, but without File::Find inside. They
676 use an iterator that does not precompute results. They can return
677 File::Find::Object::Result objects, which give a subset of the utility
678 of Path::Class objects. File::Find::Object::Rule appears to be a
679 literal translation of File::Find::Rule, including oddities like making
680 "-M" into a boolean.
681
682 File::chdir::WalkDir recursively descends a tree, calling a callback on
683 each file. No iterator. Supports exclusion patterns. Depth-first
684 post-order by default, but offers pre-order option. Does not process
685 symlinks.
686
687 File::Find::Iterator is based on iterator patterns in Higher Order
688 Perl. It allows a filtering callback. Symlinks are followed
689 automatically without infinite loop protection. No control over order.
690 It offers a "state file" option for resuming interrupted work.
691
692 File::Find::Declare has declarative helper rules, no iterator, is
693 Moose-based and offers no control over ordering or following symlinks.
694
695 File::Find::Node has no iterator, does matching via callback and offers
696 no control over ordering.
697
698 File::Set builds up a set of files to operate on from a list of
699 directories to include or exclude, with control over recursion. A
700 callback is applied to each file (or directory) in the set. There is
701 no iterator. There is no control over ordering. Symlinks are not
702 followed. It has several extra features for checksumming the set and
703 creating tarballs with /bin/tar.
704
706 Thank you to Ricardo Signes (rjbs) for inspiring me to write yet
707 another file finder module, for writing file finder optimization
708 benchmarks, and tirelessly running my code over and over to see if it
709 got faster.
710
711 • See the speed of Perl file finders
712 <http://rjbs.manxome.org/rubric/entry/1981>
713
715 David Golden <dagolden@cpan.org>
716
718 This software is Copyright (c) 2013 by David Golden.
719
720 This is free software, licensed under:
721
722 The Apache License, Version 2.0, January 2004
723
724
725
726perl v5.34.0 2022-01-21 PIR(3)