1PIR(3)                User Contributed Perl Documentation               PIR(3)
2
3
4

NAME

6       PIR - Short alias for Path::Iterator::Rule
7

VERSION

9       version 1.014
10

SYNOPSIS

12         use PIR;
13
14         my $rule = PIR->new;          # match anything
15         $rule->file->size(">10k");    # add/chain rules
16
17         # iterator interface
18         my $next = $rule->iter( @dirs );
19         while ( defined( my $file = $next->() ) ) {
20           ...
21         }
22
23         # list interface
24         for my $file ( $rule->all( @dirs ) ) {
25           ...
26         }
27

DESCRIPTION

29       This is an empty subclass of Path::Iterator::Rule.  It saves you from
30       having to type the full name repeatedly, which is particularly handy
31       for one-liners:
32
33           $ perl -MPIR -wE 'say for PIR->new->skip_dirs(".")->perl_module->all(@INC)'
34

USAGE

36   Constructors
37       "new"
38
39         my $rule = Path::Iterator::Rule->new;
40
41       Creates a new rule object that matches any file or directory.  It takes
42       no arguments. For convenience, it may also be called on an object, in
43       which case it still returns a new object that matches any file or
44       directory.
45
46       "clone"
47
48         my $common      = Path::Iterator::Rule->new->file->not_empty;
49         my $big_files   = $common->clone->size(">1M");
50         my $small_files = $common->clone->size("<10K");
51
52       Creates a copy of a rule object.  Useful for customizing different rule
53       objects against a common base.
54
55   Matching and iteration
56       "iter"
57
58         my $next = $rule->iter( @dirs, \%options);
59         while ( defined( my $file = $next->() ) ) {
60           ...
61         }
62
63       Creates a subroutine reference iterator that returns a single result
64       when dereferenced.  This iterator is "lazy" -- results are not pre-
65       computed.
66
67       It takes as arguments a list of directories to search and an optional
68       hash reference of control options.  If no search directories are
69       provided, the current directory is used (".").  Valid options include:
70
71       •   "depthfirst" -- Controls order of results.  Valid values are "1"
72           (post-order, depth-first search), "0" (breadth-first search) or
73           "-1" (pre-order, depth-first search). Default is 0.
74
75       •   "error_handler" -- Catches errors during execution of rule tests.
76           Default handler dies with the filename and error. If set to undef,
77           error handling is disabled.
78
79       •   "follow_symlinks" -- Follow directory symlinks when true. Default
80           is 1.
81
82       •   "report_symlinks" -- Includes symlinks in results when true.
83           Default is equal to "follow_symlinks".
84
85       •   "loop_safe" -- Prevents visiting the same directory more than once
86           when true.  Default is 1.
87
88       •   "relative" -- Return matching items relative to the search
89           directory. Default is 0.
90
91       •   "sorted" -- Whether entries in a directory are sorted before
92           processing. Default is 1.
93
94       •   "visitor" -- An optional coderef that will be called on items
95           matching all rules.
96
97       Filesystem loops might exist from either hard or soft links.  The
98       "loop_safe" option prevents infinite loops, but adds some overhead by
99       making "stat" calls.  Because directories are visited only once when
100       "loop_safe" is true, matches could come from a symlinked directory
101       before the real directory depending on the search order.
102
103       To get only the real files, turn off "follow_symlinks".  You can have
104       symlinks included in results, but not descend into symlink directories
105       if you turn off "follow_symlinks", but turn on "report_symlinks".
106
107       Turning "loop_safe" off and leaving "follow_symlinks" on avoids "stat"
108       calls and will be fastest, but with the risk of an infinite loop and
109       repeated files.  The default is slow, but safe.
110
111       The "error_handler" parameter must be a subroutine reference.  It will
112       be called when a rule test throws an exception.  The first argument
113       will be the file name being inspected and the second argument will be
114       the exception.
115
116       The optional "visitor" parameter must be a subroutine reference.  If
117       set, it will be called for any result that matches.  It is called the
118       same way a custom rule would be (see "EXTENDING") but its return value
119       is ignored.  It is called when an item is first inspected --
120       "postorder" is not respected.
121
122       The paths inspected and returned will be relative to the search
123       directories provided.  If these are absolute, then the paths returned
124       will have absolute paths.  If these are relative, then the paths
125       returned will have relative paths.
126
127       If the search directories are absolute and the "relative" option is
128       true, files returned will be relative to the search directory.  Note
129       that if the search directories are not mutually exclusive (whether
130       containing subdirectories like @INC or symbolic links), files found
131       could be returned relative to different initial search directories
132       based on "depthfirst", "follow_symlinks" or "loop_safe".
133
134       When the iterator is exhausted, it will return undef.
135
136       "iter_fast"
137
138       This works just like "iter", except that it optimizes for speed over
139       safety. Don't do this unless you're sure you need it and accept the
140       consequences.  See "PERFORMANCE" for details.
141
142       "all"
143
144         my @matches = $rule->all( @dir, \%options );
145
146       Returns a list of paths that match the rule.  It takes the same
147       arguments and has the same behaviors as the "iter" method.  The "all"
148       method uses "iter" internally to fetch all results.
149
150       In scalar context, it will return the count of matched paths.
151
152       In void context, it is optimized to iterate over everything, but not
153       store results.  This is most useful with the "visitor" option:
154
155           $rule->all( $path, { visitor => \&callback } );
156
157       "all_fast"
158
159       This works just like "all", except that it optimizes for speed over
160       safety. Don't do this unless you're sure you need it and accept the
161       consequences.  See "PERFORMANCE" for details.
162
163       "test"
164
165         if ( $rule->test( $path, $basename, $stash ) ) { ... }
166
167       Test a file path against a rule.  Used internally, but provided should
168       someone want to create their own, custom iteration algorithm.
169
170   Logic operations
171       "Path::Iterator::Rule" provides three logic operations for adding rules
172       to the object.  Rules may be either a subroutine reference with
173       specific semantics (described below in "EXTENDING") or another
174       "Path::Iterator::Rule" object.
175
176       "and"
177
178         $rule->and( sub { -r -w -x $_ } ); # stacked filetest example
179         $rule->and( @more_rules );
180
181       Adds one or more constraints to the current rule. E.g. "old rule AND
182       new1 AND new2 AND ...".  Returns the object to allow method chaining.
183
184       "or"
185
186         $rule->or(
187           $rule->new->name("foo*"),
188           $rule->new->name("bar*"),
189           sub { -r -w -x $_ },
190         );
191
192       Takes one or more alternatives and adds them as a constraint to the
193       current rule. E.g. "old rule AND ( new1 OR new2 OR ... )".  Returns the
194       object to allow method chaining.
195
196       "not"
197
198         $rule->not( sub { -r -w -x $_ } );
199
200       Takes one or more alternatives and adds them as a negative constraint
201       to the current rule. E.g. "old rule AND NOT ( new1 AND new2 AND ...)".
202       Returns the object to allow method chaining.
203
204       "skip"
205
206         $rule->skip(
207           $rule->new->dir->not_writeable,
208           $rule->new->dir->name("foo"),
209         );
210
211       Takes one or more alternatives and will prune a directory if any of the
212       criteria match or if any of the rules already indicate the directory
213       should be pruned.  Pruning means the directory will not be returned by
214       the iterator and will not be searched.
215
216       For files, it is equivalent to "$rule->not($rule->or(@rules))".
217       Returns the object to allow method chaining.
218
219       This method should be called as early as possible in the rule chain.
220       See "skip_dirs" below for further explanation and an example.
221

RULE METHODS

223       Rule methods are helpers that add constraints.  Internally, they
224       generate a closure to accomplish the desired logic and add it to the
225       rule object with the "and" method.  Rule methods return the object to
226       allow for method chaining.
227
228   File name rules
229       "name"
230
231         $rule->name( "foo.txt" );
232         $rule->name( qr/foo/, "bar.*");
233
234       The "name" method takes one or more patterns and creates a rule that is
235       true if any of the patterns match the basename of the file or directory
236       path.  Patterns may be regular expressions or glob expressions (or
237       literal names).
238
239       "iname"
240
241         $rule->iname( "foo.txt" );
242         $rule->iname( qr/foo/, "bar.*");
243
244       The "iname" method is just like the "name" method, but matches case-
245       insensitively.
246
247       "skip_dirs"
248
249         $rule->skip_dirs( @patterns );
250
251       The "skip_dirs" method skips directories that match one or more
252       patterns.  Patterns may be regular expressions or globs (just like
253       "name").  Directories that match will not be returned from the iterator
254       and will be excluded from further search.  This includes the starting
255       directories.  If that isn't what you want, see "skip_subdirs" instead.
256
257       Note: this rule should be specified early so that it has a chance to
258       operate before a logical shortcut.  E.g.
259
260         $rule->skip_dirs(".git")->file; # OK
261         $rule->file->skip_dirs(".git"); # Won't work
262
263       In the latter case, when a ".git" directory is seen, the "file" rule
264       shortcuts the rule before the "skip_dirs" rule has a chance to act.
265
266       "skip_subdirs"
267
268         $rule->skip_subdirs( @patterns );
269
270       This works just like "skip_dirs", except that the starting directories
271       (depth 0) are not skipped and may be returned from the iterator unless
272       excluded by other rules.
273
274   File test rules
275       Most of the "-X" style filetest are available as boolean rules.  The
276       table below maps the filetest to its corresponding method name.
277
278          Test | Method               Test |  Method
279         ------|-------------        ------|----------------
280           -r  |  readable             -R  |  r_readable
281           -w  |  writeable            -W  |  r_writeable
282           -w  |  writable             -W  |  r_writable
283           -x  |  executable           -X  |  r_executable
284           -o  |  owned                -O  |  r_owned
285               |                           |
286           -e  |  exists               -f  |  file
287           -z  |  empty                -d  |  directory, dir
288           -s  |  nonempty             -l  |  symlink
289               |                       -p  |  fifo
290           -u  |  setuid               -S  |  socket
291           -g  |  setgid               -b  |  block
292           -k  |  sticky               -c  |  character
293               |                       -t  |  tty
294           -T  |  ascii
295           -B  |  binary
296
297       For example:
298
299         $rule->file->nonempty; # -f -s $file
300
301       The -X operators for timestamps take a single argument in a form that
302       Number::Compare can interpret.
303
304          Test | Method
305         ------|-------------
306           -A  |  accessed
307           -M  |  modified
308           -C  |  changed
309
310       For example:
311
312         $rule->modified(">1"); # -M $file > 1
313
314   Stat test rules
315       All of the "stat" elements have a method that takes a single argument
316       in a form understood by Number::Compare.
317
318         stat()  |  Method
319        --------------------
320              0  |  dev
321              1  |  ino
322              2  |  mode
323              3  |  nlink
324              4  |  uid
325              5  |  gid
326              6  |  rdev
327              7  |  size
328              8  |  atime
329              9  |  mtime
330             10  |  ctime
331             11  |  blksize
332             12  |  blocks
333
334       For example:
335
336         $rule->size(">10K")
337
338   Depth rules
339         $rule->min_depth(3);
340         $rule->max_depth(5);
341
342       The "min_depth" and "max_depth" rule methods take a single argument and
343       limit the paths returned to a minimum or maximum depth (respectively)
344       from the starting search directory.  A depth of 0 means the starting
345       directory itself.  A depth of 1 means its children.  (This is similar
346       to the Unix "find" utility.)
347
348   Perl file rules
349         # All perl rules
350         $rule->perl_file;
351
352         # Individual perl file rules
353         $rule->perl_module;     # .pm files
354         $rule->perl_pod;        # .pod files
355         $rule->perl_test;       # .t files
356         $rule->perl_installer;  # Makefile.PL or Build.PL
357         $rule->perl_script;     # .pl or 'perl' in the shebang
358
359       These rule methods match file names (or a shebang line) that are
360       typical of Perl distribution files.
361
362   Version control file rules
363         # Skip all known VCS files
364         $rule->skip_vcs;
365
366         # Skip individual VCS files
367         $rule->skip_cvs;
368         $rule->skip_rcs;
369         $rule->skip_svn;
370         $rule->skip_git;
371         $rule->skip_bzr;
372         $rule->skip_hg;
373         $rule->skip_darcs;
374
375       Skips files and/or prunes directories related to a version control
376       system.  Just like "skip_dirs", these rules should be specified early
377       to get the correct behavior.
378
379   File content rules
380       "contents_match"
381
382         $rule->contents_match(qr/BEGIN .* END/xs);
383
384       The "contents_match" rule takes a list of regular expressions and
385       returns files that match one of the expressions.
386
387       The expressions are applied to the file's contents as a single string.
388       For large files, this is likely to take significant time and memory.
389
390       Files are assumed to be encoded in UTF-8, but alternative Perl IO
391       layers can be passed as the first argument:
392
393         $rule->contents_match(":encoding(iso-8859-1)", qr/BEGIN .* END/xs);
394
395       See perlio for further details.
396
397       "line_match"
398
399         $rule->line_match(qr/^new/i, qr/^Addition/);
400
401       The "line_match" rule takes a list of regular expressions and returns
402       files with at least one line that matches one of the expressions.
403
404       Files are assumed to be encoded in UTF-8, but alternative Perl IO
405       layers can be passed as the first argument.
406
407       "shebang"
408
409         $rule->shebang(qr/#!.*\bperl\b/);
410
411       The "shebang" rule takes a list of regular expressions or glob patterns
412       and checks them against the first line of a file.
413
414   Other rules
415       "dangling"
416
417         $rule->symlink->dangling;
418         $rule->not_dangling;
419
420       The "dangling" rule method matches dangling symlinks.  Use it or its
421       inverse to control how dangling symlinks should be treated.
422
423   Negated rules
424       Most rule methods have a negated form preceded by "not_".
425
426         $rule->not_name("foo.*")
427
428       Because this happens automatically, it includes somewhat silly ones
429       like "not_nonempty" (which is thus a less efficient way of saying
430       "empty").
431
432       Rules that skip directories or version control files do not have a
433       negated version.
434

EXTENDING

436   Custom rule subroutines
437       Rules are implemented as (usually anonymous) subroutine callbacks that
438       return a value indicating whether or not the rule matches.  These
439       callbacks are called with three arguments.  The first argument is a
440       path, which is also locally aliased as the $_ global variable for
441       convenience in simple tests.
442
443         $rule->and( sub { -r -w -x $_ } ); # tests $_
444
445       The second argument is the basename of the path, which is useful for
446       certain types of name checks:
447
448         $rule->and( sub { $_[1] =~ /foo|bar/ } ); "foo" or "bar" in basename;
449
450       The third argument is a hash reference that can be used to maintain
451       state.  Keys beginning with an underscore are reserved for
452       "Path::Iterator::Rule" to provide additional data about the search in
453       progress.  For example, the "_depth" key is used to support minimum and
454       maximum depth checks.
455
456       The custom rule subroutine must return one of four values:
457
458       •   A true value -- indicates the constraint is satisfied
459
460       •   A false value -- indicates the constraint is not satisfied
461
462       •   "\1" -- indicate the constraint is satisfied, and prune if it's a
463           directory
464
465       •   "\0" -- indicate the constraint is not satisfied, and prune if it's
466           a directory
467
468       A reference is a special flag that signals that a directory should not
469       be searched recursively, regardless of whether the directory should be
470       returned by the iterator or not.
471
472       The legacy "0 but true" value used previously for pruning is no longer
473       valid and will throw an exception if it is detected.
474
475       Here is an example.  This is equivalent to the "max_depth" rule method
476       with a depth of 3:
477
478         $rule->and(
479           sub {
480             my ($path, $basename, $stash) = @_;
481             return 1 if $stash->{_depth} < 3;
482             return \1 if $stash->{_depth} == 3;
483             return \0; # should never get here
484           }
485         );
486
487       Files and directories and directories up to depth 3 will be returned
488       and directories will be searched.  Files of depth 3 will be returned.
489       Directories of depth 3 will be returned, but their contents will not be
490       added to the search.
491
492       Returning a reference is "sticky" -- they will propagate through "and"
493       and "or" logic.
494
495           0 && \0 = \0    \0 && 0 = \0    0 || \0 = \0    \0 || 0 = \0
496           0 && \1 = \0    \0 && 1 = \0    0 || \1 = \1    \0 || 1 = \1
497           1 && \0 = \0    \1 && 0 = \0    1 || \0 = \1    \1 || 0 = \1
498           1 && \1 = \1    \1 && 1 = \1    1 || \1 = \1    \1 || 1 = \1
499
500       Once a directory is flagged to be pruned, it will be pruned regardless
501       of subsequent rules.
502
503           $rule->max_depth(3)->name(qr/foo/);
504
505       This will return files or directories with "foo" in the name, but all
506       directories at depth 3 will be pruned, regardless of whether they match
507       the name rule.
508
509       Generally, if you want to do directory pruning, you are encouraged to
510       use the "skip" method instead of writing your own logic using "\0" and
511       "\1".
512
513   Extension modules and custom rule methods
514       One of the strengths of File::Find::Rule is the many CPAN modules that
515       extend it.  "Path::Iterator::Rule" provides the "add_helper" method to
516       provide a similar mechanism for extensions.
517
518       The "add_helper" class method takes three arguments, a "name" for the
519       rule method, a closure-generating callback, and a flag for not
520       generating a negated form of the rule.  Unless the flag is true, an
521       inverted "not_*" method is generated automatically.  Extension classes
522       should call this as a class method to install new rule methods.  For
523       example, this adds a "foo" method that checks if the filename is "foo":
524
525         package Path::Iterator::Rule::Foo;
526
527         use Path::Iterator::Rule;
528
529         Path::Iterator::Rule->add_helper(
530           foo => sub {
531             my @args = @_; # do this to customize closure with arguments
532             return sub {
533               my ($item, $basename) = @_;
534               return if -d "$item";
535               return $basename =~ /^foo$/;
536             }
537           }
538         );
539
540         1;
541
542       This allows the following rule methods:
543
544         $rule->foo;
545         $fule->not_foo;
546
547       The "add_helper" method will warn and ignore a helper with the same
548       name as an existing method.
549
550   Subclassing
551       Instead of processing and returning strings, this module may be
552       subclassed to operate on objects that represent files.  Such objects
553       must stringify to a file path.
554
555       The following private implementation methods must be overridden:
556
557       •   _objectify -- given a path, return an object
558
559       •   _children -- given a directory, return an (unsorted) list of [
560           basename, full path ] entries within it, excluding "." and ".."
561
562       Note that "_children" should return a list of tuples, where the tuples
563       are array references containing basename and full path.
564
565       See Path::Class::Rule source for an example.
566

LEXICAL WARNINGS

568       If you run with lexical warnings enabled, "Path::Iterator::Rule" will
569       issue warnings in certain circumstances (such as an unreadable
570       directory that must be skipped).  To disable these categories, put the
571       following statement at the correct scope:
572
573         no warnings 'Path::Iterator::Rule';
574

PERFORMANCE

576       By default, "Path::Iterator::Rule" iterator options are "slow but
577       safe".  They ensure uniqueness, return files in sorted order, and throw
578       nice error messages if something goes wrong.
579
580       If you want speed over safety, set these options:
581
582           %options = (
583               loop_safe => 0,
584               sorted => 0,
585               depthfirst => -1,
586               error_handler => undef
587           );
588
589       Alternatively, use the "iter_fast" and "all_fast" methods instead,
590       which set these options for you.
591
592           $iter = $rule->iter( @dirs, \%options );
593
594           $iter = $rule->iter_fast( @dirs ); # same thing
595
596       Depending on the file structure being searched, "depthfirst => -1" may
597       or may not be a good choice. If you have lots of nested directories and
598       all the files at the bottom, a depth first search might do less work or
599       use less memory, particularly if the search will be halted early (e.g.
600       finding the first N matches.)
601
602       Rules will shortcut on failure, so be sure to put rules likely to fail
603       early in a rule chain.
604
605       Consider:
606
607           $r1 = Path::Iterator::Rule->new->name(qr/foo/)->file;
608           $r2 = Path::Iterator::Rule->new->file->name(qr/foo/);
609
610       If there are lots of files, but only a few containing "foo", then $r1
611       above will be faster.
612
613       Rules are implemented as code references, so long chains have some
614       overhead.  Consider testing with a custom coderef that combines several
615       tests into one.
616
617       Consider:
618
619           $r3 = Path::Iterator::Rule->new->and( sub { -x -w -r $_ } );
620           $r4 = Path::Iterator::Rule->new->executable->writeable->readable;
621
622       Rule $r3 above will be much faster, not only because it stacks the file
623       tests, but because it requires only a single code reference.
624

CAVEATS

626       Some features are still unimplemented:
627
628       •   Untainting options
629
630       •   Some File::Find::Rule helpers (e.g. "grep")
631
632       •   Extension class loading via "import()"
633
634       Filetest operators and stat rules are subject to the usual portability
635       considerations.  See perlport for details.
636

SEE ALSO

638       There are many other file finding modules out there.  They all have
639       various features/deficiencies, depending on your preferences and needs.
640       Here is an (incomplete) list of alternatives, with some comparison
641       commentary.
642
643       Path::Class::Rule and IO::All::Rule are subclasses of
644       "Path::Iterator::Rule" and operate on Path::Class and IO::All objects,
645       respectively.  Because of this, they are substantially slower on large
646       directory trees than just using this module directly.
647
648       File::Find is part of the Perl core.  It requires the user to write a
649       callback function to process each node of the search.  Callbacks must
650       use global variables to determine the current node.  It only supports
651       depth-first search (both pre- and post-order). It supports pre- and
652       post-processing callbacks; the former is required for sorting files to
653       process in a directory.  File::Find::Closures can be used to help
654       create a callback for File::Find.
655
656       File::Find::Rule is an object-oriented wrapper around File::Find.  It
657       provides a number of helper functions and there are many more
658       "File::Find::Rule::*" modules on CPAN with additional helpers.  It
659       provides an iterator interface, but precomputes all the results.
660
661       File::Next provides iterators for file, directories or "everything".
662       It takes two callbacks, one to match files and one to decide which
663       directories to descend.  It does not allow control over breadth/depth
664       order, though it does provide means to sort files for processing within
665       a directory. Like File::Find, it requires callbacks to use global
666       variables.
667
668       Path::Class::Iterator walks a directory structure with an iterator.  It
669       is implemented as Path::Class subclasses, which adds a degree of extra
670       complexity. It takes a single callback to define "interesting" paths to
671       return.  The callback gets a Path::Class::Iterator::File or
672       Path::Class::Iterator::Dir object for evaluation.
673
674       File::Find::Object and companion File::Find::Object::Rule are like
675       File::Find and File::Find::Rule, but without File::Find inside.  They
676       use an iterator that does not precompute results. They can return
677       File::Find::Object::Result objects, which give a subset of the utility
678       of Path::Class objects.  File::Find::Object::Rule appears to be a
679       literal translation of File::Find::Rule, including oddities like making
680       "-M" into a boolean.
681
682       File::chdir::WalkDir recursively descends a tree, calling a callback on
683       each file.  No iterator.  Supports exclusion patterns.  Depth-first
684       post-order by default, but offers pre-order option. Does not process
685       symlinks.
686
687       File::Find::Iterator is based on iterator patterns in Higher Order
688       Perl.  It allows a filtering callback. Symlinks are followed
689       automatically without infinite loop protection. No control over order.
690       It offers a "state file" option for resuming interrupted work.
691
692       File::Find::Declare has declarative helper rules, no iterator, is
693       Moose-based and offers no control over ordering or following symlinks.
694
695       File::Find::Node has no iterator, does matching via callback and offers
696       no control over ordering.
697
698       File::Set builds up a set of files to operate on from a list of
699       directories to include or exclude, with control over recursion.  A
700       callback is applied to each file (or directory) in the set.  There is
701       no iterator.  There is no control over ordering.  Symlinks are not
702       followed.  It has several extra features for checksumming the set and
703       creating tarballs with /bin/tar.
704

THANKS

706       Thank you to Ricardo Signes (rjbs) for inspiring me to write yet
707       another file finder module, for writing file finder optimization
708       benchmarks, and tirelessly running my code over and over to see if it
709       got faster.
710
711       •   See the speed of Perl file finders
712           <http://rjbs.manxome.org/rubric/entry/1981>
713

AUTHOR

715       David Golden <dagolden@cpan.org>
716
718       This software is Copyright (c) 2013 by David Golden.
719
720       This is free software, licensed under:
721
722         The Apache License, Version 2.0, January 2004
723
724
725
726perl v5.32.1                      2021-01-27                            PIR(3)
Impressum