1Filter::Simple(3pm)    Perl Programmers Reference Guide    Filter::Simple(3pm)
2
3
4

NAME

6       Filter::Simple - Simplified source filtering
7

SYNOPSIS

9        # in MyFilter.pm:
10
11            package MyFilter;
12
13            use Filter::Simple;
14
15            FILTER { ... };
16
17            # or just:
18            #
19            # use Filter::Simple sub { ... };
20
21        # in user's code:
22
23            use MyFilter;
24
25            # this code is filtered
26
27            no MyFilter;
28
29            # this code is not
30

DESCRIPTION

32   The Problem
33       Source filtering is an immensely powerful feature of recent versions of
34       Perl.  It allows one to extend the language itself (e.g. the Switch
35       module), to simplify the language (e.g. Language::Pythonesque), or to
36       completely recast the language (e.g. Lingua::Romana::Perligata).
37       Effectively, it allows one to use the full power of Perl as its own,
38       recursively applied, macro language.
39
40       The excellent Filter::Util::Call module (by Paul Marquess) provides a
41       usable Perl interface to source filtering, but it is often too powerful
42       and not nearly as simple as it could be.
43
44       To use the module it is necessary to do the following:
45
46       1.  Download, build, and install the Filter::Util::Call module.  (If
47           you have Perl 5.7.1 or later, this is already done for you.)
48
49       2.  Set up a module that does a "use Filter::Util::Call".
50
51       3.  Within that module, create an "import" subroutine.
52
53       4.  Within the "import" subroutine do a call to "filter_add", passing
54           it either a subroutine reference.
55
56       5.  Within the subroutine reference, call "filter_read" or
57           "filter_read_exact" to "prime" $_ with source code data from the
58           source file that will "use" your module. Check the status value
59           returned to see if any source code was actually read in.
60
61       6.  Process the contents of $_ to change the source code in the desired
62           manner.
63
64       7.  Return the status value.
65
66       8.  If the act of unimporting your module (via a "no") should cause
67           source code filtering to cease, create an "unimport" subroutine,
68           and have it call "filter_del". Make sure that the call to
69           "filter_read" or "filter_read_exact" in step 5 will not
70           accidentally read past the "no". Effectively this limits source
71           code filters to line-by-line operation, unless the "import"
72           subroutine does some fancy pre-pre-parsing of the source code it's
73           filtering.
74
75       For example, here is a minimal source code filter in a module named
76       BANG.pm. It simply converts every occurrence of the sequence
77       "BANG\s+BANG" to the sequence "die 'BANG' if $BANG" in any piece of
78       code following a "use BANG;" statement (until the next "no BANG;"
79       statement, if any):
80
81           package BANG;
82
83           use Filter::Util::Call ;
84
85           sub import {
86               filter_add( sub {
87               my $caller = caller;
88               my ($status, $no_seen, $data);
89               while ($status = filter_read()) {
90                   if (/^\s*no\s+$caller\s*;\s*?$/) {
91                       $no_seen=1;
92                       last;
93                   }
94                   $data .= $_;
95                   $_ = "";
96               }
97               $_ = $data;
98               s/BANG\s+BANG/die 'BANG' if \$BANG/g
99                   unless $status < 0;
100               $_ .= "no $class;\n" if $no_seen;
101               return 1;
102               })
103           }
104
105           sub unimport {
106               filter_del();
107           }
108
109           1 ;
110
111       This level of sophistication puts filtering out of the reach of many
112       programmers.
113
114   A Solution
115       The Filter::Simple module provides a simplified interface to
116       Filter::Util::Call; one that is sufficient for most common cases.
117
118       Instead of the above process, with Filter::Simple the task of setting
119       up a source code filter is reduced to:
120
121       1.  Download and install the Filter::Simple module.  (If you have Perl
122           5.7.1 or later, this is already done for you.)
123
124       2.  Set up a module that does a "use Filter::Simple" and then calls
125           "FILTER { ... }".
126
127       3.  Within the anonymous subroutine or block that is passed to
128           "FILTER", process the contents of $_ to change the source code in
129           the desired manner.
130
131       In other words, the previous example, would become:
132
133           package BANG;
134           use Filter::Simple;
135
136           FILTER {
137               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
138           };
139
140           1 ;
141
142       Note that the source code is passed as a single string, so any regex
143       that uses "^" or "$" to detect line boundaries will need the "/m" flag.
144
145   Disabling or changing <no> behaviour
146       By default, the installed filter only filters up to a line consisting
147       of one of the three standard source "terminators":
148
149           no ModuleName;  # optional comment
150
151       or:
152
153           __END__
154
155       or:
156
157           __DATA__
158
159       but this can be altered by passing a second argument to "use
160       Filter::Simple" or "FILTER" (just remember: there's no comma after the
161       initial block when you use "FILTER").
162
163       That second argument may be either a "qr"'d regular expression (which
164       is then used to match the terminator line), or a defined false value
165       (which indicates that no terminator line should be looked for), or a
166       reference to a hash (in which case the terminator is the value
167       associated with the key 'terminator'.
168
169       For example, to cause the previous filter to filter only up to a line
170       of the form:
171
172           GNAB esu;
173
174       you would write:
175
176           package BANG;
177           use Filter::Simple;
178
179           FILTER {
180               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
181           }
182           qr/^\s*GNAB\s+esu\s*;\s*?$/;
183
184       or:
185
186           FILTER {
187               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
188           }
189           { terminator => qr/^\s*GNAB\s+esu\s*;\s*?$/ };
190
191       and to prevent the filter's being turned off in any way:
192
193           package BANG;
194           use Filter::Simple;
195
196           FILTER {
197               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
198           }
199           "";    # or: 0
200
201       or:
202
203           FILTER {
204               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
205           }
206           { terminator => "" };
207
208       Note that, no matter what you set the terminator pattern to, the actual
209       terminator itself must be contained on a single source line.
210
211   All-in-one interface
212       Separating the loading of Filter::Simple:
213
214           use Filter::Simple;
215
216       from the setting up of the filtering:
217
218           FILTER { ... };
219
220       is useful because it allows other code (typically parser support code
221       or caching variables) to be defined before the filter is invoked.
222       However, there is often no need for such a separation.
223
224       In those cases, it is easier to just append the filtering subroutine
225       and any terminator specification directly to the "use" statement that
226       loads Filter::Simple, like so:
227
228           use Filter::Simple sub {
229               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
230           };
231
232       This is exactly the same as:
233
234           use Filter::Simple;
235           BEGIN {
236               Filter::Simple::FILTER {
237                   s/BANG\s+BANG/die 'BANG' if \$BANG/g;
238               };
239           }
240
241       except that the "FILTER" subroutine is not exported by Filter::Simple.
242
243   Filtering only specific components of source code
244       One of the problems with a filter like:
245
246           use Filter::Simple;
247
248           FILTER { s/BANG\s+BANG/die 'BANG' if \$BANG/g };
249
250       is that it indiscriminately applies the specified transformation to the
251       entire text of your source program. So something like:
252
253           warn 'BANG BANG, YOU'RE DEAD';
254           BANG BANG;
255
256       will become:
257
258           warn 'die 'BANG' if $BANG, YOU'RE DEAD';
259           die 'BANG' if $BANG;
260
261       It is very common when filtering source to only want to apply the
262       filter to the non-character-string parts of the code, or alternatively
263       to only the character strings.
264
265       Filter::Simple supports this type of filtering by automatically
266       exporting the "FILTER_ONLY" subroutine.
267
268       "FILTER_ONLY" takes a sequence of specifiers that install separate (and
269       possibly multiple) filters that act on only parts of the source code.
270       For example:
271
272           use Filter::Simple;
273
274           FILTER_ONLY
275               code      => sub { s/BANG\s+BANG/die 'BANG' if \$BANG/g },
276               quotelike => sub { s/BANG\s+BANG/CHITTY CHITTY/g };
277
278       The "code" subroutine will only be used to filter parts of the source
279       code that are not quotelikes, POD, or "__DATA__". The "quotelike"
280       subroutine only filters Perl quotelikes (including here documents).
281
282       The full list of alternatives is:
283
284       "code"
285           Filters only those sections of the source code that are not
286           quotelikes, POD, or "__DATA__".
287
288       "code_no_comments"
289           Filters only those sections of the source code that are not
290           quotelikes, POD, comments, or "__DATA__".
291
292       "executable"
293           Filters only those sections of the source code that are not POD or
294           "__DATA__".
295
296       "executable_no_comments"
297           Filters only those sections of the source code that are not POD,
298           comments, or "__DATA__".
299
300       "quotelike"
301           Filters only Perl quotelikes (as interpreted by
302           &Text::Balanced::extract_quotelike).
303
304       "string"
305           Filters only the string literal parts of a Perl quotelike (i.e. the
306           contents of a string literal, either half of a "tr///", the second
307           half of an "s///").
308
309       "regex"
310           Filters only the pattern literal parts of a Perl quotelike (i.e.
311           the contents of a "qr//" or an "m//", the first half of an "s///").
312
313       "all"
314           Filters everything. Identical in effect to "FILTER".
315
316       Except for "FILTER_ONLY code => sub {...}", each of the component
317       filters is called repeatedly, once for each component found in the
318       source code.
319
320       Note that you can also apply two or more of the same type of filter in
321       a single "FILTER_ONLY". For example, here's a simple macro-preprocessor
322       that is only applied within regexes, with a final debugging pass that
323       prints the resulting source code:
324
325           use Regexp::Common;
326           FILTER_ONLY
327               regex => sub { s/!\[/[^/g },
328               regex => sub { s/%d/$RE{num}{int}/g },
329               regex => sub { s/%f/$RE{num}{real}/g },
330               all   => sub { print if $::DEBUG };
331
332   Filtering only the code parts of source code
333       Most source code ceases to be grammatically correct when it is broken
334       up into the pieces between string literals and regexes. So the 'code'
335       and 'code_no_comments' component filter behave slightly differently
336       from the other partial filters described in the previous section.
337
338       Rather than calling the specified processor on each individual piece of
339       code (i.e. on the bits between quotelikes), the 'code...' partial
340       filters operate on the entire source code, but with the quotelike bits
341       (and, in the case of 'code_no_comments', the comments) "blanked out".
342
343       That is, a 'code...' filter replaces each quoted string, quotelike,
344       regex, POD, and __DATA__ section with a placeholder. The delimiters of
345       this placeholder are the contents of the $; variable at the time the
346       filter is applied (normally "\034"). The remaining four bytes are a
347       unique identifier for the component being replaced.
348
349       This approach makes it comparatively easy to write code preprocessors
350       without worrying about the form or contents of strings, regexes, etc.
351
352       For convenience, during a 'code...' filtering operation, Filter::Simple
353       provides a package variable ($Filter::Simple::placeholder) that
354       contains a pre-compiled regex that matches any placeholder...and
355       captures the identifier within the placeholder. Placeholders can be
356       moved and re-ordered within the source code as needed.
357
358       In addition, a second package variable (@Filter::Simple::components)
359       contains a list of the various pieces of $_, as they were originally
360       split up to allow placeholders to be inserted.
361
362       Once the filtering has been applied, the original strings, regexes,
363       POD, etc. are re-inserted into the code, by replacing each placeholder
364       with the corresponding original component (from @components). Note that
365       this means that the @components variable must be treated with extreme
366       care within the filter. The @components array stores the "back-
367       translations" of each placeholder inserted into $_, as well as the
368       interstitial source code between placeholders. If the placeholder
369       backtranslations are altered in @components, they will be similarly
370       changed when the placeholders are removed from $_ after the filter is
371       complete.
372
373       For example, the following filter detects concatenated pairs of
374       strings/quotelikes and reverses the order in which they are
375       concatenated:
376
377           package DemoRevCat;
378           use Filter::Simple;
379
380           FILTER_ONLY code => sub {
381               my $ph = $Filter::Simple::placeholder;
382               s{ ($ph) \s* [.] \s* ($ph) }{ $2.$1 }gx
383           };
384
385       Thus, the following code:
386
387           use DemoRevCat;
388
389           my $str = "abc" . q(def);
390
391           print "$str\n";
392
393       would become:
394
395           my $str = q(def)."abc";
396
397           print "$str\n";
398
399       and hence print:
400
401           defabc
402
403   Using Filter::Simple with an explicit "import" subroutine
404       Filter::Simple generates a special "import" subroutine for your module
405       (see "How it works") which would normally replace any "import"
406       subroutine you might have explicitly declared.
407
408       However, Filter::Simple is smart enough to notice your existing
409       "import" and Do The Right Thing with it.  That is, if you explicitly
410       define an "import" subroutine in a package that's using Filter::Simple,
411       that "import" subroutine will still be invoked immediately after any
412       filter you install.
413
414       The only thing you have to remember is that the "import" subroutine
415       must be declared before the filter is installed. If you use "FILTER" to
416       install the filter:
417
418           package Filter::TurnItUpTo11;
419
420           use Filter::Simple;
421
422           FILTER { s/(\w+)/\U$1/ };
423
424       that will almost never be a problem, but if you install a filtering
425       subroutine by passing it directly to the "use Filter::Simple"
426       statement:
427
428           package Filter::TurnItUpTo11;
429
430           use Filter::Simple sub{ s/(\w+)/\U$1/ };
431
432       then you must make sure that your "import" subroutine appears before
433       that "use" statement.
434
435   Using Filter::Simple and Exporter together
436       Likewise, Filter::Simple is also smart enough to Do The Right Thing if
437       you use Exporter:
438
439           package Switch;
440           use base Exporter;
441           use Filter::Simple;
442
443           @EXPORT    = qw(switch case);
444           @EXPORT_OK = qw(given  when);
445
446           FILTER { $_ = magic_Perl_filter($_) }
447
448       Immediately after the filter has been applied to the source,
449       Filter::Simple will pass control to Exporter, so it can do its magic
450       too.
451
452       Of course, here too, Filter::Simple has to know you're using Exporter
453       before it applies the filter. That's almost never a problem, but if
454       you're nervous about it, you can guarantee that things will work
455       correctly by ensuring that your "use base Exporter" always precedes
456       your "use Filter::Simple".
457
458   How it works
459       The Filter::Simple module exports into the package that calls "FILTER"
460       (or "use"s it directly) -- such as package "BANG" in the above example
461       -- two automagically constructed subroutines -- "import" and "unimport"
462       -- which take care of all the nasty details.
463
464       In addition, the generated "import" subroutine passes its own argument
465       list to the filtering subroutine, so the BANG.pm filter could easily be
466       made parametric:
467
468           package BANG;
469
470           use Filter::Simple;
471
472           FILTER {
473               my ($die_msg, $var_name) = @_;
474               s/BANG\s+BANG/die '$die_msg' if \${$var_name}/g;
475           };
476
477           # and in some user code:
478
479           use BANG "BOOM", "BAM";  # "BANG BANG" becomes: die 'BOOM' if $BAM
480
481       The specified filtering subroutine is called every time a "use BANG" is
482       encountered, and passed all the source code following that call, up to
483       either the next "no BANG;" (or whatever terminator you've set) or the
484       end of the source file, whichever occurs first. By default, any "no
485       BANG;" call must appear by itself on a separate line, or it is ignored.
486

AUTHOR

488       Damian Conway
489

CONTACT

491       Filter::Simple is now maintained by the Perl5-Porters.  Please submit
492       bug via the "perlbug" tool that comes with your perl.  For usage
493       instructions, read "perldoc perlbug" or possibly "man perlbug".  For
494       mostly anything else, please contact <perl5-porters@perl.org>.
495
496       Maintainer of the CPAN release is Steffen Mueller <smueller@cpan.org>.
497       Contact him with technical difficulties with respect to the packaging
498       of the CPAN module.
499
500       Praise of the module, flowers, and presents still go to the author,
501       Damian Conway <damian@conway.org>.
502
504           Copyright (c) 2000-2008, Damian Conway. All Rights Reserved.
505           This module is free software. It may be used, redistributed
506           and/or modified under the same terms as Perl itself.
507
508
509
510perl v5.12.4                      2011-06-07               Filter::Simple(3pm)
Impressum