1Filter::Simple(3pm)    Perl Programmers Reference Guide    Filter::Simple(3pm)
2
3
4

NAME

6       Filter::Simple - Simplified source filtering
7

SYNOPSIS

9        # in MyFilter.pm:
10
11            package MyFilter;
12
13            use Filter::Simple;
14
15            FILTER { ... };
16
17            # or just:
18            #
19            # use Filter::Simple sub { ... };
20
21        # in user's code:
22
23            use MyFilter;
24
25            # this code is filtered
26
27            no MyFilter;
28
29            # this code is not
30

DESCRIPTION

32       The Problem
33
34       Source filtering is an immensely powerful feature of recent versions of
35       Perl.  It allows one to extend the language itself (e.g. the Switch
36       module), to simplify the language (e.g. Language::Pythonesque), or to
37       completely recast the language (e.g. Lingua::Romana::Perligata). Effec‐
38       tively, it allows one to use the full power of Perl as its own, recur‐
39       sively applied, macro language.
40
41       The excellent Filter::Util::Call module (by Paul Marquess) provides a
42       usable Perl interface to source filtering, but it is often too powerful
43       and not nearly as simple as it could be.
44
45       To use the module it is necessary to do the following:
46
47       1.  Download, build, and install the Filter::Util::Call module.  (If
48           you have Perl 5.7.1 or later, this is already done for you.)
49
50       2.  Set up a module that does a "use Filter::Util::Call".
51
52       3.  Within that module, create an "import" subroutine.
53
54       4.  Within the "import" subroutine do a call to "filter_add", passing
55           it either a subroutine reference.
56
57       5.  Within the subroutine reference, call "filter_read" or "fil‐
58           ter_read_exact" to "prime" $_ with source code data from the source
59           file that will "use" your module. Check the status value returned
60           to see if any source code was actually read in.
61
62       6.  Process the contents of $_ to change the source code in the desired
63           manner.
64
65       7.  Return the status value.
66
67       8.  If the act of unimporting your module (via a "no") should cause
68           source code filtering to cease, create an "unimport" subroutine,
69           and have it call "filter_del". Make sure that the call to "fil‐
70           ter_read" or "filter_read_exact" in step 5 will not accidentally
71           read past the "no". Effectively this limits source code filters to
72           line-by-line operation, unless the "import" subroutine does some
73           fancy pre-pre-parsing of the source code it's filtering.
74
75       For example, here is a minimal source code filter in a module named
76       BANG.pm. It simply converts every occurrence of the sequence
77       "BANG\s+BANG" to the sequence "die 'BANG' if $BANG" in any piece of
78       code following a "use BANG;" statement (until the next "no BANG;"
79       statement, if any):
80
81           package BANG;
82
83           use Filter::Util::Call ;
84
85           sub import {
86               filter_add( sub {
87               my $caller = caller;
88               my ($status, $no_seen, $data);
89               while ($status = filter_read()) {
90                   if (/^\s*no\s+$caller\s*;\s*?$/) {
91                       $no_seen=1;
92                       last;
93                   }
94                   $data .= $_;
95                   $_ = "";
96               }
97               $_ = $data;
98               s/BANG\s+BANG/die 'BANG' if \$BANG/g
99                   unless $status < 0;
100               $_ .= "no $class;\n" if $no_seen;
101               return 1;
102               })
103           }
104
105           sub unimport {
106               filter_del();
107           }
108
109           1 ;
110
111       This level of sophistication puts filtering out of the reach of many
112       programmers.
113
114       A Solution
115
116       The Filter::Simple module provides a simplified interface to Fil‐
117       ter::Util::Call; one that is sufficient for most common cases.
118
119       Instead of the above process, with Filter::Simple the task of setting
120       up a source code filter is reduced to:
121
122       1.  Download and install the Filter::Simple module.  (If you have Perl
123           5.7.1 or later, this is already done for you.)
124
125       2.  Set up a module that does a "use Filter::Simple" and then calls
126           "FILTER { ... }".
127
128       3.  Within the anonymous subroutine or block that is passed to "FIL‐
129           TER", process the contents of $_ to change the source code in the
130           desired manner.
131
132       In other words, the previous example, would become:
133
134           package BANG;
135           use Filter::Simple;
136
137           FILTER {
138               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
139           };
140
141           1 ;
142
143       Note that the source code is passed as a single string, so any regex
144       that uses "^" or "$" to detect line boundaries will need the "/m" flag.
145
146       Disabling or changing <no> behaviour
147
148       By default, the installed filter only filters up to a line consisting
149       of one of the three standard source "terminators":
150
151           no ModuleName;  # optional comment
152
153       or:
154
155           __END__
156
157       or:
158
159           __DATA__
160
161       but this can be altered by passing a second argument to "use Fil‐
162       ter::Simple" or "FILTER" (just remember: there's no comma after the
163       initial block when you use "FILTER").
164
165       That second argument may be either a "qr"'d regular expression (which
166       is then used to match the terminator line), or a defined false value
167       (which indicates that no terminator line should be looked for), or a
168       reference to a hash (in which case the terminator is the value associ‐
169       ated with the key 'terminator'.
170
171       For example, to cause the previous filter to filter only up to a line
172       of the form:
173
174           GNAB esu;
175
176       you would write:
177
178           package BANG;
179           use Filter::Simple;
180
181           FILTER {
182               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
183           }
184           qr/^\s*GNAB\s+esu\s*;\s*?$/;
185
186       or:
187
188           FILTER {
189               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
190           }
191           { terminator => qr/^\s*GNAB\s+esu\s*;\s*?$/ };
192
193       and to prevent the filter's being turned off in any way:
194
195           package BANG;
196           use Filter::Simple;
197
198           FILTER {
199               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
200           }
201           "";    # or: 0
202
203       or:
204
205           FILTER {
206               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
207           }
208           { terminator => "" };
209
210       Note that, no matter what you set the terminator pattern to, the actual
211       terminator itself must be contained on a single source line.
212
213       All-in-one interface
214
215       Separating the loading of Filter::Simple:
216
217           use Filter::Simple;
218
219       from the setting up of the filtering:
220
221           FILTER { ... };
222
223       is useful because it allows other code (typically parser support code
224       or caching variables) to be defined before the filter is invoked.  How‐
225       ever, there is often no need for such a separation.
226
227       In those cases, it is easier to just append the filtering subroutine
228       and any terminator specification directly to the "use" statement that
229       loads Filter::Simple, like so:
230
231           use Filter::Simple sub {
232               s/BANG\s+BANG/die 'BANG' if \$BANG/g;
233           };
234
235       This is exactly the same as:
236
237           use Filter::Simple;
238           BEGIN {
239               Filter::Simple::FILTER {
240                   s/BANG\s+BANG/die 'BANG' if \$BANG/g;
241               };
242           }
243
244       except that the "FILTER" subroutine is not exported by Filter::Simple.
245
246       Filtering only specific components of source code
247
248       One of the problems with a filter like:
249
250           use Filter::Simple;
251
252           FILTER { s/BANG\s+BANG/die 'BANG' if \$BANG/g };
253
254       is that it indiscriminately applies the specified transformation to the
255       entire text of your source program. So something like:
256
257           warn 'BANG BANG, YOU'RE DEAD';
258           BANG BANG;
259
260       will become:
261
262           warn 'die 'BANG' if $BANG, YOU'RE DEAD';
263           die 'BANG' if $BANG;
264
265       It is very common when filtering source to only want to apply the fil‐
266       ter to the non-character-string parts of the code, or alternatively to
267       only the character strings.
268
269       Filter::Simple supports this type of filtering by automatically export‐
270       ing the "FILTER_ONLY" subroutine.
271
272       "FILTER_ONLY" takes a sequence of specifiers that install separate (and
273       possibly multiple) filters that act on only parts of the source code.
274       For example:
275
276           use Filter::Simple;
277
278           FILTER_ONLY
279               code      => sub { s/BANG\s+BANG/die 'BANG' if \$BANG/g },
280               quotelike => sub { s/BANG\s+BANG/CHITTY CHITTY/g };
281
282       The "code" subroutine will only be used to filter parts of the source
283       code that are not quotelikes, POD, or "__DATA__". The "quotelike" sub‐
284       routine only filters Perl quotelikes (including here documents).
285
286       The full list of alternatives is:
287
288       "code"
289           Filters only those sections of the source code that are not quote‐
290           likes, POD, or "__DATA__".
291
292       "code_no_comments"
293           Filters only those sections of the source code that are not quote‐
294           likes, POD, comments, or "__DATA__".
295
296       "executable"
297           Filters only those sections of the source code that are not POD or
298           "__DATA__".
299
300       "executable_no_comments"
301           Filters only those sections of the source code that are not POD,
302           comments, or "__DATA__".
303
304       "quotelike"
305           Filters only Perl quotelikes (as interpreted by &Text::Bal‐
306           anced::extract_quotelike).
307
308       "string"
309           Filters only the string literal parts of a Perl quotelike (i.e. the
310           contents of a string literal, either half of a "tr///", the second
311           half of an "s///").
312
313       "regex"
314           Filters only the pattern literal parts of a Perl quotelike (i.e.
315           the contents of a "qr//" or an "m//", the first half of an "s///").
316
317       "all"
318           Filters everything. Identical in effect to "FILTER".
319
320       Except for "FILTER_ONLY code => sub {...}", each of the component fil‐
321       ters is called repeatedly, once for each component found in the source
322       code.
323
324       Note that you can also apply two or more of the same type of filter in
325       a single "FILTER_ONLY". For example, here's a simple macro-preprocessor
326       that is only applied within regexes, with a final debugging pass that
327       prints the resulting source code:
328
329           use Regexp::Common;
330           FILTER_ONLY
331               regex => sub { s/!\[/[^/g },
332               regex => sub { s/%d/$RE{num}{int}/g },
333               regex => sub { s/%f/$RE{num}{real}/g },
334               all   => sub { print if $::DEBUG };
335
336       Filtering only the code parts of source code
337
338       Most source code ceases to be grammatically correct when it is broken
339       up into the pieces between string literals and regexes. So the 'code'
340       and 'code_no_comments' component filter behave slightly differently
341       from the other partial filters described in the previous section.
342
343       Rather than calling the specified processor on each individual piece of
344       code (i.e. on the bits between quotelikes), the 'code...' partial fil‐
345       ters operate on the entire source code, but with the quotelike bits
346       (and, in the case of 'code_no_comments', the comments) "blanked out".
347
348       That is, a 'code...' filter replaces each quoted string, quotelike,
349       regex, POD, and __DATA__ section with a placeholder. The delimiters of
350       this placeholder are the contents of the $; variable at the time the
351       filter is applied (normally "\034"). The remaining four bytes are a
352       unique identifier for the component being replaced.
353
354       This approach makes it comparatively easy to write code preprocessors
355       without worrying about the form or contents of strings, regexes, etc.
356
357       For convenience, during a 'code...' filtering operation, Filter::Simple
358       provides a package variable ($Filter::Simple::placeholder) that con‐
359       tains a pre-compiled regex that matches any placeholder...and captures
360       the identifier within the placeholder. Placeholders can be moved and
361       re-ordered within the source code as needed.
362
363       In addition, a second package variable (@Filter::Simple::components)
364       contains a list of the various pieces of $_, as they were originally
365       split up to allow placeholders to be inserted.
366
367       Once the filtering has been applied, the original strings, regexes,
368       POD, etc. are re-inserted into the code, by replacing each placeholder
369       with the corresponding original component (from @components). Note that
370       this means that the @components variable must be treated with extreme
371       care within the filter. The @components array stores the "back- trans‐
372       lations" of each placeholder inserted into $_, as well as the intersti‐
373       tial source code between placeholders. If the placeholder backtransla‐
374       tions are altered in @components, they will be similarly changed when
375       the placeholders are removed from $_ after the filter is complete.
376
377       For example, the following filter detects concatentated pairs of
378       strings/quotelikes and reverses the order in which they are concate‐
379       nated:
380
381           package DemoRevCat;
382           use Filter::Simple;
383
384           FILTER_ONLY code => sub {
385               my $ph = $Filter::Simple::placeholder;
386               s{ ($ph) \s* [.] \s* ($ph) }{ $2.$1 }gx
387           };
388
389       Thus, the following code:
390
391           use DemoRevCat;
392
393           my $str = "abc" . q(def);
394
395           print "$str\n";
396
397       would become:
398
399           my $str = q(def)."abc";
400
401           print "$str\n";
402
403       and hence print:
404
405           defabc
406
407       Using Filter::Simple with an explicit "import" subroutine
408
409       Filter::Simple generates a special "import" subroutine for your module
410       (see "How it works") which would normally replace any "import" subrou‐
411       tine you might have explicitly declared.
412
413       However, Filter::Simple is smart enough to notice your existing
414       "import" and Do The Right Thing with it.  That is, if you explicitly
415       define an "import" subroutine in a package that's using Filter::Simple,
416       that "import" subroutine will still be invoked immediately after any
417       filter you install.
418
419       The only thing you have to remember is that the "import" subroutine
420       must be declared before the filter is installed. If you use "FILTER" to
421       install the filter:
422
423           package Filter::TurnItUpTo11;
424
425           use Filter::Simple;
426
427           FILTER { s/(\w+)/\U$1/ };
428
429       that will almost never be a problem, but if you install a filtering
430       subroutine by passing it directly to the "use Filter::Simple" state‐
431       ment:
432
433           package Filter::TurnItUpTo11;
434
435           use Filter::Simple sub{ s/(\w+)/\U$1/ };
436
437       then you must make sure that your "import" subroutine appears before
438       that "use" statement.
439
440       Using Filter::Simple and Exporter together
441
442       Likewise, Filter::Simple is also smart enough to Do The Right Thing if
443       you use Exporter:
444
445           package Switch;
446           use base Exporter;
447           use Filter::Simple;
448
449           @EXPORT    = qw(switch case);
450           @EXPORT_OK = qw(given  when);
451
452           FILTER { $_ = magic_Perl_filter($_) }
453
454       Immediately after the filter has been applied to the source, Fil‐
455       ter::Simple will pass control to Exporter, so it can do its magic too.
456
457       Of course, here too, Filter::Simple has to know you're using Exporter
458       before it applies the filter. That's almost never a problem, but if
459       you're nervous about it, you can guarantee that things will work cor‐
460       rectly by ensuring that your "use base Exporter" always precedes your
461       "use Filter::Simple".
462
463       How it works
464
465       The Filter::Simple module exports into the package that calls "FILTER"
466       (or "use"s it directly) -- such as package "BANG" in the above example
467       -- two automagically constructed subroutines -- "import" and "unimport"
468       -- which take care of all the nasty details.
469
470       In addition, the generated "import" subroutine passes its own argument
471       list to the filtering subroutine, so the BANG.pm filter could easily be
472       made parametric:
473
474           package BANG;
475
476           use Filter::Simple;
477
478           FILTER {
479               my ($die_msg, $var_name) = @_;
480               s/BANG\s+BANG/die '$die_msg' if \${$var_name}/g;
481           };
482
483           # and in some user code:
484
485           use BANG "BOOM", "BAM";  # "BANG BANG" becomes: die 'BOOM' if $BAM
486
487       The specified filtering subroutine is called every time a "use BANG" is
488       encountered, and passed all the source code following that call, up to
489       either the next "no BANG;" (or whatever terminator you've set) or the
490       end of the source file, whichever occurs first. By default, any "no
491       BANG;" call must appear by itself on a separate line, or it is ignored.
492

AUTHOR

494       Damian Conway (damian@conway.org)
495
497           Copyright (c) 2000-2001, Damian Conway. All Rights Reserved.
498           This module is free software. It may be used, redistributed
499           and/or modified under the same terms as Perl itself.
500
501
502
503perl v5.8.8                       2001-09-21               Filter::Simple(3pm)
Impressum