1Filter::Simple(3pm) Perl Programmers Reference Guide Filter::Simple(3pm)
2
3
4
6 Filter::Simple - Simplified source filtering
7
9 # in MyFilter.pm:
10
11 package MyFilter;
12
13 use Filter::Simple;
14
15 FILTER { ... };
16
17 # or just:
18 #
19 # use Filter::Simple sub { ... };
20
21 # in user's code:
22
23 use MyFilter;
24
25 # this code is filtered
26
27 no MyFilter;
28
29 # this code is not
30
32 The Problem
33 Source filtering is an immensely powerful feature of recent versions of
34 Perl. It allows one to extend the language itself (e.g. the Switch
35 module), to simplify the language (e.g. Language::Pythonesque), or to
36 completely recast the language (e.g. Lingua::Romana::Perligata).
37 Effectively, it allows one to use the full power of Perl as its own,
38 recursively applied, macro language.
39
40 The excellent Filter::Util::Call module (by Paul Marquess) provides a
41 usable Perl interface to source filtering, but it is often too powerful
42 and not nearly as simple as it could be.
43
44 To use the module it is necessary to do the following:
45
46 1. Download, build, and install the Filter::Util::Call module. (If
47 you have Perl 5.7.1 or later, this is already done for you.)
48
49 2. Set up a module that does a "use Filter::Util::Call".
50
51 3. Within that module, create an "import" subroutine.
52
53 4. Within the "import" subroutine do a call to "filter_add", passing
54 it either a subroutine reference.
55
56 5. Within the subroutine reference, call "filter_read" or
57 "filter_read_exact" to "prime" $_ with source code data from the
58 source file that will "use" your module. Check the status value
59 returned to see if any source code was actually read in.
60
61 6. Process the contents of $_ to change the source code in the desired
62 manner.
63
64 7. Return the status value.
65
66 8. If the act of unimporting your module (via a "no") should cause
67 source code filtering to cease, create an "unimport" subroutine,
68 and have it call "filter_del". Make sure that the call to
69 "filter_read" or "filter_read_exact" in step 5 will not
70 accidentally read past the "no". Effectively this limits source
71 code filters to line-by-line operation, unless the "import"
72 subroutine does some fancy pre-pre-parsing of the source code it's
73 filtering.
74
75 For example, here is a minimal source code filter in a module named
76 BANG.pm. It simply converts every occurrence of the sequence
77 "BANG\s+BANG" to the sequence "die 'BANG' if $BANG" in any piece of
78 code following a "use BANG;" statement (until the next "no BANG;"
79 statement, if any):
80
81 package BANG;
82
83 use Filter::Util::Call ;
84
85 sub import {
86 filter_add( sub {
87 my $caller = caller;
88 my ($status, $no_seen, $data);
89 while ($status = filter_read()) {
90 if (/^\s*no\s+$caller\s*;\s*?$/) {
91 $no_seen=1;
92 last;
93 }
94 $data .= $_;
95 $_ = "";
96 }
97 $_ = $data;
98 s/BANG\s+BANG/die 'BANG' if \$BANG/g
99 unless $status < 0;
100 $_ .= "no $class;\n" if $no_seen;
101 return 1;
102 })
103 }
104
105 sub unimport {
106 filter_del();
107 }
108
109 1 ;
110
111 This level of sophistication puts filtering out of the reach of many
112 programmers.
113
114 A Solution
115 The Filter::Simple module provides a simplified interface to
116 Filter::Util::Call; one that is sufficient for most common cases.
117
118 Instead of the above process, with Filter::Simple the task of setting
119 up a source code filter is reduced to:
120
121 1. Download and install the Filter::Simple module. (If you have Perl
122 5.7.1 or later, this is already done for you.)
123
124 2. Set up a module that does a "use Filter::Simple" and then calls
125 "FILTER { ... }".
126
127 3. Within the anonymous subroutine or block that is passed to
128 "FILTER", process the contents of $_ to change the source code in
129 the desired manner.
130
131 In other words, the previous example, would become:
132
133 package BANG;
134 use Filter::Simple;
135
136 FILTER {
137 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
138 };
139
140 1 ;
141
142 Note that the source code is passed as a single string, so any regex
143 that uses "^" or "$" to detect line boundaries will need the "/m" flag.
144
145 Disabling or changing <no> behaviour
146 By default, the installed filter only filters up to a line consisting
147 of one of the three standard source "terminators":
148
149 no ModuleName; # optional comment
150
151 or:
152
153 __END__
154
155 or:
156
157 __DATA__
158
159 but this can be altered by passing a second argument to "use
160 Filter::Simple" or "FILTER" (just remember: there's no comma after the
161 initial block when you use "FILTER").
162
163 That second argument may be either a "qr"'d regular expression (which
164 is then used to match the terminator line), or a defined false value
165 (which indicates that no terminator line should be looked for), or a
166 reference to a hash (in which case the terminator is the value
167 associated with the key 'terminator'.
168
169 For example, to cause the previous filter to filter only up to a line
170 of the form:
171
172 GNAB esu;
173
174 you would write:
175
176 package BANG;
177 use Filter::Simple;
178
179 FILTER {
180 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
181 }
182 qr/^\s*GNAB\s+esu\s*;\s*?$/;
183
184 or:
185
186 FILTER {
187 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
188 }
189 { terminator => qr/^\s*GNAB\s+esu\s*;\s*?$/ };
190
191 and to prevent the filter's being turned off in any way:
192
193 package BANG;
194 use Filter::Simple;
195
196 FILTER {
197 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
198 }
199 ""; # or: 0
200
201 or:
202
203 FILTER {
204 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
205 }
206 { terminator => "" };
207
208 Note that, no matter what you set the terminator pattern to, the actual
209 terminator itself must be contained on a single source line.
210
211 All-in-one interface
212 Separating the loading of Filter::Simple:
213
214 use Filter::Simple;
215
216 from the setting up of the filtering:
217
218 FILTER { ... };
219
220 is useful because it allows other code (typically parser support code
221 or caching variables) to be defined before the filter is invoked.
222 However, there is often no need for such a separation.
223
224 In those cases, it is easier to just append the filtering subroutine
225 and any terminator specification directly to the "use" statement that
226 loads Filter::Simple, like so:
227
228 use Filter::Simple sub {
229 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
230 };
231
232 This is exactly the same as:
233
234 use Filter::Simple;
235 BEGIN {
236 Filter::Simple::FILTER {
237 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
238 };
239 }
240
241 except that the "FILTER" subroutine is not exported by Filter::Simple.
242
243 Filtering only specific components of source code
244 One of the problems with a filter like:
245
246 use Filter::Simple;
247
248 FILTER { s/BANG\s+BANG/die 'BANG' if \$BANG/g };
249
250 is that it indiscriminately applies the specified transformation to the
251 entire text of your source program. So something like:
252
253 warn 'BANG BANG, YOU'RE DEAD';
254 BANG BANG;
255
256 will become:
257
258 warn 'die 'BANG' if $BANG, YOU'RE DEAD';
259 die 'BANG' if $BANG;
260
261 It is very common when filtering source to only want to apply the
262 filter to the non-character-string parts of the code, or alternatively
263 to only the character strings.
264
265 Filter::Simple supports this type of filtering by automatically
266 exporting the "FILTER_ONLY" subroutine.
267
268 "FILTER_ONLY" takes a sequence of specifiers that install separate (and
269 possibly multiple) filters that act on only parts of the source code.
270 For example:
271
272 use Filter::Simple;
273
274 FILTER_ONLY
275 code => sub { s/BANG\s+BANG/die 'BANG' if \$BANG/g },
276 quotelike => sub { s/BANG\s+BANG/CHITTY CHITTY/g };
277
278 The "code" subroutine will only be used to filter parts of the source
279 code that are not quotelikes, POD, or "__DATA__". The "quotelike"
280 subroutine only filters Perl quotelikes (including here documents).
281
282 The full list of alternatives is:
283
284 "code"
285 Filters only those sections of the source code that are not
286 quotelikes, POD, or "__DATA__".
287
288 "code_no_comments"
289 Filters only those sections of the source code that are not
290 quotelikes, POD, comments, or "__DATA__".
291
292 "executable"
293 Filters only those sections of the source code that are not POD or
294 "__DATA__".
295
296 "executable_no_comments"
297 Filters only those sections of the source code that are not POD,
298 comments, or "__DATA__".
299
300 "quotelike"
301 Filters only Perl quotelikes (as interpreted by
302 &Text::Balanced::extract_quotelike).
303
304 "string"
305 Filters only the string literal parts of a Perl quotelike (i.e. the
306 contents of a string literal, either half of a "tr///", the second
307 half of an "s///").
308
309 "regex"
310 Filters only the pattern literal parts of a Perl quotelike (i.e.
311 the contents of a "qr//" or an "m//", the first half of an "s///").
312
313 "all"
314 Filters everything. Identical in effect to "FILTER".
315
316 Except for "FILTER_ONLY code => sub {...}", each of the component
317 filters is called repeatedly, once for each component found in the
318 source code.
319
320 Note that you can also apply two or more of the same type of filter in
321 a single "FILTER_ONLY". For example, here's a simple macro-preprocessor
322 that is only applied within regexes, with a final debugging pass that
323 prints the resulting source code:
324
325 use Regexp::Common;
326 FILTER_ONLY
327 regex => sub { s/!\[/[^/g },
328 regex => sub { s/%d/$RE{num}{int}/g },
329 regex => sub { s/%f/$RE{num}{real}/g },
330 all => sub { print if $::DEBUG };
331
332 Filtering only the code parts of source code
333 Most source code ceases to be grammatically correct when it is broken
334 up into the pieces between string literals and regexes. So the 'code'
335 and 'code_no_comments' component filter behave slightly differently
336 from the other partial filters described in the previous section.
337
338 Rather than calling the specified processor on each individual piece of
339 code (i.e. on the bits between quotelikes), the 'code...' partial
340 filters operate on the entire source code, but with the quotelike bits
341 (and, in the case of 'code_no_comments', the comments) "blanked out".
342
343 That is, a 'code...' filter replaces each quoted string, quotelike,
344 regex, POD, and __DATA__ section with a placeholder. The delimiters of
345 this placeholder are the contents of the $; variable at the time the
346 filter is applied (normally "\034"). The remaining four bytes are a
347 unique identifier for the component being replaced.
348
349 This approach makes it comparatively easy to write code preprocessors
350 without worrying about the form or contents of strings, regexes, etc.
351
352 For convenience, during a 'code...' filtering operation, Filter::Simple
353 provides a package variable ($Filter::Simple::placeholder) that
354 contains a pre-compiled regex that matches any placeholder...and
355 captures the identifier within the placeholder. Placeholders can be
356 moved and re-ordered within the source code as needed.
357
358 In addition, a second package variable (@Filter::Simple::components)
359 contains a list of the various pieces of $_, as they were originally
360 split up to allow placeholders to be inserted.
361
362 Once the filtering has been applied, the original strings, regexes,
363 POD, etc. are re-inserted into the code, by replacing each placeholder
364 with the corresponding original component (from @components). Note that
365 this means that the @components variable must be treated with extreme
366 care within the filter. The @components array stores the "back-
367 translations" of each placeholder inserted into $_, as well as the
368 interstitial source code between placeholders. If the placeholder
369 backtranslations are altered in @components, they will be similarly
370 changed when the placeholders are removed from $_ after the filter is
371 complete.
372
373 For example, the following filter detects concatenated pairs of
374 strings/quotelikes and reverses the order in which they are
375 concatenated:
376
377 package DemoRevCat;
378 use Filter::Simple;
379
380 FILTER_ONLY code => sub {
381 my $ph = $Filter::Simple::placeholder;
382 s{ ($ph) \s* [.] \s* ($ph) }{ $2.$1 }gx
383 };
384
385 Thus, the following code:
386
387 use DemoRevCat;
388
389 my $str = "abc" . q(def);
390
391 print "$str\n";
392
393 would become:
394
395 my $str = q(def)."abc";
396
397 print "$str\n";
398
399 and hence print:
400
401 defabc
402
403 Using Filter::Simple with an explicit "import" subroutine
404 Filter::Simple generates a special "import" subroutine for your module
405 (see "How it works") which would normally replace any "import"
406 subroutine you might have explicitly declared.
407
408 However, Filter::Simple is smart enough to notice your existing
409 "import" and Do The Right Thing with it. That is, if you explicitly
410 define an "import" subroutine in a package that's using Filter::Simple,
411 that "import" subroutine will still be invoked immediately after any
412 filter you install.
413
414 The only thing you have to remember is that the "import" subroutine
415 must be declared before the filter is installed. If you use "FILTER" to
416 install the filter:
417
418 package Filter::TurnItUpTo11;
419
420 use Filter::Simple;
421
422 FILTER { s/(\w+)/\U$1/ };
423
424 that will almost never be a problem, but if you install a filtering
425 subroutine by passing it directly to the "use Filter::Simple"
426 statement:
427
428 package Filter::TurnItUpTo11;
429
430 use Filter::Simple sub{ s/(\w+)/\U$1/ };
431
432 then you must make sure that your "import" subroutine appears before
433 that "use" statement.
434
435 Using Filter::Simple and Exporter together
436 Likewise, Filter::Simple is also smart enough to Do The Right Thing if
437 you use Exporter:
438
439 package Switch;
440 use base Exporter;
441 use Filter::Simple;
442
443 @EXPORT = qw(switch case);
444 @EXPORT_OK = qw(given when);
445
446 FILTER { $_ = magic_Perl_filter($_) }
447
448 Immediately after the filter has been applied to the source,
449 Filter::Simple will pass control to Exporter, so it can do its magic
450 too.
451
452 Of course, here too, Filter::Simple has to know you're using Exporter
453 before it applies the filter. That's almost never a problem, but if
454 you're nervous about it, you can guarantee that things will work
455 correctly by ensuring that your "use base Exporter" always precedes
456 your "use Filter::Simple".
457
458 How it works
459 The Filter::Simple module exports into the package that calls "FILTER"
460 (or "use"s it directly) -- such as package "BANG" in the above example
461 -- two automagically constructed subroutines -- "import" and "unimport"
462 -- which take care of all the nasty details.
463
464 In addition, the generated "import" subroutine passes its own argument
465 list to the filtering subroutine, so the BANG.pm filter could easily be
466 made parametric:
467
468 package BANG;
469
470 use Filter::Simple;
471
472 FILTER {
473 my ($die_msg, $var_name) = @_;
474 s/BANG\s+BANG/die '$die_msg' if \${$var_name}/g;
475 };
476
477 # and in some user code:
478
479 use BANG "BOOM", "BAM"; # "BANG BANG" becomes: die 'BOOM' if $BAM
480
481 The specified filtering subroutine is called every time a "use BANG" is
482 encountered, and passed all the source code following that call, up to
483 either the next "no BANG;" (or whatever terminator you've set) or the
484 end of the source file, whichever occurs first. By default, any "no
485 BANG;" call must appear by itself on a separate line, or it is ignored.
486
488 Damian Conway
489
491 Filter::Simple is now maintained by the Perl5-Porters. Please submit
492 bug via the "perlbug" tool that comes with your perl. For usage
493 instructions, read "perldoc perlbug" or possibly "man perlbug". For
494 mostly anything else, please contact <perl5-porters@perl.org>.
495
496 Maintainer of the CPAN release is Steffen Mueller <smueller@cpan.org>.
497 Contact him with technical difficulties with respect to the packaging
498 of the CPAN module.
499
500 Praise of the module, flowers, and presents still go to the author,
501 Damian Conway <damian@conway.org>.
502
504 Copyright (c) 2000-2008, Damian Conway. All Rights Reserved.
505 This module is free software. It may be used, redistributed
506 and/or modified under the same terms as Perl itself.
507
508
509
510perl v5.10.1 2009-04-11 Filter::Simple(3pm)