1Filter::Simple(3pm) Perl Programmers Reference Guide Filter::Simple(3pm)
2
3
4
6 Filter::Simple - Simplified source filtering
7
9 # in MyFilter.pm:
10
11 package MyFilter;
12
13 use Filter::Simple;
14
15 FILTER { ... };
16
17 # or just:
18 #
19 # use Filter::Simple sub { ... };
20
21 # in user's code:
22
23 use MyFilter;
24
25 # this code is filtered
26
27 no MyFilter;
28
29 # this code is not
30
32 The Problem
33
34 Source filtering is an immensely powerful feature of recent versions of
35 Perl. It allows one to extend the language itself (e.g. the Switch
36 module), to simplify the language (e.g. Language::Pythonesque), or to
37 completely recast the language (e.g. Lingua::Romana::Perligata). Effec‐
38 tively, it allows one to use the full power of Perl as its own, recur‐
39 sively applied, macro language.
40
41 The excellent Filter::Util::Call module (by Paul Marquess) provides a
42 usable Perl interface to source filtering, but it is often too powerful
43 and not nearly as simple as it could be.
44
45 To use the module it is necessary to do the following:
46
47 1. Download, build, and install the Filter::Util::Call module. (If
48 you have Perl 5.7.1 or later, this is already done for you.)
49
50 2. Set up a module that does a "use Filter::Util::Call".
51
52 3. Within that module, create an "import" subroutine.
53
54 4. Within the "import" subroutine do a call to "filter_add", passing
55 it either a subroutine reference.
56
57 5. Within the subroutine reference, call "filter_read" or "fil‐
58 ter_read_exact" to "prime" $_ with source code data from the source
59 file that will "use" your module. Check the status value returned
60 to see if any source code was actually read in.
61
62 6. Process the contents of $_ to change the source code in the desired
63 manner.
64
65 7. Return the status value.
66
67 8. If the act of unimporting your module (via a "no") should cause
68 source code filtering to cease, create an "unimport" subroutine,
69 and have it call "filter_del". Make sure that the call to "fil‐
70 ter_read" or "filter_read_exact" in step 5 will not accidentally
71 read past the "no". Effectively this limits source code filters to
72 line-by-line operation, unless the "import" subroutine does some
73 fancy pre-pre-parsing of the source code it's filtering.
74
75 For example, here is a minimal source code filter in a module named
76 BANG.pm. It simply converts every occurrence of the sequence
77 "BANG\s+BANG" to the sequence "die 'BANG' if $BANG" in any piece of
78 code following a "use BANG;" statement (until the next "no BANG;"
79 statement, if any):
80
81 package BANG;
82
83 use Filter::Util::Call ;
84
85 sub import {
86 filter_add( sub {
87 my $caller = caller;
88 my ($status, $no_seen, $data);
89 while ($status = filter_read()) {
90 if (/^\s*no\s+$caller\s*;\s*?$/) {
91 $no_seen=1;
92 last;
93 }
94 $data .= $_;
95 $_ = "";
96 }
97 $_ = $data;
98 s/BANG\s+BANG/die 'BANG' if \$BANG/g
99 unless $status < 0;
100 $_ .= "no $class;\n" if $no_seen;
101 return 1;
102 })
103 }
104
105 sub unimport {
106 filter_del();
107 }
108
109 1 ;
110
111 This level of sophistication puts filtering out of the reach of many
112 programmers.
113
114 A Solution
115
116 The Filter::Simple module provides a simplified interface to Fil‐
117 ter::Util::Call; one that is sufficient for most common cases.
118
119 Instead of the above process, with Filter::Simple the task of setting
120 up a source code filter is reduced to:
121
122 1. Download and install the Filter::Simple module. (If you have Perl
123 5.7.1 or later, this is already done for you.)
124
125 2. Set up a module that does a "use Filter::Simple" and then calls
126 "FILTER { ... }".
127
128 3. Within the anonymous subroutine or block that is passed to "FIL‐
129 TER", process the contents of $_ to change the source code in the
130 desired manner.
131
132 In other words, the previous example, would become:
133
134 package BANG;
135 use Filter::Simple;
136
137 FILTER {
138 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
139 };
140
141 1 ;
142
143 Note that the source code is passed as a single string, so any regex
144 that uses "^" or "$" to detect line boundaries will need the "/m" flag.
145
146 Disabling or changing <no> behaviour
147
148 By default, the installed filter only filters up to a line consisting
149 of one of the three standard source "terminators":
150
151 no ModuleName; # optional comment
152
153 or:
154
155 __END__
156
157 or:
158
159 __DATA__
160
161 but this can be altered by passing a second argument to "use Fil‐
162 ter::Simple" or "FILTER" (just remember: there's no comma after the
163 initial block when you use "FILTER").
164
165 That second argument may be either a "qr"'d regular expression (which
166 is then used to match the terminator line), or a defined false value
167 (which indicates that no terminator line should be looked for), or a
168 reference to a hash (in which case the terminator is the value associ‐
169 ated with the key 'terminator'.
170
171 For example, to cause the previous filter to filter only up to a line
172 of the form:
173
174 GNAB esu;
175
176 you would write:
177
178 package BANG;
179 use Filter::Simple;
180
181 FILTER {
182 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
183 }
184 qr/^\s*GNAB\s+esu\s*;\s*?$/;
185
186 or:
187
188 FILTER {
189 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
190 }
191 { terminator => qr/^\s*GNAB\s+esu\s*;\s*?$/ };
192
193 and to prevent the filter's being turned off in any way:
194
195 package BANG;
196 use Filter::Simple;
197
198 FILTER {
199 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
200 }
201 ""; # or: 0
202
203 or:
204
205 FILTER {
206 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
207 }
208 { terminator => "" };
209
210 Note that, no matter what you set the terminator pattern to, the actual
211 terminator itself must be contained on a single source line.
212
213 All-in-one interface
214
215 Separating the loading of Filter::Simple:
216
217 use Filter::Simple;
218
219 from the setting up of the filtering:
220
221 FILTER { ... };
222
223 is useful because it allows other code (typically parser support code
224 or caching variables) to be defined before the filter is invoked. How‐
225 ever, there is often no need for such a separation.
226
227 In those cases, it is easier to just append the filtering subroutine
228 and any terminator specification directly to the "use" statement that
229 loads Filter::Simple, like so:
230
231 use Filter::Simple sub {
232 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
233 };
234
235 This is exactly the same as:
236
237 use Filter::Simple;
238 BEGIN {
239 Filter::Simple::FILTER {
240 s/BANG\s+BANG/die 'BANG' if \$BANG/g;
241 };
242 }
243
244 except that the "FILTER" subroutine is not exported by Filter::Simple.
245
246 Filtering only specific components of source code
247
248 One of the problems with a filter like:
249
250 use Filter::Simple;
251
252 FILTER { s/BANG\s+BANG/die 'BANG' if \$BANG/g };
253
254 is that it indiscriminately applies the specified transformation to the
255 entire text of your source program. So something like:
256
257 warn 'BANG BANG, YOU'RE DEAD';
258 BANG BANG;
259
260 will become:
261
262 warn 'die 'BANG' if $BANG, YOU'RE DEAD';
263 die 'BANG' if $BANG;
264
265 It is very common when filtering source to only want to apply the fil‐
266 ter to the non-character-string parts of the code, or alternatively to
267 only the character strings.
268
269 Filter::Simple supports this type of filtering by automatically export‐
270 ing the "FILTER_ONLY" subroutine.
271
272 "FILTER_ONLY" takes a sequence of specifiers that install separate (and
273 possibly multiple) filters that act on only parts of the source code.
274 For example:
275
276 use Filter::Simple;
277
278 FILTER_ONLY
279 code => sub { s/BANG\s+BANG/die 'BANG' if \$BANG/g },
280 quotelike => sub { s/BANG\s+BANG/CHITTY CHITTY/g };
281
282 The "code" subroutine will only be used to filter parts of the source
283 code that are not quotelikes, POD, or "__DATA__". The "quotelike" sub‐
284 routine only filters Perl quotelikes (including here documents).
285
286 The full list of alternatives is:
287
288 "code"
289 Filters only those sections of the source code that are not quote‐
290 likes, POD, or "__DATA__".
291
292 "code_no_comments"
293 Filters only those sections of the source code that are not quote‐
294 likes, POD, comments, or "__DATA__".
295
296 "executable"
297 Filters only those sections of the source code that are not POD or
298 "__DATA__".
299
300 "executable_no_comments"
301 Filters only those sections of the source code that are not POD,
302 comments, or "__DATA__".
303
304 "quotelike"
305 Filters only Perl quotelikes (as interpreted by &Text::Bal‐
306 anced::extract_quotelike).
307
308 "string"
309 Filters only the string literal parts of a Perl quotelike (i.e. the
310 contents of a string literal, either half of a "tr///", the second
311 half of an "s///").
312
313 "regex"
314 Filters only the pattern literal parts of a Perl quotelike (i.e.
315 the contents of a "qr//" or an "m//", the first half of an "s///").
316
317 "all"
318 Filters everything. Identical in effect to "FILTER".
319
320 Except for "FILTER_ONLY code => sub {...}", each of the component fil‐
321 ters is called repeatedly, once for each component found in the source
322 code.
323
324 Note that you can also apply two or more of the same type of filter in
325 a single "FILTER_ONLY". For example, here's a simple macro-preprocessor
326 that is only applied within regexes, with a final debugging pass that
327 prints the resulting source code:
328
329 use Regexp::Common;
330 FILTER_ONLY
331 regex => sub { s/!\[/[^/g },
332 regex => sub { s/%d/$RE{num}{int}/g },
333 regex => sub { s/%f/$RE{num}{real}/g },
334 all => sub { print if $::DEBUG };
335
336 Filtering only the code parts of source code
337
338 Most source code ceases to be grammatically correct when it is broken
339 up into the pieces between string literals and regexes. So the 'code'
340 and 'code_no_comments' component filter behave slightly differently
341 from the other partial filters described in the previous section.
342
343 Rather than calling the specified processor on each individual piece of
344 code (i.e. on the bits between quotelikes), the 'code...' partial fil‐
345 ters operate on the entire source code, but with the quotelike bits
346 (and, in the case of 'code_no_comments', the comments) "blanked out".
347
348 That is, a 'code...' filter replaces each quoted string, quotelike,
349 regex, POD, and __DATA__ section with a placeholder. The delimiters of
350 this placeholder are the contents of the $; variable at the time the
351 filter is applied (normally "\034"). The remaining four bytes are a
352 unique identifier for the component being replaced.
353
354 This approach makes it comparatively easy to write code preprocessors
355 without worrying about the form or contents of strings, regexes, etc.
356
357 For convenience, during a 'code...' filtering operation, Filter::Simple
358 provides a package variable ($Filter::Simple::placeholder) that con‐
359 tains a pre-compiled regex that matches any placeholder...and captures
360 the identifier within the placeholder. Placeholders can be moved and
361 re-ordered within the source code as needed.
362
363 In addition, a second package variable (@Filter::Simple::components)
364 contains a list of the various pieces of $_, as they were originally
365 split up to allow placeholders to be inserted.
366
367 Once the filtering has been applied, the original strings, regexes,
368 POD, etc. are re-inserted into the code, by replacing each placeholder
369 with the corresponding original component (from @components). Note that
370 this means that the @components variable must be treated with extreme
371 care within the filter. The @components array stores the "back- trans‐
372 lations" of each placeholder inserted into $_, as well as the intersti‐
373 tial source code between placeholders. If the placeholder backtransla‐
374 tions are altered in @components, they will be similarly changed when
375 the placeholders are removed from $_ after the filter is complete.
376
377 For example, the following filter detects concatentated pairs of
378 strings/quotelikes and reverses the order in which they are concate‐
379 nated:
380
381 package DemoRevCat;
382 use Filter::Simple;
383
384 FILTER_ONLY code => sub {
385 my $ph = $Filter::Simple::placeholder;
386 s{ ($ph) \s* [.] \s* ($ph) }{ $2.$1 }gx
387 };
388
389 Thus, the following code:
390
391 use DemoRevCat;
392
393 my $str = "abc" . q(def);
394
395 print "$str\n";
396
397 would become:
398
399 my $str = q(def)."abc";
400
401 print "$str\n";
402
403 and hence print:
404
405 defabc
406
407 Using Filter::Simple with an explicit "import" subroutine
408
409 Filter::Simple generates a special "import" subroutine for your module
410 (see "How it works") which would normally replace any "import" subrou‐
411 tine you might have explicitly declared.
412
413 However, Filter::Simple is smart enough to notice your existing
414 "import" and Do The Right Thing with it. That is, if you explicitly
415 define an "import" subroutine in a package that's using Filter::Simple,
416 that "import" subroutine will still be invoked immediately after any
417 filter you install.
418
419 The only thing you have to remember is that the "import" subroutine
420 must be declared before the filter is installed. If you use "FILTER" to
421 install the filter:
422
423 package Filter::TurnItUpTo11;
424
425 use Filter::Simple;
426
427 FILTER { s/(\w+)/\U$1/ };
428
429 that will almost never be a problem, but if you install a filtering
430 subroutine by passing it directly to the "use Filter::Simple" state‐
431 ment:
432
433 package Filter::TurnItUpTo11;
434
435 use Filter::Simple sub{ s/(\w+)/\U$1/ };
436
437 then you must make sure that your "import" subroutine appears before
438 that "use" statement.
439
440 Using Filter::Simple and Exporter together
441
442 Likewise, Filter::Simple is also smart enough to Do The Right Thing if
443 you use Exporter:
444
445 package Switch;
446 use base Exporter;
447 use Filter::Simple;
448
449 @EXPORT = qw(switch case);
450 @EXPORT_OK = qw(given when);
451
452 FILTER { $_ = magic_Perl_filter($_) }
453
454 Immediately after the filter has been applied to the source, Fil‐
455 ter::Simple will pass control to Exporter, so it can do its magic too.
456
457 Of course, here too, Filter::Simple has to know you're using Exporter
458 before it applies the filter. That's almost never a problem, but if
459 you're nervous about it, you can guarantee that things will work cor‐
460 rectly by ensuring that your "use base Exporter" always precedes your
461 "use Filter::Simple".
462
463 How it works
464
465 The Filter::Simple module exports into the package that calls "FILTER"
466 (or "use"s it directly) -- such as package "BANG" in the above example
467 -- two automagically constructed subroutines -- "import" and "unimport"
468 -- which take care of all the nasty details.
469
470 In addition, the generated "import" subroutine passes its own argument
471 list to the filtering subroutine, so the BANG.pm filter could easily be
472 made parametric:
473
474 package BANG;
475
476 use Filter::Simple;
477
478 FILTER {
479 my ($die_msg, $var_name) = @_;
480 s/BANG\s+BANG/die '$die_msg' if \${$var_name}/g;
481 };
482
483 # and in some user code:
484
485 use BANG "BOOM", "BAM"; # "BANG BANG" becomes: die 'BOOM' if $BAM
486
487 The specified filtering subroutine is called every time a "use BANG" is
488 encountered, and passed all the source code following that call, up to
489 either the next "no BANG;" (or whatever terminator you've set) or the
490 end of the source file, whichever occurs first. By default, any "no
491 BANG;" call must appear by itself on a separate line, or it is ignored.
492
494 Damian Conway (damian@conway.org)
495
497 Copyright (c) 2000-2001, Damian Conway. All Rights Reserved.
498 This module is free software. It may be used, redistributed
499 and/or modified under the same terms as Perl itself.
500
501
502
503perl v5.8.8 2001-09-21 Filter::Simple(3pm)