1Pegex::API(3)         User Contributed Perl Documentation        Pegex::API(3)
2
3
4

The Pegex API

6       Pegex can be used in many ways: inside scripts, from the command line
7       or as the foundation of a modular parsing framework. This document
8       details the various ways to use Pegex.
9
10       At the most abstract level, Pegex works like this:
11
12           $result = $parser->new($grammar, $receiver)->parse($input);
13
14       Which is to say, abstractly: a Pegex parser, under the direction of a
15       Pegex grammar, parses an input stream, and reports matches to a Pegex
16       receiver, which produces a result.
17
18       The parser, grammar, receiver and even the input, are Pegex objects.
19       These 4 objects are involved in every Pegex parse operation, so let's
20       review them briefly:
21
22       Pegex::Parser
23           The Pegex parsing engine. This engine applies the logic of the
24           grammar to an input text. A parser object contains a grammar object
25           and a receiver object. Its primary method is called "parse". The
26           default parser engine is non-backtracking, recursive descent.
27           However there are parser subclasses for various alternative types
28           of parsing.
29
30       Pegex::Grammar
31           A Pegex grammar starts as a text file/string composed in the Pegex
32           syntax.  Before it can be used in by a Parser it must be compiled.
33           After compilation, it is turned into a data tree consisting of
34           rules and regexes. In modules that are based on a Pegex grammar,
35           the grammar will be compiled into a class file.  Pegex itself, uses
36           a Pegex grammar class called Pegex::Pegex::Grammar to parse various
37           Pegex grammars.
38
39       Pegex::Receiver
40           A parser on it's own has no idea what to do with the text it
41           matches. A Pegex receiver is a class that contains methods
42           corresponding to the rules in a grammar. As a rule in the grammar
43           matches, its corresponding receiver method (if one exists) is
44           called with the data that has been matched. It is the receiver's
45           job to take action on the data, often building it into some new
46           structure. Pegex will use Pegex::Tree::Wrap as the default
47           receiver; it produces a reasonably readable tree of the
48           matched/captured data.
49
50       Pegex::Input
51           Pegex abstracts its input streams into an object interface as well.
52           Any operation that can take an input string, can also take an input
53           object. Pegex will turn regular strings into these objects. This is
54           probably the API concept you will encounter the least, but it is
55           covered here for completeness.
56
57       All of these object classes can be subclassed to achieve various
58       results.  Normally, you will write your own Pegex grammar and a Pegex
59       receiver to achieve a task.
60
61   Starting Simple - The "pegex" Function
62       The Pegex module exports a function called "pegex" that you can use for
63       smaller tasks. Here is an example:
64
65           use Pegex;
66           use YAML;
67
68           $grammar = "
69           expr: num PLUS num
70           num: /( DIGIT+ )/
71           ";
72
73           print Dump pegex($grammar)->parse('2+2');
74
75       This program would produce:
76
77           expr:
78           - num: 2
79           - num: 2
80
81       Let's review what's happening here. The Pegex module is exporting a
82       "pegex" function. This function takes a Pegex grammar string as input.
83       Internally this function compiles the grammar string into a grammar
84       object.  Then it creates a parser object containing the grammar object
85       and returns it.
86
87       The parse method is called on the input string: '2+2'. The string
88       matches, and a nice data structure is returned.
89
90       So how was the data structure created? By the receiver object, of
91       course! But we didn't specify one, did we? Nope. It used the default
92       receiver, Pegex::Tree::Wrap. We could have said:
93
94           print Dump pegex($grammar, 'Pegex::Tree::Wrap')->parse('2+2');
95
96       This receiver basically generates a mapping, where rule names of
97       matches are the keys, and the leaf values are the regex captures.
98
99       The more basic receiver called Pegex::Tree generates a tree of
100       sequences that contain just the data (without the rule names). This
101       code:
102
103           print Dump pegex($grammar, 'Pegex::Tree')->parse('2+2');
104
105       would produce:
106
107           - 2
108           - 2
109
110       If we wrote our own receiver class called "Calculator" like this:
111
112           package Calculator;
113           use base 'Pegex::Tree';
114
115           sub got_expr {
116               my ($receiver, $data) = @_;
117               my ($a, $b) = @$data;
118               return $a + $b;
119           }
120
121       Then, this:
122
123           print pegex(grammar, 'Calculator')->parse('2+2');
124
125       would print:
126
127           4
128
129   More Explicit Usage
130       Continuing with the example above, let's see how to do it a little more
131       formally.
132
133           use Pegex::Parser;
134           use Pegex::Grammar;
135           use Pegex::Tree;
136           use Pegex::Input;
137           use YAML;
138
139           $grammar_text = "
140           expr: num PLUS num
141           num: /( DIGIT+ )/
142           ";
143
144           $grammar = Pegex::Grammar->new(text => $grammar_text);
145           $receiver = Pegex::Tree->new();
146           $parser = Pegex::Parser->new(
147               grammar => $grammar,
148               receiver => $receiver,
149           );
150           $input = Pegex::Input->new(string => '2+2');
151
152           print Dump $parser->parse($input);
153
154       This code does the same thing as the first example, but this time we've
155       made all the objects ourselves.
156
157   Precompiled Grammars
158       If you ship a Pegex grammar as part of a CPAN distribution, you'll want
159       it to be precompiled into a module. Pegex makes that easy.
160
161       Say the grammar_text about is stored in a file called "share/expr.pgx".
162       If you create a module called "lib/MyThing/Grammar.pm" with content
163       like this:
164
165           package MyThing::Grammar;
166           use base 'Pegex::Grammar';
167           use constant file => './share/expr.pgx';
168           sub make_tree {
169           }
170           1;
171
172       Then run this command line:
173
174           perl -Ilib -MMyThing::Grammar=compile
175
176       It will rewrite your module to look something like this:
177
178           package MyThing::Grammar;
179           use base 'Pegex::Grammar';
180           use constant file => './share/expr.pgx';
181           sub make_tree {
182             { '+toprule' => 'expr',
183               'PLUS' => { '.rgx' => qr/\G\+/ },
184               'expr' => {
185                 '.all' => [
186                   { '.ref' => 'num' },
187                   { '.ref' => 'PLUS' },
188                   { '.ref' => 'num' }
189                 ]
190               },
191               'num' => { '.rgx' => qr/\G([0-9]+)/ }
192             }
193           }
194           1;
195
196       This command found the file where your grammar is, compiled it, and
197       used Data::Dumper to output it back into your module's "make_tree"
198       method.
199
200       This is what a compiled Pegex grammar looks like. As soon as this
201       module is loaded, the grammar is ready to be used by Pegex.
202
203       Automatically rebuilding during development with environment variable
204
205       If you find yourself needing to compile your grammar module a lot
206       during development, just set this environment variable like so:
207
208           export PERL_PEGEX_AUTO_COMPILE=MyThing::Grammar
209
210       Now, every time the grammar module is loaded it will check to see if it
211       needs to be recompiled, and do it on the fly.
212
213       If you have more than one grammar to recompile, just list all the names
214       separated by commas.
215
216       Automatically rebuilding during development using "make"
217
218       Alternatively, if your module uses "ExtUtils::MakeMaker", you can have
219       "make" automatically rebuild your "Grammar" class if your ".pgx" file
220       is updated.
221
222       Simply add this at the bottom of your "Makefile.PL":
223
224           sub MY::postamble {
225             <<EOF;
226           lib/MyThing/Grammar.pm : share/expr.pgx
227           \t\$(PERL) -Ilib -MMyThing::Grammar=compile
228           EOF
229           }
230

See Also

232       •   Pegex::Parser
233
234       •   Pegex::Grammar
235
236       •   Pegex::Receiver
237
238       •   Pegex::Tree
239
240       •   Pegex::Tree::Wrap
241
242       •   Pegex::Input
243
244
245
246perl v5.36.0                      2022-07-22                     Pegex::API(3)
Impressum