1Pegex::Tutorial::CalculUasteorr(C3o)ntributed Perl DocumPeengteaxt:i:oTnutorial::Calculator(3)
2
3
4
6 A Pegex Calculator
7
8 When you look around the web for stuff about parsing, you inevitably
9 find a bunch of examples about how to write a precedence parser for
10 mathematical expressions.
11
12 • <http://en.wikipedia.org/wiki/Operator-precedence_parser>
13
14 • <http://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/>
15
16 • <http://www.hokstad.com/operator-precedence-parser.html>
17
18 • <http://www.perlmonks.org/?node_id=554516>
19
20 This tutorial is the Pegex version of that. Pegex actually comes with
21 an examples directory that contains two arithmetic expression
22 parser/evaluator calculator programs:
23
24 • <https://github.com/ingydotnet/pegex-pm/blob/master/eg/calculator/calculator1.pl>
25
26 • <https://github.com/ingydotnet/pegex-pm/blob/master/eg/calculator/calculator2.pl>
27
28 They both do the same thing but using different parsing approaches.
29 We'll cover both of them in detail. I hope you'll find that Pegex
30 handles operator precedence parsing elegantly.
31
33 Precedence parsers are interesting because you need to deal with
34 operators that have the same precedence, different precedences and both
35 left and right associativity.
36
37 Consider the equation:
38
39 1 - 2 ^ 3 ^ 4 + 5 * 6 / 7
40
41 Normal precedence and associativity rules make this the same as:
42
43 (1 - (2 ^ (3 ^ 4)) + ((5 * 6) / 7))
44
45 Our calculator should produce the same result for both. Note that this
46 means we will be parsing numbers, 5 operators, parentheses, and
47 separating whitespace.
48
49 Here's an example of the calculator program running:
50
51 > perl eg/calculator1.pl
52 Enter an equation: 1+2*3
53 1+2*3 = 7
54 Enter an equation: (1 + 2) * 3
55 (1 + 2) * 3 = 9
56 Enter an equation:
57
59 Most of the solutions that you'll read about on the web, involve (or
60 assume) a lexing/tokenizing step before parsing. Pegex always parses an
61 input stream directly, pulling out "tokens" that it needs using regex
62 captures. So the parse happens as one operation, which has many
63 advantages.
64
65 But how do we get the operator precedence rules into this? Well, we
66 have 2 different ways:
67
68 calculator1.pl - Operator Precedence Climbing
69 Note: The code in this example is copy/pasted from "calculator/" in
70 example
71 files. The code in those running files is slightly different but
72 rewritten to make more sense in this doc.
73
74 Our first example calculator uses what is known as the Operator
75 Precedence Climbing method. See:
76 <http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method>.
77
78 This is basically a clever technique of specifying our grammar rules
79 such that they imply precedence. Here's the pegex grammar from the
80 code:
81
82 expr: add-sub
83 add-sub: mul-div+ % /- ( [ '+-' ])/
84 mul-div: power+ % /- ([ '*/' ])/
85 power: token+ % /- '^' /
86 token: /- '(' -/ expr /- ')'/ | number
87 number: /- ( '-'? DIGIT+ )/
88
89 It's a little bit wonky but it works. It says that any expression is an
90 add/subtract and that an add/subtract is really a multiply/divide etc.
91 Finally after the last operator comes the number token and the parens.
92
93 It feels a bit backwards. One of the biggest drawbacks of PCM is that
94 it becomes less and less efficient with more and more operators. It
95 needs to go through the whole tree, just to find each number.
96
97 But it works and the code is minimal. The receiver class gets the
98 numbers in the correct order, immediately evaluates the answer and
99 returns the answer for each level. Whatever the return value of the
100 final operation is, becomes the result of the parse. Here's the
101 receiver class:
102
103 {
104 package Calculator;
105 use base 'Pegex::Tree';
106
107 sub gotrule {
108 my ($self, $list) = @_;
109 return $list unless ref $list;
110
111 # Right associative:
112 if ($self->rule eq 'power') {
113 while (@$list > 1) {
114 my ($a, $b) = splice(@$list, -2, 2);
115 push @$list, $a ** $b;
116 }
117 }
118 # Left associative:
119 else {
120 while (@$list > 1) {
121 my ($a, $op, $b) = splice(@$list, 0, 3);
122 unshift @$list,
123 ($op eq '+') ? ($a + $b) :
124 ($op eq '-') ? ($a - $b) :
125 ($op eq '*') ? ($a * $b) :
126 ($op eq '/') ? ($a / $b) :
127 die;
128 }
129 }
130 return @$list;
131 }
132 }
133
134 As you can see, it has an action method for each level or precedence.
135 It loops over the expression, evaluating it. Whether it loops from left
136 to right or right to left depends on the associativity that we want to
137 use.
138
139 Our runner code looks like this:
140
141 while (1) {
142 print "\nEnter an equation: ";
143 my $expr = <> || '';
144 chomp $expr;
145 last unless length $expr;
146 calc($expr);
147 }
148
149 sub calc {
150 my ($expr) = @_;
151 my $result = pegex($grammar, 'Calculator')->parse($expr);
152 if ($@) {
153 warn $@;
154 return;
155 }
156 print "$expr = $result\n";
157 }
158
159 And that's the whole thing. We have a working calculator as specced!
160
161 However the real point of this is to explore good parsing techniques,
162 and the PCM leaves us wanting to try something more efficient. Let's
163 try another approach...
164
165 calculator2.pl - Shunting Yard Algorithm
166 An age old way of parsing expressions is to somehow get the numbers and
167 operators into an RPN (Reverse Polish Notation) stack, which is each
168 operand follow by its operator. Once in that form, precedence and
169 associativity are accounted for.
170
171 For example:
172
173 1 / 2 - ( -3 * 4 )
174
175 becomes:
176
177 1, 2, /, -3, 4, *, -
178
179 To evaluate an RPN you pop off an operator and then attempt to pop off
180 and operand. If the operand is another operator you recurse. When you
181 have 2 operands you do the operation and put the result back on the
182 stack. When there is only 1 element on the stack, you are done. That's
183 your result.
184
185 Let's look at our new grammar in "calculator2.pl":
186
187 expr: operand (operator operand)*
188 operator: /- (['+-*/^'])/
189 operand: num | /- '('/ expr /- ')'/
190 num: /- ('-'? DIGIT+)/
191
192 This is much easier to understand. We are just parsing out the tokens.
193 In a (very real) sense, we are using Pegex as a lexer.
194
195 Now let's look at the receiver class:
196
197 {
198 package Calculator;
199 use base 'Pegex::Tree', 'Precedence';
200
201 my $operator_precedence_table = {
202 '+' => {p => 1, a => 'l'},
203 '-' => {p => 1, a => 'l'},
204 '*' => {p => 2, a => 'l'},
205 '/' => {p => 2, a => 'l'},
206 '^' => {p => 3, a => 'r'},
207 };
208
209 sub got_expr {
210 my ($self, $expr) = @_;
211 $self->precedence_rpn($expr, $operator_precedence_table);
212 }
213 }
214
215 This is also much simpler. There's only one method. What's going on?
216 Well the secret is that I put the code to turn the tokens into RPN in a
217 separate base class called "lib/Precedence.pm" in example.
218
219 This is an implementation of Edsger Dijkstra's famous Shunting-yard
220 Algorithm from 1961! It's only 20 lines of Perl. I won't include it
221 inline here, but have a look at it for yourself.
222 <https://github.com/ingydotnet/pegex-pm/blob/master/eg/calculator/lib/Precedence.pm>
223
224 The Shunting-yard algorithm simply takes a list of expression tokens
225 and transforms them into an RPN stack. It uses information from a
226 precedence/associativity table like the one above.
227
228 Unlike calculator1.pl where we evaluated as we parsed, calculator2.pl
229 creates an RPN which is akin to an AST. In other words, it's more like
230 something an actually language compiler would do.
231
232 But we are writing a calculator and we still need to evaluate this
233 puppy. I changed the runner code to look like this:
234
235 sub calc {
236 my $expr = shift;
237 my $calculator = pegex($grammar, 'Calculator');
238 my $rpn = eval { $calculator->parse($expr) };
239 my $result = RPN::evaluate($rpn);
240 print $@ || "$expr = $result\n";
241 }
242
243 So overall, this second solution was a bit more code, but also feels
244 more solid on several levels.
245
247 Pegex strives to be the nicest and most reusable way to write new
248 parsers. Operator precedence parsers are a necessary part of parsing
249 mathematical expressions and computer languages. This tutorial showed
250 you 2 ways to do it. As the demands for Pegex grow, we may see even
251 more ways to do it.
252
253
254
255perl v5.38.0 2023-07-21 Pegex::Tutorial::Calculator(3)