Pegex::Tutorial::Calculator(3pm)

1Pegex::Tutorial::CalculUasteorr(C3o)ntributed Perl DocumPeengteaxt:i:oTnutorial::Calculator(3)
2
3
4

Pegex Tutorial Calculator

6       A Pegex Calculator
7
8       When you look around the web for stuff about parsing, you inevitably
9       find a bunch of examples about how to write a precedence parser for
10       mathematical expressions.
11
12       •   <http://en.wikipedia.org/wiki/Operator-precedence_parser>
13
14       •   <http://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/>
15
16       •   <http://www.hokstad.com/operator-precedence-parser.html>
17
18       •   <http://www.perlmonks.org/?node_id=554516>
19
20       This tutorial is the Pegex version of that. Pegex actually comes with
21       an examples directory that contains two arithmetic expression
22       parser/evaluator calculator programs:
23
24       •   <https://github.com/ingydotnet/pegex-pm/blob/master/eg/calculator/calculator1.pl>
25
26       •   <https://github.com/ingydotnet/pegex-pm/blob/master/eg/calculator/calculator2.pl>
27
28       They both do the same thing but using different parsing approaches.
29       We'll cover both of them in detail. I hope you'll find that Pegex
30       handles operator precedence parsing elegantly.
31

The Problem

33       Precedence parsers are interesting because you need to deal with
34       operators that have the same precedence, different precedences and both
35       left and right associativity.
36
37       Consider the equation:
38
39           1 - 2 ^ 3 ^ 4 + 5 * 6 / 7
40
41       Normal precedence and associativity rules make this the same as:
42
43           (1 - (2 ^ (3 ^ 4)) + ((5 * 6) / 7))
44
45       Our calculator should produce the same result for both. Note that this
46       means we will be parsing numbers, 5 operators, parentheses, and
47       separating whitespace.
48
49       Here's an example of the calculator program running:
50
51           > perl eg/calculator1.pl
52           Enter an equation: 1+2*3
53           1+2*3 = 7
54           Enter an equation: (1 + 2) * 3
55           (1 + 2) * 3 = 9
56           Enter an equation:
57

The Solutions

59       Most of the solutions that you'll read about on the web, involve (or
60       assume) a lexing/tokenizing step before parsing. Pegex always parses an
61       input stream directly, pulling out "tokens" that it needs using regex
62       captures. So the parse happens as one operation, which has many
63       advantages.
64
65       But how do we get the operator precedence rules into this? Well, we
66       have 2 different ways:
67
68   calculator1.pl - Operator Precedence Climbing
69       Note: The code in this example is copy/pasted from "calculator/" in
70       example
71             files. The code in those running files is slightly different but
72             rewritten to make more sense in this doc.
73
74       Our first example calculator uses what is known as the Operator
75       Precedence Climbing method. See:
76       <http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method>.
77
78       This is basically a clever technique of specifying our grammar rules
79       such that they imply precedence. Here's the pegex grammar from the
80       code:
81
82           expr: add-sub
83           add-sub: mul-div+ % /- ( [ '+-' ])/
84           mul-div: power+ % /- ([ '*/' ])/
85           power: token+ % /- '^' /
86           token: /- '(' -/ expr /- ')'/ | number
87           number: /- ( '-'? DIGIT+ )/
88
89       It's a little bit wonky but it works. It says that any expression is an
90       add/subtract and that an add/subtract is really a multiply/divide etc.
91       Finally after the last operator comes the number token and the parens.
92
93       It feels a bit backwards. One of the biggest drawbacks of PCM is that
94       it becomes less and less efficient with more and more operators. It
95       needs to go through the whole tree, just to find each number.
96
97       But it works and the code is minimal. The receiver class gets the
98       numbers in the correct order, immediately evaluates the answer and
99       returns the answer for each level. Whatever the return value of the
100       final operation is, becomes the result of the parse. Here's the
101       receiver class:
102
103           {
104               package Calculator;
105               use base 'Pegex::Tree';
106
107               sub gotrule {
108                   my ($self, $list) = @_;
109                   return $list unless ref $list;
110
111                   # Right associative:
112                   if ($self->rule eq 'power') {
113                       while (@$list > 1) {
114                           my ($a, $b) = splice(@$list, -2, 2);
115                           push @$list, $a ** $b;
116                       }
117                   }
118                   # Left associative:
119                   else {
120                       while (@$list > 1) {
121                           my ($a, $op, $b) = splice(@$list, 0, 3);
122                           unshift @$list,
123                               ($op eq '+') ? ($a + $b) :
124                               ($op eq '-') ? ($a - $b) :
125                               ($op eq '*') ? ($a * $b) :
126                               ($op eq '/') ? ($a / $b) :
127                               die;
128                       }
129                   }
130                   return @$list;
131               }
132           }
133
134       As you can see, it has an action method for each level or precedence.
135       It loops over the expression, evaluating it. Whether it loops from left
136       to right or right to left depends on the associativity that we want to
137       use.
138
139       Our runner code looks like this:
140
141           while (1) {
142               print "\nEnter an equation: ";
143               my $expr = <> || '';
144               chomp $expr;
145               last unless length $expr;
146               calc($expr);
147           }
148
149           sub calc {
150               my ($expr) = @_;
151               my $result = pegex($grammar, 'Calculator')->parse($expr);
152               if ($@) {
153                   warn $@;
154                   return;
155               }
156               print "$expr = $result\n";
157           }
158
159       And that's the whole thing. We have a working calculator as specced!
160
161       However the real point of this is to explore good parsing techniques,
162       and the PCM leaves us wanting to try something more efficient. Let's
163       try another approach...
164
165   calculator2.pl - Shunting Yard Algorithm
166       An age old way of parsing expressions is to somehow get the numbers and
167       operators into an RPN (Reverse Polish Notation) stack, which is each
168       operand follow by its operator. Once in that form, precedence and
169       associativity are accounted for.
170
171       For example:
172
173           1 / 2 - ( -3 * 4 )
174
175       becomes:
176
177           1, 2, /, -3, 4, *, -
178
179       To evaluate an RPN you pop off an operator and then attempt to pop off
180       and operand. If the operand is another operator you recurse. When you
181       have 2 operands you do the operation and put the result back on the
182       stack. When there is only 1 element on the stack, you are done. That's
183       your result.
184
185       Let's look at our new grammar in "calculator2.pl":
186
187           expr: operand (operator operand)*
188           operator: /- (['+-*/^'])/
189           operand: num | /- '('/ expr /- ')'/
190           num: /- ('-'? DIGIT+)/
191
192       This is much easier to understand. We are just parsing out the tokens.
193       In a (very real) sense, we are using Pegex as a lexer.
194
195       Now let's look at the receiver class:
196
197           {
198               package Calculator;
199               use base 'Pegex::Tree', 'Precedence';
200
201               my $operator_precedence_table = {
202                   '+' => {p => 1, a => 'l'},
203                   '-' => {p => 1, a => 'l'},
204                   '*' => {p => 2, a => 'l'},
205                   '/' => {p => 2, a => 'l'},
206                   '^' => {p => 3, a => 'r'},
207               };
208
209               sub got_expr {
210                   my ($self, $expr) = @_;
211                   $self->precedence_rpn($expr, $operator_precedence_table);
212               }
213           }
214
215       This is also much simpler. There's only one method. What's going on?
216       Well the secret is that I put the code to turn the tokens into RPN in a
217       separate base class called "lib/Precedence.pm" in example.
218
219       This is an implementation of Edsger Dijkstra's famous Shunting-yard
220       Algorithm from 1961! It's only 20 lines of Perl. I won't include it
221       inline here, but have a look at it for yourself.
222       <https://github.com/ingydotnet/pegex-pm/blob/master/eg/calculator/lib/Precedence.pm>
223
224       The Shunting-yard algorithm simply takes a list of expression tokens
225       and transforms them into an RPN stack. It uses information from a
226       precedence/associativity table like the one above.
227
228       Unlike calculator1.pl where we evaluated as we parsed, calculator2.pl
229       creates an RPN which is akin to an AST. In other words, it's more like
230       something an actually language compiler would do.
231
232       But we are writing a calculator and we still need to evaluate this
233       puppy. I changed the runner code to look like this:
234
235           sub calc {
236               my $expr = shift;
237               my $calculator = pegex($grammar, 'Calculator');
238               my $rpn = eval { $calculator->parse($expr) };
239               my $result = RPN::evaluate($rpn);
240               print $@ || "$expr = $result\n";
241           }
242
243       So overall, this second solution was a bit more code, but also feels
244       more solid on several levels.
245

Conclusion

247       Pegex strives to be the nicest and most reusable way to write new
248       parsers.  Operator precedence parsers are a necessary part of parsing
249       mathematical expressions and computer languages. This tutorial showed
250       you 2 ways to do it.  As the demands for Pegex grow, we may see even
251       more ways to do it.
252
253
254
255perl v5.38.0                      2023-07-21    Pegex::Tutorial::Calculator(3)