1Marpa::XS::Rewrite(3) User Contributed Perl DocumentationMarpa::XS::Rewrite(3)
2
3
4

NAME

6       Marpa::XS::Rewrite - How Marpa rewrites grammars
7

OVERVIEW

9       Marpa rewrites grammars, adding internal symbols and rules in the
10       process.  These rewritings do not affect the semantics, but they do
11       show up when you look at Marpa's grammars.
12
13       Marpa's internal symbols have tags at the end, enclosed in square
14       brackets.  This means that all Marpa internal symbols end in a right
15       square bracket.
16
17   Marpa adds a new start rule
18       Many parsers add their own start rule and their own start symbol to
19       grammars.  Marpa is no exception.  The new internal start symbol is the
20       original start symbol with ""[']"" suffixed.  An example of a Marpa
21       internal start symbol is ""Expression[']"".  If the grammar allows a
22       null parse, there will also be a Marpa internal nulling start symbol,
23       with ""['][]"" suffixed.  An example of a Marpa internal nulling start
24       symbol would be ""Script['][]"".
25
26   Marpa eliminates proper nullable symbols
27       A symbol is nulled if it produces the empty sentence.  Nulling symbols
28       are those which are always nulled.  Nullable symbols are those which
29       are sometimes nulled.  Non-nullable symbols are those which are never
30       nulled.
31
32       All nulling symbols are also nullable symbols.  A proper nullable is
33       any nullable symbol which is not a nulling symbol.  In other words, a
34       proper nullable is a symbol that might or might not be nulled.
35
36       Nullable symbols were a problem in previous versions of Earley parsers.
37       Aycock and Horspool 2002 outlined a new approach for dealing with them.
38       I use their ideas with modifications of my own.
39
40       Marpa rewrites its grammar to eliminate proper nullables.  It does this
41       by turning every proper nullable into two symbols: a non-nullable
42       variant and a nulling variant.  The non-nullable variant keeps the
43       original symbol's name, but is no longer allowed to appear in places
44       where it might be nulled.
45
46       The name of the nulling variant is that of the original symbol with the
47       nulling tag (""[]"") suffixed.  When the nulling variant is used, it
48       must be nulled.
49
50       The newly introduced nulling symbols will not appear on any left hand
51       sides, with one exception: grammars that allow a null parse will have a
52       nulling start rule.  Except for the nulling start symbol, Marpa marks
53       nulling symbols internally and recognizes them directly, without the
54       need for empty rules.
55
56       Rules with proper nullables on the RHS have to be replaced with new
57       rules covering every possible combination of the non-nullable and
58       nulling variants.  That rewrite is called the CHAF rewrite.
59
60   Marpa divides some rules up into smaller ones
61       Here's an example of a CHAF rewrite:
62
63           {   lhs => 'statement',
64               rhs => [
65                   qw/optional_whitespace expression
66                       optional_whitespace optional_modifier
67                       optional_whitespace/
68               ]
69           }
70
71       This rule contains four instances of proper nullables, illustrating my
72       fear that grammars of practical interest will have lots of proper
73       nullables on right hand sides.  "optional_whitespace" and
74       "optional_modifier" are both proper nullables and "optional_whitespace"
75       appears three times.
76
77       Here's is the output from "show_rules", showing what Marpa does with
78       this rule:
79
80           0: statement -> optional_whitespace expression optional_whitespace optional_modifier optional_whitespace
81           /* !used */
82
83           15: statement -> optional_whitespace expression statement[R0:2] /* vrhs real=2 */
84           16: statement -> optional_whitespace expression optional_whitespace[] optional_modifier[] optional_whitespace[]
85           17: statement -> optional_whitespace[] expression statement[R0:2] /* vrhs real=2 */
86           18: statement -> optional_whitespace[] expression optional_whitespace[] optional_modifier[] optional_whitespace[]
87           19: statement[R0:2] -> optional_whitespace statement[R0:3] /* vlhs vrhs real=1 */
88           20: statement[R0:2] -> optional_whitespace optional_modifier[] optional_whitespace[] /* vlhs real=3 */
89           21: statement[R0:2] -> optional_whitespace[] statement[R0:3] /* vlhs vrhs real=1 */
90           22: statement[R0:3] -> optional_modifier optional_whitespace /* vlhs real=2 */
91           23: statement[R0:3] -> optional_modifier optional_whitespace[] /* vlhs real=2 */
92           24: statement[R0:3] -> optional_modifier[] optional_whitespace /* vlhs real=2 */
93
94       Rule 0 is the original rule.  Because Marpa has rewritten it, the rule
95       is marked ""!used"", telling later stages in the precomputation to
96       ignore it.  Marpa breaks Rule 0 up into three pieces, each with no more
97       than two proper nullables.  Marpa then eliminates the proper nullables
98       in each piece by factoring.
99
100       To factor a piece, Marpa first rewrites it into multiple rules, one for
101       each possible combination of nulled and non-nulled symbols.  Marpa then
102       replaces each proper nullable with its nulling or non-nullable variant,
103       as appropriate.  There are two symbol variants for each proper
104       nullable.  With a maximum of two proper nullables for each piece, each
105       with two variants, there are at most four combinations.  A rule for one
106       of these combinations is called a factored rule, or a factor.
107
108       In the example in the above display, the original rule (Rule 0) was
109       broken into three pieces.  Rule 0 had 5 RHS symbols, and the three
110       pieces contain, respectively, the first two RHS symbols; the third RHS
111       symbol; and the last two (that is, 4th and 5th) RHS symbols.
112
113       The first piece of Rule 0 is factored into four rules: Rules 15 to 18.
114       The second piece is factored into three rules: Rules 19 to 21.  The
115       third and last piece of Rule 0 is also factored into three rules: Rules
116       22 to 24.
117
118       When a rule is broken into pieces, the left hand side of the first
119       piece is the left hand side of the original rule.  New symbols are
120       introduced to be the left hand sides of the other pieces.  The names of
121       the new LHS symbols are formed by suffixing a tag to the name of the
122       original left hand side.  The suffixed tag begins with a capital "R"
123       and identifies the location of the piece in the original rule.  For
124       example, the tag ""[R0:3]"" indicates that this symbol is the left hand
125       side for the piece of Rule 0 which begins at right hand symbol 3 (the
126       first symbol is symbol 0).
127
128       When a new LHS symbol is created for a piece, the worst case is that
129       the new LHS is also a proper nullable.  This worst case occurs when all
130       the original symbols in the piece for the new LHS and all symbols in
131       all subsequent pieces are proper nullables.
132
133       A newly created LHS symbol will always appear on the RHS of the
134       previous piece.  If the newly created symbol is a proper nullable, then
135       it counts against the limit of two proper nullables for that previous
136       piece, and it must be factored, just the same as if it was one of the
137       proper nullables appearing in the original rule.
138
139       Rule 0 is such a worst case.  The last three symbols of the right hand
140       side are all proper nullables.  That means that all the symbols in the
141       last two pieces of the original rule are proper nullables.  Therefore
142       both of the newly created LHS symbols are proper nullables.
143
144       The original Rule 0 has 4 proper nullables.  After the CHAF rewrite,
145       there are 6 proper nullables: the original 4 plus the 2 symbols newly
146       created to serve as left hand sides.  This is why, in order to have at
147       most 2 proper nullables per piece, the original rule must to be divided
148       into 3 pieces.
149
150       CHAF preserves user semantics.  Marpa, when it splits the rule into
151       pieces and factors the pieces, inserts logic to gather and preserve the
152       values of child nodes.  The user's semantic actions see the original
153       child values as if the CHAF rewrite had never occurred.
154
155   Marpa converts sequence productions to BNF rules
156       Internally, Marpa converts productions specified as sequences into BNF
157       productions.  The conversion is done in a standard way.  For example,
158
159           {
160           lhs       => 'statements',
161           rhs       => [qw/statement/],
162           separator => 'comma',
163           min       => 1
164           }
165
166       becomes
167
168           2: statements -> statements[Subseq:8:5] /* vrhs real=0 */
169           3: statements -> statements[Subseq:8:5] comma /* vrhs real=1 */
170           4: statements[Subseq:8:5] -> statement /* vlhs real=1 */
171           5: statements[Subseq:8:5] -> statements[Subseq:8:5] comma statement /* vlhs vrhs real=2 */
172
173       In the added symbol, the tag ""[Subseq:8:5]"" indicates a special
174       symbol introduced to serve as the LHS of the sequence.  The ""8:5""
175       indentifies the sequence using the symbol ID's of the two symbols
176       involved.  ""statements"", the LHS symbol, has symbol ID 8.
177       ""statement"", the RHS symbol, has symbol ID 5.
178
179       Here's another example, this time of a sequence without a separator.
180       The rule
181
182           { lhs => 'block', rhs => [qw/statements/], min => 0 },
183
184       is rewritten by Marpa as follows:
185
186           7: block -> /* empty !used */
187           8: block -> block[Subseq:0:8] /* vrhs real=0 */
188           9: block[Subseq:0:8] -> statements /* vlhs real=1 */
189           10: block[Subseq:0:8] -> block[Subseq:0:8] statements /* vlhs vrhs real=1 */
190
191       Note that rule 7, the empty rule with the "block" symbol as its LHS, is
192       marked ""!used"".  "block" is a proper nullable, and rules from
193       sequence conversion undergo the same rewriting of proper nullables as
194       any other rules.  In this case a nulling variant of "block" ("block[]")
195       was created.  That made the empty rule created by the sequence
196       conversion useless, and that is why it was marked ""!used"".
197
199         Copyright 2012 Jeffrey Kegler
200         This file is part of Marpa::XS.  Marpa::XS is free software: you can
201         redistribute it and/or modify it under the terms of the GNU Lesser
202         General Public License as published by the Free Software Foundation,
203         either version 3 of the License, or (at your option) any later version.
204
205         Marpa::XS is distributed in the hope that it will be useful,
206         but WITHOUT ANY WARRANTY; without even the implied warranty of
207         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
208         Lesser General Public License for more details.
209
210         You should have received a copy of the GNU Lesser
211         General Public License along with Marpa::XS.  If not, see
212         http://www.gnu.org/licenses/.
213
214
215
216perl v5.36.0                      2022-07-22             Marpa::XS::Rewrite(3)
Impressum