1Marpa::XS::Rewrite(3) User Contributed Perl DocumentationMarpa::XS::Rewrite(3)
2
3
4
6 Marpa::XS::Rewrite - How Marpa rewrites grammars
7
9 Marpa rewrites grammars, adding internal symbols and rules in the
10 process. These rewritings do not affect the semantics, but they do
11 show up when you look at Marpa's grammars.
12
13 Marpa's internal symbols have tags at the end, enclosed in square
14 brackets. This means that all Marpa internal symbols end in a right
15 square bracket.
16
17 Marpa adds a new start rule
18 Many parsers add their own start rule and their own start symbol to
19 grammars. Marpa is no exception. The new internal start symbol is the
20 original start symbol with ""[']"" suffixed. An example of a Marpa
21 internal start symbol is ""Expression[']"". If the grammar allows a
22 null parse, there will also be a Marpa internal nulling start symbol,
23 with ""['][]"" suffixed. An example of a Marpa internal nulling start
24 symbol would be ""Script['][]"".
25
26 Marpa eliminates proper nullable symbols
27 A symbol is nulled if it produces the empty sentence. Nulling symbols
28 are those which are always nulled. Nullable symbols are those which
29 are sometimes nulled. Non-nullable symbols are those which are never
30 nulled.
31
32 All nulling symbols are also nullable symbols. A proper nullable is
33 any nullable symbol which is not a nulling symbol. In other words, a
34 proper nullable is a symbol that might or might not be nulled.
35
36 Nullable symbols were a problem in previous versions of Earley parsers.
37 Aycock and Horspool 2002 outlined a new approach for dealing with them.
38 I use their ideas with modifications of my own.
39
40 Marpa rewrites its grammar to eliminate proper nullables. It does this
41 by turning every proper nullable into two symbols: a non-nullable
42 variant and a nulling variant. The non-nullable variant keeps the
43 original symbol's name, but is no longer allowed to appear in places
44 where it might be nulled.
45
46 The name of the nulling variant is that of the original symbol with the
47 nulling tag (""[]"") suffixed. When the nulling variant is used, it
48 must be nulled.
49
50 The newly introduced nulling symbols will not appear on any left hand
51 sides, with one exception: grammars that allow a null parse will have a
52 nulling start rule. Except for the nulling start symbol, Marpa marks
53 nulling symbols internally and recognizes them directly, without the
54 need for empty rules.
55
56 Rules with proper nullables on the RHS have to be replaced with new
57 rules covering every possible combination of the non-nullable and
58 nulling variants. That rewrite is called the CHAF rewrite.
59
60 Marpa divides some rules up into smaller ones
61 Here's an example of a CHAF rewrite:
62
63 { lhs => 'statement',
64 rhs => [
65 qw/optional_whitespace expression
66 optional_whitespace optional_modifier
67 optional_whitespace/
68 ]
69 }
70
71 This rule contains four instances of proper nullables, illustrating my
72 fear that grammars of practical interest will have lots of proper
73 nullables on right hand sides. "optional_whitespace" and
74 "optional_modifier" are both proper nullables and "optional_whitespace"
75 appears three times.
76
77 Here's is the output from "show_rules", showing what Marpa does with
78 this rule:
79
80 0: statement -> optional_whitespace expression optional_whitespace optional_modifier optional_whitespace
81 /* !used */
82
83 15: statement -> optional_whitespace expression statement[R0:2] /* vrhs real=2 */
84 16: statement -> optional_whitespace expression optional_whitespace[] optional_modifier[] optional_whitespace[]
85 17: statement -> optional_whitespace[] expression statement[R0:2] /* vrhs real=2 */
86 18: statement -> optional_whitespace[] expression optional_whitespace[] optional_modifier[] optional_whitespace[]
87 19: statement[R0:2] -> optional_whitespace statement[R0:3] /* vlhs vrhs real=1 */
88 20: statement[R0:2] -> optional_whitespace optional_modifier[] optional_whitespace[] /* vlhs real=3 */
89 21: statement[R0:2] -> optional_whitespace[] statement[R0:3] /* vlhs vrhs real=1 */
90 22: statement[R0:3] -> optional_modifier optional_whitespace /* vlhs real=2 */
91 23: statement[R0:3] -> optional_modifier optional_whitespace[] /* vlhs real=2 */
92 24: statement[R0:3] -> optional_modifier[] optional_whitespace /* vlhs real=2 */
93
94 Rule 0 is the original rule. Because Marpa has rewritten it, the rule
95 is marked ""!used"", telling later stages in the precomputation to
96 ignore it. Marpa breaks Rule 0 up into three pieces, each with no more
97 than two proper nullables. Marpa then eliminates the proper nullables
98 in each piece by factoring.
99
100 To factor a piece, Marpa first rewrites it into multiple rules, one for
101 each possible combination of nulled and non-nulled symbols. Marpa then
102 replaces each proper nullable with its nulling or non-nullable variant,
103 as appropriate. There are two symbol variants for each proper
104 nullable. With a maximum of two proper nullables for each piece, each
105 with two variants, there are at most four combinations. A rule for one
106 of these combinations is called a factored rule, or a factor.
107
108 In the example in the above display, the original rule (Rule 0) was
109 broken into three pieces. Rule 0 had 5 RHS symbols, and the three
110 pieces contain, respectively, the first two RHS symbols; the third RHS
111 symbol; and the last two (that is, 4th and 5th) RHS symbols.
112
113 The first piece of Rule 0 is factored into four rules: Rules 15 to 18.
114 The second piece is factored into three rules: Rules 19 to 21. The
115 third and last piece of Rule 0 is also factored into three rules: Rules
116 22 to 24.
117
118 When a rule is broken into pieces, the left hand side of the first
119 piece is the left hand side of the original rule. New symbols are
120 introduced to be the left hand sides of the other pieces. The names of
121 the new LHS symbols are formed by suffixing a tag to the name of the
122 original left hand side. The suffixed tag begins with a capital "R"
123 and identifies the location of the piece in the original rule. For
124 example, the tag ""[R0:3]"" indicates that this symbol is the left hand
125 side for the piece of Rule 0 which begins at right hand symbol 3 (the
126 first symbol is symbol 0).
127
128 When a new LHS symbol is created for a piece, the worst case is that
129 the new LHS is also a proper nullable. This worst case occurs when all
130 the original symbols in the piece for the new LHS and all symbols in
131 all subsequent pieces are proper nullables.
132
133 A newly created LHS symbol will always appear on the RHS of the
134 previous piece. If the newly created symbol is a proper nullable, then
135 it counts against the limit of two proper nullables for that previous
136 piece, and it must be factored, just the same as if it was one of the
137 proper nullables appearing in the original rule.
138
139 Rule 0 is such a worst case. The last three symbols of the right hand
140 side are all proper nullables. That means that all the symbols in the
141 last two pieces of the original rule are proper nullables. Therefore
142 both of the newly created LHS symbols are proper nullables.
143
144 The original Rule 0 has 4 proper nullables. After the CHAF rewrite,
145 there are 6 proper nullables: the original 4 plus the 2 symbols newly
146 created to serve as left hand sides. This is why, in order to have at
147 most 2 proper nullables per piece, the original rule must to be divided
148 into 3 pieces.
149
150 CHAF preserves user semantics. Marpa, when it splits the rule into
151 pieces and factors the pieces, inserts logic to gather and preserve the
152 values of child nodes. The user's semantic actions see the original
153 child values as if the CHAF rewrite had never occurred.
154
155 Marpa converts sequence productions to BNF rules
156 Internally, Marpa converts productions specified as sequences into BNF
157 productions. The conversion is done in a standard way. For example,
158
159 {
160 lhs => 'statements',
161 rhs => [qw/statement/],
162 separator => 'comma',
163 min => 1
164 }
165
166 becomes
167
168 2: statements -> statements[Subseq:8:5] /* vrhs real=0 */
169 3: statements -> statements[Subseq:8:5] comma /* vrhs real=1 */
170 4: statements[Subseq:8:5] -> statement /* vlhs real=1 */
171 5: statements[Subseq:8:5] -> statements[Subseq:8:5] comma statement /* vlhs vrhs real=2 */
172
173 In the added symbol, the tag ""[Subseq:8:5]"" indicates a special
174 symbol introduced to serve as the LHS of the sequence. The ""8:5""
175 indentifies the sequence using the symbol ID's of the two symbols
176 involved. ""statements"", the LHS symbol, has symbol ID 8.
177 ""statement"", the RHS symbol, has symbol ID 5.
178
179 Here's another example, this time of a sequence without a separator.
180 The rule
181
182 { lhs => 'block', rhs => [qw/statements/], min => 0 },
183
184 is rewritten by Marpa as follows:
185
186 7: block -> /* empty !used */
187 8: block -> block[Subseq:0:8] /* vrhs real=0 */
188 9: block[Subseq:0:8] -> statements /* vlhs real=1 */
189 10: block[Subseq:0:8] -> block[Subseq:0:8] statements /* vlhs vrhs real=1 */
190
191 Note that rule 7, the empty rule with the "block" symbol as its LHS, is
192 marked ""!used"". "block" is a proper nullable, and rules from
193 sequence conversion undergo the same rewriting of proper nullables as
194 any other rules. In this case a nulling variant of "block" ("block[]")
195 was created. That made the empty rule created by the sequence
196 conversion useless, and that is why it was marked ""!used"".
197
199 Copyright 2012 Jeffrey Kegler
200 This file is part of Marpa::XS. Marpa::XS is free software: you can
201 redistribute it and/or modify it under the terms of the GNU Lesser
202 General Public License as published by the Free Software Foundation,
203 either version 3 of the License, or (at your option) any later version.
204
205 Marpa::XS is distributed in the hope that it will be useful,
206 but WITHOUT ANY WARRANTY; without even the implied warranty of
207 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
208 Lesser General Public License for more details.
209
210 You should have received a copy of the GNU Lesser
211 General Public License along with Marpa::XS. If not, see
212 http://www.gnu.org/licenses/.
213
214
215
216perl v5.36.0 2022-07-22 Marpa::XS::Rewrite(3)