1Marpa::XS::Semantics::OUrsdeerr(C3o)ntributed Perl DocumMeanrtpaat:i:oXnS::Semantics::Order(3)
2
3
4
6 Marpa::XS::Semantics::Order - How Marpa ranks ambiguous parses
7
9 Marpa allows ambiguous parses. While an unambiguous parse can produce
10 at most one parse tree and one parse result, an ambiguous parse will
11 produce a parse series. A parse series is a sequence of parse trees,
12 each of which will have its own parse result.
13
14 This document describes ways of controlling the order in which the
15 recognizer's "value" method evaluates the parse trees of an ambiguous
16 parse. It also describes ways to exclude selected parse trees from the
17 parse series.
18
19 Duplicate parses are eliminated
20 When evaluating the parse trees in a parse series, Marpa never
21 evaluates the same parse tree twice. Marpa considers two parse trees
22 to be the same if they are semantic duplicates.
23
24 Two parse trees are semantic duplicates if and only if a recursive,
25 top-down evaluation of each applies the same rules in the same order at
26 the same earleme locations. This definition implies that, given any
27 deterministic semantics, two parse trees which are semantic duplicates
28 will always produce the same parse result -- hence the term "semantic
29 duplicates". When the Marpa documentation refers to duplicate parses,
30 it will mean semantic duplicates unless otherwise stated.
31
32 Default parse order
33 By calling the recognizer's "value" method repeatedly, Marpa can
34 produce all the parse results in the current parse series. The default
35 is for the parse results to be returned in an arbitrary order. This
36 corresponds to the ""none"" value of the recognizer's "ranking_method"
37 named argument.
38
39 Arbitrary parse order
40 When this document calls a behavior "arbitrary" it means that an
41 application must not rely on that behavior being repeated. An
42 arbitrary behavior is allowed to change from version to version of the
43 software, from run to run of the application, and even from call to
44 call of the methods.
45
46 "Arbitrary parse order", in particular, means that the particular order
47 in which parse trees are evaluated may not be repeated. But an
48 application can expect, even when the order of evaluation of the parse
49 trees is arbitrary, that all appropriate parse trees will be included
50 in the parse series, and that none of the parse trees in the parse
51 series will be the semantic duplicate of any other tree.
52
53 Ranking methods
54 Marpa grammar objects have a "ranking_method" named argument, whose
55 value can be the name of a ranking method, or ""none"", indicating that
56 the default ranking method is to be used.
57
58 The "rule" ranking method
59 The rule method ranks alternative parses according to their rules.
60 Every rule has a rule rank. A rule's rank can be specified using the
61 the "rank" named argument for that rule. Rule ranks must be integers.
62 If no rule rank is specified, the rule rank is 0. When being ranked,
63 parse trees are traversed top-down, left-to-right.
64
65 The "high_rule_only" ranking method
66 The "high_rule_only" ranking method is similar to the "rule" ranking
67 method, except that, at every choice point, it discards all of the
68 choices which have a rank lower than that of the highest ranked
69 alternative.
70
71 In carrying out the ranking, the choice points of the parse trees are
72 visited in top-down, left-to-right order. Since the "high_rule_only"
73 ranking method eliminates some parse trees, it can reduce or eliminate
74 the ambiguity of a parse, but it does not necessarily do either. This
75 is because, at each choice point among the parse trees, it is possible
76 that several of the choices, or all of them, will have the same rank as
77 the highest ranked alternative.
78
79 Null ranking
80 Some rules have a RHS which contains proper nullables: symbols which
81 may be nulled, but which are not nulling symbols. (Nulling symbols are
82 symbols which are always nulled.)
83
84 When a rule contains proper nullables, each application of that rule to
85 a section of input creates a nulling variant -- that rule with a
86 specific pattern of null and non-null symbols. In many cases, this
87 creates an ambiguity -- different nulling variants could apply to the
88 same substring of input. In ambiguous parsings of this kind, some
89 applications may want to rank nulling variants that start with non-null
90 symbols higher. Other applications may want to do the opposite -- to
91 rank nulling variants that start with null symbols higher.
92
93 The "null_ranking" named argument for rules specifies which nulling
94 variants are ranked high or low. Ranking of nulling variants is done
95 left-to-right, with the null preference as indicated by the
96 "null_ranking" named argument. Specifically, if the "null_ranking" is
97 ""low"", then the closer a nulling variant places its non-null symbols
98 to the start of the rule, the higher it ranks. If the "null_ranking"
99 is ""high"", then the closer a nulling variant places its null symbols
100 to the start of the rule, the higher it ranks.
101
102 A rule is null ranked if and only if its "null_ranking" is defined.
103 Note that this definition means that a rule can be considered null
104 ranked even if it does not have any nullable symbols. A rule which is
105 not null ranked is called non-null-ranked.
106
107 Null ranking, in the current implementation, is targeted at situations
108 where the alternatives are different applications of a single rule. In
109 situations where the choice is between two different rules, an
110 application which cares about the order will typically want to override
111 the null ranking using rule ranks. A more detailed description of the
112 rule comparison logic can be found below.
113
114 By default, "null_ranking" is undefined, which means the order of the
115 null variants is arbitrary. This is fine for most applications.
116
117 Details of ranking
118 When ranking, the logic descends the parse trees top-down and left-to-
119 right. Places where there is more than one alternative are called
120 choice points. Alternatives are ranked (in the "rule" ranking method)
121 or selected (in the "high_rule_only" ranking method), by comparing them
122 as follows:
123
124 • If two alternatives are for the same rule, and that the rule is
125 null ranked, they rank as described under "Null ranking".
126
127 • If the two alternatives have the same rule, and that rule is non-
128 null-ranked, the alternatives have equal rank.
129
130 • If the two alternatives have different rules, and the two rules
131 have different rule ranks, the alternative with the higher rule
132 rank ranks high.
133
134 • The remaining cases deal with the comparison of alternatives which
135 have different rules, when both rules have the same rule rank.
136
137 • If the rule for one alternative is null ranked, while the rule
138 for the other is non-null-ranked, the alternative with the non-
139 null-ranked rule ranks high.
140
141 • If both rules are non-null-ranked, the two alternatives have
142 equal rank.
143
144 • If both rules are null-ranked, their ranking is arbitrary --
145 either one may rank high, or they may have equal rank.
146
147 A general approach to sorting parses
148 The most general way to sort Marpa parses is for the application to
149 take control. The application can set up the Marpa semantic actions so
150 that the parse result of every parse tree is a "<rank, true_value>"
151 duple. The duples can then be sorted by "rank". Once the resuls are
152 sorted, the "rank" element of the duple can be discarded. (Those
153 familiar with the Schwartzian transform may note a resemblance. In
154 Perl, duples can be implemented as references to arrays of 2 elements.)
155
156 The user needs to be careful. In theory, ambiguity can cause an
157 exponential explosion in the number of results. In practice, ambiguity
158 tends to get out of hand very easily. Producing and sorting all the
159 parses can take a very long time.
160
161 The "constant" ranking method is no longer supported
162 The "constant" ranking method is no longer implemented. An initial
163 attempt to find the ideal compromise between flexibility and
164 efficiency, the "constant" ranking method proved too ambitious, and had
165 several drawbacks. First, its overhead was high in comparison to the
166 flexibility it offered. Second, it was not intuitive. Even when the
167 "constant" ranking method was powerful enough to solve a problem, it
168 was not always obvious how to actually use it. Finally, its
169 implementation was extremely complex.
170
171 The new "rule" ranking method is far more efficient, vastly more
172 intuitive, and far easier to implement and maintain. But the new
173 "rule" ranking method is also very flexible -- it passes the test suite
174 which was written for the "constant" ranking method.
175
177 Copyright 2012 Jeffrey Kegler
178 This file is part of Marpa::XS. Marpa::XS is free software: you can
179 redistribute it and/or modify it under the terms of the GNU Lesser
180 General Public License as published by the Free Software Foundation,
181 either version 3 of the License, or (at your option) any later version.
182
183 Marpa::XS is distributed in the hope that it will be useful,
184 but WITHOUT ANY WARRANTY; without even the implied warranty of
185 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
186 Lesser General Public License for more details.
187
188 You should have received a copy of the GNU Lesser
189 General Public License along with Marpa::XS. If not, see
190 http://www.gnu.org/licenses/.
191
192
193
194perl v5.34.0 2022-01-21 Marpa::XS::Semantics::Order(3)