1Marpa::XS::Recognizer(3Upsme)r Contributed Perl DocumentaMtairopna::XS::Recognizer(3pm)
2
3
4
6 Marpa::XS::Recognizer - Marpa recognizers
7
9 my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
10 $recce->read( 'Number', 42 );
11 $recce->read( 'Multiply', );
12 $recce->read( 'Number', 1 );
13 $recce->read( 'Add', );
14 $recce->read( 'Number', 7 );
15
17 To create a recognizer object, use the "new" method.
18
19 To read input, use the "read" method.
20
21 To evaluate a parse tree, based on the input, use the "value" method.
22
23 Token streams
24 By default, Marpa uses the token-stream model of input. The token-
25 stream model is standard -- so standard the most documents about
26 parsing do not bother to describe it. In the token-stream model, each
27 read adds a token at the current location, then advances the current
28 location by one. The location before any input is numbered 0 and if N
29 tokens are parsed, they fill the locations from 1 to N.
30
31 This document will describe only the token-stream model of input.
32 Marpa allows other models of the input, but their use requires special
33 method calls, which are described in the document on alternative input
34 models.
35
37 new
38 my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
39
40 The "new" method creates a recognizer object. The "new" method either
41 returns a new recognizer object or throws an exception.
42
43 The arguments to the "new" method are references to hashes of named
44 arguments. In each key/value pair of these hashes, the key is the
45 argument name, and the hash value is the value of the argument. The
46 named arguments are described below.
47
49 terminals_expected
50 my $terminals_expected = $recce->terminals_expected();
51
52 Returns a reference to a list of strings, where the strings are the
53 names of the terminals acceptable at the current location. In the
54 default input model, the presence of a terminal in this list means that
55 terminal will be acceptable in the next "read" method call. This is
56 highly useful for Ruby Slippers parsing.
57
58 check_terminal
59 my $is_symbol_a_terminal = $recce->check_terminal('Document');
60
61 Returns a Perl true when its argument is the name of a terminal symbol.
62 Otherwise, returns a Perl false. Not often needed.
63
65 read
66 $recce->read( 'Number', 42 );
67 $recce->read( 'Multiply', );
68 $recce->read( 'Number', 1 );
69 $recce->read( 'Add', );
70 $recce->read( 'Number', 7 );
71
72 The "read" method reads one token at the current parse location. It
73 then advances the current location by 1.
74
75 "read" takes two arguments: a token name and a token value. The token
76 name is required. It must be the name of a valid terminal symbol. The
77 token value is optional. It defaults to a Perl "undef". For details
78 about terminal symbols, see "Terminals" in Marpa::XS::Grammar.
79
80 The parser may accept or reject the token. If the parser accepted the
81 token, the "read" method returns the number of tokens which are
82 acceptable at the new current location. This number may be helpful in
83 guiding Ruby Slippers parsing.
84
85 "read" may return zero, which means that the next "read" call must
86 fail, because there is no token that will be acceptable to it. In the
87 default input model, where the "read" method is the only means of
88 inputing tokens, a zero return from a "read" method means that the
89 parse is exhausted -- that no more input is possible. More details on
90 "exhaustion" are in a section below.
91
92 Marpa may reject a token because it is not one of those acceptable at
93 the current location. When this happens, "read" returns a Perl
94 "undef". A rejected token need not end parsing -- it is perfectly
95 possible to retry the "read" call with another token. This is, in
96 fact, an important technique in Ruby Slippers parsing. For details,
97 see the section on Ruby Slippers parsing.
98
99 For other failures, including an attempt to "read" a token into an
100 exhausted parser, Marpa throws an exception.
101
102 set
103 $recce->set( { max_parses => 10, } );
104
105 The "set" method's arguments are references to hashes of named
106 arguments. The "set" method can be used to set or change named
107 arguments after the recognizer has been created. Details of the named
108 arguments are below.
109
110 value
111 my $value_ref = $recce->value;
112 my $value = $value_ref ? ${$value_ref} : 'No Parse';
113
114 Because Marpa parses ambiguous grammars, every parse is a series of
115 zero or more parse trees. There are zero parse trees if there was no
116 valid parse of the input according to the grammar.
117
118 The "value" method call evaluates the next parse tree in the parse
119 series, and returns a reference to the parse result for that parse
120 tree. If there are no more parse trees, the "value" method returns
121 "undef".
122
123 The "value" method's arguments are references to hashes of named
124 arguments. Details of the named arguments are below.
125
126 Several of the recognizer's named arguments are not allowed once
127 evaluation has begun. Evaluation of a parse series begins with its
128 first "value" method call. As a convenience, the named arguments of
129 the first "value" method call are treated as having been specified
130 BEFORE evaluation begins.
131
132 For example, the "end" named argument, which specifies the end of
133 parsing location in the input stream, may not be changed after
134 evaluation begins. The first "value" call of an ambiguous parse series
135 may have an "end" named argument, and its value will apply to the
136 entire parse series. If any subsequent call to "value" for that parse
137 series has an "end" named argument specified, an exception will be
138 thrown.
139
141 show_earley_sets
142 print $recce->show_earley_sets()
143 or die "print failed: $ERRNO";
144
145 An advanced, internals-oriented tracing method, which will not be of
146 interest to most users. Most users will want to use the
147 "show_progress" method instead. "show_earley_sets" returns a multi-
148 line string listing every Earley item in every Earley set.
149
150 show_progress
151 print $recce->show_progress()
152 or die "print failed: $ERRNO";
153
154 Returns a string describing the progress of the parse. With no
155 arguments, the string contains reports for the current location. With
156 a single integer argument N, the string contains reports for location
157 N. With two numeric arguments, N and M, the arguments are interpreted
158 as a range of locations and the returned string contains reports for
159 all locations in the range.
160
161 If an argument is negative, -N, it indicates the Nth location counting
162 backward from the furthest location of the parse. For example, if 42
163 was the furthest location, -1 would be location 42 and -2 would be
164 location 41. For example, the method call "$recce->show_progress(-3,
165 -1)" returns reports for the last three locations of the parse. The
166 method call "$recce->show_progress(0, -1)" will print progress reports
167 for the entire parse.
168
169 "show_progress" is Marpa's most powerful tool for debugging application
170 grammars. It can also be used to track the progress of a parse or to
171 investigate how a parse works. A much fuller description, with an
172 example, is in the document on debugging Marpa grammars.
173
175 The recognizer's named arguments are accepted by its "new", "set" and
176 "value" methods. Most of these named arguments are valid everywhere,
177 although depending on their meaning, they may have no effect. For
178 example, the "too_many_earley_items" named argument will have no effect
179 after input is complete.
180
181 Some of the recognizer's named arguments are not valid for all calls of
182 the "new", "set" and "value" methods, and cause an exception if used in
183 the wrong one. For example, a "grammar" named argument is only valid
184 when a recognizer is being constructed by the "new" method. If a named
185 argument can cause an exception due to use in the wrong method call,
186 this is mentioned in its description below.
187
188 closures
189 The value of "closures" named argument must be a reference to a hash.
190 In each key/value pair of this hash, the key must be an action name.
191 The hash value must be a CODE ref. The "closures" named argument is
192 not allowed once evaluation has begun.
193
194 When an action name is a key in the "closures" named argument, the
195 usual action resolution mechanism of the semantics is bypassed. One
196 common use of the "closures" named argument is to allow anonymous
197 subroutines to be semantic actions. For more details, see the document
198 on semantics.
199
200 end
201 The "end" named argument specifies the parse end location. The default
202 is for the parse to end where the input did, so that the parse returned
203 is of the entire input. The "end" named argument is not allowed once
204 evaluation has begun.
205
206 grammar
207 The "new" method is required to have a "grammar" named argument. Its
208 value must be a precomputed Marpa grammar object. The "grammar" named
209 argument is not allowed anywhere else.
210
211 max_parses
212 The value must be an integer. If it is greater than zero, the
213 evaluator will return no more than that number of parse results. If it
214 is zero, there will be no limit on the number of parse results
215 returned. The default is for there to be no limit.
216
217 Marpa allows extremely ambiguous grammars. "max_parses" can be used if
218 the user wants to see only the first few parse results of an ambiguous
219 parse. "max_parses" is also useful to limit CPU usage and output
220 length when testing and debugging.
221
222 ranking_method
223 The value must be a string: one of ""none"", ""rule"", or
224 ""high_rule_only"". When the value is ""none"", Marpa returns the
225 parse results in arbitrary order. This is the default. The
226 "ranking_method" named argument is not allowed once evaluation has
227 begun.
228
229 The ""rule"" and ""high_rule_only"" ranking methods allows the user to
230 control the order in which parse results are returned by the "value"
231 method, and to exclude some parse results from the parse series. For
232 details, see the document on parse order.
233
234 too_many_earley_items
235 The "too_many_earley_items" argument is optional. If specified, it
236 sets the Earley item warning threshold. If an Earley set becomes
237 larger than the Earley item warning threshold, a warning is printed to
238 the trace file handle.
239
240 Marpa parses from any BNF, and can handle grammars and inputs which
241 produce large Earley sets. But parsing that involves large Earley sets
242 can be slow. Large Earley sets are something most applications can,
243 and will wish to, avoid.
244
245 By default, Marpa calculates an Earley item warning threshold based on
246 the size of the grammar. The default threshold will never be less than
247 100. If the Earley item warning threshold is set to 0, warnings about
248 large Earley sets are turned off.
249
250 trace_actions
251 The "value" method's "trace_actions" named argument is a boolean. If
252 the boolean value is true, Marpa prints tracing information as it
253 resolves action names to Perl closures. A boolean value of false turns
254 tracing off, which is the default. Traces are written to the trace
255 file handle.
256
257 trace_file_handle
258 The value is a file handle. Traces and warning messages go to the
259 trace file handle. By default the trace file handle is inherited from
260 the grammar used to create the recognizer.
261
262 trace_terminals
263 Very handy in debugging, and often useful even when the problem is not
264 in the lexing. The value is a trace level. When the trace level is 0,
265 tracing of terminals is off. This is the default.
266
267 At a trace level of 1 or higher, Marpa produces a trace message for
268 each terminal as it is accepted or rejected by the recognizer. At a
269 trace level of 2 or higher, the trace messages include, for every
270 location, a list of the terminals expected. In practical grammars,
271 output from trace level 2 can be voluminous.
272
273 trace_values
274 The "value" method's "trace_values" named argument is a numeric trace
275 level. If the numeric trace level is 1, Marpa prints tracing
276 information as values are computed in the evaluation stack. A trace
277 level of 0 turns value tracing off, which is the default. Traces are
278 written to the trace file handle.
279
280 warnings
281 The value is a boolean. Warnings are written to the trace file handle.
282 By default, the recognizer's warnings are on. Usually, an application
283 will want to leave them on.
284
286 $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
287
288 my @tokens = (
289 [ 'Number', 42 ],
290 ['Multiply'], [ 'Number', 1 ],
291 ['Add'], [ 'Number', 7 ],
292 );
293
294 TOKEN: for ( my $token_ix = 0; $token_ix <= $#tokens; $token_ix++ ) {
295 defined $recce->read( @{ $tokens[$token_ix] } )
296 or fix_things( $recce, \@tokens )
297 or die q{Don't know how to fix things};
298 }
299
300 Marpa is able to tell the application which symbols are acceptable as
301 tokens at the next location in the parse. The "terminals_expected"
302 method returns the list of tokens that will be accepted by the next
303 "read". The application can use this information to change the input
304 "on the fly" so that it is acceptable to the parser.
305
306 An application can also take a "try it and see" approach. If an
307 application is not sure whether a token is acceptable or not, the
308 application can try to read the dubious token using the "read" method.
309 If the token is rejected, the "read" method call will return a Perl
310 "undef". At that point, the application can retry the "read" with a
311 different token.
312
313 An example
314 Marpa's HTML parser, Marpa::HTML, is an example of how Ruby Slippers
315 parsing can help with a non-trivial, real-life application. When a
316 token is rejected in Marpa::HTML, it changes the input to match the
317 parser's expectations by
318
319 • Modifying existing tokens, and
320
321 • Creating new tokens.
322
323 The second technique, the creation of new "virtual" tokens, is used by
324 Marpa::HTML to deal with omitted start and end tags. The actual HTML
325 grammar that Marpa::HTML uses takes an oversimplified view of the HTML
326 -- it assumes, even when the HTML standards do not require it, that
327 start and end tags are always present. For most HTML files of
328 interest, this assumption will be contrary to fact.
329
330 Ruby Slippers parsing is used to make the grammar's over-simplistic
331 view of the world come true for it. Whenever a token is rejected,
332 Marpa::HTML looks at the expected tokens list. If it sees that a start
333 or end tag is expected, Marpa::HTML creates a token for it -- a
334 completely new "virtual" token that gives the parser exactly what it
335 expects. Marpa::HTML then resumes input at the point in the original
336 input stream where it left off.
337
339 A parse is exhausted when it will accept no more input. In the default
340 input model, the "read" method indicates this by returning zero.
341
342 An exhausted parse is not necessarily a failed parse. Grammars are
343 often written so that once they "find what they are looking for", no
344 further input is acceptable. Grammars of that kind become exhausted
345 when they succeed.
346
348 Copyright 2012 Jeffrey Kegler
349 This file is part of Marpa::XS. Marpa::XS is free software: you can
350 redistribute it and/or modify it under the terms of the GNU Lesser
351 General Public License as published by the Free Software Foundation,
352 either version 3 of the License, or (at your option) any later version.
353
354 Marpa::XS is distributed in the hope that it will be useful,
355 but WITHOUT ANY WARRANTY; without even the implied warranty of
356 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
357 Lesser General Public License for more details.
358
359 You should have received a copy of the GNU Lesser
360 General Public License along with Marpa::XS. If not, see
361 http://www.gnu.org/licenses/.
362
363
364
365perl v5.38.0 2023-07-20 Marpa::XS::Recognizer(3pm)