Marpa::XS::Recognizer(3pm)

1Marpa::XS::Recognizer(3Upsme)r Contributed Perl DocumentaMtairopna::XS::Recognizer(3pm)
2
3
4

NAME

6       Marpa::XS::Recognizer - Marpa recognizers
7

SYNOPSIS

9           my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
10           $recce->read( 'Number', 42 );
11           $recce->read( 'Multiply', );
12           $recce->read( 'Number', 1 );
13           $recce->read( 'Add', );
14           $recce->read( 'Number', 7 );
15

DESCRIPTION

17       To create a recognizer object, use the "new" method.
18
19       To read input, use the "read" method.
20
21       To evaluate a parse tree, based on the input, use the "value" method.
22
23   Token streams
24       By default, Marpa uses the token-stream model of input.  The token-
25       stream model is standard -- so standard the most documents about
26       parsing do not bother to describe it.  In the token-stream model, each
27       read adds a token at the current location, then advances the current
28       location by one.  The location before any input is numbered 0 and if N
29       tokens are parsed, they fill the locations from 1 to N.
30
31       This document will describe only the token-stream model of input.
32       Marpa allows other models of the input, but their use requires special
33       method calls, which are described in the document on alternative input
34       models.
35

CONSTRUCTOR

37   new
38           my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
39
40       The "new" method creates a recognizer object.  The "new" method either
41       returns a new recognizer object or throws an exception.
42
43       The arguments to the "new" method are references to hashes of named
44       arguments.  In each key/value pair of these hashes, the key is the
45       argument name, and the hash value is the value of the argument.  The
46       named arguments are described below.
47

ACCESSORS

49   terminals_expected
50           my $terminals_expected = $recce->terminals_expected();
51
52       Returns a reference to a list of strings, where the strings are the
53       names of the terminals acceptable at the current location.  In the
54       default input model, the presence of a terminal in this list means that
55       terminal will be acceptable in the next "read" method call.  This is
56       highly useful for Ruby Slippers parsing.
57
58   check_terminal
59           my $is_symbol_a_terminal = $recce->check_terminal('Document');
60
61       Returns a Perl true when its argument is the name of a terminal symbol.
62       Otherwise, returns a Perl false.  Not often needed.
63

MUTATORS

65   read
66           $recce->read( 'Number', 42 );
67           $recce->read( 'Multiply', );
68           $recce->read( 'Number', 1 );
69           $recce->read( 'Add', );
70           $recce->read( 'Number', 7 );
71
72       The "read" method reads one token at the current parse location.  It
73       then advances the current location by 1.
74
75       "read" takes two arguments: a token name and a token value.  The token
76       name is required.  It must be the name of a valid terminal symbol.  The
77       token value is optional.  It defaults to a Perl "undef".  For details
78       about terminal symbols, see "Terminals" in Marpa::XS::Grammar.
79
80       The parser may accept or reject the token.  If the parser accepted the
81       token, the "read" method returns the number of tokens which are
82       acceptable at the new current location.  This number may be helpful in
83       guiding Ruby Slippers parsing.
84
85       "read" may return zero, which means that the next "read" call must
86       fail, because there is no token that will be acceptable to it.  In the
87       default input model, where the "read" method is the only means of
88       inputing tokens, a zero return from a "read" method means that the
89       parse is exhausted -- that no more input is possible.  More details on
90       "exhaustion" are in a section below.
91
92       Marpa may reject a token because it is not one of those acceptable at
93       the current location.  When this happens, "read" returns a Perl
94       "undef".  A rejected token need not end parsing -- it is perfectly
95       possible to retry the "read" call with another token.  This is, in
96       fact, an important technique in Ruby Slippers parsing.  For details,
97       see the section on Ruby Slippers parsing.
98
99       For other failures, including an attempt to "read" a token into an
100       exhausted parser, Marpa throws an exception.
101
102   set
103           $recce->set( { max_parses => 10, } );
104
105       The "set" method's arguments are references to hashes of named
106       arguments.  The "set" method can be used to set or change named
107       arguments after the recognizer has been created.  Details of the named
108       arguments are below.
109
110   value
111           my $value_ref = $recce->value;
112           my $value = $value_ref ? ${$value_ref} : 'No Parse';
113
114       Because Marpa parses ambiguous grammars, every parse is a series of
115       zero or more parse trees.  There are zero parse trees if there was no
116       valid parse of the input according to the grammar.
117
118       The "value" method call evaluates the next parse tree in the parse
119       series, and returns a reference to the parse result for that parse
120       tree.  If there are no more parse trees, the "value" method returns
121       "undef".
122
123       The "value" method's arguments are references to hashes of named
124       arguments.  Details of the named arguments are below.
125
126       Several of the recognizer's named arguments are not allowed once
127       evaluation has begun.  Evaluation of a parse series begins with its
128       first "value" method call.  As a convenience, the named arguments of
129       the first "value" method call are treated as having been specified
130       BEFORE evaluation begins.
131
132       For example, the "end" named argument, which specifies the end of
133       parsing location in the input stream, may not be changed after
134       evaluation begins.  The first "value" call of an ambiguous parse series
135       may have an "end" named argument, and its value will apply to the
136       entire parse series.  If any subsequent call to "value" for that parse
137       series has an "end" named argument specified, an exception will be
138       thrown.
139

TRACE ACCESSORS

141   show_earley_sets
142           print $recce->show_earley_sets()
143               or die "print failed: $ERRNO";
144
145       An advanced, internals-oriented tracing method, which will not be of
146       interest to most users.  Most users will want to use the
147       "show_progress" method instead.  "show_earley_sets" returns a multi-
148       line string listing every Earley item in every Earley set.
149
150   show_progress
151           print $recce->show_progress()
152               or die "print failed: $ERRNO";
153
154       Returns a string describing the progress of the parse.  With no
155       arguments, the string contains reports for the current location.  With
156       a single integer argument N, the string contains reports for location
157       N.  With two numeric arguments, N and M, the arguments are interpreted
158       as a range of locations and the returned string contains reports for
159       all locations in the range.
160
161       If an argument is negative, -N, it indicates the Nth location counting
162       backward from the furthest location of the parse.  For example, if 42
163       was the furthest location, -1 would be location 42 and -2 would be
164       location 41.  For example, the method call "$recce->show_progress(-3,
165       -1)" returns reports for the last three locations of the parse.  The
166       method call "$recce->show_progress(0, -1)" will print progress reports
167       for the entire parse.
168
169       "show_progress" is Marpa's most powerful tool for debugging application
170       grammars.  It can also be used to track the progress of a parse or to
171       investigate how a parse works.  A much fuller description, with an
172       example, is in the document on debugging Marpa grammars.
173

NAMED ARGUMENTS

175       The recognizer's named arguments are accepted by its "new", "set" and
176       "value" methods.  Most of these named arguments are valid everywhere,
177       although depending on their meaning, they may have no effect.  For
178       example, the "too_many_earley_items" named argument will have no effect
179       after input is complete.
180
181       Some of the recognizer's named arguments are not valid for all calls of
182       the "new", "set" and "value" methods, and cause an exception if used in
183       the wrong one.  For example, a "grammar" named argument is only valid
184       when a recognizer is being constructed by the "new" method.  If a named
185       argument can cause an exception due to use in the wrong method call,
186       this is mentioned in its description below.
187
188   closures
189       The value of "closures" named argument must be a reference to a hash.
190       In each key/value pair of this hash, the key must be an action name.
191       The hash value must be a CODE ref.  The "closures" named argument is
192       not allowed once evaluation has begun.
193
194       When an action name is a key in the "closures" named argument, the
195       usual action resolution mechanism of the semantics is bypassed.  One
196       common use of the "closures" named argument is to allow anonymous
197       subroutines to be semantic actions.  For more details, see the document
198       on semantics.
199
200   end
201       The "end" named argument specifies the parse end location.  The default
202       is for the parse to end where the input did, so that the parse returned
203       is of the entire input.  The "end" named argument is not allowed once
204       evaluation has begun.
205
206   grammar
207       The "new" method is required to have a "grammar" named argument.  Its
208       value must be a precomputed Marpa grammar object.  The "grammar" named
209       argument is not allowed anywhere else.
210
211   max_parses
212       The value must be an integer.  If it is greater than zero, the
213       evaluator will return no more than that number of parse results.  If it
214       is zero, there will be no limit on the number of parse results
215       returned.  The default is for there to be no limit.
216
217       Marpa allows extremely ambiguous grammars.  "max_parses" can be used if
218       the user wants to see only the first few parse results of an ambiguous
219       parse.  "max_parses" is also useful to limit CPU usage and output
220       length when testing and debugging.
221
222   ranking_method
223       The value must be a string: one of ""none"", ""rule"", or
224       ""high_rule_only"".  When the value is ""none"", Marpa returns the
225       parse results in arbitrary order.  This is the default.  The
226       "ranking_method" named argument is not allowed once evaluation has
227       begun.
228
229       The ""rule"" and ""high_rule_only"" ranking methods allows the user to
230       control the order in which parse results are returned by the "value"
231       method, and to exclude some parse results from the parse series.  For
232       details, see the document on parse order.
233
234   too_many_earley_items
235       The "too_many_earley_items" argument is optional.  If specified, it
236       sets the Earley item warning threshold.  If an Earley set becomes
237       larger than the Earley item warning threshold, a warning is printed to
238       the trace file handle.
239
240       Marpa parses from any BNF, and can handle grammars and inputs which
241       produce large Earley sets.  But parsing that involves large Earley sets
242       can be slow.  Large Earley sets are something most applications can,
243       and will wish to, avoid.
244
245       By default, Marpa calculates an Earley item warning threshold based on
246       the size of the grammar.  The default threshold will never be less than
247       100.  If the Earley item warning threshold is set to 0, warnings about
248       large Earley sets are turned off.
249
250   trace_actions
251       The "value" method's "trace_actions" named argument is a boolean.  If
252       the boolean value is true, Marpa prints tracing information as it
253       resolves action names to Perl closures.  A boolean value of false turns
254       tracing off, which is the default.  Traces are written to the trace
255       file handle.
256
257   trace_file_handle
258       The value is a file handle.  Traces and warning messages go to the
259       trace file handle.  By default the trace file handle is inherited from
260       the grammar used to create the recognizer.
261
262   trace_terminals
263       Very handy in debugging, and often useful even when the problem is not
264       in the lexing.  The value is a trace level.  When the trace level is 0,
265       tracing of terminals is off.  This is the default.
266
267       At a trace level of 1 or higher, Marpa produces a trace message for
268       each terminal as it is accepted or rejected by the recognizer.  At a
269       trace level of 2 or higher, the trace messages include, for every
270       location, a list of the terminals expected.  In practical grammars,
271       output from trace level 2 can be voluminous.
272
273   trace_values
274       The "value" method's "trace_values" named argument is a numeric trace
275       level.  If the numeric trace level is 1, Marpa prints tracing
276       information as values are computed in the evaluation stack.  A trace
277       level of 0 turns value tracing off, which is the default.  Traces are
278       written to the trace file handle.
279
280   warnings
281       The value is a boolean.  Warnings are written to the trace file handle.
282       By default, the recognizer's warnings are on.  Usually, an application
283       will want to leave them on.
284

RUBY SLIPPERS PARSING

286           $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
287
288           my @tokens = (
289               [ 'Number', 42 ],
290               ['Multiply'], [ 'Number', 1 ],
291               ['Add'],      [ 'Number', 7 ],
292           );
293
294           TOKEN: for ( my $token_ix = 0; $token_ix <= $#tokens; $token_ix++ ) {
295               defined $recce->read( @{ $tokens[$token_ix] } )
296                   or fix_things( $recce, \@tokens )
297                   or die q{Don't know how to fix things};
298           }
299
300       Marpa is able to tell the application which symbols are acceptable as
301       tokens at the next location in the parse.  The "terminals_expected"
302       method returns the list of tokens that will be accepted by the next
303       "read".  The application can use this information to change the input
304       "on the fly" so that it is acceptable to the parser.
305
306       An application can also take a "try it and see" approach.  If an
307       application is not sure whether a token is acceptable or not, the
308       application can try to read the dubious token using the "read" method.
309       If the token is rejected, the "read" method call will return a Perl
310       "undef".  At that point, the application can retry the "read" with a
311       different token.
312
313   An example
314       Marpa's HTML parser, Marpa::HTML, is an example of how Ruby Slippers
315       parsing can help with a non-trivial, real-life application.  When a
316       token is rejected in Marpa::HTML, it changes the input to match the
317       parser's expectations by
318
319       •   Modifying existing tokens, and
320
321       •   Creating new tokens.
322
323       The second technique, the creation of new "virtual" tokens, is used by
324       Marpa::HTML to deal with omitted start and end tags.  The actual HTML
325       grammar that Marpa::HTML uses takes an oversimplified view of the HTML
326       -- it assumes, even when the HTML standards do not require it, that
327       start and end tags are always present.  For most HTML files of
328       interest, this assumption will be contrary to fact.
329
330       Ruby Slippers parsing is used to make the grammar's over-simplistic
331       view of the world come true for it.  Whenever a token is rejected,
332       Marpa::HTML looks at the expected tokens list.  If it sees that a start
333       or end tag is expected, Marpa::HTML creates a token for it -- a
334       completely new "virtual" token that gives the parser exactly what it
335       expects.  Marpa::HTML then resumes input at the point in the original
336       input stream where it left off.
337

PARSE EXHAUSTION

339       A parse is exhausted when it will accept no more input.  In the default
340       input model, the "read" method indicates this by returning zero.
341
342       An exhausted parse is not necessarily a failed parse.  Grammars are
343       often written so that once they "find what they are looking for", no
344       further input is acceptable.  Grammars of that kind become exhausted
345       when they succeed.
346

COPYRIGHT AND LICENSE

348         Copyright 2012 Jeffrey Kegler
349         This file is part of Marpa::XS.  Marpa::XS is free software: you can
350         redistribute it and/or modify it under the terms of the GNU Lesser
351         General Public License as published by the Free Software Foundation,
352         either version 3 of the License, or (at your option) any later version.
353
354         Marpa::XS is distributed in the hope that it will be useful,
355         but WITHOUT ANY WARRANTY; without even the implied warranty of
356         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
357         Lesser General Public License for more details.
358
359         You should have received a copy of the GNU Lesser
360         General Public License along with Marpa::XS.  If not, see
361         http://www.gnu.org/licenses/.
362
363
364
365perl v5.38.0                      2023-07-20        Marpa::XS::Recognizer(3pm)