1Regex(3) User Contributed Perl Documentation Regex(3)
2
3
4
6 YAPE::Regex - Yet Another Parser/Extractor for Regular Expressions
7
9 This document refers to YAPE::Regex version 4.00.
10
12 use YAPE::Regex;
13 use strict;
14
15 my $regex = qr/reg(ular\s+)?exp?(ression)?/i;
16 my $parser = YAPE::Regex->new($regex);
17
18 # here is the tokenizing part
19 while (my $chunk = $parser->next) {
20 # ...
21 }
22
24 The "YAPE" hierarchy of modules is an attempt at a unified means of
25 parsing and extracting content. It attempts to maintain a generic
26 interface, to promote simplicity and reusability. The API is powerful,
27 yet simple. The modules do tokenization (which can be intercepted) and
28 build trees, so that extraction of specific nodes is doable.
29
31 This module is yet another (?) parser and tree-builder for Perl regular
32 expressions. It builds a tree out of a regex, but at the moment, the
33 extent of the extraction tool for the tree is quite limited (see
34 "Extracting Sections"). However, the tree can be useful to extension
35 modules.
36
38 In addition to the base class, "YAPE::Regex", there is the auxiliary
39 class "YAPE::Regex::Element" (common to all "YAPE" base classes) that
40 holds the individual nodes' classes. There is documentation for the
41 node classes in that module's documentation.
42
43 Methods for "YAPE::Regex"
44 • "use YAPE::Regex;"
45
46 • "use YAPE::Regex qw( MyExt::Mod );"
47
48 If supplied no arguments, the module is loaded normally, and the
49 node classes are given the proper inheritence (from
50 "YAPE::Regex::Element"). If you supply a module (or list of
51 modules), "import" will automatically include them (if needed) and
52 set up their node classes with the proper inheritence -- that is,
53 it will append "YAPE::Regex" to @MyExt::Mod::ISA, and
54 "YAPE::Regex::xxx" to each node class's @ISA (where "xxx" is the
55 name of the specific node class).
56
57 package MyExt::Mod;
58 use YAPE::Regex 'MyExt::Mod';
59
60 # does the work of:
61 # @MyExt::Mod::ISA = 'YAPE::Regex'
62 # @MyExt::Mod::text::ISA = 'YAPE::Regex::text'
63 # ...
64
65 • "my $p = YAPE::Regex->new($REx);"
66
67 Creates a "YAPE::Regex" object, using the contents of $REx as a
68 regular expression. The "new" method will attempt to convert $REx
69 to a compiled regex (using "qr//") if $REx isn't already one. If
70 there is an error in the regex, this will fail, but the parser will
71 pretend it was ok. It will then report the bad token when it gets
72 to it, in the course of parsing.
73
74 • "my $text = $p->chunk($len);"
75
76 Returns the next $len characters in the input string; $len defaults
77 to 30 characters. This is useful for figuring out why a parsing
78 error occurs.
79
80 • "my $done = $p->done;"
81
82 Returns true if the parser is done with the input string, and false
83 otherwise.
84
85 • "my $errstr = $p->error;"
86
87 Returns the parser error message.
88
89 • "my $backref = $p->extract;"
90
91 Returns a code reference that returns the next back-reference in
92 the regex. For more information on enhancements in upcoming
93 versions of this module, check "Extracting Sections".
94
95 • "my $node = $p->display(...);"
96
97 Returns a string representation of the entire content. It calls
98 the "parse" method in case there is more data that has not yet been
99 parsed. This calls the "fullstring" method on the root nodes.
100 Check the "YAPE::Regex::Element" docs on the arguments to
101 "fullstring".
102
103 • "my $node = $p->next;"
104
105 Returns the next token, or "undef" if there is no valid token.
106 There will be an error message (accessible with the "error" method)
107 if there was a problem in the parsing.
108
109 • "my $node = $p->parse;"
110
111 Calls "next" until all the data has been parsed.
112
113 • "my $node = $p->root;"
114
115 Returns the root node of the tree structure.
116
117 • "my $state = $p->state;"
118
119 Returns the current state of the parser. It is one of the
120 following values: "alt", "anchor", "any", "backref", capture(N),
121 "Cchar", "class", "close", "code", "comment", cond(TYPE), "ctrl",
122 "cut", "done", "error", "flags", "group", "hex", "later",
123 lookahead(neg|pos), lookbehind(neg|pos), "macro", "named", "oct",
124 "slash", "text", and "utf8hex".
125
126 For capture(N), N will be the number the captured pattern
127 represents.
128
129 For cond(TYPE), TYPE will either be a number representing the back-
130 reference that the conditional depends on, or the string "assert".
131
132 For "lookahead" and "lookbehind", one of "neg" and "pos" will be
133 there, depending on the type of assertion.
134
135 • "my $node = $p->top;"
136
137 Synonymous to "root".
138
139 Extracting Sections
140 While extraction of nodes is the goal of the "YAPE" modules, the author
141 is at a loss for words as to what needs to be extracted from a regex.
142 At the current time, all the "extract" method does is allow you access
143 to the regex's set of back-references:
144
145 my $extor = $parser->extract;
146 while (my $backref = $extor->()) {
147 # ...
148 }
149
150 "japhy" is very open to suggestions as to the approach to node
151 extraction (in how the API should look, in addition to what should be
152 proffered). Preliminary ideas include extraction keywords like the
153 output of -Dr (or the "re" module's "debug" option).
154
156 • "YAPE::Regex::Explain"
157
158 Presents an explanation of a regular expression, node by node.
159
160 • "YAPE::Regex::Reverse" (Not released)
161
162 Reverses the nodes of a regular expression.
163
165 This is a listing of things to add to future versions of this module.
166
167 API
168 • Create a robust "extract" method
169
170 Open to suggestions.
171
173 Following is a list of known or reported bugs.
174
175 Pending
176 • "use charnames ':full'"
177
178 To understand "\N{...}" properly, you must be using 5.6.0 or
179 higher. However, the parser only knows how to resolve full names
180 (those made using use charnames ':full'). There might be an option
181 in the future to specify a class name.
182
184 The "YAPE::Regex::Element" documentation, for information on the node
185 classes. Also, "Text::Balanced", Damian Conway's excellent module,
186 used for the matching of "(?{ ... })" and "(??{ ... })" blocks.
187
189 The original author is Jeff "japhy" Pinyan (CPAN ID: PINYAN).
190
191 Gene Sullivan (gsullivan@cpan.org) is a co-maintainer.
192
194 This module is free software; you can redistribute it and/or modify it
195 under the same terms as Perl itself. See perlartistic.
196
197
198
199perl v5.36.0 2023-01-20 Regex(3)