1Regex(3) User Contributed Perl Documentation Regex(3)
2
3
4
6 YAPE::Regex - Yet Another Parser/Extractor for Regular Expressions
7
9 use YAPE::Regex;
10 use strict;
11
12 my $regex = qr/reg(ular\s+)?exp?(ression)?/i;
13 my $parser = YAPE::Regex->new($regex);
14
15 # here is the tokenizing part
16 while (my $chunk = $parser->next) {
17 # ...
18 }
19
21 The "YAPE" hierarchy of modules is an attempt at a unified means of
22 parsing and extracting content. It attempts to maintain a generic
23 interface, to promote simplicity and reusability. The API is powerful,
24 yet simple. The modules do tokenization (which can be intercepted) and
25 build trees, so that extraction of specific nodes is doable.
26
28 This module is yet another (?) parser and tree-builder for Perl regular
29 expressions. It builds a tree out of a regex, but at the moment, the
30 extent of the extraction tool for the tree is quite limited (see
31 "Extracting Sections"). However, the tree can be useful to extension
32 modules.
33
35 In addition to the base class, "YAPE::Regex", there is the auxiliary
36 class "YAPE::Regex::Element" (common to all "YAPE" base classes) that
37 holds the individual nodes' classes. There is documentation for the
38 node classes in that module's documentation.
39
40 Methods for "YAPE::Regex"
41 · "use YAPE::Regex;"
42
43 · "use YAPE::Regex qw( MyExt::Mod );"
44
45 If supplied no arguments, the module is loaded normally, and the
46 node classes are given the proper inheritence (from
47 "YAPE::Regex::Element"). If you supply a module (or list of
48 modules), "import" will automatically include them (if needed) and
49 set up their node classes with the proper inheritence -- that is,
50 it will append "YAPE::Regex" to @MyExt::Mod::ISA, and
51 "YAPE::Regex::xxx" to each node class's @ISA (where "xxx" is the
52 name of the specific node class).
53
54 package MyExt::Mod;
55 use YAPE::Regex 'MyExt::Mod';
56
57 # does the work of:
58 # @MyExt::Mod::ISA = 'YAPE::Regex'
59 # @MyExt::Mod::text::ISA = 'YAPE::Regex::text'
60 # ...
61
62 · "my $p = YAPE::Regex->new($REx);"
63
64 Creates a "YAPE::Regex" object, using the contents of $REx as a
65 regular expression. The "new" method will attempt to convert $REx
66 to a compiled regex (using "qr//") if $REx isn't already one. If
67 there is an error in the regex, this will fail, but the parser will
68 pretend it was ok. It will then report the bad token when it gets
69 to it, in the course of parsing.
70
71 · "my $text = $p->chunk($len);"
72
73 Returns the next $len characters in the input string; $len defaults
74 to 30 characters. This is useful for figuring out why a parsing
75 error occurs.
76
77 · "my $done = $p->done;"
78
79 Returns true if the parser is done with the input string, and false
80 otherwise.
81
82 · "my $errstr = $p->error;"
83
84 Returns the parser error message.
85
86 · "my $backref = $p->extract;"
87
88 Returns a code reference that returns the next back-reference in
89 the regex. For more information on enhancements in upcoming
90 versions of this module, check "Extracting Sections".
91
92 · "my $node = $p->display(...);"
93
94 Returns a string representation of the entire content. It calls
95 the "parse" method in case there is more data that has not yet been
96 parsed. This calls the "fullstring" method on the root nodes.
97 Check the "YAPE::Regex::Element" docs on the arguments to
98 "fullstring".
99
100 · "my $node = $p->next;"
101
102 Returns the next token, or "undef" if there is no valid token.
103 There will be an error message (accessible with the "error" method)
104 if there was a problem in the parsing.
105
106 · "my $node = $p->parse;"
107
108 Calls "next" until all the data has been parsed.
109
110 · "my $node = $p->root;"
111
112 Returns the root node of the tree structure.
113
114 · "my $state = $p->state;"
115
116 Returns the current state of the parser. It is one of the
117 following values: "alt", "anchor", "any", "backref", capture(N),
118 "Cchar", "class", "close", "code", "comment", "cond(TYPE)", "ctrl",
119 "cut", "done", "error", "flags", "group", "hex", "later",
120 "lookahead(neg|pos)", "lookbehind(neg|pos)", "macro", "named",
121 "oct", "slash", "text", and "utf8hex".
122
123 For capture(N), N will be the number the captured pattern
124 represents.
125
126 For "cond(TYPE)", TYPE will either be a number representing the
127 back-reference that the conditional depends on, or the string
128 "assert".
129
130 For "lookahead" and "lookbehind", one of "neg" and "pos" will be
131 there, depending on the type of assertion.
132
133 · "my $node = $p->top;"
134
135 Synonymous to "root".
136
137 Extracting Sections
138 While extraction of nodes is the goal of the "YAPE" modules, the author
139 is at a loss for words as to what needs to be extracted from a regex.
140 At the current time, all the "extract" method does is allow you access
141 to the regex's set of back-references:
142
143 my $extor = $parser->extract;
144 while (my $backref = $extor->()) {
145 # ...
146 }
147
148 "japhy" is very open to suggestions as to the approach to node
149 extraction (in how the API should look, in addition to what should be
150 proffered). Preliminary ideas include extraction keywords like the
151 output of -Dr (or the "re" module's "debug" option).
152
154 · "YAPE::Regex::Explain" 3.011
155
156 Presents an explanation of a regular expression, node by node.
157
158 · "YAPE::Regex::Reverse" (Not released)
159
160 Reverses the nodes of a regular expression.
161
163 This is a listing of things to add to future versions of this module.
164
165 API
166 · Create a robust "extract" method
167
168 Open to suggestions.
169
171 Following is a list of known or reported bugs.
172
173 Pending
174 · "use charnames ':full'"
175
176 To understand "\N{...}" properly, you must be using 5.6.0 or
177 higher. However, the parser only knows how to resolve full names
178 (those made using "use charnames ':full'"). There might be an
179 option in the future to specify a class name.
180
182 The "YAPE::Regex::Element" documentation, for information on the node
183 classes. Also, "Text::Balanced", Damian Conway's excellent module,
184 used for the matching of "(?{ ... })" and "(??{ ... })" blocks.
185
187 Jeff "japhy" Pinyan
188 CPAN ID: PINYAN
189 PINYAN@cpan.org
190
191
192
193perl v5.12.1 2009-11-30 Regex(3)