1Tree::XPathEngine(3) User Contributed Perl Documentation Tree::XPathEngine(3)
2
3
4
6 Tree::XPathEngine - a re-usable XPath engine
7
9 This module provides an XPath engine, that can be re-used by other
10 module/classes that implement trees.
11
12 It is designed to be compatible with Class::XPath, ie it passes its
13 tests if you replace Class::XPath by Tree::XPathEngine.
14
15 This code is a more or less direct copy of the XML::XPath module by
16 Matt Sergeant. I only removed the XML processing part (that parses an
17 XML document and load it as a tree in memory) to remove the dependency
18 on XML::Parser, applied a couple of patches, removed a whole bunch of
19 XML specific things (comment, processing inistructions, namespaces...),
20 renamed a whole lot of methods to make Pod::Coverage happy, and changed
21 the docs.
22
23 The article eXtending XML XPath,
24 http://www.xmltwig.com/article/extending_xml_xpath/ should give authors
25 who want to use this module enough background to do so.
26
27 Otherwise, my email is below ;--)
28
29 WARNING: while the underlying code is rather solid, this module most
30 likely lacks docs.
31
32 As they say, "patches welcome"... but I am also interested in any
33 experience using this module, what were the tricky parts, and how could
34 the code or the docs be improved.
35
37 use Tree::XPathEngine;
38
39 my $tree= my_tree->new( ...);
40 my $xp = Tree::XPathEngine->new();
41
42 my @nodeset = $xp->find('/root/kid/grankid[1]'); # find all first grankids
43
44 package tree;
45
46 # needs to provide these methods
47 sub xpath_get_name { ... }
48 sub xpath_get_next_sibling { ... }
49 sub xpath_get_previous_sibling { ... }
50 sub xpath_get_root_node { ... }
51 sub xpath_get_parent_node { ... }
52 sub xpath_get_child_nodes { ... }
53 sub xpath_is_element_node { return 1; }
54 sub xpath_cmp { ... }
55 sub xpath_get_attributes { ... } # only if attributes are used
56 sub xpath_to_literal { ... } # only if you want to use findnodes_as_string or findvalue
57
60 The API of Tree::XPathEngine itself is extremely simple to allow you to
61 get going almost immediately. The deeper API's are more complex, but
62 you shouldn't have to touch most of that.
63
64 new %options
65 options
66
67 xpath_name_re
68 a regular expression used to match names (node names or attribute
69 names) by default it is qr/[A-Za-z_][\w.-]*/ in order to work under
70 perl 5.6.n, but you might want to use something like
71 qr/\p{L}[\w.-]*/ in 5.8.n, to accomodate letter outside of the
72 ascii range.
73
74 findnodes ($path, $context)
75 Returns a list of nodes found by $path, in context $context. In scalar
76 context returns an "Tree::XPathEngine::NodeSet" object.
77
78 findnodes_as_string ($path, $context)
79 Returns the text values of the nodes
80
81 findvalue ($path, $context)
82 Returns either a "Tree::XPathEngine::Literal", a
83 "Tree::XPathEngine::Boolean" or a "Tree::XPathEngine::Number" object.
84 If the path returns a NodeSet, $nodeset->xpath_to_literal is called
85 automatically for you (and thus a "Tree::XPathEngine::Literal" is
86 returned). Note that for each of the objects stringification is
87 overloaded, so you can just print the value found, or manipulate it in
88 the ways you would a normal perl value (e.g. using regular
89 expressions).
90
91 exists ($path, $context)
92 Returns true if the given path exists.
93
94 matches($node, $path, $context)
95 Returns true if the node matches the path.
96
97 find ($path, $context)
98 The find function takes an XPath expression (a string) and returns
99 either a Tree::XPathEngine::NodeSet object containing the nodes it
100 found (or empty if no nodes matched the path), or one of
101 Tree::XPathEngine::Literal (a string), Tree::XPathEngine::Number, or
102 Tree::XPathEngine::Boolean. It should always return something - and you
103 can use ->isa() to find out what it returned. If you need to check how
104 many nodes it found you should check $nodeset->size. See
105 Tree::XPathEngine::NodeSet.
106
107 XPath variables
108 XPath lets you use variables in expressions (see the XPath spec:
109 <http://www.w3.org/TR/xpath>).
110
111 set_var ($var_name, $val)
112 sets the variable $var_name to val
113
114 get_var ($var_name)
115 get the value of the variable (there should be no need to use this
116 method from outside the module, but it looked silly to have
117 "set_var" and "_get_var").
118
120 The purpose of this module is to add XPah support to generic tree
121 modules.
122
123 It works by letting you create a Tree::XPathEngine object, that will be
124 called to resolve XPath queries on a context. The context is a node (or
125 a list of nodes) in a tree.
126
127 The tree should share some characteristics with a XML tree: it is made
128 of nodes, there are 2 kinds of nodes, document (the whole tree, the
129 root of the tree is a child of this node), elements (regular nodes in
130 the tree) and attributes.
131
132 Nodes in the tree are expected to provide methods that will be called
133 by the XPath engine to resolve the query. Not all of the possible
134 methods need be available, depending on the type of XPath queries that
135 need to be supported: for example if the nodes do not have a text value
136 then there is no need for a "string_value" method, and XPath queries
137 cannot include the "string()" function (using it will trigger a runtime
138 error).
139
140 Most of the expected methods are usual methods for a tree module, so it
141 should not be too difficult to implement them, by aliasing existing
142 methods to the required ones.
143
144 Just in case, here is a fast way to alias for example your own "parent"
145 method to the "get_parent_node" needed by Tree::XPathEngine:
146
147 *get_parent_node= *parent; # in the node package
148
149 The XPath engine expects the whole tree and attributes to be full blown
150 objects, which provide a set of methods similar to nodes. If they are
151 not, see below for ways to "fake" it.
152
153 Methods to be provided by the nodes
154 xpath_get_name
155 returns the name of the node.
156
157 Not used for the document.
158
159 xpath_string_value
160 The text corresponding to the node, used by the "string()" function
161 (for queries like "//foo[string()="bar"]")
162
163 xpath_get_next_sibling
164 xpath_get_previous_sibling
165 xpath_get_root_node
166 returns the document object. see "Document object" below for more
167 details.
168
169 xpath_get_parent_node
170 The parent of the root of the tree is the document node.
171
172 The parent of an attribute is its element.
173
174 xpath_get_child_nodes
175 returns a list of children.
176
177 note that the attributes are not children of an element
178
179 xpath_is_element_node
180 xpath_is_document_node
181 xpath_is_attribute_node
182 xpath_is_text_node
183 only if the tree includes textual nodes
184
185 xpath_to_string
186 returns the node as a string
187
188 xpath_to_number
189 returns the node value as a number object
190
191 sub xpath_to_number
192 { return XML::XPath::Number->new( $_[0]->xpath_string_value); }
193
194 xpath_cmp ($node_a, $node_b)
195 compares 2 nodes and returns -1, 0 or 1 depending on whether
196 $a_node is before, equal to or after $b_node in the tree.
197
198 This is needed in order to return sorted results and to remove
199 duplicates.
200
201 See "Ordering nodesets" below for a ready-to-use sorting method if
202 your tree does not have a "cmp" method
203
204 Element specific methods
205 xpath_get_attributes
206 returns the list of attributes, attributes should be objects that
207 support the following methods:
208
210 Document object
211 The original XPath works on XML, and is roughly speaking based on the
212 DOM model of an XML document. As far as the XPath engine is concerned,
213 it still deals with a DOM tree.
214
215 One of the possibly annoying consequences is that in the DOM the
216 document itself is a node, that has a single element child, the root of
217 the document tree. If the tree you want to use this module on doesn't
218 follow that model, if its root element is the tree itself, then you
219 will have to fake it.
220
221 This is how I did it in Tree::DAG_Node::XPath:
222
223 # in package Tree::DAG_Node::XPath
224 sub xpath_get_root_node
225 { my $node= shift;
226 # The parent of root is a Tree::DAG_Node::XPath::Root
227 # that helps getting the tree to mimic a DOM tree
228 return $node->root->xpath_get_parent_node;
229 }
230
231 sub xpath_get_parent_node
232 { my $node= shift;
233
234 return $node->mother # normal case, any node but the root
235 # the root parent is a Tree::DAG_Node::XPath::Root object
236 # which contains the reference of the (real) root node
237 || bless { root => $node }, 'Tree::DAG_Node::XPath::Root';
238 }
239
240 # class for the fake root for a tree
241 package Tree::DAG_Node::XPath::Root;
242
243
244 sub xpath_get_child_nodes { return ( $_[0]->{root}); }
245 sub address { return -1; } # the root is before all other nodes
246 sub xpath_get_attributes { return [] }
247 sub xpath_is_document_node { return 1 }
248 sub xpath_is_element_node { return 0 }
249 sub xpath_is_attribute_node { return 0 }
250
251 Attribute objects
252 If the attributes in the original tree are not objects, but simple
253 fields in a hash, you can generate objects on the fly:
254
255 # in the element package
256 sub xpath_get_attributes
257 { my $elt= shift;
258 my $atts= $elt->attributes; # returns a reference to a hash of attributes
259 my $rank=-1; # used for sorting
260 my @atts= map { bless( { name => $_, value => $atts->{$_}, elt => $elt, rank => $rank -- },
261 'Tree::DAG_Node::XPath::Attribute')
262 }
263 sort keys %$atts;
264 return @atts;
265 }
266
267 # the attribute package
268 package Tree::DAG_Node::XPath::Attribute;
269 use Tree::XPathEngine::Number;
270
271 # not used, instead get_attributes in Tree::DAG_Node::XPath directly returns an
272 # object blessed in this class
273 #sub new
274 # { my( $class, $elt, $att)= @_;
275 # return bless { name => $att, value => $elt->att( $att), elt => $elt }, $class;
276 # }
277
278 sub xpath_get_value { return $_[0]->{value}; }
279 sub xpath_get_name { return $_[0]->{name} ; }
280 sub xpath_string_value { return $_[0]->{value}; }
281 sub xpath_to_number { return Tree::XPathEngine::Number->new( $_[0]->{value}); }
282 sub xpath_is_document_node { 0 }
283 sub xpath_is_element_node { 0 }
284 sub xpath_is_attribute_node { 1 }
285 sub to_string { return qq{$_[0]->{name}="$_[0]->{value}"}; }
286
287 # Tree::DAG_Node uses the address field to sort nodes, which simplifies things quite a bit
288 sub xpath_cmp { $_[0]->address cmp $_[1]->address }
289 sub address
290 { my $att= shift;
291 my $elt= $att->{elt};
292 return $elt->address . ':' . $att->{rank};
293 }
294
295 Ordering nodesets
296 XPath query results must be sorted, and duplicates removed, so the
297 XPath engine needs to be able to sort nodes.
298
299 I does so by calling the "cmp" method on nodes.
300
301 One of the easiest way to write such a method, for static trees, is to
302 have a method of the object return its position in the tree as a
303 number.
304
305 If that is not possible, here is a method that should work (note that
306 it only compares elements):
307
308 # in the tree element package
309
310 sub xpath_cmp($$)
311 { my( $a, $b)= @_;
312 if( UNIVERSAL::isa( $b, $ELEMENT)) # $ELEMENT is the tree element class
313 { # 2 elts, compare them
314 return $a->elt_cmp( $b);
315 }
316 elsif( UNIVERSAL::isa( $b, $ATTRIBUTE)) # $ATTRIBUTE is the attribute class
317 { # elt <=> att, compare the elt to the att->{elt}
318 # if the elt is the att->{elt} (cmp return 0) then -1, elt is before att
319 return ($a->elt_cmp( $b->{elt}) ) || -1 ;
320 }
321 elsif( UNIVERSAL::isa( $b, $TREE)) # $TREE is the tree class
322 { # elt <=> document, elt is after document
323 return 1;
324 }
325 else
326 { die "unknown node type ", ref( $b); }
327 }
328
329
330 sub elt_cmp
331 { my( $a, $b)=@_;
332
333 # easy cases
334 return 0 if( $a == $b);
335 return 1 if( $a->in($b)); # a starts after b
336 return -1 if( $b->in($a)); # a starts before b
337
338 # ancestors does not include the element itself
339 my @a_pile= ($a, $a->ancestors);
340 my @b_pile= ($b, $b->ancestors);
341
342 # the 2 elements are not in the same twig
343 return undef unless( $a_pile[-1] == $b_pile[-1]);
344
345 # find the first non common ancestors (they are siblings)
346 my $a_anc= pop @a_pile;
347 my $b_anc= pop @b_pile;
348
349 while( $a_anc == $b_anc)
350 { $a_anc= pop @a_pile;
351 $b_anc= pop @b_pile;
352 }
353
354 # from there move left and right and figure out the order
355 my( $a_prev, $a_next, $b_prev, $b_next)= ($a_anc, $a_anc, $b_anc, $b_anc);
356 while()
357 { $a_prev= $a_prev->_prev_sibling || return( -1);
358 return 1 if( $a_prev == $b_next);
359 $a_next= $a_next->_next_sibling || return( 1);
360 return -1 if( $a_next == $b_prev);
361 $b_prev= $b_prev->_prev_sibling || return( 1);
362 return -1 if( $b_prev == $a_next);
363 $b_next= $b_next->_next_sibling || return( -1);
364 return 1 if( $b_next == $a_prev);
365 }
366 }
367
368 sub in
369 { my ($self, $ancestor)= @_;
370 while( $self= $self->xpath_get_parent_node) { return $self if( $self == $ancestor); }
371 }
372
373 sub ancestors
374 { my( $self)= @_;
375 while( $self= $self->xpath_get_parent_node) { push @ancestors, $self; }
376 return @ancestors;
377 }
378
379 # in the attribute package
380 sub xpath_cmp($$)
381 { my( $a, $b)= @_;
382 if( UNIVERSAL::isa( $b, $ATTRIBUTE))
383 { # 2 attributes, compare their elements, then their name
384 return ($a->{elt}->elt_cmp( $b->{elt}) ) || ($a->{name} cmp $b->{name});
385 }
386 elsif( UNIVERSAL::isa( $b, $ELEMENT))
387 { # att <=> elt : compare the att->elt and the elt
388 # if att->elt is the elt (cmp returns 0) then 1 (elt is before att)
389 return ($a->{elt}->elt_cmp( $b) ) || 1 ;
390 }
391 elsif( UNIVERSAL::isa( $b, $TREE))
392 { # att <=> document, att is after document
393 return 1;
394 }
395 else
396 { die "unknown node type ", ref( $b); }
397 }
398
400 The module supports the XPath recommendation to the same extend as
401 XML::XPath (that is, rather completely).
402
403 It includes a perl-specific extension: direct support for regular
404 expressions.
405
406 You can use the usual (in Perl!) "=~" and "!~" operators. Regular
407 expressions are / delimited (no other delimiter is accepted, \ inside
408 regexp must be backslashed), the "imsx" modifiers can be used.
409
410 $xp->findnodes( '//@att[.=~ /^v.$/]'); # returns the list of attributes att
411 # whose value matches ^v.$
412
414 provide inheritable node and attribute classes for typical cases,
415 starting with nodes where the root IS the tree, and where attributes
416 are a simple hash (similar to what I did in Tree::DAG_Node).
417
418 better docs (patches welcome).
419
421 Tree::DAG_Node::XPath for an exemple of using this module
422
423 <http://www.xmltwig.com/article/extending_xml_xpath/> for background
424 information
425
426 Class::XPath, which is probably easier to use, but at this point
427 supports much less of XPath that Tree::XPathEngine.
428
430 Michel Rodriguez, "<mirod@cpan.org>"
431
432 This code is heavily based on the code for XML::XPath by Matt Sergeant
433 copyright 2000 Axkit.com Ltd
434
436 Please report any bugs or feature requests to
437 "bug-tree-xpathengine@rt.cpan.org", or through the web interface at
438 <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Tree-XPathEngine>. I
439 will be notified, and then you'll automatically be notified of progress
440 on your bug as I make changes.
441
444 XML::XPath Copyright 2000-2004 AxKit.com Ltd. Copyright 2006 Michel
445 Rodriguez, All Rights Reserved.
446
447 This program is free software; you can redistribute it and/or modify it
448 under the same terms as Perl itself.
449
450
451
452perl v5.34.0 2022-01-21 Tree::XPathEngine(3)