1Tree::XPathEngine(3)  User Contributed Perl Documentation Tree::XPathEngine(3)
2
3
4

NAME

6       Tree::XPathEngine - a re-usable XPath engine
7

DESCRIPTION

9       This module provides an XPath engine, that can be re-used by other
10       module/classes that implement trees.
11
12       It is designed to be compatible with Class::XPath, ie it passes its
13       tests if you replace Class::XPath by Tree::XPathEngine.
14
15       This code is a more or less direct copy of the XML::XPath module by
16       Matt Sergeant. I only removed the XML processing part (that parses an
17       XML document and load it as a tree in memory) to remove the dependency
18       on XML::Parser, applied a couple of patches, removed a whole bunch of
19       XML specific things (comment, processing inistructions, namespaces...),
20       renamed a whole lot of methods to make Pod::Coverage happy, and changed
21       the docs.
22
23       The article eXtending XML XPath,
24       http://www.xmltwig.com/article/extending_xml_xpath/ should give authors
25       who want to use this module enough background to do so.
26
27       Otherwise, my email is below ;--)
28
29       WARNING: while the underlying code is rather solid, this module most
30       likely lacks docs.
31
32       As they say, "patches welcome"... but I am also interested in any
33       experience using this module, what were the tricky parts, and how could
34       the code or the docs be improved.
35

SYNOPSIS

37           use Tree::XPathEngine;
38
39           my $tree= my_tree->new( ...);
40           my $xp = Tree::XPathEngine->new();
41
42           my @nodeset = $xp->find('/root/kid/grankid[1]'); # find all first grankids
43
44           package tree;
45
46           # needs to provide these methods
47           sub xpath_get_name              { ... }
48           sub xpath_get_next_sibling      { ... }
49           sub xpath_get_previous_sibling  { ... }
50           sub xpath_get_root_node         { ... }
51           sub xpath_get_parent_node       { ... }
52           sub xpath_get_child_nodes       { ... }
53           sub xpath_is_element_node       { return 1; }
54           sub xpath_cmp                   { ... }
55           sub xpath_get_attributes        { ... } # only if attributes are used
56           sub xpath_to_literal            { ... } # only if you want to use findnodes_as_string or findvalue
57

DETAILS

API

60       The API of Tree::XPathEngine itself is extremely simple to allow you to
61       get going almost immediately. The deeper API's are more complex, but
62       you shouldn't have to touch most of that.
63
64   new %options
65       options
66
67       xpath_name_re
68           a regular expression used to match names (node names or attribute
69           names) by default it is qr/[A-Za-z_][\w.-]*/ in order to work under
70           perl 5.6.n, but you might want to use something like
71           qr/\p{L}[\w.-]*/ in 5.8.n, to accomodate letter outside of the
72           ascii range.
73
74   findnodes ($path, $context)
75       Returns a list of nodes found by $path, in context $context.  In scalar
76       context returns an "Tree::XPathEngine::NodeSet" object.
77
78   findnodes_as_string ($path, $context)
79       Returns the text values of the nodes
80
81   findvalue ($path, $context)
82       Returns either a "Tree::XPathEngine::Literal", a
83       "Tree::XPathEngine::Boolean" or a "Tree::XPathEngine::Number" object.
84       If the path returns a NodeSet, $nodeset->xpath_to_literal is called
85       automatically for you (and thus a "Tree::XPathEngine::Literal" is
86       returned). Note that for each of the objects stringification is
87       overloaded, so you can just print the value found, or manipulate it in
88       the ways you would a normal perl value (e.g. using regular
89       expressions).
90
91   exists ($path, $context)
92       Returns true if the given path exists.
93
94   matches($node, $path, $context)
95       Returns true if the node matches the path.
96
97   find ($path, $context)
98       The find function takes an XPath expression (a string) and returns
99       either a Tree::XPathEngine::NodeSet object containing the nodes it
100       found (or empty if no nodes matched the path), or one of
101       Tree::XPathEngine::Literal (a string), Tree::XPathEngine::Number, or
102       Tree::XPathEngine::Boolean. It should always return something - and you
103       can use ->isa() to find out what it returned. If you need to check how
104       many nodes it found you should check $nodeset->size.  See
105       Tree::XPathEngine::NodeSet.
106
107   XPath variables
108       XPath lets you use variables in expressions (see the XPath spec:
109       <http://www.w3.org/TR/xpath>).
110
111       set_var ($var_name, $val)
112           sets the variable $var_name to val
113
114       get_var ($var_name)
115           get the value of the variable (there should be no need to use this
116           method from outside the module, but it looked silly to have
117           "set_var" and "_get_var").
118

How to use this module

120       The purpose of this module is to add XPah support to generic tree
121       modules.
122
123       It works by letting you create a Tree::XPathEngine object, that will be
124       called to resolve XPath queries on a context. The context is a node (or
125       a list of nodes) in a tree.
126
127       The tree should share some characteristics with a XML tree: it is made
128       of nodes, there are 2 kinds of nodes, document (the whole tree, the
129       root of the tree is a child of this node), elements (regular nodes in
130       the tree) and attributes.
131
132       Nodes in the tree are expected to provide methods that will be called
133       by the XPath engine to resolve the query. Not all of the possible
134       methods need be available, depending on the type of XPath queries that
135       need to be supported: for example if the nodes do not have a text value
136       then there is no need for a "string_value" method, and XPath queries
137       cannot include the "string()" function (using it will trigger a runtime
138       error).
139
140       Most of the expected methods are usual methods for a tree module, so it
141       should not be too difficult to implement them, by aliasing existing
142       methods to the required ones.
143
144       Just in case, here is a fast way to alias for example your own "parent"
145       method to the "get_parent_node" needed by Tree::XPathEngine:
146
147         *get_parent_node= *parent; # in the node package
148
149       The XPath engine expects the whole tree and attributes to be full blown
150       objects, which provide a set of methods similar to nodes. If they are
151       not, see below for ways to "fake" it.
152
153   Methods to be provided by the nodes
154       xpath_get_name
155           returns the name of the node.
156
157           Not used for the document.
158
159       xpath_string_value
160           The text corresponding to the node, used by the "string()" function
161           (for queries like "//foo[string()="bar"]")
162
163       xpath_get_next_sibling
164       xpath_get_previous_sibling
165       xpath_get_root_node
166           returns the document object. see "Document object" below for more
167           details.
168
169       xpath_get_parent_node
170           The parent of the root of the tree is the document node.
171
172           The parent of an attribute is its element.
173
174       xpath_get_child_nodes
175           returns a list of children.
176
177           note that the attributes are not children of an element
178
179       xpath_is_element_node
180       xpath_is_document_node
181       xpath_is_attribute_node
182       xpath_is_text_node
183           only if the tree includes textual nodes
184
185       xpath_to_string
186           returns the node as a string
187
188       xpath_to_number
189           returns the node value as a number object
190
191             sub xpath_to_number
192               { return XML::XPath::Number->new( $_[0]->xpath_string_value); }
193
194       xpath_cmp ($node_a, $node_b)
195           compares 2 nodes and returns -1, 0 or 1 depending on whether
196           $a_node is before, equal to or after $b_node in the tree.
197
198           This is needed in order to return sorted results and to remove
199           duplicates.
200
201           See "Ordering nodesets" below for a ready-to-use sorting method if
202           your tree does not have a "cmp" method
203
204   Element specific methods
205       xpath_get_attributes
206           returns the list of attributes, attributes should be objects that
207           support the following methods:
208

Tricky bits

210   Document object
211       The original XPath works on XML, and is roughly speaking based on the
212       DOM model of an XML document. As far as the XPath engine is concerned,
213       it still deals with a DOM tree.
214
215       One of the possibly annoying consequences is that in the DOM the
216       document itself is a node, that has a single element child, the root of
217       the document tree. If the tree you want to use this module on doesn't
218       follow that model, if its root element is the tree itself, then you
219       will have to fake it.
220
221       This is how I did it in Tree::DAG_Node::XPath:
222
223         # in package Tree::DAG_Node::XPath
224         sub xpath_get_root_node
225         { my $node= shift;
226           # The parent of root is a Tree::DAG_Node::XPath::Root
227           # that helps getting the tree to mimic a DOM tree
228           return $node->root->xpath_get_parent_node;
229         }
230
231         sub xpath_get_parent_node
232           { my $node= shift;
233
234             return    $node->mother # normal case, any node but the root
235                       # the root parent is a Tree::DAG_Node::XPath::Root object
236                       # which contains the reference of the (real) root node
237                    || bless { root => $node }, 'Tree::DAG_Node::XPath::Root';
238           }
239
240         # class for the fake root for a tree
241         package Tree::DAG_Node::XPath::Root;
242
243
244         sub xpath_get_child_nodes   { return ( $_[0]->{root}); }
245         sub address                 { return -1; } # the root is before all other nodes
246         sub xpath_get_attributes    { return []  }
247         sub xpath_is_document_node  { return 1   }
248         sub xpath_is_element_node   { return 0   }
249         sub xpath_is_attribute_node { return 0   }
250
251   Attribute objects
252       If the attributes in the original tree are not objects, but simple
253       fields in a hash, you can generate objects on the fly:
254
255         # in the element package
256         sub xpath_get_attributes
257           { my $elt= shift;
258             my $atts= $elt->attributes; # returns a reference to a hash of attributes
259             my $rank=-1;                # used for sorting
260             my @atts= map { bless( { name => $_, value => $atts->{$_}, elt => $elt, rank => $rank -- },
261                                    'Tree::DAG_Node::XPath::Attribute')
262                           }
263                            sort keys %$atts;
264             return @atts;
265           }
266
267         # the attribute package
268         package Tree::DAG_Node::XPath::Attribute;
269         use Tree::XPathEngine::Number;
270
271         # not used, instead get_attributes in Tree::DAG_Node::XPath directly returns an
272         # object blessed in this class
273         #sub new
274         #  { my( $class, $elt, $att)= @_;
275         #    return bless { name => $att, value => $elt->att( $att), elt => $elt }, $class;
276         #  }
277
278         sub xpath_get_value         { return $_[0]->{value}; }
279         sub xpath_get_name          { return $_[0]->{name} ; }
280         sub xpath_string_value      { return $_[0]->{value}; }
281         sub xpath_to_number         { return Tree::XPathEngine::Number->new( $_[0]->{value}); }
282         sub xpath_is_document_node  { 0 }
283         sub xpath_is_element_node   { 0 }
284         sub xpath_is_attribute_node { 1 }
285         sub to_string         { return qq{$_[0]->{name}="$_[0]->{value}"}; }
286
287         # Tree::DAG_Node uses the address field to sort nodes, which simplifies things quite a bit
288         sub xpath_cmp { $_[0]->address cmp $_[1]->address }
289         sub address
290           { my $att= shift;
291             my $elt= $att->{elt};
292             return $elt->address . ':' . $att->{rank};
293           }
294
295   Ordering nodesets
296       XPath query results must be sorted, and duplicates removed, so the
297       XPath engine needs to be able to sort nodes.
298
299       I does so by calling the "cmp" method on nodes.
300
301       One of the easiest way to write such a method, for static trees, is to
302       have a method of the object return its position in the tree as a
303       number.
304
305       If that is not possible, here is a method that should work (note that
306       it only compares elements):
307
308        # in the tree element package
309
310         sub xpath_cmp($$)
311           { my( $a, $b)= @_;
312             if( UNIVERSAL::isa( $b, $ELEMENT))       # $ELEMENT is the tree element class
313               { # 2 elts, compare them
314                                         return $a->elt_cmp( $b);
315                     }
316             elsif( UNIVERSAL::isa( $b, $ATTRIBUTE))  # $ATTRIBUTE is the attribute class
317               { # elt <=> att, compare the elt to the att->{elt}
318                                         # if the elt is the att->{elt} (cmp return 0) then -1, elt is before att
319                 return ($a->elt_cmp( $b->{elt}) ) || -1 ;
320               }
321             elsif( UNIVERSAL::isa( $b, $TREE))        # $TREE is the tree class
322               { # elt <=> document, elt is after document
323                                         return 1;
324               }
325             else
326               { die "unknown node type ", ref( $b); }
327           }
328
329
330         sub elt_cmp
331           { my( $a, $b)=@_;
332
333             # easy cases
334             return  0 if( $a == $b);
335             return  1 if( $a->in($b)); # a starts after b
336             return -1 if( $b->in($a)); # a starts before b
337
338             # ancestors does not include the element itself
339             my @a_pile= ($a, $a->ancestors);
340             my @b_pile= ($b, $b->ancestors);
341
342             # the 2 elements are not in the same twig
343             return undef unless( $a_pile[-1] == $b_pile[-1]);
344
345             # find the first non common ancestors (they are siblings)
346             my $a_anc= pop @a_pile;
347             my $b_anc= pop @b_pile;
348
349             while( $a_anc == $b_anc)
350               { $a_anc= pop @a_pile;
351                 $b_anc= pop @b_pile;
352               }
353
354             # from there move left and right and figure out the order
355             my( $a_prev, $a_next, $b_prev, $b_next)= ($a_anc, $a_anc, $b_anc, $b_anc);
356             while()
357               { $a_prev= $a_prev->_prev_sibling || return( -1);
358                 return 1 if( $a_prev == $b_next);
359                 $a_next= $a_next->_next_sibling || return( 1);
360                 return -1 if( $a_next == $b_prev);
361                 $b_prev= $b_prev->_prev_sibling || return( 1);
362                 return -1 if( $b_prev == $a_next);
363                 $b_next= $b_next->_next_sibling || return( -1);
364                 return 1 if( $b_next == $a_prev);
365               }
366           }
367
368         sub in
369           { my ($self, $ancestor)= @_;
370             while( $self= $self->xpath_get_parent_node) { return $self if( $self ==  $ancestor); }
371           }
372
373         sub ancestors
374           { my( $self)= @_;
375             while( $self= $self->xpath_get_parent_node) { push @ancestors, $self; }
376             return @ancestors;
377           }
378
379         # in the attribute package
380         sub xpath_cmp($$)
381           { my( $a, $b)= @_;
382             if( UNIVERSAL::isa( $b, $ATTRIBUTE))
383               { # 2 attributes, compare their elements, then their name
384                 return ($a->{elt}->elt_cmp( $b->{elt}) ) || ($a->{name} cmp $b->{name});
385               }
386             elsif( UNIVERSAL::isa( $b, $ELEMENT))
387               { # att <=> elt : compare the att->elt and the elt
388                 # if att->elt is the elt (cmp returns 0) then 1 (elt is before att)
389                 return ($a->{elt}->elt_cmp( $b) ) || 1 ;
390               }
391             elsif( UNIVERSAL::isa( $b, $TREE))
392               { # att <=> document, att is after document
393                 return 1;
394               }
395             else
396               { die "unknown node type ", ref( $b); }
397           }
398

XPath extension

400       The module supports the XPath recommendation to the same extend as
401       XML::XPath (that is, rather completely).
402
403       It includes a perl-specific extension: direct support for regular
404       expressions.
405
406       You can use the usual (in Perl!) "=~" and "!~" operators. Regular
407       expressions are / delimited (no other delimiter is accepted, \ inside
408       regexp must be backslashed), the "imsx" modifiers can be used.
409
410         $xp->findnodes( '//@att[.=~ /^v.$/]'); # returns the list of attributes att
411                                                # whose value matches ^v.$
412

TODO

414       provide inheritable node and attribute classes for typical cases,
415       starting with nodes where the root IS the tree, and where attributes
416       are a simple hash (similar to what I did in Tree::DAG_Node).
417
418       better docs (patches welcome).
419

SEE ALSO

421       Tree::DAG_Node::XPath for an exemple of using this module
422
423       <http://www.xmltwig.com/article/extending_xml_xpath/> for background
424       information
425
426       Class::XPath, which is probably easier to use, but at this point
427       supports much less of XPath that Tree::XPathEngine.
428

AUTHOR

430       Michel Rodriguez, "<mirod@cpan.org>"
431
432       This code is heavily based on the code for XML::XPath by Matt Sergeant
433       copyright 2000 Axkit.com Ltd
434

BUGS

436       Please report any bugs or feature requests to
437       "bug-tree-xpathengine@rt.cpan.org", or through the web interface at
438       <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Tree-XPathEngine>.  I
439       will be notified, and then you'll automatically be notified of progress
440       on your bug as I make changes.
441

ACKNOWLEDGEMENTS

444       XML::XPath Copyright 2000-2004 AxKit.com Ltd.  Copyright 2006 Michel
445       Rodriguez, All Rights Reserved.
446
447       This program is free software; you can redistribute it and/or modify it
448       under the same terms as Perl itself.
449
450
451
452perl v5.34.0                      2022-01-21              Tree::XPathEngine(3)
Impressum