1Mojo::DOM58(3) User Contributed Perl Documentation Mojo::DOM58(3)
2
3
4
6 Mojo::DOM58 - Minimalistic HTML/XML DOM parser with CSS selectors
7
9 use Mojo::DOM58;
10
11 # Parse
12 my $dom = Mojo::DOM58->new('<div><p id="a">Test</p><p id="b">123</p></div>');
13
14 # Find
15 say $dom->at('#b')->text;
16 say $dom->find('p')->map('text')->join("\n");
17 say $dom->find('[id]')->map(attr => 'id')->join("\n");
18
19 # Iterate
20 $dom->find('p[id]')->reverse->each(sub { say $_->{id} });
21
22 # Loop
23 for my $e ($dom->find('p[id]')->each) {
24 say $e->{id}, ':', $e->text;
25 }
26
27 # Modify
28 $dom->find('div p')->last->append('<p id="c">456</p>');
29 $dom->at('#c')->prepend($dom->new_tag('p', id => 'd', '789'));
30 $dom->find(':not(p)')->map('strip');
31
32 # Render
33 say "$dom";
34
36 Mojo::DOM58 is a minimalistic and relaxed pure-perl HTML/XML DOM parser
37 based on Mojo::DOM. It supports the HTML Living Standard
38 <https://html.spec.whatwg.org/> and Extensible Markup Language (XML)
39 1.0 <http://www.w3.org/TR/xml/>, and matching based on CSS3 selectors
40 <http://www.w3.org/TR/selectors/>. It will even try to interpret broken
41 HTML and XML, so you should not use it for validation.
42
44 Mojo::DOM58 is a fork of Mojo::DOM and tracks features and fixes to
45 stay closely compatible with upstream. It differs only in the
46 standalone format and compatibility with Perl 5.8. Any bugs or patches
47 not related to these changes should be reported directly to the
48 Mojolicious issue tracker.
49
50 This release of Mojo::DOM58 is up to date with version 8.09 of
51 Mojolicious.
52
54 When we parse an HTML/XML fragment, it gets turned into a tree of
55 nodes.
56
57 <!DOCTYPE html>
58 <html>
59 <head><title>Hello</title></head>
60 <body>World!</body>
61 </html>
62
63 There are currently eight different kinds of nodes, "cdata", "comment",
64 "doctype", "pi", "raw", "root", "tag" and "text". Elements are nodes of
65 the type "tag".
66
67 root
68 |- doctype (html)
69 +- tag (html)
70 |- tag (head)
71 | +- tag (title)
72 | +- raw (Hello)
73 +- tag (body)
74 +- text (World!)
75
76 While all node types are represented as Mojo::DOM58 objects, some
77 methods like "attr" and "namespace" only apply to elements.
78
80 Mojo::DOM58 defaults to HTML semantics, that means all tags and
81 attribute names are lowercased and selectors need to be lowercase as
82 well.
83
84 # HTML semantics
85 my $dom = Mojo::DOM58->new('<P ID="greeting">Hi!</P>');
86 say $dom->at('p[id]')->text;
87
88 If an XML declaration is found, the parser will automatically switch
89 into XML mode and everything becomes case-sensitive.
90
91 # XML semantics
92 my $dom = Mojo::DOM58->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
93 say $dom->at('P[ID]')->text;
94
95 HTML or XML semantics can also be forced with the "xml" method.
96
97 # Force HTML semantics
98 my $dom = Mojo::DOM58->new->xml(0)->parse('<P ID="greeting">Hi!</P>');
99 say $dom->at('p[id]')->text;
100
101 # Force XML semantics
102 my $dom = Mojo::DOM58->new->xml(1)->parse('<P ID="greeting">Hi!</P>');
103 say $dom->at('P[ID]')->text;
104
106 Mojo::DOM58 uses a CSS selector engine based on Mojo::DOM::CSS. All CSS
107 selectors that make sense for a standalone parser are supported.
108
109 * Any element.
110
111 my $all = $dom->find('*');
112
113 E An element of type "E".
114
115 my $title = $dom->at('title');
116
117 E[foo]
118 An "E" element with a "foo" attribute.
119
120 my $links = $dom->find('a[href]');
121
122 E[foo="bar"]
123 An "E" element whose "foo" attribute value is exactly equal to
124 "bar".
125
126 my $case_sensitive = $dom->find('input[type="hidden"]');
127 my $case_sensitive = $dom->find('input[type=hidden]');
128
129 E[foo="bar" i]
130 An "E" element whose "foo" attribute value is exactly equal to any
131 (ASCII-range) case-permutation of "bar". Note that this selector is
132 EXPERIMENTAL and might change without warning!
133
134 my $case_insensitive = $dom->find('input[type="hidden" i]');
135 my $case_insensitive = $dom->find('input[type=hidden i]');
136 my $case_insensitive = $dom->find('input[class~="foo" i]');
137
138 This selector is part of Selectors Level 4
139 <http://dev.w3.org/csswg/selectors-4>, which is still a work in
140 progress.
141
142 E[foo~="bar"]
143 An "E" element whose "foo" attribute value is a list of whitespace-
144 separated values, one of which is exactly equal to "bar".
145
146 my $foo = $dom->find('input[class~="foo"]');
147 my $foo = $dom->find('input[class~=foo]');
148
149 E[foo^="bar"]
150 An "E" element whose "foo" attribute value begins exactly with the
151 string "bar".
152
153 my $begins_with = $dom->find('input[name^="f"]');
154 my $begins_with = $dom->find('input[name^=f]');
155
156 E[foo$="bar"]
157 An "E" element whose "foo" attribute value ends exactly with the
158 string "bar".
159
160 my $ends_with = $dom->find('input[name$="o"]');
161 my $ends_with = $dom->find('input[name$=o]');
162
163 E[foo*="bar"]
164 An "E" element whose "foo" attribute value contains the substring
165 "bar".
166
167 my $contains = $dom->find('input[name*="fo"]');
168 my $contains = $dom->find('input[name*=fo]');
169
170 E[foo|="en"]
171 An "E" element whose "foo" attribute has a hyphen-separated list of
172 values beginning (from the left) with "en".
173
174 my $english = $dom->find('link[hreflang|=en]');
175
176 E:root
177 An "E" element, root of the document.
178
179 my $root = $dom->at(':root');
180
181 E:nth-child(n)
182 An "E" element, the "n-th" child of its parent.
183
184 my $third = $dom->find('div:nth-child(3)');
185 my $odd = $dom->find('div:nth-child(odd)');
186 my $even = $dom->find('div:nth-child(even)');
187 my $top3 = $dom->find('div:nth-child(-n+3)');
188
189 E:nth-last-child(n)
190 An "E" element, the "n-th" child of its parent, counting from the
191 last one.
192
193 my $third = $dom->find('div:nth-last-child(3)');
194 my $odd = $dom->find('div:nth-last-child(odd)');
195 my $even = $dom->find('div:nth-last-child(even)');
196 my $bottom3 = $dom->find('div:nth-last-child(-n+3)');
197
198 E:nth-of-type(n)
199 An "E" element, the "n-th" sibling of its type.
200
201 my $third = $dom->find('div:nth-of-type(3)');
202 my $odd = $dom->find('div:nth-of-type(odd)');
203 my $even = $dom->find('div:nth-of-type(even)');
204 my $top3 = $dom->find('div:nth-of-type(-n+3)');
205
206 E:nth-last-of-type(n)
207 An "E" element, the "n-th" sibling of its type, counting from the
208 last one.
209
210 my $third = $dom->find('div:nth-last-of-type(3)');
211 my $odd = $dom->find('div:nth-last-of-type(odd)');
212 my $even = $dom->find('div:nth-last-of-type(even)');
213 my $bottom3 = $dom->find('div:nth-last-of-type(-n+3)');
214
215 E:first-child
216 An "E" element, first child of its parent.
217
218 my $first = $dom->find('div p:first-child');
219
220 E:last-child
221 An "E" element, last child of its parent.
222
223 my $last = $dom->find('div p:last-child');
224
225 E:first-of-type
226 An "E" element, first sibling of its type.
227
228 my $first = $dom->find('div p:first-of-type');
229
230 E:last-of-type
231 An "E" element, last sibling of its type.
232
233 my $last = $dom->find('div p:last-of-type');
234
235 E:only-child
236 An "E" element, only child of its parent.
237
238 my $lonely = $dom->find('div p:only-child');
239
240 E:only-of-type
241 An "E" element, only sibling of its type.
242
243 my $lonely = $dom->find('div p:only-of-type');
244
245 E:empty
246 An "E" element that has no children (including text nodes).
247
248 my $empty = $dom->find(':empty');
249
250 E:link
251 An "E" element being the source anchor of a hyperlink of which the
252 target is not yet visited (":link") or already visited
253 (":visited"). Note that Mojo::DOM58 is not stateful, therefore
254 ":link" and ":visited" yield exactly the same results.
255
256 my $links = $dom->find(':link');
257 my $links = $dom->find(':visited');
258
259 E:visited
260 Alias for "E:link".
261
262 E:checked
263 A user interface element "E" which is checked (for instance a
264 radio-button or checkbox).
265
266 my $input = $dom->find(':checked');
267
268 E.warning
269 An "E" element whose class is "warning".
270
271 my $warning = $dom->find('div.warning');
272
273 E#myid
274 An "E" element with "ID" equal to "myid".
275
276 my $foo = $dom->at('div#foo');
277
278 E:not(s1, s2)
279 An "E" element that does not match either compound selector "s1" or
280 compound selector "s2". Note that support for compound selectors is
281 EXPERIMENTAL and might change without warning!
282
283 my $others = $dom->find('div p:not(:first-child, :last-child)');
284
285 Support for compound selectors was added as part of Selectors Level
286 4 <http://dev.w3.org/csswg/selectors-4>, which is still a work in
287 progress.
288
289 E:matches(s1, s2)
290 An "E" element that matches compound selector "s1" and/or compound
291 selector "s2". Note that this selector is EXPERIMENTAL and might
292 change without warning!
293
294 my $headers = $dom->find(':matches(section, article, aside, nav) h1');
295
296 This selector is part of Selectors Level 4
297 <http://dev.w3.org/csswg/selectors-4>, which is still a work in
298 progress.
299
300 A|E An "E" element that belongs to the namespace alias "A" from CSS
301 Namespaces Module Level 3 <https://www.w3.org/TR/css-
302 namespaces-3/>. Key/value pairs passed to selector methods are
303 used to declare namespace aliases.
304
305 my $elem = $dom->find('lq|elem', lq => 'http://example.com/q-markup');
306
307 Using an empty alias searches for an element that belongs to no
308 namespace.
309
310 my $div = $dom->find('|div');
311
312 E F An "F" element descendant of an "E" element.
313
314 my $headlines = $dom->find('div h1');
315
316 E > F
317 An "F" element child of an "E" element.
318
319 my $headlines = $dom->find('html > body > div > h1');
320
321 E + F
322 An "F" element immediately preceded by an "E" element.
323
324 my $second = $dom->find('h1 + h2');
325
326 E ~ F
327 An "F" element preceded by an "E" element.
328
329 my $second = $dom->find('h1 ~ h2');
330
331 E, F, G
332 Elements of type "E", "F" and "G".
333
334 my $headlines = $dom->find('h1, h2, h3');
335
336 E[foo=bar][bar=baz]
337 An "E" element whose attributes match all following attribute
338 selectors.
339
340 my $links = $dom->find('a[foo^=b][foo$=ar]');
341
343 Mojo::DOM58 overloads the following operators.
344
345 array
346 my @nodes = @$dom;
347
348 Alias for "child_nodes".
349
350 # "<!-- Test -->"
351 $dom->parse('<!-- Test --><b>123</b>')->[0];
352
353 bool
354 my $bool = !!$dom;
355
356 Always true.
357
358 hash
359 my %attrs = %$dom;
360
361 Alias for "attr".
362
363 # "test"
364 $dom->parse('<div id="test">Test</div>')->at('div')->{id};
365
366 stringify
367 my $str = "$dom";
368
369 Alias for "to_string".
370
372 Mojo::DOM58 implements the following functions, which can be imported
373 individually.
374
375 tag_to_html
376 my $str = tag_to_html 'div', id => 'foo', 'safe content';
377
378 Generate HTML/XML tag and render it right away. This is a significantly
379 faster alternative to "new_tag" for template systems that have to
380 generate a lot of tags.
381
383 Mojo::DOM58 implements the following methods.
384
385 new
386 my $dom = Mojo::DOM58->new;
387 my $dom = Mojo::DOM58->new('<foo bar="baz">I ♥ Mojo::DOM58!</foo>');
388
389 Construct a new scalar-based Mojo::DOM58 object and "parse" HTML/XML
390 fragment if necessary.
391
392 new_tag
393 my $tag = Mojo::DOM58->new_tag('div');
394 my $tag = $dom->new_tag('div');
395 my $tag = $dom->new_tag('div', id => 'foo', hidden => undef);
396 my $tag = $dom->new_tag('div', 'safe content');
397 my $tag = $dom->new_tag('div', id => 'foo', 'safe content');
398 my $tag = $dom->new_tag('div', data => {mojo => 'rocks'}, 'safe content');
399 my $tag = $dom->new_tag('div', id => 'foo', sub { 'unsafe content' });
400
401 Construct a new Mojo::DOM58 object for an HTML/XML tag with or without
402 attributes and content. The "data" attribute may contain a hash
403 reference with key/value pairs to generate attributes from.
404
405 # "<br>"
406 $dom->new_tag('br');
407
408 # "<div></div>"
409 $dom->new_tag('div');
410
411 # "<div id="foo" hidden></div>"
412 $dom->new_tag('div', id => 'foo', hidden => undef);
413
414 # "<div>test & 123</div>"
415 $dom->new_tag('div', 'test & 123');
416
417 # "<div id="foo">test & 123</div>"
418 $dom->new_tag('div', id => 'foo', 'test & 123');
419
420 # "<div data-foo="1" data-bar="test">test & 123</div>""
421 $dom->new_tag('div', data => {foo => 1, Bar => 'test'}, 'test & 123');
422
423 # "<div id="foo">test & 123</div>"
424 $dom->new_tag('div', id => 'foo', sub { 'test & 123' });
425
426 # "<div>Hello<b>Mojo!</b></div>"
427 $dom->parse('<div>Hello</div>')->at('div')
428 ->append_content($dom->new_tag('b', 'Mojo!'))->root;
429
430 all_text
431 my $text = $dom->all_text;
432
433 Extract text content from all descendant nodes of this element.
434
435 # "foo\nbarbaz\n"
436 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->all_text;
437
438 ancestors
439 my $collection = $dom->ancestors;
440 my $collection = $dom->ancestors('div ~ p');
441
442 Find all ancestor elements of this node matching the CSS selector and
443 return a collection containing these elements as Mojo::DOM58 objects.
444 All selectors listed in "SELECTORS" are supported.
445
446 # List tag names of ancestor elements
447 say $dom->ancestors->map('tag')->join("\n");
448
449 append
450 $dom = $dom->append('<p>I ♥ Mojo::DOM58!</p>');
451 $dom = $dom->append(Mojo::DOM58->new);
452
453 Append HTML/XML fragment to this node (for all node types other than
454 "root").
455
456 # "<div><h1>Test</h1><h2>123</h2></div>"
457 $dom->parse('<div><h1>Test</h1></div>')
458 ->at('h1')->append('<h2>123</h2>')->root;
459
460 # "<p>Test 123</p>"
461 $dom->parse('<p>Test</p>')->at('p')
462 ->child_nodes->first->append(' 123')->root;
463
464 append_content
465 $dom = $dom->append_content('<p>I ♥ Mojo::DOM58!</p>');
466 $dom = $dom->append_content(Mojo::DOM58->new);
467
468 Append HTML/XML fragment (for "root" and "tag" nodes) or raw content to
469 this node's content.
470
471 # "<div><h1>Test123</h1></div>"
472 $dom->parse('<div><h1>Test</h1></div>')
473 ->at('h1')->append_content('123')->root;
474
475 # "<!-- Test 123 --><br>"
476 $dom->parse('<!-- Test --><br>')
477 ->child_nodes->first->append_content('123 ')->root;
478
479 # "<p>Test<i>123</i></p>"
480 $dom->parse('<p>Test</p>')->at('p')->append_content('<i>123</i>')->root;
481
482 at
483 my $result = $dom->at('div ~ p');
484 my $result = $dom->at('svg|line', svg => 'http://www.w3.org/2000/svg');
485
486 Find first descendant element of this element matching the CSS selector
487 and return it as a Mojo::DOM58 object, or "undef" if none could be
488 found. All selectors listed in "SELECTORS" are supported.
489
490 # Find first element with "svg" namespace definition
491 my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};
492
493 Trailing key/value pairs can be used to declare xml namespace aliases.
494
495 # "<rect />"
496 $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
497 ->at('svg|rect', svg => 'http://www.w3.org/2000/svg');
498
499 attr
500 my $hash = $dom->attr;
501 my $foo = $dom->attr('foo');
502 $dom = $dom->attr({foo => 'bar'});
503 $dom = $dom->attr(foo => 'bar');
504
505 This element's attributes.
506
507 # Remove an attribute
508 delete $dom->attr->{id};
509
510 # Attribute without value
511 $dom->attr(selected => undef);
512
513 # List id attributes
514 say $dom->find('*')->map(attr => 'id')->compact->join("\n");
515
516 child_nodes
517 my $collection = $dom->child_nodes;
518
519 Return a collection containing all child nodes of this element as
520 Mojo::DOM58 objects.
521
522 # "<p><b>123</b></p>"
523 $dom->parse('<p>Test<b>123</b></p>')->at('p')->child_nodes->first->remove;
524
525 # "<!DOCTYPE html>"
526 $dom->parse('<!DOCTYPE html><b>123</b>')->child_nodes->first;
527
528 # " Test "
529 $dom->parse('<b>123</b><!-- Test -->')->child_nodes->last->content;
530
531 children
532 my $collection = $dom->children;
533 my $collection = $dom->children('div ~ p');
534
535 Find all child elements of this element matching the CSS selector and
536 return a collection containing these elements as Mojo::DOM58 objects.
537 All selectors listed in "SELECTORS" are supported.
538
539 # Show tag name of random child element
540 say $dom->children->shuffle->first->tag;
541
542 content
543 my $str = $dom->content;
544 $dom = $dom->content('<p>I ♥ Mojo::DOM58!</p>');
545 $dom = $dom->content(Mojo::DOM58->new);
546
547 Return this node's content or replace it with HTML/XML fragment (for
548 "root" and "tag" nodes) or raw content.
549
550 # "<b>Test</b>"
551 $dom->parse('<div><b>Test</b></div>')->at('div')->content;
552
553 # "<div><h1>123</h1></div>"
554 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('123')->root;
555
556 # "<p><i>123</i></p>"
557 $dom->parse('<p>Test</p>')->at('p')->content('<i>123</i>')->root;
558
559 # "<div><h1></h1></div>"
560 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('')->root;
561
562 # " Test "
563 $dom->parse('<!-- Test --><br>')->child_nodes->first->content;
564
565 # "<div><!-- 123 -->456</div>"
566 $dom->parse('<div><!-- Test -->456</div>')
567 ->at('div')->child_nodes->first->content(' 123 ')->root;
568
569 descendant_nodes
570 my $collection = $dom->descendant_nodes;
571
572 Return a collection containing all descendant nodes of this element as
573 Mojo::DOM58 objects.
574
575 # "<p><b>123</b></p>"
576 $dom->parse('<p><!-- Test --><b>123<!-- 456 --></b></p>')
577 ->descendant_nodes->grep(sub { $_->type eq 'comment' })
578 ->map('remove')->first;
579
580 # "<p><b>test</b>test</p>"
581 $dom->parse('<p><b>123</b>456</p>')
582 ->at('p')->descendant_nodes->grep(sub { $_->type eq 'text' })
583 ->map(content => 'test')->first->root;
584
585 find
586 my $collection = $dom->find('div ~ p');
587 my $collection = $dom->find('svg|line', svg => 'http://www.w3.org/2000/svg');
588
589 Find all descendant elements of this element matching the CSS selector
590 and return a collection containing these elements as Mojo::DOM58
591 objects. All selectors listed in "SELECTORS" are supported.
592
593 # Find a specific element and extract information
594 my $id = $dom->find('div')->[23]{id};
595
596 # Extract information from multiple elements
597 my @headers = $dom->find('h1, h2, h3')->map('text')->each;
598
599 # Count all the different tags
600 my $hash = $dom->find('*')->reduce(sub { $a->{$b->tag}++; $a }, {});
601
602 # Find elements with a class that contains dots
603 my @divs = $dom->find('div.foo\.bar')->each;
604
605 Trailing key/value pairs can be used to declare xml namespace aliases.
606
607 # "<rect />"
608 $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
609 ->find('svg|rect', svg => 'http://www.w3.org/2000/svg')->first;
610
611 following
612 my $collection = $dom->following;
613 my $collection = $dom->following('div ~ p');
614
615 Find all sibling elements after this node matching the CSS selector and
616 return a collection containing these elements as Mojo::DOM58 objects.
617 All selectors listed in "SELECTORS" are supported.
618
619 # List tags of sibling elements after this node
620 say $dom->following->map('tag')->join("\n");
621
622 following_nodes
623 my $collection = $dom->following_nodes;
624
625 Return a collection containing all sibling nodes after this node as
626 Mojo::DOM58 objects.
627
628 # "C"
629 $dom->parse('<p>A</p><!-- B -->C')->at('p')->following_nodes->last->content;
630
631 matches
632 my $bool = $dom->matches('div ~ p');
633 my $bool = $dom->matches('svg|line', svg => 'http://www.w3.org/2000/svg');
634
635 Check if this element matches the CSS selector. All selectors listed in
636 "SELECTORS" are supported.
637
638 # True
639 $dom->parse('<p class="a">A</p>')->at('p')->matches('.a');
640 $dom->parse('<p class="a">A</p>')->at('p')->matches('p[class]');
641
642 # False
643 $dom->parse('<p class="a">A</p>')->at('p')->matches('.b');
644 $dom->parse('<p class="a">A</p>')->at('p')->matches('p[id]');
645
646 Trailing key/value pairs can be used to declare xml namespace aliases.
647
648 # True
649 $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
650 ->matches('svg|rect', svg => 'http://www.w3.org/2000/svg');
651
652 namespace
653 my $namespace = $dom->namespace;
654
655 Find this element's namespace, or return "undef" if none could be
656 found.
657
658 # Find namespace for an element with namespace prefix
659 my $namespace = $dom->at('svg > svg\:circle')->namespace;
660
661 # Find namespace for an element that may or may not have a namespace prefix
662 my $namespace = $dom->at('svg > circle')->namespace;
663
664 next
665 my $sibling = $dom->next;
666
667 Return Mojo::DOM58 object for next sibling element, or "undef" if there
668 are no more siblings.
669
670 # "<h2>123</h2>"
671 $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h1')->next;
672
673 next_node
674 my $sibling = $dom->next_node;
675
676 Return Mojo::DOM58 object for next sibling node, or "undef" if there
677 are no more siblings.
678
679 # "456"
680 $dom->parse('<p><b>123</b><!-- Test -->456</p>')
681 ->at('b')->next_node->next_node;
682
683 # " Test "
684 $dom->parse('<p><b>123</b><!-- Test -->456</p>')
685 ->at('b')->next_node->content;
686
687 parent
688 my $parent = $dom->parent;
689
690 Return Mojo::DOM58 object for parent of this node, or "undef" if this
691 node has no parent.
692
693 # "<b><i>Test</i></b>"
694 $dom->parse('<p><b><i>Test</i></b></p>')->at('i')->parent;
695
696 parse
697 $dom = $dom->parse('<foo bar="baz">I ♥ Mojo::DOM58!</foo>');
698
699 Parse HTML/XML fragment.
700
701 # Parse XML
702 my $dom = Mojo::DOM58->new->xml(1)->parse('<foo>I ♥ Mojo::DOM58!</foo>');
703
704 preceding
705 my $collection = $dom->preceding;
706 my $collection = $dom->preceding('div ~ p');
707
708 Find all sibling elements before this node matching the CSS selector
709 and return a collection containing these elements as Mojo::DOM58
710 objects. All selectors listed in "SELECTORS" are supported.
711
712 # List tags of sibling elements before this node
713 say $dom->preceding->map('tag')->join("\n");
714
715 preceding_nodes
716 my $collection = $dom->preceding_nodes;
717
718 Return a collection containing all sibling nodes before this node as
719 Mojo::DOM58 objects.
720
721 # "A"
722 $dom->parse('A<!-- B --><p>C</p>')->at('p')->preceding_nodes->first->content;
723
724 prepend
725 $dom = $dom->prepend('<p>I ♥ Mojo::DOM58!</p>');
726 $dom = $dom->prepend(Mojo::DOM58->new);
727
728 Prepend HTML/XML fragment to this node (for all node types other than
729 "root").
730
731 # "<div><h1>Test</h1><h2>123</h2></div>"
732 $dom->parse('<div><h2>123</h2></div>')
733 ->at('h2')->prepend('<h1>Test</h1>')->root;
734
735 # "<p>Test 123</p>"
736 $dom->parse('<p>123</p>')
737 ->at('p')->child_nodes->first->prepend('Test ')->root;
738
739 prepend_content
740 $dom = $dom->prepend_content('<p>I ♥ Mojo::DOM58!</p>');
741 $dom = $dom->prepend_content(Mojo::DOM58->new);
742
743 Prepend HTML/XML fragment (for "root" and "tag" nodes) or raw content
744 to this node's content.
745
746 # "<div><h2>Test123</h2></div>"
747 $dom->parse('<div><h2>123</h2></div>')
748 ->at('h2')->prepend_content('Test')->root;
749
750 # "<!-- Test 123 --><br>"
751 $dom->parse('<!-- 123 --><br>')
752 ->child_nodes->first->prepend_content(' Test')->root;
753
754 # "<p><i>123</i>Test</p>"
755 $dom->parse('<p>Test</p>')->at('p')->prepend_content('<i>123</i>')->root;
756
757 previous
758 my $sibling = $dom->previous;
759
760 Return Mojo::DOM58 object for previous sibling element, or "undef" if
761 there are no more siblings.
762
763 # "<h1>Test</h1>"
764 $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h2')->previous;
765
766 previous_node
767 my $sibling = $dom->previous_node;
768
769 Return Mojo::DOM58 object for previous sibling node, or "undef" if
770 there are no more siblings.
771
772 # "123"
773 $dom->parse('<p>123<!-- Test --><b>456</b></p>')
774 ->at('b')->previous_node->previous_node;
775
776 # " Test "
777 $dom->parse('<p>123<!-- Test --><b>456</b></p>')
778 ->at('b')->previous_node->content;
779
780 remove
781 my $parent = $dom->remove;
782
783 Remove this node and return "root" (for "root" nodes) or "parent".
784
785 # "<div></div>"
786 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->remove;
787
788 # "<p><b>456</b></p>"
789 $dom->parse('<p>123<b>456</b></p>')
790 ->at('p')->child_nodes->first->remove->root;
791
792 replace
793 my $parent = $dom->replace('<div>I ♥ Mojo::DOM58!</div>');
794 my $parent = $dom->replace(Mojo::DOM58->new);
795
796 Replace this node with HTML/XML fragment and return "root" (for "root"
797 nodes) or "parent".
798
799 # "<div><h2>123</h2></div>"
800 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->replace('<h2>123</h2>');
801
802 # "<p><b>123</b></p>"
803 $dom->parse('<p>Test</p>')
804 ->at('p')->child_nodes->[0]->replace('<b>123</b>')->root;
805
806 root
807 my $root = $dom->root;
808
809 Return Mojo::DOM58 object for "root" node.
810
811 selector
812 my $selector = $dom->selector;
813
814 Get a unique CSS selector for this element.
815
816 # "ul:nth-child(1) > li:nth-child(2)"
817 $dom->parse('<ul><li>Test</li><li>123</li></ul>')->find('li')->last->selector;
818
819 # "p:nth-child(1) > b:nth-child(1) > i:nth-child(1)"
820 $dom->parse('<p><b><i>Test</i></b></p>')->at('i')->selector;
821
822 strip
823 my $parent = $dom->strip;
824
825 Remove this element while preserving its content and return "parent".
826
827 # "<div>Test</div>"
828 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;
829
830 tag
831 my $tag = $dom->tag;
832 $dom = $dom->tag('div');
833
834 This element's tag name.
835
836 # List tag names of child elements
837 say $dom->children->map('tag')->join("\n");
838
839 tap
840 $dom = $dom->tap(sub {...});
841
842 Equivalent to "tap" in Mojo::Base.
843
844 text
845 my $text = $dom->text;
846
847 Extract text content from this element only (not including child
848 elements).
849
850 # "bar"
851 $dom->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;
852
853 # "foo\nbaz\n"
854 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;
855
856 to_string
857 my $str = $dom->to_string;
858
859 Render this node and its content to HTML/XML.
860
861 # "<b>Test</b>"
862 $dom->parse('<div><b>Test</b></div>')->at('div b')->to_string;
863
864 tree
865 my $tree = $dom->tree;
866 $dom = $dom->tree(['root']);
867
868 Document Object Model. Note that this structure should only be used
869 very carefully since it is very dynamic.
870
871 type
872 my $type = $dom->type;
873
874 This node's type, usually "cdata", "comment", "doctype", "pi", "raw",
875 "root", "tag" or "text".
876
877 # "cdata"
878 $dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;
879
880 # "comment"
881 $dom->parse('<!-- Test -->')->child_nodes->first->type;
882
883 # "doctype"
884 $dom->parse('<!DOCTYPE html>')->child_nodes->first->type;
885
886 # "pi"
887 $dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;
888
889 # "raw"
890 $dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;
891
892 # "root"
893 $dom->parse('<p>Test</p>')->type;
894
895 # "tag"
896 $dom->parse('<p>Test</p>')->at('p')->type;
897
898 # "text"
899 $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;
900
901 val
902 my $value = $dom->val;
903
904 Extract value from form element (such as "button", "input", "option",
905 "select" and "textarea"), or return "undef" if this element has no
906 value. In the case of "select" with "multiple" attribute, find "option"
907 elements with "selected" attribute and return an array reference with
908 all values, or "undef" if none could be found.
909
910 # "a"
911 $dom->parse('<input name=test value=a>')->at('input')->val;
912
913 # "b"
914 $dom->parse('<textarea>b</textarea>')->at('textarea')->val;
915
916 # "c"
917 $dom->parse('<option value="c">Test</option>')->at('option')->val;
918
919 # "d"
920 $dom->parse('<select><option selected>d</option></select>')
921 ->at('select')->val;
922
923 # "e"
924 $dom->parse('<select multiple><option selected>e</option></select>')
925 ->at('select')->val->[0];
926
927 # "on"
928 $dom->parse('<input name=test type=checkbox>')->at('input')->val;
929
930 with_roles
931 my $new_class = Mojo::DOM58->with_roles('Mojo::DOM58::Role::One');
932 my $new_class = Mojo::DOM58->with_roles('+One', '+Two');
933 $dom = $dom->with_roles('+One', '+Two');
934
935 Equivalent to "with_roles" in Mojo::Base. Note that role support
936 depends on Role::Tiny (2.000001+).
937
938 wrap
939 $dom = $dom->wrap('<div></div>');
940 $dom = $dom->wrap(Mojo::DOM58->new);
941
942 Wrap HTML/XML fragment around this node (for all node types other than
943 "root"), placing it as the last child of the first innermost element.
944
945 # "<p>123<b>Test</b></p>"
946 $dom->parse('<b>Test</b>')->at('b')->wrap('<p>123</p>')->root;
947
948 # "<div><p><b>Test</b></p>123</div>"
949 $dom->parse('<b>Test</b>')->at('b')->wrap('<div><p></p>123</div>')->root;
950
951 # "<p><b>Test</b></p><p>123</p>"
952 $dom->parse('<b>Test</b>')->at('b')->wrap('<p></p><p>123</p>')->root;
953
954 # "<p><b>Test</b></p>"
955 $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->wrap('<b>')->root;
956
957 wrap_content
958 $dom = $dom->wrap_content('<div></div>');
959 $dom = $dom->wrap_content(Mojo::DOM58->new);
960
961 Wrap HTML/XML fragment around this node's content (for "root" and "tag"
962 nodes), placing it as the last children of the first innermost element.
963
964 # "<p><b>123Test</b></p>"
965 $dom->parse('<p>Test<p>')->at('p')->wrap_content('<b>123</b>')->root;
966
967 # "<p><b>Test</b></p><p>123</p>"
968 $dom->parse('<b>Test</b>')->wrap_content('<p></p><p>123</p>');
969
970 xml
971 my $bool = $dom->xml;
972 $dom = $dom->xml($bool);
973
974 Disable HTML semantics in parser and activate case-sensitivity,
975 defaults to auto detection based on XML declarations.
976
978 Some Mojo::DOM58 methods return an array-based collection object based
979 on Mojo::Collection, which can either be accessed directly as an array
980 reference, or with the following methods.
981
982 # Chain methods
983 $collection->map(sub { ucfirst })->shuffle->each(sub {
984 my ($word, $num) = @_;
985 say "$num: $word";
986 });
987
988 # Access array directly to manipulate collection
989 $collection->[23] += 100;
990 say for @$collection;
991
992 compact
993 my $new = $collection->compact;
994
995 Create a new collection with all elements that are defined and not an
996 empty string.
997
998 # $collection contains (0, 1, undef, 2, '', 3)
999 $collection->compact->join(', '); # "0, 1, 2, 3"
1000
1001 each
1002 my @elements = $collection->each;
1003 $collection = $collection->each(sub {...});
1004
1005 Evaluate callback for each element in collection or return all elements
1006 as a list if none has been provided. The element will be the first
1007 argument passed to the callback and is also available as $_.
1008
1009 # Make a numbered list
1010 $collection->each(sub {
1011 my ($e, $num) = @_;
1012 say "$num: $e";
1013 });
1014
1015 first
1016 my $first = $collection->first;
1017 my $first = $collection->first(qr/foo/);
1018 my $first = $collection->first(sub {...});
1019 my $first = $collection->first($method);
1020 my $first = $collection->first($method, @args);
1021
1022 Evaluate regular expression/callback for, or call method on, each
1023 element in collection and return the first one that matched the regular
1024 expression, or for which the callback/method returned true. The element
1025 will be the first argument passed to the callback and is also available
1026 as $_.
1027
1028 # Longer version
1029 my $first = $collection->first(sub { $_->$method(@args) });
1030
1031 # Find first value that contains the word "mojo"
1032 my $interesting = $collection->first(qr/mojo/i);
1033
1034 # Find first value that is greater than 5
1035 my $greater = $collection->first(sub { $_ > 5 });
1036
1037 flatten
1038 my $new = $collection->flatten;
1039
1040 Flatten nested collections/arrays recursively and create a new
1041 collection with all elements.
1042
1043 # $collection contains (1, [2, [3, 4], 5, [6]], 7)
1044 $collection->flatten->join(', '); # "1, 2, 3, 4, 5, 6, 7"
1045
1046 grep
1047 my $new = $collection->grep(qr/foo/);
1048 my $new = $collection->grep(sub {...});
1049 my $new = $collection->grep($method);
1050 my $new = $collection->grep($method, @args);
1051
1052 Evaluate regular expression/callback for, or call method on, each
1053 element in collection and create a new collection with all elements
1054 that matched the regular expression, or for which the callback/method
1055 returned true. The element will be the first argument passed to the
1056 callback and is also available as $_.
1057
1058 # Longer version
1059 my $new = $collection->grep(sub { $_->$method(@args) });
1060
1061 # Find all values that contain the word "mojo"
1062 my $interesting = $collection->grep(qr/mojo/i);
1063
1064 # Find all values that are greater than 5
1065 my $greater = $collection->grep(sub { $_ > 5 });
1066
1067 join
1068 my $stream = $collection->join;
1069 my $stream = $collection->join("\n");
1070
1071 Turn collection into string.
1072
1073 # Join all values with commas
1074 $collection->join(', ');
1075
1076 last
1077 my $last = $collection->last;
1078
1079 Return the last element in collection.
1080
1081 map
1082 my $new = $collection->map(sub {...});
1083 my $new = $collection->map($method);
1084 my $new = $collection->map($method, @args);
1085
1086 Evaluate callback for, or call method on, each element in collection
1087 and create a new collection from the results. The element will be the
1088 first argument passed to the callback and is also available as $_.
1089
1090 # Longer version
1091 my $new = $collection->map(sub { $_->$method(@args) });
1092
1093 # Append the word "mojo" to all values
1094 my $domified = $collection->map(sub { $_ . 'mojo' });
1095
1096 reduce
1097 my $result = $collection->reduce(sub {...});
1098 my $result = $collection->reduce(sub {...}, $initial);
1099
1100 Reduce elements in collection with callback, the first element will be
1101 used as initial value if none has been provided.
1102
1103 # Calculate the sum of all values
1104 my $sum = $collection->reduce(sub { $a + $b });
1105
1106 # Count how often each value occurs in collection
1107 my $hash = $collection->reduce(sub { $a->{$b}++; $a }, {});
1108
1109 reverse
1110 my $new = $collection->reverse;
1111
1112 Create a new collection with all elements in reverse order.
1113
1114 slice
1115 my $new = $collection->slice(4 .. 7);
1116
1117 Create a new collection with all selected elements.
1118
1119 # $collection contains ('A', 'B', 'C', 'D', 'E')
1120 $collection->slice(1, 2, 4)->join(' '); # "B C E"
1121
1122 shuffle
1123 my $new = $collection->shuffle;
1124
1125 Create a new collection with all elements in random order.
1126
1127 size
1128 my $size = $collection->size;
1129
1130 Number of elements in collection.
1131
1132 sort
1133 my $new = $collection->sort;
1134 my $new = $collection->sort(sub {...});
1135
1136 Sort elements based on return value of callback and create a new
1137 collection from the results.
1138
1139 # Sort values case-insensitive
1140 my $case_insensitive = $collection->sort(sub { uc($a) cmp uc($b) });
1141
1142 tap
1143 $collection = $collection->tap(sub {...});
1144
1145 Equivalent to "tap" in Mojo::Base.
1146
1147 to_array
1148 my $array = $collection->to_array;
1149
1150 Turn collection into array reference.
1151
1152 uniq
1153 my $new = $collection->uniq;
1154 my $new = $collection->uniq(sub {...});
1155 my $new = $collection->uniq($method);
1156 my $new = $collection->uniq($method, @args);
1157
1158 Create a new collection without duplicate elements, using the string
1159 representation of either the elements or the return value of the
1160 callback/method to decide uniqueness. Note that "undef" and empty
1161 string are treated the same.
1162
1163 # Longer version
1164 my $new = $collection->uniq(sub { $_->$method(@args) });
1165
1166 # $collection contains ('foo', 'bar', 'bar', 'baz')
1167 $collection->uniq->join(' '); # "foo bar baz"
1168
1169 # $collection contains ([1, 2], [2, 1], [3, 2])
1170 $collection->uniq(sub{ $_->[1] })->to_array; # "[[1, 2], [2, 1]]"
1171
1172 with_roles
1173 $collection = $collection->with_roles('Mojo::Collection::Role::One');
1174
1175 Equivalent to "with_roles" in Mojo::Base. Note that role support
1176 depends on Role::Tiny (2.000001+).
1177
1179 Report issues related to the format of this distribution or Perl 5.8
1180 support to the public bugtracker. Any other issues should be reported
1181 directly to the upstream Mojolicious issue tracker.
1182
1184 Dan Book <dbook@cpan.org>
1185
1186 Code and tests adapted from Mojo::DOM, a lightweight DOM parser by the
1187 Mojolicious team.
1188
1190 Matt S Trout (mst)
1191
1193 Copyright (c) 2008-2016 Sebastian Riedel and others.
1194
1195 Copyright (c) 2016 "AUTHOR" and "CONTRIBUTORS" for adaptation to
1196 standalone format.
1197
1198 This is free software, licensed under:
1199
1200 The Artistic License 2.0 (GPL Compatible)
1201
1203 Mojo::DOM, HTML::TreeBuilder, XML::LibXML, XML::Twig, XML::Smart
1204
1205
1206
1207perl v5.30.1 2020-01-30 Mojo::DOM58(3)