1Mojo::DOM58(3) User Contributed Perl Documentation Mojo::DOM58(3)
2
3
4
6 Mojo::DOM58 - Minimalistic HTML/XML DOM parser with CSS selectors
7
9 use Mojo::DOM58;
10
11 # Parse
12 my $dom = Mojo::DOM58->new('<div><p id="a">Test</p><p id="b">123</p></div>');
13
14 # Find
15 say $dom->at('#b')->text;
16 say $dom->find('p')->map('text')->join("\n");
17 say $dom->find('[id]')->map(attr => 'id')->join("\n");
18
19 # Iterate
20 $dom->find('p[id]')->reverse->each(sub { say $_->{id} });
21
22 # Loop
23 for my $e ($dom->find('p[id]')->each) {
24 say $e->{id}, ':', $e->text;
25 }
26
27 # Modify
28 $dom->find('div p')->last->append('<p id="c">456</p>');
29 $dom->at('#c')->prepend($dom->new_tag('p', id => 'd', '789'));
30 $dom->find(':not(p)')->map('strip');
31
32 # Render
33 say "$dom";
34
36 Mojo::DOM58 is a minimalistic and relaxed pure-perl HTML/XML DOM parser
37 based on Mojo::DOM. It supports the HTML Living Standard
38 <https://html.spec.whatwg.org/> and Extensible Markup Language (XML)
39 1.0 <https://www.w3.org/TR/xml/>, and matching based on CSS3 selectors
40 <https://www.w3.org/TR/selectors/>. It will even try to interpret
41 broken HTML and XML, so you should not use it for validation.
42
44 Mojo::DOM58 is a fork of Mojo::DOM and tracks features and fixes to
45 stay closely compatible with upstream. It differs only in the
46 standalone format and compatibility with Perl 5.8. Any bugs or patches
47 not related to these changes should be reported directly to the
48 Mojolicious issue tracker.
49
50 This release of Mojo::DOM58 is up to date with version 9.0 of
51 Mojolicious.
52
54 When we parse an HTML/XML fragment, it gets turned into a tree of
55 nodes.
56
57 <!DOCTYPE html>
58 <html>
59 <head><title>Hello</title></head>
60 <body>World!</body>
61 </html>
62
63 There are currently eight different kinds of nodes, "cdata", "comment",
64 "doctype", "pi", "raw", "root", "tag" and "text". Elements are nodes of
65 the type "tag".
66
67 root
68 |- doctype (html)
69 +- tag (html)
70 |- tag (head)
71 | +- tag (title)
72 | +- raw (Hello)
73 +- tag (body)
74 +- text (World!)
75
76 While all node types are represented as Mojo::DOM58 objects, some
77 methods like "attr" and "namespace" only apply to elements.
78
80 Mojo::DOM58 defaults to HTML semantics, that means all tags and
81 attribute names are lowercased and selectors need to be lowercase as
82 well.
83
84 # HTML semantics
85 my $dom = Mojo::DOM58->new('<P ID="greeting">Hi!</P>');
86 say $dom->at('p[id]')->text;
87
88 If an XML declaration is found, the parser will automatically switch
89 into XML mode and everything becomes case-sensitive.
90
91 # XML semantics
92 my $dom = Mojo::DOM58->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
93 say $dom->at('P[ID]')->text;
94
95 HTML or XML semantics can also be forced with the "xml" method.
96
97 # Force HTML semantics
98 my $dom = Mojo::DOM58->new->xml(0)->parse('<P ID="greeting">Hi!</P>');
99 say $dom->at('p[id]')->text;
100
101 # Force XML semantics
102 my $dom = Mojo::DOM58->new->xml(1)->parse('<P ID="greeting">Hi!</P>');
103 say $dom->at('P[ID]')->text;
104
106 Mojo::DOM58 uses a CSS selector engine based on Mojo::DOM::CSS. All CSS
107 selectors that make sense for a standalone parser are supported.
108
109 * Any element.
110
111 my $all = $dom->find('*');
112
113 E An element of type "E".
114
115 my $title = $dom->at('title');
116
117 E[foo]
118 An "E" element with a "foo" attribute.
119
120 my $links = $dom->find('a[href]');
121
122 E[foo="bar"]
123 An "E" element whose "foo" attribute value is exactly equal to
124 "bar".
125
126 my $case_sensitive = $dom->find('input[type="hidden"]');
127 my $case_sensitive = $dom->find('input[type=hidden]');
128
129 E[foo="bar" i]
130 An "E" element whose "foo" attribute value is exactly equal to any
131 (ASCII-range) case-permutation of "bar". Note that this selector is
132 EXPERIMENTAL and might change without warning!
133
134 my $case_insensitive = $dom->find('input[type="hidden" i]');
135 my $case_insensitive = $dom->find('input[type=hidden i]');
136 my $case_insensitive = $dom->find('input[class~="foo" i]');
137
138 This selector is part of Selectors Level 4
139 <https://dev.w3.org/csswg/selectors-4>, which is still a work in
140 progress.
141
142 E[foo="bar" s]
143 An "E" element whose "foo" attribute value is exactly and case-
144 sensitively equal to "bar". Note that this selector is EXPERIMENTAL
145 and might change without warning!
146
147 my $case_sensitive = $dom->find('input[type="hidden" s]');
148
149 This selector is part of Selectors Level 4
150 <https://dev.w3.org/csswg/selectors-4>, which is still a work in
151 progress.
152
153 E[foo~="bar"]
154 An "E" element whose "foo" attribute value is a list of whitespace-
155 separated values, one of which is exactly equal to "bar".
156
157 my $foo = $dom->find('input[class~="foo"]');
158 my $foo = $dom->find('input[class~=foo]');
159
160 E[foo^="bar"]
161 An "E" element whose "foo" attribute value begins exactly with the
162 string "bar".
163
164 my $begins_with = $dom->find('input[name^="f"]');
165 my $begins_with = $dom->find('input[name^=f]');
166
167 E[foo$="bar"]
168 An "E" element whose "foo" attribute value ends exactly with the
169 string "bar".
170
171 my $ends_with = $dom->find('input[name$="o"]');
172 my $ends_with = $dom->find('input[name$=o]');
173
174 E[foo*="bar"]
175 An "E" element whose "foo" attribute value contains the substring
176 "bar".
177
178 my $contains = $dom->find('input[name*="fo"]');
179 my $contains = $dom->find('input[name*=fo]');
180
181 E[foo|="en"]
182 An "E" element whose "foo" attribute has a hyphen-separated list of
183 values beginning (from the left) with "en".
184
185 my $english = $dom->find('link[hreflang|=en]');
186
187 E:root
188 An "E" element, root of the document.
189
190 my $root = $dom->at(':root');
191
192 E:nth-child(n)
193 An "E" element, the "n-th" child of its parent.
194
195 my $third = $dom->find('div:nth-child(3)');
196 my $odd = $dom->find('div:nth-child(odd)');
197 my $even = $dom->find('div:nth-child(even)');
198 my $top3 = $dom->find('div:nth-child(-n+3)');
199
200 E:nth-last-child(n)
201 An "E" element, the "n-th" child of its parent, counting from the
202 last one.
203
204 my $third = $dom->find('div:nth-last-child(3)');
205 my $odd = $dom->find('div:nth-last-child(odd)');
206 my $even = $dom->find('div:nth-last-child(even)');
207 my $bottom3 = $dom->find('div:nth-last-child(-n+3)');
208
209 E:nth-of-type(n)
210 An "E" element, the "n-th" sibling of its type.
211
212 my $third = $dom->find('div:nth-of-type(3)');
213 my $odd = $dom->find('div:nth-of-type(odd)');
214 my $even = $dom->find('div:nth-of-type(even)');
215 my $top3 = $dom->find('div:nth-of-type(-n+3)');
216
217 E:nth-last-of-type(n)
218 An "E" element, the "n-th" sibling of its type, counting from the
219 last one.
220
221 my $third = $dom->find('div:nth-last-of-type(3)');
222 my $odd = $dom->find('div:nth-last-of-type(odd)');
223 my $even = $dom->find('div:nth-last-of-type(even)');
224 my $bottom3 = $dom->find('div:nth-last-of-type(-n+3)');
225
226 E:first-child
227 An "E" element, first child of its parent.
228
229 my $first = $dom->find('div p:first-child');
230
231 E:last-child
232 An "E" element, last child of its parent.
233
234 my $last = $dom->find('div p:last-child');
235
236 E:first-of-type
237 An "E" element, first sibling of its type.
238
239 my $first = $dom->find('div p:first-of-type');
240
241 E:last-of-type
242 An "E" element, last sibling of its type.
243
244 my $last = $dom->find('div p:last-of-type');
245
246 E:only-child
247 An "E" element, only child of its parent.
248
249 my $lonely = $dom->find('div p:only-child');
250
251 E:only-of-type
252 An "E" element, only sibling of its type.
253
254 my $lonely = $dom->find('div p:only-of-type');
255
256 E:empty
257 An "E" element that has no children (including text nodes).
258
259 my $empty = $dom->find(':empty');
260
261 E:any-link
262 Alias for "E:link". Note that this selector is EXPERIMENTAL and
263 might change without warning! This selector is part of Selectors
264 Level 4 <https://dev.w3.org/csswg/selectors-4>, which is still a
265 work in progress.
266
267 E:link
268 An "E" element being the source anchor of a hyperlink of which the
269 target is not yet visited (":link") or already visited
270 (":visited"). Note that Mojo::DOM58 is not stateful, therefore
271 ":any-link", ":link" and ":visited" yield exactly the same results.
272
273 my $links = $dom->find(':any-link');
274 my $links = $dom->find(':link');
275 my $links = $dom->find(':visited');
276
277 E:visited
278 Alias for "E:link".
279
280 E:scope
281 An "E" element being a designated reference element. Note that this
282 selector is EXPERIMENTAL and might change without warning!
283
284 my $scoped = $dom->find('a:not(:scope > a)');
285 my $scoped = $dom->find('div :scope p');
286 my $scoped = $dom->find('~ p');
287
288 This selector is part of Selectors Level 4
289 <https://dev.w3.org/csswg/selectors-4>, which is still a work in
290 progress.
291
292 E:checked
293 A user interface element "E" which is checked (for instance a
294 radio-button or checkbox).
295
296 my $input = $dom->find(':checked');
297
298 E.warning
299 An "E" element whose class is "warning".
300
301 my $warning = $dom->find('div.warning');
302
303 E#myid
304 An "E" element with "ID" equal to "myid".
305
306 my $foo = $dom->at('div#foo');
307
308 E:not(s1, s2)
309 An "E" element that does not match either compound selector "s1" or
310 compound selector "s2". Note that support for compound selectors is
311 EXPERIMENTAL and might change without warning!
312
313 my $others = $dom->find('div p:not(:first-child, :last-child)');
314
315 Support for compound selectors was added as part of Selectors Level
316 4 <https://dev.w3.org/csswg/selectors-4>, which is still a work in
317 progress.
318
319 E:is(s1, s2)
320 An "E" element that matches compound selector "s1" and/or compound
321 selector "s2". Note that this selector is EXPERIMENTAL and might
322 change without warning!
323
324 my $headers = $dom->find(':is(section, article, aside, nav) h1');
325
326 This selector is part of Selectors Level 4
327 <https://dev.w3.org/csswg/selectors-4>, which is still a work in
328 progress.
329
330 E:has(rs1, rs2)
331 An "E" element, if either of the relative selectors "rs1" or "rs2",
332 when evaluated with "E" as the :scope elements, match an element.
333 Note that this selector is EXPERIMENTAL and might change without
334 warning!
335
336 my $link = $dom->find('a:has(> img)');
337
338 This selector is part of Selectors Level 4
339 <https://dev.w3.org/csswg/selectors-4>, which is still a work in
340 progress. Also be aware that this feature is currently marked
341 "at-risk", so there is a high chance that it will get removed
342 completely.
343
344 A|E An "E" element that belongs to the namespace alias "A" from CSS
345 Namespaces Module Level 3 <https://www.w3.org/TR/css-
346 namespaces-3/>. Key/value pairs passed to selector methods are
347 used to declare namespace aliases.
348
349 my $elem = $dom->find('lq|elem', lq => 'http://example.com/q-markup');
350
351 Using an empty alias searches for an element that belongs to no
352 namespace.
353
354 my $div = $dom->find('|div');
355
356 E F An "F" element descendant of an "E" element.
357
358 my $headlines = $dom->find('div h1');
359
360 E > F
361 An "F" element child of an "E" element.
362
363 my $headlines = $dom->find('html > body > div > h1');
364
365 E + F
366 An "F" element immediately preceded by an "E" element.
367
368 my $second = $dom->find('h1 + h2');
369
370 E ~ F
371 An "F" element preceded by an "E" element.
372
373 my $second = $dom->find('h1 ~ h2');
374
375 E, F, G
376 Elements of type "E", "F" and "G".
377
378 my $headlines = $dom->find('h1, h2, h3');
379
380 E[foo=bar][bar=baz]
381 An "E" element whose attributes match all following attribute
382 selectors.
383
384 my $links = $dom->find('a[foo^=b][foo$=ar]');
385
387 Mojo::DOM58 overloads the following operators.
388
389 array
390 my @nodes = @$dom;
391
392 Alias for "child_nodes".
393
394 # "<!-- Test -->"
395 $dom->parse('<!-- Test --><b>123</b>')->[0];
396
397 bool
398 my $bool = !!$dom;
399
400 Always true.
401
402 hash
403 my %attrs = %$dom;
404
405 Alias for "attr".
406
407 # "test"
408 $dom->parse('<div id="test">Test</div>')->at('div')->{id};
409
410 stringify
411 my $str = "$dom";
412
413 Alias for "to_string".
414
416 Mojo::DOM58 implements the following functions, which can be imported
417 individually.
418
419 tag_to_html
420 my $str = tag_to_html 'div', id => 'foo', 'safe content';
421
422 Generate HTML/XML tag and render it right away. This is a significantly
423 faster alternative to "new_tag" for template systems that have to
424 generate a lot of tags.
425
427 Mojo::DOM58 implements the following methods.
428
429 new
430 my $dom = Mojo::DOM58->new;
431 my $dom = Mojo::DOM58->new('<foo bar="baz">I ♥ Mojo::DOM58!</foo>');
432
433 Construct a new scalar-based Mojo::DOM58 object and "parse" HTML/XML
434 fragment if necessary.
435
436 new_tag
437 my $tag = Mojo::DOM58->new_tag('div');
438 my $tag = $dom->new_tag('div');
439 my $tag = $dom->new_tag('div', id => 'foo', hidden => undef);
440 my $tag = $dom->new_tag('div', 'safe content');
441 my $tag = $dom->new_tag('div', id => 'foo', 'safe content');
442 my $tag = $dom->new_tag('div', data => {mojo => 'rocks'}, 'safe content');
443 my $tag = $dom->new_tag('div', id => 'foo', sub { 'unsafe content' });
444
445 Construct a new Mojo::DOM58 object for an HTML/XML tag with or without
446 attributes and content. The "data" attribute may contain a hash
447 reference with key/value pairs to generate attributes from.
448
449 # "<br>"
450 $dom->new_tag('br');
451
452 # "<div></div>"
453 $dom->new_tag('div');
454
455 # "<div id="foo" hidden></div>"
456 $dom->new_tag('div', id => 'foo', hidden => undef);
457
458 # "<div>test & 123</div>"
459 $dom->new_tag('div', 'test & 123');
460
461 # "<div id="foo">test & 123</div>"
462 $dom->new_tag('div', id => 'foo', 'test & 123');
463
464 # "<div data-foo="1" data-bar="test">test & 123</div>""
465 $dom->new_tag('div', data => {foo => 1, Bar => 'test'}, 'test & 123');
466
467 # "<div id="foo">test & 123</div>"
468 $dom->new_tag('div', id => 'foo', sub { 'test & 123' });
469
470 # "<div>Hello<b>Mojo!</b></div>"
471 $dom->parse('<div>Hello</div>')->at('div')
472 ->append_content($dom->new_tag('b', 'Mojo!'))->root;
473
474 all_text
475 my $text = $dom->all_text;
476
477 Extract text content from all descendant nodes of this element. For
478 HTML documents "script" and "style" elements are excluded.
479
480 # "foo\nbarbaz\n"
481 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->all_text;
482
483 ancestors
484 my $collection = $dom->ancestors;
485 my $collection = $dom->ancestors('div ~ p');
486
487 Find all ancestor elements of this node matching the CSS selector and
488 return a collection containing these elements as Mojo::DOM58 objects.
489 All selectors listed in "SELECTORS" are supported.
490
491 # List tag names of ancestor elements
492 say $dom->ancestors->map('tag')->join("\n");
493
494 append
495 $dom = $dom->append('<p>I ♥ Mojo::DOM58!</p>');
496 $dom = $dom->append(Mojo::DOM58->new);
497
498 Append HTML/XML fragment to this node (for all node types other than
499 "root").
500
501 # "<div><h1>Test</h1><h2>123</h2></div>"
502 $dom->parse('<div><h1>Test</h1></div>')
503 ->at('h1')->append('<h2>123</h2>')->root;
504
505 # "<p>Test 123</p>"
506 $dom->parse('<p>Test</p>')->at('p')
507 ->child_nodes->first->append(' 123')->root;
508
509 append_content
510 $dom = $dom->append_content('<p>I ♥ Mojo::DOM58!</p>');
511 $dom = $dom->append_content(Mojo::DOM58->new);
512
513 Append HTML/XML fragment (for "root" and "tag" nodes) or raw content to
514 this node's content.
515
516 # "<div><h1>Test123</h1></div>"
517 $dom->parse('<div><h1>Test</h1></div>')
518 ->at('h1')->append_content('123')->root;
519
520 # "<!-- Test 123 --><br>"
521 $dom->parse('<!-- Test --><br>')
522 ->child_nodes->first->append_content('123 ')->root;
523
524 # "<p>Test<i>123</i></p>"
525 $dom->parse('<p>Test</p>')->at('p')->append_content('<i>123</i>')->root;
526
527 at
528 my $result = $dom->at('div ~ p');
529 my $result = $dom->at('svg|line', svg => 'http://www.w3.org/2000/svg');
530
531 Find first descendant element of this element matching the CSS selector
532 and return it as a Mojo::DOM58 object, or "undef" if none could be
533 found. All selectors listed in "SELECTORS" are supported.
534
535 # Find first element with "svg" namespace definition
536 my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};
537
538 Trailing key/value pairs can be used to declare xml namespace aliases.
539
540 # "<rect />"
541 $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
542 ->at('svg|rect', svg => 'http://www.w3.org/2000/svg');
543
544 attr
545 my $hash = $dom->attr;
546 my $foo = $dom->attr('foo');
547 $dom = $dom->attr({foo => 'bar'});
548 $dom = $dom->attr(foo => 'bar');
549
550 This element's attributes.
551
552 # Remove an attribute
553 delete $dom->attr->{id};
554
555 # Attribute without value
556 $dom->attr(selected => undef);
557
558 # List id attributes
559 say $dom->find('*')->map(attr => 'id')->compact->join("\n");
560
561 child_nodes
562 my $collection = $dom->child_nodes;
563
564 Return a collection containing all child nodes of this element as
565 Mojo::DOM58 objects.
566
567 # "<p><b>123</b></p>"
568 $dom->parse('<p>Test<b>123</b></p>')->at('p')->child_nodes->first->remove;
569
570 # "<!DOCTYPE html>"
571 $dom->parse('<!DOCTYPE html><b>123</b>')->child_nodes->first;
572
573 # " Test "
574 $dom->parse('<b>123</b><!-- Test -->')->child_nodes->last->content;
575
576 children
577 my $collection = $dom->children;
578 my $collection = $dom->children('div ~ p');
579
580 Find all child elements of this element matching the CSS selector and
581 return a collection containing these elements as Mojo::DOM58 objects.
582 All selectors listed in "SELECTORS" are supported.
583
584 # Show tag name of random child element
585 say $dom->children->shuffle->first->tag;
586
587 content
588 my $str = $dom->content;
589 $dom = $dom->content('<p>I ♥ Mojo::DOM58!</p>');
590 $dom = $dom->content(Mojo::DOM58->new);
591
592 Return this node's content or replace it with HTML/XML fragment (for
593 "root" and "tag" nodes) or raw content.
594
595 # "<b>Test</b>"
596 $dom->parse('<div><b>Test</b></div>')->at('div')->content;
597
598 # "<div><h1>123</h1></div>"
599 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('123')->root;
600
601 # "<p><i>123</i></p>"
602 $dom->parse('<p>Test</p>')->at('p')->content('<i>123</i>')->root;
603
604 # "<div><h1></h1></div>"
605 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('')->root;
606
607 # " Test "
608 $dom->parse('<!-- Test --><br>')->child_nodes->first->content;
609
610 # "<div><!-- 123 -->456</div>"
611 $dom->parse('<div><!-- Test -->456</div>')
612 ->at('div')->child_nodes->first->content(' 123 ')->root;
613
614 descendant_nodes
615 my $collection = $dom->descendant_nodes;
616
617 Return a collection containing all descendant nodes of this element as
618 Mojo::DOM58 objects.
619
620 # "<p><b>123</b></p>"
621 $dom->parse('<p><!-- Test --><b>123<!-- 456 --></b></p>')
622 ->descendant_nodes->grep(sub { $_->type eq 'comment' })
623 ->map('remove')->first;
624
625 # "<p><b>test</b>test</p>"
626 $dom->parse('<p><b>123</b>456</p>')
627 ->at('p')->descendant_nodes->grep(sub { $_->type eq 'text' })
628 ->map(content => 'test')->first->root;
629
630 find
631 my $collection = $dom->find('div ~ p');
632 my $collection = $dom->find('svg|line', svg => 'http://www.w3.org/2000/svg');
633
634 Find all descendant elements of this element matching the CSS selector
635 and return a collection containing these elements as Mojo::DOM58
636 objects. All selectors listed in "SELECTORS" are supported.
637
638 # Find a specific element and extract information
639 my $id = $dom->find('div')->[23]{id};
640
641 # Extract information from multiple elements
642 my @headers = $dom->find('h1, h2, h3')->map('text')->each;
643
644 # Count all the different tags
645 my $hash = $dom->find('*')->reduce(sub { $a->{$b->tag}++; $a }, {});
646
647 # Find elements with a class that contains dots
648 my @divs = $dom->find('div.foo\.bar')->each;
649
650 Trailing key/value pairs can be used to declare xml namespace aliases.
651
652 # "<rect />"
653 $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
654 ->find('svg|rect', svg => 'http://www.w3.org/2000/svg')->first;
655
656 following
657 my $collection = $dom->following;
658 my $collection = $dom->following('div ~ p');
659
660 Find all sibling elements after this node matching the CSS selector and
661 return a collection containing these elements as Mojo::DOM58 objects.
662 All selectors listed in "SELECTORS" are supported.
663
664 # List tags of sibling elements after this node
665 say $dom->following->map('tag')->join("\n");
666
667 following_nodes
668 my $collection = $dom->following_nodes;
669
670 Return a collection containing all sibling nodes after this node as
671 Mojo::DOM58 objects.
672
673 # "C"
674 $dom->parse('<p>A</p><!-- B -->C')->at('p')->following_nodes->last->content;
675
676 matches
677 my $bool = $dom->matches('div ~ p');
678 my $bool = $dom->matches('svg|line', svg => 'http://www.w3.org/2000/svg');
679
680 Check if this element matches the CSS selector. All selectors listed in
681 "SELECTORS" are supported.
682
683 # True
684 $dom->parse('<p class="a">A</p>')->at('p')->matches('.a');
685 $dom->parse('<p class="a">A</p>')->at('p')->matches('p[class]');
686
687 # False
688 $dom->parse('<p class="a">A</p>')->at('p')->matches('.b');
689 $dom->parse('<p class="a">A</p>')->at('p')->matches('p[id]');
690
691 Trailing key/value pairs can be used to declare xml namespace aliases.
692
693 # True
694 $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
695 ->matches('svg|rect', svg => 'http://www.w3.org/2000/svg');
696
697 namespace
698 my $namespace = $dom->namespace;
699
700 Find this element's namespace, or return "undef" if none could be
701 found.
702
703 # "http://www.w3.org/2000/svg"
704 Mojo::DOM58->new('<svg xmlns:svg="http://www.w3.org/2000/svg"><svg:circle>3.14</svg:circle></svg>')->at('svg\:circle')->namespace;
705
706 # Find namespace for an element with namespace prefix
707 my $namespace = $dom->at('svg > svg\:circle')->namespace;
708
709 # Find namespace for an element that may or may not have a namespace prefix
710 my $namespace = $dom->at('svg > circle')->namespace;
711
712 next
713 my $sibling = $dom->next;
714
715 Return Mojo::DOM58 object for next sibling element, or "undef" if there
716 are no more siblings.
717
718 # "<h2>123</h2>"
719 $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h1')->next;
720
721 next_node
722 my $sibling = $dom->next_node;
723
724 Return Mojo::DOM58 object for next sibling node, or "undef" if there
725 are no more siblings.
726
727 # "456"
728 $dom->parse('<p><b>123</b><!-- Test -->456</p>')
729 ->at('b')->next_node->next_node;
730
731 # " Test "
732 $dom->parse('<p><b>123</b><!-- Test -->456</p>')
733 ->at('b')->next_node->content;
734
735 parent
736 my $parent = $dom->parent;
737
738 Return Mojo::DOM58 object for parent of this node, or "undef" if this
739 node has no parent.
740
741 # "<b><i>Test</i></b>"
742 $dom->parse('<p><b><i>Test</i></b></p>')->at('i')->parent;
743
744 parse
745 $dom = $dom->parse('<foo bar="baz">I ♥ Mojo::DOM58!</foo>');
746
747 Parse HTML/XML fragment.
748
749 # Parse XML
750 my $dom = Mojo::DOM58->new->xml(1)->parse('<foo>I ♥ Mojo::DOM58!</foo>');
751
752 preceding
753 my $collection = $dom->preceding;
754 my $collection = $dom->preceding('div ~ p');
755
756 Find all sibling elements before this node matching the CSS selector
757 and return a collection containing these elements as Mojo::DOM58
758 objects. All selectors listed in "SELECTORS" are supported.
759
760 # List tags of sibling elements before this node
761 say $dom->preceding->map('tag')->join("\n");
762
763 preceding_nodes
764 my $collection = $dom->preceding_nodes;
765
766 Return a collection containing all sibling nodes before this node as
767 Mojo::DOM58 objects.
768
769 # "A"
770 $dom->parse('A<!-- B --><p>C</p>')->at('p')->preceding_nodes->first->content;
771
772 prepend
773 $dom = $dom->prepend('<p>I ♥ Mojo::DOM58!</p>');
774 $dom = $dom->prepend(Mojo::DOM58->new);
775
776 Prepend HTML/XML fragment to this node (for all node types other than
777 "root").
778
779 # "<div><h1>Test</h1><h2>123</h2></div>"
780 $dom->parse('<div><h2>123</h2></div>')
781 ->at('h2')->prepend('<h1>Test</h1>')->root;
782
783 # "<p>Test 123</p>"
784 $dom->parse('<p>123</p>')
785 ->at('p')->child_nodes->first->prepend('Test ')->root;
786
787 prepend_content
788 $dom = $dom->prepend_content('<p>I ♥ Mojo::DOM58!</p>');
789 $dom = $dom->prepend_content(Mojo::DOM58->new);
790
791 Prepend HTML/XML fragment (for "root" and "tag" nodes) or raw content
792 to this node's content.
793
794 # "<div><h2>Test123</h2></div>"
795 $dom->parse('<div><h2>123</h2></div>')
796 ->at('h2')->prepend_content('Test')->root;
797
798 # "<!-- Test 123 --><br>"
799 $dom->parse('<!-- 123 --><br>')
800 ->child_nodes->first->prepend_content(' Test')->root;
801
802 # "<p><i>123</i>Test</p>"
803 $dom->parse('<p>Test</p>')->at('p')->prepend_content('<i>123</i>')->root;
804
805 previous
806 my $sibling = $dom->previous;
807
808 Return Mojo::DOM58 object for previous sibling element, or "undef" if
809 there are no more siblings.
810
811 # "<h1>Test</h1>"
812 $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h2')->previous;
813
814 previous_node
815 my $sibling = $dom->previous_node;
816
817 Return Mojo::DOM58 object for previous sibling node, or "undef" if
818 there are no more siblings.
819
820 # "123"
821 $dom->parse('<p>123<!-- Test --><b>456</b></p>')
822 ->at('b')->previous_node->previous_node;
823
824 # " Test "
825 $dom->parse('<p>123<!-- Test --><b>456</b></p>')
826 ->at('b')->previous_node->content;
827
828 remove
829 my $parent = $dom->remove;
830
831 Remove this node and return "root" (for "root" nodes) or "parent".
832
833 # "<div></div>"
834 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->remove;
835
836 # "<p><b>456</b></p>"
837 $dom->parse('<p>123<b>456</b></p>')
838 ->at('p')->child_nodes->first->remove->root;
839
840 replace
841 my $parent = $dom->replace('<div>I ♥ Mojo::DOM58!</div>');
842 my $parent = $dom->replace(Mojo::DOM58->new);
843
844 Replace this node with HTML/XML fragment and return "root" (for "root"
845 nodes) or "parent".
846
847 # "<div><h2>123</h2></div>"
848 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->replace('<h2>123</h2>');
849
850 # "<p><b>123</b></p>"
851 $dom->parse('<p>Test</p>')
852 ->at('p')->child_nodes->[0]->replace('<b>123</b>')->root;
853
854 root
855 my $root = $dom->root;
856
857 Return Mojo::DOM58 object for "root" node.
858
859 selector
860 my $selector = $dom->selector;
861
862 Get a unique CSS selector for this element.
863
864 # "ul:nth-child(1) > li:nth-child(2)"
865 $dom->parse('<ul><li>Test</li><li>123</li></ul>')->find('li')->last->selector;
866
867 # "p:nth-child(1) > b:nth-child(1) > i:nth-child(1)"
868 $dom->parse('<p><b><i>Test</i></b></p>')->at('i')->selector;
869
870 strip
871 my $parent = $dom->strip;
872
873 Remove this element while preserving its content and return "parent".
874
875 # "<div>Test</div>"
876 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;
877
878 tag
879 my $tag = $dom->tag;
880 $dom = $dom->tag('div');
881
882 This element's tag name.
883
884 # List tag names of child elements
885 say $dom->children->map('tag')->join("\n");
886
887 tap
888 $dom = $dom->tap(sub {...});
889
890 Equivalent to "tap" in Mojo::Base.
891
892 text
893 my $text = $dom->text;
894
895 Extract text content from this element only (not including child
896 elements).
897
898 # "bar"
899 $dom->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;
900
901 # "foo\nbaz\n"
902 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;
903
904 to_string
905 my $str = $dom->to_string;
906
907 Render this node and its content to HTML/XML.
908
909 # "<b>Test</b>"
910 $dom->parse('<div><b>Test</b></div>')->at('div b')->to_string;
911
912 To extract text content from all descendant nodes, see "all_text".
913
914 tree
915 my $tree = $dom->tree;
916 $dom = $dom->tree(['root']);
917
918 Document Object Model. Note that this structure should only be used
919 very carefully since it is very dynamic.
920
921 type
922 my $type = $dom->type;
923
924 This node's type, usually "cdata", "comment", "doctype", "pi", "raw",
925 "root", "tag" or "text".
926
927 # "cdata"
928 $dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;
929
930 # "comment"
931 $dom->parse('<!-- Test -->')->child_nodes->first->type;
932
933 # "doctype"
934 $dom->parse('<!DOCTYPE html>')->child_nodes->first->type;
935
936 # "pi"
937 $dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;
938
939 # "raw"
940 $dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;
941
942 # "root"
943 $dom->parse('<p>Test</p>')->type;
944
945 # "tag"
946 $dom->parse('<p>Test</p>')->at('p')->type;
947
948 # "text"
949 $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;
950
951 val
952 my $value = $dom->val;
953
954 Extract value from form element (such as "button", "input", "option",
955 "select" and "textarea"), or return "undef" if this element has no
956 value. In the case of "select" with "multiple" attribute, find "option"
957 elements with "selected" attribute and return an array reference with
958 all values, or "undef" if none could be found.
959
960 # "a"
961 $dom->parse('<input name=test value=a>')->at('input')->val;
962
963 # "b"
964 $dom->parse('<textarea>b</textarea>')->at('textarea')->val;
965
966 # "c"
967 $dom->parse('<option value="c">Test</option>')->at('option')->val;
968
969 # "d"
970 $dom->parse('<select><option selected>d</option></select>')
971 ->at('select')->val;
972
973 # "e"
974 $dom->parse('<select multiple><option selected>e</option></select>')
975 ->at('select')->val->[0];
976
977 # "on"
978 $dom->parse('<input name=test type=checkbox>')->at('input')->val;
979
980 with_roles
981 my $new_class = Mojo::DOM58->with_roles('Mojo::DOM58::Role::One');
982 my $new_class = Mojo::DOM58->with_roles('+One', '+Two');
983 $dom = $dom->with_roles('+One', '+Two');
984
985 Equivalent to "with_roles" in Mojo::Base. Note that role support
986 depends on Role::Tiny (2.000001+).
987
988 wrap
989 $dom = $dom->wrap('<div></div>');
990 $dom = $dom->wrap(Mojo::DOM58->new);
991
992 Wrap HTML/XML fragment around this node (for all node types other than
993 "root"), placing it as the last child of the first innermost element.
994
995 # "<p>123<b>Test</b></p>"
996 $dom->parse('<b>Test</b>')->at('b')->wrap('<p>123</p>')->root;
997
998 # "<div><p><b>Test</b></p>123</div>"
999 $dom->parse('<b>Test</b>')->at('b')->wrap('<div><p></p>123</div>')->root;
1000
1001 # "<p><b>Test</b></p><p>123</p>"
1002 $dom->parse('<b>Test</b>')->at('b')->wrap('<p></p><p>123</p>')->root;
1003
1004 # "<p><b>Test</b></p>"
1005 $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->wrap('<b>')->root;
1006
1007 wrap_content
1008 $dom = $dom->wrap_content('<div></div>');
1009 $dom = $dom->wrap_content(Mojo::DOM58->new);
1010
1011 Wrap HTML/XML fragment around this node's content (for "root" and "tag"
1012 nodes), placing it as the last children of the first innermost element.
1013
1014 # "<p><b>123Test</b></p>"
1015 $dom->parse('<p>Test<p>')->at('p')->wrap_content('<b>123</b>')->root;
1016
1017 # "<p><b>Test</b></p><p>123</p>"
1018 $dom->parse('<b>Test</b>')->wrap_content('<p></p><p>123</p>');
1019
1020 xml
1021 my $bool = $dom->xml;
1022 $dom = $dom->xml($bool);
1023
1024 Disable HTML semantics in parser and activate case-sensitivity,
1025 defaults to auto detection based on XML declarations.
1026
1028 Some Mojo::DOM58 methods return an array-based collection object based
1029 on Mojo::Collection, which can either be accessed directly as an array
1030 reference, or with the following methods.
1031
1032 # Chain methods
1033 $collection->map(sub { ucfirst })->shuffle->each(sub {
1034 my ($word, $num) = @_;
1035 say "$num: $word";
1036 });
1037
1038 # Access array directly to manipulate collection
1039 $collection->[23] += 100;
1040 say for @$collection;
1041
1042 compact
1043 my $new = $collection->compact;
1044
1045 Create a new collection with all elements that are defined and not an
1046 empty string.
1047
1048 # $collection contains (0, 1, undef, 2, '', 3)
1049 $collection->compact->join(', '); # "0, 1, 2, 3"
1050
1051 each
1052 my @elements = $collection->each;
1053 $collection = $collection->each(sub {...});
1054
1055 Evaluate callback for each element in collection or return all elements
1056 as a list if none has been provided. The element will be the first
1057 argument passed to the callback and is also available as $_.
1058
1059 # Make a numbered list
1060 $collection->each(sub {
1061 my ($e, $num) = @_;
1062 say "$num: $e";
1063 });
1064
1065 first
1066 my $first = $collection->first;
1067 my $first = $collection->first(qr/foo/);
1068 my $first = $collection->first(sub {...});
1069 my $first = $collection->first($method);
1070 my $first = $collection->first($method, @args);
1071
1072 Evaluate regular expression/callback for, or call method on, each
1073 element in collection and return the first one that matched the regular
1074 expression, or for which the callback/method returned true. The element
1075 will be the first argument passed to the callback and is also available
1076 as $_.
1077
1078 # Longer version
1079 my $first = $collection->first(sub { $_->$method(@args) });
1080
1081 # Find first value that contains the word "mojo"
1082 my $interesting = $collection->first(qr/mojo/i);
1083
1084 # Find first value that is greater than 5
1085 my $greater = $collection->first(sub { $_ > 5 });
1086
1087 flatten
1088 my $new = $collection->flatten;
1089
1090 Flatten nested collections/arrays recursively and create a new
1091 collection with all elements.
1092
1093 # $collection contains (1, [2, [3, 4], 5, [6]], 7)
1094 $collection->flatten->join(', '); # "1, 2, 3, 4, 5, 6, 7"
1095
1096 grep
1097 my $new = $collection->grep(qr/foo/);
1098 my $new = $collection->grep(sub {...});
1099 my $new = $collection->grep($method);
1100 my $new = $collection->grep($method, @args);
1101
1102 Evaluate regular expression/callback for, or call method on, each
1103 element in collection and create a new collection with all elements
1104 that matched the regular expression, or for which the callback/method
1105 returned true. The element will be the first argument passed to the
1106 callback and is also available as $_.
1107
1108 # Longer version
1109 my $new = $collection->grep(sub { $_->$method(@args) });
1110
1111 # Find all values that contain the word "mojo"
1112 my $interesting = $collection->grep(qr/mojo/i);
1113
1114 # Find all values that are greater than 5
1115 my $greater = $collection->grep(sub { $_ > 5 });
1116
1117 head
1118 my $new = $collection->head(4);
1119 my $new = $collection->head(-2);
1120
1121 Create a new collection with up to the specified number of elements
1122 from the beginning of the collection. A negative number will count from
1123 the end.
1124
1125 # $collection contains ('A', 'B', 'C', 'D', 'E')
1126 $collection->head(3)->join(' '); # "A B C"
1127 $collection->head(-3)->join(' '); # "A B"
1128
1129 join
1130 my $stream = $collection->join;
1131 my $stream = $collection->join("\n");
1132
1133 Turn collection into string.
1134
1135 # Join all values with commas
1136 $collection->join(', ');
1137
1138 last
1139 my $last = $collection->last;
1140
1141 Return the last element in collection.
1142
1143 map
1144 my $new = $collection->map(sub {...});
1145 my $new = $collection->map($method);
1146 my $new = $collection->map($method, @args);
1147
1148 Evaluate callback for, or call method on, each element in collection
1149 and create a new collection from the results. The element will be the
1150 first argument passed to the callback and is also available as $_.
1151
1152 # Longer version
1153 my $new = $collection->map(sub { $_->$method(@args) });
1154
1155 # Append the word "mojo" to all values
1156 my $domified = $collection->map(sub { $_ . 'mojo' });
1157
1158 reduce
1159 my $result = $collection->reduce(sub {...});
1160 my $result = $collection->reduce(sub {...}, $initial);
1161
1162 Reduce elements in collection with callback, the first element will be
1163 used as initial value if none has been provided.
1164
1165 # Calculate the sum of all values
1166 my $sum = $collection->reduce(sub { $a + $b });
1167
1168 # Count how often each value occurs in collection
1169 my $hash = $collection->reduce(sub { $a->{$b}++; $a }, {});
1170
1171 reverse
1172 my $new = $collection->reverse;
1173
1174 Create a new collection with all elements in reverse order.
1175
1176 slice
1177 my $new = $collection->slice(4 .. 7);
1178
1179 Create a new collection with all selected elements.
1180
1181 # $collection contains ('A', 'B', 'C', 'D', 'E')
1182 $collection->slice(1, 2, 4)->join(' '); # "B C E"
1183
1184 shuffle
1185 my $new = $collection->shuffle;
1186
1187 Create a new collection with all elements in random order.
1188
1189 size
1190 my $size = $collection->size;
1191
1192 Number of elements in collection.
1193
1194 sort
1195 my $new = $collection->sort;
1196 my $new = $collection->sort(sub {...});
1197
1198 Sort elements based on return value of callback and create a new
1199 collection from the results.
1200
1201 # Sort values case-insensitive
1202 my $case_insensitive = $collection->sort(sub { uc($a) cmp uc($b) });
1203
1204 tail
1205 my $new = $collection->tail(4);
1206 my $new = $collection->tail(-2);
1207
1208 Create a new collection with up to the specified number of elements
1209 from the end of the collection. A negative number will count from the
1210 beginning.
1211
1212 # $collection contains ('A', 'B', 'C', 'D', 'E')
1213 $collection->tail(3)->join(' '); # "C D E"
1214 $collection->tail(-3)->join(' '); # "D E"
1215
1216 tap
1217 $collection = $collection->tap(sub {...});
1218
1219 Equivalent to "tap" in Mojo::Base.
1220
1221 to_array
1222 my $array = $collection->to_array;
1223
1224 Turn collection into array reference.
1225
1226 uniq
1227 my $new = $collection->uniq;
1228 my $new = $collection->uniq(sub {...});
1229 my $new = $collection->uniq($method);
1230 my $new = $collection->uniq($method, @args);
1231
1232 Create a new collection without duplicate elements, using the string
1233 representation of either the elements or the return value of the
1234 callback/method to decide uniqueness. Note that "undef" and empty
1235 string are treated the same.
1236
1237 # Longer version
1238 my $new = $collection->uniq(sub { $_->$method(@args) });
1239
1240 # $collection contains ('foo', 'bar', 'bar', 'baz')
1241 $collection->uniq->join(' '); # "foo bar baz"
1242
1243 # $collection contains ([1, 2], [2, 1], [3, 2])
1244 $collection->uniq(sub{ $_->[1] })->to_array; # "[[1, 2], [2, 1]]"
1245
1246 with_roles
1247 $collection = $collection->with_roles('Mojo::Collection::Role::One');
1248
1249 Equivalent to "with_roles" in Mojo::Base. Note that role support
1250 depends on Role::Tiny (2.000001+).
1251
1253 You can set the "MOJO_DOM58_CSS_DEBUG" environment variable to get some
1254 advanced diagnostics information printed to "STDERR".
1255
1256 MOJO_DOM58_CSS_DEBUG=1
1257
1259 Report issues related to the format of this distribution or Perl 5.8
1260 support to the public bugtracker. Any other issues should be reported
1261 directly to the upstream Mojolicious issue tracker.
1262
1264 Dan Book <dbook@cpan.org>
1265
1266 Code and tests adapted from Mojo::DOM, a lightweight DOM parser by the
1267 Mojolicious team.
1268
1270 Matt S Trout (mst)
1271
1273 Copyright (c) 2008-2016 Sebastian Riedel and others.
1274
1275 Copyright (c) 2016 "AUTHOR" and "CONTRIBUTORS" for adaptation to
1276 standalone format.
1277
1278 This is free software, licensed under:
1279
1280 The Artistic License 2.0 (GPL Compatible)
1281
1283 Mojo::DOM, HTML::TreeBuilder, XML::LibXML, XML::Twig, XML::Smart
1284
1285
1286
1287perl v5.34.0 2022-01-21 Mojo::DOM58(3)