1HTML::GenToc(3) User Contributed Perl Documentation HTML::GenToc(3)
2
3
4
6 HTML::GenToc - Generate a Table of Contents for HTML documents.
7
9 version 3.20
10
12 use HTML::GenToc;
13
14 # create a new object
15 my $toc = new HTML::GenToc();
16
17 my $toc = new HTML::GenToc(title=>"Table of Contents",
18 toc_entry=>{
19 H1=>1,
20 H2=>2
21 },
22 toc_end=>{
23 H1=>'/H1',
24 H2=>'/H2'
25 }
26 );
27
28 # generate a ToC from a file
29 $toc->generate_toc(input=>$html_file,
30 footer=>$footer_file,
31 header=>$header_file
32 );
33
35 HTML::GenToc generates anchors and a table of contents for HTML
36 documents. Depending on the arguments, it will insert the information
37 it generates, or output to a string, a separate file or STDOUT.
38
39 While it defaults to taking H1 and H2 elements as the significant
40 elements to put into the table of contents, any tag can be defined as a
41 significant element. Also, it doesn't matter if the input HTML code is
42 complete, pure HTML, one can input pseudo-html or page-fragments, which
43 makes it suitable for using on templates and HTML meta-languages such
44 as WML.
45
46 Also included in the distrubution is hypertoc, a script which uses the
47 module so that one can process files on the command-line in a user-
48 friendly manner.
49
51 The ToC generated is a multi-level level list containing links to the
52 significant elements. HTML::GenToc inserts the links into the ToC to
53 significant elements at a level specified by the user.
54
55 Example:
56
57 If H1s are specified as level 1, than they appear in the first level
58 list of the ToC. If H2s are specified as a level 2, than they appear in
59 a second level list in the ToC.
60
61 Information on the significant elements and what level they should
62 occur are passed in to the methods used by this object, or one can use
63 the defaults.
64
65 There are two phases to the ToC generation. The first phase is to put
66 suitable anchors into the HTML documents, and the second phase is to
67 generate the ToC from HTML documents which have anchors in them for the
68 ToC to link to.
69
70 For more information on controlling the contents of the created ToC,
71 see "Formatting the ToC".
72
73 HTML::GenToc also supports the ability to incorporate the ToC into the
74 HTML document itself via the inline option. See "Inlining the ToC" for
75 more information.
76
77 In order for HTML::GenToc to support linking to significant elements,
78 HTML::GenToc inserts anchors into the significant elements. One can
79 use HTML::GenToc as a filter, outputing the result to another file, or
80 one can overwrite the original file, with the original backed up with a
81 suffix (default: "org") appended to the filename. One can also output
82 the result to a string.
83
85 Default arguments can be set when the object is created, and overridden
86 by setting arguments when the generate_toc method is called. Arguments
87 are given as a hash of arguments.
88
89 Method -- new
90 $toc = new HTML::GenToc();
91
92 $toc = new HTML::GenToc(toc_entry=>\%my_toc_entry,
93 toc_end=>\%my_toc_end,
94 bak=>'bak',
95 ...
96 );
97
98 Creates a new HTML::GenToc object.
99
100 These arguments will be used as defaults in invocations of other
101 methods.
102
103 See generate_tod for possible arguments.
104
105 generate_toc
106 $toc->generate_toc(outfile=>"index2.html");
107
108 my $result_str = $toc->generate_toc(to_string=>1);
109
110 Generates a table of contents for the significant elements in the HTML
111 documents, optionally generating anchors for them first.
112
113 Options
114
115 bak bak => string
116
117 If the input file/files is/are being overwritten (overwrite is on),
118 copy the original file to "filename.string". If the value is
119 empty, no backup file will be created. (default:org)
120
121 debug
122 debug => 1
123
124 Enable verbose debugging output. Used for debugging this module;
125 in other words, don't bother. (default:off)
126
127 entrysep
128 entrysep => string
129
130 Separator string for non-<li> item entries (default: ", ")
131
132 filenames
133 filenames => \@filenames
134
135 The filenames to use when creating table-of-contents links. This
136 overrides the filenames given in the input option, and is expected
137 to have exactly the same number of elements. This can also be used
138 when passing in string-content to the input option, to give a
139 (fake) filename to use for the links relating to that content.
140
141 footer
142 footer => file_or_string
143
144 Either the filename of the file containing footer text for ToC; or
145 a string containing the footer text.
146
147 header
148 header => file_or_string
149
150 Either the filename of the file containing header text for ToC; or
151 a string containing the header text.
152
153 ignore_only_one
154 ignore_only_one => 1
155
156 If there would be only one item in the ToC, don't make a ToC.
157
158 ignore_sole_first
159 ignore_sole_first => 1
160
161 If the first item in the ToC is of the highest level, AND it is the
162 only one of that level, ignore it. This is useful in web-pages
163 where there is only one H1 header but one doesn't know beforehand
164 whether there will be only one.
165
166 inline
167 inline => 1
168
169 Put ToC in document at a given point. See "Inlining the ToC" for
170 more information.
171
172 input
173 input => \@filenames
174
175 input => $content
176
177 This is expected to be either a reference to an array of filenames,
178 or a string containing content to process.
179
180 The three main uses would be:
181
182 (a) you have more than one file to process, so pass in multiple
183 filenames
184
185 (b) you have one file to process, so pass in its filename as the
186 only array item
187
188 (c) you have HTML content to process, so pass in just the content
189 as a string
190
191 (default:undefined)
192
193 notoc_match
194 notoc_match => string
195
196 If there are certain individual tags you don't wish to include in
197 the table of contents, even though they match the "significant
198 elements", then if this pattern matches contents inside the tag
199 (not the body), then that tag will not be included, either in
200 generating anchors nor in generating the ToC. (default:
201 "class="notoc"")
202
203 ol ol => 1
204
205 Use an ordered list for level 1 ToC entries.
206
207 ol_num_levels
208 ol_num_levels => 2
209
210 The number of levels deep the OL listing will go if ol is true. If
211 set to zero, will use an ordered list for all levels. (default:1)
212
213 overwrite
214 overwrite => 1
215
216 Overwrite the input file with the output. (default:off)
217
218 outfile
219 outfile => file
220
221 File to write the output to. This is where the modified HTML
222 output goes to. Note that it doesn't make sense to use this option
223 if you are processing more than one file. If you give '-' as the
224 filename, then output will go to STDOUT. (default: STDOUT)
225
226 quiet
227 quiet => 1
228
229 Suppress informative messages. (default: off)
230
231 textonly
232 textonly => 1
233
234 Use only text content in significant elements.
235
236 title
237 title => string
238
239 Title for ToC page (if not using header or inline or toc_only)
240 (default: "Table of Contents")
241
242 toc_after
243 toc_after => \%toc_after_data
244
245 %toc_after_data = { tag1 => suffix1,
246 tag2 => suffix2
247 };
248
249 toc_after => { H2=>'</em>' }
250
251 For defining layout of significant elements in the ToC.
252
253 This expects a reference to a hash of tag=>suffix pairs.
254
255 The tag is the HTML tag which marks the start of the element. The
256 suffix is what is required to be appended to the Table of Contents
257 entry generated for that tag.
258
259 (default: undefined)
260
261 toc_before
262 toc_before => \%toc_before_data
263
264 %toc_before_data = { tag1 => prefix1,
265 tag2 => prefix2
266 };
267
268 toc_before=>{ H2=>'<em>' }
269
270 For defining the layout of significant elements in the ToC. The
271 tag is the HTML tag which marks the start of the element. The
272 prefix is what is required to be prepended to the Table of Contents
273 entry generated for that tag.
274
275 (default: undefined)
276
277 toc_end
278 toc_end => \%toc_end_data
279
280 %toc_end_data = { tag1 => endtag1,
281 tag2 => endtag2
282 };
283
284 toc_end => { H1 => '/H1', H2 => '/H2' }
285
286 For defining significant elements. The tag is the HTML tag which
287 marks the start of the element. The endtag the HTML tag which
288 marks the end of the element. When matching in the input file,
289 case is ignored (but make sure that all your tag options referring
290 to the same tag are exactly the same!).
291
292 toc_entry
293 toc_entry => \%toc_entry_data
294
295 %toc_entry_data = { tag1 => level1,
296 tag2 => level2
297 };
298
299 toc_entry => { H1 => 1, H2 => 2 }
300
301 For defining significant elements. The tag is the HTML tag which
302 marks the start of the element. The level is what level the tag is
303 considered to be. The value of level must be numeric, and non-
304 zero. If the value is negative, consective entries represented by
305 the significant_element will be separated by the value set by
306 entrysep option.
307
308 toclabel
309 toclabel => string
310
311 HTML text that labels the ToC. Always used. (default: "<h1>Table
312 of Contents</h1>")
313
314 toc_tag
315 toc_tag => string
316
317 If a ToC is to be included inline, this is the pattern which is
318 used to match the tag where the ToC should be put. This can be a
319 start-tag, an end-tag or a comment, but the < should be left out;
320 that is, if you want the ToC to be placed after the BODY tag, then
321 give "BODY". If you want a special comment tag to make where the
322 ToC should go, then include the comment marks, for example:
323 "!--toc--" (default:BODY)
324
325 toc_tag_replace
326 toc_tag_replace => 1
327
328 In conjunction with toc_tag, this is a flag to say whether the
329 given tag should be replaced, or if the ToC should be put after the
330 tag. This can be useful if your toc_tag is a comment and you don't
331 need it after you have the ToC in place. (default:false)
332
333 toc_only
334 toc_only => 1
335
336 Output only the Table of Contents, that is, the Table of Contents
337 plus the toclabel. If there is a header or a footer, these will
338 also be output.
339
340 If toc_only is false then if there is no header, and inline is not
341 true, then a suitable HTML page header will be output, and if there
342 is no footer and inline is not true, then a HTML page footer will
343 be output.
344
345 (default:false)
346
347 to_string
348 to_string => 1
349
350 Return the modified HTML output as a string. This does override
351 other methods of output (unlike version 3.00). If to_string is
352 false, the method will return 1 rather than a string.
353
354 use_id
355 use_id => 1
356
357 Use id="name" for anchors rather than <a name="name"/> anchors.
358 However if an anchor already exists for a Significant Element, this
359 won't make an id for that particular element.
360
361 useorg
362 useorg => 1
363
364 Use pre-existing backup files as the input source; that is, files
365 of the form infile.bak (see input and bak).
366
368 These methods are documented for developer purposes and aren't intended
369 to be used externally.
370
371 make_anchor_name
372 $toc->make_anchor_name(content=>$content,
373 anchors=>\%anchors);
374
375 Makes the anchor-name for one anchor. Bases the anchor on the content
376 of the significant element. Ensures that anchors are unique.
377
378 make_anchors
379 my $new_html = $toc->make_anchors(input=>$html,
380 notoc_match=>$notoc_match,
381 use_id=>$use_id,
382 toc_entry=>\%toc_entries,
383 toc_end=>\%toc_ends,
384 );
385
386 Makes the anchors the given input string. Returns a string.
387
388 make_toc_list
389 my @toc_list = $toc->make_toc_list(input=>$html,
390 labels=>\%labels,
391 notoc_match=>$notoc_match,
392 toc_entry=>\%toc_entry,
393 toc_end=>\%toc_end,
394 filename=>$filename);
395
396 Makes a list of lists which represents the structure and content of (a
397 portion of) the ToC from one file. Also updates a list of labels for
398 the ToC entries.
399
400 build_lol
401 Build a list of lists of paths, given a list of hashes with info about
402 paths.
403
404 output_toc
405 $self->output_toc(toc=>$toc_str,
406 input=>\@input,
407 filenames=>\@filenames);
408
409 Put the output (whether to file, STDOUT or string). The "output" in
410 this case could be the ToC, the modified (anchors added) HTML, or both.
411
412 put_toc_inline
413 my $newhtml = $toc->put_toc_inline(toc_str=>$toc_str,
414 filename=>$filename, in_string=>$in_string);
415
416 Puts the given toc_str into the given input string; returns a string.
417
418 cp
419 cp($src, $dst);
420
421 Copies file $src to $dst. Used for making backups of files.
422
424 Formatting the ToC
425 The toc_entry and other related options give you control on how the ToC
426 entries may look, but there are other options to affect the final
427 appearance of the ToC file created.
428
429 With the header option, the contents of the given file (or string) will
430 be prepended before the generated ToC. This allows you to have
431 introductory text, or any other text, before the ToC.
432
433 Note:
434 If you use the header option, make sure the file specified contains
435 the opening HTML tag, the HEAD element (containing the TITLE
436 element), and the opening BODY tag. However, these tags/elements
437 should not be in the header file if the inline option is used. See
438 "Inlining the ToC" for information on what the header file should
439 contain for inlining the ToC.
440
441 With the toclabel option, the contents of the given string will be
442 prepended before the generated ToC (but after any text taken from a
443 header file).
444
445 With the footer option, the contents of the file will be appended after
446 the generated ToC.
447
448 Note:
449 If you use the footer, make sure it includes the closing BODY and
450 HTML tags (unless, of course, you are using the inline option).
451
452 If the header option is not specified, the appropriate starting HTML
453 markup will be added, unless the toc_only option is specified. If the
454 footer option is not specified, the appropriate closing HTML markup
455 will be added, unless the toc_only option is specified.
456
457 If you do not want/need to deal with header, and footer, files, then
458 you are allowed to specify the title, title option, of the ToC file;
459 and it allows you to specify a heading, or label, to put before ToC
460 entries' list, the toclabel option. Both options have default values.
461
462 If you do not want HTML page tags to be supplied, and just want the ToC
463 itself, then specify the toc_only option. If there are no header or
464 footer files, then this will simply output the contents of toclabel and
465 the ToC itself.
466
467 Inlining the ToC
468 The ability to incorporate the ToC directly into an HTML document is
469 supported via the inline option.
470
471 Inlining will be done on the first file in the list of files processed,
472 and will only be done if that file contains an opening tag matching the
473 toc_tag value.
474
475 If overwrite is true, then the first file in the list will be
476 overwritten, with the generated ToC inserted at the appropriate spot.
477 Otherwise a modified version of the first file is output to either
478 STDOUT or to the output file defined by the outfile option.
479
480 The options toc_tag and toc_tag_replace are used to determine where and
481 how the ToC is inserted into the output.
482
483 Example 1
484
485 $toc->generate_toc(inline=>1,
486 toc_tag => 'BODY',
487 toc_tag_replace => 0,
488 ...
489 );
490
491 This will put the generated ToC after the BODY tag of the first file.
492 If the header option is specified, then the contents of the specified
493 file are inserted after the BODY tag. If the toclabel option is not
494 empty, then the text specified by the toclabel option is inserted.
495 Then the ToC is inserted, and finally, if the footer option is
496 specified, it inserts the footer. Then the rest of the input file
497 follows as it was before.
498
499 Example 2
500
501 $toc->generate_toc(inline=>1,
502 toc_tag => '!--toc--',
503 toc_tag_replace => 1,
504 ...
505 );
506
507 This will put the generated ToC after the first comment of the form
508 <!--toc-->, and that comment will be replaced by the ToC (in the order
509 header
510 toclabel
511 ToC
512 footer) followed by the rest of the input file.
513
514 Note:
515 The header file should not contain the beginning HTML tag and HEAD
516 element since the HTML file being processed should already contain
517 these tags/elements.
518
520 • HTML::GenToc is smart enough to detect anchors inside significant
521 elements. If the anchor defines the NAME attribute, HTML::GenToc
522 uses the value. Else, it adds its own NAME attribute to the anchor.
523 If use_id is true, then it likewise checks for and uses IDs.
524
525 • The TITLE element is treated specially if specified in the
526 toc_entry option. It is illegal to insert anchors (A) into TITLE
527 elements. Therefore, HTML::GenToc will actually link to the
528 filename itself instead of the TITLE element of the document.
529
530 • HTML::GenToc will ignore a significant element if it does not
531 contain any non-whitespace characters. A warning message is
532 generated if such a condition exists.
533
534 • If you have a sequence of significant elements that change in a
535 slightly disordered fashion, such as H1 -> H3 -> H2 or even H2 ->
536 H1, though HTML::GenToc deals with this to create a list which is
537 still good HTML, if you are using an ordered list to that depth,
538 then you will get strange numbering, as an extra list element will
539 have been inserted to nest the elements at the correct level.
540
541 For example (H2 -> H1 with ol_num_levels=1):
542
543 1.
544 * My H2 Header
545 2. My H1 Header
546
547 For example (H1 -> H3 -> H2 with ol_num_levels=0 and H3 also being
548 significant):
549
550 1. My H1 Header
551 1.
552 1. My H3 Header
553 2. My H2 Header
554 2. My Second H1 Header
555
556 In cases such as this it may be better not to use the ol option.
557
559 • Version 3.10 (and above) generates more verbose (SEO-friendly)
560 anchors than prior versions. Thus anchors generated with earlier
561 versions will not match version 3.10 anchors.
562
563 • Version 3.00 (and above) of HTML::GenToc is not compatible with
564 Version 2.x of HTML::GenToc. It is now designed to do everything
565 in one pass, and has dropped certain options: the infile option is
566 no longer used (it has been replaced with the input option); the
567 toc_file option no longer exists; use the outfile option instead;
568 the tocmap option is no longer supported. Also the old array-
569 parsing of arguments is no longer supported. There is no longer a
570 generate_anchors method; everything is done with generate_toc.
571
572 It now generates lower-case tags rather than upper-case ones.
573
574 • HTML::GenToc is not very efficient (memory and speed), and can be
575 slow for large documents.
576
577 • Now that generation of anchors and of the ToC are done in one pass,
578 even more memory is used than was the case before. This is more
579 notable when processing multiple files, since all files are read
580 into memory before processing them.
581
582 • Invalid markup will be generated if a significant element is
583 contained inside of an anchor. For example:
584
585 <a name="foo"><h1>The FOO command</h1></a>
586
587 will be converted to (if H1 is a significant element),
588
589 <a name="foo"><h1><a name="The">The</a> FOO command</h1></a>
590
591 which is illegal since anchors cannot be nested.
592
593 It is better style to put anchor statements within the element to
594 be anchored. For example, the following is preferred:
595
596 <h1><a name="foo">The FOO command</a></h1>
597
598 HTML::GenToc will detect the "foo" name and use it.
599
600 • name attributes without quotes are not recognized.
601
603 Tell me about them.
604
606 The installation of this module requires "Module::Build". The module
607 depends on "HTML::SimpleParse", "HTML::Entities" and "HTML::LinkList"
608 and uses "Data::Dumper" for debugging purposes. The hypertoc script
609 depends on "Getopt::Long", "Getopt::ArgvFile" and "Pod::Usage".
610 Testing of this distribution depends on "Test::More".
611
613 To install this module, run the following commands:
614
615 perl Build.PL
616 ./Build
617 ./Build test
618 ./Build install
619
620 Or, if you're on a platform (like DOS or Windows) that doesn't like the
621 "./" notation, you can do this:
622
623 perl Build.PL
624 perl Build
625 perl Build test
626 perl Build install
627
628 In order to install somewhere other than the default, such as in a
629 directory under your home directory, like "/home/fred/perl" go
630
631 perl Build.PL --install_base /home/fred/perl
632
633 as the first step instead.
634
635 This will install the files underneath /home/fred/perl.
636
637 You will then need to make sure that you alter the PERL5LIB variable to
638 find the modules, and the PATH variable to find the script.
639
640 Therefore you will need to change: your path, to include
641 /home/fred/perl/script (where the script will be)
642
643 PATH=/home/fred/perl/script:${PATH}
644
645 the PERL5LIB variable to add /home/fred/perl/lib
646
647 PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
648
650 perl(1) htmltoc(1) hypertoc(1)
651
653 Kathryn Andersen
654 (RUBYKAT) http://www.katspace.org/tools/hypertoc/
655
656 Based on htmltoc by Earl Hood ehood AT medusa.acs.uci.edu
657
658 Contributions by Dan Dascalescu, <http://dandascalescu.com>
659
661 Copyright (C) 1994-1997 Earl Hood, ehood AT medusa.acs.uci.edu
662 Copyright (C) 2002-2008 Kathryn Andersen
663
664 This program is free software; you can redistribute it and/or modify it
665 under the terms of the GNU General Public License as published by the
666 Free Software Foundation; either version 2 of the License, or (at your
667 option) any later version.
668
669 This program is distributed in the hope that it will be useful, but
670 WITHOUT ANY WARRANTY; without even the implied warranty of
671 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
672 General Public License for more details.
673
674 You should have received a copy of the GNU General Public License along
675 with this program; if not, write to the Free Software Foundation, Inc.,
676 675 Mass Ave, Cambridge, MA 02139, USA.
677
678
679
680perl v5.32.1 2021-01-27 HTML::GenToc(3)