1HTML::GenToc(3)       User Contributed Perl Documentation      HTML::GenToc(3)
2
3
4

NAME

6       HTML::GenToc - Generate a Table of Contents for HTML documents.
7

VERSION

9       version 3.20
10

SYNOPSIS

12         use HTML::GenToc;
13
14         # create a new object
15         my $toc = new HTML::GenToc();
16
17         my $toc = new HTML::GenToc(title=>"Table of Contents",
18                                 toc_entry=>{
19                                   H1=>1,
20                                   H2=>2
21                                 },
22                                 toc_end=>{
23                                   H1=>'/H1',
24                                   H2=>'/H2'
25                                 }
26           );
27
28         # generate a ToC from a file
29         $toc->generate_toc(input=>$html_file,
30                            footer=>$footer_file,
31                            header=>$header_file
32           );
33

DESCRIPTION

35       HTML::GenToc generates anchors and a table of contents for HTML
36       documents.  Depending on the arguments, it will insert the information
37       it generates, or output to a string, a separate file or STDOUT.
38
39       While it defaults to taking H1 and H2 elements as the significant
40       elements to put into the table of contents, any tag can be defined as a
41       significant element.  Also, it doesn't matter if the input HTML code is
42       complete, pure HTML, one can input pseudo-html or page-fragments, which
43       makes it suitable for using on templates and HTML meta-languages such
44       as WML.
45
46       Also included in the distrubution is hypertoc, a script which uses the
47       module so that one can process files on the command-line in a user-
48       friendly manner.
49

DETAILS

51       The ToC generated is a multi-level level list containing links to the
52       significant elements. HTML::GenToc inserts the links into the ToC to
53       significant elements at a level specified by the user.
54
55       Example:
56
57       If H1s are specified as level 1, than they appear in the first level
58       list of the ToC. If H2s are specified as a level 2, than they appear in
59       a second level list in the ToC.
60
61       Information on the significant elements and what level they should
62       occur are passed in to the methods used by this object, or one can use
63       the defaults.
64
65       There are two phases to the ToC generation.  The first phase is to put
66       suitable anchors into the HTML documents, and the second phase is to
67       generate the ToC from HTML documents which have anchors in them for the
68       ToC to link to.
69
70       For more information on controlling the contents of the created ToC,
71       see "Formatting the ToC".
72
73       HTML::GenToc also supports the ability to incorporate the ToC into the
74       HTML document itself via the inline option.  See "Inlining the ToC" for
75       more information.
76
77       In order for HTML::GenToc to support linking to significant elements,
78       HTML::GenToc inserts anchors into the significant elements.  One can
79       use HTML::GenToc as a filter, outputing the result to another file, or
80       one can overwrite the original file, with the original backed up with a
81       suffix (default: "org") appended to the filename.  One can also output
82       the result to a string.
83

METHODS

85       Default arguments can be set when the object is created, and overridden
86       by setting arguments when the generate_toc method is called.  Arguments
87       are given as a hash of arguments.
88
89   Method -- new
90           $toc = new HTML::GenToc();
91
92           $toc = new HTML::GenToc(toc_entry=>\%my_toc_entry,
93               toc_end=>\%my_toc_end,
94               bak=>'bak',
95               ...
96               );
97
98       Creates a new HTML::GenToc object.
99
100       These arguments will be used as defaults in invocations of other
101       methods.
102
103       See generate_tod for possible arguments.
104
105   generate_toc
106           $toc->generate_toc(outfile=>"index2.html");
107
108           my $result_str = $toc->generate_toc(to_string=>1);
109
110       Generates a table of contents for the significant elements in the HTML
111       documents, optionally generating anchors for them first.
112
113       Options
114
115       bak bak => string
116
117           If the input file/files is/are being overwritten (overwrite is on),
118           copy the original file to "filename.string".  If the value is
119           empty, no backup file will be created.  (default:org)
120
121       debug
122           debug => 1
123
124           Enable verbose debugging output.  Used for debugging this module;
125           in other words, don't bother.  (default:off)
126
127       entrysep
128           entrysep => string
129
130           Separator string for non-<li> item entries (default: ", ")
131
132       filenames
133           filenames => \@filenames
134
135           The filenames to use when creating table-of-contents links.  This
136           overrides the filenames given in the input option, and is expected
137           to have exactly the same number of elements.  This can also be used
138           when passing in string-content to the input option, to give a
139           (fake) filename to use for the links relating to that content.
140
141       footer
142           footer => file_or_string
143
144           Either the filename of the file containing footer text for ToC; or
145           a string containing the footer text.
146
147       header
148           header => file_or_string
149
150           Either the filename of the file containing header text for ToC; or
151           a string containing the header text.
152
153       ignore_only_one
154           ignore_only_one => 1
155
156           If there would be only one item in the ToC, don't make a ToC.
157
158       ignore_sole_first
159           ignore_sole_first => 1
160
161           If the first item in the ToC is of the highest level, AND it is the
162           only one of that level, ignore it.  This is useful in web-pages
163           where there is only one H1 header but one doesn't know beforehand
164           whether there will be only one.
165
166       inline
167           inline => 1
168
169           Put ToC in document at a given point.  See "Inlining the ToC" for
170           more information.
171
172       input
173           input => \@filenames
174
175           input => $content
176
177           This is expected to be either a reference to an array of filenames,
178           or a string containing content to process.
179
180           The three main uses would be:
181
182           (a) you have more than one file to process, so pass in multiple
183               filenames
184
185           (b) you have one file to process, so pass in its filename as the
186               only array item
187
188           (c) you have HTML content to process, so pass in just the content
189               as a string
190
191           (default:undefined)
192
193       notoc_match
194           notoc_match => string
195
196           If there are certain individual tags you don't wish to include in
197           the table of contents, even though they match the "significant
198           elements", then if this pattern matches contents inside the tag
199           (not the body), then that tag will not be included, either in
200           generating anchors nor in generating the ToC.  (default:
201           "class="notoc"")
202
203       ol  ol => 1
204
205           Use an ordered list for level 1 ToC entries.
206
207       ol_num_levels
208           ol_num_levels => 2
209
210           The number of levels deep the OL listing will go if ol is true.  If
211           set to zero, will use an ordered list for all levels.  (default:1)
212
213       overwrite
214           overwrite => 1
215
216           Overwrite the input file with the output.  (default:off)
217
218       outfile
219           outfile => file
220
221           File to write the output to.  This is where the modified HTML
222           output goes to.  Note that it doesn't make sense to use this option
223           if you are processing more than one file.  If you give '-' as the
224           filename, then output will go to STDOUT.  (default: STDOUT)
225
226       quiet
227           quiet => 1
228
229           Suppress informative messages. (default: off)
230
231       textonly
232           textonly => 1
233
234           Use only text content in significant elements.
235
236       title
237           title => string
238
239           Title for ToC page (if not using header or inline or toc_only)
240           (default: "Table of Contents")
241
242       toc_after
243           toc_after => \%toc_after_data
244
245           %toc_after_data = { tag1 => suffix1,
246               tag2 => suffix2
247               };
248
249           toc_after => { H2=>'</em>' }
250
251           For defining layout of significant elements in the ToC.
252
253           This expects a reference to a hash of tag=>suffix pairs.
254
255           The tag is the HTML tag which marks the start of the element.  The
256           suffix is what is required to be appended to the Table of Contents
257           entry generated for that tag.
258
259           (default: undefined)
260
261       toc_before
262           toc_before => \%toc_before_data
263
264           %toc_before_data = { tag1 => prefix1,
265               tag2 => prefix2
266               };
267
268           toc_before=>{ H2=>'<em>' }
269
270           For defining the layout of significant elements in the ToC.  The
271           tag is the HTML tag which marks the start of the element.  The
272           prefix is what is required to be prepended to the Table of Contents
273           entry generated for that tag.
274
275           (default: undefined)
276
277       toc_end
278           toc_end => \%toc_end_data
279
280           %toc_end_data = { tag1 => endtag1,
281               tag2 => endtag2
282               };
283
284           toc_end => { H1 => '/H1', H2 => '/H2' }
285
286           For defining significant elements.  The tag is the HTML tag which
287           marks the start of the element.  The endtag the HTML tag which
288           marks the end of the element.  When matching in the input file,
289           case is ignored (but make sure that all your tag options referring
290           to the same tag are exactly the same!).
291
292       toc_entry
293           toc_entry => \%toc_entry_data
294
295           %toc_entry_data = { tag1 => level1,
296               tag2 => level2
297               };
298
299           toc_entry => { H1 => 1, H2 => 2 }
300
301           For defining significant elements.  The tag is the HTML tag which
302           marks the start of the element.  The level is what level the tag is
303           considered to be.  The value of level must be numeric, and non-
304           zero. If the value is negative, consective entries represented by
305           the significant_element will be separated by the value set by
306           entrysep option.
307
308       toclabel
309           toclabel => string
310
311           HTML text that labels the ToC.  Always used.  (default: "<h1>Table
312           of Contents</h1>")
313
314       toc_tag
315           toc_tag => string
316
317           If a ToC is to be included inline, this is the pattern which is
318           used to match the tag where the ToC should be put.  This can be a
319           start-tag, an end-tag or a comment, but the < should be left out;
320           that is, if you want the ToC to be placed after the BODY tag, then
321           give "BODY".  If you want a special comment tag to make where the
322           ToC should go, then include the comment marks, for example:
323           "!--toc--" (default:BODY)
324
325       toc_tag_replace
326           toc_tag_replace => 1
327
328           In conjunction with toc_tag, this is a flag to say whether the
329           given tag should be replaced, or if the ToC should be put after the
330           tag.  This can be useful if your toc_tag is a comment and you don't
331           need it after you have the ToC in place.  (default:false)
332
333       toc_only
334           toc_only => 1
335
336           Output only the Table of Contents, that is, the Table of Contents
337           plus the toclabel.  If there is a header or a footer, these will
338           also be output.
339
340           If toc_only is false then if there is no header, and inline is not
341           true, then a suitable HTML page header will be output, and if there
342           is no footer and inline is not true, then a HTML page footer will
343           be output.
344
345           (default:false)
346
347       to_string
348           to_string => 1
349
350           Return the modified HTML output as a string.  This does override
351           other methods of output (unlike version 3.00).  If to_string is
352           false, the method will return 1 rather than a string.
353
354       use_id
355           use_id => 1
356
357           Use id="name" for anchors rather than <a name="name"/> anchors.
358           However if an anchor already exists for a Significant Element, this
359           won't make an id for that particular element.
360
361       useorg
362           useorg => 1
363
364           Use pre-existing backup files as the input source; that is, files
365           of the form infile.bak  (see input and bak).
366

INTERNAL METHODS

368       These methods are documented for developer purposes and aren't intended
369       to be used externally.
370
371   make_anchor_name
372           $toc->make_anchor_name(content=>$content,
373               anchors=>\%anchors);
374
375       Makes the anchor-name for one anchor.  Bases the anchor on the content
376       of the significant element.  Ensures that anchors are unique.
377
378   make_anchors
379           my $new_html = $toc->make_anchors(input=>$html,
380               notoc_match=>$notoc_match,
381               use_id=>$use_id,
382               toc_entry=>\%toc_entries,
383               toc_end=>\%toc_ends,
384               );
385
386       Makes the anchors the given input string.  Returns a string.
387
388   make_toc_list
389           my @toc_list = $toc->make_toc_list(input=>$html,
390               labels=>\%labels,
391               notoc_match=>$notoc_match,
392               toc_entry=>\%toc_entry,
393               toc_end=>\%toc_end,
394               filename=>$filename);
395
396       Makes a list of lists which represents the structure and content of (a
397       portion of) the ToC from one file.  Also updates a list of labels for
398       the ToC entries.
399
400   build_lol
401       Build a list of lists of paths, given a list of hashes with info about
402       paths.
403
404   output_toc
405           $self->output_toc(toc=>$toc_str,
406               input=>\@input,
407               filenames=>\@filenames);
408
409       Put the output (whether to file, STDOUT or string).  The "output" in
410       this case could be the ToC, the modified (anchors added) HTML, or both.
411
412   put_toc_inline
413           my $newhtml = $toc->put_toc_inline(toc_str=>$toc_str,
414               filename=>$filename, in_string=>$in_string);
415
416       Puts the given toc_str into the given input string; returns a string.
417
418   cp
419           cp($src, $dst);
420
421       Copies file $src to $dst.  Used for making backups of files.
422

FILE FORMATS

424   Formatting the ToC
425       The toc_entry and other related options give you control on how the ToC
426       entries may look, but there are other options to affect the final
427       appearance of the ToC file created.
428
429       With the header option, the contents of the given file (or string) will
430       be prepended before the generated ToC. This allows you to have
431       introductory text, or any other text, before the ToC.
432
433       Note:
434           If you use the header option, make sure the file specified contains
435           the opening HTML tag, the HEAD element (containing the TITLE
436           element), and the opening BODY tag. However, these tags/elements
437           should not be in the header file if the inline option is used. See
438           "Inlining the ToC" for information on what the header file should
439           contain for inlining the ToC.
440
441       With the toclabel option, the contents of the given string will be
442       prepended before the generated ToC (but after any text taken from a
443       header file).
444
445       With the footer option, the contents of the file will be appended after
446       the generated ToC.
447
448       Note:
449           If you use the footer, make sure it includes the closing BODY and
450           HTML tags (unless, of course, you are using the inline option).
451
452       If the header option is not specified, the appropriate starting HTML
453       markup will be added, unless the toc_only option is specified.  If the
454       footer option is not specified, the appropriate closing HTML markup
455       will be added, unless the toc_only option is specified.
456
457       If you do not want/need to deal with header, and footer, files, then
458       you are allowed to specify the title, title option, of the ToC file;
459       and it allows you to specify a heading, or label, to put before ToC
460       entries' list, the toclabel option. Both options have default values.
461
462       If you do not want HTML page tags to be supplied, and just want the ToC
463       itself, then specify the toc_only option.  If there are no header or
464       footer files, then this will simply output the contents of toclabel and
465       the ToC itself.
466
467   Inlining the ToC
468       The ability to incorporate the ToC directly into an HTML document is
469       supported via the inline option.
470
471       Inlining will be done on the first file in the list of files processed,
472       and will only be done if that file contains an opening tag matching the
473       toc_tag value.
474
475       If overwrite is true, then the first file in the list will be
476       overwritten, with the generated ToC inserted at the appropriate spot.
477       Otherwise a modified version of the first file is output to either
478       STDOUT or to the output file defined by the outfile option.
479
480       The options toc_tag and toc_tag_replace are used to determine where and
481       how the ToC is inserted into the output.
482
483       Example 1
484
485           $toc->generate_toc(inline=>1,
486                              toc_tag => 'BODY',
487                              toc_tag_replace => 0,
488                              ...
489                              );
490
491       This will put the generated ToC after the BODY tag of the first file.
492       If the header option is specified, then the contents of the specified
493       file are inserted after the BODY tag.  If the toclabel option is not
494       empty, then the text specified by the toclabel option is inserted.
495       Then the ToC is inserted, and finally, if the footer option is
496       specified, it inserts the footer.  Then the rest of the input file
497       follows as it was before.
498
499       Example 2
500
501           $toc->generate_toc(inline=>1,
502                              toc_tag => '!--toc--',
503                              toc_tag_replace => 1,
504                              ...
505                              );
506
507       This will put the generated ToC after the first comment of the form
508       <!--toc-->, and that comment will be replaced by the ToC (in the order
509           header
510           toclabel
511           ToC
512           footer) followed by the rest of the input file.
513
514       Note:
515           The header file should not contain the beginning HTML tag and HEAD
516           element since the HTML file being processed should already contain
517           these tags/elements.
518

NOTES

520       ·   HTML::GenToc is smart enough to detect anchors inside significant
521           elements. If the anchor defines the NAME attribute, HTML::GenToc
522           uses the value. Else, it adds its own NAME attribute to the anchor.
523           If use_id is true, then it likewise checks for and uses IDs.
524
525       ·   The TITLE element is treated specially if specified in the
526           toc_entry option. It is illegal to insert anchors (A) into TITLE
527           elements.  Therefore, HTML::GenToc will actually link to the
528           filename itself instead of the TITLE element of the document.
529
530       ·   HTML::GenToc will ignore a significant element if it does not
531           contain any non-whitespace characters. A warning message is
532           generated if such a condition exists.
533
534       ·   If you have a sequence of significant elements that change in a
535           slightly disordered fashion, such as H1 -> H3 -> H2 or even H2 ->
536           H1, though HTML::GenToc deals with this to create a list which is
537           still good HTML, if you are using an ordered list to that depth,
538           then you will get strange numbering, as an extra list element will
539           have been inserted to nest the elements at the correct level.
540
541           For example (H2 -> H1 with ol_num_levels=1):
542
543               1.
544                   * My H2 Header
545               2. My H1 Header
546
547           For example (H1 -> H3 -> H2 with ol_num_levels=0 and H3 also being
548           significant):
549
550               1. My H1 Header
551                   1.
552                       1. My H3 Header
553                   2. My H2 Header
554               2. My Second H1 Header
555
556           In cases such as this it may be better not to use the ol option.
557

CAVEATS

559       ·   Version 3.10 (and above) generates more verbose (SEO-friendly)
560           anchors than prior versions. Thus anchors generated with earlier
561           versions will not match version 3.10 anchors.
562
563       ·   Version 3.00 (and above) of HTML::GenToc is not compatible with
564           Version 2.x of HTML::GenToc.  It is now designed to do everything
565           in one pass, and has dropped certain options: the infile option is
566           no longer used (it has been replaced with the input option); the
567           toc_file option no longer exists; use the outfile option instead;
568           the tocmap option is no longer supported.  Also the old array-
569           parsing of arguments is no longer supported.  There is no longer a
570           generate_anchors method; everything is done with generate_toc.
571
572           It now generates lower-case tags rather than upper-case ones.
573
574       ·   HTML::GenToc is not very efficient (memory and speed), and can be
575           slow for large documents.
576
577       ·   Now that generation of anchors and of the ToC are done in one pass,
578           even more memory is used than was the case before.  This is more
579           notable when processing multiple files, since all files are read
580           into memory before processing them.
581
582       ·   Invalid markup will be generated if a significant element is
583           contained inside of an anchor. For example:
584
585               <a name="foo"><h1>The FOO command</h1></a>
586
587           will be converted to (if H1 is a significant element),
588
589               <a name="foo"><h1><a name="The">The</a> FOO command</h1></a>
590
591           which is illegal since anchors cannot be nested.
592
593           It is better style to put anchor statements within the element to
594           be anchored. For example, the following is preferred:
595
596               <h1><a name="foo">The FOO command</a></h1>
597
598           HTML::GenToc will detect the "foo" name and use it.
599
600       ·   name attributes without quotes are not recognized.
601

BUGS

603       Tell me about them.
604

REQUIRES

606       The installation of this module requires "Module::Build".  The module
607       depends on "HTML::SimpleParse", "HTML::Entities" and "HTML::LinkList"
608       and uses "Data::Dumper" for debugging purposes.  The hypertoc script
609       depends on "Getopt::Long", "Getopt::ArgvFile" and "Pod::Usage".
610       Testing of this distribution depends on "Test::More".
611

INSTALLATION

613       To install this module, run the following commands:
614
615           perl Build.PL
616           ./Build
617           ./Build test
618           ./Build install
619
620       Or, if you're on a platform (like DOS or Windows) that doesn't like the
621       "./" notation, you can do this:
622
623          perl Build.PL
624          perl Build
625          perl Build test
626          perl Build install
627
628       In order to install somewhere other than the default, such as in a
629       directory under your home directory, like "/home/fred/perl" go
630
631          perl Build.PL --install_base /home/fred/perl
632
633       as the first step instead.
634
635       This will install the files underneath /home/fred/perl.
636
637       You will then need to make sure that you alter the PERL5LIB variable to
638       find the modules, and the PATH variable to find the script.
639
640       Therefore you will need to change: your path, to include
641       /home/fred/perl/script (where the script will be)
642
643               PATH=/home/fred/perl/script:${PATH}
644
645       the PERL5LIB variable to add /home/fred/perl/lib
646
647               PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
648

SEE ALSO

650       perl(1) htmltoc(1) hypertoc(1)
651

AUTHOR

653       Kathryn Andersen
654       (RUBYKAT)     http://www.katspace.org/tools/hypertoc/
655
656       Based on htmltoc by Earl Hood       ehood AT medusa.acs.uci.edu
657
658       Contributions by Dan Dascalescu, <http://dandascalescu.com>
659
661       Copyright (C) 1994-1997  Earl Hood, ehood AT medusa.acs.uci.edu
662       Copyright (C) 2002-2008 Kathryn Andersen
663
664       This program is free software; you can redistribute it and/or modify it
665       under the terms of the GNU General Public License as published by the
666       Free Software Foundation; either version 2 of the License, or (at your
667       option) any later version.
668
669       This program is distributed in the hope that it will be useful, but
670       WITHOUT ANY WARRANTY; without even the implied warranty of
671       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
672       General Public License for more details.
673
674       You should have received a copy of the GNU General Public License along
675       with this program; if not, write to the Free Software Foundation, Inc.,
676       675 Mass Ave, Cambridge, MA 02139, USA.
677
678
679
680perl v5.30.0                      2019-07-26                   HTML::GenToc(3)
Impressum