1HTML::FormatText::WithLUisnekrs(C3o)ntributed Perl DocumHeTnMtLa:t:iFoonrmatText::WithLinks(3)
2
3
4

NAME

6       HTML::FormatText::WithLinks - HTML to text conversion with links as
7       footnotes
8

SYNOPSIS

10           use HTML::FormatText::WithLinks;
11
12           my $f = HTML::FormatText::WithLinks->new();
13
14           my $html = qq(
15           <html>
16           <body>
17           <p>
18               Some html with a <a href="http://example.com/">link</a>
19           </p>
20           </body>
21           </html>
22           );
23
24           my $text = $f->parse($html);
25
26           print $text;
27
28           # results in something like
29
30           Some html with a [1]link
31
32           1. http://example.com/
33
34           my $f2 = HTML::FormatText::WithLinks->new(
35               before_link => '',
36               after_link => ' [%l]',
37               footnote => ''
38           );
39
40           $text = $f2->parse($html);
41           print $text;
42
43           # results in something like
44
45           Some html with a link [http://example.com/]
46
47           my $f3 = HTML::FormatText::WithLinks->new(
48               link_num_generator => sub {
49                   return "*" x (shift() + 1);
50               },
51               footnote => '[%n] %l'
52           );
53
54           $text = $f3->parse($html);
55           print $text;
56
57           # results in something like
58
59           Some html with a [*]link
60
61           [*] http://example.com/
62

DESCRIPTION

64       HTML::FormatText::WithLinks takes HTML and turns it into plain text but
65       prints all the links in the HTML as footnotes. By default, it attempts
66       to mimic the format of the lynx text based web browser's --dump option.
67

METHODS

69   new
70           my $f = HTML::FormatText::WithLinks->new( %options );
71
72       Returns a new instance. It accepts all the options of HTML::FormatText
73       plus
74
75       base
76           a base option. This should be set to a URI which will be used to
77           turn any relative URIs on the HTML to absolute ones.
78
79       doc_overrides_base
80           If a base element is found in the document and it has an href
81           attribute then setting doc_overrides_base to true will cause the
82           document's base to be used. This defaults to false.
83
84       before_link (default: '[%n]')
85       after_link (default: '')
86       footnote (default: '[%n] %l')
87           a string to print before a link (i.e. when the <a> is found), after
88           link has ended (i.e. when then </a> is found) and when printing out
89           footnotes.
90
91           "%n" will be replaced by the link number, "%l" will be replaced by
92           the link itself.
93
94           If footnote is set to '', no footnotes will be printed.
95
96       link_num_generator (default: sub { return shift() + 1 })
97           link_num_generator is a sub that returns the value to be printed
98           for a given link number. The internal store starts numbering at 0.
99
100       with_emphasis
101           If set to 1 then italicised text will be surrounded by "/" and
102           bolded text by "_".  You can change these markers by using the
103           "italic_marker" and "bold_marker" options.
104
105       unique_links
106           If set to 1 then will only generate 1 footnote per unique URI as
107           oppose to the default behaviour which is to generate a footnote per
108           URI.
109
110       anchor_links
111           If set to 0 then links pointing to local anchors will be skipped.
112           The default behaviour is to include all links.
113
114       skip_linked_urls
115           If set to 1, then links where the text equals the href value will
116           be skipped.  The default behaviour is to include all links.
117
118   parse
119           my $text = $f->parse($html);
120
121       Takes some HTML and returns it as text. Returns undef on error.
122
123       Will also return undef if you pass it undef. Returns an empty string if
124       passed an empty string.
125
126   parse_file
127           my $text = $f->parse_file($filename);
128
129       Takes a filename and returns the contents of the file as plain text.
130       Returns undef on error.
131
132   error
133           $f->error();
134
135       Returns the last error that occurred. In practice this is likely to be
136       either a warning that parse_file couldn't find the file or that
137       HTML::TreeBuilder failed.
138

CAVEATS

140       When passing HTML fragments the results may be a little unpredictable.
141       I've tried to work round the most egregious of the issues but any
142       unexpected results are welcome.
143
144       Also note that if for some reason there is an a tag in the document
145       that does not have an href attribute then it will be quietly ignored.
146       If this is really a problem for anyone then let me know and I'll see if
147       I can think of a sensible thing to do in this case.
148

AUTHOR

150       Struan Donald. <struan@cpan.org>
151
152       <http://www.exo.org.uk/code/>
153
154       Ian Malpass <ian@indecorous.com> was responsible for the custom
155       formatting bits and the nudge to release the code.
156
157       Simon Dassow <janus@errornet.de<gt> for the anchor_links option plus a
158       few bugfixes and optimisations
159
160       Kevin Ryde for the code for pulling the base out the document.
161
162       Thomas Sibley <trs@bestpractical.com> patches for skipping links that
163       are their urls and to change the delimiters for bold and italic text..
164

SOURCE CODE

166       The source code for this module is hosted on GitHub
167       <http://github.com/struan/html-formattext-withlinks>
168
170       Copyright (C) 2003-2010 Struan Donald and Ian Malpass. All rights
171       reserved.
172

LICENSE

174       This program is free software; you can redistribute it and/or modify it
175       under the same terms as Perl itself.
176

SEE ALSO

178       perl(1), HTML::Formatter.
179
180
181
182perl v5.36.0                      2023-01-20    HTML::FormatText::WithLinks(3)
Impressum