1linkchecker(1) General Commands Manual linkchecker(1)
2
3
4
6 linkchecker - check HTML documents for broken links
7
9 linkchecker [options] [file-or-url]...
10
12 LinkChecker features recursive checking, multithreading, output in col‐
13 ored or normal text, HTML, SQL, CSV or a sitemap graph in GML or XML,
14 support for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and
15 local file links, restriction of link checking with regular expression
16 filters for URLs, proxy support, username/password authorization for
17 HTTP and FTP, robots.txt exclusion protocol support, i18n support, a
18 command line interface and a (Fast)CGI web interface (requires HTTP
19 server)
20
22 The most common use checks the given domain recursively, plus any URL
23 pointing outside of the domain:
24 linkchecker http://treasure.calvinsplayground.de/
25 Beware that this checks the whole site which can have several hundred
26 thousands URLs. Use the -r option to restrict the recursion depth.
27 Don't connect to mailto: hosts, only check their URL syntax. All other
28 links are checked as usual:
29 linkchecker --ignore-url=^mailto: www.mysite.org
30 Checking a local HTML file on Unix:
31 linkchecker ../bla.html
32 Checking a local HTML file on Windows:
33 linkchecker c:\temp\test.html
34 You can skip the http:// url part if the domain starts with www.:
35 linkchecker www.myhomepage.de
36 You can skip the ftp:// url part if the domain starts with ftp.:
37 linkchecker -r0 ftp.linux.org
38 Generate a sitemap graph and convert it with the graphviz dot utility:
39 linkchecker -odot -v www.myhomepage.de | dot -Tps > sitemap.ps
40
42 General options
43 -h, --help
44 Help me! Print usage information for this program.
45
46 -fFILENAME, --config=FILENAME
47 Use FILENAME as configuration file. As default LinkChecker first
48 searches /etc/linkchecker/linkcheckerrc and then
49 ~/.linkchecker/linkcheckerrc.
50
51 -I, --interactive
52 Ask for URL if none are given on the commandline.
53
54 -tNUMBER, --threads=NUMBER
55 Generate no more than the given number of threads. Default num‐
56 ber of threads is 10. To disable threading specify a non-posi‐
57 tive number.
58
59 --priority
60 Run with normal thread scheduling priority. Per default
61 LinkChecker runs with low thread priority to be suitable as a
62 background job.
63
64 -V, --version
65 Print version and exit.
66
67 --allow-root
68 Do not drop privileges when running as root user on Unix sys‐
69 tems.
70
71 Output options
72 -v, --verbose
73 Log all checked URLs. Default is to log only errors and warn‐
74 ings.
75
76 --no-warnings
77 Don't log warnings. Default is to log warnings.
78
79 -WREGEX, --warning-regex=REGEX
80 Define a regular expression which prints a warning if it matches
81 any content of the checked link. This applies only to valid
82 pages, so we can get their content.
83
84 Use this to check for pages that contain some form of error, for
85 example "This page has moved" or "Oracle Application Server
86 error".
87
88 --warning-size-bytes=NUMBER
89 Print a warning if content size info is available and exceeds
90 the given number of bytes.
91
92 -q, --quiet
93 Quiet operation, an alias for -o none. This is only useful with
94 -F.
95
96 -oTYPE[/ENCODING], --output=TYPE[/ENCODING]
97 Specify output type as text, html, sql, csv, gml, dot, xml, none
98 or blacklist. Default type is text. The various output types
99 are documented below. The ENCODING specifies the output encod‐
100 ing, the default is that of your locale. Valid encodings are
101 listed at http://docs.python.org/lib/standard-encodings.html.
102
103 -FTYPE[/ENCODING][/FILENAME], --file-output=TYPE[/ENCODING][/FILENAME]
104 Output to a file linkchecker-out.TYPE, $HOME/.linkchecker/black‐
105 list for blacklist output, or FILENAME if specified. The ENCOD‐
106 ING specifies the output encoding, the default is that of your
107 locale. Valid encodings are listed at
108 http://docs.python.org/lib/standard-encodings.html. The FILE‐
109 NAME and ENCODING parts of the none output type will be ignored,
110 else if the file already exists, it will be overwritten. You
111 can specify this option more than once. Valid file output types
112 are text, html, sql, csv, gml, dot, xml, none or blacklist
113 Default is no file output. The various output types are docu‐
114 mented below. Note that you can suppress all console output with
115 the option -o none.
116
117 --no-status
118 Do not print check status messages.
119
120 -DSTRING, --debug=STRING
121 Print debugging output for the given logger. Available loggers
122 are cmdline, checking, cache, gui, dns and all. Specifying all
123 is an alias for specifying all available loggers. The option
124 can be given multiple times to debug with more than one logger.
125 For accurate results, threading will be disabled during debug
126 runs.
127
128 --trace
129 Print tracing information.
130
131 --profile
132 Write profiling data into a file named linkchecker.prof in the
133 current working directory. See also --viewprof.
134
135 --viewprof
136 Print out previously generated profiling data. See also --pro‐
137 file.
138
139 Checking options
140 -rNUMBER, --recursion-level=NUMBER
141 Check recursively all links up to given depth. A negative depth
142 will enable infinite recursion. Default depth is infinite.
143
144 --no-follow-url=REGEX
145 Check but do not recurse into URLs matching the given regular
146 expression. This option can be given multiple times.
147
148 --ignore-url=REGEX
149 Only check syntax of URLs matching the given regular expression.
150 This option can be given multiple times.
151
152 -C, --cookies
153 Accept and send HTTP cookies according to RFC 2109. Only cookies
154 which are sent back to the originating server are accepted.
155 Sent and accepted cookies are provided as additional logging
156 information.
157
158 -a, --anchors
159 Check HTTP anchor references. Default is not to check anchors.
160 This option enables logging of the warning url-anchor-not-found.
161
162 --no-anchor-caching
163 Treat url#anchora and url#anchorb as equal on caching. This is
164 the default browser behaviour, but it's not specified in the URI
165 specification. Use with care since broken anchors are not guar‐
166 anteed to be detected in this mode.
167
168 -uSTRING, --user=STRING
169 Try the given username for HTTP and FTP authorization. For FTP
170 the default username is anonymous. For HTTP there is no default
171 username. See also -p.
172
173 -pSTRING, --password=STRING
174 Try the given password for HTTP and FTP authorization. For FTP
175 the default password is anonymous@. For HTTP there is no default
176 password. See also -u.
177
178 --timeout=NUMBER
179 Set the timeout for connection attempts in seconds. The default
180 timeout is 60 seconds.
181
182 -PNUMBER, --pause=NUMBER
183 Pause the given number of seconds between two subsequent connec‐
184 tion requests to the same host. Default is no pause between
185 requests.
186
187 -NSTRING, --nntp-server=STRING
188 Specify an NNTP server for news: links. Default is the environ‐
189 ment variable NNTP_SERVER. If no host is given, only the syntax
190 of the link is checked.
191
192 --no-proxy-for=REGEX
193 Contact hosts that match the given regular expression directly
194 instead of going through a proxy. This option can be given mul‐
195 tiple times.
196
198 Note that by default only errors and warnings are logged. You should
199 use the --verbose option to get the complete URL list, especially when
200 outputting a sitemap graph format.
201
202
203 text Standard text logger, logging URLs in keyword: argument fashion.
204
205 html Log URLs in keyword: argument fashion, formatted as HTML. Addi‐
206 tionally has links to the referenced pages. Invalid URLs have
207 HTML and CSS syntax check links appended.
208
209 csv Log check result in CSV format with one URL per line.
210
211 gml Log parent-child relations between linked URLs as a GML sitemap
212 graph.
213
214 dot Log parent-child relations between linked URLs as a DOT sitemap
215 graph.
216
217 gxml Log check result as a GraphXML sitemap graph.
218
219 xml Log check result as machine-readable XML.
220
221 sql Log check result as SQL script with INSERT commands. An example
222 script to create the initial SQL table is included as cre‐
223 ate.sql.
224
225 blacklist
226 Suitable for cron jobs. Logs the check result into a file
227 ~/.linkchecker/blacklist which only contains entries with
228 invalid URLs and the number of times they have failed.
229
230 none Logs nothing. Suitable for debugging or checking the exit code.
231
233 Only Python regular expressions are accepted by LinkChecker. See
234 http://www.amk.ca/python/howto/regex/ for an introduction in regular
235 expressions.
236
237 The only addition is that a leading exclamation mark negates the regu‐
238 lar expression.
239
241 A cookie file contains standard RFC 805 header data with the following
242 possible names:
243
244 Scheme (optional)
245 Sets the scheme the cookies are valid for; default scheme is
246 http.
247
248 Host (required)
249 Sets the domain the cookies are valid for.
250
251 Path (optional)
252 Gives the path the cookies are value for; default path is /.
253
254 Set-cookie (optional)
255 Set cookie name/value. Can be given more than once.
256
257 Multiple entries are separated by a blank line. The example below will
258 send two cookies to all URLs starting with http://imadoofus.org/hello/
259 and one to all URLs starting with https://imaweevil.org/:
260
261 Host: imadoofus.org
262 Path: /hello
263 Set-cookie: ID="smee"
264 Set-cookie: spam="egg"
265
266 Scheme: https
267 Host: imaweevil.org
268 Set-cookie: baggage="elitist"; comment="hologram"
269
270
272 To use a proxy set $http_proxy, $https_proxy, $ftp_proxy on Unix or
273 Windows to the proxy URL. The URL should be of the form
274 http://[user:pass@]host[:port], for example http://localhost:8080, or
275 http://joe:test@proxy.domain. On a Mac use the Internet Config to
276 select a proxy.
277
279 URLs on the commandline starting with ftp. are treated like ftp://ftp.,
280 URLs starting with www. are treated like http://www.. You can also
281 give local files as arguments.
282
283 If you have your system configured to automatically establish a connec‐
284 tion to the internet (e.g. with diald), it will connect when checking
285 links not pointing to your local host. Use the -s and -i options to
286 prevent this.
287
288 Javascript links are currently ignored.
289
290 If your platform does not support threading, LinkChecker disables it
291 automatically.
292
293 You can supply multiple user/password pairs in a configuration file.
294
295 When checking news: links the given NNTP host doesn't need to be the
296 same as the host of the user browsing your pages.
297
299 NNTP_SERVER - specifies default NNTP server
300 http_proxy - specifies default HTTP proxy server
301 ftp_proxy - specifies default FTP proxy server
302 LC_MESSAGES, LANG, LANGUAGE - specify output language
303
305 The return value is non-zero when
306
307 · invalid links were found or
308
309 · link warnings were found and warnings are enabled
310
311 · a program error occurred.
312
314 /etc/linkchecker/linkcheckerrc, ~/.linkchecker/linkcheckerrc - default
315 configuration files
316 ~/.linkchecker/blacklist - default blacklist logger output filename
317 linkchecker-out.TYPE - default logger file output name
318 http://docs.python.org/lib/standard-encodings.html - valid output
319 encodings
320 http://www.amk.ca/python/howto/regex/ - regular expression documenta‐
321 tion
322
324 Bastian Kleineidam <calvin@users.sourceforge.net>
325
327 Copyright © 2000-2007 Bastian Kleineidam
328
329
330
331LinkChecker 2001-03-10 linkchecker(1)