1linkcheckerrc(5) File Formats Manual linkcheckerrc(5)
2
3
4
6 linkcheckerrc - configuration file for LinkChecker
7
9 linkcheckerrc is the configuration file for LinkChecker. The file is
10 written in an INI-style format.
11 The default file location is ~/.linkchecker/linkcheckerrc on Unix,
12 %HOMEPATH%\.linkchecker\linkcheckerrc on Windows systems.
13
15 [checking]
16 cookiefile=filename
17 Read a file with initial cookie data. The cookie data format is
18 explained in linkchecker(1).
19 Command line option: --cookiefile
20
21 localwebroot=STRING
22 When checking absolute URLs inside local files, the given root
23 directory is used as base URL.
24 Note that the given directory must have URL syntax, so it must
25 use a slash to join directories instead of a backslash. And the
26 given directory must end with a slash.
27 Command line option: none
28
29 nntpserver=STRING
30 Specify an NNTP server for news: links. Default is the environ‐
31 ment variable NNTP_SERVER. If no host is given, only the syntax
32 of the link is checked.
33 Command line option: --nntp-server
34
35 recursionlevel=NUMBER
36 Check recursively all links up to given depth. A negative depth
37 will enable infinite recursion. Default depth is infinite.
38 Command line option: --recursion-level
39
40 threads=NUMBER
41 Generate no more than the given number of threads. Default num‐
42 ber of threads is 10. To disable threading specify a non-posi‐
43 tive number.
44 Command line option: --threads
45
46 timeout=NUMBER
47 Set the timeout for connection attempts in seconds. The default
48 timeout is 60 seconds.
49 Command line option: --timeout
50
51 aborttimeout=NUMBER
52 Time to wait for checks to finish after the user aborts the
53 first time (with Ctrl-C or the abort button). The default abort
54 timeout is 300 seconds.
55 Command line option: --timeout
56
57 useragent=STRING
58 Specify the User-Agent string to send to the HTTP server, for
59 example "Mozilla/4.0". The default is "LinkChecker/X.Y" where
60 X.Y is the current version of LinkChecker.
61 Command line option: --user-agent
62
63 sslverify=[0|1|filename]
64 If set to zero disables SSL certificate checking. If set to one
65 (the default) enables SSL certificate checking with the provided
66 CA certificate file. If a filename is specified, it will be used
67 as the certificate file.
68 Command line option: none
69
70 maxrunseconds=NUMBER
71 Stop checking new URLs after the given number of seconds. Same
72 as if the user stops (by hitting Ctrl-C) after the given number
73 of seconds.
74 The default is not to stop until all URLs are checked.
75 Command line option: none
76
77 maxnumurls=NUMBER
78 Maximum number of URLs to check. New URLs will not be queued
79 after the given number of URLs is checked.
80 The default is to queue and check all URLs.
81 Command line option: none
82
83 maxrequestspersecond=NUMBER
84 Limit the maximum number of requests per second to one host.
85
86 allowedschemes=NAME[,NAME...]
87 Allowed URL schemes as comma-separated list.
88
89 [filtering]
90 ignore=REGEX (MULTILINE)
91 Only check syntax of URLs matching the given regular expres‐
92 sions.
93 Command line option: --ignore-url
94
95 ignorewarnings=NAME[,NAME...]
96 Ignore the comma-separated list of warnings. See WARNINGS for
97 the list of supported warnings.
98 Command line option: none
99
100 internlinks=REGEX
101 Regular expression to add more URLs recognized as internal
102 links. Default is that URLs given on the command line are
103 internal.
104 Command line option: none
105
106 nofollow=REGEX (MULTILINE)
107 Check but do not recurse into URLs matching the given regular
108 expressions.
109 Command line option: --no-follow-url
110
111 checkextern=[0|1]
112 Check external links. Default is to check internal links only.
113 Command line option: --checkextern
114
115 [authentication]
116 entry=REGEX USER [PASS] (MULTILINE)
117 Provide different user/password pairs for different link types.
118 Entries are a triple (URL regex, username, password) or a tuple
119 (URL regex, username), where the entries are separated by white‐
120 space.
121 The password is optional and if missing it has to be entered at
122 the commandline.
123 If the regular expression matches the checked URL, the given
124 user/password pair is used for authentication. The commandline
125 options -u and -p match every link and therefore override the
126 entries given here. The first match wins. At the moment, authen‐
127 tication is used/needed for http[s] and ftp links.
128 Command line option: -u, -p
129
130 loginurl=URL
131 A login URL to be visited before checking. Also needs authenti‐
132 cation data set for it.
133
134 loginuserfield=STRING
135 The name of the user CGI field. Default name is login.
136
137 loginpasswordfield=STRING
138 The name of the password CGI field. Default name is password.
139
140 loginextrafields=NAME:VALUE (MULTILINE)
141 Optionally any additional CGI name/value pairs. Note that the
142 default values are submitted automatically.
143
144 [output]
145 debug=STRING[,STRING...]
146 Print debugging output for the given modules. Available debug
147 modules are cmdline, checking, cache, dns, thread, plugins and
148 all. Specifying all is an alias for specifying all available
149 loggers.
150 Command line option: --debug
151
152 fileoutput=TYPE[,TYPE...]
153 Output to a files linkchecker-out.TYPE,
154 $HOME/.linkchecker/blacklist for blacklist output.
155 Valid file output types are text, html, sql, csv, gml, dot, xml,
156 none or blacklist Default is no file output. The various output
157 types are documented below. Note that you can suppress all con‐
158 sole output with output=none.
159 Command line option: --file-output
160
161 log=TYPE[/ENCODING]
162 Specify output type as text, html, sql, csv, gml, dot, xml, none
163 or blacklist. Default type is text. The various output types
164 are documented below.
165 The ENCODING specifies the output encoding, the default is that
166 of your locale. Valid encodings are listed at
167 http://docs.python.org/library/codecs.html#standard-encodings.
168 Command line option: --output
169
170 quiet=[0|1]
171 If set, operate quiet. An alias for log=none. This is only use‐
172 ful with fileoutput.
173 Command line option: --verbose
174
175 status=[0|1]
176 Control printing check status messages. Default is 1.
177 Command line option: --no-status
178
179 verbose=[0|1]
180 If set log all checked URLs once. Default is to log only errors
181 and warnings.
182 Command line option: --verbose
183
184 warnings=[0|1]
185 If set log warnings. Default is to log warnings.
186 Command line option: --no-warnings
187
188 [text]
189 filename=STRING
190 Specify output filename for text logging. Default filename is
191 linkchecker-out.txt.
192 Command line option: --file-output=
193
194 parts=STRING
195 Comma-separated list of parts that have to be logged. See LOG‐
196 GER PARTS below.
197 Command line option: none
198
199 encoding=STRING
200 Valid encodings are listed in
201 http://docs.python.org/library/codecs.html#standard-encodings.
202 Default encoding is iso-8859-15.
203
204 color* Color settings for the various log parts, syntax is color or
205 type;color. The type can be bold, light, blink, invert. The
206 color can be default, black, red, green, yellow, blue, purple,
207 cyan, white, Black, Red, Green, Yellow, Blue, Purple, Cyan or
208 White.
209 Command line option: none
210
211 colorparent=STRING
212 Set parent color. Default is white.
213
214 colorurl=STRING
215 Set URL color. Default is default.
216
217 colorname=STRING
218 Set name color. Default is default.
219
220 colorreal=STRING
221 Set real URL color. Default is cyan.
222
223 colorbase=STRING
224 Set base URL color. Default is purple.
225
226 colorvalid=STRING
227 Set valid color. Default is bold;green.
228
229 colorinvalid=STRING
230 Set invalid color. Default is bold;red.
231
232 colorinfo=STRING
233 Set info color. Default is default.
234
235 colorwarning=STRING
236 Set warning color. Default is bold;yellow.
237
238 colordltime=STRING
239 Set download time color. Default is default.
240
241 colorreset=STRING
242 Set reset color. Default is default.
243
244 [gml]
245 filename=STRING
246 See [text] section above.
247
248 parts=STRING
249 See [text] section above.
250
251 encoding=STRING
252 See [text] section above.
253
254 [dot]
255 filename=STRING
256 See [text] section above.
257
258 parts=STRING
259 See [text] section above.
260
261 encoding=STRING
262 See [text] section above.
263
264 [csv]
265 filename=STRING
266 See [text] section above.
267
268 parts=STRING
269 See [text] section above.
270
271 encoding=STRING
272 See [text] section above.
273
274 separator=CHAR
275 Set CSV separator. Default is a comma (,).
276
277 quotechar=CHAR
278 Set CSV quote character. Default is a double quote (").
279
280 [sql]
281 filename=STRING
282 See [text] section above.
283
284 parts=STRING
285 See [text] section above.
286
287 encoding=STRING
288 See [text] section above.
289
290 dbname=STRING
291 Set database name to store into. Default is linksdb.
292
293 separator=CHAR
294 Set SQL command separator character. Default is a semicolon (;).
295
296 [html]
297 filename=STRING
298 See [text] section above.
299
300 parts=STRING
301 See [text] section above.
302
303 encoding=STRING
304 See [text] section above.
305
306 colorbackground=COLOR
307 Set HTML background color. Default is #fff7e5.
308
309 colorurl=
310 Set HTML URL color. Default is #dcd5cf.
311
312 colorborder=
313 Set HTML border color. Default is #000000.
314
315 colorlink=
316 Set HTML link color. Default is #191c83.
317
318 colorwarning=
319 Set HTML warning color. Default is #e0954e.
320
321 colorerror=
322 Set HTML error color. Default is #db4930.
323
324 colorok=
325 Set HTML valid color. Default is #3ba557.
326
327 [blacklist]
328 filename=STRING
329 See [text] section above.
330
331 encoding=STRING
332 See [text] section above.
333
334 [xml]
335 filename=STRING
336 See [text] section above.
337
338 parts=STRING
339 See [text] section above.
340
341 encoding=STRING
342 See [text] section above.
343
344 [gxml]
345 filename=STRING
346 See [text] section above.
347
348 parts=STRING
349 See [text] section above.
350
351 encoding=STRING
352 See [text] section above.
353
354 [sitemap]
355 filename=STRING
356 See [text] section above.
357
358 parts=STRING
359 See [text] section above.
360
361 encoding=STRING
362 See [text] section above.
363
364 priority=FLOAT
365 A number between 0.0 and 1.0 determining the priority. The
366 default priority for the first URL is 1.0, for all child URLs
367 0.5.
368
369 frequency=[always|hourly|daily|weekly|monthly|yearly|never]
370 How frequently pages are changing.
371
373 all (for all parts)
374 id (a unique ID for each logentry)
375 realurl (the full url link)
376 result (valid or invalid, with messages)
377 extern (1 or 0, only in some logger types reported)
378 base (base href=...)
379 name (<a href=...>name</a> and <img alt="name">)
380 parenturl (if any)
381 info (some additional info, e.g. FTP welcome messages)
382 warning (warnings)
383 dltime (download time)
384 checktime (check time)
385 url (the original url name, can be relative)
386 intro (the blurb at the beginning, "starting at ...")
387 outro (the blurb at the end, "found x errors ...")
388
390 Some option values can span multiple lines. Each line has to be
391 indented for that to work. Lines starting with a hash (#) will be
392 ignored, though they must still be indented.
393
394 ignore=
395 lconline
396 bookmark
397 # a comment
398 ^mailto:
399
401 [output]
402 log=html
403
404 [checking]
405 threads=5
406
407 [filtering]
408 ignorewarnings=http-moved-permanent
409
410
412 All plugins have a separate section. If the section appears in the con‐
413 figuration file the plugin is enabled. Some plugins read extra options
414 in their section.
415
416
417 [AnchorCheck]
418 Checks validity of HTML anchors.
419
420
421 [LocationInfo]
422 Adds the country and if possible city name of the URL host as info.
423 Needs GeoIP or pygeoip and a local country or city lookup DB installed.
424
425
426 [RegexCheck]
427 Define a regular expression which prints a warning if it matches any
428 content of the checked link. This applies only to valid pages, so we
429 can get their content.
430
431 warningregex=REGEX
432 Use this to check for pages that contain some form of error mes‐
433 sage, for example "This page has moved" or "Oracle Application
434 error". REGEX should be unquoted.
435
436 Note that multiple values can be combined in the regular expres‐
437 sion, for example "(This page has moved|Oracle Application
438 error)".
439
440
441 [SslCertificateCheck]
442 Check SSL certificate expiration date. Only internal https: links will
443 be checked. A domain will only be checked once to avoid duplicate warn‐
444 ings.
445
446 sslcertwarndays=NUMBER
447 Configures the expiration warning time in days.
448
449
450 [HtmlSyntaxCheck]
451 Check the syntax of HTML pages with the online W3C HTML validator. See
452 http://validator.w3.org/docs/api.html.
453
454
455 [HttpHeaderInfo]
456 Print HTTP headers in URL info.
457
458 prefixes=prefix1[,prefix2]...
459 List of comma separated header prefixes. For example to display
460 all HTTP headers that start with "X-".
461
462
463 [CssSyntaxCheck]
464 Check the syntax of HTML pages with the online W3C CSS validator. See
465 http://jigsaw.w3.org/css-validator/manual.html#expert.
466
467
468 [VirusCheck]
469 Checks the page content for virus infections with clamav. A local cla‐
470 mav daemon must be installed.
471
472 clamavconf=filename
473 Filename of clamd.conf config file.
474
475 [PdfParser]
476 Parse PDF files for URLs to check. Needs the pdfminer Python package
477 installed.
478
479
480 [WordParser]
481 Parse Word files for URLs to check. Needs the pywin32 Python extension
482 installed.
483
484
486 The following warnings are recognized in the 'ignorewarnings' config
487 file entry:
488
489 file-missing-slash
490 The file: URL is missing a trailing slash.
491
492 file-system-path
493 The file: path is not the same as the system specific path.
494
495 ftp-missing-slash
496 The ftp: URL is missing a trailing slash.
497
498 http-cookie-store-error
499 An error occurred while storing a cookie.
500
501 http-empty-content
502 The URL had no content.
503
504 mail-no-mx-host
505 The mail MX host could not be found.
506
507 nntp-no-newsgroup
508 The NNTP newsgroup could not be found.
509
510 nntp-no-server
511 No NNTP server was found.
512
513 url-content-size-zero
514 The URL content size is zero.
515
516 url-content-too-large
517 The URL content size is too large.
518
519 url-effective-url
520 The effective URL is different from the original.
521
522 url-error-getting-content
523 Could not get the content of the URL.
524
525 url-obfuscated-ip
526 The IP is obfuscated.
527
528 url-whitespace
529 The URL contains leading or trailing whitespace.
530
531
533 linkchecker(1)
534
536 Bastian Kleineidam <bastian.kleineidam@web.de>
537
539 Copyright © 2000-2014 Bastian Kleineidam
540
541
542
543LinkChecker 2007-11-30 linkcheckerrc(5)