1LINKCHECKERRC(5) LinkChecker LINKCHECKERRC(5)
2
3
4
6 linkcheckerrc - configuration file for LinkChecker
7
9 linkcheckerrc is the configuration file for LinkChecker. The file is
10 written in an INI-style format. The default file location is $XDG_CON‐
11 FIG_HOME/linkchecker/linkcheckerrc or else ~/.con‐
12 fig/linkchecker/linkcheckerrc on Unix, %HOMEPATH%\.con‐
13 fig\linkchecker\linkcheckerrc on Windows systems.
14
16 checking
17 cookiefile=filename
18 Read a file with initial cookie data. The cookie data format is
19 explained in linkchecker(1). Command line option: --cookiefile
20
21 debugmemory=[0|1]
22 Write memory allocation statistics to a file on exit, requires
23 meliae. The default is not to write the file. Command line op‐
24 tion: none
25
26 localwebroot=STRING
27 When checking absolute URLs inside local files, the given root
28 directory is used as base URL. Note that the given directory
29 must have URL syntax, so it must use a slash to join directories
30 instead of a backslash. And the given directory must end with a
31 slash. Command line option: none
32
33 nntpserver=STRING
34 Specify an NNTP server for news: links. Default is the environ‐
35 ment variable NNTP_SERVER. If no host is given, only the syntax
36 of the link is checked. Command line option: --nntp-server
37
38 recursionlevel=NUMBER
39 Check recursively all links up to given depth. A negative depth
40 will enable infinite recursion. Default depth is infinite. Com‐
41 mand line option: --recursion-level
42
43 threads=NUMBER
44 Generate no more than the given number of threads. Default num‐
45 ber of threads is 10. To disable threading specify a non-posi‐
46 tive number. Command line option: --threads
47
48 timeout=NUMBER
49 Set the timeout for connection attempts in seconds. The default
50 timeout is 60 seconds. Command line option: --timeout
51
52 aborttimeout=NUMBER
53 Time to wait for checks to finish after the user aborts the
54 first time (with Ctrl-C or the abort button). The default abort
55 timeout is 300 seconds. Command line option: none
56
57 useragent=STRING
58 Specify the User-Agent string to send to the HTTP server, for
59 example "Mozilla/4.0". The default is "LinkChecker/X.Y" where
60 X.Y is the current version of LinkChecker. Command line option:
61 --user-agent
62
63 sslverify=[0|1|filename]
64 If set to zero disables SSL certificate checking. If set to one
65 (the default) enables SSL certificate checking with the provided
66 CA certificate file. If a filename is specified, it will be used
67 as the certificate file. Command line option: none
68
69 maxrunseconds=NUMBER
70 Stop checking new URLs after the given number of seconds. Same
71 as if the user stops (by hitting Ctrl-C) after the given number
72 of seconds. The default is not to stop until all URLs are
73 checked. Command line option: none
74
75 maxfilesizedownload=NUMBER
76 Files larger than NUMBER bytes will be ignored, without down‐
77 loading anything if accessed over http and an accurate Con‐
78 tent-Length header was returned. No more than this amount of a
79 file will be downloaded. The default is 5242880 (5 MB). Com‐
80 mand line option: none
81
82 maxfilesizeparse=NUMBER
83 Files larger than NUMBER bytes will not be parsed for links.
84 The default is 1048576 (1 MB). Command line option: none
85
86 maxnumurls=NUMBER
87 Maximum number of URLs to check. New URLs will not be queued af‐
88 ter the given number of URLs is checked. The default is to
89 queue and check all URLs. Command line option: none
90
91 maxrequestspersecond=NUMBER
92 Limit the maximum number of HTTP requests per second to one
93 host. The average number of requests per second is approxi‐
94 mately one third of the maximum. Values less than 1 and at least
95 0.001 can be used. To use values greater than 10, the HTTP
96 server must return a "LinkChecker" response header. The default
97 is 10. Command line option: none
98
99 robotstxt=[0|1]
100 When using http, fetch robots.txt, and confirm whether each URL
101 should be accessed before checking. The default is to use ro‐
102 bots.txt files. Command line option: --no-robots
103
104 allowedschemes=NAME[,NAME...]
105 Allowed URL schemes as comma-separated list. Command line op‐
106 tion: none
107
108 resultcachesize=NUMBER
109 Set the result cache size. The default is 100 000 URLs. Com‐
110 mand line option: none
111
112 filtering
113 ignore=REGEX (MULTILINE)
114 Only check syntax of URLs matching the given regular expres‐
115 sions. Command line option: --ignore-url
116
117 ignorewarnings=NAME[,NAME...]
118 Ignore the comma-separated list of warnings. See WARNINGS for
119 the list of supported warnings. Command line option: none
120
121 internlinks=REGEX
122 Regular expression to add more URLs recognized as internal
123 links. Default is that URLs given on the command line are in‐
124 ternal. Command line option: none
125
126 nofollow=REGEX (MULTILINE)
127 Check but do not recurse into URLs matching the given regular
128 expressions. Command line option: --no-follow-url
129
130 checkextern=[0|1]
131 Check external links. Default is to check internal links only.
132 Command line option: --check-extern
133
134 authentication
135 entry=REGEX USER [PASS] (MULTILINE)
136 Provide individual username/password pairs for different links.
137 In addition to a single login page specified with loginurl mul‐
138 tiple FTP, HTTP (Basic Authentication) and telnet links are sup‐
139 ported. Entries are a triple (URL regex, username, password) or
140 a tuple (URL regex, username), where the entries are separated
141 by whitespace. The password is optional and if missing it has
142 to be entered at the commandline. If the regular expression
143 matches the checked URL, the given username/password pair is
144 used for authentication. The command line options -u and -p
145 match every link and therefore override the entries given here.
146 The first match wins. Command line option: -u, -p
147
148 loginurl=URL
149 The URL of a login page to be visited before link checking. The
150 page is expected to contain an HTML form to collect credentials
151 and submit them to the address in its action attribute using an
152 HTTP POST request. The name attributes of the input elements of
153 the form and the values to be submitted need to be available
154 (see entry for an explanation of username and password values).
155
156 loginuserfield=STRING
157 The name attribute of the username input element. Default: lo‐
158 gin.
159
160 loginpasswordfield=STRING
161 The name attribute of the password input element. Default: pass‐
162 word.
163
164 loginextrafields=NAME:VALUE (MULTILINE)
165 Optionally the name attributes of any additional input elements
166 and the values to populate them with. Note that these are sub‐
167 mitted without checking whether matching input elements exist in
168 the HTML form.
169
170 output
171 URL checking results
172 fileoutput=TYPE[,TYPE...]
173 Output to a file linkchecker-out.TYPE, or
174 $XDG_DATA_HOME/linkchecker/failures for the failures output
175 type. Valid file output types are text, html, sql, csv, gml,
176 dot, xml, none or failures. Default is no file output. The vari‐
177 ous output types are documented below. Note that you can sup‐
178 press all console output with output=none. Command line option:
179 --file-output
180
181 log=TYPE[/ENCODING]
182 Specify the console output type as text, html, sql, csv, gml,
183 dot, xml, none or failures. Default type is text. The various
184 output types are documented below. The ENCODING specifies the
185 output encoding, the default is that of your locale. Valid en‐
186 codings are listed at
187 https://docs.python.org/library/codecs.html#standard-encodings.
188 Command line option: --output
189
190 verbose=[0|1]
191 If set log all checked URLs once. Default is to log only errors
192 and warnings. Command line option: --verbose
193
194 warnings=[0|1]
195 If set log warnings. Default is to log warnings. Command line
196 option: --no-warnings
197
198 ignoreerrors=URL_REGEX [MESSAGE_REGEX] (MULTILINE)
199 Specify regular expressions to ignore errors for matching URLs,
200 one per line. A second regular expression can be specified per
201 line to only ignore matching error messages per corresponding
202 URL. If the second expression is omitted, all errors are ig‐
203 nored. In contrast to filtering, this happens after checking,
204 which allows checking URLs despite certain expected and tolera‐
205 ble errors. Default is to not ignore any errors. Example:
206
207 [output]
208 ignoreerrors=
209 ^https://deprecated\.example\.com ^410 Gone
210 # ignore all errors (no second expression), also for syntax check:
211 ^mailto:.*@example\.com$
212
213 Progress updates
214 status=[0|1]
215 Control printing URL checker status messages. Default is 1.
216 Command line option: --no-status
217
218 Application
219 debug=STRING[,STRING...]
220 Print debugging output for the given logger. Available debug
221 loggers are cmdline, checking, cache, plugin and all. all is an
222 alias for all available loggers. Command line option: --debug
223
224 Quiet
225 quiet=[0|1]
226 If set, operate quiet. An alias for log=none that also hides ap‐
227 plication information messages. This is only useful with file‐
228 output, else no results will be output. Command line option:
229 --quiet
230
232 text
233 filename=STRING
234 Specify output filename for text logging. Default filename is
235 linkchecker-out.txt. Command line option: --file-output
236
237 parts=STRING
238 Comma-separated list of parts that have to be logged. See LOGGER
239 PARTS below. Command line option: none
240
241 encoding=STRING
242 Valid encodings are listed in
243 https://docs.python.org/library/codecs.html#standard-encodings.
244 Default encoding is the system default locale encoding.
245
246 color* Color settings for the various log parts, syntax is color or
247 type;color. The type can be bold, light, blink, invert. The
248 color can be default, black, red, green, yellow, blue, purple,
249 cyan, white, Black, Red, Green, Yellow, Blue, Purple, Cyan or
250 White. Command line option: none
251
252 colorparent=STRING
253 Set parent color. Default is white.
254
255 colorurl=STRING
256 Set URL color. Default is default.
257
258 colorname=STRING
259 Set name color. Default is default.
260
261 colorreal=STRING
262 Set real URL color. Default is cyan.
263
264 colorbase=STRING
265 Set base URL color. Default is purple.
266
267 colorvalid=STRING
268 Set valid color. Default is bold;green.
269
270 colorinvalid=STRING
271 Set invalid color. Default is bold;red.
272
273 colorinfo=STRING
274 Set info color. Default is default.
275
276 colorwarning=STRING
277 Set warning color. Default is bold;yellow.
278
279 colordltime=STRING
280 Set download time color. Default is default.
281
282 colorreset=STRING
283 Set reset color. Default is default.
284
285 gml
286 filename=STRING
287 See [text] section above.
288
289 parts=STRING
290 See [text] section above.
291
292 encoding=STRING
293 See [text] section above.
294
295 dot
296 filename=STRING
297 See [text] section above.
298
299 parts=STRING
300 See [text] section above.
301
302 encoding=STRING
303 See [text] section above.
304
305 csv
306 filename=STRING
307 See [text] section above.
308
309 parts=STRING
310 See [text] section above.
311
312 encoding=STRING
313 See [text] section above.
314
315 separator=CHAR
316 Set CSV separator. Default is a semicolon (;).
317
318 quotechar=CHAR
319 Set CSV quote character. Default is a double quote (").
320
321 dialect=STRING
322 Controls the output formatting. See
323 https://docs.python.org/3/library/csv.html#csv.Dialect. Default
324 is excel.
325
326 sql
327 filename=STRING
328 See [text] section above.
329
330 parts=STRING
331 See [text] section above.
332
333 encoding=STRING
334 See [text] section above.
335
336 dbname=STRING
337 Set database name to store into. Default is linksdb.
338
339 separator=CHAR
340 Set SQL command separator character. Default is a semicolon (;).
341
342 html
343 filename=STRING
344 See [text] section above.
345
346 parts=STRING
347 See [text] section above.
348
349 encoding=STRING
350 See [text] section above.
351
352 colorbackground=COLOR
353 Set HTML background color. Default is #fff7e5.
354
355 colorurl=
356 Set HTML URL color. Default is #dcd5cf.
357
358 colorborder=
359 Set HTML border color. Default is #000000.
360
361 colorlink=
362 Set HTML link color. Default is #191c83.
363
364 colorwarning=
365 Set HTML warning color. Default is #e0954e.
366
367 colorerror=
368 Set HTML error color. Default is #db4930.
369
370 colorok=
371 Set HTML valid color. Default is #3ba557.
372
373 failures
374 filename=STRING
375 See [text] section above.
376
377 encoding=STRING
378 See [text] section above.
379
380 xml
381 filename=STRING
382 See [text] section above.
383
384 parts=STRING
385 See [text] section above.
386
387 encoding=STRING
388 See [text] section above.
389
390 gxml
391 filename=STRING
392 See [text] section above.
393
394 parts=STRING
395 See [text] section above.
396
397 encoding=STRING
398 See [text] section above.
399
400 sitemap
401 filename=STRING
402 See [text] section above.
403
404 parts=STRING
405 See [text] section above.
406
407 encoding=STRING
408 See [text] section above.
409
410 priority=FLOAT
411 A number between 0.0 and 1.0 determining the priority. The de‐
412 fault priority for the first URL is 1.0, for all child URLs 0.5.
413
414 frequency=[always|hourly|daily|weekly|monthly|yearly|never]
415 How frequently pages are changing. Default is daily.
416
418 all for all parts
419
420 id a unique ID for each logentry
421
422 realurl
423 the full url link
424
425 result valid or invalid, with messages
426
427 extern 1 or 0, only in some logger types reported
428
429 base base href=...
430
431 name <a href=...>name</a> and <img alt="name">
432
433 parenturl
434 if any
435
436 info some additional info, e.g. FTP welcome messages
437
438 warning
439 warnings
440
441 dltime download time
442
443 checktime
444 check time
445
446 url the original url name, can be relative
447
448 intro the blurb at the beginning, "starting at ..."
449
450 outro the blurb at the end, "found x errors ..."
451
453 Some option values can span multiple lines. Each line has to be in‐
454 dented for that to work. Lines starting with a hash (#) will be ig‐
455 nored, though they must still be indented.
456
457 ignore=
458 lconline
459 bookmark
460 # a comment
461 ^mailto:
462
464 [output]
465 log=html
466
467 [checking]
468 threads=5
469
470 [filtering]
471 ignorewarnings=http-moved-permanent
472
474 All plugins have a separate section. If the section appears in the con‐
475 figuration file the plugin is enabled. Some plugins read extra options
476 in their section.
477
478 AnchorCheck
479 Checks validity of HTML anchors. When checking local files, URLs with
480 anchors that link to directories e.g. "example/#anchor" are not sup‐
481 ported. There is no such limitation when using http(s).
482
483 LocationInfo
484 Adds the country and if possible city name of the URL host as info.
485 Needs GeoIP or pygeoip and a local country or city lookup DB installed.
486
487 RegexCheck
488 Define a regular expression which prints a warning if it matches any
489 content of the checked link. This applies only to valid pages, so we
490 can get their content.
491
492 warningregex=REGEX
493 Use this to check for pages that contain some form of error mes‐
494 sage, for example "This page has moved" or "Oracle Application
495 error". REGEX should be unquoted.
496
497 Note that multiple values can be combined in the regular expres‐
498 sion, for example "(This page has moved|Oracle Application er‐
499 ror)".
500
501 SslCertificateCheck
502 Check SSL certificate expiration date. Only internal https: links will
503 be checked. A domain will only be checked once to avoid duplicate warn‐
504 ings.
505
506 sslcertwarndays=NUMBER
507 Configures the expiration warning time in days.
508
509 HtmlSyntaxCheck
510 Check the syntax of HTML pages with the online W3C HTML validator. See
511 https://validator.w3.org/docs/api.html.
512
513 NOTE:
514 The HtmlSyntaxCheck plugin is currently broken and is disabled.
515
516 HttpHeaderInfo
517 Print HTTP headers in URL info.
518
519 prefixes=prefix1[,*prefix2*]...
520 List of comma separated header prefixes. For example to display
521 all HTTP headers that start with "X-".
522
523 CssSyntaxCheck
524 Check the syntax of HTML pages with the online W3C CSS validator. See
525 https://jigsaw.w3.org/css-validator/manual.html#expert.
526
527 VirusCheck
528 Checks the page content for virus infections with clamav. A local cla‐
529 mav daemon must be installed.
530
531 clamavconf=filename
532 Filename of clamd.conf config file.
533
534 PdfParser
535 Parse PDF files for URLs to check. Needs the pdfminer.six Python pack‐
536 age installed.
537
538 WordParser
539 Parse Word files for URLs to check. Needs the pywin32 Python extension
540 installed.
541
542 MarkdownCheck
543 Parse Markdown files for URLs to check.
544
545 filename_re=REGEX
546 Regular expression matching the names of Markdown files.
547
549 The following warnings are recognized in the 'ignorewarnings' config
550 file entry:
551
552 file-anchorcheck-directory
553 A local directory with an anchor, not supported by AnchorCheck.
554
555 file-missing-slash
556 The file: URL is missing a trailing slash.
557
558 file-system-path
559 The file: path is not the same as the system specific path.
560
561 ftp-missing-slash
562 The ftp: URL is missing a trailing slash.
563
564 http-cookie-store-error
565 An error occurred while storing a cookie.
566
567 http-empty-content
568 The URL had no content.
569
570 http-rate-limited
571 Too many HTTP requests.
572
573 mail-no-mx-host
574 The mail MX host could not be found.
575
576 nntp-no-newsgroup
577 The NNTP newsgroup could not be found.
578
579 nntp-no-server
580 No NNTP server was found.
581
582 url-content-size-zero
583 The URL content size is zero.
584
585 url-content-too-large
586 The URL content size is too large.
587
588 url-content-type-unparseable
589 The URL content type is not parseable.
590
591 url-effective-url
592 The effective URL is different from the original.
593
594 url-error-getting-content
595 Could not get the content of the URL.
596
597 url-obfuscated-ip
598 The IP is obfuscated.
599
600 url-whitespace
601 The URL contains leading or trailing whitespace.
602
604 linkchecker(1)
605
607 Bastian Kleineidam <bastian.kleineidam@web.de>
608
610 2000-2016 Bastian Kleineidam, 2010-2022 LinkChecker Authors
611
612
613
614
61510.1.0.post162+g614e84b5 October 31, 2022 LINKCHECKERRC(5)