1linkcheckerrc(5)              File Formats Manual             linkcheckerrc(5)
2
3
4

NAME

6       linkcheckerrc - configuration file for LinkChecker
7

DESCRIPTION

9       linkcheckerrc  is  the configuration file for LinkChecker.  The file is
10       written in an INI-style format.
11       The default file  location  is  ~/.linkchecker/linkcheckerrc  on  Unix,
12       %HOMEPATH%\.linkchecker\linkcheckerrc on Windows systems.
13

SETTINGS

15   [checking]
16       cookiefile=filename
17              Read  a file with initial cookie data. The cookie data format is
18              explained in linkchecker(1).
19              Command line option: --cookiefile
20
21       localwebroot=STRING
22              When checking absolute URLs inside local files, the  given  root
23              directory is used as base URL.
24              Note  that  the given directory must have URL syntax, so it must
25              use a slash to join directories instead of a backslash.  And the
26              given directory must end with a slash.
27              Command line option: none
28
29       nntpserver=STRING
30              Specify  an NNTP server for news: links. Default is the environ‐
31              ment variable NNTP_SERVER. If no host is given, only the  syntax
32              of the link is checked.
33              Command line option: --nntp-server
34
35       recursionlevel=NUMBER
36              Check recursively all links up to given depth.  A negative depth
37              will enable infinite recursion.  Default depth is infinite.
38              Command line option: --recursion-level
39
40       threads=NUMBER
41              Generate no more than the given number of threads. Default  num‐
42              ber  of  threads is 10. To disable threading specify a non-posi‐
43              tive number.
44              Command line option: --threads
45
46       timeout=NUMBER
47              Set the timeout for connection attempts in seconds. The  default
48              timeout is 60 seconds.
49              Command line option: --timeout
50
51       aborttimeout=NUMBER
52              Time  to  wait  for  checks  to finish after the user aborts the
53              first time (with Ctrl-C or the abort button).  The default abort
54              timeout is 300 seconds.
55              Command line option: --timeout
56
57       useragent=STRING
58              Specify  the  User-Agent  string to send to the HTTP server, for
59              example "Mozilla/4.0". The default  is  "LinkChecker/X.Y"  where
60              X.Y is the current version of LinkChecker.
61              Command line option: --user-agent
62
63       sslverify=[0|1|filename]
64              If set to zero disables SSL certificate checking.  If set to one
65              (the default) enables SSL certificate checking with the provided
66              CA certificate file. If a filename is specified, it will be used
67              as the certificate file.
68              Command line option: none
69
70       maxrunseconds=NUMBER
71              Stop checking new URLs after the given number of  seconds.  Same
72              as  if the user stops (by hitting Ctrl-C) after the given number
73              of seconds.
74              The default is not to stop until all URLs are checked.
75              Command line option: none
76
77       maxnumurls=NUMBER
78              Maximum number of URLs to check. New URLs  will  not  be  queued
79              after the given number of URLs is checked.
80              The default is to queue and check all URLs.
81              Command line option: none
82
83       maxrequestspersecond=NUMBER
84              Limit the maximum number of requests per second to one host.
85
86       allowedschemes=NAME[,NAME...]
87              Allowed URL schemes as comma-separated list.
88
89   [filtering]
90       ignore=REGEX (MULTILINE)
91              Only  check  syntax  of  URLs matching the given regular expres‐
92              sions.
93              Command line option: --ignore-url
94
95       ignorewarnings=NAME[,NAME...]
96              Ignore the comma-separated list of warnings.  See  WARNINGS  for
97              the list of supported warnings.
98              Command line option: none
99
100       internlinks=REGEX
101              Regular  expression  to  add  more  URLs  recognized as internal
102              links.  Default is that URLs  given  on  the  command  line  are
103              internal.
104              Command line option: none
105
106       nofollow=REGEX (MULTILINE)
107              Check  but  do  not recurse into URLs matching the given regular
108              expressions.
109              Command line option: --no-follow-url
110
111       checkextern=[0|1]
112              Check external links. Default is to check internal links only.
113              Command line option: --checkextern
114
115   [authentication]
116       entry=REGEX USER [PASS] (MULTILINE)
117              Provide different user/password pairs for different link  types.
118              Entries  are a triple (URL regex, username, password) or a tuple
119              (URL regex, username), where the entries are separated by white‐
120              space.
121              The  password is optional and if missing it has to be entered at
122              the commandline.
123              If the regular expression matches the  checked  URL,  the  given
124              user/password  pair  is used for authentication. The commandline
125              options -u and -p match every link and  therefore  override  the
126              entries given here. The first match wins. At the moment, authen‐
127              tication is used/needed for http[s] and ftp links.
128              Command line option: -u, -p
129
130       loginurl=URL
131              A login URL to be visited before checking. Also needs  authenti‐
132              cation data set for it.
133
134       loginuserfield=STRING
135              The name of the user CGI field. Default name is login.
136
137       loginpasswordfield=STRING
138              The name of the password CGI field. Default name is password.
139
140       loginextrafields=NAME:VALUE (MULTILINE)
141              Optionally  any  additional  CGI name/value pairs. Note that the
142              default values are submitted automatically.
143
144   [output]
145       debug=STRING[,STRING...]
146              Print debugging output for the given modules.   Available  debug
147              modules  are  cmdline, checking, cache, dns, thread, plugins and
148              all.  Specifying all is an alias for  specifying  all  available
149              loggers.
150              Command line option: --debug
151
152       fileoutput=TYPE[,TYPE...]
153              Output        to       a       files       linkchecker-out.TYPE,
154              $HOME/.linkchecker/blacklist for blacklist output.
155              Valid file output types are text, html, sql, csv, gml, dot, xml,
156              none  or blacklist Default is no file output. The various output
157              types are documented below. Note that you can suppress all  con‐
158              sole output with output=none.
159              Command line option: --file-output
160
161       log=TYPE[/ENCODING]
162              Specify output type as text, html, sql, csv, gml, dot, xml, none
163              or blacklist.  Default type is text. The  various  output  types
164              are documented below.
165              The  ENCODING specifies the output encoding, the default is that
166              of   your   locale.    Valid    encodings    are    listed    at
167              http://docs.python.org/library/codecs.html#standard-encodings.
168              Command line option: --output
169
170       quiet=[0|1]
171              If set, operate quiet. An alias for log=none.  This is only use‐
172              ful with fileoutput.
173              Command line option: --verbose
174
175       status=[0|1]
176              Control printing check status messages. Default is 1.
177              Command line option: --no-status
178
179       verbose=[0|1]
180              If set log all checked URLs once. Default is to log only  errors
181              and warnings.
182              Command line option: --verbose
183
184       warnings=[0|1]
185              If set log warnings. Default is to log warnings.
186              Command line option: --no-warnings
187
188   [text]
189       filename=STRING
190              Specify  output  filename  for text logging. Default filename is
191              linkchecker-out.txt.
192              Command line option: --file-output=
193
194       parts=STRING
195              Comma-separated list of parts that have to be logged.  See  LOG‐
196              GER PARTS below.
197              Command line option: none
198
199       encoding=STRING
200              Valid          encodings          are          listed         in
201              http://docs.python.org/library/codecs.html#standard-encodings.
202              Default encoding is iso-8859-15.
203
204       color* Color settings for the various log parts,  syntax  is  color  or
205              type;color.  The  type  can  be bold, light, blink, invert.  The
206              color can be default, black, red, green, yellow,  blue,  purple,
207              cyan,  white,  Black,  Red, Green, Yellow, Blue, Purple, Cyan or
208              White.
209              Command line option: none
210
211       colorparent=STRING
212              Set parent color. Default is white.
213
214       colorurl=STRING
215              Set URL color. Default is default.
216
217       colorname=STRING
218              Set name color. Default is default.
219
220       colorreal=STRING
221              Set real URL color. Default is cyan.
222
223       colorbase=STRING
224              Set base URL color. Default is purple.
225
226       colorvalid=STRING
227              Set valid color. Default is bold;green.
228
229       colorinvalid=STRING
230              Set invalid color. Default is bold;red.
231
232       colorinfo=STRING
233              Set info color. Default is default.
234
235       colorwarning=STRING
236              Set warning color. Default is bold;yellow.
237
238       colordltime=STRING
239              Set download time color. Default is default.
240
241       colorreset=STRING
242              Set reset color. Default is default.
243
244   [gml]
245       filename=STRING
246              See [text] section above.
247
248       parts=STRING
249              See [text] section above.
250
251       encoding=STRING
252              See [text] section above.
253
254   [dot]
255       filename=STRING
256              See [text] section above.
257
258       parts=STRING
259              See [text] section above.
260
261       encoding=STRING
262              See [text] section above.
263
264   [csv]
265       filename=STRING
266              See [text] section above.
267
268       parts=STRING
269              See [text] section above.
270
271       encoding=STRING
272              See [text] section above.
273
274       separator=CHAR
275              Set CSV separator. Default is a comma (,).
276
277       quotechar=CHAR
278              Set CSV quote character. Default is a double quote (").
279
280   [sql]
281       filename=STRING
282              See [text] section above.
283
284       parts=STRING
285              See [text] section above.
286
287       encoding=STRING
288              See [text] section above.
289
290       dbname=STRING
291              Set database name to store into. Default is linksdb.
292
293       separator=CHAR
294              Set SQL command separator character. Default is a semicolon (;).
295
296   [html]
297       filename=STRING
298              See [text] section above.
299
300       parts=STRING
301              See [text] section above.
302
303       encoding=STRING
304              See [text] section above.
305
306       colorbackground=COLOR
307              Set HTML background color. Default is #fff7e5.
308
309       colorurl=
310              Set HTML URL color. Default is #dcd5cf.
311
312       colorborder=
313              Set HTML border color. Default is #000000.
314
315       colorlink=
316              Set HTML link color. Default is #191c83.
317
318       colorwarning=
319              Set HTML warning color. Default is #e0954e.
320
321       colorerror=
322              Set HTML error color. Default is #db4930.
323
324       colorok=
325              Set HTML valid color. Default is #3ba557.
326
327   [blacklist]
328       filename=STRING
329              See [text] section above.
330
331       encoding=STRING
332              See [text] section above.
333
334   [xml]
335       filename=STRING
336              See [text] section above.
337
338       parts=STRING
339              See [text] section above.
340
341       encoding=STRING
342              See [text] section above.
343
344   [gxml]
345       filename=STRING
346              See [text] section above.
347
348       parts=STRING
349              See [text] section above.
350
351       encoding=STRING
352              See [text] section above.
353
354   [sitemap]
355       filename=STRING
356              See [text] section above.
357
358       parts=STRING
359              See [text] section above.
360
361       encoding=STRING
362              See [text] section above.
363
364       priority=FLOAT
365              A number between 0.0  and  1.0  determining  the  priority.  The
366              default  priority  for  the first URL is 1.0, for all child URLs
367              0.5.
368
369       frequency=[always|hourly|daily|weekly|monthly|yearly|never]
370              How frequently pages are changing.
371

LOGGER PARTS

373        all       (for all parts)
374        id        (a unique ID for each logentry)
375        realurl   (the full url link)
376        result    (valid or invalid, with messages)
377        extern    (1 or 0, only in some logger types reported)
378        base      (base href=...)
379        name      (<a href=...>name</a> and <img alt="name">)
380        parenturl (if any)
381        info      (some additional info, e.g. FTP welcome messages)
382        warning   (warnings)
383        dltime    (download time)
384        checktime (check time)
385        url       (the original url name, can be relative)
386        intro     (the blurb at the beginning, "starting at ...")
387        outro     (the blurb at the end, "found x errors ...")
388

MULTILINE

390       Some option values can  span  multiple  lines.  Each  line  has  to  be
391       indented  for  that  to  work.  Lines  starting with a hash (#) will be
392       ignored, though they must still be indented.
393
394        ignore=
395          lconline
396          bookmark
397          # a comment
398          ^mailto:
399

EXAMPLE

401        [output]
402        log=html
403
404        [checking]
405        threads=5
406
407        [filtering]
408        ignorewarnings=http-moved-permanent
409
410

PLUGINS

412       All plugins have a separate section. If the section appears in the con‐
413       figuration file the plugin is enabled.  Some plugins read extra options
414       in their section.
415
416
417   [AnchorCheck]
418       Checks validity of HTML anchors.
419
420
421   [LocationInfo]
422       Adds the country and if possible city name of the  URL  host  as  info.
423       Needs GeoIP or pygeoip and a local country or city lookup DB installed.
424
425
426   [RegexCheck]
427       Define  a  regular  expression which prints a warning if it matches any
428       content of the checked link. This applies only to valid  pages,  so  we
429       can get their content.
430
431       warningregex=REGEX
432              Use this to check for pages that contain some form of error mes‐
433              sage, for example "This page has moved" or  "Oracle  Application
434              error". REGEX should be unquoted.
435
436              Note that multiple values can be combined in the regular expres‐
437              sion, for  example  "(This  page  has  moved|Oracle  Application
438              error)".
439
440
441   [SslCertificateCheck]
442       Check  SSL certificate expiration date. Only internal https: links will
443       be checked. A domain will only be checked once to avoid duplicate warn‐
444       ings.
445
446       sslcertwarndays=NUMBER
447              Configures the expiration warning time in days.
448
449
450   [HtmlSyntaxCheck]
451       Check the syntax of HTML pages with the online W3C HTML validator.  See
452       http://validator.w3.org/docs/api.html.
453
454
455   [HttpHeaderInfo]
456       Print HTTP headers in URL info.
457
458       prefixes=prefix1[,prefix2]...
459              List of comma separated header prefixes. For example to  display
460              all HTTP headers that start with "X-".
461
462
463   [CssSyntaxCheck]
464       Check  the syntax of HTML pages with the online W3C CSS validator.  See
465       http://jigsaw.w3.org/css-validator/manual.html#expert.
466
467
468   [VirusCheck]
469       Checks the page content for virus infections with clamav.  A local cla‐
470       mav daemon must be installed.
471
472       clamavconf=filename
473              Filename of clamd.conf config file.
474
475   [PdfParser]
476       Parse  PDF  files  for URLs to check. Needs the pdfminer Python package
477       installed.
478
479
480   [WordParser]
481       Parse Word files for URLs to check. Needs the pywin32 Python  extension
482       installed.
483
484

WARNINGS

486       The  following  warnings  are recognized in the 'ignorewarnings' config
487       file entry:
488
489       file-missing-slash
490              The file: URL is missing a trailing slash.
491
492       file-system-path
493              The file: path is not the same as the system specific path.
494
495       ftp-missing-slash
496              The ftp: URL is missing a trailing slash.
497
498       http-cookie-store-error
499              An error occurred while storing a cookie.
500
501       http-empty-content
502              The URL had no content.
503
504       mail-no-mx-host
505              The mail MX host could not be found.
506
507       nntp-no-newsgroup
508              The NNTP newsgroup could not be found.
509
510       nntp-no-server
511              No NNTP server was found.
512
513       url-content-size-zero
514              The URL content size is zero.
515
516       url-content-too-large
517              The URL content size is too large.
518
519       url-effective-url
520              The effective URL is different from the original.
521
522       url-error-getting-content
523              Could not get the content of the URL.
524
525       url-obfuscated-ip
526              The IP is obfuscated.
527
528       url-whitespace
529              The URL contains leading or trailing whitespace.
530
531

SEE ALSO

533       linkchecker(1)
534

AUTHOR

536       Bastian Kleineidam <bastian.kleineidam@web.de>
537
539       Copyright © 2000-2014 Bastian Kleineidam
540
541
542
543LinkChecker                       2007-11-30                  linkcheckerrc(5)
Impressum