1LINKCHECKERRC(5)                  LinkChecker                 LINKCHECKERRC(5)
2
3
4

NAME

6       linkcheckerrc - configuration file for LinkChecker
7

DESCRIPTION

9       linkcheckerrc  is  the  configuration file for LinkChecker. The file is
10       written in an INI-style format.  The default file location is $XDG_CON‐
11       FIG_HOME/linkchecker/linkcheckerrc        or        else        ~/.con‐
12       fig/linkchecker/linkcheckerrc      on      Unix,       %HOMEPATH%\.con‐
13       fig\linkchecker\linkcheckerrc on Windows systems.
14

SETTINGS

16   checking
17       cookiefile=filename
18              Read  a file with initial cookie data. The cookie data format is
19              explained in linkchecker(1).  Command line option: --cookiefile
20
21       debugmemory=[0|1]
22              Write memory allocation statistics to a file on  exit,  requires
23              meliae.  The default is not to write the file.  Command line op‐
24              tion: none
25
26       localwebroot=STRING
27              When checking absolute URLs inside local files, the  given  root
28              directory  is  used  as base URL.  Note that the given directory
29              must have URL syntax, so it must use a slash to join directories
30              instead  of a backslash. And the given directory must end with a
31              slash.  Command line option: none
32
33       nntpserver=STRING
34              Specify an NNTP server for news: links. Default is the  environ‐
35              ment  variable NNTP_SERVER. If no host is given, only the syntax
36              of the link is checked.  Command line option: --nntp-server
37
38       recursionlevel=NUMBER
39              Check recursively all links up to given depth. A negative  depth
40              will enable infinite recursion. Default depth is infinite.  Com‐
41              mand line option: --recursion-level
42
43       threads=NUMBER
44              Generate no more than the given number of threads. Default  num‐
45              ber  of  threads is 10. To disable threading specify a non-posi‐
46              tive number.  Command line option: --threads
47
48       timeout=NUMBER
49              Set the timeout for connection attempts in seconds. The  default
50              timeout is 60 seconds.  Command line option: --timeout
51
52       aborttimeout=NUMBER
53              Time  to  wait  for  checks  to finish after the user aborts the
54              first time (with Ctrl-C or the abort button). The default  abort
55              timeout is 300 seconds.  Command line option: none
56
57       useragent=STRING
58              Specify  the  User-Agent  string to send to the HTTP server, for
59              example "Mozilla/4.0". The default  is  "LinkChecker/X.Y"  where
60              X.Y is the current version of LinkChecker.  Command line option:
61              --user-agent
62
63       sslverify=[0|1|filename]
64              If set to zero disables SSL certificate checking. If set to  one
65              (the default) enables SSL certificate checking with the provided
66              CA certificate file. If a filename is specified, it will be used
67              as the certificate file.  Command line option: none
68
69       maxrunseconds=NUMBER
70              Stop  checking  new URLs after the given number of seconds. Same
71              as if the user stops (by hitting Ctrl-C) after the given  number
72              of  seconds.   The  default  is  not  to stop until all URLs are
73              checked.  Command line option: none
74
75       maxfilesizedownload=NUMBER
76              Files larger than NUMBER bytes will be  ignored,  without  down‐
77              loading  anything  if  accessed  over  http and an accurate Con‐
78              tent-Length header was returned.  No more than this amount of  a
79              file  will  be downloaded.  The default is 5242880 (5 MB).  Com‐
80              mand line option: none
81
82       maxfilesizeparse=NUMBER
83              Files larger than NUMBER bytes will not  be  parsed  for  links.
84              The default is 1048576 (1 MB).  Command line option: none
85
86       maxnumurls=NUMBER
87              Maximum number of URLs to check. New URLs will not be queued af‐
88              ter the given number of URLs is  checked.   The  default  is  to
89              queue and check all URLs.  Command line option: none
90
91       maxrequestspersecond=NUMBER
92              Limit  the  maximum  number  of  HTTP requests per second to one
93              host.  The average number of requests  per  second  is  approxi‐
94              mately one third of the maximum. Values less than 1 and at least
95              0.001 can be used.  To use values  greater  than  10,  the  HTTP
96              server must return a "LinkChecker" response header.  The default
97              is 10.  Command line option: none
98
99       robotstxt=[0|1]
100              When using http, fetch robots.txt, and confirm whether each  URL
101              should  be  accessed before checking.  The default is to use ro‐
102              bots.txt files.  Command line option: --no-robots
103
104       allowedschemes=NAME[,NAME...]
105              Allowed URL schemes as comma-separated list.  Command  line  op‐
106              tion: none
107
108       resultcachesize=NUMBER
109              Set  the  result cache size.  The default is 100 000 URLs.  Com‐
110              mand line option: none
111
112   filtering
113       ignore=REGEX (MULTILINE)
114              Only check syntax of URLs matching  the  given  regular  expres‐
115              sions.  Command line option: --ignore-url
116
117       ignorewarnings=NAME[,NAME...]
118              Ignore  the  comma-separated  list of warnings. See WARNINGS for
119              the list of supported warnings.  Command line option: none
120
121       internlinks=REGEX
122              Regular expression to  add  more  URLs  recognized  as  internal
123              links.   Default  is that URLs given on the command line are in‐
124              ternal.  Command line option: none
125
126       nofollow=REGEX (MULTILINE)
127              Check but do not recurse into URLs matching  the  given  regular
128              expressions.  Command line option: --no-follow-url
129
130       checkextern=[0|1]
131              Check  external  links. Default is to check internal links only.
132              Command line option: --check-extern
133
134   authentication
135       entry=REGEX USER [PASS] (MULTILINE)
136              Provide individual username/password pairs for different  links.
137              In  addition to a single login page specified with loginurl mul‐
138              tiple FTP, HTTP (Basic Authentication) and telnet links are sup‐
139              ported.  Entries are a triple (URL regex, username, password) or
140              a tuple (URL regex, username), where the entries  are  separated
141              by  whitespace.   The password is optional and if missing it has
142              to be entered at the commandline.   If  the  regular  expression
143              matches  the  checked  URL,  the given username/password pair is
144              used for authentication. The command  line  options  -u  and  -p
145              match  every link and therefore override the entries given here.
146              The first match wins.  Command line option: -u, -p
147
148       loginurl=URL
149              The URL of a login page to be visited before link checking.  The
150              page  is expected to contain an HTML form to collect credentials
151              and submit them to the address in its action attribute using  an
152              HTTP  POST request. The name attributes of the input elements of
153              the form and the values to be submitted  need  to  be  available
154              (see entry for an explanation of username and password values).
155
156       loginuserfield=STRING
157              The  name  attribute of the username input element. Default: lo‐
158              gin.
159
160       loginpasswordfield=STRING
161              The name attribute of the password input element. Default: pass‐
162              word.
163
164       loginextrafields=NAME:VALUE (MULTILINE)
165              Optionally  the name attributes of any additional input elements
166              and the values to populate them with. Note that these  are  sub‐
167              mitted without checking whether matching input elements exist in
168              the HTML form.
169
170   output
171   URL checking results
172       fileoutput=TYPE[,TYPE...]
173              Output     to     a      file      linkchecker-out.TYPE,      or
174              $XDG_DATA_HOME/linkchecker/failures   for  the  failures  output
175              type.  Valid file output types are text, html,  sql,  csv,  gml,
176              dot, xml, none or failures. Default is no file output. The vari‐
177              ous output types are documented below. Note that  you  can  sup‐
178              press all console output with output=none.  Command line option:
179              --file-output
180
181       log=TYPE[/ENCODING]
182              Specify the console output type as text, html,  sql,  csv,  gml,
183              dot,  xml,  none  or failures. Default type is text. The various
184              output types are documented below.  The ENCODING  specifies  the
185              output  encoding,  the default is that of your locale. Valid en‐
186              codings               are               listed                at
187              https://docs.python.org/library/codecs.html#standard-encodings.
188              Command line option: --output
189
190       verbose=[0|1]
191              If set log all checked URLs once. Default is to log only  errors
192              and warnings.  Command line option: --verbose
193
194       warnings=[0|1]
195              If  set  log warnings. Default is to log warnings.  Command line
196              option: --no-warnings
197
198       ignoreerrors=URL_REGEX [MESSAGE_REGEX] (MULTILINE)
199              Specify regular expressions to ignore errors for matching  URLs,
200              one  per  line. A second regular expression can be specified per
201              line to only ignore matching error  messages  per  corresponding
202              URL.  If  the  second  expression is omitted, all errors are ig‐
203              nored. In contrast to filtering, this  happens  after  checking,
204              which  allows checking URLs despite certain expected and tolera‐
205              ble errors. Default is to not ignore any errors. Example:
206
207          [output]
208          ignoreerrors=
209            ^https://deprecated\.example\.com ^410 Gone
210            # ignore all errors (no second expression), also for syntax check:
211            ^mailto:.*@example\.com$
212
213   Progress updates
214       status=[0|1]
215              Control printing URL checker  status  messages.  Default  is  1.
216              Command line option: --no-status
217
218   Application
219       debug=STRING[,STRING...]
220              Print  debugging  output  for  the given logger. Available debug
221              loggers are cmdline, checking, cache, plugin and all.  all is an
222              alias for all available loggers.  Command line option: --debug
223
224   Quiet
225       quiet=[0|1]
226              If set, operate quiet. An alias for log=none that also hides ap‐
227              plication information messages.  This is only useful with  file‐
228              output,  else  no  results will be output.  Command line option:
229              --quiet
230

OUTPUT TYPES

232   text
233       filename=STRING
234              Specify output filename for text logging.  Default  filename  is
235              linkchecker-out.txt.  Command line option: --file-output
236
237       parts=STRING
238              Comma-separated list of parts that have to be logged. See LOGGER
239              PARTS below.  Command line option: none
240
241       encoding=STRING
242              Valid         encodings          are          listed          in
243              https://docs.python.org/library/codecs.html#standard-encodings.
244              Default encoding is the system default locale encoding.
245
246       color* Color settings for the various log parts,  syntax  is  color  or
247              type;color.  The  type  can  be  bold, light, blink, invert. The
248              color can be default, black, red, green, yellow,  blue,  purple,
249              cyan,  white,  Black,  Red, Green, Yellow, Blue, Purple, Cyan or
250              White.  Command line option: none
251
252       colorparent=STRING
253              Set parent color. Default is white.
254
255       colorurl=STRING
256              Set URL color. Default is default.
257
258       colorname=STRING
259              Set name color. Default is default.
260
261       colorreal=STRING
262              Set real URL color. Default is cyan.
263
264       colorbase=STRING
265              Set base URL color. Default is purple.
266
267       colorvalid=STRING
268              Set valid color. Default is bold;green.
269
270       colorinvalid=STRING
271              Set invalid color. Default is bold;red.
272
273       colorinfo=STRING
274              Set info color. Default is default.
275
276       colorwarning=STRING
277              Set warning color. Default is bold;yellow.
278
279       colordltime=STRING
280              Set download time color. Default is default.
281
282       colorreset=STRING
283              Set reset color. Default is default.
284
285   gml
286       filename=STRING
287              See [text] section above.
288
289       parts=STRING
290              See [text] section above.
291
292       encoding=STRING
293              See [text] section above.
294
295   dot
296       filename=STRING
297              See [text] section above.
298
299       parts=STRING
300              See [text] section above.
301
302       encoding=STRING
303              See [text] section above.
304
305   csv
306       filename=STRING
307              See [text] section above.
308
309       parts=STRING
310              See [text] section above.
311
312       encoding=STRING
313              See [text] section above.
314
315       separator=CHAR
316              Set CSV separator. Default is a semicolon (;).
317
318       quotechar=CHAR
319              Set CSV quote character. Default is a double quote (").
320
321       dialect=STRING
322              Controls        the        output        formatting.         See
323              https://docs.python.org/3/library/csv.html#csv.Dialect.  Default
324              is excel.
325
326   sql
327       filename=STRING
328              See [text] section above.
329
330       parts=STRING
331              See [text] section above.
332
333       encoding=STRING
334              See [text] section above.
335
336       dbname=STRING
337              Set database name to store into. Default is linksdb.
338
339       separator=CHAR
340              Set SQL command separator character. Default is a semicolon (;).
341
342   html
343       filename=STRING
344              See [text] section above.
345
346       parts=STRING
347              See [text] section above.
348
349       encoding=STRING
350              See [text] section above.
351
352       colorbackground=COLOR
353              Set HTML background color. Default is #fff7e5.
354
355       colorurl=
356              Set HTML URL color. Default is #dcd5cf.
357
358       colorborder=
359              Set HTML border color. Default is #000000.
360
361       colorlink=
362              Set HTML link color. Default is #191c83.
363
364       colorwarning=
365              Set HTML warning color. Default is #e0954e.
366
367       colorerror=
368              Set HTML error color. Default is #db4930.
369
370       colorok=
371              Set HTML valid color. Default is #3ba557.
372
373   failures
374       filename=STRING
375              See [text] section above.
376
377       encoding=STRING
378              See [text] section above.
379
380   xml
381       filename=STRING
382              See [text] section above.
383
384       parts=STRING
385              See [text] section above.
386
387       encoding=STRING
388              See [text] section above.
389
390   gxml
391       filename=STRING
392              See [text] section above.
393
394       parts=STRING
395              See [text] section above.
396
397       encoding=STRING
398              See [text] section above.
399
400   sitemap
401       filename=STRING
402              See [text] section above.
403
404       parts=STRING
405              See [text] section above.
406
407       encoding=STRING
408              See [text] section above.
409
410       priority=FLOAT
411              A number between 0.0 and 1.0 determining the priority.  The  de‐
412              fault priority for the first URL is 1.0, for all child URLs 0.5.
413
414       frequency=[always|hourly|daily|weekly|monthly|yearly|never]
415              How frequently pages are changing. Default is daily.
416

LOGGER PARTS

418       all    for all parts
419
420       id     a unique ID for each logentry
421
422       realurl
423              the full url link
424
425       result valid or invalid, with messages
426
427       extern 1 or 0, only in some logger types reported
428
429       base   base href=...
430
431       name   <a href=...>name</a> and <img alt="name">
432
433       parenturl
434              if any
435
436       info   some additional info, e.g. FTP welcome messages
437
438       warning
439              warnings
440
441       dltime download time
442
443       checktime
444              check time
445
446       url    the original url name, can be relative
447
448       intro  the blurb at the beginning, "starting at ..."
449
450       outro  the blurb at the end, "found x errors ..."
451

MULTILINE

453       Some  option  values  can  span multiple lines. Each line has to be in‐
454       dented for that to work. Lines starting with a hash  (#)  will  be  ig‐
455       nored, though they must still be indented.
456
457          ignore=
458            lconline
459            bookmark
460            # a comment
461            ^mailto:
462

EXAMPLE

464          [output]
465          log=html
466
467          [checking]
468          threads=5
469
470          [filtering]
471          ignorewarnings=http-moved-permanent
472

PLUGINS

474       All plugins have a separate section. If the section appears in the con‐
475       figuration file the plugin is enabled. Some plugins read extra  options
476       in their section.
477
478   AnchorCheck
479       Checks  validity  of HTML anchors. When checking local files, URLs with
480       anchors that link to directories e.g. "example/#anchor"  are  not  sup‐
481       ported. There is no such limitation when using http(s).
482
483   LocationInfo
484       Adds  the  country  and  if possible city name of the URL host as info.
485       Needs GeoIP or pygeoip and a local country or city lookup DB installed.
486
487   RegexCheck
488       Define a regular expression which prints a warning if  it  matches  any
489       content  of  the  checked link. This applies only to valid pages, so we
490       can get their content.
491
492       warningregex=REGEX
493              Use this to check for pages that contain some form of error mes‐
494              sage,  for  example "This page has moved" or "Oracle Application
495              error".  REGEX should be unquoted.
496
497              Note that multiple values can be combined in the regular expres‐
498              sion,  for  example "(This page has moved|Oracle Application er‐
499              ror)".
500
501   SslCertificateCheck
502       Check SSL certificate expiration date. Only internal https: links  will
503       be checked. A domain will only be checked once to avoid duplicate warn‐
504       ings.
505
506       sslcertwarndays=NUMBER
507              Configures the expiration warning time in days.
508
509   HtmlSyntaxCheck
510       Check the syntax of HTML pages with the online W3C HTML validator.  See
511       https://validator.w3.org/docs/api.html.
512
513       NOTE:
514          The HtmlSyntaxCheck plugin is currently broken and is disabled.
515
516   HttpHeaderInfo
517       Print HTTP headers in URL info.
518
519       prefixes=prefix1[,*prefix2*]...
520              List  of comma separated header prefixes. For example to display
521              all HTTP headers that start with "X-".
522
523   CssSyntaxCheck
524       Check the syntax of HTML pages with the online W3C CSS  validator.  See
525       https://jigsaw.w3.org/css-validator/manual.html#expert.
526
527   VirusCheck
528       Checks  the page content for virus infections with clamav. A local cla‐
529       mav daemon must be installed.
530
531       clamavconf=filename
532              Filename of clamd.conf config file.
533
534   PdfParser
535       Parse PDF files for URLs to check. Needs the pdfminer.six Python  pack‐
536       age installed.
537
538   WordParser
539       Parse  Word files for URLs to check. Needs the pywin32 Python extension
540       installed.
541
542   MarkdownCheck
543       Parse Markdown files for URLs to check.
544
545       filename_re=REGEX
546              Regular expression matching the names of Markdown files.
547

WARNINGS

549       The following warnings are recognized in  the  'ignorewarnings'  config
550       file entry:
551
552       file-anchorcheck-directory
553              A local directory with an anchor, not supported by AnchorCheck.
554
555       file-missing-slash
556              The file: URL is missing a trailing slash.
557
558       file-system-path
559              The file: path is not the same as the system specific path.
560
561       ftp-missing-slash
562              The ftp: URL is missing a trailing slash.
563
564       http-cookie-store-error
565              An error occurred while storing a cookie.
566
567       http-empty-content
568              The URL had no content.
569
570       http-rate-limited
571              Too many HTTP requests.
572
573       mail-no-mx-host
574              The mail MX host could not be found.
575
576       nntp-no-newsgroup
577              The NNTP newsgroup could not be found.
578
579       nntp-no-server
580              No NNTP server was found.
581
582       url-content-size-zero
583              The URL content size is zero.
584
585       url-content-too-large
586              The URL content size is too large.
587
588       url-content-type-unparseable
589              The URL content type is not parseable.
590
591       url-effective-url
592              The effective URL is different from the original.
593
594       url-error-getting-content
595              Could not get the content of the URL.
596
597       url-obfuscated-ip
598              The IP is obfuscated.
599
600       url-whitespace
601              The URL contains leading or trailing whitespace.
602

SEE ALSO

604       linkchecker(1)
605

AUTHOR

607       Bastian Kleineidam <bastian.kleineidam@web.de>
608
610       2000-2016 Bastian Kleineidam, 2010-2022 LinkChecker Authors
611
612
613
614
61510.1.0.post162+g614e84b5       October 31, 2022               LINKCHECKERRC(5)
Impressum