1LINKCHECKERRC(5)                  LinkChecker                 LINKCHECKERRC(5)
2
3
4

NAME

6       linkcheckerrc - configuration file for LinkChecker
7

DESCRIPTION

9       linkcheckerrc  is  the  configuration file for LinkChecker. The file is
10       written  in  an  INI-style  format.   The  default  file  location   is
11       ~/.linkchecker/linkcheckerrc          on          Unix,          %HOME‐
12       PATH%\.linkchecker\linkcheckerrc on Windows systems.
13

SETTINGS

15   checking
16       cookiefile=filename
17              Read a file with initial cookie data. The cookie data format  is
18              explained in linkchecker(1).  Command line option: --cookiefile
19
20       debugmemory=[0|1]
21              Write  memory  allocation statistics to a file on exit, requires
22              meliae.  The default is not to write the file.  Command line op‐
23              tion: none
24
25       localwebroot=STRING
26              When  checking  absolute URLs inside local files, the given root
27              directory is used as base URL.  Note that  the  given  directory
28              must have URL syntax, so it must use a slash to join directories
29              instead of a backslash. And the given directory must end with  a
30              slash.  Command line option: none
31
32       nntpserver=STRING
33              Specify  an NNTP server for news: links. Default is the environ‐
34              ment variable NNTP_SERVER. If no host is given, only the  syntax
35              of the link is checked.  Command line option: --nntp-server
36
37       recursionlevel=NUMBER
38              Check  recursively all links up to given depth. A negative depth
39              will enable infinite recursion. Default depth is infinite.  Com‐
40              mand line option: --recursion-level
41
42       threads=NUMBER
43              Generate  no more than the given number of threads. Default num‐
44              ber of threads is 10. To disable threading specify  a  non-posi‐
45              tive number.  Command line option: --threads
46
47       timeout=NUMBER
48              Set  the timeout for connection attempts in seconds. The default
49              timeout is 60 seconds.  Command line option: --timeout
50
51       aborttimeout=NUMBER
52              Time to wait for checks to finish  after  the  user  aborts  the
53              first  time (with Ctrl-C or the abort button). The default abort
54              timeout is 300 seconds.  Command line option: none
55
56       useragent=STRING
57              Specify the User-Agent string to send to the  HTTP  server,  for
58              example  "Mozilla/4.0".  The  default is "LinkChecker/X.Y" where
59              X.Y is the current version of LinkChecker.  Command line option:
60              --user-agent
61
62       sslverify=[0|1|filename]
63              If  set to zero disables SSL certificate checking. If set to one
64              (the default) enables SSL certificate checking with the provided
65              CA certificate file. If a filename is specified, it will be used
66              as the certificate file.  Command line option: none
67
68       maxrunseconds=NUMBER
69              Stop checking new URLs after the given number of  seconds.  Same
70              as  if the user stops (by hitting Ctrl-C) after the given number
71              of seconds.  The default is not  to  stop  until  all  URLs  are
72              checked.  Command line option: none
73
74       maxfilesizedownload=NUMBER
75              Files  larger  than  NUMBER bytes will be ignored, without down‐
76              loading anything if accessed over  http  and  an  accurate  Con‐
77              tent-Length  header was returned.  No more than this amount of a
78              file will be downloaded.  The default is 5242880 (5  MB).   Com‐
79              mand line option: none
80
81       maxfilesizeparse=NUMBER
82              Files  larger  than  NUMBER  bytes will not be parsed for links.
83              The default is 1048576 (1 MB).  Command line option: none
84
85       maxnumurls=NUMBER
86              Maximum number of URLs to check. New URLs will not be queued af‐
87              ter  the  given  number  of  URLs is checked.  The default is to
88              queue and check all URLs.  Command line option: none
89
90       maxrequestspersecond=NUMBER
91              Limit the maximum number of requests per  second  to  one  host.
92              The default is 10.  Command line option: none
93
94       robotstxt=[0|1]
95              When  using http, fetch robots.txt, and confirm whether each URL
96              should be accessed before checking.  The default is to  use  ro‐
97              bots.txt files.  Command line option: --no-robots
98
99       allowedschemes=NAME[,NAME...]
100              Allowed  URL  schemes as comma-separated list.  Command line op‐
101              tion: none
102
103       resultcachesize=NUMBER
104              Set the result cache size.  The default is 100 000  URLs.   Com‐
105              mand line option: none
106
107   filtering
108       ignore=REGEX (MULTILINE)
109              Only  check  syntax  of  URLs matching the given regular expres‐
110              sions.  Command line option: --ignore-url
111
112       ignorewarnings=NAME[,NAME...]
113              Ignore the comma-separated list of warnings.  See  WARNINGS  for
114              the list of supported warnings.  Command line option: none
115
116       internlinks=REGEX
117              Regular  expression  to  add  more  URLs  recognized as internal
118              links.  Default is that URLs given on the command line  are  in‐
119              ternal.  Command line option: none
120
121       nofollow=REGEX (MULTILINE)
122              Check  but  do  not recurse into URLs matching the given regular
123              expressions.  Command line option: --no-follow-url
124
125       checkextern=[0|1]
126              Check external links. Default is to check internal  links  only.
127              Command line option: --check-extern
128
129   authentication
130       entry=REGEX USER [PASS] (MULTILINE)
131              Provide  individual username/password pairs for different links.
132              In addtion to a single login page specified with loginurl multi‐
133              ple  FTP,  HTTP (Basic Authentication) and telnet links are sup‐
134              ported.  Entries are a triple (URL regex, username, password) or
135              a  tuple  (URL regex, username), where the entries are separated
136              by whitespace.  The password is optional and if missing  it  has
137              to  be  entered  at  the commandline.  If the regular expression
138              matches the checked URL, the  given  username/password  pair  is
139              used  for  authentication.  The  command  line options -u and -p
140              match every link and therefore override the entries given  here.
141              The first match wins.  Command line option: -u, -p
142
143       loginurl=URL
144              The  URL of a login page to be visited before link checking. The
145              page is expected to contain an HTML form to collect  credentials
146              and  submit them to the address in its action attribute using an
147              HTTP POST request. The name attributes of the input elements  of
148              the  form  and  the  values to be submitted need to be available
149              (see entry for an explanation of username and password values).
150
151       loginuserfield=STRING
152              The name attribute of the username input element.  Default:  lo‐
153              gin.
154
155       loginpasswordfield=STRING
156              The name attribute of the password input element. Default: pass‐
157              word.
158
159       loginextrafields=NAME:VALUE (MULTILINE)
160              Optionally the name attributes of any additional input  elements
161              and  the  values to populate them with. Note that these are sub‐
162              mitted without checking whether matching input elements exist in
163              the HTML form.
164
165   output
166   URL checking results
167       fileoutput=TYPE[,TYPE...]
168              Output      to      a      file     linkchecker-out.TYPE,     or
169              $HOME/.linkchecker/failures for the failures output type.  Valid
170              file  output types are text, html, sql, csv, gml, dot, xml, none
171              or failures. Default is no file output. The various output types
172              are  documented  below.  Note  that you can suppress all console
173              output with output=none.  Command line option: --file-output
174
175       log=TYPE[/ENCODING]
176              Specify the console output type as text, html,  sql,  csv,  gml,
177              dot,  xml,  none  or failures. Default type is text. The various
178              output types are documented below.  The ENCODING  specifies  the
179              output  encoding,  the default is that of your locale. Valid en‐
180              codings               are               listed                at
181              https://docs.python.org/library/codecs.html#standard-encodings.
182              Command line option: --output
183
184       verbose=[0|1]
185              If set log all checked URLs once. Default is to log only  errors
186              and warnings.  Command line option: --verbose
187
188       warnings=[0|1]
189              If  set  log warnings. Default is to log warnings.  Command line
190              option: --no-warnings
191
192   Progress updates
193       status=[0|1]
194              Control printing URL checker  status  messages.  Default  is  1.
195              Command line option: --no-status
196
197   Application
198       debug=STRING[,STRING...]
199              Print  debugging  output  for the given modules. Available debug
200              modules are cmdline, checking, cache, dns, thread,  plugins  and
201              all.  Specifying  all  is  an alias for specifying all available
202              loggers.  Command line option: --debug
203
204   Quiet
205       quiet=[0|1]
206              If set, operate quiet. An alias for log=none that also hides ap‐
207              plication  information messages.  This is only useful with file‐
208              output, else no results will be output.   Command  line  option:
209              --quiet
210

OUTPUT TYPES

212   text
213       filename=STRING
214              Specify  output  filename  for text logging. Default filename is
215              linkchecker-out.txt.  Command line option: --file-output
216
217       parts=STRING
218              Comma-separated list of parts that have to be logged. See LOGGER
219              PARTS below.  Command line option: none
220
221       encoding=STRING
222              Valid          encodings          are          listed         in
223              https://docs.python.org/library/codecs.html#standard-encodings.
224              Default encoding is the system default locale encoding.
225
226       color* Color  settings  for  the  various log parts, syntax is color or
227              type;color. The type can be  bold,  light,  blink,  invert.  The
228              color  can  be default, black, red, green, yellow, blue, purple,
229              cyan, white, Black, Red, Green, Yellow, Blue,  Purple,  Cyan  or
230              White.  Command line option: none
231
232       colorparent=STRING
233              Set parent color. Default is white.
234
235       colorurl=STRING
236              Set URL color. Default is default.
237
238       colorname=STRING
239              Set name color. Default is default.
240
241       colorreal=STRING
242              Set real URL color. Default is cyan.
243
244       colorbase=STRING
245              Set base URL color. Default is purple.
246
247       colorvalid=STRING
248              Set valid color. Default is bold;green.
249
250       colorinvalid=STRING
251              Set invalid color. Default is bold;red.
252
253       colorinfo=STRING
254              Set info color. Default is default.
255
256       colorwarning=STRING
257              Set warning color. Default is bold;yellow.
258
259       colordltime=STRING
260              Set download time color. Default is default.
261
262       colorreset=STRING
263              Set reset color. Default is default.
264
265   gml
266       filename=STRING
267              See [text] section above.
268
269       parts=STRING
270              See [text] section above.
271
272       encoding=STRING
273              See [text] section above.
274
275   dot
276       filename=STRING
277              See [text] section above.
278
279       parts=STRING
280              See [text] section above.
281
282       encoding=STRING
283              See [text] section above.
284
285   csv
286       filename=STRING
287              See [text] section above.
288
289       parts=STRING
290              See [text] section above.
291
292       encoding=STRING
293              See [text] section above.
294
295       separator=CHAR
296              Set CSV separator. Default is a semicolon (;).
297
298       quotechar=CHAR
299              Set CSV quote character. Default is a double quote (").
300
301   sql
302       filename=STRING
303              See [text] section above.
304
305       parts=STRING
306              See [text] section above.
307
308       encoding=STRING
309              See [text] section above.
310
311       dbname=STRING
312              Set database name to store into. Default is linksdb.
313
314       separator=CHAR
315              Set SQL command separator character. Default is a semicolon (;).
316
317   html
318       filename=STRING
319              See [text] section above.
320
321       parts=STRING
322              See [text] section above.
323
324       encoding=STRING
325              See [text] section above.
326
327       colorbackground=COLOR
328              Set HTML background color. Default is #fff7e5.
329
330       colorurl=
331              Set HTML URL color. Default is #dcd5cf.
332
333       colorborder=
334              Set HTML border color. Default is #000000.
335
336       colorlink=
337              Set HTML link color. Default is #191c83.
338
339       colorwarning=
340              Set HTML warning color. Default is #e0954e.
341
342       colorerror=
343              Set HTML error color. Default is #db4930.
344
345       colorok=
346              Set HTML valid color. Default is #3ba557.
347
348   failures
349       filename=STRING
350              See [text] section above.
351
352       encoding=STRING
353              See [text] section above.
354
355   xml
356       filename=STRING
357              See [text] section above.
358
359       parts=STRING
360              See [text] section above.
361
362       encoding=STRING
363              See [text] section above.
364
365   gxml
366       filename=STRING
367              See [text] section above.
368
369       parts=STRING
370              See [text] section above.
371
372       encoding=STRING
373              See [text] section above.
374
375   sitemap
376       filename=STRING
377              See [text] section above.
378
379       parts=STRING
380              See [text] section above.
381
382       encoding=STRING
383              See [text] section above.
384
385       priority=FLOAT
386              A  number  between 0.0 and 1.0 determining the priority. The de‐
387              fault priority for the first URL is 1.0, for all child URLs 0.5.
388
389       frequency=[always|hourly|daily|weekly|monthly|yearly|never]
390              How frequently pages are changing.
391

LOGGER PARTS

393       all    for all parts
394
395       id     a unique ID for each logentry
396
397       realurl
398              the full url link
399
400       result valid or invalid, with messages
401
402       extern 1 or 0, only in some logger types reported
403
404       base   base href=...
405
406       name   <a href=...>name</a> and <img alt="name">
407
408       parenturl
409              if any
410
411       info   some additional info, e.g. FTP welcome messages
412
413       warning
414              warnings
415
416       dltime download time
417
418       checktime
419              check time
420
421       url    the original url name, can be relative
422
423       intro  the blurb at the beginning, "starting at ..."
424
425       outro  the blurb at the end, "found x errors ..."
426

MULTILINE

428       Some option values can span multiple lines. Each line  has  to  be  in‐
429       dented  for  that  to  work. Lines starting with a hash (#) will be ig‐
430       nored, though they must still be indented.
431
432          ignore=
433            lconline
434            bookmark
435            # a comment
436            ^mailto:
437

EXAMPLE

439          [output]
440          log=html
441
442          [checking]
443          threads=5
444
445          [filtering]
446          ignorewarnings=http-moved-permanent
447

PLUGINS

449       All plugins have a separate section. If the section appears in the con‐
450       figuration  file the plugin is enabled. Some plugins read extra options
451       in their section.
452
453   AnchorCheck
454       Checks validity of HTML anchors.
455
456       NOTE:
457          The AnchorCheck plugin is currently broken and is disabled.
458
459   LocationInfo
460       Adds the country and if possible city name of the  URL  host  as  info.
461       Needs GeoIP or pygeoip and a local country or city lookup DB installed.
462
463   RegexCheck
464       Define  a  regular  expression which prints a warning if it matches any
465       content of the checked link. This applies only to valid  pages,  so  we
466       can get their content.
467
468       warningregex=REGEX
469              Use this to check for pages that contain some form of error mes‐
470              sage, for example "This page has moved" or  "Oracle  Application
471              error".  REGEX should be unquoted.
472
473              Note that multiple values can be combined in the regular expres‐
474              sion, for example "(This page has moved|Oracle  Application  er‐
475              ror)".
476
477   SslCertificateCheck
478       Check  SSL certificate expiration date. Only internal https: links will
479       be checked. A domain will only be checked once to avoid duplicate warn‐
480       ings.
481
482       sslcertwarndays=NUMBER
483              Configures the expiration warning time in days.
484
485   HtmlSyntaxCheck
486       Check  the syntax of HTML pages with the online W3C HTML validator. See
487       https://validator.w3.org/docs/api.html.
488
489       NOTE:
490          The HtmlSyntaxCheck plugin is currently broken and is disabled.
491
492   HttpHeaderInfo
493       Print HTTP headers in URL info.
494
495       prefixes=prefix1[,*prefix2*]...
496              List of comma separated header prefixes. For example to  display
497              all HTTP headers that start with "X-".
498
499   CssSyntaxCheck
500       Check  the  syntax of HTML pages with the online W3C CSS validator. See
501       https://jigsaw.w3.org/css-validator/manual.html#expert.
502
503   VirusCheck
504       Checks the page content for virus infections with clamav. A local  cla‐
505       mav daemon must be installed.
506
507       clamavconf=filename
508              Filename of clamd.conf config file.
509
510   PdfParser
511       Parse  PDF  files  for URLs to check. Needs the pdfminer Python package
512       installed.
513
514   WordParser
515       Parse Word files for URLs to check. Needs the pywin32 Python  extension
516       installed.
517
518   MarkdownCheck
519       Parse Markdown files for URLs to check.
520
521       filename_re=REGEX
522              Regular expression matching the names of Markdown files.
523

WARNINGS

525       The  following  warnings  are recognized in the 'ignorewarnings' config
526       file entry:
527
528       file-missing-slash
529              The file: URL is missing a trailing slash.
530
531       file-system-path
532              The file: path is not the same as the system specific path.
533
534       ftp-missing-slash
535              The ftp: URL is missing a trailing slash.
536
537       http-cookie-store-error
538              An error occurred while storing a cookie.
539
540       http-empty-content
541              The URL had no content.
542
543       mail-no-mx-host
544              The mail MX host could not be found.
545
546       nntp-no-newsgroup
547              The NNTP newsgroup could not be found.
548
549       nntp-no-server
550              No NNTP server was found.
551
552       url-content-size-zero
553              The URL content size is zero.
554
555       url-content-too-large
556              The URL content size is too large.
557
558       url-effective-url
559              The effective URL is different from the original.
560
561       url-error-getting-content
562              Could not get the content of the URL.
563
564       url-obfuscated-ip
565              The IP is obfuscated.
566
567       url-whitespace
568              The URL contains leading or trailing whitespace.
569

SEE ALSO

571       linkchecker(1)
572

AUTHOR

574       Bastian Kleineidam <bastian.kleineidam@web.de>
575
577       2000-2016 Bastian Kleineidam, 2010-2021 LinkChecker Authors
578
579
580
581
58210.0.1.post124+ga12fcf04       December 21, 2021              LINKCHECKERRC(5)
Impressum