1linkchecker(1)              General Commands Manual             linkchecker(1)
2
3
4

NAME

6       linkchecker - check HTML documents for broken links
7

SYNOPSIS

9       linkchecker [options] [file-or-url]...
10

DESCRIPTION

12       LinkChecker features recursive checking, multithreading, output in col‐
13       ored or normal text, HTML, SQL, CSV or a sitemap graph in GML  or  XML,
14       support  for  HTTP/1.1,  HTTPS,  FTP, mailto:, news:, nntp:, Telnet and
15       local file links, restriction of link checking with regular  expression
16       filters  for  URLs,  proxy support, username/password authorization for
17       HTTP and FTP, robots.txt exclusion protocol support,  i18n  support,  a
18       command  line  interface  and  a (Fast)CGI web interface (requires HTTP
19       server)
20

EXAMPLES

22       The most common use checks the given domain recursively, plus  any  URL
23       pointing outside of the domain:
24         linkchecker http://treasure.calvinsplayground.de/
25       Beware  that  this checks the whole site which can have several hundred
26       thousands URLs. Use the -r option to restrict the recursion depth.
27       Don't connect to mailto: hosts, only check their URL syntax. All  other
28       links are checked as usual:
29         linkchecker --ignore-url=^mailto: www.mysite.org
30       Checking a local HTML file on Unix:
31         linkchecker ../bla.html
32       Checking a local HTML file on Windows:
33         linkchecker c:\temp\test.html
34       You can skip the http:// url part if the domain starts with www.:
35         linkchecker www.myhomepage.de
36       You can skip the ftp:// url part if the domain starts with ftp.:
37         linkchecker -r0 ftp.linux.org
38       Generate a sitemap graph and convert it with the graphviz dot utility:
39         linkchecker -odot -v www.myhomepage.de | dot -Tps > sitemap.ps
40

OPTIONS

42   General options
43       -h, --help
44              Help me! Print usage information for this program.
45
46       -fFILENAME, --config=FILENAME
47              Use FILENAME as configuration file. As default LinkChecker first
48              searches      /etc/linkchecker/linkcheckerrc      and       then
49              ~/.linkchecker/linkcheckerrc.
50
51       -I, --interactive
52              Ask for URL if none are given on the commandline.
53
54       -tNUMBER, --threads=NUMBER
55              Generate  no more than the given number of threads. Default num‐
56              ber of threads is 10. To disable threading specify  a  non-posi‐
57              tive number.
58
59       --priority
60              Run   with   normal  thread  scheduling  priority.  Per  default
61              LinkChecker runs with low thread priority to be  suitable  as  a
62              background job.
63
64       -V, --version
65              Print version and exit.
66
67       --allow-root
68              Do  not  drop  privileges when running as root user on Unix sys‐
69              tems.
70
71   Output options
72       -v, --verbose
73              Log all checked URLs. Default is to log only  errors  and  warn‐
74              ings.
75
76       --no-warnings
77              Don't log warnings. Default is to log warnings.
78
79       -WREGEX, --warning-regex=REGEX
80              Define a regular expression which prints a warning if it matches
81              any content of the checked link.  This  applies  only  to  valid
82              pages, so we can get their content.
83
84              Use this to check for pages that contain some form of error, for
85              example "This page has  moved"  or  "Oracle  Application  Server
86              error".
87
88       --warning-size-bytes=NUMBER
89              Print  a  warning  if content size info is available and exceeds
90              the given number of bytes.
91
92       -q, --quiet
93              Quiet operation, an alias for -o none.  This is only useful with
94              -F.
95
96       -oTYPE[/ENCODING], --output=TYPE[/ENCODING]
97              Specify output type as text, html, sql, csv, gml, dot, xml, none
98              or blacklist.  Default type is text. The  various  output  types
99              are  documented below.  The ENCODING specifies the output encod‐
100              ing, the default is that of your locale.   Valid  encodings  are
101              listed at http://docs.python.org/lib/standard-encodings.html.
102
103       -FTYPE[/ENCODING][/FILENAME], --file-output=TYPE[/ENCODING][/FILENAME]
104              Output to a file linkchecker-out.TYPE, $HOME/.linkchecker/black‐
105              list for blacklist output, or FILENAME if specified.  The ENCOD‐
106              ING  specifies  the output encoding, the default is that of your
107              locale.       Valid      encodings      are      listed       at
108              http://docs.python.org/lib/standard-encodings.html.   The  FILE‐
109              NAME and ENCODING parts of the none output type will be ignored,
110              else  if  the  file already exists, it will be overwritten.  You
111              can specify this option more than once. Valid file output  types
112              are  text,  html,  sql,  csv,  gml,  dot, xml, none or blacklist
113              Default is no file output. The various output  types  are  docu‐
114              mented below. Note that you can suppress all console output with
115              the option -o none.
116
117       --no-status
118              Do not print check status messages.
119
120       -DSTRING, --debug=STRING
121              Print debugging output for the given logger.  Available  loggers
122              are  cmdline, checking, cache, gui, dns and all.  Specifying all
123              is an alias for specifying all available  loggers.   The  option
124              can  be given multiple times to debug with more than one logger.
125               For accurate results, threading will be disabled  during  debug
126              runs.
127
128       --trace
129              Print tracing information.
130
131       --profile
132              Write  profiling  data into a file named linkchecker.prof in the
133              current working directory. See also --viewprof.
134
135       --viewprof
136              Print out previously generated profiling data. See  also  --pro‐
137              file.
138
139   Checking options
140       -rNUMBER, --recursion-level=NUMBER
141              Check recursively all links up to given depth.  A negative depth
142              will enable infinite recursion.  Default depth is infinite.
143
144       --no-follow-url=REGEX
145              Check but do not recurse into URLs matching  the  given  regular
146              expression. This option can be given multiple times.
147
148       --ignore-url=REGEX
149              Only check syntax of URLs matching the given regular expression.
150              This option can be given multiple times.
151
152       -C, --cookies
153              Accept and send HTTP cookies according to RFC 2109. Only cookies
154              which  are  sent  back  to  the originating server are accepted.
155              Sent and accepted cookies are  provided  as  additional  logging
156              information.
157
158       -a, --anchors
159              Check  HTTP  anchor references. Default is not to check anchors.
160              This option enables logging of the warning url-anchor-not-found.
161
162       --no-anchor-caching
163              Treat url#anchora and url#anchorb as equal on caching.  This  is
164              the default browser behaviour, but it's not specified in the URI
165              specification. Use with care since broken anchors are not  guar‐
166              anteed to be detected in this mode.
167
168       -uSTRING, --user=STRING
169              Try  the given username for HTTP and FTP authorization.  For FTP
170              the default username is anonymous. For HTTP there is no  default
171              username. See also -p.
172
173       -pSTRING, --password=STRING
174              Try  the given password for HTTP and FTP authorization.  For FTP
175              the default password is anonymous@. For HTTP there is no default
176              password. See also -u.
177
178       --timeout=NUMBER
179              Set  the timeout for connection attempts in seconds. The default
180              timeout is 60 seconds.
181
182       -PNUMBER, --pause=NUMBER
183              Pause the given number of seconds between two subsequent connec‐
184              tion  requests  to  the  same  host. Default is no pause between
185              requests.
186
187       -NSTRING, --nntp-server=STRING
188              Specify an NNTP server for news: links. Default is the  environ‐
189              ment  variable NNTP_SERVER. If no host is given, only the syntax
190              of the link is checked.
191
192       --no-proxy-for=REGEX
193              Contact hosts that match the given regular  expression  directly
194              instead  of going through a proxy. This option can be given mul‐
195              tiple times.
196

OUTPUT TYPES

198       Note that by default only errors and warnings are logged.   You  should
199       use  the --verbose option to get the complete URL list, especially when
200       outputting a sitemap graph format.
201
202
203       text   Standard text logger, logging URLs in keyword: argument fashion.
204
205       html   Log URLs in keyword: argument fashion, formatted as HTML.  Addi‐
206              tionally  has  links  to the referenced pages. Invalid URLs have
207              HTML and CSS syntax check links appended.
208
209       csv    Log check result in CSV format with one URL per line.
210
211       gml    Log parent-child relations between linked URLs as a GML  sitemap
212              graph.
213
214       dot    Log  parent-child relations between linked URLs as a DOT sitemap
215              graph.
216
217       gxml   Log check result as a GraphXML sitemap graph.
218
219       xml    Log check result as machine-readable XML.
220
221       sql    Log check result as SQL script with INSERT commands. An  example
222              script  to  create  the  initial  SQL  table is included as cre‐
223              ate.sql.
224
225       blacklist
226              Suitable for cron jobs.  Logs  the  check  result  into  a  file
227              ~/.linkchecker/blacklist   which   only  contains  entries  with
228              invalid URLs and the number of times they have failed.
229
230       none   Logs nothing. Suitable for debugging or checking the exit code.
231

REGULAR EXPRESSIONS

233       Only Python regular  expressions  are  accepted  by  LinkChecker.   See
234       http://www.amk.ca/python/howto/regex/  for  an  introduction in regular
235       expressions.
236
237       The only addition is that a leading exclamation mark negates the  regu‐
238       lar expression.
239
241       A  cookie file contains standard RFC 805 header data with the following
242       possible names:
243
244       Scheme (optional)
245              Sets the scheme the cookies are valid  for;  default  scheme  is
246              http.
247
248       Host (required)
249              Sets the domain the cookies are valid for.
250
251       Path (optional)
252              Gives the path the cookies are value for; default path is /.
253
254       Set-cookie (optional)
255              Set cookie name/value. Can be given more than once.
256
257       Multiple entries are separated by a blank line.  The example below will
258       send two cookies to all URLs starting with  http://imadoofus.org/hello/
259       and one to all URLs starting with https://imaweevil.org/:
260
261        Host: imadoofus.org
262        Path: /hello
263        Set-cookie: ID="smee"
264        Set-cookie: spam="egg"
265
266        Scheme: https
267        Host: imaweevil.org
268        Set-cookie: baggage="elitist"; comment="hologram"
269
270

PROXY SUPPORT

272       To  use  a  proxy  set $http_proxy, $https_proxy, $ftp_proxy on Unix or
273       Windows  to  the  proxy  URL.  The  URL   should   be   of   the   form
274       http://[user:pass@]host[:port],  for  example http://localhost:8080, or
275       http://joe:test@proxy.domain.  On a Mac  use  the  Internet  Config  to
276       select a proxy.
277

NOTES

279       URLs on the commandline starting with ftp. are treated like ftp://ftp.,
280       URLs starting with www. are treated like  http://www..   You  can  also
281       give local files as arguments.
282
283       If you have your system configured to automatically establish a connec‐
284       tion to the internet (e.g. with diald), it will connect  when  checking
285       links  not  pointing  to your local host.  Use the -s and -i options to
286       prevent this.
287
288       Javascript links are currently ignored.
289
290       If your platform does not support threading,  LinkChecker  disables  it
291       automatically.
292
293       You can supply multiple user/password pairs in a configuration file.
294
295       When  checking  news:  links the given NNTP host doesn't need to be the
296       same as the host of the user browsing your pages.
297

ENVIRONMENT

299       NNTP_SERVER - specifies default NNTP server
300       http_proxy - specifies default HTTP proxy server
301       ftp_proxy - specifies default FTP proxy server
302       LC_MESSAGES, LANG, LANGUAGE - specify output language
303

RETURN VALUE

305       The return value is non-zero when
306
307       ·      invalid links were found or
308
309       ·      link warnings were found and warnings are enabled
310
311       ·      a program error occurred.
312

FILES

314       /etc/linkchecker/linkcheckerrc, ~/.linkchecker/linkcheckerrc -  default
315       configuration files
316       ~/.linkchecker/blacklist - default blacklist logger output filename
317       linkchecker-out.TYPE - default logger file output name
318       http://docs.python.org/lib/standard-encodings.html   -   valid   output
319       encodings
320       http://www.amk.ca/python/howto/regex/ - regular  expression  documenta‐
321       tion
322

AUTHOR

324       Bastian Kleineidam <calvin@users.sourceforge.net>
325
327       Copyright © 2000-2007 Bastian Kleineidam
328
329
330
331LinkChecker                       2001-03-10                    linkchecker(1)
Impressum