wget(1) - f7

1WGET(1)                            GNU Wget                            WGET(1)
2
3
4

NAME

6       Wget - The non-interactive network downloader.
7

SYNOPSIS

9       wget [option]... [URL]...
10

DESCRIPTION

12       GNU Wget is a free utility for non-interactive download of files from
13       the Web.  It supports HTTP, HTTPS, and FTP protocols, as well as
14       retrieval through HTTP proxies.
15
16       Wget is non-interactive, meaning that it can work in the background,
17       while the user is not logged on.  This allows you to start a retrieval
18       and disconnect from the system, letting Wget finish the work.  By con‐
19       trast, most of the Web browsers require constant user's presence, which
20       can be a great hindrance when transferring a lot of data.
21
22       Wget can follow links in HTML and XHTML pages and create local versions
23       of remote web sites, fully recreating the directory structure of the
24       original site.  This is sometimes referred to as ``recursive download‐
25       ing.''  While doing that, Wget respects the Robot Exclusion Standard
26       (/robots.txt).  Wget can be instructed to convert the links in down‐
27       loaded HTML files to the local files for offline viewing.
28
29       Wget has been designed for robustness over slow or unstable network
30       connections; if a download fails due to a network problem, it will keep
31       retrying until the whole file has been retrieved.  If the server sup‐
32       ports regetting, it will instruct the server to continue the download
33       from where it left off.
34

OPTIONS

36       Option Syntax
37
38       Since Wget uses GNU getopt to process command-line arguments, every
39       option has a long form along with the short one.  Long options are more
40       convenient to remember, but take time to type.  You may freely mix dif‐
41       ferent option styles, or specify options after the command-line argu‐
42       ments.  Thus you may write:
43
44               wget -r --tries=10 http://fly.srk.fer.hr/ -o log
45
46       The space between the option accepting an argument and the argument may
47       be omitted.  Instead -o log you can write -olog.
48
49       You may put several options that do not require arguments together,
50       like:
51
52               wget -drc <URL>
53
54       This is a complete equivalent of:
55
56               wget -d -r -c <URL>
57
58       Since the options can be specified after the arguments, you may termi‐
59       nate them with --.  So the following will try to download URL -x,
60       reporting failure to log:
61
62               wget -o log -- -x
63
64       The options that accept comma-separated lists all respect the conven‐
65       tion that specifying an empty list clears its value.  This can be use‐
66       ful to clear the .wgetrc settings.  For instance, if your .wgetrc sets
67       "exclude_directories" to /cgi-bin, the following example will first
68       reset it, and then set it to exclude /~nobody and /~somebody.  You can
69       also clear the lists in .wgetrc.
70
71               wget -X '' -X /~nobody,/~somebody
72
73       Most options that do not accept arguments are boolean options, so named
74       because their state can be captured with a yes-or-no (``boolean'')
75       variable.  For example, --follow-ftp tells Wget to follow FTP links
76       from HTML files and, on the other hand, --no-glob tells it not to per‐
77       form file globbing on FTP URLs.  A boolean option is either affirmative
78       or negative (beginning with --no).  All such options share several
79       properties.
80
81       Unless stated otherwise, it is assumed that the default behavior is the
82       opposite of what the option accomplishes.  For example, the documented
83       existence of --follow-ftp assumes that the default is to not follow FTP
84       links from HTML pages.
85
86       Affirmative options can be negated by prepending the --no- to the
87       option name; negative options can be negated by omitting the --no- pre‐
88       fix.  This might seem superfluous---if the default for an affirmative
89       option is to not do something, then why provide a way to explicitly
90       turn it off?  But the startup file may in fact change the default.  For
91       instance, using "follow_ftp = off" in .wgetrc makes Wget not follow FTP
92       links by default, and using --no-follow-ftp is the only way to restore
93       the factory default from the command line.
94
95       Basic Startup Options
96
97       -V
98       --version
99           Display the version of Wget.
100
101       -h
102       --help
103           Print a help message describing all of Wget's command-line options.
104
105       -b
106       --background
107           Go to background immediately after startup.  If no output file is
108           specified via the -o, output is redirected to wget-log.
109
110       -e command
111       --execute command
112           Execute command as if it were a part of .wgetrc.  A command thus
113           invoked will be executed after the commands in .wgetrc, thus taking
114           precedence over them.  If you need to specify more than one wgetrc
115           command, use multiple instances of -e.
116
117       Logging and Input File Options
118
119       -o logfile
120       --output-file=logfile
121           Log all messages to logfile.  The messages are normally reported to
122           standard error.
123
124       -a logfile
125       --append-output=logfile
126           Append to logfile.  This is the same as -o, only it appends to log‐
127           file instead of overwriting the old log file.  If logfile does not
128           exist, a new file is created.
129
130       -d
131       --debug
132           Turn on debug output, meaning various information important to the
133           developers of Wget if it does not work properly.  Your system
134           administrator may have chosen to compile Wget without debug sup‐
135           port, in which case -d will not work.  Please note that compiling
136           with debug support is always safe---Wget compiled with the debug
137           support will not print any debug info unless requested with -d.
138
139       -q
140       --quiet
141           Turn off Wget's output.
142
143       -v
144       --verbose
145           Turn on verbose output, with all the available data.  The default
146           output is verbose.
147
148       -nv
149       --no-verbose
150           Turn off verbose without being completely quiet (use -q for that),
151           which means that error messages and basic information still get
152           printed.
153
154       -i file
155       --input-file=file
156           Read URLs from file.  If - is specified as file, URLs are read from
157           the standard input.  (Use ./- to read from a file literally named
158           -.)
159
160           If this function is used, no URLs need be present on the command
161           line.  If there are URLs both on the command line and in an input
162           file, those on the command lines will be the first ones to be
163           retrieved.  The file need not be an HTML document (but no harm if
164           it is)---it is enough if the URLs are just listed sequentially.
165
166           However, if you specify --force-html, the document will be regarded
167           as html.  In that case you may have problems with relative links,
168           which you can solve either by adding "<base href="url">" to the
169           documents or by specifying --base=url on the command line.
170
171       -F
172       --force-html
173           When input is read from a file, force it to be treated as an HTML
174           file.  This enables you to retrieve relative links from existing
175           HTML files on your local disk, by adding "<base href="url">" to
176           HTML, or using the --base command-line option.
177
178       -B URL
179       --base=URL
180           Prepends URL to relative links read from the file specified with
181           the -i option.
182
183       Download Options
184
185       --bind-address=ADDRESS
186           When making client TCP/IP connections, bind to ADDRESS on the local
187           machine.  ADDRESS may be specified as a hostname or IP address.
188           This option can be useful if your machine is bound to multiple IPs.
189
190       -t number
191       --tries=number
192           Set number of retries to number.  Specify 0 or inf for infinite
193           retrying.  The default is to retry 20 times, with the exception of
194           fatal errors like ``connection refused'' or ``not found'' (404),
195           which are not retried.
196
197       -O file
198       --output-document=file
199           The documents will not be written to the appropriate files, but all
200           will be concatenated together and written to file.  If - is used as
201           file, documents will be printed to standard output, disabling link
202           conversion.  (Use ./- to print to a file literally named -.)
203
204           Note that a combination with -k is only well-defined for download‐
205           ing a single document.
206
207       -nc
208       --no-clobber
209           If a file is downloaded more than once in the same directory,
210           Wget's behavior depends on a few options, including -nc.  In cer‐
211           tain cases, the local file will be clobbered, or overwritten, upon
212           repeated download.  In other cases it will be preserved.
213
214           When running Wget without -N, -nc, or -r, downloading the same file
215           in the same directory will result in the original copy of file
216           being preserved and the second copy being named file.1.  If that
217           file is downloaded yet again, the third copy will be named file.2,
218           and so on.  When -nc is specified, this behavior is suppressed, and
219           Wget will refuse to download newer copies of file.  Therefore,
220           ``"no-clobber"'' is actually a misnomer in this mode---it's not
221           clobbering that's prevented (as the numeric suffixes were already
222           preventing clobbering), but rather the multiple version saving
223           that's prevented.
224
225           When running Wget with -r, but without -N or -nc, re-downloading a
226           file will result in the new copy simply overwriting the old.
227           Adding -nc will prevent this behavior, instead causing the original
228           version to be preserved and any newer copies on the server to be
229           ignored.
230
231           When running Wget with -N, with or without -r, the decision as to
232           whether or not to download a newer copy of a file depends on the
233           local and remote timestamp and size of the file.  -nc may not be
234           specified at the same time as -N.
235
236           Note that when -nc is specified, files with the suffixes .html or
237           .htm will be loaded from the local disk and parsed as if they had
238           been retrieved from the Web.
239
240       -c
241       --continue
242           Continue getting a partially-downloaded file.  This is useful when
243           you want to finish up a download started by a previous instance of
244           Wget, or by another program.  For instance:
245
246                   wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
247
248           If there is a file named ls-lR.Z in the current directory, Wget
249           will assume that it is the first portion of the remote file, and
250           will ask the server to continue the retrieval from an offset equal
251           to the length of the local file.
252
253           Note that you don't need to specify this option if you just want
254           the current invocation of Wget to retry downloading a file should
255           the connection be lost midway through.  This is the default behav‐
256           ior.  -c only affects resumption of downloads started prior to this
257           invocation of Wget, and whose local files are still sitting around.
258
259           Without -c, the previous example would just download the remote
260           file to ls-lR.Z.1, leaving the truncated ls-lR.Z file alone.
261
262           Beginning with Wget 1.7, if you use -c on a non-empty file, and it
263           turns out that the server does not support continued downloading,
264           Wget will refuse to start the download from scratch, which would
265           effectively ruin existing contents.  If you really want the down‐
266           load to start from scratch, remove the file.
267
268           Also beginning with Wget 1.7, if you use -c on a file which is of
269           equal size as the one on the server, Wget will refuse to download
270           the file and print an explanatory message.  The same happens when
271           the file is smaller on the server than locally (presumably because
272           it was changed on the server since your last download
273           attempt)---because ``continuing'' is not meaningful, no download
274           occurs.
275
276           On the other side of the coin, while using -c, any file that's big‐
277           ger on the server than locally will be considered an incomplete
278           download and only "(length(remote) - length(local))" bytes will be
279           downloaded and tacked onto the end of the local file.  This behav‐
280           ior can be desirable in certain cases---for instance, you can use
281           wget -c to download just the new portion that's been appended to a
282           data collection or log file.
283
284           However, if the file is bigger on the server because it's been
285           changed, as opposed to just appended to, you'll end up with a gar‐
286           bled file.  Wget has no way of verifying that the local file is
287           really a valid prefix of the remote file.  You need to be espe‐
288           cially careful of this when using -c in conjunction with -r, since
289           every file will be considered as an "incomplete download" candi‐
290           date.
291
292           Another instance where you'll get a garbled file if you try to use
293           -c is if you have a lame HTTP proxy that inserts a ``transfer
294           interrupted'' string into the local file.  In the future a ``roll‐
295           back'' option may be added to deal with this case.
296
297           Note that -c only works with FTP servers and with HTTP servers that
298           support the "Range" header.
299
300       --progress=type
301           Select the type of the progress indicator you wish to use.  Legal
302           indicators are ``dot'' and ``bar''.
303
304           The ``bar'' indicator is used by default.  It draws an ASCII
305           progress bar graphics (a.k.a ``thermometer'' display) indicating
306           the status of retrieval.  If the output is not a TTY, the ``dot''
307           bar will be used by default.
308
309           Use --progress=dot to switch to the ``dot'' display.  It traces the
310           retrieval by printing dots on the screen, each dot representing a
311           fixed amount of downloaded data.
312
313           When using the dotted retrieval, you may also set the style by
314           specifying the type as dot:style.  Different styles assign differ‐
315           ent meaning to one dot.  With the "default" style each dot repre‐
316           sents 1K, there are ten dots in a cluster and 50 dots in a line.
317           The "binary" style has a more ``computer''-like orientation---8K
318           dots, 16-dots clusters and 48 dots per line (which makes for 384K
319           lines).  The "mega" style is suitable for downloading very large
320           files---each dot represents 64K retrieved, there are eight dots in
321           a cluster, and 48 dots on each line (so each line contains 3M).
322
323           Note that you can set the default style using the "progress" com‐
324           mand in .wgetrc.  That setting may be overridden from the command
325           line.  The exception is that, when the output is not a TTY, the
326           ``dot'' progress will be favored over ``bar''.  To force the bar
327           output, use --progress=bar:force.
328
329       -N
330       --timestamping
331           Turn on time-stamping.
332
333       -S
334       --server-response
335           Print the headers sent by HTTP servers and responses sent by FTP
336           servers.
337
338       --spider
339           When invoked with this option, Wget will behave as a Web spider,
340           which means that it will not download the pages, just check that
341           they are there.  For example, you can use Wget to check your book‐
342           marks:
343
344                   wget --spider --force-html -i bookmarks.html
345
346           This feature needs much more work for Wget to get close to the
347           functionality of real web spiders.
348
349       -T seconds
350       --timeout=seconds
351           Set the network timeout to seconds seconds.  This is equivalent to
352           specifying --dns-timeout, --connect-timeout, and --read-timeout,
353           all at the same time.
354
355           When interacting with the network, Wget can check for timeout and
356           abort the operation if it takes too long.  This prevents anomalies
357           like hanging reads and infinite connects.  The only timeout enabled
358           by default is a 900-second read timeout.  Setting a timeout to 0
359           disables it altogether.  Unless you know what you are doing, it is
360           best not to change the default timeout settings.
361
362           All timeout-related options accept decimal values, as well as sub‐
363           second values.  For example, 0.1 seconds is a legal (though unwise)
364           choice of timeout.  Subsecond timeouts are useful for checking
365           server response times or for testing network latency.
366
367       --dns-timeout=seconds
368           Set the DNS lookup timeout to seconds seconds.  DNS lookups that
369           don't complete within the specified time will fail.  By default,
370           there is no timeout on DNS lookups, other than that implemented by
371           system libraries.
372
373       --connect-timeout=seconds
374           Set the connect timeout to seconds seconds.  TCP connections that
375           take longer to establish will be aborted.  By default, there is no
376           connect timeout, other than that implemented by system libraries.
377
378       --read-timeout=seconds
379           Set the read (and write) timeout to seconds seconds.  The ``time''
380           of this timeout refers idle time: if, at any point in the download,
381           no data is received for more than the specified number of seconds,
382           reading fails and the download is restarted.  This option does not
383           directly affect the duration of the entire download.
384
385           Of course, the remote server may choose to terminate the connection
386           sooner than this option requires.  The default read timeout is 900
387           seconds.
388
389       --limit-rate=amount
390           Limit the download speed to amount bytes per second.  Amount may be
391           expressed in bytes, kilobytes with the k suffix, or megabytes with
392           the m suffix.  For example, --limit-rate=20k will limit the
393           retrieval rate to 20KB/s.  This is useful when, for whatever rea‐
394           son, you don't want Wget to consume the entire available bandwidth.
395
396           This option allows the use of decimal numbers, usually in conjunc‐
397           tion with power suffixes; for example, --limit-rate=2.5k is a legal
398           value.
399
400           Note that Wget implements the limiting by sleeping the appropriate
401           amount of time after a network read that took less time than speci‐
402           fied by the rate.  Eventually this strategy causes the TCP transfer
403           to slow down to approximately the specified rate.  However, it may
404           take some time for this balance to be achieved, so don't be sur‐
405           prised if limiting the rate doesn't work well with very small
406           files.
407
408       -w seconds
409       --wait=seconds
410           Wait the specified number of seconds between the retrievals.  Use
411           of this option is recommended, as it lightens the server load by
412           making the requests less frequent.  Instead of in seconds, the time
413           can be specified in minutes using the "m" suffix, in hours using
414           "h" suffix, or in days using "d" suffix.
415
416           Specifying a large value for this option is useful if the network
417           or the destination host is down, so that Wget can wait long enough
418           to reasonably expect the network error to be fixed before the
419           retry.
420
421       --waitretry=seconds
422           If you don't want Wget to wait between every retrieval, but only
423           between retries of failed downloads, you can use this option.  Wget
424           will use linear backoff, waiting 1 second after the first failure
425           on a given file, then waiting 2 seconds after the second failure on
426           that file, up to the maximum number of seconds you specify.  There‐
427           fore, a value of 10 will actually make Wget wait up to (1 + 2 + ...
428           + 10) = 55 seconds per file.
429
430           Note that this option is turned on by default in the global wgetrc
431           file.
432
433       --random-wait
434           Some web sites may perform log analysis to identify retrieval pro‐
435           grams such as Wget by looking for statistically significant simi‐
436           larities in the time between requests. This option causes the time
437           between requests to vary between 0 and 2 * wait seconds, where wait
438           was specified using the --wait option, in order to mask Wget's
439           presence from such analysis.
440
441           A recent article in a publication devoted to development on a popu‐
442           lar consumer platform provided code to perform this analysis on the
443           fly.  Its author suggested blocking at the class C address level to
444           ensure automated retrieval programs were blocked despite changing
445           DHCP-supplied addresses.
446
447           The --random-wait option was inspired by this ill-advised recommen‐
448           dation to block many unrelated users from a web site due to the
449           actions of one.
450
451       --no-proxy
452           Don't use proxies, even if the appropriate *_proxy environment
453           variable is defined.
454
455       -Q quota
456       --quota=quota
457           Specify download quota for automatic retrievals.  The value can be
458           specified in bytes (default), kilobytes (with k suffix), or
459           megabytes (with m suffix).
460
461           Note that quota will never affect downloading a single file.  So if
462           you specify wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz, all of
463           the ls-lR.gz will be downloaded.  The same goes even when several
464           URLs are specified on the command-line.  However, quota is
465           respected when retrieving either recursively, or from an input
466           file.  Thus you may safely type wget -Q2m -i sites---download will
467           be aborted when the quota is exceeded.
468
469           Setting quota to 0 or to inf unlimits the download quota.
470
471       --no-dns-cache
472           Turn off caching of DNS lookups.  Normally, Wget remembers the IP
473           addresses it looked up from DNS so it doesn't have to repeatedly
474           contact the DNS server for the same (typically small) set of hosts
475           it retrieves from.  This cache exists in memory only; a new Wget
476           run will contact DNS again.
477
478           However, it has been reported that in some situations it is not
479           desirable to cache host names, even for the duration of a short-
480           running application like Wget.  With this option Wget issues a new
481           DNS lookup (more precisely, a new call to "gethostbyname" or
482           "getaddrinfo") each time it makes a new connection.  Please note
483           that this option will not affect caching that might be performed by
484           the resolving library or by an external caching layer, such as
485           NSCD.
486
487           If you don't understand exactly what this option does, you probably
488           won't need it.
489
490       --restrict-file-names=mode
491           Change which characters found in remote URLs may show up in local
492           file names generated from those URLs.  Characters that are
493           restricted by this option are escaped, i.e. replaced with %HH,
494           where HH is the hexadecimal number that corresponds to the
495           restricted character.
496
497           By default, Wget escapes the characters that are not valid as part
498           of file names on your operating system, as well as control charac‐
499           ters that are typically unprintable.  This option is useful for
500           changing these defaults, either because you are downloading to a
501           non-native partition, or because you want to disable escaping of
502           the control characters.
503
504           When mode is set to ``unix'', Wget escapes the character / and the
505           control characters in the ranges 0--31 and 128--159.  This is the
506           default on Unix-like OS'es.
507
508           When mode is set to ``windows'', Wget escapes the characters \, ⎪,
509           /, :, ?, ", *, <, >, and the control characters in the ranges 0--31
510           and 128--159.  In addition to this, Wget in Windows mode uses +
511           instead of : to separate host and port in local file names, and
512           uses
513            @  instead of  ?  to separate the query portion of the file name
514           from the rest.  Therefore, a URL that would be saved as
515           www.xemacs.org:4300/search.pl?input=blah in Unix mode would be
516           saved as www.xemacs.org+4300/search.pl@input=blah in Windows mode.
517           This mode is the default on Windows.
518
519           If you append ,nocontrol to the mode, as in unix,nocontrol, escap‐
520           ing of the control characters is also switched off.  You can use
521           --restrict-file-names=nocontrol to turn off escaping of control
522           characters without affecting the choice of the OS to use as file
523           name restriction mode.
524
525       -4
526       --inet4-only
527       -6
528       --inet6-only
529           Force connecting to IPv4 or IPv6 addresses.  With --inet4-only or
530           -4, Wget will only connect to IPv4 hosts, ignoring AAAA records in
531           DNS, and refusing to connect to IPv6 addresses specified in URLs.
532           Conversely, with --inet6-only or -6, Wget will only connect to IPv6
533           hosts and ignore A records and IPv4 addresses.
534
535           Neither options should be needed normally.  By default, an
536           IPv6-aware Wget will use the address family specified by the host's
537           DNS record.  If the DNS responds with both IPv4 and IPv6 addresses,
538           Wget will them in sequence until it finds one it can connect to.
539           (Also see "--prefer-family" option described below.)
540
541           These options can be used to deliberately force the use of IPv4 or
542           IPv6 address families on dual family systems, usually to aid debug‐
543           ging or to deal with broken network configuration.  Only one of
544           --inet6-only and --inet4-only may be specified at the same time.
545           Neither option is available in Wget compiled without IPv6 support.
546
547       --prefer-family=IPv4/IPv6/none
548           When given a choice of several addresses, connect to the addresses
549           with specified address family first.  IPv4 addresses are preferred
550           by default.
551
552           This avoids spurious errors and connect attempts when accessing
553           hosts that resolve to both IPv6 and IPv4 addresses from IPv4 net‐
554           works.  For example, www.kame.net resolves to
555           2001:200:0:8002:203:47ff:fea5:3085 and to 203.178.141.194.  When
556           the preferred family is "IPv4", the IPv4 address is used first;
557           when the preferred family is "IPv6", the IPv6 address is used
558           first; if the specified value is "none", the address order returned
559           by DNS is used without change.
560
561           Unlike -4 and -6, this option doesn't inhibit access to any address
562           family, it only changes the order in which the addresses are
563           accessed.  Also note that the reordering performed by this option
564           is stable---it doesn't affect order of addresses of the same fam‐
565           ily.  That is, the relative order of all IPv4 addresses and of all
566           IPv6 addresses remains intact in all cases.
567
568       --retry-connrefused
569           Consider ``connection refused'' a transient error and try again.
570           Normally Wget gives up on a URL when it is unable to connect to the
571           site because failure to connect is taken as a sign that the server
572           is not running at all and that retries would not help.  This option
573           is for mirroring unreliable sites whose servers tend to disappear
574           for short periods of time.
575
576       --user=user
577       --password=password
578           Specify the username user and password password for both FTP and
579           HTTP file retrieval.  These parameters can be overridden using the
580           --ftp-user and --ftp-password options for FTP connections and the
581           --http-user and --http-password options for HTTP connections.
582
583       Directory Options
584
585       -nd
586       --no-directories
587           Do not create a hierarchy of directories when retrieving recur‐
588           sively.  With this option turned on, all files will get saved to
589           the current directory, without clobbering (if a name shows up more
590           than once, the filenames will get extensions .n).
591
592       -x
593       --force-directories
594           The opposite of -nd---create a hierarchy of directories, even if
595           one would not have been created otherwise.  E.g. wget -x
596           http://fly.srk.fer.hr/robots.txt will save the downloaded file to
597           fly.srk.fer.hr/robots.txt.
598
599       -nH
600       --no-host-directories
601           Disable generation of host-prefixed directories.  By default,
602           invoking Wget with -r http://fly.srk.fer.hr/ will create a struc‐
603           ture of directories beginning with fly.srk.fer.hr/.  This option
604           disables such behavior.
605
606       --protocol-directories
607           Use the protocol name as a directory component of local file names.
608           For example, with this option, wget -r http://host will save to
609           http/host/... rather than just to host/....
610
611       --cut-dirs=number
612           Ignore number directory components.  This is useful for getting a
613           fine-grained control over the directory where recursive retrieval
614           will be saved.
615
616           Take, for example, the directory at
617           ftp://ftp.xemacs.org/pub/xemacs/.  If you retrieve it with -r, it
618           will be saved locally under ftp.xemacs.org/pub/xemacs/.  While the
619           -nH option can remove the ftp.xemacs.org/ part, you are still stuck
620           with pub/xemacs.  This is where --cut-dirs comes in handy; it makes
621           Wget not ``see'' number remote directory components.  Here are sev‐
622           eral examples of how --cut-dirs option works.
623
624                   No options        -> ftp.xemacs.org/pub/xemacs/
625                   -nH               -> pub/xemacs/
626                   -nH --cut-dirs=1  -> xemacs/
627                   -nH --cut-dirs=2  -> .
628
629                   --cut-dirs=1      -> ftp.xemacs.org/xemacs/
630                   ...
631
632           If you just want to get rid of the directory structure, this option
633           is similar to a combination of -nd and -P.  However, unlike -nd,
634           --cut-dirs does not lose with subdirectories---for instance, with
635           -nH --cut-dirs=1, a beta/ subdirectory will be placed to
636           xemacs/beta, as one would expect.
637
638       -P prefix
639       --directory-prefix=prefix
640           Set directory prefix to prefix.  The directory prefix is the direc‐
641           tory where all other files and subdirectories will be saved to,
642           i.e. the top of the retrieval tree.  The default is . (the current
643           directory).
644
645       HTTP Options
646
647       -E
648       --html-extension
649           If a file of type application/xhtml+xml or text/html is downloaded
650           and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this
651           option will cause the suffix .html to be appended to the local
652           filename.  This is useful, for instance, when you're mirroring a
653           remote site that uses .asp pages, but you want the mirrored pages
654           to be viewable on your stock Apache server.  Another good use for
655           this is when you're downloading CGI-generated materials.  A URL
656           like http://site.com/article.cgi?25 will be saved as arti‐
657           cle.cgi?25.html.
658
659           Note that filenames changed in this way will be re-downloaded every
660           time you re-mirror a site, because Wget can't tell that the local
661           X.html file corresponds to remote URL X (since it doesn't yet know
662           that the URL produces output of type text/html or applica‐
663           tion/xhtml+xml.  To prevent this re-downloading, you must use -k
664           and -K so that the original version of the file will be saved as
665           X.orig.
666
667       --http-user=user
668       --http-password=password
669           Specify the username user and password password on an HTTP server.
670           According to the type of the challenge, Wget will encode them using
671           either the "basic" (insecure) or the "digest" authentication
672           scheme.
673
674           Another way to specify username and password is in the URL itself.
675           Either method reveals your password to anyone who bothers to run
676           "ps".  To prevent the passwords from being seen, store them in
677           .wgetrc or .netrc, and make sure to protect those files from other
678           users with "chmod".  If the passwords are really important, do not
679           leave them lying in those files either---edit the files and delete
680           them after Wget has started the download.
681
682       --no-cache
683           Disable server-side cache.  In this case, Wget will send the remote
684           server an appropriate directive (Pragma: no-cache) to get the file
685           from the remote service, rather than returning the cached version.
686           This is especially useful for retrieving and flushing out-of-date
687           documents on proxy servers.
688
689           Caching is allowed by default.
690
691       --no-cookies
692           Disable the use of cookies.  Cookies are a mechanism for maintain‐
693           ing server-side state.  The server sends the client a cookie using
694           the "Set-Cookie" header, and the client responds with the same
695           cookie upon further requests.  Since cookies allow the server own‐
696           ers to keep track of visitors and for sites to exchange this infor‐
697           mation, some consider them a breach of privacy.  The default is to
698           use cookies; however, storing cookies is not on by default.
699
700       --load-cookies file
701           Load cookies from file before the first HTTP retrieval.  file is a
702           textual file in the format originally used by Netscape's cook‐
703           ies.txt file.
704
705           You will typically use this option when mirroring sites that
706           require that you be logged in to access some or all of their con‐
707           tent.  The login process typically works by the web server issuing
708           an HTTP cookie upon receiving and verifying your credentials.  The
709           cookie is then resent by the browser when accessing that part of
710           the site, and so proves your identity.
711
712           Mirroring such a site requires Wget to send the same cookies your
713           browser sends when communicating with the site.  This is achieved
714           by --load-cookies---simply point Wget to the location of the cook‐
715           ies.txt file, and it will send the same cookies your browser would
716           send in the same situation.  Different browsers keep textual cookie
717           files in different locations:
718
719           Netscape 4.x.
720               The cookies are in ~/.netscape/cookies.txt.
721
722           Mozilla and Netscape 6.x.
723               Mozilla's cookie file is also named cookies.txt, located some‐
724               where under ~/.mozilla, in the directory of your profile.  The
725               full path usually ends up looking somewhat like
726               ~/.mozilla/default/some-weird-string/cookies.txt.
727
728           Internet Explorer.
729               You can produce a cookie file Wget can use by using the File
730               menu, Import and Export, Export Cookies.  This has been tested
731               with Internet Explorer 5; it is not guaranteed to work with
732               earlier versions.
733
734           Other browsers.
735               If you are using a different browser to create your cookies,
736               --load-cookies will only work if you can locate or produce a
737               cookie file in the Netscape format that Wget expects.
738
739           If you cannot use --load-cookies, there might still be an alterna‐
740           tive.  If your browser supports a ``cookie manager'', you can use
741           it to view the cookies used when accessing the site you're mirror‐
742           ing.  Write down the name and value of the cookie, and manually
743           instruct Wget to send those cookies, bypassing the ``official''
744           cookie support:
745
746                   wget --no-cookies --header "Cookie: <name>=<value>"
747
748       --save-cookies file
749           Save cookies to file before exiting.  This will not save cookies
750           that have expired or that have no expiry time (so-called ``session
751           cookies''), but also see --keep-session-cookies.
752
753       --keep-session-cookies
754           When specified, causes --save-cookies to also save session cookies.
755           Session cookies are normally not saved because they are meant to be
756           kept in memory and forgotten when you exit the browser.  Saving
757           them is useful on sites that require you to log in or to visit the
758           home page before you can access some pages.  With this option, mul‐
759           tiple Wget runs are considered a single browser session as far as
760           the site is concerned.
761
762           Since the cookie file format does not normally carry session cook‐
763           ies, Wget marks them with an expiry timestamp of 0.  Wget's
764           --load-cookies recognizes those as session cookies, but it might
765           confuse other browsers.  Also note that cookies so loaded will be
766           treated as other session cookies, which means that if you want
767           --save-cookies to preserve them again, you must use --keep-ses‐
768           sion-cookies again.
769
770       --ignore-length
771           Unfortunately, some HTTP servers (CGI programs, to be more precise)
772           send out bogus "Content-Length" headers, which makes Wget go wild,
773           as it thinks not all the document was retrieved.  You can spot this
774           syndrome if Wget retries getting the same document again and again,
775           each time claiming that the (otherwise normal) connection has
776           closed on the very same byte.
777
778           With this option, Wget will ignore the "Content-Length" header---as
779           if it never existed.
780
781       --header=header-line
782           Send header-line along with the rest of the headers in each HTTP
783           request.  The supplied header is sent as-is, which means it must
784           contain name and value separated by colon, and must not contain
785           newlines.
786
787           You may define more than one additional header by specifying
788           --header more than once.
789
790                   wget --header='Accept-Charset: iso-8859-2' \
791                        --header='Accept-Language: hr'        \
792                          http://fly.srk.fer.hr/
793
794           Specification of an empty string as the header value will clear all
795           previous user-defined headers.
796
797           As of Wget 1.10, this option can be used to override headers other‐
798           wise generated automatically.  This example instructs Wget to con‐
799           nect to localhost, but to specify foo.bar in the "Host" header:
800
801                   wget --header="Host: foo.bar" http://localhost/
802
803           In versions of Wget prior to 1.10 such use of --header caused send‐
804           ing of duplicate headers.
805
806       --proxy-user=user
807       --proxy-password=password
808           Specify the username user and password password for authentication
809           on a proxy server.  Wget will encode them using the "basic" authen‐
810           tication scheme.
811
812           Security considerations similar to those with --http-password per‐
813           tain here as well.
814
815       --referer=url
816           Include `Referer: url' header in HTTP request.  Useful for retriev‐
817           ing documents with server-side processing that assume they are
818           always being retrieved by interactive web browsers and only come
819           out properly when Referer is set to one of the pages that point to
820           them.
821
822       --save-headers
823           Save the headers sent by the HTTP server to the file, preceding the
824           actual contents, with an empty line as the separator.
825
826       -U agent-string
827       --user-agent=agent-string
828           Identify as agent-string to the HTTP server.
829
830           The HTTP protocol allows the clients to identify themselves using a
831           "User-Agent" header field.  This enables distinguishing the WWW
832           software, usually for statistical purposes or for tracing of proto‐
833           col violations.  Wget normally identifies as Wget/version, version
834           being the current version number of Wget.
835
836           However, some sites have been known to impose the policy of tailor‐
837           ing the output according to the "User-Agent"-supplied information.
838           While this is not such a bad idea in theory, it has been abused by
839           servers denying information to clients other than (historically)
840           Netscape or, more frequently, Microsoft Internet Explorer.  This
841           option allows you to change the "User-Agent" line issued by Wget.
842           Use of this option is discouraged, unless you really know what you
843           are doing.
844
845           Specifying empty user agent with --user-agent="" instructs Wget not
846           to send the "User-Agent" header in HTTP requests.
847
848       --post-data=string
849       --post-file=file
850           Use POST as the method for all HTTP requests and send the specified
851           data in the request body.  "--post-data" sends string as data,
852           whereas "--post-file" sends the contents of file.  Other than that,
853           they work in exactly the same way.
854
855           Please be aware that Wget needs to know the size of the POST data
856           in advance.  Therefore the argument to "--post-file" must be a reg‐
857           ular file; specifying a FIFO or something like /dev/stdin won't
858           work.  It's not quite clear how to work around this limitation
859           inherent in HTTP/1.0.  Although HTTP/1.1 introduces chunked trans‐
860           fer that doesn't require knowing the request length in advance, a
861           client can't use chunked unless it knows it's talking to an
862           HTTP/1.1 server.  And it can't know that until it receives a
863           response, which in turn requires the request to have been completed
864           -- a chicken-and-egg problem.
865
866           Note: if Wget is redirected after the POST request is completed, it
867           will not send the POST data to the redirected URL.  This is because
868           URLs that process POST often respond with a redirection to a regu‐
869           lar page, which does not desire or accept POST.  It is not com‐
870           pletely clear that this behavior is optimal; if it doesn't work
871           out, it might be changed in the future.
872
873           This example shows how to log to a server using POST and then pro‐
874           ceed to download the desired pages, presumably only accessible to
875           authorized users:
876
877                   # Log in to the server.  This can be done only once.
878                   wget --save-cookies cookies.txt \
879                        --post-data 'user=foo&password=bar' \
880                        http://server.com/auth.php
881
882                   # Now grab the page or pages we care about.
883                   wget --load-cookies cookies.txt \
884                        -p http://server.com/interesting/article.php
885
886           If the server is using session cookies to track user authentica‐
887           tion, the above will not work because --save-cookies will not save
888           them (and neither will browsers) and the cookies.txt file will be
889           empty.  In that case use --keep-session-cookies along with
890           --save-cookies to force saving of session cookies.
891
892       HTTPS (SSL/TLS) Options
893
894       To support encrypted HTTP (HTTPS) downloads, Wget must be compiled with
895       an external SSL library, currently OpenSSL.  If Wget is compiled with‐
896       out SSL support, none of these options are available.
897
898       --secure-protocol=protocol
899           Choose the secure protocol to be used.  Legal values are auto,
900           SSLv2, SSLv3, and TLSv1.  If auto is used, the SSL library is given
901           the liberty of choosing the appropriate protocol automatically,
902           which is achieved by sending an SSLv2 greeting and announcing sup‐
903           port for SSLv3 and TLSv1.  This is the default.
904
905           Specifying SSLv2, SSLv3, or TLSv1 forces the use of the correspond‐
906           ing protocol.  This is useful when talking to old and buggy SSL
907           server implementations that make it hard for OpenSSL to choose the
908           correct protocol version.  Fortunately, such servers are quite
909           rare.
910
911       --no-check-certificate
912           Don't check the server certificate against the available certifi‐
913           cate authorities.  Also don't require the URL host name to match
914           the common name presented by the certificate.
915
916           As of Wget 1.10, the default is to verify the server's certificate
917           against the recognized certificate authorities, breaking the SSL
918           handshake and aborting the download if the verification fails.
919           Although this provides more secure downloads, it does break inter‐
920           operability with some sites that worked with previous Wget ver‐
921           sions, particularly those using self-signed, expired, or otherwise
922           invalid certificates.  This option forces an ``insecure'' mode of
923           operation that turns the certificate verification errors into warn‐
924           ings and allows you to proceed.
925
926           If you encounter ``certificate verification'' errors or ones saying
927           that ``common name doesn't match requested host name'', you can use
928           this option to bypass the verification and proceed with the down‐
929           load.  Only use this option if you are otherwise convinced of the
930           site's authenticity, or if you really don't care about the validity
931           of its certificate.  It is almost always a bad idea not to check
932           the certificates when transmitting confidential or important data.
933
934       --certificate=file
935           Use the client certificate stored in file.  This is needed for
936           servers that are configured to require certificates from the
937           clients that connect to them.  Normally a certificate is not
938           required and this switch is optional.
939
940       --certificate-type=type
941           Specify the type of the client certificate.  Legal values are PEM
942           (assumed by default) and DER, also known as ASN1.
943
944       --private-key=file
945           Read the private key from file.  This allows you to provide the
946           private key in a file separate from the certificate.
947
948       --private-key-type=type
949           Specify the type of the private key.  Accepted values are PEM (the
950           default) and DER.
951
952       --ca-certificate=file
953           Use file as the file with the bundle of certificate authorities
954           (``CA'') to verify the peers.  The certificates must be in PEM for‐
955           mat.
956
957           Without this option Wget looks for CA certificates at the system-
958           specified locations, chosen at OpenSSL installation time.
959
960       --ca-directory=directory
961           Specifies directory containing CA certificates in PEM format.  Each
962           file contains one CA certificate, and the file name is based on a
963           hash value derived from the certificate.  This is achieved by pro‐
964           cessing a certificate directory with the "c_rehash" utility sup‐
965           plied with OpenSSL.  Using --ca-directory is more efficient than
966           --ca-certificate when many certificates are installed because it
967           allows Wget to fetch certificates on demand.
968
969           Without this option Wget looks for CA certificates at the system-
970           specified locations, chosen at OpenSSL installation time.
971
972       --random-file=file
973           Use file as the source of random data for seeding the pseudo-random
974           number generator on systems without /dev/random.
975
976           On such systems the SSL library needs an external source of random‐
977           ness to initialize.  Randomness may be provided by EGD (see
978           --egd-file below) or read from an external source specified by the
979           user.  If this option is not specified, Wget looks for random data
980           in $RANDFILE or, if that is unset, in $HOME/.rnd.  If none of those
981           are available, it is likely that SSL encryption will not be usable.
982
983           If you're getting the ``Could not seed OpenSSL PRNG; disabling
984           SSL.''  error, you should provide random data using some of the
985           methods described above.
986
987       --egd-file=file
988           Use file as the EGD socket.  EGD stands for Entropy Gathering Dae‐
989           mon, a user-space program that collects data from various unpre‐
990           dictable system sources and makes it available to other programs
991           that might need it.  Encryption software, such as the SSL library,
992           needs sources of non-repeating randomness to seed the random number
993           generator used to produce cryptographically strong keys.
994
995           OpenSSL allows the user to specify his own source of entropy using
996           the "RAND_FILE" environment variable.  If this variable is unset,
997           or if the specified file does not produce enough randomness,
998           OpenSSL will read random data from EGD socket specified using this
999           option.
1000
1001           If this option is not specified (and the equivalent startup command
1002           is not used), EGD is never contacted.  EGD is not needed on modern
1003           Unix systems that support /dev/random.
1004
1005       FTP Options
1006
1007       --ftp-user=user
1008       --ftp-password=password
1009           Specify the username user and password password on an FTP server.
1010           Without this, or the corresponding startup option, the password
1011           defaults to -wget@, normally used for anonymous FTP.
1012
1013           Another way to specify username and password is in the URL itself.
1014           Either method reveals your password to anyone who bothers to run
1015           "ps".  To prevent the passwords from being seen, store them in
1016           .wgetrc or .netrc, and make sure to protect those files from other
1017           users with "chmod".  If the passwords are really important, do not
1018           leave them lying in those files either---edit the files and delete
1019           them after Wget has started the download.
1020
1021       --no-remove-listing
1022           Don't remove the temporary .listing files generated by FTP
1023           retrievals.  Normally, these files contain the raw directory list‐
1024           ings received from FTP servers.  Not removing them can be useful
1025           for debugging purposes, or when you want to be able to easily check
1026           on the contents of remote server directories (e.g. to verify that a
1027           mirror you're running is complete).
1028
1029           Note that even though Wget writes to a known filename for this
1030           file, this is not a security hole in the scenario of a user making
1031           .listing a symbolic link to /etc/passwd or something and asking
1032           "root" to run Wget in his or her directory.  Depending on the
1033           options used, either Wget will refuse to write to .listing, making
1034           the globbing/recursion/time-stamping operation fail, or the sym‐
1035           bolic link will be deleted and replaced with the actual .listing
1036           file, or the listing will be written to a .listing.number file.
1037
1038           Even though this situation isn't a problem, though, "root" should
1039           never run Wget in a non-trusted user's directory.  A user could do
1040           something as simple as linking index.html to /etc/passwd and asking
1041           "root" to run Wget with -N or -r so the file will be overwritten.
1042
1043       --no-glob
1044           Turn off FTP globbing.  Globbing refers to the use of shell-like
1045           special characters (wildcards), like *, ?, [ and ] to retrieve more
1046           than one file from the same directory at once, like:
1047
1048                   wget ftp://gnjilux.srk.fer.hr/*.msg
1049
1050           By default, globbing will be turned on if the URL contains a glob‐
1051           bing character.  This option may be used to turn globbing on or off
1052           permanently.
1053
1054           You may have to quote the URL to protect it from being expanded by
1055           your shell.  Globbing makes Wget look for a directory listing,
1056           which is system-specific.  This is why it currently works only with
1057           Unix FTP servers (and the ones emulating Unix "ls" output).
1058
1059       --no-passive-ftp
1060           Disable the use of the passive FTP transfer mode.  Passive FTP man‐
1061           dates that the client connect to the server to establish the data
1062           connection rather than the other way around.
1063
1064           If the machine is connected to the Internet directly, both passive
1065           and active FTP should work equally well.  Behind most firewall and
1066           NAT configurations passive FTP has a better chance of working.
1067           However, in some rare firewall configurations, active FTP actually
1068           works when passive FTP doesn't.  If you suspect this to be the
1069           case, use this option, or set "passive_ftp=off" in your init file.
1070
1071       --retr-symlinks
1072           Usually, when retrieving FTP directories recursively and a symbolic
1073           link is encountered, the linked-to file is not downloaded.
1074           Instead, a matching symbolic link is created on the local filesys‐
1075           tem.  The pointed-to file will not be downloaded unless this recur‐
1076           sive retrieval would have encountered it separately and downloaded
1077           it anyway.
1078
1079           When --retr-symlinks is specified, however, symbolic links are tra‐
1080           versed and the pointed-to files are retrieved.  At this time, this
1081           option does not cause Wget to traverse symlinks to directories and
1082           recurse through them, but in the future it should be enhanced to do
1083           this.
1084
1085           Note that when retrieving a file (not a directory) because it was
1086           specified on the command-line, rather than because it was recursed
1087           to, this option has no effect.  Symbolic links are always traversed
1088           in this case.
1089
1090       --no-http-keep-alive
1091           Turn off the ``keep-alive'' feature for HTTP downloads.  Normally,
1092           Wget asks the server to keep the connection open so that, when you
1093           download more than one document from the same server, they get
1094           transferred over the same TCP connection.  This saves time and at
1095           the same time reduces the load on the server.
1096
1097           This option is useful when, for some reason, persistent
1098           (keep-alive) connections don't work for you, for example due to a
1099           server bug or due to the inability of server-side scripts to cope
1100           with the connections.
1101
1102       Recursive Retrieval Options
1103
1104       -r
1105       --recursive
1106           Turn on recursive retrieving.
1107
1108       -l depth
1109       --level=depth
1110           Specify recursion maximum depth level depth.  The default maximum
1111           depth is 5.
1112
1113       --delete-after
1114           This option tells Wget to delete every single file it downloads,
1115           after having done so.  It is useful for pre-fetching popular pages
1116           through a proxy, e.g.:
1117
1118                   wget -r -nd --delete-after http://whatever.com/~popular/page/
1119
1120           The -r option is to retrieve recursively, and -nd to not create
1121           directories.
1122
1123           Note that --delete-after deletes files on the local machine.  It
1124           does not issue the DELE command to remote FTP sites, for instance.
1125           Also note that when --delete-after is specified, --convert-links is
1126           ignored, so .orig files are simply not created in the first place.
1127
1128       -k
1129       --convert-links
1130           After the download is complete, convert the links in the document
1131           to make them suitable for local viewing.  This affects not only the
1132           visible hyperlinks, but any part of the document that links to
1133           external content, such as embedded images, links to style sheets,
1134           hyperlinks to non-HTML content, etc.
1135
1136           Each link will be changed in one of the two ways:
1137
1138           *   The links to files that have been downloaded by Wget will be
1139               changed to refer to the file they point to as a relative link.
1140
1141               Example: if the downloaded file /foo/doc.html links to
1142               /bar/img.gif, also downloaded, then the link in doc.html will
1143               be modified to point to ../bar/img.gif.  This kind of transfor‐
1144               mation works reliably for arbitrary combinations of directo‐
1145               ries.
1146
1147           *   The links to files that have not been downloaded by Wget will
1148               be changed to include host name and absolute path of the loca‐
1149               tion they point to.
1150
1151               Example: if the downloaded file /foo/doc.html links to
1152               /bar/img.gif (or to ../bar/img.gif), then the link in doc.html
1153               will be modified to point to http://hostname/bar/img.gif.
1154
1155           Because of this, local browsing works reliably: if a linked file
1156           was downloaded, the link will refer to its local name; if it was
1157           not downloaded, the link will refer to its full Internet address
1158           rather than presenting a broken link.  The fact that the former
1159           links are converted to relative links ensures that you can move the
1160           downloaded hierarchy to another directory.
1161
1162           Note that only at the end of the download can Wget know which links
1163           have been downloaded.  Because of that, the work done by -k will be
1164           performed at the end of all the downloads.
1165
1166       -K
1167       --backup-converted
1168           When converting a file, back up the original version with a .orig
1169           suffix.  Affects the behavior of -N.
1170
1171       -m
1172       --mirror
1173           Turn on options suitable for mirroring.  This option turns on
1174           recursion and time-stamping, sets infinite recursion depth and
1175           keeps FTP directory listings.  It is currently equivalent to -r -N
1176           -l inf --no-remove-listing.
1177
1178       -p
1179       --page-requisites
1180           This option causes Wget to download all the files that are neces‐
1181           sary to properly display a given HTML page.  This includes such
1182           things as inlined images, sounds, and referenced stylesheets.
1183
1184           Ordinarily, when downloading a single HTML page, any requisite doc‐
1185           uments that may be needed to display it properly are not down‐
1186           loaded.  Using -r together with -l can help, but since Wget does
1187           not ordinarily distinguish between external and inlined documents,
1188           one is generally left with ``leaf documents'' that are missing
1189           their requisites.
1190
1191           For instance, say document 1.html contains an "<IMG>" tag referenc‐
1192           ing 1.gif and an "<A>" tag pointing to external document 2.html.
1193           Say that 2.html is similar but that its image is 2.gif and it links
1194           to 3.html.  Say this continues up to some arbitrarily high number.
1195
1196           If one executes the command:
1197
1198                   wget -r -l 2 http://<site>/1.html
1199
1200           then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
1201           As you can see, 3.html is without its requisite 3.gif because Wget
1202           is simply counting the number of hops (up to 2) away from 1.html in
1203           order to determine where to stop the recursion.  However, with this
1204           command:
1205
1206                   wget -r -l 2 -p http://<site>/1.html
1207
1208           all the above files and 3.html's requisite 3.gif will be down‐
1209           loaded.  Similarly,
1210
1211                   wget -r -l 1 -p http://<site>/1.html
1212
1213           will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.  One
1214           might think that:
1215
1216                   wget -r -l 0 -p http://<site>/1.html
1217
1218           would download just 1.html and 1.gif, but unfortunately this is not
1219           the case, because -l 0 is equivalent to -l inf---that is, infinite
1220           recursion.  To download a single HTML page (or a handful of them,
1221           all specified on the command-line or in a -i URL input file) and
1222           its (or their) requisites, simply leave off -r and -l:
1223
1224                   wget -p http://<site>/1.html
1225
1226           Note that Wget will behave as if -r had been specified, but only
1227           that single page and its requisites will be downloaded.  Links from
1228           that page to external documents will not be followed.  Actually, to
1229           download a single page and all its requisites (even if they exist
1230           on separate websites), and make sure the lot displays properly
1231           locally, this author likes to use a few options in addition to -p:
1232
1233                   wget -E -H -k -K -p http://<site>/<document>
1234
1235           To finish off this topic, it's worth knowing that Wget's idea of an
1236           external document link is any URL specified in an "<A>" tag, an
1237           "<AREA>" tag, or a "<LINK>" tag other than "<LINK
1238           REL="stylesheet">".
1239
1240       --strict-comments
1241           Turn on strict parsing of HTML comments.  The default is to termi‐
1242           nate comments at the first occurrence of -->.
1243
1244           According to specifications, HTML comments are expressed as SGML
1245           declarations.  Declaration is special markup that begins with <!
1246           and ends with >, such as <!DOCTYPE ...>, that may contain comments
1247           between a pair of -- delimiters.  HTML comments are ``empty decla‐
1248           rations'', SGML declarations without any non-comment text.  There‐
1249           fore, <!--foo--> is a valid comment, and so is <!--one-- --two-->,
1250           but <!--1--2--> is not.
1251
1252           On the other hand, most HTML writers don't perceive comments as
1253           anything other than text delimited with <!-- and -->, which is not
1254           quite the same.  For example, something like <!------------> works
1255           as a valid comment as long as the number of dashes is a multiple of
1256           four (!).  If not, the comment technically lasts until the next --,
1257           which may be at the other end of the document.  Because of this,
1258           many popular browsers completely ignore the specification and
1259           implement what users have come to expect: comments delimited with
1260           <!-- and -->.
1261
1262           Until version 1.9, Wget interpreted comments strictly, which
1263           resulted in missing links in many web pages that displayed fine in
1264           browsers, but had the misfortune of containing non-compliant com‐
1265           ments.  Beginning with version 1.9, Wget has joined the ranks of
1266           clients that implements ``naive'' comments, terminating each com‐
1267           ment at the first occurrence of -->.
1268
1269           If, for whatever reason, you want strict comment parsing, use this
1270           option to turn it on.
1271
1272       Recursive Accept/Reject Options
1273
1274       -A acclist --accept acclist
1275       -R rejlist --reject rejlist
1276           Specify comma-separated lists of file name suffixes or patterns to
1277           accept or reject..
1278
1279       -D domain-list
1280       --domains=domain-list
1281           Set domains to be followed.  domain-list is a comma-separated list
1282           of domains.  Note that it does not turn on -H.
1283
1284       --exclude-domains domain-list
1285           Specify the domains that are not to be followed..
1286
1287       --follow-ftp
1288           Follow FTP links from HTML documents.  Without this option, Wget
1289           will ignore all the FTP links.
1290
1291       --follow-tags=list
1292           Wget has an internal table of HTML tag / attribute pairs that it
1293           considers when looking for linked documents during a recursive
1294           retrieval.  If a user wants only a subset of those tags to be con‐
1295           sidered, however, he or she should be specify such tags in a comma-
1296           separated list with this option.
1297
1298       --ignore-tags=list
1299           This is the opposite of the --follow-tags option.  To skip certain
1300           HTML tags when recursively looking for documents to download, spec‐
1301           ify them in a comma-separated list.
1302
1303           In the past, this option was the best bet for downloading a single
1304           page and its requisites, using a command-line like:
1305
1306                   wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>
1307
1308           However, the author of this option came across a page with tags
1309           like "<LINK REL="home" HREF="/">" and came to the realization that
1310           specifying tags to ignore was not enough.  One can't just tell Wget
1311           to ignore "<LINK>", because then stylesheets will not be down‐
1312           loaded.  Now the best bet for downloading a single page and its
1313           requisites is the dedicated --page-requisites option.
1314
1315       -H
1316       --span-hosts
1317           Enable spanning across hosts when doing recursive retrieving.
1318
1319       -L
1320       --relative
1321           Follow relative links only.  Useful for retrieving a specific home
1322           page without any distractions, not even those from the same hosts.
1323
1324       -I list
1325       --include-directories=list
1326           Specify a comma-separated list of directories you wish to follow
1327           when downloading.  Elements of list may contain wildcards.
1328
1329       -X list
1330       --exclude-directories=list
1331           Specify a comma-separated list of directories you wish to exclude
1332           from download.  Elements of list may contain wildcards.
1333
1334       -np
1335       --no-parent
1336           Do not ever ascend to the parent directory when retrieving recur‐
1337           sively.  This is a useful option, since it guarantees that only the
1338           files below a certain hierarchy will be downloaded.
1339

EXAMPLES

1341       The examples are divided into three sections loosely based on their
1342       complexity.
1343
1344       Simple Usage
1345
1346       ·   Say you want to download a URL.  Just type:
1347
1348                   wget http://fly.srk.fer.hr/
1349
1350       ·   But what will happen if the connection is slow, and the file is
1351           lengthy?  The connection will probably fail before the whole file
1352           is retrieved, more than once.  In this case, Wget will try getting
1353           the file until it either gets the whole of it, or exceeds the
1354           default number of retries (this being 20).  It is easy to change
1355           the number of tries to 45, to insure that the whole file will
1356           arrive safely:
1357
1358                   wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
1359
1360       ·   Now let's leave Wget to work in the background, and write its
1361           progress to log file log.  It is tiring to type --tries, so we
1362           shall use -t.
1363
1364                   wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
1365
1366           The ampersand at the end of the line makes sure that Wget works in
1367           the background.  To unlimit the number of retries, use -t inf.
1368
1369       ·   The usage of FTP is as simple.  Wget will take care of login and
1370           password.
1371
1372                   wget ftp://gnjilux.srk.fer.hr/welcome.msg
1373
1374       ·   If you specify a directory, Wget will retrieve the directory list‐
1375           ing, parse it and convert it to HTML.  Try:
1376
1377                   wget ftp://ftp.gnu.org/pub/gnu/
1378                   links index.html
1379
1380       Advanced Usage
1381
1382       ·   You have a file that contains the URLs you want to download?  Use
1383           the -i switch:
1384
1385                   wget -i <file>
1386
1387           If you specify - as file name, the URLs will be read from standard
1388           input.
1389
1390       ·   Create a five levels deep mirror image of the GNU web site, with
1391           the same directory structure the original has, with only one try
1392           per document, saving the log of the activities to gnulog:
1393
1394                   wget -r http://www.gnu.org/ -o gnulog
1395
1396       ·   The same as the above, but convert the links in the HTML files to
1397           point to local files, so you can view the documents off-line:
1398
1399                   wget --convert-links -r http://www.gnu.org/ -o gnulog
1400
1401       ·   Retrieve only one HTML page, but make sure that all the elements
1402           needed for the page to be displayed, such as inline images and
1403           external style sheets, are also downloaded.  Also make sure the
1404           downloaded page references the downloaded links.
1405
1406                   wget -p --convert-links http://www.server.com/dir/page.html
1407
1408           The HTML page will be saved to www.server.com/dir/page.html, and
1409           the images, stylesheets, etc., somewhere under www.server.com/,
1410           depending on where they were on the remote server.
1411
1412       ·   The same as the above, but without the www.server.com/ directory.
1413           In fact, I don't want to have all those random server directories
1414           anyway---just save all those files under a download/ subdirectory
1415           of the current directory.
1416
1417                   wget -p --convert-links -nH -nd -Pdownload \
1418                        http://www.server.com/dir/page.html
1419
1420       ·   Retrieve the index.html of www.lycos.com, showing the original
1421           server headers:
1422
1423                   wget -S http://www.lycos.com/
1424
1425       ·   Save the server headers with the file, perhaps for post-processing.
1426
1427                   wget --save-headers http://www.lycos.com/
1428                   more index.html
1429
1430       ·   Retrieve the first two levels of wuarchive.wustl.edu, saving them
1431           to /tmp.
1432
1433                   wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
1434
1435       ·   You want to download all the GIFs from a directory on an HTTP
1436           server.  You tried wget http://www.server.com/dir/*.gif, but that
1437           didn't work because HTTP retrieval does not support globbing.  In
1438           that case, use:
1439
1440                   wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
1441
1442           More verbose, but the effect is the same.  -r -l1 means to retrieve
1443           recursively, with maximum depth of 1.  --no-parent means that ref‐
1444           erences to the parent directory are ignored, and -A.gif means to
1445           download only the GIF files.  -A "*.gif" would have worked too.
1446
1447       ·   Suppose you were in the middle of downloading, when Wget was inter‐
1448           rupted.  Now you do not want to clobber the files already present.
1449           It would be:
1450
1451                   wget -nc -r http://www.gnu.org/
1452
1453       ·   If you want to encode your own username and password to HTTP or
1454           FTP, use the appropriate URL syntax.
1455
1456                   wget ftp://hniksic:mypassword@unix.server.com/.emacs
1457
1458           Note, however, that this usage is not advisable on multi-user sys‐
1459           tems because it reveals your password to anyone who looks at the
1460           output of "ps".
1461
1462       ·   You would like the output documents to go to standard output
1463           instead of to files?
1464
1465                   wget -O - http://jagor.srce.hr/ http://www.srce.hr/
1466
1467           You can also combine the two options and make pipelines to retrieve
1468           the documents from remote hotlists:
1469
1470                   wget -O - http://cool.list.com/ ⎪ wget --force-html -i -
1471
1472       Very Advanced Usage
1473
1474       ·   If you wish Wget to keep a mirror of a page (or FTP subdirecto‐
1475           ries), use --mirror (-m), which is the shorthand for -r -l inf -N.
1476           You can put Wget in the crontab file asking it to recheck a site
1477           each Sunday:
1478
1479                   crontab
1480                   0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog
1481
1482       ·   In addition to the above, you want the links to be converted for
1483           local viewing.  But, after having read this manual, you know that
1484           link conversion doesn't play well with timestamping, so you also
1485           want Wget to back up the original HTML files before the conversion.
1486           Wget invocation would look like this:
1487
1488                   wget --mirror --convert-links --backup-converted  \
1489                        http://www.gnu.org/ -o /home/me/weeklog
1490
1491       ·   But you've also noticed that local viewing doesn't work all that
1492           well when HTML files are saved under extensions other than .html,
1493           perhaps because they were served as index.cgi.  So you'd like Wget
1494           to rename all the files served with content-type text/html or
1495           application/xhtml+xml to name.html.
1496
1497                   wget --mirror --convert-links --backup-converted \
1498                        --html-extension -o /home/me/weeklog        \
1499                        http://www.gnu.org/
1500
1501           Or, with less typing:
1502
1503                   wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog
1504

FILES

1506       /etc/wgetrc
1507           Default location of the global startup file.
1508
1509       .wgetrc
1510           User startup file.
1511

BUGS

1513       You are welcome to send bug reports about GNU Wget to
1514       <bug-wget@gnu.org>.
1515
1516       Before actually submitting a bug report, please try to follow a few
1517       simple guidelines.
1518
1519       1.  Please try to ascertain that the behavior you see really is a bug.
1520           If Wget crashes, it's a bug.  If Wget does not behave as docu‐
1521           mented, it's a bug.  If things work strange, but you are not sure
1522           about the way they are supposed to work, it might well be a bug.
1523
1524       2.  Try to repeat the bug in as simple circumstances as possible.  E.g.
1525           if Wget crashes while downloading wget -rl0 -kKE -t5 -Y0
1526           http://yoyodyne.com -o /tmp/log, you should try to see if the crash
1527           is repeatable, and if will occur with a simpler set of options.
1528           You might even try to start the download at the page where the
1529           crash occurred to see if that page somehow triggered the crash.
1530
1531           Also, while I will probably be interested to know the contents of
1532           your .wgetrc file, just dumping it into the debug message is proba‐
1533           bly a bad idea.  Instead, you should first try to see if the bug
1534           repeats with .wgetrc moved out of the way.  Only if it turns out
1535           that .wgetrc settings affect the bug, mail me the relevant parts of
1536           the file.
1537
1538       3.  Please start Wget with -d option and send us the resulting output
1539           (or relevant parts thereof).  If Wget was compiled without debug
1540           support, recompile it---it is much easier to trace bugs with debug
1541           support on.
1542
1543           Note: please make sure to remove any potentially sensitive informa‐
1544           tion from the debug log before sending it to the bug address.  The
1545           "-d" won't go out of its way to collect sensitive information, but
1546           the log will contain a fairly complete transcript of Wget's commu‐
1547           nication with the server, which may include passwords and pieces of
1548           downloaded data.  Since the bug address is publically archived, you
1549           may assume that all bug reports are visible to the public.
1550
1551       4.  If Wget has crashed, try to run it in a debugger, e.g. "gdb `which
1552           wget` core" and type "where" to get the backtrace.  This may not
1553           work if the system administrator has disabled core files, but it is
1554           safe to try.
1555

AUTHOR

1560       Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
1561

COPYRIGHT

1563       Copyright (c) 1996--2005 Free Software Foundation, Inc.
1564
1565       Permission is granted to make and distribute verbatim copies of this
1566       manual provided the copyright notice and this permission notice are
1567       preserved on all copies.
1568
1569       Permission is granted to copy, distribute and/or modify this document
1570       under the terms of the GNU Free Documentation License, Version 1.2 or
1571       any later version published by the Free Software Foundation; with the
1572       Invariant Sections being ``GNU General Public License'' and ``GNU Free
1573       Documentation License'', with no Front-Cover Texts, and with no Back-
1574       Cover Texts.  A copy of the license is included in the section entitled
1575       ``GNU Free Documentation License''.
1576
1577
1578
1579GNU Wget 1.10.2 (Red Hat modified)2007-02-12                           WGET(1)