1httrack(1)                  General Commands Manual                 httrack(1)
2
3
4

NAME

6       httrack - offline browser : copy websites to a local directory
7

SYNOPSIS

9       httrack  [ url ]... [ -filter ]... [ +filter ]... [ ] [ -%O, --chroot ]
10       [ -w, --mirror ] [ -W, --mirror-wizard ] [  -g,  --get-files  ]  [  -i,
11       --continue  ] [ -Y, --mirrorlinks ] [ -P, --proxy ] [ -%f, --httpproxy-
12       ftp[=N] ] [ -%b, --bind ] [ -rN, --depth[=N] ] [ -%eN,  --ext-depth[=N]
13       ]  [  -mN,  --max-files[=N]  ]  [  -MN,  --max-size[=N] ] [ -EN, --max-
14       time[=N] ] [ -AN, --max-rate[=N] ] [ -%cN,  --connection-per-second[=N]
15       ]  [ -GN, --max-pause[=N] ] [ -%mN, --max-mms-time[=N] ] [ -cN, --sock‐
16       ets[=N] ] [ -TN, --timeout ] [  -RN,  --retries[=N]  ]  [  -JN,  --min-
17       rate[=N]  ] [ -HN, --host-control[=N] ] [ -%P, --extended-parsing[=N] ]
18       [ -n, --near ] [ -t, --test ] [ -%L, --list ] [ -%S, --urllist ] [ -NN,
19       --structure[=N]  ]  [ -%D, --cached-delayed-type-check ] [ -%M, --mime-
20       html ] [ -LN, --long-names[=N]  ]  [  -KN,  --keep-links[=N]  ]  [  -x,
21       --replace-external  ]  [  -%x,  --disable-passwords ] [ -%q, --include-
22       query-string ] [ -o, --generate-errors ] [ -X, --purge-old[=N] ] [ -%p,
23       --preserve  ]  [  -bN,  --cookies[=N]  ] [ -u, --check-type[=N] ] [ -j,
24       --parse-java[=N] ] [ -sN, --robots[=N] ] [  -%h,  --http-10  ]  [  -%k,
25       --keep-alive  ] [ -%B, --tolerant ] [ -%s, --updatehack ] [ -%u, --url‐
26       hack ] [ -%A, --assume ] [ -@iN, --protocol[=N] ] [ -%w, --disable-mod‐
27       ule  ]  [  -F, --user-agent ] [ -%R, --referer ] [ -%E, --from ] [ -%F,
28       --footer ] [ -%l, --language ] [ -C, --cache[=N] ] [  -k,  --store-all-
29       in-cache ] [ -%n, --do-not-recatch ] [ -%v, --display ] [ -Q, --do-not-
30       log ] [ -q, --quiet ] [ -z, --extra-log ] [ -Z,  --debug-log  ]  [  -v,
31       --verbose  ] [ -f, --file-log ] [ -f2, --single-log ] [ -I, --index ] [
32       -%i, --build-top-index ] [ -%I, --search-index ] [ -pN,  --priority[=N]
33       ]  [ -S, --stay-on-same-dir ] [ -D, --can-go-down ] [ -U, --can-go-up ]
34       [ -B, --can-go-up-and-down ]  [  -a,  --stay-on-same-address  ]  [  -d,
35       --stay-on-same-domain  ]  [  -l, --stay-on-same-tld ] [ -e, --go-every‐
36       where ] [ -%H, --debug-headers ] [ -%!, --disable-security-limits  ]  [
37       -V,  --userdef-cmd  ] [ -%U, --user ] [ -%W, --callback ] [ -K, --keep-
38       links[=N] ] [
39

DESCRIPTION

41       httrack allows you to download a World Wide Web site from the  Internet
42       to  a  local  directory,  building recursively all directories, getting
43       HTML, images, and other files from the server to your computer. HTTrack
44       arranges  the  original  site's  relative link-structure. Simply open a
45       page of the "mirrored" website in your browser, and you can browse  the
46       site  from  link to link, as if you were viewing it online. HTTrack can
47       also update an existing mirrored site,  and  resume  interrupted  down‐
48       loads.
49

EXAMPLES

51       httrack www.someweb.com/bob/
52               mirror site www.someweb.com/bob/ and only this site
53
54       httrack   www.someweb.com/bob/  www.anothertest.com/mike/  +*.com/*.jpg
55       -mime:application/*
56               mirror the two sites together (with shared  links)  and  accept
57              any .jpg files on .com sites
58
59       httrack www.someweb.com/bob/bobby.html +* -r6
60              means get all files starting from bobby.html, with 6 link-depth,
61              and possibility of going everywhere on the web
62
63       httrack        www.someweb.com/bob/bobby.html        --spider        -P
64       proxy.myhost.com:8080
65              runs the spider on www.someweb.com/bob/bobby.html using a proxy
66
67       httrack --update
68              updates a mirror in the current folder
69
70       httrack
71              will bring you to the interactive mode
72
73       httrack --continue
74              continues a mirror in the current folder
75

OPTIONS

77   General options:
78       -O     path  for  mirror/logfiles+cache (-O path mirror[,path cache and
79              logfiles]) (--path <param>)
80
81       -%O    chroot path to, must be r00t (-%O root path) (--chroot <param>)
82
83
84   Action options:
85       -w     *mirror web sites (--mirror)
86
87       -W     mirror web sites, semi-automatic (asks questions) (--mirror-wiz‐
88              ard)
89
90       -g     just get files (saved in the current directory) (--get-files)
91
92       -i     continue an interrupted mirror using the cache (--continue)
93
94       -Y     mirror ALL links located in the first level pages (mirror links)
95              (--mirrorlinks)
96
97
98   Proxy options:
99       -P     proxy use (-P proxy:port or  -P  user:pass@proxy:port)  (--proxy
100              <param>)
101
102       -%f    *use proxy for ftp (f0 don t use) (--httpproxy-ftp[=N])
103
104       -%b    use  this  local  hostname  to make/send requests (-%b hostname)
105              (--bind <param>)
106
107
108   Limits options:
109       -rN    set the mirror depth to N (* r9999) (--depth[=N])
110
111       -%eN   set the external links depth to N (* %e0) (--ext-depth[=N])
112
113       -mN    maximum file length for a non-html file (--max-files[=N])
114
115       -mN,N2 maximum file length for non html (N) and html (N2)
116
117       -MN    maximum  overall  size  that  can  be  uploaded/scanned  (--max-
118              size[=N])
119
120       -EN    maximum  mirror  time  in  seconds  (60=1  minute,  3600=1 hour)
121              (--max-time[=N])
122
123       -AN    maximum transfer rate in bytes/seconds (1000=1KB/s max)  (--max-
124              rate[=N])
125
126       -%cN   maximum number of connections/seconds (*%c10) (--connection-per-
127              second[=N])
128
129       -GN    pause transfer if N bytes reached, and wait until lock  file  is
130              deleted (--max-pause[=N])
131
132       -%mN   maximum mms stream download time in seconds (60=1 minute, 3600=1
133              hour) (--max-mms-time[=N])
134
135
136   Flow control:
137       -cN    number of multiple connections (*c8) (--sockets[=N])
138
139       -TN    timeout, number of seconds after a non-responding link is  shut‐
140              down (--timeout)
141
142       -RN    number  of retries, in case of timeout or non-fatal errors (*R1)
143              (--retries[=N])
144
145       -JN    traffic jam control, minimum transfert rate (bytes/seconds) tol‐
146              erated for a link (--min-rate[=N])
147
148       -HN    host  is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or
149              slow (--host-control[=N])
150
151
152   Links options:
153       -%P    *extended parsing, attempt to parse all links, even  in  unknown
154              tags or Javascript (%P0 don t use) (--extended-parsing[=N])
155
156       -n     get  non-html  files   near   an html file (ex: an image located
157              outside) (--near)
158
159       -t     test all URLs (even forbidden ones) (--test)
160
161       -%L    <file> add all URL located in this text file (one URL per  line)
162              (--list <param>)
163
164       -%S    <file>  add  all  scan rules located in this text file (one scan
165              rule per line) (--urllist <param>)
166
167
168   Build options:
169       -NN    structure type (0 *original structure, 1+: see below)  (--struc‐
170              ture[=N])
171
172       -or    user defined structure (-N "%h%p/%n%q.%t")
173
174       -%N    delayed  type check, don t make any link test but wait for files
175              download to start instead (experimental) (%N0 don t use, %N1 use
176              for unknown extensions, * %N2 always use)
177
178       -%D    cached  delayed  type  check,  don t wait for remote type during
179              updates, to speedup them (%D0 wait, * %D1 don t wait) (--cached-
180              delayed-type-check)
181
182       -%M    generate  a  RFC  MIME-encapsulated full-archive (.mht) (--mime-
183              html)
184
185       -LN    long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 com‐
186              patible) (--long-names[=N])
187
188       -KN    keep  original  links  (e.g.  http://www.adr/link) (K0 *relative
189              link, K absolute links,  K4  original  links,  K3  absolute  URI
190              links) (--keep-links[=N])
191
192       -x     replace external html links by error pages (--replace-external)
193
194       -%x    do not include any password for external password protected web‐
195              sites (%x0 include) (--disable-passwords)
196
197       -%q    *include query string for local files (useless, for  information
198              purpose only) (%q0 don t include) (--include-query-string)
199
200       -o     *generate  output  html  file in case of error (404..) (o0 don t
201              generate) (--generate-errors)
202
203       -X     *purge old files after update (X0 keep delete) (--purge-old[=N])
204
205       -%p    preserve html files  as is  (identical to  -K4 -%F "" )  (--pre‐
206              serve)
207
208
209   Spider options:
210       -bN    accept  cookies  in  cookies.txt  (0=do  not  accept,* 1=accept)
211              (--cookies[=N])
212
213       -u     check document type if unknown (cgi,asp..) (u0 don t check, * u1
214              check but /, u2 check always) (--check-type[=N])
215
216       -j     *parse  Java Classes (j0 don t parse, bitmask: |1 parse default,
217              |2 don t parse .class |4 don t parse .js |8 don t be aggressive)
218              (--parse-java[=N])
219
220       -sN    follow  robots.txt  and  meta robots tags (0=never,1=sometimes,*
221              2=always, 3=always (even strict rules)) (--robots[=N])
222
223       -%h    force HTTP/1.0 requests (reduce update features,  only  for  old
224              servers or proxies) (--http-10)
225
226       -%k    use  keep-alive if possible, greately reducing latency for small
227              files and test requests (%k0 don t use) (--keep-alive)
228
229       -%B    tolerant requests (accept bogus responses on some  servers,  but
230              not standard!) (--tolerant)
231
232       -%s    update  hacks: various hacks to limit re-transfers when updating
233              (identical size, bogus response..) (--updatehack)
234
235       -%u    url hacks: various hacks to  limit  duplicate  URLs  (strip  //,
236              www.foo.com==foo.com..) (--urlhack)
237
238       -%A    assume that a type (cgi,asp..) is always linked with a mime type
239              (-%A   php3,cgi=text/html;dat,bin=application/x-zip)   (--assume
240              <param>)
241
242       -can   also   be   used   to  force  a  specific  file  type:  --assume
243              foo.cgi=text/html
244
245       -@iN   internet protocol (0=both ipv6+ipv4, 4=ipv4 only,  6=ipv6  only)
246              (--protocol[=N])
247
248       -%w    disable a specific external mime module (-%w htsswf -%w htsjava)
249              (--disable-module <param>)
250
251
252   Browser ID:
253       -F     user-agent field sent in HTTP  headers  (-F  "user-agent  name")
254              (--user-agent <param>)
255
256       -%R    default referer field sent in HTTP headers (--referer <param>)
257
258       -%E    from email address sent in HTTP headers (--from <param>)
259
260       -%F    footer string in Html code (-%F "Mirrored [from host %s [file %s
261              [at %s]]]" (--footer <param>)
262
263       -%l    preffered language (-%l "fr, en, jp, *" (--language <param>)
264
265
266   Log, index, cache
267       -C     create/use a cache for updates and retries (C0 no cache,C1 cache
268              is prioritary,* C2 test update before) (--cache[=N])
269
270       -k     store all files in cache (not useful if files on disk) (--store-
271              all-in-cache)
272
273       -%n    do not re-download locally erased files (--do-not-recatch)
274
275       -%v    display on screen filenames downloaded (in  realtime)  -  *  %v1
276              short version - %v2 full animation (--display)
277
278       -Q     no log - quiet mode (--do-not-log)
279
280       -q     no questions - quiet mode (--quiet)
281
282       -z     log - extra infos (--extra-log)
283
284       -Z     log - debug (--debug-log)
285
286       -v     log on screen (--verbose)
287
288       -f     *log in files (--file-log)
289
290       -f2    one single log file (--single-log)
291
292       -I     *make an index (I0 don t make) (--index)
293
294       -%i    make  a  top  index  for  a  project  folder  (* %i0 don t make)
295              (--build-top-index)
296
297       -%I    make an searchable index for this mirror  (*  %I0  don  t  make)
298              (--search-index)
299
300
301   Expert options:
302       -pN    priority mode: (* p3) (--priority[=N])
303
304       -p0    just scan, don t save anything (for checking links)
305
306       -p1    save only html files
307
308       -p2    save only non html files
309
310       -*p3   save all files
311
312       -p7    get html files before, then treat other files
313
314       -S     stay on the same directory (--stay-on-same-dir)
315
316       -D     *can only go down into subdirs (--can-go-down)
317
318       -U     can only go to upper directories (--can-go-up)
319
320       -B     can  both  go up&down into the directory structure (--can-go-up-
321              and-down)
322
323       -a     *stay on the same address (--stay-on-same-address)
324
325       -d     stay on the same principal domain (--stay-on-same-domain)
326
327       -l     stay on the same TLD (eg: .com) (--stay-on-same-tld)
328
329       -e     go everywhere on the web (--go-everywhere)
330
331       -%H    debug HTTP headers in logfile (--debug-headers)
332
333
334   Guru options: (do NOT use if possible)
335       -#X    *use optimized engine (limited memory boundary checks)  (--fast-
336              engine)
337
338       -#0    filter  test  (-#0  *.gif   www.bar.com/foo.gif ) (--debug-test‐
339              filters <param>)
340
341       -#1    simplify test (-#1 ./foo/bar/../foobar)
342
343       -#2    type test (-#2 /foo/bar.php)
344
345       -#C    cache list (-#C  *.com/spider*.gif  (--debug-cache <param>)
346
347       -#R    cache repair (damaged cache) (--repair-cache)
348
349       -#d    debug parser (--debug-parsing)
350
351       -#E    extract new.zip cache meta-data in meta.zip
352
353       -#f    always flush log files (--advanced-flushlogs)
354
355       -#FN   maximum number of filters (--advanced-maxfilters[=N])
356
357       -#h    version info (--version)
358
359       -#K    scan stdin (debug) (--debug-scanstdin)
360
361       -#L    maximum number of links (-#L1000000) (--advanced-maxlinks)
362
363       -#p    display ugly progress information (--advanced-progressinfo)
364
365       -#P    catch URL (--catch-url)
366
367       -#R    old FTP routines (debug) (--repair-cache)
368
369       -#T    generate transfer ops. log every minutes (--debug-xfrstats)
370
371       -#u    wait time (--advanced-wait)
372
373       -#Z    generate transfer rate statictics every minutes  (--debug-rates‐
374              tats)
375
376       -#!    execute a shell command (-#! "echo hello") (--exec <param>)
377
378
379   Dangerous options: (do NOT use unless you exactly know what you are doing)
380       -%!    bypass  built-in  security limits aimed to avoid bandwith abuses
381              (bandwidth, simultaneous  connections)  (--disable-security-lim‐
382              its)
383
384       -IMPORTANT
385              NOTE: DANGEROUS OPTION, ONLY SUITABLE FOR EXPERTS
386
387       -USE   IT WITH EXTREME CARE
388
389
390   Command-line specific options:
391       -V     execute  system command after each files ($0 is the filename: -V
392              "rm ") (--userdef-cmd <param>)
393
394       -%U    run the engine with another id when called as root  (-%U  smith)
395              (--user <param>)
396
397       -%W    use   an   external   library   function   as   a  wrapper  (-%W
398              myfoo.so[,myparameters]) (--callback <param>)
399
400
401   Details: Option N
402       -N0    Site-structure (default)
403
404       -N1    HTML in web/, images/other files in web/images/
405
406       -N2    HTML in web/HTML, images/other in web/images
407
408       -N3    HTML in web/,  images/other in web/
409
410       -N4    HTML in web/, images/other in web/xxx, where  xxx  is  the  file
411              extension (all gif will be placed onto web/gif, for example)
412
413       -N5    Images/other in web/xxx and HTML in web/HTML
414
415       -N99   All files in web/, with random names (gadget !)
416
417       -N100  Site-structure, without www.domain.xxx/
418
419       -N101  Identical to N1 exept that "web" is replaced by the site s name
420
421       -N102  Identical to N2 exept that "web" is replaced by the site s name
422
423       -N103  Identical to N3 exept that "web" is replaced by the site s name
424
425       -N104  Identical to N4 exept that "web" is replaced by the site s name
426
427       -N105  Identical to N5 exept that "web" is replaced by the site s name
428
429       -N199  Identical to N99 exept that "web" is replaced by the site s name
430
431       -N1001 Identical to N1 exept that there is no "web" directory
432
433       -N1002 Identical to N2 exept that there is no "web" directory
434
435       -N1003 Identical  to  N3 exept that there is no "web" directory (option
436              set for g option)
437
438       -N1004 Identical to N4 exept that there is no "web" directory
439
440       -N1005 Identical to N5 exept that there is no "web" directory
441
442       -N1099 Identical to N99 exept that there is no "web" directory
443
444   Details: User-defined option N
445          %n  Name of file without file type (ex: image)
446          %N  Name of file, including file type (ex: image.gif)
447          %t  File type (ex: gif)
448          %p  Path [without ending /] (ex: /someimages)
449          %h  Host name (ex: www.someweb.com)
450          %M  URL MD5 (128 bits, 32 ascii bytes)
451          %Q  query string MD5 (128 bits, 32 ascii bytes)
452          %r  protocol name (ex: http)
453          %q  small query string MD5 (16 bits, 4 ascii bytes)
454             %s?  Short name version (ex: %sN)
455          %[param]  param variable in query string
456          %[param:before:after:empty:notfound]  advanced variable extraction
457
458   Details: User-defined option N and advanced variable extraction
459          %[param:before:after:empty:notfound]
460
461       -param : parameter name
462
463       -before
464              : string to prepend if the parameter was found
465
466       -after : string to append if the parameter was found
467
468       -notfound
469              : string replacement if the parameter could not be found
470
471       -empty : string replacement if the parameter was empty
472
473       -all   fields, except the first one (the parameter name), can be empty
474
475
476   Details: Option K
477       -K0    foo.cgi?q=45  ->  foo4B54.html?q=45 (relative URI, default)
478
479       -K     ->   http://www.foobar.com/folder/foo.cgi?q=45  (absolute   URL)
480              (--keep-links[=N])
481
482       -K4    ->  foo.cgi?q=45 (original URL)
483
484       -K3    ->  /folder/foo.cgi?q=45 (absolute URI)
485
486
487   Shortcuts:
488       --mirror
489                   <URLs> *make a mirror of site(s) (default)
490
491       --get
492                      <URLs>   get the files indicated, do not seek other URLs
493              (-qg)
494
495       --list
496                <text file>  add all URL located in this text file (-%L)
497
498       --mirrorlinks
499              <URLs>  mirror all links in 1st level pages (-Y)
500
501       --testlinks
502                <URLs>  test links in pages (-r1p0C0I0t)
503
504       --spider
505                   <URLs>  spider site(s), to test  links:  reports  Errors  &
506              Warnings (-p0C0I0t)
507
508       --testsite
509                 <URLs>  identical to --spider
510
511       --skeleton
512                 <URLs>  make a mirror, but gets only html files (-p1)
513
514       --update
515                           update a mirror, without confirmation (-iC2)
516
517       --continue
518                         continue a mirror, without confirmation (-iC1)
519
520
521       --catchurl
522                         create  a temporary proxy to capture an URL or a form
523              post URL
524
525       --clean
526                            erase cache & log files
527
528
529       --http10
530                           force http/1.0 requests (-%h)
531
532
533   Details: Option %W: External callbacks prototypes
534   see htsdefines.h

FILES

536       /etc/httrack.conf
537              The system wide configuration file.
538

ENVIRONMENT

540       HOME   Is being used if you defined in /etc/httrack.conf the line  path
541              ~/websites/#
542

DIAGNOSTICS

544       Errors/Warnings are reported to hts-log.txt by default, or to stderr if
545       the -v option was specified.
546

LIMITS

548       These are the principals limits of HTTrack for that moment.  Note  that
549       we did not heard about any other utility that would have solved them.
550
551
552       -  Several  scripts generating complex filenames may not find them (ex:
553       img.src='image'+a+Mobj.dst+'.gif')
554
555       - Some java classes may not find some files on them (class included)
556
557       - Cgi-bin links  may  not  work  properly  in  some  cases  (parameters
558       needed). To avoid them: use filters like -*cgi-bin*
559
560

BUGS

562       Please  reports  bugs to <bugs@httrack.com>.  Include a complete, self-
563       contained example that will allow the bug to  be  reproduced,  and  say
564       which version of httrack you are using. Do not forget to detail options
565       used, OS version, and any other information you deem necessary.
566
568       Copyright (C) Xavier Roche and other contributors
569
570       This program is free software; you can redistribute it and/or modify it
571       under  the  terms of the GNU General Public License as published by the
572       Free Software Foundation; either version 2 of the License, or any later
573       version.
574
575       This  program  is  distributed  in the hope that it will be useful, but
576       WITHOUT ANY  WARRANTY;  without  even  the  implied  warranty  of  MER‐
577       CHANTABILITY  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
578       Public License for more details.
579
580       You should have received a copy of the GNU General Public License along
581       with this program; if not, write to the Free Software Foundation, Inc.,
582       59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
583

AVAILABILITY

585       The  most   recent  released  version  of  httrack  can  be  found  at:
586       http://www.httrack.com
587

AUTHOR

589       Xavier Roche <roche@httrack.com>
590

SEE ALSO

592       The       HTML       documentation       (available      online      at
593       http://www.httrack.com/html/  )  contains  more  detailed  information.
594       Please   also   refer   to   the   httrack  FAQ  (available  online  at
595       http://www.httrack.com/html/faq.html )
596
597
598
599httrack website copier             Jun 2007                         httrack(1)
Impressum