1httrack(1)                  General Commands Manual                 httrack(1)
2
3
4

NAME

6       httrack - offline browser : copy websites to a local directory
7

SYNOPSIS

9       httrack  [ url ]... [ -filter ]... [ +filter ]... [ -O, --path ] [ -%O,
10       --chroot ] [ -w, --mirror ] [ -W, --mirror-wizard ] [ -g, --get-files ]
11       [ -i, --continue ] [ -Y, --mirrorlinks ] [ -P, --proxy ] [ -%f, --http‐
12       proxy-ftp[=N]  ]  [  -%b,  --bind  ]  [  -rN,  --depth[=N]  ]  [  -%eN,
13       --ext-depth[=N]  ]  [  -mN, --max-files[=N] ] [ -MN, --max-size[=N] ] [
14       -EN, --max-time[=N]  ]  [  -AN,  --max-rate[=N]  ]  [  -%cN,  --connec‐
15       tion-per-second[=N]    ]    [    -GN,   --max-pause[=N]   ]   [   -%mN,
16       --max-mms-time[=N] ] [ -cN, --sockets[=N] ] [ -TN, --timeout ]  [  -RN,
17       --retries[=N]  ]  [ -JN, --min-rate[=N] ] [ -HN, --host-control[=N] ] [
18       -%P, --extended-parsing[=N] ] [ -n, --near ] [  -t,  --test  ]  [  -%L,
19       --list   ]  [  -%S,  --urllist  ]  [  -NN,  --structure[=N]  ]  [  -%D,
20       --cached-delayed-type-check   ]   [   -%M,   --mime-html   ]   [   -LN,
21       --long-names[=N] ] [ -KN, --keep-links[=N] ] [ -x, --replace-external ]
22       [ -%x, --disable-passwords ] [  -%q,  --include-query-string  ]  [  -o,
23       --generate-errors  ] [ -X, --purge-old[=N] ] [ -%p, --preserve ] [ -bN,
24       --cookies[=N] ] [ -u, --check-type[=N] ] [  -j,  --parse-java[=N]  ]  [
25       -sN,  --robots[=N]  ]  [  -%h, --http-10 ] [ -%k, --keep-alive ] [ -%B,
26       --tolerant ] [ -%s, --updatehack ] [ -%u, --urlhack ] [ -%A, --assume ]
27       [ -@iN, --protocol[=N] ] [ -%w, --disable-module ] [ -F, --user-agent ]
28       [ -%R, --referer ] [ -%E, --from ] [ -%F, --footer ] [ -%l,  --language
29       ]   [   -C,   --cache[=N]   ]  [  -k,  --store-all-in-cache  ]  [  -%n,
30       --do-not-recatch ] [ -%v, --display  ]  [  -Q,  --do-not-log  ]  [  -q,
31       --quiet  ]  [ -z, --extra-log ] [ -Z, --debug-log ] [ -v, --verbose ] [
32       -f, --file-log ]  [  -f2,  --single-log  ]  [  -I,  --index  ]  [  -%i,
33       --build-top-index  ]  [ -%I, --search-index ] [ -pN, --priority[=N] ] [
34       -S, --stay-on-same-dir ] [ -D, --can-go-down ] [ -U,  --can-go-up  ]  [
35       -B,  --can-go-up-and-down  ]  [  -a,  --stay-on-same-address  ]  [  -d,
36       --stay-on-same-domain ] [ -l, --stay-on-same-tld ]  [  -e,  --go-every‐
37       where  ]  [ -%H, --debug-headers ] [ -%!, --disable-security-limits ] [
38       -V, --userdef-cmd ]  [  -%U,  --user  ]  [  -%W,  --callback  ]  [  -K,
39       --keep-links[=N] ] [
40

DESCRIPTION

42       httrack  allows you to download a World Wide Web site from the Internet
43       to a local directory, building  recursively  all  directories,  getting
44       HTML, images, and other files from the server to your computer. HTTrack
45       arranges the original site's relative  link-structure.  Simply  open  a
46       page  of the "mirrored" website in your browser, and you can browse the
47       site from link to link, as if you were viewing it online.  HTTrack  can
48       also  update  an  existing  mirrored site, and resume interrupted down‐
49       loads.
50

EXAMPLES

52       httrack www.someweb.com/bob/
53               mirror site www.someweb.com/bob/ and only this site
54
55       httrack  www.someweb.com/bob/  www.anothertest.com/mike/   +*.com/*.jpg
56       -mime:application/*
57               mirror  the  two  sites together (with shared links) and accept
58              any .jpg files on .com sites
59
60       httrack www.someweb.com/bob/bobby.html +* -r6
61              means get all files starting from bobby.html, with 6 link-depth,
62              and possibility of going everywhere on the web
63
64       httrack        www.someweb.com/bob/bobby.html        --spider        -P
65       proxy.myhost.com:8080
66              runs the spider on www.someweb.com/bob/bobby.html using a proxy
67
68       httrack --update
69              updates a mirror in the current folder
70
71       httrack
72              will bring you to the interactive mode
73
74       httrack --continue
75              continues a mirror in the current folder
76

OPTIONS

78   General options:
79       -O     path for mirror/logfiles+cache (-O path mirror[,path  cache  and
80              logfiles]) (--path <param>)
81
82       -%O    chroot path to, must be r00t (-%O root path) (--chroot <param>)
83
84
85   Action options:
86       -w     *mirror web sites (--mirror)
87
88       -W     mirror web sites, semi-automatic (asks questions) (--mirror-wiz‐
89              ard)
90
91       -g     just get files (saved in the current directory) (--get-files)
92
93       -i     continue an interrupted mirror using the cache (--continue)
94
95       -Y     mirror ALL links located in the first level pages (mirror links)
96              (--mirrorlinks)
97
98
99   Proxy options:
100       -P     proxy  use  (-P  proxy:port or -P user:pass@proxy:port) (--proxy
101              <param>)
102
103       -%f    *use proxy for ftp (f0 don t use) (--httpproxy-ftp[=N])
104
105       -%b    use this local hostname to  make/send  requests  (-%b  hostname)
106              (--bind <param>)
107
108
109   Limits options:
110       -rN    set the mirror depth to N (* r9999) (--depth[=N])
111
112       -%eN   set the external links depth to N (* %e0) (--ext-depth[=N])
113
114       -mN    maximum file length for a non-html file (--max-files[=N])
115
116       -mN,N2 maximum file length for non html (N) and html (N2)
117
118       -MN    maximum    overall    size    that   can   be   uploaded/scanned
119              (--max-size[=N])
120
121       -EN    maximum mirror  time  in  seconds  (60=1  minute,  3600=1  hour)
122              (--max-time[=N])
123
124       -AN    maximum   transfer   rate   in  bytes/seconds  (1000=1KB/s  max)
125              (--max-rate[=N])
126
127       -%cN   maximum  number  of   connections/seconds   (*%c10)   (--connec‐
128              tion-per-second[=N])
129
130       -GN    pause  transfer  if N bytes reached, and wait until lock file is
131              deleted (--max-pause[=N])
132
133       -%mN   maximum mms stream download time in seconds (60=1 minute, 3600=1
134              hour) (--max-mms-time[=N])
135
136
137   Flow control:
138       -cN    number of multiple connections (*c8) (--sockets[=N])
139
140       -TN    timeout,  number of seconds after a non-responding link is shut‐
141              down (--timeout)
142
143       -RN    number of retries, in case of timeout or non-fatal errors  (*R1)
144              (--retries[=N])
145
146       -JN    traffic jam control, minimum transfert rate (bytes/seconds) tol‐
147              erated for a link (--min-rate[=N])
148
149       -HN    host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout  or
150              slow (--host-control[=N])
151
152
153   Links options:
154       -%P    *extended  parsing,  attempt to parse all links, even in unknown
155              tags or Javascript (%P0 don t use) (--extended-parsing[=N])
156
157       -n     get non-html files  near  an html file  (ex:  an  image  located
158              outside) (--near)
159
160       -t     test all URLs (even forbidden ones) (--test)
161
162       -%L    <file>  add all URL located in this text file (one URL per line)
163              (--list <param>)
164
165       -%S    <file> add all scan rules located in this text  file  (one  scan
166              rule per line) (--urllist <param>)
167
168
169   Build options:
170       -NN    structure  type (0 *original structure, 1+: see below) (--struc‐
171              ture[=N])
172
173       -or    user defined structure (-N "%h%p/%n%q.%t")
174
175       -%N    delayed type check, don t make any link test but wait for  files
176              download to start instead (experimental) (%N0 don t use, %N1 use
177              for unknown extensions, * %N2 always use)
178
179       -%D    cached delayed type check, don t wait  for  remote  type  during
180              updates,   to  speedup  them  (%D0  wait,  *  %D1  don  t  wait)
181              (--cached-delayed-type-check)
182
183       -%M    generate   a   RFC   MIME-encapsulated    full-archive    (.mht)
184              (--mime-html)
185
186       -LN    long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 com‐
187              patible) (--long-names[=N])
188
189       -KN    keep original links  (e.g.  http://www.adr/link)  (K0  *relative
190              link,  K  absolute  links,  K4  original  links, K3 absolute URI
191              links) (--keep-links[=N])
192
193       -x     replace external html links by error pages (--replace-external)
194
195       -%x    do not include any password for external password protected web‐
196              sites (%x0 include) (--disable-passwords)
197
198       -%q    *include  query string for local files (useless, for information
199              purpose only) (%q0 don t include) (--include-query-string)
200
201       -o     *generate output html file in case of error (404..)  (o0  don  t
202              generate) (--generate-errors)
203
204       -X     *purge old files after update (X0 keep delete) (--purge-old[=N])
205
206       -%p    preserve  html files  as is  (identical to  -K4 -%F "" ) (--pre‐
207              serve)
208
209
210   Spider options:
211       -bN    accept cookies  in  cookies.txt  (0=do  not  accept,*  1=accept)
212              (--cookies[=N])
213
214       -u     check document type if unknown (cgi,asp..) (u0 don t check, * u1
215              check but /, u2 check always) (--check-type[=N])
216
217       -j     *parse Java Classes (j0 don t parse, bitmask: |1 parse  default,
218              |2 don t parse .class |4 don t parse .js |8 don t be aggressive)
219              (--parse-java[=N])
220
221       -sN    follow robots.txt and meta  robots  tags  (0=never,1=sometimes,*
222              2=always, 3=always (even strict rules)) (--robots[=N])
223
224       -%h    force  HTTP/1.0  requests  (reduce update features, only for old
225              servers or proxies) (--http-10)
226
227       -%k    use keep-alive if possible, greately reducing latency for  small
228              files and test requests (%k0 don t use) (--keep-alive)
229
230       -%B    tolerant  requests  (accept bogus responses on some servers, but
231              not standard!) (--tolerant)
232
233       -%s    update hacks: various hacks to limit re-transfers when  updating
234              (identical size, bogus response..) (--updatehack)
235
236       -%u    url  hacks:  various  hacks  to  limit duplicate URLs (strip //,
237              www.foo.com==foo.com..) (--urlhack)
238
239       -%A    assume that a type (cgi,asp..) is always linked with a mime type
240              (-%A   php3,cgi=text/html;dat,bin=application/x-zip)   (--assume
241              <param>)
242
243       -can   also  be  used  to  force  a  specific   file   type:   --assume
244              foo.cgi=text/html
245
246       -@iN   internet  protocol  (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only)
247              (--protocol[=N])
248
249       -%w    disable a specific external mime module (-%w htsswf -%w htsjava)
250              (--disable-module <param>)
251
252
253   Browser ID:
254       -F     user-agent  field  sent  in  HTTP headers (-F "user-agent name")
255              (--user-agent <param>)
256
257       -%R    default referer field sent in HTTP headers (--referer <param>)
258
259       -%E    from email address sent in HTTP headers (--from <param>)
260
261       -%F    footer string in Html code (-%F "Mirrored [from host %s [file %s
262              [at %s]]]" (--footer <param>)
263
264       -%l    preffered language (-%l "fr, en, jp, *" (--language <param>)
265
266
267   Log, index, cache
268       -C     create/use a cache for updates and retries (C0 no cache,C1 cache
269              is prioritary,* C2 test update before) (--cache[=N])
270
271       -k     store  all  files  in  cache  (not  useful  if  files  on  disk)
272              (--store-all-in-cache)
273
274       -%n    do not re-download locally erased files (--do-not-recatch)
275
276       -%v    display  on  screen  filenames  downloaded (in realtime) - * %v1
277              short version - %v2 full animation (--display)
278
279       -Q     no log - quiet mode (--do-not-log)
280
281       -q     no questions - quiet mode (--quiet)
282
283       -z     log - extra infos (--extra-log)
284
285       -Z     log - debug (--debug-log)
286
287       -v     log on screen (--verbose)
288
289       -f     *log in files (--file-log)
290
291       -f2    one single log file (--single-log)
292
293       -I     *make an index (I0 don t make) (--index)
294
295       -%i    make a top index for  a  project  folder  (*  %i0  don  t  make)
296              (--build-top-index)
297
298       -%I    make  an  searchable  index  for  this mirror (* %I0 don t make)
299              (--search-index)
300
301
302   Expert options:
303       -pN    priority mode: (* p3) (--priority[=N])
304
305       -p0    just scan, don t save anything (for checking links)
306
307       -p1    save only html files
308
309       -p2    save only non html files
310
311       -*p3   save all files
312
313       -p7    get html files before, then treat other files
314
315       -S     stay on the same directory (--stay-on-same-dir)
316
317       -D     *can only go down into subdirs (--can-go-down)
318
319       -U     can only go to upper directories (--can-go-up)
320
321       -B     can   both   go   up&down   into   the    directory    structure
322              (--can-go-up-and-down)
323
324       -a     *stay on the same address (--stay-on-same-address)
325
326       -d     stay on the same principal domain (--stay-on-same-domain)
327
328       -l     stay on the same TLD (eg: .com) (--stay-on-same-tld)
329
330       -e     go everywhere on the web (--go-everywhere)
331
332       -%H    debug HTTP headers in logfile (--debug-headers)
333
334
335   Guru options: (do NOT use if possible)
336       -#X    *use   optimized   engine   (limited   memory  boundary  checks)
337              (--fast-engine)
338
339       -#0    filter test (-#0  *.gif   www.bar.com/foo.gif  )  (--debug-test‐
340              filters <param>)
341
342       -#1    simplify test (-#1 ./foo/bar/../foobar)
343
344       -#2    type test (-#2 /foo/bar.php)
345
346       -#C    cache list (-#C  *.com/spider*.gif  (--debug-cache <param>)
347
348       -#R    cache repair (damaged cache) (--repair-cache)
349
350       -#d    debug parser (--debug-parsing)
351
352       -#E    extract new.zip cache meta-data in meta.zip
353
354       -#f    always flush log files (--advanced-flushlogs)
355
356       -#FN   maximum number of filters (--advanced-maxfilters[=N])
357
358       -#h    version info (--version)
359
360       -#K    scan stdin (debug) (--debug-scanstdin)
361
362       -#L    maximum number of links (-#L1000000) (--advanced-maxlinks)
363
364       -#p    display ugly progress information (--advanced-progressinfo)
365
366       -#P    catch URL (--catch-url)
367
368       -#R    old FTP routines (debug) (--repair-cache)
369
370       -#T    generate transfer ops. log every minutes (--debug-xfrstats)
371
372       -#u    wait time (--advanced-wait)
373
374       -#Z    generate  transfer rate statictics every minutes (--debug-rates‐
375              tats)
376
377       -#!    execute a shell command (-#! "echo hello") (--exec <param>)
378
379
380   Dangerous options: (do NOT use unless you exactly know what you are doing)
381       -%!    bypass built-in security limits aimed to avoid  bandwith  abuses
382              (bandwidth,  simultaneous  connections) (--disable-security-lim‐
383              its)
384
385       -IMPORTANT
386              NOTE: DANGEROUS OPTION, ONLY SUITABLE FOR EXPERTS
387
388       -USE   IT WITH EXTREME CARE
389
390
391   Command-line specific options:
392       -V     execute system command after each files ($0 is the filename:  -V
393              "rm ") (--userdef-cmd <param>)
394
395       -%U    run  the  engine with another id when called as root (-%U smith)
396              (--user <param>)
397
398       -%W    use  an  external   library   function   as   a   wrapper   (-%W
399              myfoo.so[,myparameters]) (--callback <param>)
400
401
402   Details: Option N
403       -N0    Site-structure (default)
404
405       -N1    HTML in web/, images/other files in web/images/
406
407       -N2    HTML in web/HTML, images/other in web/images
408
409       -N3    HTML in web/,  images/other in web/
410
411       -N4    HTML  in  web/,  images/other  in web/xxx, where xxx is the file
412              extension (all gif will be placed onto web/gif, for example)
413
414       -N5    Images/other in web/xxx and HTML in web/HTML
415
416       -N99   All files in web/, with random names (gadget !)
417
418       -N100  Site-structure, without www.domain.xxx/
419
420       -N101  Identical to N1 exept that "web" is replaced by the site s name
421
422       -N102  Identical to N2 exept that "web" is replaced by the site s name
423
424       -N103  Identical to N3 exept that "web" is replaced by the site s name
425
426       -N104  Identical to N4 exept that "web" is replaced by the site s name
427
428       -N105  Identical to N5 exept that "web" is replaced by the site s name
429
430       -N199  Identical to N99 exept that "web" is replaced by the site s name
431
432       -N1001 Identical to N1 exept that there is no "web" directory
433
434       -N1002 Identical to N2 exept that there is no "web" directory
435
436       -N1003 Identical to N3 exept that there is no "web"  directory  (option
437              set for g option)
438
439       -N1004 Identical to N4 exept that there is no "web" directory
440
441       -N1005 Identical to N5 exept that there is no "web" directory
442
443       -N1099 Identical to N99 exept that there is no "web" directory
444
445   Details: User-defined option N
446          %n  Name of file without file type (ex: image)
447          %N  Name of file, including file type (ex: image.gif)
448          %t  File type (ex: gif)
449          %p  Path [without ending /] (ex: /someimages)
450          %h  Host name (ex: www.someweb.com)
451          %M  URL MD5 (128 bits, 32 ascii bytes)
452          %Q  query string MD5 (128 bits, 32 ascii bytes)
453          %r  protocol name (ex: http)
454          %q  small query string MD5 (16 bits, 4 ascii bytes)
455             %s?  Short name version (ex: %sN)
456          %[param]  param variable in query string
457          %[param:before:after:empty:notfound]  advanced variable extraction
458
459   Details: User-defined option N and advanced variable extraction
460          %[param:before:after:empty:notfound]
461
462       -param : parameter name
463
464       -before
465              : string to prepend if the parameter was found
466
467       -after : string to append if the parameter was found
468
469       -notfound
470              : string replacement if the parameter could not be found
471
472       -empty : string replacement if the parameter was empty
473
474       -all   fields, except the first one (the parameter name), can be empty
475
476
477   Details: Option K
478       -K0    foo.cgi?q=45  ->  foo4B54.html?q=45 (relative URI, default)
479
480       -K     ->    http://www.foobar.com/folder/foo.cgi?q=45  (absolute  URL)
481              (--keep-links[=N])
482
483       -K4    ->  foo.cgi?q=45 (original URL)
484
485       -K3    ->  /folder/foo.cgi?q=45 (absolute URI)
486
487
488   Shortcuts:
489       --mirror
490                   <URLs> *make a mirror of site(s) (default)
491
492       --get
493                      <URLs>  get the files indicated, do not seek other  URLs
494              (-qg)
495
496       --list
497                <text file>  add all URL located in this text file (-%L)
498
499       --mirrorlinks
500              <URLs>  mirror all links in 1st level pages (-Y)
501
502       --testlinks
503                <URLs>  test links in pages (-r1p0C0I0t)
504
505       --spider
506                   <URLs>   spider  site(s),  to  test links: reports Errors &
507              Warnings (-p0C0I0t)
508
509       --testsite
510                 <URLs>  identical to --spider
511
512       --skeleton
513                 <URLs>  make a mirror, but gets only html files (-p1)
514
515       --update
516                           update a mirror, without confirmation (-iC2)
517
518       --continue
519                         continue a mirror, without confirmation (-iC1)
520
521
522       --catchurl
523                         create a temporary proxy to capture an URL or a  form
524              post URL
525
526       --clean
527                            erase cache & log files
528
529
530       --http10
531                           force http/1.0 requests (-%h)
532
533
534   Details: Option %W: External callbacks prototypes
535   see htsdefines.h

FILES

537       /etc/httrack.conf
538              The system wide configuration file.
539

ENVIRONMENT

541       HOME   Is  being used if you defined in /etc/httrack.conf the line path
542              ~/websites/#
543

DIAGNOSTICS

545       Errors/Warnings are reported to hts-log.txt by default, or to stderr if
546       the -v option was specified.
547

LIMITS

549       These  are  the principals limits of HTTrack for that moment. Note that
550       we did not heard about any other utility that would have solved them.
551
552
553       - Several scripts generating complex filenames may not find  them  (ex:
554       img.src='image'+a+Mobj.dst+'.gif')
555
556       - Some java classes may not find some files on them (class included)
557
558       -  Cgi-bin  links  may  not  work  properly  in  some cases (parameters
559       needed). To avoid them: use filters like -*cgi-bin*
560

BUGS

562       Please reports bugs to <bugs@httrack.com>.  Include a  complete,  self-
563       contained  example  that  will  allow the bug to be reproduced, and say
564       which version of httrack you are using. Do not forget to detail options
565       used, OS version, and any other information you deem necessary.
566
568       Copyright (C) Xavier Roche and other contributors
569
570       This program is free software; you can redistribute it and/or modify it
571       under the terms of the GNU General Public License as published  by  the
572       Free Software Foundation; either version 2 of the License, or any later
573       version.
574
575       This program is distributed in the hope that it  will  be  useful,  but
576       WITHOUT  ANY  WARRANTY;  without  even  the  implied  warranty  of MER‐
577       CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU  General
578       Public License for more details.
579
580       You should have received a copy of the GNU General Public License along
581       with this program; if not, write to the Free Software Foundation, Inc.,
582       59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
583

AVAILABILITY

585       The   most   recent  released  version  of  httrack  can  be  found at:
586       http://www.httrack.com
587

AUTHOR

589       Xavier Roche <roche@httrack.com>
590

SEE ALSO

592       The      HTML      documentation       (available       online       at
593       http://www.httrack.com/html/  )  contains  more  detailed  information.
594       Please  also  refer  to  the   httrack   FAQ   (available   online   at
595       http://www.httrack.com/html/faq.html )
596
597
598
599httrack website coHpTiTerrack version 3.43-9 (compiled Jan  4 2010)      httrack(1)
Impressum