1httrack(1)                  General Commands Manual                 httrack(1)
2
3
4

NAME

6       httrack - offline browser : copy websites to a local directory
7

SYNOPSIS

9       httrack  [  url ]... [ -filter ]... [ +filter ]... [ -O, --path ] [ -w,
10       --mirror ] [ -W, --mirror-wizard ] [ -g, --get-files ] [ -i, --continue
11       ]  [ -Y, --mirrorlinks ] [ -P, --proxy ] [ -%f, --httpproxy-ftp[=N] ] [
12       -%b, --bind ] [ -rN, --depth[=N] ] [ -%eN,  --ext-depth[=N]  ]  [  -mN,
13       --max-files[=N]  ]  [  -MN,  --max-size[=N] ] [ -EN, --max-time[=N] ] [
14       -AN, --max-rate[=N] ] [  -%cN,  --connection-per-second[=N]  ]  [  -GN,
15       --max-pause[=N]  ] [ -cN, --sockets[=N] ] [ -TN, --timeout[=N] ] [ -RN,
16       --retries[=N] ] [ -JN, --min-rate[=N] ] [ -HN, --host-control[=N]  ]  [
17       -%P,  --extended-parsing[=N]  ]  [  -n,  --near ] [ -t, --test ] [ -%L,
18       --list  ]  [  -%S,  --urllist  ]  [  -NN,  --structure[=N]  ]  [   -%D,
19       --cached-delayed-type-check   ]   [   -%M,   --mime-html   ]   [   -LN,
20       --long-names[=N] ] [ -KN, --keep-links[=N] ] [ -x, --replace-external ]
21       [  -%x,  --disable-passwords  ]  [  -%q, --include-query-string ] [ -o,
22       --generate-errors ] [ -X, --purge-old[=N] ] [ -%p, --preserve ] [  -%T,
23       --utf8-conversion  ]  [ -bN, --cookies[=N] ] [ -u, --check-type[=N] ] [
24       -j, --parse-java[=N] ] [ -sN, --robots[=N] ] [ -%h, --http-10 ] [  -%k,
25       --keep-alive  ] [ -%B, --tolerant ] [ -%s, --updatehack ] [ -%u, --url‐
26       hack ] [ -%A, --assume ] [ -@iN, --protocol[=N] ] [ -%w, --disable-mod‐
27       ule  ]  [  -F, --user-agent ] [ -%R, --referer ] [ -%E, --from ] [ -%F,
28       --footer ] [ -%l, --language ] [ -%a, --accept ] [ -%X, --headers  ]  [
29       -C,  --cache[=N] ] [ -k, --store-all-in-cache ] [ -%n, --do-not-recatch
30       ] [ -%v, --display ] [ -Q,  --do-not-log  ]  [  -q,  --quiet  ]  [  -z,
31       --extra-log  ] [ -Z, --debug-log ] [ -v, --verbose ] [ -f, --file-log ]
32       [ -f2, --single-log ] [ -I, --index ] [ -%i, --build-top-index ] [ -%I,
33       --search-index  ]  [ -pN, --priority[=N] ] [ -S, --stay-on-same-dir ] [
34       -D, --can-go-down ] [ -U, --can-go-up ] [ -B, --can-go-up-and-down ]  [
35       -a,  --stay-on-same-address  ]  [  -d,  --stay-on-same-domain  ]  [ -l,
36       --stay-on-same-tld ] [ -e, --go-everywhere ] [ -%H, --debug-headers ] [
37       -%!,  --disable-security-limits  ] [ -V, --userdef-cmd ] [ -%W, --call‐
38       back ] [ -K, --keep-links[=N] ] [
39

DESCRIPTION

41       httrack allows you to download a World Wide Web site from the  Internet
42       to  a  local  directory,  building recursively all directories, getting
43       HTML, images, and other files from the server to your computer. HTTrack
44       arranges  the  original  site's  relative link-structure. Simply open a
45       page of the "mirrored" website in your browser, and you can browse  the
46       site  from  link to link, as if you were viewing it online. HTTrack can
47       also update an existing mirrored site,  and  resume  interrupted  down‐
48       loads.
49

EXAMPLES

51       httrack www.someweb.com/bob/
52               mirror site www.someweb.com/bob/ and only this site
53
54       httrack   www.someweb.com/bob/  www.anothertest.com/mike/  +*.com/*.jpg
55       -mime:application/*
56               mirror the two sites together (with shared  links)  and  accept
57              any .jpg files on .com sites
58
59       httrack www.someweb.com/bob/bobby.html +* -r6
60              means get all files starting from bobby.html, with 6 link-depth,
61              and possibility of going everywhere on the web
62
63       httrack        www.someweb.com/bob/bobby.html        --spider        -P
64       proxy.myhost.com:8080
65              runs the spider on www.someweb.com/bob/bobby.html using a proxy
66
67       httrack --update
68              updates a mirror in the current folder
69
70       httrack
71              will bring you to the interactive mode
72
73       httrack --continue
74              continues a mirror in the current folder
75

OPTIONS

77   General options:
78       -O     path  for  mirror/logfiles+cache (-O path mirror[,path cache and
79              logfiles]) (--path <param>)
80
81
82   Action options:
83       -w     *mirror web sites (--mirror)
84
85       -W     mirror web sites, semi-automatic (asks questions) (--mirror-wiz‐
86              ard)
87
88       -g     just get files (saved in the current directory) (--get-files)
89
90       -i     continue an interrupted mirror using the cache (--continue)
91
92       -Y     mirror ALL links located in the first level pages (mirror links)
93              (--mirrorlinks)
94
95
96   Proxy options:
97       -P     proxy use (-P proxy:port or  -P  user:pass@proxy:port)  (--proxy
98              <param>)
99
100       -%f    *use proxy for ftp (f0 don t use) (--httpproxy-ftp[=N])
101
102       -%b    use  this  local  hostname  to make/send requests (-%b hostname)
103              (--bind <param>)
104
105
106   Limits options:
107       -rN    set the mirror depth to N (* r9999) (--depth[=N])
108
109       -%eN   set the external links depth to N (* %e0) (--ext-depth[=N])
110
111       -mN    maximum file length for a non-html file (--max-files[=N])
112
113       -mN,N2 maximum file length for non html (N) and html (N2)
114
115       -MN    maximum   overall   size   that    can    be    uploaded/scanned
116              (--max-size[=N])
117
118       -EN    maximum  mirror  time  in  seconds  (60=1  minute,  3600=1 hour)
119              (--max-time[=N])
120
121       -AN    maximum  transfer  rate  in   bytes/seconds   (1000=1KB/s   max)
122              (--max-rate[=N])
123
124       -%cN   maximum   number   of   connections/seconds  (*%c10)  (--connec‐
125              tion-per-second[=N])
126
127       -GN    pause transfer if N bytes reached, and wait until lock  file  is
128              deleted (--max-pause[=N])
129
130
131   Flow control:
132       -cN    number of multiple connections (*c8) (--sockets[=N])
133
134       -TN    timeout,  number of seconds after a non-responding link is shut‐
135              down (--timeout[=N])
136
137       -RN    number of retries, in case of timeout or non-fatal errors  (*R1)
138              (--retries[=N])
139
140       -JN    traffic jam control, minimum transfert rate (bytes/seconds) tol‐
141              erated for a link (--min-rate[=N])
142
143       -HN    host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout  or
144              slow (--host-control[=N])
145
146
147   Links options:
148       -%P    *extended  parsing,  attempt to parse all links, even in unknown
149              tags or Javascript (%P0 don t use) (--extended-parsing[=N])
150
151       -n     get non-html files  near  an html file  (ex:  an  image  located
152              outside) (--near)
153
154       -t     test all URLs (even forbidden ones) (--test)
155
156       -%L    <file>  add all URL located in this text file (one URL per line)
157              (--list <param>)
158
159       -%S    <file> add all scan rules located in this text  file  (one  scan
160              rule per line) (--urllist <param>)
161
162
163   Build options:
164       -NN    structure  type (0 *original structure, 1+: see below) (--struc‐
165              ture[=N])
166
167       -or    user defined structure (-N "%h%p/%n%q.%t")
168
169       -%N    delayed type check, don t make any link test but wait for  files
170              download to start instead (experimental) (%N0 don t use, %N1 use
171              for unknown extensions, * %N2 always use)
172
173       -%D    cached delayed type check, don t wait  for  remote  type  during
174              updates,   to  speedup  them  (%D0  wait,  *  %D1  don  t  wait)
175              (--cached-delayed-type-check)
176
177       -%M    generate   a   RFC   MIME-encapsulated    full-archive    (.mht)
178              (--mime-html)
179
180       -LN    long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 com‐
181              patible) (--long-names[=N])
182
183       -KN    keep original links  (e.g.  http://www.adr/link)  (K0  *relative
184              link,  K  absolute  links,  K4  original  links, K3 absolute URI
185              links, K5 transparent proxy link) (--keep-links[=N])
186
187       -x     replace external html links by error pages (--replace-external)
188
189       -%x    do not include any password for external password protected web‐
190              sites (%x0 include) (--disable-passwords)
191
192       -%q    *include  query string for local files (useless, for information
193              purpose only) (%q0 don t include) (--include-query-string)
194
195       -o     *generate output html file in case of error (404..)  (o0  don  t
196              generate) (--generate-errors)
197
198       -X     *purge old files after update (X0 keep delete) (--purge-old[=N])
199
200       -%p    preserve  html files  as is  (identical to  -K4 -%F "" ) (--pre‐
201              serve)
202
203       -%T    links conversion to UTF-8 (--utf8-conversion)
204
205
206   Spider options:
207       -bN    accept cookies  in  cookies.txt  (0=do  not  accept,*  1=accept)
208              (--cookies[=N])
209
210       -u     check document type if unknown (cgi,asp..) (u0 don t check, * u1
211              check but /, u2 check always) (--check-type[=N])
212
213       -j     *parse Java Classes (j0 don t parse, bitmask: |1 parse  default,
214              |2 don t parse .class |4 don t parse .js |8 don t be aggressive)
215              (--parse-java[=N])
216
217       -sN    follow robots.txt and meta  robots  tags  (0=never,1=sometimes,*
218              2=always, 3=always (even strict rules)) (--robots[=N])
219
220       -%h    force  HTTP/1.0  requests  (reduce update features, only for old
221              servers or proxies) (--http-10)
222
223       -%k    use keep-alive if possible, greately reducing latency for  small
224              files and test requests (%k0 don t use) (--keep-alive)
225
226       -%B    tolerant  requests  (accept bogus responses on some servers, but
227              not standard!) (--tolerant)
228
229       -%s    update hacks: various hacks to limit re-transfers when  updating
230              (identical size, bogus response..) (--updatehack)
231
232       -%u    url  hacks:  various  hacks  to  limit duplicate URLs (strip //,
233              www.foo.com==foo.com..) (--urlhack)
234
235       -%A    assume that a type (cgi,asp..) is always linked with a mime type
236              (-%A   php3,cgi=text/html;dat,bin=application/x-zip)   (--assume
237              <param>)
238
239       -can   also  be  used  to  force  a  specific   file   type:   --assume
240              foo.cgi=text/html
241
242       -@iN   internet  protocol  (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only)
243              (--protocol[=N])
244
245       -%w    disable a specific external mime module (-%w htsswf -%w htsjava)
246              (--disable-module <param>)
247
248
249   Browser ID:
250       -F     user-agent  field  sent  in  HTTP headers (-F "user-agent name")
251              (--user-agent <param>)
252
253       -%R    default referer field sent in HTTP headers (--referer <param>)
254
255       -%E    from email address sent in HTTP headers (--from <param>)
256
257       -%F    footer string in Html code (-%F "Mirrored [from host %s [file %s
258              [at %s]]]" (--footer <param>)
259
260       -%l    preffered language (-%l "fr, en, jp, *" (--language <param>)
261
262       -%a    accepted   formats   (-%a  "text/html,image/png;q=0.9,*/*;q=0.1"
263              (--accept <param>)
264
265       -%X    additional  HTTP  header  line  (-%X  "X-Magic:  42"  (--headers
266              <param>)
267
268
269   Log, index, cache
270       -C     create/use a cache for updates and retries (C0 no cache,C1 cache
271              is prioritary,* C2 test update before) (--cache[=N])
272
273       -k     store  all  files  in  cache  (not  useful  if  files  on  disk)
274              (--store-all-in-cache)
275
276       -%n    do not re-download locally erased files (--do-not-recatch)
277
278       -%v    display  on  screen  filenames  downloaded (in realtime) - * %v1
279              short version - %v2 full animation (--display)
280
281       -Q     no log - quiet mode (--do-not-log)
282
283       -q     no questions - quiet mode (--quiet)
284
285       -z     log - extra infos (--extra-log)
286
287       -Z     log - debug (--debug-log)
288
289       -v     log on screen (--verbose)
290
291       -f     *log in files (--file-log)
292
293       -f2    one single log file (--single-log)
294
295       -I     *make an index (I0 don t make) (--index)
296
297       -%i    make a top index for  a  project  folder  (*  %i0  don  t  make)
298              (--build-top-index)
299
300       -%I    make  an  searchable  index  for  this mirror (* %I0 don t make)
301              (--search-index)
302
303
304   Expert options:
305       -pN    priority mode: (* p3) (--priority[=N])
306
307       -p0    just scan, don t save anything (for checking links)
308
309       -p1    save only html files
310
311       -p2    save only non html files
312
313       -*p3   save all files
314
315       -p7    get html files before, then treat other files
316
317       -S     stay on the same directory (--stay-on-same-dir)
318
319       -D     *can only go down into subdirs (--can-go-down)
320
321       -U     can only go to upper directories (--can-go-up)
322
323       -B     can   both   go   up&down   into   the    directory    structure
324              (--can-go-up-and-down)
325
326       -a     *stay on the same address (--stay-on-same-address)
327
328       -d     stay on the same principal domain (--stay-on-same-domain)
329
330       -l     stay on the same TLD (eg: .com) (--stay-on-same-tld)
331
332       -e     go everywhere on the web (--go-everywhere)
333
334       -%H    debug HTTP headers in logfile (--debug-headers)
335
336
337   Guru options: (do NOT use if possible)
338       -#X    *use   optimized   engine   (limited   memory  boundary  checks)
339              (--fast-engine)
340
341       -#0    filter test (-#0  *.gif   www.bar.com/foo.gif  )  (--debug-test‐
342              filters <param>)
343
344       -#1    simplify test (-#1 ./foo/bar/../foobar)
345
346       -#2    type test (-#2 /foo/bar.php)
347
348       -#C    cache list (-#C  *.com/spider*.gif  (--debug-cache <param>)
349
350       -#R    cache repair (damaged cache) (--repair-cache)
351
352       -#d    debug parser (--debug-parsing)
353
354       -#E    extract new.zip cache meta-data in meta.zip
355
356       -#f    always flush log files (--advanced-flushlogs)
357
358       -#FN   maximum number of filters (--advanced-maxfilters[=N])
359
360       -#h    version info (--version)
361
362       -#K    scan stdin (debug) (--debug-scanstdin)
363
364       -#L    maximum number of links (-#L1000000) (--advanced-maxlinks[=N])
365
366       -#p    display ugly progress information (--advanced-progressinfo)
367
368       -#P    catch URL (--catch-url)
369
370       -#R    old FTP routines (debug) (--repair-cache)
371
372       -#T    generate transfer ops. log every minutes (--debug-xfrstats)
373
374       -#u    wait time (--advanced-wait)
375
376       -#Z    generate  transfer rate statictics every minutes (--debug-rates‐
377              tats)
378
379
380   Dangerous options: (do NOT use unless you exactly know what you are doing)
381       -%!    bypass built-in security limits aimed to avoid bandwidth  abuses
382              (bandwidth,  simultaneous  connections) (--disable-security-lim‐
383              its)
384
385       -IMPORTANT
386              NOTE: DANGEROUS OPTION, ONLY SUITABLE FOR EXPERTS
387
388       -USE   IT WITH EXTREME CARE
389
390
391   Command-line specific options:
392       -V     execute system command after each files ($0 is the filename:  -V
393              "rm \$0") (--userdef-cmd <param>)
394
395       -%W    use   an   external   library   function   as   a  wrapper  (-%W
396              myfoo.so[,myparameters]) (--callback <param>)
397
398
399   Details: Option N
400       -N0    Site-structure (default)
401
402       -N1    HTML in web/, images/other files in web/images/
403
404       -N2    HTML in web/HTML, images/other in web/images
405
406       -N3    HTML in web/,  images/other in web/
407
408       -N4    HTML in web/, images/other in web/xxx, where  xxx  is  the  file
409              extension (all gif will be placed onto web/gif, for example)
410
411       -N5    Images/other in web/xxx and HTML in web/HTML
412
413       -N99   All files in web/, with random names (gadget !)
414
415       -N100  Site-structure, without www.domain.xxx/
416
417       -N101  Identical to N1 exept that "web" is replaced by the site s name
418
419       -N102  Identical to N2 exept that "web" is replaced by the site s name
420
421       -N103  Identical to N3 exept that "web" is replaced by the site s name
422
423       -N104  Identical to N4 exept that "web" is replaced by the site s name
424
425       -N105  Identical to N5 exept that "web" is replaced by the site s name
426
427       -N199  Identical to N99 exept that "web" is replaced by the site s name
428
429       -N1001 Identical to N1 exept that there is no "web" directory
430
431       -N1002 Identical to N2 exept that there is no "web" directory
432
433       -N1003 Identical  to  N3 exept that there is no "web" directory (option
434              set for g option)
435
436       -N1004 Identical to N4 exept that there is no "web" directory
437
438       -N1005 Identical to N5 exept that there is no "web" directory
439
440       -N1099 Identical to N99 exept that there is no "web" directory
441
442   Details: User-defined option N
443          %n  Name of file without file type (ex: image)
444          %N  Name of file, including file type (ex: image.gif)
445          %t  File type (ex: gif)
446          %p  Path [without ending /] (ex: /someimages)
447          %h  Host name (ex: www.someweb.com)
448          %M  URL MD5 (128 bits, 32 ascii bytes)
449          %Q  query string MD5 (128 bits, 32 ascii bytes)
450          %k  full query string
451          %r  protocol name (ex: http)
452          %q  small query string MD5 (16 bits, 4 ascii bytes)
453             %s?  Short name version (ex: %sN)
454          %[param]  param variable in query string
455          %[param:before:after:empty:notfound]  advanced variable extraction
456
457   Details: User-defined option N and advanced variable extraction
458          %[param:before:after:empty:notfound]
459
460       -param : parameter name
461
462       -before
463              : string to prepend if the parameter was found
464
465       -after : string to append if the parameter was found
466
467       -notfound
468              : string replacement if the parameter could not be found
469
470       -empty : string replacement if the parameter was empty
471
472       -all   fields, except the first one (the parameter name), can be empty
473
474
475   Details: Option K
476       -K0    foo.cgi?q=45  ->  foo4B54.html?q=45 (relative URI, default)
477
478       -K     ->   http://www.foobar.com/folder/foo.cgi?q=45  (absolute   URL)
479              (--keep-links[=N])
480
481       -K3    ->  /folder/foo.cgi?q=45 (absolute URI)
482
483       -K4    ->  foo.cgi?q=45 (original URL)
484
485       -K5    ->   http://www.foobar.com/folder/foo4B54.html?q=45 (transparent
486              proxy URL)
487
488
489   Shortcuts:
490       --mirror
491                   <URLs> *make a mirror of site(s) (default)
492
493       --get
494                      <URLs>  get the files indicated, do not seek other  URLs
495              (-qg)
496
497       --list
498                <text file>  add all URL located in this text file (-%L)
499
500       --mirrorlinks
501              <URLs>  mirror all links in 1st level pages (-Y)
502
503       --testlinks
504                <URLs>  test links in pages (-r1p0C0I0t)
505
506       --spider
507                   <URLs>   spider  site(s),  to  test links: reports Errors &
508              Warnings (-p0C0I0t)
509
510       --testsite
511                 <URLs>  identical to --spider
512
513       --skeleton
514                 <URLs>  make a mirror, but gets only html files (-p1)
515
516       --update
517                           update a mirror, without confirmation (-iC2)
518
519       --continue
520                         continue a mirror, without confirmation (-iC1)
521
522
523       --catchurl
524                         create a temporary proxy to capture an URL or a  form
525              post URL
526
527       --clean
528                            erase cache & log files
529
530
531       --http10
532                           force http/1.0 requests (-%h)
533
534
535   Details: Option %W: External callbacks prototypes
536   see htsdefines.h

FILES

538       /etc/httrack.conf
539              The system wide configuration file.
540

ENVIRONMENT

542       HOME   Is  being used if you defined in /etc/httrack.conf the line path
543              ~/websites/#
544

DIAGNOSTICS

546       Errors/Warnings are reported to hts-log.txt by default, or to stderr if
547       the -v option was specified.
548

LIMITS

550       These  are  the principals limits of HTTrack for that moment. Note that
551       we did not heard about any other utility that would have solved them.
552
553
554       - Several scripts generating complex filenames may not find  them  (ex:
555       img.src='image'+a+Mobj.dst+'.gif')
556
557       - Some java classes may not find some files on them (class included)
558
559       -  Cgi-bin  links  may  not  work  properly  in  some cases (parameters
560       needed). To avoid them: use filters like -*cgi-bin*
561

BUGS

563       Please reports bugs to <bugs@httrack.com>.  Include a  complete,  self-
564       contained  example  that  will  allow the bug to be reproduced, and say
565       which version of httrack you are using. Do not forget to detail options
566       used, OS version, and any other information you deem necessary.
567
569       Copyright (C) 1998-2017 Xavier Roche and other contributors
570
571       This program is free software: you can redistribute it and/or modify it
572       under the terms of the GNU General Public License as published  by  the
573       Free  Software Foundation, either version 3 of the License, or (at your
574       option) any later version.
575
576       This program is distributed in the hope that it  will  be  useful,  but
577       WITHOUT  ANY  WARRANTY;  without  even  the  implied  warranty  of MER‐
578       CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU  General
579       Public License for more details.
580
581       You should have received a copy of the GNU General Public License along
582       with this program. If not, see <http://www.gnu.org/licenses/>.
583
584

AVAILABILITY

586       The  most   recent  released  version  of  httrack  can  be  found  at:
587       http://www.httrack.com
588

AUTHOR

590       Xavier Roche <roche@httrack.com>
591

SEE ALSO

593       The       HTML       documentation       (available      online      at
594       http://www.httrack.com/html/  )  contains  more  detailed  information.
595       Please   also   refer   to   the   httrack  FAQ  (available  online  at
596       http://www.httrack.com/html/faq.html )
597
598
599
600httrack website copier            20 May 2017                       httrack(1)
Impressum