1httrack(1) General Commands Manual httrack(1)
2
3
4
6 httrack - offline browser : copy websites to a local directory
7
9 httrack [ url ]... [ -filter ]... [ +filter ]... [ ] [ -%O, --chroot ]
10 [ -w, --mirror ] [ -W, --mirror-wizard ] [ -g, --get-files ] [ -i,
11 --continue ] [ -Y, --mirrorlinks ] [ -P, --proxy ] [ -%f, --httpproxy-
12 ftp[=N] ] [ -%b, --bind ] [ -rN, --depth[=N] ] [ -%eN, --ext-depth[=N]
13 ] [ -mN, --max-files[=N] ] [ -MN, --max-size[=N] ] [ -EN, --max-
14 time[=N] ] [ -AN, --max-rate[=N] ] [ -%cN, --connection-per-second[=N]
15 ] [ -GN, --max-pause[=N] ] [ -%mN, --max-mms-time[=N] ] [ -cN, --sock‐
16 ets[=N] ] [ -TN, --timeout ] [ -RN, --retries[=N] ] [ -JN, --min-
17 rate[=N] ] [ -HN, --host-control[=N] ] [ -%P, --extended-parsing[=N] ]
18 [ -n, --near ] [ -t, --test ] [ -%L, --list ] [ -%S, --urllist ] [ -NN,
19 --structure[=N] ] [ -%D, --cached-delayed-type-check ] [ -%M, --mime-
20 html ] [ -LN, --long-names[=N] ] [ -KN, --keep-links[=N] ] [ -x,
21 --replace-external ] [ -%x, --disable-passwords ] [ -%q, --include-
22 query-string ] [ -o, --generate-errors ] [ -X, --purge-old[=N] ] [ -%p,
23 --preserve ] [ -bN, --cookies[=N] ] [ -u, --check-type[=N] ] [ -j,
24 --parse-java[=N] ] [ -sN, --robots[=N] ] [ -%h, --http-10 ] [ -%k,
25 --keep-alive ] [ -%B, --tolerant ] [ -%s, --updatehack ] [ -%u, --url‐
26 hack ] [ -%A, --assume ] [ -@iN, --protocol[=N] ] [ -%w, --disable-mod‐
27 ule ] [ -F, --user-agent ] [ -%R, --referer ] [ -%E, --from ] [ -%F,
28 --footer ] [ -%l, --language ] [ -C, --cache[=N] ] [ -k, --store-all-
29 in-cache ] [ -%n, --do-not-recatch ] [ -%v, --display ] [ -Q, --do-not-
30 log ] [ -q, --quiet ] [ -z, --extra-log ] [ -Z, --debug-log ] [ -v,
31 --verbose ] [ -f, --file-log ] [ -f2, --single-log ] [ -I, --index ] [
32 -%i, --build-top-index ] [ -%I, --search-index ] [ -pN, --priority[=N]
33 ] [ -S, --stay-on-same-dir ] [ -D, --can-go-down ] [ -U, --can-go-up ]
34 [ -B, --can-go-up-and-down ] [ -a, --stay-on-same-address ] [ -d,
35 --stay-on-same-domain ] [ -l, --stay-on-same-tld ] [ -e, --go-every‐
36 where ] [ -%H, --debug-headers ] [ -%!, --disable-security-limits ] [
37 -V, --userdef-cmd ] [ -%U, --user ] [ -%W, --callback ] [ -K, --keep-
38 links[=N] ] [
39
41 httrack allows you to download a World Wide Web site from the Internet
42 to a local directory, building recursively all directories, getting
43 HTML, images, and other files from the server to your computer. HTTrack
44 arranges the original site's relative link-structure. Simply open a
45 page of the "mirrored" website in your browser, and you can browse the
46 site from link to link, as if you were viewing it online. HTTrack can
47 also update an existing mirrored site, and resume interrupted down‐
48 loads.
49
51 httrack www.someweb.com/bob/
52 mirror site www.someweb.com/bob/ and only this site
53
54 httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
55 -mime:application/*
56 mirror the two sites together (with shared links) and accept
57 any .jpg files on .com sites
58
59 httrack www.someweb.com/bob/bobby.html +* -r6
60 means get all files starting from bobby.html, with 6 link-depth,
61 and possibility of going everywhere on the web
62
63 httrack www.someweb.com/bob/bobby.html --spider -P
64 proxy.myhost.com:8080
65 runs the spider on www.someweb.com/bob/bobby.html using a proxy
66
67 httrack --update
68 updates a mirror in the current folder
69
70 httrack
71 will bring you to the interactive mode
72
73 httrack --continue
74 continues a mirror in the current folder
75
77 General options:
78 -O path for mirror/logfiles+cache (-O path mirror[,path cache and
79 logfiles]) (--path <param>)
80
81 -%O chroot path to, must be r00t (-%O root path) (--chroot <param>)
82
83
84 Action options:
85 -w *mirror web sites (--mirror)
86
87 -W mirror web sites, semi-automatic (asks questions) (--mirror-wiz‐
88 ard)
89
90 -g just get files (saved in the current directory) (--get-files)
91
92 -i continue an interrupted mirror using the cache (--continue)
93
94 -Y mirror ALL links located in the first level pages (mirror links)
95 (--mirrorlinks)
96
97
98 Proxy options:
99 -P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy
100 <param>)
101
102 -%f *use proxy for ftp (f0 don t use) (--httpproxy-ftp[=N])
103
104 -%b use this local hostname to make/send requests (-%b hostname)
105 (--bind <param>)
106
107
108 Limits options:
109 -rN set the mirror depth to N (* r9999) (--depth[=N])
110
111 -%eN set the external links depth to N (* %e0) (--ext-depth[=N])
112
113 -mN maximum file length for a non-html file (--max-files[=N])
114
115 -mN,N2 maximum file length for non html (N) and html (N2)
116
117 -MN maximum overall size that can be uploaded/scanned (--max-
118 size[=N])
119
120 -EN maximum mirror time in seconds (60=1 minute, 3600=1 hour)
121 (--max-time[=N])
122
123 -AN maximum transfer rate in bytes/seconds (1000=1KB/s max) (--max-
124 rate[=N])
125
126 -%cN maximum number of connections/seconds (*%c10) (--connection-per-
127 second[=N])
128
129 -GN pause transfer if N bytes reached, and wait until lock file is
130 deleted (--max-pause[=N])
131
132 -%mN maximum mms stream download time in seconds (60=1 minute, 3600=1
133 hour) (--max-mms-time[=N])
134
135
136 Flow control:
137 -cN number of multiple connections (*c8) (--sockets[=N])
138
139 -TN timeout, number of seconds after a non-responding link is shut‐
140 down (--timeout)
141
142 -RN number of retries, in case of timeout or non-fatal errors (*R1)
143 (--retries[=N])
144
145 -JN traffic jam control, minimum transfert rate (bytes/seconds) tol‐
146 erated for a link (--min-rate[=N])
147
148 -HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or
149 slow (--host-control[=N])
150
151
152 Links options:
153 -%P *extended parsing, attempt to parse all links, even in unknown
154 tags or Javascript (%P0 don t use) (--extended-parsing[=N])
155
156 -n get non-html files near an html file (ex: an image located
157 outside) (--near)
158
159 -t test all URLs (even forbidden ones) (--test)
160
161 -%L <file> add all URL located in this text file (one URL per line)
162 (--list <param>)
163
164 -%S <file> add all scan rules located in this text file (one scan
165 rule per line) (--urllist <param>)
166
167
168 Build options:
169 -NN structure type (0 *original structure, 1+: see below) (--struc‐
170 ture[=N])
171
172 -or user defined structure (-N "%h%p/%n%q.%t")
173
174 -%N delayed type check, don t make any link test but wait for files
175 download to start instead (experimental) (%N0 don t use, %N1 use
176 for unknown extensions, * %N2 always use)
177
178 -%D cached delayed type check, don t wait for remote type during
179 updates, to speedup them (%D0 wait, * %D1 don t wait) (--cached-
180 delayed-type-check)
181
182 -%M generate a RFC MIME-encapsulated full-archive (.mht) (--mime-
183 html)
184
185 -LN long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 com‐
186 patible) (--long-names[=N])
187
188 -KN keep original links (e.g. http://www.adr/link) (K0 *relative
189 link, K absolute links, K4 original links, K3 absolute URI
190 links) (--keep-links[=N])
191
192 -x replace external html links by error pages (--replace-external)
193
194 -%x do not include any password for external password protected web‐
195 sites (%x0 include) (--disable-passwords)
196
197 -%q *include query string for local files (useless, for information
198 purpose only) (%q0 don t include) (--include-query-string)
199
200 -o *generate output html file in case of error (404..) (o0 don t
201 generate) (--generate-errors)
202
203 -X *purge old files after update (X0 keep delete) (--purge-old[=N])
204
205 -%p preserve html files as is (identical to -K4 -%F "" ) (--pre‐
206 serve)
207
208
209 Spider options:
210 -bN accept cookies in cookies.txt (0=do not accept,* 1=accept)
211 (--cookies[=N])
212
213 -u check document type if unknown (cgi,asp..) (u0 don t check, * u1
214 check but /, u2 check always) (--check-type[=N])
215
216 -j *parse Java Classes (j0 don t parse, bitmask: |1 parse default,
217 |2 don t parse .class |4 don t parse .js |8 don t be aggressive)
218 (--parse-java[=N])
219
220 -sN follow robots.txt and meta robots tags (0=never,1=sometimes,*
221 2=always, 3=always (even strict rules)) (--robots[=N])
222
223 -%h force HTTP/1.0 requests (reduce update features, only for old
224 servers or proxies) (--http-10)
225
226 -%k use keep-alive if possible, greately reducing latency for small
227 files and test requests (%k0 don t use) (--keep-alive)
228
229 -%B tolerant requests (accept bogus responses on some servers, but
230 not standard!) (--tolerant)
231
232 -%s update hacks: various hacks to limit re-transfers when updating
233 (identical size, bogus response..) (--updatehack)
234
235 -%u url hacks: various hacks to limit duplicate URLs (strip //,
236 www.foo.com==foo.com..) (--urlhack)
237
238 -%A assume that a type (cgi,asp..) is always linked with a mime type
239 (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume
240 <param>)
241
242 -can also be used to force a specific file type: --assume
243 foo.cgi=text/html
244
245 -@iN internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only)
246 (--protocol[=N])
247
248 -%w disable a specific external mime module (-%w htsswf -%w htsjava)
249 (--disable-module <param>)
250
251
252 Browser ID:
253 -F user-agent field sent in HTTP headers (-F "user-agent name")
254 (--user-agent <param>)
255
256 -%R default referer field sent in HTTP headers (--referer <param>)
257
258 -%E from email address sent in HTTP headers (--from <param>)
259
260 -%F footer string in Html code (-%F "Mirrored [from host %s [file %s
261 [at %s]]]" (--footer <param>)
262
263 -%l preffered language (-%l "fr, en, jp, *" (--language <param>)
264
265
266 Log, index, cache
267 -C create/use a cache for updates and retries (C0 no cache,C1 cache
268 is prioritary,* C2 test update before) (--cache[=N])
269
270 -k store all files in cache (not useful if files on disk) (--store-
271 all-in-cache)
272
273 -%n do not re-download locally erased files (--do-not-recatch)
274
275 -%v display on screen filenames downloaded (in realtime) - * %v1
276 short version - %v2 full animation (--display)
277
278 -Q no log - quiet mode (--do-not-log)
279
280 -q no questions - quiet mode (--quiet)
281
282 -z log - extra infos (--extra-log)
283
284 -Z log - debug (--debug-log)
285
286 -v log on screen (--verbose)
287
288 -f *log in files (--file-log)
289
290 -f2 one single log file (--single-log)
291
292 -I *make an index (I0 don t make) (--index)
293
294 -%i make a top index for a project folder (* %i0 don t make)
295 (--build-top-index)
296
297 -%I make an searchable index for this mirror (* %I0 don t make)
298 (--search-index)
299
300
301 Expert options:
302 -pN priority mode: (* p3) (--priority[=N])
303
304 -p0 just scan, don t save anything (for checking links)
305
306 -p1 save only html files
307
308 -p2 save only non html files
309
310 -*p3 save all files
311
312 -p7 get html files before, then treat other files
313
314 -S stay on the same directory (--stay-on-same-dir)
315
316 -D *can only go down into subdirs (--can-go-down)
317
318 -U can only go to upper directories (--can-go-up)
319
320 -B can both go up&down into the directory structure (--can-go-up-
321 and-down)
322
323 -a *stay on the same address (--stay-on-same-address)
324
325 -d stay on the same principal domain (--stay-on-same-domain)
326
327 -l stay on the same TLD (eg: .com) (--stay-on-same-tld)
328
329 -e go everywhere on the web (--go-everywhere)
330
331 -%H debug HTTP headers in logfile (--debug-headers)
332
333
334 Guru options: (do NOT use if possible)
335 -#X *use optimized engine (limited memory boundary checks) (--fast-
336 engine)
337
338 -#0 filter test (-#0 *.gif www.bar.com/foo.gif ) (--debug-test‐
339 filters <param>)
340
341 -#1 simplify test (-#1 ./foo/bar/../foobar)
342
343 -#2 type test (-#2 /foo/bar.php)
344
345 -#C cache list (-#C *.com/spider*.gif (--debug-cache <param>)
346
347 -#R cache repair (damaged cache) (--repair-cache)
348
349 -#d debug parser (--debug-parsing)
350
351 -#E extract new.zip cache meta-data in meta.zip
352
353 -#f always flush log files (--advanced-flushlogs)
354
355 -#FN maximum number of filters (--advanced-maxfilters[=N])
356
357 -#h version info (--version)
358
359 -#K scan stdin (debug) (--debug-scanstdin)
360
361 -#L maximum number of links (-#L1000000) (--advanced-maxlinks)
362
363 -#p display ugly progress information (--advanced-progressinfo)
364
365 -#P catch URL (--catch-url)
366
367 -#R old FTP routines (debug) (--repair-cache)
368
369 -#T generate transfer ops. log every minutes (--debug-xfrstats)
370
371 -#u wait time (--advanced-wait)
372
373 -#Z generate transfer rate statictics every minutes (--debug-rates‐
374 tats)
375
376 -#! execute a shell command (-#! "echo hello") (--exec <param>)
377
378
379 Dangerous options: (do NOT use unless you exactly know what you are doing)
380 -%! bypass built-in security limits aimed to avoid bandwith abuses
381 (bandwidth, simultaneous connections) (--disable-security-lim‐
382 its)
383
384 -IMPORTANT
385 NOTE: DANGEROUS OPTION, ONLY SUITABLE FOR EXPERTS
386
387 -USE IT WITH EXTREME CARE
388
389
390 Command-line specific options:
391 -V execute system command after each files ($0 is the filename: -V
392 "rm ") (--userdef-cmd <param>)
393
394 -%U run the engine with another id when called as root (-%U smith)
395 (--user <param>)
396
397 -%W use an external library function as a wrapper (-%W
398 myfoo.so[,myparameters]) (--callback <param>)
399
400
401 Details: Option N
402 -N0 Site-structure (default)
403
404 -N1 HTML in web/, images/other files in web/images/
405
406 -N2 HTML in web/HTML, images/other in web/images
407
408 -N3 HTML in web/, images/other in web/
409
410 -N4 HTML in web/, images/other in web/xxx, where xxx is the file
411 extension (all gif will be placed onto web/gif, for example)
412
413 -N5 Images/other in web/xxx and HTML in web/HTML
414
415 -N99 All files in web/, with random names (gadget !)
416
417 -N100 Site-structure, without www.domain.xxx/
418
419 -N101 Identical to N1 exept that "web" is replaced by the site s name
420
421 -N102 Identical to N2 exept that "web" is replaced by the site s name
422
423 -N103 Identical to N3 exept that "web" is replaced by the site s name
424
425 -N104 Identical to N4 exept that "web" is replaced by the site s name
426
427 -N105 Identical to N5 exept that "web" is replaced by the site s name
428
429 -N199 Identical to N99 exept that "web" is replaced by the site s name
430
431 -N1001 Identical to N1 exept that there is no "web" directory
432
433 -N1002 Identical to N2 exept that there is no "web" directory
434
435 -N1003 Identical to N3 exept that there is no "web" directory (option
436 set for g option)
437
438 -N1004 Identical to N4 exept that there is no "web" directory
439
440 -N1005 Identical to N5 exept that there is no "web" directory
441
442 -N1099 Identical to N99 exept that there is no "web" directory
443
444 Details: User-defined option N
445 %n Name of file without file type (ex: image)
446 %N Name of file, including file type (ex: image.gif)
447 %t File type (ex: gif)
448 %p Path [without ending /] (ex: /someimages)
449 %h Host name (ex: www.someweb.com)
450 %M URL MD5 (128 bits, 32 ascii bytes)
451 %Q query string MD5 (128 bits, 32 ascii bytes)
452 %r protocol name (ex: http)
453 %q small query string MD5 (16 bits, 4 ascii bytes)
454 %s? Short name version (ex: %sN)
455 %[param] param variable in query string
456 %[param:before:after:empty:notfound] advanced variable extraction
457
458 Details: User-defined option N and advanced variable extraction
459 %[param:before:after:empty:notfound]
460
461 -param : parameter name
462
463 -before
464 : string to prepend if the parameter was found
465
466 -after : string to append if the parameter was found
467
468 -notfound
469 : string replacement if the parameter could not be found
470
471 -empty : string replacement if the parameter was empty
472
473 -all fields, except the first one (the parameter name), can be empty
474
475
476 Details: Option K
477 -K0 foo.cgi?q=45 -> foo4B54.html?q=45 (relative URI, default)
478
479 -K -> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL)
480 (--keep-links[=N])
481
482 -K4 -> foo.cgi?q=45 (original URL)
483
484 -K3 -> /folder/foo.cgi?q=45 (absolute URI)
485
486
487 Shortcuts:
488 --mirror
489 <URLs> *make a mirror of site(s) (default)
490
491 --get
492 <URLs> get the files indicated, do not seek other URLs
493 (-qg)
494
495 --list
496 <text file> add all URL located in this text file (-%L)
497
498 --mirrorlinks
499 <URLs> mirror all links in 1st level pages (-Y)
500
501 --testlinks
502 <URLs> test links in pages (-r1p0C0I0t)
503
504 --spider
505 <URLs> spider site(s), to test links: reports Errors &
506 Warnings (-p0C0I0t)
507
508 --testsite
509 <URLs> identical to --spider
510
511 --skeleton
512 <URLs> make a mirror, but gets only html files (-p1)
513
514 --update
515 update a mirror, without confirmation (-iC2)
516
517 --continue
518 continue a mirror, without confirmation (-iC1)
519
520
521 --catchurl
522 create a temporary proxy to capture an URL or a form
523 post URL
524
525 --clean
526 erase cache & log files
527
528
529 --http10
530 force http/1.0 requests (-%h)
531
532
533 Details: Option %W: External callbacks prototypes
534 see htsdefines.h
536 /etc/httrack.conf
537 The system wide configuration file.
538
540 HOME Is being used if you defined in /etc/httrack.conf the line path
541 ~/websites/#
542
544 Errors/Warnings are reported to hts-log.txt by default, or to stderr if
545 the -v option was specified.
546
548 These are the principals limits of HTTrack for that moment. Note that
549 we did not heard about any other utility that would have solved them.
550
551
552 - Several scripts generating complex filenames may not find them (ex:
553 img.src='image'+a+Mobj.dst+'.gif')
554
555 - Some java classes may not find some files on them (class included)
556
557 - Cgi-bin links may not work properly in some cases (parameters
558 needed). To avoid them: use filters like -*cgi-bin*
559
560
562 Please reports bugs to <bugs@httrack.com>. Include a complete, self-
563 contained example that will allow the bug to be reproduced, and say
564 which version of httrack you are using. Do not forget to detail options
565 used, OS version, and any other information you deem necessary.
566
568 Copyright (C) Xavier Roche and other contributors
569
570 This program is free software; you can redistribute it and/or modify it
571 under the terms of the GNU General Public License as published by the
572 Free Software Foundation; either version 2 of the License, or any later
573 version.
574
575 This program is distributed in the hope that it will be useful, but
576 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
577 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
578 Public License for more details.
579
580 You should have received a copy of the GNU General Public License along
581 with this program; if not, write to the Free Software Foundation, Inc.,
582 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
583
585 The most recent released version of httrack can be found at:
586 http://www.httrack.com
587
589 Xavier Roche <roche@httrack.com>
590
592 The HTML documentation (available online at
593 http://www.httrack.com/html/ ) contains more detailed information.
594 Please also refer to the httrack FAQ (available online at
595 http://www.httrack.com/html/faq.html )
596
597
598
599httrack website copier Jun 2007 httrack(1)