1httrack(1) General Commands Manual httrack(1)
2
3
4
6 httrack - offline browser : copy websites to a local directory
7
9 httrack [ url ]... [ -filter ]... [ +filter ]... [ -O, --path ] [ -%O,
10 --chroot ] [ -w, --mirror ] [ -W, --mirror-wizard ] [ -g, --get-files ]
11 [ -i, --continue ] [ -Y, --mirrorlinks ] [ -P, --proxy ] [ -%f, --http‐
12 proxy-ftp[=N] ] [ -%b, --bind ] [ -rN, --depth[=N] ] [ -%eN,
13 --ext-depth[=N] ] [ -mN, --max-files[=N] ] [ -MN, --max-size[=N] ] [
14 -EN, --max-time[=N] ] [ -AN, --max-rate[=N] ] [ -%cN, --connec‐
15 tion-per-second[=N] ] [ -GN, --max-pause[=N] ] [ -%mN,
16 --max-mms-time[=N] ] [ -cN, --sockets[=N] ] [ -TN, --timeout ] [ -RN,
17 --retries[=N] ] [ -JN, --min-rate[=N] ] [ -HN, --host-control[=N] ] [
18 -%P, --extended-parsing[=N] ] [ -n, --near ] [ -t, --test ] [ -%L,
19 --list ] [ -%S, --urllist ] [ -NN, --structure[=N] ] [ -%D,
20 --cached-delayed-type-check ] [ -%M, --mime-html ] [ -LN,
21 --long-names[=N] ] [ -KN, --keep-links[=N] ] [ -x, --replace-external ]
22 [ -%x, --disable-passwords ] [ -%q, --include-query-string ] [ -o,
23 --generate-errors ] [ -X, --purge-old[=N] ] [ -%p, --preserve ] [ -bN,
24 --cookies[=N] ] [ -u, --check-type[=N] ] [ -j, --parse-java[=N] ] [
25 -sN, --robots[=N] ] [ -%h, --http-10 ] [ -%k, --keep-alive ] [ -%B,
26 --tolerant ] [ -%s, --updatehack ] [ -%u, --urlhack ] [ -%A, --assume ]
27 [ -@iN, --protocol[=N] ] [ -%w, --disable-module ] [ -F, --user-agent ]
28 [ -%R, --referer ] [ -%E, --from ] [ -%F, --footer ] [ -%l, --language
29 ] [ -C, --cache[=N] ] [ -k, --store-all-in-cache ] [ -%n,
30 --do-not-recatch ] [ -%v, --display ] [ -Q, --do-not-log ] [ -q,
31 --quiet ] [ -z, --extra-log ] [ -Z, --debug-log ] [ -v, --verbose ] [
32 -f, --file-log ] [ -f2, --single-log ] [ -I, --index ] [ -%i,
33 --build-top-index ] [ -%I, --search-index ] [ -pN, --priority[=N] ] [
34 -S, --stay-on-same-dir ] [ -D, --can-go-down ] [ -U, --can-go-up ] [
35 -B, --can-go-up-and-down ] [ -a, --stay-on-same-address ] [ -d,
36 --stay-on-same-domain ] [ -l, --stay-on-same-tld ] [ -e, --go-every‐
37 where ] [ -%H, --debug-headers ] [ -%!, --disable-security-limits ] [
38 -V, --userdef-cmd ] [ -%U, --user ] [ -%W, --callback ] [ -K,
39 --keep-links[=N] ] [
40
42 httrack allows you to download a World Wide Web site from the Internet
43 to a local directory, building recursively all directories, getting
44 HTML, images, and other files from the server to your computer. HTTrack
45 arranges the original site's relative link-structure. Simply open a
46 page of the "mirrored" website in your browser, and you can browse the
47 site from link to link, as if you were viewing it online. HTTrack can
48 also update an existing mirrored site, and resume interrupted down‐
49 loads.
50
52 httrack www.someweb.com/bob/
53 mirror site www.someweb.com/bob/ and only this site
54
55 httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
56 -mime:application/*
57 mirror the two sites together (with shared links) and accept
58 any .jpg files on .com sites
59
60 httrack www.someweb.com/bob/bobby.html +* -r6
61 means get all files starting from bobby.html, with 6 link-depth,
62 and possibility of going everywhere on the web
63
64 httrack www.someweb.com/bob/bobby.html --spider -P
65 proxy.myhost.com:8080
66 runs the spider on www.someweb.com/bob/bobby.html using a proxy
67
68 httrack --update
69 updates a mirror in the current folder
70
71 httrack
72 will bring you to the interactive mode
73
74 httrack --continue
75 continues a mirror in the current folder
76
78 General options:
79 -O path for mirror/logfiles+cache (-O path mirror[,path cache and
80 logfiles]) (--path <param>)
81
82 -%O chroot path to, must be r00t (-%O root path) (--chroot <param>)
83
84
85 Action options:
86 -w *mirror web sites (--mirror)
87
88 -W mirror web sites, semi-automatic (asks questions) (--mirror-wiz‐
89 ard)
90
91 -g just get files (saved in the current directory) (--get-files)
92
93 -i continue an interrupted mirror using the cache (--continue)
94
95 -Y mirror ALL links located in the first level pages (mirror links)
96 (--mirrorlinks)
97
98
99 Proxy options:
100 -P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy
101 <param>)
102
103 -%f *use proxy for ftp (f0 don t use) (--httpproxy-ftp[=N])
104
105 -%b use this local hostname to make/send requests (-%b hostname)
106 (--bind <param>)
107
108
109 Limits options:
110 -rN set the mirror depth to N (* r9999) (--depth[=N])
111
112 -%eN set the external links depth to N (* %e0) (--ext-depth[=N])
113
114 -mN maximum file length for a non-html file (--max-files[=N])
115
116 -mN,N2 maximum file length for non html (N) and html (N2)
117
118 -MN maximum overall size that can be uploaded/scanned
119 (--max-size[=N])
120
121 -EN maximum mirror time in seconds (60=1 minute, 3600=1 hour)
122 (--max-time[=N])
123
124 -AN maximum transfer rate in bytes/seconds (1000=1KB/s max)
125 (--max-rate[=N])
126
127 -%cN maximum number of connections/seconds (*%c10) (--connec‐
128 tion-per-second[=N])
129
130 -GN pause transfer if N bytes reached, and wait until lock file is
131 deleted (--max-pause[=N])
132
133 -%mN maximum mms stream download time in seconds (60=1 minute, 3600=1
134 hour) (--max-mms-time[=N])
135
136
137 Flow control:
138 -cN number of multiple connections (*c8) (--sockets[=N])
139
140 -TN timeout, number of seconds after a non-responding link is shut‐
141 down (--timeout)
142
143 -RN number of retries, in case of timeout or non-fatal errors (*R1)
144 (--retries[=N])
145
146 -JN traffic jam control, minimum transfert rate (bytes/seconds) tol‐
147 erated for a link (--min-rate[=N])
148
149 -HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or
150 slow (--host-control[=N])
151
152
153 Links options:
154 -%P *extended parsing, attempt to parse all links, even in unknown
155 tags or Javascript (%P0 don t use) (--extended-parsing[=N])
156
157 -n get non-html files near an html file (ex: an image located
158 outside) (--near)
159
160 -t test all URLs (even forbidden ones) (--test)
161
162 -%L <file> add all URL located in this text file (one URL per line)
163 (--list <param>)
164
165 -%S <file> add all scan rules located in this text file (one scan
166 rule per line) (--urllist <param>)
167
168
169 Build options:
170 -NN structure type (0 *original structure, 1+: see below) (--struc‐
171 ture[=N])
172
173 -or user defined structure (-N "%h%p/%n%q.%t")
174
175 -%N delayed type check, don t make any link test but wait for files
176 download to start instead (experimental) (%N0 don t use, %N1 use
177 for unknown extensions, * %N2 always use)
178
179 -%D cached delayed type check, don t wait for remote type during
180 updates, to speedup them (%D0 wait, * %D1 don t wait)
181 (--cached-delayed-type-check)
182
183 -%M generate a RFC MIME-encapsulated full-archive (.mht)
184 (--mime-html)
185
186 -LN long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 com‐
187 patible) (--long-names[=N])
188
189 -KN keep original links (e.g. http://www.adr/link) (K0 *relative
190 link, K absolute links, K4 original links, K3 absolute URI
191 links) (--keep-links[=N])
192
193 -x replace external html links by error pages (--replace-external)
194
195 -%x do not include any password for external password protected web‐
196 sites (%x0 include) (--disable-passwords)
197
198 -%q *include query string for local files (useless, for information
199 purpose only) (%q0 don t include) (--include-query-string)
200
201 -o *generate output html file in case of error (404..) (o0 don t
202 generate) (--generate-errors)
203
204 -X *purge old files after update (X0 keep delete) (--purge-old[=N])
205
206 -%p preserve html files as is (identical to -K4 -%F "" ) (--pre‐
207 serve)
208
209
210 Spider options:
211 -bN accept cookies in cookies.txt (0=do not accept,* 1=accept)
212 (--cookies[=N])
213
214 -u check document type if unknown (cgi,asp..) (u0 don t check, * u1
215 check but /, u2 check always) (--check-type[=N])
216
217 -j *parse Java Classes (j0 don t parse, bitmask: |1 parse default,
218 |2 don t parse .class |4 don t parse .js |8 don t be aggressive)
219 (--parse-java[=N])
220
221 -sN follow robots.txt and meta robots tags (0=never,1=sometimes,*
222 2=always, 3=always (even strict rules)) (--robots[=N])
223
224 -%h force HTTP/1.0 requests (reduce update features, only for old
225 servers or proxies) (--http-10)
226
227 -%k use keep-alive if possible, greately reducing latency for small
228 files and test requests (%k0 don t use) (--keep-alive)
229
230 -%B tolerant requests (accept bogus responses on some servers, but
231 not standard!) (--tolerant)
232
233 -%s update hacks: various hacks to limit re-transfers when updating
234 (identical size, bogus response..) (--updatehack)
235
236 -%u url hacks: various hacks to limit duplicate URLs (strip //,
237 www.foo.com==foo.com..) (--urlhack)
238
239 -%A assume that a type (cgi,asp..) is always linked with a mime type
240 (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume
241 <param>)
242
243 -can also be used to force a specific file type: --assume
244 foo.cgi=text/html
245
246 -@iN internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only)
247 (--protocol[=N])
248
249 -%w disable a specific external mime module (-%w htsswf -%w htsjava)
250 (--disable-module <param>)
251
252
253 Browser ID:
254 -F user-agent field sent in HTTP headers (-F "user-agent name")
255 (--user-agent <param>)
256
257 -%R default referer field sent in HTTP headers (--referer <param>)
258
259 -%E from email address sent in HTTP headers (--from <param>)
260
261 -%F footer string in Html code (-%F "Mirrored [from host %s [file %s
262 [at %s]]]" (--footer <param>)
263
264 -%l preffered language (-%l "fr, en, jp, *" (--language <param>)
265
266
267 Log, index, cache
268 -C create/use a cache for updates and retries (C0 no cache,C1 cache
269 is prioritary,* C2 test update before) (--cache[=N])
270
271 -k store all files in cache (not useful if files on disk)
272 (--store-all-in-cache)
273
274 -%n do not re-download locally erased files (--do-not-recatch)
275
276 -%v display on screen filenames downloaded (in realtime) - * %v1
277 short version - %v2 full animation (--display)
278
279 -Q no log - quiet mode (--do-not-log)
280
281 -q no questions - quiet mode (--quiet)
282
283 -z log - extra infos (--extra-log)
284
285 -Z log - debug (--debug-log)
286
287 -v log on screen (--verbose)
288
289 -f *log in files (--file-log)
290
291 -f2 one single log file (--single-log)
292
293 -I *make an index (I0 don t make) (--index)
294
295 -%i make a top index for a project folder (* %i0 don t make)
296 (--build-top-index)
297
298 -%I make an searchable index for this mirror (* %I0 don t make)
299 (--search-index)
300
301
302 Expert options:
303 -pN priority mode: (* p3) (--priority[=N])
304
305 -p0 just scan, don t save anything (for checking links)
306
307 -p1 save only html files
308
309 -p2 save only non html files
310
311 -*p3 save all files
312
313 -p7 get html files before, then treat other files
314
315 -S stay on the same directory (--stay-on-same-dir)
316
317 -D *can only go down into subdirs (--can-go-down)
318
319 -U can only go to upper directories (--can-go-up)
320
321 -B can both go up&down into the directory structure
322 (--can-go-up-and-down)
323
324 -a *stay on the same address (--stay-on-same-address)
325
326 -d stay on the same principal domain (--stay-on-same-domain)
327
328 -l stay on the same TLD (eg: .com) (--stay-on-same-tld)
329
330 -e go everywhere on the web (--go-everywhere)
331
332 -%H debug HTTP headers in logfile (--debug-headers)
333
334
335 Guru options: (do NOT use if possible)
336 -#X *use optimized engine (limited memory boundary checks)
337 (--fast-engine)
338
339 -#0 filter test (-#0 *.gif www.bar.com/foo.gif ) (--debug-test‐
340 filters <param>)
341
342 -#1 simplify test (-#1 ./foo/bar/../foobar)
343
344 -#2 type test (-#2 /foo/bar.php)
345
346 -#C cache list (-#C *.com/spider*.gif (--debug-cache <param>)
347
348 -#R cache repair (damaged cache) (--repair-cache)
349
350 -#d debug parser (--debug-parsing)
351
352 -#E extract new.zip cache meta-data in meta.zip
353
354 -#f always flush log files (--advanced-flushlogs)
355
356 -#FN maximum number of filters (--advanced-maxfilters[=N])
357
358 -#h version info (--version)
359
360 -#K scan stdin (debug) (--debug-scanstdin)
361
362 -#L maximum number of links (-#L1000000) (--advanced-maxlinks)
363
364 -#p display ugly progress information (--advanced-progressinfo)
365
366 -#P catch URL (--catch-url)
367
368 -#R old FTP routines (debug) (--repair-cache)
369
370 -#T generate transfer ops. log every minutes (--debug-xfrstats)
371
372 -#u wait time (--advanced-wait)
373
374 -#Z generate transfer rate statictics every minutes (--debug-rates‐
375 tats)
376
377 -#! execute a shell command (-#! "echo hello") (--exec <param>)
378
379
380 Dangerous options: (do NOT use unless you exactly know what you are doing)
381 -%! bypass built-in security limits aimed to avoid bandwith abuses
382 (bandwidth, simultaneous connections) (--disable-security-lim‐
383 its)
384
385 -IMPORTANT
386 NOTE: DANGEROUS OPTION, ONLY SUITABLE FOR EXPERTS
387
388 -USE IT WITH EXTREME CARE
389
390
391 Command-line specific options:
392 -V execute system command after each files ($0 is the filename: -V
393 "rm ") (--userdef-cmd <param>)
394
395 -%U run the engine with another id when called as root (-%U smith)
396 (--user <param>)
397
398 -%W use an external library function as a wrapper (-%W
399 myfoo.so[,myparameters]) (--callback <param>)
400
401
402 Details: Option N
403 -N0 Site-structure (default)
404
405 -N1 HTML in web/, images/other files in web/images/
406
407 -N2 HTML in web/HTML, images/other in web/images
408
409 -N3 HTML in web/, images/other in web/
410
411 -N4 HTML in web/, images/other in web/xxx, where xxx is the file
412 extension (all gif will be placed onto web/gif, for example)
413
414 -N5 Images/other in web/xxx and HTML in web/HTML
415
416 -N99 All files in web/, with random names (gadget !)
417
418 -N100 Site-structure, without www.domain.xxx/
419
420 -N101 Identical to N1 exept that "web" is replaced by the site s name
421
422 -N102 Identical to N2 exept that "web" is replaced by the site s name
423
424 -N103 Identical to N3 exept that "web" is replaced by the site s name
425
426 -N104 Identical to N4 exept that "web" is replaced by the site s name
427
428 -N105 Identical to N5 exept that "web" is replaced by the site s name
429
430 -N199 Identical to N99 exept that "web" is replaced by the site s name
431
432 -N1001 Identical to N1 exept that there is no "web" directory
433
434 -N1002 Identical to N2 exept that there is no "web" directory
435
436 -N1003 Identical to N3 exept that there is no "web" directory (option
437 set for g option)
438
439 -N1004 Identical to N4 exept that there is no "web" directory
440
441 -N1005 Identical to N5 exept that there is no "web" directory
442
443 -N1099 Identical to N99 exept that there is no "web" directory
444
445 Details: User-defined option N
446 %n Name of file without file type (ex: image)
447 %N Name of file, including file type (ex: image.gif)
448 %t File type (ex: gif)
449 %p Path [without ending /] (ex: /someimages)
450 %h Host name (ex: www.someweb.com)
451 %M URL MD5 (128 bits, 32 ascii bytes)
452 %Q query string MD5 (128 bits, 32 ascii bytes)
453 %r protocol name (ex: http)
454 %q small query string MD5 (16 bits, 4 ascii bytes)
455 %s? Short name version (ex: %sN)
456 %[param] param variable in query string
457 %[param:before:after:empty:notfound] advanced variable extraction
458
459 Details: User-defined option N and advanced variable extraction
460 %[param:before:after:empty:notfound]
461
462 -param : parameter name
463
464 -before
465 : string to prepend if the parameter was found
466
467 -after : string to append if the parameter was found
468
469 -notfound
470 : string replacement if the parameter could not be found
471
472 -empty : string replacement if the parameter was empty
473
474 -all fields, except the first one (the parameter name), can be empty
475
476
477 Details: Option K
478 -K0 foo.cgi?q=45 -> foo4B54.html?q=45 (relative URI, default)
479
480 -K -> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL)
481 (--keep-links[=N])
482
483 -K4 -> foo.cgi?q=45 (original URL)
484
485 -K3 -> /folder/foo.cgi?q=45 (absolute URI)
486
487
488 Shortcuts:
489 --mirror
490 <URLs> *make a mirror of site(s) (default)
491
492 --get
493 <URLs> get the files indicated, do not seek other URLs
494 (-qg)
495
496 --list
497 <text file> add all URL located in this text file (-%L)
498
499 --mirrorlinks
500 <URLs> mirror all links in 1st level pages (-Y)
501
502 --testlinks
503 <URLs> test links in pages (-r1p0C0I0t)
504
505 --spider
506 <URLs> spider site(s), to test links: reports Errors &
507 Warnings (-p0C0I0t)
508
509 --testsite
510 <URLs> identical to --spider
511
512 --skeleton
513 <URLs> make a mirror, but gets only html files (-p1)
514
515 --update
516 update a mirror, without confirmation (-iC2)
517
518 --continue
519 continue a mirror, without confirmation (-iC1)
520
521
522 --catchurl
523 create a temporary proxy to capture an URL or a form
524 post URL
525
526 --clean
527 erase cache & log files
528
529
530 --http10
531 force http/1.0 requests (-%h)
532
533
534 Details: Option %W: External callbacks prototypes
535 see htsdefines.h
537 /etc/httrack.conf
538 The system wide configuration file.
539
541 HOME Is being used if you defined in /etc/httrack.conf the line path
542 ~/websites/#
543
545 Errors/Warnings are reported to hts-log.txt by default, or to stderr if
546 the -v option was specified.
547
549 These are the principals limits of HTTrack for that moment. Note that
550 we did not heard about any other utility that would have solved them.
551
552
553 - Several scripts generating complex filenames may not find them (ex:
554 img.src='image'+a+Mobj.dst+'.gif')
555
556 - Some java classes may not find some files on them (class included)
557
558 - Cgi-bin links may not work properly in some cases (parameters
559 needed). To avoid them: use filters like -*cgi-bin*
560
562 Please reports bugs to <bugs@httrack.com>. Include a complete, self-
563 contained example that will allow the bug to be reproduced, and say
564 which version of httrack you are using. Do not forget to detail options
565 used, OS version, and any other information you deem necessary.
566
568 Copyright (C) Xavier Roche and other contributors
569
570 This program is free software; you can redistribute it and/or modify it
571 under the terms of the GNU General Public License as published by the
572 Free Software Foundation; either version 2 of the License, or any later
573 version.
574
575 This program is distributed in the hope that it will be useful, but
576 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
577 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
578 Public License for more details.
579
580 You should have received a copy of the GNU General Public License along
581 with this program; if not, write to the Free Software Foundation, Inc.,
582 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
583
585 The most recent released version of httrack can be found at:
586 http://www.httrack.com
587
589 Xavier Roche <roche@httrack.com>
590
592 The HTML documentation (available online at
593 http://www.httrack.com/html/ ) contains more detailed information.
594 Please also refer to the httrack FAQ (available online at
595 http://www.httrack.com/html/faq.html )
596
597
598
599httrack website coHpTiTerrack version 3.43-9 (compiled Jan 4 2010) httrack(1)