1httrack(1) General Commands Manual httrack(1)
2
3
4
6 httrack - offline browser : copy websites to a local directory
7
9 httrack [ url ]... [ -filter ]... [ +filter ]... [ -O, --path ] [ -w,
10 --mirror ] [ -W, --mirror-wizard ] [ -g, --get-files ] [ -i, --continue
11 ] [ -Y, --mirrorlinks ] [ -P, --proxy ] [ -%f, --httpproxy-ftp[=N] ] [
12 -%b, --bind ] [ -rN, --depth[=N] ] [ -%eN, --ext-depth[=N] ] [ -mN,
13 --max-files[=N] ] [ -MN, --max-size[=N] ] [ -EN, --max-time[=N] ] [
14 -AN, --max-rate[=N] ] [ -%cN, --connection-per-second[=N] ] [ -GN,
15 --max-pause[=N] ] [ -cN, --sockets[=N] ] [ -TN, --timeout[=N] ] [ -RN,
16 --retries[=N] ] [ -JN, --min-rate[=N] ] [ -HN, --host-control[=N] ] [
17 -%P, --extended-parsing[=N] ] [ -n, --near ] [ -t, --test ] [ -%L,
18 --list ] [ -%S, --urllist ] [ -NN, --structure[=N] ] [ -%D,
19 --cached-delayed-type-check ] [ -%M, --mime-html ] [ -LN,
20 --long-names[=N] ] [ -KN, --keep-links[=N] ] [ -x, --replace-external ]
21 [ -%x, --disable-passwords ] [ -%q, --include-query-string ] [ -o,
22 --generate-errors ] [ -X, --purge-old[=N] ] [ -%p, --preserve ] [ -%T,
23 --utf8-conversion ] [ -bN, --cookies[=N] ] [ -u, --check-type[=N] ] [
24 -j, --parse-java[=N] ] [ -sN, --robots[=N] ] [ -%h, --http-10 ] [ -%k,
25 --keep-alive ] [ -%B, --tolerant ] [ -%s, --updatehack ] [ -%u, --url‐
26 hack ] [ -%A, --assume ] [ -@iN, --protocol[=N] ] [ -%w, --disable-mod‐
27 ule ] [ -F, --user-agent ] [ -%R, --referer ] [ -%E, --from ] [ -%F,
28 --footer ] [ -%l, --language ] [ -%a, --accept ] [ -%X, --headers ] [
29 -C, --cache[=N] ] [ -k, --store-all-in-cache ] [ -%n, --do-not-recatch
30 ] [ -%v, --display ] [ -Q, --do-not-log ] [ -q, --quiet ] [ -z,
31 --extra-log ] [ -Z, --debug-log ] [ -v, --verbose ] [ -f, --file-log ]
32 [ -f2, --single-log ] [ -I, --index ] [ -%i, --build-top-index ] [ -%I,
33 --search-index ] [ -pN, --priority[=N] ] [ -S, --stay-on-same-dir ] [
34 -D, --can-go-down ] [ -U, --can-go-up ] [ -B, --can-go-up-and-down ] [
35 -a, --stay-on-same-address ] [ -d, --stay-on-same-domain ] [ -l,
36 --stay-on-same-tld ] [ -e, --go-everywhere ] [ -%H, --debug-headers ] [
37 -%!, --disable-security-limits ] [ -V, --userdef-cmd ] [ -%W, --call‐
38 back ] [ -K, --keep-links[=N] ] [
39
41 httrack allows you to download a World Wide Web site from the Internet
42 to a local directory, building recursively all directories, getting
43 HTML, images, and other files from the server to your computer. HTTrack
44 arranges the original site's relative link-structure. Simply open a
45 page of the "mirrored" website in your browser, and you can browse the
46 site from link to link, as if you were viewing it online. HTTrack can
47 also update an existing mirrored site, and resume interrupted down‐
48 loads.
49
51 httrack www.someweb.com/bob/
52 mirror site www.someweb.com/bob/ and only this site
53
54 httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
55 -mime:application/*
56 mirror the two sites together (with shared links) and accept
57 any .jpg files on .com sites
58
59 httrack www.someweb.com/bob/bobby.html +* -r6
60 means get all files starting from bobby.html, with 6 link-depth,
61 and possibility of going everywhere on the web
62
63 httrack www.someweb.com/bob/bobby.html --spider -P
64 proxy.myhost.com:8080
65 runs the spider on www.someweb.com/bob/bobby.html using a proxy
66
67 httrack --update
68 updates a mirror in the current folder
69
70 httrack
71 will bring you to the interactive mode
72
73 httrack --continue
74 continues a mirror in the current folder
75
77 General options:
78 -O path for mirror/logfiles+cache (-O path mirror[,path cache and
79 logfiles]) (--path <param>)
80
81
82 Action options:
83 -w *mirror web sites (--mirror)
84
85 -W mirror web sites, semi-automatic (asks questions) (--mirror-wiz‐
86 ard)
87
88 -g just get files (saved in the current directory) (--get-files)
89
90 -i continue an interrupted mirror using the cache (--continue)
91
92 -Y mirror ALL links located in the first level pages (mirror links)
93 (--mirrorlinks)
94
95
96 Proxy options:
97 -P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy
98 <param>)
99
100 -%f *use proxy for ftp (f0 don t use) (--httpproxy-ftp[=N])
101
102 -%b use this local hostname to make/send requests (-%b hostname)
103 (--bind <param>)
104
105
106 Limits options:
107 -rN set the mirror depth to N (* r9999) (--depth[=N])
108
109 -%eN set the external links depth to N (* %e0) (--ext-depth[=N])
110
111 -mN maximum file length for a non-html file (--max-files[=N])
112
113 -mN,N2 maximum file length for non html (N) and html (N2)
114
115 -MN maximum overall size that can be uploaded/scanned
116 (--max-size[=N])
117
118 -EN maximum mirror time in seconds (60=1 minute, 3600=1 hour)
119 (--max-time[=N])
120
121 -AN maximum transfer rate in bytes/seconds (1000=1KB/s max)
122 (--max-rate[=N])
123
124 -%cN maximum number of connections/seconds (*%c10) (--connec‐
125 tion-per-second[=N])
126
127 -GN pause transfer if N bytes reached, and wait until lock file is
128 deleted (--max-pause[=N])
129
130
131 Flow control:
132 -cN number of multiple connections (*c8) (--sockets[=N])
133
134 -TN timeout, number of seconds after a non-responding link is shut‐
135 down (--timeout[=N])
136
137 -RN number of retries, in case of timeout or non-fatal errors (*R1)
138 (--retries[=N])
139
140 -JN traffic jam control, minimum transfert rate (bytes/seconds) tol‐
141 erated for a link (--min-rate[=N])
142
143 -HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or
144 slow (--host-control[=N])
145
146
147 Links options:
148 -%P *extended parsing, attempt to parse all links, even in unknown
149 tags or Javascript (%P0 don t use) (--extended-parsing[=N])
150
151 -n get non-html files near an html file (ex: an image located
152 outside) (--near)
153
154 -t test all URLs (even forbidden ones) (--test)
155
156 -%L <file> add all URL located in this text file (one URL per line)
157 (--list <param>)
158
159 -%S <file> add all scan rules located in this text file (one scan
160 rule per line) (--urllist <param>)
161
162
163 Build options:
164 -NN structure type (0 *original structure, 1+: see below) (--struc‐
165 ture[=N])
166
167 -or user defined structure (-N "%h%p/%n%q.%t")
168
169 -%N delayed type check, don t make any link test but wait for files
170 download to start instead (experimental) (%N0 don t use, %N1 use
171 for unknown extensions, * %N2 always use)
172
173 -%D cached delayed type check, don t wait for remote type during
174 updates, to speedup them (%D0 wait, * %D1 don t wait)
175 (--cached-delayed-type-check)
176
177 -%M generate a RFC MIME-encapsulated full-archive (.mht)
178 (--mime-html)
179
180 -LN long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 com‐
181 patible) (--long-names[=N])
182
183 -KN keep original links (e.g. http://www.adr/link) (K0 *relative
184 link, K absolute links, K4 original links, K3 absolute URI
185 links, K5 transparent proxy link) (--keep-links[=N])
186
187 -x replace external html links by error pages (--replace-external)
188
189 -%x do not include any password for external password protected web‐
190 sites (%x0 include) (--disable-passwords)
191
192 -%q *include query string for local files (useless, for information
193 purpose only) (%q0 don t include) (--include-query-string)
194
195 -o *generate output html file in case of error (404..) (o0 don t
196 generate) (--generate-errors)
197
198 -X *purge old files after update (X0 keep delete) (--purge-old[=N])
199
200 -%p preserve html files as is (identical to -K4 -%F "" ) (--pre‐
201 serve)
202
203 -%T links conversion to UTF-8 (--utf8-conversion)
204
205
206 Spider options:
207 -bN accept cookies in cookies.txt (0=do not accept,* 1=accept)
208 (--cookies[=N])
209
210 -u check document type if unknown (cgi,asp..) (u0 don t check, * u1
211 check but /, u2 check always) (--check-type[=N])
212
213 -j *parse Java Classes (j0 don t parse, bitmask: |1 parse default,
214 |2 don t parse .class |4 don t parse .js |8 don t be aggressive)
215 (--parse-java[=N])
216
217 -sN follow robots.txt and meta robots tags (0=never,1=sometimes,*
218 2=always, 3=always (even strict rules)) (--robots[=N])
219
220 -%h force HTTP/1.0 requests (reduce update features, only for old
221 servers or proxies) (--http-10)
222
223 -%k use keep-alive if possible, greately reducing latency for small
224 files and test requests (%k0 don t use) (--keep-alive)
225
226 -%B tolerant requests (accept bogus responses on some servers, but
227 not standard!) (--tolerant)
228
229 -%s update hacks: various hacks to limit re-transfers when updating
230 (identical size, bogus response..) (--updatehack)
231
232 -%u url hacks: various hacks to limit duplicate URLs (strip //,
233 www.foo.com==foo.com..) (--urlhack)
234
235 -%A assume that a type (cgi,asp..) is always linked with a mime type
236 (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume
237 <param>)
238
239 -can also be used to force a specific file type: --assume
240 foo.cgi=text/html
241
242 -@iN internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only)
243 (--protocol[=N])
244
245 -%w disable a specific external mime module (-%w htsswf -%w htsjava)
246 (--disable-module <param>)
247
248
249 Browser ID:
250 -F user-agent field sent in HTTP headers (-F "user-agent name")
251 (--user-agent <param>)
252
253 -%R default referer field sent in HTTP headers (--referer <param>)
254
255 -%E from email address sent in HTTP headers (--from <param>)
256
257 -%F footer string in Html code (-%F "Mirrored [from host %s [file %s
258 [at %s]]]" (--footer <param>)
259
260 -%l preffered language (-%l "fr, en, jp, *" (--language <param>)
261
262 -%a accepted formats (-%a "text/html,image/png;q=0.9,*/*;q=0.1"
263 (--accept <param>)
264
265 -%X additional HTTP header line (-%X "X-Magic: 42" (--headers
266 <param>)
267
268
269 Log, index, cache
270 -C create/use a cache for updates and retries (C0 no cache,C1 cache
271 is prioritary,* C2 test update before) (--cache[=N])
272
273 -k store all files in cache (not useful if files on disk)
274 (--store-all-in-cache)
275
276 -%n do not re-download locally erased files (--do-not-recatch)
277
278 -%v display on screen filenames downloaded (in realtime) - * %v1
279 short version - %v2 full animation (--display)
280
281 -Q no log - quiet mode (--do-not-log)
282
283 -q no questions - quiet mode (--quiet)
284
285 -z log - extra infos (--extra-log)
286
287 -Z log - debug (--debug-log)
288
289 -v log on screen (--verbose)
290
291 -f *log in files (--file-log)
292
293 -f2 one single log file (--single-log)
294
295 -I *make an index (I0 don t make) (--index)
296
297 -%i make a top index for a project folder (* %i0 don t make)
298 (--build-top-index)
299
300 -%I make an searchable index for this mirror (* %I0 don t make)
301 (--search-index)
302
303
304 Expert options:
305 -pN priority mode: (* p3) (--priority[=N])
306
307 -p0 just scan, don t save anything (for checking links)
308
309 -p1 save only html files
310
311 -p2 save only non html files
312
313 -*p3 save all files
314
315 -p7 get html files before, then treat other files
316
317 -S stay on the same directory (--stay-on-same-dir)
318
319 -D *can only go down into subdirs (--can-go-down)
320
321 -U can only go to upper directories (--can-go-up)
322
323 -B can both go up&down into the directory structure
324 (--can-go-up-and-down)
325
326 -a *stay on the same address (--stay-on-same-address)
327
328 -d stay on the same principal domain (--stay-on-same-domain)
329
330 -l stay on the same TLD (eg: .com) (--stay-on-same-tld)
331
332 -e go everywhere on the web (--go-everywhere)
333
334 -%H debug HTTP headers in logfile (--debug-headers)
335
336
337 Guru options: (do NOT use if possible)
338 -#X *use optimized engine (limited memory boundary checks)
339 (--fast-engine)
340
341 -#0 filter test (-#0 *.gif www.bar.com/foo.gif ) (--debug-test‐
342 filters <param>)
343
344 -#1 simplify test (-#1 ./foo/bar/../foobar)
345
346 -#2 type test (-#2 /foo/bar.php)
347
348 -#C cache list (-#C *.com/spider*.gif (--debug-cache <param>)
349
350 -#R cache repair (damaged cache) (--repair-cache)
351
352 -#d debug parser (--debug-parsing)
353
354 -#E extract new.zip cache meta-data in meta.zip
355
356 -#f always flush log files (--advanced-flushlogs)
357
358 -#FN maximum number of filters (--advanced-maxfilters[=N])
359
360 -#h version info (--version)
361
362 -#K scan stdin (debug) (--debug-scanstdin)
363
364 -#L maximum number of links (-#L1000000) (--advanced-maxlinks[=N])
365
366 -#p display ugly progress information (--advanced-progressinfo)
367
368 -#P catch URL (--catch-url)
369
370 -#R old FTP routines (debug) (--repair-cache)
371
372 -#T generate transfer ops. log every minutes (--debug-xfrstats)
373
374 -#u wait time (--advanced-wait)
375
376 -#Z generate transfer rate statictics every minutes (--debug-rates‐
377 tats)
378
379
380 Dangerous options: (do NOT use unless you exactly know what you are doing)
381 -%! bypass built-in security limits aimed to avoid bandwidth abuses
382 (bandwidth, simultaneous connections) (--disable-security-lim‐
383 its)
384
385 -IMPORTANT
386 NOTE: DANGEROUS OPTION, ONLY SUITABLE FOR EXPERTS
387
388 -USE IT WITH EXTREME CARE
389
390
391 Command-line specific options:
392 -V execute system command after each files ($0 is the filename: -V
393 "rm \$0") (--userdef-cmd <param>)
394
395 -%W use an external library function as a wrapper (-%W
396 myfoo.so[,myparameters]) (--callback <param>)
397
398
399 Details: Option N
400 -N0 Site-structure (default)
401
402 -N1 HTML in web/, images/other files in web/images/
403
404 -N2 HTML in web/HTML, images/other in web/images
405
406 -N3 HTML in web/, images/other in web/
407
408 -N4 HTML in web/, images/other in web/xxx, where xxx is the file
409 extension (all gif will be placed onto web/gif, for example)
410
411 -N5 Images/other in web/xxx and HTML in web/HTML
412
413 -N99 All files in web/, with random names (gadget !)
414
415 -N100 Site-structure, without www.domain.xxx/
416
417 -N101 Identical to N1 exept that "web" is replaced by the site s name
418
419 -N102 Identical to N2 exept that "web" is replaced by the site s name
420
421 -N103 Identical to N3 exept that "web" is replaced by the site s name
422
423 -N104 Identical to N4 exept that "web" is replaced by the site s name
424
425 -N105 Identical to N5 exept that "web" is replaced by the site s name
426
427 -N199 Identical to N99 exept that "web" is replaced by the site s name
428
429 -N1001 Identical to N1 exept that there is no "web" directory
430
431 -N1002 Identical to N2 exept that there is no "web" directory
432
433 -N1003 Identical to N3 exept that there is no "web" directory (option
434 set for g option)
435
436 -N1004 Identical to N4 exept that there is no "web" directory
437
438 -N1005 Identical to N5 exept that there is no "web" directory
439
440 -N1099 Identical to N99 exept that there is no "web" directory
441
442 Details: User-defined option N
443 %n Name of file without file type (ex: image)
444 %N Name of file, including file type (ex: image.gif)
445 %t File type (ex: gif)
446 %p Path [without ending /] (ex: /someimages)
447 %h Host name (ex: www.someweb.com)
448 %M URL MD5 (128 bits, 32 ascii bytes)
449 %Q query string MD5 (128 bits, 32 ascii bytes)
450 %k full query string
451 %r protocol name (ex: http)
452 %q small query string MD5 (16 bits, 4 ascii bytes)
453 %s? Short name version (ex: %sN)
454 %[param] param variable in query string
455 %[param:before:after:empty:notfound] advanced variable extraction
456
457 Details: User-defined option N and advanced variable extraction
458 %[param:before:after:empty:notfound]
459
460 -param : parameter name
461
462 -before
463 : string to prepend if the parameter was found
464
465 -after : string to append if the parameter was found
466
467 -notfound
468 : string replacement if the parameter could not be found
469
470 -empty : string replacement if the parameter was empty
471
472 -all fields, except the first one (the parameter name), can be empty
473
474
475 Details: Option K
476 -K0 foo.cgi?q=45 -> foo4B54.html?q=45 (relative URI, default)
477
478 -K -> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL)
479 (--keep-links[=N])
480
481 -K3 -> /folder/foo.cgi?q=45 (absolute URI)
482
483 -K4 -> foo.cgi?q=45 (original URL)
484
485 -K5 -> http://www.foobar.com/folder/foo4B54.html?q=45 (transparent
486 proxy URL)
487
488
489 Shortcuts:
490 --mirror
491 <URLs> *make a mirror of site(s) (default)
492
493 --get
494 <URLs> get the files indicated, do not seek other URLs
495 (-qg)
496
497 --list
498 <text file> add all URL located in this text file (-%L)
499
500 --mirrorlinks
501 <URLs> mirror all links in 1st level pages (-Y)
502
503 --testlinks
504 <URLs> test links in pages (-r1p0C0I0t)
505
506 --spider
507 <URLs> spider site(s), to test links: reports Errors &
508 Warnings (-p0C0I0t)
509
510 --testsite
511 <URLs> identical to --spider
512
513 --skeleton
514 <URLs> make a mirror, but gets only html files (-p1)
515
516 --update
517 update a mirror, without confirmation (-iC2)
518
519 --continue
520 continue a mirror, without confirmation (-iC1)
521
522
523 --catchurl
524 create a temporary proxy to capture an URL or a form
525 post URL
526
527 --clean
528 erase cache & log files
529
530
531 --http10
532 force http/1.0 requests (-%h)
533
534
535 Details: Option %W: External callbacks prototypes
536 see htsdefines.h
538 /etc/httrack.conf
539 The system wide configuration file.
540
542 HOME Is being used if you defined in /etc/httrack.conf the line path
543 ~/websites/#
544
546 Errors/Warnings are reported to hts-log.txt by default, or to stderr if
547 the -v option was specified.
548
550 These are the principals limits of HTTrack for that moment. Note that
551 we did not heard about any other utility that would have solved them.
552
553
554 - Several scripts generating complex filenames may not find them (ex:
555 img.src='image'+a+Mobj.dst+'.gif')
556
557 - Some java classes may not find some files on them (class included)
558
559 - Cgi-bin links may not work properly in some cases (parameters
560 needed). To avoid them: use filters like -*cgi-bin*
561
563 Please reports bugs to <bugs@httrack.com>. Include a complete, self-
564 contained example that will allow the bug to be reproduced, and say
565 which version of httrack you are using. Do not forget to detail options
566 used, OS version, and any other information you deem necessary.
567
569 Copyright (C) 1998-2017 Xavier Roche and other contributors
570
571 This program is free software: you can redistribute it and/or modify it
572 under the terms of the GNU General Public License as published by the
573 Free Software Foundation, either version 3 of the License, or (at your
574 option) any later version.
575
576 This program is distributed in the hope that it will be useful, but
577 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
578 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
579 Public License for more details.
580
581 You should have received a copy of the GNU General Public License along
582 with this program. If not, see <http://www.gnu.org/licenses/>.
583
584
586 The most recent released version of httrack can be found at:
587 http://www.httrack.com
588
590 Xavier Roche <roche@httrack.com>
591
593 The HTML documentation (available online at
594 http://www.httrack.com/html/ ) contains more detailed information.
595 Please also refer to the httrack FAQ (available online at
596 http://www.httrack.com/html/faq.html )
597
598
599
600httrack website copier 20 May 2017 httrack(1)