1thttpd(8) System Manager's Manual thttpd(8)
2
3
4
6 thttpd - tiny/turbo/throttling HTTP server
7
9 thttpd [-C configfile] [-p port] [-d dir] [-dd data_dir] [-r|-nor]
10 [-s|-nos] [-v|-nov] [-g|-nog] [-u user] [-c cgipat] [-t throttles] [-h
11 host] [-l logfile] [-i pidfile] [-T charset] [-P P3P] [-M maxage] [-V]
12 [-D]
13
15 thttpd is a simple, small, fast, and secure HTTP server. It doesn't
16 have a lot of special features, but it suffices for most uses of the
17 web, it's about as fast as the best full-featured servers (Apache,
18 NCSA, Netscape), and it has one extremely useful feature (URL-traffic-
19 based throttling) that no other server currently has.
20
22 -C Specifies a config-file to read. All options can be set either
23 by command-line flags or in the config file. See below for
24 details.
25
26 -p Specifies an alternate port number to listen on. The default is
27 80. The config-file option name for this flag is "port", and
28 the config.h option is DEFAULT_PORT.
29
30 -d Specifies a directory to chdir() to at startup. This is merely
31 a convenience - you could just as easily do a cd in the shell
32 script that invokes the program. The config-file option name
33 for this flag is "dir", and the config.h options are WEBDIR,
34 USE_USER_DIR.
35
36 -r Do a chroot() at initialization time, restricting file access to
37 the program's current directory. If -r is the compiled-in
38 default, then -nor disables it. See below for details. The
39 config-file option names for this flag are "chroot" and "noch‐
40 root", and the config.h option is ALWAYS_CHROOT.
41
42 -dd Specifies a directory to chdir() to after chrooting. If you're
43 not chrooting, you might as well do a single chdir() with the -d
44 flag. If you are chrooting, this lets you put the web files in
45 a subdirectory of the chroot tree, instead of in the top level
46 mixed in with the chroot files. The config-file option name for
47 this flag is "data_dir".
48
49 -nos Don't do explicit symbolic link checking. Normally, thttpd
50 explicitly expands any symbolic links in filenames, to check
51 that the resulting path stays within the original document tree.
52 If you want to turn off this check and save some CPU time, you
53 can use the -nos flag, however this is not recommended. Note,
54 though, that if you are using the chroot option, the symlink
55 checking is unnecessary and is turned off, so the safe way to
56 save those CPU cycles is to use chroot. The config-file option
57 names for this flag are "symlinkcheck" and "nosymlinkcheck".
58
59 -v Do el-cheapo virtual hosting. If -v is the compiled-in default,
60 then -nov disables it. See below for details. The config-file
61 option names for this flag are "vhost" and "novhost", and the
62 config.h option is ALWAYS_VHOST.
63
64 -g Use a global passwd file. This means that every file in the
65 entire document tree is protected by the single .htpasswd file
66 at the top of the tree. Otherwise the semantics of the
67 .htpasswd file are the same. If this option is set but there is
68 no .htpasswd file in the top-level directory, then thttpd pro‐
69 ceeds as if the option was not set - first looking for a local
70 .htpasswd file, and if that doesn't exist either then serving
71 the file without any password. If -g is the compiled-in
72 default, then -nog disables it. The config-file option names
73 for this flag are "globalpasswd" and "noglobalpasswd", and the
74 config.h option is ALWAYS_GLOBAL_PASSWD.
75
76 -u Specifies what user to switch to after initialization when
77 started as root. The default is "nobody". The config-file
78 option name for this flag is "user", and the config.h option is
79 DEFAULT_USER.
80
81 -c Specifies a wildcard pattern for CGI programs, for instance
82 "**.cgi" or "/cgi-bin/*". See below for details. The config-
83 file option name for this flag is "cgipat", and the config.h
84 option is CGI_PATTERN.
85
86 -t Specifies a file of throttle settings. See below for details.
87 The config-file option name for this flag is "throttles".
88
89 -h Specifies a hostname to bind to, for multihoming. The default
90 is to bind to all hostnames supported on the local machine. See
91 below for details. The config-file option name for this flag is
92 "host", and the config.h option is SERVER_NAME.
93
94 -l Specifies a file for logging. If no -l argument is specified,
95 thttpd logs via syslog(). If "-l /dev/null" is specified,
96 thttpd doesn't log at all. The config-file option name for this
97 flag is "logfile".
98
99 -i Specifies a file to write the process-id to. If no file is
100 specified, no process-id is written. You can use this file to
101 send signals to thttpd. See below for details. The config-file
102 option name for this flag is "pidfile".
103
104 -T Specifies the character set to use with text MIME types. The
105 default is iso-8859-1. The config-file option name for this
106 flag is "charset", and the config.h option is DEFAULT_CHARSET.
107
108 -P Specifies a P3P server privacy header to be returned with all
109 responses. See http://www.w3.org/P3P/ for details. Thttpd
110 doesn't do anything at all with the string except put it in the
111 P3P: response header. The config-file option name for this flag
112 is "p3p".
113
114 -M Specifies the number of seconds to be used in a "Cache-Control:
115 max-age" header to be returned with all responses. An equiva‐
116 lent "Expires" header is also generated. The default is no
117 Cache-Control or Expires headers, which is just fine for most
118 sites. The config-file option name for this flag is "max_age".
119
120 -V Shows the current version info.
121
122 -D This was originally just a debugging flag, however it's worth
123 mentioning because one of the things it does is prevent thttpd
124 from making itself a background daemon. Instead it runs in the
125 foreground like a regular program. This is necessary when you
126 want to run thttpd wrapped in a little shell script that
127 restarts it if it exits.
128
130 All the command-line options can also be set in a config file. One
131 advantage of using a config file is that the file can be changed, and
132 thttpd will pick up the changes with a restart.
133
134 The syntax of the config file is simple, a series of "option" or
135 "option=value" separated by whitespace. The option names are listed
136 above with their corresponding command-line flags.
137
139 chroot() is a system call that restricts the program's view of the
140 filesystem to the current directory and directories below it. It
141 becomes impossible for remote users to access any file outside of the
142 initial directory. The restriction is inherited by child processes, so
143 CGI programs get it too. This is a very strong security measure, and
144 is recommended. The only downside is that only root can call chroot(),
145 so this means the program must be started as root. However, the last
146 thing it does during initialization is to give up root access by becom‐
147 ing another user, so this is safe.
148
149 The program can also be compile-time configured to always do a
150 chroot(), without needing the -r flag.
151
152 Note that with some other web servers, such as NCSA httpd, setting up a
153 directory tree for use with chroot() is complicated, involving creating
154 a bunch of special directories and copying in various files. With
155 thttpd it's a lot easier, all you have to do is make sure any shells,
156 utilities, and config files used by your CGI programs and scripts are
157 available. If you have CGI disabled, or if you make a policy that all
158 CGI programs must be written in a compiled language such as C and stat‐
159 ically linked, then you probably don't have to do any setup at all.
160
161 However, one thing you should do is tell syslogd about the chroot tree,
162 so that thttpd can still generate syslog messages. Check your system's
163 syslodg man page for how to do this. In FreeBSD you would put some‐
164 thing like this in /etc/rc.conf:
165 syslogd_flags="-l /usr/local/www/data/dev/log"
166 Substitute in your own chroot tree's pathname, of course. Don't worry
167 about creating the log socket, syslogd wants to do that itself. (You
168 may need to create the dev directory.) In Linux the flag is -a instead
169 of -l, and there may be other differences.
170
171 Relevant config.h option: ALWAYS_CHROOT.
172
174 thttpd supports the CGI 1.1 spec.
175
176 In order for a CGI program to be run, its name must match the pattern
177 specified either at compile time or on the command line with the -c
178 flag. This is a simple shell-style filename pattern. You can use * to
179 match any string not including a slash, or ** to match any string
180 including slashes, or ? to match any single character. You can also
181 use multiple such patterns separated by |. The patterns get checked
182 against the filename part of the incoming URL. Don't forget to quote
183 any wildcard characters so that the shell doesn't mess with them.
184
185 Restricting CGI programs to a single directory lets the site adminis‐
186 trator review them for security holes, and is strongly recommended. If
187 there are individual users that you trust, you can enable their direc‐
188 tories too.
189
190 If no CGI pattern is specified, neither here nor at compile time, then
191 CGI programs cannot be run at all. If you want to disable CGI as a
192 security measure, that's how you do it, just comment out the patterns
193 in the config file and don't run with the -c flag.
194
195 Note: the current working directory when a CGI program gets run is the
196 directory that the CGI program lives in. This isn't in the CGI 1.1
197 spec, but it's what most other HTTP servers do.
198
199 Relevant config.h options: CGI_PATTERN, CGI_TIMELIMIT, CGI_NICE,
200 CGI_PATH, CGI_LD_LIBRARY_PATH, CGIBINDIR.
201
203 Basic Authentication is available as an option at compile time. If
204 enabled, it uses a password file in the directory to be protected,
205 called .htpasswd by default. This file is formatted as the familiar
206 colon-separated username/encrypted-password pair, records delimited by
207 newlines. The protection does not carry over to subdirectories. The
208 utility program thtpasswd(1) is included to help create and modify
209 .htpasswd files.
210
211 Relevant config.h option: AUTH_FILE
212
214 The throttle file lets you set maximum byte rates on URLs or URL
215 groups. You can optionally set a minimum rate too. The format of the
216 throttle file is very simple. A # starts a comment, and the rest of
217 the line is ignored. Blank lines are ignored. The rest of the lines
218 should consist of a pattern, whitespace, and a number. The pattern is
219 a simple shell-style filename pattern, using ?/**/*, or multiple such
220 patterns separated by |.
221
222 The numbers in the file are byte rates, specified in units of bytes per
223 second. For comparison, a v.90 modem gives about 5000 B/s depending on
224 compression, a double-B-channel ISDN line about 12800 B/s, and a T1
225 line is about 150000 B/s. If you want to set a minimum rate as well,
226 use number-number.
227
228 Example:
229 # throttle file for www.acme.com
230
231 ** 2000-100000 # limit total web usage to 2/3 of our T1,
232 # but never go below 2000 B/s
233 **.jpg|**.gif 50000 # limit images to 1/3 of our T1
234 **.mpg 20000 # and movies to even less
235 jef/** 20000 # jef's pages are too popular
236
237 Throttling is implemented by checking each incoming URL filename
238 against all of the patterns in the throttle file. The server accumu‐
239 lates statistics on how much bandwidth each pattern has accounted for
240 recently (via a rolling average). If a URL matches a pattern that has
241 been exceeding its specified limit, then the data returned is actually
242 slowed down, with pauses between each block. If that's not possible
243 (e.g. for CGI programs) or if the bandwidth has gotten way larger than
244 the limit, then the server returns a special code saying 'try again
245 later'.
246
247 The minimum rates are implemented similarly. If too many people are
248 trying to fetch something at the same time, throttling may slow down
249 each connection so much that it's not really useable. Furthermore, all
250 those slow connections clog up the server, using up file handles and
251 connection slots. Setting a minimum rate says that past a certain
252 point you should not even bother - the server returns the 'try again
253 later" code and the connection isn't even started.
254
255 There is no provision for setting a maximum connections/second throt‐
256 tle, because throttling a request uses as much cpu as handling it, so
257 there would be no point. There is also no provision for throttling the
258 number of simultaneous connections on a per-URL basis. However you can
259 control the overall number of connections for the whole server very
260 simply, by setting the operating system's per-process file descriptor
261 limit before starting thttpd. Be sure to set the hard limit, not the
262 soft limit.
263
265 Multihoming means using one machine to serve multiple hostnames. For
266 instance, if you're an internet provider and you want to let all of
267 your customers have customized web addresses, you might have
268 www.joe.acme.com, www.jane.acme.com, and your own www.acme.com, all
269 running on the same physical hardware. This feature is also known as
270 "virtual hosts". There are three steps to setting this up.
271
272 One, make DNS entries for all of the hostnames. The current way to do
273 this, allowed by HTTP/1.1, is to use CNAME aliases, like so:
274 www.acme.com IN A 192.100.66.1
275 www.joe.acme.com IN CNAME www.acme.com
276 www.jane.acme.com IN CNAME www.acme.com
277 However, this is incompatible with older HTTP/1.0 browsers. If you
278 want to stay compatible, there's a different way - use A records
279 instead, each with a different IP address, like so:
280 www.acme.com IN A 192.100.66.1
281 www.joe.acme.com IN A 192.100.66.200
282 www.jane.acme.com IN A 192.100.66.201
283 This is bad because it uses extra IP addresses, a somewhat scarce
284 resource. But if you want people with older browsers to be able to
285 visit your sites, you still have to do it this way.
286
287 Step two. If you're using the modern CNAME method of multihoming, then
288 you can skip this step. Otherwise, using the older multiple-IP-address
289 method you must set up IP aliases or multiple interfaces for the extra
290 addresses. You can use ifconfig(8)'s alias command to tell the machine
291 to answer to all of the different IP addresses. Example:
292 ifconfig le0 www.acme.com
293 ifconfig le0 www.joe.acme.com alias
294 ifconfig le0 www.jane.acme.com alias
295 If your OS's version of ifconfig doesn't have an alias command, you're
296 probably out of luck (but see http://www.acme.com/soft‐
297 ware/thttpd/notes.html).
298
299 Third and last, you must set up thttpd to handle the multiple hosts.
300 The easiest way is with the -v flag, or the ALWAYS_VHOST config.h
301 option. This works with either CNAME multihosting or multiple-IP mul‐
302 tihosting. What it does is send each incoming request to a subdirec‐
303 tory based on the hostname it's intended for. All you have to do in
304 order to set things up is to create those subdirectories in the direc‐
305 tory where thttpd will run. With the example above, you'd do like so:
306 mkdir www.acme.com www.joe.acme.com www.jane.acme.com
307 If you're using old-style multiple-IP multihosting, you should also
308 create symbolic links from the numeric addresses to the names, like so:
309 ln -s www.acme.com 192.100.66.1
310 ln -s www.joe.acme.com 192.100.66.200
311 ln -s www.jane.acme.com 192.100.66.201
312 This lets the older HTTP/1.0 browsers find the right subdirectory.
313
314 There's an optional alternate step three if you're using multiple-IP
315 multihosting: run a separate thttpd process for each hostname, using
316 the -h flag to specify which one is which. This gives you more flexi‐
317 bility, since you can run each of these processes in separate directo‐
318 ries, with different throttle files, etc. Example:
319 thttpd -r -d /usr/www -h www.acme.com
320 thttpd -r -d /usr/www/joe -u joe -h www.joe.acme.com
321 thttpd -r -d /usr/www/jane -u jane -h www.jane.acme.com
322 But remember, this multiple-process method does not work with CNAME
323 multihosting - for that, you must use a single thttpd process with the
324 -v flag.
325
327 thttpd lets you define your own custom error pages for the various HTTP
328 errors. There's a separate file for each error number, all stored in
329 one special directory. The directory name is "errors", at the top of
330 the web directory tree. The error files should be named "errNNN.html",
331 where NNN is the error number. So for example, to make a custom error
332 page for the authentication failure error, which is number 401, you
333 would put your HTML into the file "errors/err401.html". If no custom
334 error file is found for a given error number, then the usual built-in
335 error page is generated.
336
337 If you're using the virtual hosts option, you can also have different
338 custom error pages for each different virtual host. In this case you
339 put another "errors" directory in the top of that virtual host's web
340 tree. thttpd will look first in the virtual host errors directory, and
341 then in the server-wide errors directory, and if neither of those has
342 an appropriate error file then it will generate the built-in error.
343
345 Sometimes another site on the net will embed your image files in their
346 HTML files, which basically means they're stealing your bandwidth. You
347 can prevent them from doing this by using non-local referer filtering.
348 With this option, certain files can only be fetched via a local ref‐
349 erer. The files have to be referenced by a local web page. If a web
350 page on some other site references the files, that fetch will be
351 blocked. There are three config-file variables for this feature:
352
353 urlpat A wildcard pattern for the URLs that should require a local ref‐
354 erer. This is typically just image files, sound files, and so
355 on. For example:
356 urlpat=**.jpg|**.gif|**.au|**.wav
357 For most sites, that one setting is all you need to enable ref‐
358 erer filtering.
359
360 noemptyreferers
361 By default, requests with no referer at all, or a null referer,
362 or a referer with no apparent hostname, are allowed. With this
363 variable set, such requests are disallowed.
364
365 localpat
366 A wildcard pattern that specifies the local host or hosts. This
367 is used to determine if the host in the referer is local or not.
368 If not specified it defaults to the actual local hostname.
369
371 thttpd is very picky about symbolic links. Before delivering any file,
372 it first checks each element in the path to see if it's a symbolic
373 link, and expands them all out to get the final actual filename. Along
374 the way it checks for things like links with ".." that go above the
375 server's directory, and absolute symlinks (ones that start with a /).
376 These are prohibited as security holes, so the server returns an error
377 page for them. This means you can't set up your web directory with a
378 bunch of symlinks pointing to individual users' home web directories.
379 Instead you do it the other way around - the user web directories are
380 real subdirs of the main web directory, and in each user's home dir
381 there's a symlink pointing to their actual web dir.
382
383 The CGI pattern is also affected - it gets matched against the fully-
384 expanded filename. So, if you have a single CGI directory but then put
385 a symbolic link in it pointing somewhere else, that won't work. The
386 CGI program will be treated as a regular file and returned to the
387 client, instead of getting run. This could be confusing.
388
390 thttpd is also picky about file permissions. It wants data files
391 (HTML, images) to be world readable. Readable by the group that the
392 thttpd process runs as is not enough - thttpd checks explicitly for the
393 world-readable bit. This is so that no one ever gets surprised by a
394 file that's not set world-readable and yet somehow is readable by the
395 HTTP server and therefore the *whole* world.
396
397 The same logic applies to directories. As with the standard Unix "ls"
398 program, thttpd will only let you look at the contents of a directory
399 if its read bit is on; but as with data files, this must be the world-
400 read bit, not just the group-read bit.
401
402 thttpd also wants the execute bit to be *off* for data files. A file
403 that is marked executable but doesn't match the CGI pattern might be a
404 script or program that got accidentally left in the wrong directory.
405 Allowing people to fetch the contents of the file might be a security
406 breach, so this is prohibited. Of course if an executable file *does*
407 match the CGI pattern, then it just gets run as a CGI.
408
409 In summary, data files should be mode 644 (rw-r--r--), directories
410 should be 755 (rwxr-xr-x) if you want to allow indexing and 711
411 (rwx--x--x) to disallow it, and CGI programs should be mode 755 (rwxr-
412 xr-x) or 711 (rwx--x--x).
413
415 thttpd does all of its logging via syslog(3). The facility it uses is
416 configurable. Aside from error messages, there are only a few log
417 entry types of interest, all fairly similar to CERN Common Log Format:
418 Aug 6 15:40:34 acme thttpd[583]: 165.113.207.103 - - "GET /file" 200 357
419 Aug 6 15:40:43 acme thttpd[583]: 165.113.207.103 - - "HEAD /file" 200 0
420 Aug 6 15:41:16 acme thttpd[583]: referer http://www.acme.com/ -> /dir
421 Aug 6 15:41:16 acme thttpd[583]: user-agent Mozilla/1.1N
422 The package includes a script for translating these log entries info
423 CERN-compatible files. Note that thttpd does not translate numeric IP
424 addresses into domain names. This is both to save time and as a minor
425 security measure (the numeric address is harder to spoof).
426
427 Relevant config.h option: LOG_FACILITY.
428
429 If you'd rather log directly to a file, you can use the -l command-line
430 flag. But note that error messages still go to syslog.
431
433 thttpd handles a couple of signals, which you can send via the standard
434 Unix kill(1) command:
435
436 INT,TERM
437 These signals tell thttpd to shut down immediately. Any
438 requests in progress get aborted.
439
440 USR1 This signal tells thttpd to shut down as soon as it's done ser‐
441 vicing all current requests. In addition, the network socket it
442 uses to accept new connections gets closed immediately, which
443 means a fresh thttpd can be started up immediately.
444
445 USR2 This signal tells thttpd to generate the statistics syslog mes‐
446 sages immediately, instead of waiting for the regular hourly
447 update.
448
449 HUP This signal tells thttpd to close and re-open its (non-syslog)
450 log file, for instance if you rotated the logs and want it to
451 start using the new one. This is a little tricky to set up cor‐
452 rectly, for instance if you are using chroot() then the log file
453 must be within the chroot tree, but it's definitely doable.
454
456 thtpasswd(1), syslogtocern(8)
457
459 Many thanks to contributors, reviewers, testers: John LoVerso, Jordan
460 Hayes, Chris Torek, Jim Thompson, Barton Schaffer, Geoff Adams, Dan
461 Kegel, John Hascall, Bennett Todd, KIKUCHI Takahiro, Catalin Ionescu.
462 Special thanks to Craig Leres for substantial debugging and develop‐
463 ment, and for not complaining about my coding style very much.
464
466 Copyright © 1995,1998,1999,2000 by Jef Poskanzer <jef@acme.com>. All
467 rights reserved.
468
469
470
471 29 February 2000 thttpd(8)