1HTTP::Proxy(3) User Contributed Perl Documentation HTTP::Proxy(3)
2
3
4
6 HTTP::Proxy - A pure Perl HTTP proxy
7
9 use HTTP::Proxy;
10
11 # initialisation
12 my $proxy = HTTP::Proxy->new( port => 3128 );
13
14 # alternate initialisation
15 my $proxy = HTTP::Proxy->new;
16 $proxy->port( 3128 ); # the classical accessors are here!
17
18 # this is a MainLoop-like method
19 $proxy->start;
20
22 This module implements a HTTP proxy, using a HTTP::Daemon to accept
23 client connections, and a LWP::UserAgent to ask for the requested
24 pages.
25
26 The most interesting feature of this proxy object is its ability to
27 filter the HTTP requests and responses through user-defined filters.
28
29 Once the proxy is created, with the "new()" method, it is possible to
30 alter its behaviour by adding so-called "filters". This is done by the
31 "push_filter()" method. Once the filter is ready to run, it can be
32 launched, with the "start()" method. This method does not normally
33 return until the proxy is killed or otherwise stopped.
34
35 An important thing to note is that the proxy is (except when running
36 the "NoFork" engine) a forking proxy: it doesn't support passing
37 information between child processes, and you can count on reliable
38 information passing only during a single HTTP connection (request +
39 response).
40
42 You can alter the way the default HTTP::Proxy works by plugging
43 callbacks (filter objects, actually) at different stages of the
44 request/response handling.
45
46 When a request is received by the HTTP::Proxy object, it is filtered
47 through a standard filter that transform this request accordingly to
48 RFC 2616 (by adding the "Via:" header, and a few other
49 transformations). This is the default, bare minimum behaviour.
50
51 The response is also filtered in the same manner. There is a total of
52 four filter chains: "request-headers", "request-body",
53 "reponse-headers" and "response-body".
54
55 You can add your own filters to the default ones with the
56 "push_filter()" method. The method pushes a filter on the appropriate
57 filter stack.
58
59 $proxy->push_filter( response => $filter );
60
61 The headers/body category is determined by the base class of the
62 filter. There are two base classes for filters, which are
63 "HTTP::Proxy::HeaderFilter" and "HTTP::Proxy::BodyFilter" (the names
64 are self-explanatory). See the documentation of those two classes to
65 find out how to write your own header or body filters.
66
67 The named parameter is used to determine the request/response part.
68
69 It is possible to push the same filter on the request and response
70 stacks, as in the following example:
71
72 $proxy->push_filter( request => $filter, response => $filter );
73
74 If several filters match the message, they will be applied in the order
75 they were pushed on their filter stack.
76
77 Named parameters can be used to create the match routine. They are:
78
79 method - the request method
80 scheme - the URI scheme
81 host - the URI authority (host:port)
82 path - the URI path
83 query - the URI query string
84 mime - the MIME type (for a response-body filter)
85
86 The filters are applied only when all the the parameters match the
87 request or the response. All these named parameters have default
88 values, which are:
89
90 method => 'OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT'
91 scheme => 'http'
92 host => ''
93 path => ''
94 query => ''
95 mime => 'text/*'
96
97 The "mime" parameter is a glob-like string, with a required "/"
98 character and a "*" as a joker. Thus, "*/*" matches all responses, and
99 "" those with no "Content-Type:" header. To match any reponse (with or
100 without a "Content-Type:" header), use "undef".
101
102 The "mime" parameter is only meaningful with the "response-body" filter
103 stack. It is ignored if passed to any other filter stack.
104
105 The "method" and "scheme" parameters are strings consisting of comma-
106 separated values. The "host" and "path" parameters are regular
107 expressions.
108
109 A match routine is compiled by the proxy and used to check if a
110 particular request or response must be filtered through a particular
111 filter.
112
113 It is also possible to push several filters on the same stack with the
114 same match subroutine:
115
116 # convert italics to bold
117 $proxy->push_filter(
118 mime => 'text/html',
119 response => HTTP::Proxy::BodyFilter::tags->new(),
120 response => HTTP::Proxy::BodyFilter::simple->new(
121 sub { ${ $_[1] } =~ s!(</?)i>!$1b>!ig }
122 )
123 );
124
125 For more details regarding the creation of new filters, check the
126 "HTTP::Proxy::HeaderFilter" and "HTTP::Proxy::BodyFilter"
127 documentation.
128
129 Here's an example of subclassing a base filter class:
130
131 # fixes a common typo ;-)
132 # but chances are that this will modify a correct URL
133 {
134 package FilterPerl;
135 use base qw( HTTP::Proxy::BodyFilter );
136
137 sub filter {
138 my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
139 $$dataref =~ s/PERL/Perl/g;
140 }
141 }
142 $proxy->push_filter( response => FilterPerl->new() );
143
144 Other examples can be found in the documentation for
145 "HTTP::Proxy::HeaderFilter", "HTTP::Proxy::BodyFilter",
146 "HTTP::Proxy::HeaderFilter::simple", "HTTP::Proxy::BodyFilter::simple".
147
148 # a simple anonymiser
149 # see eg/anonymiser.pl for the complete code
150 $proxy->push_filter(
151 mime => undef,
152 request => HTTP::Proxy::HeaderFilter::simple->new(
153 sub { $_[0]->remove_header(qw( User-Agent From Referer Cookie )) },
154 ),
155 response => HTTP::Proxy::HeaderFilter::simple->new(
156 sub { $_[0]->remove_header(qw( Set-Cookie )); },
157 )
158 );
159
160 IMPORTANT: If you use your own "LWP::UserAgent", you must install it
161 before your calls to "push_filter()", otherwise the match method will
162 make wrong assumptions about the schemes your agent supports.
163
164 NOTE: It is likely that possibility of changing the agent or the daemon
165 may disappear in future versions.
166
168 Constructor and initialisation
169 new()
170 The "new()" method creates a new HTTP::Proxy object. All attributes
171 can be passed as parameters to replace the default.
172
173 Parameters that are not "HTTP::Proxy" attributes will be ignored
174 and passed to the chosen "HTTP::Proxy::Engine" object.
175
176 init()
177 "init()" initialise the proxy without starting it. It is usually
178 not needed.
179
180 This method is called by "start()" if needed.
181
182 push_filter()
183 The "push_filter()" method is used to add filters to the proxy. It
184 is fully described in section FILTERS.
185
186 Accessors and mutators
187 The HTTP::Proxy has several accessors and mutators.
188
189 Called with arguments, the accessor returns the current value. Called
190 with a single argument, it sets the current value and returns the
191 previous one, in case you want to keep it.
192
193 If you call a read-only accessor with a parameter, this parameter will
194 be ignored.
195
196 The defined accessors are (in alphabetical order):
197
198 agent
199 The LWP::UserAgent object used internally to connect to remote
200 sites.
201
202 chunk
203 The chunk size for the LWP::UserAgent callbacks.
204
205 client_socket (read-only)
206 The socket currently connected to the client. Mostly useful in
207 filters.
208
209 client_headers
210 This attribute holds a reference to the client headers set up by
211 LWP::UserAgent ("Client-Aborted", "Client-Bad-Header-Line",
212 "Client-Date", "Client-Junk", "Client-Peer", "Client-Request-Num",
213 "Client-Response-Num", "Client-SSL-Cert-Issuer",
214 "Client-SSL-Cert-Subject", "Client-SSL-Cipher",
215 "Client-SSL-Warning", "Client-Transfer-Encoding",
216 "Client-Warning").
217
218 They are removed by the filter HTTP::Proxy::HeaderFilter::standard
219 from the request and response objects received by the proxy.
220
221 If a filter (such as a SSL certificate verification filter) need to
222 access them, it must do it through this accessor.
223
224 conn (read-only)
225 The number of connections processed by this HTTP::Proxy instance.
226
227 daemon
228 The HTTP::Daemon object used to accept incoming connections. (You
229 usually never need this.)
230
231 engine
232 The HTTP::Proxy::Engine object that manages the child processes.
233
234 hop_headers
235 This attribute holds a reference to the hop-by-hop headers
236 ("Connection", "Keep-Alive", "Proxy-Authenticate",
237 "Proxy-Authorization", "TE", "Trailers", "Transfer-Encoding",
238 "Upgrade").
239
240 They are removed by the filter HTTP::Proxy::HeaderFilter::standard
241 from the request and response objects received by the proxy.
242
243 If a filter (such as a proxy authorisation filter) need to access
244 them, it must do it through this accessor.
245
246 host
247 The proxy HTTP::Daemon host (default: 'localhost').
248
249 This means that by default, the proxy answers only to clients on
250 the local machine. You can pass a specific interface address or
251 ""/"undef" for any interface.
252
253 This default prevents your proxy to be used as an anonymous proxy
254 by script kiddies.
255
256 known_methods( @groups ) (read-only)
257 This method returns all HTTP (and extensions to HTTP) known to
258 "HTTP::Proxy". Methods are grouped by type. Known method groups
259 are: "HTTP", "WebDAV" and "DeltaV".
260
261 Called with an empty list, this method will return all known
262 methods. This method is case-insensitive, and will "carp()" if an
263 unknown group name is passed.
264
265 logfh
266 A filehandle to a logfile (default: *STDERR).
267
268 logmask( [$mask] )
269 Be verbose in the logs (default: NONE).
270
271 Here are the various elements that can be added to the mask (their
272 values are powers of 2, starting from 0 and listed here in
273 ascending order):
274
275 NONE - Log only errors
276 PROXY - Proxy information
277 STATUS - Requested URL, reponse status and total number
278 of connections processed
279 PROCESS - Subprocesses information (fork, wait, etc.)
280 SOCKET - Information about low-level sockets
281 HEADERS - Full request and response headers are sent along
282 FILTERS - Filter information
283 DATA - Data received by the filters
284 CONNECT - Data transmitted by the CONNECT method
285 ENGINE - Engine information
286 ALL - Log all of the above
287
288 If you only want status and process information, you can use:
289
290 $proxy->logmask( STATUS | PROCESS );
291
292 Note that all the logging constants are not exported by default,
293 but by the ":log" tag. They can also be exported one by one.
294
295 loop (read-only)
296 Internal. False when the main loop is about to be broken.
297
298 max_clients
299 maxchild
300 The maximum number of child process the HTTP::Proxy object will
301 spawn to handle client requests (default: depends on the engine).
302
303 This method is currently delegated to the HTTP::Proxy::Engine
304 object.
305
306 "maxchild" is deprecated and will disappear.
307
308 max_connections
309 maxconn
310 The maximum number of TCP connections the proxy will accept before
311 returning from start(). 0 (the default) means never stop accepting
312 connections.
313
314 "maxconn" is deprecated.
315
316 Note: "max_connections" will be deprecated soon, for two reasons:
317 1) it is more of an HTTP::Proxy::Engine attribute, 2) not all
318 engines will support it.
319
320 max_keep_alive_requests
321 maxserve
322 The maximum number of requests the proxy will serve in a single
323 connection. (same as "MaxRequestsPerChild" in Apache)
324
325 "maxserve" is deprecated.
326
327 port
328 The proxy "HTTP::Daemon" port (default: 8080).
329
330 request
331 The request originaly received by the proxy from the user-agent,
332 which will be modified by the request filters.
333
334 response
335 The response received from the origin server by the proxy. It is
336 normally "undef" until the proxy actually receives the beginning of
337 a response from the origin server.
338
339 If one of the request filters sets this attribute, it "short-
340 circuits" the request/response scheme, and the proxy will return
341 this response (which is NOT filtered through the response filter
342 stacks) instead of the expected origin server response. This is
343 useful for caching (though Squid does it much better) and proxy
344 authentication, for example.
345
346 stash
347 The stash is a hash where filters can store data to share between
348 them.
349
350 The stash() method can be used to set the whole hash (with a HASH
351 reference). To access individual keys simply do:
352
353 $proxy->stash( 'bloop' );
354
355 To set it, type:
356
357 $proxy->stash( bloop => 'owww' );
358
359 It's also possibly to get a reference to the stash:
360
361 my $s = $filter->proxy->stash();
362 $s->{bang} = 'bam';
363
364 # $proxy->stash( 'bang' ) will now return 'bam'
365
366 Warning: since the proxy forks for each TCP connection, the data is
367 only shared between filters in the same child process.
368
369 timeout
370 The timeout used by the internal LWP::UserAgent (default: 60).
371
372 url (read-only)
373 The url where the proxy can be reached.
374
375 via The content of the Via: header. Setting it to an empty string will
376 prevent its addition. (default: "$hostname (HTTP::Proxy/$VERSION)")
377
378 x_forwarded_for
379 If set to a true value, the proxy will send the "X-Forwarded-For:"
380 header. (default: true)
381
382 Connection handling methods
383 start()
384 This method works like Tk's "MainLoop": you hand over control to
385 the "HTTP::Proxy" object you created and configured.
386
387 If "maxconn" is not zero, "start()" will return after accepting at
388 most that many connections. It will return the total number of
389 connexions.
390
391 serve_connections()
392 This is the internal method used to handle each new TCP connection
393 to the proxy.
394
395 Other methods
396 log( $level, $prefix, $message )
397 Adds $message at the end of "logfh", if $level matches "logmask".
398 The "log()" method also prints a timestamp.
399
400 The output looks like:
401
402 [Thu Dec 5 12:30:12 2002] ($$) $prefix: $message
403
404 where $$ is the current processus id.
405
406 If $message is a multiline string, several log lines will be
407 output, each line starting with $prefix.
408
409 is_protocol_supported( $scheme )
410 Returns a boolean indicating if $scheme is supported by the proxy.
411
412 This method is only used internaly.
413
414 It is essential to allow HTTP::Proxy users to create "pseudo-
415 schemes" that LWP doesn't know about, but that one of the proxy
416 filters can handle directly. New schemes are added as follows:
417
418 $proxy->init(); # required to get an agent
419 $proxy->agent->protocols_allowed(
420 [ @{ $proxy->agent->protocols_allowed }, 'myhttp' ] );
421
422 new_connection()
423 Increase the proxy's TCP connections counter. Only used by
424 "HTTP::Proxy::Engine" objects.
425
426 Apache-like attributes
427 "HTTP::Proxy" has several Apache-like attributes that control the way
428 the HTTP and TCP connections are handled.
429
430 The following attributes control the TCP connection. They are passed to
431 the underlying "HTTP::Proxy::Engine", which may (or may not) use them
432 to change its behaviour.
433
434 start_servers
435 Number of child process to fork at the beginning.
436
437 max_clients
438 Maximum number of concurrent TCP connections (i.e. child
439 processes).
440
441 max_requests_per_child
442 Maximum number of TCP connections handled by the same child
443 process.
444
445 min_spare_servers
446 Minimum number of inactive child processes.
447
448 max_spare_servers
449 Maximum number of inactive child processes.
450
451 Those attributes control the HTTP connection:
452
453 keep_alive
454 Support for keep alive HTTP connections.
455
456 max_keep_alive_requests
457 Maximum number of HTTP connections within a single TCP connection.
458
459 keep_alive_timeout
460 Timeout for keep-alive connection.
461
463 No symbols are exported by default. The ":log" tag exports all the
464 logging constants.
465
467 This module does not work under Windows, but I can't see why, and do
468 not have a development platform under that system. Patches and
469 explanations very welcome.
470
471 I guess it is because "fork()" is not well supported.
472
473 $proxy->maxchild(0);
474
475 However, David Fishburn says:
476 This did not work for me under WinXP - ActiveState Perl 5.6, but it
477 DOES work on WinXP ActiveState Perl 5.8.
478
479 Several people have tried to help, but we haven't found a way to make
480 it work correctly yet.
481
482 As from version 0.16, the default engine is
483 "HTTP::Proxy::Engine::NoFork". Let me know if it works better.
484
486 HTTP::Proxy::Engine, HTTP::Proxy::BodyFilter,
487 HTTP::Proxy::HeaderFilter, the examples in eg/.
488
490 Philippe "BooK" Bruhat, <book@cpan.org>.
491
492 The module has its own web page at http://http-proxy.mongueurs.net/
493 <http://http-proxy.mongueurs.net/> complete with older versions and
494 repository snapshot.
495
496 There are also two mailing-lists: http-proxy@mongueurs.net for general
497 discussion about "HTTP::Proxy" and http-proxy-cvs@mongueurs.net for CVS
498 commits emails.
499
501 Many people helped me during the development of this module, either on
502 mailing-lists, IRC or over a beer in a pub...
503
504 So, in no particular order, thanks to the libwww-perl team for such a
505 terrific suite of modules, perl-qa (tips for testing), the French Perl
506 Mongueurs (for code tricks, beers and encouragements) and my growing
507 user base... ";-)"
508
509 I'd like to particularly thank Dan Grigsby, who's been using
510 "HTTP::Proxy" since 2003 (before the filter classes even existed). He
511 is apparently making a living from a product based on "HTTP::Proxy".
512 Thanks a lot for your confidence in my work!
513
515 Copyright 2002-2008, Philippe Bruhat.
516
518 This module is free software; you can redistribute it or modify it
519 under the same terms as Perl itself.
520
521
522
523perl v5.12.0 2010-05-02 HTTP::Proxy(3)