1HTTP::Proxy(3)        User Contributed Perl Documentation       HTTP::Proxy(3)
2
3
4

NAME

6       HTTP::Proxy - A pure Perl HTTP proxy
7

SYNOPSIS

9           use HTTP::Proxy;
10
11           # initialisation
12           my $proxy = HTTP::Proxy->new( port => 3128 );
13
14           # alternate initialisation
15           my $proxy = HTTP::Proxy->new;
16           $proxy->port( 3128 ); # the classical accessors are here!
17
18           # this is a MainLoop-like method
19           $proxy->start;
20

DESCRIPTION

22       This module implements a HTTP proxy, using a HTTP::Daemon to accept
23       client connections, and a LWP::UserAgent to ask for the requested
24       pages.
25
26       The most interesting feature of this proxy object is its ability to
27       filter the HTTP requests and responses through user-defined filters.
28
29       Once the proxy is created, with the "new()" method, it is possible to
30       alter its behaviour by adding so-called "filters". This is done by the
31       "push_filter()" method. Once the filter is ready to run, it can be
32       launched, with the "start()" method. This method does not normally
33       return until the proxy is killed or otherwise stopped.
34
35       An important thing to note is that the proxy is (except when running
36       the "NoFork" engine) a forking proxy: it doesn't support passing infor‐
37       mation between child processes, and you can count on reliable informa‐
38       tion passing only during a single HTTP connection (request + response).
39

FILTERS

41       You can alter the way the default HTTP::Proxy works by plugging call‐
42       backs (filter objects, actually) at different stages of the
43       request/response handling.
44
45       When a request is received by the HTTP::Proxy object, it is filtered
46       through a standard filter that transform this request accordingly to
47       RFC 2616 (by adding the "Via:" header, and a few other transforma‐
48       tions). This is the default, bare minimum behaviour.
49
50       The response is also filtered in the same manner. There is a total of
51       four filter chains: "request-headers", "request-body", "reponse-head‐
52       ers" and "response-body".
53
54       You can add your own filters to the default ones with the "push_fil‐
55       ter()" method. The method pushes a filter on the appropriate filter
56       stack.
57
58           $proxy->push_filter( response => $filter );
59
60       The headers/body category is determined by the base class of the fil‐
61       ter.  There are two base classes for filters, which are
62       "HTTP::Proxy::HeaderFilter" and "HTTP::Proxy::BodyFilter" (the names
63       are self-explanatory). See the documentation of those two classes to
64       find out how to write your own header or body filters.
65
66       The named parameter is used to determine the request/response part.
67
68       It is possible to push the same filter on the request and response
69       stacks, as in the following example:
70
71           $proxy->push_filter( request => $filter, response => $filter );
72
73       If several filters match the message, they will be applied in the order
74       they were pushed on their filter stack.
75
76       Named parameters can be used to create the match routine. They are:
77
78           method - the request method
79           scheme - the URI scheme
80           host   - the URI authority (host:port)
81           path   - the URI path
82           query  - the URI query string
83           mime   - the MIME type (for a response-body filter)
84
85       The filters are applied only when all the the parameters match the
86       request or the response. All these named parameters have default val‐
87       ues, which are:
88
89           method => 'OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT'
90           scheme => 'http'
91           host   => ''
92           path   => ''
93           query  => ''
94           mime   => 'text/*'
95
96       The "mime" parameter is a glob-like string, with a required "/" charac‐
97       ter and a "*" as a joker. Thus, "*/*" matches all responses, and ""
98       those with no "Content-Type:" header. To match any reponse (with or
99       without a "Content-Type:" header), use "undef".
100
101       The "mime" parameter is only meaningful with the "response-body" filter
102       stack. It is ignored if passed to any other filter stack.
103
104       The "method" and "scheme" parameters are strings consisting of comma-
105       separated values. The "host" and "path" parameters are regular expres‐
106       sions.
107
108       A match routine is compiled by the proxy and used to check if a partic‐
109       ular request or response must be filtered through a particular filter.
110
111       It is also possible to push several filters on the same stack with the
112       same match subroutine:
113
114           # convert italics to bold
115           $proxy->push_filter(
116               mime     => 'text/html',
117               response => HTTP::Proxy::BodyFilter::tags->new(),
118               response =>
119                 HTTP::Proxy::BodyFilter::simple->new( sub { s!(</?)i>!$1b>!ig } )
120           );
121
122       For more details regarding the creation of new filters, check the
123       "HTTP::Proxy::HeaderFilter" and "HTTP::Proxy::BodyFilter" documenta‐
124       tion.
125
126       Here's an example of subclassing a base filter class:
127
128           # fixes a common typo ;-)
129           # but chances are that this will modify a correct URL
130           {
131               package FilterPerl;
132               use base qw( HTTP::Proxy::BodyFilter );
133
134               sub filter {
135                   my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
136                   $$dataref =~ s/PERL/Perl/g;
137               }
138           }
139           $proxy->push_filter( response => FilterPerl->new() );
140
141       Other examples can be found in the documentation for
142       "HTTP::Proxy::HeaderFilter", "HTTP::Proxy::BodyFilter",
143       "HTTP::Proxy::HeaderFilter::simple", "HTTP::Proxy::BodyFilter::simple".
144
145           # a simple anonymiser
146           # see eg/anonymiser.pl for the complete code
147           $proxy->push_filter(
148               mime    => undef,
149               request => HTTP::Proxy::HeaderFilter::simple->new(
150                   sub { $_[0]->remove_header(qw( User-Agent From Referer Cookie )) },
151               ),
152               response => HTTP::Proxy::HeaderFilter::simple->new(
153                   sub { $_[0]->remove_header(qw( Set-Cookie )); },
154               )
155           );
156
157       IMPORTANT: If you use your own "LWP::UserAgent", you must install it
158       before your calls to "push_filter()", otherwise the match method will
159       make wrong assumptions about the schemes your agent supports.
160
161       NOTE: It is likely that possibility of changing the agent or the daemon
162       may disappear in future versions.
163

METHODS

165       Constructor and initialisation
166
167       new()
168           The "new()" method creates a new HTTP::Proxy object. All attributes
169           can be passed as parameters to replace the default.
170
171           Parameters that are not "HTTP::Proxy" attributes will be ignored
172           and passed to the chosen "HTTP::Proxy::Engine" object.
173
174       init()
175           "init()" initialise the proxy without starting it. It is usually
176           not needed.
177
178           This method is called by "start()" if needed.
179
180       push_filter()
181           The "push_filter()" method is used to add filters to the proxy.  It
182           is fully described in section FILTERS.
183
184       Accessors and mutators
185
186       The HTTP::Proxy has several accessors and mutators.
187
188       Called with arguments, the accessor returns the current value.  Called
189       with a single argument, it sets the current value and returns the pre‐
190       vious one, in case you want to keep it.
191
192       If you call a read-only accessor with a parameter, this parameter will
193       be ignored.
194
195       The defined accessors are (in alphabetical order):
196
197       agent
198           The LWP::UserAgent object used internally to connect to remote
199           sites.
200
201       chunk
202           The chunk size for the LWP::UserAgent callbacks.
203
204       client_socket (read-only)
205           The socket currently connected to the client. Mostly useful in fil‐
206           ters.
207
208       client_headers
209           This attribute holds a reference to the client headers set up by
210           LWP::UserAgent ("Client-Aborted", "Client-Bad-Header-Line",
211           "Client-Date", "Client-Junk", "Client-Peer", "Client-Request-Num",
212           "Client-Response-Num", "Client-SSL-Cert-Issuer",
213           "Client-SSL-Cert-Subject", "Client-SSL-Cipher", "Client-SSL-Warn‐
214           ing", "Client-Transfer-Encoding", "Client-Warning").
215
216           They are removed by the filter HTTP::Proxy::HeaderFilter::standard
217           from the request and response objects received by the proxy.
218
219           If a filter (such as a SSL certificate verification filter) need to
220           access them, it must do it through this accessor.
221
222       conn (read-only)
223           The number of connections processed by this HTTP::Proxy instance.
224
225       daemon
226           The HTTP::Daemon object used to accept incoming connections.  (You
227           usually never need this.)
228
229       engine
230           The HTTP::Proxy::Engine object that manages the child processes.
231
232       hop_headers
233           This attribute holds a reference to the hop-by-hop headers ("Con‐
234           nection", "Keep-Alive", "Proxy-Authenticate", "Proxy-Authoriza‐
235           tion", "TE", "Trailers", "Transfer-Encoding", "Upgrade").
236
237           They are removed by the filter HTTP::Proxy::HeaderFilter::standard
238           from the request and response objects received by the proxy.
239
240           If a filter (such as a proxy authorisation filter) need to access
241           them, it must do it through this accessor.
242
243       host
244           The proxy HTTP::Daemon host (default: 'localhost').
245
246           This means that by default, the proxy answers only to clients on
247           the local machine. You can pass a specific interface address or
248           ""/"undef" for any interface.
249
250           This default prevents your proxy to be used as an anonymous proxy
251           by script kiddies.
252
253       known_methods( @groups ) (read-only)
254           This method returns all HTTP (and extensions to HTTP) known to
255           "HTTP::Proxy". Methods are grouped by type. Known method groups
256           are: "HTTP", "WebDAV" and "DeltaV".
257
258           Called with an empty list, this method will return all known meth‐
259           ods.  This method is case-insensitive, and will "carp()" if an
260           unknown group name is passed.
261
262       logfh
263           A filehandle to a logfile (default: *STDERR).
264
265       logmask( [$mask] )
266           Be verbose in the logs (default: NONE).
267
268           Here are the various elements that can be added to the mask (their
269           values are powers of 2, starting from 0 and listed here in ascend‐
270           ing order):
271
272               NONE    - Log only errors
273               PROXY   - Proxy information
274               STATUS  - Requested URL, reponse status and total number
275                         of connections processed
276               PROCESS - Subprocesses information (fork, wait, etc.)
277               SOCKET  - Information about low-level sockets
278               HEADERS - Full request and response headers are sent along
279               FILTERS - Filter information
280               DATA    - Data received by the filters
281               CONNECT - Data transmitted by the CONNECT method
282               ENGINE  - Engine information
283               ALL     - Log all of the above
284
285           If you only want status and process information, you can use:
286
287               $proxy->logmask( STATUS ⎪ PROCESS );
288
289           Note that all the logging constants are not exported by default,
290           but by the ":log" tag. They can also be exported one by one.
291
292       loop (read-only)
293           Internal. False when the main loop is about to be broken.
294
295       max_clients
296       maxchild
297           The maximum number of child process the HTTP::Proxy object will
298           spawn to handle client requests (default: depends on the engine).
299
300           This method is currently delegated to the HTTP::Proxy::Engine
301           object.
302
303           "maxchild" is deprecated and will disappear.
304
305       max_connections
306       maxconn
307           The maximum number of TCP connections the proxy will accept before
308           returning from start(). 0 (the default) means never stop accepting
309           connections.
310
311           "maxconn" is deprecated.
312
313           Note: "max_connections" will be deprecated soon, for two reasons:
314           1) it is more of an HTTP::Proxy::Engine attribute, 2) not all
315           engines will support it.
316
317       max_keep_alive_requests
318       maxserve
319           The maximum number of requests the proxy will serve in a single
320           connection.  (same as "MaxRequestsPerChild" in Apache)
321
322           "maxserve" is deprecated.
323
324       port
325           The proxy "HTTP::Daemon" port (default: 8080).
326
327       request
328           The request originaly received by the proxy from the user-agent,
329           which will be modified by the request filters.
330
331       response
332           The response received from the origin server by the proxy. It is
333           normally "undef" until the proxy actually receives the beginning of
334           a response from the origin server.
335
336           If one of the request filters sets this attribute, it "short-cir‐
337           cuits" the request/response scheme, and the proxy will return this
338           response (which is NOT filtered through the response filter stacks)
339           instead of the expected origin server response. This is useful for
340           caching (though Squid does it much better) and proxy authentica‐
341           tion, for example.
342
343       stash
344           The stash is a hash where filters can store data to share between
345           them.
346
347           The stash() method can be used to set the whole hash (with a HASH
348           reference).  To access individual keys simply do:
349
350               $proxy->stash( 'bloop' );
351
352           To set it, type:
353
354               $proxy->stash( bloop => 'owww' );
355
356           It's also possibly to get a reference to the stash:
357
358               my $s = $filter->proxy->stash();
359               $s->{bang} = 'bam';
360
361               # $proxy->stash( 'bang' ) will now return 'bam'
362
363           Warning: since the proxy forks for each TCP connection, the data is
364           only shared between filters in the same child process.
365
366       timeout
367           The timeout used by the internal LWP::UserAgent (default: 60).
368
369       url (read-only)
370           The url where the proxy can be reached.
371
372       via The content of the Via: header. Setting it to an empty string will
373           prevent its addition. (default: "$hostname (HTTP::Proxy/$VERSION)")
374
375       x_forwarded_for
376           If set to a true value, the proxy will send the "X-Forwarded-For:"
377           header.  (default: true)
378
379       Connection handling methods
380
381       start()
382           This method works like Tk's "MainLoop": you hand over control to
383           the "HTTP::Proxy" object you created and configured.
384
385           If "maxconn" is not zero, "start()" will return after accepting at
386           most that many connections. It will return the total number of con‐
387           nexions.
388
389       serve_connections()
390           This is the internal method used to handle each new TCP connection
391           to the proxy.
392
393       Other methods
394
395       log( $level, $prefix, $message )
396           Adds $message at the end of "logfh", if $level matches "logmask".
397           The "log()" method also prints a timestamp.
398
399           The output looks like:
400
401               [Thu Dec  5 12:30:12 2002] ($$) $prefix: $message
402
403           where $$ is the current processus id.
404
405           If $message is a multiline string, several log lines will be out‐
406           put, each line starting with $prefix.
407
408       is_protocol_supported( $scheme )
409           Returns a boolean indicating if $scheme is supported by the proxy.
410
411           This method is only used internaly.
412
413           It is essential to allow HTTP::Proxy users to create
414           "pseudo-schemes" that LWP doesn't know about, but that one of the
415           proxy filters can handle directly. New schemes are added as fol‐
416           lows:
417
418               $proxy->init();    # required to get an agent
419               $proxy->agent->protocols_allowed(
420                   [ @{ $proxy->agent->protocols_allowed }, 'myhttp' ] );
421
422       new_connection()
423           Increase the proxy's TCP connections counter. Only used by
424           "HTTP::Proxy::Engine" objects.
425
426       Apache-like attributes
427
428       "HTTP::Proxy" has several Apache-like attributes that control the way
429       the HTTP and TCP connections are handled.
430
431       The following attributes control the TCP connection. They are passed to
432       the underlying "HTTP::Proxy::Engine", which may (or may not) use them
433       to change its behaviour.
434
435       start_servers
436           Number of child process to fork at the beginning.
437
438       max_clients
439           Maximum number of concurrent TCP connections (i.e. child pro‐
440           cesses).
441
442       max_requests_per_child
443           Maximum number of TCP connections handled by the same child
444           process.
445
446       min_spare_servers
447           Minimum number of inactive child processes.
448
449       max_spare_servers
450           Maximum number of inactive child processes.
451
452       Those attributes control the HTTP connection:
453
454       keep_alive
455           Support for keep alive HTTP connections.
456
457       max_keep_alive_requests
458           Maximum number of HTTP connections within a single TCP connection.
459
460       keep_alive_timeout
461           Timeout for keep-alive connection.
462

EXPORTED SYMBOLS

464       No symbols are exported by default. The ":log" tag exports all the log‐
465       ging constants.
466

BUGS

468       This module does not work under Windows, but I can't see why, and do
469       not have a development platform under that system. Patches and explana‐
470       tions very welcome.
471
472       I guess it is because "fork()" is not well supported.
473
474           $proxy->maxchild(0);
475
476       However, David Fishburn says:
477           This did not work for me under WinXP - ActiveState Perl 5.6, but it
478           DOES work on WinXP ActiveState Perl 5.8.
479
480       Several people have tried to help, but we haven't found a way to make
481       it work correctly yet.
482
483       As from version 0.16, the default engine is
484       "HTTP::Proxy::Engine::NoFork".  Let me know if it works better.
485

SEE ALSO

487       HTTP::Proxy::Engine, HTTP::Proxy::BodyFilter, HTTP::Proxy::HeaderFil‐
488       ter, the examples in eg/.
489

AUTHOR

491       Philippe "BooK" Bruhat, <book@cpan.org>.
492
493       The module has its own web page at <http://http-proxy.mongueurs.net/>
494       complete with older versions and repository snapshot.
495
496       There are also two mailing-lists: http-proxy@mongueurs.net for general
497       discussion about "HTTP::Proxy" and http-proxy-cvs@mongueurs.net for CVS
498       commits emails.
499

THANKS

501       Many people helped me during the development of this module, either on
502       mailing-lists, IRC or over a beer in a pub...
503
504       So, in no particular order, thanks to the libwww-perl team for such a
505       terrific suite of modules, perl-qa (tips for testing), the French Perl
506       Mongueurs (for code tricks, beers and encouragements) and my growing
507       user base... ";-)"
508
509       I'd like to particularly thank Dan Grigsby, who's been using
510       "HTTP::Proxy" since 2003 (before the filter classes even existed).  He
511       is apparently making a living from a product based on "HTTP::Proxy".
512       Thanks a lot for your confidence in my work!
513
515       Copyright 2002-2005, Philippe Bruhat.
516

LICENSE

518       This module is free software; you can redistribute it or modify it
519       under the same terms as Perl itself.
520
521
522
523perl v5.8.8                       2006-09-04                    HTTP::Proxy(3)
Impressum