1HTTP::Proxy(3)        User Contributed Perl Documentation       HTTP::Proxy(3)
2
3
4

NAME

6       HTTP::Proxy - A pure Perl HTTP proxy
7

SYNOPSIS

9           use HTTP::Proxy;
10
11           # initialisation
12           my $proxy = HTTP::Proxy->new( port => 3128 );
13
14           # alternate initialisation
15           my $proxy = HTTP::Proxy->new;
16           $proxy->port( 3128 ); # the classical accessors are here!
17
18           # this is a MainLoop-like method
19           $proxy->start;
20

DESCRIPTION

22       This module implements a HTTP proxy, using a HTTP::Daemon to accept
23       client connections, and a LWP::UserAgent to ask for the requested
24       pages.
25
26       The most interesting feature of this proxy object is its ability to
27       filter the HTTP requests and responses through user-defined filters.
28
29       Once the proxy is created, with the "new()" method, it is possible to
30       alter its behaviour by adding so-called "filters". This is done by the
31       "push_filter()" method. Once the filter is ready to run, it can be
32       launched, with the "start()" method. This method does not normally
33       return until the proxy is killed or otherwise stopped.
34
35       An important thing to note is that the proxy is (except when running
36       the "NoFork" engine) a forking proxy: it doesn't support passing
37       information between child processes, and you can count on reliable
38       information passing only during a single HTTP connection (request +
39       response).
40

FILTERS

42       You can alter the way the default HTTP::Proxy works by plugging
43       callbacks (filter objects, actually) at different stages of the
44       request/response handling.
45
46       When a request is received by the HTTP::Proxy object, it is filtered
47       through a standard filter that transform this request accordingly to
48       RFC 2616 (by adding the "Via:" header, and a few other
49       transformations). This is the default, bare minimum behaviour.
50
51       The response is also filtered in the same manner. There is a total of
52       four filter chains: "request-headers", "request-body",
53       "reponse-headers" and "response-body".
54
55       You can add your own filters to the default ones with the
56       "push_filter()" method. The method pushes a filter on the appropriate
57       filter stack.
58
59           $proxy->push_filter( response => $filter );
60
61       The headers/body category is determined by the base class of the
62       filter.  There are two base classes for filters, which are
63       "HTTP::Proxy::HeaderFilter" and "HTTP::Proxy::BodyFilter" (the names
64       are self-explanatory). See the documentation of those two classes to
65       find out how to write your own header or body filters.
66
67       The named parameter is used to determine the request/response part.
68
69       It is possible to push the same filter on the request and response
70       stacks, as in the following example:
71
72           $proxy->push_filter( request => $filter, response => $filter );
73
74       If several filters match the message, they will be applied in the order
75       they were pushed on their filter stack.
76
77       Named parameters can be used to create the match routine. They are:
78
79           method - the request method
80           scheme - the URI scheme
81           host   - the URI authority (host:port)
82           path   - the URI path
83           query  - the URI query string
84           mime   - the MIME type (for a response-body filter)
85
86       The filters are applied only when all the the parameters match the
87       request or the response. All these named parameters have default
88       values, which are:
89
90           method => 'OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT'
91           scheme => 'http'
92           host   => ''
93           path   => ''
94           query  => ''
95           mime   => 'text/*'
96
97       The "mime" parameter is a glob-like string, with a required "/"
98       character and a "*" as a joker. Thus, "*/*" matches all responses, and
99       "" those with no "Content-Type:" header. To match any reponse (with or
100       without a "Content-Type:" header), use "undef".
101
102       The "mime" parameter is only meaningful with the "response-body" filter
103       stack. It is ignored if passed to any other filter stack.
104
105       The "method" and "scheme" parameters are strings consisting of comma-
106       separated values. The "host" and "path" parameters are regular
107       expressions.
108
109       A match routine is compiled by the proxy and used to check if a
110       particular request or response must be filtered through a particular
111       filter.
112
113       It is also possible to push several filters on the same stack with the
114       same match subroutine:
115
116           # convert italics to bold
117           $proxy->push_filter(
118               mime     => 'text/html',
119               response => HTTP::Proxy::BodyFilter::tags->new(),
120               response => HTTP::Proxy::BodyFilter::simple->new(
121                   sub { ${ $_[1] } =~ s!(</?)i>!$1b>!ig }
122               )
123           );
124
125       For more details regarding the creation of new filters, check the
126       "HTTP::Proxy::HeaderFilter" and "HTTP::Proxy::BodyFilter"
127       documentation.
128
129       Here's an example of subclassing a base filter class:
130
131           # fixes a common typo ;-)
132           # but chances are that this will modify a correct URL
133           {
134               package FilterPerl;
135               use base qw( HTTP::Proxy::BodyFilter );
136
137               sub filter {
138                   my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
139                   $$dataref =~ s/PERL/Perl/g;
140               }
141           }
142           $proxy->push_filter( response => FilterPerl->new() );
143
144       Other examples can be found in the documentation for
145       "HTTP::Proxy::HeaderFilter", "HTTP::Proxy::BodyFilter",
146       "HTTP::Proxy::HeaderFilter::simple", "HTTP::Proxy::BodyFilter::simple".
147
148           # a simple anonymiser
149           # see eg/anonymiser.pl for the complete code
150           $proxy->push_filter(
151               mime    => undef,
152               request => HTTP::Proxy::HeaderFilter::simple->new(
153                   sub { $_[0]->remove_header(qw( User-Agent From Referer Cookie )) },
154               ),
155               response => HTTP::Proxy::HeaderFilter::simple->new(
156                   sub { $_[0]->remove_header(qw( Set-Cookie )); },
157               )
158           );
159
160       IMPORTANT: If you use your own "LWP::UserAgent", you must install it
161       before your calls to "push_filter()", otherwise the match method will
162       make wrong assumptions about the schemes your agent supports.
163
164       NOTE: It is likely that possibility of changing the agent or the daemon
165       may disappear in future versions.
166

METHODS

168   Constructor and initialisation
169       new()
170           The "new()" method creates a new HTTP::Proxy object. All attributes
171           can be passed as parameters to replace the default.
172
173           Parameters that are not "HTTP::Proxy" attributes will be ignored
174           and passed to the chosen "HTTP::Proxy::Engine" object.
175
176       init()
177           "init()" initialise the proxy without starting it. It is usually
178           not needed.
179
180           This method is called by "start()" if needed.
181
182       push_filter()
183           The "push_filter()" method is used to add filters to the proxy.  It
184           is fully described in section FILTERS.
185
186   Accessors and mutators
187       The HTTP::Proxy has several accessors and mutators.
188
189       Called with arguments, the accessor returns the current value.  Called
190       with a single argument, it sets the current value and returns the
191       previous one, in case you want to keep it.
192
193       If you call a read-only accessor with a parameter, this parameter will
194       be ignored.
195
196       The defined accessors are (in alphabetical order):
197
198       agent
199           The LWP::UserAgent object used internally to connect to remote
200           sites.
201
202       chunk
203           The chunk size for the LWP::UserAgent callbacks.
204
205       client_socket (read-only)
206           The socket currently connected to the client. Mostly useful in
207           filters.
208
209       client_headers
210           This attribute holds a reference to the client headers set up by
211           LWP::UserAgent ("Client-Aborted", "Client-Bad-Header-Line",
212           "Client-Date", "Client-Junk", "Client-Peer", "Client-Request-Num",
213           "Client-Response-Num", "Client-SSL-Cert-Issuer",
214           "Client-SSL-Cert-Subject", "Client-SSL-Cipher",
215           "Client-SSL-Warning", "Client-Transfer-Encoding",
216           "Client-Warning").
217
218           They are removed by the filter HTTP::Proxy::HeaderFilter::standard
219           from the request and response objects received by the proxy.
220
221           If a filter (such as a SSL certificate verification filter) need to
222           access them, it must do it through this accessor.
223
224       conn (read-only)
225           The number of connections processed by this HTTP::Proxy instance.
226
227       daemon
228           The HTTP::Daemon object used to accept incoming connections.  (You
229           usually never need this.)
230
231       engine
232           The HTTP::Proxy::Engine object that manages the child processes.
233
234       hop_headers
235           This attribute holds a reference to the hop-by-hop headers
236           ("Connection", "Keep-Alive", "Proxy-Authenticate",
237           "Proxy-Authorization", "TE", "Trailers", "Transfer-Encoding",
238           "Upgrade").
239
240           They are removed by the filter HTTP::Proxy::HeaderFilter::standard
241           from the request and response objects received by the proxy.
242
243           If a filter (such as a proxy authorisation filter) need to access
244           them, it must do it through this accessor.
245
246       host
247           The proxy HTTP::Daemon host (default: 'localhost').
248
249           This means that by default, the proxy answers only to clients on
250           the local machine. You can pass a specific interface address or
251           ""/"undef" for any interface.
252
253           This default prevents your proxy to be used as an anonymous proxy
254           by script kiddies.
255
256       known_methods( @groups ) (read-only)
257           This method returns all HTTP (and extensions to HTTP) known to
258           "HTTP::Proxy". Methods are grouped by type. Known method groups
259           are: "HTTP", "WebDAV" and "DeltaV".
260
261           Called with an empty list, this method will return all known
262           methods.  This method is case-insensitive, and will "carp()" if an
263           unknown group name is passed.
264
265       logfh
266           A filehandle to a logfile (default: *STDERR).
267
268       logmask( [$mask] )
269           Be verbose in the logs (default: NONE).
270
271           Here are the various elements that can be added to the mask (their
272           values are powers of 2, starting from 0 and listed here in
273           ascending order):
274
275               NONE    - Log only errors
276               PROXY   - Proxy information
277               STATUS  - Requested URL, reponse status and total number
278                         of connections processed
279               PROCESS - Subprocesses information (fork, wait, etc.)
280               SOCKET  - Information about low-level sockets
281               HEADERS - Full request and response headers are sent along
282               FILTERS - Filter information
283               DATA    - Data received by the filters
284               CONNECT - Data transmitted by the CONNECT method
285               ENGINE  - Engine information
286               ALL     - Log all of the above
287
288           If you only want status and process information, you can use:
289
290               $proxy->logmask( STATUS | PROCESS );
291
292           Note that all the logging constants are not exported by default,
293           but by the ":log" tag. They can also be exported one by one.
294
295       loop (read-only)
296           Internal. False when the main loop is about to be broken.
297
298       max_clients
299       maxchild
300           The maximum number of child process the HTTP::Proxy object will
301           spawn to handle client requests (default: depends on the engine).
302
303           This method is currently delegated to the HTTP::Proxy::Engine
304           object.
305
306           "maxchild" is deprecated and will disappear.
307
308       max_connections
309       maxconn
310           The maximum number of TCP connections the proxy will accept before
311           returning from start(). 0 (the default) means never stop accepting
312           connections.
313
314           "maxconn" is deprecated.
315
316           Note: "max_connections" will be deprecated soon, for two reasons:
317           1) it is more of an HTTP::Proxy::Engine attribute, 2) not all
318           engines will support it.
319
320       max_keep_alive_requests
321       maxserve
322           The maximum number of requests the proxy will serve in a single
323           connection.  (same as "MaxRequestsPerChild" in Apache)
324
325           "maxserve" is deprecated.
326
327       port
328           The proxy "HTTP::Daemon" port (default: 8080).
329
330       request
331           The request originaly received by the proxy from the user-agent,
332           which will be modified by the request filters.
333
334       response
335           The response received from the origin server by the proxy. It is
336           normally "undef" until the proxy actually receives the beginning of
337           a response from the origin server.
338
339           If one of the request filters sets this attribute, it "short-
340           circuits" the request/response scheme, and the proxy will return
341           this response (which is NOT filtered through the response filter
342           stacks) instead of the expected origin server response. This is
343           useful for caching (though Squid does it much better) and proxy
344           authentication, for example.
345
346       stash
347           The stash is a hash where filters can store data to share between
348           them.
349
350           The stash() method can be used to set the whole hash (with a HASH
351           reference).  To access individual keys simply do:
352
353               $proxy->stash( 'bloop' );
354
355           To set it, type:
356
357               $proxy->stash( bloop => 'owww' );
358
359           It's also possibly to get a reference to the stash:
360
361               my $s = $filter->proxy->stash();
362               $s->{bang} = 'bam';
363
364               # $proxy->stash( 'bang' ) will now return 'bam'
365
366           Warning: since the proxy forks for each TCP connection, the data is
367           only shared between filters in the same child process.
368
369       timeout
370           The timeout used by the internal LWP::UserAgent (default: 60).
371
372       url (read-only)
373           The url where the proxy can be reached.
374
375       via The content of the Via: header. Setting it to an empty string will
376           prevent its addition. (default: "$hostname (HTTP::Proxy/$VERSION)")
377
378       x_forwarded_for
379           If set to a true value, the proxy will send the "X-Forwarded-For:"
380           header.  (default: true)
381
382   Connection handling methods
383       start()
384           This method works like Tk's "MainLoop": you hand over control to
385           the "HTTP::Proxy" object you created and configured.
386
387           If "maxconn" is not zero, "start()" will return after accepting at
388           most that many connections. It will return the total number of
389           connexions.
390
391       serve_connections()
392           This is the internal method used to handle each new TCP connection
393           to the proxy.
394
395   Other methods
396       log( $level, $prefix, $message )
397           Adds $message at the end of "logfh", if $level matches "logmask".
398           The "log()" method also prints a timestamp.
399
400           The output looks like:
401
402               [Thu Dec  5 12:30:12 2002] ($$) $prefix: $message
403
404           where $$ is the current processus id.
405
406           If $message is a multiline string, several log lines will be
407           output, each line starting with $prefix.
408
409       is_protocol_supported( $scheme )
410           Returns a boolean indicating if $scheme is supported by the proxy.
411
412           This method is only used internaly.
413
414           It is essential to allow HTTP::Proxy users to create "pseudo-
415           schemes" that LWP doesn't know about, but that one of the proxy
416           filters can handle directly. New schemes are added as follows:
417
418               $proxy->init();    # required to get an agent
419               $proxy->agent->protocols_allowed(
420                   [ @{ $proxy->agent->protocols_allowed }, 'myhttp' ] );
421
422       new_connection()
423           Increase the proxy's TCP connections counter. Only used by
424           "HTTP::Proxy::Engine" objects.
425
426   Apache-like attributes
427       "HTTP::Proxy" has several Apache-like attributes that control the way
428       the HTTP and TCP connections are handled.
429
430       The following attributes control the TCP connection. They are passed to
431       the underlying "HTTP::Proxy::Engine", which may (or may not) use them
432       to change its behaviour.
433
434       start_servers
435           Number of child process to fork at the beginning.
436
437       max_clients
438           Maximum number of concurrent TCP connections (i.e. child
439           processes).
440
441       max_requests_per_child
442           Maximum number of TCP connections handled by the same child
443           process.
444
445       min_spare_servers
446           Minimum number of inactive child processes.
447
448       max_spare_servers
449           Maximum number of inactive child processes.
450
451       Those attributes control the HTTP connection:
452
453       keep_alive
454           Support for keep alive HTTP connections.
455
456       max_keep_alive_requests
457           Maximum number of HTTP connections within a single TCP connection.
458
459       keep_alive_timeout
460           Timeout for keep-alive connection.
461

EXPORTED SYMBOLS

463       No symbols are exported by default. The ":log" tag exports all the
464       logging constants.
465

BUGS

467       This module does not work under Windows, but I can't see why, and do
468       not have a development platform under that system. Patches and
469       explanations very welcome.
470
471       I guess it is because "fork()" is not well supported.
472
473           $proxy->maxchild(0);
474
475       However, David Fishburn says:
476           This did not work for me under WinXP - ActiveState Perl 5.6, but it
477           DOES work on WinXP ActiveState Perl 5.8.
478
479       Several people have tried to help, but we haven't found a way to make
480       it work correctly yet.
481
482       As from version 0.16, the default engine is
483       "HTTP::Proxy::Engine::NoFork".  Let me know if it works better.
484

SEE ALSO

486       HTTP::Proxy::Engine, HTTP::Proxy::BodyFilter,
487       HTTP::Proxy::HeaderFilter, the examples in eg/.
488

AUTHOR

490       Philippe "BooK" Bruhat, <book@cpan.org>.
491
492       The module has its own web page at http://http-proxy.mongueurs.net/
493       <http://http-proxy.mongueurs.net/> complete with older versions and
494       repository snapshot.
495
496       There are also two mailing-lists: http-proxy@mongueurs.net for general
497       discussion about "HTTP::Proxy" and http-proxy-cvs@mongueurs.net for CVS
498       commits emails.
499

THANKS

501       Many people helped me during the development of this module, either on
502       mailing-lists, IRC or over a beer in a pub...
503
504       So, in no particular order, thanks to the libwww-perl team for such a
505       terrific suite of modules, perl-qa (tips for testing), the French Perl
506       Mongueurs (for code tricks, beers and encouragements) and my growing
507       user base... ";-)"
508
509       I'd like to particularly thank Dan Grigsby, who's been using
510       "HTTP::Proxy" since 2003 (before the filter classes even existed).  He
511       is apparently making a living from a product based on "HTTP::Proxy".
512       Thanks a lot for your confidence in my work!
513
515       Copyright 2002-2008, Philippe Bruhat.
516

LICENSE

518       This module is free software; you can redistribute it or modify it
519       under the same terms as Perl itself.
520
521
522
523perl v5.12.0                      2010-05-02                    HTTP::Proxy(3)
Impressum