1HTTP::Proxy(3)        User Contributed Perl Documentation       HTTP::Proxy(3)
2
3
4

NAME

6       HTTP::Proxy - A pure Perl HTTP proxy
7

SYNOPSIS

9           use HTTP::Proxy;
10
11           # initialisation
12           my $proxy = HTTP::Proxy->new( port => 3128 );
13
14           # alternate initialisation
15           my $proxy = HTTP::Proxy->new;
16           $proxy->port( 3128 ); # the classical accessors are here!
17
18           # this is a MainLoop-like method
19           $proxy->start;
20

DESCRIPTION

22       This module implements an HTTP proxy, using an HTTP::Daemon to accept
23       client connections, and an LWP::UserAgent to ask for the requested
24       pages.
25
26       The most interesting feature of this proxy object is its ability to
27       filter the HTTP requests and responses through user-defined filters.
28
29       Once the proxy is created, with the "new()" method, it is possible to
30       alter its behaviour by adding so-called "filters." This is done by the
31       "push_filter()" method. Once the filter is ready to run, it can be
32       launched, with the "start()" method. This method does not normally
33       return until the proxy is killed or otherwise stopped.
34
35       An important thing to note is that the proxy is (except when running
36       the "NoFork" engine) a forking proxy: it doesn't support passing
37       information between child processes, and you can count on reliable
38       information passing only during a single HTTP connection (request +
39       response).
40

FILTERS

42       You can alter the way the default HTTP::Proxy works by plugging
43       callbacks (filter objects, actually) at different stages of the
44       request/response handling.
45
46       When a request is received by the HTTP::Proxy object, it is filtered
47       through a standard filter that transforms the request according to RFC
48       2616 (by adding the "Via:" header, and other transformations). This is
49       the default, bare minimum behaviour.
50
51       The response is also filtered in the same manner. There is a total of
52       four filter chains: "request-headers", "request-body",
53       "response-headers" and "response-body".
54
55       You can add your own filters to the default ones with the
56       "push_filter()" method. The method pushes a filter on the appropriate
57       filter stack.
58
59           $proxy->push_filter( response => $filter );
60
61       The headers/body category is determined by the base class of the
62       filter. There are two base classes for filters, which are
63       HTTP::Proxy::HeaderFilter and HTTP::Proxy::BodyFilter (the names are
64       self-explanatory). See the documentation of those two classes to find
65       out how to write your own header and body filters.
66
67       The named parameter is used to determine the request/response part.
68
69       It is possible to push the same filter on the request and response
70       stacks, as in the following example:
71
72           $proxy->push_filter( request => $filter, response => $filter );
73
74       If several filters match the message, they will be applied in the order
75       they were pushed on their filter stack.
76
77       Named parameters can be used to create the match routine. They are:
78
79           method - the request method
80           scheme - the URI scheme
81           host   - the URI authority (host:port)
82           path   - the URI path
83           query  - the URI query string
84           mime   - the MIME type (for a response-body filter)
85
86       The filters are applied only when all the the parameters match the
87       request or the response. All these named parameters have default
88       values, which are:
89
90           method => 'OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT'
91           scheme => 'http'
92           host   => ''
93           path   => ''
94           query  => ''
95           mime   => 'text/*'
96
97       The "mime" parameter is a glob-like string, with a required "/"
98       character and a "*" as a wildcard. Thus, "*/*" matches all responses,
99       and "" those with no "Content-Type:" header. To match any repines (with
100       or without a "Content-Type:" header), use "undef".
101
102       The "mime" parameter is only meaningful with the "response-body" filter
103       stack. It is ignored if passed to any other filter stack.
104
105       The "method" and "scheme" parameters are strings consisting of comma-
106       separated values. The "host" and "path" parameters are regular
107       expressions.
108
109       A match routine is compiled by the proxy and used to check if a
110       particular request or response must be filtered through a particular
111       filter.
112
113       It is also possible to push several filters on the same stack with the
114       same match subroutine:
115
116           # convert italics to bold
117           $proxy->push_filter(
118               mime     => 'text/html',
119               response => HTTP::Proxy::BodyFilter::tags->new(),
120               response => HTTP::Proxy::BodyFilter::simple->new(
121                   sub { ${ $_[1] } =~ s!(</?)i>!$1b>!ig }
122               )
123           );
124
125       For more details regarding the creation of new filters, check the
126       HTTP::Proxy::HeaderFilter and HTTP::Proxy::BodyFilter documentation.
127
128       Here's an example of subclassing a base filter class:
129
130           # fixes a common typo ;-)
131           # but chances are that this will modify a correct URL
132           {
133               package FilterPerl;
134               use base qw( HTTP::Proxy::BodyFilter );
135
136               sub filter {
137                   my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
138                   $$dataref =~ s/PERL/Perl/g;
139               }
140           }
141           $proxy->push_filter( response => FilterPerl->new() );
142
143       Other examples can be found in the documentation for
144       HTTP::Proxy::HeaderFilter, HTTP::Proxy::BodyFilter,
145       HTTP::Proxy::HeaderFilter::simple, HTTP::Proxy::BodyFilter::simple.
146
147           # a simple anonymiser
148           # see eg/anonymiser.pl for the complete code
149           $proxy->push_filter(
150               mime    => undef,
151               request => HTTP::Proxy::HeaderFilter::simple->new(
152                   sub { $_[1]->remove_header(qw( User-Agent From Referer Cookie )) },
153               ),
154               response => HTTP::Proxy::HeaderFilter::simple->new(
155                   sub { $_[1]->remove_header(qw( Set-Cookie )); },
156               )
157           );
158
159       IMPORTANT: If you use your own LWP::UserAgent, you must install it
160       before your calls to "push_filter()", otherwise the match method will
161       make wrong assumptions about the schemes your agent supports.
162
163       NOTE: It is likely that possibility of changing the agent or the daemon
164       may disappear in future versions.
165

METHODS

167   Constructor and initialisation
168       new()
169           The "new()" method creates a new HTTP::Proxy object. All attributes
170           can be passed as parameters to replace the default.
171
172           Parameters that are not HTTP::Proxy attributes will be ignored and
173           passed to the chosen HTTP::Proxy::Engine object.
174
175       init()
176           "init()" initialise the proxy without starting it. It is usually
177           not needed.
178
179           This method is called by "start()" if needed.
180
181       push_filter()
182           The "push_filter()" method is used to add filters to the proxy.  It
183           is fully described in section FILTERS.
184
185   Accessors and mutators
186       HTTP::Proxy class has several accessors and mutators.
187
188       Called with arguments, the accessor returns the current value.  Called
189       with a single argument, it sets the current value and returns the
190       previous one, in case you want to keep it.
191
192       If you call a read-only accessor with a parameter, this parameter will
193       be ignored.
194
195       The defined accessors are (in alphabetical order):
196
197       agent
198           The LWP::UserAgent object used internally to connect to remote
199           sites.
200
201       chunk
202           The chunk size for the LWP::UserAgent callbacks.
203
204       client_socket (read-only)
205           The socket currently connected to the client. Mostly useful in
206           filters.
207
208       client_headers
209           This attribute holds a reference to the client headers set up by
210           LWP::UserAgent ("Client-Aborted", "Client-Bad-Header-Line",
211           "Client-Date", "Client-Junk", "Client-Peer", "Client-Request-Num",
212           "Client-Response-Num", "Client-SSL-Cert-Issuer",
213           "Client-SSL-Cert-Subject", "Client-SSL-Cipher",
214           "Client-SSL-Warning", "Client-Transfer-Encoding",
215           "Client-Warning").
216
217           They are removed by the filter HTTP::Proxy::HeaderFilter::standard
218           from the request and response objects received by the proxy.
219
220           If a filter (such as a SSL certificate verification filter) need to
221           access them, it must do it through this accessor.
222
223       conn (read-only)
224           The number of connections processed by this HTTP::Proxy instance.
225
226       daemon
227           The HTTP::Daemon object used to accept incoming connections.  (You
228           usually never need this.)
229
230       engine
231           The HTTP::Proxy::Engine object that manages the child processes.
232
233       hop_headers
234           This attribute holds a reference to the hop-by-hop headers
235           ("Connection", "Keep-Alive", "Proxy-Authenticate",
236           "Proxy-Authorization", "TE", "Trailers", "Transfer-Encoding",
237           "Upgrade").
238
239           They are removed by the filter HTTP::Proxy::HeaderFilter::standard
240           from the request and response objects received by the proxy.
241
242           If a filter (such as a proxy authorisation filter) need to access
243           them, it must do it through this accessor.
244
245       host
246           The proxy HTTP::Daemon host (default: 'localhost').
247
248           This means that by default, the proxy answers only to clients on
249           the local machine. You can pass a specific interface address or
250           ""/"undef" for any interface.
251
252           This default prevents your proxy to be used as an anonymous proxy
253           by script kiddies.
254
255       known_methods( @groups ) (read-only)
256           This method returns all HTTP (and extensions to HTTP) known to
257           "HTTP::Proxy". Methods are grouped by type. Known method groups
258           are: "HTTP", "WebDAV" and "DeltaV".
259
260           Called with an empty list, this method will return all known
261           methods.  This method is case-insensitive, and will "carp()" if an
262           unknown group name is passed.
263
264       logfh
265           A filehandle to a logfile (default: *STDERR).
266
267       logmask( [$mask] )
268           Be verbose in the logs (default: "NONE").
269
270           Here are the various elements that can be added to the mask (their
271           values are powers of 2, starting from 0 and listed here in
272           ascending order):
273
274               NONE    - Log only errors
275               PROXY   - Proxy information
276               STATUS  - Requested URL, response status and total number
277                         of connections processed
278               PROCESS - Subprocesses information (fork, wait, etc.)
279               SOCKET  - Information about low-level sockets
280               HEADERS - Full request and response headers are sent along
281               FILTERS - Filter information
282               DATA    - Data received by the filters
283               CONNECT - Data transmitted by the CONNECT method
284               ENGINE  - Engine information
285               ALL     - Log all of the above
286
287           If you only want status and process information, you can use:
288
289               $proxy->logmask( STATUS | PROCESS );
290
291           Note that all the logging constants are not exported by default,
292           but by the ":log" tag. They can also be exported one by one.
293
294       loop (read-only)
295           Internal. False when the main loop is about to be broken.
296
297       max_clients
298       maxchild
299           The maximum number of child process the HTTP::Proxy object will
300           spawn to handle client requests (default: depends on the engine).
301
302           This method is currently delegated to the HTTP::Proxy::Engine
303           object.
304
305           "maxchild" is deprecated and will disappear.
306
307       max_connections
308       maxconn
309           The maximum number of TCP connections the proxy will accept before
310           returning from start(). 0 (the default) means never stop accepting
311           connections.
312
313           "maxconn" is deprecated.
314
315           Note: "max_connections" will be deprecated soon, for two reasons:
316           1) it is more of an HTTP::Proxy::Engine attribute, 2) not all
317           engines will support it.
318
319       max_keep_alive_requests
320       maxserve
321           The maximum number of requests the proxy will serve in a single
322           connection.  (same as "MaxRequestsPerChild" in Apache)
323
324           "maxserve" is deprecated.
325
326       port
327           The proxy HTTP::Daemon port (default: 8080).
328
329       request
330           The request originally received by the proxy from the user-agent,
331           which will be modified by the request filters.
332
333       response
334           The response received from the origin server by the proxy. It is
335           normally "undef" until the proxy actually receives the beginning of
336           a response from the origin server.
337
338           If one of the request filters sets this attribute, it "short-
339           circuits" the request/response scheme, and the proxy will return
340           this response (which is NOT filtered through the response filter
341           stacks) instead of the expected origin server response. This is
342           useful for caching (though Squid does it much better) and proxy
343           authentication, for example.
344
345       stash
346           The stash is a hash where filters can store data to share between
347           them.
348
349           The stash() method can be used to set the whole hash (with a HASH
350           reference).  To access individual keys simply do:
351
352               $proxy->stash( 'bloop' );
353
354           To set it, type:
355
356               $proxy->stash( bloop => 'owww' );
357
358           It's also possibly to get a reference to the stash:
359
360               my $s = $filter->proxy->stash();
361               $s->{bang} = 'bam';
362
363               # $proxy->stash( 'bang' ) will now return 'bam'
364
365           Warning: since the proxy forks for each TCP connection, the data is
366           only shared between filters in the same child process.
367
368       timeout
369           The timeout used by the internal LWP::UserAgent (default: 60).
370
371       url (read-only)
372           The url where the proxy can be reached.
373
374       via The content of the Via: header. Setting it to an empty string will
375           prevent its addition. (default: "$hostname (HTTP::Proxy/$VERSION)")
376
377       x_forwarded_for
378           If set to a true value, the proxy will send the "X-Forwarded-For:"
379           header.  (default: true)
380
381   Connection handling methods
382       start()
383           This method works like Tk's "MainLoop": you hand over control to
384           the HTTP::Proxy object you created and configured.
385
386           If "maxconn" is not zero, "start()" will return after accepting at
387           most that many connections. It will return the total number of
388           connexions.
389
390       serve_connections()
391           This is the internal method used to handle each new TCP connection
392           to the proxy.
393
394   Other methods
395       log( $level, $prefix, $message )
396           Adds $message at the end of "logfh", if $level matches "logmask".
397           The "log()" method also prints a timestamp.
398
399           The output looks like:
400
401               [Thu Dec  5 12:30:12 2002] ($$) $prefix: $message
402
403           where $$ is the current process's id.
404
405           If $message is a multiline string, several log lines will be
406           output, each line starting with $prefix.
407
408       is_protocol_supported( $scheme )
409           Returns a boolean indicating if $scheme is supported by the proxy.
410
411           This method is only used internally.
412
413           It is essential to allow HTTP::Proxy users to create "pseudo-
414           schemes" that LWP doesn't know about, but that one of the proxy
415           filters can handle directly. New schemes are added as follows:
416
417               $proxy->init();    # required to get an agent
418               $proxy->agent->protocols_allowed(
419                   [ @{ $proxy->agent->protocols_allowed }, 'myhttp' ] );
420
421       new_connection()
422           Increase the proxy's TCP connections counter. Only used by
423           HTTP::Proxy::Engine objects.
424
425   Apache-like attributes
426       HTTP::Proxy has several Apache-like attributes that control the way the
427       HTTP and TCP connections are handled.
428
429       The following attributes control the TCP connection. They are passed to
430       the underlying HTTP::Proxy::Engine, which may (or may not) use them to
431       change its behaviour.
432
433       start_servers
434           Number of child process to fork at the beginning.
435
436       max_clients
437           Maximum number of concurrent TCP connections (i.e. child
438           processes).
439
440       max_requests_per_child
441           Maximum number of TCP connections handled by the same child
442           process.
443
444       min_spare_servers
445           Minimum number of inactive child processes.
446
447       max_spare_servers
448           Maximum number of inactive child processes.
449
450       Those attributes control the HTTP connection:
451
452       keep_alive
453           Support for keep alive HTTP connections.
454
455       max_keep_alive_requests
456           Maximum number of HTTP connections within a single TCP connection.
457
458       keep_alive_timeout
459           Timeout for keep-alive connection.
460

EXPORTED SYMBOLS

462       No symbols are exported by default. The ":log" tag exports all the
463       logging constants.
464

BUGS

466       This module does not work under Windows, but I can't see why, and do
467       not have a development platform under that system. Patches and
468       explanations very welcome.
469
470       I guess it is because "fork()" is not well supported.
471
472           $proxy->maxchild(0);
473
474       However, David Fishburn says:
475           This did not work for me under WinXP - ActiveState Perl 5.6, but it
476           DOES work on WinXP ActiveState Perl 5.8.
477
478       Several people have tried to help, but we haven't found a way to make
479       it work correctly yet.
480
481       As from version 0.16, the default engine is
482       HTTP::Proxy::Engine::NoFork.  Let me know if it works better.
483

SEE ALSO

485       HTTP::Proxy::Engine, HTTP::Proxy::BodyFilter,
486       HTTP::Proxy::HeaderFilter, the examples in eg/.
487

AUTHOR

489       Philippe "BooK" Bruhat, <book@cpan.org>.
490
491       There is also a mailing-list: http-proxy@mongueurs.net for general
492       discussion about HTTP::Proxy.
493

THANKS

495       Many people helped me during the development of this module, either on
496       mailing-lists, IRC or over a beer in a pub...
497
498       So, in no particular order, thanks to the libwww-perl team for such a
499       terrific suite of modules, perl-qa (tips for testing), the French Perl
500       Mongueurs (for code tricks, beers and encouragements) and my growing
501       user base... ";-)"
502
503       I'd like to particularly thank Dan Grigsby, who's been using
504       HTTP::Proxy since 2003 (before the filter classes even existed).  He is
505       apparently making a living from a product based on HTTP::Proxy. Thanks
506       a lot for your confidence in my work!
507
509       Copyright 2002-2015, Philippe Bruhat.
510

LICENSE

512       This module is free software; you can redistribute it or modify it
513       under the same terms as Perl itself.
514
515
516
517perl v5.32.0                      2020-07-28                    HTTP::Proxy(3)
Impressum