1HTTP::Proxy(3) User Contributed Perl Documentation HTTP::Proxy(3)
2
3
4
6 HTTP::Proxy - A pure Perl HTTP proxy
7
9 use HTTP::Proxy;
10
11 # initialisation
12 my $proxy = HTTP::Proxy->new( port => 3128 );
13
14 # alternate initialisation
15 my $proxy = HTTP::Proxy->new;
16 $proxy->port( 3128 ); # the classical accessors are here!
17
18 # this is a MainLoop-like method
19 $proxy->start;
20
22 This module implements an HTTP proxy, using an HTTP::Daemon to accept
23 client connections, and an LWP::UserAgent to ask for the requested
24 pages.
25
26 The most interesting feature of this proxy object is its ability to
27 filter the HTTP requests and responses through user-defined filters.
28
29 Once the proxy is created, with the "new()" method, it is possible to
30 alter its behaviour by adding so-called "filters." This is done by the
31 "push_filter()" method. Once the filter is ready to run, it can be
32 launched, with the "start()" method. This method does not normally
33 return until the proxy is killed or otherwise stopped.
34
35 An important thing to note is that the proxy is (except when running
36 the "NoFork" engine) a forking proxy: it doesn't support passing
37 information between child processes, and you can count on reliable
38 information passing only during a single HTTP connection (request +
39 response).
40
42 You can alter the way the default HTTP::Proxy works by plugging
43 callbacks (filter objects, actually) at different stages of the
44 request/response handling.
45
46 When a request is received by the HTTP::Proxy object, it is filtered
47 through a standard filter that transforms the request according to RFC
48 2616 (by adding the "Via:" header, and other transformations). This is
49 the default, bare minimum behaviour.
50
51 The response is also filtered in the same manner. There is a total of
52 four filter chains: "request-headers", "request-body",
53 "response-headers" and "response-body".
54
55 You can add your own filters to the default ones with the
56 "push_filter()" method. The method pushes a filter on the appropriate
57 filter stack.
58
59 $proxy->push_filter( response => $filter );
60
61 The headers/body category is determined by the base class of the
62 filter. There are two base classes for filters, which are
63 HTTP::Proxy::HeaderFilter and HTTP::Proxy::BodyFilter (the names are
64 self-explanatory). See the documentation of those two classes to find
65 out how to write your own header and body filters.
66
67 The named parameter is used to determine the request/response part.
68
69 It is possible to push the same filter on the request and response
70 stacks, as in the following example:
71
72 $proxy->push_filter( request => $filter, response => $filter );
73
74 If several filters match the message, they will be applied in the order
75 they were pushed on their filter stack.
76
77 Named parameters can be used to create the match routine. They are:
78
79 method - the request method
80 scheme - the URI scheme
81 host - the URI authority (host:port)
82 path - the URI path
83 query - the URI query string
84 mime - the MIME type (for a response-body filter)
85
86 The filters are applied only when all the the parameters match the
87 request or the response. All these named parameters have default
88 values, which are:
89
90 method => 'OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT'
91 scheme => 'http'
92 host => ''
93 path => ''
94 query => ''
95 mime => 'text/*'
96
97 The "mime" parameter is a glob-like string, with a required "/"
98 character and a "*" as a wildcard. Thus, "*/*" matches all responses,
99 and "" those with no "Content-Type:" header. To match any repines (with
100 or without a "Content-Type:" header), use "undef".
101
102 The "mime" parameter is only meaningful with the "response-body" filter
103 stack. It is ignored if passed to any other filter stack.
104
105 The "method" and "scheme" parameters are strings consisting of comma-
106 separated values. The "host" and "path" parameters are regular
107 expressions.
108
109 A match routine is compiled by the proxy and used to check if a
110 particular request or response must be filtered through a particular
111 filter.
112
113 It is also possible to push several filters on the same stack with the
114 same match subroutine:
115
116 # convert italics to bold
117 $proxy->push_filter(
118 mime => 'text/html',
119 response => HTTP::Proxy::BodyFilter::tags->new(),
120 response => HTTP::Proxy::BodyFilter::simple->new(
121 sub { ${ $_[1] } =~ s!(</?)i>!$1b>!ig }
122 )
123 );
124
125 For more details regarding the creation of new filters, check the
126 HTTP::Proxy::HeaderFilter and HTTP::Proxy::BodyFilter documentation.
127
128 Here's an example of subclassing a base filter class:
129
130 # fixes a common typo ;-)
131 # but chances are that this will modify a correct URL
132 {
133 package FilterPerl;
134 use base qw( HTTP::Proxy::BodyFilter );
135
136 sub filter {
137 my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
138 $$dataref =~ s/PERL/Perl/g;
139 }
140 }
141 $proxy->push_filter( response => FilterPerl->new() );
142
143 Other examples can be found in the documentation for
144 HTTP::Proxy::HeaderFilter, HTTP::Proxy::BodyFilter,
145 HTTP::Proxy::HeaderFilter::simple, HTTP::Proxy::BodyFilter::simple.
146
147 # a simple anonymiser
148 # see eg/anonymiser.pl for the complete code
149 $proxy->push_filter(
150 mime => undef,
151 request => HTTP::Proxy::HeaderFilter::simple->new(
152 sub { $_[1]->remove_header(qw( User-Agent From Referer Cookie )) },
153 ),
154 response => HTTP::Proxy::HeaderFilter::simple->new(
155 sub { $_[1]->remove_header(qw( Set-Cookie )); },
156 )
157 );
158
159 IMPORTANT: If you use your own LWP::UserAgent, you must install it
160 before your calls to "push_filter()", otherwise the match method will
161 make wrong assumptions about the schemes your agent supports.
162
163 NOTE: It is likely that possibility of changing the agent or the daemon
164 may disappear in future versions.
165
167 Constructor and initialisation
168 new()
169 The "new()" method creates a new HTTP::Proxy object. All attributes
170 can be passed as parameters to replace the default.
171
172 Parameters that are not HTTP::Proxy attributes will be ignored and
173 passed to the chosen HTTP::Proxy::Engine object.
174
175 init()
176 "init()" initialise the proxy without starting it. It is usually
177 not needed.
178
179 This method is called by "start()" if needed.
180
181 push_filter()
182 The "push_filter()" method is used to add filters to the proxy. It
183 is fully described in section FILTERS.
184
185 Accessors and mutators
186 HTTP::Proxy class has several accessors and mutators.
187
188 Called with arguments, the accessor returns the current value. Called
189 with a single argument, it sets the current value and returns the
190 previous one, in case you want to keep it.
191
192 If you call a read-only accessor with a parameter, this parameter will
193 be ignored.
194
195 The defined accessors are (in alphabetical order):
196
197 agent
198 The LWP::UserAgent object used internally to connect to remote
199 sites.
200
201 chunk
202 The chunk size for the LWP::UserAgent callbacks.
203
204 client_socket (read-only)
205 The socket currently connected to the client. Mostly useful in
206 filters.
207
208 client_headers
209 This attribute holds a reference to the client headers set up by
210 LWP::UserAgent ("Client-Aborted", "Client-Bad-Header-Line",
211 "Client-Date", "Client-Junk", "Client-Peer", "Client-Request-Num",
212 "Client-Response-Num", "Client-SSL-Cert-Issuer",
213 "Client-SSL-Cert-Subject", "Client-SSL-Cipher",
214 "Client-SSL-Warning", "Client-Transfer-Encoding",
215 "Client-Warning").
216
217 They are removed by the filter HTTP::Proxy::HeaderFilter::standard
218 from the request and response objects received by the proxy.
219
220 If a filter (such as a SSL certificate verification filter) need to
221 access them, it must do it through this accessor.
222
223 conn (read-only)
224 The number of connections processed by this HTTP::Proxy instance.
225
226 daemon
227 The HTTP::Daemon object used to accept incoming connections. (You
228 usually never need this.)
229
230 engine
231 The HTTP::Proxy::Engine object that manages the child processes.
232
233 hop_headers
234 This attribute holds a reference to the hop-by-hop headers
235 ("Connection", "Keep-Alive", "Proxy-Authenticate",
236 "Proxy-Authorization", "TE", "Trailers", "Transfer-Encoding",
237 "Upgrade").
238
239 They are removed by the filter HTTP::Proxy::HeaderFilter::standard
240 from the request and response objects received by the proxy.
241
242 If a filter (such as a proxy authorisation filter) need to access
243 them, it must do it through this accessor.
244
245 host
246 The proxy HTTP::Daemon host (default: 'localhost').
247
248 This means that by default, the proxy answers only to clients on
249 the local machine. You can pass a specific interface address or
250 ""/"undef" for any interface.
251
252 This default prevents your proxy to be used as an anonymous proxy
253 by script kiddies.
254
255 known_methods( @groups ) (read-only)
256 This method returns all HTTP (and extensions to HTTP) known to
257 "HTTP::Proxy". Methods are grouped by type. Known method groups
258 are: "HTTP", "WebDAV" and "DeltaV".
259
260 Called with an empty list, this method will return all known
261 methods. This method is case-insensitive, and will "carp()" if an
262 unknown group name is passed.
263
264 logfh
265 A filehandle to a logfile (default: *STDERR).
266
267 logmask( [$mask] )
268 Be verbose in the logs (default: "NONE").
269
270 Here are the various elements that can be added to the mask (their
271 values are powers of 2, starting from 0 and listed here in
272 ascending order):
273
274 NONE - Log only errors
275 PROXY - Proxy information
276 STATUS - Requested URL, response status and total number
277 of connections processed
278 PROCESS - Subprocesses information (fork, wait, etc.)
279 SOCKET - Information about low-level sockets
280 HEADERS - Full request and response headers are sent along
281 FILTERS - Filter information
282 DATA - Data received by the filters
283 CONNECT - Data transmitted by the CONNECT method
284 ENGINE - Engine information
285 ALL - Log all of the above
286
287 If you only want status and process information, you can use:
288
289 $proxy->logmask( STATUS | PROCESS );
290
291 Note that all the logging constants are not exported by default,
292 but by the ":log" tag. They can also be exported one by one.
293
294 loop (read-only)
295 Internal. False when the main loop is about to be broken.
296
297 max_clients
298 maxchild
299 The maximum number of child process the HTTP::Proxy object will
300 spawn to handle client requests (default: depends on the engine).
301
302 This method is currently delegated to the HTTP::Proxy::Engine
303 object.
304
305 "maxchild" is deprecated and will disappear.
306
307 max_connections
308 maxconn
309 The maximum number of TCP connections the proxy will accept before
310 returning from start(). 0 (the default) means never stop accepting
311 connections.
312
313 "maxconn" is deprecated.
314
315 Note: "max_connections" will be deprecated soon, for two reasons:
316 1) it is more of an HTTP::Proxy::Engine attribute, 2) not all
317 engines will support it.
318
319 max_keep_alive_requests
320 maxserve
321 The maximum number of requests the proxy will serve in a single
322 connection. (same as "MaxRequestsPerChild" in Apache)
323
324 "maxserve" is deprecated.
325
326 port
327 The proxy HTTP::Daemon port (default: 8080).
328
329 request
330 The request originally received by the proxy from the user-agent,
331 which will be modified by the request filters.
332
333 response
334 The response received from the origin server by the proxy. It is
335 normally "undef" until the proxy actually receives the beginning of
336 a response from the origin server.
337
338 If one of the request filters sets this attribute, it "short-
339 circuits" the request/response scheme, and the proxy will return
340 this response (which is NOT filtered through the response filter
341 stacks) instead of the expected origin server response. This is
342 useful for caching (though Squid does it much better) and proxy
343 authentication, for example.
344
345 stash
346 The stash is a hash where filters can store data to share between
347 them.
348
349 The stash() method can be used to set the whole hash (with a HASH
350 reference). To access individual keys simply do:
351
352 $proxy->stash( 'bloop' );
353
354 To set it, type:
355
356 $proxy->stash( bloop => 'owww' );
357
358 It's also possibly to get a reference to the stash:
359
360 my $s = $filter->proxy->stash();
361 $s->{bang} = 'bam';
362
363 # $proxy->stash( 'bang' ) will now return 'bam'
364
365 Warning: since the proxy forks for each TCP connection, the data is
366 only shared between filters in the same child process.
367
368 timeout
369 The timeout used by the internal LWP::UserAgent (default: 60).
370
371 url (read-only)
372 The url where the proxy can be reached.
373
374 via The content of the Via: header. Setting it to an empty string will
375 prevent its addition. (default: "$hostname (HTTP::Proxy/$VERSION)")
376
377 x_forwarded_for
378 If set to a true value, the proxy will send the "X-Forwarded-For:"
379 header. (default: true)
380
381 Connection handling methods
382 start()
383 This method works like Tk's "MainLoop": you hand over control to
384 the HTTP::Proxy object you created and configured.
385
386 If "maxconn" is not zero, "start()" will return after accepting at
387 most that many connections. It will return the total number of
388 connexions.
389
390 serve_connections()
391 This is the internal method used to handle each new TCP connection
392 to the proxy.
393
394 Other methods
395 log( $level, $prefix, $message )
396 Adds $message at the end of "logfh", if $level matches "logmask".
397 The "log()" method also prints a timestamp.
398
399 The output looks like:
400
401 [Thu Dec 5 12:30:12 2002] ($$) $prefix: $message
402
403 where $$ is the current process's id.
404
405 If $message is a multiline string, several log lines will be
406 output, each line starting with $prefix.
407
408 is_protocol_supported( $scheme )
409 Returns a boolean indicating if $scheme is supported by the proxy.
410
411 This method is only used internally.
412
413 It is essential to allow HTTP::Proxy users to create "pseudo-
414 schemes" that LWP doesn't know about, but that one of the proxy
415 filters can handle directly. New schemes are added as follows:
416
417 $proxy->init(); # required to get an agent
418 $proxy->agent->protocols_allowed(
419 [ @{ $proxy->agent->protocols_allowed }, 'myhttp' ] );
420
421 new_connection()
422 Increase the proxy's TCP connections counter. Only used by
423 HTTP::Proxy::Engine objects.
424
425 Apache-like attributes
426 HTTP::Proxy has several Apache-like attributes that control the way the
427 HTTP and TCP connections are handled.
428
429 The following attributes control the TCP connection. They are passed to
430 the underlying HTTP::Proxy::Engine, which may (or may not) use them to
431 change its behaviour.
432
433 start_servers
434 Number of child process to fork at the beginning.
435
436 max_clients
437 Maximum number of concurrent TCP connections (i.e. child
438 processes).
439
440 max_requests_per_child
441 Maximum number of TCP connections handled by the same child
442 process.
443
444 min_spare_servers
445 Minimum number of inactive child processes.
446
447 max_spare_servers
448 Maximum number of inactive child processes.
449
450 Those attributes control the HTTP connection:
451
452 keep_alive
453 Support for keep alive HTTP connections.
454
455 max_keep_alive_requests
456 Maximum number of HTTP connections within a single TCP connection.
457
458 keep_alive_timeout
459 Timeout for keep-alive connection.
460
462 No symbols are exported by default. The ":log" tag exports all the
463 logging constants.
464
466 This module does not work under Windows, but I can't see why, and do
467 not have a development platform under that system. Patches and
468 explanations very welcome.
469
470 I guess it is because "fork()" is not well supported.
471
472 $proxy->maxchild(0);
473
474 However, David Fishburn says:
475 This did not work for me under WinXP - ActiveState Perl 5.6, but it
476 DOES work on WinXP ActiveState Perl 5.8.
477
478 Several people have tried to help, but we haven't found a way to make
479 it work correctly yet.
480
481 As from version 0.16, the default engine is
482 HTTP::Proxy::Engine::NoFork. Let me know if it works better.
483
485 HTTP::Proxy::Engine, HTTP::Proxy::BodyFilter,
486 HTTP::Proxy::HeaderFilter, the examples in eg/.
487
489 Philippe "BooK" Bruhat, <book@cpan.org>.
490
491 There is also a mailing-list: http-proxy@mongueurs.net for general
492 discussion about HTTP::Proxy.
493
495 Many people helped me during the development of this module, either on
496 mailing-lists, IRC or over a beer in a pub...
497
498 So, in no particular order, thanks to the libwww-perl team for such a
499 terrific suite of modules, perl-qa (tips for testing), the French Perl
500 Mongueurs (for code tricks, beers and encouragements) and my growing
501 user base... ";-)"
502
503 I'd like to particularly thank Dan Grigsby, who's been using
504 HTTP::Proxy since 2003 (before the filter classes even existed). He is
505 apparently making a living from a product based on HTTP::Proxy. Thanks
506 a lot for your confidence in my work!
507
509 Copyright 2002-2015, Philippe Bruhat.
510
512 This module is free software; you can redistribute it or modify it
513 under the same terms as Perl itself.
514
515
516
517perl v5.30.1 2020-01-30 HTTP::Proxy(3)