1POE::Component::Client:U:sHeTrTPC(o3n)tributed Perl DocuPmOeEn:t:aCtoimopnonent::Client::HTTP(3)
2
3
4
6 POE::Component::Client::HTTP - a HTTP user-agent component
7
9 use POE qw(Component::Client::HTTP);
10
11 POE::Component::Client::HTTP->spawn(
12 Agent => 'SpiffCrawler/0.90', # defaults to something long
13 Alias => 'ua', # defaults to 'weeble'
14 From => 'spiffster@perl.org', # defaults to undef (no header)
15 Protocol => 'HTTP/0.9', # defaults to 'HTTP/1.1'
16 Timeout => 60, # defaults to 180 seconds
17 MaxSize => 16384, # defaults to entire response
18 Streaming => 4096, # defaults to 0 (off)
19 FollowRedirects => 2, # defaults to 0 (off)
20 Proxy => "http://localhost:80", # defaults to HTTP_PROXY env. variable
21 NoProxy => [ "localhost", "127.0.0.1" ], # defs to NO_PROXY env. variable
22 BindAddr => "12.34.56.78", # defaults to INADDR_ANY
23 );
24
25 $kernel->post(
26 'ua', # posts to the 'ua' alias
27 'request', # posts to ua's 'request' state
28 'response', # which of our states will receive the response
29 $request, # an HTTP::Request object
30 );
31
32 # This is the sub which is called when the session receives a
33 # 'response' event.
34 sub response_handler {
35 my ($request_packet, $response_packet) = @_[ARG0, ARG1];
36
37 # HTTP::Request
38 my $request_object = $request_packet->[0];
39
40 # HTTP::Response
41 my $response_object = $response_packet->[0];
42
43 my $stream_chunk;
44 if (! defined($response_object->content)) {
45 $stream_chunk = $response_packet->[1];
46 }
47
48 print(
49 "*" x 78, "\n",
50 "*** my request:\n",
51 "-" x 78, "\n",
52 $request_object->as_string(),
53 "*" x 78, "\n",
54 "*** their response:\n",
55 "-" x 78, "\n",
56 $response_object->as_string(),
57 );
58
59 if (defined $stream_chunk) {
60 print "-" x 40, "\n", $stream_chunk, "\n";
61 }
62
63 print "*" x 78, "\n";
64 }
65
67 POE::Component::Client::HTTP is an HTTP user-agent for POE. It lets
68 other sessions run while HTTP transactions are being processed, and it
69 lets several HTTP transactions be processed in parallel.
70
71 It supports keep-alive through POE::Component::Client::Keepalive, which
72 in turn uses POE::Component::Client::DNS for asynchronous name
73 resolution.
74
75 HTTP client components are not proper objects. Instead of being
76 created, as most objects are, they are "spawned" as separate sessions.
77 To avoid confusion (and hopefully not cause other confusion), they must
78 be spawned with a "spawn" method, not created anew with a "new" one.
79
81 spawn
82 PoCo::Client::HTTP's "spawn" method takes a few named parameters:
83
84 Agent => $user_agent_string
85 Agent => \@list_of_agents
86 If a UserAgent header is not present in the HTTP::Request, a random
87 one will be used from those specified by the "Agent" parameter. If
88 none are supplied, POE::Component::Client::HTTP will advertise itself
89 to the server.
90
91 "Agent" may contain a reference to a list of user agents. If this is
92 the case, PoCo::Client::HTTP will choose one of them at random for
93 each request.
94
95 Alias => $session_alias
96 "Alias" sets the name by which the session will be known. If no
97 alias is given, the component defaults to "weeble". The alias lets
98 several sessions interact with HTTP components without keeping (or
99 even knowing) hard references to them. It's possible to spawn
100 several HTTP components with different names.
101
102 ConnectionManager => $poco_client_keepalive
103 "ConnectionManager" sets this component's connection pool manager.
104 It expects the connection manager to be a reference to a
105 POE::Component::Client::Keepalive object. The HTTP client component
106 will call "allocate()" on the connection manager itself so you should
107 not have done this already.
108
109 my $pool = POE::Component::Client::Keepalive->new(
110 keep_alive => 10, # seconds to keep connections alive
111 max_open => 100, # max concurrent connections - total
112 max_per_host => 20, # max concurrent connections - per host
113 timeout => 30, # max time (seconds) to establish a new connection
114 );
115
116 POE::Component::Client::HTTP->spawn(
117 # ...
118 ConnectionManager => $pool,
119 # ...
120 );
121
122 See POE::Component::Client::Keepalive for more information.
123
124 CookieJar => $cookie_jar
125 "CookieJar" sets the component's cookie jar. It expects the cookie
126 jar to be a reference to a HTTP::Cookies object.
127
128 From => $admin_address
129 "From" holds an e-mail address where the client's administrator
130 and/or maintainer may be reached. It defaults to undef, which means
131 no From header will be included in requests.
132
133 MaxSize => OCTETS
134 "MaxSize" specifies the largest response to accept from a server.
135 The content of larger responses will be truncated to OCTET octets.
136 This has been used to return the <head></head> section of web pages
137 without the need to wade through <body></body>.
138
139 NoProxy => [ $host_1, $host_2, ..., $host_N ]
140 NoProxy => "host1,host2,hostN"
141 "NoProxy" specifies a list of server hosts that will not be proxied.
142 It is useful for local hosts and hosts that do not properly support
143 proxying. If NoProxy is not specified, a list will be taken from the
144 NO_PROXY environment variable.
145
146 NoProxy => [ "localhost", "127.0.0.1" ],
147 NoProxy => "localhost,127.0.0.1",
148
149 BindAddr => $local_ip
150 Specify "BindAddr" to bind all client sockets to a particular local
151 address. The value of BindAddr will be passed through
152 POE::Component::Client::Keepalive to POE::Wheel::SocketFactory (as
153 "bind_address"). See that module's documentation for implementation
154 details.
155
156 BindAddr => "12.34.56.78"
157
158 Protocol => $http_protocol_string
159 "Protocol" advertises the protocol that the client wishes to see.
160 Under normal circumstances, it should be left to its default value:
161 "HTTP/1.1".
162
163 Proxy => [ $proxy_host, $proxy_port ]
164 Proxy => $proxy_url
165 Proxy => $proxy_url,$proxy_url,...
166 "Proxy" specifies one or more proxy hosts that requests will be
167 passed through. If not specified, proxy servers will be taken from
168 the HTTP_PROXY (or http_proxy) environment variable. No proxying
169 will occur unless Proxy is set or one of the environment variables
170 exists.
171
172 The proxy can be specified either as a host and port, or as one or
173 more URLs. Proxy URLs must specify the proxy port, even if it is 80.
174
175 Proxy => [ "127.0.0.1", 80 ],
176 Proxy => "http://127.0.0.1:80/",
177
178 "Proxy" may specify multiple proxies separated by commas.
179 PoCo::Client::HTTP will choose proxies from this list at random.
180 This is useful for load balancing requests through multiple gateways.
181
182 Proxy => "http://127.0.0.1:80/,http://127.0.0.1:81/",
183
184 Streaming => OCTETS
185 "Streaming" changes allows Client::HTTP to return large content in
186 chunks (of OCTETS octets each) rather than combine the entire content
187 into a single HTTP::Response object.
188
189 By default, Client::HTTP reads the entire content for a response into
190 memory before returning an HTTP::Response object. This is obviously
191 bad for applications like streaming MP3 clients, because they often
192 fetch songs that never end. Yes, they go on and on, my friend.
193
194 When "Streaming" is set to nonzero, however, the response handler
195 receives chunks of up to OCTETS octets apiece. The response handler
196 accepts slightly different parameters in this case. ARG0 is also an
197 HTTP::Response object but it does not contain response content, and
198 ARG1 contains a a chunk of raw response content, or undef if the
199 stream has ended.
200
201 sub streaming_response_handler {
202 my $response_packet = $_[ARG1];
203 my ($response, $data) = @$response_packet;
204 print SAVED_STREAM $data if defined $data;
205 }
206
207 FollowRedirects => $number_of_hops_to_follow
208 "FollowRedirects" specifies how many redirects (e.g. 302 Moved) to
209 follow. If not specified defaults to 0, and thus no redirection is
210 followed. This maintains compatibility with the previous behavior,
211 which was not to follow redirects at all.
212
213 If redirects are followed, a response chain should be built, and can
214 be accessed through $response_object->previous(). See HTTP::Response
215 for details here.
216
217 Timeout => $query_timeout
218 "Timeout" sets how long POE::Component::Client::HTTP has to process
219 an application's request, in seconds. "Timeout" defaults to 180
220 (three minutes) if not specified.
221
222 It's important to note that the timeout begins when the component
223 receives an application's request, not when it attempts to connect to
224 the web server.
225
226 Timeouts may result from sending the component too many requests at
227 once. Each request would need to be received and tracked in order.
228 Consider this:
229
230 $_[KERNEL]->post(component => request => ...) for (1..15_000);
231
232 15,000 requests are queued together in one enormous bolus. The
233 component would receive and initialize them in order. The first
234 socket activity wouldn't arrive until the 15,000th request was set
235 up. If that took longer than "Timeout", then the requests that have
236 waited too long would fail.
237
238 "ConnectionManager"'s own timeout and concurrency limits also affect
239 how many requests may be processed at once. For example, most of the
240 15,000 requests would wait in the connection manager's pool until
241 sockets become available. Meanwhile, the "Timeout" would be counting
242 down.
243
244 Applications may elect to control concurrency outside the component's
245 "Timeout". They may do so in a few ways.
246
247 The easiest way is to limit the initial number of requests to
248 something more manageable. As responses arrive, the application
249 should handle them and start new requests. This limits concurrency
250 to the initial request count.
251
252 An application may also outsource job throttling to another module,
253 such as POE::Component::JobQueue.
254
255 In any case, "Timeout" and "ConnectionManager" may be tuned to
256 maximize timeouts and concurrency limits. This may help in some
257 cases. Developers should be aware that doing so will increase memory
258 usage. POE::Component::Client::HTTP and KeepAlive track requests in
259 memory, while applications are free to keep pending requests on disk.
260
262 Sessions communicate asynchronously with PoCo::Client::HTTP. They post
263 requests to it, and it posts responses back.
264
265 request
266 Requests are posted to the component's "request" state. They include
267 an HTTP::Request object which defines the request. For example:
268
269 $kernel->post(
270 'ua', 'request', # http session alias & state
271 'response', # my state to receive responses
272 GET 'http://poe.perl.org', # a simple HTTP request
273 'unique id', # a tag to identify the request
274 'progress', # an event to indicate progress
275 'http://1.2.3.4:80/' # proxy to use for this request
276 );
277
278 Requests include the state to which responses will be posted. In the
279 previous example, the handler for a 'response' state will be called
280 with each HTTP response. The "progress" handler is optional and if
281 installed, the component will provide progress metrics (see sample
282 handler below). The "proxy" parameter is optional and if not defined,
283 a default proxy will be used if configured. No proxy will be used if
284 neither a default one nor a "proxy" parameter is defined.
285
286 pending_requests_count
287 There's also a pending_requests_count state that returns the number of
288 requests currently being processed. To receive the return value, it
289 must be invoked with $kernel->call().
290
291 my $count = $kernel->call('ua' => 'pending_requests_count');
292
293 cancel
294 Cancel a specific HTTP request. Requires a reference to the original
295 request (blessed or stringified) so it knows which one to cancel. See
296 "progress handler" below for notes on canceling streaming requests.
297
298 To cancel a request based on its blessed HTTP::Request object:
299
300 $kernel->post( component => cancel => $http_request );
301
302 To cancel a request based on its stringified HTTP::Request object:
303
304 $kernel->post( component => cancel => "$http_request" );
305
306 shutdown
307 Responds to all pending requests with 408 (request timeout), and then
308 shuts down the component and all subcomponents.
309
311 response handler
312 In addition to all the usual POE parameters, HTTP responses come with
313 two list references:
314
315 my ($request_packet, $response_packet) = @_[ARG0, ARG1];
316
317 $request_packet contains a reference to the original HTTP::Request
318 object. This is useful for matching responses back to the requests
319 that generated them.
320
321 my $http_request_object = $request_packet->[0];
322 my $http_request_tag = $request_packet->[1]; # from the 'request' post
323
324 $response_packet contains a reference to the resulting HTTP::Response
325 object.
326
327 my $http_response_object = $response_packet->[0];
328
329 Please see the HTTP::Request and HTTP::Response manpages for more
330 information.
331
332 progress handler
333 The example progress handler shows how to calculate a percentage of
334 download completion.
335
336 sub progress_handler {
337 my $gen_args = $_[ARG0]; # args passed to all calls
338 my $call_args = $_[ARG1]; # args specific to the call
339
340 my $req = $gen_args->[0]; # HTTP::Request object being serviced
341 my $tag = $gen_args->[1]; # Request ID tag from.
342 my $got = $call_args->[0]; # Number of bytes retrieved so far.
343 my $tot = $call_args->[1]; # Total bytes to be retrieved.
344 my $oct = $call_args->[2]; # Chunk of raw octets received this time.
345
346 my $percent = $got / $tot * 100;
347
348 printf(
349 "-- %.0f%% [%d/%d]: %s\n", $percent, $got, $tot, $req->uri()
350 );
351
352 # To cancel the request:
353 # $_[KERNEL]->post( component => cancel => $req );
354 }
355
356 DEPRECATION WARNING
357
358 The third return argument (the raw octets received) has been
359 deprecated. Instead of it, use the Streaming parameter to get chunks
360 of content in the response handler.
361
363 The HTTP::Request object passed to the request event can contain a CODE
364 reference as "content". This allows for sending large files without
365 wasting memory. Your callback should return a chunk of data each time
366 it is called, and an empty string when done. Don't forget to set the
367 Content-Length header correctly. Example:
368
369 my $request = HTTP::Request->new( PUT => 'http://...' );
370
371 my $file = '/path/to/large_file';
372
373 open my $fh, '<', $file;
374
375 my $upload_cb = sub {
376 if ( sysread $fh, my $buf, 4096 ) {
377 return $buf;
378 }
379 else {
380 close $fh;
381 return '';
382 }
383 };
384
385 $request->content_length( -s $file );
386
387 $request->content( $upload_cb );
388
389 $kernel->post( ua => request, 'response', $request );
390
392 Transparent content decoding has been disabled as of version 0.84.
393 This also removes support for transparent gzip requesting and
394 decompression.
395
396 To re-enable gzip compression, specify the gzip Content-Encoding and
397 use HTTP::Response's decoded_content() method rather than content():
398
399 my $request = HTTP::Request->new(
400 GET => "http://www.yahoo.com/", [
401 'Accept-Encoding' => 'gzip'
402 ]
403 );
404
405 # ... time passes ...
406
407 my $content = $response->decoded_content();
408
409 The change in POE::Component::Client::HTTP behavior was prompted by
410 changes in HTTP::Response that surfaced a bug in the component's
411 transparent gzip handling.
412
413 Allowing the application to specify and handle content encodings seems
414 to be the most reliable and flexible resolution.
415
416 For more information about the problem and discussions regarding the
417 solution, see: <http://www.perlmonks.org/?node_id=683833> and
418 <http://rt.cpan.org/Ticket/Display.html?id=35538>
419
421 POE::Component::Client::HTTP sets its own response headers with
422 additional information. All of its headers begin with "X-PCCH".
423
424 X-PCCH-Peer
425 X-PCCH-Peer contains the remote IPv4 address and port, separated by a
426 period. For example, "127.0.0.1.8675" represents port 8675 on
427 localhost.
428
429 Proxying will render X-PCCH-Peer nearly useless, since the socket will
430 be connected to a proxy rather than the server itself.
431
432 This feature was added at Doreen Grey's request. Doreen wanted a means
433 to find the remote server's address without having to make an
434 additional request.
435
436 Patches for IPv6 support are welcome.
437
439 POE::Component::Client::HTTP uses two standard environment variables:
440 HTTP_PROXY and NO_PROXY.
441
442 HTTP_PROXY sets the proxy server that Client::HTTP will forward
443 requests through. NO_PROXY sets a list of hosts that will not be
444 forwarded through a proxy.
445
446 See the Proxy and NoProxy constructor parameters for more information
447 about these variables.
448
450 This component is built upon HTTP::Request, HTTP::Response, and POE.
451 Please see its source code and the documentation for its foundation
452 modules to learn more. If you want to use cookies, you'll need to read
453 about HTTP::Cookies as well.
454
455 Also see the test program, t/01_request.t, in the PoCo::Client::HTTP
456 distribution.
457
459 There is no support for CGI_PROXY or CgiProxy.
460
461 Secure HTTP (https) proxying is not supported at this time.
462
463 There is no object oriented interface. See
464 POE::Component::Client::Keepalive and POE::Component::Client::DNS for
465 examples of a decent OO interface.
466
468 POE::Component::Client::HTTP is
469
470 · Copyright 1999-2009 Rocco Caputo
471
472 · Copyright 2004 Rob Bloodgood
473
474 · Copyright 2004-2005 Martijn van Beers
475
476 All rights are reserved. POE::Component::Client::HTTP is free
477 software; you may redistribute it and/or modify it under the same terms
478 as Perl itself.
479
481 Joel Bernstein solved some nasty race conditions. Portugal Telecom
482 <http://www.sapo.pt/> was kind enough to support his contributions.
483
484 Jeff Bisbee added POD tests and documentation to pass several of them
485 to version 0.79. He's a kwalitee-increasing machine!
486
488 https://rt.cpan.org/Dist/Display.html?Queue=POE-Component-Client-HTTP
489
491 http://github.com/rcaputo/poe-component-client-http
492 http://gitorious.org/poe-component-client-http
493
495 http://search.cpan.org/dist/POE-Component-Client-HTTP/
496
497
498
499perl v5.12.0 2010-02-15 POE::Component::Client::HTTP(3)