1POE::Component::Client:U:sHeTrTPC(o3n)tributed Perl DocuPmOeEn:t:aCtoimopnonent::Client::HTTP(3)
2
3
4
6 POE::Component::Client::HTTP - a HTTP user-agent component
7
9 version 0.949
10
12 use POE qw(Component::Client::HTTP);
13
14 POE::Component::Client::HTTP->spawn(
15 Agent => 'SpiffCrawler/0.90', # defaults to something long
16 Alias => 'ua', # defaults to 'weeble'
17 From => 'spiffster@perl.org', # defaults to undef (no header)
18 Protocol => 'HTTP/0.9', # defaults to 'HTTP/1.1'
19 Timeout => 60, # defaults to 180 seconds
20 MaxSize => 16384, # defaults to entire response
21 Streaming => 4096, # defaults to 0 (off)
22 FollowRedirects => 2, # defaults to 0 (off)
23 Proxy => "http://localhost:80", # defaults to HTTP_PROXY env. variable
24 NoProxy => [ "localhost", "127.0.0.1" ], # defs to NO_PROXY env. variable
25 BindAddr => "12.34.56.78", # defaults to INADDR_ANY
26 );
27
28 $kernel->post(
29 'ua', # posts to the 'ua' alias
30 'request', # posts to ua's 'request' state
31 'response', # which of our states will receive the response
32 $request, # an HTTP::Request object
33 );
34
35 # This is the sub which is called when the session receives a
36 # 'response' event.
37 sub response_handler {
38 my ($request_packet, $response_packet) = @_[ARG0, ARG1];
39
40 # HTTP::Request
41 my $request_object = $request_packet->[0];
42
43 # HTTP::Response
44 my $response_object = $response_packet->[0];
45
46 my $stream_chunk;
47 if (! defined($response_object->content)) {
48 $stream_chunk = $response_packet->[1];
49 }
50
51 print(
52 "*" x 78, "\n",
53 "*** my request:\n",
54 "-" x 78, "\n",
55 $request_object->as_string(),
56 "*" x 78, "\n",
57 "*** their response:\n",
58 "-" x 78, "\n",
59 $response_object->as_string(),
60 );
61
62 if (defined $stream_chunk) {
63 print "-" x 40, "\n", $stream_chunk, "\n";
64 }
65
66 print "*" x 78, "\n";
67 }
68
70 POE::Component::Client::HTTP is an HTTP user-agent for POE. It lets
71 other sessions run while HTTP transactions are being processed, and it
72 lets several HTTP transactions be processed in parallel.
73
74 It supports keep-alive through POE::Component::Client::Keepalive, which
75 in turn uses POE::Component::Resolver for asynchronous IPv4 and IPv6
76 name resolution.
77
78 HTTP client components are not proper objects. Instead of being
79 created, as most objects are, they are "spawned" as separate sessions.
80 To avoid confusion (and hopefully not cause other confusion), they must
81 be spawned with a "spawn" method, not created anew with a "new" one.
82
84 spawn
85 PoCo::Client::HTTP's "spawn" method takes a few named parameters:
86
87 Agent => $user_agent_string
88 Agent => \@list_of_agents
89 If a UserAgent header is not present in the HTTP::Request, a random
90 one will be used from those specified by the "Agent" parameter. If
91 none are supplied, POE::Component::Client::HTTP will advertise itself
92 to the server.
93
94 "Agent" may contain a reference to a list of user agents. If this is
95 the case, PoCo::Client::HTTP will choose one of them at random for
96 each request.
97
98 Alias => $session_alias
99 "Alias" sets the name by which the session will be known. If no
100 alias is given, the component defaults to "weeble". The alias lets
101 several sessions interact with HTTP components without keeping (or
102 even knowing) hard references to them. It's possible to spawn
103 several HTTP components with different names.
104
105 ConnectionManager => $poco_client_keepalive
106 "ConnectionManager" sets this component's connection pool manager.
107 It expects the connection manager to be a reference to a
108 POE::Component::Client::Keepalive object. The HTTP client component
109 will call "allocate()" on the connection manager itself so you should
110 not have done this already.
111
112 my $pool = POE::Component::Client::Keepalive->new(
113 keep_alive => 10, # seconds to keep connections alive
114 max_open => 100, # max concurrent connections - total
115 max_per_host => 20, # max concurrent connections - per host
116 timeout => 30, # max time (seconds) to establish a new connection
117 );
118
119 POE::Component::Client::HTTP->spawn(
120 # ...
121 ConnectionManager => $pool,
122 # ...
123 );
124
125 See POE::Component::Client::Keepalive for more information, including
126 how to alter the connection manager's resolver configuration (for
127 example, to force IPv6 or prefer it before IPv4).
128
129 CookieJar => $cookie_jar
130 "CookieJar" sets the component's cookie jar. It expects the cookie
131 jar to be a reference to a HTTP::Cookies object.
132
133 From => $admin_address
134 "From" holds an e-mail address where the client's administrator
135 and/or maintainer may be reached. It defaults to undef, which means
136 no From header will be included in requests.
137
138 MaxSize => OCTETS
139 "MaxSize" specifies the largest response to accept from a server.
140 The content of larger responses will be truncated to OCTET octets.
141 This has been used to return the <head></head> section of web pages
142 without the need to wade through <body></body>.
143
144 NoProxy => [ $host_1, $host_2, ..., $host_N ]
145 NoProxy => "host1,host2,hostN"
146 "NoProxy" specifies a list of server hosts that will not be proxied.
147 It is useful for local hosts and hosts that do not properly support
148 proxying. If NoProxy is not specified, a list will be taken from the
149 NO_PROXY environment variable.
150
151 NoProxy => [ "localhost", "127.0.0.1" ],
152 NoProxy => "localhost,127.0.0.1",
153
154 BindAddr => $local_ip
155 Specify "BindAddr" to bind all client sockets to a particular local
156 address. The value of BindAddr will be passed through
157 POE::Component::Client::Keepalive to POE::Wheel::SocketFactory (as
158 "bind_address"). See that module's documentation for implementation
159 details.
160
161 BindAddr => "12.34.56.78"
162
163 Protocol => $http_protocol_string
164 "Protocol" advertises the protocol that the client wishes to see.
165 Under normal circumstances, it should be left to its default value:
166 "HTTP/1.1".
167
168 Proxy => [ $proxy_host, $proxy_port ]
169 Proxy => $proxy_url
170 Proxy => $proxy_url,$proxy_url,...
171 "Proxy" specifies one or more proxy hosts that requests will be
172 passed through. If not specified, proxy servers will be taken from
173 the HTTP_PROXY (or http_proxy) environment variable. No proxying
174 will occur unless Proxy is set or one of the environment variables
175 exists.
176
177 The proxy can be specified either as a host and port, or as one or
178 more URLs. Proxy URLs must specify the proxy port, even if it is 80.
179
180 Proxy => [ "127.0.0.1", 80 ],
181 Proxy => "http://127.0.0.1:80/",
182
183 "Proxy" may specify multiple proxies separated by commas.
184 PoCo::Client::HTTP will choose proxies from this list at random.
185 This is useful for load balancing requests through multiple gateways.
186
187 Proxy => "http://127.0.0.1:80/,http://127.0.0.1:81/",
188
189 Streaming => OCTETS
190 "Streaming" changes allows Client::HTTP to return large content in
191 chunks (of OCTETS octets each) rather than combine the entire content
192 into a single HTTP::Response object.
193
194 By default, Client::HTTP reads the entire content for a response into
195 memory before returning an HTTP::Response object. This is obviously
196 bad for applications like streaming MP3 clients, because they often
197 fetch songs that never end. Yes, they go on and on, my friend.
198
199 When "Streaming" is set to nonzero, however, the response handler
200 receives chunks of up to OCTETS octets apiece. The response handler
201 accepts slightly different parameters in this case. ARG0 is also an
202 HTTP::Response object but it does not contain response content, and
203 ARG1 contains a a chunk of raw response content, or undef if the
204 stream has ended.
205
206 sub streaming_response_handler {
207 my $response_packet = $_[ARG1];
208 my ($response, $data) = @$response_packet;
209 print SAVED_STREAM $data if defined $data;
210 }
211
212 FollowRedirects => $number_of_hops_to_follow
213 "FollowRedirects" specifies how many redirects (e.g. 302 Moved) to
214 follow. If not specified defaults to 0, and thus no redirection is
215 followed. This maintains compatibility with the previous behavior,
216 which was not to follow redirects at all.
217
218 If redirects are followed, a response chain should be built, and can
219 be accessed through $response_object->previous(). See HTTP::Response
220 for details here.
221
222 Timeout => $query_timeout
223 "Timeout" sets how long POE::Component::Client::HTTP has to process
224 an application's request, in seconds. "Timeout" defaults to 180
225 (three minutes) if not specified.
226
227 It's important to note that the timeout begins when the component
228 receives an application's request, not when it attempts to connect to
229 the web server.
230
231 Timeouts may result from sending the component too many requests at
232 once. Each request would need to be received and tracked in order.
233 Consider this:
234
235 $_[KERNEL]->post(component => request => ...) for (1..15_000);
236
237 15,000 requests are queued together in one enormous bolus. The
238 component would receive and initialize them in order. The first
239 socket activity wouldn't arrive until the 15,000th request was set
240 up. If that took longer than "Timeout", then the requests that have
241 waited too long would fail.
242
243 "ConnectionManager"'s own timeout and concurrency limits also affect
244 how many requests may be processed at once. For example, most of the
245 15,000 requests would wait in the connection manager's pool until
246 sockets become available. Meanwhile, the "Timeout" would be counting
247 down.
248
249 Applications may elect to control concurrency outside the component's
250 "Timeout". They may do so in a few ways.
251
252 The easiest way is to limit the initial number of requests to
253 something more manageable. As responses arrive, the application
254 should handle them and start new requests. This limits concurrency
255 to the initial request count.
256
257 An application may also outsource job throttling to another module,
258 such as POE::Component::JobQueue.
259
260 In any case, "Timeout" and "ConnectionManager" may be tuned to
261 maximize timeouts and concurrency limits. This may help in some
262 cases. Developers should be aware that doing so will increase memory
263 usage. POE::Component::Client::HTTP and KeepAlive track requests in
264 memory, while applications are free to keep pending requests on disk.
265
267 Sessions communicate asynchronously with PoCo::Client::HTTP. They post
268 requests to it, and it posts responses back.
269
270 request
271 Requests are posted to the component's "request" state. They include
272 an HTTP::Request object which defines the request. For example:
273
274 $kernel->post(
275 'ua', 'request', # http session alias & state
276 'response', # my state to receive responses
277 GET('http://poe.perl.org'), # a simple HTTP request
278 'unique id', # a tag to identify the request
279 'progress', # an event to indicate progress
280 'http://1.2.3.4:80/' # proxy to use for this request
281 );
282
283 Requests include the state to which responses will be posted. In the
284 previous example, the handler for a 'response' state will be called
285 with each HTTP response. The "progress" handler is optional and if
286 installed, the component will provide progress metrics (see sample
287 handler below). The "proxy" parameter is optional and if not defined,
288 a default proxy will be used if configured. No proxy will be used if
289 neither a default one nor a "proxy" parameter is defined.
290
291 pending_requests_count
292 There's also a pending_requests_count state that returns the number of
293 requests currently being processed. To receive the return value, it
294 must be invoked with $kernel->call().
295
296 my $count = $kernel->call('ua' => 'pending_requests_count');
297
298 NOTE: Sometimes the count might not be what you expected, because
299 responses are currently in POE's queue and you haven't processed them.
300 This could happen if you configure the "ConnectionManager"'s
301 concurrency to a high enough value.
302
303 cancel
304 Cancel a specific HTTP request. Requires a reference to the original
305 request (blessed or stringified) so it knows which one to cancel. See
306 "progress handler" below for notes on canceling streaming requests.
307
308 To cancel a request based on its blessed HTTP::Request object:
309
310 $kernel->post( component => cancel => $http_request );
311
312 To cancel a request based on its stringified HTTP::Request object:
313
314 $kernel->post( component => cancel => "$http_request" );
315
316 shutdown
317 Responds to all pending requests with 408 (request timeout), and then
318 shuts down the component and all subcomponents.
319
321 response handler
322 In addition to all the usual POE parameters, HTTP responses come with
323 two list references:
324
325 my ($request_packet, $response_packet) = @_[ARG0, ARG1];
326
327 $request_packet contains a reference to the original HTTP::Request
328 object. This is useful for matching responses back to the requests
329 that generated them.
330
331 my $http_request_object = $request_packet->[0];
332 my $http_request_tag = $request_packet->[1]; # from the 'request' post
333
334 $response_packet contains a reference to the resulting HTTP::Response
335 object.
336
337 my $http_response_object = $response_packet->[0];
338
339 Please see the HTTP::Request and HTTP::Response manpages for more
340 information.
341
342 progress handler
343 The example progress handler shows how to calculate a percentage of
344 download completion.
345
346 sub progress_handler {
347 my $gen_args = $_[ARG0]; # args passed to all calls
348 my $call_args = $_[ARG1]; # args specific to the call
349
350 my $req = $gen_args->[0]; # HTTP::Request object being serviced
351 my $tag = $gen_args->[1]; # Request ID tag from.
352 my $got = $call_args->[0]; # Number of bytes retrieved so far.
353 my $tot = $call_args->[1]; # Total bytes to be retrieved.
354 my $oct = $call_args->[2]; # Chunk of raw octets received this time.
355
356 my $percent = $got / $tot * 100;
357
358 printf(
359 "-- %.0f%% [%d/%d]: %s\n", $percent, $got, $tot, $req->uri()
360 );
361
362 # To cancel the request:
363 # $_[KERNEL]->post( component => cancel => $req );
364 }
365
366 DEPRECATION WARNING
367
368 The third return argument (the raw octets received) has been
369 deprecated. Instead of it, use the Streaming parameter to get chunks
370 of content in the response handler.
371
373 The HTTP::Request object passed to the request event can contain a CODE
374 reference as "content". This allows for sending large files without
375 wasting memory. Your callback should return a chunk of data each time
376 it is called, and an empty string when done. Don't forget to set the
377 Content-Length header correctly. Example:
378
379 my $request = HTTP::Request->new( PUT => 'http://...' );
380
381 my $file = '/path/to/large_file';
382
383 open my $fh, '<', $file;
384
385 my $upload_cb = sub {
386 if ( sysread $fh, my $buf, 4096 ) {
387 return $buf;
388 }
389 else {
390 close $fh;
391 return '';
392 }
393 };
394
395 $request->content_length( -s $file );
396
397 $request->content( $upload_cb );
398
399 $kernel->post( ua => request, 'response', $request );
400
402 Transparent content decoding has been disabled as of version 0.84.
403 This also removes support for transparent gzip requesting and
404 decompression.
405
406 To re-enable gzip compression, specify the gzip Content-Encoding and
407 use HTTP::Response's decoded_content() method rather than content():
408
409 my $request = HTTP::Request->new(
410 GET => "http://www.yahoo.com/", [
411 'Accept-Encoding' => 'gzip'
412 ]
413 );
414
415 # ... time passes ...
416
417 my $content = $response->decoded_content();
418
419 The change in POE::Component::Client::HTTP behavior was prompted by
420 changes in HTTP::Response that surfaced a bug in the component's
421 transparent gzip handling.
422
423 Allowing the application to specify and handle content encodings seems
424 to be the most reliable and flexible resolution.
425
426 For more information about the problem and discussions regarding the
427 solution, see: <http://www.perlmonks.org/?node_id=683833> and
428 <http://rt.cpan.org/Ticket/Display.html?id=35538>
429
431 POE::Component::Client::HTTP sets its own response headers with
432 additional information. All of its headers begin with "X-PCCH".
433
434 X-PCCH-Errmsg
435 POE::Component::Client::HTTP may fail because of an internal client
436 error rather than an HTTP protocol error. X-PCCH-Errmsg will contain a
437 human readable reason for client failures, should they occur.
438
439 The text of X-PCCH-Errmsg may also be repeated in the response's
440 content.
441
442 X-PCCH-Peer
443 X-PCCH-Peer contains the remote IPv4 address and port, separated by a
444 period. For example, "127.0.0.1.8675" represents port 8675 on
445 localhost.
446
447 Proxying will render X-PCCH-Peer nearly useless, since the socket will
448 be connected to a proxy rather than the server itself.
449
450 This feature was added at Doreen Grey's request. Doreen wanted a means
451 to find the remote server's address without having to make an
452 additional request.
453
455 POE::Component::Client::HTTP uses two standard environment variables:
456 HTTP_PROXY and NO_PROXY.
457
458 HTTP_PROXY sets the proxy server that Client::HTTP will forward
459 requests through. NO_PROXY sets a list of hosts that will not be
460 forwarded through a proxy.
461
462 See the Proxy and NoProxy constructor parameters for more information
463 about these variables.
464
466 This component is built upon HTTP::Request, HTTP::Response, and POE.
467 Please see its source code and the documentation for its foundation
468 modules to learn more. If you want to use cookies, you'll need to read
469 about HTTP::Cookies as well.
470
471 Also see the test program, t/01_request.t, in the PoCo::Client::HTTP
472 distribution.
473
475 There is no support for CGI_PROXY or CgiProxy.
476
477 Secure HTTP (https) proxying is not supported at this time.
478
479 There is no object oriented interface. See
480 POE::Component::Client::Keepalive and POE::Component::Resolver for
481 examples of a decent OO interface.
482
484 POE::Component::Client::HTTP is
485
486 · Copyright 1999-2009 Rocco Caputo
487
488 · Copyright 2004 Rob Bloodgood
489
490 · Copyright 2004-2005 Martijn van Beers
491
492 All rights are reserved. POE::Component::Client::HTTP is free
493 software; you may redistribute it and/or modify it under the same terms
494 as Perl itself.
495
497 Joel Bernstein solved some nasty race conditions. Portugal Telecom
498 <http://www.sapo.pt/> was kind enough to support his contributions.
499
500 Jeff Bisbee added POD tests and documentation to pass several of them
501 to version 0.79. He's a kwalitee-increasing machine!
502
504 https://rt.cpan.org/Dist/Display.html?Queue=POE-Component-Client-HTTP
505
507 Github: <http://github.com/rcaputo/poe-component-client-http> .
508
509 Gitorious: <http://gitorious.org/poe-component-client-http> .
510
512 <http://search.cpan.org/dist/POE-Component-Client-HTTP/>
513
514
515
516perl v5.30.1 2020-01-30 POE::Component::Client::HTTP(3)