1POE::Component::Client:U:sHeTrTPC(o3n)tributed Perl DocuPmOeEn:t:aCtoimopnonent::Client::HTTP(3)
2
3
4

NAME

6       POE::Component::Client::HTTP - a HTTP user-agent component
7

VERSION

9       version 0.949
10

SYNOPSIS

12         use POE qw(Component::Client::HTTP);
13
14         POE::Component::Client::HTTP->spawn(
15           Agent     => 'SpiffCrawler/0.90',   # defaults to something long
16           Alias     => 'ua',                  # defaults to 'weeble'
17           From      => 'spiffster@perl.org',  # defaults to undef (no header)
18           Protocol  => 'HTTP/0.9',            # defaults to 'HTTP/1.1'
19           Timeout   => 60,                    # defaults to 180 seconds
20           MaxSize   => 16384,                 # defaults to entire response
21           Streaming => 4096,                  # defaults to 0 (off)
22           FollowRedirects => 2,               # defaults to 0 (off)
23           Proxy     => "http://localhost:80", # defaults to HTTP_PROXY env. variable
24           NoProxy   => [ "localhost", "127.0.0.1" ], # defs to NO_PROXY env. variable
25           BindAddr  => "12.34.56.78",         # defaults to INADDR_ANY
26         );
27
28         $kernel->post(
29           'ua',        # posts to the 'ua' alias
30           'request',   # posts to ua's 'request' state
31           'response',  # which of our states will receive the response
32           $request,    # an HTTP::Request object
33         );
34
35         # This is the sub which is called when the session receives a
36         # 'response' event.
37         sub response_handler {
38           my ($request_packet, $response_packet) = @_[ARG0, ARG1];
39
40           # HTTP::Request
41           my $request_object  = $request_packet->[0];
42
43           # HTTP::Response
44           my $response_object = $response_packet->[0];
45
46           my $stream_chunk;
47           if (! defined($response_object->content)) {
48             $stream_chunk = $response_packet->[1];
49           }
50
51           print(
52             "*" x 78, "\n",
53             "*** my request:\n",
54             "-" x 78, "\n",
55             $request_object->as_string(),
56             "*" x 78, "\n",
57             "*** their response:\n",
58             "-" x 78, "\n",
59             $response_object->as_string(),
60           );
61
62           if (defined $stream_chunk) {
63             print "-" x 40, "\n", $stream_chunk, "\n";
64           }
65
66           print "*" x 78, "\n";
67         }
68

DESCRIPTION

70       POE::Component::Client::HTTP is an HTTP user-agent for POE.  It lets
71       other sessions run while HTTP transactions are being processed, and it
72       lets several HTTP transactions be processed in parallel.
73
74       It supports keep-alive through POE::Component::Client::Keepalive, which
75       in turn uses POE::Component::Resolver for asynchronous IPv4 and IPv6
76       name resolution.
77
78       HTTP client components are not proper objects.  Instead of being
79       created, as most objects are, they are "spawned" as separate sessions.
80       To avoid confusion (and hopefully not cause other confusion), they must
81       be spawned with a "spawn" method, not created anew with a "new" one.
82

CONSTRUCTOR

84   spawn
85       PoCo::Client::HTTP's "spawn" method takes a few named parameters:
86
87       Agent => $user_agent_string
88       Agent => \@list_of_agents
89         If a UserAgent header is not present in the HTTP::Request, a random
90         one will be used from those specified by the "Agent" parameter.  If
91         none are supplied, POE::Component::Client::HTTP will advertise itself
92         to the server.
93
94         "Agent" may contain a reference to a list of user agents.  If this is
95         the case, PoCo::Client::HTTP will choose one of them at random for
96         each request.
97
98       Alias => $session_alias
99         "Alias" sets the name by which the session will be known.  If no
100         alias is given, the component defaults to "weeble".  The alias lets
101         several sessions interact with HTTP components without keeping (or
102         even knowing) hard references to them.  It's possible to spawn
103         several HTTP components with different names.
104
105       ConnectionManager => $poco_client_keepalive
106         "ConnectionManager" sets this component's connection pool manager.
107         It expects the connection manager to be a reference to a
108         POE::Component::Client::Keepalive object.  The HTTP client component
109         will call "allocate()" on the connection manager itself so you should
110         not have done this already.
111
112           my $pool = POE::Component::Client::Keepalive->new(
113             keep_alive    => 10, # seconds to keep connections alive
114             max_open      => 100, # max concurrent connections - total
115             max_per_host  => 20, # max concurrent connections - per host
116             timeout       => 30, # max time (seconds) to establish a new connection
117           );
118
119           POE::Component::Client::HTTP->spawn(
120             # ...
121             ConnectionManager => $pool,
122             # ...
123           );
124
125         See POE::Component::Client::Keepalive for more information, including
126         how to alter the connection manager's resolver configuration (for
127         example, to force IPv6 or prefer it before IPv4).
128
129       CookieJar => $cookie_jar
130         "CookieJar" sets the component's cookie jar.  It expects the cookie
131         jar to be a reference to a HTTP::Cookies object.
132
133       From => $admin_address
134         "From" holds an e-mail address where the client's administrator
135         and/or maintainer may be reached.  It defaults to undef, which means
136         no From header will be included in requests.
137
138       MaxSize => OCTETS
139         "MaxSize" specifies the largest response to accept from a server.
140         The content of larger responses will be truncated to OCTET octets.
141         This has been used to return the <head></head> section of web pages
142         without the need to wade through <body></body>.
143
144       NoProxy => [ $host_1, $host_2, ..., $host_N ]
145       NoProxy => "host1,host2,hostN"
146         "NoProxy" specifies a list of server hosts that will not be proxied.
147         It is useful for local hosts and hosts that do not properly support
148         proxying.  If NoProxy is not specified, a list will be taken from the
149         NO_PROXY environment variable.
150
151           NoProxy => [ "localhost", "127.0.0.1" ],
152           NoProxy => "localhost,127.0.0.1",
153
154       BindAddr => $local_ip
155         Specify "BindAddr" to bind all client sockets to a particular local
156         address.  The value of BindAddr will be passed through
157         POE::Component::Client::Keepalive to POE::Wheel::SocketFactory (as
158         "bind_address").  See that module's documentation for implementation
159         details.
160
161           BindAddr => "12.34.56.78"
162
163       Protocol => $http_protocol_string
164         "Protocol" advertises the protocol that the client wishes to see.
165         Under normal circumstances, it should be left to its default value:
166         "HTTP/1.1".
167
168       Proxy => [ $proxy_host, $proxy_port ]
169       Proxy => $proxy_url
170       Proxy => $proxy_url,$proxy_url,...
171         "Proxy" specifies one or more proxy hosts that requests will be
172         passed through.  If not specified, proxy servers will be taken from
173         the HTTP_PROXY (or http_proxy) environment variable.  No proxying
174         will occur unless Proxy is set or one of the environment variables
175         exists.
176
177         The proxy can be specified either as a host and port, or as one or
178         more URLs.  Proxy URLs must specify the proxy port, even if it is 80.
179
180           Proxy => [ "127.0.0.1", 80 ],
181           Proxy => "http://127.0.0.1:80/",
182
183         "Proxy" may specify multiple proxies separated by commas.
184         PoCo::Client::HTTP will choose proxies from this list at random.
185         This is useful for load balancing requests through multiple gateways.
186
187           Proxy => "http://127.0.0.1:80/,http://127.0.0.1:81/",
188
189       Streaming => OCTETS
190         "Streaming" changes allows Client::HTTP to return large content in
191         chunks (of OCTETS octets each) rather than combine the entire content
192         into a single HTTP::Response object.
193
194         By default, Client::HTTP reads the entire content for a response into
195         memory before returning an HTTP::Response object.  This is obviously
196         bad for applications like streaming MP3 clients, because they often
197         fetch songs that never end.  Yes, they go on and on, my friend.
198
199         When "Streaming" is set to nonzero, however, the response handler
200         receives chunks of up to OCTETS octets apiece.  The response handler
201         accepts slightly different parameters in this case.  ARG0 is also an
202         HTTP::Response object but it does not contain response content, and
203         ARG1 contains a a chunk of raw response content, or undef if the
204         stream has ended.
205
206           sub streaming_response_handler {
207             my $response_packet = $_[ARG1];
208             my ($response, $data) = @$response_packet;
209             print SAVED_STREAM $data if defined $data;
210           }
211
212       FollowRedirects => $number_of_hops_to_follow
213         "FollowRedirects" specifies how many redirects (e.g. 302 Moved) to
214         follow.  If not specified defaults to 0, and thus no redirection is
215         followed.  This maintains compatibility with the previous behavior,
216         which was not to follow redirects at all.
217
218         If redirects are followed, a response chain should be built, and can
219         be accessed through $response_object->previous(). See HTTP::Response
220         for details here.
221
222       Timeout => $query_timeout
223         "Timeout" sets how long POE::Component::Client::HTTP has to process
224         an application's request, in seconds.  "Timeout" defaults to 180
225         (three minutes) if not specified.
226
227         It's important to note that the timeout begins when the component
228         receives an application's request, not when it attempts to connect to
229         the web server.
230
231         Timeouts may result from sending the component too many requests at
232         once.  Each request would need to be received and tracked in order.
233         Consider this:
234
235           $_[KERNEL]->post(component => request => ...) for (1..15_000);
236
237         15,000 requests are queued together in one enormous bolus.  The
238         component would receive and initialize them in order.  The first
239         socket activity wouldn't arrive until the 15,000th request was set
240         up.  If that took longer than "Timeout", then the requests that have
241         waited too long would fail.
242
243         "ConnectionManager"'s own timeout and concurrency limits also affect
244         how many requests may be processed at once.  For example, most of the
245         15,000 requests would wait in the connection manager's pool until
246         sockets become available.  Meanwhile, the "Timeout" would be counting
247         down.
248
249         Applications may elect to control concurrency outside the component's
250         "Timeout".  They may do so in a few ways.
251
252         The easiest way is to limit the initial number of requests to
253         something more manageable.  As responses arrive, the application
254         should handle them and start new requests.  This limits concurrency
255         to the initial request count.
256
257         An application may also outsource job throttling to another module,
258         such as POE::Component::JobQueue.
259
260         In any case, "Timeout" and "ConnectionManager" may be tuned to
261         maximize timeouts and concurrency limits.  This may help in some
262         cases.  Developers should be aware that doing so will increase memory
263         usage.  POE::Component::Client::HTTP and KeepAlive track requests in
264         memory, while applications are free to keep pending requests on disk.
265

ACCEPTED EVENTS

267       Sessions communicate asynchronously with PoCo::Client::HTTP.  They post
268       requests to it, and it posts responses back.
269
270   request
271       Requests are posted to the component's "request" state.  They include
272       an HTTP::Request object which defines the request.  For example:
273
274         $kernel->post(
275           'ua', 'request',            # http session alias & state
276           'response',                 # my state to receive responses
277           GET('http://poe.perl.org'), # a simple HTTP request
278           'unique id',                # a tag to identify the request
279           'progress',                 # an event to indicate progress
280           'http://1.2.3.4:80/'        # proxy to use for this request
281         );
282
283       Requests include the state to which responses will be posted.  In the
284       previous example, the handler for a 'response' state will be called
285       with each HTTP response.  The "progress" handler is optional and if
286       installed, the component will provide progress metrics (see sample
287       handler below).  The "proxy" parameter is optional and if not defined,
288       a default proxy will be used if configured.  No proxy will be used if
289       neither a default one nor a "proxy" parameter is defined.
290
291   pending_requests_count
292       There's also a pending_requests_count state that returns the number of
293       requests currently being processed.  To receive the return value, it
294       must be invoked with $kernel->call().
295
296         my $count = $kernel->call('ua' => 'pending_requests_count');
297
298       NOTE: Sometimes the count might not be what you expected, because
299       responses are currently in POE's queue and you haven't processed them.
300       This could happen if you configure the "ConnectionManager"'s
301       concurrency to a high enough value.
302
303   cancel
304       Cancel a specific HTTP request.  Requires a reference to the original
305       request (blessed or stringified) so it knows which one to cancel.  See
306       "progress handler" below for notes on canceling streaming requests.
307
308       To cancel a request based on its blessed HTTP::Request object:
309
310         $kernel->post( component => cancel => $http_request );
311
312       To cancel a request based on its stringified HTTP::Request object:
313
314         $kernel->post( component => cancel => "$http_request" );
315
316   shutdown
317       Responds to all pending requests with 408 (request timeout), and then
318       shuts down the component and all subcomponents.
319

SENT EVENTS

321   response handler
322       In addition to all the usual POE parameters, HTTP responses come with
323       two list references:
324
325         my ($request_packet, $response_packet) = @_[ARG0, ARG1];
326
327       $request_packet contains a reference to the original HTTP::Request
328       object.  This is useful for matching responses back to the requests
329       that generated them.
330
331         my $http_request_object = $request_packet->[0];
332         my $http_request_tag    = $request_packet->[1]; # from the 'request' post
333
334       $response_packet contains a reference to the resulting HTTP::Response
335       object.
336
337         my $http_response_object = $response_packet->[0];
338
339       Please see the HTTP::Request and HTTP::Response manpages for more
340       information.
341
342   progress handler
343       The example progress handler shows how to calculate a percentage of
344       download completion.
345
346         sub progress_handler {
347           my $gen_args  = $_[ARG0];    # args passed to all calls
348           my $call_args = $_[ARG1];    # args specific to the call
349
350           my $req = $gen_args->[0];    # HTTP::Request object being serviced
351           my $tag = $gen_args->[1];    # Request ID tag from.
352           my $got = $call_args->[0];   # Number of bytes retrieved so far.
353           my $tot = $call_args->[1];   # Total bytes to be retrieved.
354           my $oct = $call_args->[2];   # Chunk of raw octets received this time.
355
356           my $percent = $got / $tot * 100;
357
358           printf(
359             "-- %.0f%% [%d/%d]: %s\n", $percent, $got, $tot, $req->uri()
360           );
361
362           # To cancel the request:
363           # $_[KERNEL]->post( component => cancel => $req );
364         }
365
366       DEPRECATION WARNING
367
368       The third return argument (the raw octets received) has been
369       deprecated.  Instead of it, use the Streaming parameter to get chunks
370       of content in the response handler.
371

REQUEST CALLBACKS

373       The HTTP::Request object passed to the request event can contain a CODE
374       reference as "content".  This allows for sending large files without
375       wasting memory.  Your callback should return a chunk of data each time
376       it is called, and an empty string when done.  Don't forget to set the
377       Content-Length header correctly.  Example:
378
379         my $request = HTTP::Request->new( PUT => 'http://...' );
380
381         my $file = '/path/to/large_file';
382
383         open my $fh, '<', $file;
384
385         my $upload_cb = sub {
386           if ( sysread $fh, my $buf, 4096 ) {
387             return $buf;
388           }
389           else {
390             close $fh;
391             return '';
392           }
393         };
394
395         $request->content_length( -s $file );
396
397         $request->content( $upload_cb );
398
399         $kernel->post( ua => request, 'response', $request );
400

CONTENT ENCODING AND COMPRESSION

402       Transparent content decoding has been disabled as of version 0.84.
403       This also removes support for transparent gzip requesting and
404       decompression.
405
406       To re-enable gzip compression, specify the gzip Content-Encoding and
407       use HTTP::Response's decoded_content() method rather than content():
408
409         my $request = HTTP::Request->new(
410           GET => "http://www.yahoo.com/", [
411             'Accept-Encoding' => 'gzip'
412           ]
413         );
414
415         # ... time passes ...
416
417         my $content = $response->decoded_content();
418
419       The change in POE::Component::Client::HTTP behavior was prompted by
420       changes in HTTP::Response that surfaced a bug in the component's
421       transparent gzip handling.
422
423       Allowing the application to specify and handle content encodings seems
424       to be the most reliable and flexible resolution.
425
426       For more information about the problem and discussions regarding the
427       solution, see: <http://www.perlmonks.org/?node_id=683833> and
428       <http://rt.cpan.org/Ticket/Display.html?id=35538>
429

CLIENT HEADERS

431       POE::Component::Client::HTTP sets its own response headers with
432       additional information.  All of its headers begin with "X-PCCH".
433
434   X-PCCH-Errmsg
435       POE::Component::Client::HTTP may fail because of an internal client
436       error rather than an HTTP protocol error.  X-PCCH-Errmsg will contain a
437       human readable reason for client failures, should they occur.
438
439       The text of X-PCCH-Errmsg may also be repeated in the response's
440       content.
441
442   X-PCCH-Peer
443       X-PCCH-Peer contains the remote IPv4 address and port, separated by a
444       period.  For example, "127.0.0.1.8675" represents port 8675 on
445       localhost.
446
447       Proxying will render X-PCCH-Peer nearly useless, since the socket will
448       be connected to a proxy rather than the server itself.
449
450       This feature was added at Doreen Grey's request.  Doreen wanted a means
451       to find the remote server's address without having to make an
452       additional request.
453

ENVIRONMENT

455       POE::Component::Client::HTTP uses two standard environment variables:
456       HTTP_PROXY and NO_PROXY.
457
458       HTTP_PROXY sets the proxy server that Client::HTTP will forward
459       requests through.  NO_PROXY sets a list of hosts that will not be
460       forwarded through a proxy.
461
462       See the Proxy and NoProxy constructor parameters for more information
463       about these variables.
464

SEE ALSO

466       This component is built upon HTTP::Request, HTTP::Response, and POE.
467       Please see its source code and the documentation for its foundation
468       modules to learn more.  If you want to use cookies, you'll need to read
469       about HTTP::Cookies as well.
470
471       Also see the test program, t/01_request.t, in the PoCo::Client::HTTP
472       distribution.
473

BUGS

475       There is no support for CGI_PROXY or CgiProxy.
476
477       Secure HTTP (https) proxying is not supported at this time.
478
479       There is no object oriented interface.  See
480       POE::Component::Client::Keepalive and POE::Component::Resolver for
481       examples of a decent OO interface.
482

AUTHOR, COPYRIGHT, & LICENSE

484       POE::Component::Client::HTTP is
485
486       · Copyright 1999-2009 Rocco Caputo
487
488       · Copyright 2004 Rob Bloodgood
489
490       · Copyright 2004-2005 Martijn van Beers
491
492       All rights are reserved.  POE::Component::Client::HTTP is free
493       software; you may redistribute it and/or modify it under the same terms
494       as Perl itself.
495

CONTRIBUTORS

497       Joel Bernstein solved some nasty race conditions.  Portugal Telecom
498       <http://www.sapo.pt/> was kind enough to support his contributions.
499
500       Jeff Bisbee added POD tests and documentation to pass several of them
501       to version 0.79.  He's a kwalitee-increasing machine!
502

BUG TRACKER

504       https://rt.cpan.org/Dist/Display.html?Queue=POE-Component-Client-HTTP
505

REPOSITORY

507       Github: <http://github.com/rcaputo/poe-component-client-http> .
508
509       Gitorious: <http://gitorious.org/poe-component-client-http> .
510

OTHER RESOURCES

512       <http://search.cpan.org/dist/POE-Component-Client-HTTP/>
513
514
515
516perl v5.32.0                      2020-07-28   POE::Component::Client::HTTP(3)
Impressum