1HTTP(3)               User Contributed Perl Documentation              HTTP(3)
2
3
4

NAME

6       AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
7

SYNOPSIS

9          use AnyEvent::HTTP;
10
11          http_get "http://www.nethype.de/", sub { print $_[1] };
12
13          # ... do something else here
14

DESCRIPTION

16       This module is an AnyEvent user, you need to make sure that you use and
17       run a supported event loop.
18
19       This module implements a simple, stateless and non-blocking HTTP
20       client. It supports GET, POST and other request methods, cookies and
21       more, all on a very low level. It can follow redirects, supports
22       proxies, and automatically limits the number of connections to the
23       values specified in the RFC.
24
25       It should generally be a "good client" that is enough for most HTTP
26       tasks. Simple tasks should be simple, but complex tasks should still be
27       possible as the user retains control over request and response headers.
28
29       The caller is responsible for authentication management, cookies (if
30       the simplistic implementation in this module doesn't suffice), referer
31       and other high-level protocol details for which this module offers only
32       limited support.
33
34   METHODS
35       http_get $url, key => value..., $cb->($data, $headers)
36           Executes an HTTP-GET request. See the http_request function for
37           details on additional parameters and the return value.
38
39       http_head $url, key => value..., $cb->($data, $headers)
40           Executes an HTTP-HEAD request. See the http_request function for
41           details on additional parameters and the return value.
42
43       http_post $url, $body, key => value..., $cb->($data, $headers)
44           Executes an HTTP-POST request with a request body of $body. See the
45           http_request function for details on additional parameters and the
46           return value.
47
48       http_request $method => $url, key => value..., $cb->($data, $headers)
49           Executes a HTTP request of type $method (e.g. "GET", "POST"). The
50           URL must be an absolute http or https URL.
51
52           When called in void context, nothing is returned. In other
53           contexts, "http_request" returns a "cancellation guard" - you have
54           to keep the object at least alive until the callback get called. If
55           the object gets destroyed before the callback is called, the
56           request will be cancelled.
57
58           The callback will be called with the response body data as first
59           argument (or "undef" if an error occurred), and a hash-ref with
60           response headers (and trailers) as second argument.
61
62           All the headers in that hash are lowercased. In addition to the
63           response headers, the "pseudo-headers" (uppercase to avoid clashing
64           with possible response headers) "HTTPVersion", "Status" and
65           "Reason" contain the three parts of the HTTP Status-Line of the
66           same name. If an error occurs during the body phase of a request,
67           then the original "Status" and "Reason" values from the header are
68           available as "OrigStatus" and "OrigReason".
69
70           The pseudo-header "URL" contains the actual URL (which can differ
71           from the requested URL when following redirects - for example, you
72           might get an error that your URL scheme is not supported even
73           though your URL is a valid http URL because it redirected to an ftp
74           URL, in which case you can look at the URL pseudo header).
75
76           The pseudo-header "Redirect" only exists when the request was a
77           result of an internal redirect. In that case it is an array
78           reference with the "($data, $headers)" from the redirect response.
79           Note that this response could in turn be the result of a redirect
80           itself, and "$headers->{Redirect}[1]{Redirect}" will then contain
81           the original response, and so on.
82
83           If the server sends a header multiple times, then their contents
84           will be joined together with a comma (","), as per the HTTP spec.
85
86           If an internal error occurs, such as not being able to resolve a
87           hostname, then $data will be "undef", "$headers->{Status}" will be
88           590-599 and the "Reason" pseudo-header will contain an error
89           message. Currently the following status codes are used:
90
91           595 - errors during connection establishment, proxy handshake.
92           596 - errors during TLS negotiation, request sending and header
93           processing.
94           597 - errors during body receiving or processing.
95           598 - user aborted request via "on_header" or "on_body".
96           599 - other, usually nonretryable, errors (garbled URL etc.).
97
98           A typical callback might look like this:
99
100              sub {
101                 my ($body, $hdr) = @_;
102
103                 if ($hdr->{Status} =~ /^2/) {
104                    ... everything should be ok
105                 } else {
106                    print "error, $hdr->{Status} $hdr->{Reason}\n";
107                 }
108              }
109
110           Additional parameters are key-value pairs, and are fully optional.
111           They include:
112
113           recurse => $count (default: $MAX_RECURSE)
114               Whether to recurse requests or not, e.g. on redirects,
115               authentication and other retries and so on, and how often to do
116               so.
117
118               Only redirects to http and https URLs are supported. While most
119               common redirection forms are handled entirely within this
120               module, some require the use of the optional URI module. If it
121               is required but missing, then the request will fail with an
122               error.
123
124           headers => hashref
125               The request headers to use. Currently, "http_request" may
126               provide its own "Host:", "Content-Length:", "Connection:" and
127               "Cookie:" headers and will provide defaults at least for "TE:",
128               "Referer:" and "User-Agent:" (this can be suppressed by using
129               "undef" for these headers in which case they won't be sent at
130               all).
131
132               You really should provide your own "User-Agent:" header value
133               that is appropriate for your program - I wouldn't be surprised
134               if the default AnyEvent string gets blocked by webservers
135               sooner or later.
136
137               Also, make sure that your headers names and values do not
138               contain any embedded newlines.
139
140           timeout => $seconds
141               The time-out to use for various stages - each connect attempt
142               will reset the timeout, as will read or write activity, i.e.
143               this is not an overall timeout.
144
145               Default timeout is 5 minutes.
146
147           proxy => [$host, $port[, $scheme]] or undef
148               Use the given http proxy for all requests, or no proxy if
149               "undef" is used.
150
151               $scheme must be either missing or must be "http" for HTTP.
152
153               If not specified, then the default proxy is used (see
154               "AnyEvent::HTTP::set_proxy").
155
156               Currently, if your proxy requires authorization, you have to
157               specify an appropriate "Proxy-Authorization" header in every
158               request.
159
160           body => $string
161               The request body, usually empty. Will be sent as-is (future
162               versions of this module might offer more options).
163
164           cookie_jar => $hash_ref
165               Passing this parameter enables (simplified) cookie-processing,
166               loosely based on the original netscape specification.
167
168               The $hash_ref must be an (initially empty) hash reference which
169               will get updated automatically. It is possible to save the
170               cookie jar to persistent storage with something like JSON or
171               Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function
172               if you wish to remove expired or session-only cookies, and also
173               for documentation on the format of the cookie jar.
174
175               Note that this cookie implementation is not meant to be
176               complete. If you want complete cookie management you have to do
177               that on your own. "cookie_jar" is meant as a quick fix to get
178               most cookie-using sites working. Cookies are a privacy
179               disaster, do not use them unless required to.
180
181               When cookie processing is enabled, the "Cookie:" and
182               "Set-Cookie:" headers will be set and handled by this module,
183               otherwise they will be left untouched.
184
185           tls_ctx => $scheme | $tls_ctx
186               Specifies the AnyEvent::TLS context to be used for https
187               connections. This parameter follows the same rules as the
188               "tls_ctx" parameter to AnyEvent::Handle, but additionally, the
189               two strings "low" or "high" can be specified, which give you a
190               predefined low-security (no verification, highest
191               compatibility) and high-security (CA and common-name
192               verification) TLS context.
193
194               The default for this option is "low", which could be
195               interpreted as "give me the page, no matter what".
196
197               See also the "sessionid" parameter.
198
199           session => $string
200               The module might reuse connections to the same host internally.
201               Sometimes (e.g. when using TLS), you do not want to reuse
202               connections from other sessions. This can be achieved by
203               setting this parameter to some unique ID (such as the address
204               of an object storing your state data, or the TLS context) -
205               only connections using the same unique ID will be reused.
206
207           on_prepare => $callback->($fh)
208               In rare cases you need to "tune" the socket before it is used
209               to connect (for example, to bind it on a given IP address).
210               This parameter overrides the prepare callback passed to
211               "AnyEvent::Socket::tcp_connect" and behaves exactly the same
212               way (e.g. it has to provide a timeout). See the description for
213               the $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for
214               details.
215
216           tcp_connect => $callback->($host, $service, $connect_cb,
217           $prepare_cb)
218               In even rarer cases you want total control over how
219               AnyEvent::HTTP establishes connections. Normally it uses
220               AnyEvent::Socket::tcp_connect to do this, but you can provide
221               your own "tcp_connect" function - obviously, it has to follow
222               the same calling conventions, except that it may always return
223               a connection guard object.
224
225               There are probably lots of weird uses for this function,
226               starting from tracing the hosts "http_request" actually tries
227               to connect, to (inexact but fast) host => IP address caching or
228               even socks protocol support.
229
230           on_header => $callback->($headers)
231               When specified, this callback will be called with the header
232               hash as soon as headers have been successfully received from
233               the remote server (not on locally-generated errors).
234
235               It has to return either true (in which case AnyEvent::HTTP will
236               continue), or false, in which case AnyEvent::HTTP will cancel
237               the download (and call the finish callback with an error code
238               of 598).
239
240               This callback is useful, among other things, to quickly reject
241               unwanted content, which, if it is supposed to be rare, can be
242               faster than first doing a "HEAD" request.
243
244               The downside is that cancelling the request makes it impossible
245               to re-use the connection. Also, the "on_header" callback will
246               not receive any trailer (headers sent after the response body).
247
248               Example: cancel the request unless the content-type is
249               "text/html".
250
251                  on_header => sub {
252                     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
253                  },
254
255           on_body => $callback->($partial_body, $headers)
256               When specified, all body data will be passed to this callback
257               instead of to the completion callback. The completion callback
258               will get the empty string instead of the body data.
259
260               It has to return either true (in which case AnyEvent::HTTP will
261               continue), or false, in which case AnyEvent::HTTP will cancel
262               the download (and call the completion callback with an error
263               code of 598).
264
265               The downside to cancelling the request is that it makes it
266               impossible to re-use the connection.
267
268               This callback is useful when the data is too large to be held
269               in memory (so the callback writes it to a file) or when only
270               some information should be extracted, or when the body should
271               be processed incrementally.
272
273               It is usually preferred over doing your own body handling via
274               "want_body_handle", but in case of streaming APIs, where HTTP
275               is only used to create a connection, "want_body_handle" is the
276               better alternative, as it allows you to install your own event
277               handler, reducing resource usage.
278
279           want_body_handle => $enable
280               When enabled (default is disabled), the behaviour of
281               AnyEvent::HTTP changes considerably: after parsing the headers,
282               and instead of downloading the body (if any), the completion
283               callback will be called. Instead of the $body argument
284               containing the body data, the callback will receive the
285               AnyEvent::Handle object associated with the connection. In
286               error cases, "undef" will be passed. When there is no body
287               (e.g. status 304), the empty string will be passed.
288
289               The handle object might or might not be in TLS mode, might be
290               connected to a proxy, be a persistent connection, use chunked
291               transfer encoding etc., and configured in unspecified ways. The
292               user is responsible for this handle (it will not be used by
293               this module anymore).
294
295               This is useful with some push-type services, where, after the
296               initial headers, an interactive protocol is used (typical
297               example would be the push-style twitter API which starts a
298               JSON/XML stream).
299
300               If you think you need this, first have a look at "on_body", to
301               see if that doesn't solve your problem in a better way.
302
303           persistent => $boolean
304               Try to create/reuse a persistent connection. When this flag is
305               set (default: true for idempotent requests, false for all
306               others), then "http_request" tries to re-use an existing
307               (previously-created) persistent connection to the host and,
308               failing that, tries to create a new one.
309
310               Requests failing in certain ways will be automatically retried
311               once, which is dangerous for non-idempotent requests, which is
312               why it defaults to off for them. The reason for this is because
313               the bozos who designed HTTP/1.1 made it impossible to
314               distinguish between a fatal error and a normal connection
315               timeout, so you never know whether there was a problem with
316               your request or not.
317
318               When reusing an existent connection, many parameters (such as
319               TLS context) will be ignored. See the "session" parameter for a
320               workaround.
321
322           keepalive => $boolean
323               Only used when "persistent" is also true. This parameter
324               decides whether "http_request" tries to handshake a
325               HTTP/1.0-style keep-alive connection (as opposed to only a
326               HTTP/1.1 persistent connection).
327
328               The default is true, except when using a proxy, in which case
329               it defaults to false, as HTTP/1.0 proxies cannot support this
330               in a meaningful way.
331
332           handle_params => { key => value ... }
333               The key-value pairs in this hash will be passed to any
334               AnyEvent::Handle constructor that is called - not all requests
335               will create a handle, and sometimes more than one is created,
336               so this parameter is only good for setting hints.
337
338               Example: set the maximum read size to 4096, to potentially
339               conserve memory at the cost of speed.
340
341                  handle_params => {
342                     max_read_size => 4096,
343                  },
344
345           Example: do a simple HTTP GET request for http://www.nethype.de/
346           and print the response body.
347
348              http_request GET => "http://www.nethype.de/", sub {
349                 my ($body, $hdr) = @_;
350                 print "$body\n";
351              };
352
353           Example: do a HTTP HEAD request on https://www.google.com/, use a
354           timeout of 30 seconds.
355
356              http_request
357                 HEAD    => "https://www.google.com",
358                 headers => { "user-agent" => "MySearchClient 1.0" },
359                 timeout => 30,
360                 sub {
361                    my ($body, $hdr) = @_;
362                    use Data::Dumper;
363                    print Dumper $hdr;
364                 }
365              ;
366
367           Example: do another simple HTTP GET request, but immediately try to
368           cancel it.
369
370              my $request = http_request GET => "http://www.nethype.de/", sub {
371                 my ($body, $hdr) = @_;
372                 print "$body\n";
373              };
374
375              undef $request;
376
377   DNS CACHING
378       AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the
379       actual connection, which in turn uses AnyEvent::DNS to resolve
380       hostnames. The latter is a simple stub resolver and does no caching on
381       its own. If you want DNS caching, you currently have to provide your
382       own default resolver (by storing a suitable resolver object in
383       $AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback.
384
385   GLOBAL FUNCTIONS AND VARIABLES
386       AnyEvent::HTTP::set_proxy "proxy-url"
387           Sets the default proxy server to use. The proxy-url must begin with
388           a string of the form "http://host:port", croaks otherwise.
389
390           To clear an already-set proxy, use "undef".
391
392           When AnyEvent::HTTP is loaded for the first time it will query the
393           default proxy from the operating system, currently by looking at
394           "$ENV{http_proxy"}.
395
396       AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
397           Remove all cookies from the cookie jar that have been expired. If
398           $session_end is given and true, then additionally remove all
399           session cookies.
400
401           You should call this function (with a true $session_end) before you
402           save cookies to disk, and you should call this function after
403           loading them again. If you have a long-running program you can
404           additionally call this function from time to time.
405
406           A cookie jar is initially an empty hash-reference that is managed
407           by this module. Its format is subject to change, but currently it
408           is as follows:
409
410           The key "version" has to contain 1, otherwise the hash gets
411           emptied. All other keys are hostnames or IP addresses pointing to
412           hash-references. The key for these inner hash references is the
413           server path for which this cookie is meant, and the values are
414           again hash-references. Each key of those hash-references is a
415           cookie name, and the value, you guessed it, is another hash-
416           reference, this time with the key-value pairs from the cookie,
417           except for "expires" and "max-age", which have been replaced by a
418           "_expires" key that contains the cookie expiry timestamp. Session
419           cookies are indicated by not having an "_expires" key.
420
421           Here is an example of a cookie jar with a single cookie, so you
422           have a chance of understanding the above paragraph:
423
424              {
425                 version    => 1,
426                 "10.0.0.1" => {
427                    "/" => {
428                       "mythweb_id" => {
429                         _expires => 1293917923,
430                         value    => "ooRung9dThee3ooyXooM1Ohm",
431                       },
432                    },
433                 },
434              }
435
436       $date = AnyEvent::HTTP::format_date $timestamp
437           Takes a POSIX timestamp (seconds since the epoch) and formats it as
438           a HTTP Date (RFC 2616).
439
440       $timestamp = AnyEvent::HTTP::parse_date $date
441           Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie
442           spec) or a bunch of minor variations of those, and returns the
443           corresponding POSIX timestamp, or "undef" if the date cannot be
444           parsed.
445
446       $AnyEvent::HTTP::MAX_RECURSE
447           The default value for the "recurse" request parameter (default:
448           10).
449
450       $AnyEvent::HTTP::TIMEOUT
451           The default timeout for connection operations (default: 300).
452
453       $AnyEvent::HTTP::USERAGENT
454           The default value for the "User-Agent" header (the default is
455           "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION;
456           +http://software.schmorp.de/pkg/AnyEvent)").
457
458       $AnyEvent::HTTP::MAX_PER_HOST
459           The maximum number of concurrent connections to the same host
460           (identified by the hostname). If the limit is exceeded, then
461           additional requests are queued until previous connections are
462           closed. Both persistent and non-persistent connections are counted
463           in this limit.
464
465           The default value for this is 4, and it is highly advisable to not
466           increase it much.
467
468           For comparison: the RFC's recommend 4 non-persistent or 2
469           persistent connections, older browsers used 2, newer ones (such as
470           firefox 3) typically use 6, and Opera uses 8 because like, they
471           have the fastest browser and give a shit for everybody else on the
472           planet.
473
474       $AnyEvent::HTTP::PERSISTENT_TIMEOUT
475           The time after which idle persistent connections get closed by
476           AnyEvent::HTTP (default: 3).
477
478       $AnyEvent::HTTP::ACTIVE
479           The number of active connections. This is not the number of
480           currently running requests, but the number of currently open and
481           non-idle TCP connections. This number can be useful for load-
482           leveling.
483
484   SHOWCASE
485       This section contains some more elaborate "real-world" examples or code
486       snippets.
487
488   HTTP/1.1 FILE DOWNLOAD
489       Downloading files with HTTP can be quite tricky, especially when
490       something goes wrong and you want to resume.
491
492       Here is a function that initiates and resumes a download. It uses the
493       last modified time to check for file content changes, and works with
494       many HTTP/1.0 servers as well, and usually falls back to a complete re-
495       download on older servers.
496
497       It calls the completion callback with either "undef", which means a
498       nonretryable error occurred, 0 when the download was partial and should
499       be retried, and 1 if it was successful.
500
501          use AnyEvent::HTTP;
502
503          sub download($$$) {
504             my ($url, $file, $cb) = @_;
505
506             open my $fh, "+<", $file
507                or die "$file: $!";
508
509             my %hdr;
510             my $ofs = 0;
511
512             if (stat $fh and -s _) {
513                $ofs = -s _;
514                warn "-s is ", $ofs;
515                $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
516                $hdr{"range"} = "bytes=$ofs-";
517             }
518
519             http_get $url,
520                headers   => \%hdr,
521                on_header => sub {
522                   my ($hdr) = @_;
523
524                   if ($hdr->{Status} == 200 && $ofs) {
525                      # resume failed
526                      truncate $fh, $ofs = 0;
527                   }
528
529                   sysseek $fh, $ofs, 0;
530
531                   1
532                },
533                on_body   => sub {
534                   my ($data, $hdr) = @_;
535
536                   if ($hdr->{Status} =~ /^2/) {
537                      length $data == syswrite $fh, $data
538                         or return; # abort on write errors
539                   }
540
541                   1
542                },
543                sub {
544                   my (undef, $hdr) = @_;
545
546                   my $status = $hdr->{Status};
547
548                   if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
549                      utime $time, $time, $fh;
550                   }
551
552                   if ($status == 200 || $status == 206 || $status == 416) {
553                      # download ok || resume ok || file already fully downloaded
554                      $cb->(1, $hdr);
555
556                   } elsif ($status == 412) {
557                      # file has changed while resuming, delete and retry
558                      unlink $file;
559                      $cb->(0, $hdr);
560
561                   } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
562                      # retry later
563                      $cb->(0, $hdr);
564
565                   } else {
566                      $cb->(undef, $hdr);
567                   }
568                }
569             ;
570          }
571
572          download "http://server/somelargefile", "/tmp/somelargefile", sub {
573             if ($_[0]) {
574                print "OK!\n";
575             } elsif (defined $_[0]) {
576                print "please retry later\n";
577             } else {
578                print "ERROR\n";
579             }
580          };
581
582       SOCKS PROXIES
583
584       Socks proxies are not directly supported by AnyEvent::HTTP. You can
585       compile your perl to support socks, or use an external program such as
586       socksify (dante) or tsocks to make your program use a socks proxy
587       transparently.
588
589       Alternatively, for AnyEvent::HTTP only, you can use your own
590       "tcp_connect" function that does the proxy handshake - here is an
591       example that works with socks4a proxies:
592
593          use Errno;
594          use AnyEvent::Util;
595          use AnyEvent::Socket;
596          use AnyEvent::Handle;
597
598          # host, port and username of/for your socks4a proxy
599          my $socks_host = "10.0.0.23";
600          my $socks_port = 9050;
601          my $socks_user = "";
602
603          sub socks4a_connect {
604             my ($host, $port, $connect_cb, $prepare_cb) = @_;
605
606             my $hdl = new AnyEvent::Handle
607                connect    => [$socks_host, $socks_port],
608                on_prepare => sub { $prepare_cb->($_[0]{fh}) },
609                on_error   => sub { $connect_cb->() },
610             ;
611
612             $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
613
614             $hdl->push_read (chunk => 8, sub {
615                my ($hdl, $chunk) = @_;
616                my ($status, $port, $ipn) = unpack "xCna4", $chunk;
617
618                if ($status == 0x5a) {
619                   $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
620                } else {
621                   $! = Errno::ENXIO; $connect_cb->();
622                }
623             });
624
625             $hdl
626          }
627
628       Use "socks4a_connect" instead of "tcp_connect" when doing
629       "http_request"s, possibly after switching off other proxy types:
630
631          AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
632
633          http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
634             my ($data, $headers) = @_;
635             ...
636          };
637

SEE ALSO

639       AnyEvent.
640

AUTHOR

642          Marc Lehmann <schmorp@schmorp.de>
643          http://home.schmorp.de/
644
645       With many thanks to Дмитрий Шалашов, who provided
646       countless testcases and bugreports.
647

POD ERRORS

649       Hey! The above document had some coding errors, which are explained
650       below:
651
652       Around line 1604:
653           Non-ASCII character seen before =encoding in 'Дмитрий'.
654           Assuming CP1252
655
656
657
658perl v5.30.1                      2020-01-29                           HTTP(3)
Impressum