1HTTP(3)               User Contributed Perl Documentation              HTTP(3)
2
3
4

NAME

6       AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
7

SYNOPSIS

9          use AnyEvent::HTTP;
10
11          http_get "http://www.nethype.de/", sub { print $_[1] };
12
13          # ... do something else here
14

DESCRIPTION

16       This module is an AnyEvent user, you need to make sure that you use and
17       run a supported event loop.
18
19       This module implements a simple, stateless and non-blocking HTTP
20       client. It supports GET, POST and other request methods, cookies and
21       more, all on a very low level. It can follow redirects, supports
22       proxies, and automatically limits the number of connections to the
23       values specified in the RFC.
24
25       It should generally be a "good client" that is enough for most HTTP
26       tasks. Simple tasks should be simple, but complex tasks should still be
27       possible as the user retains control over request and response headers.
28
29       The caller is responsible for authentication management, cookies (if
30       the simplistic implementation in this module doesn't suffice), referer
31       and other high-level protocol details for which this module offers only
32       limited support.
33
34   METHODS
35       http_get $url, key => value..., $cb->($data, $headers)
36           Executes an HTTP-GET request. See the http_request function for
37           details on additional parameters and the return value.
38
39       http_head $url, key => value..., $cb->($data, $headers)
40           Executes an HTTP-HEAD request. See the http_request function for
41           details on additional parameters and the return value.
42
43       http_post $url, $body, key => value..., $cb->($data, $headers)
44           Executes an HTTP-POST request with a request body of $body. See the
45           http_request function for details on additional parameters and the
46           return value.
47
48       http_request $method => $url, key => value..., $cb->($data, $headers)
49           Executes a HTTP request of type $method (e.g. "GET", "POST"). The
50           URL must be an absolute http or https URL.
51
52           When called in void context, nothing is returned. In other
53           contexts, "http_request" returns a "cancellation guard" - you have
54           to keep the object at least alive until the callback get called. If
55           the object gets destroyed before the callback is called, the
56           request will be cancelled.
57
58           The callback will be called with the response body data as first
59           argument (or "undef" if an error occurred), and a hash-ref with
60           response headers (and trailers) as second argument.
61
62           All the headers in that hash are lowercased. In addition to the
63           response headers, the "pseudo-headers" (uppercase to avoid clashing
64           with possible response headers) "HTTPVersion", "Status" and
65           "Reason" contain the three parts of the HTTP Status-Line of the
66           same name. If an error occurs during the body phase of a request,
67           then the original "Status" and "Reason" values from the header are
68           available as "OrigStatus" and "OrigReason".
69
70           The pseudo-header "URL" contains the actual URL (which can differ
71           from the requested URL when following redirects - for example, you
72           might get an error that your URL scheme is not supported even
73           though your URL is a valid http URL because it redirected to an ftp
74           URL, in which case you can look at the URL pseudo header).
75
76           The pseudo-header "Redirect" only exists when the request was a
77           result of an internal redirect. In that case it is an array
78           reference with the "($data, $headers)" from the redirect response.
79           Note that this response could in turn be the result of a redirect
80           itself, and "$headers->{Redirect}[1]{Redirect}" will then contain
81           the original response, and so on.
82
83           If the server sends a header multiple times, then their contents
84           will be joined together with a comma (","), as per the HTTP spec.
85
86           If an internal error occurs, such as not being able to resolve a
87           hostname, then $data will be "undef", "$headers->{Status}" will be
88           590-599 and the "Reason" pseudo-header will contain an error
89           message. Currently the following status codes are used:
90
91           595 - errors during connection establishment, proxy handshake.
92           596 - errors during TLS negotiation, request sending and header
93           processing.
94           597 - errors during body receiving or processing.
95           598 - user aborted request via "on_header" or "on_body".
96           599 - other, usually nonretryable, errors (garbled URL etc.).
97
98           A typical callback might look like this:
99
100              sub {
101                 my ($body, $hdr) = @_;
102
103                 if ($hdr->{Status} =~ /^2/) {
104                    ... everything should be ok
105                 } else {
106                    print "error, $hdr->{Status} $hdr->{Reason}\n";
107                 }
108              }
109
110           Additional parameters are key-value pairs, and are fully optional.
111           They include:
112
113           recurse => $count (default: $MAX_RECURSE)
114               Whether to recurse requests or not, e.g. on redirects,
115               authentication and other retries and so on, and how often to do
116               so.
117
118               Only redirects to http and https URLs are supported. While most
119               common redirection forms are handled entirely within this
120               module, some require the use of the optional URI module. If it
121               is required but missing, then the request will fail with an
122               error.
123
124           headers => hashref
125               The request headers to use. Currently, "http_request" may
126               provide its own "Host:", "Content-Length:", "Connection:" and
127               "Cookie:" headers and will provide defaults at least for "TE:",
128               "Referer:" and "User-Agent:" (this can be suppressed by using
129               "undef" for these headers in which case they won't be sent at
130               all).
131
132               You really should provide your own "User-Agent:" header value
133               that is appropriate for your program - I wouldn't be surprised
134               if the default AnyEvent string gets blocked by webservers
135               sooner or later.
136
137               Also, make sure that your headers names and values do not
138               contain any embedded newlines.
139
140           timeout => $seconds
141               The time-out to use for various stages - each connect attempt
142               will reset the timeout, as will read or write activity, i.e.
143               this is not an overall timeout.
144
145               Default timeout is 5 minutes.
146
147           proxy => [$host, $port[, $scheme]] or undef
148               Use the given http proxy for all requests, or no proxy if
149               "undef" is used.
150
151               $scheme must be either missing or must be "http" for HTTP.
152
153               If not specified, then the default proxy is used (see
154               "AnyEvent::HTTP::set_proxy").
155
156               Currently, if your proxy requires authorization, you have to
157               specify an appropriate "Proxy-Authorization" header in every
158               request.
159
160               Note that this module will prefer an existing persistent
161               connection, even if that connection was made using another
162               proxy. If you need to ensure that a new connection is made in
163               this case, you can either force "persistent" to false or e.g.
164               use the proxy address in your "sessionid".
165
166           body => $string
167               The request body, usually empty. Will be sent as-is (future
168               versions of this module might offer more options).
169
170           cookie_jar => $hash_ref
171               Passing this parameter enables (simplified) cookie-processing,
172               loosely based on the original netscape specification.
173
174               The $hash_ref must be an (initially empty) hash reference which
175               will get updated automatically. It is possible to save the
176               cookie jar to persistent storage with something like JSON or
177               Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function
178               if you wish to remove expired or session-only cookies, and also
179               for documentation on the format of the cookie jar.
180
181               Note that this cookie implementation is not meant to be
182               complete. If you want complete cookie management you have to do
183               that on your own. "cookie_jar" is meant as a quick fix to get
184               most cookie-using sites working. Cookies are a privacy
185               disaster, do not use them unless required to.
186
187               When cookie processing is enabled, the "Cookie:" and
188               "Set-Cookie:" headers will be set and handled by this module,
189               otherwise they will be left untouched.
190
191           tls_ctx => $scheme | $tls_ctx
192               Specifies the AnyEvent::TLS context to be used for https
193               connections. This parameter follows the same rules as the
194               "tls_ctx" parameter to AnyEvent::Handle, but additionally, the
195               two strings "low" or "high" can be specified, which give you a
196               predefined low-security (no verification, highest
197               compatibility) and high-security (CA and common-name
198               verification) TLS context.
199
200               The default for this option is "low", which could be
201               interpreted as "give me the page, no matter what".
202
203               See also the "sessionid" parameter.
204
205           sessionid => $string
206               The module might reuse connections to the same host internally
207               (regardless of other settings, such as "tcp_connect" or
208               "proxy"). Sometimes (e.g.  when using TLS or a specfic proxy),
209               you do not want to reuse connections from other sessions. This
210               can be achieved by setting this parameter to some unique ID
211               (such as the address of an object storing your state data or
212               the TLS context, or the proxy IP) - only connections using the
213               same unique ID will be reused.
214
215           on_prepare => $callback->($fh)
216               In rare cases you need to "tune" the socket before it is used
217               to connect (for example, to bind it on a given IP address).
218               This parameter overrides the prepare callback passed to
219               "AnyEvent::Socket::tcp_connect" and behaves exactly the same
220               way (e.g. it has to provide a timeout). See the description for
221               the $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for
222               details.
223
224           tcp_connect => $callback->($host, $service, $connect_cb,
225           $prepare_cb)
226               In even rarer cases you want total control over how
227               AnyEvent::HTTP establishes connections. Normally it uses
228               AnyEvent::Socket::tcp_connect to do this, but you can provide
229               your own "tcp_connect" function - obviously, it has to follow
230               the same calling conventions, except that it may always return
231               a connection guard object.
232
233               The connections made by this hook will be treated as equivalent
234               to connections made the built-in way, specifically, they will
235               be put into and taken from the persistent connection cache. If
236               your $tcp_connect function is incompatible with this kind of
237               re-use, consider switching off "persistent" connections and/or
238               providing a "sessionid" identifier.
239
240               There are probably lots of weird uses for this function,
241               starting from tracing the hosts "http_request" actually tries
242               to connect, to (inexact but fast) host => IP address caching or
243               even socks protocol support.
244
245           on_header => $callback->($headers)
246               When specified, this callback will be called with the header
247               hash as soon as headers have been successfully received from
248               the remote server (not on locally-generated errors).
249
250               It has to return either true (in which case AnyEvent::HTTP will
251               continue), or false, in which case AnyEvent::HTTP will cancel
252               the download (and call the finish callback with an error code
253               of 598).
254
255               This callback is useful, among other things, to quickly reject
256               unwanted content, which, if it is supposed to be rare, can be
257               faster than first doing a "HEAD" request.
258
259               The downside is that cancelling the request makes it impossible
260               to re-use the connection. Also, the "on_header" callback will
261               not receive any trailer (headers sent after the response body).
262
263               Example: cancel the request unless the content-type is
264               "text/html".
265
266                  on_header => sub {
267                     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
268                  },
269
270           on_body => $callback->($partial_body, $headers)
271               When specified, all body data will be passed to this callback
272               instead of to the completion callback. The completion callback
273               will get the empty string instead of the body data.
274
275               It has to return either true (in which case AnyEvent::HTTP will
276               continue), or false, in which case AnyEvent::HTTP will cancel
277               the download (and call the completion callback with an error
278               code of 598).
279
280               The downside to cancelling the request is that it makes it
281               impossible to re-use the connection.
282
283               This callback is useful when the data is too large to be held
284               in memory (so the callback writes it to a file) or when only
285               some information should be extracted, or when the body should
286               be processed incrementally.
287
288               It is usually preferred over doing your own body handling via
289               "want_body_handle", but in case of streaming APIs, where HTTP
290               is only used to create a connection, "want_body_handle" is the
291               better alternative, as it allows you to install your own event
292               handler, reducing resource usage.
293
294           want_body_handle => $enable
295               When enabled (default is disabled), the behaviour of
296               AnyEvent::HTTP changes considerably: after parsing the headers,
297               and instead of downloading the body (if any), the completion
298               callback will be called. Instead of the $body argument
299               containing the body data, the callback will receive the
300               AnyEvent::Handle object associated with the connection. In
301               error cases, "undef" will be passed. When there is no body
302               (e.g. status 304), the empty string will be passed.
303
304               The handle object might or might not be in TLS mode, might be
305               connected to a proxy, be a persistent connection, use chunked
306               transfer encoding etc., and configured in unspecified ways. The
307               user is responsible for this handle (it will not be used by
308               this module anymore).
309
310               This is useful with some push-type services, where, after the
311               initial headers, an interactive protocol is used (typical
312               example would be the push-style twitter API which starts a
313               JSON/XML stream).
314
315               If you think you need this, first have a look at "on_body", to
316               see if that doesn't solve your problem in a better way.
317
318           persistent => $boolean
319               Try to create/reuse a persistent connection. When this flag is
320               set (default: true for idempotent requests, false for all
321               others), then "http_request" tries to re-use an existing
322               (previously-created) persistent connection to same host (i.e.
323               identical URL scheme, hostname, port and sessionid) and,
324               failing that, tries to create a new one.
325
326               Requests failing in certain ways will be automatically retried
327               once, which is dangerous for non-idempotent requests, which is
328               why it defaults to off for them. The reason for this is because
329               the bozos who designed HTTP/1.1 made it impossible to
330               distinguish between a fatal error and a normal connection
331               timeout, so you never know whether there was a problem with
332               your request or not.
333
334               When reusing an existent connection, many parameters (such as
335               TLS context) will be ignored. See the "sessionid" parameter for
336               a workaround.
337
338           keepalive => $boolean
339               Only used when "persistent" is also true. This parameter
340               decides whether "http_request" tries to handshake a
341               HTTP/1.0-style keep-alive connection (as opposed to only a
342               HTTP/1.1 persistent connection).
343
344               The default is true, except when using a proxy, in which case
345               it defaults to false, as HTTP/1.0 proxies cannot support this
346               in a meaningful way.
347
348           handle_params => { key => value ... }
349               The key-value pairs in this hash will be passed to any
350               AnyEvent::Handle constructor that is called - not all requests
351               will create a handle, and sometimes more than one is created,
352               so this parameter is only good for setting hints.
353
354               Example: set the maximum read size to 4096, to potentially
355               conserve memory at the cost of speed.
356
357                  handle_params => {
358                     max_read_size => 4096,
359                  },
360
361           Example: do a simple HTTP GET request for http://www.nethype.de/
362           and print the response body.
363
364              http_request GET => "http://www.nethype.de/", sub {
365                 my ($body, $hdr) = @_;
366                 print "$body\n";
367              };
368
369           Example: do a HTTP HEAD request on https://www.google.com/, use a
370           timeout of 30 seconds.
371
372              http_request
373                 HEAD    => "https://www.google.com",
374                 headers => { "user-agent" => "MySearchClient 1.0" },
375                 timeout => 30,
376                 sub {
377                    my ($body, $hdr) = @_;
378                    use Data::Dumper;
379                    print Dumper $hdr;
380                 }
381              ;
382
383           Example: do another simple HTTP GET request, but immediately try to
384           cancel it.
385
386              my $request = http_request GET => "http://www.nethype.de/", sub {
387                 my ($body, $hdr) = @_;
388                 print "$body\n";
389              };
390
391              undef $request;
392
393   DNS CACHING
394       AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the
395       actual connection, which in turn uses AnyEvent::DNS to resolve
396       hostnames. The latter is a simple stub resolver and does no caching on
397       its own. If you want DNS caching, you currently have to provide your
398       own default resolver (by storing a suitable resolver object in
399       $AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback.
400
401   GLOBAL FUNCTIONS AND VARIABLES
402       AnyEvent::HTTP::set_proxy "proxy-url"
403           Sets the default proxy server to use. The proxy-url must begin with
404           a string of the form "http://host:port", croaks otherwise.
405
406           To clear an already-set proxy, use "undef".
407
408           When AnyEvent::HTTP is loaded for the first time it will query the
409           default proxy from the operating system, currently by looking at
410           "$ENV{http_proxy"}.
411
412       AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
413           Remove all cookies from the cookie jar that have been expired. If
414           $session_end is given and true, then additionally remove all
415           session cookies.
416
417           You should call this function (with a true $session_end) before you
418           save cookies to disk, and you should call this function after
419           loading them again. If you have a long-running program you can
420           additionally call this function from time to time.
421
422           A cookie jar is initially an empty hash-reference that is managed
423           by this module. Its format is subject to change, but currently it
424           is as follows:
425
426           The key "version" has to contain 2, otherwise the hash gets
427           cleared. All other keys are hostnames or IP addresses pointing to
428           hash-references. The key for these inner hash references is the
429           server path for which this cookie is meant, and the values are
430           again hash-references. Each key of those hash-references is a
431           cookie name, and the value, you guessed it, is another hash-
432           reference, this time with the key-value pairs from the cookie,
433           except for "expires" and "max-age", which have been replaced by a
434           "_expires" key that contains the cookie expiry timestamp. Session
435           cookies are indicated by not having an "_expires" key.
436
437           Here is an example of a cookie jar with a single cookie, so you
438           have a chance of understanding the above paragraph:
439
440              {
441                 version    => 2,
442                 "10.0.0.1" => {
443                    "/" => {
444                       "mythweb_id" => {
445                         _expires => 1293917923,
446                         value    => "ooRung9dThee3ooyXooM1Ohm",
447                       },
448                    },
449                 },
450              }
451
452       $date = AnyEvent::HTTP::format_date $timestamp
453           Takes a POSIX timestamp (seconds since the epoch) and formats it as
454           a HTTP Date (RFC 2616).
455
456       $timestamp = AnyEvent::HTTP::parse_date $date
457           Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie
458           spec) or a bunch of minor variations of those, and returns the
459           corresponding POSIX timestamp, or "undef" if the date cannot be
460           parsed.
461
462       $AnyEvent::HTTP::MAX_RECURSE
463           The default value for the "recurse" request parameter (default:
464           10).
465
466       $AnyEvent::HTTP::TIMEOUT
467           The default timeout for connection operations (default: 300).
468
469       $AnyEvent::HTTP::USERAGENT
470           The default value for the "User-Agent" header (the default is
471           "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION;
472           +http://software.schmorp.de/pkg/AnyEvent)").
473
474       $AnyEvent::HTTP::MAX_PER_HOST
475           The maximum number of concurrent connections to the same host
476           (identified by the hostname). If the limit is exceeded, then
477           additional requests are queued until previous connections are
478           closed. Both persistent and non-persistent connections are counted
479           in this limit.
480
481           The default value for this is 4, and it is highly advisable to not
482           increase it much.
483
484           For comparison: the RFC's recommend 4 non-persistent or 2
485           persistent connections, older browsers used 2, newer ones (such as
486           firefox 3) typically use 6, and Opera uses 8 because like, they
487           have the fastest browser and give a shit for everybody else on the
488           planet.
489
490       $AnyEvent::HTTP::PERSISTENT_TIMEOUT
491           The time after which idle persistent connections get closed by
492           AnyEvent::HTTP (default: 3).
493
494       $AnyEvent::HTTP::ACTIVE
495           The number of active connections. This is not the number of
496           currently running requests, but the number of currently open and
497           non-idle TCP connections. This number can be useful for load-
498           leveling.
499
500   SHOWCASE
501       This section contains some more elaborate "real-world" examples or code
502       snippets.
503
504   HTTP/1.1 FILE DOWNLOAD
505       Downloading files with HTTP can be quite tricky, especially when
506       something goes wrong and you want to resume.
507
508       Here is a function that initiates and resumes a download. It uses the
509       last modified time to check for file content changes, and works with
510       many HTTP/1.0 servers as well, and usually falls back to a complete re-
511       download on older servers.
512
513       It calls the completion callback with either "undef", which means a
514       nonretryable error occurred, 0 when the download was partial and should
515       be retried, and 1 if it was successful.
516
517          use AnyEvent::HTTP;
518
519          sub download($$$) {
520             my ($url, $file, $cb) = @_;
521
522             open my $fh, "+<", $file
523                or die "$file: $!";
524
525             my %hdr;
526             my $ofs = 0;
527
528             if (stat $fh and -s _) {
529                $ofs = -s _;
530                warn "-s is ", $ofs;
531                $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
532                $hdr{"range"} = "bytes=$ofs-";
533             }
534
535             http_get $url,
536                headers   => \%hdr,
537                on_header => sub {
538                   my ($hdr) = @_;
539
540                   if ($hdr->{Status} == 200 && $ofs) {
541                      # resume failed
542                      truncate $fh, $ofs = 0;
543                   }
544
545                   sysseek $fh, $ofs, 0;
546
547                   1
548                },
549                on_body   => sub {
550                   my ($data, $hdr) = @_;
551
552                   if ($hdr->{Status} =~ /^2/) {
553                      length $data == syswrite $fh, $data
554                         or return; # abort on write errors
555                   }
556
557                   1
558                },
559                sub {
560                   my (undef, $hdr) = @_;
561
562                   my $status = $hdr->{Status};
563
564                   if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
565                      utime $time, $time, $fh;
566                   }
567
568                   if ($status == 200 || $status == 206 || $status == 416) {
569                      # download ok || resume ok || file already fully downloaded
570                      $cb->(1, $hdr);
571
572                   } elsif ($status == 412) {
573                      # file has changed while resuming, delete and retry
574                      unlink $file;
575                      $cb->(0, $hdr);
576
577                   } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
578                      # retry later
579                      $cb->(0, $hdr);
580
581                   } else {
582                      $cb->(undef, $hdr);
583                   }
584                }
585             ;
586          }
587
588          download "http://server/somelargefile", "/tmp/somelargefile", sub {
589             if ($_[0]) {
590                print "OK!\n";
591             } elsif (defined $_[0]) {
592                print "please retry later\n";
593             } else {
594                print "ERROR\n";
595             }
596          };
597
598       SOCKS PROXIES
599
600       Socks proxies are not directly supported by AnyEvent::HTTP. You can
601       compile your perl to support socks, or use an external program such as
602       socksify (dante) or tsocks to make your program use a socks proxy
603       transparently.
604
605       Alternatively, for AnyEvent::HTTP only, you can use your own
606       "tcp_connect" function that does the proxy handshake - here is an
607       example that works with socks4a proxies:
608
609          use Errno;
610          use AnyEvent::Util;
611          use AnyEvent::Socket;
612          use AnyEvent::Handle;
613
614          # host, port and username of/for your socks4a proxy
615          my $socks_host = "10.0.0.23";
616          my $socks_port = 9050;
617          my $socks_user = "";
618
619          sub socks4a_connect {
620             my ($host, $port, $connect_cb, $prepare_cb) = @_;
621
622             my $hdl = new AnyEvent::Handle
623                connect    => [$socks_host, $socks_port],
624                on_prepare => sub { $prepare_cb->($_[0]{fh}) },
625                on_error   => sub { $connect_cb->() },
626             ;
627
628             $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
629
630             $hdl->push_read (chunk => 8, sub {
631                my ($hdl, $chunk) = @_;
632                my ($status, $port, $ipn) = unpack "xCna4", $chunk;
633
634                if ($status == 0x5a) {
635                   $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
636                } else {
637                   $! = Errno::ENXIO; $connect_cb->();
638                }
639             });
640
641             $hdl
642          }
643
644       Use "socks4a_connect" instead of "tcp_connect" when doing
645       "http_request"s, possibly after switching off other proxy types:
646
647          AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
648
649          http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
650             my ($data, $headers) = @_;
651             ...
652          };
653

SEE ALSO

655       AnyEvent.
656

AUTHOR

658          Marc Lehmann <schmorp@schmorp.de>
659          http://home.schmorp.de/
660
661       With many thanks to Дмитрий Шалашов, who provided
662       countless testcases and bugreports.
663

POD ERRORS

665       Hey! The above document had some coding errors, which are explained
666       below:
667
668       Around line 1618:
669           Non-ASCII character seen before =encoding in 'Дмитрий'.
670           Assuming CP1252
671
672
673
674perl v5.32.1                      2021-01-26                           HTTP(3)
Impressum