1HTTP(3) User Contributed Perl Documentation HTTP(3)
2
3
4
6 AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
7
9 use AnyEvent::HTTP;
10
11 http_get "http://www.nethype.de/", sub { print $_[1] };
12
13 # ... do something else here
14
16 This module is an AnyEvent user, you need to make sure that you use and
17 run a supported event loop.
18
19 This module implements a simple, stateless and non-blocking HTTP
20 client. It supports GET, POST and other request methods, cookies and
21 more, all on a very low level. It can follow redirects, supports
22 proxies, and automatically limits the number of connections to the
23 values specified in the RFC.
24
25 It should generally be a "good client" that is enough for most HTTP
26 tasks. Simple tasks should be simple, but complex tasks should still be
27 possible as the user retains control over request and response headers.
28
29 The caller is responsible for authentication management, cookies (if
30 the simplistic implementation in this module doesn't suffice), referer
31 and other high-level protocol details for which this module offers only
32 limited support.
33
34 METHODS
35 http_get $url, key => value..., $cb->($data, $headers)
36 Executes an HTTP-GET request. See the http_request function for
37 details on additional parameters and the return value.
38
39 http_head $url, key => value..., $cb->($data, $headers)
40 Executes an HTTP-HEAD request. See the http_request function for
41 details on additional parameters and the return value.
42
43 http_post $url, $body, key => value..., $cb->($data, $headers)
44 Executes an HTTP-POST request with a request body of $body. See the
45 http_request function for details on additional parameters and the
46 return value.
47
48 http_request $method => $url, key => value..., $cb->($data, $headers)
49 Executes a HTTP request of type $method (e.g. "GET", "POST"). The
50 URL must be an absolute http or https URL.
51
52 When called in void context, nothing is returned. In other
53 contexts, "http_request" returns a "cancellation guard" - you have
54 to keep the object at least alive until the callback get called. If
55 the object gets destroyed before the callback is called, the
56 request will be cancelled.
57
58 The callback will be called with the response body data as first
59 argument (or "undef" if an error occurred), and a hash-ref with
60 response headers (and trailers) as second argument.
61
62 All the headers in that hash are lowercased. In addition to the
63 response headers, the "pseudo-headers" (uppercase to avoid clashing
64 with possible response headers) "HTTPVersion", "Status" and
65 "Reason" contain the three parts of the HTTP Status-Line of the
66 same name. If an error occurs during the body phase of a request,
67 then the original "Status" and "Reason" values from the header are
68 available as "OrigStatus" and "OrigReason".
69
70 The pseudo-header "URL" contains the actual URL (which can differ
71 from the requested URL when following redirects - for example, you
72 might get an error that your URL scheme is not supported even
73 though your URL is a valid http URL because it redirected to an ftp
74 URL, in which case you can look at the URL pseudo header).
75
76 The pseudo-header "Redirect" only exists when the request was a
77 result of an internal redirect. In that case it is an array
78 reference with the "($data, $headers)" from the redirect response.
79 Note that this response could in turn be the result of a redirect
80 itself, and "$headers->{Redirect}[1]{Redirect}" will then contain
81 the original response, and so on.
82
83 If the server sends a header multiple times, then their contents
84 will be joined together with a comma (","), as per the HTTP spec.
85
86 If an internal error occurs, such as not being able to resolve a
87 hostname, then $data will be "undef", "$headers->{Status}" will be
88 590-599 and the "Reason" pseudo-header will contain an error
89 message. Currently the following status codes are used:
90
91 595 - errors during connection establishment, proxy handshake.
92 596 - errors during TLS negotiation, request sending and header
93 processing.
94 597 - errors during body receiving or processing.
95 598 - user aborted request via "on_header" or "on_body".
96 599 - other, usually nonretryable, errors (garbled URL etc.).
97
98 A typical callback might look like this:
99
100 sub {
101 my ($body, $hdr) = @_;
102
103 if ($hdr->{Status} =~ /^2/) {
104 ... everything should be ok
105 } else {
106 print "error, $hdr->{Status} $hdr->{Reason}\n";
107 }
108 }
109
110 Additional parameters are key-value pairs, and are fully optional.
111 They include:
112
113 recurse => $count (default: $MAX_RECURSE)
114 Whether to recurse requests or not, e.g. on redirects,
115 authentication and other retries and so on, and how often to do
116 so.
117
118 Only redirects to http and https URLs are supported. While most
119 common redirection forms are handled entirely within this
120 module, some require the use of the optional URI module. If it
121 is required but missing, then the request will fail with an
122 error.
123
124 headers => hashref
125 The request headers to use. Currently, "http_request" may
126 provide its own "Host:", "Content-Length:", "Connection:" and
127 "Cookie:" headers and will provide defaults at least for "TE:",
128 "Referer:" and "User-Agent:" (this can be suppressed by using
129 "undef" for these headers in which case they won't be sent at
130 all).
131
132 You really should provide your own "User-Agent:" header value
133 that is appropriate for your program - I wouldn't be surprised
134 if the default AnyEvent string gets blocked by webservers
135 sooner or later.
136
137 Also, make sure that your headers names and values do not
138 contain any embedded newlines.
139
140 timeout => $seconds
141 The time-out to use for various stages - each connect attempt
142 will reset the timeout, as will read or write activity, i.e.
143 this is not an overall timeout.
144
145 Default timeout is 5 minutes.
146
147 proxy => [$host, $port[, $scheme]] or undef
148 Use the given http proxy for all requests, or no proxy if
149 "undef" is used.
150
151 $scheme must be either missing or must be "http" for HTTP.
152
153 If not specified, then the default proxy is used (see
154 "AnyEvent::HTTP::set_proxy").
155
156 Currently, if your proxy requires authorization, you have to
157 specify an appropriate "Proxy-Authorization" header in every
158 request.
159
160 body => $string
161 The request body, usually empty. Will be sent as-is (future
162 versions of this module might offer more options).
163
164 cookie_jar => $hash_ref
165 Passing this parameter enables (simplified) cookie-processing,
166 loosely based on the original netscape specification.
167
168 The $hash_ref must be an (initially empty) hash reference which
169 will get updated automatically. It is possible to save the
170 cookie jar to persistent storage with something like JSON or
171 Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function
172 if you wish to remove expired or session-only cookies, and also
173 for documentation on the format of the cookie jar.
174
175 Note that this cookie implementation is not meant to be
176 complete. If you want complete cookie management you have to do
177 that on your own. "cookie_jar" is meant as a quick fix to get
178 most cookie-using sites working. Cookies are a privacy
179 disaster, do not use them unless required to.
180
181 When cookie processing is enabled, the "Cookie:" and
182 "Set-Cookie:" headers will be set and handled by this module,
183 otherwise they will be left untouched.
184
185 tls_ctx => $scheme | $tls_ctx
186 Specifies the AnyEvent::TLS context to be used for https
187 connections. This parameter follows the same rules as the
188 "tls_ctx" parameter to AnyEvent::Handle, but additionally, the
189 two strings "low" or "high" can be specified, which give you a
190 predefined low-security (no verification, highest
191 compatibility) and high-security (CA and common-name
192 verification) TLS context.
193
194 The default for this option is "low", which could be
195 interpreted as "give me the page, no matter what".
196
197 See also the "sessionid" parameter.
198
199 session => $string
200 The module might reuse connections to the same host internally.
201 Sometimes (e.g. when using TLS), you do not want to reuse
202 connections from other sessions. This can be achieved by
203 setting this parameter to some unique ID (such as the address
204 of an object storing your state data, or the TLS context) -
205 only connections using the same unique ID will be reused.
206
207 on_prepare => $callback->($fh)
208 In rare cases you need to "tune" the socket before it is used
209 to connect (for example, to bind it on a given IP address).
210 This parameter overrides the prepare callback passed to
211 "AnyEvent::Socket::tcp_connect" and behaves exactly the same
212 way (e.g. it has to provide a timeout). See the description for
213 the $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for
214 details.
215
216 tcp_connect => $callback->($host, $service, $connect_cb,
217 $prepare_cb)
218 In even rarer cases you want total control over how
219 AnyEvent::HTTP establishes connections. Normally it uses
220 AnyEvent::Socket::tcp_connect to do this, but you can provide
221 your own "tcp_connect" function - obviously, it has to follow
222 the same calling conventions, except that it may always return
223 a connection guard object.
224
225 There are probably lots of weird uses for this function,
226 starting from tracing the hosts "http_request" actually tries
227 to connect, to (inexact but fast) host => IP address caching or
228 even socks protocol support.
229
230 on_header => $callback->($headers)
231 When specified, this callback will be called with the header
232 hash as soon as headers have been successfully received from
233 the remote server (not on locally-generated errors).
234
235 It has to return either true (in which case AnyEvent::HTTP will
236 continue), or false, in which case AnyEvent::HTTP will cancel
237 the download (and call the finish callback with an error code
238 of 598).
239
240 This callback is useful, among other things, to quickly reject
241 unwanted content, which, if it is supposed to be rare, can be
242 faster than first doing a "HEAD" request.
243
244 The downside is that cancelling the request makes it impossible
245 to re-use the connection. Also, the "on_header" callback will
246 not receive any trailer (headers sent after the response body).
247
248 Example: cancel the request unless the content-type is
249 "text/html".
250
251 on_header => sub {
252 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
253 },
254
255 on_body => $callback->($partial_body, $headers)
256 When specified, all body data will be passed to this callback
257 instead of to the completion callback. The completion callback
258 will get the empty string instead of the body data.
259
260 It has to return either true (in which case AnyEvent::HTTP will
261 continue), or false, in which case AnyEvent::HTTP will cancel
262 the download (and call the completion callback with an error
263 code of 598).
264
265 The downside to cancelling the request is that it makes it
266 impossible to re-use the connection.
267
268 This callback is useful when the data is too large to be held
269 in memory (so the callback writes it to a file) or when only
270 some information should be extracted, or when the body should
271 be processed incrementally.
272
273 It is usually preferred over doing your own body handling via
274 "want_body_handle", but in case of streaming APIs, where HTTP
275 is only used to create a connection, "want_body_handle" is the
276 better alternative, as it allows you to install your own event
277 handler, reducing resource usage.
278
279 want_body_handle => $enable
280 When enabled (default is disabled), the behaviour of
281 AnyEvent::HTTP changes considerably: after parsing the headers,
282 and instead of downloading the body (if any), the completion
283 callback will be called. Instead of the $body argument
284 containing the body data, the callback will receive the
285 AnyEvent::Handle object associated with the connection. In
286 error cases, "undef" will be passed. When there is no body
287 (e.g. status 304), the empty string will be passed.
288
289 The handle object might or might not be in TLS mode, might be
290 connected to a proxy, be a persistent connection, use chunked
291 transfer encoding etc., and configured in unspecified ways. The
292 user is responsible for this handle (it will not be used by
293 this module anymore).
294
295 This is useful with some push-type services, where, after the
296 initial headers, an interactive protocol is used (typical
297 example would be the push-style twitter API which starts a
298 JSON/XML stream).
299
300 If you think you need this, first have a look at "on_body", to
301 see if that doesn't solve your problem in a better way.
302
303 persistent => $boolean
304 Try to create/reuse a persistent connection. When this flag is
305 set (default: true for idempotent requests, false for all
306 others), then "http_request" tries to re-use an existing
307 (previously-created) persistent connection to the host and,
308 failing that, tries to create a new one.
309
310 Requests failing in certain ways will be automatically retried
311 once, which is dangerous for non-idempotent requests, which is
312 why it defaults to off for them. The reason for this is because
313 the bozos who designed HTTP/1.1 made it impossible to
314 distinguish between a fatal error and a normal connection
315 timeout, so you never know whether there was a problem with
316 your request or not.
317
318 When reusing an existent connection, many parameters (such as
319 TLS context) will be ignored. See the "session" parameter for a
320 workaround.
321
322 keepalive => $boolean
323 Only used when "persistent" is also true. This parameter
324 decides whether "http_request" tries to handshake a
325 HTTP/1.0-style keep-alive connection (as opposed to only a
326 HTTP/1.1 persistent connection).
327
328 The default is true, except when using a proxy, in which case
329 it defaults to false, as HTTP/1.0 proxies cannot support this
330 in a meaningful way.
331
332 handle_params => { key => value ... }
333 The key-value pairs in this hash will be passed to any
334 AnyEvent::Handle constructor that is called - not all requests
335 will create a handle, and sometimes more than one is created,
336 so this parameter is only good for setting hints.
337
338 Example: set the maximum read size to 4096, to potentially
339 conserve memory at the cost of speed.
340
341 handle_params => {
342 max_read_size => 4096,
343 },
344
345 Example: do a simple HTTP GET request for http://www.nethype.de/
346 and print the response body.
347
348 http_request GET => "http://www.nethype.de/", sub {
349 my ($body, $hdr) = @_;
350 print "$body\n";
351 };
352
353 Example: do a HTTP HEAD request on https://www.google.com/, use a
354 timeout of 30 seconds.
355
356 http_request
357 HEAD => "https://www.google.com",
358 headers => { "user-agent" => "MySearchClient 1.0" },
359 timeout => 30,
360 sub {
361 my ($body, $hdr) = @_;
362 use Data::Dumper;
363 print Dumper $hdr;
364 }
365 ;
366
367 Example: do another simple HTTP GET request, but immediately try to
368 cancel it.
369
370 my $request = http_request GET => "http://www.nethype.de/", sub {
371 my ($body, $hdr) = @_;
372 print "$body\n";
373 };
374
375 undef $request;
376
377 DNS CACHING
378 AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the
379 actual connection, which in turn uses AnyEvent::DNS to resolve
380 hostnames. The latter is a simple stub resolver and does no caching on
381 its own. If you want DNS caching, you currently have to provide your
382 own default resolver (by storing a suitable resolver object in
383 $AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback.
384
385 GLOBAL FUNCTIONS AND VARIABLES
386 AnyEvent::HTTP::set_proxy "proxy-url"
387 Sets the default proxy server to use. The proxy-url must begin with
388 a string of the form "http://host:port", croaks otherwise.
389
390 To clear an already-set proxy, use "undef".
391
392 When AnyEvent::HTTP is loaded for the first time it will query the
393 default proxy from the operating system, currently by looking at
394 "$ENV{http_proxy"}.
395
396 AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
397 Remove all cookies from the cookie jar that have been expired. If
398 $session_end is given and true, then additionally remove all
399 session cookies.
400
401 You should call this function (with a true $session_end) before you
402 save cookies to disk, and you should call this function after
403 loading them again. If you have a long-running program you can
404 additionally call this function from time to time.
405
406 A cookie jar is initially an empty hash-reference that is managed
407 by this module. Its format is subject to change, but currently it
408 is as follows:
409
410 The key "version" has to contain 1, otherwise the hash gets
411 emptied. All other keys are hostnames or IP addresses pointing to
412 hash-references. The key for these inner hash references is the
413 server path for which this cookie is meant, and the values are
414 again hash-references. Each key of those hash-references is a
415 cookie name, and the value, you guessed it, is another hash-
416 reference, this time with the key-value pairs from the cookie,
417 except for "expires" and "max-age", which have been replaced by a
418 "_expires" key that contains the cookie expiry timestamp. Session
419 cookies are indicated by not having an "_expires" key.
420
421 Here is an example of a cookie jar with a single cookie, so you
422 have a chance of understanding the above paragraph:
423
424 {
425 version => 1,
426 "10.0.0.1" => {
427 "/" => {
428 "mythweb_id" => {
429 _expires => 1293917923,
430 value => "ooRung9dThee3ooyXooM1Ohm",
431 },
432 },
433 },
434 }
435
436 $date = AnyEvent::HTTP::format_date $timestamp
437 Takes a POSIX timestamp (seconds since the epoch) and formats it as
438 a HTTP Date (RFC 2616).
439
440 $timestamp = AnyEvent::HTTP::parse_date $date
441 Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie
442 spec) or a bunch of minor variations of those, and returns the
443 corresponding POSIX timestamp, or "undef" if the date cannot be
444 parsed.
445
446 $AnyEvent::HTTP::MAX_RECURSE
447 The default value for the "recurse" request parameter (default:
448 10).
449
450 $AnyEvent::HTTP::TIMEOUT
451 The default timeout for connection operations (default: 300).
452
453 $AnyEvent::HTTP::USERAGENT
454 The default value for the "User-Agent" header (the default is
455 "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION;
456 +http://software.schmorp.de/pkg/AnyEvent)").
457
458 $AnyEvent::HTTP::MAX_PER_HOST
459 The maximum number of concurrent connections to the same host
460 (identified by the hostname). If the limit is exceeded, then
461 additional requests are queued until previous connections are
462 closed. Both persistent and non-persistent connections are counted
463 in this limit.
464
465 The default value for this is 4, and it is highly advisable to not
466 increase it much.
467
468 For comparison: the RFC's recommend 4 non-persistent or 2
469 persistent connections, older browsers used 2, newer ones (such as
470 firefox 3) typically use 6, and Opera uses 8 because like, they
471 have the fastest browser and give a shit for everybody else on the
472 planet.
473
474 $AnyEvent::HTTP::PERSISTENT_TIMEOUT
475 The time after which idle persistent connections get closed by
476 AnyEvent::HTTP (default: 3).
477
478 $AnyEvent::HTTP::ACTIVE
479 The number of active connections. This is not the number of
480 currently running requests, but the number of currently open and
481 non-idle TCP connections. This number can be useful for load-
482 leveling.
483
484 SHOWCASE
485 This section contains some more elaborate "real-world" examples or code
486 snippets.
487
488 HTTP/1.1 FILE DOWNLOAD
489 Downloading files with HTTP can be quite tricky, especially when
490 something goes wrong and you want to resume.
491
492 Here is a function that initiates and resumes a download. It uses the
493 last modified time to check for file content changes, and works with
494 many HTTP/1.0 servers as well, and usually falls back to a complete re-
495 download on older servers.
496
497 It calls the completion callback with either "undef", which means a
498 nonretryable error occurred, 0 when the download was partial and should
499 be retried, and 1 if it was successful.
500
501 use AnyEvent::HTTP;
502
503 sub download($$$) {
504 my ($url, $file, $cb) = @_;
505
506 open my $fh, "+<", $file
507 or die "$file: $!";
508
509 my %hdr;
510 my $ofs = 0;
511
512 if (stat $fh and -s _) {
513 $ofs = -s _;
514 warn "-s is ", $ofs;
515 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
516 $hdr{"range"} = "bytes=$ofs-";
517 }
518
519 http_get $url,
520 headers => \%hdr,
521 on_header => sub {
522 my ($hdr) = @_;
523
524 if ($hdr->{Status} == 200 && $ofs) {
525 # resume failed
526 truncate $fh, $ofs = 0;
527 }
528
529 sysseek $fh, $ofs, 0;
530
531 1
532 },
533 on_body => sub {
534 my ($data, $hdr) = @_;
535
536 if ($hdr->{Status} =~ /^2/) {
537 length $data == syswrite $fh, $data
538 or return; # abort on write errors
539 }
540
541 1
542 },
543 sub {
544 my (undef, $hdr) = @_;
545
546 my $status = $hdr->{Status};
547
548 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
549 utime $time, $time, $fh;
550 }
551
552 if ($status == 200 || $status == 206 || $status == 416) {
553 # download ok || resume ok || file already fully downloaded
554 $cb->(1, $hdr);
555
556 } elsif ($status == 412) {
557 # file has changed while resuming, delete and retry
558 unlink $file;
559 $cb->(0, $hdr);
560
561 } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
562 # retry later
563 $cb->(0, $hdr);
564
565 } else {
566 $cb->(undef, $hdr);
567 }
568 }
569 ;
570 }
571
572 download "http://server/somelargefile", "/tmp/somelargefile", sub {
573 if ($_[0]) {
574 print "OK!\n";
575 } elsif (defined $_[0]) {
576 print "please retry later\n";
577 } else {
578 print "ERROR\n";
579 }
580 };
581
582 SOCKS PROXIES
583
584 Socks proxies are not directly supported by AnyEvent::HTTP. You can
585 compile your perl to support socks, or use an external program such as
586 socksify (dante) or tsocks to make your program use a socks proxy
587 transparently.
588
589 Alternatively, for AnyEvent::HTTP only, you can use your own
590 "tcp_connect" function that does the proxy handshake - here is an
591 example that works with socks4a proxies:
592
593 use Errno;
594 use AnyEvent::Util;
595 use AnyEvent::Socket;
596 use AnyEvent::Handle;
597
598 # host, port and username of/for your socks4a proxy
599 my $socks_host = "10.0.0.23";
600 my $socks_port = 9050;
601 my $socks_user = "";
602
603 sub socks4a_connect {
604 my ($host, $port, $connect_cb, $prepare_cb) = @_;
605
606 my $hdl = new AnyEvent::Handle
607 connect => [$socks_host, $socks_port],
608 on_prepare => sub { $prepare_cb->($_[0]{fh}) },
609 on_error => sub { $connect_cb->() },
610 ;
611
612 $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
613
614 $hdl->push_read (chunk => 8, sub {
615 my ($hdl, $chunk) = @_;
616 my ($status, $port, $ipn) = unpack "xCna4", $chunk;
617
618 if ($status == 0x5a) {
619 $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
620 } else {
621 $! = Errno::ENXIO; $connect_cb->();
622 }
623 });
624
625 $hdl
626 }
627
628 Use "socks4a_connect" instead of "tcp_connect" when doing
629 "http_request"s, possibly after switching off other proxy types:
630
631 AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
632
633 http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
634 my ($data, $headers) = @_;
635 ...
636 };
637
639 AnyEvent.
640
642 Marc Lehmann <schmorp@schmorp.de>
643 http://home.schmorp.de/
644
645 With many thanks to Дмитрий Шалашов, who provided
646 countless testcases and bugreports.
647
649 Hey! The above document had some coding errors, which are explained
650 below:
651
652 Around line 1604:
653 Non-ASCII character seen before =encoding in 'Дмитрий'.
654 Assuming CP1252
655
656
657
658perl v5.30.1 2020-01-29 HTTP(3)