1LWP::UserAgent(3) User Contributed Perl Documentation LWP::UserAgent(3)
2
3
4
6 LWP::UserAgent - Web user agent class
7
9 require LWP::UserAgent;
10
11 my $ua = LWP::UserAgent->new;
12 $ua->timeout(10);
13 $ua->env_proxy;
14
15 my $response = $ua->get('http://search.cpan.org/');
16
17 if ($response->is_success) {
18 print $response->content; # or whatever
19 }
20 else {
21 die $response->status_line;
22 }
23
25 The "LWP::UserAgent" is a class implementing a web user agent.
26 "LWP::UserAgent" objects can be used to dispatch web requests.
27
28 In normal use the application creates an "LWP::UserAgent" object, and
29 then configures it with values for timeouts, proxies, name, etc. It
30 then creates an instance of "HTTP::Request" for the request that needs
31 to be performed. This request is then passed to one of the request
32 method the UserAgent, which dispatches it using the relevant protocol,
33 and returns a "HTTP::Response" object. There are convenience methods
34 for sending the most common request types: get(), head() and post().
35 When using these methods then the creation of the request object is
36 hidden as shown in the synopsis above.
37
38 The basic approach of the library is to use HTTP style communication
39 for all protocol schemes. This means that you will construct
40 "HTTP::Request" objects and receive "HTTP::Response" objects even for
41 non-HTTP resources like gopher and ftp. In order to achieve even more
42 similarity to HTTP style communications, gopher menus and file directo‐
43 ries are converted to HTML documents.
44
46 The following constructor methods are available:
47
48 $ua = LWP::UserAgent->new( %options )
49 This method constructs a new "LWP::UserAgent" object and returns
50 it. Key/value pair arguments may be provided to set up the initial
51 state. The following options correspond to attribute methods
52 described below:
53
54 KEY DEFAULT
55 ----------- --------------------
56 agent "libwww-perl/#.##"
57 from undef
58 conn_cache undef
59 cookie_jar undef
60 default_headers HTTP::Headers->new
61 max_size undef
62 max_redirect 7
63 parse_head 1
64 protocols_allowed undef
65 protocols_forbidden undef
66 requests_redirectable ['GET', 'HEAD']
67 timeout 180
68
69 The following additional options are also accepted: If the
70 "env_proxy" option is passed in with a TRUE value, then proxy set‐
71 tings are read from environment variables (see env_proxy() method
72 below). If the "keep_alive" option is passed in, then a "LWP::Con‐
73 nCache" is set up (see conn_cache() method below). The
74 "keep_alive" value is passed on as the "total_capacity" for the
75 connection cache.
76
77 $ua->clone
78 Returns a copy of the LWP::UserAgent object.
79
81 The settings of the configuration attributes modify the behaviour of
82 the "LWP::UserAgent" when it dispatches requests. Most of these can
83 also be initialized by options passed to the constructor method.
84
85 The following attributes methods are provided. The attribute value is
86 left unchanged if no argument is given. The return value from each
87 method is the old attribute value.
88
89 $ua->agent
90 $ua->agent( $product_id )
91 Get/set the product token that is used to identify the user agent
92 on the network. The agent value is sent as the "User-Agent" header
93 in the requests. The default is the string returned by the
94 _agent() method (see below).
95
96 If the $product_id ends with space then the _agent() string is
97 appended to it.
98
99 The user agent string should be one or more simple product identi‐
100 fiers with an optional version number separated by the "/" charac‐
101 ter. Examples are:
102
103 $ua->agent('Checkbot/0.4 ' . $ua->_agent);
104 $ua->agent('Checkbot/0.4 '); # same as above
105 $ua->agent('Mozilla/5.0');
106 $ua->agent(""); # don't identify
107
108 $ua->_agent
109 Returns the default agent identifier. This is a string of the form
110 "libwww-perl/#.##", where "#.##" is substituted with the version
111 number of this library.
112
113 $ua->from
114 $ua->from( $email_address )
115 Get/set the e-mail address for the human user who controls the
116 requesting user agent. The address should be machine-usable, as
117 defined in RFC 822. The "from" value is send as the "From" header
118 in the requests. Example:
119
120 $ua->from('gaas@cpan.org');
121
122 The default is to not send a "From" header. See the default_head‐
123 ers() method for the more general interface that allow any header
124 to be defaulted.
125
126 $ua->cookie_jar
127 $ua->cookie_jar( $cookie_jar_obj )
128 Get/set the cookie jar object to use. The only requirement is that
129 the cookie jar object must implement the extract_cookies($request)
130 and add_cookie_header($response) methods. These methods will then
131 be invoked by the user agent as requests are sent and responses are
132 received. Normally this will be a "HTTP::Cookies" object or some
133 subclass.
134
135 The default is to have no cookie_jar, i.e. never automatically add
136 "Cookie" headers to the requests.
137
138 Shortcut: If a reference to a plain hash is passed in as the
139 $cookie_jar_object, then it is replaced with an instance of
140 "HTTP::Cookies" that is initialized based on the hash. This form
141 also automatically loads the "HTTP::Cookies" module. It means
142 that:
143
144 $ua->cookie_jar({ file => "$ENV{HOME}/.cookies.txt" });
145
146 is really just a shortcut for:
147
148 require HTTP::Cookies;
149 $ua->cookie_jar(HTTP::Cookies->new(file => "$ENV{HOME}/.cookies.txt"));
150
151 $ua->default_headers
152 $ua->default_headers( $headers_obj )
153 Get/set the headers object that will provide default header values
154 for any requests sent. By default this will be an empty
155 "HTTP::Headers" object. Example:
156
157 $ua->default_headers->push_header('Accept-Language' => "no, en");
158
159 $ua->default_header( $field )
160 $ua->default_header( $field => $value )
161 This is just a short-cut for $ua->default_headers->header( $field
162 => $value ). Example:
163
164 $ua->default_header('Accept-Language' => "no, en");
165
166 $ua->conn_cache
167 $ua->conn_cache( $cache_obj )
168 Get/set the "LWP::ConnCache" object to use. See LWP::ConnCache for
169 details.
170
171 $ua->credentials( $netloc, $realm, $uname, $pass )
172 Set the user name and password to be used for a realm. It is often
173 more useful to specialize the get_basic_credentials() method
174 instead.
175
176 $ua->max_size
177 $ua->max_size( $bytes )
178 Get/set the size limit for response content. The default is
179 "undef", which means that there is no limit. If the returned
180 response content is only partial, because the size limit was
181 exceeded, then a "Client-Aborted" header will be added to the
182 response. The content might end up longer than "max_size" as we
183 abort once appending a chunk of data makes the length exceed the
184 limit. The "Content-Length" header, if present, will indicate the
185 length of the full content and will normally not be the same as
186 "length($res->content)".
187
188 $ua->max_redirect
189 $ua->max_redirect( $n )
190 This reads or sets the object's limit of how many times it will
191 obey redirection responses in a given request cycle.
192
193 By default, the value is 7. This means that if you call request()
194 method and the response is a redirect elsewhere which is in turn a
195 redirect, and so on seven times, then LWP gives up after that sev‐
196 enth request.
197
198 $ua->parse_head
199 $ua->parse_head( $boolean )
200 Get/set a value indicating whether we should initialize response
201 headers from the <head> section of HTML documents. The default is
202 TRUE. Do not turn this off, unless you know what you are doing.
203
204 $ua->protocols_allowed
205 $ua->protocols_allowed( \@protocols )
206 This reads (or sets) this user agent's list of protocols that the
207 request methods will exclusively allow. The protocol names are
208 case insensitive.
209
210 For example: "$ua->protocols_allowed( [ 'http', 'https'] );" means
211 that this user agent will allow only those protocols, and attempts
212 to use this user agent to access URLs with any other schemes (like
213 "ftp://...") will result in a 500 error.
214
215 To delete the list, call: "$ua->protocols_allowed(undef)"
216
217 By default, an object has neither a "protocols_allowed" list, nor a
218 "protocols_forbidden" list.
219
220 Note that having a "protocols_allowed" list causes any "proto‐
221 cols_forbidden" list to be ignored.
222
223 $ua->protocols_forbidden
224 $ua->protocols_forbidden( \@protocols )
225 This reads (or sets) this user agent's list of protocols that the
226 request method will not allow. The protocol names are case insensi‐
227 tive.
228
229 For example: "$ua->protocols_forbidden( [ 'file', 'mailto'] );"
230 means that this user agent will not allow those protocols, and
231 attempts to use this user agent to access URLs with those schemes
232 will result in a 500 error.
233
234 To delete the list, call: "$ua->protocols_forbidden(undef)"
235
236 $ua->requests_redirectable
237 $ua->requests_redirectable( \@requests )
238 This reads or sets the object's list of request names that
239 "$ua->redirect_ok(...)" will allow redirection for. By default,
240 this is "['GET', 'HEAD']", as per RFC 2616. To change to include
241 'POST', consider:
242
243 push @{ $ua->requests_redirectable }, 'POST';
244
245 $ua->timeout
246 $ua->timeout( $secs )
247 Get/set the timeout value in seconds. The default timeout() value
248 is 180 seconds, i.e. 3 minutes.
249
250 The requests is aborted if no activity on the connection to the
251 server is observed for "timeout" seconds. This means that the time
252 it takes for the complete transaction and the request() method to
253 actually return might be longer.
254
255 Proxy attributes
256
257 The following methods set up when requests should be passed via a proxy
258 server.
259
260 $ua->proxy(\@schemes, $proxy_url)
261 $ua->proxy($scheme, $proxy_url)
262 Set/retrieve proxy URL for a scheme:
263
264 $ua->proxy(['http', 'ftp'], 'http://proxy.sn.no:8001/');
265 $ua->proxy('gopher', 'http://proxy.sn.no:8001/');
266
267 The first form specifies that the URL is to be used for proxying of
268 access methods listed in the list in the first method argument,
269 i.e. 'http' and 'ftp'.
270
271 The second form shows a shorthand form for specifying proxy URL for
272 a single access scheme.
273
274 $ua->no_proxy( $domain, ... )
275 Do not proxy requests to the given domains. Calling no_proxy with‐
276 out any domains clears the list of domains. Eg:
277
278 $ua->no_proxy('localhost', 'no', ...);
279
280 $ua->env_proxy
281 Load proxy settings from *_proxy environment variables. You might
282 specify proxies like this (sh-syntax):
283
284 gopher_proxy=http://proxy.my.place/
285 wais_proxy=http://proxy.my.place/
286 no_proxy="localhost,my.domain"
287 export gopher_proxy wais_proxy no_proxy
288
289 csh or tcsh users should use the "setenv" command to define these
290 environment variables.
291
292 On systems with case insensitive environment variables there exists
293 a name clash between the CGI environment variables and the
294 "HTTP_PROXY" environment variable normally picked up by
295 env_proxy(). Because of this "HTTP_PROXY" is not honored for CGI
296 scripts. The "CGI_HTTP_PROXY" environment variable can be used
297 instead.
298
300 The methods described in this section are used to dispatch requests via
301 the user agent. The following request methods are provided:
302
303 $ua->get( $url )
304 $ua->get( $url , $field_name => $value, ... )
305 This method will dispatch a "GET" request on the given $url. Fur‐
306 ther arguments can be given to initialize the headers of the
307 request. These are given as separate name/value pairs. The return
308 value is a response object. See HTTP::Response for a description
309 of the interface it provides.
310
311 Fields names that start with ":" are special. These will not ini‐
312 tialize headers of the request but will determine how the response
313 content is treated. The following special field names are recog‐
314 nized:
315
316 :content_file => $filename
317 :content_cb => \&callback
318 :read_size_hint => $bytes
319
320 If a $filename is provided with the ":content_file" option, then
321 the response content will be saved here instead of in the response
322 object. If a callback is provided with the ":content_cb" option
323 then this function will be called for each chunk of the response
324 content as it is received from the server. If neither of these
325 options are given, then the response content will accumulate in the
326 response object itself. This might not be suitable for very large
327 response bodies. Only one of ":content_file" or ":content_cb" can
328 be specified. The content of unsuccessful responses will always
329 accumulate in the response object itself, regardless of the ":con‐
330 tent_file" or ":content_cb" options passed in.
331
332 The ":read_size_hint" option is passed to the protocol module which
333 will try to read data from the server in chunks of this size. A
334 smaller value for the ":read_size_hint" will result in a higher
335 number of callback invocations.
336
337 The callback function is called with 3 arguments: a chunk of data,
338 a reference to the response object, and a reference to the protocol
339 object. The callback can abort the request by invoking die(). The
340 exception message will show up as the "X-Died" header field in the
341 response returned by the get() function.
342
343 $ua->head( $url )
344 $ua->head( $url , $field_name => $value, ... )
345 This method will dispatch a "HEAD" request on the given $url. Oth‐
346 erwise it works like the get() method described above.
347
348 $ua->post( $url, \%form )
349 $ua->post( $url, \@form )
350 $ua->post( $url, \%form, $field_name => $value, ... )
351 This method will dispatch a "POST" request on the given $url, with
352 %form or @form providing the key/value pairs for the fill-in form
353 content. Additional headers and content options are the same as for
354 the get() method.
355
356 This method will use the POST() function from "HTTP::Request::Com‐
357 mon" to build the request. See HTTP::Request::Common for a details
358 on how to pass form content and other advanced features.
359
360 $ua->mirror( $url, $filename )
361 This method will get the document identified by $url and store it
362 in file called $filename. If the file already exists, then the
363 request will contain an "If-Modified-Since" header matching the
364 modification time of the file. If the document on the server has
365 not changed since this time, then nothing happens. If the document
366 has been updated, it will be downloaded again. The modification
367 time of the file will be forced to match that of the server.
368
369 The return value is the the response object.
370
371 $ua->request( $request )
372 $ua->request( $request, $content_file )
373 $ua->request( $request, $content_cb )
374 $ua->request( $request, $content_cb, $read_size_hint )
375 This method will dispatch the given $request object. Normally this
376 will be an instance of the "HTTP::Request" class, but any object
377 with a similar interface will do. The return value is a response
378 object. See HTTP::Request and HTTP::Response for a description of
379 the interface provided by these classes.
380
381 The request() method will process redirects and authentication
382 responses transparently. This means that it may actually send sev‐
383 eral simple requests via the simple_request() method described
384 below.
385
386 The request methods described above; get(), head(), post() and mir‐
387 ror(), will all dispatch the request they build via this method.
388 They are convenience methods that simply hides the creation of the
389 request object for you.
390
391 The $content_file, $content_cb and $read_size_hint all correspond
392 to options described with the get() method above.
393
394 You are allowed to use a CODE reference as "content" in the request
395 object passed in. The "content" function should return the content
396 when called. The content can be returned in chunks. The content
397 function will be invoked repeatedly until it return an empty string
398 to signal that there is no more content.
399
400 $ua->simple_request( $request )
401 $ua->simple_request( $request, $content_file )
402 $ua->simple_request( $request, $content_cb )
403 $ua->simple_request( $request, $content_cb, $read_size_hint )
404 This method dispatches a single request and returns the response
405 received. Arguments are the same as for request() described above.
406
407 The difference from request() is that simple_request() will not try
408 to handle redirects or authentication responses. The request()
409 method will in fact invoke this method for each simple request it
410 sends.
411
412 $ua->is_protocol_supported( $scheme )
413 You can use this method to test whether this user agent object sup‐
414 ports the specified "scheme". (The "scheme" might be a string
415 (like 'http' or 'ftp') or it might be an URI object reference.)
416
417 Whether a scheme is supported, is determined by the user agent's
418 "protocols_allowed" or "protocols_forbidden" lists (if any), and by
419 the capabilities of LWP. I.e., this will return TRUE only if LWP
420 supports this protocol and it's permitted for this particular
421 object.
422
423 Callback methods
424
425 The following methods will be invoked as requests are processed. These
426 methods are documented here because subclasses of "LWP::UserAgent"
427 might want to override their behaviour.
428
429 $ua->prepare_request( $request )
430 This method is invoked by simple_request(). Its task is to modify
431 the given $request object by setting up various headers based on
432 the attributes of the user agent. The return value should normally
433 be the $request object passed in. If a different request object is
434 returned it will be the one actually processed.
435
436 The headers affected by the base implementation are; "User-Agent",
437 "From", "Range" and "Cookie".
438
439 $ua->redirect_ok( $prospective_request, $response )
440 This method is called by request() before it tries to follow a re‐
441 direction to the request in $response. This should return a TRUE
442 value if this redirection is permissible. The $prospective_request
443 will be the request to be sent if this method returns TRUE.
444
445 The base implementation will return FALSE unless the method is in
446 the object's "requests_redirectable" list, FALSE if the proposed
447 redirection is to a "file://..." URL, and TRUE otherwise.
448
449 $ua->get_basic_credentials( $realm, $uri, $isproxy )
450 This is called by request() to retrieve credentials for documents
451 protected by Basic or Digest Authentication. The arguments passed
452 in is the $realm provided by the server, the $uri requested and a
453 boolean flag to indicate if this is authentication against a proxy
454 server.
455
456 The method should return a username and password. It should return
457 an empty list to abort the authentication resolution attempt. Sub‐
458 classes can override this method to prompt the user for the infor‐
459 mation. An example of this can be found in "lwp-request" program
460 distributed with this library.
461
462 The base implementation simply checks a set of pre-stored member
463 variables, set up with the credentials() method.
464
466 See LWP for a complete overview of libwww-perl5. See lwpcook and the
467 scripts lwp-request and lwp-download for examples of usage.
468
469 See HTTP::Request and HTTP::Response for a description of the message
470 objects dispatched and received. See HTTP::Request::Common and
471 HTML::Form for other ways to build request objects.
472
473 See WWW::Mechanize and WWW::Search for examples of more specialized
474 user agents based on "LWP::UserAgent".
475
477 Copyright 1995-2004 Gisle Aas.
478
479 This library is free software; you can redistribute it and/or modify it
480 under the same terms as Perl itself.
481
482
483
484perl v5.8.8 2004-04-06 LWP::UserAgent(3)