1WWW::Mechanize(3) User Contributed Perl Documentation WWW::Mechanize(3)
2
3
4
6 WWW::Mechanize - Handy web browsing in a Perl object
7
9 version 1.88
10
12 WWW::Mechanize supports performing a sequence of page fetches including
13 following links and submitting forms. Each fetched page is parsed and
14 its links and forms are extracted. A link or a form can be selected,
15 form fields can be filled and the next page can be fetched. Mech also
16 stores a history of the URLs you've visited, which can be queried and
17 revisited.
18
19 use WWW::Mechanize ();
20 my $mech = WWW::Mechanize->new();
21
22 $mech->get( $url );
23
24 $mech->follow_link( n => 3 );
25 $mech->follow_link( text_regex => qr/download this/i );
26 $mech->follow_link( url => 'http://host.com/index.html' );
27
28 $mech->submit_form(
29 form_number => 3,
30 fields => {
31 username => 'mungo',
32 password => 'lost-and-alone',
33 }
34 );
35
36 $mech->submit_form(
37 form_name => 'search',
38 fields => { query => 'pot of gold', },
39 button => 'Search Now'
40 );
41
43 "WWW::Mechanize", or Mech for short, is a Perl module for stateful
44 programmatic web browsing, used for automating interaction with
45 websites.
46
47 Features include:
48
49 · All HTTP methods
50
51 · High-level hyperlink and HTML form support, without having to parse
52 HTML yourself
53
54 · SSL support
55
56 · Automatic cookies
57
58 · Custom HTTP headers
59
60 · Automatic handling of redirections
61
62 · Proxies
63
64 · HTTP authentication
65
66 Mech is well suited for use in testing web applications. If you use
67 one of the Test::*, like Test::HTML::Lint modules, you can check the
68 fetched content and use that as input to a test call.
69
70 use Test::More;
71 like( $mech->content(), qr/$expected/, "Got expected content" );
72
73 Each page fetch stores its URL in a history stack which you can
74 traverse.
75
76 $mech->back();
77
78 If you want finer control over your page fetching, you can use these
79 methods. "follow_link" and "submit_form" are just high level wrappers
80 around them.
81
82 $mech->find_link( n => $number );
83 $mech->form_number( $number );
84 $mech->form_name( $name );
85 $mech->field( $name, $value );
86 $mech->set_fields( %field_values );
87 $mech->set_visible( @criteria );
88 $mech->click( $button );
89
90 WWW::Mechanize is a proper subclass of LWP::UserAgent and you can also
91 use any of LWP::UserAgent's methods.
92
93 $mech->add_header($name => $value);
94
95 Please note that Mech does NOT support JavaScript, you need additional
96 software for that. Please check "JavaScript" in WWW::Mechanize::FAQ for
97 more.
98
100 · <https://github.com/libwww-perl/WWW-Mechanize/issues>
101
102 The queue for bugs & enhancements in WWW::Mechanize. Please note
103 that the queue at <http://rt.cpan.org> is no longer maintained.
104
105 · <https://metacpan.org/pod/WWW::Mechanize>
106
107 The CPAN documentation page for Mechanize.
108
109 · <https://metacpan.org/pod/distribution/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod>
110
111 Frequently asked questions. Make sure you read here FIRST.
112
114 new()
115 Creates and returns a new WWW::Mechanize object, hereafter referred to
116 as the "agent".
117
118 my $mech = WWW::Mechanize->new()
119
120 The constructor for WWW::Mechanize overrides two of the parms to the
121 LWP::UserAgent constructor:
122
123 agent => 'WWW-Mechanize/#.##'
124 cookie_jar => {} # an empty, memory-only HTTP::Cookies object
125
126 You can override these overrides by passing parms to the constructor,
127 as in:
128
129 my $mech = WWW::Mechanize->new( agent => 'wonderbot 1.01' );
130
131 If you want none of the overhead of a cookie jar, or don't want your
132 bot accepting cookies, you have to explicitly disallow it, like so:
133
134 my $mech = WWW::Mechanize->new( cookie_jar => undef );
135
136 Here are the parms that WWW::Mechanize recognizes. These do not
137 include parms that LWP::UserAgent recognizes.
138
139 · "autocheck => [0|1]"
140
141 Checks each request made to see if it was successful. This saves
142 you the trouble of manually checking yourself. Any errors found
143 are errors, not warnings.
144
145 The default value is ON, unless it's being subclassed, in which
146 case it is OFF. This means that standalone WWW::Mechanize
147 instances have autocheck turned on, which is protective for the
148 vast majority of Mech users who don't bother checking the return
149 value of get() and post() and can't figure why their code fails.
150 However, if WWW::Mechanize is subclassed, such as for
151 Test::WWW::Mechanize or Test::WWW::Mechanize::Catalyst, this may
152 not be an appropriate default, so it's off.
153
154 · "noproxy => [0|1]"
155
156 Turn off the automatic call to the LWP::UserAgent "env_proxy"
157 function.
158
159 This needs to be explicitly turned off if you're using
160 Crypt::SSLeay to access a https site via a proxy server. Note: you
161 still need to set your HTTPS_PROXY environment variable as
162 appropriate.
163
164 · "onwarn => \&func"
165
166 Reference to a "warn"-compatible function, such as "Carp::carp",
167 that is called when a warning needs to be shown.
168
169 If this is set to "undef", no warnings will ever be shown.
170 However, it's probably better to use the "quiet" method to control
171 that behavior.
172
173 If this value is not passed, Mech uses "Carp::carp" if Carp is
174 installed, or "CORE::warn" if not.
175
176 · "onerror => \&func"
177
178 Reference to a "die"-compatible function, such as "Carp::croak",
179 that is called when there's a fatal error.
180
181 If this is set to "undef", no errors will ever be shown.
182
183 If this value is not passed, Mech uses "Carp::croak" if Carp is
184 installed, or "CORE::die" if not.
185
186 · "quiet => [0|1]"
187
188 Don't complain on warnings. Setting "quiet => 1" is the same as
189 calling "$mech->quiet(1)". Default is off.
190
191 · "stack_depth => $value"
192
193 Sets the depth of the page stack that keeps track of all the
194 downloaded pages. Default is effectively infinite stack size. If
195 the stack is eating up your memory, then set this to a smaller
196 number, say 5 or 10. Setting this to zero means Mech will keep no
197 history.
198
199 To support forms, WWW::Mechanize's constructor pushes POST on to the
200 agent's "requests_redirectable" list (see also LWP::UserAgent.)
201
202 $mech->agent_alias( $alias )
203 Sets the user agent string to the expanded version from a table of
204 actual user strings. $alias can be one of the following:
205
206 · Windows IE 6
207
208 · Windows Mozilla
209
210 · Mac Safari
211
212 · Mac Mozilla
213
214 · Linux Mozilla
215
216 · Linux Konqueror
217
218 then it will be replaced with a more interesting one. For instance,
219
220 $mech->agent_alias( 'Windows IE 6' );
221
222 sets your User-Agent to
223
224 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
225
226 The list of valid aliases can be returned from "known_agent_aliases()".
227 The current list is:
228
229 · Windows IE 6
230
231 · Windows Mozilla
232
233 · Mac Safari
234
235 · Mac Mozilla
236
237 · Linux Mozilla
238
239 · Linux Konqueror
240
241 known_agent_aliases()
242 Returns a list of all the agent aliases that Mech knows about.
243
245 $mech->get( $uri )
246 Given a URL/URI, fetches it. Returns an HTTP::Response object. $uri
247 can be a well-formed URL string, a URI object, or a
248 WWW::Mechanize::Link object.
249
250 The results are stored internally in the agent object, but you don't
251 know that. Just use the accessors listed below. Poking at the
252 internals is deprecated and subject to change in the future.
253
254 "get()" is a well-behaved overloaded version of the method in
255 LWP::UserAgent. This lets you do things like
256
257 $mech->get( $uri, ':content_file' => $tempfile );
258
259 and you can rest assured that the parms will get filtered down
260 appropriately.
261
262 NOTE: Because ":content_file" causes the page contents to be stored in
263 a file instead of the response object, some Mech functions that expect
264 it to be there won't work as expected. Use with caution.
265
266 $mech->post( $uri, content => $content )
267 POSTs $content to $uri. Returns an HTTP::Response object. $uri can be
268 a well-formed URI string, a URI object, or a WWW::Mechanize::Link
269 object.
270
271 $mech->put( $uri, content => $content )
272 PUTs $content to $uri. Returns an HTTP::Response object. $uri can be
273 a well-formed URI string, a URI object, or a WWW::Mechanize::Link
274 object.
275
276 $mech->reload()
277 Acts like the reload button in a browser: repeats the current request.
278 The history (as per the back() method) is not altered.
279
280 Returns the HTTP::Response object from the reload, or "undef" if
281 there's no current request.
282
283 $mech->back()
284 The equivalent of hitting the "back" button in a browser. Returns to
285 the previous page. Won't go back past the first page. (Really, what
286 would it do if it could?)
287
288 Returns true if it could go back, or false if not.
289
290 $mech->history_count()
291 This returns the number of items in the browser history. This number
292 does include the most recently made request.
293
294 $mech->history($n)
295 This returns the nth item in history. The 0th item is the most recent
296 request and response, which would be acted on by methods like
297 "find_link()". The 1th item is the state you'd return to if you called
298 "back()".
299
300 The maximum useful value for $n is "$mech->history_count - 1".
301 Requests beyond that bound will return "undef".
302
303 History items are returned as hash references, in the form:
304
305 { req => $http_request, res => $http_response }
306
308 $mech->success()
309 Returns a boolean telling whether the last request was successful. If
310 there hasn't been an operation yet, returns false.
311
312 This is a convenience function that wraps "$mech->res->is_success".
313
314 $mech->uri()
315 Returns the current URI as a URI object. This object stringifies to the
316 URI itself.
317
318 $mech->response() / $mech->res()
319 Return the current response as an HTTP::Response object.
320
321 Synonym for "$mech->response()"
322
323 $mech->status()
324 Returns the HTTP status code of the response. This is a 3-digit number
325 like 200 for OK, 404 for not found, and so on.
326
327 $mech->ct() / $mech->content_type()
328 Returns the content type of the response.
329
330 $mech->base()
331 Returns the base URI for the current response
332
333 $mech->forms()
334 When called in a list context, returns a list of the forms found in the
335 last fetched page. In a scalar context, returns a reference to an array
336 with those forms. The forms returned are all HTML::Form objects.
337
338 $mech->current_form()
339 Returns the current form as an HTML::Form object.
340
341 $mech->links()
342 When called in a list context, returns a list of the links found in the
343 last fetched page. In a scalar context it returns a reference to an
344 array with those links. Each link is a WWW::Mechanize::Link object.
345
346 $mech->is_html()
347 Returns true/false on whether our content is HTML, according to the
348 HTTP headers.
349
350 $mech->title()
351 Returns the contents of the "<TITLE>" tag, as parsed by
352 HTML::HeadParser. Returns undef if the content is not HTML.
353
355 $mech->content(...)
356 Returns the content that the mech uses internally for the last page
357 fetched. Ordinarily this is the same as
358 "$mech->response()->decoded_content()", but this may differ for HTML
359 documents if update_html is overloaded (in which case the value passed
360 to the base-class implementation of same will be returned), and/or
361 extra named arguments are passed to content():
362
363 $mech->content( format => 'text' )
364 Returns a text-only version of the page, with all HTML markup
365 stripped. This feature requires HTML::TreeBuilder to be installed, or
366 a fatal error will be thrown. This works only if the contents are
367 HTML.
368
369 $mech->content( base_href => [$base_href|undef] )
370 Returns the HTML document, modified to contain a "<base
371 href="$base_href">" mark-up in the header. $base_href is
372 "$mech->base()" if not specified. This is handy to pass the HTML to
373 e.g. HTML::Display. This works only if the contents are HTML.
374
375 $mech->content( raw => 1 )
376 Returns "$self->response()->content()", i.e. the raw contents from
377 the response.
378
379 $mech->content( decoded_by_headers => 1 )
380 Returns the content after applying all "Content-Encoding" headers but
381 with not additional mangling.
382
383 $mech->content( charset => $charset )
384 Returns "$self->response()->decoded_content(charset => $charset)"
385 (see HTTP::Response for details).
386
387 To preserve backwards compatibility, additional parameters will be
388 ignored unless none of "raw | decoded_by_headers | charset" is
389 specified and the text is HTML, in which case an error will be
390 triggered.
391
392 $mech->text()
393 Returns the text of the current HTML content. If the content isn't
394 HTML, $mech will die.
395
396 The text is extracted by parsing the content, and then the extracted
397 text is cached, so don't worry about performance of calling this
398 repeatedly.
399
401 $mech->links()
402 Lists all the links on the current page. Each link is a
403 WWW::Mechanize::Link object. In list context, returns a list of all
404 links. In scalar context, returns an array reference of all links.
405
406 $mech->follow_link(...)
407 Follows a specified link on the page. You specify the match to be
408 found using the same parms that "find_link()" uses.
409
410 Here some examples:
411
412 · 3rd link called "download"
413
414 $mech->follow_link( text => 'download', n => 3 );
415
416 · first link where the URL has "download" in it, regardless of case:
417
418 $mech->follow_link( url_regex => qr/download/i );
419
420 or
421
422 $mech->follow_link( url_regex => qr/(?i:download)/ );
423
424 · 3rd link on the page
425
426 $mech->follow_link( n => 3 );
427
428 · the link with the url
429
430 $mech->follow_link( url => '/other/page' );
431
432 or
433
434 $mech->follow_link( url => 'http://example.com/page' );
435
436 Returns the result of the GET method (an HTTP::Response object) if a
437 link was found. If the page has no links, or the specified link
438 couldn't be found, returns undef.
439
440 $mech->find_link( ... )
441 Finds a link in the currently fetched page. It returns a
442 WWW::Mechanize::Link object which describes the link. (You'll probably
443 be most interested in the "url()" property.) If it fails to find a
444 link it returns undef.
445
446 You can take the URL part and pass it to the "get()" method. If that's
447 your plan, you might as well use the "follow_link()" method directly,
448 since it does the "get()" for you automatically.
449
450 Note that "<FRAME SRC="...">" tags are parsed out of the the HTML and
451 treated as links so this method works with them.
452
453 You can select which link to find by passing in one or more of these
454 key/value pairs:
455
456 · "text => 'string'," and "text_regex => qr/regex/,"
457
458 "text" matches the text of the link against string, which must be
459 an exact match. To select a link with text that is exactly
460 "download", use
461
462 $mech->find_link( text => 'download' );
463
464 "text_regex" matches the text of the link against regex. To select
465 a link with text that has "download" anywhere in it, regardless of
466 case, use
467
468 $mech->find_link( text_regex => qr/download/i );
469
470 Note that the text extracted from the page's links are trimmed.
471 For example, "<a> foo </a>" is stored as 'foo', and searching for
472 leading or trailing spaces will fail.
473
474 · "url => 'string'," and "url_regex => qr/regex/,"
475
476 Matches the URL of the link against string or regex, as
477 appropriate. The URL may be a relative URL, like foo/bar.html,
478 depending on how it's coded on the page.
479
480 · "url_abs => string" and "url_abs_regex => regex"
481
482 Matches the absolute URL of the link against string or regex, as
483 appropriate. The URL will be an absolute URL, even if it's
484 relative in the page.
485
486 · "name => string" and "name_regex => regex"
487
488 Matches the name of the link against string or regex, as
489 appropriate.
490
491 · "id => string" and "id_regex => regex"
492
493 Matches the attribute 'id' of the link against string or regex, as
494 appropriate.
495
496 · "class => string" and "class_regex => regex"
497
498 Matches the attribute 'class' of the link against string or regex,
499 as appropriate.
500
501 · "tag => string" and "tag_regex => regex"
502
503 Matches the tag that the link came from against string or regex, as
504 appropriate. The "tag_regex" is probably most useful to check for
505 more than one tag, as in:
506
507 $mech->find_link( tag_regex => qr/^(a|frame)$/ );
508
509 The tags and attributes looked at are defined below.
510
511 If "n" is not specified, it defaults to 1. Therefore, if you don't
512 specify any parms, this method defaults to finding the first link on
513 the page.
514
515 Note that you can specify multiple text or URL parameters, which will
516 be ANDed together. For example, to find the first link with text of
517 "News" and with "cnn.com" in the URL, use:
518
519 $mech->find_link( text => 'News', url_regex => qr/cnn\.com/ );
520
521 The return value is a reference to an array containing a
522 WWW::Mechanize::Link object for every link in "$self->content".
523
524 The links come from the following:
525
526 "<a href=...>"
527 "<area href=...>"
528 "<frame src=...>"
529 "<iframe src=...>"
530 "<link href=...>"
531 "<meta content=...>"
532
533 $mech->find_all_links( ... )
534 Returns all the links on the current page that match the criteria. The
535 method for specifying link criteria is the same as in "find_link()".
536 Each of the links returned is a WWW::Mechanize::Link object.
537
538 In list context, "find_all_links()" returns a list of the links.
539 Otherwise, it returns a reference to the list of links.
540
541 "find_all_links()" with no parameters returns all links in the page.
542
543 $mech->find_all_inputs( ... criteria ... )
544 find_all_inputs() returns an array of all the input controls in the
545 current form whose properties match all of the regexes passed in. The
546 controls returned are all descended from HTML::Form::Input. See
547 "INPUTS" in HTML::Form for details.
548
549 If no criteria are passed, all inputs will be returned.
550
551 If there is no current page, there is no form on the current page, or
552 there are no submit controls in the current form then the return will
553 be an empty array.
554
555 You may use a regex or a literal string:
556
557 # get all textarea controls whose names begin with "customer"
558 my @customer_text_inputs = $mech->find_all_inputs(
559 type => 'textarea',
560 name_regex => qr/^customer/,
561 );
562
563 # get all text or textarea controls called "customer"
564 my @customer_text_inputs = $mech->find_all_inputs(
565 type_regex => qr/^(text|textarea)$/,
566 name => 'customer',
567 );
568
569 $mech->find_all_submits( ... criteria ... )
570 "find_all_submits()" does the same thing as "find_all_inputs()" except
571 that it only returns controls that are submit controls, ignoring other
572 types of input controls like text and checkboxes.
573
575 $mech->images
576 Lists all the images on the current page. Each image is a
577 WWW::Mechanize::Image object. In list context, returns a list of all
578 images. In scalar context, returns an array reference of all images.
579
580 $mech->find_image()
581 Finds an image in the current page. It returns a WWW::Mechanize::Image
582 object which describes the image. If it fails to find an image it
583 returns undef.
584
585 You can select which image to find by passing in one or more of these
586 key/value pairs:
587
588 · "alt => 'string'" and "alt_regex => qr/regex/,"
589
590 "alt" matches the ALT attribute of the image against string, which
591 must be an exact match. To select a image with an ALT tag that is
592 exactly "download", use
593
594 $mech->find_image( alt => 'download' );
595
596 "alt_regex" matches the ALT attribute of the image against a
597 regular expression. To select an image with an ALT attribute that
598 has "download" anywhere in it, regardless of case, use
599
600 $mech->find_image( alt_regex => qr/download/i );
601
602 · "url => 'string'," and "url_regex => qr/regex/,"
603
604 Matches the URL of the image against string or regex, as
605 appropriate. The URL may be a relative URL, like foo/bar.html,
606 depending on how it's coded on the page.
607
608 · "url_abs => string" and "url_abs_regex => regex"
609
610 Matches the absolute URL of the image against string or regex, as
611 appropriate. The URL will be an absolute URL, even if it's
612 relative in the page.
613
614 · "tag => string" and "tag_regex => regex"
615
616 Matches the tag that the image came from against string or regex,
617 as appropriate. The "tag_regex" is probably most useful to check
618 for more than one tag, as in:
619
620 $mech->find_image( tag_regex => qr/^(img|input)$/ );
621
622 The tags supported are "<img>" and "<input>".
623
624 If "n" is not specified, it defaults to 1. Therefore, if you don't
625 specify any parms, this method defaults to finding the first image on
626 the page.
627
628 Note that you can specify multiple ALT or URL parameters, which will be
629 ANDed together. For example, to find the first image with ALT text of
630 "News" and with "cnn.com" in the URL, use:
631
632 $mech->find_image( image => 'News', url_regex => qr/cnn\.com/ );
633
634 The return value is a reference to an array containing a
635 WWW::Mechanize::Image object for every image in "$self->content".
636
637 $mech->find_all_images( ... )
638 Returns all the images on the current page that match the criteria.
639 The method for specifying image criteria is the same as in
640 "find_image()". Each of the images returned is a WWW::Mechanize::Image
641 object.
642
643 In list context, "find_all_images()" returns a list of the images.
644 Otherwise, it returns a reference to the list of images.
645
646 "find_all_images()" with no parameters returns all images in the page.
647
649 These methods let you work with the forms on a page. The idea is to
650 choose a form that you'll later work with using the field methods
651 below.
652
653 $mech->forms
654 Lists all the forms on the current page. Each form is an HTML::Form
655 object. In list context, returns a list of all forms. In scalar
656 context, returns an array reference of all forms.
657
658 $mech->form_number($number)
659 Selects the numberth form on the page as the target for subsequent
660 calls to "field()" and "click()". Also returns the form that was
661 selected.
662
663 If it is found, the form is returned as an HTML::Form object and set
664 internally for later use with Mech's form methods such as "field()" and
665 "click()". When called in a list context, the number of the found form
666 is also returned as a second value.
667
668 Emits a warning and returns undef if no form is found.
669
670 The first form is number 1, not zero.
671
672 $mech->form_name( $name )
673 Selects a form by name. If there is more than one form on the page
674 with that name, then the first one is used, and a warning is generated.
675
676 If it is found, the form is returned as an HTML::Form object and set
677 internally for later use with Mech's form methods such as "field()" and
678 "click()".
679
680 Returns undef if no form is found.
681
682 $mech->form_id( $name )
683 Selects a form by ID. If there is more than one form on the page with
684 that ID, then the first one is used, and a warning is generated.
685
686 If it is found, the form is returned as an HTML::Form object and set
687 internally for later use with Mech's form methods such as "field()" and
688 "click()".
689
690 If no form is found it returns "undef". This will also trigger a
691 warning, unless "quiet" is enabled.
692
693 $mech->all_forms_with_fields( @fields )
694 Selects a form by passing in a list of field names it must contain.
695 All matching forms (perhaps none) are returned as a list of HTML::Form
696 objects.
697
698 $mech->form_with_fields( @fields )
699 Selects a form by passing in a list of field names it must contain. If
700 there is more than one form on the page with that matches, then the
701 first one is used, and a warning is generated.
702
703 If it is found, the form is returned as an HTML::Form object and set
704 internally for later used with Mech's form methods such as "field()"
705 and "click()".
706
707 Returns undef and emits a warning if no form is found.
708
709 Note that this functionality requires libwww-perl 5.69 or higher.
710
711 $mech->all_forms_with( $attr1 => $value1, $attr2 => $value2, ... )
712 Searches for forms with arbitrary attribute/value pairs within the
713 <form> tag. (Currently does not work for attribute "action" due to
714 implementation details of HTML::Form.) When given more than one pair,
715 all criteria must match. Using "undef" as value means that the
716 attribute in question may not be present.
717
718 All matching forms (perhaps none) are returned as a list of HTML::Form
719 objects.
720
721 $mech->form_with( $attr1 => $value1, $attr2 => $value2, ... )
722 Searches for forms with arbitrary attribute/value pairs within the
723 <form> tag. (Currently does not work for attribute "action" due to
724 implementation details of HTML::Form.) When given more than one pair,
725 all criteria must match. Using "undef" as value means that the
726 attribute in question may not be present.
727
728 If it is found, the form is returned as an HTML::Form object and set
729 internally for later used with Mech's form methods such as "field()"
730 and "click()".
731
732 Returns undef if no form is found.
733
735 These methods allow you to set the values of fields in a given form.
736
737 $mech->field( $name, $value, $number )
738 $mech->field( $name, \@values, $number )
739 Given the name of a field, set its value to the value specified. This
740 applies to the current form (as set by the "form_name()" or
741 "form_number()" method or defaulting to the first form on the page).
742
743 The optional $number parameter is used to distinguish between two
744 fields with the same name. The fields are numbered from 1.
745
746 $mech->select($name, $value)
747 $mech->select($name, \@values)
748 Given the name of a "select" field, set its value to the value
749 specified. If the field is not "<select multiple>" and the $value is
750 an array, only the first value will be set. [Note: the documentation
751 previously claimed that only the last value would be set, but this was
752 incorrect.] Passing $value as a hash with an "n" key selects an item
753 by number (e.g. "{n => 3}" or "{n => [2,4]}"). The numbering starts
754 at 1. This applies to the current form.
755
756 If you have a field with "<select multiple>" and you pass a single
757 $value, then $value will be added to the list of fields selected,
758 without clearing the others. However, if you pass an array reference,
759 then all previously selected values will be cleared.
760
761 Returns true on successfully setting the value. On failure, returns
762 false and calls "$self->warn()" with an error message.
763
764 $mech->set_fields( $name => $value ... )
765 This method sets multiple fields of the current form. It takes a list
766 of field name and value pairs. If there is more than one field with the
767 same name, the first one found is set. If you want to select which of
768 the duplicate field to set, use a value which is an anonymous array
769 which has the field value and its number as the 2 elements.
770
771 # set the second foo field
772 $mech->set_fields( $name => [ 'foo', 2 ] );
773
774 The fields are numbered from 1.
775
776 This applies to the current form.
777
778 $mech->set_visible( @criteria )
779 This method sets fields of the current form without having to know
780 their names. So if you have a login screen that wants a username and
781 password, you do not have to fetch the form and inspect the source (or
782 use the mech-dump utility, installed with WWW::Mechanize) to see what
783 the field names are; you can just say
784
785 $mech->set_visible( $username, $password );
786
787 and the first and second fields will be set accordingly. The method is
788 called set_visible because it acts only on visible fields; hidden form
789 inputs are not considered. The order of the fields is the order in
790 which they appear in the HTML source which is nearly always the order
791 anyone viewing the page would think they are in, but some creative work
792 with tables could change that; caveat user.
793
794 Each element in @criteria is either a field value or a field specifier.
795 A field value is a scalar. A field specifier allows you to specify the
796 type of input field you want to set and is denoted with an arrayref
797 containing two elements. So you could specify the first radio button
798 with
799
800 $mech->set_visible( [ radio => 'KCRW' ] );
801
802 Field values and specifiers can be intermixed, hence
803
804 $mech->set_visible( 'fred', 'secret', [ option => 'Checking' ] );
805
806 would set the first two fields to "fred" and "secret", and the next
807 "OPTION" menu field to "Checking".
808
809 The possible field specifier types are: "text", "password", "hidden",
810 "textarea", "file", "image", "submit", "radio", "checkbox" and
811 "option".
812
813 "set_visible" returns the number of values set.
814
815 $mech->tick( $name, $value [, $set] )
816 "Ticks" the first checkbox that has both the name and value associated
817 with it on the current form. Dies if there is no named check box for
818 that value. Passing in a false value as the third optional argument
819 will cause the checkbox to be unticked.
820
821 $mech->untick($name, $value)
822 Causes the checkbox to be unticked. Shorthand for
823 "tick($name,$value,undef)"
824
825 $mech->value( $name [, $number] )
826 Given the name of a field, return its value. This applies to the
827 current form.
828
829 The optional $number parameter is used to distinguish between two
830 fields with the same name. The fields are numbered from 1.
831
832 If the field is of type file (file upload field), the value is always
833 cleared to prevent remote sites from downloading your local files. To
834 upload a file, specify its file name explicitly.
835
836 $mech->click( $button [, $x, $y] )
837 Has the effect of clicking a button on the current form. The first
838 argument is the name of the button to be clicked. The second and third
839 arguments (optional) allow you to specify the (x,y) coordinates of the
840 click.
841
842 If there is only one button on the form, "$mech->click()" with no
843 arguments simply clicks that one button.
844
845 Returns an HTTP::Response object.
846
847 $mech->click_button( ... )
848 Has the effect of clicking a button on the current form by specifying
849 its name, value, or index. Its arguments are a list of key/value
850 pairs. Only one of name, number, input or value must be specified in
851 the keys.
852
853 · "name => name"
854
855 Clicks the button named name in the current form.
856
857 · "id => id"
858
859 Clicks the button with the id id in the current form.
860
861 · "number => n"
862
863 Clicks the nth button in the current form. Numbering starts at 1.
864
865 · "value => value"
866
867 Clicks the button with the value value in the current form.
868
869 · "input => $inputobject"
870
871 Clicks on the button referenced by $inputobject, an instance of
872 HTML::Form::SubmitInput obtained e.g. from
873
874 $mech->current_form()->find_input( undef, 'submit' )
875
876 $inputobject must belong to the current form.
877
878 · "x => x"
879
880 · "y => y"
881
882 These arguments (optional) allow you to specify the (x,y)
883 coordinates of the click.
884
885 $mech->submit()
886 Submits the current form, without specifying a button to click.
887 Actually, no button is clicked at all.
888
889 Returns an HTTP::Response object.
890
891 This used to be a synonym for "$mech->click( 'submit' )", but is no
892 longer so.
893
894 $mech->submit_form( ... )
895 This method lets you select a form from the previously fetched page,
896 fill in its fields, and submit it. It combines the
897 "form_number"/"form_name", "set_fields" and "click" methods into one
898 higher level call. Its arguments are a list of key/value pairs, all of
899 which are optional.
900
901 · "fields => \%fields"
902
903 Specifies the fields to be filled in the current form.
904
905 · "with_fields => \%fields"
906
907 Probably all you need for the common case. It combines a smart form
908 selector and data setting in one operation. It selects the first
909 form that contains all fields mentioned in "\%fields". This is
910 nice because you don't need to know the name or number of the form
911 to do this.
912
913 (calls "form_with_fields()" and "set_fields()").
914
915 If you choose "with_fields", the "fields" option will be ignored.
916 The "form_number", "form_name" and "form_id" options will still be
917 used. An exception will be thrown unless exactly one form matches
918 all of the provided criteria.
919
920 · "form_number => n"
921
922 Selects the nth form (calls "form_number()". If this parm is not
923 specified, the currently-selected form is used.
924
925 · "form_name => name"
926
927 Selects the form named name (calls "form_name()")
928
929 · "form_id => ID"
930
931 Selects the form with ID ID (calls "form_id()")>>)
932
933 · "button => button"
934
935 Clicks on button button (calls "click()")
936
937 · "x => x, y => y"
938
939 Sets the x or y values for "click()"
940
941 · "strict_forms => bool"
942
943 Sets the HTML::Form strict flag which causes form submission to
944 croak if any of the passed fields don't exist on the page, and/or a
945 value doesn't exist in a select element. By default HTML::Form
946 defaults this value to false.
947
948 If no form is selected, the first form found is used.
949
950 If button is not passed, then the "submit()" method is used instead.
951
952 If you want to submit a file and get its content from a scalar rather
953 than a file in the filesystem, you can use:
954
955 $mech->submit_form(with_fields => { logfile => [ [ undef, 'whatever', Content => $content ], 1 ] } );
956
957 Returns an HTTP::Response object.
958
960 $mech->add_header( name => $value [, name => $value... ] )
961 Sets HTTP headers for the agent to add or remove from the HTTP request.
962
963 $mech->add_header( Encoding => 'text/klingon' );
964
965 If a value is "undef", then that header will be removed from any future
966 requests. For example, to never send a Referer header:
967
968 $mech->add_header( Referer => undef );
969
970 If you want to delete a header, use "delete_header".
971
972 Returns the number of name/value pairs added.
973
974 NOTE: This method was very different in WWW::Mechanize before 1.00.
975 Back then, the headers were stored in a package hash, not as a member
976 of the object instance. Calling "add_header()" would modify the
977 headers for every WWW::Mechanize object, even after your object no
978 longer existed.
979
980 $mech->delete_header( name [, name ... ] )
981 Removes HTTP headers from the agent's list of special headers. For
982 instance, you might need to do something like:
983
984 # Don't send a Referer for this URL
985 $mech->add_header( Referer => undef );
986
987 # Get the URL
988 $mech->get( $url );
989
990 # Back to the default behavior
991 $mech->delete_header( 'Referer' );
992
993 $mech->quiet(true/false)
994 Allows you to suppress warnings to the screen.
995
996 $mech->quiet(0); # turns on warnings (the default)
997 $mech->quiet(1); # turns off warnings
998 $mech->quiet(); # returns the current quietness status
999
1000 $mech->stack_depth( $max_depth )
1001 Get or set the page stack depth. Use this if you're doing a lot of page
1002 scraping and running out of memory.
1003
1004 A value of 0 means "no history at all." By default, the max stack
1005 depth is humongously large, effectively keeping all history.
1006
1007 $mech->save_content( $filename, %opts )
1008 Dumps the contents of "$mech->content" into $filename. $filename will
1009 be overwritten. Dies if there are any errors.
1010
1011 If the content type does not begin with "text/", then the content is
1012 saved in binary mode (i.e. "binmode()" is set on the output
1013 filehandle).
1014
1015 Additional arguments can be passed as key/value pairs:
1016
1017 $mech->save_content( $filename, binary => 1 )
1018 Filehandle is set with "binmode" to ":raw" and contents are taken
1019 calling "$self->content(decoded_by_headers => 1)". Same as calling:
1020
1021 $mech->save_content( $filename, binmode => ':raw',
1022 decoded_by_headers => 1 );
1023
1024 This should be the safest way to save contents verbatim.
1025
1026 $mech->save_content( $filename, binmode => $binmode )
1027 Filehandle is set to binary mode. If $binmode begins with ':', it
1028 is passed as a parameter to "binmode":
1029
1030 binmode $fh, $binmode;
1031
1032 otherwise the filehandle is set to binary mode if $binmode is true:
1033
1034 binmode $fh;
1035
1036 all other arguments
1037 are passed as-is to "$mech->content(%opts)". In particular,
1038 "decoded_by_headers" might come handy if you want to revert the
1039 effect of line compression performed by the web server but without
1040 further interpreting the contents (e.g. decoding it according to
1041 the charset).
1042
1043 $mech->dump_headers( [$fh] )
1044 Prints a dump of the HTTP response headers for the most recent
1045 response. If $fh is not specified or is undef, it dumps to STDOUT.
1046
1047 Unlike the rest of the dump_* methods, $fh can be a scalar. It will be
1048 used as a file name.
1049
1050 $mech->dump_links( [[$fh], $absolute] )
1051 Prints a dump of the links on the current page to $fh. If $fh is not
1052 specified or is undef, it dumps to STDOUT.
1053
1054 If $absolute is true, links displayed are absolute, not relative.
1055
1056 $mech->dump_images( [[$fh], $absolute] )
1057 Prints a dump of the images on the current page to $fh. If $fh is not
1058 specified or is undef, it dumps to STDOUT.
1059
1060 If $absolute is true, links displayed are absolute, not relative.
1061
1062 $mech->dump_forms( [$fh] )
1063 Prints a dump of the forms on the current page to $fh. If $fh is not
1064 specified or is undef, it dumps to STDOUT. Running the following:
1065
1066 my $mech = WWW::Mechanize->new();
1067 $mech->get("https://www.google.com/");
1068 $mech->dump_forms;
1069
1070 will print:
1071
1072 GET https://www.google.com/search [f]
1073 ie=ISO-8859-1 (hidden readonly)
1074 hl=en (hidden readonly)
1075 source=hp (hidden readonly)
1076 biw= (hidden readonly)
1077 bih= (hidden readonly)
1078 q= (text)
1079 btnG=Google Search (submit)
1080 btnI=I'm Feeling Lucky (submit)
1081 gbv=1 (hidden readonly)
1082
1083 $mech->dump_text( [$fh] )
1084 Prints a dump of the text on the current page to $fh. If $fh is not
1085 specified or is undef, it dumps to STDOUT.
1086
1088 $mech->clone()
1089 Clone the mech object. The clone will be using the same cookie jar as
1090 the original mech.
1091
1092 $mech->redirect_ok()
1093 An overloaded version of "redirect_ok()" in LWP::UserAgent. This
1094 method is used to determine whether a redirection in the request should
1095 be followed.
1096
1097 Note that WWW::Mechanize's constructor pushes POST on to the agent's
1098 "requests_redirectable" list.
1099
1100 $mech->request( $request [, $arg [, $size]])
1101 Overloaded version of "request()" in LWP::UserAgent. Performs the
1102 actual request. Normally, if you're using WWW::Mechanize, it's because
1103 you don't want to deal with this level of stuff anyway.
1104
1105 Note that $request will be modified.
1106
1107 Returns an HTTP::Response object.
1108
1109 $mech->update_html( $html )
1110 Allows you to replace the HTML that the mech has found. Updates the
1111 forms and links parse-trees that the mech uses internally.
1112
1113 Say you have a page that you know has malformed output, and you want to
1114 update it so the links come out correctly:
1115
1116 my $html = $mech->content;
1117 $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
1118 $mech->update_html( $html );
1119
1120 This method is also used internally by the mech itself to update its
1121 own HTML content when loading a page. This means that if you would like
1122 to systematically perform the above HTML substitution, you would
1123 overload update_html in a subclass thusly:
1124
1125 package MyMech;
1126 use base 'WWW::Mechanize';
1127
1128 sub update_html {
1129 my ($self, $html) = @_;
1130 $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
1131 $self->WWW::Mechanize::update_html( $html );
1132 }
1133
1134 If you do this, then the mech will use the tidied-up HTML instead of
1135 the original both when parsing for its own needs, and for returning to
1136 you through "content".
1137
1138 Overloading this method is also the recommended way of implementing
1139 extra validation steps (e.g. link checkers) for every HTML page
1140 received. "warn" and "die" would then come in handy to signal
1141 validation errors.
1142
1143 $mech->credentials( $username, $password )
1144 Provide credentials to be used for HTTP Basic authentication for all
1145 sites and realms until further notice.
1146
1147 The four argument form described in LWP::UserAgent is still supported.
1148
1149 $mech->get_basic_credentials( $realm, $uri, $isproxy )
1150 Returns the credentials for the realm and URI.
1151
1152 $mech->clear_credentials()
1153 Remove any credentials set up with "credentials()".
1154
1156 As a subclass of LWP::UserAgent, WWW::Mechanize inherits all of
1157 LWP::UserAgent's methods. Many of which are overridden or extended.
1158 The following methods are inherited unchanged. View the LWP::UserAgent
1159 documentation for their implementation descriptions.
1160
1161 This is not meant to be an inclusive list. LWP::UA may have added
1162 others.
1163
1164 $mech->head()
1165 Inherited from LWP::UserAgent.
1166
1167 $mech->mirror()
1168 Inherited from LWP::UserAgent.
1169
1170 $mech->simple_request()
1171 Inherited from LWP::UserAgent.
1172
1173 $mech->is_protocol_supported()
1174 Inherited from LWP::UserAgent.
1175
1176 $mech->prepare_request()
1177 Inherited from LWP::UserAgent.
1178
1179 $mech->progress()
1180 Inherited from LWP::UserAgent.
1181
1183 These methods are only used internally. You probably don't need to
1184 know about them.
1185
1186 $mech->_update_page($request, $response)
1187 Updates all internal variables in $mech as if $request was just
1188 performed, and returns $response. The page stack is not altered by this
1189 method, it is up to caller (e.g. "request") to do that.
1190
1191 $mech->_modify_request( $req )
1192 Modifies a HTTP::Request before the request is sent out, for both GET
1193 and POST requests.
1194
1195 We add a "Referer" header, as well as header to note that we can accept
1196 gzip encoded content, if Compress::Zlib is installed.
1197
1198 $mech->_make_request()
1199 Convenience method to make it easier for subclasses like
1200 WWW::Mechanize::Cached to intercept the request.
1201
1202 $mech->_reset_page()
1203 Resets the internal fields that track page parsed stuff.
1204
1205 $mech->_extract_links()
1206 Extracts links from the content of a webpage, and populates the
1207 "{links}" property with WWW::Mechanize::Link objects.
1208
1209 $mech->_push_page_stack()
1210 The agent keeps a stack of visited pages, which it can pop when it
1211 needs to go BACK and so on.
1212
1213 The current page needs to be pushed onto the stack before we get a new
1214 page, and the stack needs to be popped when BACK occurs.
1215
1216 Neither of these take any arguments, they just operate on the $mech
1217 object.
1218
1219 warn( @messages )
1220 Centralized warning method, for diagnostics and non-fatal problems.
1221 Defaults to calling "CORE::warn", but may be overridden by setting
1222 "onwarn" in the constructor.
1223
1224 die( @messages )
1225 Centralized error method. Defaults to calling "CORE::die", but may be
1226 overridden by setting "onerror" in the constructor.
1227
1229 The default settings can get you up and running quickly, but there are
1230 settings you can change in order to make your life easier.
1231
1232 autocheck
1233 "autocheck" can save you the overhead of checking status codes for
1234 success. You may outgrow it as your needs get more sophisticated,
1235 but it's a safe option to start with.
1236
1237 my $agent = WWW::Mechanize->new( autocheck => 1 );
1238
1239 cookie_jar
1240 You are encouraged to install Mozilla::PublicSuffix and use
1241 HTTP::CookieJar::LWP as your cookie jar. HTTP::CookieJar::LWP
1242 provides a better security model matching that of current Web
1243 browsers when Mozilla::PublicSuffix is installed.
1244
1245 use HTTP::CookieJar::LWP ();
1246
1247 my $jar = HTTP::CookieJar::LWP->new;
1248 my $agent = WWW::Mechanize->new( cookie_jar => $jar );
1249
1250 protocols_allowed
1251 This option is inherited directly from LWP::UserAgent. It allows
1252 you to whitelist the protocols you're willing to allow.
1253
1254 my $agent = WWW::Mechanize->new(
1255 protocols_allowed => [ 'http', 'https' ]
1256 );
1257
1258 This will prevent you from inadvertently following URLs like
1259 "file:///etc/passwd"
1260
1261 protocols_forbidden
1262 This option is also inherited directly from LWP::UserAgent. It
1263 allows you to blacklist the protocols you're unwilling to allow.
1264
1265 my $agent = WWW::Mechanize->new(
1266 protocols_forbidden => [ 'file', 'mailto', 'ssh', ]
1267 );
1268
1269 This will prevent you from inadvertently following URLs like
1270 "file:///etc/passwd"
1271
1272 strict_forms
1273 Consider supplying the "strict_forms" argument as a rule when you
1274 are using "submit_form". This will perform a helpful sanity check
1275 on the form fields you are submitting, which can save you a lot of
1276 debugging time.
1277
1278 $agent->submit_form( fields => { foo => 'bar' } , strict_forms => 1 );
1279
1281 WWW::Mechanize is hosted at GitHub.
1282
1283 Repository: <https://github.com/libwww-perl/WWW-Mechanize>. Bugs:
1284 <https://github.com/libwww-perl/WWW-Mechanize/issues>.
1285
1287 Spidering Hacks, by Kevin Hemenway and Tara Calishain
1288 Spidering Hacks from O'Reilly
1289 (<http://www.oreilly.com/catalog/spiderhks/>) is a great book for
1290 anyone wanting to know more about screen-scraping and spidering.
1291
1292 There are six hacks that use Mech or a Mech derivative:
1293
1294 #21 WWW::Mechanize 101
1295 #22 Scraping with WWW::Mechanize
1296 #36 Downloading Images from Webshots
1297 #44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
1298 #64 Super Author Searching
1299 #73 Scraping TV Listings
1300
1301 The book was also positively reviewed on Slashdot:
1302 <http://books.slashdot.org/article.pl?sid=03/12/11/2126256>
1303
1305 · WWW::Mechanize mailing list
1306
1307 The Mech mailing list is at
1308 <http://groups.google.com/group/www-mechanize-users> and is
1309 specific to Mechanize, unlike the LWP mailing list below. Although
1310 it is a users list, all development discussion takes place here,
1311 too.
1312
1313 · LWP mailing list
1314
1315 The LWP mailing list is at
1316 <http://lists.perl.org/showlist.cgi?name=libwww>, and is more user-
1317 oriented and well-populated than the WWW::Mechanize list.
1318
1319 · Perlmonks
1320
1321 <http://perlmonks.org> is an excellent community of support, and
1322 many questions about Mech have already been answered there.
1323
1324 · WWW::Mechanize::Examples
1325
1326 A random array of examples submitted by users, included with the
1327 Mechanize distribution.
1328
1330 · <http://www.ibm.com/developerworks/linux/library/wa-perlsecure/>
1331
1332 IBM article "Secure Web site access with Perl"
1333
1334 · <http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf>
1335
1336 Leland Johnson's hack #84 in Google Hacks, 2nd Edition is an
1337 example of a production script that uses WWW::Mechanize and
1338 HTML::TableContentParser. It takes in keywords and returns the
1339 estimated price of these keywords on Google's AdWords program.
1340
1341 · <http://www.perl.com/pub/a/2004/06/04/recorder.html>
1342
1343 Linda Julien writes about using HTTP::Recorder to create
1344 WWW::Mechanize scripts.
1345
1346 · <http://www.developer.com/lang/other/article.php/3454041>
1347
1348 Jason Gilmore's article on using WWW::Mechanize for scraping sales
1349 information from Amazon and eBay.
1350
1351 · <http://www.perl.com/pub/a/2003/01/22/mechanize.html>
1352
1353 Chris Ball's article about using WWW::Mechanize for scraping TV
1354 listings.
1355
1356 · <http://www.stonehenge.com/merlyn/LinuxMag/col47.html>
1357
1358 Randal Schwartz's article on scraping Yahoo News for images. It's
1359 already out of date: He manually walks the list of links hunting
1360 for matches, which wouldn't have been necessary if the
1361 "find_link()" method existed at press time.
1362
1363 · <http://www.perladvent.org/2002/16th/>
1364
1365 WWW::Mechanize on the Perl Advent Calendar, by Mark Fowler.
1366
1367 · <http://www.linux-magazin.de/ausgaben/2004/03/datenruessel/>
1368
1369 Michael Schilli's article on Mech and WWW::Mechanize::Shell for the
1370 German magazine Linux Magazin.
1371
1372 Other modules that use Mechanize
1373 Here are modules that use or subclass Mechanize. Let me know of any
1374 others:
1375
1376 · Finance::Bank::LloydsTSB
1377
1378 · HTTP::Recorder
1379
1380 Acts as a proxy for web interaction, and then generates
1381 WWW::Mechanize scripts.
1382
1383 · Win32::IE::Mechanize
1384
1385 Just like Mech, but using Microsoft Internet Explorer to do the
1386 work.
1387
1388 · WWW::Bugzilla
1389
1390 · WWW::CheckSite
1391
1392 · WWW::Google::Groups
1393
1394 · WWW::Hotmail
1395
1396 · WWW::Mechanize::Cached
1397
1398 · WWW::Mechanize::Cached::GZip
1399
1400 · WWW::Mechanize::FormFiller
1401
1402 · WWW::Mechanize::Shell
1403
1404 · WWW::Mechanize::Sleepy
1405
1406 · WWW::Mechanize::SpamCop
1407
1408 · WWW::Mechanize::Timed
1409
1410 · WWW::SourceForge
1411
1412 · WWW::Yahoo::Groups
1413
1414 · WWW::Scripter
1415
1417 Thanks to the numerous people who have helped out on WWW::Mechanize in
1418 one way or another, including Kirrily Robert for the original
1419 "WWW::Automate", Lyle Hopkins, Damien Clark, Ansgar Burchardt, Gisle
1420 Aas, Jeremy Ary, Hilary Holz, Rafael Kitover, Norbert Buchmuller, Dave
1421 Page, David Sainty, H.Merijn Brand, Matt Lawrence, Michael Schwern,
1422 Adriano Ferreira, Miyagawa, Peteris Krumins, Rafael Kitover, David
1423 Steinbrunner, Kevin Falcone, Mike O'Regan, Mark Stosberg, Uri Guttman,
1424 Peter Scott, Philippe Bruhat, Ian Langworth, John Beppu, Gavin Estey,
1425 Jim Brandt, Ask Bjoern Hansen, Greg Davies, Ed Silva, Mark-Jason
1426 Dominus, Autrijus Tang, Mark Fowler, Stuart Children, Max Maischein,
1427 Meng Wong, Prakash Kailasa, Abigail, Jan Pazdziora, Dominique
1428 Quatravaux, Scott Lanning, Rob Casey, Leland Johnson, Joshua Gatcomb,
1429 Julien Beasley, Abe Timmerman, Peter Stevens, Pete Krawczyk, Tad
1430 McClellan, and the late great Iain Truskett.
1431
1433 Andy Lester <andy at petdance.com>
1434
1436 This software is copyright (c) 2004-2016 by Andy Lester.
1437
1438 This is free software; you can redistribute it and/or modify it under
1439 the same terms as the Perl 5 programming language system itself.
1440
1442 Hey! The above document had some coding errors, which are explained
1443 below:
1444
1445 Around line 3062:
1446 Unknown directive: =over4
1447
1448 Around line 3064:
1449 '=item' outside of any '=over'
1450
1451
1452
1453perl v5.28.0 2018-03-23 WWW::Mechanize(3)