1WWW::Mechanize(3)     User Contributed Perl Documentation    WWW::Mechanize(3)
2
3
4

NAME

6       WWW::Mechanize - Handy web browsing in a Perl object
7

VERSION

9       Version 1.62
10

SYNOPSIS

12       "WWW::Mechanize", or Mech for short, is a Perl module for stateful
13       programmatic web browsing, used for automating interaction with
14       websites.
15
16       Features include:
17
18       ·   All HTTP methods
19
20       ·   High-level hyperlink and HTML form support, without having to parse
21           HTML yourself
22
23       ·   SSL support
24
25       ·   Automatic cookies
26
27       ·   Custom HTTP headers
28
29       ·   Automatic handling of redirections
30
31       ·   Proxies
32
33       ·   HTTP authentication
34
35       Mech supports performing a sequence of page fetches including following
36       links and submitting forms. Each fetched page is parsed and its links
37       and forms are extracted. A link or a form can be selected, form fields
38       can be filled and the next page can be fetched.  Mech also stores a
39       history of the URLs you've visited, which can be queried and revisited.
40
41           use WWW::Mechanize;
42           my $mech = WWW::Mechanize->new();
43
44           $mech->get( $url );
45
46           $mech->follow_link( n => 3 );
47           $mech->follow_link( text_regex => qr/download this/i );
48           $mech->follow_link( url => 'http://host.com/index.html' );
49
50           $mech->submit_form(
51               form_number => 3,
52               fields      => {
53                   username    => 'mungo',
54                   password    => 'lost-and-alone',
55               }
56           );
57
58           $mech->submit_form(
59               form_name => 'search',
60               fields    => { query  => 'pot of gold', },
61               button    => 'Search Now'
62           );
63
64       Mech is well suited for use in testing web applications.  If you use
65       one of the Test::*, like Test::HTML::Lint modules, you can check the
66       fetched content and use that as input to a test call.
67
68           use Test::More;
69           like( $mech->content(), qr/$expected/, "Got expected content" );
70
71       Each page fetch stores its URL in a history stack which you can
72       traverse.
73
74           $mech->back();
75
76       If you want finer control over your page fetching, you can use these
77       methods. "follow_link" and "submit_form" are just high level wrappers
78       around them.
79
80           $mech->find_link( n => $number );
81           $mech->form_number( $number );
82           $mech->form_name( $name );
83           $mech->field( $name, $value );
84           $mech->set_fields( %field_values );
85           $mech->set_visible( @criteria );
86           $mech->click( $button );
87
88       WWW::Mechanize is a proper subclass of LWP::UserAgent and you can also
89       use any of LWP::UserAgent's methods.
90
91           $mech->add_header($name => $value);
92
93       Please note that Mech does NOT support JavaScript.  Please check the
94       FAQ in WWW::Mechanize::FAQ for more.
95
97       ·   http://code.google.com/p/www-mechanize/issues/list
98           <http://code.google.com/p/www-mechanize/issues/list>
99
100           The queue for bugs & enhancements in WWW::Mechanize and
101           Test::WWW::Mechanize.  Please note that the queue at
102           <http://rt.cpan.org> is no longer maintained.
103
104       ·   http://search.cpan.org/dist/WWW-Mechanize/
105           <http://search.cpan.org/dist/WWW-Mechanize/>
106
107           The CPAN documentation page for Mechanize.
108
109       ·   http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod
110           <http://search.cpan.org/dist/WWW-
111           Mechanize/lib/WWW/Mechanize/FAQ.pod>
112
113           Frequently asked questions.  Make sure you read here FIRST.
114

CONSTRUCTOR AND STARTUP

116   new()
117       Creates and returns a new WWW::Mechanize object, hereafter referred to
118       as the "agent".
119
120           my $mech = WWW::Mechanize->new()
121
122       The constructor for WWW::Mechanize overrides two of the parms to the
123       LWP::UserAgent constructor:
124
125           agent => 'WWW-Mechanize/#.##'
126           cookie_jar => {}    # an empty, memory-only HTTP::Cookies object
127
128       You can override these overrides by passing parms to the constructor,
129       as in:
130
131           my $mech = WWW::Mechanize->new( agent => 'wonderbot 1.01' );
132
133       If you want none of the overhead of a cookie jar, or don't want your
134       bot accepting cookies, you have to explicitly disallow it, like so:
135
136           my $mech = WWW::Mechanize->new( cookie_jar => undef );
137
138       Here are the parms that WWW::Mechanize recognizes.  These do not
139       include parms that LWP::UserAgent recognizes.
140
141       ·   "autocheck => [0|1]"
142
143           Checks each request made to see if it was successful.  This saves
144           you the trouble of manually checking yourself.  Any errors found
145           are errors, not warnings.
146
147           The default value is ON, unless it's being subclassed, in which
148           case it is OFF.  This means that standalone WWW::Mechanizeinstances
149           have autocheck turned on, which is protective for the vast majority
150           of Mech users who don't bother checking the return value of get()
151           and post() and can't figure why their code fails. However, if
152           WWW::Mechanize is subclassed, such as for Test::WWW::Mechanize or
153           Test::WWW::Mechanize::Catalyst, this may not be an appropriate
154           default, so it's off.
155
156       ·   "noproxy => [0|1]"
157
158           Turn off the automatic call to the LWP::UserAgent "env_proxy"
159           function.
160
161           This needs to be explicitly turned off if you're using
162           Crypt::SSLeay to access a https site via a proxy server.  Note: you
163           still need to set your HTTPS_PROXY environment variable as
164           appropriate.
165
166       ·   "onwarn => \&func"
167
168           Reference to a "warn"-compatible function, such as "L<Carp>::carp",
169           that is called when a warning needs to be shown.
170
171           If this is set to "undef", no warnings will ever be shown.
172           However, it's probably better to use the "quiet" method to control
173           that behavior.
174
175           If this value is not passed, Mech uses "Carp::carp" if Carp is
176           installed, or "CORE::warn" if not.
177
178       ·   "onerror => \&func"
179
180           Reference to a "die"-compatible function, such as "L<Carp>::croak",
181           that is called when there's a fatal error.
182
183           If this is set to "undef", no errors will ever be shown.
184
185           If this value is not passed, Mech uses "Carp::croak" if Carp is
186           installed, or "CORE::die" if not.
187
188       ·   "quiet => [0|1]"
189
190           Don't complain on warnings.  Setting "quiet => 1" is the same as
191           calling "$mech->quiet(1)".  Default is off.
192
193       ·   "stack_depth => $value"
194
195           Sets the depth of the page stack that keeps track of all the
196           downloaded pages. Default is effectively infinite stack size.  If
197           the stack is eating up your memory, then set this to a smaller
198           number, say 5 or 10.  Setting this to zero means Mech will keep no
199           history.
200
201       To support forms, WWW::Mechanize's constructor pushes POST on to the
202       agent's "requests_redirectable" list (see also LWP::UserAgent.)
203
204   $mech->agent_alias( $alias )
205       Sets the user agent string to the expanded version from a table of
206       actual user strings.  $alias can be one of the following:
207
208       ·   Windows IE 6
209
210       ·   Windows Mozilla
211
212       ·   Mac Safari
213
214       ·   Mac Mozilla
215
216       ·   Linux Mozilla
217
218       ·   Linux Konqueror
219
220       then it will be replaced with a more interesting one.  For instance,
221
222           $mech->agent_alias( 'Windows IE 6' );
223
224       sets your User-Agent to
225
226           Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
227
228       The list of valid aliases can be returned from "known_agent_aliases()".
229       The current list is:
230
231       ·   Windows IE 6
232
233       ·   Windows Mozilla
234
235       ·   Mac Safari
236
237       ·   Mac Mozilla
238
239       ·   Linux Mozilla
240
241       ·   Linux Konqueror
242
243   known_agent_aliases()
244       Returns a list of all the agent aliases that Mech knows about.
245

PAGE-FETCHING METHODS

247   $mech->get( $uri )
248       Given a URL/URI, fetches it.  Returns an HTTP::Response object.  $uri
249       can be a well-formed URL string, a URI object, or a
250       WWW::Mechanize::Link object.
251
252       The results are stored internally in the agent object, but you don't
253       know that.  Just use the accessors listed below.  Poking at the
254       internals is deprecated and subject to change in the future.
255
256       "get()" is a well-behaved overloaded version of the method in
257       LWP::UserAgent.  This lets you do things like
258
259           $mech->get( $uri, ':content_file' => $tempfile );
260
261       and you can rest assured that the parms will get filtered down
262       appropriately.
263
264       NOTE: Because ":content_file" causes the page contents to be stored in
265       a file instead of the response object, some Mech functions that expect
266       it to be there won't work as expected. Use with caution.
267
268   $mech->put( $uri, content => $content )
269       PUTs $content to $uri.  Returns an HTTP::Response object.  $uri can be
270       a well-formed URI string, a URI object, or a WWW::Mechanize::Link
271       object.
272
273   $mech->reload()
274       Acts like the reload button in a browser: repeats the current request.
275       The history (as per the "back" method) is not altered.
276
277       Returns the HTTP::Response object from the reload, or "undef" if
278       there's no current request.
279
280   $mech->back()
281       The equivalent of hitting the "back" button in a browser.  Returns to
282       the previous page.  Won't go back past the first page. (Really, what
283       would it do if it could?)
284
285       Returns true if it could go back, or false if not.
286

STATUS METHODS

288   $mech->success()
289       Returns a boolean telling whether the last request was successful.  If
290       there hasn't been an operation yet, returns false.
291
292       This is a convenience function that wraps "$mech->res->is_success".
293
294   $mech->uri()
295       Returns the current URI as a URI object. This object stringifies to the
296       URI itself.
297
298   $mech->response() / $mech->res()
299       Return the current response as an HTTP::Response object.
300
301       Synonym for "$mech->response()"
302
303   $mech->status()
304       Returns the HTTP status code of the response.  This is a 3-digit number
305       like 200 for OK, 404 for not found, and so on.
306
307   $mech->ct() / $mech->content_type()
308       Returns the content type of the response.
309
310   $mech->base()
311       Returns the base URI for the current response
312
313   $mech->forms()
314       When called in a list context, returns a list of the forms found in the
315       last fetched page. In a scalar context, returns a reference to an array
316       with those forms. The forms returned are all HTML::Form objects.
317
318   $mech->current_form()
319       Returns the current form as an HTML::Form object.
320
321   $mech->links()
322       When called in a list context, returns a list of the links found in the
323       last fetched page.  In a scalar context it returns a reference to an
324       array with those links.  Each link is a WWW::Mechanize::Link object.
325
326   $mech->is_html()
327       Returns true/false on whether our content is HTML, according to the
328       HTTP headers.
329
330   $mech->title()
331       Returns the contents of the "<TITLE>" tag, as parsed by
332       HTML::HeadParser.  Returns undef if the content is not HTML.
333

CONTENT-HANDLING METHODS

335   $mech->content(...)
336       Returns the content that the mech uses internally for the last page
337       fetched. Ordinarily this is the same as $mech->response()->content(),
338       but this may differ for HTML documents if "update_html" is overloaded
339       (in which case the value passed to the base-class implementation of
340       same will be returned), and/or extra named arguments are passed to
341       content():
342
343       $mech->content( format => 'text' )
344         Returns a text-only version of the page, with all HTML markup
345         stripped. This feature requires HTML::TreeBuilder to be installed, or
346         a fatal error will be thrown.
347
348       $mech->content( base_href => [$base_href|undef] )
349         Returns the HTML document, modified to contain a "<base
350         href="$base_href">" mark-up in the header.  $base_href is
351         "$mech->base()" if not specified. This is handy to pass the HTML to
352         e.g. HTML::Display.
353
354       Passing arguments to "content()" if the current document is not HTML
355       has no effect now (i.e. the return value is the same as
356       "$self->response()->content()". This may change in the future, but will
357       likely be backwards-compatible when it does.
358
360   $mech->links()
361       Lists all the links on the current page.  Each link is a
362       WWW::Mechanize::Link object. In list context, returns a list of all
363       links.  In scalar context, returns an array reference of all links.
364
365   $mech->follow_link(...)
366       Follows a specified link on the page.  You specify the match to be
367       found using the same parms that "find_link()" uses.
368
369       Here some examples:
370
371       ·   3rd link called "download"
372
373               $mech->follow_link( text => 'download', n => 3 );
374
375       ·   first link where the URL has "download" in it, regardless of case:
376
377               $mech->follow_link( url_regex => qr/download/i );
378
379           or
380
381               $mech->follow_link( url_regex => qr/(?i:download)/ );
382
383       ·   3rd link on the page
384
385               $mech->follow_link( n => 3 );
386
387       Returns the result of the GET method (an HTTP::Response object) if a
388       link was found. If the page has no links, or the specified link
389       couldn't be found, returns undef.
390
391   $mech->find_link( ... )
392       Finds a link in the currently fetched page. It returns a
393       WWW::Mechanize::Link object which describes the link.  (You'll probably
394       be most interested in the "url()" property.)  If it fails to find a
395       link it returns undef.
396
397       You can take the URL part and pass it to the "get()" method.  If that's
398       your plan, you might as well use the "follow_link()" method directly,
399       since it does the "get()" for you automatically.
400
401       Note that "<FRAME SRC="...">" tags are parsed out of the the HTML and
402       treated as links so this method works with them.
403
404       You can select which link to find by passing in one or more of these
405       key/value pairs:
406
407       ·   "text => 'string'," and "text_regex => qr/regex/,"
408
409           "text" matches the text of the link against string, which must be
410           an exact match.  To select a link with text that is exactly
411           "download", use
412
413               $mech->find_link( text => 'download' );
414
415           "text_regex" matches the text of the link against regex.  To select
416           a link with text that has "download" anywhere in it, regardless of
417           case, use
418
419               $mech->find_link( text_regex => qr/download/i );
420
421           Note that the text extracted from the page's links are trimmed.
422           For example, "<a> foo </a>" is stored as 'foo', and searching for
423           leading or trailing spaces will fail.
424
425       ·   "url => 'string'," and "url_regex => qr/regex/,"
426
427           Matches the URL of the link against string or regex, as
428           appropriate.  The URL may be a relative URL, like foo/bar.html,
429           depending on how it's coded on the page.
430
431       ·   "url_abs => string" and "url_abs_regex => regex"
432
433           Matches the absolute URL of the link against string or regex, as
434           appropriate.  The URL will be an absolute URL, even if it's
435           relative in the page.
436
437       ·   "name => string" and "name_regex => regex"
438
439           Matches the name of the link against string or regex, as
440           appropriate.
441
442       ·   "id => string" and "id_regex => regex"
443
444           Matches the attribute 'id' of the link against string or regex, as
445           appropriate.
446
447       ·   "class => string" and "class_regex => regex"
448
449           Matches the attribute 'class' of the link against string or regex,
450           as appropriate.
451
452       ·   "tag => string" and "tag_regex => regex"
453
454           Matches the tag that the link came from against string or regex, as
455           appropriate.  The "tag_regex" is probably most useful to check for
456           more than one tag, as in:
457
458               $mech->find_link( tag_regex => qr/^(a|frame)$/ );
459
460           The tags and attributes looked at are defined below, at
461           "$mech->find_link() : link format".
462
463       If "n" is not specified, it defaults to 1.  Therefore, if you don't
464       specify any parms, this method defaults to finding the first link on
465       the page.
466
467       Note that you can specify multiple text or URL parameters, which will
468       be ANDed together.  For example, to find the first link with text of
469       "News" and with "cnn.com" in the URL, use:
470
471           $mech->find_link( text => 'News', url_regex => qr/cnn\.com/ );
472
473       The return value is a reference to an array containing a
474       WWW::Mechanize::Link object for every link in "$self->content".
475
476       The links come from the following:
477
478       "<a href=...>"
479       "<area href=...>"
480       "<frame src=...>"
481       "<iframe src=...>"
482       "<link href=...>"
483       "<meta content=...>"
484
485   $mech->find_all_links( ... )
486       Returns all the links on the current page that match the criteria.  The
487       method for specifying link criteria is the same as in "find_link()".
488       Each of the links returned is a WWW::Mechanize::Link object.
489
490       In list context, "find_all_links()" returns a list of the links.
491       Otherwise, it returns a reference to the list of links.
492
493       "find_all_links()" with no parameters returns all links in the page.
494
495   $mech->find_all_inputs( ... criteria ... )
496       find_all_inputs() returns an array of all the input controls in the
497       current form whose properties match all of the regexes passed in.  The
498       controls returned are all descended from HTML::Form::Input.
499
500       If no criteria are passed, all inputs will be returned.
501
502       If there is no current page, there is no form on the current page, or
503       there are no submit controls in the current form then the return will
504       be an empty array.
505
506       You may use a regex or a literal string:
507
508           # get all textarea controls whose names begin with "customer"
509           my @customer_text_inputs = $mech->find_all_inputs(
510               type       => 'textarea',
511               name_regex => qr/^customer/,
512           );
513
514           # get all text or textarea controls called "customer"
515           my @customer_text_inputs = $mech->find_all_inputs(
516               type_regex => qr/^(text|textarea)$/,
517               name       => 'customer',
518           );
519
520   $mech->find_all_submits( ... criteria ... )
521       "find_all_submits()" does the same thing as "find_all_inputs()" except
522       that it only returns controls that are submit controls, ignoring other
523       types of input controls like text and checkboxes.
524

IMAGE METHODS

526   $mech->images
527       Lists all the images on the current page.  Each image is a
528       WWW::Mechanize::Image object. In list context, returns a list of all
529       images.  In scalar context, returns an array reference of all images.
530
531   $mech->find_image()
532       Finds an image in the current page. It returns a WWW::Mechanize::Image
533       object which describes the image.  If it fails to find an image it
534       returns undef.
535
536       You can select which image to find by passing in one or more of these
537       key/value pairs:
538
539       ·   "alt => 'string'" and "alt_regex => qr/regex/,"
540
541           "alt" matches the ALT attribute of the image against string, which
542           must be an exact match. To select a image with an ALT tag that is
543           exactly "download", use
544
545               $mech->find_image( alt => 'download' );
546
547           "alt_regex" matches the ALT attribute of the image  against a
548           regular expression.  To select an image with an ALT attribute that
549           has "download" anywhere in it, regardless of case, use
550
551               $mech->find_image( alt_regex => qr/download/i );
552
553       ·   "url => 'string'," and "url_regex => qr/regex/,"
554
555           Matches the URL of the image against string or regex, as
556           appropriate.  The URL may be a relative URL, like foo/bar.html,
557           depending on how it's coded on the page.
558
559       ·   "url_abs => string" and "url_abs_regex => regex"
560
561           Matches the absolute URL of the image against string or regex, as
562           appropriate.  The URL will be an absolute URL, even if it's
563           relative in the page.
564
565       ·   "tag => string" and "tag_regex => regex"
566
567           Matches the tag that the image came from against string or regex,
568           as appropriate.  The "tag_regex" is probably most useful to check
569           for more than one tag, as in:
570
571               $mech->find_image( tag_regex => qr/^(img|input)$/ );
572
573           The tags supported are "<img>" and "<input>".
574
575       If "n" is not specified, it defaults to 1.  Therefore, if you don't
576       specify any parms, this method defaults to finding the first image on
577       the page.
578
579       Note that you can specify multiple ALT or URL parameters, which will be
580       ANDed together.  For example, to find the first image with ALT text of
581       "News" and with "cnn.com" in the URL, use:
582
583           $mech->find_image( image => 'News', url_regex => qr/cnn\.com/ );
584
585       The return value is a reference to an array containing a
586       WWW::Mechanize::Image object for every image in "$self->content".
587
588   $mech->find_all_images( ... )
589       Returns all the images on the current page that match the criteria.
590       The method for specifying image criteria is the same as in
591       "find_image()".  Each of the images returned is a WWW::Mechanize::Image
592       object.
593
594       In list context, "find_all_images()" returns a list of the images.
595       Otherwise, it returns a reference to the list of images.
596
597       "find_all_images()" with no parameters returns all images in the page.
598

FORM METHODS

600       These methods let you work with the forms on a page.  The idea is to
601       choose a form that you'll later work with using the field methods
602       below.
603
604   $mech->forms
605       Lists all the forms on the current page.  Each form is an HTML::Form
606       object.  In list context, returns a list of all forms.  In scalar
607       context, returns an array reference of all forms.
608
609   $mech->form_number($number)
610       Selects the numberth form on the page as the target for subsequent
611       calls to "field()" and "click()".  Also returns the form that was
612       selected.
613
614       If it is found, the form is returned as an HTML::Form object and set
615       internally for later use with Mech's form methods such as "field()" and
616       "click()".
617
618       Emits a warning and returns undef if no form is found.
619
620       The first form is number 1, not zero.
621
622   $mech->form_name( $name )
623       Selects a form by name.  If there is more than one form on the page
624       with that name, then the first one is used, and a warning is generated.
625
626       If it is found, the form is returned as an HTML::Form object and set
627       internally for later use with Mech's form methods such as "field()" and
628       "click()".
629
630       Returns undef if no form is found.
631
632   $mech->form_id( $name )
633       Selects a form by ID.  If there is more than one form on the page with
634       that ID, then the first one is used, and a warning is generated.
635
636       If it is found, the form is returned as an HTML::Form object and set
637       internally for later use with Mech's form methods such as "field()" and
638       "click()".
639
640       Returns undef if no form is found.
641
642   $mech->form_with_fields( @fields )
643       Selects a form by passing in a list of field names it must contain.  If
644       there is more than one form on the page with that matches, then the
645       first one is used, and a warning is generated.
646
647       If it is found, the form is returned as an HTML::Form object and set
648       internally for later used with Mech's form methods such as "field()"
649       and "click()".
650
651       Returns undef if no form is found.
652
653       Note that this functionality requires libwww-perl 5.69 or higher.
654

FIELD METHODS

656       These methods allow you to set the values of fields in a given form.
657
658   $mech->field( $name, $value, $number )
659   $mech->field( $name, \@values, $number )
660       Given the name of a field, set its value to the value specified.  This
661       applies to the current form (as set by the "form_name()" or
662       "form_number()" method or defaulting to the first form on the page).
663
664       The optional $number parameter is used to distinguish between two
665       fields with the same name.  The fields are numbered from 1.
666
667   $mech->select($name, $value)
668   $mech->select($name, \@values)
669       Given the name of a "select" field, set its value to the value
670       specified.  If the field is not "<select multiple>" and the $value is
671       an array, only the first value will be set.  [Note: the documentation
672       previously claimed that only the last value would be set, but this was
673       incorrect.]  Passing $value as a hash with an "n" key selects an item
674       by number (e.g.  "{n => 3}" or "{n => [2,4]}").  The numbering starts
675       at 1.  This applies to the current form.
676
677       If you have a field with "<select multiple>" and you pass a single
678       $value, then $value will be added to the list of fields selected,
679       without clearing the others.  However, if you pass an array reference,
680       then all previously selected values will be cleared.
681
682       Returns true on successfully setting the value. On failure, returns
683       false and calls "$self>warn()" with an error message.
684
685   $mech->set_fields( $name => $value ... )
686       This method sets multiple fields of the current form. It takes a list
687       of field name and value pairs. If there is more than one field with the
688       same name, the first one found is set. If you want to select which of
689       the duplicate field to set, use a value which is an anonymous array
690       which has the field value and its number as the 2 elements.
691
692               # set the second foo field
693               $mech->set_fields( $name => [ 'foo', 2 ] );
694
695       The fields are numbered from 1.
696
697       This applies to the current form.
698
699   $mech->set_visible( @criteria )
700       This method sets fields of the current form without having to know
701       their names.  So if you have a login screen that wants a username and
702       password, you do not have to fetch the form and inspect the source (or
703       use the mech-dump utility, installed with WWW::Mechanize) to see what
704       the field names are; you can just say
705
706           $mech->set_visible( $username, $password );
707
708       and the first and second fields will be set accordingly.  The method is
709       called set_visible because it acts only on visible fields; hidden form
710       inputs are not considered.  The order of the fields is the order in
711       which they appear in the HTML source which is nearly always the order
712       anyone viewing the page would think they are in, but some creative work
713       with tables could change that; caveat user.
714
715       Each element in @criteria is either a field value or a field specifier.
716       A field value is a scalar.  A field specifier allows you to specify the
717       type of input field you want to set and is denoted with an arrayref
718       containing two elements.  So you could specify the first radio button
719       with
720
721           $mech->set_visible( [ radio => 'KCRW' ] );
722
723       Field values and specifiers can be intermixed, hence
724
725           $mech->set_visible( 'fred', 'secret', [ option => 'Checking' ] );
726
727       would set the first two fields to "fred" and "secret", and the next
728       "OPTION" menu field to "Checking".
729
730       The possible field specifier types are: "text", "password", "hidden",
731       "textarea", "file", "image", "submit", "radio", "checkbox" and
732       "option".
733
734       "set_visible" returns the number of values set.
735
736   $mech->tick( $name, $value [, $set] )
737       "Ticks" the first checkbox that has both the name and value associated
738       with it on the current form.  Dies if there is no named check box for
739       that value.  Passing in a false value as the third optional argument
740       will cause the checkbox to be unticked.
741
742   $mech->untick($name, $value)
743       Causes the checkbox to be unticked.  Shorthand for
744       "tick($name,$value,undef)"
745
746   $mech->value( $name [, $number] )
747       Given the name of a field, return its value. This applies to the
748       current form.
749
750       The optional $number parameter is used to distinguish between two
751       fields with the same name.  The fields are numbered from 1.
752
753       If the field is of type file (file upload field), the value is always
754       cleared to prevent remote sites from downloading your local files.  To
755       upload a file, specify its file name explicitly.
756
757   $mech->click( $button [, $x, $y] )
758       Has the effect of clicking a button on the current form.  The first
759       argument is the name of the button to be clicked.  The second and third
760       arguments (optional) allow you to specify the (x,y) coordinates of the
761       click.
762
763       If there is only one button on the form, "$mech->click()" with no
764       arguments simply clicks that one button.
765
766       Returns an HTTP::Response object.
767
768   $mech->click_button( ... )
769       Has the effect of clicking a button on the current form by specifying
770       its name, value, or index.  Its arguments are a list of key/value
771       pairs.  Only one of name, number, input or value must be specified in
772       the keys.
773
774       ·   "name => name"
775
776           Clicks the button named name in the current form.
777
778       ·   "number => n"
779
780           Clicks the nth button in the current form. Numbering starts at 1.
781
782       ·   "value => value"
783
784           Clicks the button with the value value in the current form.
785
786       ·   "input => $inputobject"
787
788           Clicks on the button referenced by $inputobject, an instance of
789           HTML::Form::SubmitInput obtained e.g. from
790
791               $mech->current_form()->find_input( undef, 'submit' )
792
793           $inputobject must belong to the current form.
794
795       ·   "x => x"
796
797       ·   "y => y"
798
799           These arguments (optional) allow you to specify the (x,y)
800           coordinates of the click.
801
802   $mech->submit()
803       Submits the page, without specifying a button to click.  Actually, no
804       button is clicked at all.
805
806       Returns an HTTP::Response object.
807
808       This used to be a synonym for "$mech->click( 'submit' )", but is no
809       longer so.
810
811   $mech->submit_form( ... )
812       This method lets you select a form from the previously fetched page,
813       fill in its fields, and submit it. It combines the
814       form_number/form_name, set_fields and click methods into one higher
815       level call. Its arguments are a list of key/value pairs, all of which
816       are optional.
817
818       ·   "fields => \%fields"
819
820           Specifies the fields to be filled in the current form.
821
822       ·   "with_fields => \%fields"
823
824           Probably all you need for the common case. It combines a smart form
825           selector and data setting in one operation. It selects the first
826           form that contains all fields mentioned in "\%fields".  This is
827           nice because you don't need to know the name or number of the form
828           to do this.
829
830           (calls "form_with_fields()" and "set_fields()").
831
832           If you choose this, the form_number, form_name, form_id and fields
833           options will be ignored.
834
835       ·   "form_number => n"
836
837           Selects the nth form (calls "form_number()").  If this parm is not
838           specified, the currently-selected form is used.
839
840       ·   "form_name => name"
841
842           Selects the form named name (calls "form_name()")
843
844       ·   "form_id => ID"
845
846           Selects the form with ID ID (calls "form_id()")
847
848       ·   "button => button"
849
850           Clicks on button button (calls "click()")
851
852       ·   "x => x, y => y"
853
854           Sets the x or y values for "click()"
855
856       If no form is selected, the first form found is used.
857
858       If button is not passed, then the "submit()" method is used instead.
859
860       Returns an HTTP::Response object.
861

MISCELLANEOUS METHODS

863   $mech->add_header( name => $value [, name => $value... ] )
864       Sets HTTP headers for the agent to add or remove from the HTTP request.
865
866           $mech->add_header( Encoding => 'text/klingon' );
867
868       If a value is "undef", then that header will be removed from any future
869       requests.  For example, to never send a Referer header:
870
871           $mech->add_header( Referer => undef );
872
873       If you want to delete a header, use "delete_header".
874
875       Returns the number of name/value pairs added.
876
877       NOTE: This method was very different in WWW::Mechanize before 1.00.
878       Back then, the headers were stored in a package hash, not as a member
879       of the object instance.  Calling "add_header()" would modify the
880       headers for every WWW::Mechanize object, even after your object no
881       longer existed.
882
883   $mech->delete_header( name [, name ... ] )
884       Removes HTTP headers from the agent's list of special headers.  For
885       instance, you might need to do something like:
886
887           # Don't send a Referer for this URL
888           $mech->add_header( Referer => undef );
889
890           # Get the URL
891           $mech->get( $url );
892
893           # Back to the default behavior
894           $mech->delete_header( 'Referer' );
895
896   $mech->quiet(true/false)
897       Allows you to suppress warnings to the screen.
898
899           $mech->quiet(0); # turns on warnings (the default)
900           $mech->quiet(1); # turns off warnings
901           $mech->quiet();  # returns the current quietness status
902
903   $mech->stack_depth( $max_depth )
904       Get or set the page stack depth. Use this if you're doing a lot of page
905       scraping and running out of memory.
906
907       A value of 0 means "no history at all."  By default, the max stack
908       depth is humongously large, effectively keeping all history.
909
910   $mech->save_content( $filename )
911       Dumps the contents of "$mech->content" into $filename.  $filename will
912       be overwritten.  Dies if there are any errors.
913
914       If the content type does not begin with "text/", then the content is
915       saved in binary mode.
916
917   $mech->dump_headers( [$fh] )
918       Prints a dump of the HTTP response headers for the most recent
919       response.  If $fh is not specified or is undef, it dumps to STDOUT.
920
921       Unlike the rest of the dump_* methods, you cannot specify a filehandle
922       to print to.
923
924   $mech->dump_links( [[$fh], $absolute] )
925       Prints a dump of the links on the current page to $fh.  If $fh is not
926       specified or is undef, it dumps to STDOUT.
927
928       If $absolute is true, links displayed are absolute, not relative.
929
930   $mech->dump_images( [[$fh], $absolute] )
931       Prints a dump of the images on the current page to $fh.  If $fh is not
932       specified or is undef, it dumps to STDOUT.
933
934       If $absolute is true, links displayed are absolute, not relative.
935
936   $mech->dump_forms( [$fh] )
937       Prints a dump of the forms on the current page to $fh.  If $fh is not
938       specified or is undef, it dumps to STDOUT.
939
940   $mech->dump_all( [[$fh], $absolute] )
941       Prints a dump of all links, images and forms on the current page to
942       $fh.  If $fh is not specified or is undef, it dumps to STDOUT.
943
944       If $absolute is true, links displayed are absolute, not relative.
945

OVERRIDDEN LWP::UserAgent METHODS

947   $mech->clone()
948       Clone the mech object.  The clone will be using the same cookie jar as
949       the original mech.
950
951   $mech->redirect_ok()
952       An overloaded version of "redirect_ok()" in LWP::UserAgent.  This
953       method is used to determine whether a redirection in the request should
954       be followed.
955
956       Note that WWW::Mechanize's constructor pushes POST on to the agent's
957       "requests_redirectable" list.
958
959   $mech->request( $request [, $arg [, $size]])
960       Overloaded version of "request()" in LWP::UserAgent.  Performs the
961       actual request.  Normally, if you're using WWW::Mechanize, it's because
962       you don't want to deal with this level of stuff anyway.
963
964       Note that $request will be modified.
965
966       Returns an HTTP::Response object.
967
968   $mech->update_html( $html )
969       Allows you to replace the HTML that the mech has found.  Updates the
970       forms and links parse-trees that the mech uses internally.
971
972       Say you have a page that you know has malformed output, and you want to
973       update it so the links come out correctly:
974
975           my $html = $mech->content;
976           $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
977           $mech->update_html( $html );
978
979       This method is also used internally by the mech itself to update its
980       own HTML content when loading a page. This means that if you would like
981       to systematically perform the above HTML substitution, you would
982       overload update_html in a subclass thusly:
983
984          package MyMech;
985          use base 'WWW::Mechanize';
986
987          sub update_html {
988              my ($self, $html) = @_;
989              $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
990              $self->WWW::Mechanize::update_html( $html );
991          }
992
993       If you do this, then the mech will use the tidied-up HTML instead of
994       the original both when parsing for its own needs, and for returning to
995       you through "content".
996
997       Overloading this method is also the recommended way of implementing
998       extra validation steps (e.g. link checkers) for every HTML page
999       received.  "warn" and "die" would then come in handy to signal
1000       validation errors.
1001
1002   $mech->credentials( $username, $password )
1003       Provide credentials to be used for HTTP Basic authentication for all
1004       sites and realms until further notice.
1005
1006       The four argument form described in LWP::UserAgent is still supported.
1007
1008   $mech->get_basic_credentials( $realm, $uri, $isproxy )
1009       Returns the credentials for the realm and URI.
1010
1011   $mech->clear_credentials()
1012       Remove any credentials set up with "credentials()".
1013

INTERNAL-ONLY METHODS

1015       These methods are only used internally.  You probably don't need to
1016       know about them.
1017
1018   $mech->_update_page($request, $response)
1019       Updates all internal variables in $mech as if $request was just
1020       performed, and returns $response. The page stack is not altered by this
1021       method, it is up to caller (e.g. "request") to do that.
1022
1023   $mech->_modify_request( $req )
1024       Modifies a HTTP::Request before the request is sent out, for both GET
1025       and POST requests.
1026
1027       We add a "Referer" header, as well as header to note that we can accept
1028       gzip encoded content, if Compress::Zlib is installed.
1029
1030   $mech->_make_request()
1031       Convenience method to make it easier for subclasses like
1032       WWW::Mechanize::Cached to intercept the request.
1033
1034   $mech->_reset_page()
1035       Resets the internal fields that track page parsed stuff.
1036
1037   $mech->_extract_links()
1038       Extracts links from the content of a webpage, and populates the
1039       "{links}" property with WWW::Mechanize::Link objects.
1040
1041   $mech->_push_page_stack()
1042       The agent keeps a stack of visited pages, which it can pop when it
1043       needs to go BACK and so on.
1044
1045       The current page needs to be pushed onto the stack before we get a new
1046       page, and the stack needs to be popped when BACK occurs.
1047
1048       Neither of these take any arguments, they just operate on the $mech
1049       object.
1050
1051   warn( @messages )
1052       Centralized warning method, for diagnostics and non-fatal problems.
1053       Defaults to calling "CORE::warn", but may be overridden by setting
1054       "onwarn" in the constructor.
1055
1056   die( @messages )
1057       Centralized error method.  Defaults to calling "CORE::die", but may be
1058       overridden by setting "onerror" in the constructor.
1059

REQUESTS & BUGS

1061       The bug queue for WWW::Mechanize and Test::WWW::Mechanize is at
1062       http://code.google.com/p/www-mechanize/issues/list
1063       <http://code.google.com/p/www-mechanize/issues/list>.  Please do not
1064       add any tickets to the old queue at <http://rt.cpan.org/>.
1065

WWW::MECHANIZE'S SUBVERSION REPOSITORY

1067       Mech and Test::WWW::Mechanize are both hosted at Google Code:
1068       http://code.google.com/p/www-mechanize/.  The Subversion repository is
1069       at http://www-mechanize.googlecode.com/svn/wm/.
1070

OTHER DOCUMENTATION

1072   Spidering Hacks, by Kevin Hemenway and Tara Calishain
1073       Spidering Hacks from O'Reilly
1074       (<http://www.oreilly.com/catalog/spiderhks/>) is a great book for
1075       anyone wanting to know more about screen-scraping and spidering.
1076
1077       There are six hacks that use Mech or a Mech derivative:
1078
1079       #21 WWW::Mechanize 101
1080       #22 Scraping with WWW::Mechanize
1081       #36 Downloading Images from Webshots
1082       #44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
1083       #64 Super Author Searching
1084       #73 Scraping TV Listings
1085
1086       The book was also positively reviewed on Slashdot:
1087       <http://books.slashdot.org/article.pl?sid=03/12/11/2126256>
1088

ONLINE RESOURCES AND SUPPORT

1090       ·   WWW::Mechanize mailing list
1091
1092           The Mech mailing list is at
1093           http://groups.google.com/group/www-mechanize-users
1094           <http://groups.google.com/group/www-mechanize-users> and is
1095           specific to Mechanize, unlike the LWP mailing list below.  Although
1096           it is a users list, all development discussion takes place here,
1097           too.
1098
1099       ·   LWP mailing list
1100
1101           The LWP mailing list is at
1102           <http://lists.perl.org/showlist.cgi?name=libwww>, and is more user-
1103           oriented and well-populated than the WWW::Mechanize list.
1104
1105       ·   Perlmonks
1106
1107           <http://perlmonks.org> is an excellent community of support, and
1108           many questions about Mech have already been answered there.
1109
1110       ·   WWW::Mechanize::Examples
1111
1112           A random array of examples submitted by users, included with the
1113           Mechanize distribution.
1114

ARTICLES ABOUT WWW::MECHANIZE

1116       ·   http://www-128.ibm.com/developerworks/linux/library/wa-perlsecure.html
1117           <http://www-128.ibm.com/developerworks/linux/library/wa-
1118           perlsecure.html>
1119
1120           IBM article "Secure Web site access with Perl"
1121
1122       ·   <http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf>
1123
1124           Leland Johnson's hack #84 in Google Hacks, 2nd Edition is an
1125           example of a production script that uses WWW::Mechanize and
1126           HTML::TableContentParser. It takes in keywords and returns the
1127           estimated price of these keywords on Google's AdWords program.
1128
1129       ·   <http://www.perl.com/pub/a/2004/06/04/recorder.html>
1130
1131           Linda Julien writes about using HTTP::Recorder to create
1132           WWW::Mechanize scripts.
1133
1134       ·   <http://www.developer.com/lang/other/article.php/3454041>
1135
1136           Jason Gilmore's article on using WWW::Mechanize for scraping sales
1137           information from Amazon and eBay.
1138
1139       ·   <http://www.perl.com/pub/a/2003/01/22/mechanize.html>
1140
1141           Chris Ball's article about using WWW::Mechanize for scraping TV
1142           listings.
1143
1144       ·   <http://www.stonehenge.com/merlyn/LinuxMag/col47.html>
1145
1146           Randal Schwartz's article on scraping Yahoo News for images.  It's
1147           already out of date: He manually walks the list of links hunting
1148           for matches, which wouldn't have been necessary if the
1149           "find_link()" method existed at press time.
1150
1151       ·   <http://www.perladvent.org/2002/16th/>
1152
1153           WWW::Mechanize on the Perl Advent Calendar, by Mark Fowler.
1154
1155       ·   http://www.linux-magazin.de/Artikel/ausgabe/2004/03/perl/perl.html
1156           <http://www.linux-
1157           magazin.de/Artikel/ausgabe/2004/03/perl/perl.html>
1158
1159           Michael Schilli's article on Mech and WWW::Mechanize::Shell for the
1160           German magazine Linux Magazin.
1161
1162   Other modules that use Mechanize
1163       Here are modules that use or subclass Mechanize.  Let me know of any
1164       others:
1165
1166       ·   Finance::Bank::LloydsTSB
1167
1168       ·   HTTP::Recorder
1169
1170           Acts as a proxy for web interaction, and then generates
1171           WWW::Mechanize scripts.
1172
1173       ·   Win32::IE::Mechanize
1174
1175           Just like Mech, but using Microsoft Internet Explorer to do the
1176           work.
1177
1178       ·   WWW::Bugzilla
1179
1180       ·   WWW::CheckSite
1181
1182       ·   WWW::Google::Groups
1183
1184       ·   WWW::Hotmail
1185
1186       ·   WWW::Mechanize::Cached
1187
1188       ·   WWW::Mechanize::FormFiller
1189
1190       ·   WWW::Mechanize::Shell
1191
1192       ·   WWW::Mechanize::Sleepy
1193
1194       ·   WWW::Mechanize::SpamCop
1195
1196       ·   WWW::Mechanize::Timed
1197
1198       ·   WWW::SourceForge
1199
1200       ·   WWW::Yahoo::Groups
1201

ACKNOWLEDGEMENTS

1203       Thanks to the numerous people who have helped out on WWW::Mechanize in
1204       one way or another, including Kirrily Robert for the original
1205       "WWW::Automate", Gisle Aas, Jeremy Ary, Hilary Holz, Rafael Kitover,
1206       Norbert Buchmuller, Dave Page, David Sainty, H.Merijn Brand, Matt
1207       Lawrence, Michael Schwern, Adriano Ferreira, Miyagawa, Peteris Krumins,
1208       Rafael Kitover, David Steinbrunner, Kevin Falcone, Mike O'Regan, Mark
1209       Stosberg, Uri Guttman, Peter Scott, Phillipe Bruhat, Ian Langworth,
1210       John Beppu, Gavin Estey, Jim Brandt, Ask Bjoern Hansen, Greg Davies, Ed
1211       Silva, Mark-Jason Dominus, Autrijus Tang, Mark Fowler, Stuart Children,
1212       Max Maischein, Meng Wong, Prakash Kailasa, Abigail, Jan Pazdziora,
1213       Dominique Quatravaux, Scott Lanning, Rob Casey, Leland Johnson, Joshua
1214       Gatcomb, Julien Beasley, Abe Timmerman, Peter Stevens, Pete Krawczyk,
1215       Tad McClellan, and the late great Iain Truskett.
1216
1218       Copyright (c) 2005-2010 Andy Lester. All rights reserved. This program
1219       is free software; you can redistribute it and/or modify it under the
1220       same terms as Perl itself.
1221
1222
1223
1224perl v5.12.0                      2010-04-11                 WWW::Mechanize(3)
Impressum