1WWW::Mechanize(3)     User Contributed Perl Documentation    WWW::Mechanize(3)
2
3
4

NAME

6       WWW::Mechanize - Handy web browsing in a Perl object
7

VERSION

9       Version 1.32
10

SYNOPSIS

12       "WWW::Mechanize", or Mech for short, helps you automate interaction
13       with a website. It supports performing a sequence of page fetches
14       including following links and submitting forms. Each fetched page is
15       parsed and its links and forms are extracted. A link or a form can be
16       selected, form fields can be filled and the next page can be fetched.
17       Mech also stores a history of the URLs you've visited, which can be
18       queried and revisited.
19
20           use WWW::Mechanize;
21           my $mech = WWW::Mechanize->new();
22
23           $mech->get( $url );
24
25           $mech->follow_link( n => 3 );
26           $mech->follow_link( text_regex => qr/download this/i );
27           $mech->follow_link( url => 'http://host.com/index.html' );
28
29           $mech->submit_form(
30               form_number => 3,
31               fields      => {
32                   username    => 'mungo',
33                   password    => 'lost-and-alone',
34               }
35           );
36
37           $mech->submit_form(
38               form_name => 'search',
39               fields    => { query  => 'pot of gold', },
40               button    => 'Search Now'
41           );
42
43       Mech is well suited for use in testing web applications.  If you use
44       one of the Test::*, like Test::HTML::Lint modules, you can check the
45       fetched content and use that as input to a test call.
46
47           use Test::More;
48           like( $mech->content(), qr/$expected/, "Got expected content" );
49
50       Each page fetch stores its URL in a history stack which you can tra‐
51       verse.
52
53           $mech->back();
54
55       If you want finer control over your page fetching, you can use these
56       methods. "follow_link" and "submit_form" are just high level wrappers
57       around them.
58
59           $mech->find_link( n => $number );
60           $mech->form_number( $number );
61           $mech->form_name( $name );
62           $mech->field( $name, $value );
63           $mech->set_fields( %field_values );
64           $mech->set_visible( @criteria );
65           $mech->click( $button );
66
67       WWW::Mechanize is a proper subclass of LWP::UserAgent and you can also
68       use any of LWP::UserAgent's methods.
69
70           $mech->add_header($name => $value);
71
72       Please note that Mech does NOT support JavaScript.  Please check the
73       FAQ in WWW::Mechanize::FAQ for more.
74
76       * <http://code.google.com/p/www-mechanize/issues/list>
77           The queue for bugs & enhancements in WWW::Mechanize and
78           Test::WWW::Mechanize.  Please note that the queue at
79           <http://rt.cpan.org> is no longer maintained.
80
81       * <http://search.cpan.org/dist/WWW-Mechanize/>
82           The CPAN documentation page for Mechanize.
83
84       * <http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod>
85           Frequently asked questions.  Make sure you read here FIRST.
86

CONSTRUCTOR AND STARTUP

88       new()
89
90       Creates and returns a new WWW::Mechanize object, hereafter referred to
91       as the "agent".
92
93           my $mech = WWW::Mechanize->new()
94
95       The constructor for WWW::Mechanize overrides two of the parms to the
96       LWP::UserAgent constructor:
97
98           agent => 'WWW-Mechanize/#.##'
99           cookie_jar => {}    # an empty, memory-only HTTP::Cookies object
100
101       You can override these overrides by passing parms to the constructor,
102       as in:
103
104           my $mech = WWW::Mechanize->new( agent => 'wonderbot 1.01' );
105
106       If you want none of the overhead of a cookie jar, or don't want your
107       bot accepting cookies, you have to explicitly disallow it, like so:
108
109           my $mech = WWW::Mechanize->new( cookie_jar => undef );
110
111       Here are the parms that WWW::Mechanize recognizes.  These do not
112       include parms that LWP::UserAgent recognizes.
113
114       * "autocheck => [0⎪1]"
115           Checks each request made to see if it was successful.  This saves
116           you the trouble of manually checking yourself.  Any errors found
117           are errors, not warnings.  Default is off.
118
119       * "onwarn => \&func"
120           Reference to a "warn"-compatible function, such as "Carp::carp",
121           that is called when a warning needs to be shown.
122
123           If this is set to "undef", no warnings will ever be shown.  How‐
124           ever, it's probably better to use the "quiet" method to control
125           that behavior.
126
127           If this value is not passed, Mech uses "Carp::carp" if Carp is
128           installed, or "CORE::warn" if not.
129
130       * "onerror => \&func"
131           Reference to a "die"-compatible function, such as "Carp::croak",
132           that is called when there's a fatal error.
133
134           If this is set to "undef", no errors will ever be shown.
135
136           If this value is not passed, Mech uses "Carp::croak" if Carp is
137           installed, or "CORE::die" if not.
138
139       * "quiet => [0⎪1]"
140           Don't complain on warnings.  Setting "quiet => 1" is the same as
141           calling "$mech->quiet(1)".  Default is off.
142
143       * "stack_depth => $value"
144           Sets the depth of the page stack that keeps track of all the down‐
145           loaded pages. Default is 0 (infinite). If the stack is eating up
146           your memory, then set it to 1.
147
148       $mech->agent_alias( $alias )
149
150       Sets the user agent string to the expanded version from a table of
151       actual user strings.  $alias can be one of the following:
152
153       * Windows IE 6
154       * Windows Mozilla
155       * Mac Safari
156       * Mac Mozilla
157       * Linux Mozilla
158       * Linux Konqueror
159
160       then it will be replaced with a more interesting one.  For instance,
161
162           $mech->agent_alias( 'Windows IE 6' );
163
164       sets your User-Agent to
165
166           Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
167
168       The list of valid aliases can be returned from "known_agent_aliases()".
169       The current list is:
170
171       * Windows IE 6
172       * Windows Mozilla
173       * Mac Safari
174       * Mac Mozilla
175       * Linux Mozilla
176       * Linux Konqueror
177
178       known_agent_aliases()
179
180       Returns a list of all the agent aliases that Mech knows about.
181

PAGE-FETCHING METHODS

183       $mech->get( $uri )
184
185       Given a URL/URI, fetches it.  Returns an HTTP::Response object.  $uri
186       can be a well-formed URL string, a URI object, or a WWW::Mecha‐
187       nize::Link object.
188
189       The results are stored internally in the agent object, but you don't
190       know that.  Just use the accessors listed below.  Poking at the inter‐
191       nals is deprecated and subject to change in the future.
192
193       "get()" is a well-behaved overloaded version of the method in
194       LWP::UserAgent.  This lets you do things like
195
196           $mech->get( $uri, ':content_file' => $tempfile );
197
198       and you can rest assured that the parms will get filtered down appro‐
199       priately.
200
201       $mech->put( $uri, content => $content )
202
203       PUTs $content to $uri.  Returns an HTTP::Response object.  $uri can be
204       a well-formed URI string, a URI object, or a WWW::Mechanize::Link
205       object.
206
207       $mech->reload()
208
209       Acts like the reload button in a browser: repeats the current request.
210       The history (as per the back method) is not altered.
211
212       Returns the HTTP::Response object from the reload, or "undef" if
213       there's no current request.
214
215       $mech->back()
216
217       The equivalent of hitting the "back" button in a browser.  Returns to
218       the previous page.  Won't go back past the first page. (Really, what
219       would it do if it could?)
220

STATUS METHODS

222       $mech->success()
223
224       Returns a boolean telling whether the last request was successful.  If
225       there hasn't been an operation yet, returns false.
226
227       This is a convenience function that wraps "$mech->res->is_success".
228
229       $mech->uri()
230
231       Returns the current URI as a URI object. This object stringifies to the
232       URI itself.
233
234       $mech->response() / $mech->res()
235
236       Return the current response as an HTTP::Response object.
237
238       Synonym for "$mech->response()"
239
240       $mech->status()
241
242       Returns the HTTP status code of the response.
243
244       $mech->ct()
245
246       Returns the content type of the response.
247
248       $mech->base()
249
250       Returns the base URI for the current response
251
252       $mech->forms()
253
254       When called in a list context, returns a list of the forms found in the
255       last fetched page. In a scalar context, returns a reference to an array
256       with those forms. The forms returned are all HTML::Form objects.
257
258       $mech->current_form()
259
260       Returns the current form as an HTML::Form object.
261
262       $mech->links()
263
264       When called in a list context, returns a list of the links found in the
265       last fetched page.  In a scalar context it returns a reference to an
266       array with those links.  Each link is a WWW::Mechanize::Link object.
267
268       $mech->is_html()
269
270       Returns true/false on whether our content is HTML, according to the
271       HTTP headers.
272
273       $mech->title()
274
275       Returns the contents of the "<TITLE>" tag, as parsed by HTML::Head‐
276       Parser.  Returns undef if the content is not HTML.
277

CONTENT-HANDLING METHODS

279       $mech->content(...)
280
281       Returns the content that the mech uses internally for the last page
282       fetched. Ordinarily this is the same as $mech->response()->content(),
283       but this may differ for HTML documents if "update_html" is overloaded
284       (in which case the value passed to the base-class implementation of
285       same will be returned), and/or extra named arguments are passed to con‐
286       tent():
287
288       $mech->content( format => 'text' )
289         Returns a text-only version of the page, with all HTML markup
290         stripped. This feature requires HTML::TreeBuilder to be installed, or
291         a fatal error will be thrown.
292
293       $mech->content( base_href => [$base_href⎪undef] )
294         Returns the HTML document, modified to contain a "<base
295         href="$base_href">" mark-up in the header.  $base_href is
296         "$mech->base()" if not specified. This is handy to pass the HTML to
297         e.g. HTML::Display.
298
299       Passing arguments to "content()" if the current document is not HTML
300       has no effect now (i.e. the return value is the same as
301       "$self->response()->content()". This may change in the future, but will
302       likely be backwards-compatible when it does.
303
305       $mech->links
306
307       Lists all the links on the current page.  Each link is a WWW::Mecha‐
308       nize::Link object. In list context, returns a list of all links.  In
309       scalar context, returns an array reference of all links.
310
311       $mech->follow_link(...)
312
313       Follows a specified link on the page.  You specify the match to be
314       found using the same parms that "find_link()" uses.
315
316       Here some examples:
317
318       * 3rd link called "download"
319               $mech->follow_link( text => 'download', n => 3 );
320
321       * first link where the URL has "download" in it, regardless of case:
322               $mech->follow_link( url_regex => qr/download/i );
323
324           or
325
326               $mech->follow_link( url_regex => qr/(?i:download)/ );
327
328       * 3rd link on the page
329               $mech->follow_link( n => 3 );
330
331       Returns the result of the GET method (an HTTP::Response object) if a
332       link was found. If the page has no links, or the specified link
333       couldn't be found, returns undef.
334
335       $mech->find_link( ... )
336
337       Finds a link in the currently fetched page. It returns a WWW::Mecha‐
338       nize::Link object which describes the link.  (You'll probably be most
339       interested in the "url()" property.)  If it fails to find a link it
340       returns undef.
341
342       You can take the URL part and pass it to the "get()" method.  If that's
343       your plan, you might as well use the "follow_link()" method directly,
344       since it does the "get()" for you automatically.
345
346       Note that "<FRAME SRC="...">" tags are parsed out of the the HTML and
347       treated as links so this method works with them.
348
349       You can select which link to find by passing in one or more of these
350       key/value pairs:
351
352       * "text => 'string'," and "text_regex => qr/regex/,"
353           "text" matches the text of the link against string, which must be
354           an exact match.  To select a link with text that is exactly "down‐
355           load", use
356
357               $mech->find_link( text => 'download' );
358
359           "text_regex" matches the text of the link against regex.  To select
360           a link with text that has "download" anywhere in it, regardless of
361           case, use
362
363               $mech->find_link( text_regex => qr/download/i );
364
365           Note that the text extracted from the page's links are trimmed.
366           For example, "<a> foo </a>" is stored as 'foo', and searching for
367           leading or trailing spaces will fail.
368
369       * "url => 'string'," and "url_regex => qr/regex/,"
370           Matches the URL of the link against string or regex, as appropri‐
371           ate.  The URL may be a relative URL, like foo/bar.html, depending
372           on how it's coded on the page.
373
374       * "url_abs => string" and "url_abs_regex => regex"
375           Matches the absolute URL of the link against string or regex, as
376           appropriate.  The URL will be an absolute URL, even if it's rela‐
377           tive in the page.
378
379       * "name => string" and "name_regex => regex"
380           Matches the name of the link against string or regex, as appropri‐
381           ate.
382
383       * "id => string" and "id_regex => regex"
384           Matches the attribute 'id' of the link against string or regex, as
385           appropriate.
386
387       * "class => string" and "class_regex => regex"
388           Matches the attribute 'class' of the link against string or regex,
389           as appropriate.
390
391       * "tag => string" and "tag_regex => regex"
392           Matches the tag that the link came from against string or regex, as
393           appropriate.  The "tag_regex" is probably most useful to check for
394           more than one tag, as in:
395
396               $mech->find_link( tag_regex => qr/^(a⎪frame)$/ );
397
398           The tags and attributes looked at are defined below, at
399           "$mech->find_link() : link format".
400
401       If "n" is not specified, it defaults to 1.  Therefore, if you don't
402       specify any parms, this method defaults to finding the first link on
403       the page.
404
405       Note that you can specify multiple text or URL parameters, which will
406       be ANDed together.  For example, to find the first link with text of
407       "News" and with "cnn.com" in the URL, use:
408
409           $mech->find_link( text => 'News', url_regex => qr/cnn\.com/ );
410
411       The return value is a reference to an array containing a WWW::Mecha‐
412       nize::Link object for every link in "$self->content".
413
414       The links come from the following:
415
416       "<A HREF=...>"
417       "<AREA HREF=...>"
418       "<FRAME SRC=...>"
419       "<IFRAME SRC=...>"
420       "<META CONTENT=...>"
421
422       $mech->find_all_links( ... )
423
424       Returns all the links on the current page that match the criteria.  The
425       method for specifying link criteria is the same as in "find_link()".
426       Each of the links returned is a WWW::Mechanize::Link object.
427
428       In list context, "find_all_links()" returns a list of the links.  Oth‐
429       erwise, it returns a reference to the list of links.
430
431       "find_all_links()" with no parameters returns all links in the page.
432
433       $mech->find_all_inputs( ... criteria ... )
434
435       find_all_inputs() returns an array of all the input controls in the
436       current form whose properties match all of the regexes passed in.  The
437       controls returned are all descended from HTML::Form::Input.
438
439       If no criteria are passed, all inputs will be returned.
440
441       If there is no current page, there is no form on the current page, or
442       there are no submit controls in the current form then the return will
443       be an empty array.
444
445       You may use a regex or a literal string:
446
447           # get all textarea controls whose names begin with "customer"
448           my @customer_text_inputs =
449               $mech->find_all_inputs( {
450                   type       => 'textarea',
451                   name_regex => qr/^customer/,
452               }
453           );
454
455           # get all text or textarea controls called "customer"
456           my @customer_text_inputs =
457               $mech->find_all_inputs( {
458                   type_regex => qr/^(text⎪textarea)$/,
459                   name       => 'customer',
460               }
461           );
462
463       $mech->find_all_submits( ... criteria ... )
464
465       "find_all_submits()" does the same thing as "find_all_inputs()" except
466       that it only returns controls that are submit controls, ignoring other
467       types of input controls like text and checkboxes.
468

IMAGE METHODS

470       $mech->images
471
472       Lists all the images on the current page.  Each image is a WWW::Mecha‐
473       nize::Image object. In list context, returns a list of all images.  In
474       scalar context, returns an array reference of all images.
475
476       $mech->find_image()
477
478       Finds an image in the current page. It returns a WWW::Mechanize::Image
479       object which describes the image.  If it fails to find an image it
480       returns undef.
481
482       You can select which image to find by passing in one or more of these
483       key/value pairs:
484
485       * "alt => 'string'" and "alt_regex => qr/regex/,"
486           "alt" matches the ALT attribute of the image against string, which
487           must be an exact match. To select a image with an ALT tag that is
488           exactly "download", use
489
490               $mech->find_image( alt => 'download' );
491
492           "alt_regex" matches the ALT attribute of the image  against a regu‐
493           lar expression.  To select an image with an ALT attribute that has
494           "download" anywhere in it, regardless of case, use
495
496               $mech->find_image( alt_regex => qr/download/i );
497
498       * "url => 'string'," and "url_regex => qr/regex/,"
499           Matches the URL of the image against string or regex, as appropri‐
500           ate.  The URL may be a relative URL, like foo/bar.html, depending
501           on how it's coded on the page.
502
503       * "url_abs => string" and "url_abs_regex => regex"
504           Matches the absolute URL of the image against string or regex, as
505           appropriate.  The URL will be an absolute URL, even if it's rela‐
506           tive in the page.
507
508       * "tag => string" and "tag_regex => regex"
509           Matches the tag that the image came from against string or regex,
510           as appropriate.  The "tag_regex" is probably most useful to check
511           for more than one tag, as in:
512
513               $mech->find_image( tag_regex => qr/^(img⎪input)$/ );
514
515           The tags supported are "<img>" and "<input>".
516
517       If "n" is not specified, it defaults to 1.  Therefore, if you don't
518       specify any parms, this method defaults to finding the first image on
519       the page.
520
521       Note that you can specify multiple ALT or URL parameters, which will be
522       ANDed together.  For example, to find the first image with ALT text of
523       "News" and with "cnn.com" in the URL, use:
524
525           $mech->find_image( image => 'News', url_regex => qr/cnn\.com/ );
526
527       The return value is a reference to an array containing a WWW::Mecha‐
528       nize::Image object for every image in "$self->content".
529
530       $mech->find_all_images( ... )
531
532       Returns all the images on the current page that match the criteria.
533       The method for specifying image criteria is the same as in
534       "find_image()".  Each of the images returned is a WWW::Mechanize::Image
535       object.
536
537       In list context, "find_all_images()" returns a list of the images.
538       Otherwise, it returns a reference to the list of images.
539
540       "find_all_images()" with no parameters returns all images in the page.
541

FORM METHODS

543       $mech->forms
544
545       Lists all the forms on the current page.  Each form is an HTML::Form
546       object.  In list context, returns a list of all forms.  In scalar con‐
547       text, returns an array reference of all forms.
548
549       $mech->form_number($number)
550
551       Selects the numberth form on the page as the target for subsequent
552       calls to "field()" and "click()".  Also returns the form that was
553       selected.
554
555       If it is found, the form is returned as an HTML::Form object and set
556       internally for later use with Mech's form methods such as "field()" and
557       "click()".
558
559       Emits a warning and returns undef if no form is found.
560
561       The first form is number 1, not zero.
562
563       $mech->form_name( $name )
564
565       Selects a form by name.  If there is more than one form on the page
566       with that name, then the first one is used, and a warning is generated.
567
568       If it is found, the form is returned as an HTML::Form object and set
569       internally for later use with Mech's form methods such as "field()" and
570       "click()".
571
572       Returns undef if no form is found.
573
574       Note that this functionality requires libwww-perl 5.69 or higher.
575
576       $mech->form_with_fields( @fields )
577
578       Selects a form by passing in a list of field names it must contain.  If
579       there is more than one form on the page with that matches, then the
580       first one is used, and a warning is generated.
581
582       If it is found, the form is returned as an HTML::Form object and set
583       internally for later used with Mech's form methods such as "field()"
584       and "click()".
585
586       Returns undef if no form is found.
587
588       Note that this functionality requires libwww-perl 5.69 or higher.
589
590       $mech->field( $name, $value, $number )
591
592       $mech->field( $name, \@values, $number )
593
594       Given the name of a field, set its value to the value specified.  This
595       applies to the current form (as set by the form_name() or form_number()
596       method or defaulting to the first form on the page).
597
598       The optional $number parameter is used to distinguish between two
599       fields with the same name.  The fields are numbered from 1.
600
601       $mech->select($name, $value)
602
603       $mech->select($name, \@values)
604
605       Given the name of a "select" field, set its value to the value speci‐
606       fied.  If the field is not <select multiple> and the $value is an
607       array, only the first value will be set.  [Note: the documentation pre‐
608       viously claimed that only the last value would be set, but this was
609       incorrect.]  Passing $value as a hash with an "n" key selects an item
610       by number (e.g. "{n =" 3> or "{n =" [2,4]}>).  The numbering starts at
611       1.  This applies to the current form.
612
613       Returns 1 on successfully setting the value. On failure, returns undef
614       and calls "$self>warn()" with an error message.
615
616       $mech->set_fields( $name => $value ... )
617
618       This method sets multiple fields of the current form. It takes a list
619       of field name and value pairs. If there is more than one field with the
620       same name, the first one found is set. If you want to select which of
621       the duplicate field to set, use a value which is an anonymous array
622       which has the field value and its number as the 2 elements.
623
624               # set the second foo field
625               $mech->set_fields( $name => [ 'foo', 2 ] ) ;
626
627       The fields are numbered from 1.
628
629       This applies to the current form.
630
631       $mech->set_visible( @criteria )
632
633       This method sets fields of the current form without having to know
634       their names.  So if you have a login screen that wants a username and
635       password, you do not have to fetch the form and inspect the source (or
636       use the mech-dump utility, installed with WWW::Mechanize) to see what
637       the field names are; you can just say
638
639           $mech->set_visible( $username, $password ) ;
640
641       and the first and second fields will be set accordingly.  The method is
642       called set_visible because it acts only on visible fields; hidden form
643       inputs are not considered.  The order of the fields is the order in
644       which they appear in the HTML source which is nearly always the order
645       anyone viewing the page would think they are in, but some creative work
646       with tables could change that; caveat user.
647
648       Each element in @criteria is either a field value or a field specifier.
649       A field value is a scalar.  A field specifier allows you to specify the
650       type of input field you want to set and is denoted with an arrayref
651       containing two elements.  So you could specify the first radio button
652       with
653
654           $mech->set_visible( [ radio => 'KCRW' ] ) ;
655
656       Field values and specifiers can be intermixed, hence
657
658           $mech->set_visible( 'fred', 'secret', [ option => 'Checking' ] ) ;
659
660       would set the first two fields to "fred" and "secret", and the next
661       "OPTION" menu field to "Checking".
662
663       The possible field specifier types are: "text", "password", "hidden",
664       "textarea", "file", "image", "submit", "radio", "checkbox" and
665       "option".
666
667       "set_visible" returns the number of values set.
668
669       $mech->tick( $name, $value [, $set] )
670
671       "Ticks" the first checkbox that has both the name and value associated
672       with it on the current form.  Dies if there is no named check box for
673       that value.  Passing in a false value as the third optional argument
674       will cause the checkbox to be unticked.
675
676       $mech->untick($name, $value)
677
678       Causes the checkbox to be unticked.  Shorthand for
679       "tick($name,$value,undef)"
680
681       $mech->value( $name, $number )
682
683       Given the name of a field, return its value. This applies to the cur‐
684       rent form.
685
686       The option $number parameter is used to distinguish between two fields
687       with the same name.  The fields are numbered from 1.
688
689       If the field is of type file (file upload field), the value is always
690       cleared to prevent remote sites from downloading your local files.  To
691       upload a file, specify its file name explicitly.
692
693       $mech->click( $button [, $x, $y] )
694
695       Has the effect of clicking a button on the current form.  The first
696       argument is the name of the button to be clicked.  The second and third
697       arguments (optional) allow you to specify the (x,y) coordinates of the
698       click.
699
700       If there is only one button on the form, "$mech->click()" with no argu‐
701       ments simply clicks that one button.
702
703       Returns an HTTP::Response object.
704
705       $mech->click_button( ... )
706
707       Has the effect of clicking a button on the current form by specifying
708       its name, value, or index.  Its arguments are a list of key/value
709       pairs.  Only one of name, number, input or value must be specified in
710       the keys.
711
712       * name => name
713           Clicks the button named name in the current form.
714
715       * number => n
716           Clicks the nth button in the current form. Numbering starts at 1.
717
718       * value => value
719           Clicks the button with the value value in the current form.
720
721       * input => $inputobject
722           Clicks on the button referenced by $inputobject, an instance of
723           HTML::Form::SubmitInput obtained e.g. from
724
725               $mech->current_form()->find_input( undef, 'submit' )
726
727           $inputobject must belong to the current form.
728
729       * x => x
730       * y => y
731           These arguments (optional) allow you to specify the (x,y) coordi‐
732           nates of the click.
733
734       $mech->submit()
735
736       Submits the page, without specifying a button to click.  Actually, no
737       button is clicked at all.
738
739       This used to be a synonym for "$mech->click( 'submit' )", but is no
740       longer so.
741
742       $mech->submit_form( ... )
743
744       This method lets you select a form from the previously fetched page,
745       fill in its fields, and submit it. It combines the form_num‐
746       ber/form_name, set_fields and click methods into one higher level call.
747       Its arguments are a list of key/value pairs, all of which are optional.
748
749       * fields => \%fields
750           Specifies the fields to be filled in the current form.
751
752       * with_fields => \%fields
753           Probably all you need for the common case. It combines a smart form
754           selector and data setting in one operation. It selects the first
755           form that contains all fields mentioned in "\%fields".  This is
756           nice because you don't need to know the name or number of the form
757           to do this.
758
759           (calls "form_with_fields" and "set_fields()").
760
761           If you choose this, the form_number, form_name and fields options
762           will be ignored.
763
764       * form_number => n
765           Selects the nth form (calls "form_number()").  If this parm is not
766           specified, the currently-selected form is used.
767
768       * form_name => name
769           Selects the form named name (calls "form_name()")
770
771       * button => button
772           Clicks on button button (calls "click()")
773
774       * x => x, y => y
775           Sets the x or y values for "click()"
776
777       If no form is selected, the first form found is used.
778
779       If button is not passed, then the "submit()" method is used instead.
780
781       Returns an HTTP::Response object.
782

MISCELLANEOUS METHODS

784       $mech->add_header( name => $value [, name => $value... ] )
785
786       Sets HTTP headers for the agent to add or remove from the HTTP request.
787
788           $mech->add_header( Encoding => 'text/klingon' );
789
790       If a value is "undef", then that header will be removed from any future
791       requests.  For example, to never send a Referer header:
792
793           $mech->add_header( Referer => undef );
794
795       If you want to delete a header, use "delete_header".
796
797       Returns the number of name/value pairs added.
798
799       NOTE: This method was very different in WWW::Mechanize before 1.00.
800       Back then, the headers were stored in a package hash, not as a member
801       of the object instance.  Calling "add_header()" would modify the head‐
802       ers for every WWW::Mechanize object, even after your object no longer
803       existed.
804
805       $mech->delete_header( name [, name ... ] )
806
807       Removes HTTP headers from the agent's list of special headers.  For
808       instance, you might need to do something like:
809
810           # Don't send a Referer for this URL
811           $mech->add_header( Referer => undef );
812
813           # Get the URL
814           $mech->get( $url );
815
816           # Back to the default behavior
817           $mech->delete_header( 'Referer' );
818
819       $mech->quiet(true/false)
820
821       Allows you to suppress warnings to the screen.
822
823           $mech->quiet(0); # turns on warnings (the default)
824           $mech->quiet(1); # turns off warnings
825           $mech->quiet();  # returns the current quietness status
826
827       $mech->stack_depth( $max_depth )
828
829       Get or set the page stack depth. Use this if you're doing a lot of page
830       scraping and running out of memory.
831
832       A value of 0 means "no history at all."  By default, the max stack
833       depth is humongously large, effectively keeping all history.
834
835       $mech->save_content( $filename )
836
837       Dumps the contents of "$mech->content" into $filename.  $filename will
838       be overwritten.  Dies if there are any errors.
839
840       $mech->dump_links( [[$fh], $absolute] )
841
842       Prints a dump of the links on the current page to $fh.  If $fh is not
843       specified or is undef, it dumps to STDOUT.
844
845       If $absolute is true, links displayed are absolute, not relative.
846
847       $mech->dump_images( [[$fh], $absolute] )
848
849       Prints a dump of the images on the current page to $fh.  If $fh is not
850       specified or is undef, it dumps to STDOUT.
851
852       If $absolute is true, links displayed are absolute, not relative.
853
854       $mech->dump_forms( [$fh] )
855
856       Prints a dump of the forms on the current page to $fh.  If $fh is not
857       specified or is undef, it dumps to STDOUT.
858
859       $mech->dump_all( [[$fh], $absolute] )
860
861       Prints a dump of all links, images and forms on the current page to
862       $fh.  If $fh is not specified or is undef, it dumps to STDOUT.
863
864       If $absolute is true, links displayed are absolute, not relative.
865

OVERRIDDEN LWP::UserAgent METHODS

867       $mech->clone()
868
869       Clone the mech object. We override here to be sure the cookie jar gets
870       copied over
871
872       $mech->redirect_ok()
873
874       An overloaded version of "redirect_ok()" in LWP::UserAgent.  This
875       method is used to determine whether a redirection in the request should
876       be followed.
877
878       $mech->request( $request [, $arg [, $size]])
879
880       Overloaded version of "request()" in LWP::UserAgent.  Performs the
881       actual request.  Normally, if you're using WWW::Mechanize, it's because
882       you don't want to deal with this level of stuff anyway.
883
884       Note that $request will be modified.
885
886       Returns an HTTP::Response object.
887
888       $mech->update_html( $html )
889
890       Allows you to replace the HTML that the mech has found.  Updates the
891       forms and links parse-trees that the mech uses internally.
892
893       Say you have a page that you know has malformed output, and you want to
894       update it so the links come out correctly:
895
896           my $html = $mech->content;
897           $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
898           $mech->update_html( $html );
899
900       This method is also used internally by the mech itself to update its
901       own HTML content when loading a page. This means that if you would like
902       to systematically perform the above HTML substitution, you would over‐
903       load update_html in a subclass thusly:
904
905          package MyMech;
906          use base 'WWW::Mechanize';
907
908          sub update_html {
909              my ($self, $html) = @_;
910              $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
911              $self->WWW::Mechanize::update_html( $html );
912          }
913
914       If you do this, then the mech will use the tidied-up HTML instead of
915       the original both when parsing for its own needs, and for returning to
916       you through "content".
917
918       Overloading this method is also the recommended way of implementing
919       extra validation steps (e.g. link checkers) for every HTML page
920       received.  "warn" and "die" would then come in handy to signal valida‐
921       tion errors.
922
923       $mech->credentials( $username, $password )
924
925       Provide credentials to be used for HTTP Basic authentication for all
926       sites and realms until further notice.
927
928       The four argument form described in LWP::UserAgent is still supported.
929

INTERNAL-ONLY METHODS

931       These methods are only used internally.  You probably don't need to
932       know about them.
933
934       $mech->_update_page($request, $response)
935
936       Updates all internal variables in $mech as if $request was just per‐
937       formed, and returns $response. The page stack is not altered by this
938       method, it is up to caller (e.g. "request") to do that.
939
940       $mech->_modify_request( $req )
941
942       Modifies a HTTP::Request before the request is sent out, for both GET
943       and POST requests.
944
945       We add a "Referer" header, as well as header to note that we can accept
946       gzip encoded content, if Compress::Zlib is installed.
947
948       $mech->_make_request()
949
950       Convenience method to make it easier for subclasses like WWW::Mecha‐
951       nize::Cached to intercept the request.
952
953       $mech->_reset_page()
954
955       Resets the internal fields that track page parsed stuff.
956
957       $mech->_extract_links()
958
959       Extracts links from the content of a webpage, and populates the
960       "{links}" property with WWW::Mechanize::Link objects.
961
962       $mech->_push_page_stack() / $mech->_pop_page_stack()
963
964       The agent keeps a stack of visited pages, which it can pop when it
965       needs to go BACK and so on.
966
967       The current page needs to be pushed onto the stack before we get a new
968       page, and the stack needs to be popped when BACK occurs.
969
970       Neither of these take any arguments, they just operate on the $mech
971       object.
972
973       warn( @messages )
974
975       Centralized warning method, for diagnostics and non-fatal problems.
976       Defaults to calling "CORE::warn", but may be overridden by setting
977       "onwarn" in the constructor.
978
979       die( @messages )
980
981       Centralized error method.  Defaults to calling "CORE::die", but may be
982       overridden by setting "onerror" in the constructor.
983

WWW::MECHANIZE'S SUBVERSION REPOSITORY

985       Mech and Test::WWW::Mechanize are both hosted at Google Code:
986       http://code.google.com/p/www-mechanize/.  The Subversion repository is
987       at http://www-mechanize.googlecode.com/svn/wm/.
988

OTHER DOCUMENTATION

990       Spidering Hacks, by Kevin Hemenway and Tara Calishain
991
992       Spidering Hacks from O'Reilly (<http://www.oreilly.com/catalog/spi
993       derhks/>) is a great book for anyone wanting to know more about screen-
994       scraping and spidering.
995
996       There are six hacks that use Mech or a Mech derivative:
997
998       #21 WWW::Mechanize 101
999       #22 Scraping with WWW::Mechanize
1000       #36 Downloading Images from Webshots
1001       #44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
1002       #64 Super Author Searching
1003       #73 Scraping TV Listings
1004
1005       The book was also positively reviewed on Slashdot: <http://books.slash
1006       dot.org/article.pl?sid=03/12/11/2126256>
1007

ONLINE RESOURCES AND SUPPORT

1009       * LWP mailing list
1010           The LWP mailing list is at
1011           <http://lists.perl.org/showlist.cgi?name=libwww>, and is more user-
1012           oriented and well-populated than the WWW::Mechanize Development
1013           list.  This is a good list for Mech users, since LWP is the basis
1014           for Mech.
1015
1016       * Perlmonks
1017           <http://perlmonks.org> is an excellent community of support, and
1018           many questions about Mech have already been answered there.
1019
1020       * WWW::Mechanize::Examples
1021           A random array of examples submitted by users, included with the
1022           Mechanize distribution.
1023

ARTICLES ABOUT WWW::MECHANIZE

1025       * <http://www-128.ibm.com/developerworks/linux/library/wa-perlse
1026       cure.html>
1027           IBM article "Secure Web site access with Perl"
1028
1029       * <http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf>
1030           Leland Johnson's hack #84 in Google Hacks, 2nd Edition is an exam‐
1031           ple of a production script that uses WWW::Mechanize and
1032           HTML::TableContentParser. It takes in keywords and returns the
1033           estimated price of these keywords on Google's AdWords program.
1034
1035       * <http://www.perl.com/pub/a/2004/06/04/recorder.html>
1036           Linda Julien writes about using HTTP::Recorder to create WWW::Mech‐
1037           anize scripts.
1038
1039       * <http://www.developer.com/lang/other/article.php/3454041>
1040           Jason Gilmore's article on using WWW::Mechanize for scraping sales
1041           information from Amazon and eBay.
1042
1043       * <http://www.perl.com/pub/a/2003/01/22/mechanize.html>
1044           Chris Ball's article about using WWW::Mechanize for scraping TV
1045           listings.
1046
1047       * <http://www.stonehenge.com/merlyn/LinuxMag/col47.html>
1048           Randal Schwartz's article on scraping Yahoo News for images.  It's
1049           already out of date: He manually walks the list of links hunting
1050           for matches, which wouldn't have been necessary if the
1051           "find_link()" method existed at press time.
1052
1053       * <http://www.perladvent.org/2002/16th/>
1054           WWW::Mechanize on the Perl Advent Calendar, by Mark Fowler.
1055
1056       * <http://www.linux-magazin.de/Artikel/ausgabe/2004/03/perl/perl.html>
1057           Michael Schilli's article on Mech and WWW::Mechanize::Shell for the
1058           German magazine Linux Magazin.
1059
1060       Other modules that use Mechanize
1061
1062       Here are modules that use or subclass Mechanize.  Let me know of any
1063       others:
1064
1065       * Finance::Bank::LloydsTSB
1066       * HTTP::Recorder
1067           Acts as a proxy for web interaction, and then generates WWW::Mecha‐
1068           nize scripts.
1069
1070       * Win32::IE::Mechanize
1071           Just like Mech, but using Microsoft Internet Explorer to do the
1072           work.
1073
1074       * WWW::Bugzilla
1075       * WWW::CheckSite
1076       * WWW::Google::Groups
1077       * WWW::Hotmail
1078       * WWW::Mechanize::Cached
1079       * WWW::Mechanize::FormFiller
1080       * WWW::Mechanize::Shell
1081       * WWW::Mechanize::Sleepy
1082       * WWW::Mechanize::SpamCop
1083       * WWW::Mechanize::Timed
1084       * WWW::SourceForge
1085       * WWW::Yahoo::Groups
1086

REQUESTS & BUGS

1088       Please report any requests, suggestions or (gasp!) bugs via the excel‐
1089       lent RT bug-tracking system at http://rt.cpan.org/, or email to
1090       "bug-WWW-Mechanize at rt.cpan.org".  This makes it much easier for me
1091       to track things.
1092
1093       <http://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Mechanize> is the RT
1094       queue for Mechanize.  Please check to see if your bug has already been
1095       reported.
1096
1097       Please note that this is NOT for support requests.  Please be sure to
1098       read the FAQ if you have support requests.
1099

ACKNOWLEDGEMENTS

1101       Thanks to the numerous people who have helped out on WWW::Mechanize in
1102       one way or another, including Kirrily Robert for the original
1103       "WWW::Automate", Adriano Ferreira, Miyagawa, Peteris Krumins, Rafael
1104       Kitover, David Steinbrunner, Kevin Falcone, Mike O'Regan, Mark Stos‐
1105       berg, Uri Guttman, Peter Scott, Phillipe Bruhat, Ian Langworth, John
1106       Beppu, Gavin Estey, Jim Brandt, Ask Bjoern Hansen, Greg Davies, Ed
1107       Silva, Mark-Jason Dominus, Autrijus Tang, Mark Fowler, Stuart Children,
1108       Max Maischein, Meng Wong, Prakash Kailasa, Abigail, Jan Pazdziora,
1109       Dominique Quatravaux, Scott Lanning, Rob Casey, Leland Johnson, Joshua
1110       Gatcomb, Julien Beasley, Abe Timmerman, Peter Stevens, Pete Krawczyk,
1111       Tad McClellan, and the late great Iain Truskett.
1112
1114       Copyright (c) 2005-2007 Andy Lester. All rights reserved. This program
1115       is free software; you can redistribute it and/or modify it under the
1116       same terms as Perl itself.
1117
1118
1119
1120perl v5.8.8                       2007-10-30                 WWW::Mechanize(3)
Impressum