1WWW::Mechanize(3)     User Contributed Perl Documentation    WWW::Mechanize(3)
2
3
4

NAME

6       WWW::Mechanize - Handy web browsing in a Perl object
7

VERSION

9       version 2.07
10

SYNOPSIS

12       WWW::Mechanize supports performing a sequence of page fetches including
13       following links and submitting forms. Each fetched page is parsed and
14       its links and forms are extracted. A link or a form can be selected,
15       form fields can be filled and the next page can be fetched.  Mech also
16       stores a history of the URLs you've visited, which can be queried and
17       revisited.
18
19           use WWW::Mechanize ();
20           my $mech = WWW::Mechanize->new();
21
22           $mech->get( $url );
23
24           $mech->follow_link( n => 3 );
25           $mech->follow_link( text_regex => qr/download this/i );
26           $mech->follow_link( url => 'http://host.com/index.html' );
27
28           $mech->submit_form(
29               form_number => 3,
30               fields      => {
31                   username    => 'mungo',
32                   password    => 'lost-and-alone',
33               }
34           );
35
36           $mech->submit_form(
37               form_name => 'search',
38               fields    => { query  => 'pot of gold', },
39               button    => 'Search Now'
40           );
41
42           # Enable strict form processing to catch typos and non-existant form fields.
43           my $strict_mech = WWW::Mechanize->new( strict_forms => 1);
44
45           $strict_mech->get( $url );
46
47           # This method call will die, saving you lots of time looking for the bug.
48           $strict_mech->submit_form(
49               form_number => 3,
50               fields      => {
51                   usernaem     => 'mungo',           # typo in field name
52                   password     => 'lost-and-alone',
53                   extra_field  => 123,               # field does not exist
54               }
55           );
56

DESCRIPTION

58       "WWW::Mechanize", or Mech for short, is a Perl module for stateful
59       programmatic web browsing, used for automating interaction with
60       websites.
61
62       Features include:
63
64       •   All HTTP methods
65
66       •   High-level hyperlink and HTML form support, without having to parse
67           HTML yourself
68
69       •   SSL support
70
71       •   Automatic cookies
72
73       •   Custom HTTP headers
74
75       •   Automatic handling of redirections
76
77       •   Proxies
78
79       •   HTTP authentication
80
81       Mech is well suited for use in testing web applications.  If you use
82       one of the Test::*, like Test::HTML::Lint modules, you can check the
83       fetched content and use that as input to a test call.
84
85           use Test::More;
86           like( $mech->content(), qr/$expected/, "Got expected content" );
87
88       Each page fetch stores its URL in a history stack which you can
89       traverse.
90
91           $mech->back();
92
93       If you want finer control over your page fetching, you can use these
94       methods. "follow_link" and "submit_form" are just high level wrappers
95       around them.
96
97           $mech->find_link( n => $number );
98           $mech->form_number( $number );
99           $mech->form_name( $name );
100           $mech->field( $name, $value );
101           $mech->set_fields( %field_values );
102           $mech->set_visible( @criteria );
103           $mech->click( $button );
104
105       WWW::Mechanize is a proper subclass of LWP::UserAgent and you can also
106       use any of LWP::UserAgent's methods.
107
108           $mech->add_header($name => $value);
109
110       Please note that Mech does NOT support JavaScript, you need additional
111       software for that. Please check "JavaScript" in WWW::Mechanize::FAQ for
112       more.
113
115       •   <https://github.com/libwww-perl/WWW-Mechanize/issues>
116
117           The queue for bugs & enhancements in WWW::Mechanize.  Please note
118           that the queue at <http://rt.cpan.org> is no longer maintained.
119
120       •   <https://metacpan.org/pod/WWW::Mechanize>
121
122           The CPAN documentation page for Mechanize.
123
124       •   <https://metacpan.org/pod/distribution/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod>
125
126           Frequently asked questions.  Make sure you read here FIRST.
127

CONSTRUCTOR AND STARTUP

129   new()
130       Creates and returns a new WWW::Mechanize object, hereafter referred to
131       as the "agent".
132
133           my $mech = WWW::Mechanize->new()
134
135       The constructor for WWW::Mechanize overrides two of the params to the
136       LWP::UserAgent constructor:
137
138           agent => 'WWW-Mechanize/#.##'
139           cookie_jar => {}    # an empty, memory-only HTTP::Cookies object
140
141       You can override these overrides by passing params to the constructor,
142       as in:
143
144           my $mech = WWW::Mechanize->new( agent => 'wonderbot 1.01' );
145
146       If you want none of the overhead of a cookie jar, or don't want your
147       bot accepting cookies, you have to explicitly disallow it, like so:
148
149           my $mech = WWW::Mechanize->new( cookie_jar => undef );
150
151       Here are the params that WWW::Mechanize recognizes.  These do not
152       include params that LWP::UserAgent recognizes.
153
154       •   "autocheck => [0|1]"
155
156           Checks each request made to see if it was successful.  This saves
157           you the trouble of manually checking yourself.  Any errors found
158           are errors, not warnings.
159
160           The default value is ON, unless it's being subclassed, in which
161           case it is OFF.  This means that standalone WWW::Mechanize
162           instances have autocheck turned on, which is protective for the
163           vast majority of Mech users who don't bother checking the return
164           value of get() and post() and can't figure why their code fails.
165           However, if WWW::Mechanize is subclassed, such as for
166           Test::WWW::Mechanize or Test::WWW::Mechanize::Catalyst, this may
167           not be an appropriate default, so it's off.
168
169       •   "noproxy => [0|1]"
170
171           Turn off the automatic call to the LWP::UserAgent "env_proxy"
172           function.
173
174           This needs to be explicitly turned off if you're using
175           Crypt::SSLeay to access a https site via a proxy server.  Note: you
176           still need to set your HTTPS_PROXY environment variable as
177           appropriate.
178
179       •   "onwarn => \&func"
180
181           Reference to a "warn"-compatible function, such as "Carp::carp",
182           that is called when a warning needs to be shown.
183
184           If this is set to "undef", no warnings will ever be shown.
185           However, it's probably better to use the "quiet" method to control
186           that behavior.
187
188           If this value is not passed, Mech uses "Carp::carp" if Carp is
189           installed, or "CORE::warn" if not.
190
191       •   "onerror => \&func"
192
193           Reference to a "die"-compatible function, such as "Carp::croak",
194           that is called when there's a fatal error.
195
196           If this is set to "undef", no errors will ever be shown.
197
198           If this value is not passed, Mech uses "Carp::croak" if Carp is
199           installed, or "CORE::die" if not.
200
201       •   "quiet => [0|1]"
202
203           Don't complain on warnings.  Setting "quiet => 1" is the same as
204           calling "$mech->quiet(1)".  Default is off.
205
206       •   "stack_depth => $value"
207
208           Sets the depth of the page stack that keeps track of all the
209           downloaded pages. Default is effectively infinite stack size.  If
210           the stack is eating up your memory, then set this to a smaller
211           number, say 5 or 10.  Setting this to zero means Mech will keep no
212           history.
213
214       In addition, WWW::Mechanize also allows you to globally enable strict
215       and verbose mode for form handling, which is done with HTML::Form.
216
217       •   "strict_forms => [0|1]"
218
219           Globally sets the HTML::Form strict flag which causes form
220           submission to croak if any of the passed fields don't exist in the
221           form, and/or a value doesn't exist in a select element. This can
222           still be disabled in individual calls to "submit_form()".
223
224           Default is off.
225
226       •   "verbose_forms => [0|1]"
227
228           Globally sets the HTML::Form verbose flag which causes form
229           submission to warn about any bad HTML form constructs found. This
230           cannot be disabled later.
231
232           Default is off.
233
234       •   "marked_sections => [0|1]"
235
236           Globally sets the HTML::Parser marked sections flag which causes
237           HTML "CDATA[[" sections to be honoured. This cannot be disabled
238           later.
239
240           Default is on.
241
242       To support forms, WWW::Mechanize's constructor pushes POST on to the
243       agent's "requests_redirectable" list (see also LWP::UserAgent.)
244
245   $mech->agent_alias( $alias )
246       Sets the user agent string to the expanded version from a table of
247       actual user strings.  $alias can be one of the following:
248
249       •   Windows IE 6
250
251       •   Windows Mozilla
252
253       •   Mac Safari
254
255       •   Mac Mozilla
256
257       •   Linux Mozilla
258
259       •   Linux Konqueror
260
261       then it will be replaced with a more interesting one.  For instance,
262
263           $mech->agent_alias( 'Windows IE 6' );
264
265       sets your User-Agent to
266
267           Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
268
269       The list of valid aliases can be returned from "known_agent_aliases()".
270       The current list is:
271
272       •   Windows IE 6
273
274       •   Windows Mozilla
275
276       •   Mac Safari
277
278       •   Mac Mozilla
279
280       •   Linux Mozilla
281
282       •   Linux Konqueror
283
284   known_agent_aliases()
285       Returns a list of all the agent aliases that Mech knows about.
286

PAGE-FETCHING METHODS

288   $mech->get( $uri )
289       Given a URL/URI, fetches it.  Returns an HTTP::Response object.  $uri
290       can be a well-formed URL string, a URI object, or a
291       WWW::Mechanize::Link object.
292
293       The results are stored internally in the agent object, but you don't
294       know that.  Just use the accessors listed below.  Poking at the
295       internals is deprecated and subject to change in the future.
296
297       "get()" is a well-behaved overloaded version of the method in
298       LWP::UserAgent.  This lets you do things like
299
300           $mech->get( $uri, ':content_file' => $filename );
301
302       and you can rest assured that the params will get filtered down
303       appropriately. See "get" in LWP::UserAgent for more details.
304
305       NOTE: Because ":content_file" causes the page contents to be stored in
306       a file instead of the response object, some Mech functions that expect
307       it to be there won't work as expected. Use with caution.
308
309       Here is a non-complete list of methods that do not work as expected
310       with ":content_file": " forms() ", " current_form() ", " links() ", "
311       title() ", " content(...) ", " text() ", all content-handling methods,
312       all link methods, all image methods, all form methods, all field
313       methods, " save_content(...) ", " dump_links(...) ", " dump_images(...)
314       ", " dump_forms(...) ", " dump_text(...) "
315
316   $mech->post( $uri, content => $content )
317       POSTs $content to $uri.  Returns an HTTP::Response object.  $uri can be
318       a well-formed URI string, a URI object, or a WWW::Mechanize::Link
319       object.
320
321   $mech->put( $uri, content => $content )
322       PUTs $content to $uri.  Returns an HTTP::Response object.  $uri can be
323       a well-formed URI string, a URI object, or a WWW::Mechanize::Link
324       object.
325
326           my $res = $mech->head( $uri );
327           my $res = $mech->head( $uri , $field_name => $value, ... );
328
329   $mech->head ($uri )
330       Performs a HEAD request to $uri. Returns an HTTP::Response object.
331       $uri can be a well-formed URI string, a URI object, or a
332       WWW::Mechanize::Link object.
333
334   $mech->reload()
335       Acts like the reload button in a browser: repeats the current request.
336       The history (as per the back() method) is not altered.
337
338       Returns the HTTP::Response object from the reload, or "undef" if
339       there's no current request.
340
341   $mech->back()
342       The equivalent of hitting the "back" button in a browser.  Returns to
343       the previous page.  Won't go back past the first page. (Really, what
344       would it do if it could?)
345
346       Returns true if it could go back, or false if not.
347
348   $mech->clear_history()
349       This deletes all the history entries and returns true.
350
351   $mech->history_count()
352       This returns the number of items in the browser history.  This number
353       does include the most recently made request.
354
355   $mech->history($n)
356       This returns the nth item in history.  The 0th item is the most recent
357       request and response, which would be acted on by methods like
358       "find_link()".  The 1st item is the state you'd return to if you called
359       "back()".
360
361       The maximum useful value for $n is "$mech->history_count - 1".
362       Requests beyond that bound will return "undef".
363
364       History items are returned as hash references, in the form:
365
366         { req => $http_request, res => $http_response }
367

STATUS METHODS

369   $mech->success()
370       Returns a boolean telling whether the last request was successful.  If
371       there hasn't been an operation yet, returns false.
372
373       This is a convenience function that wraps "$mech->res->is_success".
374
375   $mech->uri()
376       Returns the current URI as a URI object. This object stringifies to the
377       URI itself.
378
379   $mech->response() / $mech->res()
380       Return the current response as an HTTP::Response object.
381
382       Synonym for "$mech->response()"
383
384   $mech->status()
385       Returns the HTTP status code of the response.  This is a 3-digit number
386       like 200 for OK, 404 for not found, and so on.
387
388   $mech->ct() / $mech->content_type()
389       Returns the content type of the response.
390
391   $mech->base()
392       Returns the base URI for the current response
393
394   $mech->forms()
395       When called in a list context, returns a list of the forms found in the
396       last fetched page. In a scalar context, returns a reference to an array
397       with those forms. The forms returned are all HTML::Form objects.
398
399   $mech->current_form()
400       Returns the current form as an HTML::Form object.
401
402   $mech->links()
403       When called in a list context, returns a list of the links found in the
404       last fetched page.  In a scalar context it returns a reference to an
405       array with those links.  Each link is a WWW::Mechanize::Link object.
406
407   $mech->is_html()
408       Returns true/false on whether our content is HTML, according to the
409       HTTP headers.
410
411   $mech->title()
412       Returns the contents of the "<TITLE>" tag, as parsed by
413       HTML::HeadParser.  Returns undef if the content is not HTML.
414
415   $mech->redirects()
416       Convenience method to get the redirects from the most recent
417       HTTP::Response.
418
419       Note that you can also use is_redirect to see if the most recent
420       response was a redirect like this.
421
422           $mech->get($url);
423           do_stuff() if $mech->res->is_redirect;
424

CONTENT-HANDLING METHODS

426   $mech->content(...)
427       Returns the content that the mech uses internally for the last page
428       fetched. Ordinarily this is the same as
429       "$mech->response()->decoded_content()", but this may differ for HTML
430       documents if update_html is overloaded (in which case the value passed
431       to the base-class implementation of same will be returned), and/or
432       extra named arguments are passed to content():
433
434       $mech->content( format => 'text' )
435         Returns a text-only version of the page, with all HTML markup
436         stripped. This feature requires HTML::TreeBuilder version 5 or higher
437         to be installed, or a fatal error will be thrown. This works only if
438         the contents are HTML.
439
440       $mech->content( base_href => [$base_href|undef] )
441         Returns the HTML document, modified to contain a "<base
442         href="$base_href">" mark-up in the header.  $base_href is
443         "$mech->base()" if not specified. This is handy to pass the HTML to
444         e.g. HTML::Display. This works only if the contents are HTML.
445
446       $mech->content( raw => 1 )
447         Returns "$self->response()->content()", i.e. the raw contents from
448         the response.
449
450       $mech->content( decoded_by_headers => 1 )
451         Returns the content after applying all "Content-Encoding" headers but
452         with not additional mangling.
453
454       $mech->content( charset => $charset )
455         Returns "$self->response()->decoded_content(charset => $charset)"
456         (see HTTP::Response for details).
457
458       To preserve backwards compatibility, additional parameters will be
459       ignored unless none of "raw | decoded_by_headers | charset" is
460       specified and the text is HTML, in which case an error will be
461       triggered.
462
463       A fresh instance of WWW::Mechanize will return "undef" when
464       "$mech->content()" is called, because no content is present before a
465       request has been made.
466
467   $mech->text()
468       Returns the text of the current HTML content.  If the content isn't
469       HTML, $mech will die.
470
471       The text is extracted by parsing the content, and then the extracted
472       text is cached, so don't worry about performance of calling this
473       repeatedly.
474
476   $mech->links()
477       Lists all the links on the current page.  Each link is a
478       WWW::Mechanize::Link object. In list context, returns a list of all
479       links.  In scalar context, returns an array reference of all links.
480
481   $mech->follow_link(...)
482       Follows a specified link on the page.  You specify the match to be
483       found using the same params that "find_link()" uses.
484
485       Here some examples:
486
487       •   3rd link called "download"
488
489               $mech->follow_link( text => 'download', n => 3 );
490
491       •   first link where the URL has "download" in it, regardless of case:
492
493               $mech->follow_link( url_regex => qr/download/i );
494
495           or
496
497               $mech->follow_link( url_regex => qr/(?i:download)/ );
498
499       •   3rd link on the page
500
501               $mech->follow_link( n => 3 );
502
503       •   the link with the url
504
505               $mech->follow_link( url => '/other/page' );
506
507           or
508
509               $mech->follow_link( url => 'http://example.com/page' );
510
511       Returns the result of the "GET" method (an HTTP::Response object) if a
512       link was found.
513
514       If the page has no links, or the specified link couldn't be found,
515       returns "undef".  If "autocheck" is enabled an exception will be thrown
516       instead.
517
518   $mech->find_link( ... )
519       Finds a link in the currently fetched page. It returns a
520       WWW::Mechanize::Link object which describes the link.  (You'll probably
521       be most interested in the "url()" property.)  If it fails to find a
522       link it returns undef.
523
524       You can take the URL part and pass it to the "get()" method.  If that's
525       your plan, you might as well use the "follow_link()" method directly,
526       since it does the "get()" for you automatically.
527
528       Note that "<FRAME SRC="...">" tags are parsed out of the HTML and
529       treated as links so this method works with them.
530
531       You can select which link to find by passing in one or more of these
532       key/value pairs:
533
534       •   "text => 'string'," and "text_regex => qr/regex/,"
535
536           "text" matches the text of the link against string, which must be
537           an exact match.  To select a link with text that is exactly
538           "download", use
539
540               $mech->find_link( text => 'download' );
541
542           "text_regex" matches the text of the link against regex.  To select
543           a link with text that has "download" anywhere in it, regardless of
544           case, use
545
546               $mech->find_link( text_regex => qr/download/i );
547
548           Note that the text extracted from the page's links are trimmed.
549           For example, "<a> foo </a>" is stored as 'foo', and searching for
550           leading or trailing spaces will fail.
551
552       •   "url => 'string'," and "url_regex => qr/regex/,"
553
554           Matches the URL of the link against string or regex, as
555           appropriate.  The URL may be a relative URL, like foo/bar.html,
556           depending on how it's coded on the page.
557
558       •   "url_abs => string" and "url_abs_regex => regex"
559
560           Matches the absolute URL of the link against string or regex, as
561           appropriate.  The URL will be an absolute URL, even if it's
562           relative in the page.
563
564       •   "name => string" and "name_regex => regex"
565
566           Matches the name of the link against string or regex, as
567           appropriate.
568
569       •   "rel => string" and "rel_regex => regex"
570
571           Matches the rel of the link against string or regex, as
572           appropriate.  This can be used to find stylesheets, favicons, or
573           links the author of the page does not want bots to follow.
574
575       •   "id => string" and "id_regex => regex"
576
577           Matches the attribute 'id' of the link against string or regex, as
578           appropriate.
579
580       •   "class => string" and "class_regex => regex"
581
582           Matches the attribute 'class' of the link against string or regex,
583           as appropriate.
584
585       •   "tag => string" and "tag_regex => regex"
586
587           Matches the tag that the link came from against string or regex, as
588           appropriate.  The "tag_regex" is probably most useful to check for
589           more than one tag, as in:
590
591               $mech->find_link( tag_regex => qr/^(a|frame)$/ );
592
593           The tags and attributes looked at are defined below.
594
595       If "n" is not specified, it defaults to 1.  Therefore, if you don't
596       specify any params, this method defaults to finding the first link on
597       the page.
598
599       Note that you can specify multiple text or URL parameters, which will
600       be ANDed together.  For example, to find the first link with text of
601       "News" and with "cnn.com" in the URL, use:
602
603           $mech->find_link( text => 'News', url_regex => qr/cnn\.com/ );
604
605       The return value is a reference to an array containing a
606       WWW::Mechanize::Link object for every link in "$self->content".
607
608       The links come from the following:
609
610       "<a href=...>"
611       "<area href=...>"
612       "<frame src=...>"
613       "<iframe src=...>"
614       "<link href=...>"
615       "<meta content=...>"
616
617   $mech->find_all_links( ... )
618       Returns all the links on the current page that match the criteria.  The
619       method for specifying link criteria is the same as in "find_link()".
620       Each of the links returned is a WWW::Mechanize::Link object.
621
622       In list context, "find_all_links()" returns a list of the links.
623       Otherwise, it returns a reference to the list of links.
624
625       "find_all_links()" with no parameters returns all links in the page.
626
627   $mech->find_all_inputs( ... criteria ... )
628       find_all_inputs() returns an array of all the input controls in the
629       current form whose properties match all of the regexes passed in.  The
630       controls returned are all descended from HTML::Form::Input.  See
631       "INPUTS" in HTML::Form for details.
632
633       If no criteria are passed, all inputs will be returned.
634
635       If there is no current page, there is no form on the current page, or
636       there are no submit controls in the current form then the return will
637       be an empty array.
638
639       You may use a regex or a literal string:
640
641           # get all textarea controls whose names begin with "customer"
642           my @customer_text_inputs = $mech->find_all_inputs(
643               type       => 'textarea',
644               name_regex => qr/^customer/,
645           );
646
647           # get all text or textarea controls called "customer"
648           my @customer_text_inputs = $mech->find_all_inputs(
649               type_regex => qr/^(text|textarea)$/,
650               name       => 'customer',
651           );
652
653   $mech->find_all_submits( ... criteria ... )
654       "find_all_submits()" does the same thing as "find_all_inputs()" except
655       that it only returns controls that are submit controls, ignoring other
656       types of input controls like text and checkboxes.
657

IMAGE METHODS

659   $mech->images
660       Lists all the images on the current page.  Each image is a
661       WWW::Mechanize::Image object. In list context, returns a list of all
662       images.  In scalar context, returns an array reference of all images.
663
664   $mech->find_image()
665       Finds an image in the current page. It returns a WWW::Mechanize::Image
666       object which describes the image.  If it fails to find an image it
667       returns undef.
668
669       You can select which image to find by passing in one or more of these
670       key/value pairs:
671
672       •   "alt => 'string'" and "alt_regex => qr/regex/"
673
674           "alt" matches the ALT attribute of the image against string, which
675           must be an exact match. To select a image with an ALT tag that is
676           exactly "download", use
677
678               $mech->find_image( alt => 'download' );
679
680           "alt_regex" matches the ALT attribute of the image  against a
681           regular expression.  To select an image with an ALT attribute that
682           has "download" anywhere in it, regardless of case, use
683
684               $mech->find_image( alt_regex => qr/download/i );
685
686       •   "url => 'string'" and "url_regex => qr/regex/"
687
688           Matches the URL of the image against string or regex, as
689           appropriate.  The URL may be a relative URL, like foo/bar.html,
690           depending on how it's coded on the page.
691
692       •   "url_abs => string" and "url_abs_regex => regex"
693
694           Matches the absolute URL of the image against string or regex, as
695           appropriate.  The URL will be an absolute URL, even if it's
696           relative in the page.
697
698       •   "tag => string" and "tag_regex => regex"
699
700           Matches the tag that the image came from against string or regex,
701           as appropriate.  The "tag_regex" is probably most useful to check
702           for more than one tag, as in:
703
704               $mech->find_image( tag_regex => qr/^(img|input)$/ );
705
706           The tags supported are "<img>" and "<input>".
707
708       •   "id => string" and "id_regex => regex"
709
710           "id" matches the id attribute of the image against string, which
711           must be an exact match. To select an image with the exact id
712           "download-image", use
713
714               $mech->find_image( id => 'download-image' );
715
716           "id_regex" matches the id attribute of the image against a regular
717           expression. To select the first image with an id that contains
718           "download" anywhere in it, use
719
720               $mech->find_image( id_regex => qr/download/ );
721
722       •   "classs => string" and "class_regex => regex"
723
724           "class" matches the class attribute of the image against string,
725           which must be an exact match. To select an image with the exact
726           class "img-fuid", use
727
728               $mech->find_image( class => 'img-fluid' );
729
730           To select an image with the class attribute "rounded float-left",
731           use
732
733               $mech->find_image( class => 'rounded float-left' );
734
735           Note that the classes have to be matched as a complete string, in
736           the exact order they appear in the website's source code.
737
738           "class_regex" matches the class attribute of the image against a
739           regular expression. Use this if you want a partial class name, or
740           if an image has several classes, but you only care about one.
741
742           To select the first image with the class "rounded", where there are
743           multiple images that might also have either class "float-left" or
744           "float-right", use
745
746               $mech->find_image( class_regex => qr/\brounded\b/ );
747
748           Selecting an image with multiple classes where you do not care
749           about the order they appear in the website's source code is not
750           currently supported.
751
752       If "n" is not specified, it defaults to 1.  Therefore, if you don't
753       specify any params, this method defaults to finding the first image on
754       the page.
755
756       Note that you can specify multiple ALT or URL parameters, which will be
757       ANDed together.  For example, to find the first image with ALT text of
758       "News" and with "cnn.com" in the URL, use:
759
760           $mech->find_image( image => 'News', url_regex => qr/cnn\.com/ );
761
762       The return value is a reference to an array containing a
763       WWW::Mechanize::Image object for every image in "$self->content".
764
765   $mech->find_all_images( ... )
766       Returns all the images on the current page that match the criteria.
767       The method for specifying image criteria is the same as in
768       "find_image()".  Each of the images returned is a WWW::Mechanize::Image
769       object.
770
771       In list context, "find_all_images()" returns a list of the images.
772       Otherwise, it returns a reference to the list of images.
773
774       "find_all_images()" with no parameters returns all images in the page.
775

FORM METHODS

777       These methods let you work with the forms on a page.  The idea is to
778       choose a form that you'll later work with using the field methods
779       below.
780
781   $mech->forms
782       Lists all the forms on the current page.  Each form is an HTML::Form
783       object.  In list context, returns a list of all forms.  In scalar
784       context, returns an array reference of all forms.
785
786   $mech->form_number($number)
787       Selects the numberth form on the page as the target for subsequent
788       calls to "field()" and "click()".  Also returns the form that was
789       selected.
790
791       If it is found, the form is returned as an HTML::Form object and set
792       internally for later use with Mech's form methods such as "field()" and
793       "click()".  When called in a list context, the number of the found form
794       is also returned as a second value.
795
796       Emits a warning and returns undef if no form is found.
797
798       The first form is number 1, not zero.
799
800   $mech->form_name( $name )
801       Selects a form by name.  If there is more than one form on the page
802       with that name, then the first one is used, and a warning is generated.
803
804       If it is found, the form is returned as an HTML::Form object and set
805       internally for later use with Mech's form methods such as "field()" and
806       "click()".
807
808       Returns undef if no form is found.
809
810   $mech->form_id( $id )
811       Selects a form by ID.  If there is more than one form on the page with
812       that ID, then the first one is used, and a warning is generated.
813
814       If it is found, the form is returned as an HTML::Form object and set
815       internally for later use with Mech's form methods such as "field()" and
816       "click()".
817
818       If no form is found it returns "undef".  This will also trigger a
819       warning, unless "quiet" is enabled.
820
821   $mech->all_forms_with_fields( @fields )
822       Selects a form by passing in a list of field names it must contain.
823       All matching forms (perhaps none) are returned as a list of HTML::Form
824       objects.
825
826   $mech->form_with_fields( @fields )
827       Selects a form by passing in a list of field names it must contain.  If
828       there is more than one form on the page with that matches, then the
829       first one is used, and a warning is generated.
830
831       If it is found, the form is returned as an HTML::Form object and set
832       internally for later used with Mech's form methods such as "field()"
833       and "click()".
834
835       Returns undef and emits a warning if no form is found.
836
837       Note that this functionality requires libwww-perl 5.69 or higher.
838
839   $mech->all_forms_with( $attr1 => $value1, $attr2 => $value2, ... )
840       Searches for forms with arbitrary attribute/value pairs within the
841       <form> tag.  (Currently does not work for attribute "action" due to
842       implementation details of HTML::Form.)  When given more than one pair,
843       all criteria must match.  Using "undef" as value means that the
844       attribute in question must not be present.
845
846       All matching forms (perhaps none) are returned as a list of HTML::Form
847       objects.
848
849   $mech->form_with( $attr1 => $value1, $attr2 => $value2, ... )
850       Searches for forms with arbitrary attribute/value pairs within the
851       <form> tag.  (Currently does not work for attribute "action" due to
852       implementation details of HTML::Form.)  When given more than one pair,
853       all criteria must match.  Using "undef" as value means that the
854       attribute in question must not be present.
855
856       If it is found, the form is returned as an HTML::Form object and set
857       internally for later used with Mech's form methods such as "field()"
858       and "click()".
859
860       Returns undef if no form is found.
861

FIELD METHODS

863       These methods allow you to set the values of fields in a given form.
864
865   $mech->field( $name, $value, $number )
866   $mech->field( $name, \@values, $number )
867       Given the name of a field, set its value to the value specified.  This
868       applies to the current form (as set by the "form_name()" or
869       "form_number()" method or defaulting to the first form on the page).
870
871       The optional $number parameter is used to distinguish between two
872       fields with the same name.  The fields are numbered from 1.
873
874   $mech->select($name, $value)
875   $mech->select($name, \@values)
876       Given the name of a "select" field, set its value to the value
877       specified.  If the field is not "<select multiple>" and the $value is
878       an array, only the first value will be set.  [Note: the documentation
879       previously claimed that only the last value would be set, but this was
880       incorrect.]  Passing $value as a hash with an "n" key selects an item
881       by number (e.g.  "{n => 3}" or "{n => [2,4]}").  The numbering starts
882       at 1.  This applies to the current form.
883
884       If you have a field with "<select multiple>" and you pass a single
885       $value, then $value will be added to the list of fields selected,
886       without clearing the others.  However, if you pass an array reference,
887       then all previously selected values will be cleared.
888
889       Returns true on successfully setting the value. On failure, returns
890       false and calls "$self->warn()" with an error message.
891
892   $mech->set_fields( $name => $value ... )
893       This method sets multiple fields of the current form. It takes a list
894       of field name and value pairs. If there is more than one field with the
895       same name, the first one found is set. If you want to select which of
896       the duplicate field to set, use a value which is an anonymous array
897       which has the field value and its number as the 2 elements.
898
899               # set the second foo field
900               $mech->set_fields( $name => [ 'foo', 2 ] );
901
902       The fields are numbered from 1.
903
904       This applies to the current form.
905
906   $mech->set_visible( @criteria )
907       This method sets fields of the current form without having to know
908       their names.  So if you have a login screen that wants a username and
909       password, you do not have to fetch the form and inspect the source (or
910       use the mech-dump utility, installed with WWW::Mechanize) to see what
911       the field names are; you can just say
912
913           $mech->set_visible( $username, $password );
914
915       and the first and second fields will be set accordingly.  The method is
916       called set_visible because it acts only on visible fields; hidden form
917       inputs are not considered.  The order of the fields is the order in
918       which they appear in the HTML source which is nearly always the order
919       anyone viewing the page would think they are in, but some creative work
920       with tables could change that; caveat user.
921
922       Each element in @criteria is either a field value or a field specifier.
923       A field value is a scalar.  A field specifier allows you to specify the
924       type of input field you want to set and is denoted with an arrayref
925       containing two elements.  So you could specify the first radio button
926       with
927
928           $mech->set_visible( [ radio => 'KCRW' ] );
929
930       Field values and specifiers can be intermixed, hence
931
932           $mech->set_visible( 'fred', 'secret', [ option => 'Checking' ] );
933
934       would set the first two fields to "fred" and "secret", and the next
935       "OPTION" menu field to "Checking".
936
937       The possible field specifier types are: "text", "password", "hidden",
938       "textarea", "file", "image", "submit", "radio", "checkbox" and
939       "option".
940
941       "set_visible" returns the number of values set.
942
943   $mech->tick( $name, $value [, $set] )
944       "Ticks" the first checkbox that has both the name and value associated
945       with it on the current form.  Dies if there is no named check box for
946       that value.  Passing in a false value as the third optional argument
947       will cause the checkbox to be unticked.
948
949   $mech->untick($name, $value)
950       Causes the checkbox to be unticked.  Shorthand for
951       "tick($name,$value,undef)"
952
953   $mech->value( $name [, $number] )
954       Given the name of a field, return its value. This applies to the
955       current form.
956
957       The optional $number parameter is used to distinguish between two
958       fields with the same name.  The fields are numbered from 1.
959
960       If the field is of type file (file upload field), the value is always
961       cleared to prevent remote sites from downloading your local files.  To
962       upload a file, specify its file name explicitly.
963
964   $mech->click( $button [, $x, $y] )
965       Has the effect of clicking a button on the current form.  The first
966       argument is the name of the button to be clicked.  The second and third
967       arguments (optional) allow you to specify the (x,y) coordinates of the
968       click.
969
970       If there is only one button on the form, "$mech->click()" with no
971       arguments simply clicks that one button.
972
973       Returns an HTTP::Response object.
974
975   $mech->click_button( ... )
976       Has the effect of clicking a button on the current form by specifying
977       its attributes. The arguments are a list of key/value pairs. Only one
978       of name, id, number, input or value must be specified in the keys.
979
980       Dies if no button is found.
981
982       •   "name => name"
983
984           Clicks the button named name in the current form.
985
986       •   "id => id"
987
988           Clicks the button with the id id in the current form.
989
990       •   "number => n"
991
992           Clicks the nth button with type submit in the current form.
993           Numbering starts at 1.
994
995       •   "value => value"
996
997           Clicks the button with the value value in the current form.
998
999       •   "input => $inputobject"
1000
1001           Clicks on the button referenced by $inputobject, an instance of
1002           HTML::Form::SubmitInput obtained e.g. from
1003
1004               $mech->current_form()->find_input( undef, 'submit' )
1005
1006           $inputobject must belong to the current form.
1007
1008       •   "x => x"
1009
1010       •   "y => y"
1011
1012           These arguments (optional) allow you to specify the (x,y)
1013           coordinates of the click.
1014
1015   $mech->submit()
1016       Submits the current form, without specifying a button to click.
1017       Actually, no button is clicked at all.
1018
1019       Returns an HTTP::Response object.
1020
1021       This used to be a synonym for "$mech->click( 'submit' )", but is no
1022       longer so.
1023
1024   $mech->submit_form( ... )
1025       This method lets you select a form from the previously fetched page,
1026       fill in its fields, and submit it. It combines the
1027       "form_number"/"form_name", "set_fields" and "click" methods into one
1028       higher level call. Its arguments are a list of key/value pairs, all of
1029       which are optional.
1030
1031       •   "fields => \%fields"
1032
1033           Specifies the fields to be filled in the current form.
1034
1035       •   "with_fields => \%fields"
1036
1037           Probably all you need for the common case. It combines a smart form
1038           selector and data setting in one operation. It selects the first
1039           form that contains all fields mentioned in "\%fields".  This is
1040           nice because you don't need to know the name or number of the form
1041           to do this.
1042
1043           (calls "form_with_fields()" and
1044                  "set_fields()").
1045
1046           If you choose "with_fields", the "fields" option will be ignored.
1047           The "form_number", "form_name" and "form_id" options will still be
1048           used.  An exception will be thrown unless exactly one form matches
1049           all of the provided criteria.
1050
1051       •   "form_number => n"
1052
1053           Selects the nth form (calls "form_number()".  If this param is not
1054           specified, the currently-selected form is used.
1055
1056       •   "form_name => name"
1057
1058           Selects the form named name (calls "form_name()")
1059
1060       •   "form_id => ID"
1061
1062           Selects the form with ID ID (calls "form_id()")
1063
1064       •   "button => button"
1065
1066           Clicks on button button (calls "click()")
1067
1068       •   "x => x, y => y"
1069
1070           Sets the x or y values for "click()"
1071
1072       •   "strict_forms => bool"
1073
1074           Sets the HTML::Form strict flag which causes form submission to
1075           croak if any of the passed fields don't exist on the page, and/or a
1076           value doesn't exist in a select element.  By default HTML::Form
1077           sets this value to false.
1078
1079           This behavior can also be turned on globally by passing
1080           "strict_forms => 1" to "WWW::Mechanize->new". If you do that, you
1081           can still disable it for individual calls by passing "strict_forms
1082           => 0" here.
1083
1084       If no form is selected, the first form found is used.
1085
1086       If button is not passed, then the "submit()" method is used instead.
1087
1088       If you want to submit a file and get its content from a scalar rather
1089       than a file in the filesystem, you can use:
1090
1091           $mech->submit_form(with_fields => { logfile => [ [ undef, 'whatever', Content => $content ], 1 ] } );
1092
1093       Returns an HTTP::Response object.
1094

MISCELLANEOUS METHODS

1096   $mech->add_header( name => $value [, name => $value... ] )
1097       Sets HTTP headers for the agent to add or remove from the HTTP request.
1098
1099           $mech->add_header( Encoding => 'text/klingon' );
1100
1101       If a value is "undef", then that header will be removed from any future
1102       requests.  For example, to never send a Referer header:
1103
1104           $mech->add_header( Referer => undef );
1105
1106       If you want to delete a header, use "delete_header".
1107
1108       Returns the number of name/value pairs added.
1109
1110       NOTE: This method was very different in WWW::Mechanize before 1.00.
1111       Back then, the headers were stored in a package hash, not as a member
1112       of the object instance.  Calling "add_header()" would modify the
1113       headers for every WWW::Mechanize object, even after your object no
1114       longer existed.
1115
1116   $mech->delete_header( name [, name ... ] )
1117       Removes HTTP headers from the agent's list of special headers.  For
1118       instance, you might need to do something like:
1119
1120           # Don't send a Referer for this URL
1121           $mech->add_header( Referer => undef );
1122
1123           # Get the URL
1124           $mech->get( $url );
1125
1126           # Back to the default behavior
1127           $mech->delete_header( 'Referer' );
1128
1129   $mech->quiet(true/false)
1130       Allows you to suppress warnings to the screen.
1131
1132           $mech->quiet(0); # turns on warnings (the default)
1133           $mech->quiet(1); # turns off warnings
1134           $mech->quiet();  # returns the current quietness status
1135
1136   $mech->stack_depth( $max_depth )
1137       Get or set the page stack depth. Use this if you're doing a lot of page
1138       scraping and running out of memory.
1139
1140       A value of 0 means "no history at all."  By default, the max stack
1141       depth is humongously large, effectively keeping all history.
1142
1143   $mech->save_content( $filename, %opts )
1144       Dumps the contents of "$mech->content" into $filename.  $filename will
1145       be overwritten.  Dies if there are any errors.
1146
1147       If the content type does not begin with "text/", then the content is
1148       saved in binary mode (i.e. "binmode()" is set on the output
1149       filehandle).
1150
1151       Additional arguments can be passed as key/value pairs:
1152
1153       $mech->save_content( $filename, binary => 1 )
1154           Filehandle is set with "binmode" to ":raw" and contents are taken
1155           calling "$self->content(decoded_by_headers => 1)". Same as calling:
1156
1157               $mech->save_content( $filename, binmode => ':raw',
1158                                    decoded_by_headers => 1 );
1159
1160           This should be the safest way to save contents verbatim.
1161
1162       $mech->save_content( $filename, binmode => $binmode )
1163           Filehandle is set to binary mode. If $binmode begins with ':', it
1164           is passed as a parameter to "binmode":
1165
1166               binmode $fh, $binmode;
1167
1168           otherwise the filehandle is set to binary mode if $binmode is true:
1169
1170               binmode $fh;
1171
1172       all other arguments
1173           are passed as-is to "$mech->content(%opts)". In particular,
1174           "decoded_by_headers" might come handy if you want to revert the
1175           effect of line compression performed by the web server but without
1176           further interpreting the contents (e.g. decoding it according to
1177           the charset).
1178
1179   $mech->dump_headers( [$fh] )
1180       Prints a dump of the HTTP response headers for the most recent
1181       response.  If $fh is not specified or is undef, it dumps to STDOUT.
1182
1183       Unlike the rest of the dump_* methods, $fh can be a scalar. It will be
1184       used as a file name.
1185
1186   $mech->dump_links( [[$fh], $absolute] )
1187       Prints a dump of the links on the current page to $fh.  If $fh is not
1188       specified or is undef, it dumps to STDOUT.
1189
1190       If $absolute is true, links displayed are absolute, not relative.
1191
1192   $mech->dump_images( [[$fh], $absolute] )
1193       Prints a dump of the images on the current page to $fh.  If $fh is not
1194       specified or is undef, it dumps to STDOUT.
1195
1196       If $absolute is true, links displayed are absolute, not relative.
1197
1198       The output will include empty lines for images that have no "src"
1199       attribute and therefore no "<-"url>>.
1200
1201   $mech->dump_forms( [$fh] )
1202       Prints a dump of the forms on the current page to $fh.  If $fh is not
1203       specified or is undef, it dumps to STDOUT. Running the following:
1204
1205           my $mech = WWW::Mechanize->new();
1206           $mech->get("https://www.google.com/");
1207           $mech->dump_forms;
1208
1209       will print:
1210
1211           GET https://www.google.com/search [f]
1212             ie=ISO-8859-1                  (hidden readonly)
1213             hl=en                          (hidden readonly)
1214             source=hp                      (hidden readonly)
1215             biw=                           (hidden readonly)
1216             bih=                           (hidden readonly)
1217             q=                             (text)
1218             btnG=Google Search             (submit)
1219             btnI=I'm Feeling Lucky         (submit)
1220             gbv=1                          (hidden readonly)
1221
1222   $mech->dump_text( [$fh] )
1223       Prints a dump of the text on the current page to $fh.  If $fh is not
1224       specified or is undef, it dumps to STDOUT.
1225

OVERRIDDEN LWP::UserAgent METHODS

1227   $mech->clone()
1228       Clone the mech object.  The clone will be using the same cookie jar as
1229       the original mech.
1230
1231   $mech->redirect_ok()
1232       An overloaded version of "redirect_ok()" in LWP::UserAgent.  This
1233       method is used to determine whether a redirection in the request should
1234       be followed.
1235
1236       Note that WWW::Mechanize's constructor pushes POST on to the agent's
1237       "requests_redirectable" list.
1238
1239   $mech->request( $request [, $arg [, $size]])
1240       Overloaded version of "request()" in LWP::UserAgent.  Performs the
1241       actual request.  Normally, if you're using WWW::Mechanize, it's because
1242       you don't want to deal with this level of stuff anyway.
1243
1244       Note that $request will be modified.
1245
1246       Returns an HTTP::Response object.
1247
1248   $mech->update_html( $html )
1249       Allows you to replace the HTML that the mech has found.  Updates the
1250       forms and links parse-trees that the mech uses internally.
1251
1252       Say you have a page that you know has malformed output, and you want to
1253       update it so the links come out correctly:
1254
1255           my $html = $mech->content;
1256           $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
1257           $mech->update_html( $html );
1258
1259       This method is also used internally by the mech itself to update its
1260       own HTML content when loading a page. This means that if you would like
1261       to systematically perform the above HTML substitution, you would
1262       overload update_html in a subclass thusly:
1263
1264          package MyMech;
1265          use base 'WWW::Mechanize';
1266
1267          sub update_html {
1268              my ($self, $html) = @_;
1269              $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
1270              $self->WWW::Mechanize::update_html( $html );
1271          }
1272
1273       If you do this, then the mech will use the tidied-up HTML instead of
1274       the original both when parsing for its own needs, and for returning to
1275       you through "content()".
1276
1277       Overloading this method is also the recommended way of implementing
1278       extra validation steps (e.g. link checkers) for every HTML page
1279       received.  "warn" and "die" would then come in handy to signal
1280       validation errors.
1281
1282   $mech->credentials( $username, $password )
1283       Provide credentials to be used for HTTP Basic authentication for all
1284       sites and realms until further notice.
1285
1286       The four argument form described in LWP::UserAgent is still supported.
1287
1288   $mech->get_basic_credentials( $realm, $uri, $isproxy )
1289       Returns the credentials for the realm and URI.
1290
1291   $mech->clear_credentials()
1292       Remove any credentials set up with "credentials()".
1293

INHERITED UNCHANGED LWP::UserAgent METHODS

1295       As a subclass of LWP::UserAgent, WWW::Mechanize inherits all of
1296       LWP::UserAgent's methods.  Many of which are overridden or extended.
1297       The following methods are inherited unchanged. View the LWP::UserAgent
1298       documentation for their implementation descriptions.
1299
1300       This is not meant to be an inclusive list.  LWP::UA may have added
1301       others.
1302
1303   $mech->head()
1304       Inherited from LWP::UserAgent.
1305
1306   $mech->mirror()
1307       Inherited from LWP::UserAgent.
1308
1309   $mech->simple_request()
1310       Inherited from LWP::UserAgent.
1311
1312   $mech->is_protocol_supported()
1313       Inherited from LWP::UserAgent.
1314
1315   $mech->prepare_request()
1316       Inherited from LWP::UserAgent.
1317
1318   $mech->progress()
1319       Inherited from LWP::UserAgent.
1320

INTERNAL-ONLY METHODS

1322       These methods are only used internally.  You probably don't need to
1323       know about them.
1324
1325   $mech->_update_page($request, $response)
1326       Updates all internal variables in $mech as if $request was just
1327       performed, and returns $response. The page stack is not altered by this
1328       method, it is up to caller (e.g.  "request") to do that.
1329
1330   $mech->_modify_request( $req )
1331       Modifies a HTTP::Request before the request is sent out, for both GET
1332       and POST requests.
1333
1334       We add a "Referer" header, as well as header to note that we can accept
1335       gzip encoded content, if Compress::Zlib is installed.
1336
1337   $mech->_make_request()
1338       Convenience method to make it easier for subclasses like
1339       WWW::Mechanize::Cached to intercept the request.
1340
1341   $mech->_reset_page()
1342       Resets the internal fields that track page parsed stuff.
1343
1344   $mech->_extract_links()
1345       Extracts links from the content of a webpage, and populates the
1346       "{links}" property with WWW::Mechanize::Link objects.
1347
1348   $mech->_push_page_stack()
1349       The agent keeps a stack of visited pages, which it can pop when it
1350       needs to go BACK and so on.
1351
1352       The current page needs to be pushed onto the stack before we get a new
1353       page, and the stack needs to be popped when BACK occurs.
1354
1355       Neither of these take any arguments, they just operate on the $mech
1356       object.
1357
1358   warn( @messages )
1359       Centralized warning method, for diagnostics and non-fatal problems.
1360       Defaults to calling "CORE::warn", but may be overridden by setting
1361       "onwarn" in the constructor.
1362
1363   die( @messages )
1364       Centralized error method.  Defaults to calling "CORE::die", but may be
1365       overridden by setting "onerror" in the constructor.
1366

BEST PRACTICES

1368       The default settings can get you up and running quickly, but there are
1369       settings you can change in order to make your life easier.
1370
1371       autocheck
1372           "autocheck" can save you the overhead of checking status codes for
1373           success.  You may outgrow it as your needs get more sophisticated,
1374           but it's a safe option to start with.
1375
1376               my $agent = WWW::Mechanize->new( autocheck => 1 );
1377
1378       cookie_jar
1379           You are encouraged to install Mozilla::PublicSuffix and use
1380           HTTP::CookieJar::LWP as your cookie jar.  HTTP::CookieJar::LWP
1381           provides a better security model matching that of current Web
1382           browsers when Mozilla::PublicSuffix is installed.
1383
1384               use HTTP::CookieJar::LWP ();
1385
1386               my $jar = HTTP::CookieJar::LWP->new;
1387               my $agent = WWW::Mechanize->new( cookie_jar => $jar );
1388
1389       protocols_allowed
1390           This option is inherited directly from LWP::UserAgent.  It may be
1391           used to allow arbitrary protocols.
1392
1393               my $agent = WWW::Mechanize->new(
1394                   protocols_allowed => [ 'http', 'https' ]
1395               );
1396
1397           This will prevent you from inadvertently following URLs like
1398           "file:///etc/passwd"
1399
1400       protocols_forbidden
1401           This option is also inherited directly from LWP::UserAgent.  It may
1402           be used to deny arbitrary protocols.
1403
1404               my $agent = WWW::Mechanize->new(
1405                   protocols_forbidden => [ 'file', 'mailto', 'ssh', ]
1406               );
1407
1408           This will prevent you from inadvertently following URLs like
1409           "file:///etc/passwd"
1410
1411       strict_forms
1412           Consider turning on the "strict_forms" option when you create a new
1413           Mech.  This will perform a helpful sanity check on form fields
1414           every time you are submitting a form, which can save you a lot of
1415           debugging time.
1416
1417               my $agent = WWW::Mechanize->new( strict_forms => 1 );
1418
1419           If you do not want to have this option globally, you can still turn
1420           it on for individual forms.
1421
1422               $agent->submit_form( fields => { foo => 'bar' } , strict_forms => 1 );
1423

WWW::MECHANIZE'S GIT REPOSITORY

1425       WWW::Mechanize is hosted at GitHub.
1426
1427       Repository: <https://github.com/libwww-perl/WWW-Mechanize>.  Bugs:
1428       <https://github.com/libwww-perl/WWW-Mechanize/issues>.
1429

OTHER DOCUMENTATION

1431   Spidering Hacks, by Kevin Hemenway and Tara Calishain
1432       Spidering Hacks from O'Reilly
1433       (<http://www.oreilly.com/catalog/spiderhks/>) is a great book for
1434       anyone wanting to know more about screen-scraping and spidering.
1435
1436       There are six hacks that use Mech or a Mech derivative:
1437
1438       #21 WWW::Mechanize 101
1439       #22 Scraping with WWW::Mechanize
1440       #36 Downloading Images from Webshots
1441       #44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
1442       #64 Super Author Searching
1443       #73 Scraping TV Listings
1444
1445       The book was also positively reviewed on Slashdot:
1446       <http://books.slashdot.org/article.pl?sid=03/12/11/2126256>
1447

ONLINE RESOURCES AND SUPPORT

1449       •   WWW::Mechanize mailing list
1450
1451           The Mech mailing list is at
1452           <http://groups.google.com/group/www-mechanize-users> and is
1453           specific to Mechanize, unlike the LWP mailing list below.  Although
1454           it is a users list, all development discussion takes place here,
1455           too.
1456
1457       •   LWP mailing list
1458
1459           The LWP mailing list is at
1460           <http://lists.perl.org/showlist.cgi?name=libwww>, and is more user-
1461           oriented and well-populated than the WWW::Mechanize list.
1462
1463       •   Perlmonks
1464
1465           <http://perlmonks.org> is an excellent community of support, and
1466           many questions about Mech have already been answered there.
1467
1468       •   WWW::Mechanize::Examples
1469
1470           A random array of examples submitted by users, included with the
1471           Mechanize distribution.
1472

ARTICLES ABOUT WWW::MECHANIZE

1474       •   <http://www.ibm.com/developerworks/linux/library/wa-perlsecure/>
1475
1476           IBM article "Secure Web site access with Perl"
1477
1478       •   <http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf>
1479
1480           Leland Johnson's hack #84 in Google Hacks, 2nd Edition is an
1481           example of a production script that uses WWW::Mechanize and
1482           HTML::TableContentParser. It takes in keywords and returns the
1483           estimated price of these keywords on Google's AdWords program.
1484
1485       •   <http://www.perl.com/pub/a/2004/06/04/recorder.html>
1486
1487           Linda Julien writes about using HTTP::Recorder to create
1488           WWW::Mechanize scripts.
1489
1490       •   <http://www.developer.com/lang/other/article.php/3454041>
1491
1492           Jason Gilmore's article on using WWW::Mechanize for scraping sales
1493           information from Amazon and eBay.
1494
1495       •   <http://www.perl.com/pub/a/2003/01/22/mechanize.html>
1496
1497           Chris Ball's article about using WWW::Mechanize for scraping TV
1498           listings.
1499
1500       •   <http://www.stonehenge.com/merlyn/LinuxMag/col47.html>
1501
1502           Randal Schwartz's article on scraping Yahoo News for images.  It's
1503           already out of date: He manually walks the list of links hunting
1504           for matches, which wouldn't have been necessary if the
1505           "find_link()" method existed at press time.
1506
1507       •   <http://www.perladvent.org/2002/16th/>
1508
1509           WWW::Mechanize on the Perl Advent Calendar, by Mark Fowler.
1510
1511       •   <http://www.linux-magazin.de/ausgaben/2004/03/datenruessel/>
1512
1513           Michael Schilli's article on Mech and WWW::Mechanize::Shell for the
1514           German magazine Linux Magazin.
1515
1516   Other modules that use Mechanize
1517       Here are modules that use or subclass Mechanize.  Let me know of any
1518       others:
1519
1520       •   Finance::Bank::LloydsTSB
1521
1522       •   HTTP::Recorder
1523
1524           Acts as a proxy for web interaction, and then generates
1525           WWW::Mechanize scripts.
1526
1527       •   Win32::IE::Mechanize
1528
1529           Just like Mech, but using Microsoft Internet Explorer to do the
1530           work.
1531
1532       •   WWW::Bugzilla
1533
1534       •   WWW::Google::Groups
1535
1536       •   WWW::Hotmail
1537
1538       •   WWW::Mechanize::Cached
1539
1540       •   WWW::Mechanize::Cached::GZip
1541
1542       •   WWW::Mechanize::FormFiller
1543
1544       •   WWW::Mechanize::Shell
1545
1546       •   WWW::Mechanize::Sleepy
1547
1548       •   WWW::Mechanize::SpamCop
1549
1550       •   WWW::Mechanize::Timed
1551
1552       •   WWW::SourceForge
1553
1554       •   WWW::Yahoo::Groups
1555
1556       •   WWW::Scripter
1557

ACKNOWLEDGEMENTS

1559       Thanks to the numerous people who have helped out on WWW::Mechanize in
1560       one way or another, including Kirrily Robert for the original
1561       "WWW::Automate", Lyle Hopkins, Damien Clark, Ansgar Burchardt, Gisle
1562       Aas, Jeremy Ary, Hilary Holz, Rafael Kitover, Norbert Buchmuller, Dave
1563       Page, David Sainty, H.Merijn Brand, Matt Lawrence, Michael Schwern,
1564       Adriano Ferreira, Miyagawa, Peteris Krumins, Rafael Kitover, David
1565       Steinbrunner, Kevin Falcone, Mike O'Regan, Mark Stosberg, Uri Guttman,
1566       Peter Scott, Philippe Bruhat, Ian Langworth, John Beppu, Gavin Estey,
1567       Jim Brandt, Ask Bjoern Hansen, Greg Davies, Ed Silva, Mark-Jason
1568       Dominus, Autrijus Tang, Mark Fowler, Stuart Children, Max Maischein,
1569       Meng Wong, Prakash Kailasa, Abigail, Jan Pazdziora, Dominique
1570       Quatravaux, Scott Lanning, Rob Casey, Leland Johnson, Joshua Gatcomb,
1571       Julien Beasley, Abe Timmerman, Peter Stevens, Pete Krawczyk, Tad
1572       McClellan, and the late great Iain Truskett.
1573

AUTHOR

1575       Andy Lester <andy at petdance.com>
1576
1578       This software is copyright (c) 2004 by Andy Lester.
1579
1580       This is free software; you can redistribute it and/or modify it under
1581       the same terms as the Perl 5 programming language system itself.
1582
1583
1584
1585perl v5.34.1                      2022-05-01                 WWW::Mechanize(3)
Impressum