1WWW::Mechanize(3) User Contributed Perl Documentation WWW::Mechanize(3)
2
3
4
6 WWW::Mechanize - Handy web browsing in a Perl object
7
9 version 2.11
10
12 WWW::Mechanize supports performing a sequence of page fetches including
13 following links and submitting forms. Each fetched page is parsed and
14 its links and forms are extracted. A link or a form can be selected,
15 form fields can be filled and the next page can be fetched. Mech also
16 stores a history of the URLs you've visited, which can be queried and
17 revisited.
18
19 use WWW::Mechanize ();
20 my $mech = WWW::Mechanize->new();
21
22 $mech->get( $url );
23
24 $mech->follow_link( n => 3 );
25 $mech->follow_link( text_regex => qr/download this/i );
26 $mech->follow_link( url => 'http://host.com/index.html' );
27
28 $mech->submit_form(
29 form_number => 3,
30 fields => {
31 username => 'mungo',
32 password => 'lost-and-alone',
33 }
34 );
35
36 $mech->submit_form(
37 form_name => 'search',
38 fields => { query => 'pot of gold', },
39 button => 'Search Now'
40 );
41
42 # Enable strict form processing to catch typos and non-existant form fields.
43 my $strict_mech = WWW::Mechanize->new( strict_forms => 1);
44
45 $strict_mech->get( $url );
46
47 # This method call will die, saving you lots of time looking for the bug.
48 $strict_mech->submit_form(
49 form_number => 3,
50 fields => {
51 usernaem => 'mungo', # typo in field name
52 password => 'lost-and-alone',
53 extra_field => 123, # field does not exist
54 }
55 );
56
58 "WWW::Mechanize", or Mech for short, is a Perl module for stateful
59 programmatic web browsing, used for automating interaction with
60 websites.
61
62 Features include:
63
64 • All HTTP methods
65
66 • High-level hyperlink and HTML form support, without having to parse
67 HTML yourself
68
69 • SSL support
70
71 • Automatic cookies
72
73 • Custom HTTP headers
74
75 • Automatic handling of redirections
76
77 • Proxies
78
79 • HTTP authentication
80
81 Mech is well suited for use in testing web applications. If you use
82 one of the Test::*, like Test::HTML::Lint modules, you can check the
83 fetched content and use that as input to a test call.
84
85 use Test::More;
86 like( $mech->content(), qr/$expected/, "Got expected content" );
87
88 Each page fetch stores its URL in a history stack which you can
89 traverse.
90
91 $mech->back();
92
93 If you want finer control over your page fetching, you can use these
94 methods. "follow_link()" and "submit_form()" are just high level
95 wrappers around them.
96
97 $mech->find_link( n => $number );
98 $mech->form_number( $number );
99 $mech->form_name( $name );
100 $mech->field( $name, $value );
101 $mech->set_fields( %field_values );
102 $mech->set_visible( @criteria );
103 $mech->click( $button );
104
105 WWW::Mechanize is a proper subclass of LWP::UserAgent and you can also
106 use any of LWP::UserAgent's methods.
107
108 $mech->add_header($name => $value);
109
110 Please note that Mech does NOT support JavaScript, you need additional
111 software for that. Please check "JavaScript" in WWW::Mechanize::FAQ for
112 more.
113
115 • <https://github.com/libwww-perl/WWW-Mechanize/issues>
116
117 The queue for bugs & enhancements in WWW::Mechanize. Please note
118 that the queue at <http://rt.cpan.org> is no longer maintained.
119
120 • <https://metacpan.org/pod/WWW::Mechanize>
121
122 The CPAN documentation page for Mechanize.
123
124 • <https://metacpan.org/pod/distribution/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod>
125
126 Frequently asked questions. Make sure you read here FIRST.
127
129 new()
130 Creates and returns a new WWW::Mechanize object, hereafter referred to
131 as the "agent".
132
133 my $mech = WWW::Mechanize->new()
134
135 The constructor for WWW::Mechanize overrides two of the params to the
136 LWP::UserAgent constructor:
137
138 agent => 'WWW-Mechanize/#.##'
139 cookie_jar => {} # an empty, memory-only HTTP::Cookies object
140
141 You can override these overrides by passing params to the constructor,
142 as in:
143
144 my $mech = WWW::Mechanize->new( agent => 'wonderbot 1.01' );
145
146 If you want none of the overhead of a cookie jar, or don't want your
147 bot accepting cookies, you have to explicitly disallow it, like so:
148
149 my $mech = WWW::Mechanize->new( cookie_jar => undef );
150
151 Here are the params that WWW::Mechanize recognizes. These do not
152 include params that LWP::UserAgent recognizes.
153
154 • "autocheck => [0|1]"
155
156 Checks each request made to see if it was successful. This saves
157 you the trouble of manually checking yourself. Any errors found
158 are errors, not warnings.
159
160 The default value is ON, unless it's being subclassed, in which
161 case it is OFF. This means that standalone WWW::Mechanize
162 instances have autocheck turned on, which is protective for the
163 vast majority of Mech users who don't bother checking the return
164 value of get() and post() and can't figure why their code fails.
165 However, if WWW::Mechanize is subclassed, such as for
166 Test::WWW::Mechanize or Test::WWW::Mechanize::Catalyst, this may
167 not be an appropriate default, so it's off.
168
169 • "noproxy => [0|1]"
170
171 Turn off the automatic call to the LWP::UserAgent "env_proxy"
172 function.
173
174 This needs to be explicitly turned off if you're using
175 Crypt::SSLeay to access a https site via a proxy server. Note: you
176 still need to set your HTTPS_PROXY environment variable as
177 appropriate.
178
179 • "onwarn => \&func"
180
181 Reference to a "warn"-compatible function, such as "Carp::carp",
182 that is called when a warning needs to be shown.
183
184 If this is set to "undef", no warnings will ever be shown.
185 However, it's probably better to use the "quiet" method to control
186 that behavior.
187
188 If this value is not passed, Mech uses "Carp::carp" if Carp is
189 installed, or "CORE::warn" if not.
190
191 • "onerror => \&func"
192
193 Reference to a "die"-compatible function, such as "Carp::croak",
194 that is called when there's a fatal error.
195
196 If this is set to "undef", no errors will ever be shown.
197
198 If this value is not passed, Mech uses "Carp::croak" if Carp is
199 installed, or "CORE::die" if not.
200
201 • "quiet => [0|1]"
202
203 Don't complain on warnings. Setting "quiet => 1" is the same as
204 calling "$mech->quiet(1)". Default is off.
205
206 • "stack_depth => $value"
207
208 Sets the depth of the page stack that keeps track of all the
209 downloaded pages. Default is effectively infinite stack size. If
210 the stack is eating up your memory, then set this to a smaller
211 number, say 5 or 10. Setting this to zero means Mech will keep no
212 history.
213
214 In addition, WWW::Mechanize also allows you to globally enable strict
215 and verbose mode for form handling, which is done with HTML::Form.
216
217 • "strict_forms => [0|1]"
218
219 Globally sets the HTML::Form strict flag which causes form
220 submission to croak if any of the passed fields don't exist in the
221 form, and/or a value doesn't exist in a select element. This can
222 still be disabled in individual calls to "submit_form()".
223
224 Default is off.
225
226 • "verbose_forms => [0|1]"
227
228 Globally sets the HTML::Form verbose flag which causes form
229 submission to warn about any bad HTML form constructs found. This
230 cannot be disabled later.
231
232 Default is off.
233
234 • "marked_sections => [0|1]"
235
236 Globally sets the HTML::Parser marked sections flag which causes
237 HTML "CDATA[[" sections to be honoured. This cannot be disabled
238 later.
239
240 Default is on.
241
242 To support forms, WWW::Mechanize's constructor pushes POST on to the
243 agent's "requests_redirectable" list (see also LWP::UserAgent.)
244
245 $mech->agent_alias( $alias )
246 Sets the user agent string to the expanded version from a table of
247 actual user strings. $alias can be one of the following:
248
249 • Windows IE 6
250
251 • Windows Mozilla
252
253 • Mac Safari
254
255 • Mac Mozilla
256
257 • Linux Mozilla
258
259 • Linux Konqueror
260
261 then it will be replaced with a more interesting one. For instance,
262
263 $mech->agent_alias( 'Windows IE 6' );
264
265 sets your User-Agent to
266
267 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
268
269 The list of valid aliases can be returned from "known_agent_aliases()".
270 The current list is:
271
272 • Windows IE 6
273
274 • Windows Mozilla
275
276 • Mac Safari
277
278 • Mac Mozilla
279
280 • Linux Mozilla
281
282 • Linux Konqueror
283
284 $mech->known_agent_aliases()
285 Returns a list of all the agent aliases that Mech knows about. This
286 can also be called as a package or class method.
287
288 @aliases = WWW::Mechanize::known_agent_aliases();
289 @aliases = WWW::Mechanize->known_agent_aliases();
290 @aliases = $mech->known_agent_aliases();
291
293 $mech->get( $uri )
294 Given a URL/URI, fetches it. Returns an HTTP::Response object. $uri
295 can be a well-formed URL string, a URI object, or a
296 WWW::Mechanize::Link object.
297
298 The results are stored internally in the agent object, but you don't
299 know that. Just use the accessors listed below. Poking at the
300 internals is deprecated and subject to change in the future.
301
302 "get()" is a well-behaved overloaded version of the method in
303 LWP::UserAgent. This lets you do things like
304
305 $mech->get( $uri, ':content_file' => $filename );
306
307 and you can rest assured that the params will get filtered down
308 appropriately. See "get" in LWP::UserAgent for more details.
309
310 NOTE: Because ":content_file" causes the page contents to be stored in
311 a file instead of the response object, some Mech functions that expect
312 it to be there won't work as expected. Use with caution.
313
314 Here is a non-complete list of methods that do not work as expected
315 with ":content_file": " forms() ", " current_form() ", " links() ", "
316 title() ", " content(...) ", " text() ", all content-handling methods,
317 all link methods, all image methods, all form methods, all field
318 methods, " save_content(...) ", " dump_links(...) ", " dump_images(...)
319 ", " dump_forms(...) ", " dump_text(...) "
320
321 $mech->post( $uri, content => $content )
322 POSTs $content to $uri. Returns an HTTP::Response object. $uri can be
323 a well-formed URI string, a URI object, or a WWW::Mechanize::Link
324 object.
325
326 $mech->put( $uri, content => $content )
327 PUTs $content to $uri. Returns an HTTP::Response object. $uri can be
328 a well-formed URI string, a URI object, or a WWW::Mechanize::Link
329 object.
330
331 my $res = $mech->head( $uri );
332 my $res = $mech->head( $uri , $field_name => $value, ... );
333
334 $mech->head ($uri )
335 Performs a HEAD request to $uri. Returns an HTTP::Response object.
336 $uri can be a well-formed URI string, a URI object, or a
337 WWW::Mechanize::Link object.
338
339 $mech->reload()
340 Acts like the reload button in a browser: repeats the current request.
341 The history (as per the back() method) is not altered.
342
343 Returns the HTTP::Response object from the reload, or "undef" if
344 there's no current request.
345
346 $mech->back()
347 The equivalent of hitting the "back" button in a browser. Returns to
348 the previous page. Won't go back past the first page. (Really, what
349 would it do if it could?)
350
351 Returns true if it could go back, or false if not.
352
353 $mech->clear_history()
354 This deletes all the history entries and returns true.
355
356 $mech->history_count()
357 This returns the number of items in the browser history. This number
358 does include the most recently made request.
359
360 $mech->history($n)
361 This returns the nth item in history. The 0th item is the most recent
362 request and response, which would be acted on by methods like
363 "find_link()". The 1st item is the state you'd return to if you called
364 "back()".
365
366 The maximum useful value for $n is "$mech->history_count - 1".
367 Requests beyond that bound will return "undef".
368
369 History items are returned as hash references, in the form:
370
371 { req => $http_request, res => $http_response }
372
374 $mech->success()
375 Returns a boolean telling whether the last request was successful. If
376 there hasn't been an operation yet, returns false.
377
378 This is a convenience function that wraps "$mech->res->is_success".
379
380 $mech->uri()
381 Returns the current URI as a URI object. This object stringifies to the
382 URI itself.
383
384 $mech->response() / $mech->res()
385 Return the current response as an HTTP::Response object.
386
387 Synonym for "$mech->response()".
388
389 $mech->status()
390 Returns the HTTP status code of the response. This is a 3-digit number
391 like 200 for OK, 404 for not found, and so on.
392
393 $mech->ct() / $mech->content_type()
394 Returns the content type of the response.
395
396 $mech->base()
397 Returns the base URI for the current response
398
399 $mech->forms()
400 When called in a list context, returns a list of the forms found in the
401 last fetched page. In a scalar context, returns a reference to an array
402 with those forms. The forms returned are all HTML::Form objects.
403
404 $mech->current_form()
405 Returns the current form as an HTML::Form object.
406
407 $mech->links()
408 When called in a list context, returns a list of the links found in the
409 last fetched page. In a scalar context it returns a reference to an
410 array with those links. Each link is a WWW::Mechanize::Link object.
411
412 $mech->is_html()
413 Returns true/false on whether our content is HTML, according to the
414 HTTP headers.
415
416 $mech->title()
417 Returns the contents of the "<TITLE>" tag, as parsed by
418 HTML::HeadParser. Returns undef if the content is not HTML.
419
420 $mech->redirects()
421 Convenience method to get the redirects from the most recent
422 HTTP::Response.
423
424 Note that you can also use is_redirect to see if the most recent
425 response was a redirect like this.
426
427 $mech->get($url);
428 do_stuff() if $mech->res->is_redirect;
429
431 $mech->content(...)
432 Returns the content that the mech uses internally for the last page
433 fetched. Ordinarily this is the same as
434 "$mech->response()->decoded_content()", but this may differ for HTML
435 documents if "update_html" is overloaded (in which case the value
436 passed to the base-class implementation of same will be returned),
437 and/or extra named arguments are passed to "content()":
438
439 $mech->content( format => 'text' )
440 Returns a text-only version of the page, with all HTML markup
441 stripped. This feature requires HTML::TreeBuilder version 5 or higher
442 to be installed, or a fatal error will be thrown. This works only if
443 the contents are HTML.
444
445 $mech->content( base_href => [$base_href|undef] )
446 Returns the HTML document, modified to contain a "<base
447 href="$base_href">" mark-up in the header. $base_href is
448 "$mech->base()" if not specified. This is handy to pass the HTML to
449 e.g. HTML::Display. This works only if the contents are HTML.
450
451 $mech->content( raw => 1 )
452 Returns "$self->response()->content()", i.e. the raw contents from
453 the response.
454
455 $mech->content( decoded_by_headers => 1 )
456 Returns the content after applying all "Content-Encoding" headers but
457 with not additional mangling.
458
459 $mech->content( charset => $charset )
460 Returns "$self->response()->decoded_content(charset => $charset)"
461 (see HTTP::Response for details).
462
463 To preserve backwards compatibility, additional parameters will be
464 ignored unless none of "raw | decoded_by_headers | charset" is
465 specified and the text is HTML, in which case an error will be
466 triggered.
467
468 A fresh instance of WWW::Mechanize will return "undef" when
469 "$mech->content()" is called, because no content is present before a
470 request has been made.
471
472 $mech->text()
473 Returns the text of the current HTML content. If the content isn't
474 HTML, $mech will die.
475
476 The text is extracted by parsing the content, and then the extracted
477 text is cached, so don't worry about performance of calling this
478 repeatedly.
479
481 $mech->links()
482 Lists all the links on the current page. Each link is a
483 WWW::Mechanize::Link object. In list context, returns a list of all
484 links. In scalar context, returns an array reference of all links.
485
486 $mech->follow_link(...)
487 Follows a specified link on the page. You specify the match to be
488 found using the same params that "find_link()" uses.
489
490 Here some examples:
491
492 • 3rd link called "download"
493
494 $mech->follow_link( text => 'download', n => 3 );
495
496 • first link where the URL has "download" in it, regardless of case:
497
498 $mech->follow_link( url_regex => qr/download/i );
499
500 or
501
502 $mech->follow_link( url_regex => qr/(?i:download)/ );
503
504 • 3rd link on the page
505
506 $mech->follow_link( n => 3 );
507
508 • the link with the url
509
510 $mech->follow_link( url => '/other/page' );
511
512 or
513
514 $mech->follow_link( url => 'http://example.com/page' );
515
516 Returns the result of the "GET" method (an HTTP::Response object) if a
517 link was found.
518
519 If the page has no links, or the specified link couldn't be found,
520 returns "undef". If "autocheck" is enabled an exception will be thrown
521 instead.
522
523 $mech->find_link( ... )
524 Finds a link in the currently fetched page. It returns a
525 WWW::Mechanize::Link object which describes the link. (You'll probably
526 be most interested in the "url()" property.) If it fails to find a
527 link it returns "undef".
528
529 You can take the URL part and pass it to the "get()" method. If that's
530 your plan, you might as well use the "follow_link()" method directly,
531 since it does the "get()" for you automatically.
532
533 Note that "<FRAME SRC="...">" tags are parsed out of the HTML and
534 treated as links so this method works with them.
535
536 You can select which link to find by passing in one or more of these
537 key/value pairs:
538
539 • "text => 'string'," and "text_regex => qr/regex/,"
540
541 "text" matches the text of the link against string, which must be
542 an exact match. To select a link with text that is exactly
543 "download", use
544
545 $mech->find_link( text => 'download' );
546
547 "text_regex" matches the text of the link against regex. To select
548 a link with text that has "download" anywhere in it, regardless of
549 case, use
550
551 $mech->find_link( text_regex => qr/download/i );
552
553 Note that the text extracted from the page's links are trimmed.
554 For example, "<a> foo </a>" is stored as 'foo', and searching for
555 leading or trailing spaces will fail.
556
557 • "url => 'string'," and "url_regex => qr/regex/,"
558
559 Matches the URL of the link against string or regex, as
560 appropriate. The URL may be a relative URL, like foo/bar.html,
561 depending on how it's coded on the page.
562
563 • "url_abs => string" and "url_abs_regex => regex"
564
565 Matches the absolute URL of the link against string or regex, as
566 appropriate. The URL will be an absolute URL, even if it's
567 relative in the page.
568
569 • "name => string" and "name_regex => regex"
570
571 Matches the name of the link against string or regex, as
572 appropriate.
573
574 • "rel => string" and "rel_regex => regex"
575
576 Matches the rel of the link against string or regex, as
577 appropriate. This can be used to find stylesheets, favicons, or
578 links the author of the page does not want bots to follow.
579
580 • "id => string" and "id_regex => regex"
581
582 Matches the attribute 'id' of the link against string or regex, as
583 appropriate.
584
585 • "class => string" and "class_regex => regex"
586
587 Matches the attribute 'class' of the link against string or regex,
588 as appropriate.
589
590 • "tag => string" and "tag_regex => regex"
591
592 Matches the tag that the link came from against string or regex, as
593 appropriate. The "tag_regex" is probably most useful to check for
594 more than one tag, as in:
595
596 $mech->find_link( tag_regex => qr/^(a|frame)$/ );
597
598 The tags and attributes looked at are defined below.
599
600 If "n" is not specified, it defaults to 1. Therefore, if you don't
601 specify any params, this method defaults to finding the first link on
602 the page.
603
604 Note that you can specify multiple text or URL parameters, which will
605 be ANDed together. For example, to find the first link with text of
606 "News" and with "cnn.com" in the URL, use:
607
608 $mech->find_link( text => 'News', url_regex => qr/cnn\.com/ );
609
610 The return value is a reference to an array containing a
611 WWW::Mechanize::Link object for every link in "$self->content".
612
613 The links come from the following:
614
615 "<a href=...>"
616 "<area href=...>"
617 "<frame src=...>"
618 "<iframe src=...>"
619 "<link href=...>"
620 "<meta content=...>"
621
622 $mech->find_all_links( ... )
623 Returns all the links on the current page that match the criteria. The
624 method for specifying link criteria is the same as in "find_link()".
625 Each of the links returned is a WWW::Mechanize::Link object.
626
627 In list context, "find_all_links()" returns a list of the links.
628 Otherwise, it returns a reference to the list of links.
629
630 "find_all_links()" with no parameters returns all links in the page.
631
632 $mech->find_all_inputs( ... criteria ... )
633 "find_all_inputs()" returns an array of all the input controls in the
634 current form whose properties match all of the regexes passed in. The
635 controls returned are all descended from HTML::Form::Input. See
636 "INPUTS" in HTML::Form for details.
637
638 If no criteria are passed, all inputs will be returned.
639
640 If there is no current page, there is no form on the current page, or
641 there are no submit controls in the current form then the return will
642 be an empty array.
643
644 You may use a regex or a literal string:
645
646 # get all textarea controls whose names begin with "customer"
647 my @customer_text_inputs = $mech->find_all_inputs(
648 type => 'textarea',
649 name_regex => qr/^customer/,
650 );
651
652 # get all text or textarea controls called "customer"
653 my @customer_text_inputs = $mech->find_all_inputs(
654 type_regex => qr/^(text|textarea)$/,
655 name => 'customer',
656 );
657
658 $mech->find_all_submits( ... criteria ... )
659 "find_all_submits()" does the same thing as "find_all_inputs()" except
660 that it only returns controls that are submit controls, ignoring other
661 types of input controls like text and checkboxes.
662
664 $mech->images
665 Lists all the images on the current page. Each image is a
666 WWW::Mechanize::Image object. In list context, returns a list of all
667 images. In scalar context, returns an array reference of all images.
668
669 $mech->find_image()
670 Finds an image in the current page. It returns a WWW::Mechanize::Image
671 object which describes the image. If it fails to find an image it
672 returns undef.
673
674 You can select which image to find by passing in one or more of these
675 key/value pairs:
676
677 • "alt => 'string'" and "alt_regex => qr/regex/"
678
679 "alt" matches the ALT attribute of the image against string, which
680 must be an exact match. To select a image with an ALT tag that is
681 exactly "download", use
682
683 $mech->find_image( alt => 'download' );
684
685 "alt_regex" matches the ALT attribute of the image against a
686 regular expression. To select an image with an ALT attribute that
687 has "download" anywhere in it, regardless of case, use
688
689 $mech->find_image( alt_regex => qr/download/i );
690
691 • "url => 'string'" and "url_regex => qr/regex/"
692
693 Matches the URL of the image against string or regex, as
694 appropriate. The URL may be a relative URL, like foo/bar.html,
695 depending on how it's coded on the page.
696
697 • "url_abs => string" and "url_abs_regex => regex"
698
699 Matches the absolute URL of the image against string or regex, as
700 appropriate. The URL will be an absolute URL, even if it's
701 relative in the page.
702
703 • "tag => string" and "tag_regex => regex"
704
705 Matches the tag that the image came from against string or regex,
706 as appropriate. The "tag_regex" is probably most useful to check
707 for more than one tag, as in:
708
709 $mech->find_image( tag_regex => qr/^(img|input)$/ );
710
711 The tags supported are "<img>" and "<input>".
712
713 • "id => string" and "id_regex => regex"
714
715 "id" matches the id attribute of the image against string, which
716 must be an exact match. To select an image with the exact id
717 "download-image", use
718
719 $mech->find_image( id => 'download-image' );
720
721 "id_regex" matches the id attribute of the image against a regular
722 expression. To select the first image with an id that contains
723 "download" anywhere in it, use
724
725 $mech->find_image( id_regex => qr/download/ );
726
727 • "classs => string" and "class_regex => regex"
728
729 "class" matches the class attribute of the image against string,
730 which must be an exact match. To select an image with the exact
731 class "img-fuid", use
732
733 $mech->find_image( class => 'img-fluid' );
734
735 To select an image with the class attribute "rounded float-left",
736 use
737
738 $mech->find_image( class => 'rounded float-left' );
739
740 Note that the classes have to be matched as a complete string, in
741 the exact order they appear in the website's source code.
742
743 "class_regex" matches the class attribute of the image against a
744 regular expression. Use this if you want a partial class name, or
745 if an image has several classes, but you only care about one.
746
747 To select the first image with the class "rounded", where there are
748 multiple images that might also have either class "float-left" or
749 "float-right", use
750
751 $mech->find_image( class_regex => qr/\brounded\b/ );
752
753 Selecting an image with multiple classes where you do not care
754 about the order they appear in the website's source code is not
755 currently supported.
756
757 If "n" is not specified, it defaults to 1. Therefore, if you don't
758 specify any params, this method defaults to finding the first image on
759 the page.
760
761 Note that you can specify multiple ALT or URL parameters, which will be
762 ANDed together. For example, to find the first image with ALT text of
763 "News" and with "cnn.com" in the URL, use:
764
765 $mech->find_image( image => 'News', url_regex => qr/cnn\.com/ );
766
767 The return value is a reference to an array containing a
768 WWW::Mechanize::Image object for every image in "$mech->content".
769
770 $mech->find_all_images( ... )
771 Returns all the images on the current page that match the criteria.
772 The method for specifying image criteria is the same as in
773 "find_image()". Each of the images returned is a WWW::Mechanize::Image
774 object.
775
776 In list context, "find_all_images()" returns a list of the images.
777 Otherwise, it returns a reference to the list of images.
778
779 "find_all_images()" with no parameters returns all images in the page.
780
782 These methods let you work with the forms on a page. The idea is to
783 choose a form that you'll later work with using the field methods
784 below.
785
786 $mech->forms
787 Lists all the forms on the current page. Each form is an HTML::Form
788 object. In list context, returns a list of all forms. In scalar
789 context, returns an array reference of all forms.
790
791 $mech->form_number($number)
792 Selects the numberth form on the page as the target for subsequent
793 calls to "field()" and "click()". Also returns the form that was
794 selected.
795
796 If it is found, the form is returned as an HTML::Form object and set
797 internally for later use with Mech's form methods such as "field()" and
798 "click()". When called in a list context, the number of the found form
799 is also returned as a second value.
800
801 Emits a warning and returns undef if no form is found.
802
803 The first form is number 1, not zero.
804
805 $mech->form_action( $action )
806 Selects a form by action, using a regex containing $action. If there
807 is more than one form on the page matching that action, then the first
808 one is used, and a warning is generated.
809
810 If it is found, the form is returned as an HTML::Form object and set
811 internally for later use with Mech's form methods such as "field()" and
812 "click()".
813
814 Returns "undef" if no form is found.
815
816 $mech->form_name( $name )
817 Selects a form by name. If there is more than one form on the page
818 with that name, then the first one is used, and a warning is generated.
819
820 If it is found, the form is returned as an HTML::Form object and set
821 internally for later use with Mech's form methods such as "field()" and
822 "click()".
823
824 Returns undef if no form is found.
825
826 $mech->form_id( $id )
827 Selects a form by ID. If there is more than one form on the page with
828 that ID, then the first one is used, and a warning is generated.
829
830 If it is found, the form is returned as an HTML::Form object and set
831 internally for later use with Mech's form methods such as "field()" and
832 "click()".
833
834 If no form is found it returns "undef". This will also trigger a
835 warning, unless "quiet" is enabled.
836
837 $mech->all_forms_with_fields( @fields )
838 Selects a form by passing in a list of field names it must contain.
839 All matching forms (perhaps none) are returned as a list of HTML::Form
840 objects.
841
842 $mech->form_with_fields( @fields )
843 Selects a form by passing in a list of field names it must contain. If
844 there is more than one form on the page with that matches, then the
845 first one is used, and a warning is generated.
846
847 If it is found, the form is returned as an HTML::Form object and set
848 internally for later used with Mech's form methods such as "field()"
849 and "click()".
850
851 Returns undef and emits a warning if no form is found.
852
853 Note that this functionality requires libwww-perl 5.69 or higher.
854
855 $mech->all_forms_with( $attr1 => $value1, $attr2 => $value2, ... )
856 Searches for forms with arbitrary attribute/value pairs within the
857 <form> tag. (Currently does not work for attribute "action" due to
858 implementation details of HTML::Form.) When given more than one pair,
859 all criteria must match. Using "undef" as value means that the
860 attribute in question must not be present.
861
862 All matching forms (perhaps none) are returned as a list of HTML::Form
863 objects.
864
865 $mech->form_with( $attr1 => $value1, $attr2 => $value2, ... )
866 Searches for forms with arbitrary attribute/value pairs within the
867 <form> tag. (Currently does not work for attribute "action" due to
868 implementation details of HTML::Form. Use "form_action()" instead.)
869 When given more than one pair, all criteria must match. Using "undef"
870 as value means that the attribute in question must not be present.
871
872 If it is found, the form is returned as an HTML::Form object and set
873 internally for later used with Mech's form methods such as "field()"
874 and "click()".
875
876 Returns undef if no form is found.
877
879 These methods allow you to set the values of fields in a given form.
880
881 $mech->field( $name, $value, $number )
882 $mech->field( $name, \@values, $number )
883 Given the name of a field, set its value to the value specified. This
884 applies to the current form (as set by the "form_name()" or
885 "form_number()" method or defaulting to the first form on the page).
886
887 The optional $number parameter is used to distinguish between two
888 fields with the same name. The fields are numbered from 1.
889
890 $mech->select($name, $value)
891 $mech->select($name, \@values)
892 Given the name of a "select" field, set its value to the value
893 specified. If the field is not "<select multiple>" and the $value is
894 an array, only the first value will be set. [Note: the documentation
895 previously claimed that only the last value would be set, but this was
896 incorrect.] Passing $value as a hash with an "n" key selects an item
897 by number (e.g. "{n => 3}" or "{n => [2,4]}"). The numbering starts
898 at 1. This applies to the current form.
899
900 If you have a field with "<select multiple>" and you pass a single
901 $value, then $value will be added to the list of fields selected,
902 without clearing the others. However, if you pass an array reference,
903 then all previously selected values will be cleared.
904
905 Returns true on successfully setting the value. On failure, returns
906 false and calls "$self->warn()" with an error message.
907
908 $mech->set_fields( $name => $value ... )
909 $mech->set_fields( $name => \@nvalue_and_instance_number )
910 $mech->set_fields( $name => \$value_instance_number )
911 This method sets multiple fields of the current form. It takes a list
912 of field name and value pairs. If there is more than one field with the
913 same name, the first one found is set. If you want to select which of
914 the duplicate field to set, use a value which is an anonymous array
915 which has the field value and its number as the 2 elements.
916
917 # set the second $name field to 'foo'
918 $mech->set_fields( $name => [ 'foo', 2 ] );
919
920 The fields are numbered from 1.
921
922 For fields that have a predefined set of values, you may also provide a
923 reference to an integer, if you don't know the options for the field,
924 but you know you just want (e.g.) the first one.
925
926 # select the first value in the $name select box
927 $mech->set_fields( $name => \0 );
928 # select the last value in the $name select box
929 $mech->set_fields( $name => \-1 );
930
931 This applies to the current form.
932
933 $mech->set_visible( @criteria )
934 This method sets fields of the current form without having to know
935 their names. So if you have a login screen that wants a username and
936 password, you do not have to fetch the form and inspect the source (or
937 use the mech-dump utility, installed with WWW::Mechanize) to see what
938 the field names are; you can just say
939
940 $mech->set_visible( $username, $password );
941
942 and the first and second fields will be set accordingly. The method is
943 called set_visible because it acts only on visible fields; hidden form
944 inputs are not considered. The order of the fields is the order in
945 which they appear in the HTML source which is nearly always the order
946 anyone viewing the page would think they are in, but some creative work
947 with tables could change that; caveat user.
948
949 Each element in @criteria is either a field value or a field specifier.
950 A field value is a scalar. A field specifier allows you to specify the
951 type of input field you want to set and is denoted with an arrayref
952 containing two elements. So you could specify the first radio button
953 with
954
955 $mech->set_visible( [ radio => 'KCRW' ] );
956
957 Field values and specifiers can be intermixed, hence
958
959 $mech->set_visible( 'fred', 'secret', [ option => 'Checking' ] );
960
961 would set the first two fields to "fred" and "secret", and the next
962 "OPTION" menu field to "Checking".
963
964 The possible field specifier types are: "text", "password", "hidden",
965 "textarea", "file", "image", "submit", "radio", "checkbox" and
966 "option".
967
968 "set_visible" returns the number of values set.
969
970 $mech->tick( $name, $value [, $set] )
971 "Ticks" the first checkbox that has both the name and value associated
972 with it on the current form. If there is no value to the input, just
973 pass an empty string as the value. Dies if there is no named checkbox
974 for the value given, if a value is given. Passing in a false value as
975 the third optional argument will cause the checkbox to be unticked.
976 The third value does not need to be set if you wish to merely tick the
977 box.
978
979 $mech->tick('extra', 'cheese');
980 $mech->tick('extra', 'mushrooms');
981
982 $mech->tick('no_value', ''); # <input type="checkbox" name="no_value">
983
984 $mech->untick($name, $value)
985 Causes the checkbox to be unticked. Shorthand for
986 "tick($name,$value,undef)"
987
988 $mech->value( $name [, $number] )
989 Given the name of a field, return its value. This applies to the
990 current form.
991
992 The optional $number parameter is used to distinguish between two
993 fields with the same name. The fields are numbered from 1.
994
995 If the field is of type file (file upload field), the value is always
996 cleared to prevent remote sites from downloading your local files. To
997 upload a file, specify its file name explicitly.
998
999 $mech->click( $button [, $x, $y] )
1000 Has the effect of clicking a button on the current form. The first
1001 argument is the name of the button to be clicked. The second and third
1002 arguments (optional) allow you to specify the (x,y) coordinates of the
1003 click.
1004
1005 If there is only one button on the form, "$mech->click()" with no
1006 arguments simply clicks that one button.
1007
1008 Returns an HTTP::Response object.
1009
1010 $mech->click_button( ... )
1011 Has the effect of clicking a button on the current form by specifying
1012 its attributes. The arguments are a list of key/value pairs. Only one
1013 of name, id, number, input or value must be specified in the keys.
1014
1015 Dies if no button is found.
1016
1017 • "name => name"
1018
1019 Clicks the button named name in the current form.
1020
1021 • "id => id"
1022
1023 Clicks the button with the id id in the current form.
1024
1025 • "number => n"
1026
1027 Clicks the nth button with type submit in the current form.
1028 Numbering starts at 1.
1029
1030 • "value => value"
1031
1032 Clicks the button with the value value in the current form.
1033
1034 • "input => $inputobject"
1035
1036 Clicks on the button referenced by $inputobject, an instance of
1037 HTML::Form::SubmitInput obtained e.g. from
1038
1039 $mech->current_form()->find_input( undef, 'submit' )
1040
1041 $inputobject must belong to the current form.
1042
1043 • "x => x"
1044
1045 • "y => y"
1046
1047 These arguments (optional) allow you to specify the (x,y)
1048 coordinates of the click.
1049
1050 $mech->submit()
1051 Submits the current form, without specifying a button to click.
1052 Actually, no button is clicked at all.
1053
1054 Returns an HTTP::Response object.
1055
1056 This used to be a synonym for "$mech->click( 'submit' )", but is no
1057 longer so.
1058
1059 $mech->submit_form( ... )
1060 This method lets you select a form from the previously fetched page,
1061 fill in its fields, and submit it. It combines the
1062 "form_number"/"form_name", "set_fields" and "click" methods into one
1063 higher level call. Its arguments are a list of key/value pairs, all of
1064 which are optional.
1065
1066 • "fields => \%fields"
1067
1068 Specifies the fields to be filled in the current form.
1069
1070 • "with_fields => \%fields"
1071
1072 Probably all you need for the common case. It combines a smart form
1073 selector and data setting in one operation. It selects the first
1074 form that contains all fields mentioned in "\%fields". This is
1075 nice because you don't need to know the name or number of the form
1076 to do this.
1077
1078 (calls "form_with_fields()" and
1079 "set_fields()").
1080
1081 If you choose "with_fields", the "fields" option will be ignored.
1082 The "form_number", "form_name" and "form_id" options will still be
1083 used. An exception will be thrown unless exactly one form matches
1084 all of the provided criteria.
1085
1086 • "form_number => n"
1087
1088 Selects the nth form (calls "form_number()". If this param is not
1089 specified, the currently-selected form is used.
1090
1091 • "form_name => name"
1092
1093 Selects the form named name (calls "form_name()")
1094
1095 • "form_id => ID"
1096
1097 Selects the form with ID ID (calls "form_id()")
1098
1099 • "button => button"
1100
1101 Clicks on button button (calls "click()")
1102
1103 • "x => x, y => y"
1104
1105 Sets the x or y values for "click()"
1106
1107 • "strict_forms => bool"
1108
1109 Sets the HTML::Form strict flag which causes form submission to
1110 croak if any of the passed fields don't exist on the page, and/or a
1111 value doesn't exist in a select element. By default HTML::Form
1112 sets this value to false.
1113
1114 This behavior can also be turned on globally by passing
1115 "strict_forms => 1" to "WWW::Mechanize->new". If you do that, you
1116 can still disable it for individual calls by passing "strict_forms
1117 => 0" here.
1118
1119 If no form is selected, the first form found is used.
1120
1121 If button is not passed, then the "submit()" method is used instead.
1122
1123 If you want to submit a file and get its content from a scalar rather
1124 than a file in the filesystem, you can use:
1125
1126 $mech->submit_form(with_fields => { logfile => [ [ undef, 'whatever', Content => $content ], 1 ] } );
1127
1128 Returns an HTTP::Response object.
1129
1131 $mech->add_header( name => $value [, name => $value... ] )
1132 Sets HTTP headers for the agent to add or remove from the HTTP request.
1133
1134 $mech->add_header( Encoding => 'text/klingon' );
1135
1136 If a value is "undef", then that header will be removed from any future
1137 requests. For example, to never send a Referer header:
1138
1139 $mech->add_header( Referer => undef );
1140
1141 If you want to delete a header, use "delete_header".
1142
1143 Returns the number of name/value pairs added.
1144
1145 NOTE: This method was very different in WWW::Mechanize before 1.00.
1146 Back then, the headers were stored in a package hash, not as a member
1147 of the object instance. Calling "add_header()" would modify the
1148 headers for every WWW::Mechanize object, even after your object no
1149 longer existed.
1150
1151 $mech->delete_header( name [, name ... ] )
1152 Removes HTTP headers from the agent's list of special headers. For
1153 instance, you might need to do something like:
1154
1155 # Don't send a Referer for this URL
1156 $mech->add_header( Referer => undef );
1157
1158 # Get the URL
1159 $mech->get( $url );
1160
1161 # Back to the default behavior
1162 $mech->delete_header( 'Referer' );
1163
1164 $mech->quiet(true/false)
1165 Allows you to suppress warnings to the screen.
1166
1167 $mech->quiet(0); # turns on warnings (the default)
1168 $mech->quiet(1); # turns off warnings
1169 $mech->quiet(); # returns the current quietness status
1170
1171 $mech->stack_depth( $max_depth )
1172 Get or set the page stack depth. Use this if you're doing a lot of page
1173 scraping and running out of memory.
1174
1175 A value of 0 means "no history at all." By default, the max stack
1176 depth is humongously large, effectively keeping all history.
1177
1178 $mech->save_content( $filename, %opts )
1179 Dumps the contents of "$mech->content" into $filename. $filename will
1180 be overwritten. Dies if there are any errors.
1181
1182 If the content type does not begin with "text/", then the content is
1183 saved in binary mode (i.e. "binmode()" is set on the output
1184 filehandle).
1185
1186 Additional arguments can be passed as key/value pairs:
1187
1188 $mech->save_content( $filename, binary => 1 )
1189 Filehandle is set with "binmode" to ":raw" and contents are taken
1190 calling "$self->content(decoded_by_headers => 1)". Same as calling:
1191
1192 $mech->save_content( $filename, binmode => ':raw',
1193 decoded_by_headers => 1 );
1194
1195 This should be the safest way to save contents verbatim.
1196
1197 $mech->save_content( $filename, binmode => $binmode )
1198 Filehandle is set to binary mode. If $binmode begins with ':', it
1199 is passed as a parameter to "binmode":
1200
1201 binmode $fh, $binmode;
1202
1203 otherwise the filehandle is set to binary mode if $binmode is true:
1204
1205 binmode $fh;
1206
1207 all other arguments
1208 are passed as-is to "$mech->content(%opts)". In particular,
1209 "decoded_by_headers" might come handy if you want to revert the
1210 effect of line compression performed by the web server but without
1211 further interpreting the contents (e.g. decoding it according to
1212 the charset).
1213
1214 $mech->dump_headers( [$fh] )
1215 Prints a dump of the HTTP response headers for the most recent
1216 response. If $fh is not specified or is "undef", it dumps to STDOUT.
1217
1218 Unlike the rest of the "dump_*" methods, $fh can be a scalar. It will
1219 be used as a file name.
1220
1221 $mech->dump_links( [[$fh], $absolute] )
1222 Prints a dump of the links on the current page to $fh. If $fh is not
1223 specified or is "undef", it dumps to STDOUT.
1224
1225 If $absolute is true, links displayed are absolute, not relative.
1226
1227 $mech->dump_images( [[$fh], $absolute] )
1228 Prints a dump of the images on the current page to $fh. If $fh is not
1229 specified or is "undef", it dumps to STDOUT.
1230
1231 If $absolute is true, links displayed are absolute, not relative.
1232
1233 The output will include empty lines for images that have no "src"
1234 attribute and therefore no URL.
1235
1236 $mech->dump_forms( [$fh] )
1237 Prints a dump of the forms on the current page to $fh. If $fh is not
1238 specified or is "undef", it dumps to STDOUT. Running the following:
1239
1240 my $mech = WWW::Mechanize->new();
1241 $mech->get("https://www.google.com/");
1242 $mech->dump_forms;
1243
1244 will print:
1245
1246 GET https://www.google.com/search [f]
1247 ie=ISO-8859-1 (hidden readonly)
1248 hl=en (hidden readonly)
1249 source=hp (hidden readonly)
1250 biw= (hidden readonly)
1251 bih= (hidden readonly)
1252 q= (text)
1253 btnG=Google Search (submit)
1254 btnI=I'm Feeling Lucky (submit)
1255 gbv=1 (hidden readonly)
1256
1257 $mech->dump_text( [$fh] )
1258 Prints a dump of the text on the current page to $fh. If $fh is not
1259 specified or is "undef", it dumps to STDOUT.
1260
1262 $mech->clone()
1263 Clone the mech object. The clone will be using the same cookie jar as
1264 the original mech.
1265
1266 $mech->redirect_ok()
1267 An overloaded version of "redirect_ok()" in LWP::UserAgent. This
1268 method is used to determine whether a redirection in the request should
1269 be followed.
1270
1271 Note that WWW::Mechanize's constructor pushes POST on to the agent's
1272 "requests_redirectable" list.
1273
1274 $mech->request( $request [, $arg [, $size]])
1275 Overloaded version of "request()" in LWP::UserAgent. Performs the
1276 actual request. Normally, if you're using WWW::Mechanize, it's because
1277 you don't want to deal with this level of stuff anyway.
1278
1279 Note that $request will be modified.
1280
1281 Returns an HTTP::Response object.
1282
1283 $mech->update_html( $html )
1284 Allows you to replace the HTML that the mech has found. Updates the
1285 forms and links parse-trees that the mech uses internally.
1286
1287 Say you have a page that you know has malformed output, and you want to
1288 update it so the links come out correctly:
1289
1290 my $html = $mech->content;
1291 $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
1292 $mech->update_html( $html );
1293
1294 This method is also used internally by the mech itself to update its
1295 own HTML content when loading a page. This means that if you would like
1296 to systematically perform the above HTML substitution, you would
1297 overload "update_html" in a subclass thusly:
1298
1299 package MyMech;
1300 use base 'WWW::Mechanize';
1301
1302 sub update_html {
1303 my ($self, $html) = @_;
1304 $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
1305 $self->WWW::Mechanize::update_html( $html );
1306 }
1307
1308 If you do this, then the mech will use the tidied-up HTML instead of
1309 the original both when parsing for its own needs, and for returning to
1310 you through "content()".
1311
1312 Overloading this method is also the recommended way of implementing
1313 extra validation steps (e.g. link checkers) for every HTML page
1314 received. "warn" and "warn" would then come in handy to signal
1315 validation errors.
1316
1317 $mech->credentials( $username, $password )
1318 Provide credentials to be used for HTTP Basic authentication for all
1319 sites and realms until further notice.
1320
1321 The four argument form described in LWP::UserAgent is still supported.
1322
1323 $mech->get_basic_credentials( $realm, $uri, $isproxy )
1324 Returns the credentials for the realm and URI.
1325
1326 $mech->clear_credentials()
1327 Remove any credentials set up with "credentials()".
1328
1330 As a subclass of LWP::UserAgent, WWW::Mechanize inherits all of
1331 LWP::UserAgent's methods. Many of which are overridden or extended.
1332 The following methods are inherited unchanged. View the LWP::UserAgent
1333 documentation for their implementation descriptions.
1334
1335 This is not meant to be an inclusive list. LWP::UA may have added
1336 others.
1337
1338 $mech->head()
1339 Inherited from LWP::UserAgent.
1340
1341 $mech->mirror()
1342 Inherited from LWP::UserAgent.
1343
1344 $mech->simple_request()
1345 Inherited from LWP::UserAgent.
1346
1347 $mech->is_protocol_supported()
1348 Inherited from LWP::UserAgent.
1349
1350 $mech->prepare_request()
1351 Inherited from LWP::UserAgent.
1352
1353 $mech->progress()
1354 Inherited from LWP::UserAgent.
1355
1357 These methods are only used internally. You probably don't need to
1358 know about them.
1359
1360 $mech->_update_page($request, $response)
1361 Updates all internal variables in $mech as if $request was just
1362 performed, and returns $response. The page stack is not altered by this
1363 method, it is up to caller (e.g. "request") to do that.
1364
1365 $mech->_modify_request( $req )
1366 Modifies a HTTP::Request before the request is sent out, for both GET
1367 and POST requests.
1368
1369 We add a "Referer" header, as well as header to note that we can accept
1370 gzip encoded content, if Compress::Zlib is installed.
1371
1372 $mech->_make_request()
1373 Convenience method to make it easier for subclasses like
1374 WWW::Mechanize::Cached to intercept the request.
1375
1376 $mech->_reset_page()
1377 Resets the internal fields that track page parsed stuff.
1378
1379 $mech->_extract_links()
1380 Extracts links from the content of a webpage, and populates the
1381 "{links}" property with WWW::Mechanize::Link objects.
1382
1383 $mech->_push_page_stack()
1384 The agent keeps a stack of visited pages, which it can pop when it
1385 needs to go BACK and so on.
1386
1387 The current page needs to be pushed onto the stack before we get a new
1388 page, and the stack needs to be popped when BACK occurs.
1389
1390 Neither of these take any arguments, they just operate on the $mech
1391 object.
1392
1393 warn( @messages )
1394 Centralized warning method, for diagnostics and non-fatal problems.
1395 Defaults to calling "CORE::warn", but may be overridden by setting
1396 "onwarn" in the constructor.
1397
1398 die( @messages )
1399 Centralized error method. Defaults to calling "CORE::die", but may be
1400 overridden by setting "onerror" in the constructor.
1401
1403 The default settings can get you up and running quickly, but there are
1404 settings you can change in order to make your life easier.
1405
1406 autocheck
1407 "autocheck" can save you the overhead of checking status codes for
1408 success. You may outgrow it as your needs get more sophisticated,
1409 but it's a safe option to start with.
1410
1411 my $agent = WWW::Mechanize->new( autocheck => 1 );
1412
1413 cookie_jar
1414 You are encouraged to install Mozilla::PublicSuffix and use
1415 HTTP::CookieJar::LWP as your cookie jar. HTTP::CookieJar::LWP
1416 provides a better security model matching that of current Web
1417 browsers when Mozilla::PublicSuffix is installed.
1418
1419 use HTTP::CookieJar::LWP ();
1420
1421 my $jar = HTTP::CookieJar::LWP->new;
1422 my $agent = WWW::Mechanize->new( cookie_jar => $jar );
1423
1424 protocols_allowed
1425 This option is inherited directly from LWP::UserAgent. It may be
1426 used to allow arbitrary protocols.
1427
1428 my $agent = WWW::Mechanize->new(
1429 protocols_allowed => [ 'http', 'https' ]
1430 );
1431
1432 This will prevent you from inadvertently following URLs like
1433 "file:///etc/passwd"
1434
1435 protocols_forbidden
1436 This option is also inherited directly from LWP::UserAgent. It may
1437 be used to deny arbitrary protocols.
1438
1439 my $agent = WWW::Mechanize->new(
1440 protocols_forbidden => [ 'file', 'mailto', 'ssh', ]
1441 );
1442
1443 This will prevent you from inadvertently following URLs like
1444 "file:///etc/passwd"
1445
1446 strict_forms
1447 Consider turning on the "strict_forms" option when you create a new
1448 Mech. This will perform a helpful sanity check on form fields
1449 every time you are submitting a form, which can save you a lot of
1450 debugging time.
1451
1452 my $agent = WWW::Mechanize->new( strict_forms => 1 );
1453
1454 If you do not want to have this option globally, you can still turn
1455 it on for individual forms.
1456
1457 $agent->submit_form( fields => { foo => 'bar' } , strict_forms => 1 );
1458
1460 WWW::Mechanize is hosted at GitHub.
1461
1462 Repository: <https://github.com/libwww-perl/WWW-Mechanize>. Bugs:
1463 <https://github.com/libwww-perl/WWW-Mechanize/issues>.
1464
1466 Spidering Hacks, by Kevin Hemenway and Tara Calishain
1467 Spidering Hacks from O'Reilly
1468 (<http://www.oreilly.com/catalog/spiderhks/>) is a great book for
1469 anyone wanting to know more about screen-scraping and spidering.
1470
1471 There are six hacks that use Mech or a Mech derivative:
1472
1473 #21 WWW::Mechanize 101
1474 #22 Scraping with WWW::Mechanize
1475 #36 Downloading Images from Webshots
1476 #44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
1477 #64 Super Author Searching
1478 #73 Scraping TV Listings
1479
1480 The book was also positively reviewed on Slashdot:
1481 <http://books.slashdot.org/article.pl?sid=03/12/11/2126256>
1482
1484 • WWW::Mechanize mailing list
1485
1486 The Mech mailing list is at
1487 <http://groups.google.com/group/www-mechanize-users> and is
1488 specific to Mechanize, unlike the LWP mailing list below. Although
1489 it is a users list, all development discussion takes place here,
1490 too.
1491
1492 • LWP mailing list
1493
1494 The LWP mailing list is at
1495 <http://lists.perl.org/showlist.cgi?name=libwww>, and is more user-
1496 oriented and well-populated than the WWW::Mechanize list.
1497
1498 • Perlmonks
1499
1500 <http://perlmonks.org> is an excellent community of support, and
1501 many questions about Mech have already been answered there.
1502
1503 • WWW::Mechanize::Examples
1504
1505 A random array of examples submitted by users, included with the
1506 Mechanize distribution.
1507
1509 • <http://www.ibm.com/developerworks/linux/library/wa-perlsecure/>
1510
1511 IBM article "Secure Web site access with Perl"
1512
1513 • <http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf>
1514
1515 Leland Johnson's hack #84 in Google Hacks, 2nd Edition is an
1516 example of a production script that uses WWW::Mechanize and
1517 HTML::TableContentParser. It takes in keywords and returns the
1518 estimated price of these keywords on Google's AdWords program.
1519
1520 • <http://www.perl.com/pub/a/2004/06/04/recorder.html>
1521
1522 Linda Julien writes about using HTTP::Recorder to create
1523 WWW::Mechanize scripts.
1524
1525 • <http://www.developer.com/lang/other/article.php/3454041>
1526
1527 Jason Gilmore's article on using WWW::Mechanize for scraping sales
1528 information from Amazon and eBay.
1529
1530 • <http://www.perl.com/pub/a/2003/01/22/mechanize.html>
1531
1532 Chris Ball's article about using WWW::Mechanize for scraping TV
1533 listings.
1534
1535 • <http://www.stonehenge.com/merlyn/LinuxMag/col47.html>
1536
1537 Randal Schwartz's article on scraping Yahoo News for images. It's
1538 already out of date: He manually walks the list of links hunting
1539 for matches, which wouldn't have been necessary if the
1540 "find_link()" method existed at press time.
1541
1542 • <http://www.perladvent.org/2002/16th/>
1543
1544 WWW::Mechanize on the Perl Advent Calendar, by Mark Fowler.
1545
1546 • <http://www.linux-magazin.de/ausgaben/2004/03/datenruessel/>
1547
1548 Michael Schilli's article on Mech and WWW::Mechanize::Shell for the
1549 German magazine Linux Magazin.
1550
1551 Other modules that use Mechanize
1552 Here are modules that use or subclass Mechanize. Let me know of any
1553 others:
1554
1555 • Finance::Bank::LloydsTSB
1556
1557 • HTTP::Recorder
1558
1559 Acts as a proxy for web interaction, and then generates
1560 WWW::Mechanize scripts.
1561
1562 • Win32::IE::Mechanize
1563
1564 Just like Mech, but using Microsoft Internet Explorer to do the
1565 work.
1566
1567 • WWW::Bugzilla
1568
1569 • WWW::Google::Groups
1570
1571 • WWW::Hotmail
1572
1573 • WWW::Mechanize::Cached
1574
1575 • WWW::Mechanize::Cached::GZip
1576
1577 • WWW::Mechanize::FormFiller
1578
1579 • WWW::Mechanize::Shell
1580
1581 • WWW::Mechanize::Sleepy
1582
1583 • WWW::Mechanize::SpamCop
1584
1585 • WWW::Mechanize::Timed
1586
1587 • WWW::SourceForge
1588
1589 • WWW::Yahoo::Groups
1590
1591 • WWW::Scripter
1592
1594 Thanks to the numerous people who have helped out on WWW::Mechanize in
1595 one way or another, including Kirrily Robert for the original
1596 "WWW::Automate", Lyle Hopkins, Damien Clark, Ansgar Burchardt, Gisle
1597 Aas, Jeremy Ary, Hilary Holz, Rafael Kitover, Norbert Buchmuller, Dave
1598 Page, David Sainty, H.Merijn Brand, Matt Lawrence, Michael Schwern,
1599 Adriano Ferreira, Miyagawa, Peteris Krumins, Rafael Kitover, David
1600 Steinbrunner, Kevin Falcone, Mike O'Regan, Mark Stosberg, Uri Guttman,
1601 Peter Scott, Philippe Bruhat, Ian Langworth, John Beppu, Gavin Estey,
1602 Jim Brandt, Ask Bjoern Hansen, Greg Davies, Ed Silva, Mark-Jason
1603 Dominus, Autrijus Tang, Mark Fowler, Stuart Children, Max Maischein,
1604 Meng Wong, Prakash Kailasa, Abigail, Jan Pazdziora, Dominique
1605 Quatravaux, Scott Lanning, Rob Casey, Leland Johnson, Joshua Gatcomb,
1606 Julien Beasley, Abe Timmerman, Peter Stevens, Pete Krawczyk, Tad
1607 McClellan, and the late great Iain Truskett.
1608
1610 Andy Lester <andy at petdance.com>
1611
1613 This software is copyright (c) 2004 by Andy Lester.
1614
1615 This is free software; you can redistribute it and/or modify it under
1616 the same terms as the Perl 5 programming language system itself.
1617
1618
1619
1620perl v5.36.0 2022-07-22 WWW::Mechanize(3)