1WWW::Mechanize(3) User Contributed Perl Documentation WWW::Mechanize(3)
2
3
4
6 WWW::Mechanize - Handy web browsing in a Perl object
7
9 version 2.03
10
12 WWW::Mechanize supports performing a sequence of page fetches including
13 following links and submitting forms. Each fetched page is parsed and
14 its links and forms are extracted. A link or a form can be selected,
15 form fields can be filled and the next page can be fetched. Mech also
16 stores a history of the URLs you've visited, which can be queried and
17 revisited.
18
19 use WWW::Mechanize ();
20 my $mech = WWW::Mechanize->new();
21
22 $mech->get( $url );
23
24 $mech->follow_link( n => 3 );
25 $mech->follow_link( text_regex => qr/download this/i );
26 $mech->follow_link( url => 'http://host.com/index.html' );
27
28 $mech->submit_form(
29 form_number => 3,
30 fields => {
31 username => 'mungo',
32 password => 'lost-and-alone',
33 }
34 );
35
36 $mech->submit_form(
37 form_name => 'search',
38 fields => { query => 'pot of gold', },
39 button => 'Search Now'
40 );
41
42 # Enable strict form processing to catch typos and non-existant form fields.
43 my $strict_mech = WWW::Mechanize->new( strict_forms => 1);
44
45 $strict_mech->get( $url );
46
47 # This method call will die, saving you lots of time looking for the bug.
48 $strict_mech->submit_form(
49 form_number => 3,
50 fields => {
51 usernaem => 'mungo', # typo in field name
52 password => 'lost-and-alone',
53 extra_field => 123, # field does not exist
54 }
55 );
56
58 "WWW::Mechanize", or Mech for short, is a Perl module for stateful
59 programmatic web browsing, used for automating interaction with
60 websites.
61
62 Features include:
63
64 • All HTTP methods
65
66 • High-level hyperlink and HTML form support, without having to parse
67 HTML yourself
68
69 • SSL support
70
71 • Automatic cookies
72
73 • Custom HTTP headers
74
75 • Automatic handling of redirections
76
77 • Proxies
78
79 • HTTP authentication
80
81 Mech is well suited for use in testing web applications. If you use
82 one of the Test::*, like Test::HTML::Lint modules, you can check the
83 fetched content and use that as input to a test call.
84
85 use Test::More;
86 like( $mech->content(), qr/$expected/, "Got expected content" );
87
88 Each page fetch stores its URL in a history stack which you can
89 traverse.
90
91 $mech->back();
92
93 If you want finer control over your page fetching, you can use these
94 methods. "follow_link" and "submit_form" are just high level wrappers
95 around them.
96
97 $mech->find_link( n => $number );
98 $mech->form_number( $number );
99 $mech->form_name( $name );
100 $mech->field( $name, $value );
101 $mech->set_fields( %field_values );
102 $mech->set_visible( @criteria );
103 $mech->click( $button );
104
105 WWW::Mechanize is a proper subclass of LWP::UserAgent and you can also
106 use any of LWP::UserAgent's methods.
107
108 $mech->add_header($name => $value);
109
110 Please note that Mech does NOT support JavaScript, you need additional
111 software for that. Please check "JavaScript" in WWW::Mechanize::FAQ for
112 more.
113
115 • <https://github.com/libwww-perl/WWW-Mechanize/issues>
116
117 The queue for bugs & enhancements in WWW::Mechanize. Please note
118 that the queue at <http://rt.cpan.org> is no longer maintained.
119
120 • <https://metacpan.org/pod/WWW::Mechanize>
121
122 The CPAN documentation page for Mechanize.
123
124 • <https://metacpan.org/pod/distribution/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod>
125
126 Frequently asked questions. Make sure you read here FIRST.
127
129 new()
130 Creates and returns a new WWW::Mechanize object, hereafter referred to
131 as the "agent".
132
133 my $mech = WWW::Mechanize->new()
134
135 The constructor for WWW::Mechanize overrides two of the params to the
136 LWP::UserAgent constructor:
137
138 agent => 'WWW-Mechanize/#.##'
139 cookie_jar => {} # an empty, memory-only HTTP::Cookies object
140
141 You can override these overrides by passing params to the constructor,
142 as in:
143
144 my $mech = WWW::Mechanize->new( agent => 'wonderbot 1.01' );
145
146 If you want none of the overhead of a cookie jar, or don't want your
147 bot accepting cookies, you have to explicitly disallow it, like so:
148
149 my $mech = WWW::Mechanize->new( cookie_jar => undef );
150
151 Here are the params that WWW::Mechanize recognizes. These do not
152 include params that LWP::UserAgent recognizes.
153
154 • "autocheck => [0|1]"
155
156 Checks each request made to see if it was successful. This saves
157 you the trouble of manually checking yourself. Any errors found
158 are errors, not warnings.
159
160 The default value is ON, unless it's being subclassed, in which
161 case it is OFF. This means that standalone WWW::Mechanize
162 instances have autocheck turned on, which is protective for the
163 vast majority of Mech users who don't bother checking the return
164 value of get() and post() and can't figure why their code fails.
165 However, if WWW::Mechanize is subclassed, such as for
166 Test::WWW::Mechanize or Test::WWW::Mechanize::Catalyst, this may
167 not be an appropriate default, so it's off.
168
169 • "noproxy => [0|1]"
170
171 Turn off the automatic call to the LWP::UserAgent "env_proxy"
172 function.
173
174 This needs to be explicitly turned off if you're using
175 Crypt::SSLeay to access a https site via a proxy server. Note: you
176 still need to set your HTTPS_PROXY environment variable as
177 appropriate.
178
179 • "onwarn => \&func"
180
181 Reference to a "warn"-compatible function, such as "Carp::carp",
182 that is called when a warning needs to be shown.
183
184 If this is set to "undef", no warnings will ever be shown.
185 However, it's probably better to use the "quiet" method to control
186 that behavior.
187
188 If this value is not passed, Mech uses "Carp::carp" if Carp is
189 installed, or "CORE::warn" if not.
190
191 • "onerror => \&func"
192
193 Reference to a "die"-compatible function, such as "Carp::croak",
194 that is called when there's a fatal error.
195
196 If this is set to "undef", no errors will ever be shown.
197
198 If this value is not passed, Mech uses "Carp::croak" if Carp is
199 installed, or "CORE::die" if not.
200
201 • "quiet => [0|1]"
202
203 Don't complain on warnings. Setting "quiet => 1" is the same as
204 calling "$mech->quiet(1)". Default is off.
205
206 • "stack_depth => $value"
207
208 Sets the depth of the page stack that keeps track of all the
209 downloaded pages. Default is effectively infinite stack size. If
210 the stack is eating up your memory, then set this to a smaller
211 number, say 5 or 10. Setting this to zero means Mech will keep no
212 history.
213
214 In addition, WWW::Mechanize also allows you to globally enable strict
215 and verbose mode for form handling, which is done with HTML::Form.
216
217 • "strict_forms => [0|1]"
218
219 Globally sets the HTML::Form strict flag which causes form
220 submission to croak if any of the passed fields don't exist in the
221 form, and/or a value doesn't exist in a select element. This can
222 still be disabled in individual calls to "submit_form()".
223
224 Default is off.
225
226 • "verbose_forms => [0|1]"
227
228 Globally sets the HTML::Form verbose flag which causes form
229 submission to warn about any bad HTML form constructs found. This
230 cannot be disabled later.
231
232 Default is off.
233
234 • "marked_sections => [0|1]"
235
236 Globally sets the HTML::Parser marked sections flag which causes
237 HTML "CDATA[[" sections to be honoured. This cannot be disabled
238 later.
239
240 Default is on.
241
242 To support forms, WWW::Mechanize's constructor pushes POST on to the
243 agent's "requests_redirectable" list (see also LWP::UserAgent.)
244
245 $mech->agent_alias( $alias )
246 Sets the user agent string to the expanded version from a table of
247 actual user strings. $alias can be one of the following:
248
249 • Windows IE 6
250
251 • Windows Mozilla
252
253 • Mac Safari
254
255 • Mac Mozilla
256
257 • Linux Mozilla
258
259 • Linux Konqueror
260
261 then it will be replaced with a more interesting one. For instance,
262
263 $mech->agent_alias( 'Windows IE 6' );
264
265 sets your User-Agent to
266
267 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
268
269 The list of valid aliases can be returned from "known_agent_aliases()".
270 The current list is:
271
272 • Windows IE 6
273
274 • Windows Mozilla
275
276 • Mac Safari
277
278 • Mac Mozilla
279
280 • Linux Mozilla
281
282 • Linux Konqueror
283
284 known_agent_aliases()
285 Returns a list of all the agent aliases that Mech knows about.
286
288 $mech->get( $uri )
289 Given a URL/URI, fetches it. Returns an HTTP::Response object. $uri
290 can be a well-formed URL string, a URI object, or a
291 WWW::Mechanize::Link object.
292
293 The results are stored internally in the agent object, but you don't
294 know that. Just use the accessors listed below. Poking at the
295 internals is deprecated and subject to change in the future.
296
297 "get()" is a well-behaved overloaded version of the method in
298 LWP::UserAgent. This lets you do things like
299
300 $mech->get( $uri, ':content_file' => $tempfile );
301
302 and you can rest assured that the params will get filtered down
303 appropriately.
304
305 NOTE: Because ":content_file" causes the page contents to be stored in
306 a file instead of the response object, some Mech functions that expect
307 it to be there won't work as expected. Use with caution.
308
309 $mech->post( $uri, content => $content )
310 POSTs $content to $uri. Returns an HTTP::Response object. $uri can be
311 a well-formed URI string, a URI object, or a WWW::Mechanize::Link
312 object.
313
314 $mech->put( $uri, content => $content )
315 PUTs $content to $uri. Returns an HTTP::Response object. $uri can be
316 a well-formed URI string, a URI object, or a WWW::Mechanize::Link
317 object.
318
319 $mech->reload()
320 Acts like the reload button in a browser: repeats the current request.
321 The history (as per the back() method) is not altered.
322
323 Returns the HTTP::Response object from the reload, or "undef" if
324 there's no current request.
325
326 $mech->back()
327 The equivalent of hitting the "back" button in a browser. Returns to
328 the previous page. Won't go back past the first page. (Really, what
329 would it do if it could?)
330
331 Returns true if it could go back, or false if not.
332
333 $mech->clear_history()
334 This deletes all the history entries and returns true.
335
336 $mech->history_count()
337 This returns the number of items in the browser history. This number
338 does include the most recently made request.
339
340 $mech->history($n)
341 This returns the nth item in history. The 0th item is the most recent
342 request and response, which would be acted on by methods like
343 "find_link()". The 1st item is the state you'd return to if you called
344 "back()".
345
346 The maximum useful value for $n is "$mech->history_count - 1".
347 Requests beyond that bound will return "undef".
348
349 History items are returned as hash references, in the form:
350
351 { req => $http_request, res => $http_response }
352
354 $mech->success()
355 Returns a boolean telling whether the last request was successful. If
356 there hasn't been an operation yet, returns false.
357
358 This is a convenience function that wraps "$mech->res->is_success".
359
360 $mech->uri()
361 Returns the current URI as a URI object. This object stringifies to the
362 URI itself.
363
364 $mech->response() / $mech->res()
365 Return the current response as an HTTP::Response object.
366
367 Synonym for "$mech->response()"
368
369 $mech->status()
370 Returns the HTTP status code of the response. This is a 3-digit number
371 like 200 for OK, 404 for not found, and so on.
372
373 $mech->ct() / $mech->content_type()
374 Returns the content type of the response.
375
376 $mech->base()
377 Returns the base URI for the current response
378
379 $mech->forms()
380 When called in a list context, returns a list of the forms found in the
381 last fetched page. In a scalar context, returns a reference to an array
382 with those forms. The forms returned are all HTML::Form objects.
383
384 $mech->current_form()
385 Returns the current form as an HTML::Form object.
386
387 $mech->links()
388 When called in a list context, returns a list of the links found in the
389 last fetched page. In a scalar context it returns a reference to an
390 array with those links. Each link is a WWW::Mechanize::Link object.
391
392 $mech->is_html()
393 Returns true/false on whether our content is HTML, according to the
394 HTTP headers.
395
396 $mech->title()
397 Returns the contents of the "<TITLE>" tag, as parsed by
398 HTML::HeadParser. Returns undef if the content is not HTML.
399
400 $mech->redirects()
401 Convenience method to get the redirects from the most recent
402 HTTP::Response.
403
404 Note that you can also use is_redirect to see if the most recent
405 response was a redirect like this.
406
407 $mech->get($url);
408 do_stuff() if $mech->res->is_redirect;
409
411 $mech->content(...)
412 Returns the content that the mech uses internally for the last page
413 fetched. Ordinarily this is the same as
414 "$mech->response()->decoded_content()", but this may differ for HTML
415 documents if update_html is overloaded (in which case the value passed
416 to the base-class implementation of same will be returned), and/or
417 extra named arguments are passed to content():
418
419 $mech->content( format => 'text' )
420 Returns a text-only version of the page, with all HTML markup
421 stripped. This feature requires HTML::TreeBuilder version 5 or higher
422 to be installed, or a fatal error will be thrown. This works only if
423 the contents are HTML.
424
425 $mech->content( base_href => [$base_href|undef] )
426 Returns the HTML document, modified to contain a "<base
427 href="$base_href">" mark-up in the header. $base_href is
428 "$mech->base()" if not specified. This is handy to pass the HTML to
429 e.g. HTML::Display. This works only if the contents are HTML.
430
431 $mech->content( raw => 1 )
432 Returns "$self->response()->content()", i.e. the raw contents from
433 the response.
434
435 $mech->content( decoded_by_headers => 1 )
436 Returns the content after applying all "Content-Encoding" headers but
437 with not additional mangling.
438
439 $mech->content( charset => $charset )
440 Returns "$self->response()->decoded_content(charset => $charset)"
441 (see HTTP::Response for details).
442
443 To preserve backwards compatibility, additional parameters will be
444 ignored unless none of "raw | decoded_by_headers | charset" is
445 specified and the text is HTML, in which case an error will be
446 triggered.
447
448 A fresh instance of WWW::Mechanize will return "undef" when
449 "$mech->content()" is called, because no content is present before a
450 request has been made.
451
452 $mech->text()
453 Returns the text of the current HTML content. If the content isn't
454 HTML, $mech will die.
455
456 The text is extracted by parsing the content, and then the extracted
457 text is cached, so don't worry about performance of calling this
458 repeatedly.
459
461 $mech->links()
462 Lists all the links on the current page. Each link is a
463 WWW::Mechanize::Link object. In list context, returns a list of all
464 links. In scalar context, returns an array reference of all links.
465
466 $mech->follow_link(...)
467 Follows a specified link on the page. You specify the match to be
468 found using the same params that "find_link()" uses.
469
470 Here some examples:
471
472 • 3rd link called "download"
473
474 $mech->follow_link( text => 'download', n => 3 );
475
476 • first link where the URL has "download" in it, regardless of case:
477
478 $mech->follow_link( url_regex => qr/download/i );
479
480 or
481
482 $mech->follow_link( url_regex => qr/(?i:download)/ );
483
484 • 3rd link on the page
485
486 $mech->follow_link( n => 3 );
487
488 • the link with the url
489
490 $mech->follow_link( url => '/other/page' );
491
492 or
493
494 $mech->follow_link( url => 'http://example.com/page' );
495
496 Returns the result of the "GET" method (an HTTP::Response object) if a
497 link was found.
498
499 If the page has no links, or the specified link couldn't be found,
500 returns "undef". If "autocheck" is enabled an exception will be thrown
501 instead.
502
503 $mech->find_link( ... )
504 Finds a link in the currently fetched page. It returns a
505 WWW::Mechanize::Link object which describes the link. (You'll probably
506 be most interested in the "url()" property.) If it fails to find a
507 link it returns undef.
508
509 You can take the URL part and pass it to the "get()" method. If that's
510 your plan, you might as well use the "follow_link()" method directly,
511 since it does the "get()" for you automatically.
512
513 Note that "<FRAME SRC="...">" tags are parsed out of the HTML and
514 treated as links so this method works with them.
515
516 You can select which link to find by passing in one or more of these
517 key/value pairs:
518
519 • "text => 'string'," and "text_regex => qr/regex/,"
520
521 "text" matches the text of the link against string, which must be
522 an exact match. To select a link with text that is exactly
523 "download", use
524
525 $mech->find_link( text => 'download' );
526
527 "text_regex" matches the text of the link against regex. To select
528 a link with text that has "download" anywhere in it, regardless of
529 case, use
530
531 $mech->find_link( text_regex => qr/download/i );
532
533 Note that the text extracted from the page's links are trimmed.
534 For example, "<a> foo </a>" is stored as 'foo', and searching for
535 leading or trailing spaces will fail.
536
537 • "url => 'string'," and "url_regex => qr/regex/,"
538
539 Matches the URL of the link against string or regex, as
540 appropriate. The URL may be a relative URL, like foo/bar.html,
541 depending on how it's coded on the page.
542
543 • "url_abs => string" and "url_abs_regex => regex"
544
545 Matches the absolute URL of the link against string or regex, as
546 appropriate. The URL will be an absolute URL, even if it's
547 relative in the page.
548
549 • "name => string" and "name_regex => regex"
550
551 Matches the name of the link against string or regex, as
552 appropriate.
553
554 • "rel => string" and "rel_regex => regex"
555
556 Matches the rel of the link against string or regex, as
557 appropriate. This can be used to find stylesheets, favicons, or
558 links the author of the page does not want bots to follow.
559
560 • "id => string" and "id_regex => regex"
561
562 Matches the attribute 'id' of the link against string or regex, as
563 appropriate.
564
565 • "class => string" and "class_regex => regex"
566
567 Matches the attribute 'class' of the link against string or regex,
568 as appropriate.
569
570 • "tag => string" and "tag_regex => regex"
571
572 Matches the tag that the link came from against string or regex, as
573 appropriate. The "tag_regex" is probably most useful to check for
574 more than one tag, as in:
575
576 $mech->find_link( tag_regex => qr/^(a|frame)$/ );
577
578 The tags and attributes looked at are defined below.
579
580 If "n" is not specified, it defaults to 1. Therefore, if you don't
581 specify any params, this method defaults to finding the first link on
582 the page.
583
584 Note that you can specify multiple text or URL parameters, which will
585 be ANDed together. For example, to find the first link with text of
586 "News" and with "cnn.com" in the URL, use:
587
588 $mech->find_link( text => 'News', url_regex => qr/cnn\.com/ );
589
590 The return value is a reference to an array containing a
591 WWW::Mechanize::Link object for every link in "$self->content".
592
593 The links come from the following:
594
595 "<a href=...>"
596 "<area href=...>"
597 "<frame src=...>"
598 "<iframe src=...>"
599 "<link href=...>"
600 "<meta content=...>"
601
602 $mech->find_all_links( ... )
603 Returns all the links on the current page that match the criteria. The
604 method for specifying link criteria is the same as in "find_link()".
605 Each of the links returned is a WWW::Mechanize::Link object.
606
607 In list context, "find_all_links()" returns a list of the links.
608 Otherwise, it returns a reference to the list of links.
609
610 "find_all_links()" with no parameters returns all links in the page.
611
612 $mech->find_all_inputs( ... criteria ... )
613 find_all_inputs() returns an array of all the input controls in the
614 current form whose properties match all of the regexes passed in. The
615 controls returned are all descended from HTML::Form::Input. See
616 "INPUTS" in HTML::Form for details.
617
618 If no criteria are passed, all inputs will be returned.
619
620 If there is no current page, there is no form on the current page, or
621 there are no submit controls in the current form then the return will
622 be an empty array.
623
624 You may use a regex or a literal string:
625
626 # get all textarea controls whose names begin with "customer"
627 my @customer_text_inputs = $mech->find_all_inputs(
628 type => 'textarea',
629 name_regex => qr/^customer/,
630 );
631
632 # get all text or textarea controls called "customer"
633 my @customer_text_inputs = $mech->find_all_inputs(
634 type_regex => qr/^(text|textarea)$/,
635 name => 'customer',
636 );
637
638 $mech->find_all_submits( ... criteria ... )
639 "find_all_submits()" does the same thing as "find_all_inputs()" except
640 that it only returns controls that are submit controls, ignoring other
641 types of input controls like text and checkboxes.
642
644 $mech->images
645 Lists all the images on the current page. Each image is a
646 WWW::Mechanize::Image object. In list context, returns a list of all
647 images. In scalar context, returns an array reference of all images.
648
649 $mech->find_image()
650 Finds an image in the current page. It returns a WWW::Mechanize::Image
651 object which describes the image. If it fails to find an image it
652 returns undef.
653
654 You can select which image to find by passing in one or more of these
655 key/value pairs:
656
657 • "alt => 'string'" and "alt_regex => qr/regex/"
658
659 "alt" matches the ALT attribute of the image against string, which
660 must be an exact match. To select a image with an ALT tag that is
661 exactly "download", use
662
663 $mech->find_image( alt => 'download' );
664
665 "alt_regex" matches the ALT attribute of the image against a
666 regular expression. To select an image with an ALT attribute that
667 has "download" anywhere in it, regardless of case, use
668
669 $mech->find_image( alt_regex => qr/download/i );
670
671 • "url => 'string'" and "url_regex => qr/regex/"
672
673 Matches the URL of the image against string or regex, as
674 appropriate. The URL may be a relative URL, like foo/bar.html,
675 depending on how it's coded on the page.
676
677 • "url_abs => string" and "url_abs_regex => regex"
678
679 Matches the absolute URL of the image against string or regex, as
680 appropriate. The URL will be an absolute URL, even if it's
681 relative in the page.
682
683 • "tag => string" and "tag_regex => regex"
684
685 Matches the tag that the image came from against string or regex,
686 as appropriate. The "tag_regex" is probably most useful to check
687 for more than one tag, as in:
688
689 $mech->find_image( tag_regex => qr/^(img|input)$/ );
690
691 The tags supported are "<img>" and "<input>".
692
693 • "id => string" and "id_regex => regex"
694
695 "id" matches the id attribute of the image against string, which
696 must be an exact match. To select an image with the exact id
697 "download-image", use
698
699 $mech->find_image( id => 'download-image' );
700
701 "id_regex" matches the id attribute of the image against a regular
702 expression. To select the first image with an id that contains
703 "download" anywhere in it, use
704
705 $mech->find_image( id_regex => qr/download/ );
706
707 • "classs => string" and "class_regex => regex"
708
709 "class" matches the class attribute of the image against string,
710 which must be an exact match. To select an image with the exact
711 class "img-fuid", use
712
713 $mech->find_image( class => 'img-fluid' );
714
715 To select an image with the class attribute "rounded float-left",
716 use
717
718 $mech->find_image( class => 'rounded float-left' );
719
720 Note that the classes have to be matched as a complete string, in
721 the exact order they appear in the website's source code.
722
723 "class_regex" matches the class attribute of the image against a
724 regular expression. Use this if you want a partial class name, or
725 if an image has several classes, but you only care about one.
726
727 To select the first image with the class "rounded", where there are
728 multiple images that might also have either class "float-left" or
729 "float-right", use
730
731 $mech->find_image( class_regex => qr/\brounded\b/ );
732
733 Selecting an image with multiple classes where you do not care
734 about the order they appear in the website's source code is not
735 currently supported.
736
737 If "n" is not specified, it defaults to 1. Therefore, if you don't
738 specify any params, this method defaults to finding the first image on
739 the page.
740
741 Note that you can specify multiple ALT or URL parameters, which will be
742 ANDed together. For example, to find the first image with ALT text of
743 "News" and with "cnn.com" in the URL, use:
744
745 $mech->find_image( image => 'News', url_regex => qr/cnn\.com/ );
746
747 The return value is a reference to an array containing a
748 WWW::Mechanize::Image object for every image in "$self->content".
749
750 $mech->find_all_images( ... )
751 Returns all the images on the current page that match the criteria.
752 The method for specifying image criteria is the same as in
753 "find_image()". Each of the images returned is a WWW::Mechanize::Image
754 object.
755
756 In list context, "find_all_images()" returns a list of the images.
757 Otherwise, it returns a reference to the list of images.
758
759 "find_all_images()" with no parameters returns all images in the page.
760
762 These methods let you work with the forms on a page. The idea is to
763 choose a form that you'll later work with using the field methods
764 below.
765
766 $mech->forms
767 Lists all the forms on the current page. Each form is an HTML::Form
768 object. In list context, returns a list of all forms. In scalar
769 context, returns an array reference of all forms.
770
771 $mech->form_number($number)
772 Selects the numberth form on the page as the target for subsequent
773 calls to "field()" and "click()". Also returns the form that was
774 selected.
775
776 If it is found, the form is returned as an HTML::Form object and set
777 internally for later use with Mech's form methods such as "field()" and
778 "click()". When called in a list context, the number of the found form
779 is also returned as a second value.
780
781 Emits a warning and returns undef if no form is found.
782
783 The first form is number 1, not zero.
784
785 $mech->form_name( $name )
786 Selects a form by name. If there is more than one form on the page
787 with that name, then the first one is used, and a warning is generated.
788
789 If it is found, the form is returned as an HTML::Form object and set
790 internally for later use with Mech's form methods such as "field()" and
791 "click()".
792
793 Returns undef if no form is found.
794
795 $mech->form_id( $name )
796 Selects a form by ID. If there is more than one form on the page with
797 that ID, then the first one is used, and a warning is generated.
798
799 If it is found, the form is returned as an HTML::Form object and set
800 internally for later use with Mech's form methods such as "field()" and
801 "click()".
802
803 If no form is found it returns "undef". This will also trigger a
804 warning, unless "quiet" is enabled.
805
806 $mech->all_forms_with_fields( @fields )
807 Selects a form by passing in a list of field names it must contain.
808 All matching forms (perhaps none) are returned as a list of HTML::Form
809 objects.
810
811 $mech->form_with_fields( @fields )
812 Selects a form by passing in a list of field names it must contain. If
813 there is more than one form on the page with that matches, then the
814 first one is used, and a warning is generated.
815
816 If it is found, the form is returned as an HTML::Form object and set
817 internally for later used with Mech's form methods such as "field()"
818 and "click()".
819
820 Returns undef and emits a warning if no form is found.
821
822 Note that this functionality requires libwww-perl 5.69 or higher.
823
824 $mech->all_forms_with( $attr1 => $value1, $attr2 => $value2, ... )
825 Searches for forms with arbitrary attribute/value pairs within the
826 <form> tag. (Currently does not work for attribute "action" due to
827 implementation details of HTML::Form.) When given more than one pair,
828 all criteria must match. Using "undef" as value means that the
829 attribute in question must not be present.
830
831 All matching forms (perhaps none) are returned as a list of HTML::Form
832 objects.
833
834 $mech->form_with( $attr1 => $value1, $attr2 => $value2, ... )
835 Searches for forms with arbitrary attribute/value pairs within the
836 <form> tag. (Currently does not work for attribute "action" due to
837 implementation details of HTML::Form.) When given more than one pair,
838 all criteria must match. Using "undef" as value means that the
839 attribute in question must not be present.
840
841 If it is found, the form is returned as an HTML::Form object and set
842 internally for later used with Mech's form methods such as "field()"
843 and "click()".
844
845 Returns undef if no form is found.
846
848 These methods allow you to set the values of fields in a given form.
849
850 $mech->field( $name, $value, $number )
851 $mech->field( $name, \@values, $number )
852 Given the name of a field, set its value to the value specified. This
853 applies to the current form (as set by the "form_name()" or
854 "form_number()" method or defaulting to the first form on the page).
855
856 The optional $number parameter is used to distinguish between two
857 fields with the same name. The fields are numbered from 1.
858
859 $mech->select($name, $value)
860 $mech->select($name, \@values)
861 Given the name of a "select" field, set its value to the value
862 specified. If the field is not "<select multiple>" and the $value is
863 an array, only the first value will be set. [Note: the documentation
864 previously claimed that only the last value would be set, but this was
865 incorrect.] Passing $value as a hash with an "n" key selects an item
866 by number (e.g. "{n => 3}" or "{n => [2,4]}"). The numbering starts
867 at 1. This applies to the current form.
868
869 If you have a field with "<select multiple>" and you pass a single
870 $value, then $value will be added to the list of fields selected,
871 without clearing the others. However, if you pass an array reference,
872 then all previously selected values will be cleared.
873
874 Returns true on successfully setting the value. On failure, returns
875 false and calls "$self->warn()" with an error message.
876
877 $mech->set_fields( $name => $value ... )
878 This method sets multiple fields of the current form. It takes a list
879 of field name and value pairs. If there is more than one field with the
880 same name, the first one found is set. If you want to select which of
881 the duplicate field to set, use a value which is an anonymous array
882 which has the field value and its number as the 2 elements.
883
884 # set the second foo field
885 $mech->set_fields( $name => [ 'foo', 2 ] );
886
887 The fields are numbered from 1.
888
889 This applies to the current form.
890
891 $mech->set_visible( @criteria )
892 This method sets fields of the current form without having to know
893 their names. So if you have a login screen that wants a username and
894 password, you do not have to fetch the form and inspect the source (or
895 use the mech-dump utility, installed with WWW::Mechanize) to see what
896 the field names are; you can just say
897
898 $mech->set_visible( $username, $password );
899
900 and the first and second fields will be set accordingly. The method is
901 called set_visible because it acts only on visible fields; hidden form
902 inputs are not considered. The order of the fields is the order in
903 which they appear in the HTML source which is nearly always the order
904 anyone viewing the page would think they are in, but some creative work
905 with tables could change that; caveat user.
906
907 Each element in @criteria is either a field value or a field specifier.
908 A field value is a scalar. A field specifier allows you to specify the
909 type of input field you want to set and is denoted with an arrayref
910 containing two elements. So you could specify the first radio button
911 with
912
913 $mech->set_visible( [ radio => 'KCRW' ] );
914
915 Field values and specifiers can be intermixed, hence
916
917 $mech->set_visible( 'fred', 'secret', [ option => 'Checking' ] );
918
919 would set the first two fields to "fred" and "secret", and the next
920 "OPTION" menu field to "Checking".
921
922 The possible field specifier types are: "text", "password", "hidden",
923 "textarea", "file", "image", "submit", "radio", "checkbox" and
924 "option".
925
926 "set_visible" returns the number of values set.
927
928 $mech->tick( $name, $value [, $set] )
929 "Ticks" the first checkbox that has both the name and value associated
930 with it on the current form. Dies if there is no named check box for
931 that value. Passing in a false value as the third optional argument
932 will cause the checkbox to be unticked.
933
934 $mech->untick($name, $value)
935 Causes the checkbox to be unticked. Shorthand for
936 "tick($name,$value,undef)"
937
938 $mech->value( $name [, $number] )
939 Given the name of a field, return its value. This applies to the
940 current form.
941
942 The optional $number parameter is used to distinguish between two
943 fields with the same name. The fields are numbered from 1.
944
945 If the field is of type file (file upload field), the value is always
946 cleared to prevent remote sites from downloading your local files. To
947 upload a file, specify its file name explicitly.
948
949 $mech->click( $button [, $x, $y] )
950 Has the effect of clicking a button on the current form. The first
951 argument is the name of the button to be clicked. The second and third
952 arguments (optional) allow you to specify the (x,y) coordinates of the
953 click.
954
955 If there is only one button on the form, "$mech->click()" with no
956 arguments simply clicks that one button.
957
958 Returns an HTTP::Response object.
959
960 $mech->click_button( ... )
961 Has the effect of clicking a button on the current form by specifying
962 its attributes. The arguments are a list of key/value pairs. Only one
963 of name, id, number, input or value must be specified in the keys.
964
965 Dies if no button is found.
966
967 • "name => name"
968
969 Clicks the button named name in the current form.
970
971 • "id => id"
972
973 Clicks the button with the id id in the current form.
974
975 • "number => n"
976
977 Clicks the nth button with type submit in the current form.
978 Numbering starts at 1.
979
980 • "value => value"
981
982 Clicks the button with the value value in the current form.
983
984 • "input => $inputobject"
985
986 Clicks on the button referenced by $inputobject, an instance of
987 HTML::Form::SubmitInput obtained e.g. from
988
989 $mech->current_form()->find_input( undef, 'submit' )
990
991 $inputobject must belong to the current form.
992
993 • "x => x"
994
995 • "y => y"
996
997 These arguments (optional) allow you to specify the (x,y)
998 coordinates of the click.
999
1000 $mech->submit()
1001 Submits the current form, without specifying a button to click.
1002 Actually, no button is clicked at all.
1003
1004 Returns an HTTP::Response object.
1005
1006 This used to be a synonym for "$mech->click( 'submit' )", but is no
1007 longer so.
1008
1009 $mech->submit_form( ... )
1010 This method lets you select a form from the previously fetched page,
1011 fill in its fields, and submit it. It combines the
1012 "form_number"/"form_name", "set_fields" and "click" methods into one
1013 higher level call. Its arguments are a list of key/value pairs, all of
1014 which are optional.
1015
1016 • "fields => \%fields"
1017
1018 Specifies the fields to be filled in the current form.
1019
1020 • "with_fields => \%fields"
1021
1022 Probably all you need for the common case. It combines a smart form
1023 selector and data setting in one operation. It selects the first
1024 form that contains all fields mentioned in "\%fields". This is
1025 nice because you don't need to know the name or number of the form
1026 to do this.
1027
1028 (calls "form_with_fields()" and
1029 "set_fields()").
1030
1031 If you choose "with_fields", the "fields" option will be ignored.
1032 The "form_number", "form_name" and "form_id" options will still be
1033 used. An exception will be thrown unless exactly one form matches
1034 all of the provided criteria.
1035
1036 • "form_number => n"
1037
1038 Selects the nth form (calls "form_number()". If this param is not
1039 specified, the currently-selected form is used.
1040
1041 • "form_name => name"
1042
1043 Selects the form named name (calls "form_name()")
1044
1045 • "form_id => ID"
1046
1047 Selects the form with ID ID (calls "form_id()")
1048
1049 • "button => button"
1050
1051 Clicks on button button (calls "click()")
1052
1053 • "x => x, y => y"
1054
1055 Sets the x or y values for "click()"
1056
1057 • "strict_forms => bool"
1058
1059 Sets the HTML::Form strict flag which causes form submission to
1060 croak if any of the passed fields don't exist on the page, and/or a
1061 value doesn't exist in a select element. By default HTML::Form
1062 sets this value to false.
1063
1064 This behavior can also be turned on globally by passing
1065 "strict_forms => 1" to "WWW::Mechanize->new". If you do that, you
1066 can still disable it for individual calls by passing "strict_forms
1067 => 0" here.
1068
1069 If no form is selected, the first form found is used.
1070
1071 If button is not passed, then the "submit()" method is used instead.
1072
1073 If you want to submit a file and get its content from a scalar rather
1074 than a file in the filesystem, you can use:
1075
1076 $mech->submit_form(with_fields => { logfile => [ [ undef, 'whatever', Content => $content ], 1 ] } );
1077
1078 Returns an HTTP::Response object.
1079
1081 $mech->add_header( name => $value [, name => $value... ] )
1082 Sets HTTP headers for the agent to add or remove from the HTTP request.
1083
1084 $mech->add_header( Encoding => 'text/klingon' );
1085
1086 If a value is "undef", then that header will be removed from any future
1087 requests. For example, to never send a Referer header:
1088
1089 $mech->add_header( Referer => undef );
1090
1091 If you want to delete a header, use "delete_header".
1092
1093 Returns the number of name/value pairs added.
1094
1095 NOTE: This method was very different in WWW::Mechanize before 1.00.
1096 Back then, the headers were stored in a package hash, not as a member
1097 of the object instance. Calling "add_header()" would modify the
1098 headers for every WWW::Mechanize object, even after your object no
1099 longer existed.
1100
1101 $mech->delete_header( name [, name ... ] )
1102 Removes HTTP headers from the agent's list of special headers. For
1103 instance, you might need to do something like:
1104
1105 # Don't send a Referer for this URL
1106 $mech->add_header( Referer => undef );
1107
1108 # Get the URL
1109 $mech->get( $url );
1110
1111 # Back to the default behavior
1112 $mech->delete_header( 'Referer' );
1113
1114 $mech->quiet(true/false)
1115 Allows you to suppress warnings to the screen.
1116
1117 $mech->quiet(0); # turns on warnings (the default)
1118 $mech->quiet(1); # turns off warnings
1119 $mech->quiet(); # returns the current quietness status
1120
1121 $mech->stack_depth( $max_depth )
1122 Get or set the page stack depth. Use this if you're doing a lot of page
1123 scraping and running out of memory.
1124
1125 A value of 0 means "no history at all." By default, the max stack
1126 depth is humongously large, effectively keeping all history.
1127
1128 $mech->save_content( $filename, %opts )
1129 Dumps the contents of "$mech->content" into $filename. $filename will
1130 be overwritten. Dies if there are any errors.
1131
1132 If the content type does not begin with "text/", then the content is
1133 saved in binary mode (i.e. "binmode()" is set on the output
1134 filehandle).
1135
1136 Additional arguments can be passed as key/value pairs:
1137
1138 $mech->save_content( $filename, binary => 1 )
1139 Filehandle is set with "binmode" to ":raw" and contents are taken
1140 calling "$self->content(decoded_by_headers => 1)". Same as calling:
1141
1142 $mech->save_content( $filename, binmode => ':raw',
1143 decoded_by_headers => 1 );
1144
1145 This should be the safest way to save contents verbatim.
1146
1147 $mech->save_content( $filename, binmode => $binmode )
1148 Filehandle is set to binary mode. If $binmode begins with ':', it
1149 is passed as a parameter to "binmode":
1150
1151 binmode $fh, $binmode;
1152
1153 otherwise the filehandle is set to binary mode if $binmode is true:
1154
1155 binmode $fh;
1156
1157 all other arguments
1158 are passed as-is to "$mech->content(%opts)". In particular,
1159 "decoded_by_headers" might come handy if you want to revert the
1160 effect of line compression performed by the web server but without
1161 further interpreting the contents (e.g. decoding it according to
1162 the charset).
1163
1164 $mech->dump_headers( [$fh] )
1165 Prints a dump of the HTTP response headers for the most recent
1166 response. If $fh is not specified or is undef, it dumps to STDOUT.
1167
1168 Unlike the rest of the dump_* methods, $fh can be a scalar. It will be
1169 used as a file name.
1170
1171 $mech->dump_links( [[$fh], $absolute] )
1172 Prints a dump of the links on the current page to $fh. If $fh is not
1173 specified or is undef, it dumps to STDOUT.
1174
1175 If $absolute is true, links displayed are absolute, not relative.
1176
1177 $mech->dump_images( [[$fh], $absolute] )
1178 Prints a dump of the images on the current page to $fh. If $fh is not
1179 specified or is undef, it dumps to STDOUT.
1180
1181 If $absolute is true, links displayed are absolute, not relative.
1182
1183 The output will include empty lines for images that have no "src"
1184 attribute and therefore no "<-"url>>.
1185
1186 $mech->dump_forms( [$fh] )
1187 Prints a dump of the forms on the current page to $fh. If $fh is not
1188 specified or is undef, it dumps to STDOUT. Running the following:
1189
1190 my $mech = WWW::Mechanize->new();
1191 $mech->get("https://www.google.com/");
1192 $mech->dump_forms;
1193
1194 will print:
1195
1196 GET https://www.google.com/search [f]
1197 ie=ISO-8859-1 (hidden readonly)
1198 hl=en (hidden readonly)
1199 source=hp (hidden readonly)
1200 biw= (hidden readonly)
1201 bih= (hidden readonly)
1202 q= (text)
1203 btnG=Google Search (submit)
1204 btnI=I'm Feeling Lucky (submit)
1205 gbv=1 (hidden readonly)
1206
1207 $mech->dump_text( [$fh] )
1208 Prints a dump of the text on the current page to $fh. If $fh is not
1209 specified or is undef, it dumps to STDOUT.
1210
1212 $mech->clone()
1213 Clone the mech object. The clone will be using the same cookie jar as
1214 the original mech.
1215
1216 $mech->redirect_ok()
1217 An overloaded version of "redirect_ok()" in LWP::UserAgent. This
1218 method is used to determine whether a redirection in the request should
1219 be followed.
1220
1221 Note that WWW::Mechanize's constructor pushes POST on to the agent's
1222 "requests_redirectable" list.
1223
1224 $mech->request( $request [, $arg [, $size]])
1225 Overloaded version of "request()" in LWP::UserAgent. Performs the
1226 actual request. Normally, if you're using WWW::Mechanize, it's because
1227 you don't want to deal with this level of stuff anyway.
1228
1229 Note that $request will be modified.
1230
1231 Returns an HTTP::Response object.
1232
1233 $mech->update_html( $html )
1234 Allows you to replace the HTML that the mech has found. Updates the
1235 forms and links parse-trees that the mech uses internally.
1236
1237 Say you have a page that you know has malformed output, and you want to
1238 update it so the links come out correctly:
1239
1240 my $html = $mech->content;
1241 $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
1242 $mech->update_html( $html );
1243
1244 This method is also used internally by the mech itself to update its
1245 own HTML content when loading a page. This means that if you would like
1246 to systematically perform the above HTML substitution, you would
1247 overload update_html in a subclass thusly:
1248
1249 package MyMech;
1250 use base 'WWW::Mechanize';
1251
1252 sub update_html {
1253 my ($self, $html) = @_;
1254 $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg;
1255 $self->WWW::Mechanize::update_html( $html );
1256 }
1257
1258 If you do this, then the mech will use the tidied-up HTML instead of
1259 the original both when parsing for its own needs, and for returning to
1260 you through "content()".
1261
1262 Overloading this method is also the recommended way of implementing
1263 extra validation steps (e.g. link checkers) for every HTML page
1264 received. "warn" and "die" would then come in handy to signal
1265 validation errors.
1266
1267 $mech->credentials( $username, $password )
1268 Provide credentials to be used for HTTP Basic authentication for all
1269 sites and realms until further notice.
1270
1271 The four argument form described in LWP::UserAgent is still supported.
1272
1273 $mech->get_basic_credentials( $realm, $uri, $isproxy )
1274 Returns the credentials for the realm and URI.
1275
1276 $mech->clear_credentials()
1277 Remove any credentials set up with "credentials()".
1278
1280 As a subclass of LWP::UserAgent, WWW::Mechanize inherits all of
1281 LWP::UserAgent's methods. Many of which are overridden or extended.
1282 The following methods are inherited unchanged. View the LWP::UserAgent
1283 documentation for their implementation descriptions.
1284
1285 This is not meant to be an inclusive list. LWP::UA may have added
1286 others.
1287
1288 $mech->head()
1289 Inherited from LWP::UserAgent.
1290
1291 $mech->mirror()
1292 Inherited from LWP::UserAgent.
1293
1294 $mech->simple_request()
1295 Inherited from LWP::UserAgent.
1296
1297 $mech->is_protocol_supported()
1298 Inherited from LWP::UserAgent.
1299
1300 $mech->prepare_request()
1301 Inherited from LWP::UserAgent.
1302
1303 $mech->progress()
1304 Inherited from LWP::UserAgent.
1305
1307 These methods are only used internally. You probably don't need to
1308 know about them.
1309
1310 $mech->_update_page($request, $response)
1311 Updates all internal variables in $mech as if $request was just
1312 performed, and returns $response. The page stack is not altered by this
1313 method, it is up to caller (e.g. "request") to do that.
1314
1315 $mech->_modify_request( $req )
1316 Modifies a HTTP::Request before the request is sent out, for both GET
1317 and POST requests.
1318
1319 We add a "Referer" header, as well as header to note that we can accept
1320 gzip encoded content, if Compress::Zlib is installed.
1321
1322 $mech->_make_request()
1323 Convenience method to make it easier for subclasses like
1324 WWW::Mechanize::Cached to intercept the request.
1325
1326 $mech->_reset_page()
1327 Resets the internal fields that track page parsed stuff.
1328
1329 $mech->_extract_links()
1330 Extracts links from the content of a webpage, and populates the
1331 "{links}" property with WWW::Mechanize::Link objects.
1332
1333 $mech->_push_page_stack()
1334 The agent keeps a stack of visited pages, which it can pop when it
1335 needs to go BACK and so on.
1336
1337 The current page needs to be pushed onto the stack before we get a new
1338 page, and the stack needs to be popped when BACK occurs.
1339
1340 Neither of these take any arguments, they just operate on the $mech
1341 object.
1342
1343 warn( @messages )
1344 Centralized warning method, for diagnostics and non-fatal problems.
1345 Defaults to calling "CORE::warn", but may be overridden by setting
1346 "onwarn" in the constructor.
1347
1348 die( @messages )
1349 Centralized error method. Defaults to calling "CORE::die", but may be
1350 overridden by setting "onerror" in the constructor.
1351
1353 The default settings can get you up and running quickly, but there are
1354 settings you can change in order to make your life easier.
1355
1356 autocheck
1357 "autocheck" can save you the overhead of checking status codes for
1358 success. You may outgrow it as your needs get more sophisticated,
1359 but it's a safe option to start with.
1360
1361 my $agent = WWW::Mechanize->new( autocheck => 1 );
1362
1363 cookie_jar
1364 You are encouraged to install Mozilla::PublicSuffix and use
1365 HTTP::CookieJar::LWP as your cookie jar. HTTP::CookieJar::LWP
1366 provides a better security model matching that of current Web
1367 browsers when Mozilla::PublicSuffix is installed.
1368
1369 use HTTP::CookieJar::LWP ();
1370
1371 my $jar = HTTP::CookieJar::LWP->new;
1372 my $agent = WWW::Mechanize->new( cookie_jar => $jar );
1373
1374 protocols_allowed
1375 This option is inherited directly from LWP::UserAgent. It allows
1376 you to whitelist the protocols you're willing to allow.
1377
1378 my $agent = WWW::Mechanize->new(
1379 protocols_allowed => [ 'http', 'https' ]
1380 );
1381
1382 This will prevent you from inadvertently following URLs like
1383 "file:///etc/passwd"
1384
1385 protocols_forbidden
1386 This option is also inherited directly from LWP::UserAgent. It
1387 allows you to blacklist the protocols you're unwilling to allow.
1388
1389 my $agent = WWW::Mechanize->new(
1390 protocols_forbidden => [ 'file', 'mailto', 'ssh', ]
1391 );
1392
1393 This will prevent you from inadvertently following URLs like
1394 "file:///etc/passwd"
1395
1396 strict_forms
1397 Consider turning on the "strict_forms" option when you create a new
1398 Mech. This will perform a helpful sanity check on form fields
1399 every time you are submitting a form, which can save you a lot of
1400 debugging time.
1401
1402 my $agent = WWW::Mechanize->new( strict_forms => 1 );
1403
1404 If you do not want to have this option globally, you can still turn
1405 it on for individual forms.
1406
1407 $agent->submit_form( fields => { foo => 'bar' } , strict_forms => 1 );
1408
1410 WWW::Mechanize is hosted at GitHub.
1411
1412 Repository: <https://github.com/libwww-perl/WWW-Mechanize>. Bugs:
1413 <https://github.com/libwww-perl/WWW-Mechanize/issues>.
1414
1416 Spidering Hacks, by Kevin Hemenway and Tara Calishain
1417 Spidering Hacks from O'Reilly
1418 (<http://www.oreilly.com/catalog/spiderhks/>) is a great book for
1419 anyone wanting to know more about screen-scraping and spidering.
1420
1421 There are six hacks that use Mech or a Mech derivative:
1422
1423 #21 WWW::Mechanize 101
1424 #22 Scraping with WWW::Mechanize
1425 #36 Downloading Images from Webshots
1426 #44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
1427 #64 Super Author Searching
1428 #73 Scraping TV Listings
1429
1430 The book was also positively reviewed on Slashdot:
1431 <http://books.slashdot.org/article.pl?sid=03/12/11/2126256>
1432
1434 • WWW::Mechanize mailing list
1435
1436 The Mech mailing list is at
1437 <http://groups.google.com/group/www-mechanize-users> and is
1438 specific to Mechanize, unlike the LWP mailing list below. Although
1439 it is a users list, all development discussion takes place here,
1440 too.
1441
1442 • LWP mailing list
1443
1444 The LWP mailing list is at
1445 <http://lists.perl.org/showlist.cgi?name=libwww>, and is more user-
1446 oriented and well-populated than the WWW::Mechanize list.
1447
1448 • Perlmonks
1449
1450 <http://perlmonks.org> is an excellent community of support, and
1451 many questions about Mech have already been answered there.
1452
1453 • WWW::Mechanize::Examples
1454
1455 A random array of examples submitted by users, included with the
1456 Mechanize distribution.
1457
1459 • <http://www.ibm.com/developerworks/linux/library/wa-perlsecure/>
1460
1461 IBM article "Secure Web site access with Perl"
1462
1463 • <http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf>
1464
1465 Leland Johnson's hack #84 in Google Hacks, 2nd Edition is an
1466 example of a production script that uses WWW::Mechanize and
1467 HTML::TableContentParser. It takes in keywords and returns the
1468 estimated price of these keywords on Google's AdWords program.
1469
1470 • <http://www.perl.com/pub/a/2004/06/04/recorder.html>
1471
1472 Linda Julien writes about using HTTP::Recorder to create
1473 WWW::Mechanize scripts.
1474
1475 • <http://www.developer.com/lang/other/article.php/3454041>
1476
1477 Jason Gilmore's article on using WWW::Mechanize for scraping sales
1478 information from Amazon and eBay.
1479
1480 • <http://www.perl.com/pub/a/2003/01/22/mechanize.html>
1481
1482 Chris Ball's article about using WWW::Mechanize for scraping TV
1483 listings.
1484
1485 • <http://www.stonehenge.com/merlyn/LinuxMag/col47.html>
1486
1487 Randal Schwartz's article on scraping Yahoo News for images. It's
1488 already out of date: He manually walks the list of links hunting
1489 for matches, which wouldn't have been necessary if the
1490 "find_link()" method existed at press time.
1491
1492 • <http://www.perladvent.org/2002/16th/>
1493
1494 WWW::Mechanize on the Perl Advent Calendar, by Mark Fowler.
1495
1496 • <http://www.linux-magazin.de/ausgaben/2004/03/datenruessel/>
1497
1498 Michael Schilli's article on Mech and WWW::Mechanize::Shell for the
1499 German magazine Linux Magazin.
1500
1501 Other modules that use Mechanize
1502 Here are modules that use or subclass Mechanize. Let me know of any
1503 others:
1504
1505 • Finance::Bank::LloydsTSB
1506
1507 • HTTP::Recorder
1508
1509 Acts as a proxy for web interaction, and then generates
1510 WWW::Mechanize scripts.
1511
1512 • Win32::IE::Mechanize
1513
1514 Just like Mech, but using Microsoft Internet Explorer to do the
1515 work.
1516
1517 • WWW::Bugzilla
1518
1519 • WWW::CheckSite
1520
1521 • WWW::Google::Groups
1522
1523 • WWW::Hotmail
1524
1525 • WWW::Mechanize::Cached
1526
1527 • WWW::Mechanize::Cached::GZip
1528
1529 • WWW::Mechanize::FormFiller
1530
1531 • WWW::Mechanize::Shell
1532
1533 • WWW::Mechanize::Sleepy
1534
1535 • WWW::Mechanize::SpamCop
1536
1537 • WWW::Mechanize::Timed
1538
1539 • WWW::SourceForge
1540
1541 • WWW::Yahoo::Groups
1542
1543 • WWW::Scripter
1544
1546 Thanks to the numerous people who have helped out on WWW::Mechanize in
1547 one way or another, including Kirrily Robert for the original
1548 "WWW::Automate", Lyle Hopkins, Damien Clark, Ansgar Burchardt, Gisle
1549 Aas, Jeremy Ary, Hilary Holz, Rafael Kitover, Norbert Buchmuller, Dave
1550 Page, David Sainty, H.Merijn Brand, Matt Lawrence, Michael Schwern,
1551 Adriano Ferreira, Miyagawa, Peteris Krumins, Rafael Kitover, David
1552 Steinbrunner, Kevin Falcone, Mike O'Regan, Mark Stosberg, Uri Guttman,
1553 Peter Scott, Philippe Bruhat, Ian Langworth, John Beppu, Gavin Estey,
1554 Jim Brandt, Ask Bjoern Hansen, Greg Davies, Ed Silva, Mark-Jason
1555 Dominus, Autrijus Tang, Mark Fowler, Stuart Children, Max Maischein,
1556 Meng Wong, Prakash Kailasa, Abigail, Jan Pazdziora, Dominique
1557 Quatravaux, Scott Lanning, Rob Casey, Leland Johnson, Joshua Gatcomb,
1558 Julien Beasley, Abe Timmerman, Peter Stevens, Pete Krawczyk, Tad
1559 McClellan, and the late great Iain Truskett.
1560
1562 Andy Lester <andy at petdance.com>
1563
1565 This software is copyright (c) 2004 by Andy Lester.
1566
1567 This is free software; you can redistribute it and/or modify it under
1568 the same terms as the Perl 5 programming language system itself.
1569
1570
1571
1572perl v5.32.1 2021-01-27 WWW::Mechanize(3)