1HTML::TreeBuilderX::ASPU_sNeErT(C3o)ntributed Perl DocumHeTnMtLa:t:iTorneeBuilderX::ASP_NET(3)
2
3
4

NAME

6       HTML::TreeBuilderX::ASP_NET - Scrape ASP.NET/VB.NET sites which utilize
7       Javascript POST-backs.
8

SYNOPSIS

10               my $ua = LWP::UserAgent->new;
11               my $resp = $ua->get('http://uniqueUrl.com/Server.aspx');
12               my $root = HTML::TreeBuilder->new_from_content( $resp->content );
13               my $a = $root->look_down( _tag => 'a', id => 'nextPage' );
14               my $aspnet = HTML::TreeBuilderX::ASP_NET->new({
15                       element   => $a
16                       , baseURL =>$resp->request->uri ## takes into account posting redirects
17               });
18               my $resp = $ua->request( $aspnet->httpResponse );
19
20               ## or the easy cheating way see the SEE ALSO section for links
21               my $aspnet = HTML::TreeBuilderX::ASP_NET->new_with_traits( traits => ['htmlElement'] );
22               $form->look_down(_tag=> 'a')->httpResponse
23

DESCRIPTION

25       Scrape ASP.NET sites which utilize the language's __VIEWSTATE,
26       __EVENTTARGET, __EVENTARGUMENT, __LASTFOCUS, et al. This module returns
27       a HTTP::Response from the form with the use of the method
28       "->httpResponse".
29
30       In this scheme many of the links on a webpage will apear to be
31       javascript functions. The default Javascript function is
32       "__doPostBack(eventTarget, eventArgument)". ASP.NET has two hidden
33       fields which record state: __VIEWSTATE, and __LASTFOCUS. It abstracts
34       each link with a method that utilizes an HTTP post-back to the server.
35       The Javascript behind "__doPostBack" simply appends
36       __EVENTTARGET=$eventTarget&__EVENTARGUMENT=$eventArgument onto the POST
37       request from the parent form and submits it. When the server receives
38       this request it decodes and decompresses the __VIEWSTATE and uses it
39       along with the new __EVENTTARGET and __EVENTARGUMENT to perform the
40       action, which is often no more than serializing the data back into the
41       __VIEWSTATE.
42
43       Sometimes developers cloak the "__doPostBack(target,arg)" with names
44       akin to "changepage(arg)" which simply call "__doPostBack("target",
45       arg)". This module will handle this use case as well using the explicit
46       an eventTriggerArugment in the constructor.
47
48       This flow is a bane on RESTLESS http and makes no sense whatsoever.
49       Thanks Microsoft.
50
51             .-------------------------------------------------------------------.
52             |                            HTML FORM 1                            |
53             | <form action="Server.aspx" method="post">                         |
54             | <input type="hidden" name="__VIEWSTATE" value="encryptedXML-FOO"> |
55             | <a>1</a> |                                                        |
56             | <a href="javascript:__doPostBack('gotopage','2')">2</a>           |
57             | ...                                                               |
58             '-------------------------------------------------------------------'
59                                               |
60                                               v
61                              _________________________________
62                              \                                \
63                               ) User clicks the link named "2" )
64                              /________________________________/
65                                               |
66                                               v
67          .------------------------------------------------------------------------.
68          | POST http://aspxnonsensery/Server.aspx                                 |
69          | Content-Length: 2659                                                   |
70          | Content-Type: application/x-www-form-urlencoded                        |
71          |                                                                        |
72          | __VIEWSTATE=encryptedXML-FOO&__EVENTTARGET=gotopage1&__EVENTARGUMENT=2 |
73          '------------------------------------------------------------------------'
74                                               |
75                                               v
76           .----------------------------------------------------------------------.
77           |                             HTML FORM 2                              |
78           |                       (different __VIEWSTATE)                        |
79           | <form action="Server.aspx" method="post">                            |
80           | <input type="hidden" name="__VIEWSTATE" value="encryptedXML-BAR">    |
81           | <a href="javascript:__doPostBack('gotopage','1')">1</a> |            |
82           | <a>2</a>                                                             |
83           | ...                                                                  |
84           '----------------------------------------------------------------------'
85
86   METHODS
87        IN ADDITION TO ALL OF THE METHODS FROM HTTP::Request::Form
88
89       ->new({ hashref })
90           Takes a HashRef, returns a new instance some of the possible
91           key/values are:
92
93           form => $htmlElement
94               optional: You explicitly send the HTML::Elmenet representing
95               the form.  If you do not one will be implicitly deduced from
96               the $self->element, making element=>$htmlElement a requirement
97
98           eventTriggerArgument => $hashRef
99               Not needed if you supply an element.  This takes a HashRef and
100               will create HTML::Elements that mimmick hidden input fields.
101               From which to tack onto the $self->form.
102
103           element => $htmlElement
104               Not needed if you send an eventTriggerArgument. Attempts to
105               deduce the __EVENTARGUMENT and __EVENTTARGET from the 'href'
106               attribute of the element just as if the two were supplied
107               explicitly.  It will also be used to deduce a form by looking
108               up in the HTML tree if one is not supplied.
109
110           debug => *0|1
111               optional: Sends the debug flag H:R:F, default is off.
112
113           baseURL => $uri
114               optional: Sets the base of the URL for the post action
115
116       ->httpRequest
117           Returns an HTTP::Request object for the HTTP POST
118
119       ->hrf
120           Explicitly return the underlying HTTP::Request::Form object. All
121           methods fallback here anyway, but this will return that object
122           directly.
123
124   FUNCTIONS
125       None of these are exported...
126
127       createInputElements( {eventTarget => eventArgument} )
128           Helper function takes two values in an HashRef. Assumes the key is
129           the __EVENTTARGET and value the __EVENTARGUMENT, returns two
130           HTML::Element pseudo-input fields with the information.
131
132       parseDoPostBack( $str )
133           Accepts a string that is often the "href" attribute of an
134           HTTP::Element. It simple parses out the call to Javascript, using
135           regexes, and makes the two args useable to perl in the form of an
136           HashRef.
137

SEE ALSO

139       HTML::TreeBuilderX::ASP_NET::Roles::htmlElement
140           For an easy way to glue the two together
141
142       HTTP::Request
143           For the object the method htmlElement returns
144
145       HTTP::Request::Form
146           For a base class, to which all methods are valid
147
148       HTML::Element
149           For the base class of all HTML tokens
150

AUTHOR

152       Evan Carroll, "<me at evancarroll.com>"
153

BUGS

155       None, though *much* more support should be added to ->element. Not
156       everthing is a simple anchor tag.
157

SUPPORT

159       You can find documentation for this module with the perldoc command.
160
161       perldoc HTML::TreeBuilderX::ASP_NET
162
163       You can also look for information at:
164
165       ·   RT: CPAN's request tracker
166
167           <http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-TreeBuilderX-ASP_NET>
168
169       ·   AnnoCPAN: Annotated CPAN documentation
170
171           <http://annocpan.org/dist/HTML-TreeBuilderX-ASP_NET>
172
173       ·   CPAN Ratings
174
175           <http://cpanratings.perl.org/d/HTML-TreeBuilderX-ASP_NET>
176
177       ·   Search CPAN
178
179           <http://search.cpan.org/dist/HTML-TreeBuilderX-ASP_NET>
180
182       Copyright 2008 Evan Carroll, all rights reserved.
183
184       This program is free software; you can redistribute it and/or modify it
185       under the same terms as Perl itself.
186
187
188
189perl v5.28.1                      2009-08-26    HTML::TreeBuilderX::ASP_NET(3)
Impressum