1HTML::TreeBuilderX::ASPU_sNeErT(C3o)ntributed Perl DocumHeTnMtLa:t:iTorneeBuilderX::ASP_NET(3)
2
3
4
6 HTML::TreeBuilderX::ASP_NET - Scrape ASP.NET/VB.NET sites which utilize
7 Javascript POST-backs.
8
10 my $ua = LWP::UserAgent->new;
11 my $resp = $ua->get('http://uniqueUrl.com/Server.aspx');
12 my $root = HTML::TreeBuilder->new_from_content( $resp->content );
13 my $a = $root->look_down( _tag => 'a', id => 'nextPage' );
14 my $aspnet = HTML::TreeBuilderX::ASP_NET->new({
15 element => $a
16 , baseURL =>$resp->request->uri ## takes into account posting redirects
17 });
18 my $resp = $ua->request( $aspnet->httpResponse );
19
20 ## or the easy cheating way see the SEE ALSO section for links
21 my $aspnet = HTML::TreeBuilderX::ASP_NET->new_with_traits( traits => ['htmlElement'] );
22 $form->look_down(_tag=> 'a')->httpResponse
23
25 Scrape ASP.NET sites which utilize the language's __VIEWSTATE,
26 __EVENTTARGET, __EVENTARGUMENT, __LASTFOCUS, et al. This module returns
27 a HTTP::Response from the form with the use of the method
28 "->httpResponse".
29
30 In this scheme many of the links on a webpage will apear to be
31 javascript functions. The default Javascript function is
32 "__doPostBack(eventTarget, eventArgument)". ASP.NET has two hidden
33 fields which record state: __VIEWSTATE, and __LASTFOCUS. It abstracts
34 each link with a method that utilizes an HTTP post-back to the server.
35 The Javascript behind "__doPostBack" simply appends
36 __EVENTTARGET=$eventTarget&__EVENTARGUMENT=$eventArgument onto the POST
37 request from the parent form and submits it. When the server receives
38 this request it decodes and decompresses the __VIEWSTATE and uses it
39 along with the new __EVENTTARGET and __EVENTARGUMENT to perform the
40 action, which is often no more than serializing the data back into the
41 __VIEWSTATE.
42
43 Sometimes developers cloak the "__doPostBack(target,arg)" with names
44 akin to "changepage(arg)" which simply call "__doPostBack("target",
45 arg)". This module will handle this use case as well using the explicit
46 an eventTriggerArugment in the constructor.
47
48 This flow is a bane on RESTLESS http and makes no sense whatsoever.
49 Thanks Microsoft.
50
51 .-------------------------------------------------------------------.
52 | HTML FORM 1 |
53 | <form action="Server.aspx" method="post"> |
54 | <input type="hidden" name="__VIEWSTATE" value="encryptedXML-FOO"> |
55 | <a>1</a> | |
56 | <a href="javascript:__doPostBack('gotopage','2')">2</a> |
57 | ... |
58 '-------------------------------------------------------------------'
59 |
60 v
61 _________________________________
62 \ \
63 ) User clicks the link named "2" )
64 /________________________________/
65 |
66 v
67 .------------------------------------------------------------------------.
68 | POST http://aspxnonsensery/Server.aspx |
69 | Content-Length: 2659 |
70 | Content-Type: application/x-www-form-urlencoded |
71 | |
72 | __VIEWSTATE=encryptedXML-FOO&__EVENTTARGET=gotopage1&__EVENTARGUMENT=2 |
73 '------------------------------------------------------------------------'
74 |
75 v
76 .----------------------------------------------------------------------.
77 | HTML FORM 2 |
78 | (different __VIEWSTATE) |
79 | <form action="Server.aspx" method="post"> |
80 | <input type="hidden" name="__VIEWSTATE" value="encryptedXML-BAR"> |
81 | <a href="javascript:__doPostBack('gotopage','1')">1</a> | |
82 | <a>2</a> |
83 | ... |
84 '----------------------------------------------------------------------'
85
86 METHODS
87 IN ADDITION TO ALL OF THE METHODS FROM HTTP::Request::Form
88
89 ->new({ hashref })
90 Takes a HashRef, returns a new instance some of the possible
91 key/values are:
92
93 form => $htmlElement
94 optional: You explicitly send the HTML::Elmenet representing
95 the form. If you do not one will be implicitly deduced from
96 the $self->element, making element=>$htmlElement a requirement
97
98 eventTriggerArgument => $hashRef
99 Not needed if you supply an element. This takes a HashRef and
100 will create HTML::Elements that mimmick hidden input fields.
101 From which to tack onto the $self->form.
102
103 element => $htmlElement
104 Not needed if you send an eventTriggerArgument. Attempts to
105 deduce the __EVENTARGUMENT and __EVENTTARGET from the 'href'
106 attribute of the element just as if the two were supplied
107 explicitly. It will also be used to deduce a form by looking
108 up in the HTML tree if one is not supplied.
109
110 debug => *0|1
111 optional: Sends the debug flag H:R:F, default is off.
112
113 baseURL => $uri
114 optional: Sets the base of the URL for the post action
115
116 ->httpRequest
117 Returns an HTTP::Request object for the HTTP POST
118
119 ->hrf
120 Explicitly return the underlying HTTP::Request::Form object. All
121 methods fallback here anyway, but this will return that object
122 directly.
123
124 FUNCTIONS
125 None of these are exported...
126
127 createInputElements( {eventTarget => eventArgument} )
128 Helper function takes two values in an HashRef. Assumes the key is
129 the __EVENTTARGET and value the __EVENTARGUMENT, returns two
130 HTML::Element pseudo-input fields with the information.
131
132 parseDoPostBack( $str )
133 Accepts a string that is often the "href" attribute of an
134 HTTP::Element. It simple parses out the call to Javascript, using
135 regexes, and makes the two args useable to perl in the form of an
136 HashRef.
137
139 HTML::TreeBuilderX::ASP_NET::Roles::htmlElement
140 For an easy way to glue the two together
141
142 HTTP::Request
143 For the object the method htmlElement returns
144
145 HTTP::Request::Form
146 For a base class, to which all methods are valid
147
148 HTML::Element
149 For the base class of all HTML tokens
150
152 Evan Carroll, "<me at evancarroll.com>"
153
155 None, though *much* more support should be added to ->element. Not
156 everthing is a simple anchor tag.
157
159 You can find documentation for this module with the perldoc command.
160
161 perldoc HTML::TreeBuilderX::ASP_NET
162
163 You can also look for information at:
164
165 · RT: CPAN's request tracker
166
167 <http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-TreeBuilderX-ASP_NET>
168
169 · AnnoCPAN: Annotated CPAN documentation
170
171 <http://annocpan.org/dist/HTML-TreeBuilderX-ASP_NET>
172
173 · CPAN Ratings
174
175 <http://cpanratings.perl.org/d/HTML-TreeBuilderX-ASP_NET>
176
177 · Search CPAN
178
179 <http://search.cpan.org/dist/HTML-TreeBuilderX-ASP_NET>
180
182 Copyright 2008 Evan Carroll, all rights reserved.
183
184 This program is free software; you can redistribute it and/or modify it
185 under the same terms as Perl itself.
186
187
188
189perl v5.28.0 2009-08-26 HTML::TreeBuilderX::ASP_NET(3)