1LW2(3) User Contributed Perl Documentation LW2(3)
2
3
4
6 LW2 - Perl HTTP library version 2.4
7
9 use LW2;
10
11 require 'LW2.pm';
12
14 Libwhisker is a Perl library useful for HTTP testing scripts. It con‐
15 tains a pure-Perl reimplementation of functionality found in the "LWP",
16 "URI", "Digest::MD5", "Digest::MD4", "Data::Dumper", "Authen::NTLM",
17 "HTML::Parser", "HTML::FormParser", "CGI::Upload", "MIME::Base64", and
18 "GetOpt::Std" modules.
19
20 Libwhisker is designed to be portable (a single perl file), fast (gen‐
21 eral benchmarks show libwhisker is faster than LWP), and flexible
22 (great care was taken to ensure the library does exactly what you want
23 to do, even if it means breaking the protocol).
24
26 The following are the functions contained in Libwhisker:
27
28 auth_brute_force
29 Params: $auth_method, \%req, $user, \@passwords [, $domain,
30 $fail_code ]
31
32 Return: $first_valid_password, undef if error/none found
33
34 Perform a HTTP authentication brute force against a server (host
35 and URI defined in %req). It will try every password in the pass‐
36 word array for the given user. The first password (in conjunction
37 with the given user) that doesn't return HTTP 401 is returned (and
38 the brute force is stopped at that point). You should retry the
39 request with the given password and double-check that you got a
40 useful HTTP return code that indicates successful authentication
41 (200, 302), and not something a bit more abnormal (407, 500, etc).
42 $domain is optional, and is only used for NTLM auth.
43
44 Note: set up any proxy settings and proxy auth in %req before call‐
45 ing this function.
46
47 You can brute-force proxy authentication by setting up the target
48 proxy as proxy_host and proxy_port in %req, using an arbitrary host
49 and uri (preferably one that is reachable upon successful proxy
50 authorization), and setting the $fail_code to 407. The
51 $auth_method passed to this function should be a proxy-based one
52 ('proxy-basic', 'proxy-ntlm', etc).
53
54 if your server returns something other than 401 upon auth failure,
55 then set $fail_code to whatever is returned (and it needs to be
56 something *different* than what is received on auth success, or
57 this function won't be able to tell the difference).
58
59 auth_unset
60 Params: \%req
61
62 Return: nothing (modifies %req)
63
64 Modifes %req to disable all authentication (regular and proxy).
65
66 Note: it only removes the values set by auth_set(). Manually-
67 defined [Proxy-]Authorization headers will also be deleted (but you
68 shouldn't be using the auth_* functions if you're manually handling
69 your own auth...)
70
71 auth_set
72 Params: $auth_method, \%req, $user, $password [, $domain]
73
74 Return: nothing (modifies %req)
75
76 Modifes %req to use the indicated authentication info.
77
78 Auth_method can be: 'basic', 'proxy-basic', 'ntlm', 'proxy-ntlm'.
79
80 Note: this function may not necessarily set any headers after being
81 called. Also, proxy-ntlm with SSL is not currently supported.
82
83 cookie_new_jar
84 Params: none
85
86 Return: $jar
87
88 Create a new cookie jar, for use with the other functions. Even
89 though the jar is technically just a hash, you should still use
90 this function in order to be future-compatible (should the jar for‐
91 mat change).
92
93 cookie_read
94 Params: $jar, \%response [, \%request, $reject ]
95
96 Return: $num_of_cookies_read
97
98 Read in cookies from an %response hash, and put them in $jar.
99
100 Notice: cookie_read uses internal magic done by http_do_request in
101 order to read cookies regardless of 'Set-Cookie[2]' header appear‐
102 ance.
103
104 If the optional %request hash is supplied, then it will be used to
105 calculate default host and path values, in case the cookie doesn't
106 specify them explicitly. If $reject is set to 1, then the %request
107 hash values are used to calculate and reject cookies which are not
108 appropriate for the path and domains of the given request.
109
110 cookie_parse
111 Params: $jar, $cookie [, $default_domain, $default_path, $reject ]
112
113 Return: nothing
114
115 Parses the cookie into the various parts and then sets the appro‐
116 priate values in the cookie $jar. If the cookie value is blank, it
117 will delete it from the $jar. See the 'docs/cookies.txt' document
118 for a full explanation of how Libwhisker parses cookies and what
119 RFC aspects are supported.
120
121 The optional $default_domain value is taken literally. Values with
122 no leading dot (e.g. 'www.host.com') are considered to be strict
123 hostnames and will only match the identical hostname. Values with
124 leading dots (e.g. '.host.com') are treated as sub-domain matches
125 for a single domain level. If the cookie does not indicate a
126 domain, and a $default_domain is not provided, then the cookie is
127 considered to match all domains/hosts.
128
129 The optional $default_path is used when the cookie does not specify
130 a path. $default_path must be absolute (start with '/'), or it
131 will be ignored. If the cookie does not specify a path, and
132 $default_path is not provided, then the default value '/' will be
133 used.
134
135 Set $reject to 1 if you wish to reject cookies based upon the pro‐
136 vided $default_domain and $default_path. Note that $default_domain
137 and $default_path must be specified for $reject to actually do
138 something meaningful.
139
140 cookie_write
141 Params: $jar, \%request, $override
142
143 Return: nothing
144
145 Goes through the given $jar and sets the Cookie header in %req
146 pending the correct domain and path. If $override is true, then
147 the secure, domain and path restrictions of the cookies are ignored
148 and all cookies are essentially included.
149
150 Notice: cookie expiration is currently not implemented. URL
151 restriction comparision is also case-insensitive.
152
153 cookie_get
154 Params: $jar, $name
155
156 Return: @elements
157
158 Fetch the named cookie from the $jar, and return the components.
159 The returned items will be an array in the following order:
160
161 value, domain, path, expire, secure
162
163 value = cookie value, should always be non-empty string domain =
164 domain root for cookie, can be undefined path = URL path for
165 cookie, should always be a non-empty string expire = undefined
166 (depreciated, but exists for backwards-compatibility) secure =
167 whether or not the cookie is limited to HTTPs; value is 0 or 1
168
169 cookie_get_names
170 Params: $jar
171
172 Return: @names
173
174 Fetch all the cookie names from the jar, which then let you
175 cooke_get() them individually.
176
177 cookie_get_valid_names
178 Params: $jar, $domain, $url, $ssl
179
180 Return: @names
181
182 Fetch all the cookie names from the jar which are valid for the
183 given $domain, $url, and $ssl values. $domain should be string
184 scalar of the target host domain ('www.example.com', etc.). $url
185 should be the absolute URL for the page ('/index.html',
186 '/cgi-bin/foo.cgi', etc.). $ssl should be 0 for non-secure cook‐
187 ies, or 1 for all (secure and normal) cookies. The return value is
188 an array of names compatible with cookie_get().
189
190 cookie_set
191 Params: $jar, $name, $value, $domain, $path, $expire, $secure
192
193 Return: nothing
194
195 Set the named cookie with the provided values into the %jar. $name
196 is required to be a non-empty string. $value is required, and will
197 delete the named cookie from the $jar if it is an empty string.
198 $domain and $path can be strings or undefined. $expire is ignored
199 (but exists for backwards-compatibility). $secure should be the
200 numeric value of 0 or 1.
201
202 crawl_new
203 Params: $START, $MAX_DEPTH, \%request_hash [, \%tracking_hash ]
204
205 Return: $crawl_object
206
207 The crawl_new() functions initializes a crawl object (hash) to the
208 default values, and then returns it for later use by crawl().
209 $START is the starting URL (in the form of
210 'http://www.host.com/url'), and MAX_DEPTH is the maximum number of
211 levels to crawl (the START URL counts as 1, so a value of 2 will
212 crawl the START URL and all URLs found on that page). The
213 request_hash is a standard initialized request hash to be used for
214 requests; you should set any authentication information or headers
215 in this hash in order for the crawler to use them. The optional
216 tracking_hash lets you supply a hash for use in tracking URL
217 results (otherwise crawl_new() will allocate a new anon hash).
218
219 crawl
220 Params: $crawl_object [, $START, $MAX_DEPTH ]
221
222 Return: $count [ undef on error ]
223
224 The heart of the crawl package. Will perform an HTTP crawl on the
225 specified HOST, starting at START URI, proceeding up to MAX_DEPTH.
226
227 Crawl_object needs to be the variable returned by crawl_new(). You
228 can also indirectly call crawl() via the crawl_object itself:
229
230 $crawl_object->{crawl}->($START,$MAX_DEPTH)
231
232 Returns the number of URLs actually crawled (not including those
233 skipped).
234
235 dump
236 Params: $name, \@array [, $name, \%hash, $name, \$scalar ]
237
238 Return: $code [ undef on error ]
239
240 The dump function will take the given $name and data reference, and
241 will create an ASCII perl code representation suitable for eval'ing
242 later to recreate the same structure. $name is the name of the
243 variable that it will be saved as. Example:
244
245 $output = LW2::dump('request',\%request);
246
247 NOTE: dump() creates anonymous structures under the name given.
248 For example, if you dump the hash %hin under the name 'hin', then
249 when you eval the dumped code you will need to use %$hin, since
250 $hin is now a *reference* to a hash.
251
252 dump_writefile
253 Params: $file, $name, \@array [, $name, \%hash, $name, \@scalar ]
254
255 Return: 0 if success; 1 if error
256
257 This calls dump() and saves the output to the specified $file.
258
259 Note: LW does not checking on the validity of the file name, it's
260 creation, or anything of the sort. Files are opened in overwrite
261 mode.
262
263 encode_base64
264 Params: $data [, $eol]
265
266 Return: $b64_encoded_data
267
268 This function does Base64 encoding. If the binary MIME::Base64
269 module is available, it will use that; otherwise, it falls back to
270 an internal perl version. The perl version carries the following
271 copyright:
272
273 Copyright 1995-1999 Gisle Aas <gisle@aas.no>
274
275 NOTE: the $eol parameter will be inserted every 76 characters.
276 This is used to format the data for output on a 80 character wide
277 terminal.
278
279 decode_base64
280 Params: $data
281
282 Return: $b64_decoded_data
283
284 A perl implementation of base64 decoding. The perl code for this
285 function was actually taken from an older MIME::Base64 perl module,
286 and bears the following copyright:
287
288 Copyright 1995-1999 Gisle Aas <gisle@aas.no>
289
290 encode_uri_hex
291 Params: $data
292
293 Return: $result
294
295 This function encodes every character (except the / character) with
296 normal URL hex encoding.
297
298 encode_uri_randomhex
299 Params: $data
300
301 Return: $result
302
303 This function randomly encodes characters (except the / character)
304 with normal URL hex encoding.
305
306 encode_uri_randomcase
307 Params: $data
308
309 Return: $result
310
311 This function randomly changes the case of characters in the
312 string.
313
314 encode_unicode
315 Params: $data
316
317 Return: $result
318
319 This function converts a normal string into Windows unicode format
320 (non-overlong or anything fancy).
321
322 decode_unicode
323 Params: $unicode_string
324
325 Return: $decoded_string
326
327 This function attempts to decode a unicode (UTF-8) string by con‐
328 verting it into a single-byte-character string. Overlong charac‐
329 ters are converted to their standard characters in place; non-over‐
330 long (aka multi-byte) characters are substituted with the 0xff;
331 invalid encoding characters are left as-is.
332
333 Note: this function is useful for dealing with the various unicode
334 exploits/vulnerabilities found in web servers; it is *not* good for
335 doing actual UTF-8 parsing, since characters over a single byte are
336 basically dropped/replaced with a placeholder.
337
338 encode_anti_ids
339 Params: \%request, $modes
340
341 Return: nothing
342
343 encode_anti_ids computes the proper anti-ids encoding/tricks speci‐
344 fied by $modes, and sets up %hin in order to use those tricks.
345 Valid modes are (the mode numbers are the same as those found in
346 whisker 1.4):
347
348 1 Encode some of the characters via normal URL encoding
349 2 Insert directory self-references (/./)
350 3 Premature URL ending (make it appear the request line is done)
351 4 Prepend a long random string in the form of "/string/../URL"
352 5 Add a fake URL parameter
353 6 Use a tab instead of a space as a request spacer
354 7 Change the case of the URL (works against Windows and Novell)
355 8 Change normal seperators ('/') to Windows version ('\')
356 9 Session splicing [NOTE: not currently available]
357
358 You can set multiple modes by setting the string to contain all the
359 modes desired; i.e. $modes="146" will use modes 1, 4, and 6.
360
361 FORMS FUNCTIONS
362 The goal is to parse the variable, human-readable HTML into con‐
363 crete structures useable by your program. The forms functions does
364 do a good job at making these structures, but I will admit: they
365 are not exactly simple, and thus not a cinch to work with. But
366 then again, representing something as complex as a HTML form is not
367 a simple thing either. I think the results are acceptable for
368 what's trying to be done. Anyways...
369
370 Forms are stored in perl hashes, with elements in the following
371 format:
372
373 $form{'element_name'}=@([ 'type', 'value', @params ])
374
375 Thus every element in the hash is an array of anonymous arrays.
376 The first array value contains the element type (which is 'select',
377 'textarea', 'button', or an 'input' value of the form 'input-text',
378 'input-hidden', 'input-radio', etc).
379
380 The second value is the value, if applicable (it could be undef if
381 no value was specified). Note that select elements will always
382 have an undef value--the actual values are in the subsequent
383 options elements.
384
385 The third value, if defined, is an anonymous array of additional
386 tag parameters found in the element (like 'onchange="blah"',
387 'size="20"', 'maxlength="40"', 'selected', etc).
388
389 The array does contain one special element, which is stored in the
390 hash under a NULL character ("\0") key. This element is of the
391 format:
392
393 $form{"\0"}=['name', 'method', 'action', @parameters];
394
395 The element is an anonymous array that contains strings of the
396 form's name, method, and action (values can be undef), and a
397 @parameters array similar to that found in normal elements (above).
398
399 Accessing individual values stored in the form hash becomes a test
400 of your perl referencing skills. Hint: to access the 'value' of
401 the third element named 'choices', you would need to do:
402
403 $form{'choices'}->[2]->[1];
404
405 The '[2]' is the third element (normal array starts with 0), and
406 the actual value is '[1]' (the type is '[0]', and the parameter
407 array is '[2]').
408
409 forms_read
410 Params: \$html_data
411
412 Return: \@found_forms
413
414 This function parses the given $html_data into libwhisker form
415 hashes. It returns a reference to an array of hash references to
416 the found forms.
417
418 forms_write
419 Params: \%form_hash
420
421 Return: $html_of_form [undef on error]
422
423 This function will take the given %form hash and compose a generic
424 HTML representation of it, formatted with tabs and newlines in
425 order to make it neat and tidy for printing.
426
427 Note: this function does *not* escape any special characters that
428 were embedded in the element values.
429
430 html_find_tags
431 Params: \$data, \&callback_function [, $xml_flag, $funcref,
432 \%tag_map]
433
434 Return: nothing
435
436 html_find_tags parses a piece of HTML and 'extracts' all found
437 tags, passing the info to the given callback function. The call‐
438 back function must accept two parameters: the current tag (as a
439 scalar), and a hash ref of all the tag's elements. For example, the
440 tag <a href="/file"> will pass 'a' as the current tag, and a hash
441 reference which contains {'href'=>"/file"}.
442
443 The xml_flag, when set, causes the parser to do some extra process‐
444 ing and checks to accomodate XML style tags such as <tag
445 foo="bar"/>.
446
447 The optional %tagmap is a hash of lowercase tag names. If a tagmap
448 is supplied, then the parser will only call the callback function
449 if the tag name exists in the tagmap.
450
451 The optional $funcref variable is passed straight to the callback
452 function, allowing you to pass flags or references to more complex
453 structures to your callback function.
454
455 html_find_tags_rewrite
456 Params: $position, $length, $replacement
457
458 Return: nothing
459
460 html_find_tags_rewrite() is used to 'rewrite' an HTML stream from
461 within an html_find_tags() callback function. In general, you can
462 think of html_find_tags_rewrite working as:
463
464 substr(DATA, $position, $length) = $replacement
465
466 Where DATA is the current HTML string the html parser is using.
467 The reason you need to use this function and not substr() is
468 because a few internal parser pointers and counters need to be
469 adjusted to accomodate the changes.
470
471 If you want to remove a piece of the string, just set the replace‐
472 ment to an empty string (''). If you wish to insert a string
473 instead of overwrite, just set $length to 0; your string will be
474 inserted at the indicated $position.
475
476 html_link_extractor
477 Params: \$html_data
478
479 Return: @urls
480
481 The html_link_extractor() function uses the internal crawl tests to
482 extract all the HTML links from the given HTML data stream.
483
484 Note: html_link_extractor() does not unique the returned array of
485 discovered links, nor does it attempt to remove javascript links or
486 make the links absolute. It just extracts every raw link from the
487 HTML stream and returns it. You'll have to do your own post-pro‐
488 cessing.
489
490 http_new_request
491 Params: %parameters
492
493 Return: \%request_hash
494
495 This function basically 'objectifies' the creation of whisker
496 request hash objects. You would call it like:
497
498 $req = http_new_request( host=>'www.example.com', uri=>'/' )
499
500 where 'host' and 'uri' can be any number of {whisker} hash control
501 values (see http_init_request for default list).
502
503 http_new_response
504 Params: [none]
505
506 Return: \%response_hash
507
508 This function basically 'objectifies' the creation of whisker
509 response hash objects. You would call it like:
510
511 $resp = http_new_response()
512
513 http_init_request
514 Params: \%request_hash_to_initialize
515
516 Return: Nothing (modifies input hash)
517
518 Sets default values to the input hash for use. Sets the host to
519 'localhost', port 80, request URI '/', using HTTP 1.1 with GET
520 method. The timeout is set to 10 seconds, no proxies are defined,
521 and all URI formatting is set to standard HTTP syntax. It also
522 sets the Connection (Keep-Alive) and User-Agent headers.
523
524 NOTICE!! It's important to use http_init_request before calling
525 http_do_request, or http_do_request might puke. Thus, a special
526 magic value is placed in the hash to let http_do_request know that
527 the hash has been properly initialized. If you really must 'roll
528 your own' and not use http_init_request before you call
529 http_do_request, you will at least need to set the MAGIC value
530 (amongst other things).
531
532 http_do_request
533 Params: \%request, \%response [, \%configs]
534
535 Return: >=1 if error; 0 if no error (also modifies response hash)
536
537 *THE* core function of libwhisker. http_do_request actually per‐
538 forms the HTTP request, using the values submitted in %request, and
539 placing result values in %response. This allows you to resubmit
540 %request in subsequent requests (%response is automatically cleared
541 upon execution). You can submit 'runtime' config directives as
542 %configs, which will be spliced into $hin{whisker}->{} before any‐
543 thing else. That means you can do:
544
545 LW2::http_do_request(\%req,\%resp,{'uri'=>'/cgi-bin/'});
546
547 This will set $req{whisker}->{'uri'}='/cgi-bin/' before execution,
548 and provides a simple shortcut (note: it does modify %req).
549
550 This function will also retry any requests that bomb out during the
551 transaction (but not during the connecting phase). This is con‐
552 trolled by the {whisker}->{retry} value. Also note that the
553 returned error message in hout is the *last* error received. All
554 retry errors are put into {whisker}->{retry_errors}, which is an
555 anonymous array.
556
557 Also note that all NTLM auth logic is implemented in
558 http_do_request(). NTLM requires multiple requests in order to
559 work correctly, and so this function attempts to wrap that and make
560 it all transparent, so that the final end result is what's passed
561 to the application.
562
563 This function will return 0 on success, 1 on HTTP protocol error,
564 and 2 on non-recoverable network connection error (you can retry
565 error 1, but error 2 means that the server is totally unreachable
566 and there's no point in retrying).
567
568 http_req2line
569 Params: \%request, $uri_only_switch
570
571 Return: $request
572
573 req2line is used internally by http_do_request, as well as provides
574 a convienient way to turn a %request configuration into an actual
575 HTTP request line. If $switch is set to 1, then the returned
576 $request will be the URI only ('/requested/page.html'), versus the
577 entire HTTP request ('GET /requested/page.html HTTP/1.0\n\n').
578 Also, if the 'full_request_override' whisker config variable is set
579 in %hin, then it will be returned instead of the constructed URI.
580
581 http_resp2line
582 Params: \%response
583
584 Return: $response
585
586 http_resp2line provides a convienient way to turn a %response hash
587 back into the original HTTP response line.
588
589 http_fixup_request
590 Params: $hash_ref
591
592 Return: Nothing
593
594 This function takes a %hin hash reference and makes sure the proper
595 headers exist (for example, it will add the Host: header, calculate
596 the Content-Length: header for POST requests, etc). For standard
597 requests (i.e. you want the request to be HTTP RFC-compliant), you
598 should call this function right before you call http_do_request.
599
600 http_reset
601 Params: Nothing
602
603 Return: Nothing
604
605 The http_reset function will walk through the %http_host_cache,
606 closing all open sockets and freeing SSL resources. It also clears
607 out the host cache in case you need to rerun everything fresh.
608
609 Note: if you just want to close a single connection, and you have a
610 copy of the %request hash you used, you should use the http_close()
611 function instead.
612
613 ssl_is_available
614 Params: Nothing
615
616 Return: $boolean [, $lib_name, $version]
617
618 The ssl_is_available() function will inform you whether SSL
619 requests are allowed, which is dependant on whether the appropriate
620 SSL libraries are installed on the machine. In scalar context, the
621 function will return 1 or 0. In array context, the second element
622 will be the SSL library name that is currently being used by LW2,
623 and the third elment will be the SSL library version number. Ele‐
624 ments two and three (name and version) will be undefined if called
625 in array context and no SSL libraries are available.
626
627 http_read_headers
628 Params: $stream, \%in, \%out
629
630 Return: $result_code, $encoding, $length, $connection
631
632 Read HTTP headers from the given stream, storing the results in
633 %out. On success, $result_code will be 1 and $encoding, $length,
634 and $connection will hold the values of the Transfer-Encoding, Con‐
635 tent-Length, and Connection headers, respectively. If any of those
636 headers are not present, then it will have an 'undef' value. On an
637 error, the $result_code will be 0 and $encoding will contain an
638 error message.
639
640 This function can be used to parse both request and response head‐
641 ers.
642
643 Note: if there are multiple Transfer-Encoding, Content-Length, or
644 Connection headers, then only the last header value is the one
645 returned by the function.
646
647 http_read_body
648 Params: $stream, \%in, \%out, $encoding, $length
649
650 Return: 1 on success, 0 on error (and sets
651 $hout->{whisker}->{error})
652
653 Read the body from the given stream, placing it in
654 $out->{whisker}->{data}. Handles chunked encoding. Can be used to
655 read HTTP (POST) request or HTTP response bodies. $encoding param‐
656 eter should be lowercase encoding type.
657
658 NOTE: $out->{whisker}->{data} is erased/cleared when this function
659 is called, leaving {data} to just contain this particular HTTP
660 body.
661
662 http_construct_headers
663 Params: \%in
664
665 Return: $data
666
667 This function assembles the headers in the given hash into a data
668 string.
669
670 http_close
671 Params: \%request
672
673 Return: nothing
674
675 This function will close any open streams for the given request.
676
677 Note: in order for http_close() to find the right connection, all
678 original host/proxy/port parameters in %request must be the exact
679 same as when the original request was made.
680
681 http_do_request_timeout
682 Params: \%request, \%response, $timeout
683
684 Return: $result
685
686 This function is identical to http_do_request(), except that it
687 wraps the entire request in a timeout wrapper. $timeout is the
688 number of seconds to allow for the entire request to be completed.
689
690 Note: this function uses alarm() and signals, and thus will only
691 work on Unix-ish platforms. It should be safe to call on any plat‐
692 form though.
693
694 md5 Params: $data
695
696 Return: $hex_md5_string
697
698 This function takes a data scalar, and composes a MD5 hash of it,
699 and returns it in a hex ascii string. It will use the fastest MD5
700 function available.
701
702 md4 Params: $data
703
704 Return: $hex_md4_string
705
706 This function takes a data scalar, and composes a MD4 hash of it,
707 and returns it in a hex ascii string. It will use the fastest MD4
708 function available.
709
710 multipart_set
711 Params: \%multi_hash, $param_name, $param_value
712
713 Return: nothing
714
715 This function sets the named parameter to the given value within
716 the supplied multipart hash.
717
718 multipart_get
719 Params: \%multi_hash, $param_name
720
721 Return: $param_value, undef on error
722
723 This function retrieves the named parameter to the given value
724 within the supplied multipart hash. There is a special case where
725 the named parameter is actually a file--in which case the resulting
726 value will be "\0FILE". In general, all special values will be
727 prefixed with a NULL character. In order to get a file's info, use
728 multipart_getfile().
729
730 multipart_setfile
731 Params: \%multi_hash, $param_name, $file_path [, $filename]
732
733 Return: undef on error, 1 on success
734
735 NOTE: this function does not actually add the contents of
736 $file_path into the %multi_hash; instead, multipart_write() inserts
737 the content when generating the final request.
738
739 multipart_getfile
740 Params: \%multi_hash, $file_param_name
741
742 Return: $path, $name ($path=undef on error)
743
744 multipart_getfile is used to retrieve information for a file param‐
745 eter contained in %multi_hash. To use this you would most likely
746 do:
747
748 ($path,$fname)=LW2::multipart_getfile(\%multi,"param_name");
749
750 multipart_boundary
751 Params: \%multi_hash [, $new_boundary_name]
752
753 Return: $current_boundary_name
754
755 multipart_boundary is used to retrieve, and optionally set, the
756 multipart boundary used for the request.
757
758 NOTE: the function does no checking on the supplied boundary, so if
759 you want things to work make sure it's a legit boundary. Lib‐
760 whisker does *not* prefix it with any '---' characters.
761
762 multipart_write
763 Params: \%multi_hash, \%request
764
765 Return: 1 if successful, undef on error
766
767 multipart_write is used to parse and construct the multipart data
768 contained in %multi_hash, and place it ready to go in the given
769 whisker hash (%request) structure, to be sent to the server.
770
771 NOTE: file contents are read into the final %request, so it's pos‐
772 sible for the hash to get *very* large if you have (a) large
773 file(s).
774
775 multipart_read
776 Params: \%multi_hash, \%hout_response [, $filepath ]
777
778 Return: 1 if successful, undef on error
779
780 multipart_read will parse the data contents of the supplied
781 %hout_response hash, by passing the appropriate info to multi‐
782 part_read_data(). Please see multipart_read_data() for more info
783 on parameters and behaviour.
784
785 NOTE: this function will return an error if the given
786 %hout_response Content-Type is not set to "multipart/form-data".
787
788 multipart_read_data
789 Params: \%multi_hash, \$data, $boundary [, $filepath ]
790
791 Return: 1 if successful, undef on error
792
793 multipart_read_data parses the contents of the supplied data using
794 the given boundary and puts the values in the supplied %multi_hash.
795 Embedded files will *not* be saved unless a $filepath is given,
796 which should be a directory suitable for writing out temporary
797 files.
798
799 NOTE: currently only application/octet-stream is the only supported
800 file encoding. All other file encodings will not be parsed/saved.
801
802 multipart_files_list
803 Params: \%multi_hash
804
805 Return: @files
806
807 multipart_files_list returns an array of parameter names for all
808 the files that are contained in %multi_hash.
809
810 multipart_params_list
811 Params: \%multi_hash
812
813 Return: @params
814
815 multipart_files_list returns an array of parameter names for all
816 the regular parameters (non-file) that are contained in
817 %multi_hash.
818
819 ntlm_new
820 Params: $username, $password [, $domain, $ntlm_only]
821
822 Return: $ntlm_object
823
824 Returns a reference to an array (otherwise known as the 'ntlm
825 object') which contains the various informations specific to a
826 user/pass combo. If $ntlm_only is set to 1, then only the NTLM
827 hash (and not the LanMan hash) will be generated. This results in
828 a speed boost, and is typically fine for using against IIS servers.
829
830 The array contains the following items, in order: username, pass‐
831 word, domain, lmhash(password), ntlmhash(password)
832
833 ntlm_decode_challenge
834 Params: $challenge
835
836 Return: @challenge_parts
837
838 Splits the supplied challenge into the various parts. The returned
839 array contains elements in the following order:
840
841 unicode_domain, ident, packet_type, domain_len, domain_maxlen,
842 domain_offset, flags, challenge_token, reserved, empty, raw_data
843
844 ntlm_client
845 Params: $ntlm_obj [, $server_challenge]
846
847 Return: $response
848
849 ntlm_client() is responsible for generating the base64-encoded text
850 you include in the HTTP Authorization header. If you call
851 ntlm_client() without a $server_challenge, the function will return
852 the initial NTLM request packet (message packet #1). You send this
853 to the server, and take the server's response (message packet #2)
854 and pass that as $server_challenge, causing ntlm_client() to gener‐
855 ate the final response packet (message packet #3).
856
857 Note: $server_challenge is expected to be base64 encoded.
858
859 get_page
860 Params: $url [, \%request]
861
862 Return: $code, $data ($code will be set to undef on error, $data
863 will contain error message)
864
865 This function will fetch the page at the given URL, and return the
866 HTTP response code and page contents. Use this in the form of:
867 ($code,$html)=LW2::get_page("http://host.com/page.html")
868
869 The optional %request will be used if supplied. This allows you to
870 set headers and other parameters.
871
872 get_page_hash
873 Params: $url [, \%request]
874
875 Return: $hash_ref (undef on no URL)
876
877 This function will fetch the page at the given URL, and return the
878 whisker HTTP response hash. The return code of the function is set
879 to $hash_ref->{whisker}->{get_page_hash}, and uses the
880 http_do_request() return values.
881
882 Note: undef is returned if no URL is supplied
883
884 get_page_to_file
885 Params: $url, $filepath [, \%request]
886
887 Return: $code ($code will be set to undef on error)
888
889 This function will fetch the page at the given URL, place the
890 resulting HTML in the file specified, and return the HTTP response
891 code. The optional %request hash sets the default parameters to be
892 used in the request.
893
894 NOTE: libwhisker does not do any file checking; libwhisker will
895 open the supplied filepath for writing, overwriting any previously-
896 existing files. Libwhisker does not differentiate between a bad
897 request, and a bad file open. If you're having troubles making
898 this function work, make sure that your $filepath is legal and
899 valid, and that you have appropriate write permissions to cre‐
900 ate/overwrite that file.
901
902 time_mktime
903 Params: $seconds, $minutes, $hours, $day_of_month, $month,
904 $year_minus_1900
905
906 Return: $seconds [ -1 on error ]
907
908 Performs a general mktime calculation with the given time compo‐
909 nents. Note that the input parameter values are expected to be in
910 the format output by localtime/gmtime. Namely, $seconds is 0-60
911 (yes, there can be a leap second value of 60 occasionally), $min‐
912 utes is 0-59, $hours is 0-23, $days is 1-31, $month is 0-11, and
913 $year is 70-127. This function is limited in that it will not
914 process dates prior to 1970 or after 2037 (that way 32-bit time_t
915 overflow calculations aren't required).
916
917 Additional parameters passed to the function are ignored, so it is
918 safe to use the full localtime/gmtime output, such as:
919
920 $seconds = LW2::time_mktime( localtime( time ) );
921
922 Note: this function does not adjust for time zone, daylight savings
923 time, etc. You must do that yourself.
924
925 time_gmtolocal
926 Params: $seconds_gmt
927
928 Return: $seconds_local_timezone
929
930 Takes a seconds value in UTC/GMT time and adjusts it to reflect the
931 current timezone. This function is slightly expensive; it takes
932 the gmtime() and localtime() representations of the current time,
933 calculates the delta difference by turning them back into seconds
934 via time_mktime, and then applies this delta difference to $sec‐
935 onds_gmt.
936
937 Note that if you give this function a time and subtract the return
938 value from the original time, you will get the delta value. At
939 that point, you can just apply the delta directly and skip calling
940 this function, which is a massive performance boost. However, this
941 will cause problems if you have a long running program which
942 crosses daylight savings time boundaries, as the DST adjustment
943 will not be accounted for unless you recalculate the new delta.
944
945 uri_split
946 Params: $uri_string [, \%request_hash]
947
948 Return: @uri_parts
949
950 Return an array of the following values, in order: uri, protocol,
951 host, port, params, frag, user, password. Values not defined are
952 given an undef value. If a %request hash is passed in, then
953 uri_split() will also set the appropriate values in the hash.
954
955 Note: uri_split() will only set the %request hash if the protocol
956 is HTTP or HTTPS!
957
958 uri_join
959 Params: @vals
960
961 Return: $url
962
963 Takes the @vals array output from http_split_uri, and returns a
964 single scalar/string with them joined again, in the form of: proto‐
965 col://user:pass@host:port/uri?params#frag
966
967 uri_absolute
968 Params: $uri, $base_uri [, $normalize_flag ]
969
970 Return: $absolute_uri
971
972 Double checks that the given $uri is in absolute form (that is,
973 "http://host/file"), and if not (it's in the form "/file"), then it
974 will append the given $base_uri to make it absolute. This provides
975 a compatibility similar to that found in the URI subpackage.
976
977 If $normalize_flag is set to 1, then the output will be passed
978 through utils_normalize_uri before being returned.
979
980 uri_normalize
981 Params: $uri [, $fix_windows_slashes ]
982
983 Return: $normalized_uri [ undef on error ]
984
985 Takes the given $uri and does any /./ and /../ dereferencing in
986 order to come up with the correct absolute URL. If the $fix_ win‐
987 dows_slashes parameter is set to 1, all \ (back slashes) will be
988 converted to / (forward slashes).
989
990 Non-http/https URIs return an error.
991
992 uri_get_dir
993 Params: $uri
994
995 Return: $uri_directory
996
997 Will take a URI and return the directory base of it, i.e.
998 /rfp/page.php will return /rfp/.
999
1000 uri_strip_path_parameters
1001 Params: $uri [, \%param_hash]
1002
1003 Return: $stripped_uri
1004
1005 This function removes all URI path parameters of the form
1006
1007 /blah1;foo=bar/blah2;baz
1008
1009 and returns the stripped URI ('/blah1/blah2'). If the optional
1010 parameter hash reference is provided, the stripped parameters are
1011 saved in the form of 'blah1'=>'foo=bar', 'blah2'=>'baz'.
1012
1013 Note: only the last value of a duplicate name is saved into the
1014 param_hash, if provided. So a $uri of '/foo;A/foo;B/' will result
1015 in a single hash entry of 'foo'=>'B'.
1016
1017 uri_parse_parameters
1018 Params: $parameter_string [, $decode, $multi_flag ]
1019
1020 Return: \%parameter_hash
1021
1022 This function takes a string in the form of:
1023
1024 foo=1&bar=2&baz=3&foo=4
1025
1026 And parses it into a hash. In the above example, the element 'foo'
1027 has two values (1 and 4). If $multi_flag is set to 1, then the
1028 'foo' hash entry will hold an anonymous array of both values. Oth‐
1029 erwise, the default is to just contain the last value (in this
1030 case, '4').
1031
1032 If $decode is set to 1, then normal hex decoding is done on the
1033 characters, where needed (both the name and value are decoded).
1034
1035 Note: if a URL parameter name appears without a value, then the
1036 value will be set to undef. E.g. for the string "foo=1&bar&baz=2",
1037 the 'bar' hash element will have an undef value.
1038
1039 uri_escape
1040 Params: $data
1041
1042 Return: $encoded_data
1043
1044 This function encodes the given $data so it is safe to be used in
1045 URIs.
1046
1047 uri_unescape
1048 Params: $encoded_data
1049
1050 Return: $data
1051
1052 This function decodes the given $data out of URI format.
1053
1054 utils_recperm
1055 Params: $uri, $depth, \@dir_parts, \@valid, \&func, \%track,
1056 \%arrays, \&cfunc
1057
1058 Return: nothing
1059
1060 This is a special function which is used to recursively-permutate
1061 through a given directory listing. This is really only used by
1062 whisker, in order to traverse down directories, testing them as it
1063 goes. See whisker 2.0 for exact usage examples.
1064
1065 utils_array_shuffle
1066 Params: \@array
1067
1068 Return: nothing
1069
1070 This function will randomize the order of the elements in the given
1071 array.
1072
1073 utils_randstr
1074 Params: [ $size, $chars ]
1075
1076 Return: $random_string
1077
1078 This function generates a random string between 10 and 20 charac‐
1079 ters long, or of $size if specified. If $chars is specified, then
1080 the random function picks characters from the supplied string. For
1081 example, to have a random string of 10 characters, composed of only
1082 the characters 'abcdef', then you would run:
1083
1084 utils_randstr(10,'abcdef');
1085
1086 The default character string is alphanumeric.
1087
1088 utils_port_open
1089 Params: $host, $port
1090
1091 Return: $result
1092
1093 Quick function to attempt to make a connection to the given host
1094 and port. If a connection was successfully made, function will
1095 return true (1). Otherwise it returns false (0).
1096
1097 Note: this uses standard TCP connections, thus is not recommended
1098 for use in port-scanning type applications. Extremely slow.
1099
1100 utils_lowercase_keys
1101 Params: \%hash
1102
1103 Return: $number_changed
1104
1105 Will lowercase all the header names (but not values) of the given
1106 hash.
1107
1108 utils_find_lowercase_key
1109 Params: \%hash, $key
1110
1111 Return: $value, undef on error or not exist
1112
1113 Searches the given hash for the $key (regardless of case), and
1114 returns the value. If the return value is placed into an array, the
1115 will dereference any multi-value references and return an array of
1116 all values.
1117
1118 WARNING! In scalar context, $value can either be a single-value
1119 scalar or an array reference for multiple scalar values. That
1120 means you either need to check the return value and act appropri‐
1121 ately, or use an array context (even if you only want a single
1122 value). This is very important, even if you know there are no
1123 multi-value hash keys. This function may still return an array of
1124 multiple values even if all hash keys are single value, since low‐
1125 ercasing the keys could result in multiple keys matching. For
1126 example, a hash with the values { 'Foo'=>'a', 'fOo'=>'b' } techni‐
1127 cally has two keys with the lowercase name 'foo', and so this func‐
1128 tion will either return an array or array reference with both 'a'
1129 and 'b'.
1130
1131 utils_find_key
1132 Params: \%hash, $key
1133
1134 Return: $value, undef on error or not exist
1135
1136 Searches the given hash for the $key (case-sensitive), and returns
1137 the value. If the return value is placed into an array, the will
1138 dereference any multi-value references and return an array of all
1139 values.
1140
1141 utils_delete_lowercase_key
1142 Params: \%hash, $key
1143
1144 Return: $number_found
1145
1146 Searches the given hash for the $key (regardless of case), and
1147 deletes the key out of the hash if found. The function returns the
1148 number of keys found and deleted (since multiple keys can exist
1149 under the names 'Key', 'key', 'keY', 'KEY', etc.).
1150
1151 utils_getline
1152 Params: \$data [, $resetpos ]
1153
1154 Return: $line (undef if no more data)
1155
1156 Fetches the next \n terminated line from the given data. Use the
1157 optional $resetpos to reset the internal position pointer. Does
1158 *NOT* return trialing \n.
1159
1160 utils_getline_crlf
1161 Params: \$data [, $resetpos ]
1162
1163 Return: $line (undef if no more data)
1164
1165 Fetches the next \r\n terminated line from the given data. Use the
1166 optional $resetpos to reset the internal position pointer. Does
1167 *NOT* return trialing \r\n.
1168
1169 utils_save_page
1170 Params: $file, \%response
1171
1172 Return: 0 on success, 1 on error
1173
1174 Saves the data portion of the given whisker %response hash to the
1175 indicated file. Can technically save the data portion of a
1176 %request hash too. A file is not written if there is no data.
1177
1178 Note: LW does not do any special file checking; files are opened in
1179 overwrite mode.
1180
1181 utils_getopts
1182 Params: $opt_str, \%opt_results
1183
1184 Return: 0 on success, 1 on error
1185
1186 This function is a general implementation of GetOpts::Std. It will
1187 parse @ARGV, looking for the options specified in $opt_str, and
1188 will put the results in %opt_results. Behavior/parameter values
1189 are similar to GetOpts::Std's getopts().
1190
1191 Note: this function does *not* support long options (--option),
1192 option grouping (-opq), or options with immediate values (-ovalue).
1193 If an option is indicated as having a value, it will take the next
1194 argument regardless.
1195
1196 utils_text_wrapper
1197 Params: $long_text_string [, $crlf, $width ]
1198
1199 Return: $formatted_test_string
1200
1201 This is a simple function used to format a long line of text for
1202 display on a typical limited-character screen, such as a unix shell
1203 console.
1204
1205 $crlf defaults to "\n", and $width defaults to 76.
1206
1207 utils_bruteurl
1208 Params: \%req, $pre, $post, \@values_in, \@values_out
1209
1210 Return: Nothing (adds to @out)
1211
1212 Bruteurl will perform a brute force against the host/server speci‐
1213 fied in %req. However, it will make one request per entry in @in,
1214 taking the value and setting $hin{'whisker'}->{'uri'}=
1215 $pre.value.$post. Any URI responding with an HTTP 200 or 403
1216 response is pushed into @out. An example of this would be to brute
1217 force usernames, putting a list of common usernames in @in, setting
1218 $pre='/~' and $post='/'.
1219
1220 utils_join_tag
1221 Params: $tag_name, \%attributes
1222
1223 Return: $tag_string [undef on error]
1224
1225 This function takes the $tag_name (like 'A') and a hash full of
1226 attributes (like {href=>'http://foo/'}) and returns the constructed
1227 HTML tag string (<A href="http://foo">).
1228
1229 utils_request_clone
1230 Params: \%from_request, \%to_request
1231
1232 Return: 1 on success, 0 on error
1233
1234 This function takes the connection/request-specific values from the
1235 given from_request hash, and copies them to the to_request hash.
1236
1237 utils_request_fingerprint
1238 Params: \%request [, $hash ]
1239
1240 Return: $fingerprint [undef on error]
1241
1242 This function constructs a 'fingerprint' of the given request by
1243 using a cryptographic hashing function on the constructed original
1244 HTTP request.
1245
1246 Note: $hash can be 'md5' (default) or 'md4'.
1247
1248 utils_flatten_lwhash
1249 Params: \%lwhash
1250
1251 Return: $flat_version [undef on error]
1252
1253 This function takes a %request or %response libwhisker hash, and
1254 creates an approximate flat data string of the original request/
1255 response (i.e. before it was parsed into components and placed into
1256 the libwhisker hash).
1257
1258 utils_carp
1259 Params: [ $package_name ]
1260
1261 Return: nothing
1262
1263 This function acts like Carp's carp function. It warn's with the
1264 file and line number of user's code which causes a problem. It
1265 traces up the call stack and reports the first function that is not
1266 in the LW2 or optional $package_name package package.
1267
1268 utils_croak
1269 Params: [ $package_name ]
1270
1271 Return: nothing
1272
1273 This function acts like Carp's croak function. It die's with the
1274 file and line number of user's code which causes a problem. It
1275 traces up the call stack and reports the first function that is not
1276 in the LW2 or optional $package_name package package.
1277
1279 LWP
1280
1282 Copyright 2001-2006 Rain Forest Puppy
1283
1284 This program is free software; you can redistribute it and/or modify it
1285 under the terms of the GPL.
1286
1287
1288
12892.4 2007-05-27 LW2(3)