1LW2(3) User Contributed Perl Documentation LW2(3)
2
3
4
6 LW2 - Perl HTTP library version 2.5
7
9 use LW2;
10
11 require 'LW2.pm';
12
14 Libwhisker is a Perl library useful for HTTP testing scripts. It
15 contains a pure-Perl reimplementation of functionality found in the
16 "LWP", "URI", "Digest::MD5", "Digest::MD4", "Data::Dumper",
17 "Authen::NTLM", "HTML::Parser", "HTML::FormParser", "CGI::Upload",
18 "MIME::Base64", and "GetOpt::Std" modules.
19
20 Libwhisker is designed to be portable (a single perl file), fast
21 (general benchmarks show libwhisker is faster than LWP), and flexible
22 (great care was taken to ensure the library does exactly what you want
23 to do, even if it means breaking the protocol).
24
26 The following are the functions contained in Libwhisker:
27
28 auth_brute_force
29 Params: $auth_method, \%req, $user, \@passwords [, $domain,
30 $fail_code ]
31
32 Return: $first_valid_password, undef if error/none found
33
34 Perform a HTTP authentication brute force against a server (host
35 and URI defined in %req). It will try every password in the
36 password array for the given user. The first password (in
37 conjunction with the given user) that doesn't return HTTP 401 is
38 returned (and the brute force is stopped at that point). You
39 should retry the request with the given password and double-check
40 that you got a useful HTTP return code that indicates successful
41 authentication (200, 302), and not something a bit more abnormal
42 (407, 500, etc). $domain is optional, and is only used for NTLM
43 auth.
44
45 Note: set up any proxy settings and proxy auth in %req before
46 calling this function.
47
48 You can brute-force proxy authentication by setting up the target
49 proxy as proxy_host and proxy_port in %req, using an arbitrary host
50 and uri (preferably one that is reachable upon successful proxy
51 authorization), and setting the $fail_code to 407. The
52 $auth_method passed to this function should be a proxy-based one
53 ('proxy-basic', 'proxy-ntlm', etc).
54
55 if your server returns something other than 401 upon auth failure,
56 then set $fail_code to whatever is returned (and it needs to be
57 something *different* than what is received on auth success, or
58 this function won't be able to tell the difference).
59
60 auth_unset
61 Params: \%req
62
63 Return: nothing (modifies %req)
64
65 Modifes %req to disable all authentication (regular and proxy).
66
67 Note: it only removes the values set by auth_set(). Manually-
68 defined [Proxy-]Authorization headers will also be deleted (but you
69 shouldn't be using the auth_* functions if you're manually handling
70 your own auth...)
71
72 auth_set
73 Params: $auth_method, \%req, $user, $password [, $domain]
74
75 Return: nothing (modifies %req)
76
77 Modifes %req to use the indicated authentication info.
78
79 Auth_method can be: 'basic', 'proxy-basic', 'ntlm', 'proxy-ntlm'.
80
81 Note: this function may not necessarily set any headers after being
82 called. Also, proxy-ntlm with SSL is not currently supported.
83
84 cookie_new_jar
85 Params: none
86
87 Return: $jar
88
89 Create a new cookie jar, for use with the other functions. Even
90 though the jar is technically just a hash, you should still use
91 this function in order to be future-compatible (should the jar
92 format change).
93
94 cookie_read
95 Params: $jar, \%response [, \%request, $reject ]
96
97 Return: $num_of_cookies_read
98
99 Read in cookies from an %response hash, and put them in $jar.
100
101 Notice: cookie_read uses internal magic done by http_do_request in
102 order to read cookies regardless of 'Set-Cookie[2]' header
103 appearance.
104
105 If the optional %request hash is supplied, then it will be used to
106 calculate default host and path values, in case the cookie doesn't
107 specify them explicitly. If $reject is set to 1, then the %request
108 hash values are used to calculate and reject cookies which are not
109 appropriate for the path and domains of the given request.
110
111 cookie_parse
112 Params: $jar, $cookie [, $default_domain, $default_path, $reject ]
113
114 Return: nothing
115
116 Parses the cookie into the various parts and then sets the
117 appropriate values in the cookie $jar. If the cookie value is
118 blank, it will delete it from the $jar. See the 'docs/cookies.txt'
119 document for a full explanation of how Libwhisker parses cookies
120 and what RFC aspects are supported.
121
122 The optional $default_domain value is taken literally. Values with
123 no leading dot (e.g. 'www.host.com') are considered to be strict
124 hostnames and will only match the identical hostname. Values with
125 leading dots (e.g. '.host.com') are treated as sub-domain matches
126 for a single domain level. If the cookie does not indicate a
127 domain, and a $default_domain is not provided, then the cookie is
128 considered to match all domains/hosts.
129
130 The optional $default_path is used when the cookie does not specify
131 a path. $default_path must be absolute (start with '/'), or it
132 will be ignored. If the cookie does not specify a path, and
133 $default_path is not provided, then the default value '/' will be
134 used.
135
136 Set $reject to 1 if you wish to reject cookies based upon the
137 provided $default_domain and $default_path. Note that
138 $default_domain and $default_path must be specified for $reject to
139 actually do something meaningful.
140
141 cookie_write
142 Params: $jar, \%request, $override
143
144 Return: nothing
145
146 Goes through the given $jar and sets the Cookie header in %req
147 pending the correct domain and path. If $override is true, then
148 the secure, domain and path restrictions of the cookies are ignored
149 and all cookies are essentially included.
150
151 Notice: cookie expiration is currently not implemented. URL
152 restriction comparision is also case-insensitive.
153
154 cookie_get
155 Params: $jar, $name
156
157 Return: @elements
158
159 Fetch the named cookie from the $jar, and return the components.
160 The returned items will be an array in the following order:
161
162 value, domain, path, expire, secure
163
164 value = cookie value, should always be non-empty string domain =
165 domain root for cookie, can be undefined path = URL path for
166 cookie, should always be a non-empty string expire = undefined
167 (depreciated, but exists for backwards-compatibility) secure =
168 whether or not the cookie is limited to HTTPs; value is 0 or 1
169
170 cookie_get_names
171 Params: $jar
172
173 Return: @names
174
175 Fetch all the cookie names from the jar, which then let you
176 cooke_get() them individually.
177
178 cookie_get_valid_names
179 Params: $jar, $domain, $url, $ssl
180
181 Return: @names
182
183 Fetch all the cookie names from the jar which are valid for the
184 given $domain, $url, and $ssl values. $domain should be string
185 scalar of the target host domain ('www.example.com', etc.). $url
186 should be the absolute URL for the page ('/index.html',
187 '/cgi-bin/foo.cgi', etc.). $ssl should be 0 for non-secure
188 cookies, or 1 for all (secure and normal) cookies. The return
189 value is an array of names compatible with cookie_get().
190
191 cookie_set
192 Params: $jar, $name, $value, $domain, $path, $expire, $secure
193
194 Return: nothing
195
196 Set the named cookie with the provided values into the %jar. $name
197 is required to be a non-empty string. $value is required, and will
198 delete the named cookie from the $jar if it is an empty string.
199 $domain and $path can be strings or undefined. $expire is ignored
200 (but exists for backwards-compatibility). $secure should be the
201 numeric value of 0 or 1.
202
203 crawl_new
204 Params: $START, $MAX_DEPTH, \%request_hash [, \%tracking_hash ]
205
206 Return: $crawl_object
207
208 The crawl_new() functions initializes a crawl object (hash) to the
209 default values, and then returns it for later use by crawl().
210 $START is the starting URL (in the form of
211 'http://www.host.com/url'), and MAX_DEPTH is the maximum number of
212 levels to crawl (the START URL counts as 1, so a value of 2 will
213 crawl the START URL and all URLs found on that page). The
214 request_hash is a standard initialized request hash to be used for
215 requests; you should set any authentication information or headers
216 in this hash in order for the crawler to use them. The optional
217 tracking_hash lets you supply a hash for use in tracking URL
218 results (otherwise crawl_new() will allocate a new anon hash).
219
220 crawl
221 Params: $crawl_object [, $START, $MAX_DEPTH ]
222
223 Return: $count [ undef on error ]
224
225 The heart of the crawl package. Will perform an HTTP crawl on the
226 specified HOST, starting at START URI, proceeding up to MAX_DEPTH.
227
228 Crawl_object needs to be the variable returned by crawl_new(). You
229 can also indirectly call crawl() via the crawl_object itself:
230
231 $crawl_object->{crawl}->($START,$MAX_DEPTH)
232
233 Returns the number of URLs actually crawled (not including those
234 skipped).
235
236 dump
237 Params: $name, \@array [, $name, \%hash, $name, \$scalar ]
238
239 Return: $code [ undef on error ]
240
241 The dump function will take the given $name and data reference, and
242 will create an ASCII perl code representation suitable for eval'ing
243 later to recreate the same structure. $name is the name of the
244 variable that it will be saved as. Example:
245
246 $output = LW2::dump('request',\%request);
247
248 NOTE: dump() creates anonymous structures under the name given.
249 For example, if you dump the hash %hin under the name 'hin', then
250 when you eval the dumped code you will need to use %$hin, since
251 $hin is now a *reference* to a hash.
252
253 dump_writefile
254 Params: $file, $name, \@array [, $name, \%hash, $name, \@scalar ]
255
256 Return: 0 if success; 1 if error
257
258 This calls dump() and saves the output to the specified $file.
259
260 Note: LW does not checking on the validity of the file name, it's
261 creation, or anything of the sort. Files are opened in overwrite
262 mode.
263
264 encode_base64
265 Params: $data [, $eol]
266
267 Return: $b64_encoded_data
268
269 This function does Base64 encoding. If the binary MIME::Base64
270 module is available, it will use that; otherwise, it falls back to
271 an internal perl version. The perl version carries the following
272 copyright:
273
274 Copyright 1995-1999 Gisle Aas <gisle@aas.no>
275
276 NOTE: the $eol parameter will be inserted every 76 characters.
277 This is used to format the data for output on a 80 character wide
278 terminal.
279
280 decode_base64
281 Params: $data
282
283 Return: $b64_decoded_data
284
285 A perl implementation of base64 decoding. The perl code for this
286 function was actually taken from an older MIME::Base64 perl module,
287 and bears the following copyright:
288
289 Copyright 1995-1999 Gisle Aas <gisle@aas.no>
290
291 encode_uri_hex
292 Params: $data
293
294 Return: $result
295
296 This function encodes every character (except the / character) with
297 normal URL hex encoding.
298
299 encode_uri_randomhex
300 Params: $data
301
302 Return: $result
303
304 This function randomly encodes characters (except the / character)
305 with normal URL hex encoding.
306
307 encode_uri_randomcase
308 Params: $data
309
310 Return: $result
311
312 This function randomly changes the case of characters in the
313 string.
314
315 encode_unicode
316 Params: $data
317
318 Return: $result
319
320 This function converts a normal string into Windows unicode format
321 (non-overlong or anything fancy).
322
323 decode_unicode
324 Params: $unicode_string
325
326 Return: $decoded_string
327
328 This function attempts to decode a unicode (UTF-8) string by
329 converting it into a single-byte-character string. Overlong
330 characters are converted to their standard characters in place;
331 non-overlong (aka multi-byte) characters are substituted with the
332 0xff; invalid encoding characters are left as-is.
333
334 Note: this function is useful for dealing with the various unicode
335 exploits/vulnerabilities found in web servers; it is *not* good for
336 doing actual UTF-8 parsing, since characters over a single byte are
337 basically dropped/replaced with a placeholder.
338
339 encode_anti_ids
340 Params: \%request, $modes
341
342 Return: nothing
343
344 encode_anti_ids computes the proper anti-ids encoding/tricks
345 specified by $modes, and sets up %hin in order to use those tricks.
346 Valid modes are (the mode numbers are the same as those found in
347 whisker 1.4):
348
349 1 Encode some of the characters via normal URL encoding
350 2 Insert directory self-references (/./)
351 3 Premature URL ending (make it appear the request line is done)
352 4 Prepend a long random string in the form of "/string/../URL"
353 5 Add a fake URL parameter
354 6 Use a tab instead of a space as a request spacer
355 7 Change the case of the URL (works against Windows and Novell)
356 8 Change normal seperators ('/') to Windows version ('\')
357 9 Session splicing [NOTE: not currently available]
358 A Use a carriage return (0x0d) as a request spacer
359 B Use binary value 0x0b as a request spacer
360
361 You can set multiple modes by setting the string to contain all the
362 modes desired; i.e. $modes="146" will use modes 1, 4, and 6.
363
364 FORMS FUNCTIONS
365 The goal is to parse the variable, human-readable HTML into
366 concrete structures useable by your program. The forms functions
367 does do a good job at making these structures, but I will admit:
368 they are not exactly simple, and thus not a cinch to work with.
369 But then again, representing something as complex as a HTML form is
370 not a simple thing either. I think the results are acceptable for
371 what's trying to be done. Anyways...
372
373 Forms are stored in perl hashes, with elements in the following
374 format:
375
376 $form{'element_name'}=@([ 'type', 'value', @params ])
377
378 Thus every element in the hash is an array of anonymous arrays.
379 The first array value contains the element type (which is 'select',
380 'textarea', 'button', or an 'input' value of the form 'input-text',
381 'input-hidden', 'input-radio', etc).
382
383 The second value is the value, if applicable (it could be undef if
384 no value was specified). Note that select elements will always
385 have an undef value--the actual values are in the subsequent
386 options elements.
387
388 The third value, if defined, is an anonymous array of additional
389 tag parameters found in the element (like 'onchange="blah"',
390 'size="20"', 'maxlength="40"', 'selected', etc).
391
392 The array does contain one special element, which is stored in the
393 hash under a NULL character ("\0") key. This element is of the
394 format:
395
396 $form{"\0"}=['name', 'method', 'action', @parameters];
397
398 The element is an anonymous array that contains strings of the
399 form's name, method, and action (values can be undef), and a
400 @parameters array similar to that found in normal elements (above).
401
402 Accessing individual values stored in the form hash becomes a test
403 of your perl referencing skills. Hint: to access the 'value' of
404 the third element named 'choices', you would need to do:
405
406 $form{'choices'}->[2]->[1];
407
408 The '[2]' is the third element (normal array starts with 0), and
409 the actual value is '[1]' (the type is '[0]', and the parameter
410 array is '[2]').
411
412 forms_read
413 Params: \$html_data
414
415 Return: \@found_forms
416
417 This function parses the given $html_data into libwhisker form
418 hashes. It returns a reference to an array of hash references to
419 the found forms.
420
421 forms_write
422 Params: \%form_hash
423
424 Return: $html_of_form [undef on error]
425
426 This function will take the given %form hash and compose a generic
427 HTML representation of it, formatted with tabs and newlines in
428 order to make it neat and tidy for printing.
429
430 Note: this function does *not* escape any special characters that
431 were embedded in the element values.
432
433 html_find_tags
434 Params: \$data, \&callback_function [, $xml_flag, $funcref,
435 \%tag_map]
436
437 Return: nothing
438
439 html_find_tags parses a piece of HTML and 'extracts' all found
440 tags, passing the info to the given callback function. The
441 callback function must accept two parameters: the current tag (as a
442 scalar), and a hash ref of all the tag's elements. For example, the
443 tag <a href="/file"> will pass 'a' as the current tag, and a hash
444 reference which contains {'href'=>"/file"}.
445
446 The xml_flag, when set, causes the parser to do some extra
447 processing and checks to accomodate XML style tags such as <tag
448 foo="bar"/>.
449
450 The optional %tagmap is a hash of lowercase tag names. If a tagmap
451 is supplied, then the parser will only call the callback function
452 if the tag name exists in the tagmap.
453
454 The optional $funcref variable is passed straight to the callback
455 function, allowing you to pass flags or references to more complex
456 structures to your callback function.
457
458 html_find_tags_rewrite
459 Params: $position, $length, $replacement
460
461 Return: nothing
462
463 html_find_tags_rewrite() is used to 'rewrite' an HTML stream from
464 within an html_find_tags() callback function. In general, you can
465 think of html_find_tags_rewrite working as:
466
467 substr(DATA, $position, $length) = $replacement
468
469 Where DATA is the current HTML string the html parser is using.
470 The reason you need to use this function and not substr() is
471 because a few internal parser pointers and counters need to be
472 adjusted to accomodate the changes.
473
474 If you want to remove a piece of the string, just set the
475 replacement to an empty string (''). If you wish to insert a
476 string instead of overwrite, just set $length to 0; your string
477 will be inserted at the indicated $position.
478
479 html_link_extractor
480 Params: \$html_data
481
482 Return: @urls
483
484 The html_link_extractor() function uses the internal crawl tests to
485 extract all the HTML links from the given HTML data stream.
486
487 Note: html_link_extractor() does not unique the returned array of
488 discovered links, nor does it attempt to remove javascript links or
489 make the links absolute. It just extracts every raw link from the
490 HTML stream and returns it. You'll have to do your own post-
491 processing.
492
493 http_new_request
494 Params: %parameters
495
496 Return: \%request_hash
497
498 This function basically 'objectifies' the creation of whisker
499 request hash objects. You would call it like:
500
501 $req = http_new_request( host=>'www.example.com', uri=>'/' )
502
503 where 'host' and 'uri' can be any number of {whisker} hash control
504 values (see http_init_request for default list).
505
506 http_new_response
507 Params: [none]
508
509 Return: \%response_hash
510
511 This function basically 'objectifies' the creation of whisker
512 response hash objects. You would call it like:
513
514 $resp = http_new_response()
515
516 http_init_request
517 Params: \%request_hash_to_initialize
518
519 Return: Nothing (modifies input hash)
520
521 Sets default values to the input hash for use. Sets the host to
522 'localhost', port 80, request URI '/', using HTTP 1.1 with GET
523 method. The timeout is set to 10 seconds, no proxies are defined,
524 and all URI formatting is set to standard HTTP syntax. It also
525 sets the Connection (Keep-Alive) and User-Agent headers.
526
527 NOTICE!! It's important to use http_init_request before calling
528 http_do_request, or http_do_request might puke. Thus, a special
529 magic value is placed in the hash to let http_do_request know that
530 the hash has been properly initialized. If you really must 'roll
531 your own' and not use http_init_request before you call
532 http_do_request, you will at least need to set the MAGIC value
533 (amongst other things).
534
535 http_do_request
536 Params: \%request, \%response [, \%configs]
537
538 Return: >=1 if error; 0 if no error (also modifies response hash)
539
540 *THE* core function of libwhisker. http_do_request actually
541 performs the HTTP request, using the values submitted in %request,
542 and placing result values in %response. This allows you to
543 resubmit %request in subsequent requests (%response is
544 automatically cleared upon execution). You can submit 'runtime'
545 config directives as %configs, which will be spliced into
546 $hin{whisker}->{} before anything else. That means you can do:
547
548 LW2::http_do_request(\%req,\%resp,{'uri'=>'/cgi-bin/'});
549
550 This will set $req{whisker}->{'uri'}='/cgi-bin/' before execution,
551 and provides a simple shortcut (note: it does modify %req).
552
553 This function will also retry any requests that bomb out during the
554 transaction (but not during the connecting phase). This is
555 controlled by the {whisker}->{retry} value. Also note that the
556 returned error message in hout is the *last* error received. All
557 retry errors are put into {whisker}->{retry_errors}, which is an
558 anonymous array.
559
560 Also note that all NTLM auth logic is implemented in
561 http_do_request(). NTLM requires multiple requests in order to
562 work correctly, and so this function attempts to wrap that and make
563 it all transparent, so that the final end result is what's passed
564 to the application.
565
566 This function will return 0 on success, 1 on HTTP protocol error,
567 and 2 on non-recoverable network connection error (you can retry
568 error 1, but error 2 means that the server is totally unreachable
569 and there's no point in retrying).
570
571 http_req2line
572 Params: \%request, $uri_only_switch
573
574 Return: $request
575
576 req2line is used internally by http_do_request, as well as provides
577 a convienient way to turn a %request configuration into an actual
578 HTTP request line. If $switch is set to 1, then the returned
579 $request will be the URI only ('/requested/page.html'), versus the
580 entire HTTP request ('GET /requested/page.html HTTP/1.0\n\n').
581 Also, if the 'full_request_override' whisker config variable is set
582 in %hin, then it will be returned instead of the constructed URI.
583
584 http_resp2line
585 Params: \%response
586
587 Return: $response
588
589 http_resp2line provides a convienient way to turn a %response hash
590 back into the original HTTP response line.
591
592 http_fixup_request
593 Params: $hash_ref
594
595 Return: Nothing
596
597 This function takes a %hin hash reference and makes sure the proper
598 headers exist (for example, it will add the Host: header, calculate
599 the Content-Length: header for POST requests, etc). For standard
600 requests (i.e. you want the request to be HTTP RFC-compliant), you
601 should call this function right before you call http_do_request.
602
603 http_reset
604 Params: Nothing
605
606 Return: Nothing
607
608 The http_reset function will walk through the %http_host_cache,
609 closing all open sockets and freeing SSL resources. It also clears
610 out the host cache in case you need to rerun everything fresh.
611
612 Note: if you just want to close a single connection, and you have a
613 copy of the %request hash you used, you should use the http_close()
614 function instead.
615
616 ssl_is_available
617 Params: Nothing
618
619 Return: $boolean [, $lib_name, $version]
620
621 The ssl_is_available() function will inform you whether SSL
622 requests are allowed, which is dependant on whether the appropriate
623 SSL libraries are installed on the machine. In scalar context, the
624 function will return 1 or 0. In array context, the second element
625 will be the SSL library name that is currently being used by LW2,
626 and the third elment will be the SSL library version number.
627 Elements two and three (name and version) will be undefined if
628 called in array context and no SSL libraries are available.
629
630 http_read_headers
631 Params: $stream, \%in, \%out
632
633 Return: $result_code, $encoding, $length, $connection
634
635 Read HTTP headers from the given stream, storing the results in
636 %out. On success, $result_code will be 1 and $encoding, $length,
637 and $connection will hold the values of the Transfer-Encoding,
638 Content-Length, and Connection headers, respectively. If any of
639 those headers are not present, then it will have an 'undef' value.
640 On an error, the $result_code will be 0 and $encoding will contain
641 an error message.
642
643 This function can be used to parse both request and response
644 headers.
645
646 Note: if there are multiple Transfer-Encoding, Content-Length, or
647 Connection headers, then only the last header value is the one
648 returned by the function.
649
650 http_read_body
651 Params: $stream, \%in, \%out, $encoding, $length
652
653 Return: 1 on success, 0 on error (and sets
654 $hout->{whisker}->{error})
655
656 Read the body from the given stream, placing it in
657 $out->{whisker}->{data}. Handles chunked encoding. Can be used to
658 read HTTP (POST) request or HTTP response bodies. $encoding
659 parameter should be lowercase encoding type.
660
661 NOTE: $out->{whisker}->{data} is erased/cleared when this function
662 is called, leaving {data} to just contain this particular HTTP
663 body.
664
665 http_construct_headers
666 Params: \%in
667
668 Return: $data
669
670 This function assembles the headers in the given hash into a data
671 string.
672
673 http_close
674 Params: \%request
675
676 Return: nothing
677
678 This function will close any open streams for the given request.
679
680 Note: in order for http_close() to find the right connection, all
681 original host/proxy/port parameters in %request must be the exact
682 same as when the original request was made.
683
684 http_do_request_timeout
685 Params: \%request, \%response, $timeout
686
687 Return: $result
688
689 This function is identical to http_do_request(), except that it
690 wraps the entire request in a timeout wrapper. $timeout is the
691 number of seconds to allow for the entire request to be completed.
692
693 Note: this function uses alarm() and signals, and thus will only
694 work on Unix-ish platforms. It should be safe to call on any
695 platform though.
696
697 md5 Params: $data
698
699 Return: $hex_md5_string
700
701 This function takes a data scalar, and composes a MD5 hash of it,
702 and returns it in a hex ascii string. It will use the fastest MD5
703 function available.
704
705 md4 Params: $data
706
707 Return: $hex_md4_string
708
709 This function takes a data scalar, and composes a MD4 hash of it,
710 and returns it in a hex ascii string. It will use the fastest MD4
711 function available.
712
713 multipart_set
714 Params: \%multi_hash, $param_name, $param_value
715
716 Return: nothing
717
718 This function sets the named parameter to the given value within
719 the supplied multipart hash.
720
721 multipart_get
722 Params: \%multi_hash, $param_name
723
724 Return: $param_value, undef on error
725
726 This function retrieves the named parameter to the given value
727 within the supplied multipart hash. There is a special case where
728 the named parameter is actually a file--in which case the resulting
729 value will be "\0FILE". In general, all special values will be
730 prefixed with a NULL character. In order to get a file's info, use
731 multipart_getfile().
732
733 multipart_setfile
734 Params: \%multi_hash, $param_name, $file_path [, $filename]
735
736 Return: undef on error, 1 on success
737
738 NOTE: this function does not actually add the contents of
739 $file_path into the %multi_hash; instead, multipart_write() inserts
740 the content when generating the final request.
741
742 multipart_getfile
743 Params: \%multi_hash, $file_param_name
744
745 Return: $path, $name ($path=undef on error)
746
747 multipart_getfile is used to retrieve information for a file
748 parameter contained in %multi_hash. To use this you would most
749 likely do:
750
751 ($path,$fname)=LW2::multipart_getfile(\%multi,"param_name");
752
753 multipart_boundary
754 Params: \%multi_hash [, $new_boundary_name]
755
756 Return: $current_boundary_name
757
758 multipart_boundary is used to retrieve, and optionally set, the
759 multipart boundary used for the request.
760
761 NOTE: the function does no checking on the supplied boundary, so if
762 you want things to work make sure it's a legit boundary.
763 Libwhisker does *not* prefix it with any '---' characters.
764
765 multipart_write
766 Params: \%multi_hash, \%request
767
768 Return: 1 if successful, undef on error
769
770 multipart_write is used to parse and construct the multipart data
771 contained in %multi_hash, and place it ready to go in the given
772 whisker hash (%request) structure, to be sent to the server.
773
774 NOTE: file contents are read into the final %request, so it's
775 possible for the hash to get *very* large if you have (a) large
776 file(s).
777
778 multipart_read
779 Params: \%multi_hash, \%hout_response [, $filepath ]
780
781 Return: 1 if successful, undef on error
782
783 multipart_read will parse the data contents of the supplied
784 %hout_response hash, by passing the appropriate info to
785 multipart_read_data(). Please see multipart_read_data() for more
786 info on parameters and behaviour.
787
788 NOTE: this function will return an error if the given
789 %hout_response Content-Type is not set to "multipart/form-data".
790
791 multipart_read_data
792 Params: \%multi_hash, \$data, $boundary [, $filepath ]
793
794 Return: 1 if successful, undef on error
795
796 multipart_read_data parses the contents of the supplied data using
797 the given boundary and puts the values in the supplied %multi_hash.
798 Embedded files will *not* be saved unless a $filepath is given,
799 which should be a directory suitable for writing out temporary
800 files.
801
802 NOTE: currently only application/octet-stream is the only supported
803 file encoding. All other file encodings will not be parsed/saved.
804
805 multipart_files_list
806 Params: \%multi_hash
807
808 Return: @files
809
810 multipart_files_list returns an array of parameter names for all
811 the files that are contained in %multi_hash.
812
813 multipart_params_list
814 Params: \%multi_hash
815
816 Return: @params
817
818 multipart_files_list returns an array of parameter names for all
819 the regular parameters (non-file) that are contained in
820 %multi_hash.
821
822 ntlm_new
823 Params: $username, $password [, $domain, $ntlm_only]
824
825 Return: $ntlm_object
826
827 Returns a reference to an array (otherwise known as the 'ntlm
828 object') which contains the various informations specific to a
829 user/pass combo. If $ntlm_only is set to 1, then only the NTLM
830 hash (and not the LanMan hash) will be generated. This results in
831 a speed boost, and is typically fine for using against IIS servers.
832
833 The array contains the following items, in order: username,
834 password, domain, lmhash(password), ntlmhash(password)
835
836 ntlm_decode_challenge
837 Params: $challenge
838
839 Return: @challenge_parts
840
841 Splits the supplied challenge into the various parts. The returned
842 array contains elements in the following order:
843
844 unicode_domain, ident, packet_type, domain_len, domain_maxlen,
845 domain_offset, flags, challenge_token, reserved, empty, raw_data
846
847 ntlm_client
848 Params: $ntlm_obj [, $server_challenge]
849
850 Return: $response
851
852 ntlm_client() is responsible for generating the base64-encoded text
853 you include in the HTTP Authorization header. If you call
854 ntlm_client() without a $server_challenge, the function will return
855 the initial NTLM request packet (message packet #1). You send this
856 to the server, and take the server's response (message packet #2)
857 and pass that as $server_challenge, causing ntlm_client() to
858 generate the final response packet (message packet #3).
859
860 Note: $server_challenge is expected to be base64 encoded.
861
862 get_page
863 Params: $url [, \%request]
864
865 Return: $code, $data ($code will be set to undef on error, $data
866 will contain error message)
867
868 This function will fetch the page at the given URL, and return the
869 HTTP response code and page contents. Use this in the form of:
870 ($code,$html)=LW2::get_page("http://host.com/page.html")
871
872 The optional %request will be used if supplied. This allows you to
873 set headers and other parameters.
874
875 get_page_hash
876 Params: $url [, \%request]
877
878 Return: $hash_ref (undef on no URL)
879
880 This function will fetch the page at the given URL, and return the
881 whisker HTTP response hash. The return code of the function is set
882 to $hash_ref->{whisker}->{get_page_hash}, and uses the
883 http_do_request() return values.
884
885 Note: undef is returned if no URL is supplied
886
887 get_page_to_file
888 Params: $url, $filepath [, \%request]
889
890 Return: $code ($code will be set to undef on error)
891
892 This function will fetch the page at the given URL, place the
893 resulting HTML in the file specified, and return the HTTP response
894 code. The optional %request hash sets the default parameters to be
895 used in the request.
896
897 NOTE: libwhisker does not do any file checking; libwhisker will
898 open the supplied filepath for writing, overwriting any previously-
899 existing files. Libwhisker does not differentiate between a bad
900 request, and a bad file open. If you're having troubles making
901 this function work, make sure that your $filepath is legal and
902 valid, and that you have appropriate write permissions to
903 create/overwrite that file.
904
905 time_mktime
906 Params: $seconds, $minutes, $hours, $day_of_month, $month,
907 $year_minus_1900
908
909 Return: $seconds [ -1 on error ]
910
911 Performs a general mktime calculation with the given time
912 components. Note that the input parameter values are expected to
913 be in the format output by localtime/gmtime. Namely, $seconds is
914 0-60 (yes, there can be a leap second value of 60 occasionally),
915 $minutes is 0-59, $hours is 0-23, $days is 1-31, $month is 0-11,
916 and $year is 70-127. This function is limited in that it will not
917 process dates prior to 1970 or after 2037 (that way 32-bit time_t
918 overflow calculations aren't required).
919
920 Additional parameters passed to the function are ignored, so it is
921 safe to use the full localtime/gmtime output, such as:
922
923 $seconds = LW2::time_mktime( localtime( time ) );
924
925 Note: this function does not adjust for time zone, daylight savings
926 time, etc. You must do that yourself.
927
928 time_gmtolocal
929 Params: $seconds_gmt
930
931 Return: $seconds_local_timezone
932
933 Takes a seconds value in UTC/GMT time and adjusts it to reflect the
934 current timezone. This function is slightly expensive; it takes
935 the gmtime() and localtime() representations of the current time,
936 calculates the delta difference by turning them back into seconds
937 via time_mktime, and then applies this delta difference to
938 $seconds_gmt.
939
940 Note that if you give this function a time and subtract the return
941 value from the original time, you will get the delta value. At
942 that point, you can just apply the delta directly and skip calling
943 this function, which is a massive performance boost. However, this
944 will cause problems if you have a long running program which
945 crosses daylight savings time boundaries, as the DST adjustment
946 will not be accounted for unless you recalculate the new delta.
947
948 uri_split
949 Params: $uri_string [, \%request_hash]
950
951 Return: @uri_parts
952
953 Return an array of the following values, in order: uri, protocol,
954 host, port, params, frag, user, password. Values not defined are
955 given an undef value. If a %request hash is passed in, then
956 uri_split() will also set the appropriate values in the hash.
957
958 Note: uri_split() will only set the %request hash if the protocol
959 is HTTP or HTTPS!
960
961 uri_join
962 Params: @vals
963
964 Return: $url
965
966 Takes the @vals array output from http_split_uri, and returns a
967 single scalar/string with them joined again, in the form of:
968 protocol://user:pass@host:port/uri?params#frag
969
970 uri_absolute
971 Params: $uri, $base_uri [, $normalize_flag ]
972
973 Return: $absolute_uri
974
975 Double checks that the given $uri is in absolute form (that is,
976 "http://host/file"), and if not (it's in the form "/file"), then it
977 will append the given $base_uri to make it absolute. This provides
978 a compatibility similar to that found in the URI subpackage.
979
980 If $normalize_flag is set to 1, then the output will be passed
981 through uri_normalize before being returned.
982
983 uri_normalize
984 Params: $uri [, $fix_windows_slashes ]
985
986 Return: $normalized_uri [ undef on error ]
987
988 Takes the given $uri and does any /./ and /../ dereferencing in
989 order to come up with the correct absolute URL. If the $fix_
990 windows_slashes parameter is set to 1, all \ (back slashes) will be
991 converted to / (forward slashes).
992
993 Non-http/https URIs return an error.
994
995 uri_get_dir
996 Params: $uri
997
998 Return: $uri_directory
999
1000 Will take a URI and return the directory base of it, i.e.
1001 /rfp/page.php will return /rfp/.
1002
1003 uri_strip_path_parameters
1004 Params: $uri [, \%param_hash]
1005
1006 Return: $stripped_uri
1007
1008 This function removes all URI path parameters of the form
1009
1010 /blah1;foo=bar/blah2;baz
1011
1012 and returns the stripped URI ('/blah1/blah2'). If the optional
1013 parameter hash reference is provided, the stripped parameters are
1014 saved in the form of 'blah1'=>'foo=bar', 'blah2'=>'baz'.
1015
1016 Note: only the last value of a duplicate name is saved into the
1017 param_hash, if provided. So a $uri of '/foo;A/foo;B/' will result
1018 in a single hash entry of 'foo'=>'B'.
1019
1020 uri_parse_parameters
1021 Params: $parameter_string [, $decode, $multi_flag ]
1022
1023 Return: \%parameter_hash
1024
1025 This function takes a string in the form of:
1026
1027 foo=1&bar=2&baz=3&foo=4
1028
1029 And parses it into a hash. In the above example, the element 'foo'
1030 has two values (1 and 4). If $multi_flag is set to 1, then the
1031 'foo' hash entry will hold an anonymous array of both values.
1032 Otherwise, the default is to just contain the last value (in this
1033 case, '4').
1034
1035 If $decode is set to 1, then normal hex decoding is done on the
1036 characters, where needed (both the name and value are decoded).
1037
1038 Note: if a URL parameter name appears without a value, then the
1039 value will be set to undef. E.g. for the string "foo=1&bar&baz=2",
1040 the 'bar' hash element will have an undef value.
1041
1042 uri_escape
1043 Params: $data
1044
1045 Return: $encoded_data
1046
1047 This function encodes the given $data so it is safe to be used in
1048 URIs.
1049
1050 uri_unescape
1051 Params: $encoded_data
1052
1053 Return: $data
1054
1055 This function decodes the given $data out of URI format.
1056
1057 utils_recperm
1058 Params: $uri, $depth, \@dir_parts, \@valid, \&func, \%track,
1059 \%arrays, \&cfunc
1060
1061 Return: nothing
1062
1063 This is a special function which is used to recursively-permutate
1064 through a given directory listing. This is really only used by
1065 whisker, in order to traverse down directories, testing them as it
1066 goes. See whisker 2.0 for exact usage examples.
1067
1068 utils_array_shuffle
1069 Params: \@array
1070
1071 Return: nothing
1072
1073 This function will randomize the order of the elements in the given
1074 array.
1075
1076 utils_randstr
1077 Params: [ $size, $chars ]
1078
1079 Return: $random_string
1080
1081 This function generates a random string between 10 and 20
1082 characters long, or of $size if specified. If $chars is specified,
1083 then the random function picks characters from the supplied string.
1084 For example, to have a random string of 10 characters, composed of
1085 only the characters 'abcdef', then you would run:
1086
1087 utils_randstr(10,'abcdef');
1088
1089 The default character string is alphanumeric.
1090
1091 utils_port_open
1092 Params: $host, $port
1093
1094 Return: $result
1095
1096 Quick function to attempt to make a connection to the given host
1097 and port. If a connection was successfully made, function will
1098 return true (1). Otherwise it returns false (0).
1099
1100 Note: this uses standard TCP connections, thus is not recommended
1101 for use in port-scanning type applications. Extremely slow.
1102
1103 utils_lowercase_keys
1104 Params: \%hash
1105
1106 Return: $number_changed
1107
1108 Will lowercase all the header names (but not values) of the given
1109 hash.
1110
1111 utils_find_lowercase_key
1112 Params: \%hash, $key
1113
1114 Return: $value, undef on error or not exist
1115
1116 Searches the given hash for the $key (regardless of case), and
1117 returns the value. If the return value is placed into an array, the
1118 will dereference any multi-value references and return an array of
1119 all values.
1120
1121 WARNING! In scalar context, $value can either be a single-value
1122 scalar or an array reference for multiple scalar values. That
1123 means you either need to check the return value and act
1124 appropriately, or use an array context (even if you only want a
1125 single value). This is very important, even if you know there are
1126 no multi-value hash keys. This function may still return an array
1127 of multiple values even if all hash keys are single value, since
1128 lowercasing the keys could result in multiple keys matching. For
1129 example, a hash with the values { 'Foo'=>'a', 'fOo'=>'b' }
1130 technically has two keys with the lowercase name 'foo', and so this
1131 function will either return an array or array reference with both
1132 'a' and 'b'.
1133
1134 utils_find_key
1135 Params: \%hash, $key
1136
1137 Return: $value, undef on error or not exist
1138
1139 Searches the given hash for the $key (case-sensitive), and returns
1140 the value. If the return value is placed into an array, the will
1141 dereference any multi-value references and return an array of all
1142 values.
1143
1144 utils_delete_lowercase_key
1145 Params: \%hash, $key
1146
1147 Return: $number_found
1148
1149 Searches the given hash for the $key (regardless of case), and
1150 deletes the key out of the hash if found. The function returns the
1151 number of keys found and deleted (since multiple keys can exist
1152 under the names 'Key', 'key', 'keY', 'KEY', etc.).
1153
1154 utils_getline
1155 Params: \$data [, $resetpos ]
1156
1157 Return: $line (undef if no more data)
1158
1159 Fetches the next \n terminated line from the given data. Use the
1160 optional $resetpos to reset the internal position pointer. Does
1161 *NOT* return trialing \n.
1162
1163 utils_getline_crlf
1164 Params: \$data [, $resetpos ]
1165
1166 Return: $line (undef if no more data)
1167
1168 Fetches the next \r\n terminated line from the given data. Use the
1169 optional $resetpos to reset the internal position pointer. Does
1170 *NOT* return trialing \r\n.
1171
1172 utils_save_page
1173 Params: $file, \%response
1174
1175 Return: 0 on success, 1 on error
1176
1177 Saves the data portion of the given whisker %response hash to the
1178 indicated file. Can technically save the data portion of a
1179 %request hash too. A file is not written if there is no data.
1180
1181 Note: LW does not do any special file checking; files are opened in
1182 overwrite mode.
1183
1184 utils_getopts
1185 Params: $opt_str, \%opt_results
1186
1187 Return: 0 on success, 1 on error
1188
1189 This function is a general implementation of GetOpts::Std. It will
1190 parse @ARGV, looking for the options specified in $opt_str, and
1191 will put the results in %opt_results. Behavior/parameter values
1192 are similar to GetOpts::Std's getopts().
1193
1194 Note: this function does *not* support long options (--option),
1195 option grouping (-opq), or options with immediate values (-ovalue).
1196 If an option is indicated as having a value, it will take the next
1197 argument regardless.
1198
1199 utils_text_wrapper
1200 Params: $long_text_string [, $crlf, $width ]
1201
1202 Return: $formatted_test_string
1203
1204 This is a simple function used to format a long line of text for
1205 display on a typical limited-character screen, such as a unix shell
1206 console.
1207
1208 $crlf defaults to "\n", and $width defaults to 76.
1209
1210 utils_bruteurl
1211 Params: \%req, $pre, $post, \@values_in, \@values_out
1212
1213 Return: Nothing (adds to @out)
1214
1215 Bruteurl will perform a brute force against the host/server
1216 specified in %req. However, it will make one request per entry in
1217 @in, taking the value and setting $hin{'whisker'}->{'uri'}=
1218 $pre.value.$post. Any URI responding with an HTTP 200 or 403
1219 response is pushed into @out. An example of this would be to brute
1220 force usernames, putting a list of common usernames in @in, setting
1221 $pre='/~' and $post='/'.
1222
1223 utils_join_tag
1224 Params: $tag_name, \%attributes
1225
1226 Return: $tag_string [undef on error]
1227
1228 This function takes the $tag_name (like 'A') and a hash full of
1229 attributes (like {href=>'http://foo/'}) and returns the constructed
1230 HTML tag string (<A href="http://foo">).
1231
1232 utils_request_clone
1233 Params: \%from_request, \%to_request
1234
1235 Return: 1 on success, 0 on error
1236
1237 This function takes the connection/request-specific values from the
1238 given from_request hash, and copies them to the to_request hash.
1239
1240 utils_request_fingerprint
1241 Params: \%request [, $hash ]
1242
1243 Return: $fingerprint [undef on error]
1244
1245 This function constructs a 'fingerprint' of the given request by
1246 using a cryptographic hashing function on the constructed original
1247 HTTP request.
1248
1249 Note: $hash can be 'md5' (default) or 'md4'.
1250
1251 utils_flatten_lwhash
1252 Params: \%lwhash
1253
1254 Return: $flat_version [undef on error]
1255
1256 This function takes a %request or %response libwhisker hash, and
1257 creates an approximate flat data string of the original request/
1258 response (i.e. before it was parsed into components and placed into
1259 the libwhisker hash).
1260
1261 utils_carp
1262 Params: [ $package_name ]
1263
1264 Return: nothing
1265
1266 This function acts like Carp's carp function. It warn's with the
1267 file and line number of user's code which causes a problem. It
1268 traces up the call stack and reports the first function that is not
1269 in the LW2 or optional $package_name package package.
1270
1271 utils_croak
1272 Params: [ $package_name ]
1273
1274 Return: nothing
1275
1276 This function acts like Carp's croak function. It die's with the
1277 file and line number of user's code which causes a problem. It
1278 traces up the call stack and reports the first function that is not
1279 in the LW2 or optional $package_name package package.
1280
1282 LWP
1283
1285 Copyright 2009 Jeff Forristal
1286
1287
1288
12892.5 2020-02-07 LW2(3)