1URI::Fetch(3)         User Contributed Perl Documentation        URI::Fetch(3)
2
3
4

NAME

6       URI::Fetch - Smart URI fetching/caching
7

SYNOPSIS

9           use URI::Fetch;
10
11           ## Simple fetch.
12           my $res = URI::Fetch->fetch('http://example.com/atom.xml')
13               or die URI::Fetch->errstr;
14
15           ## Fetch using specified ETag and Last-Modified headers.
16           my $res = URI::Fetch->fetch('http://example.com/atom.xml',
17                   ETag => '123-ABC',
18                   LastModified => time - 3600,
19           )
20               or die URI::Fetch->errstr;
21
22           ## Fetch using an on-disk cache that URI::Fetch manages for you.
23           my $cache = Cache::File->new( cache_root => '/tmp/cache' );
24           my $res = URI::Fetch->fetch('http://example.com/atom.xml',
25                   Cache => $cache
26           )
27               or die URI::Fetch->errstr;
28

DESCRIPTION

30       URI::Fetch is a smart client for fetching HTTP pages, notably syndica‐
31       tion feeds (RSS, Atom, and others), in an intelligent, bandwidth- and
32       time-saving way. That means:
33
34       * GZIP support
35           If you have Compress::Zlib installed, URI::Fetch will automatically
36           try to download a compressed version of the content, saving band‐
37           width (and time).
38
39       * Last-Modified and ETag support
40           If you use a local cache (see the Cache parameter to fetch),
41           URI::Fetch will keep track of the Last-Modified and ETag headers
42           from the server, allowing you to only download pages that have been
43           modified since the last time you checked.
44
45       * Proper understanding of HTTP error codes
46           Certain HTTP error codes are special, particularly when fetching
47           syndication feeds, and well-written clients should pay special
48           attention to them.  URI::Fetch can only do so much for you in this
49           regard, but it gives you the tools to be a well-written client.
50
51           The response from fetch gives you the raw HTTP response code, along
52           with special handling of 4 codes:
53
54           * 200 (OK)
55               Signals that the content of a page/feed was retrieved success‐
56               fully.
57
58           * 301 (Moved Permanently)
59               Signals that a page/feed has moved permanently, and that your
60               database of feeds should be updated to reflect the new URI.
61
62           * 304 (Not Modified)
63               Signals that a page/feed has not changed since it was last
64               fetched.
65
66           * 410 (Gone)
67               Signals that a page/feed is gone and will never be coming back,
68               so you should stop trying to fetch it.
69

USAGE

71       URI::Fetch->fetch($uri, %param)
72
73       Fetches a page identified by the URI $uri.
74
75       On success, returns a URI::Fetch::Response object; on failure, returns
76       "undef".
77
78       %param can contain:
79
80       * LastModified
81       * ETag
82           LastModified and ETag can be supplied to force the server to only
83           return the full page if it's changed since the last request. If
84           you're writing your own feed client, this is recommended practice,
85           because it limits both your bandwidth use and the server's.
86
87           If you'd rather not have to store the LastModified time and ETag
88           yourself, see the Cache parameter below (and the SYNOPSIS above).
89
90       * Cache
91           If you'd like URI::Fetch to cache responses between requests, pro‐
92           vide the Cache parameter with an object supporting the Cache API
93           (e.g.  Cache::File, Cache::Memory). Specifically, an object that
94           supports "$cache->get($key)" and "$cache->set($key, $value,
95           $expires)".
96
97           If supplied, URI::Fetch will store the page content, ETag, and
98           last-modified time of the response in the cache, and will pull the
99           content from the cache on subsequent requests if the page returns a
100           Not-Modified response.
101
102       * UserAgent
103           Optional.  You may provide your own LWP::UserAgent instance.  Look
104           into LWPx::ParanoidUserAgent if you're fetching URLs given to you
105           by possibly malicious parties.
106
107       * NoNetwork
108           Optional.  Controls the interaction between the cache and HTTP
109           requests with If-Modified-Since/If-None-Match headers.  Possible
110           behaviors are:
111
112           false (default)
113               If a page is in the cache, the origin HTTP server is always
114               checked for a fresher copy with an If-Modified-Since and/or If-
115               None-Match header.
116
117           1   If set to 1, the origin HTTP is never contacted, regardless of
118               the page being in cache or not.  If the page is missing from
119               cache, the fetch method will return undef.  If the page is in
120               cache, that page will be returned, no matter how old it is.
121               Note that setting this option means the URI::Fetch::Response
122               object will never have the http_response member set.
123
124           "N", where N > 1
125               The origin HTTP server is not contacted if the page is in cache
126               and the cached page was inserted in the last N seconds.  If the
127               cached copy is older than N seconds, a normal HTTP request
128               (full or cache check) is done.
129
130       * ContentAlterHook
131           Optional.  A subref that gets called with a scalar reference to
132           your content so you can modify the content before it's returned and
133           before it's put in cache.
134
135           For instance, you may want to only cache the <head> section of an
136           HTML document, or you may want to take a feed URL and cache only a
137           pre-parsed version of it.  If you modify the scalarref given to
138           your hook and change it into a hashref, scalarref, or some blessed
139           object, that same value will be returned to you later on not-modi‐
140           fied responses.
141
142       * CacheEntryGrep
143           Optional.  A subref that gets called with the URI::Fetch::Response
144           object about to be cached (with the contents already possibly
145           transformed by your "ContentAlterHook").  If your subref returns
146           true, the page goes into the cache.  If false, it doesn't.
147
148       * Freeze
149       * Thaw
150           Optional. Subrefs that get called to serialize and deserialize,
151           respectively, the data that will be cached. The cached data should
152           be assumed to be an arbitrary Perl data structure, containing
153           (potentially) references to arrays, hashes, etc.
154
155           Freeze should serialize the structure into a scalar; Thaw should
156           deserialize the scalar into a data structure.
157
158           By default, Storable will be used for freezing and thawing the
159           cached data structure.
160
161       * ForceResponse
162           Optional. A boolean that indicates a URI::Fetch::Response should be
163           returned regardless of the HTTP status. By default "undef" is
164           returned when a response is not a "success" (200 codes) or one of
165           the recognized HTTP status codes listed above. The HTTP status mes‐
166           sage can then be retreived using the "errstr" method on the class.
167

LICENSE

169       URI::Fetch is free software; you may redistribute it and/or modify it
170       under the same terms as Perl itself.
171
173       Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin
174       Trott, ben+cpan@stupidfool.org. All rights reserved.
175
176
177
178perl v5.8.8                       2007-05-27                     URI::Fetch(3)
Impressum