1URI::Fetch(3)         User Contributed Perl Documentation        URI::Fetch(3)
2
3
4

NAME

6       URI::Fetch - Smart URI fetching/caching
7

SYNOPSIS

9           use URI::Fetch;
10
11           ## Simple fetch.
12           my $res = URI::Fetch->fetch('http://example.com/atom.xml')
13               or die URI::Fetch->errstr;
14
15           ## Fetch using specified ETag and Last-Modified headers.
16           $res = URI::Fetch->fetch('http://example.com/atom.xml',
17                   ETag => '123-ABC',
18                   LastModified => time - 3600,
19           )
20               or die URI::Fetch->errstr;
21
22           ## Fetch using an on-disk cache that URI::Fetch manages for you.
23           my $cache = Cache::File->new( cache_root => '/tmp/cache' );
24           $res = URI::Fetch->fetch('http://example.com/atom.xml',
25                   Cache => $cache
26           )
27               or die URI::Fetch->errstr;
28

DESCRIPTION

30       URI::Fetch is a smart client for fetching HTTP pages, notably
31       syndication feeds (RSS, Atom, and others), in an intelligent,
32       bandwidth- and time-saving way. That means:
33
34       ·   GZIP support
35
36           If you have Compress::Zlib installed, URI::Fetch will automatically
37           try to download a compressed version of the content, saving
38           bandwidth (and time).
39
40       ·   Last-Modified and ETag support
41
42           If you use a local cache (see the Cache parameter to fetch),
43           URI::Fetch will keep track of the Last-Modified and ETag headers
44           from the server, allowing you to only download pages that have been
45           modified since the last time you checked.
46
47       ·   Proper understanding of HTTP error codes
48
49           Certain HTTP error codes are special, particularly when fetching
50           syndication feeds, and well-written clients should pay special
51           attention to them.  URI::Fetch can only do so much for you in this
52           regard, but it gives you the tools to be a well-written client.
53
54           The response from fetch gives you the raw HTTP response code, along
55           with special handling of 4 codes:
56
57           ·   200 (OK)
58
59               Signals that the content of a page/feed was retrieved
60               successfully.
61
62           ·   301 (Moved Permanently)
63
64               Signals that a page/feed has moved permanently, and that your
65               database of feeds should be updated to reflect the new URI.
66
67           ·   304 (Not Modified)
68
69               Signals that a page/feed has not changed since it was last
70               fetched.
71
72           ·   410 (Gone)
73
74               Signals that a page/feed is gone and will never be coming back,
75               so you should stop trying to fetch it.
76

USAGE

78   URI::Fetch->fetch($uri, %param)
79       Fetches a page identified by the URI $uri.
80
81       On success, returns a URI::Fetch::Response object; on failure, returns
82       "undef".
83
84       %param can contain:
85
86       ·   LastModified
87
88       ·   ETag
89
90           LastModified and ETag can be supplied to force the server to only
91           return the full page if it's changed since the last request. If
92           you're writing your own feed client, this is recommended practice,
93           because it limits both your bandwidth use and the server's.
94
95           If you'd rather not have to store the LastModified time and ETag
96           yourself, see the Cache parameter below (and the SYNOPSIS above).
97
98       ·   Cache
99
100           If you'd like URI::Fetch to cache responses between requests,
101           provide the Cache parameter with an object supporting the Cache API
102           (e.g.  Cache::File, Cache::Memory). Specifically, an object that
103           supports "$cache->get($key)" and "$cache->set($key, $value,
104           $expires)".
105
106           If supplied, URI::Fetch will store the page content, ETag, and
107           last-modified time of the response in the cache, and will pull the
108           content from the cache on subsequent requests if the page returns a
109           Not-Modified response.
110
111       ·   UserAgent
112
113           Optional.  You may provide your own LWP::UserAgent instance.  Look
114           into LWPx::ParanoidUserAgent if you're fetching URLs given to you
115           by possibly malicious parties.
116
117       ·   NoNetwork
118
119           Optional.  Controls the interaction between the cache and HTTP
120           requests with If-Modified-Since/If-None-Match headers.  Possible
121           behaviors are:
122
123           false (default)
124               If a page is in the cache, the origin HTTP server is always
125               checked for a fresher copy with an If-Modified-Since and/or If-
126               None-Match header.
127
128           1   If set to 1, the origin HTTP is never contacted, regardless of
129               the page being in cache or not.  If the page is missing from
130               cache, the fetch method will return undef.  If the page is in
131               cache, that page will be returned, no matter how old it is.
132               Note that setting this option means the URI::Fetch::Response
133               object will never have the http_response member set.
134
135           "N", where N > 1
136               The origin HTTP server is not contacted if the page is in cache
137               and the cached page was inserted in the last N seconds.  If the
138               cached copy is older than N seconds, a normal HTTP request
139               (full or cache check) is done.
140
141       ·   ContentAlterHook
142
143           Optional.  A subref that gets called with a scalar reference to
144           your content so you can modify the content before it's returned and
145           before it's put in cache.
146
147           For instance, you may want to only cache the <head> section of an
148           HTML document, or you may want to take a feed URL and cache only a
149           pre-parsed version of it.  If you modify the scalarref given to
150           your hook and change it into a hashref, scalarref, or some blessed
151           object, that same value will be returned to you later on not-
152           modified responses.
153
154       ·   CacheEntryGrep
155
156           Optional.  A subref that gets called with the URI::Fetch::Response
157           object about to be cached (with the contents already possibly
158           transformed by your "ContentAlterHook").  If your subref returns
159           true, the page goes into the cache.  If false, it doesn't.
160
161       ·   Freeze
162
163       ·   Thaw
164
165           Optional. Subrefs that get called to serialize and deserialize,
166           respectively, the data that will be cached. The cached data should
167           be assumed to be an arbitrary Perl data structure, containing
168           (potentially) references to arrays, hashes, etc.
169
170           Freeze should serialize the structure into a scalar; Thaw should
171           deserialize the scalar into a data structure.
172
173           By default, Storable will be used for freezing and thawing the
174           cached data structure.
175
176       ·   ForceResponse
177
178           Optional. A boolean that indicates a URI::Fetch::Response should be
179           returned regardless of the HTTP status. By default "undef" is
180           returned when a response is not a "success" (200 codes) or one of
181           the recognized HTTP status codes listed above. The HTTP status
182           message can then be retreived using the "errstr" method on the
183           class.
184

LICENSE

186       URI::Fetch is free software; you may redistribute it and/or modify it
187       under the same terms as Perl itself.
188
190       Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin
191       Trott, ben+cpan@stupidfool.org. All rights reserved.
192
193
194
195perl v5.12.3                      2011-01-28                     URI::Fetch(3)
Impressum