1URI::Fetch(3) User Contributed Perl Documentation URI::Fetch(3)
2
3
4
6 URI::Fetch - Smart URI fetching/caching
7
9 use URI::Fetch;
10
11 ## Simple fetch.
12 my $res = URI::Fetch->fetch('http://example.com/atom.xml')
13 or die URI::Fetch->errstr;
14
15 ## Fetch using specified ETag and Last-Modified headers.
16 $res = URI::Fetch->fetch('http://example.com/atom.xml',
17 ETag => '123-ABC',
18 LastModified => time - 3600,
19 )
20 or die URI::Fetch->errstr;
21
22 ## Fetch using an on-disk cache that URI::Fetch manages for you.
23 my $cache = Cache::File->new( cache_root => '/tmp/cache' );
24 $res = URI::Fetch->fetch('http://example.com/atom.xml',
25 Cache => $cache
26 )
27 or die URI::Fetch->errstr;
28
30 URI::Fetch is a smart client for fetching HTTP pages, notably
31 syndication feeds (RSS, Atom, and others), in an intelligent,
32 bandwidth- and time-saving way. That means:
33
34 · GZIP support
35
36 If you have Compress::Zlib installed, URI::Fetch will automatically
37 try to download a compressed version of the content, saving
38 bandwidth (and time).
39
40 · Last-Modified and ETag support
41
42 If you use a local cache (see the Cache parameter to fetch),
43 URI::Fetch will keep track of the Last-Modified and ETag headers
44 from the server, allowing you to only download pages that have been
45 modified since the last time you checked.
46
47 · Proper understanding of HTTP error codes
48
49 Certain HTTP error codes are special, particularly when fetching
50 syndication feeds, and well-written clients should pay special
51 attention to them. URI::Fetch can only do so much for you in this
52 regard, but it gives you the tools to be a well-written client.
53
54 The response from fetch gives you the raw HTTP response code, along
55 with special handling of 4 codes:
56
57 · 200 (OK)
58
59 Signals that the content of a page/feed was retrieved
60 successfully.
61
62 · 301 (Moved Permanently)
63
64 Signals that a page/feed has moved permanently, and that your
65 database of feeds should be updated to reflect the new URI.
66
67 · 304 (Not Modified)
68
69 Signals that a page/feed has not changed since it was last
70 fetched.
71
72 · 410 (Gone)
73
74 Signals that a page/feed is gone and will never be coming back,
75 so you should stop trying to fetch it.
76
78 URI::Fetch->fetch($uri, %param)
79 Fetches a page identified by the URI $uri.
80
81 On success, returns a URI::Fetch::Response object; on failure, returns
82 "undef".
83
84 %param can contain:
85
86 · LastModified
87
88 · ETag
89
90 LastModified and ETag can be supplied to force the server to only
91 return the full page if it's changed since the last request. If
92 you're writing your own feed client, this is recommended practice,
93 because it limits both your bandwidth use and the server's.
94
95 If you'd rather not have to store the LastModified time and ETag
96 yourself, see the Cache parameter below (and the SYNOPSIS above).
97
98 · Cache
99
100 If you'd like URI::Fetch to cache responses between requests,
101 provide the Cache parameter with an object supporting the Cache API
102 (e.g. Cache::File, Cache::Memory). Specifically, an object that
103 supports "$cache->get($key)" and "$cache->set($key, $value,
104 $expires)".
105
106 If supplied, URI::Fetch will store the page content, ETag, and
107 last-modified time of the response in the cache, and will pull the
108 content from the cache on subsequent requests if the page returns a
109 Not-Modified response.
110
111 · UserAgent
112
113 Optional. You may provide your own LWP::UserAgent instance. Look
114 into LWPx::ParanoidUserAgent if you're fetching URLs given to you
115 by possibly malicious parties.
116
117 · NoNetwork
118
119 Optional. Controls the interaction between the cache and HTTP
120 requests with If-Modified-Since/If-None-Match headers. Possible
121 behaviors are:
122
123 false (default)
124 If a page is in the cache, the origin HTTP server is always
125 checked for a fresher copy with an If-Modified-Since and/or If-
126 None-Match header.
127
128 1 If set to 1, the origin HTTP is never contacted, regardless of
129 the page being in cache or not. If the page is missing from
130 cache, the fetch method will return undef. If the page is in
131 cache, that page will be returned, no matter how old it is.
132 Note that setting this option means the URI::Fetch::Response
133 object will never have the http_response member set.
134
135 "N", where N > 1
136 The origin HTTP server is not contacted if the page is in cache
137 and the cached page was inserted in the last N seconds. If the
138 cached copy is older than N seconds, a normal HTTP request
139 (full or cache check) is done.
140
141 · ContentAlterHook
142
143 Optional. A subref that gets called with a scalar reference to
144 your content so you can modify the content before it's returned and
145 before it's put in cache.
146
147 For instance, you may want to only cache the <head> section of an
148 HTML document, or you may want to take a feed URL and cache only a
149 pre-parsed version of it. If you modify the scalarref given to
150 your hook and change it into a hashref, scalarref, or some blessed
151 object, that same value will be returned to you later on not-
152 modified responses.
153
154 · CacheEntryGrep
155
156 Optional. A subref that gets called with the URI::Fetch::Response
157 object about to be cached (with the contents already possibly
158 transformed by your "ContentAlterHook"). If your subref returns
159 true, the page goes into the cache. If false, it doesn't.
160
161 · Freeze
162
163 · Thaw
164
165 Optional. Subrefs that get called to serialize and deserialize,
166 respectively, the data that will be cached. The cached data should
167 be assumed to be an arbitrary Perl data structure, containing
168 (potentially) references to arrays, hashes, etc.
169
170 Freeze should serialize the structure into a scalar; Thaw should
171 deserialize the scalar into a data structure.
172
173 By default, Storable will be used for freezing and thawing the
174 cached data structure.
175
176 · ForceResponse
177
178 Optional. A boolean that indicates a URI::Fetch::Response should be
179 returned regardless of the HTTP status. By default "undef" is
180 returned when a response is not a "success" (200 codes) or one of
181 the recognized HTTP status codes listed above. The HTTP status
182 message can then be retreived using the "errstr" method on the
183 class.
184
186 URI::Fetch is free software; you may redistribute it and/or modify it
187 under the same terms as Perl itself.
188
190 Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin
191 Trott, ben+cpan@stupidfool.org. All rights reserved.
192
193
194
195perl v5.12.3 2011-01-28 URI::Fetch(3)