1URI::Fetch(3) User Contributed Perl Documentation URI::Fetch(3)
2
3
4
6 URI::Fetch - Smart URI fetching/caching
7
9 use URI::Fetch;
10
11 ## Simple fetch.
12 my $res = URI::Fetch->fetch('http://example.com/atom.xml')
13 or die URI::Fetch->errstr;
14 do_something($res->content) if $res->is_success;
15
16 ## Fetch using specified ETag and Last-Modified headers.
17 $res = URI::Fetch->fetch('http://example.com/atom.xml',
18 ETag => '123-ABC',
19 LastModified => time - 3600,
20 )
21 or die URI::Fetch->errstr;
22
23 ## Fetch using an on-disk cache that URI::Fetch manages for you.
24 my $cache = Cache::File->new( cache_root => '/tmp/cache' );
25 $res = URI::Fetch->fetch('http://example.com/atom.xml',
26 Cache => $cache
27 )
28 or die URI::Fetch->errstr;
29
31 URI::Fetch is a smart client for fetching HTTP pages, notably
32 syndication feeds (RSS, Atom, and others), in an intelligent,
33 bandwidth- and time-saving way. That means:
34
35 • GZIP support
36
37 If you have Compress::Zlib installed, URI::Fetch will automatically
38 try to download a compressed version of the content, saving
39 bandwidth (and time).
40
41 • Last-Modified and ETag support
42
43 If you use a local cache (see the Cache parameter to fetch),
44 URI::Fetch will keep track of the Last-Modified and ETag headers
45 from the server, allowing you to only download pages that have been
46 modified since the last time you checked.
47
48 • Proper understanding of HTTP error codes
49
50 Certain HTTP error codes are special, particularly when fetching
51 syndication feeds, and well-written clients should pay special
52 attention to them. URI::Fetch can only do so much for you in this
53 regard, but it gives you the tools to be a well-written client.
54
55 The response from fetch gives you the raw HTTP response code, along
56 with special handling of 4 codes:
57
58 • 200 (OK)
59
60 Signals that the content of a page/feed was retrieved
61 successfully.
62
63 • 301 (Moved Permanently)
64
65 Signals that a page/feed has moved permanently, and that your
66 database of feeds should be updated to reflect the new URI.
67
68 • 304 (Not Modified)
69
70 Signals that a page/feed has not changed since it was last
71 fetched.
72
73 • 410 (Gone)
74
75 Signals that a page/feed is gone and will never be coming back,
76 so you should stop trying to fetch it.
77
78 Change from 0.09
79 If you make a request using a cache and get back a 304 response code
80 (Not Modified), then if the content was returned from the cache, then
81 is_success() will return true, and "$response->content" will contain
82 the cached content.
83
84 I think this is the right behaviour, given the philosophy of
85 "URI::Fetch", but please let me (NEILB) know if you disagree.
86
88 URI::Fetch->fetch($uri, %param)
89 Fetches a page identified by the URI $uri.
90
91 On success, returns a URI::Fetch::Response object; on failure, returns
92 "undef".
93
94 %param can contain:
95
96 • LastModified
97
98 • ETag
99
100 LastModified and ETag can be supplied to force the server to only
101 return the full page if it's changed since the last request. If
102 you're writing your own feed client, this is recommended practice,
103 because it limits both your bandwidth use and the server's.
104
105 If you'd rather not have to store the LastModified time and ETag
106 yourself, see the Cache parameter below (and the SYNOPSIS above).
107
108 • Cache
109
110 If you'd like URI::Fetch to cache responses between requests,
111 provide the Cache parameter with an object supporting the Cache API
112 (e.g. Cache::File, Cache::Memory). Specifically, an object that
113 supports "$cache->get($key)" and "$cache->set($key, $value,
114 $expires)".
115
116 If supplied, URI::Fetch will store the page content, ETag, and
117 last-modified time of the response in the cache, and will pull the
118 content from the cache on subsequent requests if the page returns a
119 Not-Modified response.
120
121 • UserAgent
122
123 Optional. You may provide your own LWP::UserAgent instance. Look
124 into LWPx::ParanoidUserAgent if you're fetching URLs given to you
125 by possibly malicious parties.
126
127 • NoNetwork
128
129 Optional. Controls the interaction between the cache and HTTP
130 requests with If-Modified-Since/If-None-Match headers. Possible
131 behaviors are:
132
133 false (default)
134 If a page is in the cache, the origin HTTP server is always
135 checked for a fresher copy with an If-Modified-Since and/or If-
136 None-Match header.
137
138 1 If set to 1, the origin HTTP is never contacted, regardless of
139 the page being in cache or not. If the page is missing from
140 cache, the fetch method will return undef. If the page is in
141 cache, that page will be returned, no matter how old it is.
142 Note that setting this option means the URI::Fetch::Response
143 object will never have the http_response member set.
144
145 "N", where N > 1
146 The origin HTTP server is not contacted if the page is in cache
147 and the cached page was inserted in the last N seconds. If the
148 cached copy is older than N seconds, a normal HTTP request
149 (full or cache check) is done.
150
151 • ContentAlterHook
152
153 Optional. A subref that gets called with a scalar reference to
154 your content so you can modify the content before it's returned and
155 before it's put in cache.
156
157 For instance, you may want to only cache the <head> section of an
158 HTML document, or you may want to take a feed URL and cache only a
159 pre-parsed version of it. If you modify the scalarref given to
160 your hook and change it into a hashref, scalarref, or some blessed
161 object, that same value will be returned to you later on not-
162 modified responses.
163
164 • CacheEntryGrep
165
166 Optional. A subref that gets called with the URI::Fetch::Response
167 object about to be cached (with the contents already possibly
168 transformed by your "ContentAlterHook"). If your subref returns
169 true, the page goes into the cache. If false, it doesn't.
170
171 • Freeze
172
173 • Thaw
174
175 Optional. Subrefs that get called to serialize and deserialize,
176 respectively, the data that will be cached. The cached data should
177 be assumed to be an arbitrary Perl data structure, containing
178 (potentially) references to arrays, hashes, etc.
179
180 Freeze should serialize the structure into a scalar; Thaw should
181 deserialize the scalar into a data structure.
182
183 By default, Storable will be used for freezing and thawing the
184 cached data structure.
185
186 • ForceResponse
187
188 Optional. A boolean that indicates a URI::Fetch::Response should be
189 returned regardless of the HTTP status. By default "undef" is
190 returned when a response is not a "success" (200 codes) or one of
191 the recognized HTTP status codes listed above. The HTTP status
192 message can then be retreived using the "errstr" method on the
193 class.
194
196 <https://github.com/neilbowers/URI-Fetch>
197
199 URI::Fetch is free software; you may redistribute it and/or modify it
200 under the same terms as Perl itself.
201
203 Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin
204 Trott, ben+cpan@stupidfool.org. All rights reserved.
205
206 Currently maintained by Neil Bowers.
207
209 • Tim Appnel
210
211 • Mario Domgoergen
212
213 • Karen Etheridge
214
215 • Brad Fitzpatrick
216
217 • Jason Hall
218
219 • Naoya Ito
220
221 • Tatsuhiko Miyagawa
222
223
224
225perl v5.36.0 2023-01-20 URI::Fetch(3)