1URI::Fetch(3) User Contributed Perl Documentation URI::Fetch(3)
2
3
4
6 URI::Fetch - Smart URI fetching/caching
7
9 use URI::Fetch;
10
11 ## Simple fetch.
12 my $res = URI::Fetch->fetch('http://example.com/atom.xml')
13 or die URI::Fetch->errstr;
14
15 ## Fetch using specified ETag and Last-Modified headers.
16 my $res = URI::Fetch->fetch('http://example.com/atom.xml',
17 ETag => '123-ABC',
18 LastModified => time - 3600,
19 )
20 or die URI::Fetch->errstr;
21
22 ## Fetch using an on-disk cache that URI::Fetch manages for you.
23 my $cache = Cache::File->new( cache_root => '/tmp/cache' );
24 my $res = URI::Fetch->fetch('http://example.com/atom.xml',
25 Cache => $cache
26 )
27 or die URI::Fetch->errstr;
28
30 URI::Fetch is a smart client for fetching HTTP pages, notably syndica‐
31 tion feeds (RSS, Atom, and others), in an intelligent, bandwidth- and
32 time-saving way. That means:
33
34 * GZIP support
35 If you have Compress::Zlib installed, URI::Fetch will automatically
36 try to download a compressed version of the content, saving band‐
37 width (and time).
38
39 * Last-Modified and ETag support
40 If you use a local cache (see the Cache parameter to fetch),
41 URI::Fetch will keep track of the Last-Modified and ETag headers
42 from the server, allowing you to only download pages that have been
43 modified since the last time you checked.
44
45 * Proper understanding of HTTP error codes
46 Certain HTTP error codes are special, particularly when fetching
47 syndication feeds, and well-written clients should pay special
48 attention to them. URI::Fetch can only do so much for you in this
49 regard, but it gives you the tools to be a well-written client.
50
51 The response from fetch gives you the raw HTTP response code, along
52 with special handling of 4 codes:
53
54 * 200 (OK)
55 Signals that the content of a page/feed was retrieved success‐
56 fully.
57
58 * 301 (Moved Permanently)
59 Signals that a page/feed has moved permanently, and that your
60 database of feeds should be updated to reflect the new URI.
61
62 * 304 (Not Modified)
63 Signals that a page/feed has not changed since it was last
64 fetched.
65
66 * 410 (Gone)
67 Signals that a page/feed is gone and will never be coming back,
68 so you should stop trying to fetch it.
69
71 URI::Fetch->fetch($uri, %param)
72
73 Fetches a page identified by the URI $uri.
74
75 On success, returns a URI::Fetch::Response object; on failure, returns
76 "undef".
77
78 %param can contain:
79
80 * LastModified
81 * ETag
82 LastModified and ETag can be supplied to force the server to only
83 return the full page if it's changed since the last request. If
84 you're writing your own feed client, this is recommended practice,
85 because it limits both your bandwidth use and the server's.
86
87 If you'd rather not have to store the LastModified time and ETag
88 yourself, see the Cache parameter below (and the SYNOPSIS above).
89
90 * Cache
91 If you'd like URI::Fetch to cache responses between requests, pro‐
92 vide the Cache parameter with an object supporting the Cache API
93 (e.g. Cache::File, Cache::Memory). Specifically, an object that
94 supports "$cache->get($key)" and "$cache->set($key, $value,
95 $expires)".
96
97 If supplied, URI::Fetch will store the page content, ETag, and
98 last-modified time of the response in the cache, and will pull the
99 content from the cache on subsequent requests if the page returns a
100 Not-Modified response.
101
102 * UserAgent
103 Optional. You may provide your own LWP::UserAgent instance. Look
104 into LWPx::ParanoidUserAgent if you're fetching URLs given to you
105 by possibly malicious parties.
106
107 * NoNetwork
108 Optional. Controls the interaction between the cache and HTTP
109 requests with If-Modified-Since/If-None-Match headers. Possible
110 behaviors are:
111
112 false (default)
113 If a page is in the cache, the origin HTTP server is always
114 checked for a fresher copy with an If-Modified-Since and/or If-
115 None-Match header.
116
117 1 If set to 1, the origin HTTP is never contacted, regardless of
118 the page being in cache or not. If the page is missing from
119 cache, the fetch method will return undef. If the page is in
120 cache, that page will be returned, no matter how old it is.
121 Note that setting this option means the URI::Fetch::Response
122 object will never have the http_response member set.
123
124 "N", where N > 1
125 The origin HTTP server is not contacted if the page is in cache
126 and the cached page was inserted in the last N seconds. If the
127 cached copy is older than N seconds, a normal HTTP request
128 (full or cache check) is done.
129
130 * ContentAlterHook
131 Optional. A subref that gets called with a scalar reference to
132 your content so you can modify the content before it's returned and
133 before it's put in cache.
134
135 For instance, you may want to only cache the <head> section of an
136 HTML document, or you may want to take a feed URL and cache only a
137 pre-parsed version of it. If you modify the scalarref given to
138 your hook and change it into a hashref, scalarref, or some blessed
139 object, that same value will be returned to you later on not-modi‐
140 fied responses.
141
142 * CacheEntryGrep
143 Optional. A subref that gets called with the URI::Fetch::Response
144 object about to be cached (with the contents already possibly
145 transformed by your "ContentAlterHook"). If your subref returns
146 true, the page goes into the cache. If false, it doesn't.
147
148 * Freeze
149 * Thaw
150 Optional. Subrefs that get called to serialize and deserialize,
151 respectively, the data that will be cached. The cached data should
152 be assumed to be an arbitrary Perl data structure, containing
153 (potentially) references to arrays, hashes, etc.
154
155 Freeze should serialize the structure into a scalar; Thaw should
156 deserialize the scalar into a data structure.
157
158 By default, Storable will be used for freezing and thawing the
159 cached data structure.
160
161 * ForceResponse
162 Optional. A boolean that indicates a URI::Fetch::Response should be
163 returned regardless of the HTTP status. By default "undef" is
164 returned when a response is not a "success" (200 codes) or one of
165 the recognized HTTP status codes listed above. The HTTP status mes‐
166 sage can then be retreived using the "errstr" method on the class.
167
169 URI::Fetch is free software; you may redistribute it and/or modify it
170 under the same terms as Perl itself.
171
173 Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin
174 Trott, ben+cpan@stupidfool.org. All rights reserved.
175
176
177
178perl v5.8.8 2007-05-27 URI::Fetch(3)