1URI::Fetch(3)         User Contributed Perl Documentation        URI::Fetch(3)
2
3
4

NAME

6       URI::Fetch - Smart URI fetching/caching
7

SYNOPSIS

9           use URI::Fetch;
10
11           ## Simple fetch.
12           my $res = URI::Fetch->fetch('http://example.com/atom.xml')
13               or die URI::Fetch->errstr;
14           do_something($res->content) if $res->is_success;
15
16           ## Fetch using specified ETag and Last-Modified headers.
17           $res = URI::Fetch->fetch('http://example.com/atom.xml',
18                   ETag => '123-ABC',
19                   LastModified => time - 3600,
20           )
21               or die URI::Fetch->errstr;
22
23           ## Fetch using an on-disk cache that URI::Fetch manages for you.
24           my $cache = Cache::File->new( cache_root => '/tmp/cache' );
25           $res = URI::Fetch->fetch('http://example.com/atom.xml',
26                   Cache => $cache
27           )
28               or die URI::Fetch->errstr;
29

DESCRIPTION

31       URI::Fetch is a smart client for fetching HTTP pages, notably
32       syndication feeds (RSS, Atom, and others), in an intelligent,
33       bandwidth- and time-saving way. That means:
34
35       •   GZIP support
36
37           If you have Compress::Zlib installed, URI::Fetch will automatically
38           try to download a compressed version of the content, saving
39           bandwidth (and time).
40
41Last-Modified and ETag support
42
43           If you use a local cache (see the Cache parameter to fetch),
44           URI::Fetch will keep track of the Last-Modified and ETag headers
45           from the server, allowing you to only download pages that have been
46           modified since the last time you checked.
47
48       •   Proper understanding of HTTP error codes
49
50           Certain HTTP error codes are special, particularly when fetching
51           syndication feeds, and well-written clients should pay special
52           attention to them.  URI::Fetch can only do so much for you in this
53           regard, but it gives you the tools to be a well-written client.
54
55           The response from fetch gives you the raw HTTP response code, along
56           with special handling of 4 codes:
57
58           •   200 (OK)
59
60               Signals that the content of a page/feed was retrieved
61               successfully.
62
63           •   301 (Moved Permanently)
64
65               Signals that a page/feed has moved permanently, and that your
66               database of feeds should be updated to reflect the new URI.
67
68           •   304 (Not Modified)
69
70               Signals that a page/feed has not changed since it was last
71               fetched.
72
73           •   410 (Gone)
74
75               Signals that a page/feed is gone and will never be coming back,
76               so you should stop trying to fetch it.
77
78   Change from 0.09
79       If you make a request using a cache and get back a 304 response code
80       (Not Modified), then if the content was returned from the cache, then
81       "is_success()" will return true, and "$response->content" will contain
82       the cached content.
83
84       I think this is the right behaviour, given the philosophy of
85       "URI::Fetch", but please let me (NEILB) know if you disagree.
86

USAGE

88   URI::Fetch->fetch($uri, %param)
89       Fetches a page identified by the URI $uri.
90
91       On success, returns a URI::Fetch::Response object; on failure, returns
92       "undef".
93
94       %param can contain:
95
96       •   LastModified
97
98       •   ETag
99
100           LastModified and ETag can be supplied to force the server to only
101           return the full page if it's changed since the last request. If
102           you're writing your own feed client, this is recommended practice,
103           because it limits both your bandwidth use and the server's.
104
105           If you'd rather not have to store the LastModified time and ETag
106           yourself, see the Cache parameter below (and the SYNOPSIS above).
107
108       •   Cache
109
110           If you'd like URI::Fetch to cache responses between requests,
111           provide the Cache parameter with an object supporting the Cache API
112           (e.g.  Cache::File, Cache::Memory). Specifically, an object that
113           supports "$cache->get($key)" and "$cache->set($key, $value,
114           $expires)".
115
116           If supplied, URI::Fetch will store the page content, ETag, and
117           last-modified time of the response in the cache, and will pull the
118           content from the cache on subsequent requests if the page returns a
119           Not-Modified response.
120
121       •   UserAgent
122
123           Optional.  You may provide your own LWP::UserAgent instance.  Look
124           into LWPx::ParanoidUserAgent if you're fetching URLs given to you
125           by possibly malicious parties.
126
127       •   NoNetwork
128
129           Optional.  Controls the interaction between the cache and HTTP
130           requests with If-Modified-Since/If-None-Match headers.  Possible
131           behaviors are:
132
133           false (default)
134               If a page is in the cache, the origin HTTP server is always
135               checked for a fresher copy with an If-Modified-Since and/or If-
136               None-Match header.
137
138           1   If set to 1, the origin HTTP is never contacted, regardless of
139               the page being in cache or not.  If the page is missing from
140               cache, the fetch method will return undef.  If the page is in
141               cache, that page will be returned, no matter how old it is.
142               Note that setting this option means the URI::Fetch::Response
143               object will never have the http_response member set.
144
145           "N", where N > 1
146               The origin HTTP server is not contacted if the page is in cache
147               and the cached page was inserted in the last N seconds.  If the
148               cached copy is older than N seconds, a normal HTTP request
149               (full or cache check) is done.
150
151       •   ContentAlterHook
152
153           Optional.  A subref that gets called with a scalar reference to
154           your content so you can modify the content before it's returned and
155           before it's put in cache.
156
157           For instance, you may want to only cache the <head> section of an
158           HTML document, or you may want to take a feed URL and cache only a
159           pre-parsed version of it.  If you modify the scalarref given to
160           your hook and change it into a hashref, scalarref, or some blessed
161           object, that same value will be returned to you later on not-
162           modified responses.
163
164       •   CacheEntryGrep
165
166           Optional.  A subref that gets called with the URI::Fetch::Response
167           object about to be cached (with the contents already possibly
168           transformed by your "ContentAlterHook").  If your subref returns
169           true, the page goes into the cache.  If false, it doesn't.
170
171       •   Freeze
172
173       •   Thaw
174
175           Optional. Subrefs that get called to serialize and deserialize,
176           respectively, the data that will be cached. The cached data should
177           be assumed to be an arbitrary Perl data structure, containing
178           (potentially) references to arrays, hashes, etc.
179
180           Freeze should serialize the structure into a scalar; Thaw should
181           deserialize the scalar into a data structure.
182
183           By default, Storable will be used for freezing and thawing the
184           cached data structure.
185
186       •   ForceResponse
187
188           Optional. A boolean that indicates a URI::Fetch::Response should be
189           returned regardless of the HTTP status. By default "undef" is
190           returned when a response is not a "success" (200 codes) or one of
191           the recognized HTTP status codes listed above. The HTTP status
192           message can then be retreived using the "errstr" method on the
193           class.
194

REPOSITORY

196       <https://github.com/neilbowers/URI-Fetch>
197

LICENSE

199       URI::Fetch is free software; you may redistribute it and/or modify it
200       under the same terms as Perl itself.
201
203       Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin
204       Trott, ben+cpan@stupidfool.org. All rights reserved.
205
206       Currently maintained by Neil Bowers.
207

CONTRIBUTORS

209       •   Tim Appnel
210
211       •   Mario Domgoergen
212
213       •   Karen Etheridge
214
215       •   Brad Fitzpatrick
216
217       •   Jason Hall
218
219       •   Naoya Ito
220
221       •   Tatsuhiko Miyagawa
222
223
224
225perl v5.36.0                      2022-07-22                     URI::Fetch(3)
Impressum