1uri(n) Tcl Uniform Resource Identifier Management uri(n)
2
3
4
5______________________________________________________________________________
6
8 uri - URI utilities
9
11 package require Tcl 8.2
12
13 package require uri ?1.2.7?
14
15 uri::setQuirkOption option ?value?
16
17 uri::split url ?defaultscheme?
18
19 uri::join ?key value?...
20
21 uri::resolve base url
22
23 uri::isrelative url
24
25 uri::geturl url ?options...?
26
27 uri::canonicalize uri
28
29 uri::register schemeList script
30
31______________________________________________________________________________
32
34 This package does two things.
35
36 First, it provides a number of commands for manipulating URLs/URIs and
37 fetching data specified by them. For fetching data this package analy‐
38 ses the requested URL/URI and then dispatches it to the appropriate
39 package (http, ftp, ...) for actual retrieval. Currently these com‐
40 mands are defined for the schemes http, https, ftp, mailto, news, ldap,
41 ldaps and file. The package uri::urn adds scheme urn.
42
43 Second, it provides regular expressions for a number of registered
44 URL/URI schemes. Registered schemes are currently ftp, ldap, ldaps,
45 file, http, https, gopher, mailto, news, wais and prospero. The pack‐
46 age uri::urn adds scheme urn.
47
48 The commands of the package conform to RFC 3986 (https://www.rfc-edi‐
49 tor.org/rfc/rfc3986.txt), with the exception of a loophole arising from
50 RFC 1630 and described in RFC 3986 Sections 5.2.2 and 5.4.2. The loop‐
51 hole allows a relative URI to include a scheme if it is the same as the
52 scheme of the base URI against which it is resolved. RFC 3986 recom‐
53 mends avoiding this usage.
54
56 uri::setQuirkOption option ?value?
57 uri::setQuirkOption is an accessor command for a number of
58 "quirk options". The command has the same semantics as the com‐
59 mand set: when called with one argument it reads an existing
60 value; with two arguments it writes a new value. The value of a
61 "quirk option" is boolean: the value false requests conformance
62 with RFC 3986, while true requests use of the quirk. See sec‐
63 tion QUIRK OPTIONS for discussion of the different options and
64 their purpose.
65
66 uri::split url ?defaultscheme?
67 uri::split takes a url, decodes it and then returns a list of
68 key/value pairs suitable for array set containing the con‐
69 stituents of the url. If the scheme is missing from the url it
70 defaults to the value of defaultscheme if it was specified, or
71 http else. Currently the schemes http, https, ftp, mailto, news,
72 ldap, ldaps and file are supported by the package itself. See
73 section EXTENDING on how to expand that range.
74
75 The set of constituents of a URL (= the set of keys in the
76 returned dictionary) is dependent on the scheme of the URL. The
77 only key which is therefore always present is scheme. For the
78 following schemes the constituents and their keys are known:
79
80 ftp user, pwd, host, port, path, type, pbare. The pbare is
81 optional.
82
83 http(s)
84 user, pwd, host, port, path, query, fragment, pbare. The
85 pbare is optional.
86
87 file path, host. The host is optional.
88
89 mailto user, host. The host is optional.
90
91 ldap(s)
92 host, port, dn, attrs, scope, filter, extensions
93
94 news Either message-id or newsgroup-name.
95
96 For discussion of the boolean pbare see options NoInitialSlash
97 and NoExtraKeys in QUIRK OPTIONS.
98
99 The constituents are returned as slices of the argument url,
100 without removal of percent-encoding ("url-encoding") or other
101 adaptations. Notably, on Windows® the path in scheme file is
102 not a valid local filename. See EXAMPLES for more information.
103
104
105 uri::join ?key value?...
106 uri::join takes a list of key/value pairs (generated by
107 uri::split, for example) and returns the canonical URL they rep‐
108 resent. Currently the schemes http, https, ftp, mailto, news,
109 ldap, ldaps and file are supported by the package itself. See
110 section EXTENDING on how to expand that range.
111
112 The arguments are expected to be slices of a valid URL, with
113 percent-encoding ("url-encoding") and any other necessary adap‐
114 tations. Notably, on Windows the path in scheme file is not a
115 valid local filename. See EXAMPLES for more information.
116
117 uri::resolve base url
118 uri::resolve resolves the specified url relative to base, in
119 conformance with RFC 3986. In other words: a non-relative url is
120 returned unchanged, whereas for a relative url the missing parts
121 are taken from base and prepended to it. The result of this
122 operation is returned. For an empty url the result is base,
123 without its URI fragment (if any). The command is available for
124 schemes http, https, ftp, and file.
125
126 uri::isrelative url
127 uri::isrelative determines whether the specified url is absolute
128 or relative. The command is available for a url of any scheme.
129
130 uri::geturl url ?options...?
131 uri::geturl decodes the specified url and then dispatches the
132 request to the package appropriate for the scheme found in the
133 URL. The command assumes that the package to handle the given
134 scheme either has the same name as the scheme itself (including
135 possible capitalization) followed by ::geturl, or, in case of
136 this failing, has the same name as the scheme itself (including
137 possible capitalization). It further assumes that whatever pack‐
138 age was loaded provides a geturl-command in the namespace of the
139 same name as the package itself. This command is called with the
140 given url and all given options. Currently geturl does not han‐
141 dle any options itself.
142
143 Note: file-URLs are an exception to the rule described above.
144 They are handled internally.
145
146 It is not possible to specify results of the command. They
147 depend on the geturl-command for the scheme the request was dis‐
148 patched to.
149
150 uri::canonicalize uri
151 uri::canonicalize returns the canonical form of a URI. The
152 canonical form of a URI is one where relative path specifica‐
153 tions, i.e. "." and "..", have been resolved. The command is
154 available for all URI schemes that have uri::split and uri::join
155 commands. The command returns a canonicalized URI if the URI
156 scheme has a path component (i.e. http, https, ftp, and file).
157 For schemes that have uri::split and uri::join commands but no
158 path component (i.e. mailto, news, ldap, and ldaps), the command
159 returns the uri unchanged.
160
161 uri::register schemeList script
162 uri::register registers the first element of schemeList as a new
163 scheme and the remaining elements as aliases for this scheme. It
164 creates the namespace for the scheme and executes the script in
165 the new namespace. The script has to declare variables contain‐
166 ing regular expressions relevant to the scheme. At least the
167 variable schemepart has to be declared as that one is used to
168 extend the variables keeping track of the registered schemes.
169
171 In addition to the commands mentioned above this package provides regu‐
172 lar expression to recognize URLs for a number of URL schemes.
173
174 For each supported scheme a namespace of the same name as the scheme
175 itself is provided inside of the namespace uri containing the variable
176 url whose contents are a regular expression to recognize URLs of that
177 scheme. Additional variables may contain regular expressions for parts
178 of URLs for that scheme.
179
180 The variable uri::schemes contains a list of all registered schemes.
181 Currently these are ftp, ldap, ldaps, file, http, https, gopher,
182 mailto, news, wais and prospero.
183
185 Extending the range of schemes supported by uri::split and uri::join is
186 easy because both commands do not handle the request by themselves but
187 dispatch it to another command in the uri namespace using the scheme of
188 the URL as criterion.
189
190 uri::split and uri::join call Split[string totitle <scheme>] and
191 Join[string totitle <scheme>] respectively.
192
193 The provision of split and join commands is sufficient to extend the
194 commands uri::canonicalize and uri::geturl (the latter subject to the
195 availability of a suitable package with a geturl command). In con‐
196 trast, to extend the command uri::resolve to a new scheme, the command
197 itself must be modified.
198
199 To extend the range of schemes for which pattern information is avail‐
200 able, use the command uri::register.
201
202 An example of a package that provides both commands and pattern infor‐
203 mation for a new scheme is uri::urn, which adds scheme urn.
204
206 The value of a "quirk option" is boolean: the value false requests con‐
207 formance with RFC 3986, while true requests use of the quirk. Use com‐
208 mand uri::setQuirkOption to access the values of quirk options.
209
210 Quirk options are useful both for allowing backwards compatibility when
211 a command specification changes, and for adding useful features that
212 are not included in RFC specifications. The following quirk options
213 are currently defined:
214
215 NoInitialSlash
216 This quirk option concerns the leading character of path (if
217 non-empty) in the schemes http, https, and ftp.
218
219 RFC 3986 defines path in an absolute URI to have an initial "/",
220 unless the value of path is the empty string. For the scheme
221 file, all versions of package uri follow this rule. The quirk
222 option NoInitialSlash does not apply to scheme file.
223
224 For the schemes http, https, and ftp, versions of uri before
225 1.2.7 define the path NOT to include an initial "/". When the
226 quirk option NoInitialSlash is true (the default), this behavior
227 is also used in version 1.2.7. To use instead values of path as
228 defined by RFC 3986, set this quirk option to false.
229
230 This setting does not affect RFC 3986 conformance. If NoIni‐
231 tialSlash is true, then the value of path in the schemes http,
232 https, or ftp, cannot distinguish between URIs in which the full
233 "RFC 3986 path" is the empty string "" or a single slash "/"
234 respectively. The missing information is recorded in an addi‐
235 tional uri::split key pbare.
236
237 The boolean pbare is defined when quirk options NoInitialSlash
238 and NoExtraKeys have values true and false respectively. In
239 this case, if the value of path is the empty string "", pbare is
240 true if the full "RFC 3986 path" is "", and pbare is false if
241 the full "RFC 3986 path" is "/".
242
243 Using this quirk option NoInitialSlash is a matter of prefer‐
244 ence.
245
246 NoExtraKeys
247 This quirk option permits full backward compatibility with ver‐
248 sions of uri before 1.2.7, by omitting the uri::split key pbare
249 described above (see quirk option NoInitialSlash). The outcome
250 is greater backward compatibility of the uri::split command, but
251 an inability to distinguish between URIs in which the full "RFC
252 3986 path" is the empty string "" or a single slash "/" respec‐
253 tively - i.e. a minor non-conformance with RFC 3986.
254
255 If the quirk option NoExtraKeys is false (the default), command
256 uri::split returns an additional key pbare, and the commands
257 comply with RFC 3986. If the quirk option NoExtraKeys is true,
258 the key pbare is not defined and there is not full conformance
259 with RFC 3986.
260
261 Using the quirk option NoExtraKeys is NOT recommended, because
262 if set to true it will reduce conformance with RFC 3986. The
263 option is included only for compatibility with code, written for
264 earlier versions of uri, that needs values of path without a
265 leading "/", AND ALSO cannot tolerate unexpected keys in the
266 results of uri::split.
267
268 HostAsDriveLetter
269 When handling the scheme file on the Windows platform, versions
270 of uri before 1.2.7 use the host field to represent a Windows
271 drive letter and the colon that follows it, and the path field
272 to represent the filename path after the colon. Such URIs are
273 invalid, and are not recognized by any RFC. When the quirk
274 option HostAsDriveLetter is true, this behavior is also used in
275 version 1.2.7. To use file URIs on Windows that conform to RFC
276 3986, set this quirk option to false (the default).
277
278 Using this quirk is NOT recommended, because if set to true it
279 will cause the uri commands to expect and produce invalid URIs.
280 The option is included only for compatibility with legacy code.
281
282 RemoveDoubleSlashes
283 When a URI is canonicalized by uri::canonicalize, its path is
284 normalized by removal of segments "." and "..". RFC 3986 does
285 not mandate the removal of empty segments "" (i.e. the merger of
286 double slashes, which is a feature of filename normalization but
287 not of URI path normalization): it treats URIs with excess
288 slashes as referring to different resources. When the quirk
289 option RemoveDoubleSlashes is true (the default), empty segments
290 will be removed from path. To prevent removal, and thereby con‐
291 form to RFC 3986, set this quirk option to false.
292
293 Using this quirk is a matter of preference. A URI with double
294 slashes in its path was most likely generated by error, cer‐
295 tainly so if it has a straightforward mapping to a file on a
296 server. In some cases it may be better to sanitize the URI; in
297 others, to keep the URI and let the server handle the possible
298 error.
299
300 BACKWARD COMPATIBILITY
301 To behave as similarly as possible to versions of uri earlier than
302 1.2.7, set the following quirk options:
303
304 · uri::setQuirkOption NoInitialSlash 1
305
306 · uri::setQuirkOption NoExtraKeys 1
307
308 · uri::setQuirkOption HostAsDriveLetter 1
309
310 · uri::setQuirkOption RemoveDoubleSlashes 0
311
312 In code that can tolerate the return by uri::split of an additional key
313 pbare, set
314
315 · uri::setQuirkOption NoExtraKeys 0
316
317 in order to achieve greater compliance with RFC 3986.
318
319 NEW DESIGNS
320 For new projects, the following settings are recommended:
321
322 · uri::setQuirkOption NoInitialSlash 0
323
324 · uri::setQuirkOption NoExtraKeys 0
325
326 · uri::setQuirkOption HostAsDriveLetter 0
327
328 · uri::setQuirkOption RemoveDoubleSlashes 0|1
329
330 DEFAULT VALUES
331 The default values for package uri version 1.2.7 are intended to be a
332 compromise between backwards compatibility and improved features. Dif‐
333 ferent default values may be chosen in future versions of package uri.
334
335 · uri::setQuirkOption NoInitialSlash 1
336
337 · uri::setQuirkOption NoExtraKeys 0
338
339 · uri::setQuirkOption HostAsDriveLetter 0
340
341 · uri::setQuirkOption RemoveDoubleSlashes 1
342
344 A Windows® local filename such as "C:\Other Files\startup.txt" is not
345 suitable for use as the path element of a URI in the scheme file.
346
347 The Tcl command file normalize will convert the backslashes to forward
348 slashes. To generate a valid path for the scheme file, the normalized
349 filename must be prepended with "/", and then any characters that do
350 not match the regexp bracket expression
351
352
353 [a-zA-Z0-9$_.+!*'(,)?:@&=-]
354
355 must be percent-encoded.
356
357 The result in this example is "/C:/Other%20Files/startup.txt" which is
358 a valid value for path.
359
360
361 % uri::join path /C:/Other%20Files/startup.txt scheme file
362
363 file:///C:/Other%20Files/startup.txt
364
365 % uri::split file:///C:/Other%20Files/startup.txt
366
367 path /C:/Other%20Files/startup.txt scheme file
368
369
370 On UNIX® systems filenames begin with "/" which is also used as the
371 directory separator. The only action needed to convert a filename to a
372 valid path is percent-encoding.
373
375 Original code (regular expressions) by Andreas Kupries. Modularisation
376 by Steve Ball, also the split/join/resolve functionality. RFC 3986 con‐
377 formance by Keith Nash.
378
380 This document, and the package it describes, will undoubtedly contain
381 bugs and other problems. Please report such in the category uri of the
382 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
383 report any ideas for enhancements you may have for either package
384 and/or documentation.
385
386 When proposing code changes, please provide unified diffs, i.e the out‐
387 put of diff -u.
388
389 Note further that attachments are strongly preferred over inlined
390 patches. Attachments can be made by going to the Edit form of the
391 ticket immediately after its creation, and then using the left-most
392 button in the secondary navigation bar.
393
395 fetching information, file, ftp, gopher, http, https, ldap, mailto,
396 news, prospero, rfc 1630, rfc 2255, rfc 2396, rfc 3986, uri, url, wais,
397 www
398
400 Networking
401
402
403
404tcllib 1.2.7 uri(n)