1uri(n)            Tcl Uniform Resource Identifier Management            uri(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       uri - URI utilities
9

SYNOPSIS

11       package require Tcl  8.2
12
13       package require uri  ?1.2.7?
14
15       uri::setQuirkOption option ?value?
16
17       uri::split url ?defaultscheme?
18
19       uri::join ?key value?...
20
21       uri::resolve base url
22
23       uri::isrelative url
24
25       uri::geturl url ?options...?
26
27       uri::canonicalize uri
28
29       uri::register schemeList script
30
31______________________________________________________________________________
32

DESCRIPTION

34       This package does two things.
35
36       First,  it provides a number of commands for manipulating URLs/URIs and
37       fetching data specified by them. For fetching data this package  analy‐
38       ses  the  requested  URL/URI  and then dispatches it to the appropriate
39       package (http, ftp, ...) for actual retrieval.   Currently  these  com‐
40       mands are defined for the schemes http, https, ftp, mailto, news, ldap,
41       ldaps and file.  The package uri::urn adds scheme urn.
42
43       Second, it provides regular expressions  for  a  number  of  registered
44       URL/URI  schemes.  Registered  schemes  are currently ftp, ldap, ldaps,
45       file, http, https, gopher, mailto, news, wais and prospero.  The  pack‐
46       age uri::urn adds scheme urn.
47
48       The  commands  of the package conform to RFC 3986 (https://www.rfc-edi
49       tor.org/rfc/rfc3986.txt), with the exception of a loophole arising from
50       RFC  1630 and described in RFC 3986 Sections 5.2.2 and 5.4.2. The loop‐
51       hole allows a relative URI to include a scheme if it is the same as the
52       scheme  of  the  base URI against which it is resolved. RFC 3986 recom‐
53       mends avoiding this usage.
54

COMMANDS

56       uri::setQuirkOption option ?value?
57              uri::setQuirkOption is an  accessor  command  for  a  number  of
58              "quirk options".  The command has the same semantics as the com‐
59              mand set: when called with one argument  it  reads  an  existing
60              value; with two arguments it writes a new value.  The value of a
61              "quirk option" is boolean: the value false requests  conformance
62              with  RFC  3986, while true requests use of the quirk.  See sec‐
63              tion QUIRK OPTIONS for discussion of the different  options  and
64              their purpose.
65
66       uri::split url ?defaultscheme?
67              uri::split  takes  a  url, decodes it and then returns a list of
68              key/value pairs suitable  for  array  set  containing  the  con‐
69              stituents  of  the url. If the scheme is missing from the url it
70              defaults to the value of defaultscheme if it was  specified,  or
71              http else. Currently the schemes http, https, ftp, mailto, news,
72              ldap, ldaps and file are supported by the package  itself.   See
73              section EXTENDING on how to expand that range.
74
75              The  set  of constituents of a URL (= the set of keys in the re‐
76              turned dictionary) is dependent on the scheme of  the  URL.  The
77              only  key  which  is therefore always present is scheme. For the
78              following schemes the constituents and their keys are known:
79
80              ftp    user, pwd, host, port, path, type, pbare.  The  pbare  is
81                     optional.
82
83              http(s)
84                     user, pwd, host, port, path, query, fragment, pbare.  The
85                     pbare is optional.
86
87              file   path, host. The host is optional.
88
89              mailto user, host. The host is optional.
90
91              ldap(s)
92                     host, port, dn, attrs, scope, filter, extensions
93
94              news   Either message-id or newsgroup-name.
95
96              For discussion of the boolean pbare see  options  NoInitialSlash
97              and NoExtraKeys in QUIRK OPTIONS.
98
99              The  constituents  are  returned  as slices of the argument url,
100              without removal of percent-encoding  ("url-encoding")  or  other
101              adaptations.   Notably,  on  Windows® the path in scheme file is
102              not a valid local filename.  See EXAMPLES for more information.
103
104
105       uri::join ?key value?...
106              uri::join  takes  a  list  of  key/value  pairs  (generated   by
107              uri::split, for example) and returns the canonical URL they rep‐
108              resent. Currently the schemes http, https,  ftp,  mailto,  news,
109              ldap,  ldaps  and  file are supported by the package itself. See
110              section EXTENDING on how to expand that range.
111
112              The arguments are expected to be slices of  a  valid  URL,  with
113              percent-encoding  ("url-encoding") and any other necessary adap‐
114              tations.  Notably, on Windows the path in scheme file is  not  a
115              valid local filename.  See EXAMPLES for more information.
116
117       uri::resolve base url
118              uri::resolve  resolves  the  specified  url relative to base, in
119              conformance with RFC 3986. In other words: a non-relative url is
120              returned unchanged, whereas for a relative url the missing parts
121              are taken from base and prepended to it. The result of this  op‐
122              eration  is returned. For an empty url the result is base, with‐
123              out its URI fragment (if any).  The  command  is  available  for
124              schemes http, https, ftp, and file.
125
126       uri::isrelative url
127              uri::isrelative determines whether the specified url is absolute
128              or relative.  The command is available for a url of any scheme.
129
130       uri::geturl url ?options...?
131              uri::geturl decodes the specified url and  then  dispatches  the
132              request  to  the package appropriate for the scheme found in the
133              URL. The command assumes that the package to  handle  the  given
134              scheme  either has the same name as the scheme itself (including
135              possible capitalization) followed by ::geturl, or,  in  case  of
136              this  failing, has the same name as the scheme itself (including
137              possible capitalization). It further assumes that whatever pack‐
138              age was loaded provides a geturl-command in the namespace of the
139              same name as the package itself. This command is called with the
140              given  url and all given options. Currently geturl does not han‐
141              dle any options itself.
142
143              Note: file-URLs are an exception to the  rule  described  above.
144              They are handled internally.
145
146              It  is  not possible to specify results of the command. They de‐
147              pend on the geturl-command for the scheme the request  was  dis‐
148              patched to.
149
150       uri::canonicalize uri
151              uri::canonicalize  returns  the  canonical  form  of a URI.  The
152              canonical form of a URI is one where  relative  path  specifica‐
153              tions,  i.e.  "."  and "..", have been resolved.  The command is
154              available for all URI schemes that have uri::split and uri::join
155              commands.  The  command  returns  a canonicalized URI if the URI
156              scheme has a path component (i.e. http, https, ftp,  and  file).
157              For  schemes  that have uri::split and uri::join commands but no
158              path component (i.e. mailto, news, ldap, and ldaps), the command
159              returns the uri unchanged.
160
161       uri::register schemeList script
162              uri::register registers the first element of schemeList as a new
163              scheme and the remaining elements as aliases for this scheme. It
164              creates  the namespace for the scheme and executes the script in
165              the new namespace. The script has to declare variables  contain‐
166              ing  regular  expressions  relevant  to the scheme. At least the
167              variable schemepart has to be declared as that one  is  used  to
168              extend the variables keeping track of the registered schemes.
169

SCHEMES

171       In addition to the commands mentioned above this package provides regu‐
172       lar expression to recognize URLs for a number of URL schemes.
173
174       For each supported scheme a namespace of the same name  as  the  scheme
175       itself  is provided inside of the namespace uri containing the variable
176       url whose contents are a regular expression to recognize URLs  of  that
177       scheme.  Additional variables may contain regular expressions for parts
178       of URLs for that scheme.
179
180       The variable uri::schemes contains a list of  all  registered  schemes.
181       Currently  these  are  ftp,  ldap,  ldaps,  file,  http, https, gopher,
182       mailto, news, wais and prospero.
183

EXTENDING

185       Extending the range of schemes supported by uri::split and uri::join is
186       easy  because both commands do not handle the request by themselves but
187       dispatch it to another command in the uri namespace using the scheme of
188       the URL as criterion.
189
190       uri::split  and  uri::join  call  Split[string  totitle  <scheme>]  and
191       Join[string totitle <scheme>] respectively.
192
193       The provision of split and join commands is sufficient  to  extend  the
194       commands  uri::canonicalize  and uri::geturl (the latter subject to the
195       availability of a suitable package with a  geturl  command).   In  con‐
196       trast,  to extend the command uri::resolve to a new scheme, the command
197       itself must be modified.
198
199       To extend the range of schemes for which pattern information is  avail‐
200       able, use the command uri::register.
201
202       An  example of a package that provides both commands and pattern infor‐
203       mation for a new scheme is uri::urn, which adds scheme urn.
204

QUIRK OPTIONS

206       The value of a "quirk option" is boolean: the value false requests con‐
207       formance with RFC 3986, while true requests use of the quirk.  Use com‐
208       mand uri::setQuirkOption to access the values of quirk options.
209
210       Quirk options are useful both for allowing backwards compatibility when
211       a  command  specification  changes, and for adding useful features that
212       are not included in RFC specifications.  The  following  quirk  options
213       are currently defined:
214
215       NoInitialSlash
216              This  quirk  option  concerns  the leading character of path (if
217              non-empty) in the schemes http, https, and ftp.
218
219              RFC 3986 defines path in an absolute URI to have an initial "/",
220              unless  the  value  of  path is the empty string. For the scheme
221              file, all versions of package uri follow this rule.   The  quirk
222              option NoInitialSlash does not apply to scheme file.
223
224              For  the  schemes  http,  https, and ftp, versions of uri before
225              1.2.7 define the path NOT to include an initial "/".   When  the
226              quirk option NoInitialSlash is true (the default), this behavior
227              is also used in version 1.2.7.  To use instead values of path as
228              defined by RFC 3986, set this quirk option to false.
229
230              This  setting  does  not affect RFC 3986 conformance.  If NoIni‐
231              tialSlash is true, then the value of path in the  schemes  http,
232              https, or ftp, cannot distinguish between URIs in which the full
233              "RFC 3986 path" is the empty string "" or a single slash "/" re‐
234              spectively.   The  missing  information  is recorded in an addi‐
235              tional uri::split key pbare.
236
237              The boolean pbare is defined when quirk  options  NoInitialSlash
238              and  NoExtraKeys  have  values  true and false respectively.  In
239              this case, if the value of path is the empty string "", pbare is
240              true  if  the  full "RFC 3986 path" is "", and pbare is false if
241              the full "RFC 3986 path" is "/".
242
243              Using this quirk option NoInitialSlash is a  matter  of  prefer‐
244              ence.
245
246       NoExtraKeys
247              This  quirk option permits full backward compatibility with ver‐
248              sions of uri before 1.2.7, by omitting the uri::split key  pbare
249              described  above (see quirk option NoInitialSlash).  The outcome
250              is greater backward compatibility of the uri::split command, but
251              an  inability to distinguish between URIs in which the full "RFC
252              3986 path" is the empty string "" or a single slash "/"  respec‐
253              tively - i.e. a minor non-conformance with RFC 3986.
254
255              If  the quirk option NoExtraKeys is false (the default), command
256              uri::split returns an additional key  pbare,  and  the  commands
257              comply  with  RFC 3986. If the quirk option NoExtraKeys is true,
258              the key pbare is not defined and there is not  full  conformance
259              with RFC 3986.
260
261              Using  the  quirk option NoExtraKeys is NOT recommended, because
262              if set to true it will reduce conformance with  RFC  3986.   The
263              option is included only for compatibility with code, written for
264              earlier versions of uri, that needs values  of  path  without  a
265              leading "/", AND ALSO cannot tolerate unexpected keys in the re‐
266              sults of uri::split.
267
268       HostAsDriveLetter
269              When handling the scheme file on the Windows platform,  versions
270              of  uri  before  1.2.7 use the host field to represent a Windows
271              drive letter and the colon that follows it, and the  path  field
272              to  represent  the filename path after the colon.  Such URIs are
273              invalid, and are not recognized by any RFC. When the  quirk  op‐
274              tion  HostAsDriveLetter  is  true, this behavior is also used in
275              version 1.2.7.  To use file URIs on Windows that conform to  RFC
276              3986, set this quirk option to false (the default).
277
278              Using  this  quirk is NOT recommended, because if set to true it
279              will cause the uri commands to expect and produce invalid  URIs.
280              The option is included only for compatibility with legacy code.
281
282       RemoveDoubleSlashes
283              When  a  URI  is canonicalized by uri::canonicalize, its path is
284              normalized by removal of segments "." and "..".  RFC  3986  does
285              not mandate the removal of empty segments "" (i.e. the merger of
286              double slashes, which is a feature of filename normalization but
287              not  of  URI  path  normalization):  it  treats URIs with excess
288              slashes as referring to different resources.  When the quirk op‐
289              tion  RemoveDoubleSlashes  is true (the default), empty segments
290              will be removed from path.  To prevent removal, and thereby con‐
291              form to RFC 3986, set this quirk option to false.
292
293              Using  this  quirk is a matter of preference.  A URI with double
294              slashes in its path was most likely  generated  by  error,  cer‐
295              tainly  so  if  it  has a straightforward mapping to a file on a
296              server.  In some cases it may be better to sanitize the URI;  in
297              others,  to  keep the URI and let the server handle the possible
298              error.
299
300   BACKWARD COMPATIBILITY
301       To behave as similarly as possible to  versions  of  uri  earlier  than
302       1.2.7, set the following quirk options:
303
304uri::setQuirkOption NoInitialSlash 1
305
306uri::setQuirkOption NoExtraKeys 1
307
308uri::setQuirkOption HostAsDriveLetter 1
309
310uri::setQuirkOption RemoveDoubleSlashes 0
311
312       In code that can tolerate the return by uri::split of an additional key
313       pbare, set
314
315uri::setQuirkOption NoExtraKeys 0
316
317       in order to achieve greater compliance with RFC 3986.
318
319   NEW DESIGNS
320       For new projects, the following settings are recommended:
321
322uri::setQuirkOption NoInitialSlash 0
323
324uri::setQuirkOption NoExtraKeys 0
325
326uri::setQuirkOption HostAsDriveLetter 0
327
328uri::setQuirkOption RemoveDoubleSlashes 0|1
329
330   DEFAULT VALUES
331       The default values for package uri version 1.2.7 are intended to  be  a
332       compromise between backwards compatibility and improved features.  Dif‐
333       ferent default values may be chosen in future versions of package uri.
334
335uri::setQuirkOption NoInitialSlash 1
336
337uri::setQuirkOption NoExtraKeys 0
338
339uri::setQuirkOption HostAsDriveLetter 0
340
341uri::setQuirkOption RemoveDoubleSlashes 1
342

EXAMPLES

344       A Windows® local filename such as "C:\Other Files\startup.txt"  is  not
345       suitable for use as the path element of a URI in the scheme file.
346
347       The  Tcl command file normalize will convert the backslashes to forward
348       slashes.  To generate a valid path for the scheme file, the  normalized
349       filename  must  be  prepended with "/", and then any characters that do
350       not match the regexp bracket expression
351
352
353                  [a-zA-Z0-9$_.+!*'(,)?:@&=-]
354
355       must be percent-encoded.
356
357       The result in this example is "/C:/Other%20Files/startup.txt" which  is
358       a valid value for path.
359
360
361              % uri::join path /C:/Other%20Files/startup.txt scheme file
362
363              file:///C:/Other%20Files/startup.txt
364
365              % uri::split file:///C:/Other%20Files/startup.txt
366
367              path /C:/Other%20Files/startup.txt scheme file
368
369
370       On UNIX® systems filenames begin with "/" which is also used as the di‐
371       rectory separator.  The only action needed to convert a filename  to  a
372       valid path is percent-encoding.
373

CREDITS

375       Original code (regular expressions) by Andreas Kupries.  Modularisation
376       by Steve Ball, also the split/join/resolve functionality. RFC 3986 con‐
377       formance by Keith Nash.
378

BUGS, IDEAS, FEEDBACK

380       This  document,  and the package it describes, will undoubtedly contain
381       bugs and other problems.  Please report such in the category uri of the
382       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
383       report any ideas for enhancements  you  may  have  for  either  package
384       and/or documentation.
385
386       When proposing code changes, please provide unified diffs, i.e the out‐
387       put of diff -u.
388
389       Note further that  attachments  are  strongly  preferred  over  inlined
390       patches.  Attachments  can  be  made  by  going to the Edit form of the
391       ticket immediately after its creation, and  then  using  the  left-most
392       button in the secondary navigation bar.
393

KEYWORDS

395       fetching  information,  file,  ftp,  gopher, http, https, ldap, mailto,
396       news, prospero, rfc 1630, rfc 2255, rfc 2396, rfc 3986, uri, url, wais,
397       www
398

CATEGORY

400       Networking
401
402
403
404tcllib                               1.2.7                              uri(n)
Impressum