url(7) - f39

1uri(7)                 Miscellaneous Information Manual                 uri(7)
2
3
4

NAME

6       uri,  url,  urn - uniform resource identifier (URI), including a URL or
7       URN
8

SYNOPSIS

10       URI = [ absoluteURI | relativeURI ] [ "#" fragment ]
11
12       absoluteURI = scheme ":" ( hierarchical_part | opaque_part )
13
14       relativeURI = ( net_path | absolute_path | relative_path )
15                     [ "?" query ]
16
17       scheme = "http" | "ftp" | "gopher" | "mailto" | "news" | "telnet" |
18                "file" | "ftp" | "man" | "info" | "whatis" | "ldap" | "wais" |
19                ...
20
21       hierarchical_part = ( net_path | absolute_path ) [ "?" query ]
22
23       net_path = "//" authority [ absolute_path ]
24
25       absolute_path = "/" path_segments
26
27       relative_path = relative_segment [ absolute_path ]
28

DESCRIPTION

30       A  Uniform  Resource  Identifier  (URI) is a short string of characters
31       identifying an abstract or physical resource (for example, a web page).
32       A  Uniform  Resource  Locator (URL) is a URI that identifies a resource
33       through its primary access mechanism (e.g.,  its  network  "location"),
34       rather  than  by name or some other attribute of that resource.  A Uni‐
35       form Resource Name (URN) is a URI that must remain globally unique  and
36       persistent  even  when the resource ceases to exist or becomes unavail‐
37       able.
38
39       URIs are the standard way to name hypertext link destinations for tools
40       such as web browsers.  The string "http://www.kernel.org" is a URL (and
41       thus it is also a URI).  Many people use the term URL loosely as a syn‐
42       onym for URI (though technically URLs are a subset of URIs).
43
44       URIs  can  be absolute or relative.  An absolute identifier refers to a
45       resource independent of context, while a relative identifier refers  to
46       a  resource  by  describing  the  difference  from the current context.
47       Within a relative path reference, the complete path  segments  "."  and
48       ".."  have  special  meanings:  "the  current hierarchy level" and "the
49       level above this hierarchy level", respectively, just like they  do  in
50       UNIX-like  systems.   A  path  segment which contains a colon character
51       can't be used as the first  segment  of  a  relative  URI  path  (e.g.,
52       "this:that"),  because  it would be mistaken for a scheme name; precede
53       such segments with ./ (e.g., "./this:that").  Note that descendants  of
54       MS-DOS  (e.g.,  Microsoft  Windows)  replace devicename colons with the
55       vertical bar ("|") in URIs, so "C:" becomes "C|".
56
57       A fragment identifier, if included, refers to a particular  named  por‐
58       tion  (fragment)  of  a resource; text after a '#' identifies the frag‐
59       ment.  A URI beginning with '#' refers to that fragment in the  current
60       resource.
61
62   Usage
63       There  are  many  different  URI schemes, each with specific additional
64       rules and meanings, but they are intentionally made to be as similar as
65       possible.  For example, many URL schemes permit the authority to be the
66       following format, called here an ip_server (square brackets show what's
67       optional):
68
69       ip_server = [user [ : password ] @ ] host [ : port]
70
71       This  format  allows  you  to optionally insert a username, a user plus
72       password, and/or a port number.  The host is the name of the host  com‐
73       puter,  either  its name as determined by DNS or an IP address (numbers
74       separated by periods).  Thus  the  URI  <http://fred:fredpassword@exam‐
75       ple.com:8080/>  logs into a web server on host example.com as fred (us‐
76       ing fredpassword) using port 8080.  Avoid including a password in a URI
77       if  possible  because  of  the many security risks of having a password
78       written down.  If the URL supplies a username but no password, and  the
79       remote  server  requests  a  password, the program interpreting the URL
80       should request one from the user.
81
82       Here are some of the most common schemes in use  on  UNIX-like  systems
83       that  are  understood  by  many tools.  Note that many tools using URIs
84       also have internal schemes or specialized  schemes;  see  those  tools'
85       documentation for information on those schemes.
86
87       http - Web (HTTP) server
88
89       http://ip_server/path
90       http://ip_server/path?query
91
92       This  is  a URL accessing a web (HTTP) server.  The default port is 80.
93       If the path refers to a directory, the web server will choose  what  to
94       return;  usually  if  there is a file named "index.html" or "index.htm"
95       its content is returned, otherwise, a list of the files in the  current
96       directory (with appropriate links) is generated and returned.  An exam‐
97       ple is <http://lwn.net>.
98
99       A query can be given in the archaic "isindex" format, consisting  of  a
100       word  or  phrase and not including an equal sign (=).  A query can also
101       be in the longer "GET" format, which has one or more query  entries  of
102       the form key=value separated by the ampersand character (&).  Note that
103       key can be repeated more than once, though it's up to  the  web  server
104       and  its  application  programs  to determine if there's any meaning to
105       that.  There is an unfortunate interaction with HTML/XML/SGML  and  the
106       GET query format; when such URIs with more than one key are embedded in
107       SGML/XML documents (including  HTML),  the  ampersand  (&)  has  to  be
108       rewritten  as &amp;.  Note that not all queries use this format; larger
109       forms may be too long to store as a URI, so they use a different inter‐
110       action  mechanism  (called POST) which does not include the data in the
111       URI.    See   the   Common   Gateway   Interface    specification    at
112       ⟨http://www.w3.org/CGI⟩ for more information.
113
114       ftp - File Transfer Protocol (FTP)
115
116       ftp://ip_server/path
117
118       This  is  a  URL  accessing  a  file through the file transfer protocol
119       (FTP).  The default port (for control) is 21.  If no  username  is  in‐
120       cluded,  the  username  "anonymous"  is supplied, and in that case many
121       clients provide as the password the requestor's Internet email address.
122       An example is <ftp://ftp.is.co.za/rfc/rfc1808.txt>.
123
124       gopher - Gopher server
125
126       gopher://ip_server/gophertype selector
127       gopher://ip_server/gophertype selector%09search
128       gopher://ip_server/gophertype selector%09search%09gopher+_string
129
130       The  default gopher port is 70.  gophertype is a single-character field
131       to denote the Gopher type of the resource to which the URL refers.  The
132       entire path may also be empty, in which case the delimiting "/" is also
133       optional and the gophertype defaults to "1".
134
135       selector is the Gopher selector string.  In the Gopher protocol, Gopher
136       selector  strings are a sequence of octets which may contain any octets
137       except 09 hexadecimal (US-ASCII HT or tab),  0A  hexadecimal  (US-ASCII
138       character LF), and 0D (US-ASCII character CR).
139
140       mailto - Email address
141
142       mailto:email-address
143
144       This  is  an  email  address,  usually  of the form name@hostname.  See
145       mailaddr(7) for more information on the correct format of an email  ad‐
146       dress.  Note that any % character must be rewritten as %25.  An example
147       is <mailto:dwheeler@dwheeler.com>.
148
149       news - Newsgroup or News message
150
151       news:newsgroup-name
152       news:message-id
153
154       A newsgroup-name is  a  period-delimited  hierarchical  name,  such  as
155       "comp.infosystems.www.misc".    If   <newsgroup-name>  is  "*"  (as  in
156       <news:*>), it is used to refer to "all available news groups".  An  ex‐
157       ample is <news:comp.lang.ada>.
158
159       A   message-id   corresponds   to  the  Message-ID  of  IETF  RFC 1036,
160       ⟨http://www.ietf.org/rfc/rfc1036.txt⟩ without  the  enclosing  "<"  and
161       ">";  it  takes the form unique@full_domain_name.  A message identifier
162       may be distinguished from a news group name by the presence of the  "@"
163       character.
164
165       telnet - Telnet login
166
167       telnet://ip_server/
168
169       The  Telnet  URL  scheme is used to designate interactive text services
170       that may be accessed by the Telnet protocol.  The final  "/"  character
171       may  be  omitted.   The  default  port  is  23.   An  example  is <tel‐
172       net://melvyl.ucop.edu/>.
173
174       file - Normal file
175
176       file://ip_server/path_segments
177       file:path_segments
178
179       This represents a file or directory accessible locally.  As  a  special
180       case, ip_server can be the string "localhost" or the empty string; this
181       is interpreted as "the machine from  which  the  URL  is  being  inter‐
182       preted".   If the path is to a directory, the viewer should display the
183       directory's contents with links to each containee; not all viewers cur‐
184       rently   do  this.   KDE  supports  generated  files  through  the  URL
185       <file:/cgi-bin>.  If the given file isn't found,  browser  writers  may
186       want  to  try to expand the filename via filename globbing (see glob(7)
187       and glob(3)).
188
189       The second format (e.g., <file:/etc/passwd>) is a  correct  format  for
190       referring  to  a  local  file.  However, older standards did not permit
191       this format, and some programs don't recognize this as a URI.   A  more
192       portable syntax is to use an empty string as the server name, for exam‐
193       ple, <file:///etc/passwd>; this form does the same thing and is  easily
194       recognized  by pattern matchers and older programs as a URI.  Note that
195       if you really mean to say "start  from  the  current  location",  don't
196       specify  the  scheme at all; use a relative address like <../test.txt>,
197       which has the side-effect of being scheme-independent.  An  example  of
198       this scheme is <file:///etc/passwd>.
199
200       man - Man page documentation
201
202       man:command-name
203       man:command-name(section)
204
205       This  refers to local online manual (man) reference pages.  The command
206       name can optionally be followed by a parenthesis  and  section  number;
207       see  man(7) for more information on the meaning of the section numbers.
208       This URI scheme is unique to UNIX-like systems (such as Linux)  and  is
209       not currently registered by the IETF.  An example is <man:ls(1)>.
210
211       info - Info page documentation
212
213       info:virtual-filename
214       info:virtual-filename#nodename
215       info:(virtual-filename)
216       info:(virtual-filename)nodename
217
218       This  scheme refers to online info reference pages (generated from tex‐
219       info files), a documentation format used by programs such  as  the  GNU
220       tools.   This URI scheme is unique to UNIX-like systems (such as Linux)
221       and is not currently registered by the IETF.  As of this writing, GNOME
222       and  KDE  differ in their URI syntax and do not accept the other's syn‐
223       tax.  The first two formats are the GNOME format; in nodenames all spa‐
224       ces  are  written  as  underscores.  The second two formats are the KDE
225       format; spaces in nodenames must be written as spaces, even though this
226       is  forbidden by the URI standards.  It's hoped that in the future most
227       tools will understand all of these formats and will always  accept  un‐
228       derscores  for spaces in nodenames.  In both GNOME and KDE, if the form
229       without the nodename is used the nodename is assumed to be "Top".   Ex‐
230       amples  of  the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>.
231       Examples of the KDE format  are  <info:(gcc)>  and  <info:(gcc)G++  and
232       GCC>.
233
234       whatis - Documentation search
235
236       whatis:string
237
238       This  scheme  searches the database of short (one-line) descriptions of
239       commands and returns a list of  descriptions  containing  that  string.
240       Only  complete  word  matches  are  returned.  See whatis(1).  This URI
241       scheme is unique to UNIX-like systems (such as Linux) and is  not  cur‐
242       rently registered by the IETF.
243
244       ghelp - GNOME help documentation
245
246       ghelp:name-of-application
247
248       This  loads  GNOME  help for the given application.  Note that not much
249       documentation currently exists in this format.
250
251       ldap - Lightweight Directory Access Protocol
252
253       ldap://hostport
254       ldap://hostport/
255       ldap://hostport/dn
256       ldap://hostport/dn?attributes
257       ldap://hostport/dn?attributes?scope
258       ldap://hostport/dn?attributes?scope?filter
259       ldap://hostport/dn?attributes?scope?filter?extensions
260
261       This scheme supports queries to the Lightweight Directory Access Proto‐
262       col (LDAP), a protocol for querying a set of servers for hierarchically
263       organized information (such as people and  computing  resources).   See
264       RFC 2255  ⟨http://www.ietf.org/rfc/rfc2255.txt⟩ for more information on
265       the LDAP URL scheme.  The components of this URL are:
266
267       hostport
268              the LDAP server to query, written as a hostname optionally  fol‐
269              lowed  by a colon and the port number.  The default LDAP port is
270              TCP port 389.  If empty, the client determines  which  the  LDAP
271              server to use.
272
273       dn     the LDAP Distinguished Name, which identifies the base object of
274              the   LDAP   search   (see   RFC 2253   ⟨http://www.ietf.org/rfc
275              /rfc2253.txt⟩ section 3).
276
277       attributes
278              a  comma-separated  list  of  attributes  to  be  returned;  see
279              RFC 2251 section 4.1.5.  If omitted, all  attributes  should  be
280              returned.
281
282       scope  specifies  the  scope  of the search, which can be one of "base"
283              (for a base object search), "one" (for a one-level  search),  or
284              "sub"  (for  a  subtree search).  If scope is omitted, "base" is
285              assumed.
286
287       filter specifies the search filter (subset of entries to  return).   If
288              omitted,   all   entries   should  be  returned.   See  RFC 2254
289              ⟨http://www.ietf.org/rfc/rfc2254.txt⟩ section 4.
290
291       extensions
292              a comma-separated list of type=value  pairs,  where  the  =value
293              portion  may be omitted for options not requiring it.  An exten‐
294              sion prefixed with a '!' is critical (must be  supported  to  be
295              valid), otherwise it is noncritical (optional).
296
297       LDAP  queries  are  easiest to explain by example.  Here's a query that
298       asks ldap.itd.umich.edu for information about the University of  Michi‐
299       gan in the U.S.:
300
301       ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US
302
303       To just get its postal address attribute, request:
304
305       ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress
306
307       To  ask  a  host.com at port 6666 for information about the person with
308       common name (cn) "Babs Jensen" at University of Michigan, request:
309
310       ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
311
312       wais - Wide Area Information Servers
313
314       wais://hostport/database
315       wais://hostport/database?search
316       wais://hostport/database/wtype/wpath
317
318       This scheme designates a WAIS database, search, or document  (see  IETF
319       RFC 1625  ⟨http://www.ietf.org/rfc/rfc1625.txt⟩ for more information on
320       WAIS).  Hostport is the hostname, optionally followed by  a  colon  and
321       port number (the default port number is 210).
322
323       The  first  form  designates a WAIS database for searching.  The second
324       form designates a particular search of the WAIS database database.  The
325       third  form  designates a particular document within a WAIS database to
326       be retrieved.  wtype is the WAIS designation of the type of the  object
327       and wpath is the WAIS document-id.
328
329       other schemes
330
331       There  are many other URI schemes.  Most tools that accept URIs support
332       a set of internal URIs (e.g., Mozilla has the about: scheme for  inter‐
333       nal  information,  and  the  GNOME help browser has the toc: scheme for
334       various starting locations).  There are many schemes that have been de‐
335       fined  but are not as widely used at the current time (e.g., prospero).
336       The nntp: scheme is deprecated in favor of the news: scheme.  URNs  are
337       to  be  supported  by  the  urn: scheme, with a hierarchical name space
338       (e.g., urn:ietf:... would identify IETF documents); at this  time  URNs
339       are not widely implemented.  Not all tools support all schemes.
340
341   Character encoding
342       URIs  use  a  limited number of characters so that they can be typed in
343       and used in a variety of situations.
344
345       The following characters are reserved, that is, they may  appear  in  a
346       URI  but  their  use  is limited to their reserved purpose (conflicting
347       data must be escaped before forming the URI):
348
349                  ; / ? : @ & = + $ ,
350
351       Unreserved characters may be included in a URI.  Unreserved  characters
352       include  uppercase and lowercase Latin letters, decimal digits, and the
353       following limited set of punctuation marks and symbols:
354
355                  - _ . ! ~ * ' ( )
356
357       All other characters must be escaped.  An escaped octet is encoded as a
358       character  triplet, consisting of the percent character "%" followed by
359       the two hexadecimal digits representing the octet code (you can use up‐
360       percase or lowercase letters for the hexadecimal digits).  For example,
361       a blank space must be escaped as "%20", a tab character as  "%09",  and
362       the "&" as "%26".  Because the percent "%" character always has the re‐
363       served purpose of being the escape indicator, it  must  be  escaped  as
364       "%25".   It  is  common practice to escape space characters as the plus
365       symbol (+) in query text; this practice isn't uniformly defined in  the
366       relevant RFCs (which recommend %20 instead) but any tool accepting URIs
367       with query text should be prepared for them.  A URI is always shown  in
368       its "escaped" form.
369
370       Unreserved  characters can be escaped without changing the semantics of
371       the URI, but this should not be done unless the URI is being used in  a
372       context that does not allow the unescaped character to appear.  For ex‐
373       ample, "%7e" is sometimes used instead of "~" in an HTTP URL path,  but
374       the two are equivalent for an HTTP URL.
375
376       For  URIs  which  must handle characters outside the US ASCII character
377       set, the HTML 4.01 specification (section B.2) and IETF RFC 3986  (last
378       paragraph of section 2.5) recommend the following approach:
379
380       (1)  translate  the  character sequences into UTF-8 (IETF RFC 3629)—see
381            utf-8(7)—and then
382
383       (2)  use the URI escaping mechanism, that is, use the %HH encoding  for
384            unsafe octets.
385
386   Writing a URI
387       When  written,  URIs  should  be  placed  inside  double  quotes (e.g.,
388       "http://www.kernel.org"),   enclosed   in   angle    brackets    (e.g.,
389       <http://lwn.net>),  or  placed  on a line by themselves.  A warning for
390       those who use double-quotes: never move extraneous punctuation (such as
391       the  period  ending  a  sentence  or the comma in a list) inside a URI,
392       since this will change the value of the URI.  Instead, use angle brack‐
393       ets instead, or switch to a quoting system that never includes extrane‐
394       ous characters inside quotation marks.  This latter system, called  the
395       'new'  or  'logical'  quoting  system by "Hart's Rules" and the "Oxford
396       Dictionary for Writers and Editors", is  preferred  practice  in  Great
397       Britain  and  in various European languages.  Older documents suggested
398       inserting the prefix "URL:" just before the  URI,  but  this  form  has
399       never caught on.
400
401       The  URI  syntax was designed to be unambiguous.  However, as URIs have
402       become commonplace, traditional media (television,  radio,  newspapers,
403       billboards,  etc.)  have  increasingly  used abbreviated URI references
404       consisting of only the authority and path portions  of  the  identified
405       resource  (e.g., <www.w3.org/Addressing>).  Such references are primar‐
406       ily intended for human interpretation rather than machine, with the as‐
407       sumption  that  context-based heuristics are sufficient to complete the
408       URI (e.g., hostnames beginning with "www" are likely to have a URI pre‐
409       fix  of  "http://"  and hostnames beginning with "ftp" likely to have a
410       prefix of "ftp://").  Many client implementations heuristically resolve
411       these  references.   Such heuristics may change over time, particularly
412       when new schemes are introduced.  Since an abbreviated URI has the same
413       syntax  as  a  relative  URL path, abbreviated URI references cannot be
414       used where relative URIs are permitted, and can be used only when there
415       is  no  defined  base (such as in dialog boxes).  Don't use abbreviated
416       URIs as hypertext links inside a document; use the standard  format  as
417       described here.
418

STANDARDS

420       (IETF   RFC 2396)   ⟨http://www.ietf.org/rfc/rfc2396.txt⟩,  (HTML  4.0)
421       ⟨http://www.w3.org/TR/REC-html40⟩.
422

NOTES

424       Any tool accepting URIs (e.g., a web browser) on a Linux system  should
425       be able to handle (directly or indirectly) all of the schemes described
426       here, including the man: and info: schemes.  Handling them by  invoking
427       some other program is fine and in fact encouraged.
428
429       Technically the fragment isn't part of the URI.
430
431       For information on how to embed URIs (including URLs) in a data format,
432       see documentation on that format.  HTML uses the format <A  HREF="uri">
433       text </A>.  Texinfo files use the format @uref{uri}.  Man and mdoc have
434       the recently added UR macro, or just include the URI in the text (view‐
435       ers should be able to detect :// as part of a URI).
436
437       The  GNOME and KDE desktop environments currently vary in the URIs they
438       accept, in particular in their respective help browsers.  To  list  man
439       pages,  GNOME  uses <toc:man> while KDE uses <man:(index)>, and to list
440       info pages, GNOME uses <toc:info> while KDE uses <info:(dir)> (the  au‐
441       thor of this man page prefers the KDE approach here, though a more reg‐
442       ular format would be even better).  In general,  KDE  uses  <file:/cgi-
443       bin/>  as a prefix to a set of generated files.  KDE prefers documenta‐
444       tion in HTML, accessed via the <file:/cgi-bin/helpindex>.   GNOME  pre‐
445       fers the ghelp scheme to store and find documentation.  Neither browser
446       handles file: references to directories at the time  of  this  writing,
447       making  it  difficult  to refer to an entire directory with a browsable
448       URI.  As noted above, these environments differ in how they handle  the
449       info:  scheme,  probably  the most important variation.  It is expected
450       that GNOME and KDE will converge to common URI formats,  and  a  future
451       version  of  this man page will describe the converged result.  Efforts
452       to aid this convergence are encouraged.
453
454   Security
455       A URI does not in itself pose a security threat.  There is  no  general
456       guarantee  that a URL, which at one time located a given resource, will
457       continue to do so.  Nor is there any guarantee that a URL will not  lo‐
458       cate a different resource at some later point in time; such a guarantee
459       can be obtained only from the person(s) controlling that namespace  and
460       the resource in question.
461
462       It  is  sometimes  possible  to construct a URL such that an attempt to
463       perform a seemingly harmless operation, such as the retrieval of an en‐
464       tity associated with the resource, will in fact cause a possibly damag‐
465       ing remote operation to occur.  The unsafe URL is typically constructed
466       by  specifying  a  port number other than that reserved for the network
467       protocol in question.  The client unwittingly contacts a site  that  is
468       in  fact running a different protocol.  The content of the URL contains
469       instructions that, when interpreted according to this  other  protocol,
470       cause an unexpected operation.  An example has been the use of a gopher
471       URL to cause an unintended or impersonating message to be  sent  via  a
472       SMTP server.
473
474       Caution  should be used when using any URL that specifies a port number
475       other than the default for the protocol, especially when it is a number
476       within the reserved space.
477
478       Care should be taken when a URI contains escaped delimiters for a given
479       protocol (for example, CR and LF characters for telnet protocols)  that
480       these  are  not  unescaped before transmission.  This might violate the
481       protocol, but avoids the potential for such characters to  be  used  to
482       simulate  an extra operation or parameter in that protocol, which might
483       lead to an unexpected and possibly harmful remote operation to be  per‐
484       formed.
485
486       It is clearly unwise to use a URI that contains a password which is in‐
487       tended to be secret.  In particular, the use of a password  within  the
488       "userinfo" component of a URI is strongly recommended against except in
489       those rare cases where the "password" parameter is intended to be  pub‐
490       lic.
491

BUGS

493       Documentation  may  be  placed in a variety of locations, so there cur‐
494       rently isn't a good URI scheme for general online documentation in  ar‐
495       bitrary  formats.   References  of the form <file:///usr/doc/ZZZ> don't
496       work because different distributions and  local  installation  require‐
497       ments  may  place  the  files  in  different  directories (it may be in
498       /usr/doc, or /usr/local/doc, or /usr/share, or somewhere else).   Also,
499       the  directory ZZZ usually changes when a version changes (though file‐
500       name globbing could partially overcome this).  Finally, using the file:
501       scheme doesn't easily support people who dynamically load documentation
502       from the Internet (instead of loading the files onto a  local  filesys‐
503       tem).   A  future  URI scheme may be added (e.g., "userdoc:") to permit
504       programs to include cross-references  to  more  detailed  documentation
505       without  having  to know the exact location of that documentation.  Al‐
506       ternatively, a future version of the filesystem specification may spec‐
507       ify  file  locations sufficiently so that the file: scheme will be able
508       to locate documentation.
509
510       Many programs and file formats don't include a way  to  incorporate  or
511       implement links using URIs.
512
513       Many  programs  can't  handle all of these different URI formats; there
514       should be a standard mechanism to load an arbitrary URI that  automati‐
515       cally  detects  the users' environment (e.g., text or graphics, desktop
516       environment, local user preferences, and currently executing tools) and
517       invokes the right tool for any URI.
518

NAME

SYNOPSIS

DESCRIPTION

STANDARDS

NOTES

BUGS

SEE ALSO