1URI(7) Linux Programmer's Manual URI(7)
2
3
4
6 uri, url, urn - uniform resource identifier (URI), including a URL or
7 URN
8
10 URI = [ absoluteURI | relativeURI ] [ "#" fragment ]
11
12 absoluteURI = scheme ":" ( hierarchical_part | opaque_part )
13
14 relativeURI = ( net_path | absolute_path | relative_path ) [ "?" query ]
15
16
17 scheme = "http" | "ftp" | "gopher" | "mailto" | "news" | "telnet" | "file" | "man" | "info" | "whatis" | "ldap" | "wais" | ...
18
19 hierarchical_part = ( net_path | absolute_path ) [ "?" query ]
20
21
22 net_path = "//" authority [ absolute_path ]
23
24 absolute_path = "/" path_segments
25
26 relative_path = relative_segment [ absolute_path ]
27
29 A Uniform Resource Identifier (URI) is a short string of characters
30 identifying an abstract or physical resource (for example, a web page).
31 A Uniform Resource Locator (URL) is a URI that identifies a resource
32 through its primary access mechanism (e.g., its network "location"),
33 rather than by name or some other attribute of that resource. A Uni‐
34 form Resource Name (URN) is a URI that must remain globally unique and
35 persistent even when the resource ceases to exist or becomes unavail‐
36 able.
37
38 URIs are the standard way to name hypertext link destinations for tools
39 such as web browsers. The string "http://www.kernelnotes.org" is a URL
40 (and thus it's a URI). Many people use the term URL loosely as a syn‐
41 onym for URI (though technically URLs are a subset of URIs).
42
43 URIs can be absolute or relative. An absolute identifier refers to a
44 resource independent of context, while a relative identifier refers to
45 a resource by describing the difference from the current context.
46 Within a relative path reference, the complete path segments "." and
47 ".." have special meanings: "the current hierarchy level" and "the
48 level above this hierarchy level", respectively, just like they do in
49 Unix-like systems. A path segment which contains a colon character
50 can't be used as the first segment of a relative URI path (e.g.,
51 "this:that"), because it would be mistaken for a scheme name; precede
52 such segments with ./ (e.g., "./this:that"). Note that descendants of
53 MS-DOS (e.g., Microsoft Windows) replace devicename colons with the
54 vertical bar ("|") in URIs, so "C:" becomes "C|".
55
56 A fragment identifier, if included, refers to a particular named por‐
57 tion (fragment) of a resource; text after a '#' identifies the frag‐
58 ment. A URI beginning with '#' refers to that fragment in the current
59 resource.
60
62 There are many different URI schemes, each with specific additional
63 rules and meanings, but they are intentionally made to be as similar as
64 possible. For example, many URL schemes permit the authority to be the
65 following format, called here an ip_server (square brackets show what's
66 optional):
67
68 ip_server = [user [ : password ] @ ] host [ : port]
69
70 This format allows you to optionally insert a user name, a user plus
71 password, and/or a port number. The host is the name of the host com‐
72 puter, either its name as determined by DNS or an IP address (numbers
73 separated by periods). Thus the URI <http://fred:fredpass‐
74 word@xyz.com:8080/> logs into a web server on host xyz.com as fred
75 (using fredpassword) using port 8080. Avoid including a password in a
76 URI if possible because of the many security risks of having a password
77 written down. If the URL supplies a user name but no password, and the
78 remote server requests a password, the program interpreting the URL
79 should request one from the user.
80
81 Here are some of the most common schemes in use on Unix-like systems
82 that are understood by many tools. Note that many tools using URIs
83 also have internal schemes or specialized schemes; see those tools'
84 documentation for information on those schemes.
85
86 http - Web (HTTP) server
87 http://ip_server/path
88 http://ip_server/path?query
89
90 This is a URL accessing a web (HTTP) server. The default port is 80.
91 If the path refers to a directory, the web server will choose what to
92 return; usually if there is a file named "index.html" or "index.htm"
93 its content is returned, otherwise, a list of the files in the current
94 directory (with appropriate links) is generated and returned. An exam‐
95 ple is <http://lwn.net>.
96
97 A query can be given in the archaic "isindex" format, consisting of a
98 word or phrase and not including an equal sign (=). A query can also
99 be in the longer "GET" format, which has one or more query entries of
100 the form key=value separated by the ampersand character (&). Note that
101 key can be repeated more than once, though it's up to the web server
102 and its application programs to determine if there's any meaning to
103 that. There is an unfortunate interaction with HTML/XML/SGML and the
104 GET query format; when such URIs with more than one key are embedded in
105 SGML/XML documents (including HTML), the ampersand (&) has to be
106 rewritten as &. Note that not all queries use this format; larger
107 forms may be too long to store as a URI, so they use a different inter‐
108 action mechanism (called POST) which does not include the data in the
109 URI. See the Common Gateway Interface specification at
110 <http://www.w3.org/CGI> for more information.
111
112 ftp - File Transfer Protocol (FTP)
113 ftp://ip_server/path
114
115 This is a URL accessing a file through the file transfer protocol
116 (FTP). The default port (for control) is 21. If no username is
117 included, the user name "anonymous" is supplied, and in that case many
118 clients provide as the password the requestor's Internet email address.
119 An example is <ftp://ftp.is.co.za/rfc/rfc1808.txt>.
120
121 gopher - Gopher server
122 gopher://ip_server/gophertype selector
123 gopher://ip_server/gophertype selector%09search
124 gopher://ip_server/gophertype selector%09search%09gopher+_string
125
126 The default gopher port is 70. gophertype is a single-character field
127 to denote the Gopher type of the resource to which the URL refers. The
128 entire path may also be empty, in which case the delimiting "/" is also
129 optional and the gophertype defaults to "1".
130
131 selector is the Gopher selector string. In the Gopher protocol, Gopher
132 selector strings are a sequence of octets which may contain any octets
133 except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal (US-ASCII
134 character LF), and 0D (US-ASCII character CR).
135
136 mailto - Email address
137 mailto:email-address
138
139 This is an email address, usually of the form name@hostname. See
140 mailaddr(7) for more information on the correct format of an email
141 address. Note that any % character must be rewritten as %25. An exam‐
142 ple is <mailto:dwheeler@dwheeler.com>.
143
144 news - Newsgroup or News message
145 news:newsgroup-name
146 news:message-id
147
148 A newsgroup-name is a period-delimited hierarchical name, such as
149 "comp.infosystems.www.misc". If <newsgroup-name> is "*" (as in
150 <news:*>), it is used to refer to "all available news groups". An
151 example is <news:comp.lang.ada>.
152
153 A message-id corresponds to the Message-ID of IETF RFC 1036, ⟨⟩ without
154 the enclosing "<" and ">"; it takes the form unique@full_domain_name.
155 A message identifier may be distinguished from a news group name by the
156 presence of the "@" character.
157
158 telnet - Telnet login
159 telnet://ip_server/
160
161 The Telnet URL scheme is used to designate interactive text services
162 that may be accessed by the Telnet protocol. The final "/" character
163 may be omitted. The default port is 23. An example is <tel‐
164 net://melvyl.ucop.edu/>.
165
166 file - Normal file
167 file://ip_server/path_segments
168 file:path_segments
169
170 This represents a file or directory accessible locally. As a special
171 case, host can be the string "localhost" or the empty string; this is
172 interpreted as `the machine from which the URL is being interpreted'.
173 If the path is to a directory, the viewer should display the direc‐
174 tory's contents with links to each containee; not all viewers currently
175 do this. KDE supports generated files through the URL <file:/cgi-bin>.
176 If the given file isn't found, browser writers may want to try to
177 expand the filename via filename globbing (see glob(7) and glob(3)).
178
179 The second format (e.g., <file:/etc/passwd>) is a correct format for
180 referring to a local file. However, older standards did not permit
181 this format, and some programs don't recognize this as a URI. A more
182 portable syntax is to use an empty string as the server name, e.g.,
183 <file:///etc/passwd>; this form does the same thing and is easily rec‐
184 ognized by pattern matchers and older programs as a URI. Note that if
185 you really mean to say "start from the current location," don't specify
186 the scheme at all; use a relative address like <../test.txt>, which has
187 the side-effect of being scheme-independent. An example of this scheme
188 is <file:///etc/passwd>.
189
190 man - Man page documentation
191 man:command-name
192 man:command-name(section)
193
194 This refers to local online manual (man) reference pages. The command
195 name can optionally be followed by a parenthesis and section number;
196 see man(7) for more information on the meaning of the section numbers.
197 This URI scheme is unique to Unix-like systems (such as Linux) and is
198 not currently registered by the IETF. An example is <man:ls(1)>.
199
200 info - Info page documentation
201 info:virtual-filename
202 info:virtual-filename#nodename
203 info:(virtual-filename)
204 info:(virtual-filename)nodename
205
206 This scheme refers to online info reference pages (generated from tex‐
207 info files), a documentation format used by programs such as the GNU
208 tools. This URI scheme is unique to Unix-like systems (such as Linux)
209 and is not currently registered by the IETF. As of this writing, GNOME
210 and KDE differ in their URI syntax and do not accept the other's syn‐
211 tax. The first two formats are the GNOME format; in nodenames all spa‐
212 ces are written as underscores. The second two formats are the KDE
213 format; spaces in nodenames must be written as spaces, even though this
214 is forbidden by the URI standards. It's hoped that in the future most
215 tools will understand all of these formats and will always accept
216 underscores for spaces in nodenames. In both GNOME and KDE, if the
217 form without the nodename is used the nodename is assumed to be "Top".
218 Examples of the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>.
219 Examples of the KDE format are <info:(gcc)> and <info:(gcc)G++ and
220 GCC>.
221
222 whatis - Documentation search
223 whatis:string
224
225 This scheme searches the database of short (one-line) descriptions of
226 commands and returns a list of descriptions containing that string.
227 Only complete word matches are returned. See whatis(1). This URI
228 scheme is unique to Unix-like systems (such as Linux) and is not cur‐
229 rently registered by the IETF.
230
231 ghelp - GNOME help documentation
232 ghelp:name-of-application
233
234 This loads GNOME help for the given application. Note that not much
235 documentation currently exists in this format.
236
237 ldap - Lightweight Directory Access Protocol
238 ldap://hostport
239 ldap://hostport/
240 ldap://hostport/dn
241 ldap://hostport/dn?attributes
242 ldap://hostport/dn?attributes?scope
243 ldap://hostport/dn?attributes?scope?filter
244 ldap://hostport/dn?attributes?scope?filter?extensions
245
246 This scheme supports queries to the Lightweight Directory Access Proto‐
247 col (LDAP), a protocol for querying a set of servers for hierarchi‐
248 cally-organized information (such as people and computing resources).
249 More information on the LDAP URL scheme is available in RFC 2255.
250 ⟨http://www.ietf.org/rfc/rfc2255.txt⟩ The components of this URL are:
251
252 hostport the LDAP server to query, written as a hostname optionally
253 followed by a colon and the port number. The default LDAP
254 port is TCP port 389. If empty, the client determines
255 which the LDAP server to use.
256
257 dn the LDAP Distinguished Name, which identifies the base
258 object of the LDAP search (see RFC 2253
259 ⟨http://www.ietf.org/rfc/rfc2253.txt⟩ section 3).
260
261 attributes a comma-separated list of attributes to be returned; see
262 RFC 2251 section 4.1.5. If omitted, all attributes should
263 be returned.
264
265 scope specifies the scope of the search, which can be one of
266 "base" (for a base object search), "one" (for a one-level
267 search), or "sub" (for a subtree search). If scope is
268 omitted, "base" is assumed.
269
270 filter specifies the search filter (subset of entries to return).
271 If omitted, all entries should be returned. See RFC 2254
272 ⟨http://www.ietf.org/rfc/rfc2254.txt⟩ section 4.
273
274 extensions a comma-separated list of type=value pairs, where the
275 =value portion may be omitted for options not requiring it.
276 An extension prefixed with a '!' is critical (must be sup‐
277 ported to be valid), otherwise it's non-critical
278 (optional).
279
280 LDAP queries are easiest to explain by example. Here's a query that
281 asks ldap.itd.umich.edu for information about the University of Michi‐
282 gan in the U.S.:
283 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US
284
285 To just get its postal address attribute, request:
286 ldap://ldap.itd.umich.edu/o=University%20of%20Michi‐
287 gan,c=US?postalAddress
288
289 To ask a host.com at port 6666 for information about the person with
290 common name (cn) "Babs Jensen" at University of Michigan, request:
291 ldap://host.com:6666/o=University%20of%20Michi‐
292 gan,c=US??sub?(cn=Babs%20Jensen)
293
294 wais - Wide Area Information Servers
295 wais://hostport/database
296 wais://hostport/database?search
297 wais://hostport/database/wtype/wpath
298
299 This scheme designates a WAIS database, search, or document (see IETF
300 RFC 1625 ⟨http://www.ietf.org/rfc/rfc1625.txt⟩ for more information on
301 WAIS). Hostport is the hostname, optionally followed by a colon and
302 port number (the default port number is 210).
303
304 The first form designates a WAIS database for searching. The second
305 form designates a particular search of the WAIS database database. The
306 third form designates a particular document within a WAIS database to
307 be retrieved. wtype is the WAIS designation of the type of the object
308 and wpath is the WAIS document-id.
309
310 other schemes
311 There are many other URI schemes. Most tools that accept URIs support
312 a set of internal URIs (e.g., Mozilla has the about: scheme for inter‐
313 nal information, and the GNOME help browser has the toc: scheme for
314 various starting locations). There are many schemes that have been
315 defined but are not as widely used at the current time (e.g., pros‐
316 pero). The nntp: scheme is deprecated in favor of the news: scheme.
317 URNs are to be supported by the urn: scheme, with a hierarchical name
318 space (e.g., urn:ietf:... would identify IETF documents); at this time
319 URNs are not widely implemented. Not all tools support all schemes.
320
322 URIs use a limited number of characters so that they can be typed in
323 and used in a variety of situations.
324
325 The following characters are reserved, that is, they may appear in a
326 URI but their use is limited to their reserved purpose (conflicting
327 data must be escaped before forming the URI):
328
329 ; / ? : @ & = + $ ,
330
331 Unreserved characters may be included in a URI. Unreserved characters
332 include upper and lower case English letters, decimal digits, and the
333 following limited set of punctuation marks and symbols:
334
335 - _ . ! ~ * ' ( )
336
337 All other characters must be escaped. An escaped octet is encoded as a
338 character triplet, consisting of the percent character "%" followed by
339 the two hexadecimal digits representing the octet code (you can use
340 upper or lower case letters for the hexadecimal digits). For example, a
341 blank space must be escaped as "%20", a tab character as "%09", and the
342 "&" as "%26". Because the percent "%" character always has the
343 reserved purpose of being the escape indicator, it must be escaped as
344 "%25". It is common practice to escape space characters as the plus
345 symbol (+) in query text; this practice isn't uniformly defined in the
346 relevant RFCs (which recommend %20 instead) but any tool accepting URIs
347 with query text should be prepared for them. A URI is always shown in
348 its "escaped" form.
349
350 Unreserved characters can be escaped without changing the semantics of
351 the URI, but this should not be done unless the URI is being used in a
352 context that does not allow the unescaped character to appear. For
353 example, "%7e" is sometimes used instead of "~" in an http URL path,
354 but the two are equivalent for an http URL.
355
356 For URIs which must handle characters outside the US ASCII character
357 set, the HTML 4.01 specification (section B.2) and IETF RFC 2718 (sec‐
358 tion 2.2.5) recommend the following approach:
359
360 1. translate the character sequences into UTF-8 (IETF RFC 2279) — see
361 utf-8(7) — and then
362
363 2. use the URI escaping mechanism, that is, use the %HH encoding for
364 unsafe octets.
365
367 When written, URIs should be placed inside doublequotes (e.g.,
368 "http://www.kernelnotes.org"), enclosed in angle brackets (e.g.,
369 <http://lwn.net>), or placed on a line by themselves. A warning for
370 those who use double-quotes: never move extraneous punctuation (such as
371 the period ending a sentence or the comma in a list) inside a URI,
372 since this will change the value of the URI. Instead, use angle brack‐
373 ets instead, or switch to a quoting system that never includes extrane‐
374 ous characters inside quotation marks. This latter system, called the
375 'new' or 'logical' quoting system by "Hart's Rules" and the "Oxford
376 Dictionary for Writers and Editors", is preferred practice in Great
377 Britain and hackers worldwide (see the Jargon File's section on Hacker
378 Writing Style, http://www.fwi.uva.nl/~mes/jargon/h/HackerWrit‐
379 ingStyle.html, for more information). Older documents suggested
380 inserting the prefix "URL:" just before the URI, but this form has
381 never caught on.
382
383 The URI syntax was designed to be unambiguous. However, as URIs have
384 become commonplace, traditional media (television, radio, newspapers,
385 billboards, etc.) have increasingly used abbreviated URI references
386 consisting of only the authority and path portions of the identified
387 resource (e.g., <www.w3.org/Addressing>). Such references are primar‐
388 ily intended for human interpretation rather than machine, with the
389 assumption that context-based heuristics are sufficient to complete the
390 URI (e.g., hostnames beginning with "www" are likely to have a URI pre‐
391 fix of "http://" and hostnames beginning with "ftp" likely to have a
392 prefix of "ftp://"). Many client implementations heuristically resolve
393 these references. Such heuristics may change over time, particularly
394 when new schemes are introduced. Since an abbreviated URI has the same
395 syntax as a relative URL path, abbreviated URI references cannot be
396 used where relative URIs are permitted, and can only be used when there
397 is no defined base (such as in dialog boxes). Don't use abbreviated
398 URIs as hypertext links inside a document; use the standard format as
399 described here.
400
402 Any tool accepting URIs (e.g., a web browser) on a Linux system should
403 be able to handle (directly or indirectly) all of the schemes described
404 here, including the man: and info: schemes. Handling them by invoking
405 some other program is fine and in fact encouraged.
406
407 Technically the fragment isn't part of the URI.
408
409 For information on how to embed URIs (including URLs) in a data format,
410 see documentation on that format. HTML uses the format <A HREF="uri">
411 text </A>. Texinfo files use the format @uref{uri}. Man and mdoc have
412 the recently-added UR macro, or just include the URI in the text (view‐
413 ers should be able to detect :// as part of a URI).
414
415 The GNOME and KDE desktop environments currently vary in the URIs they
416 accept, in particular in their respective help browsers. To list man
417 pages, GNOME uses <toc:man> while KDE uses <man:(index)>, and to list
418 info pages, GNOME uses <toc:info> while KDE uses <info:(dir)> (the
419 author of this man page prefers the KDE approach here, though a more
420 regular format would be even better). In general, KDE uses <file:/cgi-
421 bin/> as a prefix to a set of generated files. KDE prefers documenta‐
422 tion in HTML, accessed via the <file:/cgi-bin/helpindex>. GNOME
423 prefers the ghelp scheme to store and find documentation. Neither
424 browser handles file: references to directories at the time of this
425 writing, making it difficult to refer to an entire directory with a
426 browsable URI. As noted above, these environments differ in how they
427 handle the info: scheme, probably the most important variation. It is
428 expected that GNOME and KDE will converge to common URI formats, and a
429 future version of this man page will describe the converged result.
430 Efforts to aid this convergence are encouraged.
431
433 A URI does not in itself pose a security threat. There is no general
434 guarantee that a URL, which at one time located a given resource, will
435 continue to do so. Nor is there any guarantee that a URL will not
436 locate a different resource at some later point in time; such a guaran‐
437 tee can only be obtained from the person(s) controlling that namespace
438 and the resource in question.
439
440 It is sometimes possible to construct a URL such that an attempt to
441 perform a seemingly harmless operation, such as the retrieval of an
442 entity associated with the resource, will in fact cause a possibly dam‐
443 aging remote operation to occur. The unsafe URL is typically con‐
444 structed by specifying a port number other than that reserved for the
445 network protocol in question. The client unwittingly contacts a site
446 that is in fact running a different protocol. The content of the URL
447 contains instructions that, when interpreted according to this other
448 protocol, cause an unexpected operation. An example has been the use
449 of a gopher URL to cause an unintended or impersonating message to be
450 sent via a SMTP server.
451
452 Caution should be used when using any URL that specifies a port number
453 other than the default for the protocol, especially when it is a number
454 within the reserved space.
455
456 Care should be taken when a URI contains escaped delimiters for a given
457 protocol (for example, CR and LF characters for telnet protocols) that
458 these are not unescaped before transmission. This might violate the
459 protocol, but avoids the potential for such characters to be used to
460 simulate an extra operation or parameter in that protocol, which might
461 lead to an unexpected and possibly harmful remote operation to be per‐
462 formed.
463
464 It is clearly unwise to use a URI that contains a password which is
465 intended to be secret. In particular, the use of a password within the
466 'userinfo' component of a URI is strongly recommended against except in
467 those rare cases where the 'password' parameter is intended to be pub‐
468 lic.
469
471 http://www.ietf.org/rfc/rfc2396.txt (IETF RFC 2396),
472 http://www.w3.org/TR/REC-html40 (HTML 4.0).
473 ⟨http://www.ietf.org/rfc/rfc1625.txt⟩
474
476 Documentation may be placed in a variety of locations, so there cur‐
477 rently isn't a good URI scheme for general online documentation in
478 arbitrary formats. References of the form <file:///usr/doc/ZZZ> don't
479 work because different distributions and local installation require‐
480 ments may place the files in different directories (it may be in
481 /usr/doc, or /usr/local/doc, or /usr/share, or somewhere else). Also,
482 the directory ZZZ usually changes when a version changes (though file‐
483 name globbing could partially overcome this). Finally, using the file:
484 scheme doesn't easily support people who dynamically load documentation
485 from the Internet (instead of loading the files onto a local filesys‐
486 tem). A future URI scheme may be added (e.g., "userdoc:") to permit
487 programs to include cross-references to more detailed documentation
488 without having to know the exact location of that documentation.
489 Alternatively, a future version of the filesystem specification may
490 specify file locations sufficiently so that the file: scheme will be
491 able to locate documentation.
492
493 Many programs and file formats don't include a way to incorporate or
494 implement links using URIs.
495
496 Many programs can't handle all of these different URI formats; there
497 should be a standard mechanism to load an arbitrary URI that automati‐
498 cally detects the users' environment (e.g., text or graphics, desktop
499 environment, local user preferences, and currently-executing tools) and
500 invokes the right tool for any URI.
501
503 David A. Wheeler (dwheeler@dwheeler.com) wrote this man page.
504
506 lynx(1), man2html(1), mailaddr(7), utf-8(7) IETF RFC 2255.
507 ⟨http://www.ietf.org/rfc/rfc2255.txt⟩
508
509
510
511Linux 2000-03-14 URI(7)