1URI(7) Linux Programmer's Manual URI(7)
2
3
4
6 uri, url, urn - uniform resource identifier (URI), including a URL or
7 URN
8
10 URI = [ absoluteURI | relativeURI ] [ "#" fragment ]
11
12 absoluteURI = scheme ":" ( hierarchical_part | opaque_part )
13
14 relativeURI = ( net_path | absolute_path | relative_path ) [ "?" query ]
15
16 scheme = "http" | "ftp" | "gopher" | "mailto" | "news" | "telnet" |
17 "file" | "man" | "info" | "whatis" | "ldap" | "wais" | ...
18
19 hierarchical_part = ( net_path | absolute_path ) [ "?" query ]
20
21 net_path = "//" authority [ absolute_path ]
22
23 absolute_path = "/" path_segments
24
25 relative_path = relative_segment [ absolute_path ]
26
28 A Uniform Resource Identifier (URI) is a short string of characters
29 identifying an abstract or physical resource (for example, a web page).
30 A Uniform Resource Locator (URL) is a URI that identifies a resource
31 through its primary access mechanism (e.g., its network "location"),
32 rather than by name or some other attribute of that resource. A Uni‐
33 form Resource Name (URN) is a URI that must remain globally unique and
34 persistent even when the resource ceases to exist or becomes unavail‐
35 able.
36
37 URIs are the standard way to name hypertext link destinations for tools
38 such as web browsers. The string "http://www.kernelnotes.org" is a URL
39 (and thus it is also a URI). Many people use the term URL loosely as a
40 synonym for URI (though technically URLs are a subset of URIs).
41
42 URIs can be absolute or relative. An absolute identifier refers to a
43 resource independent of context, while a relative identifier refers to
44 a resource by describing the difference from the current context.
45 Within a relative path reference, the complete path segments "." and
46 ".." have special meanings: "the current hierarchy level" and "the
47 level above this hierarchy level", respectively, just like they do in
48 Unix-like systems. A path segment which contains a colon character
49 can't be used as the first segment of a relative URI path (e.g.,
50 "this:that"), because it would be mistaken for a scheme name; precede
51 such segments with ./ (e.g., "./this:that"). Note that descendants of
52 MS-DOS (e.g., Microsoft Windows) replace devicename colons with the
53 vertical bar ("|") in URIs, so "C:" becomes "C|".
54
55 A fragment identifier, if included, refers to a particular named por‐
56 tion (fragment) of a resource; text after a '#' identifies the frag‐
57 ment. A URI beginning with '#' refers to that fragment in the current
58 resource.
59
60 Usage
61 There are many different URI schemes, each with specific additional
62 rules and meanings, but they are intentionally made to be as similar as
63 possible. For example, many URL schemes permit the authority to be the
64 following format, called here an ip_server (square brackets show what's
65 optional):
66
67 ip_server = [user [ : password ] @ ] host [ : port]
68
69 This format allows you to optionally insert a username, a user plus
70 password, and/or a port number. The host is the name of the host com‐
71 puter, either its name as determined by DNS or an IP address (numbers
72 separated by periods). Thus the URI <http://fred:fredpass‐
73 word@xyz.com:8080/> logs into a web server on host xyz.com as fred
74 (using fredpassword) using port 8080. Avoid including a password in a
75 URI if possible because of the many security risks of having a password
76 written down. If the URL supplies a username but no password, and the
77 remote server requests a password, the program interpreting the URL
78 should request one from the user.
79
80 Here are some of the most common schemes in use on Unix-like systems
81 that are understood by many tools. Note that many tools using URIs
82 also have internal schemes or specialized schemes; see those tools'
83 documentation for information on those schemes.
84
85 http - Web (HTTP) server
86
87 http://ip_server/path
88 http://ip_server/path?query
89
90 This is a URL accessing a web (HTTP) server. The default port is 80.
91 If the path refers to a directory, the web server will choose what to
92 return; usually if there is a file named "index.html" or "index.htm"
93 its content is returned, otherwise, a list of the files in the current
94 directory (with appropriate links) is generated and returned. An exam‐
95 ple is <http://lwn.net>.
96
97 A query can be given in the archaic "isindex" format, consisting of a
98 word or phrase and not including an equal sign (=). A query can also
99 be in the longer "GET" format, which has one or more query entries of
100 the form key=value separated by the ampersand character (&). Note that
101 key can be repeated more than once, though it's up to the web server
102 and its application programs to determine if there's any meaning to
103 that. There is an unfortunate interaction with HTML/XML/SGML and the
104 GET query format; when such URIs with more than one key are embedded in
105 SGML/XML documents (including HTML), the ampersand (&) has to be
106 rewritten as &. Note that not all queries use this format; larger
107 forms may be too long to store as a URI, so they use a different inter‐
108 action mechanism (called POST) which does not include the data in the
109 URI. See the Common Gateway Interface specification at
110 <http://www.w3.org/CGI> for more information.
111
112 ftp - File Transfer Protocol (FTP)
113
114 ftp://ip_server/path
115
116 This is a URL accessing a file through the file transfer protocol
117 (FTP). The default port (for control) is 21. If no username is
118 included, the username "anonymous" is supplied, and in that case many
119 clients provide as the password the requestor's Internet email address.
120 An example is <ftp://ftp.is.co.za/rfc/rfc1808.txt>.
121
122 gopher - Gopher server
123
124 gopher://ip_server/gophertype selector
125 gopher://ip_server/gophertype selector%09search
126 gopher://ip_server/gophertype selector%09search%09gopher+_string
127
128 The default gopher port is 70. gophertype is a single-character field
129 to denote the Gopher type of the resource to which the URL refers. The
130 entire path may also be empty, in which case the delimiting "/" is also
131 optional and the gophertype defaults to "1".
132
133 selector is the Gopher selector string. In the Gopher protocol, Gopher
134 selector strings are a sequence of octets which may contain any octets
135 except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal (US-ASCII
136 character LF), and 0D (US-ASCII character CR).
137
138 mailto - Email address
139
140 mailto:email-address
141
142 This is an email address, usually of the form name@hostname. See
143 mailaddr(7) for more information on the correct format of an email
144 address. Note that any % character must be rewritten as %25. An exam‐
145 ple is <mailto:dwheeler@dwheeler.com>.
146
147 news - Newsgroup or News message
148
149 news:newsgroup-name
150 news:message-id
151
152 A newsgroup-name is a period-delimited hierarchical name, such as
153 "comp.infosystems.www.misc". If <newsgroup-name> is "*" (as in
154 <news:*>), it is used to refer to "all available news groups". An
155 example is <news:comp.lang.ada>.
156
157 A message-id corresponds to the Message-ID of IETF RFC 1036,
158 ⟨http://www.ietf.org/rfc/rfc1036.txt⟩ without the enclosing "<" and
159 ">"; it takes the form unique@full_domain_name. A message identifier
160 may be distinguished from a news group name by the presence of the "@"
161 character.
162
163 telnet - Telnet login
164
165 telnet://ip_server/
166
167 The Telnet URL scheme is used to designate interactive text services
168 that may be accessed by the Telnet protocol. The final "/" character
169 may be omitted. The default port is 23. An example is <tel‐
170 net://melvyl.ucop.edu/>.
171
172 file - Normal file
173
174 file://ip_server/path_segments
175 file:path_segments
176
177 This represents a file or directory accessible locally. As a special
178 case, host can be the string "localhost" or the empty string; this is
179 interpreted as "the machine from which the URL is being interpreted".
180 If the path is to a directory, the viewer should display the direc‐
181 tory's contents with links to each containee; not all viewers currently
182 do this. KDE supports generated files through the URL <file:/cgi-bin>.
183 If the given file isn't found, browser writers may want to try to
184 expand the filename via filename globbing (see glob(7) and glob(3)).
185
186 The second format (e.g., <file:/etc/passwd>) is a correct format for
187 referring to a local file. However, older standards did not permit
188 this format, and some programs don't recognize this as a URI. A more
189 portable syntax is to use an empty string as the server name, for exam‐
190 ple, <file:///etc/passwd>; this form does the same thing and is easily
191 recognized by pattern matchers and older programs as a URI. Note that
192 if you really mean to say "start from the current location," don't
193 specify the scheme at all; use a relative address like <../test.txt>,
194 which has the side-effect of being scheme-independent. An example of
195 this scheme is <file:///etc/passwd>.
196
197 man - Man page documentation
198
199 man:command-name
200 man:command-name(section)
201
202 This refers to local online manual (man) reference pages. The command
203 name can optionally be followed by a parenthesis and section number;
204 see man(7) for more information on the meaning of the section numbers.
205 This URI scheme is unique to Unix-like systems (such as Linux) and is
206 not currently registered by the IETF. An example is <man:ls(1)>.
207
208 info - Info page documentation
209
210 info:virtual-filename
211 info:virtual-filename#nodename
212 info:(virtual-filename)
213 info:(virtual-filename)nodename
214
215 This scheme refers to online info reference pages (generated from tex‐
216 info files), a documentation format used by programs such as the GNU
217 tools. This URI scheme is unique to Unix-like systems (such as Linux)
218 and is not currently registered by the IETF. As of this writing, GNOME
219 and KDE differ in their URI syntax and do not accept the other's syn‐
220 tax. The first two formats are the GNOME format; in nodenames all spa‐
221 ces are written as underscores. The second two formats are the KDE
222 format; spaces in nodenames must be written as spaces, even though this
223 is forbidden by the URI standards. It's hoped that in the future most
224 tools will understand all of these formats and will always accept
225 underscores for spaces in nodenames. In both GNOME and KDE, if the
226 form without the nodename is used the nodename is assumed to be "Top".
227 Examples of the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>.
228 Examples of the KDE format are <info:(gcc)> and <info:(gcc)G++ and
229 GCC>.
230
231 whatis - Documentation search
232
233 whatis:string
234
235 This scheme searches the database of short (one-line) descriptions of
236 commands and returns a list of descriptions containing that string.
237 Only complete word matches are returned. See whatis(1). This URI
238 scheme is unique to Unix-like systems (such as Linux) and is not cur‐
239 rently registered by the IETF.
240
241 ghelp - GNOME help documentation
242
243 ghelp:name-of-application
244
245 This loads GNOME help for the given application. Note that not much
246 documentation currently exists in this format.
247
248 ldap - Lightweight Directory Access Protocol
249
250 ldap://hostport
251 ldap://hostport/
252 ldap://hostport/dn
253 ldap://hostport/dn?attributes
254 ldap://hostport/dn?attributes?scope
255 ldap://hostport/dn?attributes?scope?filter
256 ldap://hostport/dn?attributes?scope?filter?extensions
257
258 This scheme supports queries to the Lightweight Directory Access Proto‐
259 col (LDAP), a protocol for querying a set of servers for hierarchically
260 organized information (such as people and computing resources). More
261 information on the LDAP URL scheme is available in RFC 2255.
262 ⟨http://www.ietf.org/rfc/rfc2255.txt⟩ The components of this URL are:
263
264 hostport the LDAP server to query, written as a hostname optionally
265 followed by a colon and the port number. The default LDAP
266 port is TCP port 389. If empty, the client determines
267 which the LDAP server to use.
268
269 dn the LDAP Distinguished Name, which identifies the base
270 object of the LDAP search (see RFC 2253
271 ⟨http://www.ietf.org/rfc/rfc2253.txt⟩ section 3).
272
273 attributes a comma-separated list of attributes to be returned; see
274 RFC 2251 section 4.1.5. If omitted, all attributes should
275 be returned.
276
277 scope specifies the scope of the search, which can be one of
278 "base" (for a base object search), "one" (for a one-level
279 search), or "sub" (for a subtree search). If scope is
280 omitted, "base" is assumed.
281
282 filter specifies the search filter (subset of entries to return).
283 If omitted, all entries should be returned. See RFC 2254
284 ⟨http://www.ietf.org/rfc/rfc2254.txt⟩ section 4.
285
286 extensions a comma-separated list of type=value pairs, where the
287 =value portion may be omitted for options not requiring it.
288 An extension prefixed with a '!' is critical (must be sup‐
289 ported to be valid), otherwise it is noncritical
290 (optional).
291
292 LDAP queries are easiest to explain by example. Here's a query that
293 asks ldap.itd.umich.edu for information about the University of Michi‐
294 gan in the U.S.:
295
296 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US
297
298 To just get its postal address attribute, request:
299
300 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress
301
302 To ask a host.com at port 6666 for information about the person with
303 common name (cn) "Babs Jensen" at University of Michigan, request:
304
305 ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
306
307 wais - Wide Area Information Servers
308
309 wais://hostport/database
310 wais://hostport/database?search
311 wais://hostport/database/wtype/wpath
312
313 This scheme designates a WAIS database, search, or document (see IETF
314 RFC 1625 ⟨http://www.ietf.org/rfc/rfc1625.txt⟩ for more information on
315 WAIS). Hostport is the hostname, optionally followed by a colon and
316 port number (the default port number is 210).
317
318 The first form designates a WAIS database for searching. The second
319 form designates a particular search of the WAIS database database. The
320 third form designates a particular document within a WAIS database to
321 be retrieved. wtype is the WAIS designation of the type of the object
322 and wpath is the WAIS document-id.
323
324 other schemes
325
326 There are many other URI schemes. Most tools that accept URIs support
327 a set of internal URIs (e.g., Mozilla has the about: scheme for inter‐
328 nal information, and the GNOME help browser has the toc: scheme for
329 various starting locations). There are many schemes that have been
330 defined but are not as widely used at the current time (e.g., pros‐
331 pero). The nntp: scheme is deprecated in favor of the news: scheme.
332 URNs are to be supported by the urn: scheme, with a hierarchical name
333 space (e.g., urn:ietf:... would identify IETF documents); at this time
334 URNs are not widely implemented. Not all tools support all schemes.
335
336 Character Encoding
337 URIs use a limited number of characters so that they can be typed in
338 and used in a variety of situations.
339
340 The following characters are reserved, that is, they may appear in a
341 URI but their use is limited to their reserved purpose (conflicting
342 data must be escaped before forming the URI):
343
344 ; / ? : @ & = + $ ,
345
346 Unreserved characters may be included in a URI. Unreserved characters
347 include upper and lower case English letters, decimal digits, and the
348 following limited set of punctuation marks and symbols:
349
350 - _ . ! ~ * ' ( )
351
352 All other characters must be escaped. An escaped octet is encoded as a
353 character triplet, consisting of the percent character "%" followed by
354 the two hexadecimal digits representing the octet code (you can use
355 upper or lower case letters for the hexadecimal digits). For example,
356 a blank space must be escaped as "%20", a tab character as "%09", and
357 the "&" as "%26". Because the percent "%" character always has the
358 reserved purpose of being the escape indicator, it must be escaped as
359 "%25". It is common practice to escape space characters as the plus
360 symbol (+) in query text; this practice isn't uniformly defined in the
361 relevant RFCs (which recommend %20 instead) but any tool accepting URIs
362 with query text should be prepared for them. A URI is always shown in
363 its "escaped" form.
364
365 Unreserved characters can be escaped without changing the semantics of
366 the URI, but this should not be done unless the URI is being used in a
367 context that does not allow the unescaped character to appear. For
368 example, "%7e" is sometimes used instead of "~" in an HTTP URL path,
369 but the two are equivalent for an HTTP URL.
370
371 For URIs which must handle characters outside the US ASCII character
372 set, the HTML 4.01 specification (section B.2) and IETF RFC 2718 (sec‐
373 tion 2.2.5) recommend the following approach:
374
375 1. translate the character sequences into UTF-8 (IETF RFC 2279) — see
376 utf-8(7) — and then
377
378 2. use the URI escaping mechanism, that is, use the %HH encoding for
379 unsafe octets.
380
381 Writing a URI
382 When written, URIs should be placed inside double quotes (e.g.,
383 "http://www.kernelnotes.org"), enclosed in angle brackets (e.g.,
384 <http://lwn.net>), or placed on a line by themselves. A warning for
385 those who use double-quotes: never move extraneous punctuation (such as
386 the period ending a sentence or the comma in a list) inside a URI,
387 since this will change the value of the URI. Instead, use angle brack‐
388 ets instead, or switch to a quoting system that never includes extrane‐
389 ous characters inside quotation marks. This latter system, called the
390 'new' or 'logical' quoting system by "Hart's Rules" and the "Oxford
391 Dictionary for Writers and Editors", is preferred practice in Great
392 Britain and hackers worldwide (see the Jargon File's section on Hacker
393 Writing Style, http://www.fwi.uva.nl/~mes/jargon/h/HackerWrit‐
394 ingStyle.html, for more information). Older documents suggested
395 inserting the prefix "URL:" just before the URI, but this form has
396 never caught on.
397
398 The URI syntax was designed to be unambiguous. However, as URIs have
399 become commonplace, traditional media (television, radio, newspapers,
400 billboards, etc.) have increasingly used abbreviated URI references
401 consisting of only the authority and path portions of the identified
402 resource (e.g., <www.w3.org/Addressing>). Such references are primar‐
403 ily intended for human interpretation rather than machine, with the
404 assumption that context-based heuristics are sufficient to complete the
405 URI (e.g., hostnames beginning with "www" are likely to have a URI pre‐
406 fix of "http://" and hostnames beginning with "ftp" likely to have a
407 prefix of "ftp://"). Many client implementations heuristically resolve
408 these references. Such heuristics may change over time, particularly
409 when new schemes are introduced. Since an abbreviated URI has the same
410 syntax as a relative URL path, abbreviated URI references cannot be
411 used where relative URIs are permitted, and can only be used when there
412 is no defined base (such as in dialog boxes). Don't use abbreviated
413 URIs as hypertext links inside a document; use the standard format as
414 described here.
415
417 http://www.ietf.org/rfc/rfc2396.txt (IETF RFC 2396),
418 http://www.w3.org/TR/REC-html40 (HTML 4.0).
419
421 Any tool accepting URIs (e.g., a web browser) on a Linux system should
422 be able to handle (directly or indirectly) all of the schemes described
423 here, including the man: and info: schemes. Handling them by invoking
424 some other program is fine and in fact encouraged.
425
426 Technically the fragment isn't part of the URI.
427
428 For information on how to embed URIs (including URLs) in a data format,
429 see documentation on that format. HTML uses the format <A HREF="uri">
430 text </A>. Texinfo files use the format @uref{uri}. Man and mdoc have
431 the recently added UR macro, or just include the URI in the text (view‐
432 ers should be able to detect :// as part of a URI).
433
434 The GNOME and KDE desktop environments currently vary in the URIs they
435 accept, in particular in their respective help browsers. To list man
436 pages, GNOME uses <toc:man> while KDE uses <man:(index)>, and to list
437 info pages, GNOME uses <toc:info> while KDE uses <info:(dir)> (the
438 author of this man page prefers the KDE approach here, though a more
439 regular format would be even better). In general, KDE uses <file:/cgi-
440 bin/> as a prefix to a set of generated files. KDE prefers documenta‐
441 tion in HTML, accessed via the <file:/cgi-bin/helpindex>. GNOME
442 prefers the ghelp scheme to store and find documentation. Neither
443 browser handles file: references to directories at the time of this
444 writing, making it difficult to refer to an entire directory with a
445 browsable URI. As noted above, these environments differ in how they
446 handle the info: scheme, probably the most important variation. It is
447 expected that GNOME and KDE will converge to common URI formats, and a
448 future version of this man page will describe the converged result.
449 Efforts to aid this convergence are encouraged.
450
451 Security
452 A URI does not in itself pose a security threat. There is no general
453 guarantee that a URL, which at one time located a given resource, will
454 continue to do so. Nor is there any guarantee that a URL will not
455 locate a different resource at some later point in time; such a guaran‐
456 tee can only be obtained from the person(s) controlling that namespace
457 and the resource in question.
458
459 It is sometimes possible to construct a URL such that an attempt to
460 perform a seemingly harmless operation, such as the retrieval of an
461 entity associated with the resource, will in fact cause a possibly dam‐
462 aging remote operation to occur. The unsafe URL is typically con‐
463 structed by specifying a port number other than that reserved for the
464 network protocol in question. The client unwittingly contacts a site
465 that is in fact running a different protocol. The content of the URL
466 contains instructions that, when interpreted according to this other
467 protocol, cause an unexpected operation. An example has been the use
468 of a gopher URL to cause an unintended or impersonating message to be
469 sent via a SMTP server.
470
471 Caution should be used when using any URL that specifies a port number
472 other than the default for the protocol, especially when it is a number
473 within the reserved space.
474
475 Care should be taken when a URI contains escaped delimiters for a given
476 protocol (for example, CR and LF characters for telnet protocols) that
477 these are not unescaped before transmission. This might violate the
478 protocol, but avoids the potential for such characters to be used to
479 simulate an extra operation or parameter in that protocol, which might
480 lead to an unexpected and possibly harmful remote operation to be per‐
481 formed.
482
483 It is clearly unwise to use a URI that contains a password which is
484 intended to be secret. In particular, the use of a password within the
485 "userinfo" component of a URI is strongly recommended against except in
486 those rare cases where the "password" parameter is intended to be pub‐
487 lic.
488
490 Documentation may be placed in a variety of locations, so there cur‐
491 rently isn't a good URI scheme for general online documentation in
492 arbitrary formats. References of the form <file:///usr/doc/ZZZ> don't
493 work because different distributions and local installation require‐
494 ments may place the files in different directories (it may be in
495 /usr/doc, or /usr/local/doc, or /usr/share, or somewhere else). Also,
496 the directory ZZZ usually changes when a version changes (though file‐
497 name globbing could partially overcome this). Finally, using the file:
498 scheme doesn't easily support people who dynamically load documentation
499 from the Internet (instead of loading the files onto a local file sys‐
500 tem). A future URI scheme may be added (e.g., "userdoc:") to permit
501 programs to include cross-references to more detailed documentation
502 without having to know the exact location of that documentation.
503 Alternatively, a future version of the file-system specification may
504 specify file locations sufficiently so that the file: scheme will be
505 able to locate documentation.
506
507 Many programs and file formats don't include a way to incorporate or
508 implement links using URIs.
509
510 Many programs can't handle all of these different URI formats; there
511 should be a standard mechanism to load an arbitrary URI that automati‐
512 cally detects the users' environment (e.g., text or graphics, desktop
513 environment, local user preferences, and currently executing tools) and
514 invokes the right tool for any URI.
515
517 lynx(1), man2html(1), mailaddr(7), utf-8(7), IETF RFC 2255
518 ⟨http://www.ietf.org/rfc/rfc2255.txt⟩
519
521 This page is part of release 3.25 of the Linux man-pages project. A
522 description of the project, and information about reporting bugs, can
523 be found at http://www.kernel.org/doc/man-pages/.
524
525
526
527Linux 2000-03-14 URI(7)