1uri(7) Miscellaneous Information Manual uri(7)
2
3
4
6 uri, url, urn - uniform resource identifier (URI), including a URL or
7 URN
8
10 URI = [ absoluteURI | relativeURI ] [ "#" fragment ]
11
12 absoluteURI = scheme ":" ( hierarchical_part | opaque_part )
13
14 relativeURI = ( net_path | absolute_path | relative_path )
15 [ "?" query ]
16
17 scheme = "http" | "ftp" | "gopher" | "mailto" | "news" | "telnet" |
18 "file" | "ftp" | "man" | "info" | "whatis" | "ldap" | "wais" |
19 ...
20
21 hierarchical_part = ( net_path | absolute_path ) [ "?" query ]
22
23 net_path = "//" authority [ absolute_path ]
24
25 absolute_path = "/" path_segments
26
27 relative_path = relative_segment [ absolute_path ]
28
30 A Uniform Resource Identifier (URI) is a short string of characters
31 identifying an abstract or physical resource (for example, a web page).
32 A Uniform Resource Locator (URL) is a URI that identifies a resource
33 through its primary access mechanism (e.g., its network "location"),
34 rather than by name or some other attribute of that resource. A Uni‐
35 form Resource Name (URN) is a URI that must remain globally unique and
36 persistent even when the resource ceases to exist or becomes unavail‐
37 able.
38
39 URIs are the standard way to name hypertext link destinations for tools
40 such as web browsers. The string "http://www.kernel.org" is a URL (and
41 thus it is also a URI). Many people use the term URL loosely as a syn‐
42 onym for URI (though technically URLs are a subset of URIs).
43
44 URIs can be absolute or relative. An absolute identifier refers to a
45 resource independent of context, while a relative identifier refers to
46 a resource by describing the difference from the current context.
47 Within a relative path reference, the complete path segments "." and
48 ".." have special meanings: "the current hierarchy level" and "the
49 level above this hierarchy level", respectively, just like they do in
50 UNIX-like systems. A path segment which contains a colon character
51 can't be used as the first segment of a relative URI path (e.g.,
52 "this:that"), because it would be mistaken for a scheme name; precede
53 such segments with ./ (e.g., "./this:that"). Note that descendants of
54 MS-DOS (e.g., Microsoft Windows) replace devicename colons with the
55 vertical bar ("|") in URIs, so "C:" becomes "C|".
56
57 A fragment identifier, if included, refers to a particular named por‐
58 tion (fragment) of a resource; text after a '#' identifies the frag‐
59 ment. A URI beginning with '#' refers to that fragment in the current
60 resource.
61
62 Usage
63 There are many different URI schemes, each with specific additional
64 rules and meanings, but they are intentionally made to be as similar as
65 possible. For example, many URL schemes permit the authority to be the
66 following format, called here an ip_server (square brackets show what's
67 optional):
68
69 ip_server = [user [ : password ] @ ] host [ : port]
70
71 This format allows you to optionally insert a username, a user plus
72 password, and/or a port number. The host is the name of the host com‐
73 puter, either its name as determined by DNS or an IP address (numbers
74 separated by periods). Thus the URI <http://fred:fredpassword@exam‐
75 ple.com:8080/> logs into a web server on host example.com as fred (us‐
76 ing fredpassword) using port 8080. Avoid including a password in a URI
77 if possible because of the many security risks of having a password
78 written down. If the URL supplies a username but no password, and the
79 remote server requests a password, the program interpreting the URL
80 should request one from the user.
81
82 Here are some of the most common schemes in use on UNIX-like systems
83 that are understood by many tools. Note that many tools using URIs
84 also have internal schemes or specialized schemes; see those tools'
85 documentation for information on those schemes.
86
87 http - Web (HTTP) server
88
89 http://ip_server/path
90 http://ip_server/path?query
91
92 This is a URL accessing a web (HTTP) server. The default port is 80.
93 If the path refers to a directory, the web server will choose what to
94 return; usually if there is a file named "index.html" or "index.htm"
95 its content is returned, otherwise, a list of the files in the current
96 directory (with appropriate links) is generated and returned. An exam‐
97 ple is <http://lwn.net>.
98
99 A query can be given in the archaic "isindex" format, consisting of a
100 word or phrase and not including an equal sign (=). A query can also
101 be in the longer "GET" format, which has one or more query entries of
102 the form key=value separated by the ampersand character (&). Note that
103 key can be repeated more than once, though it's up to the web server
104 and its application programs to determine if there's any meaning to
105 that. There is an unfortunate interaction with HTML/XML/SGML and the
106 GET query format; when such URIs with more than one key are embedded in
107 SGML/XML documents (including HTML), the ampersand (&) has to be
108 rewritten as &. Note that not all queries use this format; larger
109 forms may be too long to store as a URI, so they use a different inter‐
110 action mechanism (called POST) which does not include the data in the
111 URI. See the Common Gateway Interface specification at
112 ⟨http://www.w3.org/CGI⟩ for more information.
113
114 ftp - File Transfer Protocol (FTP)
115
116 ftp://ip_server/path
117
118 This is a URL accessing a file through the file transfer protocol
119 (FTP). The default port (for control) is 21. If no username is in‐
120 cluded, the username "anonymous" is supplied, and in that case many
121 clients provide as the password the requestor's Internet email address.
122 An example is <ftp://ftp.is.co.za/rfc/rfc1808.txt>.
123
124 gopher - Gopher server
125
126 gopher://ip_server/gophertype selector
127 gopher://ip_server/gophertype selector%09search
128 gopher://ip_server/gophertype selector%09search%09gopher+_string
129
130 The default gopher port is 70. gophertype is a single-character field
131 to denote the Gopher type of the resource to which the URL refers. The
132 entire path may also be empty, in which case the delimiting "/" is also
133 optional and the gophertype defaults to "1".
134
135 selector is the Gopher selector string. In the Gopher protocol, Gopher
136 selector strings are a sequence of octets which may contain any octets
137 except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal (US-ASCII
138 character LF), and 0D (US-ASCII character CR).
139
140 mailto - Email address
141
142 mailto:email-address
143
144 This is an email address, usually of the form name@hostname. See
145 mailaddr(7) for more information on the correct format of an email ad‐
146 dress. Note that any % character must be rewritten as %25. An example
147 is <mailto:dwheeler@dwheeler.com>.
148
149 news - Newsgroup or News message
150
151 news:newsgroup-name
152 news:message-id
153
154 A newsgroup-name is a period-delimited hierarchical name, such as
155 "comp.infosystems.www.misc". If <newsgroup-name> is "*" (as in
156 <news:*>), it is used to refer to "all available news groups". An ex‐
157 ample is <news:comp.lang.ada>.
158
159 A message-id corresponds to the Message-ID of IETF RFC 1036,
160 ⟨http://www.ietf.org/rfc/rfc1036.txt⟩ without the enclosing "<" and
161 ">"; it takes the form unique@full_domain_name. A message identifier
162 may be distinguished from a news group name by the presence of the "@"
163 character.
164
165 telnet - Telnet login
166
167 telnet://ip_server/
168
169 The Telnet URL scheme is used to designate interactive text services
170 that may be accessed by the Telnet protocol. The final "/" character
171 may be omitted. The default port is 23. An example is <tel‐
172 net://melvyl.ucop.edu/>.
173
174 file - Normal file
175
176 file://ip_server/path_segments
177 file:path_segments
178
179 This represents a file or directory accessible locally. As a special
180 case, ip_server can be the string "localhost" or the empty string; this
181 is interpreted as "the machine from which the URL is being inter‐
182 preted". If the path is to a directory, the viewer should display the
183 directory's contents with links to each containee; not all viewers cur‐
184 rently do this. KDE supports generated files through the URL
185 <file:/cgi-bin>. If the given file isn't found, browser writers may
186 want to try to expand the filename via filename globbing (see glob(7)
187 and glob(3)).
188
189 The second format (e.g., <file:/etc/passwd>) is a correct format for
190 referring to a local file. However, older standards did not permit
191 this format, and some programs don't recognize this as a URI. A more
192 portable syntax is to use an empty string as the server name, for exam‐
193 ple, <file:///etc/passwd>; this form does the same thing and is easily
194 recognized by pattern matchers and older programs as a URI. Note that
195 if you really mean to say "start from the current location", don't
196 specify the scheme at all; use a relative address like <../test.txt>,
197 which has the side-effect of being scheme-independent. An example of
198 this scheme is <file:///etc/passwd>.
199
200 man - Man page documentation
201
202 man:command-name
203 man:command-name(section)
204
205 This refers to local online manual (man) reference pages. The command
206 name can optionally be followed by a parenthesis and section number;
207 see man(7) for more information on the meaning of the section numbers.
208 This URI scheme is unique to UNIX-like systems (such as Linux) and is
209 not currently registered by the IETF. An example is <man:ls(1)>.
210
211 info - Info page documentation
212
213 info:virtual-filename
214 info:virtual-filename#nodename
215 info:(virtual-filename)
216 info:(virtual-filename)nodename
217
218 This scheme refers to online info reference pages (generated from tex‐
219 info files), a documentation format used by programs such as the GNU
220 tools. This URI scheme is unique to UNIX-like systems (such as Linux)
221 and is not currently registered by the IETF. As of this writing, GNOME
222 and KDE differ in their URI syntax and do not accept the other's syn‐
223 tax. The first two formats are the GNOME format; in nodenames all spa‐
224 ces are written as underscores. The second two formats are the KDE
225 format; spaces in nodenames must be written as spaces, even though this
226 is forbidden by the URI standards. It's hoped that in the future most
227 tools will understand all of these formats and will always accept un‐
228 derscores for spaces in nodenames. In both GNOME and KDE, if the form
229 without the nodename is used the nodename is assumed to be "Top". Ex‐
230 amples of the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>.
231 Examples of the KDE format are <info:(gcc)> and <info:(gcc)G++ and
232 GCC>.
233
234 whatis - Documentation search
235
236 whatis:string
237
238 This scheme searches the database of short (one-line) descriptions of
239 commands and returns a list of descriptions containing that string.
240 Only complete word matches are returned. See whatis(1). This URI
241 scheme is unique to UNIX-like systems (such as Linux) and is not cur‐
242 rently registered by the IETF.
243
244 ghelp - GNOME help documentation
245
246 ghelp:name-of-application
247
248 This loads GNOME help for the given application. Note that not much
249 documentation currently exists in this format.
250
251 ldap - Lightweight Directory Access Protocol
252
253 ldap://hostport
254 ldap://hostport/
255 ldap://hostport/dn
256 ldap://hostport/dn?attributes
257 ldap://hostport/dn?attributes?scope
258 ldap://hostport/dn?attributes?scope?filter
259 ldap://hostport/dn?attributes?scope?filter?extensions
260
261 This scheme supports queries to the Lightweight Directory Access Proto‐
262 col (LDAP), a protocol for querying a set of servers for hierarchically
263 organized information (such as people and computing resources). See
264 RFC 2255 ⟨http://www.ietf.org/rfc/rfc2255.txt⟩ for more information on
265 the LDAP URL scheme. The components of this URL are:
266
267 hostport
268 the LDAP server to query, written as a hostname optionally fol‐
269 lowed by a colon and the port number. The default LDAP port is
270 TCP port 389. If empty, the client determines which the LDAP
271 server to use.
272
273 dn the LDAP Distinguished Name, which identifies the base object of
274 the LDAP search (see RFC 2253 ⟨http://www.ietf.org/rfc
275 /rfc2253.txt⟩ section 3).
276
277 attributes
278 a comma-separated list of attributes to be returned; see
279 RFC 2251 section 4.1.5. If omitted, all attributes should be
280 returned.
281
282 scope specifies the scope of the search, which can be one of "base"
283 (for a base object search), "one" (for a one-level search), or
284 "sub" (for a subtree search). If scope is omitted, "base" is
285 assumed.
286
287 filter specifies the search filter (subset of entries to return). If
288 omitted, all entries should be returned. See RFC 2254
289 ⟨http://www.ietf.org/rfc/rfc2254.txt⟩ section 4.
290
291 extensions
292 a comma-separated list of type=value pairs, where the =value
293 portion may be omitted for options not requiring it. An exten‐
294 sion prefixed with a '!' is critical (must be supported to be
295 valid), otherwise it is noncritical (optional).
296
297 LDAP queries are easiest to explain by example. Here's a query that
298 asks ldap.itd.umich.edu for information about the University of Michi‐
299 gan in the U.S.:
300
301 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US
302
303 To just get its postal address attribute, request:
304
305 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress
306
307 To ask a host.com at port 6666 for information about the person with
308 common name (cn) "Babs Jensen" at University of Michigan, request:
309
310 ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
311
312 wais - Wide Area Information Servers
313
314 wais://hostport/database
315 wais://hostport/database?search
316 wais://hostport/database/wtype/wpath
317
318 This scheme designates a WAIS database, search, or document (see IETF
319 RFC 1625 ⟨http://www.ietf.org/rfc/rfc1625.txt⟩ for more information on
320 WAIS). Hostport is the hostname, optionally followed by a colon and
321 port number (the default port number is 210).
322
323 The first form designates a WAIS database for searching. The second
324 form designates a particular search of the WAIS database database. The
325 third form designates a particular document within a WAIS database to
326 be retrieved. wtype is the WAIS designation of the type of the object
327 and wpath is the WAIS document-id.
328
329 other schemes
330
331 There are many other URI schemes. Most tools that accept URIs support
332 a set of internal URIs (e.g., Mozilla has the about: scheme for inter‐
333 nal information, and the GNOME help browser has the toc: scheme for
334 various starting locations). There are many schemes that have been de‐
335 fined but are not as widely used at the current time (e.g., prospero).
336 The nntp: scheme is deprecated in favor of the news: scheme. URNs are
337 to be supported by the urn: scheme, with a hierarchical name space
338 (e.g., urn:ietf:... would identify IETF documents); at this time URNs
339 are not widely implemented. Not all tools support all schemes.
340
341 Character encoding
342 URIs use a limited number of characters so that they can be typed in
343 and used in a variety of situations.
344
345 The following characters are reserved, that is, they may appear in a
346 URI but their use is limited to their reserved purpose (conflicting
347 data must be escaped before forming the URI):
348
349 ; / ? : @ & = + $ ,
350
351 Unreserved characters may be included in a URI. Unreserved characters
352 include uppercase and lowercase Latin letters, decimal digits, and the
353 following limited set of punctuation marks and symbols:
354
355 - _ . ! ~ * ' ( )
356
357 All other characters must be escaped. An escaped octet is encoded as a
358 character triplet, consisting of the percent character "%" followed by
359 the two hexadecimal digits representing the octet code (you can use up‐
360 percase or lowercase letters for the hexadecimal digits). For example,
361 a blank space must be escaped as "%20", a tab character as "%09", and
362 the "&" as "%26". Because the percent "%" character always has the re‐
363 served purpose of being the escape indicator, it must be escaped as
364 "%25". It is common practice to escape space characters as the plus
365 symbol (+) in query text; this practice isn't uniformly defined in the
366 relevant RFCs (which recommend %20 instead) but any tool accepting URIs
367 with query text should be prepared for them. A URI is always shown in
368 its "escaped" form.
369
370 Unreserved characters can be escaped without changing the semantics of
371 the URI, but this should not be done unless the URI is being used in a
372 context that does not allow the unescaped character to appear. For ex‐
373 ample, "%7e" is sometimes used instead of "~" in an HTTP URL path, but
374 the two are equivalent for an HTTP URL.
375
376 For URIs which must handle characters outside the US ASCII character
377 set, the HTML 4.01 specification (section B.2) and IETF RFC 3986 (last
378 paragraph of section 2.5) recommend the following approach:
379
380 (1) translate the character sequences into UTF-8 (IETF RFC 3629)—see
381 utf-8(7)—and then
382
383 (2) use the URI escaping mechanism, that is, use the %HH encoding for
384 unsafe octets.
385
386 Writing a URI
387 When written, URIs should be placed inside double quotes (e.g.,
388 "http://www.kernel.org"), enclosed in angle brackets (e.g.,
389 <http://lwn.net>), or placed on a line by themselves. A warning for
390 those who use double-quotes: never move extraneous punctuation (such as
391 the period ending a sentence or the comma in a list) inside a URI,
392 since this will change the value of the URI. Instead, use angle brack‐
393 ets instead, or switch to a quoting system that never includes extrane‐
394 ous characters inside quotation marks. This latter system, called the
395 'new' or 'logical' quoting system by "Hart's Rules" and the "Oxford
396 Dictionary for Writers and Editors", is preferred practice in Great
397 Britain and in various European languages. Older documents suggested
398 inserting the prefix "URL:" just before the URI, but this form has
399 never caught on.
400
401 The URI syntax was designed to be unambiguous. However, as URIs have
402 become commonplace, traditional media (television, radio, newspapers,
403 billboards, etc.) have increasingly used abbreviated URI references
404 consisting of only the authority and path portions of the identified
405 resource (e.g., <www.w3.org/Addressing>). Such references are primar‐
406 ily intended for human interpretation rather than machine, with the as‐
407 sumption that context-based heuristics are sufficient to complete the
408 URI (e.g., hostnames beginning with "www" are likely to have a URI pre‐
409 fix of "http://" and hostnames beginning with "ftp" likely to have a
410 prefix of "ftp://"). Many client implementations heuristically resolve
411 these references. Such heuristics may change over time, particularly
412 when new schemes are introduced. Since an abbreviated URI has the same
413 syntax as a relative URL path, abbreviated URI references cannot be
414 used where relative URIs are permitted, and can be used only when there
415 is no defined base (such as in dialog boxes). Don't use abbreviated
416 URIs as hypertext links inside a document; use the standard format as
417 described here.
418
420 (IETF RFC 2396) ⟨http://www.ietf.org/rfc/rfc2396.txt⟩, (HTML 4.0)
421 ⟨http://www.w3.org/TR/REC-html40⟩.
422
424 Any tool accepting URIs (e.g., a web browser) on a Linux system should
425 be able to handle (directly or indirectly) all of the schemes described
426 here, including the man: and info: schemes. Handling them by invoking
427 some other program is fine and in fact encouraged.
428
429 Technically the fragment isn't part of the URI.
430
431 For information on how to embed URIs (including URLs) in a data format,
432 see documentation on that format. HTML uses the format <A HREF="uri">
433 text </A>. Texinfo files use the format @uref{uri}. Man and mdoc have
434 the recently added UR macro, or just include the URI in the text (view‐
435 ers should be able to detect :// as part of a URI).
436
437 The GNOME and KDE desktop environments currently vary in the URIs they
438 accept, in particular in their respective help browsers. To list man
439 pages, GNOME uses <toc:man> while KDE uses <man:(index)>, and to list
440 info pages, GNOME uses <toc:info> while KDE uses <info:(dir)> (the au‐
441 thor of this man page prefers the KDE approach here, though a more reg‐
442 ular format would be even better). In general, KDE uses <file:/cgi-
443 bin/> as a prefix to a set of generated files. KDE prefers documenta‐
444 tion in HTML, accessed via the <file:/cgi-bin/helpindex>. GNOME pre‐
445 fers the ghelp scheme to store and find documentation. Neither browser
446 handles file: references to directories at the time of this writing,
447 making it difficult to refer to an entire directory with a browsable
448 URI. As noted above, these environments differ in how they handle the
449 info: scheme, probably the most important variation. It is expected
450 that GNOME and KDE will converge to common URI formats, and a future
451 version of this man page will describe the converged result. Efforts
452 to aid this convergence are encouraged.
453
454 Security
455 A URI does not in itself pose a security threat. There is no general
456 guarantee that a URL, which at one time located a given resource, will
457 continue to do so. Nor is there any guarantee that a URL will not lo‐
458 cate a different resource at some later point in time; such a guarantee
459 can be obtained only from the person(s) controlling that namespace and
460 the resource in question.
461
462 It is sometimes possible to construct a URL such that an attempt to
463 perform a seemingly harmless operation, such as the retrieval of an en‐
464 tity associated with the resource, will in fact cause a possibly damag‐
465 ing remote operation to occur. The unsafe URL is typically constructed
466 by specifying a port number other than that reserved for the network
467 protocol in question. The client unwittingly contacts a site that is
468 in fact running a different protocol. The content of the URL contains
469 instructions that, when interpreted according to this other protocol,
470 cause an unexpected operation. An example has been the use of a gopher
471 URL to cause an unintended or impersonating message to be sent via a
472 SMTP server.
473
474 Caution should be used when using any URL that specifies a port number
475 other than the default for the protocol, especially when it is a number
476 within the reserved space.
477
478 Care should be taken when a URI contains escaped delimiters for a given
479 protocol (for example, CR and LF characters for telnet protocols) that
480 these are not unescaped before transmission. This might violate the
481 protocol, but avoids the potential for such characters to be used to
482 simulate an extra operation or parameter in that protocol, which might
483 lead to an unexpected and possibly harmful remote operation to be per‐
484 formed.
485
486 It is clearly unwise to use a URI that contains a password which is in‐
487 tended to be secret. In particular, the use of a password within the
488 "userinfo" component of a URI is strongly recommended against except in
489 those rare cases where the "password" parameter is intended to be pub‐
490 lic.
491
493 Documentation may be placed in a variety of locations, so there cur‐
494 rently isn't a good URI scheme for general online documentation in ar‐
495 bitrary formats. References of the form <file:///usr/doc/ZZZ> don't
496 work because different distributions and local installation require‐
497 ments may place the files in different directories (it may be in
498 /usr/doc, or /usr/local/doc, or /usr/share, or somewhere else). Also,
499 the directory ZZZ usually changes when a version changes (though file‐
500 name globbing could partially overcome this). Finally, using the file:
501 scheme doesn't easily support people who dynamically load documentation
502 from the Internet (instead of loading the files onto a local filesys‐
503 tem). A future URI scheme may be added (e.g., "userdoc:") to permit
504 programs to include cross-references to more detailed documentation
505 without having to know the exact location of that documentation. Al‐
506 ternatively, a future version of the filesystem specification may spec‐
507 ify file locations sufficiently so that the file: scheme will be able
508 to locate documentation.
509
510 Many programs and file formats don't include a way to incorporate or
511 implement links using URIs.
512
513 Many programs can't handle all of these different URI formats; there
514 should be a standard mechanism to load an arbitrary URI that automati‐
515 cally detects the users' environment (e.g., text or graphics, desktop
516 environment, local user preferences, and currently executing tools) and
517 invokes the right tool for any URI.
518
520 lynx(1), man2html(1), mailaddr(7), utf-8(7)
521
522 IETF RFC 2255 ⟨http://www.ietf.org/rfc/rfc2255.txt⟩
523
524
525
526Linux man-pages 6.05 2023-04-30 uri(7)