1uri_string(3) Erlang Module Definition uri_string(3)
2
3
4
6 uri_string - URI processing functions.
7
9 This module contains functions for parsing and handling URIs (RFC 3986)
10 and form-urlencoded query strings (HTML 5.2).
11
12 Parsing and serializing non-UTF-8 form-urlencoded query strings are
13 also supported (HTML 5.0).
14
15 A URI is an identifier consisting of a sequence of characters matching
16 the syntax rule named URI in RFC 3986.
17
18 The generic URI syntax consists of a hierarchical sequence of compo‐
19 nents referred to as the scheme, authority, path, query, and fragment:
20
21 URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
22 hier-part = "//" authority path-abempty
23 / path-absolute
24 / path-rootless
25 / path-empty
26 scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
27 authority = [ userinfo "@" ] host [ ":" port ]
28 userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
29
30 reserved = gen-delims / sub-delims
31 gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
32 sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
33 / "*" / "+" / "," / ";" / "="
34
35 unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
36
37
38
39 The interpretation of a URI depends only on the characters used and not
40 on how those characters are represented in a network protocol.
41
42 The functions implemented by this module cover the following use cases:
43
44 * Parsing URIs into its components and returing a map
45 parse/1
46
47 * Recomposing a map of URI components into a URI string
48 recompose/1
49
50 * Changing inbound binary and percent-encoding of URIs
51 transcode/2
52
53 * Transforming URIs into a normalized form
54 normalize/1
55 normalize/2
56
57 * Composing form-urlencoded query strings from a list of key-value
58 pairs
59 compose_query/1
60 compose_query/2
61
62 * Dissecting form-urlencoded query strings into a list of key-value
63 pairs
64 dissect_query/1
65
66 * Decoding percent-encoded triplets in URI map or a specific compo‐
67 nent of URI
68 percent_decode/1
69
70 * Preparing and retrieving application specific data included in URI
71 components
72 quote/1quote/2unquote/1
73
74 There are four different encodings present during the handling of URIs:
75
76 * Inbound binary encoding in binaries
77
78 * Inbound percent-encoding in lists and binaries
79
80 * Outbound binary encoding in binaries
81
82 * Outbound percent-encoding in lists and binaries
83
84 Functions with uri_string() argument accept lists, binaries and mixed
85 lists (lists with binary elements) as input type. All of the functions
86 but transcode/2 expects input as lists of unicode codepoints, UTF-8 en‐
87 coded binaries and UTF-8 percent-encoded URI parts ("%C3%B6" corre‐
88 sponds to the unicode character "ö").
89
90 Unless otherwise specified the return value type and encoding are the
91 same as the input type and encoding. That is, binary input returns bi‐
92 nary output, list input returns a list output but mixed input returns
93 list output.
94
95 In case of lists there is only percent-encoding. In binaries, however,
96 both binary encoding and percent-encoding shall be considered.
97 transcode/2 provides the means to convert between the supported encod‐
98 ings, it takes a uri_string() and a list of options specifying inbound
99 and outbound encodings.
100
101 RFC 3986 does not mandate any specific character encoding and it is
102 usually defined by the protocol or surrounding text. This library takes
103 the same assumption, binary and percent-encoding are handled as one
104 configuration unit, they cannot be set to different values.
105
106 Quoting functions are intended to be used by URI producing application
107 during component preparation or retrieval phase to avoid conflicts be‐
108 tween data and characters used in URI syntax. Quoting functions use
109 percent encoding, but with different rules than for example during exe‐
110 cution of recompose/1. It is user responsibility to provide quoting
111 functions with application data only and using their output to combine
112 an URI component.
113 Quoting functions can for instance be used for constructing a path com‐
114 ponent with a segment containing '/' character which should not collide
115 with '/' used as general delimiter in path component.
116
118 error() = {error, atom(), term()}
119
120 Error tuple indicating the type of error. Possible values of the
121 second component:
122
123 * invalid_character
124
125 * invalid_encoding
126
127 * invalid_input
128
129 * invalid_map
130
131 * invalid_percent_encoding
132
133 * invalid_scheme
134
135 * invalid_uri
136
137 * invalid_utf8
138
139 * missing_value
140
141 The third component is a term providing additional information
142 about the cause of the error.
143
144 uri_map() =
145 #{fragment => unicode:chardata(),
146 host => unicode:chardata(),
147 path => unicode:chardata(),
148 port => integer() >= 0 | undefined,
149 query => unicode:chardata(),
150 scheme => unicode:chardata(),
151 userinfo => unicode:chardata()}
152
153 Map holding the main components of a URI.
154
155 uri_string() = iodata()
156
157 List of unicode codepoints, a UTF-8 encoded binary, or a mix of
158 the two, representing an RFC 3986 compliant URI (percent-encoded
159 form). A URI is a sequence of characters from a very limited
160 set: the letters of the basic Latin alphabet, digits, and a few
161 special characters.
162
164 allowed_characters() -> [{atom(), list()}]
165
166 This is a utility function meant to be used in the shell for
167 printing the allowed characters in each major URI component, and
168 also in the most important characters sets. Please note that
169 this function does not replace the ABNF rules defined by the
170 standards, these character sets are derived directly from those
171 aformentioned rules. For more information see the Uniform Re‐
172 source Identifiers chapter in stdlib's Users Guide.
173
174 compose_query(QueryList) -> QueryString
175
176 Types:
177
178 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
179 QueryString = uri_string() | error()
180
181 Composes a form-urlencoded QueryString based on a QueryList, a
182 list of non-percent-encoded key-value pairs. Form-urlencoding is
183 defined in section 4.10.21.6 of the HTML 5.2 specification and
184 in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
185 encodings.
186
187 See also the opposite operation dissect_query/1.
188
189 Example:
190
191 1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
192 "foo+bar=1&city=%C3%B6rebro"
193 2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
194 2> {<<"city">>,<<"örebro"/utf8>>}]).
195 <<"foo+bar=1&city=%C3%B6rebro">>
196
197
198 compose_query(QueryList, Options) -> QueryString
199
200 Types:
201
202 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
203 Options = [{encoding, atom()}]
204 QueryString = uri_string() | error()
205
206 Same as compose_query/1 but with an additional Options parame‐
207 ter, that controls the encoding ("charset") used by the encoding
208 algorithm. There are two supported encodings: utf8 (or unicode)
209 and latin1.
210
211 Each character in the entry's name and value that cannot be ex‐
212 pressed using the selected character encoding, is replaced by a
213 string consisting of a U+0026 AMPERSAND character (&), a "#"
214 (U+0023) character, one or more ASCII digits representing the
215 Unicode code point of the character in base ten, and finally a
216 ";" (U+003B) character.
217
218 Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39,
219 0x41 to 0x5A, 0x5F, 0x61 to 0x7A, are percent-encoded (U+0025
220 PERCENT SIGN character (%) followed by uppercase ASCII hex dig‐
221 its representing the hexadecimal value of the byte).
222
223 See also the opposite operation dissect_query/1.
224
225 Example:
226
227 1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
228 1> [{encoding, latin1}]).
229 "foo+bar=1&city=%F6rebro"
230 2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
231 2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).
232 <<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>
233
234
235 dissect_query(QueryString) -> QueryList
236
237 Types:
238
239 QueryString = uri_string()
240 QueryList =
241 [{unicode:chardata(), unicode:chardata() | true}] | er‐
242 ror()
243
244 Dissects an urlencoded QueryString and returns a QueryList, a
245 list of non-percent-encoded key-value pairs. Form-urlencoding is
246 defined in section 4.10.21.6 of the HTML 5.2 specification and
247 in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
248 encodings.
249
250 See also the opposite operation compose_query/1.
251
252 Example:
253
254 1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
255 [{"foo bar","1"},{"city","örebro"}]
256 2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
257 [{<<"foo bar">>,<<"1">>},
258 {<<"city">>,<<230,157,177,228,186,172>>}]
259
260
261 normalize(URI) -> NormalizedURI
262
263 Types:
264
265 URI = uri_string() | uri_map()
266 NormalizedURI = uri_string() | error()
267
268 Transforms an URI into a normalized form using Syntax-Based Nor‐
269 malization as defined by RFC 3986.
270
271 This function implements case normalization, percent-encoding
272 normalization, path segment normalization and scheme based nor‐
273 malization for HTTP(S) with basic support for FTP, SSH, SFTP and
274 TFTP.
275
276 Example:
277
278 1> uri_string:normalize("/a/b/c/./../../g").
279 "/a/g"
280 2> uri_string:normalize(<<"mid/content=5/../6">>).
281 <<"mid/6">>
282 3> uri_string:normalize("http://localhost:80").
283 "http://localhost/"
284 4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
285 4> host => "localhost-örebro"}).
286 "http://localhost-%C3%B6rebro/a/g"
287
288
289 normalize(URI, Options) -> NormalizedURI
290
291 Types:
292
293 URI = uri_string() | uri_map()
294 Options = [return_map]
295 NormalizedURI = uri_string() | uri_map() | error()
296
297 Same as normalize/1 but with an additional Options parameter,
298 that controls whether the normalized URI shall be returned as an
299 uri_map(). There is one supported option: return_map.
300
301 Example:
302
303 1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
304 #{path => "/a/g"}
305 2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]).
306 #{path => <<"mid/6">>}
307 3> uri_string:normalize("http://localhost:80", [return_map]).
308 #{scheme => "http",path => "/",host => "localhost"}
309 4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
310 4> host => "localhost-örebro"}, [return_map]).
311 #{scheme => "http",path => "/a/g",host => "localhost-örebro"}
312
313
314 parse(URIString) -> URIMap
315
316 Types:
317
318 URIString = uri_string()
319 URIMap = uri_map() | error()
320
321 Parses an RFC 3986 compliant uri_string() into a uri_map(), that
322 holds the parsed components of the URI. If parsing fails, an er‐
323 ror tuple is returned.
324
325 See also the opposite operation recompose/1.
326
327 Example:
328
329 1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
330 #{fragment => "nose",host => "example.com",
331 path => "/over/there",port => 8042,query => "name=ferret",
332 scheme => foo,userinfo => "user"}
333 2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
334 #{host => <<"example.com">>,path => <<"/over/there">>,
335 port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
336 userinfo => <<"user">>}
337
338
339 percent_decode(URI) -> Result
340
341 Types:
342
343 URI = uri_string() | uri_map()
344 Result =
345 uri_string() |
346 uri_map() |
347 {error, {invalid, {atom(), {term(), term()}}}}
348
349 Decodes all percent-encoded triplets in the input that can be
350 both a uri_string() and a uri_map(). Note, that this function
351 performs raw decoding and it shall be used on already parsed URI
352 components. Applying this function directly on a standard URI
353 can effectively change it.
354
355 If the input encoding is not UTF-8, an error tuple is returned.
356
357 Example:
358
359 1> uri_string:percent_decode(#{host => "localhost-%C3%B6rebro",path => [],
360 1> scheme => "http"}).
361 #{host => "localhost-örebro",path => [],scheme => "http"}
362 2> uri_string:percent_decode(<<"%C3%B6rebro">>).
363 <<"örebro"/utf8>>
364
365
366 Warning:
367 Using uri_string:percent_decode/1 directly on a URI is not safe.
368 This example shows, that after each consecutive application of
369 the function the resulting URI will be changed. None of these
370 URIs refer to the same resource.
371
372 3> uri_string:percent_decode(<<"http://local%252Fhost/path">>).
373 <<"http://local%2Fhost/path">>
374 4> uri_string:percent_decode(<<"http://local%2Fhost/path">>).
375 <<"http://local/host/path">>
376
377
378
379 quote(Data) -> QuotedData
380
381 Types:
382
383 Data = QuotedData = unicode:chardata()
384
385 Replaces characters out of unreserved set with their percent en‐
386 coded equivalents.
387
388 Unreserved characters defined in RFC 3986 are not quoted.
389
390 Example:
391
392 1> uri_string:quote("SomeId/04").
393 "SomeId%2F04"
394 2> uri_string:quote(<<"SomeId/04">>).
395 <<"SomeId%2F04">>
396
397
398 Warning:
399 Function is not aware about any URI component context and should
400 not be used on whole URI. If applied more than once on the same
401 data, might produce unexpected results.
402
403
404 quote(Data, Safe) -> QuotedData
405
406 Types:
407
408 Data = unicode:chardata()
409 Safe = string()
410 QuotedData = unicode:chardata()
411
412 Same as quote/1, but Safe allows user to provide a list of char‐
413 acters to be protected from encoding.
414
415 Example:
416
417 1> uri_string:quote("SomeId/04", "/").
418 "SomeId/04"
419 2> uri_string:quote(<<"SomeId/04">>, "/").
420 <<"SomeId/04">>
421
422
423 Warning:
424 Function is not aware about any URI component context and should
425 not be used on whole URI. If applied more than once on the same
426 data, might produce unexpected results.
427
428
429 recompose(URIMap) -> URIString
430
431 Types:
432
433 URIMap = uri_map()
434 URIString = uri_string() | error()
435
436 Creates an RFC 3986 compliant URIString (percent-encoded), based
437 on the components of URIMap. If the URIMap is invalid, an error
438 tuple is returned.
439
440 See also the opposite operation parse/1.
441
442 Example:
443
444 1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
445 1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
446 #{fragment => "nose",host => "example.com",
447 path => "/over/there",port => 8042,query => "name=ferret",
448 scheme => "foo",userinfo => "user"}
449
450 2> uri_string:recompose(URIMap).
451 "foo://example.com:8042/over/there?name=ferret#nose"
452
453 resolve(RefURI, BaseURI) -> TargetURI
454
455 Types:
456
457 RefURI = BaseURI = uri_string() | uri_map()
458 TargetURI = uri_string() | error()
459
460 Convert a RefURI reference that might be relative to a given
461 base URI into the parsed components of the reference's target,
462 which can then be recomposed to form the target URI.
463
464 Example:
465
466 1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q").
467 "http://localhost/abs/ol/ute"
468 2> uri_string:resolve("../relative", "http://localhost/a/b/c?q").
469 "http://localhost/a/relative"
470 3> uri_string:resolve("http://localhost/full", "http://localhost/a/b/c?q").
471 "http://localhost/full"
472 4> uri_string:resolve(#{path => "path", query => "xyz"}, "http://localhost/a/b/c?q").
473 "http://localhost/a/b/path?xyz"
474
475
476 resolve(RefURI, BaseURI, Options) -> TargetURI
477
478 Types:
479
480 RefURI = BaseURI = uri_string() | uri_map()
481 Options = [return_map]
482 TargetURI = uri_string() | uri_map() | error()
483
484 Same as resolve/2 but with an additional Options parameter, that
485 controls whether the target URI shall be returned as an
486 uri_map(). There is one supported option: return_map.
487
488 Example:
489
490 1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q", [return_map]).
491 #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
492 2> uri_string:resolve(#{path => "/abs/ol/ute"}, #{scheme => "http",
493 2> host => "localhost", path => "/a/b/c?q"}, [return_map]).
494 #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
495
496
497 transcode(URIString, Options) -> Result
498
499 Types:
500
501 URIString = uri_string()
502 Options =
503 [{in_encoding, unicode:encoding()} |
504 {out_encoding, unicode:encoding()}]
505 Result = uri_string() | error()
506
507 Transcodes an RFC 3986 compliant URIString, where Options is a
508 list of tagged tuples, specifying the inbound (in_encoding) and
509 outbound (out_encoding) encodings. in_encoding and out_encoding
510 specifies both binary encoding and percent-encoding for the in‐
511 put and output data. Mixed encoding, where binary encoding is
512 not the same as percent-encoding, is not supported. If an argu‐
513 ment is invalid, an error tuple is returned.
514
515 Example:
516
517 1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
518 1> [{in_encoding, utf32},{out_encoding, utf8}]).
519 <<"foo%C3%B6bar"/utf8>>
520 2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
521 2> {out_encoding, utf8}]).
522 "foo%C3%B6bar"
523
524
525 unquote(QuotedData) -> Data
526
527 Types:
528
529 QuotedData = Data = unicode:chardata()
530
531 Percent decode characters.
532
533 Example:
534
535 1> uri_string:unquote("SomeId%2F04").
536 "SomeId/04"
537 2> uri_string:unquote(<<"SomeId%2F04">>).
538 <<"SomeId/04">>
539
540
541 Warning:
542 Function is not aware about any URI component context and should
543 not be used on whole URI. If applied more than once on the same
544 data, might produce unexpected results.
545
546
547
548
549Ericsson AB stdlib 4.3.1.3 uri_string(3)