1uri_string(3)              Erlang Module Definition              uri_string(3)
2
3
4

NAME

6       uri_string - URI processing functions.
7

DESCRIPTION

9       This module contains functions for parsing and handling URIs (RFC 3986)
10       and form-urlencoded query strings (HTML 5.2).
11
12       Parsing and serializing non-UTF-8  form-urlencoded  query  strings  are
13       also supported (HTML 5.0).
14
15       A  URI is an identifier consisting of a sequence of characters matching
16       the syntax rule named URI in RFC 3986.
17
18       The generic URI syntax consists of a hierarchical  sequence  of  compo‐
19       nents referred to as the scheme, authority, path, query, and fragment:
20
21           URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
22           hier-part   = "//" authority path-abempty
23                          / path-absolute
24                          / path-rootless
25                          / path-empty
26           scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
27           authority   = [ userinfo "@" ] host [ ":" port ]
28           userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
29
30           reserved    = gen-delims / sub-delims
31           gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
32           sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
33                       / "*" / "+" / "," / ";" / "="
34
35           unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
36
37
38
39       The interpretation of a URI depends only on the characters used and not
40       on how those characters are represented in a network protocol.
41
42       The functions implemented by this module cover the following use cases:
43
44         * Parsing URIs into its components and returing a map
45           parse/1
46
47         * Recomposing a map of URI components into a URI string
48           recompose/1
49
50         * Changing inbound binary and percent-encoding of URIs
51           transcode/2
52
53         * Transforming URIs into a normalized form
54           normalize/1
55           normalize/2
56
57         * Composing form-urlencoded query strings from a  list  of  key-value
58           pairs
59           compose_query/1
60           compose_query/2
61
62         * Dissecting  form-urlencoded  query strings into a list of key-value
63           pairs
64           dissect_query/1
65
66         * Decoding percent-encoded triplets in URI map or a  specific  compo‐
67           nent of URI
68           percent_decode/1
69
70         * Preparing  and retrieving application specific data included in URI
71           components
72           quote/1quote/2unquote/1
73
74       There are four different encodings present during the handling of URIs:
75
76         * Inbound binary encoding in binaries
77
78         * Inbound percent-encoding in lists and binaries
79
80         * Outbound binary encoding in binaries
81
82         * Outbound percent-encoding in lists and binaries
83
84       Functions with uri_string() argument accept lists, binaries  and  mixed
85       lists  (lists with binary elements) as input type. All of the functions
86       but transcode/2 expects input as lists of unicode codepoints, UTF-8 en‐
87       coded  binaries  and  UTF-8  percent-encoded URI parts ("%C3%B6" corre‐
88       sponds to the unicode character "ö").
89
90       Unless otherwise specified the return value type and encoding  are  the
91       same  as the input type and encoding. That is, binary input returns bi‐
92       nary output, list input returns a list output but mixed  input  returns
93       list output.
94
95       In  case of lists there is only percent-encoding. In binaries, however,
96       both  binary  encoding  and  percent-encoding  shall   be   considered.
97       transcode/2  provides the means to convert between the supported encod‐
98       ings, it takes a uri_string() and a list of options specifying  inbound
99       and outbound encodings.
100
101       RFC  3986  does  not  mandate any specific character encoding and it is
102       usually defined by the protocol or surrounding text. This library takes
103       the  same  assumption,  binary  and percent-encoding are handled as one
104       configuration unit, they cannot be set to different values.
105
106       Quoting functions are intended to be used by URI producing  application
107       during  component preparation or retrieval phase to avoid conflicts be‐
108       tween data and characters used in URI  syntax.  Quoting  functions  use
109       percent encoding, but with different rules than for example during exe‐
110       cution of recompose/1. It is user  responsibility  to  provide  quoting
111       functions  with application data only and using their output to combine
112       an URI component.
113       Quoting functions can for instance be used for constructing a path com‐
114       ponent with a segment containing '/' character which should not collide
115       with '/' used as general delimiter in path component.
116

DATA TYPES

118       error() = {error, atom(), term()}
119
120              Error tuple indicating the type of error. Possible values of the
121              second component:
122
123                * invalid_character
124
125                * invalid_encoding
126
127                * invalid_input
128
129                * invalid_map
130
131                * invalid_percent_encoding
132
133                * invalid_scheme
134
135                * invalid_uri
136
137                * invalid_utf8
138
139                * missing_value
140
141              The  third  component is a term providing additional information
142              about the cause of the error.
143
144       uri_map() =
145           #{fragment => unicode:chardata(),
146             host => unicode:chardata(),
147             path => unicode:chardata(),
148             port => integer() >= 0 | undefined,
149             query => unicode:chardata(),
150             scheme => unicode:chardata(),
151             userinfo => unicode:chardata()}
152
153              Map holding the main components of a URI.
154
155       uri_string() = iodata()
156
157              List of unicode codepoints, a UTF-8 encoded binary, or a mix  of
158              the two, representing an RFC 3986 compliant URI (percent-encoded
159              form). A URI is a sequence of characters  from  a  very  limited
160              set:  the letters of the basic Latin alphabet, digits, and a few
161              special characters.
162

EXPORTS

164       allowed_characters() -> [{atom(), list()}]
165
166              This is a utility function meant to be used  in  the  shell  for
167              printing the allowed characters in each major URI component, and
168              also in the most important characters  sets.  Please  note  that
169              this  function  does  not  replace the ABNF rules defined by the
170              standards, these character sets are derived directly from  those
171              aformentioned  rules.  For  more information see the Uniform Re‐
172              source Identifiers chapter in stdlib's Users Guide.
173
174       compose_query(QueryList) -> QueryString
175
176              Types:
177
178                 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
179                 QueryString = uri_string() | error()
180
181              Composes a form-urlencoded QueryString based on a  QueryList,  a
182              list of non-percent-encoded key-value pairs. Form-urlencoding is
183              defined in section 4.10.21.6 of the HTML 5.2  specification  and
184              in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
185              encodings.
186
187              See also the opposite operation dissect_query/1.
188
189              Example:
190
191              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
192              "foo+bar=1&city=%C3%B6rebro"
193              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
194              2> {<<"city">>,<<"örebro"/utf8>>}]).
195              <<"foo+bar=1&city=%C3%B6rebro">>
196
197
198       compose_query(QueryList, Options) -> QueryString
199
200              Types:
201
202                 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
203                 Options = [{encoding, atom()}]
204                 QueryString = uri_string() | error()
205
206              Same as compose_query/1 but with an additional  Options  parame‐
207              ter, that controls the encoding ("charset") used by the encoding
208              algorithm. There are two supported encodings: utf8 (or  unicode)
209              and latin1.
210
211              Each  character in the entry's name and value that cannot be ex‐
212              pressed using the selected character encoding, is replaced by  a
213              string  consisting  of  a  U+0026 AMPERSAND character (&), a "#"
214              (U+0023) character, one or more ASCII  digits  representing  the
215              Unicode  code  point of the character in base ten, and finally a
216              ";" (U+003B) character.
217
218              Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to  0x39,
219              0x41  to  0x5A,  0x5F, 0x61 to 0x7A, are percent-encoded (U+0025
220              PERCENT SIGN character (%) followed by uppercase ASCII hex  dig‐
221              its representing the hexadecimal value of the byte).
222
223              See also the opposite operation dissect_query/1.
224
225              Example:
226
227              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
228              1> [{encoding, latin1}]).
229              "foo+bar=1&city=%F6rebro"
230              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
231              2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).
232              <<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>
233
234
235       dissect_query(QueryString) -> QueryList
236
237              Types:
238
239                 QueryString = uri_string()
240                 QueryList =
241                     [{unicode:chardata(),  unicode:chardata()  | true}] | er‐
242                 ror()
243
244              Dissects an urlencoded QueryString and returns  a  QueryList,  a
245              list of non-percent-encoded key-value pairs. Form-urlencoding is
246              defined in section 4.10.21.6 of the HTML 5.2  specification  and
247              in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
248              encodings.
249
250              See also the opposite operation compose_query/1.
251
252              Example:
253
254              1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
255              [{"foo bar","1"},{"city","örebro"}]
256              2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
257              [{<<"foo bar">>,<<"1">>},
258               {<<"city">>,<<230,157,177,228,186,172>>}]
259
260
261       normalize(URI) -> NormalizedURI
262
263              Types:
264
265                 URI = uri_string() | uri_map()
266                 NormalizedURI = uri_string() | error()
267
268              Transforms an URI into a normalized form using Syntax-Based Nor‐
269              malization as defined by RFC 3986.
270
271              This  function  implements  case normalization, percent-encoding
272              normalization, path segment normalization and scheme based  nor‐
273              malization for HTTP(S) with basic support for FTP, SSH, SFTP and
274              TFTP.
275
276              Example:
277
278              1> uri_string:normalize("/a/b/c/./../../g").
279              "/a/g"
280              2> uri_string:normalize(<<"mid/content=5/../6">>).
281              <<"mid/6">>
282              3> uri_string:normalize("http://localhost:80").
283              "http://localhost/"
284              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
285              4> host => "localhost-örebro"}).
286              "http://localhost-%C3%B6rebro/a/g"
287
288
289       normalize(URI, Options) -> NormalizedURI
290
291              Types:
292
293                 URI = uri_string() | uri_map()
294                 Options = [return_map]
295                 NormalizedURI = uri_string() | uri_map() | error()
296
297              Same as normalize/1 but with an  additional  Options  parameter,
298              that controls whether the normalized URI shall be returned as an
299              uri_map(). There is one supported option: return_map.
300
301              Example:
302
303              1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
304              #{path => "/a/g"}
305              2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]).
306              #{path => <<"mid/6">>}
307              3> uri_string:normalize("http://localhost:80", [return_map]).
308              #{scheme => "http",path => "/",host => "localhost"}
309              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
310              4> host => "localhost-örebro"}, [return_map]).
311              #{scheme => "http",path => "/a/g",host => "localhost-örebro"}
312
313
314       parse(URIString) -> URIMap
315
316              Types:
317
318                 URIString = uri_string()
319                 URIMap = uri_map() | error()
320
321              Parses an RFC 3986 compliant uri_string() into a uri_map(), that
322              holds the parsed components of the URI. If parsing fails, an er‐
323              ror tuple is returned.
324
325              See also the opposite operation recompose/1.
326
327              Example:
328
329              1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
330              #{fragment => "nose",host => "example.com",
331                path => "/over/there",port => 8042,query => "name=ferret",
332                scheme => foo,userinfo => "user"}
333              2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
334              #{host => <<"example.com">>,path => <<"/over/there">>,
335                port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
336                userinfo => <<"user">>}
337
338
339       percent_decode(URI) -> Result
340
341              Types:
342
343                 URI = uri_string() | uri_map()
344                 Result =
345                     uri_string() |
346                     uri_map() |
347                     {error, {invalid, {atom(), {term(), term()}}}}
348
349              Decodes all percent-encoded triplets in the input  that  can  be
350              both  a  uri_string()  and a uri_map(). Note, that this function
351              performs raw decoding and it shall be used on already parsed URI
352              components.  Applying  this  function directly on a standard URI
353              can effectively change it.
354
355              If the input encoding is not UTF-8, an error tuple is returned.
356
357              Example:
358
359              1> uri_string:percent_decode(#{host => "localhost-%C3%B6rebro",path => [],
360              1> scheme => "http"}).
361              #{host => "localhost-örebro",path => [],scheme => "http"}
362              2> uri_string:percent_decode(<<"%C3%B6rebro">>).
363              <<"örebro"/utf8>>
364
365
366          Warning:
367              Using uri_string:percent_decode/1 directly on a URI is not safe.
368              This  example  shows, that after each consecutive application of
369              the function the resulting URI will be changed.  None  of  these
370              URIs refer to the same resource.
371
372              3> uri_string:percent_decode(<<"http://local%252Fhost/path">>).
373              <<"http://local%2Fhost/path">>
374              4> uri_string:percent_decode(<<"http://local%2Fhost/path">>).
375              <<"http://local/host/path">>
376
377
378
379       quote(Data) -> QuotedData
380
381              Types:
382
383                 Data = QuotedData = unicode:chardata()
384
385              Replaces characters out of unreserved set with their percent en‐
386              coded equivalents.
387
388              Unreserved characters defined in RFC 3986 are not quoted.
389
390              Example:
391
392              1> uri_string:quote("SomeId/04").
393              "SomeId%2F04"
394              2> uri_string:quote(<<"SomeId/04">>).
395              <<"SomeId%2F04">>
396
397
398          Warning:
399              Function is not aware about any URI component context and should
400              not  be used on whole URI. If applied more than once on the same
401              data, might produce unexpected results.
402
403
404       quote(Data, Safe) -> QuotedData
405
406              Types:
407
408                 Data = unicode:chardata()
409                 Safe = string()
410                 QuotedData = unicode:chardata()
411
412              Same as quote/1, but Safe allows user to provide a list of char‐
413              acters to be protected from encoding.
414
415              Example:
416
417              1> uri_string:quote("SomeId/04", "/").
418              "SomeId/04"
419              2> uri_string:quote(<<"SomeId/04">>, "/").
420              <<"SomeId/04">>
421
422
423          Warning:
424              Function is not aware about any URI component context and should
425              not be used on whole URI. If applied more than once on the  same
426              data, might produce unexpected results.
427
428
429       recompose(URIMap) -> URIString
430
431              Types:
432
433                 URIMap = uri_map()
434                 URIString = uri_string() | error()
435
436              Creates an RFC 3986 compliant URIString (percent-encoded), based
437              on the components of URIMap. If the URIMap is invalid, an  error
438              tuple is returned.
439
440              See also the opposite operation parse/1.
441
442              Example:
443
444              1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
445              1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
446              #{fragment => "nose",host => "example.com",
447                path => "/over/there",port => 8042,query => "name=ferret",
448                scheme => "foo",userinfo => "user"}
449
450              2> uri_string:recompose(URIMap).
451              "foo://example.com:8042/over/there?name=ferret#nose"
452
453       resolve(RefURI, BaseURI) -> TargetURI
454
455              Types:
456
457                 RefURI = BaseURI = uri_string() | uri_map()
458                 TargetURI = uri_string() | error()
459
460              Convert  a  RefURI  reference  that might be relative to a given
461              base URI into the parsed components of the  reference's  target,
462              which can then be recomposed to form the target URI.
463
464              Example:
465
466              1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q").
467              "http://localhost/abs/ol/ute"
468              2> uri_string:resolve("../relative", "http://localhost/a/b/c?q").
469              "http://localhost/a/relative"
470              3> uri_string:resolve("http://localhost/full", "http://localhost/a/b/c?q").
471              "http://localhost/full"
472              4> uri_string:resolve(#{path => "path", query => "xyz"}, "http://localhost/a/b/c?q").
473              "http://localhost/a/b/path?xyz"
474
475
476       resolve(RefURI, BaseURI, Options) -> TargetURI
477
478              Types:
479
480                 RefURI = BaseURI = uri_string() | uri_map()
481                 Options = [return_map]
482                 TargetURI = uri_string() | uri_map() | error()
483
484              Same as resolve/2 but with an additional Options parameter, that
485              controls  whether  the  target  URI  shall  be  returned  as  an
486              uri_map(). There is one supported option: return_map.
487
488              Example:
489
490              1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q", [return_map]).
491              #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
492              2> uri_string:resolve(#{path => "/abs/ol/ute"}, #{scheme => "http",
493              2> host => "localhost", path => "/a/b/c?q"}, [return_map]).
494              #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
495
496
497       transcode(URIString, Options) -> Result
498
499              Types:
500
501                 URIString = uri_string()
502                 Options =
503                     [{in_encoding, unicode:encoding()} |
504                      {out_encoding, unicode:encoding()}]
505                 Result = uri_string() | error()
506
507              Transcodes  an  RFC 3986 compliant URIString, where Options is a
508              list of tagged tuples, specifying the inbound (in_encoding)  and
509              outbound  (out_encoding) encodings. in_encoding and out_encoding
510              specifies both binary encoding and percent-encoding for the  in‐
511              put  and  output  data. Mixed encoding, where binary encoding is
512              not the same as percent-encoding, is not supported. If an  argu‐
513              ment is invalid, an error tuple is returned.
514
515              Example:
516
517              1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
518              1> [{in_encoding, utf32},{out_encoding, utf8}]).
519              <<"foo%C3%B6bar"/utf8>>
520              2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
521              2> {out_encoding, utf8}]).
522              "foo%C3%B6bar"
523
524
525       unquote(QuotedData) -> Data
526
527              Types:
528
529                 QuotedData = Data = unicode:chardata()
530
531              Percent decode characters.
532
533              Example:
534
535              1> uri_string:unquote("SomeId%2F04").
536              "SomeId/04"
537              2> uri_string:unquote(<<"SomeId%2F04">>).
538              <<"SomeId/04">>
539
540
541          Warning:
542              Function is not aware about any URI component context and should
543              not be used on whole URI. If applied more than once on the  same
544              data, might produce unexpected results.
545
546
547
548
549Ericsson AB                       stdlib 4.2                     uri_string(3)
Impressum