1uri_string(3)              Erlang Module Definition              uri_string(3)
2
3
4

NAME

6       uri_string - URI processing functions.
7

DESCRIPTION

9       This module contains functions for parsing and handling URIs (RFC 3986)
10       and form-urlencoded query strings (HTML 5.2).
11
12       Parsing and serializing non-UTF-8  form-urlencoded  query  strings  are
13       also supported (HTML 5.0).
14
15       A  URI is an identifier consisting of a sequence of characters matching
16       the syntax rule named URI in RFC 3986.
17
18       The generic URI syntax consists of a hierarchical  sequence  of  compo‐
19       nents referred to as the scheme, authority, path, query, and fragment:
20
21           URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
22           hier-part   = "//" authority path-abempty
23                          / path-absolute
24                          / path-rootless
25                          / path-empty
26           scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
27           authority   = [ userinfo "@" ] host [ ":" port ]
28           userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
29
30           reserved    = gen-delims / sub-delims
31           gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
32           sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
33                       / "*" / "+" / "," / ";" / "="
34
35           unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
36
37
38
39       The interpretation of a URI depends only on the characters used and not
40       on how those characters are represented in a network protocol.
41
42       The functions implemented by this module cover the following use cases:
43
44         * Parsing URIs into its components and returing a map
45           parse/1
46
47         * Recomposing a map of URI components into a URI string
48           recompose/1
49
50         * Changing inbound binary and percent-encoding of URIs
51           transcode/2
52
53         * Transforming URIs into a normalized form
54           normalize/1
55           normalize/2
56
57         * Composing form-urlencoded query strings from a  list  of  key-value
58           pairs
59           compose_query/1
60           compose_query/2
61
62         * Dissecting  form-urlencoded  query strings into a list of key-value
63           pairs
64           dissect_query/1
65
66       There are four different encodings present during the handling of URIs:
67
68         * Inbound binary encoding in binaries
69
70         * Inbound percent-encoding in lists and binaries
71
72         * Outbound binary encoding in binaries
73
74         * Outbound percent-encoding in lists and binaries
75
76       Functions with uri_string() argument accept lists, binaries  and  mixed
77       lists  (lists with binary elements) as input type. All of the functions
78       but transcode/2 expects input as lists  of  unicode  codepoints,  UTF-8
79       encoded  binaries  and UTF-8 percent-encoded URI parts ("%C3%B6" corre‐
80       sponds to the unicode character "ö").
81
82       Unless otherwise specified the return value type and encoding  are  the
83       same  as  the  input  type  and encoding. That is, binary input returns
84       binary output, list input returns a list output but mixed input returns
85       list output.
86
87       In  case of lists there is only percent-encoding. In binaries, however,
88       both  binary  encoding  and  percent-encoding  shall   be   considered.
89       transcode/2  provides the means to convert between the supported encod‐
90       ings, it takes a uri_string() and a list of options specifying  inbound
91       and outbound encodings.
92
93       RFC  3986  does  not  mandate any specific character encoding and it is
94       usually defined by the protocol or surrounding text. This library takes
95       the  same  assumption,  binary  and percent-encoding are handled as one
96       configuration unit, they cannot be set to different values.
97

DATA TYPES

99       error() = {error, atom(), term()}
100
101              Error tuple indicating the type of error. Possible values of the
102              second component:
103
104                * invalid_character
105
106                * invalid_encoding
107
108                * invalid_input
109
110                * invalid_map
111
112                * invalid_percent_encoding
113
114                * invalid_scheme
115
116                * invalid_uri
117
118                * invalid_utf8
119
120                * missing_value
121
122              The  third  component is a term providing additional information
123              about the cause of the error.
124
125       uri_map() =
126           #{fragment => unicode:chardata(),
127             host => unicode:chardata(),
128             path => unicode:chardata(),
129             port => integer() >= 0 | undefined,
130             query => unicode:chardata(),
131             scheme => unicode:chardata(),
132             userinfo => unicode:chardata()} |
133           #{}
134
135              Map holding the main components of a URI.
136
137       uri_string() = iodata()
138
139              List of unicode codepoints, a UTF-8 encoded binary, or a mix  of
140              the two, representing an RFC 3986 compliant URI (percent-encoded
141              form). A URI is a sequence of characters  from  a  very  limited
142              set:  the letters of the basic Latin alphabet, digits, and a few
143              special characters.
144

EXPORTS

146       compose_query(QueryList) -> QueryString
147
148              Types:
149
150                 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
151                 QueryString = uri_string() | error()
152
153              Composes a form-urlencoded QueryString based on a  QueryList,  a
154              list of non-percent-encoded key-value pairs. Form-urlencoding is
155              defined in section 4.10.21.6 of the HTML 5.2  specification  and
156              in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
157              encodings.
158
159              See also the opposite operation dissect_query/1.
160
161              Example:
162
163              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
164              "foo+bar=1&city=%C3%B6rebro"
165              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
166              2> {<<"city">>,<<"örebro"/utf8>>}]).
167              <<"foo+bar=1&city=%C3%B6rebro">>
168
169
170       compose_query(QueryList, Options) -> QueryString
171
172              Types:
173
174                 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
175                 Options = [{encoding, atom()}]
176                 QueryString = uri_string() | error()
177
178              Same as compose_query/1 but with an additional  Options  parame‐
179              ter, that controls the encoding ("charset") used by the encoding
180              algorithm. There are two supported encodings: utf8 (or  unicode)
181              and latin1.
182
183              Each  character  in  the  entry's  name and value that cannot be
184              expressed using the selected character encoding, is replaced  by
185              a  string  consisting of a U+0026 AMPERSAND character (&), a "#"
186              (U+0023) character, one or more ASCII  digits  representing  the
187              Unicode  code  point of the character in base ten, and finally a
188              ";" (U+003B) character.
189
190              Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to  0x39,
191              0x41  to  0x5A,  0x5F, 0x61 to 0x7A, are percent-encoded (U+0025
192              PERCENT SIGN character (%) followed by uppercase ASCII hex  dig‐
193              its representing the hexadecimal value of the byte).
194
195              See also the opposite operation dissect_query/1.
196
197              Example:
198
199              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
200              1> [{encoding, latin1}]).
201              "foo+bar=1&city=%F6rebro"
202              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
203              2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).
204              <<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>
205
206
207       dissect_query(QueryString) -> QueryList
208
209              Types:
210
211                 QueryString = uri_string()
212                 QueryList =
213                     [{unicode:chardata(),   unicode:chardata()   |  true}]  |
214                 error()
215
216              Dissects an urlencoded QueryString and returns  a  QueryList,  a
217              list of non-percent-encoded key-value pairs. Form-urlencoding is
218              defined in section 4.10.21.6 of the HTML 5.2  specification  and
219              in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
220              encodings.
221
222              See also the opposite operation compose_query/1.
223
224              Example:
225
226              1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
227              [{"foo bar","1"},{"city","örebro"}]
228              2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
229              [{<<"foo bar">>,<<"1">>},
230               {<<"city">>,<<230,157,177,228,186,172>>}]
231
232
233       normalize(URI) -> NormalizedURI
234
235              Types:
236
237                 URI = uri_string() | uri_map()
238                 NormalizedURI = uri_string() | error()
239
240              Transforms an URI into a normalized form using Syntax-Based Nor‐
241              malization as defined by RFC 3986.
242
243              This  function  implements  case normalization, percent-encoding
244              normalization, path segment normalization and scheme based  nor‐
245              malization for HTTP(S) with basic support for FTP, SSH, SFTP and
246              TFTP.
247
248              Example:
249
250              1> uri_string:normalize("/a/b/c/./../../g").
251              "/a/g"
252              2> uri_string:normalize(<<"mid/content=5/../6">>).
253              <<"mid/6">>
254              3> uri_string:normalize("http://localhost:80").
255              "https://localhost/"
256              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
257              4> host => "localhost-örebro"}).
258              "http://localhost-%C3%B6rebro/a/g"
259
260
261       normalize(URI, Options) -> NormalizedURI
262
263              Types:
264
265                 URI = uri_string() | uri_map()
266                 Options = [return_map]
267                 NormalizedURI = uri_string() | uri_map()
268
269              Same as normalize/1 but with an  additional  Options  parameter,
270              that  controls  if  the  normalized  URI shall be returned as an
271              uri_map(). There is one supported option: return_map.
272
273              Example:
274
275              1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
276              #{path => "/a/g"}
277              2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]).
278              #{path => <<"mid/6">>}
279              3> uri_string:normalize("http://localhost:80", [return_map]).
280              #{scheme => "http",path => "/",host => "localhost"}
281              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
282              4> host => "localhost-örebro"}, [return_map]).
283              #{scheme => "http",path => "/a/g",host => "localhost-örebro"}
284
285
286       parse(URIString) -> URIMap
287
288              Types:
289
290                 URIString = uri_string()
291                 URIMap = uri_map() | error()
292
293              Parses an RFC 3986 compliant uri_string() into a uri_map(), that
294              holds  the  parsed  components  of the URI. If parsing fails, an
295              error tuple is returned.
296
297              See also the opposite operation recompose/1.
298
299              Example:
300
301              1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
302              #{fragment => "nose",host => "example.com",
303                path => "/over/there",port => 8042,query => "name=ferret",
304                scheme => foo,userinfo => "user"}
305              2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
306              #{host => <<"example.com">>,path => <<"/over/there">>,
307                port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
308                userinfo => <<"user">>}
309
310
311       recompose(URIMap) -> URIString
312
313              Types:
314
315                 URIMap = uri_map()
316                 URIString = uri_string() | error()
317
318              Creates an RFC 3986 compliant URIString (percent-encoded), based
319              on  the components of URIMap. If the URIMap is invalid, an error
320              tuple is returned.
321
322              See also the opposite operation parse/1.
323
324              Example:
325
326              1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
327              1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
328              #{fragment => "top",host => "example.com",
329                path => "/over/there",port => 8042,query => "?name=ferret",
330                scheme => foo,userinfo => "user"}
331
332              2> uri_string:recompose(URIMap).
333              "foo://example.com:8042/over/there?name=ferret#nose"
334
335       transcode(URIString, Options) -> Result
336
337              Types:
338
339                 URIString = uri_string()
340                 Options =
341                     [{in_encoding, unicode:encoding()} |
342                      {out_encoding, unicode:encoding()}]
343                 Result = uri_string() | error()
344
345              Transcodes an RFC 3986 compliant URIString, where Options  is  a
346              list  of tagged tuples, specifying the inbound (in_encoding) and
347              outbound (out_encoding) encodings. in_encoding and  out_encoding
348              specifies  both  binary  encoding  and  percent-encoding for the
349              input and output data. Mixed encoding, where binary encoding  is
350              not  the same as percent-encoding, is not supported. If an argu‐
351              ment is invalid, an error tuple is returned.
352
353              Example:
354
355              1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
356              1> [{in_encoding, utf32},{out_encoding, utf8}]).
357              <<"foo%C3%B6bar"/utf8>>
358              2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
359              2> {out_encoding, utf8}]).
360              "foo%C3%B6bar"
361
362
363
364
365Ericsson AB                       stdlib 3.10                    uri_string(3)
Impressum