1uri_string(3)              Erlang Module Definition              uri_string(3)
2
3
4

NAME

6       uri_string - URI processing functions.
7

DESCRIPTION

9       This module contains functions for parsing and handling URIs (RFC 3986)
10       and form-urlencoded query strings (HTML 5.2).
11
12       Parsing and serializing non-UTF-8  form-urlencoded  query  strings  are
13       also supported (HTML 5.0).
14
15       A  URI is an identifier consisting of a sequence of characters matching
16       the syntax rule named URI in RFC 3986.
17
18       The generic URI syntax consists of a hierarchical  sequence  of  compo‐
19       nents referred to as the scheme, authority, path, query, and fragment:
20
21           URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
22           hier-part   = "//" authority path-abempty
23                          / path-absolute
24                          / path-rootless
25                          / path-empty
26           scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
27           authority   = [ userinfo "@" ] host [ ":" port ]
28           userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
29
30           reserved    = gen-delims / sub-delims
31           gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
32           sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
33                       / "*" / "+" / "," / ";" / "="
34
35           unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
36
37
38
39       The interpretation of a URI depends only on the characters used and not
40       on how those characters are represented in a network protocol.
41
42       The functions implemented by this module cover the following use cases:
43
44         * Parsing URIs into its components and returing a map
45           parse/1
46
47         * Recomposing a map of URI components into a URI string
48           recompose/1
49
50         * Changing inbound binary and percent-encoding of URIs
51           transcode/2
52
53         * Transforming URIs into a normalized form
54           normalize/1
55           normalize/2
56
57         * Composing form-urlencoded query strings from a  list  of  key-value
58           pairs
59           compose_query/1
60           compose_query/2
61
62         * Dissecting  form-urlencoded  query strings into a list of key-value
63           pairs
64           dissect_query/1
65
66       There are four different encodings present during the handling of URIs:
67
68         * Inbound binary encoding in binaries
69
70         * Inbound percent-encoding in lists and binaries
71
72         * Outbound binary encoding in binaries
73
74         * Outbound percent-encoding in lists and binaries
75
76       Functions with uri_string() argument accept lists, binaries  and  mixed
77       lists  (lists with binary elements) as input type. All of the functions
78       but transcode/2 expects input as lists  of  unicode  codepoints,  UTF-8
79       encoded  binaries  and UTF-8 percent-encoded URI parts ("%C3%B6" corre‐
80       sponds to the unicode character "ö").
81
82       Unless otherwise specified the return value type and encoding  are  the
83       same  as  the  input  type  and encoding. That is, binary input returns
84       binary output, list input returns a list output but mixed input returns
85       list output.
86
87       In  case of lists there is only percent-encoding. In binaries, however,
88       both  binary  encoding  and  percent-encoding  shall   be   considered.
89       transcode/2  provides the means to convert between the supported encod‐
90       ings, it takes a uri_string() and a list of options specifying  inbound
91       and outbound encodings.
92
93       RFC  3986  does  not  mandate any specific character encoding and it is
94       usually defined by the protocol or surrounding text. This library takes
95       the  same  assumption,  binary  and percent-encoding are handled as one
96       configuration unit, they cannot be set to different values.
97

DATA TYPES

99       error() = {error, atom(), term()}
100
101              Error tuple indicating the type of error. Possible values of the
102              second component:
103
104                * invalid_character
105
106                * invalid_encoding
107
108                * invalid_input
109
110                * invalid_map
111
112                * invalid_percent_encoding
113
114                * invalid_scheme
115
116                * invalid_uri
117
118                * invalid_utf8
119
120                * missing_value
121
122              The  third  component is a term providing additional information
123              about the cause of the error.
124
125       uri_map() =
126           #{fragment => unicode:chardata(),
127             host => unicode:chardata(),
128             path => unicode:chardata(),
129             port => integer() >= 0 | undefined,
130             query => unicode:chardata(),
131             scheme => unicode:chardata(),
132             userinfo => unicode:chardata()} |
133           #{}
134
135              Map holding the main components of a URI.
136
137       uri_string() = iodata()
138
139              List of unicode codepoints, a UTF-8 encoded binary, or a mix  of
140              the two, representing an RFC 3986 compliant URI (percent-encoded
141              form). A URI is a sequence of characters  from  a  very  limited
142              set:  the letters of the basic Latin alphabet, digits, and a few
143              special characters.
144

EXPORTS

146       compose_query(QueryList) -> QueryString
147
148              Types:
149
150                 QueryList = [{unicode:chardata(), unicode:chardata()}]
151                 QueryString = uri_string() | error()
152
153              Composes a form-urlencoded QueryString based on a  QueryList,  a
154              list of non-percent-encoded key-value pairs. Form-urlencoding is
155              defined in section 4.10.21.6 of the HTML 5.2  specification  and
156              in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
157              encodings.
158
159              See also the opposite operation dissect_query/1.
160
161              Example:
162
163              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
164              "foo+bar=1&city=%C3%B6rebro"
165              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
166              2> {<<"city">>,<<"örebro"/utf8>>}]).
167              <<"foo+bar=1&city=%C3%B6rebro">>
168
169
170       compose_query(QueryList, Options) -> QueryString
171
172              Types:
173
174                 QueryList = [{unicode:chardata(), unicode:chardata()}]
175                 Options = [{encoding, atom()}]
176                 QueryString = uri_string() | error()
177
178              Same as compose_query/1 but with an additional  Options  parame‐
179              ter, that controls the encoding ("charset") used by the encoding
180              algorithm. There are two supported encodings: utf8 (or  unicode)
181              and latin1.
182
183              Each  character  in  the  entry's  name and value that cannot be
184              expressed using the selected character encoding, is replaced  by
185              a  string  consisting of a U+0026 AMPERSAND character (&), a "#"
186              (U+0023) character, one or more ASCII  digits  representing  the
187              Unicode  code  point of the character in base ten, and finally a
188              ";" (U+003B) character.
189
190              Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to  0x39,
191              0x41  to  0x5A,  0x5F, 0x61 to 0x7A, are percent-encoded (U+0025
192              PERCENT SIGN character (%) followed by uppercase ASCII hex  dig‐
193              its representing the hexadecimal value of the byte).
194
195              See also the opposite operation dissect_query/1.
196
197              Example:
198
199              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
200              1> [{encoding, latin1}]).
201              "foo+bar=1&city=%F6rebro"
202              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
203              2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).
204              <<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>
205
206
207       dissect_query(QueryString) -> QueryList
208
209              Types:
210
211                 QueryString = uri_string()
212                 QueryList =
213                     [{unicode:chardata(), unicode:chardata()}] | error()
214
215              Dissects  an  urlencoded  QueryString and returns a QueryList, a
216              list of non-percent-encoded key-value pairs. Form-urlencoding is
217              defined  in  section 4.10.21.6 of the HTML 5.2 specification and
218              in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
219              encodings.
220
221              See also the opposite operation compose_query/1.
222
223              Example:
224
225              1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
226              [{"foo bar","1"},{"city","örebro"}]
227              2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
228              [{<<"foo bar">>,<<"1">>},
229               {<<"city">>,<<230,157,177,228,186,172>>}]
230
231
232       normalize(URI) -> NormalizedURI
233
234              Types:
235
236                 URI = uri_string() | uri_map()
237                 NormalizedURI = uri_string() | error()
238
239              Transforms an URI into a normalized form using Syntax-Based Nor‐
240              malization as defined by RFC 3986.
241
242              This function implements  case  normalization,  percent-encoding
243              normalization,  path segment normalization and scheme based nor‐
244              malization for HTTP(S) with basic support for FTP, SSH, SFTP and
245              TFTP.
246
247              Example:
248
249              1> uri_string:normalize("/a/b/c/./../../g").
250              "/a/g"
251              2> uri_string:normalize(<<"mid/content=5/../6">>).
252              <<"mid/6">>
253              3> uri_string:normalize("http://localhost:80").
254              "https://localhost/"
255              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
256              4> host => "localhost-örebro"}).
257              "http://localhost-%C3%B6rebro/a/g"
258
259
260       normalize(URI, Options) -> NormalizedURI
261
262              Types:
263
264                 URI = uri_string() | uri_map()
265                 Options = [return_map]
266                 NormalizedURI = uri_string() | uri_map()
267
268              Same  as  normalize/1  but with an additional Options parameter,
269              that controls if the normalized URI  shall  be  returned  as  an
270              uri_map(). There is one supported option: return_map.
271
272              Example:
273
274              1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
275              #{path => "/a/g"}
276              2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]).
277              #{path => <<"mid/6">>}
278              3> uri_string:normalize("http://localhost:80", [return_map]).
279              #{scheme => "http",path => "/",host => "localhost"}
280              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
281              4> host => "localhost-örebro"}, [return_map]).
282              #{scheme => "http",path => "/a/g",host => "localhost-örebro"}
283
284
285       parse(URIString) -> URIMap
286
287              Types:
288
289                 URIString = uri_string()
290                 URIMap = uri_map() | error()
291
292              Parses an RFC 3986 compliant uri_string() into a uri_map(), that
293              holds the parsed components of the URI.  If  parsing  fails,  an
294              error tuple is returned.
295
296              See also the opposite operation recompose/1.
297
298              Example:
299
300              1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
301              #{fragment => "nose",host => "example.com",
302                path => "/over/there",port => 8042,query => "name=ferret",
303                scheme => foo,userinfo => "user"}
304              2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
305              #{host => <<"example.com">>,path => <<"/over/there">>,
306                port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
307                userinfo => <<"user">>}
308
309
310       recompose(URIMap) -> URIString
311
312              Types:
313
314                 URIMap = uri_map()
315                 URIString = uri_string() | error()
316
317              Creates an RFC 3986 compliant URIString (percent-encoded), based
318              on the components of URIMap. If the URIMap is invalid, an  error
319              tuple is returned.
320
321              See also the opposite operation parse/1.
322
323              Example:
324
325              1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
326              1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
327              #{fragment => "top",host => "example.com",
328                path => "/over/there",port => 8042,query => "?name=ferret",
329                scheme => foo,userinfo => "user"}
330
331              2> uri_string:recompose(URIMap).
332              "foo://example.com:8042/over/there?name=ferret#nose"
333
334       transcode(URIString, Options) -> Result
335
336              Types:
337
338                 URIString = uri_string()
339                 Options =
340                     [{in_encoding, unicode:encoding()} |
341                      {out_encoding, unicode:encoding()}]
342                 Result = uri_string() | error()
343
344              Transcodes  an  RFC 3986 compliant URIString, where Options is a
345              list of tagged tuples, specifying the inbound (in_encoding)  and
346              outbound  (out_encoding) encodings. in_encoding and out_encoding
347              specifies both binary  encoding  and  percent-encoding  for  the
348              input  and output data. Mixed encoding, where binary encoding is
349              not the same as percent-encoding, is not supported. If an  argu‐
350              ment is invalid, an error tuple is returned.
351
352              Example:
353
354              1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
355              1> [{in_encoding, utf32},{out_encoding, utf8}]).
356              <<"foo%C3%B6bar"/utf8>>
357              2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
358              2> {out_encoding, utf8}]).
359              "foo%C3%B6bar"
360
361
362
363
364Ericsson AB                     stdlib 3.8.2.1                   uri_string(3)
Impressum