1uri_string(3) Erlang Module Definition uri_string(3)
2
3
4
6 uri_string - URI processing functions.
7
9 This module contains functions for parsing and handling URIs (RFC 3986)
10 and form-urlencoded query strings (HTML 5.2).
11
12 Parsing and serializing non-UTF-8 form-urlencoded query strings are
13 also supported (HTML 5.0).
14
15 A URI is an identifier consisting of a sequence of characters matching
16 the syntax rule named URI in RFC 3986.
17
18 The generic URI syntax consists of a hierarchical sequence of compo‐
19 nents referred to as the scheme, authority, path, query, and fragment:
20
21 URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
22 hier-part = "//" authority path-abempty
23 / path-absolute
24 / path-rootless
25 / path-empty
26 scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
27 authority = [ userinfo "@" ] host [ ":" port ]
28 userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
29
30 reserved = gen-delims / sub-delims
31 gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
32 sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
33 / "*" / "+" / "," / ";" / "="
34
35 unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
36
37
38
39 The interpretation of a URI depends only on the characters used and not
40 on how those characters are represented in a network protocol.
41
42 The functions implemented by this module cover the following use cases:
43
44 * Parsing URIs into its components and returing a map
45 parse/1
46
47 * Recomposing a map of URI components into a URI string
48 recompose/1
49
50 * Changing inbound binary and percent-encoding of URIs
51 transcode/2
52
53 * Transforming URIs into a normalized form
54 normalize/1
55 normalize/2
56
57 * Composing form-urlencoded query strings from a list of key-value
58 pairs
59 compose_query/1
60 compose_query/2
61
62 * Dissecting form-urlencoded query strings into a list of key-value
63 pairs
64 dissect_query/1
65
66 There are four different encodings present during the handling of URIs:
67
68 * Inbound binary encoding in binaries
69
70 * Inbound percent-encoding in lists and binaries
71
72 * Outbound binary encoding in binaries
73
74 * Outbound percent-encoding in lists and binaries
75
76 Functions with uri_string() argument accept lists, binaries and mixed
77 lists (lists with binary elements) as input type. All of the functions
78 but transcode/2 expects input as lists of unicode codepoints, UTF-8
79 encoded binaries and UTF-8 percent-encoded URI parts ("%C3%B6" corre‐
80 sponds to the unicode character "ö").
81
82 Unless otherwise specified the return value type and encoding are the
83 same as the input type and encoding. That is, binary input returns
84 binary output, list input returns a list output but mixed input returns
85 list output.
86
87 In case of lists there is only percent-encoding. In binaries, however,
88 both binary encoding and percent-encoding shall be considered.
89 transcode/2 provides the means to convert between the supported encod‐
90 ings, it takes a uri_string() and a list of options specifying inbound
91 and outbound encodings.
92
93 RFC 3986 does not mandate any specific character encoding and it is
94 usually defined by the protocol or surrounding text. This library takes
95 the same assumption, binary and percent-encoding are handled as one
96 configuration unit, they cannot be set to different values.
97
99 error() = {error, atom(), term()}
100
101 Error tuple indicating the type of error. Possible values of the
102 second component:
103
104 * invalid_character
105
106 * invalid_encoding
107
108 * invalid_input
109
110 * invalid_map
111
112 * invalid_percent_encoding
113
114 * invalid_scheme
115
116 * invalid_uri
117
118 * invalid_utf8
119
120 * missing_value
121
122 The third component is a term providing additional information
123 about the cause of the error.
124
125 uri_map() =
126 #{fragment => unicode:chardata(),
127 host => unicode:chardata(),
128 path => unicode:chardata(),
129 port => integer() >= 0 | undefined,
130 query => unicode:chardata(),
131 scheme => unicode:chardata(),
132 userinfo => unicode:chardata()} |
133 #{}
134
135 Map holding the main components of a URI.
136
137 uri_string() = iodata()
138
139 List of unicode codepoints, a UTF-8 encoded binary, or a mix of
140 the two, representing an RFC 3986 compliant URI (percent-encoded
141 form). A URI is a sequence of characters from a very limited
142 set: the letters of the basic Latin alphabet, digits, and a few
143 special characters.
144
146 compose_query(QueryList) -> QueryString
147
148 Types:
149
150 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
151 QueryString = uri_string() | error()
152
153 Composes a form-urlencoded QueryString based on a QueryList, a
154 list of non-percent-encoded key-value pairs. Form-urlencoding is
155 defined in section 4.10.21.6 of the HTML 5.2 specification and
156 in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
157 encodings.
158
159 See also the opposite operation dissect_query/1.
160
161 Example:
162
163 1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
164 "foo+bar=1&city=%C3%B6rebro"
165 2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
166 2> {<<"city">>,<<"örebro"/utf8>>}]).
167 <<"foo+bar=1&city=%C3%B6rebro">>
168
169
170 compose_query(QueryList, Options) -> QueryString
171
172 Types:
173
174 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
175 Options = [{encoding, atom()}]
176 QueryString = uri_string() | error()
177
178 Same as compose_query/1 but with an additional Options parame‐
179 ter, that controls the encoding ("charset") used by the encoding
180 algorithm. There are two supported encodings: utf8 (or unicode)
181 and latin1.
182
183 Each character in the entry's name and value that cannot be
184 expressed using the selected character encoding, is replaced by
185 a string consisting of a U+0026 AMPERSAND character (&), a "#"
186 (U+0023) character, one or more ASCII digits representing the
187 Unicode code point of the character in base ten, and finally a
188 ";" (U+003B) character.
189
190 Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39,
191 0x41 to 0x5A, 0x5F, 0x61 to 0x7A, are percent-encoded (U+0025
192 PERCENT SIGN character (%) followed by uppercase ASCII hex dig‐
193 its representing the hexadecimal value of the byte).
194
195 See also the opposite operation dissect_query/1.
196
197 Example:
198
199 1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
200 1> [{encoding, latin1}]).
201 "foo+bar=1&city=%F6rebro"
202 2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
203 2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).
204 <<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>
205
206
207 dissect_query(QueryString) -> QueryList
208
209 Types:
210
211 QueryString = uri_string()
212 QueryList =
213 [{unicode:chardata(), unicode:chardata() | true}] |
214 error()
215
216 Dissects an urlencoded QueryString and returns a QueryList, a
217 list of non-percent-encoded key-value pairs. Form-urlencoding is
218 defined in section 4.10.21.6 of the HTML 5.2 specification and
219 in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
220 encodings.
221
222 See also the opposite operation compose_query/1.
223
224 Example:
225
226 1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
227 [{"foo bar","1"},{"city","örebro"}]
228 2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
229 [{<<"foo bar">>,<<"1">>},
230 {<<"city">>,<<230,157,177,228,186,172>>}]
231
232
233 normalize(URI) -> NormalizedURI
234
235 Types:
236
237 URI = uri_string() | uri_map()
238 NormalizedURI = uri_string() | error()
239
240 Transforms an URI into a normalized form using Syntax-Based Nor‐
241 malization as defined by RFC 3986.
242
243 This function implements case normalization, percent-encoding
244 normalization, path segment normalization and scheme based nor‐
245 malization for HTTP(S) with basic support for FTP, SSH, SFTP and
246 TFTP.
247
248 Example:
249
250 1> uri_string:normalize("/a/b/c/./../../g").
251 "/a/g"
252 2> uri_string:normalize(<<"mid/content=5/../6">>).
253 <<"mid/6">>
254 3> uri_string:normalize("http://localhost:80").
255 "https://localhost/"
256 4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
257 4> host => "localhost-örebro"}).
258 "http://localhost-%C3%B6rebro/a/g"
259
260
261 normalize(URI, Options) -> NormalizedURI
262
263 Types:
264
265 URI = uri_string() | uri_map()
266 Options = [return_map]
267 NormalizedURI = uri_string() | uri_map()
268
269 Same as normalize/1 but with an additional Options parameter,
270 that controls if the normalized URI shall be returned as an
271 uri_map(). There is one supported option: return_map.
272
273 Example:
274
275 1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
276 #{path => "/a/g"}
277 2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]).
278 #{path => <<"mid/6">>}
279 3> uri_string:normalize("http://localhost:80", [return_map]).
280 #{scheme => "http",path => "/",host => "localhost"}
281 4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
282 4> host => "localhost-örebro"}, [return_map]).
283 #{scheme => "http",path => "/a/g",host => "localhost-örebro"}
284
285
286 parse(URIString) -> URIMap
287
288 Types:
289
290 URIString = uri_string()
291 URIMap = uri_map() | error()
292
293 Parses an RFC 3986 compliant uri_string() into a uri_map(), that
294 holds the parsed components of the URI. If parsing fails, an
295 error tuple is returned.
296
297 See also the opposite operation recompose/1.
298
299 Example:
300
301 1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
302 #{fragment => "nose",host => "example.com",
303 path => "/over/there",port => 8042,query => "name=ferret",
304 scheme => foo,userinfo => "user"}
305 2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
306 #{host => <<"example.com">>,path => <<"/over/there">>,
307 port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
308 userinfo => <<"user">>}
309
310
311 recompose(URIMap) -> URIString
312
313 Types:
314
315 URIMap = uri_map()
316 URIString = uri_string() | error()
317
318 Creates an RFC 3986 compliant URIString (percent-encoded), based
319 on the components of URIMap. If the URIMap is invalid, an error
320 tuple is returned.
321
322 See also the opposite operation parse/1.
323
324 Example:
325
326 1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
327 1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
328 #{fragment => "top",host => "example.com",
329 path => "/over/there",port => 8042,query => "?name=ferret",
330 scheme => foo,userinfo => "user"}
331
332 2> uri_string:recompose(URIMap).
333 "foo://example.com:8042/over/there?name=ferret#nose"
334
335 transcode(URIString, Options) -> Result
336
337 Types:
338
339 URIString = uri_string()
340 Options =
341 [{in_encoding, unicode:encoding()} |
342 {out_encoding, unicode:encoding()}]
343 Result = uri_string() | error()
344
345 Transcodes an RFC 3986 compliant URIString, where Options is a
346 list of tagged tuples, specifying the inbound (in_encoding) and
347 outbound (out_encoding) encodings. in_encoding and out_encoding
348 specifies both binary encoding and percent-encoding for the
349 input and output data. Mixed encoding, where binary encoding is
350 not the same as percent-encoding, is not supported. If an argu‐
351 ment is invalid, an error tuple is returned.
352
353 Example:
354
355 1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
356 1> [{in_encoding, utf32},{out_encoding, utf8}]).
357 <<"foo%C3%B6bar"/utf8>>
358 2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
359 2> {out_encoding, utf8}]).
360 "foo%C3%B6bar"
361
362
363
364
365Ericsson AB stdlib 3.10 uri_string(3)