1trurl(1) trurl Manual trurl(1)
2
3
4
6 trurl - transpose URLs
7
9 trurl [options / URLs]
10
12 trurl parses, manipulates and outputs URLs and parts of URLs.
13
14 It uses the RFC 3986 definition of URLs and it uses libcurl's URL
15 parser to do so, which includes a few "extensions". The URL support is
16 limited to "hierarchical" URLs, the ones that use "://" separators af‐
17 ter the scheme.
18
19 Typically you pass in one or more URLs and decide what of that you want
20 output. Possibly modifying the URL as well.
21
22 trurl knows URLs and every URL consists of up to ten separate and inde‐
23 pendent "components". These components can be extracted, removed and
24 updated with trurl and they are referred to by their respective names:
25 scheme, user, password, options, host, port, path, query, fragment and
26 zoneid.
27
29 Options start with one or two dashes. Many of the options require an
30 additional value next to them.
31
32 Any other argument is interpreted as a URL argument, and is treated as
33 if it was following a --url option.
34
35 The first argument that is exactly two dashes ("--"), marks the end of
36 options; any argument after the end of options is interpreted as a URL
37 argument even if it starts with a dash.
38
39
40 -a, --append [component]=[data]
41 Append data to a component. This can only append data to the
42 path and the query components.
43
44 For path, this URL encodes and appends the new segment to the
45 path, separated with a slash.
46
47 For query, this URL encodes and appends the new segment to the
48 query, separated with an ampersand (&). If the appended segment
49 contains an equal sign ('=') that one will be kept verbatim and
50 both sides of the first occurrence will be URL encoded sepa‐
51 rately.
52
53 --accept-space
54 When set, trurl will try to accept spaces as part of the URL and
55 instead URL encode such occurrences accordingly.
56
57 According to RFC 3986, a space cannot legally be part of a URL.
58 This option provides a best-effort to convert the provided
59 string into a valid URL.
60
61 --default-port
62 When set, trurl will use the scheme's default port number for
63 URLs with a known scheme, and without an explicit port number.
64
65 Note that trurl only knows default port numbers for URL schemes
66 that are supported by libcurl.
67
68 Since, by default, trurl removes default port numbers from URLs
69 with a known scheme, this option is pretty much ignored unless
70 one of --get, --json, and --keep-port is not also specified.
71
72 -f, --url-file [file name]
73 Read URLs to work on from the given file. Use the file name "-"
74 (a single minus) to tell trurl to read the URLs from stdin.
75
76 Each line needs to be a single valid URL. trurl will remove one
77 carriage return character at the end of the line if present,
78 trim off all the trailing space and tab characters, and skip all
79 empty (after trimming) lines.
80
81 The maximum line length supported in a file like this is 4094
82 bytes. Lines that exceed that length are skipped, and a warning
83 is printed to stderr when they are encountered.
84
85 -g, --get [format]
86 Output text and URL data according to the provided format
87 string. Components from the URL can be output when specified as
88 {component} or [component], with the name of the part show
89 within curly braces or brackets. You can not mix braces and
90 brackets for this purpose in the same command line.
91
92 The following component names are available (case sensitive):
93 url, scheme, user, password, options, host, port, path, query,
94 fragment and zoneid.
95
96 {component} will expand to nothing if the given component does
97 not have a value.
98
99 Components are shown URL decoded by default. If you instead
100 write the component prefixed with a colon like "{:path}", it
101 gets output URL encoded.
102
103 You may also prefix components with default: and/or puny:, in
104 any order.
105
106 If default: is specified, like "{default:url}" or "{de‐
107 fault:port}", and the port is not explicitly specified in the
108 URL, the scheme's default port will be output if it is known.
109
110 If puny: is specified, like "{puny:url}" or "{puny:host}", the
111 "punycoded" version of the host name will be used in the output.
112
113 If --default-port is specified, all formats are expanded as if
114 they used default:; and if --punycode is used, all formats are
115 expanded as if they used puny:. Also note that "{url}" is af‐
116 fected by the --keep-port option.
117
118 Hosts provided as IPv6 numerical addresses will be provided
119 within square brackets. Like "[fe80::20c:29ff:fe9c:409b]".
120
121 Hosts provided as IPv4 numerical addresses will be "normalized"
122 and provided as four dot-separated decimal numbers when output.
123
124 You can access specific keys in the query string using the for‐
125 mat {query:key}. Then the value of the first matching key will
126 be output using a case sensitive match. When extracting a URL
127 decoded query key that contains %00, such octet will be replaced
128 with a single period '.' in the output.
129
130 You can access specific keys in the query string and out all
131 values using the format {query-all:key}. This looks for 'key'
132 case sensitively and will output all values for that key space-
133 separated.
134
135 The "format" string supports the following backslash sequences:
136
137 \\ - backslash
138
139 \t - tab
140
141 \n - newline
142
143 \r - carriage return
144
145 \{ - an open curly brace that does not start a variable
146
147 \[ - an open bracket that does not start a variable
148
149 All other text in the format string will be shown as-is.
150
151 -h, --help
152 Show the help output.
153
154 --iterate [component]=[item1 item2 ...]
155 Set the component to multiple values and output the result once
156 for each iteration. Several combined iterations are allowed to
157 generate combinations, but only one --iterate option per compo‐
158 nent. The listed items to iterate over should be separated by
159 single spaces.
160
161 --json Outputs all set components of the URLs as JSON objects. All com‐
162 ponents of the URL that have data will get populated in the
163 parts object using their component names. See below for details
164 on the format.
165
166 --keep-port
167 By default, trurl removes default port numbers from URLs with a
168 known scheme even if they are explicitly specified in the input
169 URL. This options, makes trurl not remove them.
170
171 --no-guess-scheme
172 Disables libcurl's scheme guessing feature. URLs that do not
173 contain a scheme will be treated as invalid URLs.
174
175 --punycode
176 Uses the "punycoded" version of the host name, which is how In‐
177 ternational Domain Names are converted into plain ASCII. If the
178 host name is not using IDN, the regular ASCII name is used.
179
180 --query-separator [what]
181 Specify the single letter used for separating query pairs. The
182 default is "&" but at least in the past sometimes semicolons ";"
183 or even colons ":" have been used for this purpose. If your URL
184 uses something other than the default letter, setting the right
185 one makes sure trurl can do its query operations properly.
186
187 --redirect [URL]
188 Redirect the URL to this new location. The redirection is per‐
189 formed on the base URL, so, if no base URL is specified, no re‐
190 direction will be performed.
191
192 -s, --set [component][:]=[data]
193 Set this URL component. Setting blank string ("") will clear the
194 component from the URL.
195
196 The following components can be set: url, scheme, user, pass‐
197 word, options, host, port, path, query, fragment and zoneid.
198
199 If a simple "="-assignment is used, the data is URL encoded when
200 applied. If ":=" is used, the data is assumed to already be URL
201 encoded and will be stored as-is.
202
203 If no URL or --url-file argument is provided, trurl will try to
204 create a URL using the components provided by the --set options.
205 If not enough components are specified, this will fail.
206
207 --sort-query
208 The "variable=content" tuplets in the query component are sorted
209 in a case insensitive alphabetical order. This helps making URLs
210 identical that otherwise only had their query pairs in different
211 orders.
212
213 --url [URL]
214 Set the input URL to work with. The URL may be provided without
215 a scheme, which then typically is not actually a legal URL but
216 trurl will try to figure out what is meant and guess what scheme
217 to use (unless --no-guess-scheme is used).
218
219 Providing multiple URLs will make trurl act on all URLs in a se‐
220 rial fashion.
221
222 If the URL cannot be parsed for whatever reason, trurl will sim‐
223 ply move on to the next provided URL - unless --verify is used.
224
225 --urlencode
226 Outputs URL encoded version of components by default when using
227 --get or --json.
228
229 --trim [component]=[what]
230 Trims data off a component. Currently this can only trim a query
231 component.
232
233 "what" is specified as a full word or as a word prefix (using a
234 single trailing asterisk ('*')) which makes trurl remove the tu‐
235 ples from the query string that match the instruction.
236
237 -v, --version
238 Show version information and exit.
239
240 --verify
241 When a URL is provided, return error immediately if it does not
242 parse as a valid URL. In normal cases, trurl can forgive a bad
243 URL input.
244
246 The --json option outputs a JSON array with one or more objects. One
247 for each URL.
248
249 Each URL JSON object contains a number of properties, a series of
250 key/value pairs. The exact set depends on the given URL.
251
252 url This key exists in every object. It is the complete URL. Af‐
253 fected by --default-port, --keep-port, and --punycode.
254
255 parts This key exists in every object, and contains an object with a
256 key for each of the settable URL components. If a component is
257 missing, it means it is not present in the URL. The parts are
258 URL decoded unless --urlencode is used.
259
260 scheme The URL scheme.
261
262 user The user name.
263
264 password
265 The password.
266
267 options
268 The options. Note that only a few URL schemes support the
269 "options" component.
270
271 host The and normalized host name. It might be a UTF-8 name if
272 an IDN name was used. It can also be a normalized IPv4
273 or IPv6 address. An IPv6 address always starts with a
274 bracket ([) - and no other host names can contain such a
275 symbol. If --punycode is used, the punycode version of
276 the host is outputted instead.
277
278 port The provided port number as a string. If the port number
279 was not provided in the URL, but the scheme is a known
280 one, and --default-port is in use, the default port for
281 that scheme will be provided here.
282
283 path The path. Including the leading slash.
284
285 query The full query, excluding the question mark separator.
286
287 fragment
288 The fragment, excluding the pound sign separator.
289
290 zoneid The zone id, which can only be present in an IPv6 ad‐
291 dress. When this key is present, then host is an IPv6 nu‐
292 merical address.
293
294 params This key contains an array of query key/value objects. Each such
295 pair is listed with "key" and "value" and their respective con‐
296 tents in the output.
297
298 The key/values are extracted from the query where they are sepa‐
299 rated by ampersands (&) - or the user sets with --query-separa‐
300 tor.
301
302 The query pairs are listed in the order of appearance in a left-
303 to-right order, but can be made alpha-sorted with --sort-query.
304
305 It is only present if the URL has a query.
306
308 Replace the host name of a URL
309 $ trurl --url https://curl.se --set host=example.com
310 https://example.com/
311
312 Create a URL by setting components
313 $ trurl --set host=example.com --set scheme=ftp
314 ftp://example.com/
315
316 Redirect a URL
317 $ trurl --url https://curl.se/we/are.html --redirect here.html
318 https://curl.se/we/here.html
319
320 Change port number
321 This also shows how trurl will remove dot-dot sequences
322 $ trurl --url https://curl.se/we/../are.html --set port=8080
323 https://curl.se:8080/are.html
324
325 Extract the path from a URL
326 $ trurl --url https://curl.se/we/are.html --get '{path}'
327 /we/are.html
328
329 Extract the port from a URL
330 This gets the default port based on the scheme if the port is
331 not set in the URL.
332 $ trurl --url https://curl.se/we/are.html --get '{default:port}'
333 443
334
335 Append a path segment to a URL
336 $ trurl --url https://curl.se/hello --append path=you
337 https://curl.se/hello/you
338
339 Append a query segment to a URL
340 $ trurl --url "https://curl.se?name=hello" --append query=search=string
341 https://curl.se/?name=hello&search=string
342
343 Read URLs from stdin
344 $ cat urllist.txt | trurl --url-file -
345 ...
346
347 Output JSON
348 $ trurl "https://fake.host/search?q=answers&user=me#frag" --json
349 [
350 {
351 "url": "https://fake.host/search?q=answers&user=me#frag",
352 "parts": [
353 "scheme": "https",
354 "host": "fake.host",
355 "path": "/search",
356 "query": "q=answers&user=me"
357 "fragment": "frag",
358 ],
359 "params": [
360 {
361 "key": "q",
362 "value": "answers"
363 },
364 {
365 "key": "user",
366 "value": "me"
367 }
368 ]
369 }
370 ]
371
372 Remove tracking tuples from query
373 $ trurl "https://curl.se?search=hey&utm_source=tracker" --trim query="utm_*"
374 https://curl.se/?search=hey
375
376 Show a specific query key value
377 $ trurl "https://example.com?a=home&here=now&thisthen" -g '{query:a}'
378 home
379
380 Sort the key/value pairs in the query component
381 $ trurl "https://example.com?b=a&c=b&a=c" --sort-query
382 https://example.com?a=c&b=a&c=b
383
384 Work with a query that uses a semicolon separator
385 $ trurl "https://curl.se?search=fool;page=5" --trim query="search" --query-separator ";"
386 https://curl.se?page=5
387
388 Accept spaces in the URL path
389 $ trurl "https://curl.se/this has space/index.html" --accept-space
390 https://curl.se/this%20has%20space/index.html
391
392 Create multiple variations of a URL with different schemes
393 $ trurl "https://curl.se/path/index.html" --iterate "scheme=http ftp sftp"
394 http://curl.se/path/index.html
395 ftp://curl.se/path/index.html
396 sftp://curl.se/path/index.html
397
399 https://curl.se/trurl
400
402 curl_url_set(3) curl_url_get(3)
403
404
405
406trurl April 27, 2023 trurl(1)