1trurl(1)                         trurl Manual                         trurl(1)
2
3
4

NAME

6       trurl - transpose URLs
7

SYNOPSIS

9       trurl [options / URLs]
10

DESCRIPTION

12       trurl parses, manipulates and outputs URLs and parts of URLs.
13
14       It  uses  the  RFC  3986  definition  of URLs and it uses libcurl's URL
15       parser to do so, which includes a few "extensions". The URL support  is
16       limited  to "hierarchical" URLs, the ones that use "://" separators af‐
17       ter the scheme.
18
19       Typically you pass in one or more URLs and decide what of that you want
20       output. Possibly modifying the URL as well.
21
22       trurl knows URLs and every URL consists of up to ten separate and inde‐
23       pendent "components". These components can be  extracted,  removed  and
24       updated  with trurl and they are referred to by their respective names:
25       scheme, user, password, options, host, port, path, query, fragment  and
26       zoneid.
27

OPTIONS

29       Options  start  with  one or two dashes. Many of the options require an
30       additional value next to them.
31
32       Any other argument is interpreted as a URL argument, and is treated  as
33       if it was following a --url option.
34
35       The  first argument that is exactly two dashes ("--"), marks the end of
36       options; any argument after the end of options is interpreted as a  URL
37       argument even if it starts with a dash.
38
39
40       -a, --append [component]=[data]
41              Append  data  to  a  component. This can only append data to the
42              path and the query components.
43
44              For path, this URL encodes and appends the new  segment  to  the
45              path, separated with a slash.
46
47              For  query,  this URL encodes and appends the new segment to the
48              query, separated with an ampersand (&). If the appended  segment
49              contains  an equal sign ('=') that one will be kept verbatim and
50              both sides of the first occurrence will  be  URL  encoded  sepa‐
51              rately.
52
53       --accept-space
54              When set, trurl will try to accept spaces as part of the URL and
55              instead URL encode such occurrences accordingly.
56
57              According to RFC 3986, a space cannot legally be part of a  URL.
58              This  option  provides  a  best-effort  to  convert the provided
59              string into a valid URL.
60
61       --default-port
62              When set, trurl will use the scheme's default  port  number  for
63              URLs with a known scheme, and without an explicit port number.
64
65              Note  that trurl only knows default port numbers for URL schemes
66              that are supported by libcurl.
67
68              Since, by default, trurl removes default port numbers from  URLs
69              with  a  known scheme, this option is pretty much ignored unless
70              one of --get, --json, and --keep-port is not also specified.
71
72       -f, --url-file [file name]
73              Read URLs to work on from the given file. Use the file name  "-"
74              (a single minus) to tell trurl to read the URLs from stdin.
75
76              Each  line needs to be a single valid URL. trurl will remove one
77              carriage return character at the end of  the  line  if  present,
78              trim off all the trailing space and tab characters, and skip all
79              empty (after trimming) lines.
80
81              The maximum line length supported in a file like  this  is  4094
82              bytes.  Lines that exceed that length are skipped, and a warning
83              is printed to stderr when they are encountered.
84
85       -g, --get [format]
86              Output text and  URL  data  according  to  the  provided  format
87              string.  Components from the URL can be output when specified as
88              {component} or [component], with  the  name  of  the  part  show
89              within  curly  braces  or  brackets.  You can not mix braces and
90              brackets for this purpose in the same command line.
91
92              The following component names are  available  (case  sensitive):
93              url,  scheme,  user, password, options, host, port, path, query,
94              fragment and zoneid.
95
96              {component} will expand to nothing if the given  component  does
97              not have a value.
98
99              Components  are  shown  URL  decoded  by default. If you instead
100              write the component prefixed with a  colon  like  "{:path}",  it
101              gets output URL encoded.
102
103              You  may  also  prefix components with default: and/or puny:, in
104              any order.
105
106              If  default:  is  specified,  like  "{default:url}"   or   "{de‐
107              fault:port}",  and  the  port is not explicitly specified in the
108              URL, the scheme's default port will be output if it is known.
109
110              If puny: is specified, like "{puny:url}" or  "{puny:host}",  the
111              "punycoded" version of the host name will be used in the output.
112
113              If  --default-port  is specified, all formats are expanded as if
114              they used default:; and if --punycode is used, all  formats  are
115              expanded  as  if  they used puny:. Also note that "{url}" is af‐
116              fected by the --keep-port option.
117
118              Hosts provided as IPv6  numerical  addresses  will  be  provided
119              within square brackets. Like "[fe80::20c:29ff:fe9c:409b]".
120
121              Hosts  provided as IPv4 numerical addresses will be "normalized"
122              and provided as four dot-separated decimal numbers when output.
123
124              You can access specific keys in the query string using the  for‐
125              mat  {query:key}.  Then the value of the first matching key will
126              be output using a case sensitive match. When  extracting  a  URL
127              decoded query key that contains %00, such octet will be replaced
128              with a single period '.' in the output.
129
130              You can access specific keys in the query  string  and  out  all
131              values  using  the  format {query-all:key}. This looks for 'key'
132              case sensitively and will output all values for that key  space-
133              separated.
134
135              The "format" string supports the following backslash sequences:
136
137              \\ - backslash
138
139              \t - tab
140
141              \n - newline
142
143              \r - carriage return
144
145              \{ - an open curly brace that does not start a variable
146
147              \[ - an open bracket that does not start a variable
148
149              All other text in the format string will be shown as-is.
150
151       -h, --help
152              Show the help output.
153
154       --iterate [component]=[item1 item2 ...]
155              Set  the component to multiple values and output the result once
156              for each iteration. Several combined iterations are  allowed  to
157              generate  combinations, but only one --iterate option per compo‐
158              nent. The listed items to iterate over should  be  separated  by
159              single spaces.
160
161       --json Outputs all set components of the URLs as JSON objects. All com‐
162              ponents of the URL that have data  will  get  populated  in  the
163              parts  object using their component names. See below for details
164              on the format.
165
166       --keep-port
167              By default, trurl removes default port numbers from URLs with  a
168              known  scheme even if they are explicitly specified in the input
169              URL. This options, makes trurl not remove them.
170
171       --no-guess-scheme
172              Disables libcurl's scheme guessing feature.  URLs  that  do  not
173              contain a scheme will be treated as invalid URLs.
174
175       --punycode
176              Uses  the "punycoded" version of the host name, which is how In‐
177              ternational Domain Names are converted into plain ASCII. If  the
178              host name is not using IDN, the regular ASCII name is used.
179
180       --query-separator [what]
181              Specify  the  single letter used for separating query pairs. The
182              default is "&" but at least in the past sometimes semicolons ";"
183              or  even colons ":" have been used for this purpose. If your URL
184              uses something other than the default letter, setting the  right
185              one makes sure trurl can do its query operations properly.
186
187       --redirect [URL]
188              Redirect  the URL to this new location.  The redirection is per‐
189              formed on the base URL, so, if no base URL is specified, no  re‐
190              direction will be performed.
191
192       -s, --set [component][:]=[data]
193              Set this URL component. Setting blank string ("") will clear the
194              component from the URL.
195
196              The following components can be set: url,  scheme,  user,  pass‐
197              word, options, host, port, path, query, fragment and zoneid.
198
199              If a simple "="-assignment is used, the data is URL encoded when
200              applied. If ":=" is used, the data is assumed to already be  URL
201              encoded and will be stored as-is.
202
203              If  no URL or --url-file argument is provided, trurl will try to
204              create a URL using the components provided by the --set options.
205              If not enough components are specified, this will fail.
206
207       --sort-query
208              The "variable=content" tuplets in the query component are sorted
209              in a case insensitive alphabetical order. This helps making URLs
210              identical that otherwise only had their query pairs in different
211              orders.
212
213       --url [URL]
214              Set the input URL to work with. The URL may be provided  without
215              a  scheme,  which then typically is not actually a legal URL but
216              trurl will try to figure out what is meant and guess what scheme
217              to use (unless --no-guess-scheme is used).
218
219              Providing multiple URLs will make trurl act on all URLs in a se‐
220              rial fashion.
221
222              If the URL cannot be parsed for whatever reason, trurl will sim‐
223              ply move on to the next provided URL - unless --verify is used.
224
225       --urlencode
226              Outputs  URL encoded version of components by default when using
227              --get or --json.
228
229       --trim [component]=[what]
230              Trims data off a component. Currently this can only trim a query
231              component.
232
233              "what"  is specified as a full word or as a word prefix (using a
234              single trailing asterisk ('*')) which makes trurl remove the tu‐
235              ples from the query string that match the instruction.
236
237       -v, --version
238              Show version information and exit.
239
240       --verify
241              When  a URL is provided, return error immediately if it does not
242              parse as a valid URL. In normal cases, trurl can forgive  a  bad
243              URL input.
244

JSON output format

246       The  --json  option  outputs a JSON array with one or more objects. One
247       for each URL.
248
249       Each URL JSON object contains a  number  of  properties,  a  series  of
250       key/value pairs. The exact set depends on the given URL.
251
252       url    This  key  exists  in  every object. It is the complete URL. Af‐
253              fected by --default-port, --keep-port, and --punycode.
254
255       parts  This key exists in every object, and contains an object  with  a
256              key  for  each of the settable URL components. If a component is
257              missing, it means it is not present in the URL.  The  parts  are
258              URL decoded unless --urlencode is used.
259
260              scheme The URL scheme.
261
262              user   The user name.
263
264              password
265                     The password.
266
267              options
268                     The options. Note that only a few URL schemes support the
269                     "options" component.
270
271              host   The and normalized host name. It might be a UTF-8 name if
272                     an  IDN  name was used.  It can also be a normalized IPv4
273                     or IPv6 address. An IPv6 address  always  starts  with  a
274                     bracket  ([) - and no other host names can contain such a
275                     symbol. If --punycode is used, the  punycode  version  of
276                     the host is outputted instead.
277
278              port   The  provided port number as a string. If the port number
279                     was not provided in the URL, but the scheme  is  a  known
280                     one,  and  --default-port is in use, the default port for
281                     that scheme will be provided here.
282
283              path   The path. Including the leading slash.
284
285              query  The full query, excluding the question mark separator.
286
287              fragment
288                     The fragment, excluding the pound sign separator.
289
290              zoneid The zone id, which can only be present  in  an  IPv6  ad‐
291                     dress. When this key is present, then host is an IPv6 nu‐
292                     merical address.
293
294       params This key contains an array of query key/value objects. Each such
295              pair  is listed with "key" and "value" and their respective con‐
296              tents in the output.
297
298              The key/values are extracted from the query where they are sepa‐
299              rated  by ampersands (&) - or the user sets with --query-separa‐
300              tor.
301
302              The query pairs are listed in the order of appearance in a left-
303              to-right order, but can be made alpha-sorted with --sort-query.
304
305              It is only present if the URL has a query.
306

EXAMPLES

308       Replace the host name of a URL
309              $ trurl --url https://curl.se --set host=example.com
310              https://example.com/
311
312       Create a URL by setting components
313               $ trurl --set host=example.com --set scheme=ftp
314               ftp://example.com/
315
316       Redirect a URL
317              $ trurl --url https://curl.se/we/are.html --redirect here.html
318              https://curl.se/we/here.html
319
320       Change port number
321              This also shows how trurl will remove dot-dot sequences
322              $ trurl --url https://curl.se/we/../are.html --set port=8080
323              https://curl.se:8080/are.html
324
325       Extract the path from a URL
326              $ trurl --url https://curl.se/we/are.html --get '{path}'
327              /we/are.html
328
329       Extract the port from a URL
330              This  gets  the  default port based on the scheme if the port is
331              not set in the URL.
332              $ trurl --url https://curl.se/we/are.html --get '{default:port}'
333              443
334
335       Append a path segment to a URL
336              $ trurl --url https://curl.se/hello --append path=you
337              https://curl.se/hello/you
338
339       Append a query segment to a URL
340              $ trurl --url "https://curl.se?name=hello" --append query=search=string
341               https://curl.se/?name=hello&search=string
342
343       Read URLs from stdin
344              $ cat urllist.txt | trurl --url-file -
345              ...
346
347       Output JSON
348              $ trurl "https://fake.host/search?q=answers&user=me#frag" --json
349              [
350                {
351                  "url": "https://fake.host/search?q=answers&user=me#frag",
352                  "parts": [
353                      "scheme": "https",
354                      "host": "fake.host",
355                      "path": "/search",
356                      "query": "q=answers&user=me"
357                      "fragment": "frag",
358                  ],
359                  "params": [
360                    {
361                      "key": "q",
362                      "value": "answers"
363                    },
364                    {
365                      "key": "user",
366                      "value": "me"
367                    }
368                  ]
369                }
370              ]
371
372       Remove tracking tuples from query
373              $ trurl "https://curl.se?search=hey&utm_source=tracker" --trim query="utm_*"
374              https://curl.se/?search=hey
375
376       Show a specific query key value
377              $ trurl "https://example.com?a=home&here=now&thisthen" -g '{query:a}'
378              home
379
380       Sort the key/value pairs in the query component
381              $ trurl "https://example.com?b=a&c=b&a=c" --sort-query
382              https://example.com?a=c&b=a&c=b
383
384       Work with a query that uses a semicolon separator
385              $ trurl "https://curl.se?search=fool;page=5" --trim query="search" --query-separator ";"
386              https://curl.se?page=5
387
388       Accept spaces in the URL path
389              $ trurl "https://curl.se/this has space/index.html" --accept-space
390              https://curl.se/this%20has%20space/index.html
391
392       Create multiple variations of a URL with different schemes
393              $ trurl "https://curl.se/path/index.html" --iterate "scheme=http ftp sftp"
394              http://curl.se/path/index.html
395              ftp://curl.se/path/index.html
396              sftp://curl.se/path/index.html
397

WWW

399       https://curl.se/trurl
400

SEE ALSO

402       curl_url_set(3) curl_url_get(3)
403
404
405
406trurl                           April 27, 2023                        trurl(1)
Impressum