1string::token::shell(n) Text and string utilities string::token::shell(n)
2
3
4
5______________________________________________________________________________
6
8 string::token::shell - Parsing of shell command line
9
11 package require Tcl 8.5
12
13 package require string::token::shell ?1.2?
14
15 package require string::token ?1?
16
17 package require fileutil
18
19 ::string token shell ?-indices? ?-partial? ?--? string
20
21______________________________________________________________________________
22
24 This package provides a command which parses a line of text using basic
25 sh-syntax into a list of words.
26
27 The complete set of procedures is described below.
28
29 ::string token shell ?-indices? ?-partial? ?--? string
30 This command parses the input string under the assumption of it
31 following basic sh-syntax. The result of the command is a list
32 of words in the string. An error is thrown if the input does
33 not follow the allowed syntax. The behaviour can be modified by
34 specifying any of the two options -indices and -partial.
35
36 -- When specified option parsing stops at this point. This
37 option is needed if the input string may start with dash.
38 In other words, this is pretty much required if string is
39 user input.
40
41 -indices
42 When specified the output is not a list of words, but a
43 list of 4-tuples describing the words. Each tuple con‐
44 tains the type of the word, its start- and end-indices in
45 the input, and the actual text of the word.
46
47 Note that the length of the word as given by the indices
48 can differ from the length of the word found in the last
49 element of the tuple. The indices describe the words
50 extent in the input, including delimiters, intra-word
51 quoting, etc. whereas for the actual text of the word
52 delimiters are stripped, intra-word quoting decoded, etc.
53
54 The possible token types are
55
56 PLAIN Plain word, not quoted.
57
58 D:QUOTED
59 Word is delimited by double-quotes.
60
61 S:QUOTED
62 Word is delimited by single-quotes.
63
64 D:QUOTED:PART
65
66 S:QUOTED:PART
67 Like the previous types, but the word has no clos‐
68 ing quote, i.e. is incomplete. These token types
69 can occur if and only if the option -partial was
70 specified, and only for the last word of the
71 result. If the option -partial was not specified
72 such incomplete words cause the command to thrown
73 an error instead.
74
75 -partial
76 When specified the parser will accept an incomplete
77 quoted word (i.e. without closing quote) at the end of
78 the line as valid instead of throwing an error.
79
80 The basic shell syntax accepted here are unquoted, single- and double-
81 quoted words, separated by whitespace. Leading and trailing whitespace
82 are possible too, and stripped. Shell variables in their various forms
83 are not recognized, nor are sub-shells. As for the recognized forms of
84 words, see below for the detailed specification.
85
86 single-quoted word
87 A single-quoted word begins with a single-quote charac‐
88 ter, i.e. ' (ASCII 39) followed by zero or more unicode
89 characters not a single-quote, and then closed by a sin‐
90 gle-quote.
91
92 The word must be followed by either the end of the
93 string, or whitespace. A word cannot directly follow the
94 word.
95
96 double-quoted word
97 A double-quoted word begins with a double-quote charac‐
98 ter, i.e. " (ASCII 34) followed by zero or more unicode
99 characters not a double-quote, and then closed by a dou‐
100 ble-quote.
101
102 Contrary to single-quoted words a double-quote can be
103 embedded into the word, by prefacing, i.e. escaping, i.e.
104 quoting it with a backslash character \ (ASCII 92). Simi‐
105 larly a backslash character must be quoted with itself to
106 be inserted literally.
107
108 unquoted word
109 Unquoted words are not delimited by quotes and thus can‐
110 not contain whitespace or single-quote characters. Dou‐
111 ble-quote and backslash characters can be put into
112 unquoted words, by quting them like for double-quoted
113 words.
114
115 whitespace
116 Whitespace is any unicode space character. This is
117 equivalent to string is space, or the regular expression
118 \\s.
119
120 Whitespace may occur before the first word, or after the
121 last word. Whitespace must occur between adjacent words.
122
124 This document, and the package it describes, will undoubtedly contain
125 bugs and other problems. Please report such in the category textutil
126 of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please
127 also report any ideas for enhancements you may have for either package
128 and/or documentation.
129
130 When proposing code changes, please provide unified diffs, i.e the out‐
131 put of diff -u.
132
133 Note further that attachments are strongly preferred over inlined
134 patches. Attachments can be made by going to the Edit form of the
135 ticket immediately after its creation, and then using the left-most
136 button in the secondary navigation bar.
137
139 bash, lexing, parsing, shell, string, tokenization
140
142 Text processing
143
144
145
146tcllib 1.2 string::token::shell(n)