1string::token::shell(n)    Text and string utilities   string::token::shell(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       string::token::shell - Parsing of shell command line
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require string::token::shell  ?1.2?
14
15       package require string::token  ?1?
16
17       package require fileutil
18
19       ::string token shell ?-indices? ?-partial? ?--? string
20
21______________________________________________________________________________
22

DESCRIPTION

24       This package provides a command which parses a line of text using basic
25       sh-syntax into a list of words.
26
27       The complete set of procedures is described below.
28
29       ::string token shell ?-indices? ?-partial? ?--? string
30              This command parses the input string under the assumption of  it
31              following  basic sh-syntax.  The result of the command is a list
32              of words in the string.  An error is thrown if  the  input  does
33              not follow the allowed syntax.  The behaviour can be modified by
34              specifying any of the two options -indices and -partial.
35
36              --     When specified option parsing stops at this  point.  This
37                     option is needed if the input string may start with dash.
38                     In other words, this is pretty much required if string is
39                     user input.
40
41              -indices
42                     When  specified  the output is not a list of words, but a
43                     list of 4-tuples describing the words.  Each  tuple  con‐
44                     tains the type of the word, its start- and end-indices in
45                     the input, and the actual text of the word.
46
47                     Note that the length of the word as given by the  indices
48                     can  differ from the length of the word found in the last
49                     element of the tuple. The indices describe the words  ex‐
50                     tent in the input, including delimiters, intra-word quot‐
51                     ing, etc. whereas for the actual text of the word  delim‐
52                     iters are stripped, intra-word quoting decoded, etc.
53
54                     The possible token types are
55
56                     PLAIN  Plain word, not quoted.
57
58                     D:QUOTED
59                            Word is delimited by double-quotes.
60
61                     S:QUOTED
62                            Word is delimited by single-quotes.
63
64                     D:QUOTED:PART
65
66                     S:QUOTED:PART
67                            Like the previous types, but the word has no clos‐
68                            ing quote, i.e. is incomplete. These  token  types
69                            can  occur  if and only if the option -partial was
70                            specified, and only for the last word of  the  re‐
71                            sult.  If  the  option  -partial was not specified
72                            such incomplete words cause the command to  thrown
73                            an error instead.
74
75              -partial
76                     When  specified  the  parser  will  accept  an incomplete
77                     quoted word (i.e. without closing quote) at  the  end  of
78                     the line as valid instead of throwing an error.
79
80       The  basic shell syntax accepted here are unquoted, single- and double-
81       quoted words, separated by whitespace. Leading and trailing  whitespace
82       are possible too, and stripped.  Shell variables in their various forms
83       are not recognized, nor are sub-shells.  As for the recognized forms of
84       words, see below for the detailed specification.
85
86              single-quoted word
87                     A  single-quoted  word begins with a single-quote charac‐
88                     ter, i.e.  ' (ASCII 39) followed by zero or more  unicode
89                     characters  not a single-quote, and then closed by a sin‐
90                     gle-quote.
91
92                     The word must be  followed  by  either  the  end  of  the
93                     string,  or whitespace. A word cannot directly follow the
94                     word.
95
96              double-quoted word
97                     A double-quoted word begins with a  double-quote  charac‐
98                     ter,  i.e.  " (ASCII 34) followed by zero or more unicode
99                     characters not a double-quote, and then closed by a  dou‐
100                     ble-quote.
101
102                     Contrary to single-quoted words a double-quote can be em‐
103                     bedded into the word, by prefacing, i.e.  escaping,  i.e.
104                     quoting it with a backslash character \ (ASCII 92). Simi‐
105                     larly a backslash character must be quoted with itself to
106                     be inserted literally.
107
108              unquoted word
109                     Unquoted  words are not delimited by quotes and thus can‐
110                     not contain whitespace or single-quote  characters.  Dou‐
111                     ble-quote  and  backslash  characters can be put into un‐
112                     quoted words,  by  quting  them  like  for  double-quoted
113                     words.
114
115              whitespace
116                     Whitespace  is  any  unicode  space  character.   This is
117                     equivalent to string is space, or the regular  expression
118                     \\s.
119
120                     Whitespace  may occur before the first word, or after the
121                     last word. Whitespace must occur between adjacent words.
122

BUGS, IDEAS, FEEDBACK

124       This document, and the package it describes, will  undoubtedly  contain
125       bugs  and  other problems.  Please report such in the category textutil
126       of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].   Please
127       also  report any ideas for enhancements you may have for either package
128       and/or documentation.
129
130       When proposing code changes, please provide unified diffs, i.e the out‐
131       put of diff -u.
132
133       Note  further  that  attachments  are  strongly  preferred over inlined
134       patches. Attachments can be made by going  to  the  Edit  form  of  the
135       ticket  immediately  after  its  creation, and then using the left-most
136       button in the secondary navigation bar.
137

KEYWORDS

139       bash, lexing, parsing, shell, string, tokenization
140

CATEGORY

142       Text processing
143
144
145
146tcllib                                1.2              string::token::shell(n)
Impressum