scan(n) - f39

1scan(n)                      Tcl Built-In Commands                     scan(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       scan - Parse string using conversion specifiers in the style of sscanf
9

SYNOPSIS

11       scan string format ?varName varName ...?
12______________________________________________________________________________
13

INTRODUCTION

15       This  command parses substrings from an input string in a fashion simi‐
16       lar to the ANSI C sscanf procedure and returns a count of the number of
17       conversions  performed, or -1 if the end of the input string is reached
18       before any conversions have been performed.  String gives the input  to
19       be  parsed  and  format  indicates  how to parse it, using % conversion
20       specifiers as in sscanf.  Each varName gives the name  of  a  variable;
21       when a substring is scanned from string that matches a conversion spec‐
22       ifier, the substring is assigned to the corresponding variable.  If  no
23       varName  variables  are specified, then scan works in an inline manner,
24       returning the data that would otherwise be stored in the variables as a
25       list.   In the inline case, an empty string is returned when the end of
26       the input string is reached before any conversions have been performed.
27

DETAILS ON SCANNING

29       Scan operates by scanning string and  format  together.   If  the  next
30       character  in  format  is  a blank or tab then it matches any number of
31       white space characters in string (including zero).  Otherwise, if it is
32       not  a  %  character  then  it must match the next character of string.
33       When a % is encountered in format, it indicates the start of a  conver‐
34       sion  specifier.   A  conversion  specifier  contains up to four fields
35       after the %: a XPG3 position specifier (or a *  to  indicate  the  con‐
36       verted value is to be discarded instead of assigned to any variable); a
37       number indicating a maximum substring width; a  size  modifier;  and  a
38       conversion  character.  All of these fields are optional except for the
39       conversion character.  The fields that are present must appear  in  the
40       order given above.
41
42       When  scan  finds  a conversion specifier in format, it first skips any
43       white-space characters in string (unless the conversion character is  [
44       or  c).   Then  it  converts the next input characters according to the
45       conversion specifier and stores the result in the variable given by the
46       next argument to scan.
47
48   OPTIONAL POSITIONAL SPECIFIER
49       If  the  %  is followed by a decimal number and a $, as in “%2$d”, then
50       the variable to use is not taken from  the  next  sequential  argument.
51       Instead, it is taken from the argument indicated by the number, where 1
52       corresponds to the first varName.  If there are any  positional  speci‐
53       fiers  in  format then all of the specifiers must be positional.  Every
54       varName on the argument list must correspond to exactly one  conversion
55       specifier or an error is generated, or in the inline case, any position
56       can be specified at most once and the empty positions will be filled in
57       with empty strings.
58
59   OPTIONAL SIZE MODIFIER
60       The size modifier field is used only when scanning a substring into one
61       of Tcl's integer values.  The size modifier field dictates the  integer
62       range  acceptable  to be stored in a variable, or, for the inline case,
63       in a position in the result list.  The syntactically valid  values  for
64       the  size  modifier  are h, L, l, and ll.  The h size modifier value is
65       equivalent to the absence of a size  modifier  in  the  the  conversion
66       specifier.  Either one indicates the integer range to be stored is lim‐
67       ited to the same range produced by the int() function of the expr  com‐
68       mand.  The L size modifier is equivalent to the l size modifier. Either
69       one indicates the integer range to be stored is  limited  to  the  same
70       range produced by the wide() function of the expr command.  The ll size
71       modifier indicates that the integer range to be stored is unlimited.
72
73   MANDATORY CONVERSION CHARACTER
74       The following conversion characters are supported:
75
76       d      The input substring must be a decimal integer.  It  is  read  in
77              and  the  integer  value is stored in the variable, truncated as
78              required by the size modifier value.
79
80       o      The input substring must be an octal integer. It is read in  and
81              the  integer  value  is  stored  in  the  variable, truncated as
82              required by the size modifier value.
83
84       x or X The input substring must be a hexadecimal integer.  It  is  read
85              in and the integer value is stored in the variable, truncated as
86              required by the size modifier value.
87
88       b      The input substring must be a binary integer.  It is read in and
89              the  integer  value  is  stored  in  the  variable, truncated as
90              required by the size modifier value.
91
92       u      The input substring must be  a  decimal  integer.   The  integer
93              value  is  truncated as required by the size modifier value, and
94              the corresponding unsigned value for  that  truncated  range  is
95              computed  and  stored  in the variable as a decimal string.  The
96              conversion makes no sense  without  reference  to  a  truncation
97              range,  so  the size modifier ll is not permitted in combination
98              with conversion character u.
99
100       i      The input substring must be an integer.  The base (i.e. decimal,
101              octal,  or hexadecimal) is determined by the C convention (lead‐
102              ing 0 for octal; prefix 0x for hexadecimal).  The integer  value
103              is  stored  in  the  variable, truncated as required by the size
104              modifier value.
105
106       c      A single character is read in and its Unicode value is stored in
107              the  variable  as  an integer value.  Initial white space is not
108              skipped in this case, so the input substring  may  be  a  white-
109              space character.
110
111       s      The  input  substring  consists  of all the characters up to the
112              next white-space character; the characters  are  copied  to  the
113              variable.
114
115       e or f or g or E or G
116              The  input  substring must be a floating-point number consisting
117              of an optional sign, a string of decimal  digits  possibly  con‐
118              taining  a decimal point, and an optional exponent consisting of
119              an e or E followed by an optional sign and a string  of  decimal
120              digits.  It is read in and stored in the variable as a floating-
121              point value.
122
123       [chars]
124              The input substring consists of one or more characters in chars.
125              The  matching  string  is  stored in the variable.  If the first
126              character between the brackets is a ] then it is treated as part
127              of  chars rather than the closing bracket for the set.  If chars
128              contains a sequence of the form a-b then any character between a
129              and  b  (inclusive)  will match.  If the first or last character
130              between the brackets is a -, then it is treated as part of chars
131              rather than indicating a range.
132
133       [^chars]
134              The  input  substring  consists of one or more characters not in
135              chars.  The matching string is stored in the variable.   If  the
136              character  immediately following the ^ is a ] then it is treated
137              as part of the set rather than the closing bracket for the  set.
138              If  chars contains a sequence of the form a-b then any character
139              between a and b (inclusive) will be excluded from the  set.   If
140              the first or last character between the brackets is a -, then it
141              is treated as part of  chars  rather  than  indicating  a  range
142              value.
143
144       n      No  input is consumed from the input string.  Instead, the total
145              number of characters scanned from the input  string  so  far  is
146              stored in the variable.
147
148       The  number  of  characters read from the input for a conversion is the
149       largest number that makes sense for that  particular  conversion  (e.g.
150       as many decimal digits as possible for %d, as many octal digits as pos‐
151       sible for %o, and so on).  The input substring for a  given  conversion
152       terminates  either  when a white-space character is encountered or when
153       the maximum substring width has been reached,  whichever  comes  first.
154       If  a  *  is  present  in  the conversion specifier then no variable is
155       assigned and the next scan argument is not consumed.
156

DIFFERENCES FROM ANSI SSCANF

158       The behavior of the scan command is the same as  the  behavior  of  the
159       ANSI C sscanf procedure except for the following differences:
160
161       [1]    %p conversion specifier is not supported.
162
163       [2]    For  %c  conversions  a single character value is converted to a
164              decimal string, which is then assigned to the corresponding var‐
165              Name; no substring width may be specified for this conversion.
166
167       [3]    The  h  modifier is always ignored and the l and L modifiers are
168              ignored when converting real values (i.e. type  double  is  used
169              for the internal representation).  The ll modifier has no sscanf
170              counterpart.
171
172       [4]    If the end of the input string is reached before any conversions
173              have  been performed and no variables are given, an empty string
174              is returned.
175

EXAMPLES

177       Convert a UNICODE character to its numeric value:
178
179              set char "x"
180              set value [scan $char %c]
181
182       Parse a simple color specification of the form #RRGGBB using  hexadeci‐
183       mal conversions with substring sizes:
184
185              set string "#08D03F"
186              scan $string "#%2x%2x%2x" r g b
187
188       Parse  a HH:MM time string, noting that this avoids problems with octal
189       numbers by forcing interpretation as decimals (if we did not  care,  we
190       would use the %i conversion instead):
191
192              set string "08:08"   ;# *Not* octal!
193              if {[scan $string "%d:%d" hours minutes] != 2} {
194                  error "not a valid time string"
195              }
196              # We have to understand numeric ranges ourselves...
197              if {$minutes < 0 || $minutes > 59} {
198                  error "invalid number of minutes"
199              }
200
201       Break a string up into sequences of non-whitespace characters (note the
202       use of the %n conversion so that we get skipping  over  leading  white‐
203       space correct):
204
205              set string " a string {with braced words} + leading space "
206              set words {}
207              while {[scan $string %s%n word length] == 2} {
208                  lappend words $word
209                  set string [string range $string $length end]
210              }
211
212       Parse a simple coordinate string, checking that it is complete by look‐
213       ing for the terminating character explicitly:
214
215              set string "(5.2,-4e-2)"
216              # Note that the spaces before the literal parts of
217              # the scan pattern are significant, and that ")" is
218              # the Unicode character \u0029
219              if {
220                  [scan $string " (%f ,%f %c" x y last] != 3
221                  || $last != 0x0029
222              } then {
223                  error "invalid coordinate string"
224              }
225              puts "X=$x, Y=$y"
226
227       An interactive session demonstrating the truncation of  integer  values
228       determined by size modifiers:
229
230              % set tcl_platform(wordSize)
231              4
232              % scan 20000000000000000000 %d
233              2147483647
234              % scan 20000000000000000000 %ld
235              9223372036854775807
236              % scan 20000000000000000000 %lld
237              20000000000000000000
238

KEYWORDS

243       conversion specifier, parse, scan
244
245
246
247Tcl                                   8.4                              scan(n)