1Scanf(3) OCaml library Scanf(3)
2
3
4
6 Scanf - Formatted input functions.
7
9 Module Scanf
10
12 Module Scanf
13 : sig end
14
15
16 Formatted input functions.
17
18
19
20
21
22
23
24
25 === Introduction ===
26
27
28 === Functional input with format strings ===
29
30
31 === The module Scanf provides formatted input functions or scanners.
32 The formatted input functions can read from any kind of input, includ‐
33 ing strings, files, or anything that can return characters. The more
34 general source of characters is named a scanning buffer and has type
35 Scanf.Scanning.scanbuf. The more general formatted input function reads
36 from any scanning buffer and is named bscanf. Generally speaking, the
37 formatted input functions have 3 arguments: - the first argument is a
38 source of characters for the input, - the second argument is a format
39 string that specifies the values to read, - the third argument is a
40 receiver function that is applied to the values read. Hence, a typical
41 call to the formatted input function Scanf.bscanf is bscanf ib fmt f,
42 where: - ib is a source of characters (typically a scanning buffer with
43 type Scanf.Scanning.scanbuf), - fmt is a format string (the same format
44 strings as those used to print material with module Printf or Format),
45 - f is a function that has as many arguments as the number of values to
46 read in the input. ===
47
48
49 === A simple example ===
50
51
52 === As suggested above, the expression bscanf ib %d f reads a decimal
53 integer n from the source of characters ib and returns f n. For
54 instance, - if we use stdib as the source of characters (Scanf.Scan‐
55 ning.stdib is the predefined input buffer that reads from standard
56 input), - if we define the receiver f as let f x = x + 1, then bscanf
57 stdib %d f reads an integer n from the standard input and returns f n
58 (that is n + 1). Thus, if we evaluate bscanf stdib %d f, and then enter
59 41 at the keyboard, we get 42 as the final result. ===
60
61
62 === Formatted input as a functional feature ===
63
64
65 === The Caml scanning facility is reminiscent of the corresponding C
66 feature. However, it is also largely different, simpler, and yet more
67 powerful: the formatted input functions are higher-order functionals
68 and the parameter passing mechanism is just the regular function appli‐
69 cation not the variable assignment based mechanism which is typical for
70 formatted input in imperative languages; the Caml format strings also
71 feature useful additions to easily define complex tokens; as expected
72 within a functional programming language, the formatted input functions
73 also support polymorphism, in particular arbitrary interaction with
74 polymorphic user-defined scanners. Furthermore, the Caml formatted
75 input facility is fully type-checked at compile time. ===
76
77
78 module Scanning : sig end
79
80
81
82 Scanning buffers
83
84
85
86
87
88
89 === Type of formatted input functions ===
90
91
92 type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.scanbuf, 'b, 'c, 'a ->
93 'd, 'd) format6 -> 'c
94
95
96 The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
97 type of a formatted input function that reads from some scanning buffer
98 according to some format string; more precisely, if scan is some for‐
99 matted input function, then scan ib fmt f applies f to the arguments
100 specified by the format string fmt , when scan has read those arguments
101 from the scanning input buffer ib .
102
103 For instance, the scanf function below has type ('a, 'b, 'c, 'd) scan‐
104 ner , since it is a formatted input function that reads from stdib :
105 scanf fmt f applies f to the arguments specified by fmt , reading those
106 arguments from stdin as expected.
107
108 If the format fmt has some %r indications, the corresponding input
109 functions must be provided before the receiver f argument. For
110 instance, if read_elem is an input function for values of type t , then
111 bscanf ib %r; read_elem f reads a value v of type t followed by a ';'
112 character, and returns f v .
113
114
115
116
117 exception Scan_failure of string
118
119
120 The exception that formatted input functions raise when the input can‐
121 not be read according to the given format.
122
123
124
125
126
127 === The general formatted input function ===
128
129
130 val bscanf : Scanning.scanbuf -> ('a, 'b, 'c, 'd) scanner
131
132
133 bscanf ib fmt r1 ... rN f reads arguments for the function f , from the
134 scanning buffer ib , according to the format string fmt , and applies f
135 to these values. The result of this call to f is returned as the
136 result of the entire bscanf call. For instance, if f is the function
137 fun s i -> i + 1 , then Scanf.sscanf x= 1 %s = %i f returns 2 .
138
139 Arguments r1 to rN are user-defined input functions that read the argu‐
140 ment corresponding to a %r conversion.
141
142
143
144
145
146 === Format string description ===
147
148
149 === The format is a character string which contains three types of
150 objects: - plain characters, which are simply matched with the charac‐
151 ters of the input, - conversion specifications, each of which causes
152 reading and conversion of one argument for the function f, - scanning
153 indications to specify boundaries of tokens. ===
154
155
156 === The space character in format strings ===
157
158
159 === As mentioned above, a plain character in the format string is just
160 matched with the next character of the input; however, two characters
161 are special exceptions to this rule: the space character (' ' or ASCII
162 code 32) and the line feed character ('\n' or ASCII code 10). A space
163 does not match a single space character, but any amount of ``white‐
164 space'' in the input. More precisely, a space inside the format string
165 matches any number of tab, space, line feed and carriage return charac‐
166 ters. Similarly, a line feed character in the format string matches
167 either a single line feed or a carriage return followed by a line feed.
168 Matching any amount of whitespace, a space in the format string also
169 matches no amount of whitespace at all; hence, the call bscanf ib Price
170 = %d $ (fun p -> p) succeeds and returns 1 when reading an input with
171 various whitespace in it, such as Price = 1 $, Price = 1 $, or even
172 Price=1$. ===
173
174
175 === Conversion specifications in format strings ===
176
177
178 === Conversion specifications consist in the % character, followed by
179 an optional flag, an optional field width, and followed by one or two
180 conversion characters. The conversion characters and their meanings
181 are: - d: reads an optionally signed decimal integer. - i: reads an
182 optionally signed integer (usual input formats for hexadecimal (0x[d]+
183 and 0X[d]+), octal (0o[d]+), and binary 0b[d]+ notations are under‐
184 stood). - u: reads an unsigned decimal integer. - x or X: reads an
185 unsigned hexadecimal integer. - o: reads an unsigned octal integer. -
186 s: reads a string argument that spreads as much as possible, until the
187 following bounding condition holds: a whitespace has been found, a
188 scanning indication has been encountered, or the end-of-input has been
189 reached. Hence, this conversion always succeeds: it returns an empty
190 string, if the bounding condition holds when the scan begins. - S:
191 reads a delimited string argument (delimiters and special escaped char‐
192 acters follow the lexical conventions of Caml). - c: reads a single
193 character. To test the current input character without reading it,
194 specify a null field width, i.e. use specification %0c. Raise
195 Invalid_argument, if the field width specification is greater than 1.
196 - C: reads a single delimited character (delimiters and special escaped
197 characters follow the lexical conventions of Caml). - f, e, E, g, G:
198 reads an optionally signed floating-point number in decimal notation,
199 in the style dddd.ddd e/E+-dd. - F: reads a floating point number
200 according to the lexical conventions of Caml (hence the decimal point
201 is mandatory if the exponent part is not mentioned). - B: reads a
202 boolean argument (true or false). - b: reads a boolean argument (for
203 backward compatibility; do not use in new programs). - ld, li, lu, lx,
204 lX, lo: reads an int32 argument to the format specified by the second
205 letter (decimal, hexadecimal, etc). - nd, ni, nu, nx, nX, no: reads a
206 nativeint argument to the format specified by the second letter. - Ld,
207 Li, Lu, Lx, LX, Lo: reads an int64 argument to the format specified by
208 the second letter. - [ range ]: reads characters that matches one of
209 the characters mentioned in the range of characters range (or not men‐
210 tioned in it, if the range starts with ^). Reads a string that can be
211 empty, if the next input character does not match the range. The set of
212 characters from c1 to c2 (inclusively) is denoted by c1-c2. Hence,
213 %[0-9] returns a string representing a decimal number or an empty
214 string if no decimal digit is found; similarly,
215 %[\\048-\\057\\065-\\070] returns a string of hexadecimal digits. If a
216 closing bracket appears in a range, it must occur as the first charac‐
217 ter of the range (or just after the ^ in case of range negation); hence
218 []] matches a ] character and [^]] matches any character that is not ].
219 - r: user-defined reader. Takes the next ri formatted input function
220 and applies it to the scanning buffer ib to read the next argument. The
221 input function ri must therefore have type Scanning.scanbuf -> 'a and
222 the argument read has type 'a. - { fmt %}: reads a format string argu‐
223 ment. The format string read must have the same type as the format
224 string specification fmt. For instance, %{%i%} reads any format string
225 that can read a value of type int; hence Scanf.sscanf fmt:\\\ number is
226 %u\\\"" fmt:%{%i%} succeeds and returns the format string number is %u
227 . - \( fmt %\): scanning format substitution. Reads a format string
228 to replace fmt. The format string read must have the same type as the
229 format string specification fmt. For instance, %\( %i% \) reads any
230 format string that can read a value of type int; hence Scanf.sscanf \\\
231 %4d\\\"1234.00" %\(%i%\) is equivalent to Scanf.sscanf 1234.00 %4d . -
232 l: returns the number of lines read so far. - n: returns the number of
233 characters read so far. - N or L: returns the number of tokens read so
234 far. - !: matches the end of input condition. - %: matches one %
235 character in the input. - ,: the no-op delimiter for conversion speci‐
236 fications. Following the % character that introduces a conversion,
237 there may be the special flag _: the conversion that follows occurs as
238 usual, but the resulting value is discarded. For instance, if f is the
239 function fun i -> i + 1, then Scanf.sscanf x = 1 %_s = %i f returns 2.
240 The field width is composed of an optional integer literal indicating
241 the maximal width of the token to read. For instance, %6d reads an
242 integer, having at most 6 decimal digits; %4f reads a float with at
243 most 4 characters; and %8[\\000-\\255] returns the next 8 characters
244 (or all the characters still available, if fewer than 8 characters are
245 available in the input). Notes: - as mentioned above, a %s conversion
246 always succeeds, even if there is nothing to read in the input: it sim‐
247 ply returns . - in addition to the relevant digits, '_' characters
248 may appear inside numbers (this is reminiscent to the usual Caml lexi‐
249 cal conventions). If stricter scanning is desired, use the range con‐
250 version facility instead of the number conversions. - the scanf facil‐
251 ity is not intended for heavy duty lexical analysis and parsing. If it
252 appears not expressive enough for your needs, several alternative
253 exists: regular expressions (module Str), stream parsers, ocamllex-gen‐
254 erated lexers, ocamlyacc-generated parsers. ===
255
256
257 === Scanning indications in format strings ===
258
259
260 === Scanning indications appear just after the string conversions %s
261 and %[ range ] to delimit the end of the token. A scanning indication
262 is introduced by a @ character, followed by some constant character c.
263 It means that the string token should end just before the next matching
264 c (which is skipped). If no c character is encountered, the string
265 token spreads as much as possible. For instance, %s@\t reads a string
266 up to the next tab character or to the end of input. If a scanning
267 indication @c does not follow a string conversion, it is treated as a
268 plain c character. Note: - the scanning indications introduce slight
269 differences in the syntax of Scanf format strings, compared to those
270 used for the Printf module. However, the scanning indications are simi‐
271 lar to those used in the Format module; hence, when producing formatted
272 text to be scanned by !Scanf.bscanf, it is wise to use printing func‐
273 tions from the Format module (or, if you need to use functions from
274 Printf, banish or carefully double check the format strings that con‐
275 tain '@' characters). ===
276
277
278 === Exceptions during scanning ===
279
280
281 === Scanners may raise the following exceptions when the input cannot
282 be read according to the format string: - Raise Scanf.Scan_failure if
283 the input does not match the format. - Raise Failure if a conversion
284 to a number is not possible. - Raise End_of_file if the end of input
285 is encountered while some more characters are needed to read the cur‐
286 rent conversion specification. - Raise Invalid_argument if the format
287 string is invalid. Note: - as a consequence, scanning a %s conversion
288 never raises exception End_of_file: if the end of input is reached the
289 conversion succeeds and simply returns the characters read so far, or
290 if none were read. ===
291
292
293 === Specialised formatted input functions ===
294
295
296 val fscanf : Pervasives.in_channel -> ('a, 'b, 'c, 'd) scanner
297
298 Same as Scanf.bscanf , but reads from the given channel.
299
300 Warning: since all formatted input functions operate from a scanning
301 buffer, be aware that each fscanf invocation will operate with a scan‐
302 ning buffer reading from the given channel. This extra level of buffer‐
303 ization can lead to strange scanning behaviour if you use low level
304 primitives on the channel (reading characters, seeking the reading
305 position, and so on).
306
307 As a consequence, never mix direct low level reading and high level
308 scanning from the same input channel.
309
310
311
312
313 val sscanf : string -> ('a, 'b, 'c, 'd) scanner
314
315 Same as Scanf.bscanf , but reads from the given string.
316
317
318
319
320 val scanf : ('a, 'b, 'c, 'd) scanner
321
322 Same as Scanf.bscanf , but reads from the predefined scanning buffer
323 Scanf.Scanning.stdib that is connected to stdin .
324
325
326
327
328 val kscanf : Scanning.scanbuf -> (Scanning.scanbuf -> exn -> 'a) ->
329 ('b, 'c, 'd, 'a) scanner
330
331 Same as Scanf.bscanf , but takes an additional function argument ef
332 that is called in case of error: if the scanning process or some con‐
333 version fails, the scanning function aborts and calls the error han‐
334 dling function ef with the scanning buffer and the exception that
335 aborted the scanning process.
336
337
338
339
340
341 === Reading format strings from input ===
342
343
344 val bscanf_format : Scanning.scanbuf -> ('a, 'b, 'c, 'd, 'e, 'f) for‐
345 mat6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
346
347
348 bscanf_format ib fmt f reads a format string token from the scanning
349 buffer ib , according to the given format string fmt , and applies f to
350 the resulting format string value. Raise Scan_failure if the format
351 string value read does not have the same type as fmt .
352
353
354
355
356 val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
357 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
358
359 Same as Scanf.bscanf_format , but reads from the given string.
360
361
362
363
364 val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
365 ('a, 'b, 'c, 'd, 'e, 'f) format6
366
367
368 format_from_string s fmt converts a string argument to a format string,
369 according to the given format string fmt . Raise Scan_failure if s ,
370 considered as a format string, does not have the same type as fmt .
371
372
373
374
375
376
377OCamldoc 2010-01-29 Scanf(3)