1Scanf(3) OCaml library Scanf(3)
2
3
4
6 Scanf - Formatted input functions.
7
9 Module Scanf
10
12 Module Scanf
13 : sig end
14
15
16 Formatted input functions.
17
18
19
20
21
22
23
24 === Introduction ===
25
26
27 === Functional input with format strings ===
28
29
30 === The module Scanf provides formatted input functions or scanners.
31 The formatted input functions can read from any kind of input, includ‐
32 ing strings, files, or anything that can return characters. The more
33 general source of characters is named a formatted input channel (or
34 scanning buffer) and has type Scanf.Scanning.in_channel. The more gen‐
35 eral formatted input function reads from any scanning buffer and is
36 named bscanf. Generally speaking, the formatted input functions have 3
37 arguments: - the first argument is a source of characters for the
38 input, - the second argument is a format string that specifies the val‐
39 ues to read, - the third argument is a receiver function that is
40 applied to the values read. Hence, a typical call to the formatted
41 input function Scanf.bscanf is bscanf ic fmt f, where: - ic is a source
42 of characters (typically a formatted input channel with type
43 Scanf.Scanning.in_channel), - fmt is a format string (the same format
44 strings as those used to print material with module Printf or Format),
45 - f is a function that has as many arguments as the number of values to
46 read in the input according to fmt. ===
47
48
49 === A simple example ===
50
51
52 === As suggested above, the expression bscanf ic %d f reads a decimal
53 integer n from the source of characters ic and returns f n. For
54 instance, - if we use stdin as the source of characters (Scanf.Scan‐
55 ning.stdin is the predefined formatted input channel that reads from
56 standard input), - if we define the receiver f as let f x = x + 1, then
57 bscanf Scanning.stdin %d f reads an integer n from the standard input
58 and returns f n (that is n + 1). Thus, if we evaluate bscanf stdin %d
59 f, and then enter 41 at the keyboard, the result we get is 42. ===
60
61
62 === Formatted input as a functional feature ===
63
64
65 === The OCaml scanning facility is reminiscent of the corresponding C
66 feature. However, it is also largely different, simpler, and yet more
67 powerful: the formatted input functions are higher-order functionals
68 and the parameter passing mechanism is just the regular function appli‐
69 cation not the variable assignment based mechanism which is typical for
70 formatted input in imperative languages; the OCaml format strings also
71 feature useful additions to easily define complex tokens; as expected
72 within a functional programming language, the formatted input functions
73 also support polymorphism, in particular arbitrary interaction with
74 polymorphic user-defined scanners. Furthermore, the OCaml formatted
75 input facility is fully type-checked at compile time. ===
76
77
78 === Formatted input channel ===
79
80
81 module Scanning : sig end
82
83
84
85
86
87
88 === Type of formatted input functions ===
89
90
91 type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
92 'd, 'd) Pervasives.format6 -> 'c
93
94
95 The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
96 type of a formatted input function that reads from some formatted input
97 channel according to some format string; more precisely, if scan is
98 some formatted input function, then scan ic fmt f applies f to all the
99 arguments specified by format string fmt , when scan has read those
100 arguments from the Scanf.Scanning.in_channel formatted input channel ic
101 .
102
103 For instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
104 scanner , since it is a formatted input function that reads from
105 Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
106 by fmt , reading those arguments from Pervasives.stdin as expected.
107
108 If the format fmt has some %r indications, the corresponding formatted
109 input functions must be provided before receiver function f . For
110 instance, if read_elem is an input function for values of type t , then
111 bscanf ic %r; read_elem f reads a value v of type t followed by a ';'
112 character, and returns f v .
113
114
115 Since 3.10.0
116
117
118
119 exception Scan_failure of string
120
121
122 When the input can not be read according to the format string specifi‐
123 cation, formatted input functions typically raise exception Scan_fail‐
124 ure .
125
126
127
128
129 === The general formatted input function ===
130
131
132 val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
133
134
135
136
137
138 === bscanf ic fmt r1 ... rN f reads characters from the Scanf.Scan‐
139 ning.in_channel formatted input channel ic and converts them to values
140 according to format string fmt. As a final step, receiver function f
141 is applied to the values read and gives the result of the bscanf call.
142 For instance, if f is the function fun s i -> i + 1, then Scanf.sscanf
143 x= 1 %s = %i f returns 2. Arguments r1 to rN are user-defined input
144 functions that read the argument corresponding to the %r conversions
145 specified in the format string. ===
146
147
148 === Format string description ===
149
150
151 === The format string is a character string which contains three types
152 of objects: - plain characters, which are simply matched with the char‐
153 acters of the input (with a special case for space and line feed, see
154 Scanf.space), - conversion specifications, each of which causes reading
155 and conversion of one argument for the function f (see Scanf.conver‐
156 sion), - scanning indications to specify boundaries of tokens (see
157 scanning Scanf.indication). ===
158
159
160 === The space character in format strings ===
161
162
163 === As mentioned above, a plain character in the format string is just
164 matched with the next character of the input; however, two characters
165 are special exceptions to this rule: the space character (' ' or ASCII
166 code 32) and the line feed character ('\n' or ASCII code 10). A space
167 does not match a single space character, but any amount of 'whitespace'
168 in the input. More precisely, a space inside the format string matches
169 any number of tab, space, line feed and carriage return characters.
170 Similarly, a line feed character in the format string matches either a
171 single line feed or a carriage return followed by a line feed. Match‐
172 ing any amount of whitespace, a space in the format string also matches
173 no amount of whitespace at all; hence, the call bscanf ib Price = %d $
174 (fun p -> p) succeeds and returns 1 when reading an input with various
175 whitespace in it, such as Price = 1 $, Price = 1 $, or even Price=1$.
176 ===
177
178
179 === Conversion specifications in format strings ===
180
181
182 === Conversion specifications consist in the % character, followed by
183 an optional flag, an optional field width, and followed by one or two
184 conversion characters. The conversion characters and their meanings
185 are: - d: reads an optionally signed decimal integer (0-9+). - i:
186 reads an optionally signed integer (usual input conventions for decimal
187 (0-9+), hexadecimal (0x[0-9a-f]+ and 0X[0-9A-F]+), octal (0o[0-7]+),
188 and binary (0b[0-1]+) notations are understood). - u: reads an
189 unsigned decimal integer. - x or X: reads an unsigned hexadecimal
190 integer ([0-9a-fA-F]+). - o: reads an unsigned octal integer ([0-7]+).
191 - s: reads a string argument that spreads as much as possible, until
192 the following bounding condition holds: - a whitespace has been found
193 (see Scanf.space), - a scanning indication (see scanning Scanf.indica‐
194 tion) has been encountered, - the end-of-input has been reached.
195 Hence, this conversion always succeeds: it returns an empty string if
196 the bounding condition holds when the scan begins. - S: reads a delim‐
197 ited string argument (delimiters and special escaped characters follow
198 the lexical conventions of OCaml). - c: reads a single character. To
199 test the current input character without reading it, specify a null
200 field width, i.e. use specification %0c. Raise Invalid_argument, if the
201 field width specification is greater than 1. - C: reads a single
202 delimited character (delimiters and special escaped characters follow
203 the lexical conventions of OCaml). - f, e, E, g, G: reads an option‐
204 ally signed floating-point number in decimal notation, in the style
205 dddd.ddd e/E+-dd. - h, H: reads an optionally signed floating-point
206 number in hexadecimal notation. - F: reads a floating point number
207 according to the lexical conventions of OCaml (hence the decimal point
208 is mandatory if the exponent part is not mentioned). - B: reads a
209 boolean argument (true or false). - b: reads a boolean argument (for
210 backward compatibility; do not use in new programs). - ld, li, lu, lx,
211 lX, lo: reads an int32 argument to the format specified by the second
212 letter for regular integers. - nd, ni, nu, nx, nX, no: reads a
213 nativeint argument to the format specified by the second letter for
214 regular integers. - Ld, Li, Lu, Lx, LX, Lo: reads an int64 argument to
215 the format specified by the second letter for regular integers. - [
216 range ]: reads characters that matches one of the characters mentioned
217 in the range of characters range (or not mentioned in it, if the range
218 starts with ^). Reads a string that can be empty, if the next input
219 character does not match the range. The set of characters from c1 to c2
220 (inclusively) is denoted by c1-c2. Hence, %[0-9] returns a string rep‐
221 resenting a decimal number or an empty string if no decimal digit is
222 found; similarly, %[0-9a-f] returns a string of hexadecimal digits. If
223 a closing bracket appears in a range, it must occur as the first char‐
224 acter of the range (or just after the ^ in case of range negation);
225 hence []] matches a ] character and [^]] matches any character that is
226 not ]. Use %% and %@ to include a % or a @ in a range. - r:
227 user-defined reader. Takes the next ri formatted input function and
228 applies it to the scanning buffer ib to read the next argument. The
229 input function ri must therefore have type Scanning.in_channel -> 'a
230 and the argument read has type 'a. - { fmt %}: reads a format string
231 argument. The format string read must have the same type as the format
232 string specification fmt. For instance, %{ %i %} reads any format
233 string that can read a value of type int; hence, if s is the string
234 fmt:\ number is %u\"", then Scanf.sscanf s fmt: %{%i%} succeeds and
235 returns the format string number is %u . - ( fmt %): scanning sub-for‐
236 mat substitution. Reads a format string rf in the input, then goes on
237 scanning with rf instead of scanning with fmt. The format string rf
238 must have the same type as the format string specification fmt that it
239 replaces. For instance, %( %i %) reads any format string that can read
240 a value of type int. The conversion returns the format string read rf,
241 and then a value read using rf. Hence, if s is the string \
242 %4d\"1234.00", then Scanf.sscanf s %(%i%) (fun fmt i -> fmt, i) evalu‐
243 ates to ("%4d", 1234). This behaviour is not mere format substitution,
244 since the conversion returns the format string read as additional argu‐
245 ment. If you need pure format substitution, use special flag _ to dis‐
246 card the extraneous argument: conversion %_( fmt %) reads a format
247 string rf and then behaves the same as format string rf. Hence, if s is
248 the string \ %4d\"1234.00", then Scanf.sscanf s %_(%i%) is simply
249 equivalent to Scanf.sscanf 1234.00 %4d . - l: returns the number of
250 lines read so far. - n: returns the number of characters read so far.
251 - N or L: returns the number of tokens read so far. - !: matches the
252 end of input condition. - %: matches one % character in the input. -
253 @: matches one @ character in the input. - ,: does nothing. Following
254 the % character that introduces a conversion, there may be the special
255 flag _: the conversion that follows occurs as usual, but the resulting
256 value is discarded. For instance, if f is the function fun i -> i + 1,
257 and s is the string x = 1 , then Scanf.sscanf s %_s = %i f returns 2.
258 The field width is composed of an optional integer literal indicating
259 the maximal width of the token to read. For instance, %6d reads an
260 integer, having at most 6 decimal digits; %4f reads a float with at
261 most 4 characters; and %8[\000-\255] returns the next 8 characters (or
262 all the characters still available, if fewer than 8 characters are
263 available in the input). Notes: - as mentioned above, a %s conversion
264 always succeeds, even if there is nothing to read in the input: in this
265 case, it simply returns . - in addition to the relevant digits, '_'
266 characters may appear inside numbers (this is reminiscent to the usual
267 OCaml lexical conventions). If stricter scanning is desired, use the
268 range conversion facility instead of the number conversions. - the
269 scanf facility is not intended for heavy duty lexical analysis and
270 parsing. If it appears not expressive enough for your needs, several
271 alternative exists: regular expressions (module Str), stream parsers,
272 ocamllex-generated lexers, ocamlyacc-generated parsers. ===
273
274
275 === Scanning indications in format strings ===
276
277
278 === Scanning indications appear just after the string conversions %s
279 and %[ range ] to delimit the end of the token. A scanning indication
280 is introduced by a @ character, followed by some plain character c. It
281 means that the string token should end just before the next matching c
282 (which is skipped). If no c character is encountered, the string token
283 spreads as much as possible. For instance, %s@\t reads a string up to
284 the next tab character or to the end of input. If a @ character appears
285 anywhere else in the format string, it is treated as a plain character.
286 Note: - As usual in format strings, % and @ characters must be escaped
287 using %% and %@; this rule still holds within range specifications and
288 scanning indications. For instance, format %s@%% reads a string up to
289 the next % character, and format %s@%@ reads a string up to the next @.
290 - The scanning indications introduce slight differences in the syntax
291 of Scanf format strings, compared to those used for the Printf module.
292 However, the scanning indications are similar to those used in the For‐
293 mat module; hence, when producing formatted text to be scanned by
294 Scanf.bscanf, it is wise to use printing functions from the Format mod‐
295 ule (or, if you need to use functions from Printf, banish or carefully
296 double check the format strings that contain '@' characters). ===
297
298
299 === Exceptions during scanning ===
300
301
302 === Scanners may raise the following exceptions when the input cannot
303 be read according to the format string: - Raise Scanf.Scan_failure if
304 the input does not match the format. - Raise Failure if a conversion
305 to a number is not possible. - Raise End_of_file if the end of input
306 is encountered while some more characters are needed to read the cur‐
307 rent conversion specification. - Raise Invalid_argument if the format
308 string is invalid. Note: - as a consequence, scanning a %s conversion
309 never raises exception End_of_file: if the end of input is reached the
310 conversion succeeds and simply returns the characters read so far, or
311 if none were ever read. ===
312
313
314 === Specialised formatted input functions ===
315
316
317 val sscanf : string -> ('a, 'b, 'c, 'd) scanner
318
319 Same as Scanf.bscanf , but reads from the given string.
320
321
322
323 val scanf : ('a, 'b, 'c, 'd) scanner
324
325 Same as Scanf.bscanf , but reads from the predefined formatted input
326 channel Scanf.Scanning.stdin that is connected to Pervasives.stdin .
327
328
329
330 val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
331 -> ('a, 'b, 'c, 'd) scanner
332
333 Same as Scanf.bscanf , but takes an additional function argument ef
334 that is called in case of error: if the scanning process or some con‐
335 version fails, the scanning function aborts and calls the error han‐
336 dling function ef with the formatted input channel and the exception
337 that aborted the scanning process as arguments.
338
339
340
341 val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
342 'c, 'd) scanner
343
344 Same as Scanf.kscanf but reads from the given string.
345
346
347 Since 4.02.0
348
349
350
351
352 === Reading format strings from input ===
353
354
355 val bscanf_format : Scanning.in_channel -> ('a, 'b, 'c, 'd, 'e, 'f)
356 Pervasives.format6 -> (('a, 'b, 'c, 'd, 'e, 'f) Pervasives.format6 ->
357 'g) -> 'g
358
359
360 bscanf_format ic fmt f reads a format string token from the formatted
361 input channel ic , according to the given format string fmt , and
362 applies f to the resulting format string value. Raise Scanf.Scan_fail‐
363 ure if the format string value read does not have the same type as fmt
364 .
365
366
367 Since 3.09.0
368
369
370
371 val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) Pervasives.for‐
372 mat6 -> (('a, 'b, 'c, 'd, 'e, 'f) Pervasives.format6 -> 'g) -> 'g
373
374 Same as Scanf.bscanf_format , but reads from the given string.
375
376
377 Since 3.09.0
378
379
380
381 val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) Perva‐
382 sives.format6 -> ('a, 'b, 'c, 'd, 'e, 'f) Pervasives.format6
383
384
385 format_from_string s fmt converts a string argument to a format string,
386 according to the given format string fmt . Raise Scanf.Scan_failure if
387 s , considered as a format string, does not have the same type as fmt .
388
389
390 Since 3.10.0
391
392
393
394 val unescaped : string -> string
395
396
397 unescaped s return a copy of s with escape sequences (according to the
398 lexical conventions of OCaml) replaced by their corresponding special
399 characters. More precisely, Scanf.unescaped has the following prop‐
400 erty: for all string s , Scanf.unescaped (String.escaped s) = s .
401
402 Always return a copy of the argument, even if there is no escape
403 sequence in the argument. Raise Scanf.Scan_failure if s is not prop‐
404 erly escaped (i.e. s has invalid escape sequences or special charac‐
405 ters that are not properly escaped). For instance, String.unescaped \"
406 will fail.
407
408
409 Since 4.00.0
410
411
412
413
414 === Deprecated ===
415
416
417 val fscanf : Pervasives.in_channel -> ('a, 'b, 'c, 'd) scanner
418
419 Deprecated.
420
421 Scanf.fscanf is error prone and deprecated since 4.03.0.
422
423 This function violates the following invariant of the Scanf module: To
424 preserve scanning semantics, all scanning functions defined in Scanf
425 must read from a user defined Scanf.Scanning.in_channel formatted input
426 channel.
427
428 If you need to read from a Pervasives.in_channel input channel ic ,
429 simply define a Scanf.Scanning.in_channel formatted input channel as in
430 let ib = Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
431
432
433
434 val kfscanf : Pervasives.in_channel -> (Scanning.in_channel -> exn ->
435 'd) -> ('a, 'b, 'c, 'd) scanner
436
437 Deprecated.
438
439 Scanf.kfscanf is error prone and deprecated since 4.03.0.
440
441
442
443
444
445OCamldoc 2019-02-02 Scanf(3)