1Scanf(3) OCaml library Scanf(3)
2
3
4
6 Scanf - Formatted input functions.
7
9 Module Scanf
10
12 Module Scanf
13 : sig end
14
15
16 Formatted input functions.
17
18
19
20
21
22
23
24 Introduction
25 Functional input with format strings
26 The module Scanf provides formatted input functions or scanners.
27
28 The formatted input functions can read from any kind of input, includ‐
29 ing strings, files, or anything that can return characters. The more
30 general source of characters is named a formatted input channel (or
31 scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
32 eral formatted input function reads from any scanning buffer and is
33 named bscanf .
34
35 Generally speaking, the formatted input functions have 3 arguments:
36
37 -the first argument is a source of characters for the input,
38
39 -the second argument is a format string that specifies the values to
40 read,
41
42 -the third argument is a receiver function that is applied to the val‐
43 ues read.
44
45 Hence, a typical call to the formatted input function Scanf.bscanf is
46 bscanf ic fmt f , where:
47
48
49 - ic is a source of characters (typically a formatted input channel
50 with type Scanf.Scanning.in_channel ),
51
52
53 - fmt is a format string (the same format strings as those used to
54 print material with module Printf or Format ),
55
56
57 - f is a function that has as many arguments as the number of values to
58 read in the input according to fmt .
59
60
61 A simple example
62 As suggested above, the expression bscanf ic "%d" f reads a decimal
63 integer n from the source of characters ic and returns f n .
64
65 For instance,
66
67
68 -if we use stdin as the source of characters ( Scanf.Scanning.stdin is
69 the predefined formatted input channel that reads from standard input),
70
71
72 -if we define the receiver f as let f x = x + 1 ,
73
74 then bscanf Scanning.stdin "%d" f reads an integer n from the standard
75 input and returns f n (that is n + 1 ). Thus, if we evaluate bscanf
76 stdin "%d" f , and then enter 41 at the keyboard, the result we get is
77 42 .
78
79 Formatted input as a functional feature
80 The OCaml scanning facility is reminiscent of the corresponding C fea‐
81 ture. However, it is also largely different, simpler, and yet more
82 powerful: the formatted input functions are higher-order functionals
83 and the parameter passing mechanism is just the regular function appli‐
84 cation not the variable assignment based mechanism which is typical for
85 formatted input in imperative languages; the OCaml format strings also
86 feature useful additions to easily define complex tokens; as expected
87 within a functional programming language, the formatted input functions
88 also support polymorphism, in particular arbitrary interaction with
89 polymorphic user-defined scanners. Furthermore, the OCaml formatted
90 input facility is fully type-checked at compile time.
91
92 Formatted input channel
93 module Scanning : sig end
94
95
96
97
98
99
100 Type of formatted input functions
101 type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
102 'd, 'd) format6 -> 'c
103
104
105 The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
106 type of a formatted input function that reads from some formatted input
107 channel according to some format string; more precisely, if scan is
108 some formatted input function, then scan
109 ic fmt f applies f to all the arguments specified by format string
110 fmt , when scan has read those arguments from the Scanf.Scan‐
111 ning.in_channel formatted input channel ic .
112
113 For instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
114 scanner , since it is a formatted input function that reads from
115 Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
116 by fmt , reading those arguments from stdin as expected.
117
118 If the format fmt has some %r indications, the corresponding formatted
119 input functions must be provided before receiver function f . For
120 instance, if read_elem is an input function for values of type t , then
121 bscanf ic "%r;" read_elem f reads a value v of type t followed by a ';'
122 character, and returns f v .
123
124
125 Since 3.10.0
126
127
128
129 exception Scan_failure of string
130
131
132 When the input can not be read according to the format string specifi‐
133 cation, formatted input functions typically raise exception Scan_fail‐
134 ure .
135
136
137
138
139 The general formatted input function
140 val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
141
142
143
144
145
146 bscanf ic fmt r1 ... rN f reads characters from the Scanf.Scan‐
147 ning.in_channel formatted input channel ic and converts them to values
148 according to format string fmt . As a final step, receiver function f
149 is applied to the values read and gives the result of the bscanf call.
150
151 For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
152 "x= 1" "%s = %i" f returns 2 .
153
154 Arguments r1 to rN are user-defined input functions that read the argu‐
155 ment corresponding to the %r conversions specified in the format
156 string.
157
158 Format string description
159 The format string is a character string which contains three types of
160 objects:
161
162 -plain characters, which are simply matched with the characters of the
163 input (with a special case for space and line feed, see Scanf.space ),
164
165 -conversion specifications, each of which causes reading and conversion
166 of one argument for the function f (see Scanf.conversion ),
167
168 -scanning indications to specify boundaries of tokens (see scanning
169 Scanf.indication ).
170
171
172 The space character in format strings
173 As mentioned above, a plain character in the format string is just
174 matched with the next character of the input; however, two characters
175 are special exceptions to this rule: the space character ( ' ' or ASCII
176 code 32) and the line feed character ( '\n' or ASCII code 10). A space
177 does not match a single space character, but any amount of 'whitespace'
178 in the input. More precisely, a space inside the format string matches
179 any number of tab, space, line feed and carriage return characters.
180 Similarly, a line feed character in the format string matches either a
181 single line feed or a carriage return followed by a line feed.
182
183 Matching any amount of whitespace, a space in the format string also
184 matches no amount of whitespace at all; hence, the call bscanf ib
185 "Price = %d $" (fun p -> p) succeeds and returns 1 when reading an
186 input with various whitespace in it, such as Price = 1 $ , Price = 1
187 $ , or even Price=1$ .
188
189 Conversion specifications in format strings
190 Conversion specifications consist in the % character, followed by an
191 optional flag, an optional field width, and followed by one or two con‐
192 version characters.
193
194 The conversion characters and their meanings are:
195
196
197 - d : reads an optionally signed decimal integer ( 0-9 +).
198
199 - i : reads an optionally signed integer (usual input conventions for
200 decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal (
201 0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
202
203 - u : reads an unsigned decimal integer.
204
205 - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
206
207 - o : reads an unsigned octal integer ( [0-7]+ ).
208
209 - s : reads a string argument that spreads as much as possible, until
210 the following bounding condition holds:
211
212 -a whitespace has been found (see Scanf.space ),
213
214 -a scanning indication (see scanning Scanf.indication ) has been
215 encountered,
216
217 -the end-of-input has been reached.
218
219 Hence, this conversion always succeeds: it returns an empty string if
220 the bounding condition holds when the scan begins.
221
222 - S : reads a delimited string argument (delimiters and special escaped
223 characters follow the lexical conventions of OCaml).
224
225 - c : reads a single character. To test the current input character
226 without reading it, specify a null field width, i.e. use specification
227 %0c . Raise Invalid_argument , if the field width specification is
228 greater than 1.
229
230 - C : reads a single delimited character (delimiters and special
231 escaped characters follow the lexical conventions of OCaml).
232
233 - f , e , E , g , G : reads an optionally signed floating-point number
234 in decimal notation, in the style dddd.ddd
235 e/E+-dd .
236
237 - h , H : reads an optionally signed floating-point number in hexadeci‐
238 mal notation.
239
240 - F : reads a floating point number according to the lexical conven‐
241 tions of OCaml (hence the decimal point is mandatory if the exponent
242 part is not mentioned).
243
244 - B : reads a boolean argument ( true or false ).
245
246 - b : reads a boolean argument (for backward compatibility; do not use
247 in new programs).
248
249 - ld , li , lu , lx , lX , lo : reads an int32 argument to the format
250 specified by the second letter for regular integers.
251
252 - nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
253 mat specified by the second letter for regular integers.
254
255 - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to the format
256 specified by the second letter for regular integers.
257
258 - [ range ] : reads characters that matches one of the characters men‐
259 tioned in the range of characters range (or not mentioned in it, if the
260 range starts with ^ ). Reads a string that can be empty, if the next
261 input character does not match the range. The set of characters from c1
262 to c2 (inclusively) is denoted by c1-c2 . Hence, %[0-9] returns a
263 string representing a decimal number or an empty string if no decimal
264 digit is found; similarly, %[0-9a-f] returns a string of hexadecimal
265 digits. If a closing bracket appears in a range, it must occur as the
266 first character of the range (or just after the ^ in case of range
267 negation); hence []] matches a ] character and [^]] matches any charac‐
268 ter that is not ] . Use %% and %@ to include a % or a @ in a range.
269
270 - r : user-defined reader. Takes the next ri formatted input function
271 and applies it to the scanning buffer ib to read the next argument. The
272 input function ri must therefore have type Scanning.in_channel -> 'a
273 and the argument read has type 'a .
274
275 - { fmt %} : reads a format string argument. The format string read
276 must have the same type as the format string specification fmt . For
277 instance, "%{ %i %}" reads any format string that can read a value of
278 type int ; hence, if s is the string "fmt:\"number is %u\"" , then
279 Scanf.sscanf s "fmt: %{%i%}" succeeds and returns the format string
280 "number is %u" .
281
282 - ( fmt %) : scanning sub-format substitution. Reads a format string
283 rf in the input, then goes on scanning with rf instead of scanning with
284 fmt . The format string rf must have the same type as the format
285 string specification fmt that it replaces. For instance, "%( %i %)"
286 reads any format string that can read a value of type int . The con‐
287 version returns the format string read rf , and then a value read using
288 rf . Hence, if s is the string "\"%4d\"1234.00" , then Scanf.sscanf s
289 "%(%i%)" (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) . This be‐
290 haviour is not mere format substitution, since the conversion returns
291 the format string read as additional argument. If you need pure format
292 substitution, use special flag _ to discard the extraneous argument:
293 conversion %_( fmt %) reads a format string rf and then behaves the
294 same as format string rf . Hence, if s is the string "\"%4d\"1234.00"
295 , then Scanf.sscanf s "%_(%i%)" is simply equivalent to Scanf.sscanf
296 "1234.00" "%4d" .
297
298 - l : returns the number of lines read so far.
299
300 - n : returns the number of characters read so far.
301
302 - N or L : returns the number of tokens read so far.
303
304 - ! : matches the end of input condition.
305
306 - % : matches one % character in the input.
307
308 - @ : matches one @ character in the input.
309
310 - , : does nothing.
311
312 Following the % character that introduces a conversion, there may be
313 the special flag _ : the conversion that follows occurs as usual, but
314 the resulting value is discarded. For instance, if f is the function
315 fun i -> i + 1 , and s is the string "x = 1" , then Scanf.sscanf s "%_s
316 = %i" f returns 2 .
317
318 The field width is composed of an optional integer literal indicating
319 the maximal width of the token to read. For instance, %6d reads an
320 integer, having at most 6 decimal digits; %4f reads a float with at
321 most 4 characters; and %8[\000-\255] returns the next 8 characters (or
322 all the characters still available, if fewer than 8 characters are
323 available in the input).
324
325 Notes:
326
327
328 -as mentioned above, a %s conversion always succeeds, even if there is
329 nothing to read in the input: in this case, it simply returns "" .
330
331
332 -in addition to the relevant digits, '_' characters may appear inside
333 numbers (this is reminiscent to the usual OCaml lexical conventions).
334 If stricter scanning is desired, use the range conversion facility
335 instead of the number conversions.
336
337
338 -the scanf facility is not intended for heavy duty lexical analysis and
339 parsing. If it appears not expressive enough for your needs, several
340 alternative exists: regular expressions (module Str ), stream parsers,
341 ocamllex -generated lexers, ocamlyacc -generated parsers.
342
343
344 Scanning indications in format strings
345 Scanning indications appear just after the string conversions %s and %[
346 range ] to delimit the end of the token. A scanning indication is
347 introduced by a @ character, followed by some plain character c . It
348 means that the string token should end just before the next matching c
349 (which is skipped). If no c character is encountered, the string token
350 spreads as much as possible. For instance, "%s@\t" reads a string up to
351 the next tab character or to the end of input. If a @ character appears
352 anywhere else in the format string, it is treated as a plain character.
353
354 Note:
355
356
357 -As usual in format strings, % and @ characters must be escaped using
358 %% and %@ ; this rule still holds within range specifications and scan‐
359 ning indications. For instance, format "%s@%%" reads a string up to
360 the next % character, and format "%s@%@" reads a string up to the next
361 @ .
362
363 -The scanning indications introduce slight differences in the syntax of
364 Scanf format strings, compared to those used for the Printf module.
365 However, the scanning indications are similar to those used in the For‐
366 mat module; hence, when producing formatted text to be scanned by
367 Scanf.bscanf , it is wise to use printing functions from the Format
368 module (or, if you need to use functions from Printf , banish or care‐
369 fully double check the format strings that contain '@' characters).
370
371
372 Exceptions during scanning
373 Scanners may raise the following exceptions when the input cannot be
374 read according to the format string:
375
376
377 -Raise Scanf.Scan_failure if the input does not match the format.
378
379
380 -Raise Failure if a conversion to a number is not possible.
381
382
383 -Raise End_of_file if the end of input is encountered while some more
384 characters are needed to read the current conversion specification.
385
386
387 -Raise Invalid_argument if the format string is invalid.
388
389 Note:
390
391
392 -as a consequence, scanning a %s conversion never raises exception
393 End_of_file : if the end of input is reached the conversion succeeds
394 and simply returns the characters read so far, or "" if none were ever
395 read.
396
397
398 Specialised formatted input functions
399 val sscanf : string -> ('a, 'b, 'c, 'd) scanner
400
401 Same as Scanf.bscanf , but reads from the given string.
402
403
404
405 val scanf : ('a, 'b, 'c, 'd) scanner
406
407 Same as Scanf.bscanf , but reads from the predefined formatted input
408 channel Scanf.Scanning.stdin that is connected to stdin .
409
410
411
412 val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
413 -> ('a, 'b, 'c, 'd) scanner
414
415 Same as Scanf.bscanf , but takes an additional function argument ef
416 that is called in case of error: if the scanning process or some con‐
417 version fails, the scanning function aborts and calls the error han‐
418 dling function ef with the formatted input channel and the exception
419 that aborted the scanning process as arguments.
420
421
422
423 val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
424 'c, 'd) scanner
425
426 Same as Scanf.kscanf but reads from the given string.
427
428
429 Since 4.02.0
430
431
432
433
434 Reading format strings from input
435 val bscanf_format : Scanning.in_channel -> ('a, 'b, 'c, 'd, 'e, 'f)
436 format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
437
438
439 bscanf_format ic fmt f reads a format string token from the formatted
440 input channel ic , according to the given format string fmt , and
441 applies f to the resulting format string value.
442
443
444 Since 3.09.0
445
446
447 Raises Scan_failure if the format string value read does not have the
448 same type as fmt .
449
450
451
452 val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
453 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
454
455 Same as Scanf.bscanf_format , but reads from the given string.
456
457
458 Since 3.09.0
459
460
461
462 val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
463 ('a, 'b, 'c, 'd, 'e, 'f) format6
464
465
466 format_from_string s fmt converts a string argument to a format string,
467 according to the given format string fmt .
468
469
470 Since 3.10.0
471
472
473 Raises Scan_failure if s , considered as a format string, does not have
474 the same type as fmt .
475
476
477
478 val unescaped : string -> string
479
480
481 unescaped s return a copy of s with escape sequences (according to the
482 lexical conventions of OCaml) replaced by their corresponding special
483 characters. More precisely, Scanf.unescaped has the following prop‐
484 erty: for all string s , Scanf.unescaped (String.escaped s) = s .
485
486 Always return a copy of the argument, even if there is no escape
487 sequence in the argument.
488
489
490 Since 4.00.0
491
492
493 Raises Scan_failure if s is not properly escaped (i.e. s has invalid
494 escape sequences or special characters that are not properly escaped).
495 For instance, Scanf.unescaped "\"" will fail.
496
497
498
499
500 Deprecated
501 val fscanf : in_channel -> ('a, 'b, 'c, 'd) scanner
502
503 Deprecated.
504
505 Scanf.fscanf is error prone and deprecated since 4.03.0.
506
507 This function violates the following invariant of the Scanf module: To
508 preserve scanning semantics, all scanning functions defined in Scanf
509 must read from a user defined Scanf.Scanning.in_channel formatted input
510 channel.
511
512 If you need to read from a in_channel input channel ic , simply define
513 a Scanf.Scanning.in_channel formatted input channel as in let ib =
514 Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
515
516
517
518 val kfscanf : in_channel -> (Scanning.in_channel -> exn -> 'd) -> ('a,
519 'b, 'c, 'd) scanner
520
521 Deprecated.
522
523 Scanf.kfscanf is error prone and deprecated since 4.03.0.
524
525
526
527
528
529OCamldoc 2020-09-01 Scanf(3)