1Scanf(3) OCaml library Scanf(3)
2
3
4
6 Scanf - Formatted input functions.
7
9 Module Scanf
10
12 Module Scanf
13 : sig end
14
15
16 Formatted input functions.
17
18
19 Alert unsynchronized_access. Unsynchronized accesses to Scan‐
20 ning.in_channel are a programming error.
21
22
23
24
25
26
27
28 Introduction
29 Functional input with format strings
30 The module Scanf provides formatted input functions or scanners.
31
32 The formatted input functions can read from any kind of input, includ‐
33 ing strings, files, or anything that can return characters. The more
34 general source of characters is named a formatted input channel (or
35 scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
36 eral formatted input function reads from any scanning buffer and is
37 named bscanf .
38
39 Generally speaking, the formatted input functions have 3 arguments:
40
41 -the first argument is a source of characters for the input,
42
43 -the second argument is a format string that specifies the values to
44 read,
45
46 -the third argument is a receiver function that is applied to the val‐
47 ues read.
48
49 Hence, a typical call to the formatted input function Scanf.bscanf is
50 bscanf ic fmt f , where:
51
52
53 - ic is a source of characters (typically a formatted input channel
54 with type Scanf.Scanning.in_channel ),
55
56
57 - fmt is a format string (the same format strings as those used to
58 print material with module Printf or Format ),
59
60
61 - f is a function that has as many arguments as the number of values to
62 read in the input according to fmt .
63
64
65 A simple example
66 As suggested above, the expression bscanf ic "%d" f reads a decimal in‐
67 teger n from the source of characters ic and returns f n .
68
69 For instance,
70
71
72 -if we use stdin as the source of characters ( Scanf.Scanning.stdin is
73 the predefined formatted input channel that reads from standard input),
74
75
76 -if we define the receiver f as let f x = x + 1 ,
77
78 then bscanf Scanning.stdin "%d" f reads an integer n from the standard
79 input and returns f n (that is n + 1 ). Thus, if we evaluate bscanf
80 stdin "%d" f , and then enter 41 at the keyboard, the result we get is
81 42 .
82
83 Formatted input as a functional feature
84 The OCaml scanning facility is reminiscent of the corresponding C fea‐
85 ture. However, it is also largely different, simpler, and yet more
86 powerful: the formatted input functions are higher-order functionals
87 and the parameter passing mechanism is just the regular function appli‐
88 cation not the variable assignment based mechanism which is typical for
89 formatted input in imperative languages; the OCaml format strings also
90 feature useful additions to easily define complex tokens; as expected
91 within a functional programming language, the formatted input functions
92 also support polymorphism, in particular arbitrary interaction with
93 polymorphic user-defined scanners. Furthermore, the OCaml formatted in‐
94 put facility is fully type-checked at compile time.
95
96 Unsynchronized accesses
97
98 Unsynchronized accesses to a Scanf.Scanning.in_channel may lead to an
99 invalid Scanf.Scanning.in_channel state. Thus, concurrent accesses to
100 Scanf.Scanning.in_channel s must be synchronized (for instance with a
101 Mutex.t ).
102
103 Formatted input channel
104 module Scanning : sig end
105
106
107
108
109
110
111 Type of formatted input functions
112 type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
113 'd, 'd) format6 -> 'c
114
115
116 The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
117 type of a formatted input function that reads from some formatted input
118 channel according to some format string; more precisely, if scan is
119 some formatted input function, then scan
120 ic fmt f applies f to all the arguments specified by format string
121 fmt , when scan has read those arguments from the Scanf.Scan‐
122 ning.in_channel formatted input channel ic .
123
124 For instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
125 scanner , since it is a formatted input function that reads from
126 Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
127 by fmt , reading those arguments from stdin as expected.
128
129 If the format fmt has some %r indications, the corresponding formatted
130 input functions must be provided before receiver function f . For in‐
131 stance, if read_elem is an input function for values of type t , then
132 bscanf ic "%r;" read_elem f reads a value v of type t followed by a ';'
133 character, and returns f v .
134
135
136 Since 3.10.0
137
138
139 type ('a, 'b, 'c, 'd) scanner_opt = ('a, Scanning.in_channel, 'b, 'c,
140 'a -> 'd option, 'd) format6 -> 'c
141
142
143
144
145
146 exception Scan_failure of string
147
148
149 When the input can not be read according to the format string specifi‐
150 cation, formatted input functions typically raise exception Scan_fail‐
151 ure .
152
153
154
155
156 The general formatted input function
157 val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
158
159
160
161
162
163 bscanf ic fmt r1 ... rN f reads characters from the Scanf.Scan‐
164 ning.in_channel formatted input channel ic and converts them to values
165 according to format string fmt . As a final step, receiver function f
166 is applied to the values read and gives the result of the bscanf call.
167
168 For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
169 "x = 1" "%s = %i" f returns 2 .
170
171 Arguments r1 to rN are user-defined input functions that read the argu‐
172 ment corresponding to the %r conversions specified in the format
173 string.
174
175 val bscanf_opt : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner_opt
176
177 Same as Scanf.bscanf , but returns None in case of scanning failure.
178
179
180 Since 5.0
181
182
183
184
185 Format string description
186 The format string is a character string which contains three types of
187 objects:
188
189 -plain characters, which are simply matched with the characters of the
190 input (with a special case for space and line feed, see Scanf.space ),
191
192 -conversion specifications, each of which causes reading and conversion
193 of one argument for the function f (see Scanf.conversion ),
194
195 -scanning indications to specify boundaries of tokens (see scanning
196 Scanf.indication ).
197
198
199 The space character in format strings
200 As mentioned above, a plain character in the format string is just
201 matched with the next character of the input; however, two characters
202 are special exceptions to this rule: the space character ( ' ' or ASCII
203 code 32) and the line feed character ( '\n' or ASCII code 10). A space
204 does not match a single space character, but any amount of 'whitespace'
205 in the input. More precisely, a space inside the format string matches
206 any number of tab, space, line feed and carriage return characters.
207 Similarly, a line feed character in the format string matches either a
208 single line feed or a carriage return followed by a line feed.
209
210 Matching any amount of whitespace, a space in the format string also
211 matches no amount of whitespace at all; hence, the call bscanf ib
212 "Price = %d $" (fun p -> p) succeeds and returns 1 when reading an
213 input with various whitespace in it, such as Price = 1 $ , Price = 1
214 $ , or even Price=1$ .
215
216 Conversion specifications in format strings
217 Conversion specifications consist in the % character, followed by an
218 optional flag, an optional field width, and followed by one or two con‐
219 version characters.
220
221 The conversion characters and their meanings are:
222
223
224 - d : reads an optionally signed decimal integer ( 0-9 +).
225
226 - i : reads an optionally signed integer (usual input conventions for
227 decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal (
228 0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
229
230 - u : reads an unsigned decimal integer.
231
232 - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
233
234 - o : reads an unsigned octal integer ( [0-7]+ ).
235
236 - s : reads a string argument that spreads as much as possible, until
237 the following bounding condition holds:
238
239 -a whitespace has been found (see Scanf.space ),
240
241 -a scanning indication (see scanning Scanf.indication ) has been en‐
242 countered,
243
244 -the end-of-input has been reached.
245
246 Hence, this conversion always succeeds: it returns an empty string if
247 the bounding condition holds when the scan begins.
248
249 - S : reads a delimited string argument (delimiters and special escaped
250 characters follow the lexical conventions of OCaml).
251
252 - c : reads a single character. To test the current input character
253 without reading it, specify a null field width, i.e. use specification
254 %0c . Raise Invalid_argument , if the field width specification is
255 greater than 1.
256
257 - C : reads a single delimited character (delimiters and special es‐
258 caped characters follow the lexical conventions of OCaml).
259
260 - f , e , E , g , G : reads an optionally signed floating-point number
261 in decimal notation, in the style dddd.ddd
262 e/E+-dd .
263
264 - h , H : reads an optionally signed floating-point number in hexadeci‐
265 mal notation.
266
267 - F : reads a floating point number according to the lexical conven‐
268 tions of OCaml (hence the decimal point is mandatory if the exponent
269 part is not mentioned).
270
271 - B : reads a boolean argument ( true or false ).
272
273 - b : reads a boolean argument (for backward compatibility; do not use
274 in new programs).
275
276 - ld , li , lu , lx , lX , lo : reads an int32 argument to the format
277 specified by the second letter for regular integers.
278
279 - nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
280 mat specified by the second letter for regular integers.
281
282 - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to the format
283 specified by the second letter for regular integers.
284
285 - [ range ] : reads characters that matches one of the characters men‐
286 tioned in the range of characters range (or not mentioned in it, if the
287 range starts with ^ ). Reads a string that can be empty, if the next
288 input character does not match the range. The set of characters from c1
289 to c2 (inclusively) is denoted by c1-c2 . Hence, %[0-9] returns a
290 string representing a decimal number or an empty string if no decimal
291 digit is found; similarly, %[0-9a-f] returns a string of hexadecimal
292 digits. If a closing bracket appears in a range, it must occur as the
293 first character of the range (or just after the ^ in case of range
294 negation); hence []] matches a ] character and [^]] matches any charac‐
295 ter that is not ] . Use %% and %@ to include a % or a @ in a range.
296
297 - r : user-defined reader. Takes the next ri formatted input function
298 and applies it to the scanning buffer ib to read the next argument. The
299 input function ri must therefore have type Scanning.in_channel -> 'a
300 and the argument read has type 'a .
301
302 - { fmt %} : reads a format string argument. The format string read
303 must have the same type as the format string specification fmt . For
304 instance, "%{ %i %}" reads any format string that can read a value of
305 type int ; hence, if s is the string "fmt:\"number is %u\"" , then
306 Scanf.sscanf s "fmt: %{%i%}" succeeds and returns the format string
307 "number is %u" .
308
309 - ( fmt %) : scanning sub-format substitution. Reads a format string
310 rf in the input, then goes on scanning with rf instead of scanning with
311 fmt . The format string rf must have the same type as the format
312 string specification fmt that it replaces. For instance, "%( %i %)"
313 reads any format string that can read a value of type int . The con‐
314 version returns the format string read rf , and then a value read using
315 rf . Hence, if s is the string "\"%4d\"1234.00" , then Scanf.sscanf s
316 "%(%i%)" (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) . This be‐
317 haviour is not mere format substitution, since the conversion returns
318 the format string read as additional argument. If you need pure format
319 substitution, use special flag _ to discard the extraneous argument:
320 conversion %_( fmt %) reads a format string rf and then behaves the
321 same as format string rf . Hence, if s is the string "\"%4d\"1234.00"
322 , then Scanf.sscanf s "%_(%i%)" is simply equivalent to Scanf.sscanf
323 "1234.00" "%4d" .
324
325 - l : returns the number of lines read so far.
326
327 - n : returns the number of characters read so far.
328
329 - N or L : returns the number of tokens read so far.
330
331 - ! : matches the end of input condition.
332
333 - % : matches one % character in the input.
334
335 - @ : matches one @ character in the input.
336
337 - , : does nothing.
338
339 Following the % character that introduces a conversion, there may be
340 the special flag _ : the conversion that follows occurs as usual, but
341 the resulting value is discarded. For instance, if f is the function
342 fun i -> i + 1 , and s is the string "x = 1" , then Scanf.sscanf s "%_s
343 = %i" f returns 2 .
344
345 The field width is composed of an optional integer literal indicating
346 the maximal width of the token to read. For instance, %6d reads an in‐
347 teger, having at most 6 decimal digits; %4f reads a float with at most
348 4 characters; and %8[\000-\255] returns the next 8 characters (or all
349 the characters still available, if fewer than 8 characters are avail‐
350 able in the input).
351
352 Notes:
353
354
355 -as mentioned above, a %s conversion always succeeds, even if there is
356 nothing to read in the input: in this case, it simply returns "" .
357
358
359 -in addition to the relevant digits, '_' characters may appear inside
360 numbers (this is reminiscent to the usual OCaml lexical conventions).
361 If stricter scanning is desired, use the range conversion facility in‐
362 stead of the number conversions.
363
364
365 -the scanf facility is not intended for heavy duty lexical analysis and
366 parsing. If it appears not expressive enough for your needs, several
367 alternative exists: regular expressions (module Str ), stream parsers,
368 ocamllex -generated lexers, ocamlyacc -generated parsers.
369
370
371 Scanning indications in format strings
372 Scanning indications appear just after the string conversions %s and %[
373 range ] to delimit the end of the token. A scanning indication is in‐
374 troduced by a @ character, followed by some plain character c . It
375 means that the string token should end just before the next matching c
376 (which is skipped). If no c character is encountered, the string token
377 spreads as much as possible. For instance, "%s@\t" reads a string up to
378 the next tab character or to the end of input. If a @ character appears
379 anywhere else in the format string, it is treated as a plain character.
380
381 Note:
382
383
384 -As usual in format strings, % and @ characters must be escaped using
385 %% and %@ ; this rule still holds within range specifications and scan‐
386 ning indications. For instance, format "%s@%%" reads a string up to
387 the next % character, and format "%s@%@" reads a string up to the next
388 @ .
389
390 -The scanning indications introduce slight differences in the syntax of
391 Scanf format strings, compared to those used for the Printf module.
392 However, the scanning indications are similar to those used in the For‐
393 mat module; hence, when producing formatted text to be scanned by
394 Scanf.bscanf , it is wise to use printing functions from the Format
395 module (or, if you need to use functions from Printf , banish or care‐
396 fully double check the format strings that contain '@' characters).
397
398
399 Exceptions during scanning
400 Scanners may raise the following exceptions when the input cannot be
401 read according to the format string:
402
403
404 -Raise Scanf.Scan_failure if the input does not match the format.
405
406
407 -Raise Failure if a conversion to a number is not possible.
408
409
410 -Raise End_of_file if the end of input is encountered while some more
411 characters are needed to read the current conversion specification.
412
413
414 -Raise Invalid_argument if the format string is invalid.
415
416 Note:
417
418
419 -as a consequence, scanning a %s conversion never raises exception
420 End_of_file : if the end of input is reached the conversion succeeds
421 and simply returns the characters read so far, or "" if none were ever
422 read.
423
424
425 Specialised formatted input functions
426 val sscanf : string -> ('a, 'b, 'c, 'd) scanner
427
428 Same as Scanf.bscanf , but reads from the given string.
429
430
431
432 val sscanf_opt : string -> ('a, 'b, 'c, 'd) scanner_opt
433
434 Same as Scanf.sscanf , but returns None in case of scanning failure.
435
436
437 Since 5.0
438
439
440
441 val scanf : ('a, 'b, 'c, 'd) scanner
442
443 Same as Scanf.bscanf , but reads from the predefined formatted input
444 channel Scanf.Scanning.stdin that is connected to stdin .
445
446
447
448 val scanf_opt : ('a, 'b, 'c, 'd) scanner_opt
449
450 Same as Scanf.scanf , but returns None in case of scanning failure.
451
452
453 Since 5.0
454
455
456
457 val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
458 -> ('a, 'b, 'c, 'd) scanner
459
460 Same as Scanf.bscanf , but takes an additional function argument ef
461 that is called in case of error: if the scanning process or some con‐
462 version fails, the scanning function aborts and calls the error han‐
463 dling function ef with the formatted input channel and the exception
464 that aborted the scanning process as arguments.
465
466
467
468 val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
469 'c, 'd) scanner
470
471 Same as Scanf.kscanf but reads from the given string.
472
473
474 Since 4.02.0
475
476
477
478
479 Reading format strings from input
480 val bscanf_format : Scanning.in_channel -> ('a, 'b, 'c, 'd, 'e, 'f)
481 format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
482
483
484 bscanf_format ic fmt f reads a format string token from the formatted
485 input channel ic , according to the given format string fmt , and ap‐
486 plies f to the resulting format string value.
487
488
489 Since 3.09.0
490
491
492 Raises Scan_failure if the format string value read does not have the
493 same type as fmt .
494
495
496
497 val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
498 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
499
500 Same as Scanf.bscanf_format , but reads from the given string.
501
502
503 Since 3.09.0
504
505
506
507 val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
508 ('a, 'b, 'c, 'd, 'e, 'f) format6
509
510
511 format_from_string s fmt converts a string argument to a format string,
512 according to the given format string fmt .
513
514
515 Since 3.10.0
516
517
518 Raises Scan_failure if s , considered as a format string, does not have
519 the same type as fmt .
520
521
522
523 val unescaped : string -> string
524
525
526 unescaped s return a copy of s with escape sequences (according to the
527 lexical conventions of OCaml) replaced by their corresponding special
528 characters. More precisely, Scanf.unescaped has the following prop‐
529 erty: for all string s , Scanf.unescaped (String.escaped s) = s .
530
531 Always return a copy of the argument, even if there is no escape se‐
532 quence in the argument.
533
534
535 Since 4.00.0
536
537
538 Raises Scan_failure if s is not properly escaped (i.e. s has invalid
539 escape sequences or special characters that are not properly escaped).
540 For instance, Scanf.unescaped "\"" will fail.
541
542
543
544
545
546OCamldoc 2023-07-20 Scanf(3)