1Stdlib.Scanf(3) OCaml library Stdlib.Scanf(3)
2
3
4
6 Stdlib.Scanf - no description
7
9 Module Stdlib.Scanf
10
12 Module Scanf
13 : (module Stdlib__Scanf)
14
15
16
17
18
19
20
21
22
23 Introduction
24 Functional input with format strings
25 The module Scanf provides formatted input functions or scanners.
26
27 The formatted input functions can read from any kind of input, includ‐
28 ing strings, files, or anything that can return characters. The more
29 general source of characters is named a formatted input channel (or
30 scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
31 eral formatted input function reads from any scanning buffer and is
32 named bscanf .
33
34 Generally speaking, the formatted input functions have 3 arguments:
35
36 -the first argument is a source of characters for the input,
37
38 -the second argument is a format string that specifies the values to
39 read,
40
41 -the third argument is a receiver function that is applied to the val‐
42 ues read.
43
44 Hence, a typical call to the formatted input function Scanf.bscanf is
45 bscanf ic fmt f , where:
46
47
48 - ic is a source of characters (typically a formatted input channel
49 with type Scanf.Scanning.in_channel ),
50
51
52 - fmt is a format string (the same format strings as those used to
53 print material with module Printf or Format ),
54
55
56 - f is a function that has as many arguments as the number of values to
57 read in the input according to fmt .
58
59
60 A simple example
61 As suggested above, the expression bscanf ic "%d" f reads a decimal in‐
62 teger n from the source of characters ic and returns f n .
63
64 For instance,
65
66
67 -if we use stdin as the source of characters ( Scanf.Scanning.stdin is
68 the predefined formatted input channel that reads from standard input),
69
70
71 -if we define the receiver f as let f x = x + 1 ,
72
73 then bscanf Scanning.stdin "%d" f reads an integer n from the standard
74 input and returns f n (that is n + 1 ). Thus, if we evaluate bscanf
75 stdin "%d" f , and then enter 41 at the keyboard, the result we get is
76 42 .
77
78 Formatted input as a functional feature
79 The OCaml scanning facility is reminiscent of the corresponding C fea‐
80 ture. However, it is also largely different, simpler, and yet more
81 powerful: the formatted input functions are higher-order functionals
82 and the parameter passing mechanism is just the regular function appli‐
83 cation not the variable assignment based mechanism which is typical for
84 formatted input in imperative languages; the OCaml format strings also
85 feature useful additions to easily define complex tokens; as expected
86 within a functional programming language, the formatted input functions
87 also support polymorphism, in particular arbitrary interaction with
88 polymorphic user-defined scanners. Furthermore, the OCaml formatted in‐
89 put facility is fully type-checked at compile time.
90
91 Unsynchronized accesses
92
93 Unsynchronized accesses to a Scanf.Scanning.in_channel may lead to an
94 invalid Scanf.Scanning.in_channel state. Thus, concurrent accesses to
95 Scanf.Scanning.in_channel s must be synchronized (for instance with a
96 Mutex.t ).
97
98 Formatted input channel
99 module Scanning : sig end
100
101
102
103
104
105
106 Type of formatted input functions
107 type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
108 'd, 'd) format6 -> 'c
109
110
111 The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
112 type of a formatted input function that reads from some formatted input
113 channel according to some format string; more precisely, if scan is
114 some formatted input function, then scan
115 ic fmt f applies f to all the arguments specified by format string
116 fmt , when scan has read those arguments from the Scanf.Scan‐
117 ning.in_channel formatted input channel ic .
118
119 For instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
120 scanner , since it is a formatted input function that reads from
121 Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
122 by fmt , reading those arguments from stdin as expected.
123
124 If the format fmt has some %r indications, the corresponding formatted
125 input functions must be provided before receiver function f . For in‐
126 stance, if read_elem is an input function for values of type t , then
127 bscanf ic "%r;" read_elem f reads a value v of type t followed by a ';'
128 character, and returns f v .
129
130
131 Since 3.10.0
132
133
134 type ('a, 'b, 'c, 'd) scanner_opt = ('a, Scanning.in_channel, 'b, 'c,
135 'a -> 'd option, 'd) format6 -> 'c
136
137
138
139
140
141 exception Scan_failure of string
142
143
144 When the input can not be read according to the format string specifi‐
145 cation, formatted input functions typically raise exception Scan_fail‐
146 ure .
147
148
149
150
151 The general formatted input function
152 val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
153
154
155
156
157
158 bscanf ic fmt r1 ... rN f reads characters from the Scanf.Scan‐
159 ning.in_channel formatted input channel ic and converts them to values
160 according to format string fmt . As a final step, receiver function f
161 is applied to the values read and gives the result of the bscanf call.
162
163 For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
164 "x = 1" "%s = %i" f returns 2 .
165
166 Arguments r1 to rN are user-defined input functions that read the argu‐
167 ment corresponding to the %r conversions specified in the format
168 string.
169
170 val bscanf_opt : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner_opt
171
172 Same as Scanf.bscanf , but returns None in case of scanning failure.
173
174
175 Since 5.0
176
177
178
179
180 Format string description
181 The format string is a character string which contains three types of
182 objects:
183
184 -plain characters, which are simply matched with the characters of the
185 input (with a special case for space and line feed, see Scanf.space ),
186
187 -conversion specifications, each of which causes reading and conversion
188 of one argument for the function f (see Scanf.conversion ),
189
190 -scanning indications to specify boundaries of tokens (see scanning
191 Scanf.indication ).
192
193
194 The space character in format strings
195 As mentioned above, a plain character in the format string is just
196 matched with the next character of the input; however, two characters
197 are special exceptions to this rule: the space character ( ' ' or ASCII
198 code 32) and the line feed character ( '\n' or ASCII code 10). A space
199 does not match a single space character, but any amount of 'whitespace'
200 in the input. More precisely, a space inside the format string matches
201 any number of tab, space, line feed and carriage return characters.
202 Similarly, a line feed character in the format string matches either a
203 single line feed or a carriage return followed by a line feed.
204
205 Matching any amount of whitespace, a space in the format string also
206 matches no amount of whitespace at all; hence, the call bscanf ib
207 "Price = %d $" (fun p -> p) succeeds and returns 1 when reading an
208 input with various whitespace in it, such as Price = 1 $ , Price = 1
209 $ , or even Price=1$ .
210
211 Conversion specifications in format strings
212 Conversion specifications consist in the % character, followed by an
213 optional flag, an optional field width, and followed by one or two con‐
214 version characters.
215
216 The conversion characters and their meanings are:
217
218
219 - d : reads an optionally signed decimal integer ( 0-9 +).
220
221 - i : reads an optionally signed integer (usual input conventions for
222 decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal (
223 0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
224
225 - u : reads an unsigned decimal integer.
226
227 - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
228
229 - o : reads an unsigned octal integer ( [0-7]+ ).
230
231 - s : reads a string argument that spreads as much as possible, until
232 the following bounding condition holds:
233
234 -a whitespace has been found (see Scanf.space ),
235
236 -a scanning indication (see scanning Scanf.indication ) has been en‐
237 countered,
238
239 -the end-of-input has been reached.
240
241 Hence, this conversion always succeeds: it returns an empty string if
242 the bounding condition holds when the scan begins.
243
244 - S : reads a delimited string argument (delimiters and special escaped
245 characters follow the lexical conventions of OCaml).
246
247 - c : reads a single character. To test the current input character
248 without reading it, specify a null field width, i.e. use specification
249 %0c . Raise Invalid_argument , if the field width specification is
250 greater than 1.
251
252 - C : reads a single delimited character (delimiters and special es‐
253 caped characters follow the lexical conventions of OCaml).
254
255 - f , e , E , g , G : reads an optionally signed floating-point number
256 in decimal notation, in the style dddd.ddd
257 e/E+-dd .
258
259 - h , H : reads an optionally signed floating-point number in hexadeci‐
260 mal notation.
261
262 - F : reads a floating point number according to the lexical conven‐
263 tions of OCaml (hence the decimal point is mandatory if the exponent
264 part is not mentioned).
265
266 - B : reads a boolean argument ( true or false ).
267
268 - b : reads a boolean argument (for backward compatibility; do not use
269 in new programs).
270
271 - ld , li , lu , lx , lX , lo : reads an int32 argument to the format
272 specified by the second letter for regular integers.
273
274 - nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
275 mat specified by the second letter for regular integers.
276
277 - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to the format
278 specified by the second letter for regular integers.
279
280 - [ range ] : reads characters that matches one of the characters men‐
281 tioned in the range of characters range (or not mentioned in it, if the
282 range starts with ^ ). Reads a string that can be empty, if the next
283 input character does not match the range. The set of characters from c1
284 to c2 (inclusively) is denoted by c1-c2 . Hence, %[0-9] returns a
285 string representing a decimal number or an empty string if no decimal
286 digit is found; similarly, %[0-9a-f] returns a string of hexadecimal
287 digits. If a closing bracket appears in a range, it must occur as the
288 first character of the range (or just after the ^ in case of range
289 negation); hence []] matches a ] character and [^]] matches any charac‐
290 ter that is not ] . Use %% and %@ to include a % or a @ in a range.
291
292 - r : user-defined reader. Takes the next ri formatted input function
293 and applies it to the scanning buffer ib to read the next argument. The
294 input function ri must therefore have type Scanning.in_channel -> 'a
295 and the argument read has type 'a .
296
297 - { fmt %} : reads a format string argument. The format string read
298 must have the same type as the format string specification fmt . For
299 instance, "%{ %i %}" reads any format string that can read a value of
300 type int ; hence, if s is the string "fmt:\"number is %u\"" , then
301 Scanf.sscanf s "fmt: %{%i%}" succeeds and returns the format string
302 "number is %u" .
303
304 - ( fmt %) : scanning sub-format substitution. Reads a format string
305 rf in the input, then goes on scanning with rf instead of scanning with
306 fmt . The format string rf must have the same type as the format
307 string specification fmt that it replaces. For instance, "%( %i %)"
308 reads any format string that can read a value of type int . The con‐
309 version returns the format string read rf , and then a value read using
310 rf . Hence, if s is the string "\"%4d\"1234.00" , then Scanf.sscanf s
311 "%(%i%)" (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) . This be‐
312 haviour is not mere format substitution, since the conversion returns
313 the format string read as additional argument. If you need pure format
314 substitution, use special flag _ to discard the extraneous argument:
315 conversion %_( fmt %) reads a format string rf and then behaves the
316 same as format string rf . Hence, if s is the string "\"%4d\"1234.00"
317 , then Scanf.sscanf s "%_(%i%)" is simply equivalent to Scanf.sscanf
318 "1234.00" "%4d" .
319
320 - l : returns the number of lines read so far.
321
322 - n : returns the number of characters read so far.
323
324 - N or L : returns the number of tokens read so far.
325
326 - ! : matches the end of input condition.
327
328 - % : matches one % character in the input.
329
330 - @ : matches one @ character in the input.
331
332 - , : does nothing.
333
334 Following the % character that introduces a conversion, there may be
335 the special flag _ : the conversion that follows occurs as usual, but
336 the resulting value is discarded. For instance, if f is the function
337 fun i -> i + 1 , and s is the string "x = 1" , then Scanf.sscanf s "%_s
338 = %i" f returns 2 .
339
340 The field width is composed of an optional integer literal indicating
341 the maximal width of the token to read. For instance, %6d reads an in‐
342 teger, having at most 6 decimal digits; %4f reads a float with at most
343 4 characters; and %8[\000-\255] returns the next 8 characters (or all
344 the characters still available, if fewer than 8 characters are avail‐
345 able in the input).
346
347 Notes:
348
349
350 -as mentioned above, a %s conversion always succeeds, even if there is
351 nothing to read in the input: in this case, it simply returns "" .
352
353
354 -in addition to the relevant digits, '_' characters may appear inside
355 numbers (this is reminiscent to the usual OCaml lexical conventions).
356 If stricter scanning is desired, use the range conversion facility in‐
357 stead of the number conversions.
358
359
360 -the scanf facility is not intended for heavy duty lexical analysis and
361 parsing. If it appears not expressive enough for your needs, several
362 alternative exists: regular expressions (module Str ), stream parsers,
363 ocamllex -generated lexers, ocamlyacc -generated parsers.
364
365
366 Scanning indications in format strings
367 Scanning indications appear just after the string conversions %s and %[
368 range ] to delimit the end of the token. A scanning indication is in‐
369 troduced by a @ character, followed by some plain character c . It
370 means that the string token should end just before the next matching c
371 (which is skipped). If no c character is encountered, the string token
372 spreads as much as possible. For instance, "%s@\t" reads a string up to
373 the next tab character or to the end of input. If a @ character appears
374 anywhere else in the format string, it is treated as a plain character.
375
376 Note:
377
378
379 -As usual in format strings, % and @ characters must be escaped using
380 %% and %@ ; this rule still holds within range specifications and scan‐
381 ning indications. For instance, format "%s@%%" reads a string up to
382 the next % character, and format "%s@%@" reads a string up to the next
383 @ .
384
385 -The scanning indications introduce slight differences in the syntax of
386 Scanf format strings, compared to those used for the Printf module.
387 However, the scanning indications are similar to those used in the For‐
388 mat module; hence, when producing formatted text to be scanned by
389 Scanf.bscanf , it is wise to use printing functions from the Format
390 module (or, if you need to use functions from Printf , banish or care‐
391 fully double check the format strings that contain '@' characters).
392
393
394 Exceptions during scanning
395 Scanners may raise the following exceptions when the input cannot be
396 read according to the format string:
397
398
399 -Raise Scanf.Scan_failure if the input does not match the format.
400
401
402 -Raise Failure if a conversion to a number is not possible.
403
404
405 -Raise End_of_file if the end of input is encountered while some more
406 characters are needed to read the current conversion specification.
407
408
409 -Raise Invalid_argument if the format string is invalid.
410
411 Note:
412
413
414 -as a consequence, scanning a %s conversion never raises exception
415 End_of_file : if the end of input is reached the conversion succeeds
416 and simply returns the characters read so far, or "" if none were ever
417 read.
418
419
420 Specialised formatted input functions
421 val sscanf : string -> ('a, 'b, 'c, 'd) scanner
422
423 Same as Scanf.bscanf , but reads from the given string.
424
425
426
427 val sscanf_opt : string -> ('a, 'b, 'c, 'd) scanner_opt
428
429 Same as Scanf.sscanf , but returns None in case of scanning failure.
430
431
432 Since 5.0
433
434
435
436 val scanf : ('a, 'b, 'c, 'd) scanner
437
438 Same as Scanf.bscanf , but reads from the predefined formatted input
439 channel Scanf.Scanning.stdin that is connected to stdin .
440
441
442
443 val scanf_opt : ('a, 'b, 'c, 'd) scanner_opt
444
445 Same as Scanf.scanf , but returns None in case of scanning failure.
446
447
448 Since 5.0
449
450
451
452 val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
453 -> ('a, 'b, 'c, 'd) scanner
454
455 Same as Scanf.bscanf , but takes an additional function argument ef
456 that is called in case of error: if the scanning process or some con‐
457 version fails, the scanning function aborts and calls the error han‐
458 dling function ef with the formatted input channel and the exception
459 that aborted the scanning process as arguments.
460
461
462
463 val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
464 'c, 'd) scanner
465
466 Same as Scanf.kscanf but reads from the given string.
467
468
469 Since 4.02.0
470
471
472
473
474 Reading format strings from input
475 val bscanf_format : Scanning.in_channel -> ('a, 'b, 'c, 'd, 'e, 'f)
476 format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
477
478
479 bscanf_format ic fmt f reads a format string token from the formatted
480 input channel ic , according to the given format string fmt , and ap‐
481 plies f to the resulting format string value.
482
483
484 Since 3.09.0
485
486
487 Raises Scan_failure if the format string value read does not have the
488 same type as fmt .
489
490
491
492 val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
493 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
494
495 Same as Scanf.bscanf_format , but reads from the given string.
496
497
498 Since 3.09.0
499
500
501
502 val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
503 ('a, 'b, 'c, 'd, 'e, 'f) format6
504
505
506 format_from_string s fmt converts a string argument to a format string,
507 according to the given format string fmt .
508
509
510 Since 3.10.0
511
512
513 Raises Scan_failure if s , considered as a format string, does not have
514 the same type as fmt .
515
516
517
518 val unescaped : string -> string
519
520
521 unescaped s return a copy of s with escape sequences (according to the
522 lexical conventions of OCaml) replaced by their corresponding special
523 characters. More precisely, Scanf.unescaped has the following prop‐
524 erty: for all string s , Scanf.unescaped (String.escaped s) = s .
525
526 Always return a copy of the argument, even if there is no escape se‐
527 quence in the argument.
528
529
530 Since 4.00.0
531
532
533 Raises Scan_failure if s is not properly escaped (i.e. s has invalid
534 escape sequences or special characters that are not properly escaped).
535 For instance, Scanf.unescaped "\"" will fail.
536
537
538
539
540
541OCamldoc 2023-07-20 Stdlib.Scanf(3)