1Stdlib.Scanf(3) OCaml library Stdlib.Scanf(3)
2
3
4
6 Stdlib.Scanf - no description
7
9 Module Stdlib.Scanf
10
12 Module Scanf
13 : (module Stdlib__scanf)
14
15
16
17
18
19
20
21
22
23 Introduction
24 Functional input with format strings
25 The module Scanf provides formatted input functions or scanners.
26
27 The formatted input functions can read from any kind of input, includ‐
28 ing strings, files, or anything that can return characters. The more
29 general source of characters is named a formatted input channel (or
30 scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
31 eral formatted input function reads from any scanning buffer and is
32 named bscanf .
33
34 Generally speaking, the formatted input functions have 3 arguments:
35
36 -the first argument is a source of characters for the input,
37
38 -the second argument is a format string that specifies the values to
39 read,
40
41 -the third argument is a receiver function that is applied to the val‐
42 ues read.
43
44 Hence, a typical call to the formatted input function Scanf.bscanf is
45 bscanf ic fmt f , where:
46
47
48 - ic is a source of characters (typically a formatted input channel
49 with type Scanf.Scanning.in_channel ),
50
51
52 - fmt is a format string (the same format strings as those used to
53 print material with module Printf or Format ),
54
55
56 - f is a function that has as many arguments as the number of values to
57 read in the input according to fmt .
58
59
60 A simple example
61 As suggested above, the expression bscanf ic "%d" f reads a decimal
62 integer n from the source of characters ic and returns f n .
63
64 For instance,
65
66
67 -if we use stdin as the source of characters ( Scanf.Scanning.stdin is
68 the predefined formatted input channel that reads from standard input),
69
70
71 -if we define the receiver f as let f x = x + 1 ,
72
73 then bscanf Scanning.stdin "%d" f reads an integer n from the standard
74 input and returns f n (that is n + 1 ). Thus, if we evaluate bscanf
75 stdin "%d" f , and then enter 41 at the keyboard, the result we get is
76 42 .
77
78 Formatted input as a functional feature
79 The OCaml scanning facility is reminiscent of the corresponding C fea‐
80 ture. However, it is also largely different, simpler, and yet more
81 powerful: the formatted input functions are higher-order functionals
82 and the parameter passing mechanism is just the regular function appli‐
83 cation not the variable assignment based mechanism which is typical for
84 formatted input in imperative languages; the OCaml format strings also
85 feature useful additions to easily define complex tokens; as expected
86 within a functional programming language, the formatted input functions
87 also support polymorphism, in particular arbitrary interaction with
88 polymorphic user-defined scanners. Furthermore, the OCaml formatted
89 input facility is fully type-checked at compile time.
90
91 Formatted input channel
92 module Scanning : sig end
93
94
95
96
97
98
99 Type of formatted input functions
100 type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
101 'd, 'd) format6 -> 'c
102
103
104 The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
105 type of a formatted input function that reads from some formatted input
106 channel according to some format string; more precisely, if scan is
107 some formatted input function, then scan
108 ic fmt f applies f to all the arguments specified by format string
109 fmt , when scan has read those arguments from the Scanf.Scan‐
110 ning.in_channel formatted input channel ic .
111
112 For instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
113 scanner , since it is a formatted input function that reads from
114 Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
115 by fmt , reading those arguments from stdin as expected.
116
117 If the format fmt has some %r indications, the corresponding formatted
118 input functions must be provided before receiver function f . For
119 instance, if read_elem is an input function for values of type t , then
120 bscanf ic "%r;" read_elem f reads a value v of type t followed by a ';'
121 character, and returns f v .
122
123
124 Since 3.10.0
125
126
127
128 exception Scan_failure of string
129
130
131 When the input can not be read according to the format string specifi‐
132 cation, formatted input functions typically raise exception Scan_fail‐
133 ure .
134
135
136
137
138 The general formatted input function
139 val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
140
141
142
143
144
145 bscanf ic fmt r1 ... rN f reads characters from the Scanf.Scan‐
146 ning.in_channel formatted input channel ic and converts them to values
147 according to format string fmt . As a final step, receiver function f
148 is applied to the values read and gives the result of the bscanf call.
149
150 For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
151 "x= 1" "%s = %i" f returns 2 .
152
153 Arguments r1 to rN are user-defined input functions that read the argu‐
154 ment corresponding to the %r conversions specified in the format
155 string.
156
157 Format string description
158 The format string is a character string which contains three types of
159 objects:
160
161 -plain characters, which are simply matched with the characters of the
162 input (with a special case for space and line feed, see Scanf.space ),
163
164 -conversion specifications, each of which causes reading and conversion
165 of one argument for the function f (see Scanf.conversion ),
166
167 -scanning indications to specify boundaries of tokens (see scanning
168 Scanf.indication ).
169
170
171 The space character in format strings
172 As mentioned above, a plain character in the format string is just
173 matched with the next character of the input; however, two characters
174 are special exceptions to this rule: the space character ( ' ' or ASCII
175 code 32) and the line feed character ( '\n' or ASCII code 10). A space
176 does not match a single space character, but any amount of 'whitespace'
177 in the input. More precisely, a space inside the format string matches
178 any number of tab, space, line feed and carriage return characters.
179 Similarly, a line feed character in the format string matches either a
180 single line feed or a carriage return followed by a line feed.
181
182 Matching any amount of whitespace, a space in the format string also
183 matches no amount of whitespace at all; hence, the call bscanf ib
184 "Price = %d $" (fun p -> p) succeeds and returns 1 when reading an
185 input with various whitespace in it, such as Price = 1 $ , Price = 1
186 $ , or even Price=1$ .
187
188 Conversion specifications in format strings
189 Conversion specifications consist in the % character, followed by an
190 optional flag, an optional field width, and followed by one or two con‐
191 version characters.
192
193 The conversion characters and their meanings are:
194
195
196 - d : reads an optionally signed decimal integer ( 0-9 +).
197
198 - i : reads an optionally signed integer (usual input conventions for
199 decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal (
200 0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
201
202 - u : reads an unsigned decimal integer.
203
204 - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
205
206 - o : reads an unsigned octal integer ( [0-7]+ ).
207
208 - s : reads a string argument that spreads as much as possible, until
209 the following bounding condition holds:
210
211 -a whitespace has been found (see Scanf.space ),
212
213 -a scanning indication (see scanning Scanf.indication ) has been
214 encountered,
215
216 -the end-of-input has been reached.
217
218 Hence, this conversion always succeeds: it returns an empty string if
219 the bounding condition holds when the scan begins.
220
221 - S : reads a delimited string argument (delimiters and special escaped
222 characters follow the lexical conventions of OCaml).
223
224 - c : reads a single character. To test the current input character
225 without reading it, specify a null field width, i.e. use specification
226 %0c . Raise Invalid_argument , if the field width specification is
227 greater than 1.
228
229 - C : reads a single delimited character (delimiters and special
230 escaped characters follow the lexical conventions of OCaml).
231
232 - f , e , E , g , G : reads an optionally signed floating-point number
233 in decimal notation, in the style dddd.ddd
234 e/E+-dd .
235
236 - h , H : reads an optionally signed floating-point number in hexadeci‐
237 mal notation.
238
239 - F : reads a floating point number according to the lexical conven‐
240 tions of OCaml (hence the decimal point is mandatory if the exponent
241 part is not mentioned).
242
243 - B : reads a boolean argument ( true or false ).
244
245 - b : reads a boolean argument (for backward compatibility; do not use
246 in new programs).
247
248 - ld , li , lu , lx , lX , lo : reads an int32 argument to the format
249 specified by the second letter for regular integers.
250
251 - nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
252 mat specified by the second letter for regular integers.
253
254 - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to the format
255 specified by the second letter for regular integers.
256
257 - [ range ] : reads characters that matches one of the characters men‐
258 tioned in the range of characters range (or not mentioned in it, if the
259 range starts with ^ ). Reads a string that can be empty, if the next
260 input character does not match the range. The set of characters from c1
261 to c2 (inclusively) is denoted by c1-c2 . Hence, %[0-9] returns a
262 string representing a decimal number or an empty string if no decimal
263 digit is found; similarly, %[0-9a-f] returns a string of hexadecimal
264 digits. If a closing bracket appears in a range, it must occur as the
265 first character of the range (or just after the ^ in case of range
266 negation); hence []] matches a ] character and [^]] matches any charac‐
267 ter that is not ] . Use %% and %@ to include a % or a @ in a range.
268
269 - r : user-defined reader. Takes the next ri formatted input function
270 and applies it to the scanning buffer ib to read the next argument. The
271 input function ri must therefore have type Scanning.in_channel -> 'a
272 and the argument read has type 'a .
273
274 - { fmt %} : reads a format string argument. The format string read
275 must have the same type as the format string specification fmt . For
276 instance, "%{ %i %}" reads any format string that can read a value of
277 type int ; hence, if s is the string "fmt:\"number is %u\"" , then
278 Scanf.sscanf s "fmt: %{%i%}" succeeds and returns the format string
279 "number is %u" .
280
281 - ( fmt %) : scanning sub-format substitution. Reads a format string
282 rf in the input, then goes on scanning with rf instead of scanning with
283 fmt . The format string rf must have the same type as the format
284 string specification fmt that it replaces. For instance, "%( %i %)"
285 reads any format string that can read a value of type int . The con‐
286 version returns the format string read rf , and then a value read using
287 rf . Hence, if s is the string "\"%4d\"1234.00" , then Scanf.sscanf s
288 "%(%i%)" (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) . This be‐
289 haviour is not mere format substitution, since the conversion returns
290 the format string read as additional argument. If you need pure format
291 substitution, use special flag _ to discard the extraneous argument:
292 conversion %_( fmt %) reads a format string rf and then behaves the
293 same as format string rf . Hence, if s is the string "\"%4d\"1234.00"
294 , then Scanf.sscanf s "%_(%i%)" is simply equivalent to Scanf.sscanf
295 "1234.00" "%4d" .
296
297 - l : returns the number of lines read so far.
298
299 - n : returns the number of characters read so far.
300
301 - N or L : returns the number of tokens read so far.
302
303 - ! : matches the end of input condition.
304
305 - % : matches one % character in the input.
306
307 - @ : matches one @ character in the input.
308
309 - , : does nothing.
310
311 Following the % character that introduces a conversion, there may be
312 the special flag _ : the conversion that follows occurs as usual, but
313 the resulting value is discarded. For instance, if f is the function
314 fun i -> i + 1 , and s is the string "x = 1" , then Scanf.sscanf s "%_s
315 = %i" f returns 2 .
316
317 The field width is composed of an optional integer literal indicating
318 the maximal width of the token to read. For instance, %6d reads an
319 integer, having at most 6 decimal digits; %4f reads a float with at
320 most 4 characters; and %8[\000-\255] returns the next 8 characters (or
321 all the characters still available, if fewer than 8 characters are
322 available in the input).
323
324 Notes:
325
326
327 -as mentioned above, a %s conversion always succeeds, even if there is
328 nothing to read in the input: in this case, it simply returns "" .
329
330
331 -in addition to the relevant digits, '_' characters may appear inside
332 numbers (this is reminiscent to the usual OCaml lexical conventions).
333 If stricter scanning is desired, use the range conversion facility
334 instead of the number conversions.
335
336
337 -the scanf facility is not intended for heavy duty lexical analysis and
338 parsing. If it appears not expressive enough for your needs, several
339 alternative exists: regular expressions (module Str ), stream parsers,
340 ocamllex -generated lexers, ocamlyacc -generated parsers.
341
342
343 Scanning indications in format strings
344 Scanning indications appear just after the string conversions %s and %[
345 range ] to delimit the end of the token. A scanning indication is
346 introduced by a @ character, followed by some plain character c . It
347 means that the string token should end just before the next matching c
348 (which is skipped). If no c character is encountered, the string token
349 spreads as much as possible. For instance, "%s@\t" reads a string up to
350 the next tab character or to the end of input. If a @ character appears
351 anywhere else in the format string, it is treated as a plain character.
352
353 Note:
354
355
356 -As usual in format strings, % and @ characters must be escaped using
357 %% and %@ ; this rule still holds within range specifications and scan‐
358 ning indications. For instance, format "%s@%%" reads a string up to
359 the next % character, and format "%s@%@" reads a string up to the next
360 @ .
361
362 -The scanning indications introduce slight differences in the syntax of
363 Scanf format strings, compared to those used for the Printf module.
364 However, the scanning indications are similar to those used in the For‐
365 mat module; hence, when producing formatted text to be scanned by
366 Scanf.bscanf , it is wise to use printing functions from the Format
367 module (or, if you need to use functions from Printf , banish or care‐
368 fully double check the format strings that contain '@' characters).
369
370
371 Exceptions during scanning
372 Scanners may raise the following exceptions when the input cannot be
373 read according to the format string:
374
375
376 -Raise Scanf.Scan_failure if the input does not match the format.
377
378
379 -Raise Failure if a conversion to a number is not possible.
380
381
382 -Raise End_of_file if the end of input is encountered while some more
383 characters are needed to read the current conversion specification.
384
385
386 -Raise Invalid_argument if the format string is invalid.
387
388 Note:
389
390
391 -as a consequence, scanning a %s conversion never raises exception
392 End_of_file : if the end of input is reached the conversion succeeds
393 and simply returns the characters read so far, or "" if none were ever
394 read.
395
396
397 Specialised formatted input functions
398 val sscanf : string -> ('a, 'b, 'c, 'd) scanner
399
400 Same as Scanf.bscanf , but reads from the given string.
401
402
403
404 val scanf : ('a, 'b, 'c, 'd) scanner
405
406 Same as Scanf.bscanf , but reads from the predefined formatted input
407 channel Scanf.Scanning.stdin that is connected to stdin .
408
409
410
411 val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
412 -> ('a, 'b, 'c, 'd) scanner
413
414 Same as Scanf.bscanf , but takes an additional function argument ef
415 that is called in case of error: if the scanning process or some con‐
416 version fails, the scanning function aborts and calls the error han‐
417 dling function ef with the formatted input channel and the exception
418 that aborted the scanning process as arguments.
419
420
421
422 val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
423 'c, 'd) scanner
424
425 Same as Scanf.kscanf but reads from the given string.
426
427
428 Since 4.02.0
429
430
431
432
433 Reading format strings from input
434 val bscanf_format : Scanning.in_channel -> ('a, 'b, 'c, 'd, 'e, 'f)
435 format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
436
437
438 bscanf_format ic fmt f reads a format string token from the formatted
439 input channel ic , according to the given format string fmt , and
440 applies f to the resulting format string value.
441
442
443 Since 3.09.0
444
445
446 Raises Scan_failure if the format string value read does not have the
447 same type as fmt .
448
449
450
451 val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
452 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
453
454 Same as Scanf.bscanf_format , but reads from the given string.
455
456
457 Since 3.09.0
458
459
460
461 val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
462 ('a, 'b, 'c, 'd, 'e, 'f) format6
463
464
465 format_from_string s fmt converts a string argument to a format string,
466 according to the given format string fmt .
467
468
469 Since 3.10.0
470
471
472 Raises Scan_failure if s , considered as a format string, does not have
473 the same type as fmt .
474
475
476
477 val unescaped : string -> string
478
479
480 unescaped s return a copy of s with escape sequences (according to the
481 lexical conventions of OCaml) replaced by their corresponding special
482 characters. More precisely, Scanf.unescaped has the following prop‐
483 erty: for all string s , Scanf.unescaped (String.escaped s) = s .
484
485 Always return a copy of the argument, even if there is no escape
486 sequence in the argument.
487
488
489 Since 4.00.0
490
491
492 Raises Scan_failure if s is not properly escaped (i.e. s has invalid
493 escape sequences or special characters that are not properly escaped).
494 For instance, Scanf.unescaped "\"" will fail.
495
496
497
498
499 Deprecated
500 val fscanf : in_channel -> ('a, 'b, 'c, 'd) scanner
501
502 Deprecated.
503
504 Scanf.fscanf is error prone and deprecated since 4.03.0.
505
506 This function violates the following invariant of the Scanf module: To
507 preserve scanning semantics, all scanning functions defined in Scanf
508 must read from a user defined Scanf.Scanning.in_channel formatted input
509 channel.
510
511 If you need to read from a in_channel input channel ic , simply define
512 a Scanf.Scanning.in_channel formatted input channel as in let ib =
513 Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
514
515
516
517 val kfscanf : in_channel -> (Scanning.in_channel -> exn -> 'd) -> ('a,
518 'b, 'c, 'd) scanner
519
520 Deprecated.
521
522 Scanf.kfscanf is error prone and deprecated since 4.03.0.
523
524
525
526
527
528OCamldoc 2020-09-01 Stdlib.Scanf(3)