1Scanf(3) OCaml library Scanf(3)
2
3
4
6 Scanf - Formatted input functions.
7
9 Module Scanf
10
12 Module Scanf
13 : sig end
14
15
16 Formatted input functions.
17
18
19
20
21
22
23
24 Introduction
25 Functional input with format strings
26 The module Scanf provides formatted input functions or scanners.
27
28 The formatted input functions can read from any kind of input, includ‐
29 ing strings, files, or anything that can return characters. The more
30 general source of characters is named a formatted input channel (or
31 scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
32 eral formatted input function reads from any scanning buffer and is
33 named bscanf .
34
35 Generally speaking, the formatted input functions have 3 arguments:
36
37 -the first argument is a source of characters for the input,
38
39 -the second argument is a format string that specifies the values to
40 read,
41
42 -the third argument is a receiver function that is applied to the val‐
43 ues read.
44
45 Hence, a typical call to the formatted input function Scanf.bscanf is
46 bscanf ic fmt f , where:
47
48
49 - ic is a source of characters (typically a formatted input channel
50 with type Scanf.Scanning.in_channel ),
51
52
53 - fmt is a format string (the same format strings as those used to
54 print material with module Printf or Format ),
55
56
57 - f is a function that has as many arguments as the number of values to
58 read in the input according to fmt .
59
60
61 A simple example
62 As suggested above, the expression bscanf ic %d f reads a decimal inte‐
63 ger n from the source of characters ic and returns f n .
64
65 For instance,
66
67
68 -if we use stdin as the source of characters ( Scanf.Scanning.stdin is
69 the predefined formatted input channel that reads from standard input),
70
71
72 -if we define the receiver f as let f x = x + 1 ,
73
74 then bscanf Scanning.stdin %d f reads an integer n from the standard
75 input and returns f n (that is n + 1 ). Thus, if we evaluate bscanf
76 stdin %d f , and then enter 41 at the keyboard, the result we get is 42
77 .
78
79 Formatted input as a functional feature
80 The OCaml scanning facility is reminiscent of the corresponding C fea‐
81 ture. However, it is also largely different, simpler, and yet more
82 powerful: the formatted input functions are higher-order functionals
83 and the parameter passing mechanism is just the regular function appli‐
84 cation not the variable assignment based mechanism which is typical for
85 formatted input in imperative languages; the OCaml format strings also
86 feature useful additions to easily define complex tokens; as expected
87 within a functional programming language, the formatted input functions
88 also support polymorphism, in particular arbitrary interaction with
89 polymorphic user-defined scanners. Furthermore, the OCaml formatted
90 input facility is fully type-checked at compile time.
91
92 Formatted input channel
93 module Scanning : sig end
94
95
96
97
98
99
100 Type of formatted input functions
101 type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
102 'd, 'd) format6 -> 'c
103
104
105 The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
106 type of a formatted input function that reads from some formatted input
107 channel according to some format string; more precisely, if scan is
108 some formatted input function, then scan ic fmt f applies f to all the
109 arguments specified by format string fmt , when scan has read those
110 arguments from the Scanf.Scanning.in_channel formatted input channel ic
111 .
112
113 For instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
114 scanner , since it is a formatted input function that reads from
115 Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
116 by fmt , reading those arguments from stdin as expected.
117
118 If the format fmt has some %r indications, the corresponding formatted
119 input functions must be provided before receiver function f . For
120 instance, if read_elem is an input function for values of type t , then
121 bscanf ic %r; read_elem f reads a value v of type t followed by a ';'
122 character, and returns f v .
123
124
125 Since 3.10.0
126
127
128
129 exception Scan_failure of string
130
131
132 When the input can not be read according to the format string specifi‐
133 cation, formatted input functions typically raise exception Scan_fail‐
134 ure .
135
136
137
138
139 The general formatted input function
140 val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
141
142
143
144
145
146 bscanf ic fmt r1 ... rN f reads characters from the Scanf.Scan‐
147 ning.in_channel formatted input channel ic and converts them to values
148 according to format string fmt . As a final step, receiver function f
149 is applied to the values read and gives the result of the bscanf call.
150
151 For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
152 x= 1 %s = %i f returns 2 .
153
154 Arguments r1 to rN are user-defined input functions that read the argu‐
155 ment corresponding to the %r conversions specified in the format
156 string.
157
158 Format string description
159 The format string is a character string which contains three types of
160 objects:
161
162 -plain characters, which are simply matched with the characters of the
163 input (with a special case for space and line feed, see Scanf.space ),
164
165 -conversion specifications, each of which causes reading and conversion
166 of one argument for the function f (see Scanf.conversion ),
167
168 -scanning indications to specify boundaries of tokens (see scanning
169 Scanf.indication ).
170
171
172 The space character in format strings
173 As mentioned above, a plain character in the format string is just
174 matched with the next character of the input; however, two characters
175 are special exceptions to this rule: the space character ( ' ' or ASCII
176 code 32) and the line feed character ( '\n' or ASCII code 10). A space
177 does not match a single space character, but any amount of 'whitespace'
178 in the input. More precisely, a space inside the format string matches
179 any number of tab, space, line feed and carriage return characters.
180 Similarly, a line feed character in the format string matches either a
181 single line feed or a carriage return followed by a line feed.
182
183 Matching any amount of whitespace, a space in the format string also
184 matches no amount of whitespace at all; hence, the call bscanf ib Price
185 = %d $ (fun p -> p) succeeds and returns 1 when reading an input with
186 various whitespace in it, such as Price = 1 $ , Price = 1 $ , or even
187 Price=1$ .
188
189 Conversion specifications in format strings
190 Conversion specifications consist in the % character, followed by an
191 optional flag, an optional field width, and followed by one or two con‐
192 version characters.
193
194 The conversion characters and their meanings are:
195
196
197 - d : reads an optionally signed decimal integer ( 0-9 +).
198
199 - i : reads an optionally signed integer (usual input conventions for
200 decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal (
201 0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
202
203 - u : reads an unsigned decimal integer.
204
205 - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
206
207 - o : reads an unsigned octal integer ( [0-7]+ ).
208
209 - s : reads a string argument that spreads as much as possible, until
210 the following bounding condition holds:
211
212 -a whitespace has been found (see Scanf.space ),
213
214 -a scanning indication (see scanning Scanf.indication ) has been
215 encountered,
216
217 -the end-of-input has been reached.
218
219 Hence, this conversion always succeeds: it returns an empty string if
220 the bounding condition holds when the scan begins.
221
222 - S : reads a delimited string argument (delimiters and special escaped
223 characters follow the lexical conventions of OCaml).
224
225 - c : reads a single character. To test the current input character
226 without reading it, specify a null field width, i.e. use specification
227 %0c . Raise Invalid_argument , if the field width specification is
228 greater than 1.
229
230 - C : reads a single delimited character (delimiters and special
231 escaped characters follow the lexical conventions of OCaml).
232
233 - f , e , E , g , G : reads an optionally signed floating-point number
234 in decimal notation, in the style dddd.ddd e/E+-dd .
235
236 - h , H : reads an optionally signed floating-point number in hexadeci‐
237 mal notation.
238
239 - F : reads a floating point number according to the lexical conven‐
240 tions of OCaml (hence the decimal point is mandatory if the exponent
241 part is not mentioned).
242
243 - B : reads a boolean argument ( true or false ).
244
245 - b : reads a boolean argument (for backward compatibility; do not use
246 in new programs).
247
248 - ld , li , lu , lx , lX , lo : reads an int32 argument to the format
249 specified by the second letter for regular integers.
250
251 - nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
252 mat specified by the second letter for regular integers.
253
254 - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to the format
255 specified by the second letter for regular integers.
256
257 - [ range ] : reads characters that matches one of the characters men‐
258 tioned in the range of characters range (or not mentioned in it, if the
259 range starts with ^ ). Reads a string that can be empty, if the next
260 input character does not match the range. The set of characters from c1
261 to c2 (inclusively) is denoted by c1-c2 . Hence, %[0-9] returns a
262 string representing a decimal number or an empty string if no decimal
263 digit is found; similarly, %[0-9a-f] returns a string of hexadecimal
264 digits. If a closing bracket appears in a range, it must occur as the
265 first character of the range (or just after the ^ in case of range
266 negation); hence []] matches a ] character and [^]] matches any charac‐
267 ter that is not ] . Use %% and %@ to include a % or a @ in a range.
268
269 - r : user-defined reader. Takes the next ri formatted input function
270 and applies it to the scanning buffer ib to read the next argument. The
271 input function ri must therefore have type Scanning.in_channel -> 'a
272 and the argument read has type 'a .
273
274 - { fmt %} : reads a format string argument. The format string read
275 must have the same type as the format string specification fmt . For
276 instance, %{ %i %} reads any format string that can read a value of
277 type int ; hence, if s is the string fmt:\ number is %u\"" , then
278 Scanf.sscanf s fmt: %{%i%} succeeds and returns the format string num‐
279 ber is %u .
280
281 - ( fmt %) : scanning sub-format substitution. Reads a format string
282 rf in the input, then goes on scanning with rf instead of scanning with
283 fmt . The format string rf must have the same type as the format
284 string specification fmt that it replaces. For instance, %( %i %)
285 reads any format string that can read a value of type int . The con‐
286 version returns the format string read rf , and then a value read using
287 rf . Hence, if s is the string \ %4d\"1234.00" , then Scanf.sscanf s
288 %(%i%) (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) . This behav‐
289 iour is not mere format substitution, since the conversion returns the
290 format string read as additional argument. If you need pure format sub‐
291 stitution, use special flag _ to discard the extraneous argument: con‐
292 version %_( fmt %) reads a format string rf and then behaves the same
293 as format string rf . Hence, if s is the string \ %4d\"1234.00" , then
294 Scanf.sscanf s %_(%i%) is simply equivalent to Scanf.sscanf 1234.00 %4d
295 .
296
297 - l : returns the number of lines read so far.
298
299 - n : returns the number of characters read so far.
300
301 - N or L : returns the number of tokens read so far.
302
303 - ! : matches the end of input condition.
304
305 - % : matches one % character in the input.
306
307 - @ : matches one @ character in the input.
308
309 - , : does nothing.
310
311 Following the % character that introduces a conversion, there may be
312 the special flag _ : the conversion that follows occurs as usual, but
313 the resulting value is discarded. For instance, if f is the function
314 fun i -> i + 1 , and s is the string x = 1 , then Scanf.sscanf s %_s =
315 %i f returns 2 .
316
317 The field width is composed of an optional integer literal indicating
318 the maximal width of the token to read. For instance, %6d reads an
319 integer, having at most 6 decimal digits; %4f reads a float with at
320 most 4 characters; and %8[\000-\255] returns the next 8 characters (or
321 all the characters still available, if fewer than 8 characters are
322 available in the input).
323
324 Notes:
325
326
327 -as mentioned above, a %s conversion always succeeds, even if there is
328 nothing to read in the input: in this case, it simply returns .
329
330
331 -in addition to the relevant digits, '_' characters may appear inside
332 numbers (this is reminiscent to the usual OCaml lexical conventions).
333 If stricter scanning is desired, use the range conversion facility
334 instead of the number conversions.
335
336
337 -the scanf facility is not intended for heavy duty lexical analysis and
338 parsing. If it appears not expressive enough for your needs, several
339 alternative exists: regular expressions (module Str ), stream parsers,
340 ocamllex -generated lexers, ocamlyacc -generated parsers.
341
342
343 Scanning indications in format strings
344 Scanning indications appear just after the string conversions %s and %[
345 range ] to delimit the end of the token. A scanning indication is
346 introduced by a @ character, followed by some plain character c . It
347 means that the string token should end just before the next matching c
348 (which is skipped). If no c character is encountered, the string token
349 spreads as much as possible. For instance, %s@\t reads a string up to
350 the next tab character or to the end of input. If a @ character appears
351 anywhere else in the format string, it is treated as a plain character.
352
353 Note:
354
355
356 -As usual in format strings, % and @ characters must be escaped using
357 %% and %@ ; this rule still holds within range specifications and scan‐
358 ning indications. For instance, format %s@%% reads a string up to the
359 next % character, and format %s@%@ reads a string up to the next @ .
360
361 -The scanning indications introduce slight differences in the syntax of
362 Scanf format strings, compared to those used for the Printf module.
363 However, the scanning indications are similar to those used in the For‐
364 mat module; hence, when producing formatted text to be scanned by
365 Scanf.bscanf , it is wise to use printing functions from the Format
366 module (or, if you need to use functions from Printf , banish or care‐
367 fully double check the format strings that contain '@' characters).
368
369
370 Exceptions during scanning
371 Scanners may raise the following exceptions when the input cannot be
372 read according to the format string:
373
374
375 -Raise Scanf.Scan_failure if the input does not match the format.
376
377
378 -Raise Failure if a conversion to a number is not possible.
379
380
381 -Raise End_of_file if the end of input is encountered while some more
382 characters are needed to read the current conversion specification.
383
384
385 -Raise Invalid_argument if the format string is invalid.
386
387 Note:
388
389
390 -as a consequence, scanning a %s conversion never raises exception
391 End_of_file : if the end of input is reached the conversion succeeds
392 and simply returns the characters read so far, or if none were ever
393 read.
394
395
396 Specialised formatted input functions
397 val sscanf : string -> ('a, 'b, 'c, 'd) scanner
398
399 Same as Scanf.bscanf , but reads from the given string.
400
401
402
403 val scanf : ('a, 'b, 'c, 'd) scanner
404
405 Same as Scanf.bscanf , but reads from the predefined formatted input
406 channel Scanf.Scanning.stdin that is connected to stdin .
407
408
409
410 val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
411 -> ('a, 'b, 'c, 'd) scanner
412
413 Same as Scanf.bscanf , but takes an additional function argument ef
414 that is called in case of error: if the scanning process or some con‐
415 version fails, the scanning function aborts and calls the error han‐
416 dling function ef with the formatted input channel and the exception
417 that aborted the scanning process as arguments.
418
419
420
421 val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
422 'c, 'd) scanner
423
424 Same as Scanf.kscanf but reads from the given string.
425
426
427 Since 4.02.0
428
429
430
431
432 Reading format strings from input
433 val bscanf_format : Scanning.in_channel -> ('a, 'b, 'c, 'd, 'e, 'f)
434 format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
435
436
437 bscanf_format ic fmt f reads a format string token from the formatted
438 input channel ic , according to the given format string fmt , and
439 applies f to the resulting format string value. Raise Scanf.Scan_fail‐
440 ure if the format string value read does not have the same type as fmt
441 .
442
443
444 Since 3.09.0
445
446
447
448 val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
449 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
450
451 Same as Scanf.bscanf_format , but reads from the given string.
452
453
454 Since 3.09.0
455
456
457
458 val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
459 ('a, 'b, 'c, 'd, 'e, 'f) format6
460
461
462 format_from_string s fmt converts a string argument to a format string,
463 according to the given format string fmt . Raise Scanf.Scan_failure if
464 s , considered as a format string, does not have the same type as fmt .
465
466
467 Since 3.10.0
468
469
470
471 val unescaped : string -> string
472
473
474 unescaped s return a copy of s with escape sequences (according to the
475 lexical conventions of OCaml) replaced by their corresponding special
476 characters. More precisely, Scanf.unescaped has the following prop‐
477 erty: for all string s , Scanf.unescaped (String.escaped s) = s .
478
479 Always return a copy of the argument, even if there is no escape
480 sequence in the argument. Raise Scanf.Scan_failure if s is not prop‐
481 erly escaped (i.e. s has invalid escape sequences or special charac‐
482 ters that are not properly escaped). For instance, Scanf.unescaped \"
483 will fail.
484
485
486 Since 4.00.0
487
488
489
490
491 Deprecated
492 val fscanf : in_channel -> ('a, 'b, 'c, 'd) scanner
493
494 Deprecated.
495
496 Scanf.fscanf is error prone and deprecated since 4.03.0.
497
498 This function violates the following invariant of the Scanf module: To
499 preserve scanning semantics, all scanning functions defined in Scanf
500 must read from a user defined Scanf.Scanning.in_channel formatted input
501 channel.
502
503 If you need to read from a in_channel input channel ic , simply define
504 a Scanf.Scanning.in_channel formatted input channel as in let ib =
505 Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
506
507
508
509 val kfscanf : in_channel -> (Scanning.in_channel -> exn -> 'd) -> ('a,
510 'b, 'c, 'd) scanner
511
512 Deprecated.
513
514 Scanf.kfscanf is error prone and deprecated since 4.03.0.
515
516
517
518
519
520OCamldoc 2019-07-30 Scanf(3)