1Stdlib.Scanf(3) OCaml library Stdlib.Scanf(3)
2
3
4
6 Stdlib.Scanf - no description
7
9 Module Stdlib.Scanf
10
12 Module Scanf
13 : (module Stdlib__scanf)
14
15
16
17
18
19
20
21
22
23 Introduction
24 Functional input with format strings
25 The module Scanf provides formatted input functions or scanners.
26
27 The formatted input functions can read from any kind of input, includ‐
28 ing strings, files, or anything that can return characters. The more
29 general source of characters is named a formatted input channel (or
30 scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
31 eral formatted input function reads from any scanning buffer and is
32 named bscanf .
33
34 Generally speaking, the formatted input functions have 3 arguments:
35
36 -the first argument is a source of characters for the input,
37
38 -the second argument is a format string that specifies the values to
39 read,
40
41 -the third argument is a receiver function that is applied to the val‐
42 ues read.
43
44 Hence, a typical call to the formatted input function Scanf.bscanf is
45 bscanf ic fmt f , where:
46
47
48 - ic is a source of characters (typically a formatted input channel
49 with type Scanf.Scanning.in_channel ),
50
51
52 - fmt is a format string (the same format strings as those used to
53 print material with module Printf or Format ),
54
55
56 - f is a function that has as many arguments as the number of values to
57 read in the input according to fmt .
58
59
60 A simple example
61 As suggested above, the expression bscanf ic %d f reads a decimal inte‐
62 ger n from the source of characters ic and returns f n .
63
64 For instance,
65
66
67 -if we use stdin as the source of characters ( Scanf.Scanning.stdin is
68 the predefined formatted input channel that reads from standard input),
69
70
71 -if we define the receiver f as let f x = x + 1 ,
72
73 then bscanf Scanning.stdin %d f reads an integer n from the standard
74 input and returns f n (that is n + 1 ). Thus, if we evaluate bscanf
75 stdin %d f , and then enter 41 at the keyboard, the result we get is 42
76 .
77
78 Formatted input as a functional feature
79 The OCaml scanning facility is reminiscent of the corresponding C fea‐
80 ture. However, it is also largely different, simpler, and yet more
81 powerful: the formatted input functions are higher-order functionals
82 and the parameter passing mechanism is just the regular function appli‐
83 cation not the variable assignment based mechanism which is typical for
84 formatted input in imperative languages; the OCaml format strings also
85 feature useful additions to easily define complex tokens; as expected
86 within a functional programming language, the formatted input functions
87 also support polymorphism, in particular arbitrary interaction with
88 polymorphic user-defined scanners. Furthermore, the OCaml formatted
89 input facility is fully type-checked at compile time.
90
91 Formatted input channel
92 module Scanning : sig end
93
94
95
96
97
98
99 Type of formatted input functions
100 type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
101 'd, 'd) format6 -> 'c
102
103
104 The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
105 type of a formatted input function that reads from some formatted input
106 channel according to some format string; more precisely, if scan is
107 some formatted input function, then scan ic fmt f applies f to all the
108 arguments specified by format string fmt , when scan has read those
109 arguments from the Scanf.Scanning.in_channel formatted input channel ic
110 .
111
112 For instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
113 scanner , since it is a formatted input function that reads from
114 Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
115 by fmt , reading those arguments from stdin as expected.
116
117 If the format fmt has some %r indications, the corresponding formatted
118 input functions must be provided before receiver function f . For
119 instance, if read_elem is an input function for values of type t , then
120 bscanf ic %r; read_elem f reads a value v of type t followed by a ';'
121 character, and returns f v .
122
123
124 Since 3.10.0
125
126
127
128 exception Scan_failure of string
129
130
131 When the input can not be read according to the format string specifi‐
132 cation, formatted input functions typically raise exception Scan_fail‐
133 ure .
134
135
136
137
138 The general formatted input function
139 val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
140
141
142
143
144
145 bscanf ic fmt r1 ... rN f reads characters from the Scanf.Scan‐
146 ning.in_channel formatted input channel ic and converts them to values
147 according to format string fmt . As a final step, receiver function f
148 is applied to the values read and gives the result of the bscanf call.
149
150 For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
151 x= 1 %s = %i f returns 2 .
152
153 Arguments r1 to rN are user-defined input functions that read the argu‐
154 ment corresponding to the %r conversions specified in the format
155 string.
156
157 Format string description
158 The format string is a character string which contains three types of
159 objects:
160
161 -plain characters, which are simply matched with the characters of the
162 input (with a special case for space and line feed, see Scanf.space ),
163
164 -conversion specifications, each of which causes reading and conversion
165 of one argument for the function f (see Scanf.conversion ),
166
167 -scanning indications to specify boundaries of tokens (see scanning
168 Scanf.indication ).
169
170
171 The space character in format strings
172 As mentioned above, a plain character in the format string is just
173 matched with the next character of the input; however, two characters
174 are special exceptions to this rule: the space character ( ' ' or ASCII
175 code 32) and the line feed character ( '\n' or ASCII code 10). A space
176 does not match a single space character, but any amount of 'whitespace'
177 in the input. More precisely, a space inside the format string matches
178 any number of tab, space, line feed and carriage return characters.
179 Similarly, a line feed character in the format string matches either a
180 single line feed or a carriage return followed by a line feed.
181
182 Matching any amount of whitespace, a space in the format string also
183 matches no amount of whitespace at all; hence, the call bscanf ib Price
184 = %d $ (fun p -> p) succeeds and returns 1 when reading an input with
185 various whitespace in it, such as Price = 1 $ , Price = 1 $ , or even
186 Price=1$ .
187
188 Conversion specifications in format strings
189 Conversion specifications consist in the % character, followed by an
190 optional flag, an optional field width, and followed by one or two con‐
191 version characters.
192
193 The conversion characters and their meanings are:
194
195
196 - d : reads an optionally signed decimal integer ( 0-9 +).
197
198 - i : reads an optionally signed integer (usual input conventions for
199 decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal (
200 0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
201
202 - u : reads an unsigned decimal integer.
203
204 - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
205
206 - o : reads an unsigned octal integer ( [0-7]+ ).
207
208 - s : reads a string argument that spreads as much as possible, until
209 the following bounding condition holds:
210
211 -a whitespace has been found (see Scanf.space ),
212
213 -a scanning indication (see scanning Scanf.indication ) has been
214 encountered,
215
216 -the end-of-input has been reached.
217
218 Hence, this conversion always succeeds: it returns an empty string if
219 the bounding condition holds when the scan begins.
220
221 - S : reads a delimited string argument (delimiters and special escaped
222 characters follow the lexical conventions of OCaml).
223
224 - c : reads a single character. To test the current input character
225 without reading it, specify a null field width, i.e. use specification
226 %0c . Raise Invalid_argument , if the field width specification is
227 greater than 1.
228
229 - C : reads a single delimited character (delimiters and special
230 escaped characters follow the lexical conventions of OCaml).
231
232 - f , e , E , g , G : reads an optionally signed floating-point number
233 in decimal notation, in the style dddd.ddd e/E+-dd .
234
235 - h , H : reads an optionally signed floating-point number in hexadeci‐
236 mal notation.
237
238 - F : reads a floating point number according to the lexical conven‐
239 tions of OCaml (hence the decimal point is mandatory if the exponent
240 part is not mentioned).
241
242 - B : reads a boolean argument ( true or false ).
243
244 - b : reads a boolean argument (for backward compatibility; do not use
245 in new programs).
246
247 - ld , li , lu , lx , lX , lo : reads an int32 argument to the format
248 specified by the second letter for regular integers.
249
250 - nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
251 mat specified by the second letter for regular integers.
252
253 - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to the format
254 specified by the second letter for regular integers.
255
256 - [ range ] : reads characters that matches one of the characters men‐
257 tioned in the range of characters range (or not mentioned in it, if the
258 range starts with ^ ). Reads a string that can be empty, if the next
259 input character does not match the range. The set of characters from c1
260 to c2 (inclusively) is denoted by c1-c2 . Hence, %[0-9] returns a
261 string representing a decimal number or an empty string if no decimal
262 digit is found; similarly, %[0-9a-f] returns a string of hexadecimal
263 digits. If a closing bracket appears in a range, it must occur as the
264 first character of the range (or just after the ^ in case of range
265 negation); hence []] matches a ] character and [^]] matches any charac‐
266 ter that is not ] . Use %% and %@ to include a % or a @ in a range.
267
268 - r : user-defined reader. Takes the next ri formatted input function
269 and applies it to the scanning buffer ib to read the next argument. The
270 input function ri must therefore have type Scanning.in_channel -> 'a
271 and the argument read has type 'a .
272
273 - { fmt %} : reads a format string argument. The format string read
274 must have the same type as the format string specification fmt . For
275 instance, %{ %i %} reads any format string that can read a value of
276 type int ; hence, if s is the string fmt:\ number is %u\"" , then
277 Scanf.sscanf s fmt: %{%i%} succeeds and returns the format string num‐
278 ber is %u .
279
280 - ( fmt %) : scanning sub-format substitution. Reads a format string
281 rf in the input, then goes on scanning with rf instead of scanning with
282 fmt . The format string rf must have the same type as the format
283 string specification fmt that it replaces. For instance, %( %i %)
284 reads any format string that can read a value of type int . The con‐
285 version returns the format string read rf , and then a value read using
286 rf . Hence, if s is the string \ %4d\"1234.00" , then Scanf.sscanf s
287 %(%i%) (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) . This behav‐
288 iour is not mere format substitution, since the conversion returns the
289 format string read as additional argument. If you need pure format sub‐
290 stitution, use special flag _ to discard the extraneous argument: con‐
291 version %_( fmt %) reads a format string rf and then behaves the same
292 as format string rf . Hence, if s is the string \ %4d\"1234.00" , then
293 Scanf.sscanf s %_(%i%) is simply equivalent to Scanf.sscanf 1234.00 %4d
294 .
295
296 - l : returns the number of lines read so far.
297
298 - n : returns the number of characters read so far.
299
300 - N or L : returns the number of tokens read so far.
301
302 - ! : matches the end of input condition.
303
304 - % : matches one % character in the input.
305
306 - @ : matches one @ character in the input.
307
308 - , : does nothing.
309
310 Following the % character that introduces a conversion, there may be
311 the special flag _ : the conversion that follows occurs as usual, but
312 the resulting value is discarded. For instance, if f is the function
313 fun i -> i + 1 , and s is the string x = 1 , then Scanf.sscanf s %_s =
314 %i f returns 2 .
315
316 The field width is composed of an optional integer literal indicating
317 the maximal width of the token to read. For instance, %6d reads an
318 integer, having at most 6 decimal digits; %4f reads a float with at
319 most 4 characters; and %8[\000-\255] returns the next 8 characters (or
320 all the characters still available, if fewer than 8 characters are
321 available in the input).
322
323 Notes:
324
325
326 -as mentioned above, a %s conversion always succeeds, even if there is
327 nothing to read in the input: in this case, it simply returns .
328
329
330 -in addition to the relevant digits, '_' characters may appear inside
331 numbers (this is reminiscent to the usual OCaml lexical conventions).
332 If stricter scanning is desired, use the range conversion facility
333 instead of the number conversions.
334
335
336 -the scanf facility is not intended for heavy duty lexical analysis and
337 parsing. If it appears not expressive enough for your needs, several
338 alternative exists: regular expressions (module Str ), stream parsers,
339 ocamllex -generated lexers, ocamlyacc -generated parsers.
340
341
342 Scanning indications in format strings
343 Scanning indications appear just after the string conversions %s and %[
344 range ] to delimit the end of the token. A scanning indication is
345 introduced by a @ character, followed by some plain character c . It
346 means that the string token should end just before the next matching c
347 (which is skipped). If no c character is encountered, the string token
348 spreads as much as possible. For instance, %s@\t reads a string up to
349 the next tab character or to the end of input. If a @ character appears
350 anywhere else in the format string, it is treated as a plain character.
351
352 Note:
353
354
355 -As usual in format strings, % and @ characters must be escaped using
356 %% and %@ ; this rule still holds within range specifications and scan‐
357 ning indications. For instance, format %s@%% reads a string up to the
358 next % character, and format %s@%@ reads a string up to the next @ .
359
360 -The scanning indications introduce slight differences in the syntax of
361 Scanf format strings, compared to those used for the Printf module.
362 However, the scanning indications are similar to those used in the For‐
363 mat module; hence, when producing formatted text to be scanned by
364 Scanf.bscanf , it is wise to use printing functions from the Format
365 module (or, if you need to use functions from Printf , banish or care‐
366 fully double check the format strings that contain '@' characters).
367
368
369 Exceptions during scanning
370 Scanners may raise the following exceptions when the input cannot be
371 read according to the format string:
372
373
374 -Raise Scanf.Scan_failure if the input does not match the format.
375
376
377 -Raise Failure if a conversion to a number is not possible.
378
379
380 -Raise End_of_file if the end of input is encountered while some more
381 characters are needed to read the current conversion specification.
382
383
384 -Raise Invalid_argument if the format string is invalid.
385
386 Note:
387
388
389 -as a consequence, scanning a %s conversion never raises exception
390 End_of_file : if the end of input is reached the conversion succeeds
391 and simply returns the characters read so far, or if none were ever
392 read.
393
394
395 Specialised formatted input functions
396 val sscanf : string -> ('a, 'b, 'c, 'd) scanner
397
398 Same as Scanf.bscanf , but reads from the given string.
399
400
401
402 val scanf : ('a, 'b, 'c, 'd) scanner
403
404 Same as Scanf.bscanf , but reads from the predefined formatted input
405 channel Scanf.Scanning.stdin that is connected to stdin .
406
407
408
409 val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
410 -> ('a, 'b, 'c, 'd) scanner
411
412 Same as Scanf.bscanf , but takes an additional function argument ef
413 that is called in case of error: if the scanning process or some con‐
414 version fails, the scanning function aborts and calls the error han‐
415 dling function ef with the formatted input channel and the exception
416 that aborted the scanning process as arguments.
417
418
419
420 val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
421 'c, 'd) scanner
422
423 Same as Scanf.kscanf but reads from the given string.
424
425
426 Since 4.02.0
427
428
429
430
431 Reading format strings from input
432 val bscanf_format : Scanning.in_channel -> ('a, 'b, 'c, 'd, 'e, 'f)
433 format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
434
435
436 bscanf_format ic fmt f reads a format string token from the formatted
437 input channel ic , according to the given format string fmt , and
438 applies f to the resulting format string value. Raise Scanf.Scan_fail‐
439 ure if the format string value read does not have the same type as fmt
440 .
441
442
443 Since 3.09.0
444
445
446
447 val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
448 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
449
450 Same as Scanf.bscanf_format , but reads from the given string.
451
452
453 Since 3.09.0
454
455
456
457 val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
458 ('a, 'b, 'c, 'd, 'e, 'f) format6
459
460
461 format_from_string s fmt converts a string argument to a format string,
462 according to the given format string fmt . Raise Scanf.Scan_failure if
463 s , considered as a format string, does not have the same type as fmt .
464
465
466 Since 3.10.0
467
468
469
470 val unescaped : string -> string
471
472
473 unescaped s return a copy of s with escape sequences (according to the
474 lexical conventions of OCaml) replaced by their corresponding special
475 characters. More precisely, Scanf.unescaped has the following prop‐
476 erty: for all string s , Scanf.unescaped (String.escaped s) = s .
477
478 Always return a copy of the argument, even if there is no escape
479 sequence in the argument. Raise Scanf.Scan_failure if s is not prop‐
480 erly escaped (i.e. s has invalid escape sequences or special charac‐
481 ters that are not properly escaped). For instance, Scanf.unescaped \"
482 will fail.
483
484
485 Since 4.00.0
486
487
488
489
490 Deprecated
491 val fscanf : in_channel -> ('a, 'b, 'c, 'd) scanner
492
493 Deprecated.
494
495 Scanf.fscanf is error prone and deprecated since 4.03.0.
496
497 This function violates the following invariant of the Scanf module: To
498 preserve scanning semantics, all scanning functions defined in Scanf
499 must read from a user defined Scanf.Scanning.in_channel formatted input
500 channel.
501
502 If you need to read from a in_channel input channel ic , simply define
503 a Scanf.Scanning.in_channel formatted input channel as in let ib =
504 Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
505
506
507
508 val kfscanf : in_channel -> (Scanning.in_channel -> exn -> 'd) -> ('a,
509 'b, 'c, 'd) scanner
510
511 Deprecated.
512
513 Scanf.kfscanf is error prone and deprecated since 4.03.0.
514
515
516
517
518
519OCamldoc 2019-07-30 Stdlib.Scanf(3)